Professional Documents
Culture Documents
Shu Hotta - Mathematical Physical Chemistry_ Practical and Intuitive Methodology-Springer (2023)
Shu Hotta - Mathematical Physical Chemistry_ Practical and Intuitive Methodology-Springer (2023)
Mathematical
Physical
Chemistry
Practical and Intuitive Methodology
Third Edition
Mathematical Physical Chemistry
Shu Hotta
Mathematical Physical
Chemistry
Practical and Intuitive Methodology
Third Edition
Shu Hotta
Takatsuki, Osaka, Japan
This Springer imprint is published by the registered company Springer Nature Singapore Pte Ltd.
The registered company address is: 152 Beach Road, #21-01/04 Gateway East, Singapore 189721,
Singapore
To the memory of
my brother Kei Hotta
and
Emeritus Professor Yûsaku Hamada
Preface to the Third Edition
This book is the third edition of Mathematical Physical Chemistry. Although the
main concept and main theme remain unchanged, this book’s third edition contains
the introductory quantum theory of fields. The major motivation for including the
relevant topics in the present edition is as follows: The Schrödinger equation has
prevailed as a quantum-mechanical fundamental equation up to the present date in
many fields of natural science. The Dirac equation was formulated as a relativisti-
cally covariant expression soon after the Schrödinger equation was discovered. In
contrast to the Schrödinger equation, however, the Dirac equation is less prevailing
even nowadays.
One of the major reasons for this is that the Dirac equation has been utilized
almost exclusively by elementary particle physicists and related researchers. And
what is more, a textbook of the elementary particle physics or quantum theory of
fields cannot help but be thick because it has to deal with so many topics that require
a detailed explanation, starting with, e.g., renormalization and regularization as well
as the underlying gauge theory. Such a situation inevitably leads to significant
reduction of descriptions about fundamental properties of the Dirac equation.
Under these circumstances, the author has aimed to explain the fundamental prop-
erties of the Dirac equation in a plain style so that scientific audience of various fields
of specialization can readily understand the essence. In particular, the author
described in detail the transformation properties of the Dirac equation (or Dirac
operator) and its plane wave solutions in terms of standard matrix algebra.
Another motivation for writing the present edition lies in extending the concepts
of vector spaces. The theory of linear vector spaces has been developed in previous
editions, in which the inner product space is intended as a vector space. The Dirac
equation was constructed so as to be consistent with the requirement of the special
theory of relativity that is closely connected to the Minkowski space, a type of vector
space. In relation to the Minkowski space, moreover, various kinds of vector spaces
play a role in the quantum theory of fields. Examples include tensor space and spinor
space. In particular, the tensor spaces are dealt with widely in different fields of
vii
viii Preface to the Third Edition
natural science. In this book, therefore, the associated theory of tensor spaces has
been treated rather systematically along with related concepts.
As is the case with the previous editions of this book, the author has disposed the
mathematical topics at the last chapter of individual parts (Parts I–V). In the present
edition, we added a chapter that dealt with advanced topics of Lie algebra at the end
of the book. The description has been improved in several parts so that readers can
gain clear understanding. Errors found in the previous editions have been corrected.
As in the case of the previous editions, readers benefit from going freely back and
forth across the whole topics of this book.
The author owes sincere gratitude to the late Emeritus Professor Yûsaku Hamada
in Kyoto Institute of Technology for a course of excellent lectures delivered by him
on applied mathematics in the undergraduate days of the author. The author is deeply
grateful as well to Emeritus Professor Chiaki Tsukamoto in Kyoto Institute of
Technology for his helpful discussions and suggestions on exponential functions
of matrices (Chap. 15). Once again, the author wishes to thank many students for
valuable discussions and Dr. Shin’ichi Koizumi, Springer, for giving him an oppor-
tunity to write this book. Finally, the author is most grateful to his wife, Kazue Hotta,
for continually encouraging him to write the book.
ix
x Preface to the Second Edition
Once again, the author wishes to thank many students for valuable discussions
and Dr. Shin’ichi Koizumi, Springer, for giving him an opportunity to write
this book.
The contents of this book are based upon manuscripts prepared for both undergrad-
uate courses of Kyoto Institute of Technology by the author entitled “Polymer
Nanomaterials Engineering” and “Photonics Physical Chemistry” and a master’s
course lecture of Kyoto Institute of Technology by the author entitled “Solid-State
Polymers Engineering.”
This book is intended for graduate and undergraduate students, especially those
who major in chemistry and, at the same time, wish to study mathematical physics.
Readers are supposed to have basic knowledge of analysis and linear algebra.
However, they are not supposed to be familiar with the theory of analytic functions
(i.e., complex analysis), even though it is desirable to have relevant knowledge
about it.
At the beginning, mathematical physics looks daunting to chemists, as used to be
the case with myself as a chemist. The book introduces the basic concepts of
mathematical physics to chemists. Unlike other books related to mathematical
physics, this book makes a reasonable selection of material so that students majoring
in chemistry can readily understand the contents in spontaneity. In particular, we
stress the importance of practical and intuitive methodology. We also expect engi-
neers and physicists to benefit from reading this book.
In Parts I and II, the book describes quantum mechanics and electromagnetism.
Relevance between the two is well considered. Although quantum mechanics covers
the broad field of modern physics, in Part I we focus on a harmonic oscillator and a
hydrogen (like) atom. This is because we can study and deal with many of funda-
mental concepts of quantum mechanics within these restricted topics. Moreover,
knowledge acquired from the study of the topics can readily be extended to practical
investigation of, e.g., electronic sates and vibration (or vibronic) states of molecular
systems. We describe these topics by both analytic method (that uses differential
equations) and operator approach (using matrix calculations). We believe that the
basic concepts of quantum mechanics can be best understood by contrasting the
analytical and algebraic approaches. For this reason, we give matrix representations
xi
xii Preface to the First Edition
operators and matrices. Also, projection operators and nilpotent matrices appear in
many parts along with their tangible applications to individual topics. Hence, readers
are recommended to carefully examine and compare the related contents throughout
the book. We believe that readers, especially chemists, benefit from the writing style
of this book, since it is suited to chemists who are good at intuitive understanding.
The author would like to thank many students for their valuable suggestions and
discussions at the lectures. The author also wishes to thank many students for
valuable discussions and Dr. Shin’ichi Koizumi, Springer, for giving him an oppor-
tunity to write this book.
xv
xvi Contents
Part II Electromagnetism
7 Maxwell’s Equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 277
7.1 Maxwell’s Equations and Their Characteristics . . . . . . . . . . . 277
7.2 Equation of Wave Motion . . . . . . . . . . . . . . . . . . . . . . . . . . 284
7.3 Polarized Characteristics of Electromagnetic Waves . . . . . . . . 289
7.4 Superposition of Two Electromagnetic Waves . . . . . . . . . . . . 294
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 303
Contents xvii
Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1305
Part I
Quantum Mechanics
In the last part, we study the theory of analytic functions, one of the most elegant
theories of mathematics. This approach not only helps cultivate a broad view of pure
mathematics, but also leads to the acquisition of practical methodology. The last part
deals with the introductory set theory and topology as well.
Chapter 1
Schrödinger Equation and Its Application
E = hν = ħω, ð1:1Þ
where ħ h/2π and ω = 2πν. The quantity ω is called angular frequency with ν
being frequency. The quantity ħ is said to be a reduced Planck constant.
Also Einstein (1917) concluded that momentum of light quantum p is identical to
the energy of light quantum divided by light velocity in vacuum c. That is, we have
p = ħk, ð1:3Þ
where k 2πλ n (n: a unit vector in the direction of propagation of light) is said to be a
wavenumber vector.
Meanwhile, Arthur Compton (1923) conducted various experiments where he
investigated how an incident X-ray beam was scattered by matter (e.g., graphite,
copper, etc.). As a result, Compton found out a systematical redshift in X-ray
wavelengths as a function of scattering angles of the X-ray beam (Compton effect).
Moreover he found that the shift in wavelengths depended only on the scattering
angle regardless of quality of material of a scatterer. The results can be summarized
in a simple equation described as
h
Δλ = ð1 - cos θÞ, ð1:4Þ
me c
λe h=me c: ð1:5Þ
In other words, λe is equal to the maximum shift in the wavelength of the scattered
beam; this shift is obtained when θ = π/2. The quantity λe is called an electron
Compton wavelength and has an approximate value of 2.426 × 10-12 (m).
Let us derive (1.4) on the basis of conservation of energy and momentum. To this
end, in Fig. 1.1 we assume that an electron is originally at rest. An X-ray beam is
incident to the electron. Then the X-ray is scattered and the electron recoils as shown.
The energy conservation reads as
ħω þ me c2 = ħω0 þ p2 c2 þ m e 2 c4 , ð1:6Þ
1.1 Early-Stage Quantum Theory 5
(b)
−
scattered X-ray ( )
Fig. 1.1 Scattering of an X-ray beam by an electron. (a) θ denotes a scattering angle of the X-ray
beam. (b) Conservation of momentum
where ω and ω′ are initial and final angular frequencies of the X-ray; the second term
of RHS is an energy of the electron in which p is a magnitude of momentum after
recoil. Meanwhile, conservation of the momentum as a vector quantity reads as
ħk = ħk0 þ p, ð1:7Þ
where k and k′ are wavenumber vectors of the X-ray before and after being scattered;
p is a momentum of the electron after recoil. Note that an initial momentum of the
electron is zero since the electron is originally at rest. Here p is defined as
p mu, ð1:8Þ
Figure 1.1 shows that -ħk, ħk′, and p form a closed triangle.
From (1.6), we have
2
me c2 þ ħðω - ω0 Þ = p2 c2 þ me 2 c4 : ð1:10Þ
Hence, we get
2me c2 ħðω - ω0 Þ þ ħ2 ðω - ω0 Þ = p2 c2 :
2
ð1:11Þ
= ħ2 ðk - k0 Þ = ħ2 k2 þ k0 - 2kk 0 cos θ
2 2
p2
ð1:12Þ
ħ2 2
ω þ ω0 - 2ωω0 cos θ ,
2
=
c2
where we used the relations ω = ck and ω′ = ck′ with the third equality. Therefore,
we get
p2 c2 = ħ2 ω2 þ ω0 - 2ωω0 cos θ :
2
ð1:13Þ
That is,
Thus, we get
ω - ω0 1 1 1 0 ħ
= 0- = ð λ - λÞ = ð1 - cos θÞ, ð1:16Þ
ωω0 ω ω 2πc m e c2
where λ and λ′ are wavelengths of the initial and final X-ray beams, respectively.
Since λ′ - λ = Δλ, we have (1.4) from (1.16) accordingly.
We have to mention another important person, Louis-Victor de Broglie (1924) in
the development of quantum mechanics. Encouraged by the success of Einstein and
Compton, he propounded the concept of matter wave, which was referred to as the
de Broglie wave afterward. Namely, de Broglie reversed the relationship of (1.1) and
(1.2) such that
ω = E=ħ, ð1:17Þ
and
where p equals |p| and λ is a wavelength of a corpuscular beam. This is said to be the
de Broglie wavelength. In (1.18), de Broglie thought that a particle carrying an
1.1 Early-Stage Quantum Theory 7
p
u= : ð1:19Þ
me 1 þ ðp=me cÞ2
p ≈ me u:
Ee = p2 c2 þ me 2 c4 : ð1:20Þ
mc2 = p2 c2 þ me 2 c4 :
Ee = mc2 : ð1:21Þ
The relation (1.21) is due to Einstein (1905, 1907) and is said to be the equiva-
lence theorem of mass and energy.
If an electron is accompanied by a matter wave, that wave should be propagated
with a certain phase velocity vp and a group velocity vg. Thus, using (1.17) and (1.18)
we have
vp = ω=k = Ee =p = p2 c2 þ me 2 c4 =p > c,
vg = ∂ω=∂k = ∂E e =∂p = c2 p= p2 c2 þ me 2 c4 < c, ð1:22Þ
vp vg = c :
2
photons (or light quanta), vp = vg = c and, hence, once again we get vpvg = c2. We
will encounter the last relation of (1.22) in Part II as well.
The above discussion is a brief historical outlook of early-stage quantum theory
before Erwin Schrödinger (1926) propounded his equation.
2
1 ∂ ψ
— 2ψ = , ð1:23Þ
v2 ∂t 2
2 2 2
∂ ∂ ∂
—2 þ þ : ð1:24Þ
∂x2 ∂y2 ∂z2
One of the special solutions for (1.24) called a plane wave is well-studied and
expressed as
x
x = ðe1 e2 e3 Þ y , ð1:26Þ
z
where e1, e2, and e3 denote basis vectors of an orthonormal base pointing to positive
directions of x-, y-, and z-axes, respectively. Here we make it a rule to represent basis
vectors by a row vector and represent a coordinate or a component of a vector by a
column vector; see Sect. 11.1.
The other way around, now we wish to seek a basic equation whose solution is
described as (1.25). Taking account of (1.1)–(1.3) as well as (1.17) and (1.18), we
rewrite (1.25) as
1.2 Schrödinger Equation 9
ψ = ψ 0 eiðħx - ħ tÞ ,
p E
ð1:27Þ
px
where we redefine p = ðe1 e2 e3 Þ py and E as quantities associated with those of
pz
matter (electron) wave. Taking partial differentiation of (1.27) with respect to x, we
obtain
∂ψ
= px ψ 0 eiðħx - ħ tÞ = px ψ:
i p E i
ð1:28Þ
∂x ħ ħ
ħ ∂ψ
= px ψ: ð1:29Þ
i ∂x
Similarly we have
ħ ∂ψ ħ ∂ψ
= py ψ and = pz ψ: ð1:30Þ
i ∂y i ∂z
ħ ∂ ħ ∂ ħ ∂
$ px , $ py , $ pz : ð1:31Þ
i ∂x i ∂y i ∂z
2
∂ ψ 2
ψ 0 eiðħx - ħ tÞ = -
i p E 1 2
= px p ψ: ð1:32Þ
∂x 2 ħ ħ2 x
Hence,
2
∂ ψ
- ħ2 = p2x ψ: ð1:33Þ
∂x2
Similarly we have
10 1 Schrödinger Equation and Its Application
2 2
∂ ψ ∂ ψ
- ħ2 = p2y ψ and - ħ2 = p2z ψ: ð1:34Þ
∂y2 ∂z2
2 2 2
∂ ∂ ∂
- ħ2 $ p2x , - ħ2 $ p2y , - ħ2 $ p2z : ð1:35Þ
∂x2 ∂y2 ∂z2
Summing both sides of (1.33) and (1.34) and then dividing by 2m, we have
ħ2 2 p2
- — ψ= ψ ð1:36Þ
2m 2m
ħ2 2 p2
- — $ , ð1:37Þ
2m 2m
∂ψ
= - Eψ 0 eiðħx - ħ tÞ = - Eψ:
i p E i
ð1:38Þ
∂t ħ ħ
That is,
∂ψ
iħ = Eψ: ð1:39Þ
∂t
∂
iħ $ E: ð1:40Þ
∂t
∂ψ ħ2 2 p2
iħ þ — ψ = E- ψ: ð1:41Þ
∂t 2m 2m
we have
p2
E= þ V, ð1:43Þ
2m
∂ψ ħ2 2
iħ þ — ψ = Vψ: ð1:44Þ
∂t 2m
ħ2 2 ∂ψ
- — þ V ψ = iħ : ð1:45Þ
2m ∂t
ħ2 2
H - — þ V: ð1:46Þ
2m
∂ψ
Hψ = iħ : ð1:47Þ
∂t
me c2
K= - m e c2 :
2
1 - ðu=cÞ
1 u 2 1 p2
K ≈ m e c2 1 þ - m e c2 = me u2 ≈ ,
2 c 2 2me
1 1
p ≈1 þ x
1-x 2
2
when x (>0) corresponding to uc is enough small than 1. This implies that in the
above case the group velocity u of a particle is supposed to be well below light
velocity c. Dirac (1928), however, formulated an equation that describes relativistic
quantum mechanics of electron (the Dirac equation). The detailed description with
respect to the Dirac equation will be found in Part V.
In (1.45) ψ varies as a function of x and t. Suppose, however, that a potential
V depends only upon x. Then we have
ħ2 2 ∂ψ ðx, t Þ
- — þ V ðxÞ ψ ðx, t Þ = iħ : ð1:48Þ
2m ∂t
Now, let us assume that separation of variables can be done with (1.48) such that
Then, we have
ħ2 2 ∂ϕðxÞξðt Þ
- — þ V ðxÞ ϕðxÞξðt Þ = iħ : ð1:50Þ
2m ∂t
ħ2 2 ∂ξðt Þ
- — þ V ðxÞ ϕðxÞ=ϕðxÞ = iħ =ξðt Þ: ð1:51Þ
2m ∂t
For (1.51) to hold, we must equate both sides to a constant E. That is, for a certain
fixed point x0 we have
ħ2 2 ∂ξðt Þ
- — þ V ðx0 Þ ϕðx0 Þ=ϕðx0 Þ = iħ =ξðt Þ, ð1:52Þ
2m ∂t
where ϕ(x0) of a numerator should be evaluated after operating — 2, while with ϕ(x0)
in a denominator, ϕ(x0) is evaluated simply replacing x in ϕ(x) with x0. Now, let us
define a function Φ(x) such that
1.2 Schrödinger Equation 13
ħ2 2
Φ ð xÞ - — þ V ðxÞ ϕðxÞ=ϕðxÞ: ð1:53Þ
2m
Then, we have
∂ξðt Þ
Φðx0 Þ = iħ =ξðt Þ: ð1:54Þ
∂t
Equation (1.56) can readily be solved. Since (1.56) depends on a sole variable t,
we have
dξðt Þ E E
= dt or d ln ξðt Þ = dt: ð1:57Þ
ξðt Þ iħ iħ
ξðt Þ Et
ln = : ð1:58Þ
ξ ð 0Þ iħ
That is,
Comparing (1.59) with (1.38), we find that the constant E in (1.55) and (1.56)
represents an energy of a particle (electron).
Thus, the next task we want to do is to solve an eigenvalue equation of (1.55).
After solving the problem, we get a solution
where the constant ξ(0) has been absorbed in ϕ(x). Normally, ϕ(x) is to be normal-
ized after determining the functional form (vide infra).
14 1 Schrödinger Equation and Its Application
The Schrödinger equation has been expressed as (1.48). The equation is a second-
order linear differential equation (SOLDE). In particular, our major interest lies in
solving an eigenvalue problem of (1.55). Eigenvalues consist of points in a complex
plane. Those points sometimes form a continuous domain, but we focus on the
eigenvalues that comprise discrete points in the complex plane. Therefore, in our
studies the eigenvalues are countable and numbered as, e.g., λn (n = 1, 2, 3, ⋯). An
example is depicted in Fig. 1.2. Having this common belief as a background, let us
first think of a simple form of SOLDE.
Example 1.1 Let us think of a following differential equation:
d 2 yð xÞ
þ λyðxÞ = 0, ð1:61Þ
dx2
The BCs of (1.62) are called Dirichlet conditions. We define the following
differential operator D described as
d2
D - : ð1:63Þ
dx2
This is because the above functions do not change a functional form with respect
to the differentiation and we ascribe solving a differential equation to solving an
algebraic equation among constants (or parameters). In the present case, λ and k are
such constants.
The parameter k could be a complex variable, because λ is allowed to take a
complex value as well. Linear independence of these functions is ensured from a
non-vanishing Wronskian, W. That is,
where a and b are (complex) constant. We call two linearly independent solutions
eikx and e-ikx (k ≠ 0) a fundamental set of solutions of a SOLDE. Inserting (1.66) into
(1.61), we have
λ - k 2 = 0, i:e:, λ = k2 : ð1:68Þ
eikL e - ikL a 0
= : ð1:70Þ
e - ikL eikL b 0
eikL e - ikL
= 0, i:e:, e2ikL - e - 2ikL = 0: ð1:71Þ
e - ikL eikL
It is because if (1.71) were not zero, we would have a = b = 0 and y(x) 0. Note
that with an eigenvalue problem we must avoid having a solution that is identically
zero. Rewriting (1.71), we get
or
eikL ða - bÞ = 0: ð1:75Þ
Therefore,
a = b, ð1:76Þ
where we used the fact that eikL is a non-vanishing function for any ikL (either real or
complex). Similarly, in the case of (1.74), we have
a = - b: ð1:77Þ
cos kL = 0: ð1:80Þ
Hence,
π
kL = þ mπ ðm = 0, ± 1, ± 2, ⋯Þ: ð1:81Þ
2
π π
In (1.81), for instance, we have k = 2L for m = 0 and k = - 2L for m = - 1. Also,
we have k = 2L for m = 1 and k = - 2L for m = - 2. These cases, however,
3π 3π
individually give linearly dependent solutions for (1.78). Therefore, to get a set of
linearly independent eigenfunctions we may define k as positive. Correspondingly,
from (1.68) we get eigenvalues of
sin kL = 0: ð1:83Þ
Hence,
kL = nπ ðn = 1, 2, 3, ⋯Þ: ð1:84Þ
where we chose positive numbers n for the same reason as the above. With the
second equality of (1.85), we made eigenvalues easily comparable to those of (1.82).
Figure 1.3 shows the eigenvalues given in both (1.82) and (1.85) in a unit of π 2/4L2.
From (1.82) and (1.85), we find that λ is positive definite (or strictly positive), and
so from (1.68) we have
18 1 Schrödinger Equation and Its Application
(x /4 )
+∞
0 1 4 9 16 25 ⋯
Fig. 1.3 Eigenvalues of a differential equation (1.61) under boundary conditions given by (1.62).
The eigenvalues are given in a unit of π 2/4L2 on a real axis
p
k = λ: ð1:86Þ
L L
I= yðxÞ yðxÞdx = jyðxÞj2 dx = 1: ð1:87Þ
-L -L
That is,
L L
1
I = 4jaj2 cos 2 kxdx = 4jaj2 ð1 þ cos 2kxÞdx
-L -L 2
ð1:88Þ
L
21 2
= 2jaj x þ sin 2kx = 4Ljaj :
2k -L
1 1
j aj = : ð1:89Þ
2 L
Thus, we have
1 1 iθ
a= e , ð1:90Þ
2 L
where θ is any real number and eiθ is said to be a phase factor. We usually set eiθ 1.
Then, we have a = 1
2
1
L. Thus for a normalized cosine eigenfunctions, we get
1 π
yðxÞ = cos kx kL = þ mπ ðm = 0, 1, 2, ⋯Þ ð1:91Þ
L 2
1
yð xÞ = sin kx ½kL = nπ ðn = 1, 2, 3, ⋯Þ ð1:92Þ
L
1 iz
cos z e þ e - iz ; z = x þ iy ðx, y : realÞ: ð1:93Þ
2
1
cos z = ½cos xðey þ e - y Þ þ i sin xðe - y - ey Þ: ð1:94Þ
2
For cos z to vanish, both its real and imaginary parts must be zero. Since
e y + e-y > 0 for all real numbers y, we must have cos x = 0 for the real part to
vanish; i.e.,
π
x= þ mπ ðm = 0, ± 1, ± 2, ⋯Þ: ð1:95Þ
2
Note in this case that sin x = ± 1(≠0). Therefore, for the imaginary part to vanish,
e-y - e y = 0. That is, we must have y = 0. Consequently, the zeros of cos z are real
numbers. In other words, with respect to z0 that satisfies cos z0 = 0 we have
π
z0 = þ mπ ðm = 0, ± 1, ± 2, ⋯Þ: ð1:96Þ
2
Example 1.2 A particle confined within a potential well The results obtained in
Example 1.1 can immediately be applied to dealing with a particle (electron) in a
one-dimensional infinite potential well. In this case, (1.55) reads as
ħ2 d 2 ψ ð xÞ
þ Eψ ðxÞ = 0, ð1:97Þ
2m dx2
0 ð - L ≤ x ≤ LÞ,
V ð xÞ =
1 ð - L > x; x > LÞ:
d2 ψ ðxÞ 2mE
þ 2 ψ ð xÞ = 0 ð1:98Þ
dx2 ħ
with BCs
ψ ðLÞ = ψ ð- LÞ = 0: ð1:99Þ
ħ2 λ
E= ð1:100Þ
2m
with λ = k2 in (1.68). For k we use the values of (1.81) and (1.84). Therefore, with
energy eigenvalues we get either
2
ħ2 ð2l þ 1Þ π 2
E= ðl = 0, 1, 2, ⋯Þ,
2m 4L2
to which
1 π
ψ ð xÞ = cos kx kL = þ lπ ðl = 0, 1, 2, ⋯Þ ð1:101Þ
L 2
corresponds or
1.4 Quantum-Mechanical Operators and Matrices 21
2
ħ2 ð2nÞ π 2
E= ðn = 1, 2, 3, ⋯Þ,
2m 4L2
to which
1
ψ ð xÞ = sin kx ½kL = nπ ðn = 1, 2, 3, ⋯Þ ð1:102Þ
L
corresponds.
Since the particle behaves as a free particle within the potential well (-L ≤ x ≤ L )
and p = ħk, we obtain
p2 ħ2 2
E= = k ,
2m 2m
where
PΨ = pΨ : ð1:103Þ
clear understanding about it. In Part III, we will deal with matrix calculation in detail
from a point of view of a general principle. At present, a (2, 2) matrix suffices. Let
A be a (2, 2) matrix expressed as
a b
A= : ð1:104Þ
c d
e
j ψi = : ð1:105Þ
f
Note that operating (2, 2) matrix on a (2, 1) matrix produces another (2, 1) matrix.
Furthermore, we define an adjoint matrix A{ such that
a c
A{ = , ð1:106Þ
b d
In this case, jψi{ also denotes a complex conjugate transpose of jψi. The notation
jψi and hψj are due to Dirac. He named hψj and jφi a bra vector and ket vector,
respectively. This naming or equivoque comes from that hψ| | φi = hψ| φi forms a
bracket. This is a (1, 2) × (2, 1) = (1, 1) matrix, i.e., a c-number (including a complex
number) and hψ| φi represents an inner product. These notations are widely used
nowadays in the field of mathematics and physics.
g
Taking another vector j ξi = and using a matrix calculation rule, we have
h
a c e a e þ c f
A{ j ψi = A{ ψi = = : ð1:108Þ
b d f b e þ d f
Thus, we get
1.4 Quantum-Mechanical Operators and Matrices 23
g
A{ ψjξ = ðae þ cf be þ df Þ = ðag þ bhÞe þ ðcg þ dhÞf : ð1:110Þ
h
Similarly, we have
a b g
hψjAξi = ðe f Þ = ðag þ bhÞe þ ðcg þ dhÞf : ð1:111Þ
c d h
Also, we have
{
A{ ψjξ = ψjA{ ξ : ð1:114Þ
{
A{ = A: ð1:115Þ
where the second equality comes from (1.113) obtained by exchanging ψ and ξ
there. Moreover, we have a following relation:
ðABÞ{ = B{ A{ : ð1:117Þ
Making an inner product by multiplying jξi from the right of the leftmost and
rightmost sides of (1.118) and using (1.116), we get
24 1 Schrödinger Equation and Its Application
This relation may be regarded as the associative law with regard to the symbol “j”
of the inner product. This is equivalent to the associative law with regard to the
matrix multiplication.
The results obtained above can readily be extended to a general case where (n, n)
matrices are dealt with.
Now, let us introduce an Hermitian operator (or matrix) H. When we have
H { = H, ð1:119Þ
A norm is a natural extension for a notion of a “length” of a vector. The norm kψk
is zero, if and only if jψ i = 0 (zero vector). For, from (1.105) and (1.107), we have
hψjψ i = jej2 þ jf j2 :
H j ψi = λ j ψi, ð1:122Þ
where we assume that jψi is normalized; namely hψ| ψi = 1 or ||ψ|| = 1. Notice that
the symbol “j” in an inner product is of secondary importance. We may disregard
this notation as in the case where a product notation “×” is omitted by denoting ab
instead of a × b.
Taking a complex conjugate of (1.123), we have
1.4 Quantum-Mechanical Operators and Matrices 25
hψjHψ i = λ : ð1:124Þ
where with the third equality we used the definition (1.119) for the Hermitian
operator. Hence, the relation λ = λ obviously shows that any eigenvalue λ is real,
if H is Hermitian. The relation (1.125) immediately tells us that even though jψi is
not an eigenfunction, hψ| Hψi is real as well, if H is Hermitian. The quantity hψ| Hψi
is said to be an expectation value. This value is interpreted as the most probable or
averaged value of H obtained as a result of operation of H on a physical state jψi. We
sometimes denote the expectation value as
hH i hψjHψ i, ð1:126Þ
where jψi is normalized. Unless jψi is normalized, it can be normalized on the basis
of (1.121) by choosing jΦi such that
j Φi = j ψi= ψ : ð1:127Þ
b
hgjf i gðxÞ f ðxÞdx, ð1:128Þ
a
b
jf ðxÞj2 dx < 1: ð1:129Þ
a
Using the above definition, let us calculate hg| Dfi, where D was defined in (1.63).
Then, using the integration by parts, we have
b
d 2 f ðxÞ b
gð x Þ -
0
dx = - ½g f 0 a þ g f 0 dx
b
hgjDf i =
a dx2 a
b b b b
0 00 0 00
- ½g f 0 a þ g f g fdx = g f - g f 0 - g f dx
b
= - þ
a a a a
b
0 0
= g f -g f þ hDgjf i:
a
ð1:130Þ
f ð bÞ = f ð aÞ = 0 and
ð1:131Þ
gð bÞ = gð aÞ = 0 i:e:, gðbÞ = gðaÞ = 0,
we get
In light of (1.120), (1.132) implies that D is Hermitian. In (1.131), notice that the
functions f and g satisfy the same BCs. Normally, for an operator to be Hermitian has
this property. Thus, the Hermiticity of a differential operator is closely related to BCs
of the differential equation.
Next, we consider a following inner product:
b b b
0
f f 00 dx = - ½f f 0 a þ f f 0 dx = - ½f f 0 a þ jf 0 j dx: ð1:133Þ
b b 2
hf jDf i = -
a a a
Note that the definite integral of (1.133) cannot be negative. There are two
possibilities for D to be Hermitian according to different BCs.
1. Dirichlet conditions: f(b) = f(a) = 0. If we could have f ′ = 0, hfj Dfi would be
zero. But, in that case f should be constant. If so, f(x) 0 according to BCs. We
must exclude this trivial case. Consequently, to avoid this situation we must have
1.4 Quantum-Mechanical Operators and Matrices 27
b
jf 0 j dx > 0
2
or hf jDf i > 0: ð1:134Þ
a
In this case, the operator D is said to be positive definite. Suppose that such a
positive-definite operator has an eigenvalue λ. Then, for a corresponding
eigenfunction y(x) we have
Both hy| Dyi and ||y||2 are positive and, hence, we have λ > 0. Thus, if D has an
eigenvalue, it must be positive. In this case, λ is said to be positive definite as
well; see Example 1.1.
2. Neumann conditions: f ′(b) = f ′(a) = 0. From (1.130), D is Hermitian as well.
Unlike the condition (1), however, f may be a non-zero constant in this case.
Therefore, we are allowed to have
b
jf 0 j dx = 0
2
or hf jDf i = 0: ð1:137Þ
a
hf jDf i ≥ 0: ð1:138Þ
where the presence of a unit matrix E is implied. Explicitly writing it, we have
The relations (1.140) and (1.141) are called the canonical commutation relation.
On the basis of a relation p = ħi ∂q
∂
, a brief proof for this is as follows:
G{ = - G, ð1:145Þ
1.5 Commutator and Canonical Commutation Relation 29
G j ψi = λ j ψi, ð1:146Þ
where G is an anti-Hermitian operator and jψi has been normalized. As in the case of
(1.123) we have
hψjGψ i = λ : ð1:148Þ
b
ħ ∂
hgjpf i = gðxÞ ½f ðxÞdx, ð1:150Þ
a i ∂x
where the domain [a, b] depends on a physical system; this can be either bounded or
unbounded. Performing integration by parts, we have
b
ħ ∂ ħ
½ gð x Þ f ð xÞ a - ½gðxÞ f ðxÞdx
b
hgjpf i =
i a ∂x i
b ð1:151Þ
ħ ħ ∂
= ½ gð bÞ f ð bÞ - gð aÞ f ð aÞ þ gðxÞ f ðxÞdx:
i a i ∂x
If we require f(b) = f(a) and g(b) = g(a), the first term vanishes and we get
30 1 Schrödinger Equation and Its Application
b
ħ ∂
hgjpf i = gðxÞ f ðxÞdx = hpgjf i: ð1:152Þ
a i ∂x
Thus, as in the case of (1.120), the momentum operator p is Hermitian. Note that a
position operator q of (1.142) is Hermitian as a priori assumption.
Meanwhile, the z-component of angular momentum operator Lz is described in a
polar coordinate as follows:
ħ ∂
Lz = , ð1:153Þ
i ∂ϕ
where ϕ is an azimuthal angle varying from 0 to 2π. The notation and implication of
Lz will be mentioned in Chap. 3. Similarly as the above, we have
2π
ħ ħ ∂
hgjLz f i = ½gð2π Þ f ð2π Þ - gð0Þ f ð0Þ þ gðxÞ f ðxÞdϕ: ð1:154Þ
i 0 i ∂ϕ
Note that we must have the above BC, because ϕ = 0 and ϕ = 2π are spatially the
same point. Thus, we find that Lz is Hermitian as well on this condition.
On the basis of aforementioned argument, let us proceed to quantum-mechanical
studies of a harmonic oscillator. Regarding the angular momentum, we will study
their basic properties in Chap. 3.
Reference
d 2 xð t Þ
m = - sxðt Þ, ð2:1Þ
dt 2
d 2 xð t Þ
þ ω2 xðt Þ = 0: ð2:2Þ
dt 2
ω= s=m, ð2:3Þ
where a and b are suitable constants. Let us consider BCs different from those of
Example 1.1 or 1.2 this time. That is, we set BCs such that
Notice that (2.5) gives initial conditions (ICs). Mathematically, ICs are included
in BCs (see Chap. 10). From (2.4) we have
v0 iωx v
xð t Þ = e - e - iωx = 0 sin ωt: ð2:7Þ
2iω ω
1
E=K þ V = mv 2 : ð2:8Þ
2 0
1 2 1
V ð qÞ = sq = mω2 q2 , ð2:9Þ
2 2
p2 p2 1
H= þ V ð qÞ = þ mω2 q2 : ð2:10Þ
2m 2m 2
Hψ ðqÞ = Eψ ðqÞ
or
ħ2 2 1
- — ψ ðqÞ þ mω2 q2 ψ ðqÞ = Eψ ðqÞ: ð2:11Þ
2m 2
This is a SOLDE and it is well known that the SOLDE can be solved by a power
series expansion method.
In the present studies, however, let us first use an operator method to solve the
eigenvalue equation (2.11) of a one-dimensional oscillator. To this end, we use a
quantum-mechanical Hamiltonian where a momentum operator p is explicitly
represented. Thus, the Hamiltonian reads as
p2 1
H= þ mω2 q2 : ð2:12Þ
2m 2
Equation (2.12) is formally the same as (2.10). Note, however, that in (2.12) p and
q are expressed as quantum-mechanical operators.
As in (1.126), we first examine an expectation value hHi of H. It is given by
34 2 Quantum-Mechanical Harmonic Oscillator
p2 1
hHi ¼ hψjHψi ¼ ψj ψ þ ψj mω2 q2 ψ
2m 2
1 1 1 1
¼ p{ ψjpψi þ mω2 q{ ψjqψi ¼ hpψjpψi þ mω2 hqψjqψi
2m 2 2m 2
1 1
¼ kpψ k2 þ mω2 kqψ k2 ≥ 0, ð2:13Þ
2m 2
where again we assumed that jψi has been normalized. In (2.13) we used the
notation (1.126) and the fact that both q and p are Hermitian. In this situation, hHi
takes a non-negative value.
In (2.13), the equality holds if and only if jpψi = 0 and jqψi = 0. Let us specify a
vector jψ 0i that satisfies these conditions such that
j pψ 0 i = 0 and j qψ 0 i = 0: ð2:14Þ
Multiplying q from the left on the first equation of (2.14) and multiplying p from
the left on the second equation, we have
qp j ψ 0 i = 0 and pq j ψ 0 i = 0: ð2:15Þ
Subtracting the second equation of (2.15) from the first equation, we get
where with the first equality we used (1.140). Therefore, we would have jψ 0(q)i 0.
This leads to the relations (2.14). That is, if and only if jψ 0(q)i 0, hHi = 0. But,
since it has no physical meaning, jψ 0(q)i 0 must be rejected as unsuitable for the
solution of (2.11). Regarding a physically acceptable solution of (2.13), hHi must
take a positive definite value accordingly. Thus, on the basis of the canonical
commutation relation, we restrict the range of the expectation values.
Instead of directly dealing with (2.12), it is well known to introduce the following
operators [1]:
mω i
a qþp p ð2:17Þ
2ħ 2mħω
mω i
a{ = q- p p: ð2:18Þ
2ħ 2mħω
Notice here again that both q and p are Hermitian. Using a matrix representation
for (2.17) and (2.18), we have
2.2 Formulation Based on an Operator Method 35
mω i
p
a 2ħ 2mħω q
= : ð2:19Þ
a{ mω
-p
i p
2ħ 2mħω
Then we have
where the second last equality comes from (1.140). Rewriting (2.20), we get
1
H = ħωa{ a þ ħω: ð2:21Þ
2
Similarly we get
1
H = ħωaa{ - ħω: ð2:22Þ
2
That is,
a, a{ = 1 or a, a{ = E, ð2:24Þ
1
H, a{ = ħω a{ a þ , a{ = ħω a{ aa{ - a{ a{ a = ħωa{ a, a{ = ħωa{ : ð2:25Þ
2
Similarly, we get
1 1
hψjHjψi ¼ ψjħωa{ a þ ħωjψ ¼ ħω ψja{ ajψi þ ħωhψjψi
2 2
1 1 1
2
¼ ħωhaψjaψi þ ħω ¼ ħωkaψ k þ ħω ≥ ħω: ð2:27Þ
2 2 2
Thus, the expectation value is equal to or larger than 12 ħω. This is consistent with
that an energy eigenvalue is positive definite as mentioned above. Equation (2.27)
also tells us that if we have
j aψ 0 i = 0, ð2:28Þ
we get
1
hψ 0 jHjψ 0 i = ħω: ð2:29Þ
2
Equation (2.29) means that the smallest expectation value is 12 ħω on the condition
of (2.28). On the same condition, using (2.21) we have
1 1
H j ψ 0 i = ħωa{ a j ψ 0 i þ ħω j ψ 0 i = ħω j ψ 0 i: ð2:30Þ
2 2
1
H j ψ 0i = ħω j ψ 0 i = E 0 j ψ 0 i: ð2:31Þ
2
a{ H j ψ 0 i = a{ E 0 j ψ 0 i: ð2:32Þ
(x )
This implies that a{ j ψ 0i belongs to an eigenvalue (E0 + ħω), which is larger than
E0 as expected. Again multiplying a{ on both sides of (2.34) from the left and using
(2.25), we get
2 2
H a{ j ψ 0 i = ðE 0 þ 2ħωÞ a{ j ψ 0 i: ð2:35Þ
This implies that (a{)2 j ψ 0i belongs to an eigenvalue (E0 + 2ħω). Thus, repeat-
edly taking the above procedures, we get
n n
H a{ j ψ 0 i = ðE 0 þ nħωÞ a{ j ψ 0 i: ð2:36Þ
1
En E 0 þ nħω = n þ ħω, ð2:37Þ
2
where En denotes an energy eigenvalue of the n-th excited state. The energy
eigenvalues are plotted in Fig. 2.1.
Our next task is to seek normalized eigenvectors of the n-th excited state. Let cn
be a normalization constant of that state. That is, we have
n
j ψ n i = c n a{ j ψ 0 i, ð2:38Þ
n n-1 n-1
a a{ ¼ aa{ - a{ a a{ þ a{ a a{
n-1 n-1 n-1 n-1
¼ a, a{ a{ þ a { a a{ ¼ a{ þ a{ a a{
n-1 n-2 2 n-2
¼ a{ þ a{ a, a{ a{ þ a { a a{
n-1 2 n-2 ð2:39Þ
¼ 2 a{ þ a{ a a {
n-1 2 n-3 3 n-3
¼ 2 a{ þ a{ a, a{ a{ þ a { a a{
n-1 3 n-3
¼ 3 a{ þ a{ a a {
¼ ⋯:
In the above procedures, we used [a, a{] = 1. What is implied in (2.39) is that a
coefficient of (a{)n - 1 increased one by one with a transferred toward the right one
by one in the second term of RHS. Notice that in the second term a is sandwiched
such that (a{)ma(a{)n - m (m = 1, 2, ⋯). Finally, we have
n n-1 n
a a{ = n a{ þ a{ a: ð2:40Þ
Meanwhile, operating a on both sides of (2.38) from the left and using (2.40), we
get
n n-1 n n-1
a ψ n i ¼ c n a a{ ψ 0 i ¼ c n n a{ þ a{ a j ψ 0 i ¼ c n n a{ j ψ 0i
cn n-1 cn ð2:41Þ
¼n c n - 1 a{ j ψ 0i ¼ n j ψ n - 1 i,
cn - 1 cn - 1
n n-1 n
a2 a{ ¼ na a{ þ a a{ a
n-2 n-1 n
¼ n ð n - 1Þ a { þ a{ a þ a a{ a ð2:42Þ
n-2 n-1 n
¼ n ð n - 1Þ a{ þ n a{ a þ a a{ a:
n n-2 n-1 n
a3 a{ = nðn - 1Þa a{ þ na a{ a þ a2 a{ a
n-3 n-2 n-1 n
= nðn - 1Þ ðn - 2Þ a{ þ a{ a þ na a{ a þ a2 a{ a
n-3 n-2 n-1 n
= nðn - 1Þðn - 2Þ a{ þ nðn - 1Þ a{ a þ na a{ a þ a2 a{ a
= ⋯:
ð2:43Þ
where m < n and f(a, a{) is a polynomial of a{ that has a power of a as a coefficient.
Further operating hψ 0j and jψ 0i from the left and right on both sides of (2.44),
respectively, we have
n n-m
ψ 0 jam a{ jψ 0 ¼ nðn - 1Þðn - 2Þ⋯ðn - m þ 1Þ ψ 0 j a{ jψ 0
þ ψ 0 j f a, a{ ajψ 0
n-m
¼ nðn - 1Þðn - 2Þ⋯ðn - m þ 1Þ ψ 0 j a{ jψ 0 : ð2:45Þ
hψ 0 j a{ = 0: ð2:46Þ
n-m
ψ 0 j a{ jψ 0 = 0:
n
ψ 0 jam a{ jψ 0 = 0: ð2:47Þ
m
ψ 0 jan a{ jψ 0 = 0: ð2:48Þ
Equation (2.48) can be obtained by repeatedly using (1.117). From (2.47) and
(2.48), we get
n
ψ 0 jam a{ jψ 0 = 0, ð2:49Þ
n
ψ 0 jan a{ jψ 0 = n!hψ 0 jψ 0 i: ð2:50Þ
n
ψ 0 jan a{ jψ 0 = n!: ð2:51Þ
1 n
j ψ n i = p a{ j ψ 0 i, ð2:52Þ
n!
then we have
1
hψ n j = p hψ 0 j an :
n!
hψ m jψ n i = δmn : ð2:53Þ
1
cn = p : ð2:54Þ
n!
Notice here that an undetermined phase factor eiθ (θ : real) is intended such that
1
cn = p eiθ :
n!
But eiθ is usually omitted for the sake of simplicity. Thus, we have constructed a
series of orthonormal eigenfunctions jψ ni.
Furthermore, using (2.41) and (2.54) we obtain
2.2 Formulation Based on an Operator Method 41
p
a j ψ n i = n j ψ n - 1 i: ð2:55Þ
H j ψ n i = ðE 0 þ nħωÞ j ψ n i: ð2:56Þ
H j ψ n i = ħωa{ a þ E0 j ψ n i: ð2:57Þ
a{ a j ψ n i = n j ψ n i: ð2:58Þ
{ {
a{ a = a{ a{ = a{ a, ð2:59Þ
Thus,
p
a{ j ψ n - 1 i = n j ψ n i: ð2:61Þ
Equations (2.55) and (2.62) clearly represent the relationship between an operator
and eigenfunction (or eigenvector). The relationship is characterized by
Thus, we are now in a position to construct this relation using matrices. From
(2.53), we should be able to construct basis vectors using a column vector such that
1 0 0
0 1 0
0 0 1
j ψ 0i = 0 , j ψ 1i = 0 , j ψ 2i = 0 , ⋯: ð2:64Þ
0 0 0
⋮ ⋮ ⋮
Notice that these vectors form a vector space of an infinite dimension. The
orthonormal relation (2.53) can easily be checked using (2.64). We represent
a and a{ so that (2.55) and (2.62) can be satisfied. We obtain
0 1 0 0 0 ⋯
p
0 0 2 0 0 ⋯
p
0 0 0 3 0 ⋯
a= : ð2:65Þ
0 0 0 0 2 ⋯
0 0 0 0 0 ⋯
⋮ ⋮ ⋮ ⋮ ⋮ ⋱
Similarly,
0 0 0 0 0 ⋯
1 0 0 0 0 ⋯
p
0 2 0 0 0 ⋯
a{ = p : ð2:66Þ
0 0 3 0 0 ⋯
0 0 0 2 0 ⋯
⋮ ⋮ ⋮ ⋮ ⋮ ⋱
Since a determinant of the matrix of (2.19) is -i/ħ ≠ 0, using its inverse matrix we
get
ħ ħ
q 2mω 2mω a
= : ð2:67Þ
p 1 mħω 1 mħω a{
-
i 2 i 2
ħ 1 mħω
q= a þ a{ and p= a - a{ : ð2:68Þ
2mω i 2
With the inverse matrix, we will deal with it in Part III. Note that q and p are both
Hermitian. Inserting (2.65) and (2.66) into (2.68), we get
0 1 0 0 0 ⋯
p
1 0 2 0 0 ⋯
p p
ħ 0 2 0 3 0 ⋯
q¼ p , ð2:69Þ
2mω 0 0 3 0 2 ⋯
0 0 0 2 0 ⋯
⋮ ⋮ ⋮ ⋮ ⋮ ⋱
0 -1 0 0 0 ⋯
p
1 0 - 2 0 0 ⋯
p p
mħω 0 2 0 - 3 0 ⋯
p= i p : ð2:70Þ
2 0 0 3 0 -2 ⋯
0 0 0 2 0 ⋯
⋮ ⋮ ⋮ ⋮ ⋮ ⋱
Equations (2.69) and (2.70) obviously show that q and p are Hermitian. We can
derive various physical quantities from these equations. For instance, following
matrix algebra Hamiltonian H can readily be calculated. The result is expressed as
44 2 Quantum-Mechanical Harmonic Oscillator
1 0 0 0 0 ⋯
0 3 0 0 0 ⋯
p2 1 ħω 0 0 5 0 0 ⋯
H= þ mω2 q2 = : ð2:71Þ
2m 2 2 0 0 0 7 0 ⋯
0 0 0 0 9 ⋯
⋮ ⋮ ⋮ ⋮ ⋮ ⋱
1 0 0 0 0 ⋯
0 0 0
0 3 0 0 0 ⋯ 0 0 0
ħω 0 0 5 0 0 ⋯ 1 ħω 5 5ħω 1
Hjψ 2 i ¼ ¼ ¼
2 0 0 0 7 0 ⋯ 0 2 0 2 0
0 0 0 0 9 ⋯ 0 0 0
⋮ ⋮ ⋮
⋮
⋮ ⋮ ⋮ ⋮ ⋱
5ħω 1
¼ ψ 2i ¼ þ 2 ħω ψ 2 i:
2 2
ð2:72Þ
1 ħ mħω
qp - pq = a þ a{ a - a{ - a - a{ a þ a{
i 2mω 2 ð2:73Þ
ħ
= ð - 2Þ aa{ - a{ a = iħ a, a{ = iħ = iħE,
2i
where with the second last equality we used (2.24) and the identity matrix E of an
infinite dimension is given by
2.4 Coordinate Representation of Schrödinger Equation 45
1 0 0 0 0 ⋯
0 1 0 0 0 ⋯
0 0 1 0 0 ⋯
E= : ð2:74Þ
0 0 0 1 0 ⋯
0 0 0 0 1 ⋯
⋮ ⋮ ⋮ ⋮ ⋮ ⋱
The Schrödinger equation has been given in (1.55) or (2.11) as a SOLDE form. In
contrast to the matrix representation, (1.55) and (2.11) are said to be coordinate
representation of Schrödinger equation. Now, let us derive a coordinate representa-
tion of (2.28) to obtain an analytical solution.
On the basis of (1.31) and (2.17), a is expressed as
mω i mω i ħ ∂ mω ħ
a= qþp p= qþp = qþ
2ħ 2mħω 2ħ 2mħω i ∂q 2ħ 2mω
∂
: ð2:75Þ
∂q
mω ħ ∂
qþ ψ ð qÞ = 0 ð2:76Þ
2ħ 2mω ∂q 0
or
mω ħ ∂ψ 0 ðqÞ
qψ ðqÞ þ = 0: ð2:77Þ
2ħ 0 2mω ∂q
∂ψ 0 ðqÞ mω
þ qψ 0 ðqÞ = 0: ð2:78Þ
∂q ħ
ψ 0 ðqÞ = N 0 e - αq ,
2
ð2:79Þ
mω
q N 0 e - αq = 0:
2
- 2αq þ ð2:80Þ
ħ
Hence, we get
mω
α= : ð2:81Þ
2ħ
Thus,
ψ 0 ðqÞ = N 0 e - 2ħ q :
mω 2
ð2:82Þ
1
π
e - cq dq =
2
ðc > 0Þ, ð2:84Þ
-1 c
we have
1
πħ
e-
mω 2
ħ q dq = : ð2:85Þ
-1 mω
1 - cq2
To get (2.84), putting I - 1e dq, we have
2.4 Coordinate Representation of Schrödinger Equation 47
1 1 1
e - cq dq e - cs ds = e - cð q þs2 Þ
2 2 2
I2 = dqds
-1 -1 -1
1 2π 1 2π
ð2:86Þ
- cr 2 1 - cR π
= e rdr dθ = e dR dθ = ,
0 0 2 0 0 c
mω 1=4 mω 1=4
e - 2ħ q :
mω 2
N0 = and ψ 0 ð qÞ = ð2:87Þ
πħ πħ
Also, we have
mω i mω i ħ ∂ mω ħ
a{ = q- p p= q- p = q-
2ħ 2mħω 2ħ 2mħω i ∂q 2ħ 2mω
∂
: ð2:88Þ
∂q
Putting
we rewrite (2.89) as
48 2 Quantum-Mechanical Harmonic Oscillator
n
n
ξ 1 mω 2 ∂
ψ n ð qÞ = ψ n =p ξ- ψ 0 ðqÞ
β n! 2ħβ2 ∂ξ
n
1 1 n=2 ∂
=p ξ- ψ 0 ð qÞ ð2:91Þ
n! 2 ∂ξ
n
1 mω 1=4 ∂
e - 2ξ :
1 2
= ξ-
2 n! πħ
n
∂ξ
β2
α= : ð2:92Þ
2
Moreover, putting
1 mω 1=4 β
Nn = , ð2:93Þ
2 n! πħ
n
π 1=2 2n n!
we get
n
∂
e - 2ξ :
1 2
ψ n ðξ=βÞ = N n ξ - ð2:94Þ
∂ξ
We have to normalize (2.94) with respect to a variable ξ. Since ψ n(q) has already
been normalized as in (2.53), we have
1
jψ n ðqÞj2 dq = 1: ð2:95Þ
-1
Let us define ψ n ðξÞ as being normalized with ξ. In other words, ψ n(q) is converted
to ψ n ðξÞ by means of variable transformation and concomitant change in normali-
zation condition. Then, we have
1
jψ n ðξÞj2 dξ = 1: ð2:97Þ
-1
1
ψ n ðξÞ ψ ðξ=βÞ, ð2:98Þ
β n
n
∂ 1
e - 2ξ
1 2
ψ n ðξÞ = N n ξ - with Nn : ð2:99Þ
∂ξ π 1=2 2n n!
dn
e-x
2 2
H n ðxÞ ð- 1Þn ex ðn ≥ 0Þ, ð2:100Þ
dxn
where Hn(x) is a n-th order polynomial. We wish to show the following relation on
the basis of mathematical induction:
ψ n ðξÞ = N n H n ðξÞe - 2ξ :
1 2
ð2:101Þ
Comparing (2.87), (2.98), and (2.99), we make sure that (2.101) holds with n = 0.
When n = 1, from (2.99) we have
∂
e - 2ξ = N 1 ξe - 2ξ - ð - ξÞe - 2ξ = N 1 2ξe - 2ξ
1 2 1 2 1 2 1 2
ψ 1 ðξÞ = N 1 ξ -
∂ξ
ð2:102Þ
d
1 ξ2
e-ξ - 12ξ2 - 12ξ2
2
= N 1 ð- 1Þ e e = N 1 H 1 ðξÞe :
dξ
nþ1 n
∂ 1 ∂ ∂
e - 2ξ ¼ e - 2ξ
1 2 1 2
ψ nþ1 ðξÞ ¼ N nþ1 ξ - ξ- Nn ξ -
∂ξ 2ðn þ 1Þ ∂ξ ∂ξ
1 ∂
N n H n ðxÞe - 2ξ
1 2
¼ ξ-
2 ð n þ 1Þ ∂ξ
1 ∂ dn
ð- 1Þn eξ e-ξ e - 2ξ
2 2 1 2
¼ Nn ξ -
2 ð n þ 1Þ ∂ξ dξn
1 ∂ dn
e 2ξ e-ξ
1 2 2
¼ N n ð- 1Þn ξ -
2 ð n þ 1Þ ∂ξ dξn
1 1 2 d
n
∂ 12ξ2 dn
¼ N n ð- 1Þn ξe2ξ n e
- ξ2
- e n e
- ξ2
2 ð n þ 1Þ dξ ∂ξ dξ
n n
1 1 2 d 1 2 d
N n ð- 1Þn ξe2ξ n e - ξ - ξe2ξ - ξ2
2
¼ n e
2 ð n þ 1Þ dξ dξ
d nþ1
- e 2ξ e-ξ
1 2 2
dξnþ1
nþ1
1 1 2 d
N n ð- 1Þnþ1 e2ξ nþ1 e - ξ
2
¼
2 ð n þ 1Þ dξ
dnþ1
¼ N nþ1 ð- 1Þnþ1 eξ e-ξ e - 2ξ ¼ N nþ1 H nþ1 ðxÞe - 2ξ :
2 2 1 2 1 2
dξnþ1
This means that (2.101) holds with n + 1 as well. Thus, it follows that (2.101) is
true of n that is zero or any positive integer.
Orthogonal relation reads as
1
ψ m ðξÞ ψ n ðξÞdξ = δmn , ð2:103Þ
-1
1 ðm = nÞ,
δmn = ð2:104Þ
0 ðm ≠ nÞ:
ψ n ð qÞ = βψ n ðβqÞ: ð2:105Þ
mω 1=4 1 mω
q e - 2ħ q ðn = 0, 1, 2, ⋯Þ:
mω 2
ψ n ð qÞ = Hn ð2:106Þ
ħ π 1=2 2n n! ħ
We tabulate first several Hermite polynomials Hn(x) in Table 2.1, where the index
n represents the highest order of the polynomials. In Table 2.1 we see that even
functions and odd functions appear alternately (i.e., parity). This is the case with
ψ n(q) as well, because ψ n(q) is a product of Hn(x) and an even function e - 2ħ q .
mω 2
Combining (2.101) and (2.103) as well as (2.99), the orthogonal relation between
ψ n ðξÞ ðn = 0, 1, 2, ⋯Þ can be described alternatively as [3]
1 p
e - ξ H m ðξÞH n ðξÞdξ = π 2n n!δmn :
2
ð2:107Þ
-1
Note that Hm(ξ) is a real function, and so Hm(ξ) = Hm(ξ). The relation (2.107) is
well known as the orthogonality of Hermite polynomials with e - ξ taken as a weight
2
function [3]. Here the weight function is a real and non-negative function within the
domain considered [e.g., (-1, +1) in the present case] and independent of indices
m and n. We will deal with it again in Sect. 10.4.
The relation (2.101) and the orthogonality relationship described as (2.107) can
more explicitly be understood as follows: From (2.11) we have the Schrödinger
equation of a one-dimensional quantum-mechanical harmonic oscillator such that
ħ2 d 2 uð qÞ 1
- þ mω2 q2 uðqÞ = EuðqÞ: ð2:108Þ
2m dq2 2
d 2 uð ξ Þ 2E
- þ ξ 2 uð ξ Þ = uðξÞ: ð2:109Þ
dξ2 ħω
2E
λ ð2:110Þ
ħω
d2
D - þ ξ2 , ð2:111Þ
dξ2
uðξÞ = vðξÞe - ξ =2
2
: ð2:113Þ
d 2 vð ξ Þ dvðξÞ - ξ22 ξ2
- 2
þ 2ξ e = ðλ - 1Þ vðξÞe - 2 : ð2:114Þ
dξ dξ
ξ2
Since e - 2 does not vanish with any ξ, we have
d 2 vð ξ Þ dvðξÞ
- þ 2ξ = ðλ - 1Þ vðξÞ: ð2:115Þ
dξ2 dξ
d2 d
D - 2
þ 2ξ , ð2:116Þ
dξ dξ
d 2 H n ðξÞ dH ðξÞ
2
- 2ξ n þ 2nH n ðξÞ = 0: ð2:118Þ
dξ dξ
λ = 2n þ 1, ð2:120Þ
we get
where c is an arbitrary constant. Thus, using (2.113), for a solution of (2.109) we get
un ðξÞ = cH n ðξÞe - ξ =2
2
, ð2:122Þ
where the solution u(ξ) is indexed with n. From (2.110), as an energy eigenvalue we
have
1
En = n þ ħω:
2
ðΔAÞ2 ,
where we have
ΔA A - hAi: ð2:123Þ
where we used the fact that an expectation value of an Hermitian operator is real.
Then, h(ΔA)2i is non-negative as in the case of (2.13). Moreover, if jψi is an
eigenstate of A, h(ΔA)2i = 0. Therefore, h(ΔA)2i represents a measure of how
large measured values are dispersed when A is measured in reference to a quantum
state jψ i. Also, we define a standard deviation δA as
δA ðΔAÞ2 : ð2:126Þ
then we have
δA δB ≥ jkj=2 ð2:128Þ
In (2.129), we used the fact that hψ| A| ψi and hψ| B| ψi are just real numbers and
those commute with any operator. Next, we calculate a following quantity in relation
to a real number λ:
where we used the fact that ΔA and ΔB are Hermitian. For the above quadratic form
to hold with any real number λ, we have
Theorem 2.2 Let A be an Hermitian operator. The necessary and sufficient condi-
tion for a physical state jψ 0i to be an eigenstate of A is δA = 0.
Proof Suppose that jψ 0i is a normalized eigenstate of A that belongs to an eigen-
value a. Then, we have
where we used the fact that ΔA is Hermitian. From the definition of norm of (1.121),
for δA = 0 to hold, we have
where we assumed that jψi is arbitrarily chosen normalized vector. Suppose now
that jψ 0i is an eigenstate of A that belongs to an eigenvalue a. Then, we have
A j ψ 0 i = a j ψ 0 i: ð2:137Þ
hψ 0 j A{ = hψ 0 j A = hψ 0 j a = ahψ 0 j , ð2:138Þ
where the last equality comes from the fact that A is Hermitian. From (2.138), we
would have
This would imply that (2.136) does not hold with jψ 0i, in contradiction to (2.127),
where ik ≠ 0. Namely, we conclude that any physical state cannot be an eigenstate of
A on condition that (2.127) holds. Equation (2.127) is rewritten as
ħ
hψ n jqjψ n i = ψ ja þ a{ jψ n
2mω n
ð2:141Þ
ħn ħ ð n þ 1Þ
= hψ jψ iþ ψ n jψ nþ1 = 0,
2mω n n - 1 2mω
ħ 2 ħ 2
q2 = a þ a{ = a2 þ E þ 2a{ a þ a{ , ð2:142Þ
2mω 2mω
where E denotes an identity operator and we used (2.24) along with the following
relation:
2.5 Variance and Uncertainty Principle 57
ħ ħ
ψ n jq2 jψ n = hψ n jψ n i þ 2hψ n ja{ aψ n i = ð2n þ 1Þ, ð2:144Þ
2mω 2mω
ħ
ðΔqÞ2 = ψ n jq2 jψ n - hψ n jqjψ n i2 = ð2n þ 1Þ:
2mω
mħω
hψ n jpjψ n i = 0 and ψ n jp2 jψ n = ð2n þ 1Þ: ð2:145Þ
2
Thus, we get
mħω
ðΔpÞ2 = ψ n jp2 jψ n = ð2n þ 1Þ:
2
Accordingly, we have
ħ ħ
δq δp = ðΔqÞ2 ðΔpÞ2 = ð2n þ 1Þ ≥ : ð2:146Þ
2 2
ħ
δq δp ≥ : ð2:147Þ
2
cannot obtain a proper solution that has an eigenstate with a fixed momentum in a
confined physical system.
References
two particles are moving under control only by a force field between the two
particles without other external force fields [1].
In the classical mechanics, equation of motion is separated into two equations
related to the relative coordinates and center-of-mass coordinates accordingly. Of
these, a term of the potential field is only included in the equation of motion with
respect to the relative coordinates.
The situation is the same with the quantum mechanics. Namely, the Schrödinger
equation of motion with the relative coordinates is expressed as an eigenvalue
equation that reads as
ħ2 2
- — þ V ðr Þ ψ = Eψ, ð3:1Þ
2μ
where μ is a reduced mass of two particles [1], i.e., an electron and a proton; V(r) is a
potential with r being a distance between the electron and proton. In (3.1), we
assume the spherically-symmetric potential; i.e., the potential is expressed only as
a function of the distance r. Moreover, if the potential is coulombic,
ħ2 2 e2
- — - ψ = Eψ, ð3:2Þ
2μ 4πε0 r
ħ2 2 Ze2
- — - ψ = Eψ, ð3:3Þ
2μ 4πε0 r
Lx
L = ð e1 e2 e3 Þ Ly , ð3:4Þ
Lz
e1 e2 e3
L=x×p= x y z ,
px py pz
where x denotes a position vector with respect to the relative coordinates x, y, and z.
That is,
x
x = ðe1 e2 e3 Þ y : ð3:5Þ
z
where we have used canonical commutation relation (1.140) in the second to the last
equality. In the above calculations, we used commutability of, e.g., y and pz; z and py.
For example, we have
62 3 Hydrogen-Like Atoms
ħ ∂ ∂ ħ ∂ j ψi ∂ j ψi
pz , y j ψi = y-y j ψi = y -y = 0:
i ∂z ∂z i ∂z ∂z
Since jψi is arbitrarily chosen, this relation implies that pz and y commute. We
obtain similar relations regarding Ly2 and Lz2 as well. Thus, we have
L2 ¼ Lx 2 þ Ly 2 þ Lz 2
¼ y 2 pz 2 þ z 2 py 2 þ z 2 px 2 þ x 2 pz 2 þ x 2 py 2 þ y 2 px 2
þ x2 px 2 - x2 px 2 þ y2 py 2 - y2 py 2 þ z2 pz 2 - z2 pz 2 - yzpz py
¼ y2 pz 2 þ z 2 py 2 þ z 2 px 2 þ x 2 pz 2 þ x 2 py 2 þ y 2 p x 2 þ x 2 px 2 þ y 2 py 2
In (3.7), we are able to ease the calculations by virtue of putting a term (x2px2 -
x px2 + y2py2 - y2py2 + z2pz2 - z2pz2). As a result, for the second term after the
2
þ xypy px þ yxpx py
The calculations of r2 p2 [the first term of (3.7)] and r p (in the third term) are
straightforward.
In a spherical coordinate, momentum p is expressed as
3.2 Constitution of Hamiltonian 63
θ e(θ )
O y
(b)
z
e(r)
e(θ )
e(φ )
where pr, pθ, and pϕ are components of p; e(r), e(θ), and e(ϕ) are orthonormal basis
vectors of ℝ3 in the direction of increasing r, θ, and ϕ, respectively (see Fig. 3.1). In
Fig. 3.1b, e(ϕ) is perpendicular to the plane shaped by the z-axis and a straight line of
y = x tan ϕ. Notice that the said plane is spanned by e(r) and e(θ). Meanwhile, the
momentum operator is expressed as [2]
64 3 Hydrogen-Like Atoms
ħ
p = —
i
ð3:9Þ
ħ ðrÞ ∂ 1 ∂ 1 ∂
= e þ eðθÞ þ eðϕÞ :
i ∂r r ∂θ r sin θ ∂ϕ
The vector notation of (3.9) corresponds to (1.31). That is, in the Cartesian
coordinate, we have
ħ ħ ∂ ∂ ∂
p= —= e þ e2 þ e3 , ð3:10Þ
i i 1 ∂x ∂y ∂z
r = reðrÞ ,
ħ ħ ∂
r p=r — =r : ð3:11Þ
i i ∂r
Hence,
2
ħ ∂ ħ ∂ ∂
r ð r pÞ p = r r = - ħ2 r 2 2 : ð3:12Þ
i ∂r i ∂r ∂r
Thus, we have
2
∂ ∂ ∂ ∂
L2 = r 2 p2 þ ħ2 r 2 þ 2ħ2 r = r 2 p2 þ ħ2 r2 : ð3:13Þ
∂r 2 ∂r ∂r ∂r
Therefore,
ħ2 ∂ ∂ L2
p2 = - r2 þ 2: ð3:14Þ
r ∂r
2 ∂r r
Notice here that L2 does not contain r (vide infra); i.e., L2 commutes with r2, and
so it can freely be divided by r2. Thus, the Hamiltonian H is represented by
3.2 Constitution of Hamiltonian 65
p2
H = þ V ðr Þ
2μ
ð3:15Þ
1 ħ2 ∂ ∂ L2 Ze2
= - 2 r2 þ 2 - :
2μ r ∂r ∂r r 4πε0 r
1 ħ2 ∂ ∂ L2 Ze2
- 2 r2 þ 2 - ψ = Eψ: ð3:16Þ
2μ r ∂r ∂r r 4πε0 r
x = r sin θ cos ϕ,
y = r sin θ sin ϕ, ð3:17Þ
z = r cos θ,
1=2
r = ðx2 þ y2 þ z2 Þ ,
2 1=2
ð x2 þ y Þ
θ = tan - 1 , ð3:18Þ
z
y
ϕ = tan - 1 :
x
Thus, we have
∂ ∂
Lz = xpy - ypx = - iħ x -y , ð3:19Þ
∂y ∂x
∂ ∂r ∂ ∂θ ∂ ∂ϕ ∂
= þ þ : ð3:20Þ
∂x ∂x ∂r ∂x ∂θ ∂x ∂ϕ
In turn, we have
66 3 Hydrogen-Like Atoms
∂r x
¼ ¼ sin θ cos ϕ,
∂x r
- 1=2
∂θ 1 ð x 2 þ y2 Þ 2x z x
¼ ¼
∂x 1 þ ðx þ y Þ=z
2 2 2 2z x2 þ y2 þ z2 ðx2 þ y2 Þ1=2
ð3:21Þ
cos θ cos ϕ
¼ ,
r
∂ϕ 1 1 sin ϕ
¼ y - 2 ¼ - :
∂x 1 þ ðy2 =x2 Þ x r sin θ
0 1
tan - 1 x = :
1 þ x2
Similarly, we have
Inserting (3.22) and (3.23) together with (3.17) into (3.19), we get
∂
Lz = - iħ : ð3:24Þ
∂ϕ
∂ ∂r ∂ ∂θ ∂ ∂ sin θ ∂
= þ = cos θ - :
∂z ∂z ∂r ∂z ∂θ ∂r r ∂θ
∂ ∂
Lx = iħ sin ϕ þ cot θ cos ϕ , ð3:25Þ
∂θ ∂ϕ
∂ ∂
Ly = - iħ cos ϕ - cot θ sin ϕ : ð3:26Þ
∂θ ∂ϕ
Then, we have
∂ ∂ ∂ ∂
LðþÞ = ħeiϕ þ i cot θ and Lð- Þ = ħe - iϕ - þ i cot θ : ð3:28Þ
∂θ ∂ϕ ∂θ ∂ϕ
Thus, we get
∂ ∂ ∂ ∂
LðþÞ Lð- Þ ¼ ħ2 eiϕ þ i cot θ e - iϕ - þ icot θ
∂θ ∂ϕ ∂θ ∂ϕ
2 2
∂ 1 ∂ ∂
¼ ħ2 eiϕ e - iϕ - þi - þ i cot θ
∂θ2 sin 2 θ ∂ϕ ∂θ∂ϕ
2 2
∂ ∂ ∂ ∂
þ e -iϕ cot θ - þ i cot θ þ ie - iϕ cot θ - þ i cot θ 2
∂θ ∂ϕ ∂ϕ∂θ ∂ϕ
2 2
∂ ∂ ∂ ∂
¼ - ħ2 þ cot θ þ i þ cot 2 θ 2 :
∂θ 2 ∂θ ∂ϕ ∂ϕ
ð3:29Þ
2
∂ ∂ ∂ cot θ ∂ ∂
icot θ =i þ cot θ
∂θ ∂ϕ ∂θ ∂ϕ ∂θ∂ϕ
2
1 ∂ ∂
=i - þ cot θ :
sin θ ∂ϕ
2 ∂θ∂ϕ
2 2
∂ ∂
Note also that ∂θ∂ϕ = ∂ϕ∂θ . This is because we are dealing with continuous and
differentiable functions.
Meanwhile, we have the following commutation relations:
Lx , Ly = Lx Ly - Ly Lx
= ypz - zpy zpx - xpz - zpx - xpz ypz - zpy
= ypz zpx - ypz xpz - zpy zpx þ zpy xpz
- zpx ypz þ zpx zpy þ xpz ypz - xpz zpy
= ypx pz z - zpx ypz þ zpy xpz - xpz zpy
þ xpz ypz - ypz xpz þ zpx zpy - zpy zpx
= - ypx zpz - pz z þ xpy zpz - pz z = iħ xpy - ypx = iħLz :
2 2
∂ ∂ ∂ ∂ ∂ j ψi ∂ j ψi
px , py j ψi = - ħ2 - j ψi = - ħ2 - = 0:
∂x ∂y ∂y ∂x ∂x∂y ∂y∂x
In the above equation, we assumed that the order of differentiation with respect to
x and y can be switched. It is because we are dealing with continuous and differen-
tiable normal functions. Thus, px and py commute.
For other important commutation relations, we have
Lx , L2 = 0, Ly , L2 = 0, and Lz , L2 = 0: ð3:31Þ
The derivation is straightforward and it is left for readers. The relations (3.30) and
(3.31) imply that a simultaneous eigenstate exists for L2 and one of Lx, Ly, and Lz.
This is because L2 commute with them from (3.31), whereas Lz does not commute
with Lx or Ly. The detailed argument about the simultaneous eigenstate can be seen
in Part III.
Thus, we have
LðþÞ Lð- Þ = Lx 2 þ Ly 2 þ i Ly Lx - Lx Ly = Lx 2 þ Ly 2 þ i Ly , Lx
= Lx 2 þ Ly 2 þ ħLz :
2
∂
L z 2 = - ħ2 : ð3:33Þ
∂ϕ2
Finally, we get
2 2
∂ ∂ 1 ∂
L2 = - ħ2 þ cot θ þ
∂θ 2 ∂θ sin θ ∂ϕ2
2
or
2
1 ∂ ∂ 1 ∂
L2 = - ħ2 sin θ þ : ð3:34Þ
sin θ ∂θ ∂θ sin 2 θ ∂ϕ2
2
ħ2 ∂ ∂ 1 ∂ ∂ 1 ∂
H= - r2 þ sin θ þ
2μr 2 ∂r ∂r sin θ ∂θ ∂θ sin 2 θ ∂ϕ2
Ze2
- : ð3:35Þ
4πε0 r
2
ħ2 ∂ ∂ 1 ∂ ∂ 1 ∂ Ze2
- r2 þ sin θ þ - ψ
2μr ∂r
2 ∂r sin θ ∂θ ∂θ sin θ ∂ϕ
2 2 4πε0 r
= Eψ:
ð3:36Þ
L2 = Lx 2 þ Ly 2 þ Lz 2 : ð3:38Þ
{
Lx { = ypz - zpy = pz { y{ - py { z{ = pz y - py z = ypz - zpy = Lx : ð3:39Þ
L2 ψjL2 ψ
= ψjLx 2 ψ þ ψjLy 2 ψ þ ψjLz 2 ψ
= Lx { ψjLx ψ þ Ly { ψjLy ψ þ Lz { ψjLz ψ ð3:40Þ
= hLx ψjLx ψ i þ Ly ψjLy ψ þ hLz ψjLz ψ i
2
= jLx ψ jj2 þ Ly ψ þ jjLz ψ jj2 ≥ 0:
Notice that the second last equality comes from that Lx, Ly, and Lz are Hermitian.
An operator that satisfies (3.40) is said to be non-negative (see Sects. 1.4, 2.2, etc.
where we saw the calculation routines). Note also that in (3.40) the equality holds
only when the following relations hold:
j Lx ψi = j Ly ψi = j Lz ψi = 0: ð3:41Þ
j L2 ψi = j Lx 2 þ Ly 2 þ Lz 2 ψi = j Lx 2 ψiþ j Ly 2 ψiþ j Lz 2 ψi
ð3:42Þ
= Lx j Lx ψi þ Ly j Ly ψi þ Lz j Lz ψi = 0:
The eigenfunction that satisfies (3.42) and the next relation (3.43) is a simulta-
neous eigenstate of Lx, Ly, Lz, and L2. This could seem to be in contradiction to the
fact that Lz does not commute with Lx or Ly. However, this is an exceptional case. Let
jψ 0i be the eigenfunction that satisfies both (3.41) and (3.42). Then, we have
j Lx ψ 0 i = j Ly ψ 0 i = j Lz ψ 0 i = j L2 ψ 0 i = 0: ð3:43Þ
3.3 Separation of Variables 71
As can be seen from (3.24) to (3.26) along with (3.34), the operators Lx, Ly, Lz,
and L2 are differential operators. Therefore, (3.43) implies that jψ 0i is a constant. We
will come back this point later. In spite of this exceptional situation, it is impossible
that all Lx, Ly, and Lz as well as L2 take a whole set of eigenfunctions as simultaneous
eigenstates. We briefly show this as follows.
In Chap. 2, we mention that if [A, B] = ik, any physical state cannot be an
eigenstate of A or B. The situation is different, on the other hand, if we have a
following case
where A, B, and C are Hermitian operators. The relation (3.30) is a typical example
for this. If C j ψi = 0 in (3.44), jψi might well be an eigenstate of A and/or B.
However, if C j ψi = c j ψi (c ≠ 0), jψi cannot be an eigenstate of A or B. This can
readily be shown in a fashion similar to that described in Sect. 2.5. Let us think of,
e.g., [Lx, Ly] = iħLz. Suppose that for ∃ψ 0 we have Lz j ψ 0i = 0. Taking an inner
product using jψ 0i, from (3.30) we have
ψ 0 j Lx Ly - Ly Lx ψ 0 = 0:
γ = ħ2 λ ðλ ≥ 0Þ: ð3:46Þ
1 ħ2 ∂ ∂ L2 Ze2
- 2 r2 þ 2 - Rðr ÞY ðθ, ϕÞ = ERðr ÞY ðθ, ϕÞ: ð3:48Þ
2μ r ∂r ∂r r 4πε0 r
That is,
1 ħ2 ∂ ∂Rðr Þ ħ2 λ Ze2
- 2 r2 þ 2 Rðr Þ - Rðr Þ = ERðr Þ: ð3:51Þ
2μ r ∂r ∂r r 4πε0 r
Regarding angular components θ and ϕ, using (3.34), (3.37), and (3.46), we have
2
1 ∂ ∂ 1 ∂
L2 Y ðθ, ϕÞ = - ħ2 sin θ þ Y ðθ, ϕÞ
sin θ ∂θ ∂θ sin θ ∂ϕ2
2
2
1 ∂ ∂ 1 ∂
- sin θ þ Y ðθ, ϕÞ = λY ðθ, ϕÞ: ð3:53Þ
sin θ ∂θ ∂θ sin 2 θ ∂ϕ2
Notice in (3.53) that the angular part of SOLDE does not depend on a specific
form of the potential.
Now, we further assume that (3.53) can be separated into a zenithal angle part θ
and azimuthal angle part ϕ such that
Then we have
3.3 Separation of Variables 73
2
1 ∂ ∂ΘðθÞ 1 ∂ Φ ð ϕÞ
- sin θ Φ ð ϕÞ þ ΘðθÞ = λΘðθÞΦðϕÞ: ð3:55Þ
sin θ ∂θ ∂θ sin 2 θ ∂ϕ2
Multiplying both sides by sin2θ/Θ(θ)Φ(ϕ) and arranging both the sides, we get
2
1 ∂ Φð ϕÞ sin 2 θ 1 ∂ ∂ΘðθÞ
- = sin θ þ λΘðθÞ : ð3:56Þ
ΦðϕÞ ∂ϕ 2 Θ ðθ Þ sin θ ∂θ ∂θ
Since LHS of (3.56) depends only upon ϕ and RHS depends only on θ, we must
have
1 d2 ΦðϕÞ
- = η: ð3:58Þ
ΦðϕÞ dϕ2
d2
Putting D - dϕ2
, we get
The SOLDEs of (3.58) and (3.59) are formally the same as (1.61) of Sect. 1.3,
where boundary conditions (BCs) are Dirichlet conditions. Unlike (1.61), however,
we have to consider different BCs, i.e., the periodic BCs.
As in Example 1.1, we adopt two linearly independent solutions. That is, we have
Meanwhile, we have
Then,
1
ΦðϕÞ = p eimϕ ðm = 0, ± 1, ± 2, ⋯Þ: ð3:64Þ
2π
m2 eimϕ = ηeimϕ :
Therefore, we get
η = m2 ðm = 0, ± 1, ± 2, ⋯Þ: ð3:65Þ
1 d dΘðθÞ m2 ΘðθÞ
- sin θ þ = λΘðθÞ ðm = 0, ± 1, ± 2, ⋯Þ: ð3:66Þ
sin θ dθ dθ sin 2 θ
p
In (3.64) putting m = 0 as an eigenvalue, we have ΦðϕÞ = 1= 2π as a
corresponding eigenfunction. Unlike Examples 1.1 and 1.2, this reflects that the
differential operator -d2/dϕ2 accompanied by the periodic BCs is a non-negative
operator that allows an eigenvalue of zero. Yet, we are uncertain of a range of m. To
clarify this point, we consider generalized angular momentum in the next section.
3.4 Generalized Angular Momentum 75
Jx
J = ð e1 e2 e3 Þ Jy : ð3:67Þ
Jz
For the sake of simple notation, let us define J as follows so that we can eliminate
ħ and deal with dimensionless quantities in the present discussion:
J x =ħ Jx
J J=ħ = ðe1 e2 e3 Þ J y =ħ = ð e1 e 2 e 3 Þ Jy ,
ð3:68Þ
J z =ħ Jz
J2 = J x 2 þ J y 2 þ J z 2 :
J x , J y = iJ z , J y , J z = iJ x , and ½J z , J x = iJ y : ð3:69Þ
J x , J2 = 0, J y , J2 = 0, and J z , J2 = 0: ð3:70Þ
The implication of (3.71) is that jζ, μi is the simultaneous eigenstate and that μ is
an eigenvalue of Jz which jζ, μi belongs to with ζ being an eigenvalue of J2 which jζ,
μi belongs to as well.
Since Jz and J2 are Hermitian, both μ and ζ are real (see Sect. 1.4). Of these, ζ ≥ 0
as in the case of (3.45). We define the following operators J(+) and J(-) as in the case
of (3.27):
76 3 Hydrogen-Like Atoms
J ðþÞ , J2 = J ð- Þ , J2 = 0: ð3:73Þ
Equation (3.75) indicates that both J(+) j ζ, μi and J(-) j ζ, μi are eigenvectors of
J that correspond to an eigenvalue ζ.
2
J x 2 þ J y 2 = J2 - J z 2 : ð3:78Þ
Therefore,
J x 2 þ J y 2 j ζ, μi = J 2 - J z 2 j ζ, μi = ζ - μ2 j ζ, μi: ð3:79Þ
ζ - μ2 ≥ 0: ð3:80Þ
j, j - 1, j - 2, ⋯, j0 : ð3:82Þ
Operating these operators on jζ, ji or jζ, j′i and using (3.81) we get
J ð- Þ J ðþÞ j ζ, ji = J2 - J z 2 - J z j ζ, ji = ζ - j2 - j j ζ, ji = 0,
ð3:84Þ
J ðþÞ J ð- Þ j ζ, j0 i= J2 - J z 2 þ J z j ζ, j0 i = ζ - j0 þ j0 j ζ, j0 i = 0:
2
ζ - j2 - j = ζ - j0 þ j0 = 0:
2
ð3:85Þ
j þ j0 = 0 or j = - j0 : ð3:88Þ
μ = j, j - 1, j - 2, ⋯, - j - 1, - j: ð3:89Þ
That is, the number μ can take is (2j + 1). The relation (3.89) implies that taking a
positive integer k,
j - k = - j or j = k=2: ð3:90Þ
where the second equality comes from that jζ, μ - 1i has been normalized; i.e.,
||| ζ, μ - 1i|| = 1. Meanwhile, taking adjoint of both sides of the first equation of
(3.77), we have
{
hζ, μ j J ðþÞ = aμ ðþÞ hζ, μ þ 1 j : ð3:92Þ
However, from (3.72) and the fact that Jx and Jy are Hermitian,
{
J ð þÞ = J ð- Þ : ð3:93Þ
where again jζ, μi is assumed to be normalized. Comparing (3.91) and (3.95), we get
aμ ð- Þ = aμ - 1 ðþÞ : ð3:96Þ
Taking an inner product regarding the first equation of (3.77) and its adjoint,
3.5 Orbital Angular Momentum: Operator Approach 79
2
ζ, μjJ ð- Þ J ðþÞ jζ, μ = aμ ðþÞ aμ ðþÞ hζ, μ þ 1jζ, μ þ 1i = aμ ðþÞ : ð3:97Þ
Once again, the second equality of (3.97) results from the normalization of the
vector.
Using (3.83) as well as (3.71) and (3.86), (3.97) can be rewritten as
ζ, μjJ 2 - J z 2 - J z jζ, μ
2
= ζ, μjjðj þ 1Þ - μ2 - μjζ, μ = hζ, μjζ, μiðj - μÞðj þ μ þ 1Þ = aμ ðþÞ :
ð3:98Þ
Thus, we get
In (3.99) and (3.100), we routinely put δ = 0 so that aμ(+) and aμ(-) can be positive
numbers. Explicitly rewriting (3.77), we get
where j is a fixed given number chosen from among zero, positive integers, and
positive half-integers (or half-odd-integers).
As discussed above, we have derived various properties and relations with respect
to the generalized angular momentum on the basis of (1) the relation (3.30) or (3.69)
and (2) the fact that Jx, Jy, and Jz are Hermitian operators. This notion is very useful
in dealing with various angular momenta of different origins (e.g., orbital angular
momentum, spin angular momentum, etc.) from a unified point of view. In Chap. 20,
we will revisit this issue in more detail.
In Sect. 3.4 we have derived various important results on angular momenta on the
basis of the commutation relations (3.69) and the assumption that Jx, Jy, and Jz are
Hermitian. Now, let us return to the discussion on orbital angular momenta we dealt
with in Sects. 3.2 and 3.3. First, we treat the orbital angular momenta via operator
approach. This approach enables us to understand why a quantity j introduced in
80 3 Hydrogen-Like Atoms
Sect. 3.4 takes a value zero or positive integers with the orbital angular momenta. In
the next section (Sect. 3.6) we will deal with the related issues by an analytical
method.
In (3.28) we introduced differential operators L(+) and L(-). According to Sect.
3.4, we define the following operators to eliminate ħ so that we can deal with
dimensionless quantities:
Mx
M L=ħ = ðe1 e2 e3 Þ My ,
ð3:102Þ
Mz
M 2 = L2 =ħ2 = M x 2 þ M y 2 þ M z 2 :
Hence, we have
∂ ∂
M ðþÞ M x þ iM y = LðþÞ =ħ = eiϕ þ i cot θ , ð3:104Þ
∂θ ∂ϕ
∂ ∂
M ð- Þ M x - iM y = Lð- Þ =ħ = e - iϕ - þ i cot θ : ð3:105Þ
∂θ ∂ϕ
Then we have
2
1 ∂ ∂ 1 ∂
M2 = - sin θ þ : ð3:106Þ
sin θ ∂θ ∂θ sin 2 θ ∂ϕ2
∂ ∂ξ ∂ ∂ ∂ ∂
= = - sin θ =- 1 - ξ2 , sin θ =
∂θ ∂θ ∂ξ ∂ξ ∂ξ ∂θ
∂ ∂
- sin 2 θ = - 1 - ξ2 , ð3:108Þ
∂ξ ∂ξ
we get
3.5 Orbital Angular Momentum: Operator Approach 81
∂ ξ ∂
M ðþÞ = eiϕ - 1 - ξ2 þi ,
∂ξ 1-ξ ∂ϕ 2
∂ ξ ∂ ð3:109Þ
M ð- Þ = e - iϕ 1 - ξ2 þi ,
∂ξ 1 - ξ ∂ϕ2
2
∂ ∂ 1 ∂
M2 = - 1 - ξ2 - :
∂ξ ∂ξ 1 - ξ2 ∂ϕ2
m = j, j - 1, j - 2, ⋯, j - 1, - j: ð3:110Þ
j ðθ, ϕÞ / e
Ym :
imϕ
Therefore, we get
∂ mξ
M ðþÞ Y m
j ðθ, ϕÞ = e
iϕ
- 1 - ξ2 - j ðθ, ϕÞ
Ym
∂ξ 1 - ξ2
ð3:112Þ
mþ1 -m
∂
= -e iϕ
1-ξ 2
1-ξ 2
j ðθ, ϕÞ
Ym ,
∂ξ
-m -m-1 -1
∂ 1
1 - ξ2 = ð - mÞ 1 - ξ2 1 - ξ2 ð - 2ξÞ
∂ξ 2
-m-2
= mξ 1 - ξ2 ,
-m -m-2
ð3:113Þ
∂
1 - ξ2 j ðθ, ϕÞ = mξ
Ym 1 - ξ2 j ðθ, ϕÞ
Ym
∂ξ
-m
∂Y m
j ðθ, ϕÞ
þ 1 - ξ2 :
∂ξ
Similarly, we get
∂ mξ
M ð- Þ Y m
j ðθ, ϕÞ = e
- iϕ
1 - ξ2 - j ðθ, ϕÞ
Ym
∂ξ 1 - ξ2
ð3:114Þ
- mþ1 m
∂
= e - iϕ 1 - ξ2 1 - ξ2 j ðθ, ϕÞ :
Ym
∂ξ
mþn n
n ∂
M ðþÞ Y m n inϕ
j ðθ, ϕÞ = ð- 1Þ e 1 - ξ2
∂ξn
-m
1 - ξ2 j ðθ, ϕÞ :
Ym ð3:115Þ
nþ1 n
M ðþÞ j ðθ, ϕÞ ¼ M
Ym ðþÞ
M ðþÞ Y m
j ðθ, ϕÞ
∂ ξ ∂
¼ eiϕ - 1 - ξ2 þi
∂ξ 1- ξ 2 ∂ϕ
mþn n -m
∂
× ð- 1Þn einϕ 1 -ξ2 1 - ξ2 j ðθ,ϕÞ
Ym
∂ξn
∂ ξðn þ mÞ
¼ ð- 1Þn eiðnþ1Þϕ - 1- ξ2 -
∂ξ 1 - ξ2
mþn n -m
∂
× 1 - ξ2 1 -ξ2 j ðθ, ϕÞ
Ym
∂ξn
mþn -1
¼ ð- 1Þn eiðnþ1Þϕ - 1 - ξ2 ðm þ nÞ 1- ξ2
-1 n -m
1 ∂
× 1 - ξ2 ð - 2ξÞ 1 - ξ2 j ðθ,ϕÞ
Ym
2 ∂ξn
mþn nþ1 -m
∂
- 1- ξ2 1 - ξ2 1- ξ2 j ðθ,ϕÞ
Ym
∂ξnþ1
mþn -m
ξðn þ mÞ ∂
n
- 1 - ξ2 1 - ξ2 j ðθ,ϕÞ
Ym
1 - ξ2 ∂ξn
mþðnþ1Þ nþ1 -m
∂
¼ ð- 1Þnþ1 eiðnþ1Þϕ 1 - ξ2 1 - ξ2 j ðθ, ϕÞ :
Ym
∂ξnþ1
ð3:116Þ
Notice that the first and third terms in the second last equality canceled each other.
Thus, (3.115) certainly holds with (n + 1).
Similarly, we have [3]
- mþn n
n ∂
M ð- Þ Y m
j ðθ, ϕÞ = e
- inϕ
1 - ξ2
∂ξn
m
1 - ξ2 j ðθ, ϕÞ :
Ym ð3:117Þ
jþ1 -j
∂
M ð- Þ Y j- j ðθ, ϕÞ = e - iϕ 1 - ξ2 1 - ξ2 Y j- j ðθ, ϕÞ
∂ξ
= 0: ð3:118Þ
-j
This implies that 1 - ξ2 Y j- j ðθ, ϕÞ is a constant with respect to ξ. We
describe this as
-j
1 - ξ2 Y j- j ðθ, ϕÞ = c ðc : constant with respect to ξÞ: ð3:119Þ
jþ1 2jþ1
2jþ1 ∂
M ðþÞ Y j- j ðθ, ϕÞ = ð- 1Þ2jþ1 eið2jþ1Þϕ 1 - ξ2
∂ξ2jþ1
j
1 - ξ2 Y j- j ðθ, ϕÞ = 0: ð3:120Þ
j
1 - ξ2 Y j- j ðθ, ϕÞ = ðat most a 2j‐degree polynomial with ξÞ: ð3:121Þ
j j
j
c 1 - ξ2 1 - ξ2 = c 1 - ξ2
At the same time, so far as the orbital angular momentum is concerned, from
(3.71) and (3.86) we can identify ζ in (3.71) with l(l + 1). Namely, we have
3.5 Orbital Angular Momentum: Operator Approach 85
ζ = lðl þ 1Þ:
m = l, l - 1, l - 2, ⋯1, 0, - 1, ⋯ - l þ 1, - l: ð3:124Þ
1 d dΘðθÞ m2 ΘðθÞ
- sin θ þ = lðl þ 1ÞΘðθÞ, ð3:125Þ
sin θ dθ dθ sin 2 θ
l ðξÞ ΘðθÞ,
Pm ð3:126Þ
and considering (3.109) along with (3.54), we arrive at the next SOLDE described as
d l ðξÞ
dPm m2
1 - ξ2 þ lðl þ 1Þ - Pm ðξÞ = 0: ð3:127Þ
dξ dξ 1 - ξ2 l
lþ1 -l
∂
M ðþÞ Y ll ðθ, ϕÞ = - eiϕ 1 - ξ2 1 - ξ2 Y ll ðθ, ϕÞ : ð3:128Þ
∂ξ
-l
1 - ξ2 Y ll ðθ, ϕÞ = c ðc : constant with respect to ξÞ: ð3:129Þ
2π π π
2
dϕ sin θdθ Y ll ðθ, ϕÞ = 2π jκ l j2 sin 2lþ1 θdθ = 1, ð3:131Þ
0 0 0
where the integration is performed on a unit sphere. Note that an infinitesimal area
element on the unit sphere is represented by sin θdθdϕ.
We evaluate the above integral denoted as
π
I sin 2lþ1 θdθ: ð3:132Þ
0
π
2l 2l - 2 2 22lþ1 ðl!Þ2
I= ⋯ sin θdθ = : ð3:135Þ
2l þ 1 2l - 1 3 0 ð2l þ 1Þ!
Then,
-1 1
Ym
l ðθ, ϕÞ = M ð- Þ Y m
l ðθ, ϕÞ: ð3:138Þ
ðl - m þ 1Þðl þ mÞ
1
Y ll - 1 ðθ, ϕÞ = p M ð- Þ Y ll ðθ, ϕÞ: ð3:139Þ
2l
1 l-m
l ðθ, ϕÞ =
Ym M ð- Þ Y ll ðθ, ϕÞ
2lð2l - 1Þ⋯ðl þ m þ 1Þ 1 2⋯ðl - mÞ
ðl þ mÞ! l-m
= M ð- Þ Y ll ðθ, ϕÞ:
ð2lÞ!ðl - mÞ!
ð3:140Þ
-m l-m
l-m ∂
M ð- Þ Y ll ðθ, ϕÞ = e - iðl - mÞϕ 1 - ξ2
∂ξl - m
l
1 - ξ2 Y ll ðθ, ϕÞ : ð3:141Þ
l-m l
Further replacing M ð- Þ Y l ðθ, ϕÞ in (3.140) with that of (3.141), we get
-m l-m
ðl þ mÞ! - iðl - mÞϕ ∂
l ðθ, ϕÞ =
Ym e 1 - ξ2
ð2lÞ!ðl - mÞ! ∂ξl - m
l
1 - ξ2 Y ll ðθ, ϕÞ : ð3:142Þ
l-m
eiχ ð2l þ 1Þðl þ mÞ! imϕ - m=2 ∂ l
l ðθ, ϕÞ =
Ym e 1 - ξ2 1 - ξ2 : ð3:143Þ
2l l! 4π ðl - mÞ! ∂ξl - m
eiχ ð- 1Þl 2l þ 1 ∂
l
l
Y 0l ðθ, ϕÞ = 1 - ξ2 , ð3:144Þ
2l l!ð- 1Þl 4π ∂ξl
where we put (-1)l on both the numerator and denominator. In RHS of (3.144),
ð- 1Þl ∂l l
1 - ξ2 Pl ðξÞ: ð3:145Þ
2l l! ∂ξl
eiχ 2l þ 1
Y 0l ðθ, ϕÞ = P ðξÞ: ð3:146Þ
ð- 1Þl 4π l
eiχ 2l þ 1 eiχ 2l þ 1
Y 0l ð0, ϕÞ = Pl ð1Þ = , ð3:147Þ
ð- 1Þl 4π ð- 1Þl 4π
where we used Pl(1) = 1. For this important relation, see Sect. 3.6.1. Also noting that
eiχ
ð- 1Þl
= 1, we must have
eiχ
=1 or eiχ = ð- 1Þl ð3:148Þ
ð- 1Þl
l-m
ð- 1Þl ð2l þ 1Þðl þ mÞ! imϕ - m=2 ∂
l ðθ, ϕÞ =
Ym e 1 - ξ2
2l l! 4π ðl - mÞ! ∂ξl - m
l
1 - ξ2 : ð3:149Þ
j ψ 0 i Y 00 ðθ, ϕÞ:
M ð- Þ j l, mi = ðl - m þ 1Þðl þ mÞ j l, m - 1i,
ð3:151Þ
ð þÞ
M j l, mi = ðl - mÞðl þ m þ 1Þ j l, m þ 1i,
M ð- Þ
p
0 2l 1
0 ð2l - 1Þ 2
0 ⋱
⋱
= 0 ð2l - k þ 1Þ k
0
⋱ ⋱
p
0 1 2l
0
ð3:152Þ
where diagonal elements are zero and a (k, k + 1) element is ð2l - k þ 1Þ k. That
is, non-zero elements are positioned just above the zero diagonal elements.
Correspondingly, we have
90 3 Hydrogen-Like Atoms
0
p
2l1 0
ð2l-1Þ2 0
⋱ 0
ð þÞ
M = ð2l-k þ1Þk 0 , ð3:153Þ
⋱ ⋱
0
⋱ 0
p
12l 0
1 0
0 1
0 0
j l, - li = , j l, - l þ 1i = , ⋯,
⋮ ⋮
0 0
0 0
0 0
0 0
⋮ ⋮
j l, l - 1i = , j l, li = , ð3:154Þ
0 0
1 0
0 1
where the first number l in jl, - li, jl, - l + 1i, etc. denotes the quantum number
associated with λ = l(l + 1) given in (3.46) and is kept constant; the latter number
denotes m. Note from (3.154) that the column vector whose k-th row is 1 corresponds
to m such that
m = - l þ k - 1: ð3:155Þ
The operator M(-) converts the column vector whose (k + 1)-th row is 1 to that
whose k-th row is 1. The former column vector corresponds to jl, m + 1i and the
latter corresponding to jl, mi. Therefore, using (3.152), we get the following
representation:
M ð- Þ j l, m þ 1i = ð2l - k þ 1Þ k j l, mi = ðl - mÞðl þ m þ 1Þ
ð3:156Þ
× j l, mi,
where the second equality is obtained by replacing k with that of (3.155), i.e.,
k = l + m + 1. Changing m to (m - 1), we get the first equation of (3.151). Similarly,
we obtain the second equation of (3.151) as well. That is, we have
M 2 = M ðþÞ M ð- Þ þ M z 2 - M z :
In the above, M(+)M(-) and Mz are diagonal matrices and, hence, Mz2 and M2 are
diagonal matrices as well such that
92 3 Hydrogen-Like Atoms
-l
-l þ 1
-lþ 2
Mz = ⋱ ,
k-l
⋱
l
0
2l 1
ð2l - 1Þ 2
M ðþÞ M ð- Þ = ,
⋱
ð2l - k þ 1Þ k
⋱
1 2l
ð3:158Þ
lðl þ 1Þ
l ð l þ 1Þ
⋱
M =
2
lðl þ 1Þ : ð3:159Þ
⋱
l ð l þ 1Þ
lðl þ 1Þ
These expressions are useful to understand how the vectors of (3.154) constitute
simultaneous eigenstates of M2 and Mz. In this situation, the matrix representation is
3.5 Orbital Angular Momentum: Operator Approach 93
said to diagonalize both M2 and Mz. In other words, the quantum states represented
by (3.154) are simultaneous eigenstates of M2 and Mz.
The matrices (3.152) and (3.153) that represent M(-) and M(+), respectively, are
said to be ladder operators or raising and lowering operators, because operating
column vectors those operators convert jmi to jm ∓ 1i as mentioned above. The
operators M(-) and M(+) correspond to a and a{ given in (2.65) and (2.66), respec-
tively. All these operators are characterized by that the corresponding matrices have
diagonal elements of zero and that non-vanishing elements are only positioned on
“right above” or “right below” relative to the diagonal elements.
These matrices are a kind of triangle matrices and all their diagonal elements are
zero. The matrices are characteristic of nilpotent matrices. That is, if a suitable power
of a matrix is zero as a matrix, such a matrix is said to be a nilpotent matrix (see Part
III). In the present case, (2l + 1)-th power of M(-) and M(+) becomes zero as a matrix.
The operator M(-) and M(+) can be described by the following shorthand
representations:
ak ð2l - k þ 1Þ k,
2
M ð- Þ = aδ aδ
p k kþ1,p p pþ1,j
= ak akþ1 δkþ2,j
kj ð3:161Þ
= ð2l - k þ 1Þ k ½2l - ðk þ 1Þ þ 1 ðk þ 1Þδkþ2, j ,
M ðþÞ M ð- Þ = a δ aδ
p k - 1 k,pþ1 p pþ1, j
= ak - 1 aj - 1 δk, j = ðak - 1 Þ2 δk, j
kj
= ½2l - ðk - 1Þ þ 1 ðk - 1Þδk, j = ð2l - k þ 2Þðk - 1Þδk, j :
ð3:163Þ
Notice that although a0 is not defined, δ1, j + 1 = 0 for any j, and so this causes no
inconvenience. Hence, [M(+)M(-)]kj of (3.163) is well-defined with 1 ≤ k ≤ 2l + 1.
Important properties of angular momentum operators examined above are based
upon the fact that those operators are ladder operators and represented by nilpotent
matrices. These characteristics will further be studied in Parts III–V.
In this section, our central task is to solve the associated Legendre differential
equation expressed by (3.127) by an analytical method. Putting m = 0 in (3.127),
we have
d dP0l ðxÞ
1 - x2 þ lðl þ 1ÞP0l ðxÞ = 0, ð3:164Þ
dx dx
d l l
1 - x2 1 - x2 = - 2lx 1 - x2 , ð3:166Þ
dx
n n!
dn ðuvÞ = dm udn - m v, ð3:167Þ
m=0 m!ðn - mÞ!
where
dm u=dxm dm u:
The above shorthand notation is due to Byron and Fuller [4]. We use this notation
for simplicity from place to place.
Noting that the third order and higher order differentiations of (1 - x2) vanish in
LHS of (3.166), we have
l
LHS = dlþ1 1 - x2 d 1 - x2
l l
= 1 - x2 dlþ2 1 - x2 - 2ðl þ 1Þxd lþ1 1 - x2
l
- lðl þ 1Þd l 1 - x2 :
Also noting that the second order and higher order differentiations of 2lx vanish in
LHS of (3.166), we have
l
RHS = - dlþ1 2lx 1 - x2
l l
= - 2lxd lþ1 1 - x2 - 2lðl þ 1Þd l 1 - x2 :
Therefore,
l l l
LHS - RHS = 1 - x2 d lþ2 1 - x2 - 2xd lþ1 1 - x2 þ lðl þ 1Þdl 1 - x2 = 0:
We define Pl(x) as
ð- 1Þl dl l
Pl ðxÞ 1 - x2 , ð3:168Þ
2l l! dxl
l
where a constant ð-2l1l!Þ is multiplied according to the custom so that we can explicitly
represent Rodrigues formula of Legendre polynomials. Thus, from (3.164) Pl(x)
defined above satisfies Legendre differential equation. Rewriting it, we get
d 2 Pl ðxÞ dP ðxÞ
1 - x2 2
- 2x l þ lðl þ 1ÞPl ðxÞ = 0: ð3:169Þ
dx dx
Or equivalently, we have
96 3 Hydrogen-Like Atoms
d dPl ðxÞ
1 - x2 þ lðl þ 1ÞPl ðxÞ = 0: ð3:170Þ
dx dx
d l ð xÞ
dPm m2
1 - x2 þ l ð l þ 1Þ - Pm ðxÞ = 0, ð3:171Þ
dx dx 1 - x2 l
m = l, l - 1, l - 2, ⋯1, 0, - 1, ⋯ - l þ 1, - l:
d dyðxÞ
pð x Þ þ cðxÞyðxÞ = 0
dx dx
are of particular importance. We will come back to this point in Sect. 10.3.
Since m can be either positive or negative, from (3.171) we notice that Pm l ðxÞ and
-m
Pl ðxÞ must satisfy the same differential equation (3.171). This implies that Pm l ðxÞ
and Pl- m ðxÞ are connected, i.e., linearly dependent. First, let us assume that m ≥ 0. In
the case of m < 0, we will examine it later soon.
According to Dennery and Krzywicki [5], we assume
m=2
l ð xÞ = κ 1 - x
Pm C ðxÞ, ð3:172Þ
2
where κ is a constant. Inserting (3.172) into (3.171) and rearranging the terms, we
obtain
d2 C dC
1 - x2 - 2ðm þ 1Þx þ ðl - mÞðl þ m þ 1ÞC = 0 ð0 ≤ m ≤ lÞ: ð3:173Þ
dx2 dx
d2 d m Pl d d m Pl
1 - x2 m - 2ðm þ 1Þx þ ðl - mÞðl þ m þ 1Þ
dx 2 dx dx dxm
dm P
m l = 0, ð3:174Þ
dx
where we used the Leibniz rule about differentiation of (3.167). Comparing (3.173)
and (3.174), we find that
3.6 Orbital Angular Momentum: Analytic Approach 97
dm Pl
C ð xÞ = κ 0 ,
dxm
where κ′ is a constant. Inserting this relation into (3.172) and setting κκ ′ = 1, we get
m=2 dm Pl ðxÞ
l ð xÞ = 1 - x
Pm ð0 ≤ m ≤ lÞ: ð3:175Þ
2
dxm
Equations (3.175) and (3.176) define the associated Legendre functions. Note,
however, that the function form differs from literature to literature [2, 5, 6].
Among classical orthogonal polynomials, Gegenbauer polynomials C λn ðxÞ often
appear in the literature. The relevant differential equation is defined by
d2 λ d
1 - x2 C ðxÞ - ð2λ þ 1Þx C λn ðxÞ þ nðn þ 2λÞC λn ðxÞ
2 n
dx dx
1
=0 λ> - : ð3:177Þ
2
d 2 mþ12 d mþ1
1 - x2 C
2 l-m
ðxÞ - 2ðm þ 1Þx Cl - m2 ðxÞ þ ðl - mÞ
dx dx
mþ1
ðl þ m þ 1ÞC l - m2 ðxÞ = 0: ð3:178Þ
dm Pl ðxÞ mþ1
m = constant Cl - m2 ðxÞ ð0 ≤ m ≤ lÞ: ð3:179Þ
dx
Next, let us determine the constant appearing in (3.179). To this end, we consider
a following generating function of the polynomials C λn ðxÞ defined by [7, 8]
-λ 1 1
1 - 2tx þ t 2 n=0
Cλn ðxÞt n λ> - : ð3:180Þ
2
1 -λ m
ð 1 þ xÞ - λ = m=0
x , ð3:181Þ
m
-λ
where λ is an arbitrary real number and we define as
m
-λ -λ
- λð- λ - 1Þð- λ - 2Þ⋯ð- λ - m þ 1Þ=m! and
m 0
1: ð3:182Þ
n n! n
= and = 1:
m ðn - mÞ!m! 0
1 Γð- λ þ 1Þ
ð 1 þ xÞ - λ = xm , ð3:183Þ
m=0 m!Γð- λ - m þ 1Þ
In (3.184), ℜe denotes the real part of a number z. Changing variables such that
t = u2, we have
1
e - u u2z - 1 du ðℜe z > 0Þ:
2
ΓðzÞ = 2 ð3:185Þ
0
Note that the above expression is associated with the following fundamental
feature of the gamma functions:
-λ 1 Γð- λ þ 1Þ
1 - 2tx þ t 2 = ð- t Þm ð2x - t Þm : ð3:187Þ
m=0 m!Γð- λ - m þ 1Þ
Assuming that x is a real number belonging to an interval [-1, 1], (3.187) holds
with t satisfying |t| < 1 [8]. The discussion is as follows: When x satisfies the above
condition, solving 1 - 2tx + t2 = 0 we get solution t± such that
p
t ± = x ± i 1 - x2 :
Defining r as
r minfjt þ j, jt - jg,
(1 - 2tx + t2)-λ, which is regarded as a function of t, is analytic in the disk |t| < r.
However, we have
jt ± j = 1:
Thus, (1 - 2tx + t2)-λ is analytic within the disk |t| < 1 and, hence, it can be
expanded in a Taylor’s series (see Chap. 6).
Continuing the calculation of (3.187), we have
-λ 1 Γð-λ þ 1Þ m m!
1-2tx þ t2 ¼ m¼0 m!Γð-λ -m þ 1Þ
ð-1Þm t m k¼0 k!ðm -k Þ!
2m -k xm - k ð-1Þk tk
where the last equality results from that we rewrote gamma functions using (3.186).
Replacing (m + k) with n, we get
where [n/2] represents an integer that does not exceed n/2. This expression comes
from a requirement that an order of x must satisfy the following condition:
n - 2k ≥ 0 or k ≤ n=2: ð3:190Þ
Comparing (3.164) and (3.177) and putting λ = 1/2, we immediately find that the
two differential equations are identical [7]. That is,
n ðxÞ = Pn ðxÞ:
C1=2 ð3:192Þ
½n=2ð- 1Þk 2n - 2k Γ 1
þ n - k n - 2k
P n ð xÞ = k = 0 k!ðn - 2k Þ!
2
x : ð3:193Þ
Γ 12
1 1 1 1 3 3
Γ þ m ¼ m- Γ m- ¼ m- m- Γ m- ¼⋯
2 2 2 2 2 2
1 3 1 1
¼ m- m- ⋯ Γ
2 2 2 2
1 ð3:195Þ
¼ 2 - m ð2m - 1Þð2m - 3Þ⋯3 1 Γ
2
ð2m - 1 Þ! 1 ð2mÞ! 1
¼ 2-m m-1 Γ ¼ 2 - 2m Γ :
2 ðm - 1Þ! 2 m! 2
1 ð2n - 2kÞ! 1
Γ þ n - k = 2 - 2ðn - kÞ Γ : ð3:196Þ
2 ðn - kÞ! 2
1 p
1
e - u du = π :
2
Γ =2
2 0
For the derivation of the above definite integral, see (2.86) of Sect. 2.4. From
(3.184), we also have
Γð1Þ = 1:
In relation of the discussion of Sect. 3.5, let us derive an important formula about
Legendre polynomials. From (3.180) and (3.192), we get
- 1=2 1
1 - 2tx þ t2 P ðxÞt
n=0 n
n
: ð3:197Þ
- 12 1 1 1
1 - 2tx þ t 2 = = tn = P ð1Þt n
: ð3:198Þ
1-t n=0 n=0 n
Pn ð1Þ = 1:
k
½ðl - mÞ=2 ð- 1Þ ð2l - 2k Þ!ðl - 2k Þðl - 2k - 1Þ⋯ðl - 2k - m þ 1Þ l - 2k - m
d m Pl ðxÞ=dxm ¼ x
k¼0 l
2 k!ðl - k Þ!ðl - 2k Þ!
k
½ðl - mÞ=2 ð- 1Þ ð2l - 2kÞ!
¼ xl - 2k - m :
k¼0 2l k!ðl - kÞ!ðl - 2k - mÞ!
ð3:199Þ
Meanwhile, we have
Γ l þ 12 - k ð2l - 2kÞ! m!
= 2 - 2ðl - k - mÞ :
Γ m þ 12 ðl - kÞ! ð2mÞ!
Therefore, we get
102 3 Hydrogen-Like Atoms
where we used m ! = Γ(m + 1) and (2m) ! = Γ(2m + 1). Comparing (3.199) and
(3.201), we get
Hence, we have
When m = 0, we have
ð- 1Þl dl l
P0l ðxÞ = 1 - x2 = Pl ðxÞ: ð3:207Þ
2l l! dxl
3.6 Orbital Angular Momentum: Analytic Approach 103
ð- 1Þm ðl - mÞ! m
Pl- m ðxÞ = Pl ðxÞ ð- l ≤ m ≤ lÞ: ð3:210Þ
ðl þ mÞ!
-m
l ðxÞ and Pl
Thus, as expected earlier, Pm ðxÞ are linearly dependent.
Now, we return back to (3.149). From (3.206) we have
ð- 1Þl- m ðl þ mÞ!
l ðxÞ ¼
Pm ð1 þ xÞ - m=2 ð1 -xÞ - m=2
2l l!ðl -mÞ!
l- m ðl -mÞ! d r l d
l- m- r
× r¼0 r!ðl- m- r Þ! dx r ð 1 þ x Þ ð1- xÞl
dxl- m- r
ðl -mÞ! l!ð1 þ xÞl- r - 2 ð- 1Þl - m- r l!ð1-xÞmþr - 2
m m
ð- 1Þl- m ðl þ mÞ! l -m
¼
2l l!ðl -mÞ! r¼0 r!ðl -m -rÞ! ðl -rÞ! ðm þ r Þ!
ð-1Þr ð1 þ xÞl -r - 2 ð1-xÞrþ 2
m m
l!ðl þ mÞ! l- m
¼ :
2l r¼0 r!ðl - m- rÞ! ðl -r Þ! ðm þ r Þ!
ð3:214Þ
θ
l-m ð- 1Þr cos 2l - 2r - m sin 2rþm θ2
l ðxÞ = l!ðl þ mÞ!
Pm : ð3:215Þ
2
r = 0 r!ðl - m - r Þ! ðl - r Þ! ðm þ r Þ!
l ðθ, ϕÞ
Ym
ð2l þ 1Þðl þ mÞ!ðl - mÞ! imϕ l-m ð- 1Þrþm cos 2l - 2r - m θ2 sin 2rþm θ
= l! e 2
:
4π r=0 r!ðl - m - r Þ!ðl - r Þ!ðm þ r Þ!
ð3:216Þ
35 i3ϕ 35 - i3ϕ
Y 33 ðθ, ϕÞ = - e sin 3 θ, Y 3- 3 ðθ, ϕÞ = e sin 3 θ:
64π 64π
For the minus sign appearing in Y 33 ðθ, ϕÞ is due to the Condon–Shortley phase.
For l = 3 and m = 0, moreover, we have
θ θ
ð- 1Þr cos 6 -2r
sin 2r
7 3! 3! 3 2 2
Y 03 ðθ,ϕÞ ¼ 3!
4π r¼0 r!ð3 -rÞ!ð3 -rÞ!r!
θ θ θ θ θ
cos 6 cos 4 sin 2 cos 2 sin 4
7 2 2 2 2 2
¼ 18 - þ
π 0!3!3!0! 1!2!2!1! 2!1!1!2!
θ
sin 6
2
-
3!0!0!3!
θ θ θ θ θ θ
cos 6 - sin 6 cos 2 sin 2 sin 2 - cos 2
7 2 2 2 2 2 2
¼ 18 þ
π 36 4
7 5 3
¼ cos 3 θ - cos θ ,
π 4 4
where in the last equality we used formulae of elementary algebra and trigonometric
functions. At the same time, we get
7
Y 03 ð0, ϕÞ = :
4π
Orthogonality relation of functions is important. Here we deal with it, regarding the
associated Legendre functions.
106 3 Hydrogen-Like Atoms
m mþ1 m-1 m
1 - x2 d Pl - 2mx 1 - x2 d Pl
2 m-1 m-1
þ ð l þ m Þ ð l - m þ 1Þ 1 - x d Pl = 0:
m-1 m-1
d 1 - x2 Þm dm Pl = - ðl þ mÞðl - m þ 1Þ 1 - x2 d Pl : ð3:220Þ
1
m
f ðmÞ 1 - x2 ðdm Pl Þdm Pl0 dx ð0 ≤ m ≤ l, l0 Þ: ð3:221Þ
-1
1
m
f ðmÞ = 1 - x2 d dm - 1 Pl d m Pl0 dx
-1
1
m m 1 m m
= d m - 1 Pl 1 - x2 d Pl0 -1
- dm - 1 Pl d 1 - x2 d Pl0 dx
-1
1
m m
=- d m - 1 P l d 1 - x2 d Pl0 dx
-1
1
m-1 m-1
= dm - 1 Pl ðl0 þ mÞðl0 - m þ 1Þ 1 - x2 d Pl0 dx
-1
= ðl þ mÞðl0 - m þ 1Þf ðm - 1Þ,
0
ð3:222Þ
where with the second equality the first term vanished and with the second last
equality we used (3.220). Equation (3.222) gives a recurrence formula regarding
f(m). Further performing the calculation, we get
3.6 Orbital Angular Momentum: Analytic Approach 107
where we have
1
f ð 0Þ = Pl ðxÞPl0 ðxÞdx: ð3:224Þ
-1
To evaluate (3.224), we have two cases; i.e., (1) l ≠ l′ and (2) l = l′. With the first
case, assuming that l > l′ and taking partial integration, we have
1
l 0 l0
I= d l 1 - x2 d l 1 - x2 dx
-1
l 0 l0 1
= d l - 1 1 - x2 d l 1 - x2 ð3:226Þ
-1
1
l 0 l0
- d l - 1 1 - x2 dl þ1 1 - x2 dx:
-1
In the above, we find that the first term vanishes because it contains (1 - x2) as a
factor. Integrating (3.226) another l′ times as before, we get
1
0 0 0 l0
I = ð- 1Þl þ1
l
dl - l -1
1 - x2 d2l þ1 1 - x2 dx: ð3:227Þ
-1
0
In (3.227) we have 0 ≤ l - l′ - 1 ≤ 2l, and so d l - l - 1 ð1 - x2 Þ does not vanish,
l
l0 0 l0
but ð1 - x2 Þ is an at most 2l′-degree polynomial. Hence, d2l þ1 ð1 - x2 Þ vanishes.
Therefore,
f ð0Þ = 0: ð3:228Þ
If l < l′, changing Pl(x) and Pl0 ðxÞ in (3.224), we get f(0) = 0 as well.
108 3 Hydrogen-Like Atoms
1
l 2
I= d l 1 - x2 dx: ð3:229Þ
-1
1 1
l l l
I = ð- 1Þl 1 - x2 d 2l 1 - x2 dx = ð- 1Þ2l ð2lÞ! 1 - x2 dx: ð3:230Þ
-1 -1
1 π
l
1 - x2 dx = sin 2lþ1 θdθ: ð3:231Þ
-1 0
22lþ1 ðl!Þ2
We have already estimate this integral of (3.132) to have ð2lþ1Þ! in (3.135).
Therefore,
Thus, we get
ðl þ mÞ! ðl þ mÞ! 2
f ðm Þ = f ð 0Þ = : ð3:233Þ
ðl - mÞ! ðl - mÞ! 2l þ 1
1
ðl þ mÞ! 2
l ðxÞPl0 ðxÞdx =
Pm δ 0: ð3:234Þ
m
-1 ðl - mÞ! 2l þ 1 ll
Accordingly, putting
we get
1
l ðxÞPl0 ðxÞdx = δll0 :
Pm ð3:236Þ
m
-1
2l þ 1 ð- 1Þl 2l þ 1 d l l
Pl ðxÞ P l ð xÞ = l 1 - x2 : ð3:237Þ
2 2 l! 2 dxl
1
-m ð- 1Þm ðl - mÞ! ðl þ mÞ! 2 2ð- 1Þm
l ðxÞPl
Pm ðxÞdx = = : ð3:239Þ
-1 ðl þ mÞ! ðl - mÞ! 2l þ 1 2l þ 1
-m
This means that Pml ðxÞ and Pl ðxÞ are not orthogonal. Thus, we need eimϕ to
constitute the complete set of orthonormal system. In other words,
2π 1
0
dϕ dðcos θÞ Y m
l0 ðθ, ϕÞ Y l ðθ, ϕÞ = δll δmm :
m 0 0 ð3:240Þ
0 -1
corresponding approach to the radial equation for the electron has been less popular
to date. The initial approach, however, was made by Sunakawa [3]. The purpose of
this chapter rests upon further improvement of that approach.
In Sect. 3.2, the separation of variables leaded to the radial part of the Schrödinger
equation described as
1 ħ2 ∂ ∂Rðr Þ ħ2 λ Ze2
- 2 r2 þ 2 Rðr Þ - Rðr Þ = ERðr Þ: ð3:51Þ
2μ r ∂r ∂r r 4πε0 r
We identified λ with l(l + 1); see Sect. 3.5. Thus, rewriting (3.51) and indexing
R(r) with l, we have
ħ2 d 2 dRl ðr Þ ħ2 l ð l þ 1Þ Ze2
- 2
r þ 2
- Rl ðr Þ = ERl ðr Þ, ð3:241Þ
2μr dr dr 2μr 4πε 0r
where Rl(r) is a radial wave function parametrized with l; μ, Z, ε0, and E denote a
reduced mass of hydrogen-like atom, atomic number, permittivity of vacuum, and
eigenvalue of energy, respectively. Otherwise we follow conventions.
Now, we are in position to solve (3.241). As in the cases of Chap. 2 of a quantum-
mechanical harmonic oscillator and the previous section of the angular momentum
operator, we present the operator formalism in dealing with radial wave functions of
hydrogen-like atoms. The essential point rests upon that the radial wave functions
can be derived by successively operating lowering operators on a radial wave
function having a maximum allowed orbital angular momentum quantum number.
The results agree with the conventional coordinate representation method based
upon power series expansion that is related to associated Laguerre polynomials.
Sunakawa [3] introduced the following differential equation by suitable trans-
formations of a variable, parameter, and function:
d 2 ψ l ð ρÞ l ð l þ 1Þ 2
- þ - ψ l ðρÞ = Eψ l ðρÞ, ð3:242Þ
dρ2 ρ2 ρ
2
where ρ = Zra , E = 2μ a
ħ2 Z
E, and ψ l(ρ) = ρRl(r) with a (4πε0ħ2/μe2) being Bohr
radius of a hydrogen-like atom. Note that ρ and E are dimensionless quantities. The
related calculations are as follows: We have
3.7 Radial Wave Functions of Hydrogen-Like Atoms 111
dRl d ðψ l =ρÞ dρ dψ l 1 ψ l Z
= = - :
dr dρ dr dρ ρ ρ2 a
Thus, we get
dRl dψ l a d dR d 2 ψ l ð ρÞ
r2 = r - ψ l, r2 l = ρ:
dr dρ Z dr dr dρ2
d l 1
bl þ - : ð3:243Þ
dρ ρ l
Hence,
d l 1
b{l = - þ - , ð3:244Þ
dρ ρ l
where the operator b{l is an adjoint operator of bl. Notice that these definitions are
different from those of Sunakawa [3]. The operator dρ d
ð AÞ is formally an anti-
Hermitian operator. We have mentioned such an operator in Sect. 1.5. The second
terms of (3.243) and (3.244) are Hermitian operators, which we define as H. Thus,
we foresee that bl and b{l can be denoted as follows:
bl = A þ H and b{l = - A þ H:
d l 1 d l 1
bl b{l = þ - - þ -
dρ ρ l dρ ρ l
d2 d l 1 l 1 d l2 2 1
=- þ - - - þ - þ ð3:245Þ
dρ2 dρ ρ l ρ l dρ ρ2 ρ l2
d2 l l2 2 1 d2 l ð l - 1Þ 2 1
=- - þ - þ = - þ - þ 2:
dρ2 ρ2 ρ2 ρ l2 dρ2 ρ2 ρ l
Also, we have
d2 lðl þ 1Þ 2 1
b{l bl = - þ - þ 2: ð3:246Þ
dρ 2 ρ2 ρ l
d2 lðl þ 1Þ 2
H ðlÞ - þ - :
dρ2 ρ2 ρ
Then, from (3.243) and (3.244) as well as (3.245) and (3.246) we have
where εðlÞ - 1
ðlþ1Þ2
. Alternatively,
Here we assume that χ is normalized (i.e., hχ| χi = 1). On the basis of the
variational principle [9], the above expected value must take a minimum ε(n - 1)
so that χ can be an eigenfunction. To satisfy this condition, we have
3.7 Radial Wave Functions of Hydrogen-Like Atoms 113
j b{n χi = 0: ð3:251Þ
H ðn - 1Þ χ = εðn - 1Þ χ: ð3:252Þ
ðnÞ
ψ n - 1 χ: ð3:253Þ
ðnÞ
ψ ðnn-Þ s bn - sþ1 bn - sþ2 ⋯bn - 1 ψ n - 1 ð2 ≤ s ≤ nÞ: ð3:255Þ
ðnÞ
With these functions (s - 1) operators have been operated on ψ n - 1 . Note that if
s took 1, no operation of bl would take place. Thus, we find that bl functions upon the
l-state to produce the (l - 1)-state. That is, bl acts as an annihilation operator. For the
sake of convenience we express
H ðn,sÞ H ðn - sÞ : ð3:256Þ
ðnÞ
H ðn,sÞ ψ ðnn-Þ s = H ðn,sÞ bn - sþ1 bn - sþ2 ⋯bn - 1 ψ n - 1
ðnÞ
= bn - sþ1 H ðn,s - 1Þ bn - sþ2 ⋯bn - 1 ψ n - 1
ðnÞ
= bn - sþ1 bn - sþ2 H ðn,s - 2Þ ⋯bn - 1 ψ n - 1
⋯⋯
ðnÞ
= bn - sþ1 bn - sþ2 ⋯H ðn,2Þ bn - 1 ψ n - 1 ð3:257Þ
ðnÞ
= bn - sþ1 bn - sþ2 ⋯bn - 1 H ðn,1Þ ψ n - 1
ðnÞ
= bn - sþ1 bn - sþ2 ⋯bn - 1 εðn - 1Þ ψ n - 1
ðnÞ
= εðn - 1Þ bn - sþ1 bn - sþ2 ⋯bn - 1 ψ n - 1
= εðn - 1Þ ψ ðnn-Þ s :
Thus, total n functions ψ ðnn-Þ s ð1 ≤ s ≤ nÞ belong to the same eigenvalue ε(n - 1).
Notice that the eigenenergy En corresponding to ε(n - 1) is given by
114 3 Hydrogen-Like Atoms
ħ2 Z 2
1
En = - : ð3:258Þ
2μ a n2
ðnÞ
If we define l n - s and take account of (3.252), total n functions ψ l (l = 0,
1, 2, ⋯, n - 1) belong to the same eigenvalue ε(n - 1).
ðnÞ
The quantum state ψ l is associated with the operators H(l ). Thus, the solution of
ðnÞ
(3.242) has been given by functions ψ l parametrized with n and l on condition that
(3.251) holds. As explicitly indicated in (3.255) and (3.257), bl lowers the parameter
ðnÞ
l by one from l to l - 1, when it operates on ψ l . The operator b0 cannot be defined
as indicated in (3.243), and so the lowest number of l should be zero. Operators such
as bl are known as a ladder operator (lowering operator or annihilation operator in the
ðnÞ
present case). The implication is that the successive operations of bl on ψ n - 1
produce various parameters l as a subscript down to zero, while retaining the same
integer parameter n as a superscript.
ðnÞ
dψ n - 1 n 1 ðnÞ
- þ - ψ = 0: ð3:259Þ
dρ ρ n n-1
ðnÞ
ψ n - 1 = cn ρn e - ρ=n , ð3:260Þ
Namely,
1
j cn j 2 ρ2n e - 2ρ=n dρ = 1: ð3:262Þ
0
1
1
e - 2ρξ dρ = :
0 2ξ
1 2nþ1
1
ρ2n e - 2ρξ dρ = ð2nÞ!ξ - ð2nþ1Þ : ð3:263Þ
0 2
1 2nþ1
1
ρ2n e - 2ρ=n dρ = ð2nÞ!nð2nþ1Þ : ð3:264Þ
0 2
Hence,
nþ12
2
cn = = ð2nÞ!: ð3:265Þ
n
To further normalize the other wave functions, we calculate the following inner
product:
ðnÞ ðnÞ
ψ l jψ l = εðn - 1Þ - εðn - 2Þ εðn - 1Þ - εðn - 3Þ ⋯ εðn - 1Þ - εðlÞ
ðnÞ ðnÞ
ψ n - 1 jψ n - 1 : ð3:268Þ
ðnÞ
To show this, we use mathematical induction. We have already normalized ψ n - 1
ðnÞ ðnÞ
in (3.261). Next, we calculate ψ n - 2 jψ n - 2 such that
116 3 Hydrogen-Like Atoms
where with the third equality we used (3.267) with l = n - 1; with the last equality
we used (3.251). Therefore, (3.268) holds with l = n - 2. Then, it suffices to show
ðnÞ ðnÞ ðnÞ ðnÞ
that assuming that (3.268) holds with ψ lþ1 jψ lþ1 , it holds with ψ l jψ l as well.
ðnÞ ðnÞ
Let us calculate ψ l jψ l , starting with (3.266) as follows:
In the next step, using b{lþ2 blþ2 = blþ3 b{lþ3 þ εðlþ2Þ - εðlþ1Þ , we have
Thus, we find that in the first term the index number of b{lþ3 has been increased by
one with itself transferred toward the right side. On the other hand, we notice that
ðnÞ ðnÞ
with the second and third terms, εðlþ1Þ ψ lþ1 jψ lþ1 cancels out. Repeating the above
processes, we reach a following expression:
3.7 Radial Wave Functions of Hydrogen-Like Atoms 117
ð3:272Þ
In (3.272), the first term of RHS vanishes because of (3.251); the subsequent
ðnÞ ðnÞ
terms produce ψ lþ1 jψ lþ1 whose coefficients have canceled out one another except
for [ε(n - 1) - ε(l )].
Meanwhile, from assumption of the mathematical induction we have
ðnÞ ðnÞ
ψ lþ1 jψ lþ1 = εðn - 1Þ - εðn - 2Þ εðn - 1Þ - εðn - 3Þ ⋯ εðn - 1Þ - εðlþ1Þ
ðnÞ ðnÞ
ψ n - 1 jψ n - 1 :
Inserting this equation into (3.272), we arrive at (3.268). In other words, we have
shown that if (3.268) holds with l = l + 1, (3.268) holds with l = l as well. This
ðnÞ ðnÞ
completes the proof to show that (3.268) is true of ψ l jψ l with l down to 0.
ðnÞ
The normalized wave functions ψ l are expressed from (3.255) as
ðnÞ ðnÞ
ψ l = κ ðn, lÞ - 2 blþ1 blþ2 ⋯bn - 1 ψ n - 1 ,
1
ð3:273Þ
nþ12
ðnÞ 2 1
ψn-1 = ρn e - ρ=n : ð3:276Þ
n ð2nÞ!
- 12
bl εðn - 1Þ - εðl - 1Þ bl : ð3:277Þ
ðnÞ ðnÞ
ψ l = blþ1 blþ2 ⋯bn - 1 ψ n - 1 : ð3:278Þ
ðnÞ
It will be of great importance to compare the functions ψ l with conventional wave
functions that are expressed using associated Laguerre polynomials. For this purpose
ðnÞ
we define the following functions Φl ðρÞ such that
lþ32
ðnÞ 2 ðn - l - 1Þ! - ρn lþ1 2lþ1 2ρ
Φl ð ρ Þ e ρ Ln - l - 1 : ð3:279Þ
n 2nðn þ lÞ! n
1 - ν x dn nþν - x
Lνn ðxÞ = x e n ðx e Þ, ðν > - 1Þ: ð3:280Þ
n! dx
In a form of power series expansion, the polynomials are expressed for integer
k ≥ 0 as
n ð- 1Þm ðn þ kÞ!
Lkn ðxÞ = m = 0 ðn - mÞ!ðk þ mÞ!m!
xm : ð3:281Þ
Hence, instead of (3.280) and (3.281), the Rodrigues formula and power series
expansion of Ln(x) are given by [2, 5]
1 x dn n - x
Ln ðxÞ = e ðx e Þ,
n! dxn
n ð- 1Þm n!
Ln ðxÞ = m=0
xm :
ðn - mÞ!ðm!Þ2
3.7 Radial Wave Functions of Hydrogen-Like Atoms 119
ðnÞ ρ
The function Φl ðρÞ contains multiplication factors e - n and ρl + 1. The function
Ln - l - 1 2ρ
2lþ1
n is a polynomial of ρ with the highest order of ρn - l - 1. Therefore,
ðnÞ ρ
Φl ðρÞ consists of summation of terms containing e - n ρt , where t is an integer equal
ðnÞ
to 1 or larger. Consequently, Φl ðρÞ → 0 when ρ → 0 and ρ → 1 (vide supra).
ðnÞ
Thus, we have confirmed that Φl ðρÞ certainly satisfies proper BCs mentioned
d
earlier and, hence, the operator dρ is indeed an anti-Hermitian. To show this more
explicitly, we define D dρd
. An inner product between arbitrarily chosen functions
f and g is
1 1
hf jDgi f Dgdρ = ½f g1
0 - ðDf Þgdρ
0 0 ð3:282Þ
= ½f
g 1
0
þ h - D f jgi,
hf jDgi = D{ f jg : ð3:283Þ
Thus we get
D{ = - D:
ðnÞ
This means that D is anti-Hermitian. The functions Φl ðρÞ we are dealing with
certainly satisfy the required boundary conditions. The operator H(l ) appearing in
(3.247) and (3.248) is Hermitian accordingly. This is because
The Hermiticity is true of bl b{l as well. Thus, the eigenvalue and eigenstate
(or wave function) which belongs to that eigenvalue are physically meaningful.
Next, consider the following operation:
120 3 Hydrogen-Like Atoms
lþ32
ðnÞ
- 12 2 ðn - l - 1Þ! d l 1
bl Φl ðρÞ ¼ εðn - 1Þ - εðl - 1Þ þ -
n 2nðn þ lÞ! dρ ρ l
ρ 2ρ
e - n ρlþ1 Ln2lþ1
-l-1 , ð3:286Þ
n
where
- 12 nl
εðn - 1Þ - εðl - 1Þ = : ð3:287Þ
ðn þ lÞðn - lÞ
2ρ
Rewriting Ln2lþ1
-l-1 n in a power series expansion form using (3.280) and
rearranging the result, we obtain
lþ32
ðnÞ 2 nl 1 d l 1 ρ d n-l-1 nþl - 2ρn
bl Φl ðρÞ ¼ þ - en ρ -l ρ e
n ðn þ lÞðn-lÞ 2nðn þ lÞ!ðn-l-1Þ! dρ ρ l dρn-l-1
lþ32
2 nl 1 d l 1
¼ þ -
n ðn þ lÞðn-lÞ 2nðn þ lÞ!ðn-l-1Þ! dρ ρ l
m
n-l-1 ð-1Þ ðn þ lÞ! lþmþ1 2 m ðn-l-1Þ!
×e -ρ=n ρ ,
m¼0 ð2l þ m þ 1Þ! n m!ðn-l-m-1Þ!
ð3:288Þ
lþ32
ðnÞ 2 nþl nl 1
bl Φl ð ρÞ ¼ e - ρ=n ρl
n 2l ðn þ lÞðn - lÞ 2nðn þ lÞ!ðn - l - 1Þ!
ð3:289Þ
In (3.289), calculation of the part {⋯} next to the multiplication sign for RHS is
somewhat complicated, and so we describe the outline of the calculation procedure
below. We have
3.7 Radial Wave Functions of Hydrogen-Like Atoms 121
Notice that with the second equality of (3.290), the summation is divided into
three terms; i.e., 1 ≤ m ≤ n - l - 1, m = n - l (the highest-order term), and m = 0
(the lowest-order term). Note that with the second last equality of (3.290), the
highest-order (n - l) term and the lowest-order term (i.e., a constant) have been
absorbed in a single equation, namely, an associated Laguerre polynomial. Corre-
spondingly, with the second last equality of (3.290) the summation range is extended
to 0 ≤ m ≤ n - l.
Summarizing the above results, we get
122 3 Hydrogen-Like Atoms
lþ32
ðnÞ 2 nþl nlðn - lÞ! 1 2ρ
bl Φ l ð ρÞ ¼ e - ρ=n ρl Ln2l--l1
n 2l ðn þ lÞðn - lÞ 2nðn þ lÞ!ðn - l - 1Þ! n
lþ12
2 ðn - lÞ! 2ρ
¼ e - ρ=n ρl Ln2l--l1
n 2nðn þ l - 1Þ! n
ðl - 1Þþ32
2 ½n - ðl - 1Þ - 1! - ρn ðl - 1Þþ1 2ðl - 1Þþ1 2ρ
¼ e ρ Ln - ðl - 1Þ - 1
n 2n½n þ ðl - 1Þ! n
ðnÞ
Φl - 1 ðρÞ:
ð3:291Þ
ðnÞ ðnÞ
Thus, we find out that Φl ðρÞ behaves exactly like ψ l . Moreover, if we replace
l in (3.279) with n - 1, we find
ðnÞ ðnÞ
Φn - 1 ðρÞ = ψ n - 1 : ð3:292Þ
ðnÞ ðnÞ
Φn - 2 ð ρÞ = ψ n - 2 : ð3:293Þ
ðnÞ ðnÞ
Φl ðρÞ = ψ l ðρÞ, ð3:294Þ
ðnÞ ðnÞ
Φl ðρÞ ψ l ðρÞ: ð3:295Þ
ðnÞ ðnÞ
Rl ðr Þ = ψ l =ρ: ð3:296Þ
ðnÞ
To normalize Rl ðr Þ, we have to calculate the following integral:
3.7 Radial Wave Functions of Hydrogen-Like Atoms 123
1 2 1 2 2 3 1 2
ðnÞ 1 ðnÞ a a a ðnÞ
Rl ðr Þ r 2 dr = ψ ρ dρ = ψl dρ
0 0 ρ2 l Z Z Z 0
a 3
= : ð3:297Þ
Z
ðnÞ
Accordingly, we choose the following functions Rl ðr Þ for the normalized radial
wave functions:
ðnÞ ðnÞ
R l ðr Þ = ðZ=aÞ3 Rl ðr Þ: ð3:298Þ
Substituting (3.296) into (3.298) and taking account of (3.279) and (3.280), we
obtain
ðnÞ 2Z 3
ðn - l - 1Þ! 2Zr l
Zr 2Zr
Rl ðr Þ = exp - Ln2lþ1
-l-1 : ð3:299Þ
an 2nðn þ lÞ! an an an
Equation (3.298) is exactly the same as the normalized radial wave functions that
can be obtained as the solution of (3.241) through the power series expansion. All
ħ2 Z 2 1
these functions belong to the same eigenenergy E n = - 2μ a n2 ; see (3.258).
Returning back to the quantum states sharing the same eigenenergy, we have such
n states that belong to the principal quantum number n with varying azimuthal
quantum number l (0 ≤ l ≤ n - 1). Meanwhile, from (3.158), (3.159), and
(3.171), 2l + 1 quantum states possess an eigenvalue of l(l + 1) with the quantity
M2. As already mentioned in Sects. 3.5 and 3.6, the integer m takes the 2l + 1
different values of
m = l, l - 1, l - 2, ⋯1, 0, - 1, ⋯ - l þ 1, - l:
n-1
l=0
ð2l þ 1Þ = n2 :
Regarding this situation, we say that the quantum states belonging to the principal
ħ2 Z 2 1
quantum number n, or the energy eigenvalue E n = - 2μ a
2
n2 , are degenerate n -
fold. Note that we ignored the freedom of spin state.
In summary of this section, we have developed the operator formalism in dealing
with radial wave functions of hydrogen-like atoms and seen how the operator
formalism features the radial wave functions. The essential point rests upon that
the radial wave functions can be derived by successively operating the lowering
124 3 Hydrogen-Like Atoms
ðnÞ
operators bl on ψ n - 1 that is parametrized with a principal quantum number n and an
orbital angular momentum quantum number l = n - 1. This is clearly represented by
(3.278). The results agree with the conventional coordinate representation method
based upon the power series expansion that leads to associated Laguerre polyno-
mials. Thus, the operator formalism is again found to be powerful in explicitly
representing the mathematical constitution of quantum-mechanical systems.
Since we have obtained angular wave functions and radial wave functions, we
ðnÞ
describe normalized total wave functions Λl,m of hydrogen-like atoms as a product
of the angular part and radial part such that
ðnÞ ðnÞ
Λl,m = Y m
l ðθ, ϕÞRl ðr Þ: ð3:300Þ
ðnÞ
ð1Þ 1 - 3=2 ψ n - 1 1 - 3=2 - r=a
ϕð1sÞ Y 00 ðθ, ϕÞR0 ðr Þ = a = a e , ð3:301Þ
4π ρ π
ð2Þ 1 r
ϕð2sÞ Y 00 ðθ, ϕÞR0 ðr Þ = p a - 2 e - 2a 2 - :
3 r
ð3:302Þ
4 2π a
ð2Þ 3 1 3 r
ðcos θÞ p a - 2 e - 2a
r
ϕ 2pz Y 01 ðθ, ϕÞR1 ðr Þ =
4π 2 6 a
1 - 32 r - 2a 1 3 r r z 1
e cos θ = p a - 2 e - 2a = p a - 2 e - 2a z:
r 5 r
= p a
4 2π a 4 2π a r 4 2π
ð3:303Þ
ð2Þ 1 3 r
ϕ 2pxþiy Y 11 ðθ, ϕÞR1 ðr Þ = - p a - 2 e - 2a sin θeiϕ
r
8 π a
ð3:304Þ
1 3 r r x þ iy 1
= - p a - 2 e - 2a = - p a - 2 e - 2a ðx þ iyÞ:
5 r
8 π a r 8 π
In (3.304), the minus sign comes from the Condon–Shortley phase. Furthermore,
we have
ð2Þ 1 3 r
ϕ 2px - iy Y 1- 1 ðθ, ϕÞR1 ðr Þ = p a - 2 e - 2a sin θe - iϕ
r
8 π a
ð3:305Þ
1 3 r r x - iy 1
= p a - 2 e - 2a = p a - 2 e - 2a ðx - iyÞ:
5 r
8 π a r 8 π
Notice that the above notations ϕ(2px + iy) and ϕ(2px - iy) differ from the custom
that uses, e.g., ϕ(2px) and ϕ(2py). We will come back to this point in Sect. 4.3.
References
∂ψ
Hψ = iħ : ð1:47Þ
∂t
∂ξðt Þ
iħ = Eξðt Þ: ð1:56Þ
∂t
The probability density of the system (i.e., normally a particle such as an electron,
a harmonic oscillator, etc.) residing at a certain place x at a certain time t is expressed
as
ψ ðx, t Þψ ðx, t Þ:
This means that the probability density of the system depends only on spatial
coordinate and is constant in time. Such a state is said to be a stationary state. That is,
the system continues residing in a quantum state described by ϕ(x) and remains
unchanged independent of time.
Next, we consider a linear combination of functions described by (1.60). That is,
we have
where the first term is pertinent to the state 1 and second term to the state 2 and c1 and
c2 are complex constants with respect to the spatial coordinates but may be weakly
time-dependent. The state described by (4.2) is called a coherent state. The proba-
bility distribution of that state is described as
2
ψ ðx, t Þψ ðx, t Þ = jc1 j2 ϕ1 j2 þ c2 jϕ2 j2 þ c1 c2 ϕ1 ϕ2 e - iωt þ c2 c1 ϕ2 ϕ1 eiωt , ð4:3Þ
where ω is expressed as
4.1 Electric Dipole Transition 129
This equation shows that the probability density of the system undergoes a
sinusoidal oscillation with time. The angular frequency equals the energy difference
between the two states divided by the reduced Planck constant. If the system is a
charged particle such as an electron and proton, the sinusoidal oscillation is accom-
panied by an oscillating electromagnetic field. Thus, the coherent state is associated
with the optical transition from one state to another, when the transition is related to
the charged particle.
The optical transitions result from various causes. Of these, the electric dipole
transition yields the largest transition probability and the dipole approximation is
often chosen to represent the transition probability. From the point of view of optical
measurements, the electric dipole transition gives the strongest absorption or emis-
sion spectral lines. The matrix element of the electric dipole, more specifically a
square of an absolute value of the matrix element, is a measure of the optical
transition probability. Labeling the quantum states as a, b, etc. and describing the
corresponding state vector as jai, jbi, etc., the matrix element Pba is given by
Pe x,
j j
ð4:6Þ
where e is an elementary charge (e < 0) and xj is a position vector of the j-th electron.
Detailed description of εe and P can be seen in Chap. 7. The quantity Pba is said to be
transition dipole moment, or more precisely, transition electric dipole moment with
respect to the states jai and jbi. We assume that the optical transition occurs from a
quantum state jai to another state jbi. Since Pba is generally a complex number, |
Pba|2 represents the transition probability.
If we adopt the coordinate representation, (4.5) is expressed by
1
ψ ðx, t Þ = p ½ϕ1 ðxÞ expð- iE 1 t=ħÞ þ ϕ2 ðxÞ expð- iE 2 t=ħÞ, ð4:8Þ
2
where we put c1 = c2 = p1
2
in (4.2). In (4.8), we have
2 2
ϕ1 ð x Þ = cos x and ϕ2 ðxÞ = sin 2x: ð4:9Þ
π π
1
ψ ðx, t Þψ ðx, t Þ = cos 2 x þ sin 2 2x þ ðsin 3x þ sin xÞ cos ωt , ð4:10Þ
π
ω = 3ħ=2m, ð4:11Þ
1 1
ψ ðx, t Þψ ðx, t Þ = 1 þ ðcos 2x - cos 4xÞ þ ðsin 3x þ sin xÞ cos ωt : ð4:12Þ
π 2
Integrating (4.12) over - π2, π2 , a contribution from only the first term is
non-vanishing to give 1, as anticipated (because of the normalization).
Putting t = 0 and integrating (4.12) over a positive domain 0, π2 , we have
π=2
1 4
ψ ðx, 0Þψ ðx, 0Þdx = þ ≈ 0:924: ð4:13Þ
0 2 3π
,
potential. The solid curve
and broken curve represent
,
the density of t = 0 and
t = π/ω (i.e., half period),
respectively
− /2 0 /2
0
1 4
ψ ðx, 0Þψ ðx, 0Þdx = - ≈ 0:076: ð4:14Þ
- π=2 2 3π
P = ex = exe1 , ð4:15Þ
where x is a position vector of the electron. Let us define the matrix element of the
electric dipole transition as
Notice that we only have to consider that the polarization of light is parallel to the
x-axis. With the coordinate representation, we have
132 4 Optical Transition and Selection Rules
π=2 π=2
2 2
P21 = ϕ2 ðxÞexϕ1 ðxÞdx = ðcos xÞex sin 2xdx
- π=2 - π=2 π π
π=2 π=2
2 e
=e x cos x sin 2xdx = xðsin x þ sin 3xÞdx ð4:17Þ
π - π=2 π - π=2
π=2 0
e 1 16e
= xð- cos xÞ0 þ x - cos 3x dx = ,
π - π=2 3 9π
where we used a trigonometric formula and integration by parts. The factor 16/9π in
(4.17) is about 36% of π/2. This number is pretty good agreement with 40% that is
estimated above from the major maximum of ψ (x, 0)ψ(x, 0).
Note that the transition moment vanishes if the two states associated with the
transition have the same parity. In other words, if these are both described by sine
functions or cosine functions, the integral vanishes.
Example 4.2 One-dimensional harmonic oscillator Second, let us think of an
optical transition regarding a harmonic oscillator that we dealt with in Chap. 2. We
denote the state of the oscillator as jni in place of jψ ni (n = 0, 1, 2, ⋯) of Chap. 2.
Then, a general expression (4.5) can be written as
εe P = eq: ð4:20Þ
That is,
ħ ħ
Pkl = e kja þ a{ jl = e hkjajli þ hkja{ jli : ð4:23Þ
2mω 2mω
ħ p p
Pkl = e k þ 1hk þ 1jli þ l þ 1hkjl þ 1i : ð4:25Þ
2mω
ħ p p
Pkl = e k þ 1δkþ1, l þ l þ 1δk, lþ1 : ð4:26Þ
2mω
Pkl = Plk :
The matrix element Pkl is symmetric with respect to indices k and l. Notice that
the first term does not vanish only when k + 1 = l. The second term does not vanish
only when k = l + 1. Therefore we get
ħðk þ 1Þ ħð l þ 1Þ
Pk,kþ1 = e and Plþ1,l = e : ð4:27Þ
2mω 2mω
0 1 0 0 0 ⋯
p
1 0 2 0 0 ⋯
p p
ħ ħ 0 2 0 3 0 ⋯
P = eq = e a þ a{ = e p , ð4:28Þ
2mω 2mω 0 0 3 0 2 ⋯
0 0 0 2 0 ⋯
⋮ ⋮ ⋮ ⋮ ⋮ ⋱
where we used (2.68). Note that a real Hermitian matrix is a symmetric matrix.
Practically, it is a fast way to construct a transition matrix (4.28) using (2.65) and
(2.66). It is an intuitively obvious and straightforward task. Having a glance at the
matrix form immediately tells us that the transition matrix elements are
134 4 Optical Transition and Selection Rules
non-vanishing with only (k, k + 1) and (k + 1, k) positions. Whereas the (k, k + 1)-
element represents transition from the k-th excited state to (k - 1)-th excited state
accompanied by photoemission, the (k + 1, k)-element implies the transition from
(k - 1)-th excited state to k-th excited state accompanied by photoabsorption. The
two transitions give the same transition moment. Note that zeroth excited state
means the ground state; see (2.64) for basis vector representations.
We should be careful about “addresses” of the matrix accordingly. For example,
P0, 1 in (4.27) represents a (1, 2) element of the matrix (4.28); P2, 1 stands for a (3, 2)
element.
Suppose that we seek the transition dipole moments using coordinate represen-
tation. Then, we need to use (2.106) and perform definite integration. For instance,
we have
1
e ψ 0 ðqÞqψ 1 ðqÞdq
-1
ħ
that corresponds to (1, 2) element of (4.28). Indeed, the above integral gives e 2mω.
The confirmation is left for the readers. Nonetheless, to seek a definite integral of
product of higher excited-state wave functions becomes increasingly troublesome. In
this respect, the operator method described above provides us with a much better
insight into complicated calculations.
Equations (4.26)–(4.28) imply that the electric dipole transition is allowed to
occur only when the quantum number changes by one. Notice also that the transition
takes place between the even function and odd function; see Table 2.1 and (2.101).
Such a condition or restriction on the optical transition is called a selection rule. The
former equation of (4.27) shows that the transition takes place from the upper state to
the lower state accompanied by the photon emission. The latter equation, on the
other hand, shows that the transition takes place from the lower state to the upper
accompanied by the photon absorption.
The hydrogen-like atoms give us a typical example. Since we have fully investigated
the quantum states of those atoms, we make the most of the related results.
Example 4.3 An electron in a hydrogen atom Unlike the one-dimensional sys-
tem, we have to take account of an angular momentum in the three-dimensional
system. We have already obtained explicit wave functions. Here we focus on 1s and
2p states of a hydrogen. For their normalized states, we have
4.3 Three-Dimensional System 135
1 - r=a
ϕð1sÞ = e , ð4:29Þ
πa3
1 1 r - r=2a
ϕ 2pz = e cos θ, ð4:30Þ
4 2πa3 a
1 1 r - r=2a
ϕ 2pxþiy = - e sin θeiϕ , ð4:31Þ
8 πa3 a
1 1 r - r=2a
ϕ 2px - iy = e sin θe - iϕ , ð4:32Þ
8 πa3 a
where a denotes Bohr radius of a hydrogen. Note that a minus sign of ϕ(2px + iy) is
due to the Condon–Shortley phase. Even though the transition probability is pro-
portional to a square of the matrix element and so the phase factor cancels out, we
describe the state vector faithfully. The energy eigenvalues are
ħ2 ħ2
E ð1sÞ = - , E 2pz = E 2pxþiy = E 2px - iy = - , ð4:33Þ
2μa2 8μa2
where μ is a reduced mass of a hydrogen. Note that the latter three states are
degenerate.
First, we consider a transition between ϕ(1s) and ϕ(2pz) states. Suppose that the
normalized coherent state is described as
1 iE ð1sÞt iE 2pz t
ψ ðx, t Þ = p ϕð1sÞ exp - þ ϕ 2pz exp - : ð4:34Þ
2 ħ ħ
As before, we have
where ω is given by
In virtue of the third term of (4.35) that contains a cos ωt factor, the charge
distribution undergoes a sinusoidal oscillation along the z-axis with an angular
frequency described by (4.36). For instance, ωt = 0 gives +1 factor to (4.35) when
t = 0, whereas it gives -1 factor when ωt = π, i.e., t = 8π μa2/3ħ.
Integrating (4.35), we have
136 4 Optical Transition and Selection Rules
1 2
¼ ½ϕð1sÞ2 þ ϕ 2pz dτ
2
where we used normalized functional forms of ϕ(1s) and ϕ(2pz) together with
orthogonality of them. Note that both of the functions are real.
Next, we calculate the matrix element. For simplicity, we denote the matrix
element simply as Pðεe Þ only by designating the unit polarization vector εe. Then,
we have
where
x
P = ex = eðe1 e2 e3 Þ y : ð4:38Þ
z
We have three possibilities of choosing εe out of e1, e2, and e3. Choosing e3, we
have
ðe Þ
Pz,jp3 i = e ϕð1sÞjzjϕ 2pz
p
z
1 π 2π
e 27 2
= p r 4 e - 3r=2a dr cos 2 θ sin θdθ dϕ = ea ≈ 0:745ea:
4 2πa4 0 0 0 35
ð4:39Þ
ðe Þ
In (4.39), we express the matrix element as Pz,jp3 i to indicate the z-component of
z
position vector and to explicitly show that ϕ(2pz) state is responsible for the
transition. In (4.39), we used z = r cos θ. We also used a radial part integration
such that
1
2a 5
r 4 e - 3r=2a dr = 24 :
0 3
4.3 Three-Dimensional System 137
ðe Þ
Pz,hp3 j = e ϕ 2pz jzjϕð1sÞ : ð4:40Þ
z
Since all the functions related to the integration are real, we have
ðe Þ ðe Þ
Pz,jp3 i = Pz,hp3 j ,
z z
where hpzj means that ϕ(2pz) is designated as the final state. Meanwhile, if we choose
e1 for εe to evaluate the matrix element Px, we have
ðe Þ
Px,jp1 i = e ϕð1sÞjxjϕ 2pz
z
e 1 π 2π ð4:41Þ
= p r 4 e - 3r=2a dr sin 2 θ cos θdθ cos ϕdϕ = 0,
4 2πa4 0 0 0
where cos ϕ comes from x = r sin θ cos ϕ and an integration of cos ϕ gives zero. In a
similar manner, we have
ðe Þ
Py,jp2 i = e ϕð1sÞjyjϕ 2pz = 0: ð4:42Þ
z
Next, we estimate the matrix elements associated with 2px and 2py. For this
purpose, it is convenient to introduce the following complex coordinates by a unitary
transformation:
138 4 Optical Transition and Selection Rules
1 1 1 i
p p 0 p p 0
x 2 2 2 2 x
ð e1 e2 e3 Þ y = ð e1 e2 e3 Þ -p
i
p
i
0 p
1
-p
i
0 y
2 2 2 2
z z
0 0 1 0 0 1
1
p ðx þ iyÞ
2
1 1
= p ðe1 - ie2 Þ p ðe1 þ ie2 Þ e3 1 ,
2 2 p ðx - iyÞ
2
z
ð4:43Þ
U { U = UU { = E: ð4:44Þ
We will investigate details of the unitary transformation and matrix in Parts III
and IV.
We define e+ and e- as follows [1]:
1 1
eþ p ðe1 þ ie2 Þ and e - p ðe1 - ie2 Þ, ð4:45Þ
2 2
where complex vectors e+ and e- represent the left-circularly polarized light and
right-circularly polarized light that carry an angular momentum ħ and -ħ, respec-
tively. We will revisit the characteristics and implication of these complex vectors in
Sect. 7.4.
We have
1
x p ðx þ iyÞ
2
ð e1 e2 e3 Þ y = ð e - eþ e 3 Þ 1 : ð4:46Þ
p ðx - iyÞ
z 2
z
In this situation, e+, e-, and e3 are said to form an orthonormal basis in a three-
dimensional complex vector space (see Sect. 11.4).
Now, choosing e+ for εe, we have [2]
4.3 Three-Dimensional System 139
ðe Þ 1
Px -þ iy,jp e ϕð1sÞjp ðx - iyÞjϕ 2pxþiy , ð4:48Þ
þi
2
where jp+i is a shorthand notation of ϕ(2px + iy) that is designated as the initial state;
x - iy represents a complex electric dipole. Equation (4.48) represents an optical
process in which an electron causes transition from ϕ(2px + iy) to ϕ(1s) to lose an
angular momentum ħ, whereas the radiation field gains that angular momentum to
ðe Þ
conserve a total angular momentum ħ. The notation Px -þ iy,jp i reflects this situation.
þ
Using the coordinate representation, we rewrite (4.48) as
1 π 2π
ðe Þ e
Px -þ iy,jp i = - p r 4 e - 3r=2a dr sin 3 θdθ e - iϕ eiϕ dϕ =
þ
8 2πa4 0 0 0
p
27 2
- ea, ð4:49Þ
35
where we used
x - iy = r sin θe - iϕ : ð4:50Þ
In the definite integral of (4.49), e-iϕ comes from x - iy, while eiϕ comes from
ϕ(2px + iy). Note that from (3.24) eiϕ is an eigenfunction corresponding to an angular
momentum eigenvalue ħ. Notice that in (4.49) exponents e-iϕ and eiϕ cancel out and
that an azimuthal integral is non-vanishing.
If we choose e- for εe, we have
ðe - Þ 1
Pxþiy,jp = e ϕð1sÞjp ðx þ iyÞjϕ 2pxþiy ð4:51Þ
þi
2
1 π 2π
e
= p r 4 e - 3r=2a dr sin 3 θdθ e2iϕ dϕ = 0, ð4:52Þ
8 2πa4 0 0 0
where we used
x þ iy = r sin θeiϕ :
With (4.52), a factor e2iϕ results from the product ϕ(2px + iy)(x + iy), which
renders the integral (4.51) vanishing. Note that the only difference between (4.49)
and (4.52) is about the integration of ϕ factor. For the same reason, if we choose e3
for εe, the matrix element vanishes. Thus, with the ϕ(2px + iy)-related matrix element,
ðe Þ
only Px -þ iy,jp i survives. Similarly, with the ϕ(2px - iy)-related matrix element, only
þ
ðe - Þ
Pxþiy,jp survives. Notice that jp-i is a shorthand notation of ϕ(2px - iy) that is
-i
designated as the initial state. That is, we have, e.g.,
140 4 Optical Transition and Selection Rules
p
ðe - Þ 1 27 2
Pxþiy,jp = e ϕð1sÞjp ðx þ iyÞjϕ 2px - iy = 5 ea,
-i
2 3
ð4:53Þ
ðe Þ 1
Px -þ iy,jp i = e ϕð1sÞjp ðx - iyÞjϕ 2px - iy = 0:
-
2
ðe Þ
Here recall (1.116) and (x - iy){ = x + iy. Also note that since Px -þ iy,jp is real,
þi
ðe Þ
Px -þ iy,jp i is real as well so that we have
þ
ðe Þ ðe Þ ðe Þ
Px -þ iy,jp = Px -þ iy,jp i = Pxþiy,hp
-
j, ð4:55Þ
þi þ þ
where hp+j means that ϕ(2px + iy) is designated as the final state.
Comparing (4.48) and (4.55), we notice that the polarization vector has been
switched from e+ to e- with the allowed transition, even though the matrix element
remains the same. This can be explained as follows: In (4.48) the photon emission is
occurring, while the electron is causing a transition from ϕ(2px + iy) to ϕ(1s). As a
result, the radiation field has gained an angular momentum by ħ during the process in
which the electron has lost an angular momentum ħ. In other words, ħ is transferred
from the electron to the radiation field and this process results in the generation of
left-circularly polarized light in the radiation field.
In (4.54), on the other hand, the reversed process takes place. That is, the photon
absorption is occurring in such a way that the electron is excited from ϕ(1s) to
ϕ(2px + iy). After this process has been completed, the electron has gained an angular
momentum by ħ, whereas the radiation field has lost an angular momentum by ħ. As
a result, the positive angular momentum ħ is transferred to the electron from the
radiation field that involves left-circularly polarized light. This can be translated into
the statement that the radiation field has gained an angular momentum by -ħ. This is
equivalent to the generation of right-circularly polarized light (characterized by e-)
in the radiation field. In other words, the electron gains the angular momentum by ħ
to compensate the change in the radiation field.
The implication of the first equation of (4.53) can be interpreted in a similar
manner. Also we have
4.3 Three-Dimensional System 141
ðe Þ ðe - Þ ðe Þ 1
-
Pxþiy,jp = Pxþiy,jp = Px -þ iy,hp j = e ϕ 2px - iy jp ðx - iyÞjϕð1sÞ
-i -i -
2
p
27 2
= ea:
35
In the above relation, hp-j implies that ϕ(2px - iy) is present as the final state.
Notice that the inner products of (4.49) and (4.53) are real, even though neither x + iy
ðe Þ ðe - Þ
nor x - iy is Hermitian. Also note that Px -þ iy,jp i of (4.49) and Pxþiy,jp of (4.53)
þ -i
have the same absolute value with minus and plus signs, respectively. The minus
sign of (4.49) comes from the Condon–Shortley phase. The difference, however, is
2
- ðe Þ
not essential, because the transition probability is proportional to Pxþiy,jp or
-i
2
ðe Þ
Px -þ iy,jp . Some literature [3, 4] uses -(x + iy) instead of x + iy. This is simply
þi
1
ψ ðx, t Þ = p
2
ϕð1sÞ exp - iE ð1sÞt=ħ þϕ 2pxþiy exp - iE 2pxþiy t=ħ , ð4:56Þ
where ϕ(1s) is described by (3.301) and ϕ(2px + iy) is expressed as (3.304). Then we
have
That is, ℜe 2pxþiy represents a real part of ϕ(2px + iy) that depends only on r and
θ. For example, from (3.304) we have
1 3 r
ℜe 2pxþiy = - p a - 2 e - 2a sin θ:
r
8 π a
142 4 Optical Transition and Selection Rules
The third term of (4.57) implies that the existence probability density of an
electron represented by |ψ(x, t)|2 is rotating counterclockwise around the z-axis
with an angular frequency of ω. Similarly, in the case of ϕ(2px - iy), the existence
probability density of an electron is rotating clockwise around the z-axis with an
angular frequency of ω.
Integrating (4.57), we have
1 2
= jϕð1sÞj2 þ ϕ 2pxþiy dτ
2
1 π 2π
þ r 2 dr sin θdθϕð1sÞℜe 2pxþiy dϕ cosðϕ - ωt Þ
0 0 0
1 1
= þ = 1,
2 2
where we used normalized functional forms of ϕ(1s) and ϕ(2px + iy); the last term
vanishes because
2π
dϕ cosðϕ - ωt Þ = 0:
0
where we assume that m is positive so that we can appropriately take into account the
Condon–Shortley phase. Alternatively, we describe it via unitary transformation as
follows:
4.3 Three-Dimensional System 143
ð- 1Þm ð- 1Þm i
p - p
ðnÞ ðnÞ ðnÞ ðnÞ 2 2
Λl, cos mϕ Λl, sin mϕ = Λl,m Λl, - m , ð4:61Þ
1 i
p p
2 2
1 i
-p p
ð2Þ ð2Þ 2 2
ϕð2px Þ ϕ 2py = Λ1,1 Λ1, - 1
1 i
p p
2 2
1 i
-p p
2 2
= ϕ 2pxþiy ϕ 2px - iy
1 i
p p
2 2
1 - 32 r - 2a 1 3 r
p a - 2 e - 2a sin θ sin ϕ :
r r
= p a e sin θ cos ϕ
4 2π a 4 2π a
ð4:62Þ
Using this expression, we calculate matrix elements of the electric dipole transi-
tion. We have
ðe Þ
Px,jp1 i = ehϕð1sÞjxjϕð2px Þi
p
x
e 1 π 2π
27 2 ð4:63Þ
4 - 3r=2a
= p r e dr sin θdθ
3
cos ϕdϕ =
2
ea:
4 2πa4 0 0 0 35
Thus, we obtained the same result as (4.49) apart from the minus sign. Since a
square of an absolute value of the transition moment plays a role, the minus sign is
again of secondary importance. With Pðye2 Þ , similarly we have
ðe Þ
Py,jp2 i = e ϕð1sÞjyjϕ 2py
y
1 π 2π p ð4:64Þ
e 27 2
= p r 4 e - 3r=2a dr sin 3 θdθ sin 2 ϕdϕ = ea:
4 2πa4 0 0 0 35
along the z-, x-, and y-axis, respectively, and so linearly polarized lights are relevant.
Note moreover that operators z, x, and y in (4.39), (4.63), and (4.64) are Hermitian
and that ϕ(2pz), ϕ(2px), and ϕ(2py) are real functions.
Notice that in the upper line the indices change cyclic like (z, x, y), whereas in the
lower line they change anti-cyclic such as (z, y, x). The proof of (4.65) is left for the
reader. Thus, we have, e.g.,
Putting
Qþ x þ iy and Q - x - iy,
we have
½M z , Qþ = Qþ , ½M z , Q - = - Q - : ð4:67Þ
hm0 j½M z , Qþ jmi = hm0 jM z Qþ - Qþ M z jmi = m0 hm0 jQþ jmi - mhm0 jQþ jmi
= hm0 jQþ jmi,
ð4:68Þ
where the quantum state jmi is identical to jl, mi in (3.151). Here we need no
information about l, and so it is omitted. Thus, we have, e.g., Mz j mi = m j mi.
Taking its adjoint, we have hm j Mz{ = hm j Mz = mhmj, where Mz is Hermitian.
These results lead to (4.68). From (4.68), we get
Therefore, for the matrix element hm′| Q+| mi not to vanish, we must have
m0 - m - 1 = 0 or Δm = 1 ðΔm m0 - mÞ:
This represents the selection rule with respect to the coordinate Q+.
Similarly, we get
In this case, for the matrix element hm′| Q-| mi not to vanish, we have
146 4 Optical Transition and Selection Rules
m0 - m þ 1 = 0 or Δm = - 1:
To derive (4.70), we can alternatively use the following: Taking the adjoint of
(4.69), we have
½M z , z = 0: ð4:71Þ
Therefore, for the matrix element hm′| z| mi not to vanish, we must have
m0 - m = 0 or Δm = 0: ð4:72Þ
These results are fully consistent with Example 4.3 of Sect. 4.3. That is, if
circularly polarized light takes part in the optical transition, Δm = ± 1. For instance,
using the present notation, we rewrite (4.48) as
p
1 1 27 2
ϕð1sÞjp ðx - iyÞjϕ 2pxþiy = p h0jQ - j1i = - 5 a:
2 2 3
In the above calculations, (1) we used [Mz, z] = 0 (with the second equality);
(2) RHS was modified so that the commutation relations can be used (the third
equality); (3) we used -Mxy = Mxy - 2Mxy and Myx = - Myx + 2Myx so that we can
use (4.65) (the second to the last equality). Moreover, using (4.65), (4.73) can be
written as
M 2 , z = 2i xM y - M x y = 2i M y x - yM x :
Similar results on the commutator can be obtained with [M2, x] and [M2, y]. For
further use, we give alternative relations such that
M 2 , x = 2i yM z - M y z = 2i M z y - zM y ,
ð4:74Þ
M 2 , y = 2iðzM x - M z xÞ = 2iðM x z - xM z Þ:
M2, M2, z ¼ 2i M 2 , M y x - M 2 , M x y þ i M 2 , z
¼ 2i M y M 2 , x - M x M 2 , y þ i M 2 , z
¼ 2i 2iM y yM z - M y z - 2iM x ðM x z - xM z Þ þ i M 2 , z
- zM 2
¼ 2 M 2 z þ zM 2 :
ð4:75Þ
In the above calculations, (1) we used [M2, My] = [M2, Mx] = 0 (with the second
equality); (2) we used (4.74) (the third equality); (3) RHS was modified so that we
can use the relation M ⊥ x from the definition of the angular momentum operator,
i.e., Mxx + Myy + Mzz = 0 (the second to the last equality). We used [Mz, z] = 0 as
well. Similar results are obtained with x and y. That is, we have
148 4 Optical Transition and Selection Rules
M 2 , M 2 , x = 2 M 2 x þ xM 2 , ð4:76Þ
M 2 , M 2 , y = 2 M 2 y þ yM 2 : ð4:77Þ
M 4 z - 2M 2 zM 2 þ zM 4 = 2 M 2 z þ zM 2 : ð4:78Þ
Using the relation (4.78) and taking inner products of both sides, we get, e.g.,
l0 jM 4 z - 2M 2 zM 2 þ zM 4 jl = l0 j2 M 2 z þ zM 2 jl : ð4:79Þ
That is,
l0 jM 4 z - 2M 2 zM 2 þ zM 4 jl - l0 j2 M 2 z þ zM 2 jl = 0:
Considering that both terms of LHS contain a factor hl′| z| li in common, we have
where the quantum state jli is identical to jl, mi in (3.151) with m omitted.
To factorize the first factor of LHS of (4.80), we view it as a quartic equation with
respect to l′. Replacing l′ with -l, we find that the first factor vanishes, and so the first
factor should have a factor (l′ + l ). Then, we factorize the first factor of LHS of (4.80)
such that
[the first factor of LHS of (4.80)]
We have similar relations with respect to hl′| x| li and hl′| y| li because of (4.76) and
(4.77). For the electric dipole transition to be allowed, among hl′| x| li, hl′| y| li, and
hl′| z| li, at least one term must be non-vanishing. For this, at least one of the four
factors of (4.81) should be zero. Since l′ + l + 2 > 0, this factor is excluded.
For l′ + l to vanish, we should have l′ = l = 0; notice that both l′ and l are
nonnegative integers. We must then examine this condition. This condition is
equivalent to that the sphericalpharmonics related to the angular variables θ and ϕ
take the form of Y 00 ðθ, ϕÞ = 1= 4π , i.e., a constant. Therefore, the θ-related integral
for the matrix element hl′| z| li only consists of a following factor:
π π
1 1
cos θ sin θdθ = sin 2θdθ = - ½cos 2θπ0 = 0,
0 2 0 4
where cos θ comes from a polar coordinate z = r cos θ and sinθ is due to an
infinitesimal volume of space, i.e., r2 sin θdrdθdϕ. Thus, we find that hl′| z| li
vanishes on condition that l′ = l = 0. As a polar coordinate representation,
x = r sin θ cos ϕ and y = r sin θ sin ϕ, and so the ϕ-related integrals hl′| x| li and
hl′| y| li vanish as well. That is,
2π 2π
cos ϕdϕ = sin ϕdϕ = 0:
0 0
l0 - l þ 1 = 0 or l0 - l - 1 = 0: ð4:84Þ
Or defining Δl l′ - l, we get
Δl = ± 1: ð4:85Þ
Thus, for the transition to be allowed, the azimuthal quantum number must
change by one.
150 4 Optical Transition and Selection Rules
In Sect. 4.3 we mentioned circularly polarized light. If the circularly polarized light
acts on an electron, what can we anticipate? Here we deal with this problem within a
framework of a semiclassical theory.
Let E and H be an electric and magnetic field of a left-circularly polarized light,
respectively. They are expressed as
1
E = p E0 ðe1 þ ie2 Þ exp iðkz - ωt Þ, ð4:86Þ
2
1 1 E0
H = p H 0 ðe2 - ie1 Þ exp iðkz - ωt Þ = p ðe2 - ie1 Þ exp iðkz - ωt Þ: ð4:87Þ
2 2 μv
Here we assume that the light is propagating in the direction of the positive z-axis.
The electric and magnetic fields described by (4.86) and (4.87) represent the left-
circularly polarized light. A synchronized motion of an electron is expected, if the
electron exerts a circular motion in such a way that the motion direction of the
electron is always perpendicular to the electric field and parallel to the magnetic field
(see Fig. 4.2). In this situation, magnetic Lorentz force does not affect the electron
motion.
Here, the Lorentz force F(t) is described by
Fðt Þ = eEðxðt ÞÞ þ e xðt Þ × Bðxðt ÞÞ, ð4:88Þ
where the first term is electric Lorentz force and the second term represents the
magnetic Lorentz force.
electron
O E x
4.5 Angular Momentum of Radiation [6] 151
1
E = p E 0 ½e1 cosðkz - ωt Þ - e2 sinðkz - ωt Þ
2
ð4:89Þ
1
þi p E 0 ½e2 cosðkz - ωt Þ þ e1 sinðkz - ωt Þ:
2
Suppose that the electron exerts the circular motion in a region narrow enough
around the origin and that the said electron motion is confined within the xy-plane
that is perpendicular to the light propagation direction. Then, we can assume that
z ≈ 0 in (4.89). Ignoring kz in (4.89) accordingly and taking a real part, we have
1
E = p E 0 ðe1 cos ωt þ e2 sin ωt Þ:
2
F = eE, ð4:90Þ
x = eE,
m€ ð4:91Þ
1 1
m€x = p eE0 cos ωt and m€y = p eE0 sin ωt: ð4:92Þ
2 2
eE
mx = - p 0 cos ωt þ Ct þ D, ð4:93Þ
2ω2
eE
my = - p 0 sin ωt þ C 0 t þ D0 , ð4:94Þ
2ω2
where C′ and D′ are integration constants. Setting y(0) = 0 and y0 ð0Þ = - peE 0 ,
2mω
we
have C′ = D′ = 0. Thus, making t a parameter, we get
2
eE 0
x2 þ y 2 = p : ð4:95Þ
2mω2
This implies that the electron is exerting a counterclockwise circular motion with
a radius - peE 0
2mω2
under the influence of the electric field. This is consistent with a
motion of an electron in the coherent state of ϕ(1s) and ϕ(2px + iy) as expressed in
(4.57).
An angular momentum the electron has acquired is
eE 0 meE e2 E 0 2
L = x × p = xpy - ypx = - p -p 0 = : ð4:96Þ
2mω2 2mω 2mω3
e2 E 0 2
= ħ: ð4:97Þ
2mω3
e2 E 0 2
= ħω: ð4:98Þ
2mω2
eE 0
α= p : ð4:99Þ
2mω2
References
3. Arfken GB, Weber HJ, Harris FE (2013) Mathematical methods for physicists, 7th edn. Aca-
demic Press, Waltham, MA
4. Chen JQ, Ping J, Wang F (2002) Group representation theory for physicists, 2nd edn. World
Scientific, Singapore
5. Sunakawa S (1991) Quantum mechanics. Iwanami, Tokyo. (in Japanese)
6. Rossi B (1957) Optics. Addison-Wesley, Reading, MA
Chapter 5
Approximation Methods of Quantum
Mechanics
the quantum state or energy is small. The applied external field causes the change in
Hamiltonian of a quantum system and results in the change in that quantum state
usually accompanied by the energy change of the system. We describe this process
as a following eigenvalue equation:
H j Ψi i = E i j Ψi i, ð5:1Þ
H = H 0 þ λV, ð5:2Þ
where H0 is the Hamiltonian of the unperturbed system without the external field;
V is an additional Hamiltonian due to the applied external field, or interaction
between the physical system and the external field. The second term includes a
parameter λ that can be modulated according to the change in the strength of the
applied field. The parameter λ should be real so that H of (5.2) can be Hermitian. The
parameter λ can be the applied field itself or can merely be a dimensionless parameter
that can be set at λ = 1 after finishing the calculation. We will come back to this point
later in examples.
In (5.2) we assume that the following equation holds:
ð0Þ
H 0 j ii = E i j ii, ð5:3Þ
where jii is the i-th quantum state. We designate j0i as the ground state. We assume
the excited states jii (i = 1, 2, ⋯) to be numbered in order of increasing energies.
These functions (or vectors) jii (including j0i) can be a quantum state that appeared
as various eigenfunctions in the previous chapters. Of these, two or more functions
may have the same eigenenergy (i.e., degenerate). If we are to deal with such
degenerate states, those states jii have to be distinguished by, e.g., jidi (d = 1, 2,
⋯, s), where s denotes the degeneracy (or degree of degeneracy). For simplicity,
however, in this chapter we are going to examine only the change in energy and
physical states with respect to non-degenerate states. We disregard the issue on
notation of the degenerate states accordingly. We pay attention to each individual
case later, when necessary.
Regarding a one-dimensional physical system, e.g., a particle confined within
potential well and quantum-mechanical harmonic oscillator, for example, all the
quantum states are non-degenerate as we have already seen in Chaps. 1 and 2. As a
5.1 Perturbation Method 157
special case we also consider the ground state of a hydrogen-like atom. This is
because that state is non-degenerate, and so the problem can be dealt with in parallel
to the cases of the one-dimensional physical systems. We assume that the quantum
states jii (i = 0, 1, 2, ⋯) constitute the orthonormal eigenvectors such that
Remember that the notation has already appeared in (2.53); see Chap. 13 as well.
One of the most important applications of the perturbation method is to evaluate the
shift in quantum state and energy level caused by the applied external field. To this
end, we expand both the quantum state and energy as a power series of λ. That is, in
(5.1) we expand jΨii and Ei such that
hijΨi i = 1: ð5:7Þ
This condition, however, does not necessarily mean that jΨii has been normalized
(vide infra). From (5.4) and (5.7) we have
ð1Þ ð2Þ
ijϕi = ijϕi = ⋯ = 0: ð5:8Þ
Let us calculate hΨi| Ψii on condition of (5.7) and (5.8). From (5.5) we have
158 5 Approximation Methods of Quantum Mechanics
ð1Þ ð1Þ
hΨi jΨi i = hijii þ λ ijϕi þ hϕi jii
ð2Þ ð2Þ ð1Þ ð1Þ
þλ2 ijϕi þ hϕi jii þ hϕi jϕi i þ ⋯
ð1Þ ð1Þ ð1Þ ð1Þ
= hijii þ λ2 ϕi jϕi þ ⋯ ≈ 1 þ λ2 ϕi jϕi ,
where we used (5.4) and (5.8). Thus, hΨi| Ψii is normalized if we ignore the factor of
λ 2.
Now, inserting (5.5) and (5.6) into (5.1) and comparing the same power factors
with respect to λ, we obtain
ð0Þ
E i - H 0 j ii = 0, ð5:9Þ
Equation (5.9) is identical with (5.3). Operating hjj on both sides of (5.10) from
the left, we get
ð1Þ
E i = hijVjii: ð5:13Þ
This represents the first-order correction term with the energy eigenvalue.
Meanwhile, assuming j ≠ i on (5.12), we have
ð1Þ hjjVjii
jjϕi = ðj ≠ iÞ, ð5:14Þ
ð0Þ ð0Þ
Ei - Ej
ð0Þ ð0Þ
where we used E i ≠ E j on the assumption that the quantum state jii that belongs
ð0Þ
to eigenenergy Ei is non-degenerate. Here, we emphasize that the eigenstates that
ð0Þ
correspond to Ej may or may not be degenerate. In other words, the quantum state
that we are making an issue of with respect to the perturbation is non-degenerate. In
this context, we do not question whether or not other quantum states [represented by
jji in (5.14)] are degenerate.
5.1 Perturbation Method 159
P k
j kihk j = E, ð5:15Þ
where with the third equality we used (5.8) and with the last equality we used (5.14).
Operating hij from the left on (5.16) and using (5.4), we recover (5.8). Hence, using
(5.5) the approximated quantum state to the first order of λ is described by
ð1Þ hkjVjii
j Ψi i ≈ j ii þ λ j ϕi i = j ii þ λ k≠i
j ki : ð5:17Þ
ð0Þ ð0Þ
Ei - Ek
For a while, let us think of a case where we deal with the change in the
eigenenergy and corresponding eigenstate with respect to the degenerate quantum
state. In that case, regarding the state jii in question on (5.12) we have ∃ j ji ( j ≠ i)
ð0Þ ð0Þ
that satisfies E i = E j . Therefore, we would have hj| V| ii = 0 from (5.12). Gener-
ally, it is not the case, however, and so we need a relation different from (5.12) to
deal with the degenerate case. However, once again we do not get into details about
this issue in this book.
ð2Þ
Next let us seek the second-order correction term E i of the eigenenergy.
Operating hjj on both sides of (5.11) from the left, we get
ð2Þ ð1Þ
Ei = ijVjϕi : ð5:18Þ
Notice that we used (5.8) to derive (5.18). Using (5.16) furthermore, we get
160 5 Approximation Methods of Quantum Mechanics
where with the second equality we used (1.116) in combination with (1.118) and
with the third equality we used the fact that V is an Hermitian operator; i.e., V{ = V.
The state jki is sometimes called an intermediate state.
Considering the expression of (5.16), we find that the approximated state of j
ð1Þ
ii þ λ j ϕi i in (5.5) is not an eigenstate of energy, because the approximated state
contains a linear combination of different eigenstates jki that have eigenenergies
ð0Þ
different from E i of jii. Substituting (5.13) and (5.19) for (5.6), with the energy
correction terms up to the second order we have
ð0Þ ð0Þ
Since E 0 < E k ðk = 1, 2, ⋯Þ for any jki, the second-order correction term is
always negative. This contributes to the energy stabilization.
“nano capacitor” where, e.g., an electron is confined within the capacitor while
applying a voltage between the two electrodes.
We adopt the coordinate system the same as that of Example 1.2. That is, suppose
that we apply an electric field F between the electrodes that are positioned at x = ± L
and that the electron is confined between the two electrodes. Then, the Hamiltonian
of the system is described as
ħ2 d 2
H= - - eFx, ð5:22Þ
2m dx2
where m is a mass of the particle. Note that we may choose -eF for the parameter λ
of Sect. 5.1.1. Then, the coordinate representation of the Schrödinger equation is
given by
ħ2 d 2 ψ i ð xÞ
- - eFxψ i ðxÞ = E i ψ i ðxÞ, ð5:23Þ
2m dx2
where Ei is the energy of the i-th state from the bottom (i.e., the ground state); ψ i is its
corresponding eigenstate. We wish to seek the perturbation energy up to the second
order and the quantum state up to the first order. We are particularly interested in the
change in the ground state j0i. Notice that the ground state is obtained in (1.101) by
putting l = 0.
According to (5.21) we have
ð0Þ 1
E 0 ≈ E 0 þ λh0jVj0i þ λ2 k≠0
j hk j V j0ij2 , ð5:24Þ
ð0Þ ð0Þ
E0 - Ek
ð0Þ 1
E0 ≈ E0 - eF h0jxj0i þ ð- eF Þ2 k≠0
j hk j xj0ij2 : ð5:25Þ
ð0Þ ð0Þ
E0 - Ek
Considering that j0i is an even function [i.e., a cosine function in (1.101)] with
respect to x and that x is an odd function, we find that the second term vanishes.
Notice that the explicit coordinate representation of j0i is
1 π
j 0i = cos x, ð5:26Þ
L 2L
162 5 Approximation Methods of Quantum Mechanics
⋯
Fig. 5.1 Notation of the
quantum states that
distinguish cosine and sine ( ) ℏ
functions. = |2 ⟩ ≡ |4⟩
ð0Þ
E k ðk = 0, 1, 2, ⋯Þ
represents energy
eigenvalues of the
unperturbed states
( ) ℏ
= |2 ⟩ ≡ |3⟩
( ) ℏ
= |1 ⟩ ≡ |2⟩
( ) ℏ
= |1 ⟩ ≡ |1⟩
( ) ℏ
= |0 ⟩ ≡ |0⟩
π
which is obtained by putting l = 0 in j lcos i 1
L cos 2L þ lπL x ðl = 0, 1, 2, ⋯Þ in
(1.101).
By the same token, hk| x| 0i in the third term of RHS of (5.25) vanishes if jki
denotes a cosine function in (1.101). If, however, jki denotes a sine function, hk| x|
0i does not vanish. To distinguish the sine functions from cosine functions, we
denote
1 nπ
j nsin i sin x ðn = 1, 2, 3, ⋯Þ ð5:27Þ
L L
that is identical with (1.102); see Fig. 5.1 with the notation of the quantum states.
ð0Þ ð0Þ 2 2
ħ2 π ħ2
ð2n4LÞ 2π ðn = 1, 2, 3, ⋯Þ in (5.25), we
2
Now, putting E0 = 2m 4L 2 and E 2n - 1 = 2m
rewrite it as
5.1 Perturbation Method 163
ð0Þ 1 1
E0 ≈ E0 - eF h0jxj0i þ ð- eF Þ2 k=1
j hk j xj0ij2
ð0Þ ð0Þ
E0 - Ek
ð0Þ 1 1
= E 0 - eF h0jxj0i þ ð- eF Þ2 n=1
j hnsin j xj0ij2 ,
ð0Þ ð0Þ
E0 - E2n - 1
where with the second equality n denotes the number appearing in (5.27) and jnsini
ð0Þ
belongs to E2n - 1 . Referring to, e.g., (4.17) with regard to the calculation of hk| x| 0i,
we have arrived at the following equation described by
ħ2 π 2 322 8 mL4 1 n2
E0 ≈ 2 - ð- eF Þ2 : ð5:28Þ
2m 4L π 6 ħ2 n=1
ð4n2 - 1Þ5
The series of the second term in (5.28) rapidly converges. Defining the first
N partial sum of the series as
N n2
Sð N Þ n=1
,
ð4n2 - 1Þ 5
we obtain
as well as
½Sð1000Þ - Sð1Þ
= 0:001324653:
Sð1000Þ
That is, only the first term of S(N ) occupies ~99.9% of the infinite series. Thus,
ð2Þ
the stabilization energy λ2 E0 due to the perturbation is satisfactorily given by
ð1Þ hkjxj0i
j Ψ0 i ≈ j 0i þ λ j ϕ0 ≈ j 0i - eF k≠0
j ki
ð0Þ ð0Þ
E0 - Ek
32 8 mL3 1 ð- 1Þn - 1 n
= j 0i þ eF j nsin i : ð5:30Þ
π 4 ħ2 n=1
ð4n2 - 1Þ3
32 8 mL3
j Ψ0 i ≈ j 0i þ eF j 1sin i: ð5:31Þ
27π 4 ħ2
Thus, we have roughly estimated the stabilization energy and the corresponding
quantum state as in (5.29) and (5.31), respectively. It may well be worth estimating
rough numbers of L and F in the actual situation (or perhaps in an actual “nano
device”). The estimation is left for readers.
Example 5.2 [1] In Chap. 2 we dealt with a quantum-mechanical harmonic oscil-
lator. Here let us suppose that a charged particle (with its charge e) is performing
harmonic oscillation under an applied electric field. First we consider the coordinate
representation of Schrödinger equation. Without the external field, the Schrödinger
equation as an eigenvalue equation reads as
ħ2 d 2 uð qÞ 1
- þ mω2 q2 uðqÞ = EuðqÞ: ð2:108Þ
2m dq2 2
Under the applied electric field, the perturbation is expressed as -eFq and, hence,
the equation can be written as
ħ2 d 2 u ð q Þ 1
- þ mω2 q2 uðqÞ - eF ðqÞquðqÞ = EuðqÞ: ð5:32Þ
2m dq2 2
For simplicity we assume that the electric field is uniform independent of q so that
we can deal with it as a constant. Then, (5.32) can be rewritten as
ħ2 d 2 uð qÞ 1 eF 2
1 eF 2
- þ mω2 q - uðqÞ = E þ mω2 uðqÞ: ð5:33Þ
2m dq2 2 mω2 2 mω2
eF
Q q- , ð5:34Þ
mω2
we have
5.1 Perturbation Method 165
ħ2 d 2 uð qÞ 1 1 eF 2
- þ mω2 Q2 uðqÞ = E þ mω2 uðqÞ: ð5:35Þ
2m dQ2 2 2 mω2
Taking account of the change in functional form that results from the variable
transformation, we rewrite (5.35) as
ħ2 d 2 u ð Q Þ 1 1 eF 2
- þ mω2 Q2 uðQÞ = E þ mω2 uðQÞ, ð5:36Þ
2m dQ2 2 2 mω2
where
eF
uðqÞ = u Q þ uðQÞ: ð5:37Þ
mω2
Defining E as
2
1 eF e2 F 2
E E þ mω2 =E þ , ð5:38Þ
2 mω2 2mω2
we get
ħ2 d 2 u ð Q Þ 1
- þ mω2 Q2 uðQÞ = EuðQÞ: ð5:39Þ
2m dQ2 2
Thus, we recover exactly the same form as (2.108). Equation (5.39) can be solved
analytically as already shown in Sect. 2.4. From (2.110) and (2.120), we have
2E
λ , λ = 2n þ 1:
ħω
Then, we have
ħω ħωð2n þ 1Þ
E= λ= : ð5:40Þ
2 2
e2 F 2 1 e2 F 2
E=E- 2
= ħω n þ - : ð5:41Þ
2mω 2 2mω2
This implies that the energy stabilization due to the applied electric field is
e2 F 2
- 2mω2 . Note that the system is always stabilized regardless of the sign of e.
166 5 Approximation Methods of Quantum Mechanics
Next, we wish to consider the problem on the basis of the matrix (or operator)
representation. Let us evaluate the perturbation energy using (5.20). Conforming the
notation of (5.20) to the present case, we have
1
E n ≈ Eðn0Þ - eF hnjqjni þ ð- eF Þ2 k≠n
j hk j qjnij2 , ð5:42Þ
ð0Þ
Eðn0Þ - Ek
1
E ðn0Þ = ħω n þ : ð5:43Þ
2
Taking account of (2.55), (2.68), and (2.62), the second term of (5.42) vanishes.
With the third term, using (2.68) as well as (2.55) and (2.61) we have
ħ ħ
jhkjqjnij2 = kja þ a{ jn kja þ a{ jn
2mω 2mω
ħ p p 2
= nhkjn - 1i þ n þ 1hkjn þ 1i
2mω
ħ p p
= nδk,n - 1 þ n þ 1δk,nþ1
2 ð5:44Þ
2mω
ħ
= nδk,n - 1 þ ðn þ 1Þδk,nþ1 þ 2δk,n - 1 δk,nþ1 nð n þ 1 Þ
2mω
ħ
= ½nδ þ ðn þ 1Þδk,nþ1 :
2mω k,n - 1
Notice that with the last equality of (5.44) there is no k that satisfies k = n -
1 = n + 1 at once. Also note that hk| q| ni is a real number. Hence, as the third term of
(5.42) we get
1 1 ħ
jhkjqjnij2 = ½nδ þ ðn þ 1Þδk,nþ1
k≠n ð0Þ
Enð0Þ - E k
k≠n ħωðn - k Þ 2mω k,n - 1
1 n nþ1 1
= þ =- :
2mω2 n - ðn - 1Þ n - ðn þ 1Þ 2mω2
ð5:45Þ
1 e2 F 2
E n ≈ ħω n þ - : ð5:46Þ
2 2mω2
Sect. 5.1.1, however, (5.46) does not represent an eigenenergy. In fact, using (5.17)
together with (4.26) and (4.28), we have
ð1Þ hk jV j0i
jΨ0 i ≈ j0i þ λ j ϕ0 ¼ j0i þ λ k≠0
j ki
ð0Þ ð0Þ
E0 - Ek
h1jV j0i
¼ j0i þ λ j 1i
ð0Þ
E0 - E1
ð0Þ ð5:47Þ
h1jxj0i e2 F 2
¼ j0i - eF j 1i ¼ j0i þ j 1i:
ð0Þ
E0 - E1
ð0Þ 2mω3 ħ
Thus, jΨ0i is not an eigenstate because jΨ0i contains both j0i and j1i that have
different eigenenergies. Hence, jΨ0i does not possess a fixed eigenenergy. Notice
also that in (5.47) the factors hk| x| 0i vanish except for h1| x| 0i; see Chaps. 2 and 4.
To think of this point further, let us come back to the coordinate representation of
Schrödinger equation. From (5.39) and (2.106), we have
1
mω 1 mω
Q e - 2ħ Q
4 mω 2
un ðQÞ = Hn ðn = 0, 1, 2, ⋯Þ,
ħ π 2n n!
1
2 ħ
mω
where H n ħ Q is the Hermite polynomial of the n-th order. Using (5.34) and
(5.37), we replace Q with q - eF
mω2 to obtain the following form described by
eF
un ðqÞ = un q -
mω2
1 2
mω 1 mω eF
e - 2ħ q-
4 mω eF
= Hn q- mω2 ðn = 0, 1, 2, ⋯Þ:
ħ π 2 2n n!
1
ħ mω2
ð5:48Þ
Since ψ n(q) of (2.106) forms the (CONS) [2], un(q) should be expanded using
ψ n(q). That is, as a solution of (5.32) we get
u n ð qÞ = c ψ ðqÞ,
k nk k
ð5:49Þ
where a set of cnk are appropriate coefficients. Since the functions ψ k(q) are
non-degenerate, un(q) expressed by (5.49) lacks a definite eigenenergy.
Example 5.3 [1] In the previous examples we studied the perturbation in
one-dimensional systems, where all the energy eigenstates are non-degenerate.
Here we deal with the change in the energy and quantum states of a hydrogen
atom (i.e., a three-dimensional system). Since the energy eigenstates are generally
degenerate (see Chap. 3), for simplicity we consider only the ground state that is
168 5 Approximation Methods of Quantum Mechanics
V = - eFz, ð5:50Þ
where e is a charge of an electron (e < 0). Then, the total Hamiltonian H is described
by
H = H 0 þ V, ð5:51Þ
2
ħ2 ∂ ∂ 1 ∂ ∂ 1 ∂
H0 = - r2 þ sin θ þ
2μr 2 ∂r ∂r sin θ ∂θ ∂θ sin 2 θ ∂ϕ2
e2
- , ð5:52Þ
4πε0 r
hkjVj0i
j Ψ0 i ≈ j 0i þ k≠0
j ki = j 0i - eF k≠0
ð0Þ ð0Þ
E0 - Ek
hkjzj0i
j ki , ð5:53Þ
ð0Þ ð0Þ
E0 - Ek
where j0i denotes the ground state expressed as (3.301). Since the quantum states
jki represent all the quantum states of the hydrogen atom as implied in (5.3), in
the second term of RHS in (5.53) we are considering all those states except for
ð0Þ ħ2
j0i. The eigenenergy E 0 is identical with E 1 = - 2μa 2 obtained from (3.258),
Pe x,
j j
ð4:6Þ
where xj is a position vector of the j-th charged particle. Placing the proton of
hydrogen at the origin, by use of the notation (3.5) P is expressed as
Px ex
P = ðe1 e2 e3 Þ Py = ex = ðe1 e2 e3 Þ ey :
Pz ez
Pz = ez: ð5:54Þ
where jΨ0i is taken from (5.53). Substituting (5.53) for (5.55), we get
hk jzj0i hk jzj0i
hΨ0 jzjΨ0 i ¼ h0j - eF k≠0
hkj z j0i - eF k≠0
jki
ð0Þ ð0Þ ð0Þ ð0Þ
E0 - Ek E0 - Ek
hk jzj0i
¼ h0jzj0i - eF k≠0
h0jzjk i
ð0Þ ð0Þ
E0 - Ek
hk jzj0i
- eF k≠0
0 z{ k
ð0Þ ð0Þ
E0 - Ek
jhk jzj0ij2
þ ðeF Þ2 k≠0
hkjzjk i 2
:
ð0Þ ð0Þ
E0 - Ek
ð5:56Þ
jhkjzj0ij2
hΨ0 jzjΨ0 i ≈ h0jzj0i - 2eF k≠0
:
ð0Þ ð0Þ
E0 - Ek
jhkjzj0ij2
hPz i ≈ eh0jzj0i - 2e2 F k≠0
: ð5:57Þ
ð0Þ ð0Þ
E0 - Ek
π
where we used 0 sin 2θdθ = 0. Then, we have
jhkjzj0ij2
hPz i ≈ - 2e2 F k≠0
: ð5:59Þ
ð0Þ ð0Þ
E0 - Ek
jhkjzj0ij2
α = - 2e2 k≠0
: ð5:61Þ
ð0Þ ð0Þ
E0 - Ek
At the first glance, to evaluate (5.61) appears to be formidable, but making the
most of the fact that the total wave functions of a hydrogen atom form the CONS,
the evaluation is straightforward. The discussion is as follows:
First, let us seek an operator G that satisfies [1]
5.1 Perturbation Method 171
To this end, we take account of all the quantum states jki of a hydrogen atom
(including j0i) that are solutions (or eigenstates of energy) of (3.36). In Sect. 3.8
we have obtained the total wave functions described by
ðnÞ ðnÞ
Λl,m = Y m
l ðθ, ϕÞRl ðr Þ, ð3:300Þ
ðnÞ
where Λl,m is expressed as a product of spherical surface harmonics and radial
wave functions. The latter functions are described by associated Laguerre func-
tions. It is well-known [2] that the spherical surface harmonics constitute the
CONS on a unit sphere and that associated Laguerre functions form the CONS in
the real one-dimensional space. Thus, their product expressed as (3.300), i.e.,
aforementioned collection of jki constitutes the CONS in the real three-
dimensional space. This is equivalent to
j
j jihj j = E, ð5:63Þ
Ef ðrÞ = f ðrÞ = k
j kihkjf ðrÞi = f
k k
j ki: ð5:64Þ
In other words, (5.64) implies that any function f(r) can be expanded into a
series of jki. The coefficients fk are defined as
ðGH 0 - H 0 GÞ j 0i
ħ2 1 ∂ ∂ 1 ∂ ∂ 1 1 ∂ ∂
=- G 2 r2 - 2 r2 G- 2 sin θ G j 0i:
2μ r ∂r ∂r r ∂r ∂r r sin θ ∂θ ∂θ
ð5:66Þ
172 5 Approximation Methods of Quantum Mechanics
1 ∂ ∂ 1 ∂ ∂G 1 ∂ ∂ j 0i
- r2 Gj0 =- 2 r2 j0 - r2 G
r2 ∂r ∂r r ∂r ∂r r 2 ∂r ∂r
2
1 ∂G ∂ G ∂G ∂ j 0i G ∂ ∂ j 0i ∂G ∂ j 0i
=- 2r j0 - j0 - - 2 r2 -
r2 ∂r ∂r2 ∂r ∂r r ∂r ∂r ∂r ∂r
2
2 ∂G ∂ G ∂G ∂ j 0i G ∂ ∂ j 0i
=- j0 - j 0 -2 - 2 r2
r ∂r ∂r 2 ∂r ∂r r ∂r ∂r
2
2 ∂G ∂ G 2 ∂G G ∂ ∂ j 0i
=- j0 - j0 þ j0 - 2 r2 ,
r ∂r ∂r 2 a ∂r r ∂r ∂r
ð5:67Þ
where with the last equality we used (3.301), namely j0i / e-r/a, where a is Bohr
radius (Sect. 3.7.1). The last term of (5.67) cancels the first term of (5.66). Then,
we get
ħ2 1 1 ∂G
ðGH 0 - H 0 GÞj0i ¼ 2 - 0
2μ r a ∂r
2
∂ G 1 1 ∂ ∂G ð5:68Þ
þ 0 þ 2 sin θ 0i
∂r2 r sin θ ∂θ ∂θ
2
ħ2 1 1 ∂G ∂ G 1 1 ∂ ∂G
¼ 2 - þ 2 þ 2 sin θ j0i:
2μ r a ∂r ∂r r sin θ ∂θ ∂θ
ħ2
r cos θϕð1sÞ =
2μ
2
1 1 ∂G ∂ G 1 1 ∂ ∂G
2 - þ 2 þ 2 sin θ ϕð1sÞ, ð5:70Þ
r a ∂r ∂r r sin θ ∂θ ∂θ
where ϕ(1s) is given by (3.301). Dividing (5.70) by ϕ(1s) and rearranging it, we
have
5.1 Perturbation Method 173
2
∂ G 1 1 ∂G 1 1 ∂ ∂G 2μ
þ2 - þ sin θ = 2 r cos θ: ð5:71Þ
∂r 2 r a ∂r r 2 sin θ ∂θ ∂θ ħ
d2 g 1 1 dg 2g 2μ
þ2 - - 2 = 2 r: ð5:73Þ
dr 2 r a dr r ħ
Equation (5.73) has a regular singular point at the origin and resembles an
Euler equation whose general form of a homogeneous equation is described by
[5]
d 2 g a dg b
þ þ g = 0, ð5:74Þ
dr 2 r dr r2
d2 g a dg a
þ - 2 g = 0, ð5:75Þ
dr 2 r dr r
g = pr 2 þ qr: ð5:76Þ
4pr 2q 2μ
4p - - = 2 r: ð5:77Þ
a a ħ
aμ a2 μ
p= - and q= - : ð5:78Þ
2ħ2 ħ2
Hence, we get
174 5 Approximation Methods of Quantum Mechanics
aμ r
gð r Þ = - þ a r: ð5:79Þ
ħ2 2
Accordingly, we have
aμ r aμ r
Gðr, θÞ = gðr Þ cos θ = - þ a r cos θ = - 2 þ a z: ð5:80Þ
ħ2 2 ħ 2
ð0Þ
H 0 j 0i = E0 j 0i ð5:82Þ
and that
{ {
ð0Þ ð0Þ
hj j H 0 = H 0 { jji = ½H 0 jji{ = Ej jji = E j hj j , ð5:83Þ
where we used a computation rule of (1.118) and with the second equality we
used the fact that H0 is Hermitian, i.e., H0{ = H0. Rewriting (5.81), we have
hjjzj0i
hjjGj0i = ð0Þ ð0Þ
: ð5:84Þ
E0 - Ej
Multiplying h0|z|ji on both sides of (5.84) and summing over j (≠0), we obtain
h0jzjjihjjzj0i
j≠0
h0jzjjihjjGj0i = j≠0
: ð5:85Þ
ð0Þ ð0Þ
E0 - Ej
h0jzjjihjjzj0i
h0jzjjihjjGj0i = j≠0
þ h0jzj0ih0jGj0i: ð5:86Þ
j ð0Þ ð0Þ
E0 - Ej
h0jzjjihjjzj0i
h0jzGj0i = h0jzjjihjjGj0i = j≠0
j ð0Þ ð0Þ
E0 - Ej
ð5:87Þ
jhjjzj0ij2 α
= j≠0
= - 2,
ð0Þ ð0Þ 2e
E0 - Ej
where with the second equality we used (5.84) and with the last equality we used
(5.61). Also, we used the fact that z is Hermitian.
Meanwhile, using (5.80), LHS of (5.87) is expressed as
aμ r aμ a2 μ
h0jzGj0i = - 0jz þ a zj0 = - h0jzrzj0 i - 0jz2 j0 : ð5:88Þ
ħ2 2 2ħ2 ħ2
Noting that j0i is spherically symmetric and that z and r are commutative,
(5.88) can be rewritten as
aμ a2 μ
h0jzGj0i = - 0jr 3
j0 - 0jr 2 j0 , ð5:89Þ
6ħ2 3ħ2
where we used h0| x2| 0i = h0| y2| 0i = h0| z2| 0i = h0| r2| 0i/3.
Returning back to the coordinate representation and taking account of vari-
ables change from the Cartesian coordinate to the polar coordinate as in (5.58),
we get
To perform the definite integral calculations of (5.89) and obtain the result of
(5.90), modify (3.263) and use the formula described below. That is, differentiate
1
1
e - rξ dr =
0 ξ
four (or five) times with respect to ξ and replace ξ with 2/a to get the result of
(5.90) appropriately.
Using (5.87) and (5.90), as the polarizability α we finally get
176 5 Approximation Methods of Quantum Mechanics
where we used
ð0Þ 1
E 0 ≈ E 0 - eF h0jzj0i þ ð- eF Þ2 j≠0
j j j zj0ij2 : ð5:93Þ
ð0Þ ð0Þ
E0 - Ej
Another approximation method is a variational method. This method also has a wide
range of applications in mathematical physics.
A major application of the variational method lies in seeking an eigenvalue that is
appropriately approximated. Suppose that we have an Hermitian differential opera-
tor D that satisfies an eigenvalue equation
5.2 Variational Method 177
λ1 ≤ λ2 ≤ λ3 ≤ ⋯, ð5:96Þ
hyn jDui = hDyn jui = hλn yn jui = λn hyn jui = λn hyn jui: ð5:97Þ
In (5.97) we used (1.132) and the computation rule of the inner product (see Sect.
13.1). Also, we used the fact that any eigenvalue of an Hermitian operator is real
(Sect. 1.4). From the assumption that the functions {yn} constitute the CONS, u can
be expanded in a series described by
u= c
j j
j yj i ð5:98Þ
with
cj yj ju : ð5:99Þ
This relation is in parallel with (5.64). Similarly, Du can be expanded such that
Du = d
j j
j yj i = j
yj jDu j yj i = λ
j j
yj ju j yj i = λc
j j j
j yj i, ð5:100Þ
where dj is an expansion coefficient and with the third equality we used (5.95) and
with the last equality we used (5.99). Comparing the individual coefficient of jyji of
(5.100), we get
d j = λ j cj : ð5:101Þ
Thus, we obtain
Du = λc
j j j
j yj i: ð5:102Þ
Comparing (5.98) and (5.102), we find that we may termwise operate D on (5.98).
Meanwhile, taking an inner product hu| Dui again termwise with (5.100), we get
178 5 Approximation Methods of Quantum Mechanics
λj cj cj ¼
2
hujDui ¼ u j
λ j cj j yj ¼ j
λj cj ujyj ¼ j j
λ j cj
2 2
≥ j
λ1 cj ¼ λ1 j
cj , ð5:103Þ
c c
2
hujui = c hy j
j j j
c jy i
j j j
= j j j
yj jyj = j
cj , ð5:104Þ
where with the last equality we used the fact that the functions {yn} are normalized.
Comparing once again (5.103) and (5.104), we finally get
hujDui
hujDui ≥ λ1 hujui or λ1 ≤ : ð5:105Þ
hujui
Dy = λy, ð5:106Þ
H = H 0 - eFx, ð5:107Þ
where H0 is the Hamiltonian without the external field F. We suppose that the
quantum state jui is expressed as
5.2 Variational Method 179
j ui ≈ j 0i þ a j 1i, ð5:108Þ
where j0i and j1i denote the ground state and first excited state, respectively; a is an
unknow parameter to be decided after the analysis based on the variational method.
In the present case, jui is a trial function. Then, we have
hujHui
hH i : ð5:109Þ
hujui
Note that in (5.110) four terms vanish because of the symmetry requirement of the
symmetric potential well and harmonic oscillator with respect to the origin as well as
because of the orthogonality between the states j0i and j1i. Here x is Hermitian and
we assume that the coordinate representation of j0i and j1i is real as in (1.101) and
(1.102) or in (2.106). Then, we have
where we used the normalized conditions h0| 0i = h1| 1i = 1 and the orthogonal
conditions h0| 1i = h1| 0i = 0. Thus, we get
Here, defining the ground state energy and first excited energy as E0 and E1,
respectively, we put
180 5 Approximation Methods of Quantum Mechanics
where E0 ≠ E1. As already shown in Chaps. 1 and 2, both the systems have
non-degenerate energy eigenstates. Then, we have
E0 - 2eFah0jxj1i þ a2 E 1
hH i = : ð5:115Þ
1 þ a2
Now we wish to determine the condition where hHi has an extremum. That is, we
are seeking a that satisfies
∂hH i
= 0: ð5:116Þ
∂a
For ∂∂a
hH i
to be zero, we must have
2
E 0 - E 1 ± ðE 1 - E 0 Þ 1 þ ½2eF h0jxj1i
ðE - E Þ2
1 0
a= : ð5:119Þ
2eF h0jxj1i
p Δ
1 þ Δ≈1 þ ð5:120Þ
2
eF h0jxj1i
a≈ : ð5:121Þ
E1 - E0
eF h0jxj1i
j ui ≈ j 0i þ j 1i: ð5:122Þ
E1 - E0
If one compares (5.122) with (5.17), one should recognize that these two equa-
tions are the same within the first-order approximation. Notice that putting i = 0 and
k = 1 in (5.17) and replacing V with -eFx, we have the expression similar to (5.122).
Notice once again that h0| x| 1i = h1| x| 0i as x is an Hermitian operator and that we
are considering real inner products.
Inserting (5.121) into (5.115) and approximating
1
≈ 1 - a2 , ð5:123Þ
1 þ a2
we can estimate hHi. The estimation depends upon the nature of the system we
choose. The next examples deal with these specific characteristics.
Example 5.4 We adopt the same problem as Example 5.1. That is, we consider how
an energy of a particle carrying a charge e confined within a one-dimensional
potential well is changed by the applied electric field. Here we deal with it using
the variational method.
We deal with the same Hamiltonian as in the case of Example 5.1. It is described
as
ħ2 d 2
H= - - eFx: ð5:22Þ
2m dx2
By putting
ħ2 d 2
H0 - , ð5:124Þ
2m dx2
H = H 0 - eFx:
Expressing the ground state as j0i and the first excited state as j1i, we adopt a trial
function jui as
j ui = j 0i þ a j 1i: ð5:125Þ
1 π
j 0i = cos x ð5:26Þ
L 2L
and
1 π
j 1i = sin x: ð5:126Þ
L L
ħ2 π 2
h0jH 0 j0i = E 0 = ,
2m 4L2
ð5:127Þ
ħ2 4π 2
h1jH 0 j1i = E 1 = = 4E 0 :
2m 4L2
h0jxj1i
a ≈ eF : ð5:128Þ
3E0
32L
h0jxj1i = : ð5:129Þ
9π 2
For this calculation, use the coordinate representation of (5.26) and (5.126). Thus,
we get
32 8 mL3
a ≈ eF : ð5:130Þ
27π 4 ħ2
h0jxj1i
j ui ≈ j 0i þ eF j 1i: ð5:131Þ
3E0
The resulting jui in (5.131) is the same as (5.31), which was obtained by the first-
order approximation of (5.30). Inserting (5.130) into (5.113) and approximating the
denominator as before such that
1
≈ 1 - a2 , ð5:123Þ
1 þ a2
322 8 mL4
hH i ≈ E 0 - ðeF Þ2 : ð5:132Þ
243π 6 ħ2
Once again, this is the same result as that of (5.28) and (5.29) where only n = 1 is
taken into account.
Example 5.5 We adopt the same problem as Example 5.2. The calculation pro-
cedures are almost the same as those of Example 5.2. We use
ħ
h0jqj1i = and E 1 - E 0 = ħω: ð5:133Þ
2mω
The detailed procedures are left for readers as an exercise. We should get the same
results as those obtained in Example 5.2.
As discussed in the above five simple examples, we showed the calculation
procedures of the perturbation method and variational method. These methods not
only supply us with suitable approximation techniques, but also provide physical
and mathematical insight in many fields of natural science.
References
Theory of analytic functions is one of the major fields of modern mathematics. Its
application covers a broad range of topics of natural science. A complex function
f(z), or a function that takes a complex number z as a variable, has various properties
that often differ from those of functions that take a real number x as a variable. In
particular, the analytic functions hold a paramount position in the complex analysis.
In this chapter we explore various features of the analytic functions accordingly.
From a practical point of view, the theory of analytic functions is very frequently
utilized for the calculation of real definite integrals. For this reason, we describe the
related topics together with tangible examples.
The complex plane (or Gaussian plane) can be dealt with as a topological space
where the metric (or distance function) is defined. Since the complex plane has a
two-dimensional extension, we can readily imagine and investigate its topological
feature. Set theory allows us to make an axiomatic approach along with the topology.
Therefore, we introduce basic notions and building blocks of the set theory and
topology.
z = x þ iy, ð6:1Þ
where x and y are the real numbers and i is an imaginary unit. Graphically, the
number z is indicated as a point in a complex plane where the real axis is drawn as
abscissa and the imaginary axis is depicted as ordinate (see Fig. 6.1). Since the
complex plane has a two-dimensional extension, we can readily imagine the domain
of variability of z and make a graphical object for it on the complex plane. A disk-
like diagram that is enclosed with a closed curve C is frequently dealt with in the
0 1
theory of analytic functions. Such a diagram provides a good subject for study of the
set theory and topology. Before developing the theory of complex numbers and
analytic functions, we briefly mention the basic notions as well as notations and
meanings of sets and topology.
where A contains ten elements in the former case and uncountably infinite elements
are contained in the latter set B. In the latter case (sometimes in the former case as
well), the elements are usually called points.
When we speak of the sets, a universal set [1] is implied in the background. This
set contains all the sets in consideration of the study and is usually clearly defined in
the context of the discussion. For example, with the real analysis the universal set is
ℝ and with the complex analysis it is ℂ (an entire complex plane). In the latter case a
two-dimensional complex plane (see Fig. 6.1) is frequently used as the universal set.
We study various characteristics of sets (or subsets) that are contained in the
universal set and use a Venn diagram to represent them. Figure 6.2a illustrates an
example of Venn diagram that shows a subset A. Naturally, every set contained in the
universal set is its subset. As is often the case, the universal set U is omitted in
Fig. 6.2a accordingly.
6.1 Set and Topology 187
(b)
a2A
and depict it as a point inside the closed curve. To show that b is not contained in A,
we write
b2
= A:
In that case, we depict b outside the closed curve of Fig. 6.2a. To show that a set
A is contained in another set B, we write
A ⊂ B: ð6:3Þ
188 6 Theory of Analytic Functions
A = B: ð6:4Þ
x; x2 < 0, x 2 ℝ = ∅, fx; x ≠ xg = ∅, x 2
= ∅, etc:
The subset A may have different cardinal numbers (or cardinalities) depending on
the nature of A. To explicitly show this, elements are sometimes indexed as, e.g.,
ai (i = 1, 2, ⋯) or aλ (λ 2 Λ), where Λ can be a finite set or infinite set with different
cardinalities; (6.2) is an example. A complement (or complementary set) of A is
defined as a difference U - A and denoted by Ac(U - A). That is, the set Ac is said
to be a complement of A with respect to U and indicated with a shaded area in
Fig. 6.2b.
Let A and B be two different subsets in U (Fig. 6.3). Figure 6.3 represents a sum
(or union, or union of sets) A [ B, an intersection A \ B, and differences A - B and
B - A. More specifically, A - B is defined as a subset of A to which B is subtracted
from A and referred to as a set difference of A relative to B. We have
A [ B = ðA - BÞ [ ðB - AÞ [ ðA \ BÞ: ð6:5Þ
A [ A = A:
The well-known de Morgan’s law can be understood graphically using Fig. 6.3.
u 2 ðA [ BÞc ⟺u 2
= A [ B⟺u 2
= A and u 2
=B
⟺u 2 Ac and u 2 Bc ⟺u 2 Ac \ Bc :
Furthermore, we have
u 2 ðA \ BÞc ⟺u 2
= A \ B⟺u 2
= A or u 2
=B
⟺u 2 Ac or u 2 Bc ⟺u 2 Ac [ Bc :
ðA [ BÞ \ C = ðA \ C Þ [ ðB \ C Þ, ð6:7Þ
ðA \ BÞ [ C = ðA [ C Þ \ ðB [ C Þ: ð6:8Þ
The confirmation of (6.7) and (6.8) is left for readers. The Venn diagrams are
intuitively acceptable, but we need more rigorous discussion to characterize and
analyze various aspects of sets (vide infra).
ðO2Þ If O1 , O2 , ⋯, On 2 τ, O1 \ O2 ⋯ \ On 2 τ:
ðO3Þ If Oλ ðλ 2 ΛÞ 2 τ, [λ2Λ Oλ 2 τ:
In the above axioms, T is called an underlying set. The coupled object of T and τ
is said to be a topological space and denoted by (T, τ). The members (i.e., subsets) of
τ are defined as open sets. (If we do not assume the topological space, the underlying
set is equivalent to the universal set. Henceforth, however, we do not have to be too
strict with the difference in this kind of terminology.)
Let (T, τ) be a topological space with a subset O′ ⊂ T. Let τ′ be defined such that
τ0 fO0 \ O; O 2 τg:
Then, (O′, τ′) can be regarded as a topological space. In this case τ′ is called a
relative topology for O′ [2, 3] and O′ is referred to as a subspace of T. The
topological space (O′, τ′) shares the same topological feature as that of (T, τ). Notice
that O′ does not necessarily need to be an open set of T. However, if O′ is an open set
of T, then any open set belonging to O′ is an open set of T as well. This is evident
from the definition of the relative topology and Axiom (O2).
The above-mentioned Axioms (O1) to (O3) sound somewhat pretentious and the
definition of the open sets seems to be “descent from heaven.” Nonetheless, these
axioms and terminology turn out useful soon. Once a topological space (T, τ) is
given, subsets of various types arise and a variety of relationships among them ensue
as well. Note that the above axioms mention nothing about a set that is different from
an open set. Let S be a subset (maybe an open set or maybe not) of T and let us think
of the properties of S.
Definition 6.1 Let (T, τ) be a topological space and S be an open set of τ; i.e., S 2 τ.
Then, a subset defined as a complement of S in (T, τ); namely, Sc = T - S is called a
closed set in (T, τ).
Replacing S with Sc in Definition 6.1, we have (Sc)c = S = T - Sc. The above
definition may be rephrased as follows: Let (T, τ) be a topological space and let A be
a subset of T. Then, if Ac = T - A 2 τ, then A is a closed set in (T, τ).
In parallel with the axioms (O1) to (O3), we have the following axioms related to
the closed sets of (T, τ): Let ~τ be a collection of all the closed sets of (T, τ).
These axioms are obvious from those of (O1) to (O3) along with de Morgan’s law
(6.6).
Next, we classify and characterize a variety of elements (or points) and subsets as
well as relationships among them in terms of the open sets and closed sets. We make
the most of these notions and properties to study various aspects of topological
spaces and their structures. In what follows, we assume the presence of a topological
space (T, τ).
a 2 O ⊂ S,
With the notion of neighborhoods at the core, we shall see further characterization of
sets and elements of the topological spaces.
Definition 6.3 Let x be a point of T and S be a subset of T. If S is a neighborhood of
x, x is said to be in the interior of S. In this case, x is called an interior point of S. The
interior of S is denoted by S∘.
The interior as a technical term can be understood as follows: Suppose that x 2
= S.
Then, by Definition 6.2 S is not a neighborhood of x. By Definition 6.3, this leads to
= S∘. This statement can be translated into x 2 Sc ) x 2 (S∘)c. That is, Sc ⊂ (S∘)c.
x2
This is equivalent to
S∘ ⊂ S: ð6:9Þ
S ⊂ S: ð6:10Þ
S∘ ⊂ S ⊂ S: ð6:11Þ
In relation to the interior and closure, we have the following important lemmas.
Lemma 6.1 Let S be a subset of T and O be any open set contained in S. Then, we
have O ⊂ S∘.
Proof Suppose that with an arbitrary point x, x 2 O. Since we have x 2 O ⊂ S, S is a
neighborhood of x (due to Definition 6.2). Then, from Definition 6.3 we have x 2 S∘.
Then, we have x 2 O ) x 2 S∘. This implies O ⊂ S∘. This completes the proof.
Lemma 6.2 Let S be a subset of T. If x 2 S∘, then there is some open set ∃O that
satisfies x 2 O ⊂ S.
Proof Suppose that with a point x, x 2 S∘. Then by Definition 6.3, S is a neighbor-
hood of x. Meanwhile, by Definition 6.2 there is some open set ∃O that satisfies
x 2 O ⊂ S. This completes the proof.
Lemmas 6.1 and 6.2 teach us further implications. From Lemma 6.1 and (6.11)
we have
O ⊂ S∘ ⊂ S ð6:12Þ
with an arbitrary open set 8O contained in S∘. From Lemma 6.2 we have
x 2 S∘ ) x 2 O; i.e., S∘ ⊂ O with some open set ∃O containing S∘. Expressing
~ we have S∘ ⊂ O.
this specific O as O, ~ Meanwhile, using Lemma 6.1 once again we
must get
~ ⊂ S∘ ⊂ S:
O ð6:13Þ
6.1 Set and Topology 193
(a) (b)
° ̅
Fig. 6.5 (a) Largest open set S∘ contained in S. O denotes an open set. (b) Smallest closed set S
containing S. C denotes a closed set
That is, we have S∘ ⊂ O~ and O~ ⊂ S∘ at once. This implies that with this specific
~ ~ ∘
open set O, we get O = S . That is,
~ = S∘ ⊂ S:
O ð6:14Þ
Relation (6.14) obviously shows that S∘ is an open set and moreover that S∘ is the
largest open set contained in S. Figure 6.5a depicts this relation. In this context we
have a following important theorem.
Theorem 6.1 Let S be a subset of T. The subset S is open if and only if S = S∘.
Proof From (6.14) S∘ is open. Therefore, if S = S∘, then S is open. Conversely,
suppose that S is open. In that event S itself is an open set contained in S. Meanwhile,
S∘ has been characterized as the largest open set contained in S. Consequently, we
have S ⊂ S∘. At the same time, from (6.11) we have S∘ ⊂ S. Thus, we get S = S∘.
This completes the proof.
In contrast to Lemmas 6.1 and 6.2, we have the following lemmas with respect to
closures.
Lemma 6.3 Let S be a subset of T and C be any closed set containing S. Then, we
have S ⊂ C.
Proof Since C is a closed set, Cc is an open set. Suppose that with a point x, x 2
= C.
Then, x 2 Cc. Since C ⊃ S, Cc ⊂ Sc. This implies Cc \ S = ∅. As x 2 Cc ⊂ Cc, by
Definition 6.2 Cc is a neighborhood of x. Then, by Definition 6.4 x is not in S; i.e.,
c c
x2= S. This statement can be translated into x 2 C c ) x 2 S ; i.e., C c ⊂ S . That
is, we get S ⊂ C.
Lemma 6.4 Let S be a subset of T and suppose that a point x does not belong to S;
i.e., x 2 = C for some closed set ∃C containing S.
= S. Then, x 2
= S, then by Definition 6.4 we have some neighborhood ∃N and some
Proof If x 2
∃
open set O contained in that N such that x 2 O ⊂ N and N \ S = ∅. Therefore, we
194 6 Theory of Analytic Functions
S⊂S⊂C ð6:15Þ
~
we have C ⊂ S. Meanwhile, using Lemma 6.3 once again we must get
~
S ⊂ S ⊂ C: ð6:16Þ
~ ⊂ S and S ⊂ C
That is, we have C ~ at once. This implies that with this specific
~ we get C
closed set C, ~ = S. Hence, we get
~
S ⊂ S = C: ð6:17Þ
Relation (6.17) obviously shows that S is a closed set and moreover that S is the
smallest closed set containing S. Figure 6.5b depicts this relation. In this context we
have a following important theorem.
Theorem 6.2 Let S be a subset of T. The subset S is closed if and only if S = S.
Proof The proof can be done analogously to that of Theorem 6.1. From (6.17) S is
closed. Therefore, if S = S, then S is closed. Conversely, suppose that S is closed. In
that event S itself is a closed set containing S. Meanwhile, S has been characterized as
the smallest closed set containing S. Then, we have S ⊃ S. At the same time, from
(6.11) we have S ⊃ S. Thus, we get S = S. This completes the proof.
Sb = S \ Sc : ð6:18Þ
Thus, we have
Sb = ð S c Þ b : ð6:20Þ
This means that S and Sc have the same boundary. Since both S and Sc are closed
sets and the boundary is defined as their intersection, from Axiom (C3) the boundary
is a closed set. From Definition 6.1 a closed set is defined as a complement of an
open set, and vice versa. Thus, the open set and closed set do not make a confron-
tation concept but a complementation concept. In this respect we have the following
lemma and theorem.
Lemma 6.5 Let S be an arbitrary subset of T. Then, we have
S c = ð S∘ Þ c :
position. Thus, we must not have such a, implying that S = S∘. Next, starting with
S ⊃ S and S ≠ S, we assume that there is a such that a 2 S and a 2 = S. That is, we
have a 2 Sc ) a 2 S \ Sc ⊂ S \ Sc = Sb , in contradiction to the supposition.
Thus, we must not have such a, implying that S = S. Combining this relation
with S = S∘ obtained above, we get S = S = S∘ . In other words, S is both open and
closed at once. These complete the proof.
196 6 Theory of Analytic Functions
We have another important class of elements and sets that include accumulation
points and isolated points.
Definition 6.6 Let (T, τ) be a topological space and S be a subset of T. Suppose that
we have a point p 2 T. If for any neighborhood N of p we have N \ (S - {p}) ≠ ∅,
then p is called an accumulation point of S. A set comprising all the accumulation
points of S is said to be a derived set of S and denoted by Sd.
Note that from Definition 6.4 we have
S d = ð S [ S c Þ \ S d = Sd \ S [ S d \ S c , ð6:22Þ
þ -
Sd Sd \ S and Sd Sd \ S c :
Then, (6.22) means that Sd is divided into two sets consisting of the accumulation
points, i.e., (Sd)+ that belongs to S and (Sd)- that does not belong to S. Thus, as
another direct sum we have
6.1 Set and Topology 197
þ
S = Sd \ S [ Sdsc = Sd [ Sdsc : ð6:23Þ
Equation (6.23) represents the direct sum consisting of a part of the derived set
and the whole discrete set. Here we have the following theorem.
Theorem 6.4 Let S be a subset of T. Then we have a following equation:
S ⊃ Sd [ Sdsc :
It is natural to ask how to deal with other points that do not belong to Sd [ Sdsc.
Taking account of the above discussion on the accumulation points and isolated
points, however, such remainder points should again be classified into accumulation
and isolated points. Since all the isolated points are contained in S, they can be taken
into S.
Suppose that among the accumulation points, some points p are not contained in
S; i.e., p 2
= S. In that case, we have S - {p} = S, namely S - fpg = S. Thus,
p 2 S - fpg , p 2 S. From (6.21), in turn, this implies that in case p is not
contained in S, for p to be an accumulation point of S is equivalent to that p is an
adherent point of S. Then, those points p can be taken into S as well. Thus, finally, we
get (6.24). From Definitions 6.6 and 6.7, obviously we have Sd \ Sdsc = ∅. This
completes the proof.
Rewriting (6.24), we get
- þ -
S = Sd [ Sd [ Sdsc = Sd [ S: ð6:25Þ
S = Sd [ S: ð6:26Þ
Proof Let us assume that p 2 S. Then, we have the following two cases: (i) if p 2 S,
we have trivially p 2 S ⊂ Sd [ S. (ii) Suppose p 2
= S. Since p 2 S, from Definition 6.4
any neighborhood of p contains a point of S (other than p). Then, from Definition
6.6, p is an accumulation point of S. Thus, we have p 2 Sd ⊂ Sd [ S and, hence, get
S ⊂ Sd [ S. Conversely, suppose that p 2 Sd [ S. Then, (i) if p 2 S, obviously we
have p 2 S. (ii) Suppose p 2 Sd. Then, from Definition 6.6 any neighborhood of
198 6 Theory of Analytic Functions
( )
( )
̅ = ∪ = ∪ ,
= ( ) ∪( )
p contains a point of S (other than p). Thus, from Definition 6.4, p 2 S. Taking into
account the above two cases, we get Sd [ S ⊂ S. Combining this with S ⊂ Sd [ S
obtained above, we get (6.26). This completes the proof.
Theorem 6.6 Let S be a subset of T. A necessary and sufficient condition for S be a
closed set is that S contains all the accumulation points of S.
Proof From Theorem 6.2, that S is a closed set is equivalent to S = S. From
Theorem 6.5 this is equivalent to S = Sd [ S. Obviously, this is equivalent to
Sd ⊂ S. In other words, S contains all the accumulation points of S. This completes
the proof.
As another proof, taking account of (6.25), we have
-
Sd = ∅ ⟺ S = S ⟺S is a closed set ðfrom Theorem 6:2Þ :
This relation is equivalent to the statement that S contains all the accumulation
points of S.
In Fig. 6.6 we show the relationship among S, S, Sd, and Sdsc. From Fig. 6.6, we
can readily show that
Sdsc = S - Sd = S - Sd :
S b = S \ Sc = S \ ð S ∘ Þ c = S - S∘ : ð6:27Þ
Since S ⊃ S∘ , we get
6.1 Set and Topology 199
S = S∘ [ Sb ; S∘ \ Sb = ∅:
Sb = ∅ ⟺ S = S∘ ⟺ S is a clopen set:
(e) Connectedness
In the precedent several paragraphs we studied various building blocks and their
properties of the topological space and sets contained in the space. In this paragraph,
we briefly mention the relationship between the sets. The connectedness is an
important concept in the topology. The concept of the connectedness is associated
with the subspaces defined earlier in this section. In terms of the connectedness, a
topological space can be categorized into two main types; a subspace of the one type
is connected and that of the other is disconnected.
Let A be a subspace of T. Intuitively, for A to be connected is defined as follows:
Let z1 and z2 be any two points of A. If z1 and z2 can be joined by a continuous line
that is contained entirely within A (see Fig. 6.7a), then A is said to be connected. Of
the connected subspaces, suppose that the subspace A has the property that all the
points inside any closed path (i.e., continuous closed line) C within A contain only
the points that belong to A (Fig. 6.7b). Then, that subspace A is said to be simply
connected. A subspace that is not simply connected but connected is said to be
multiply connected. A typical example for the latter is a torus. If a subspace is given
as a union of two (or more) disjoint non-empty (open) sets, that subspace is said to be
disconnected. Notice that if two (or more) sets have no element in common, those
sets are said to be disjoint. Fig. 6.7c shows an example for which a disconnected
subspace A is a union of two disjoint sets B and C.
Interesting examples can be seen in Chap. 20 in relation to the connectedness.
6.1.3 T1-Space
We can “enter” a variety of topologies τ into a set T to get a topological space (T, τ).
We have two extremes of them. One is an indiscrete topology (or trivial topology)
and the other is a discrete topology [2]. For the former the topology is characterized
as
τ = f∅, T g ð6:28Þ
(a) (b)
(c)
= ∪
Fig. 6.7 Connectedness of subspaces. (a) Connected subspace A. Any two points z1 and z2 of A can
be connected by a continuous line that is contained entirely within A. (b) Simply connected
subspace A. All the points inside any closed path C within A contain only the points that belong
to A. (c) Disconnected subspace A that comprises two disjoint sets B and C
With τ all the subsets are open and complements to the individual subsets are
closed by Definition 6.1. However, such closed subsets are contained in τ, and so
they are again open. Thus, all the subsets are clopen sets. Practically speaking, the
aforementioned two extremes are of less interest and value. For this reason, we need
some moderate separation conditions with topological spaces. These conditions are
well-established as the separation axioms. We will not get into details about the
discussion [2], but we mention the first separation axiom (Fréchet axiom) that
produces the T1-space.
Definition 6.8 Let (T, τ) be a topological space. Suppose that with respect to
8
x, y (x ≠ y) 2 T there is a neighborhood N in such a way that
x 2 N and y 2
= N:
The topological space that satisfies the above separation condition is called a
T1-space.
We mention two important theorems of the T1-space.
6.1 Set and Topology 201
Theorem 6.7 A necessary and sufficient condition for (T, τ) to be a T1-space is that
for each point x 2 T, {x} is a closed set of T.
Proof
1. Necessary condition: Let (T, τ) be a T1-space. Choose any pair of elements x, y
(x ≠ y) 2 T. Then, from Definition 6.8 there is a neighborhood ∃N of y (≠x) that
does not contain x. Then, we have y 2 N and N \ {x} = ∅. From Definition 6.4
this implies that y 2 = fxg. Thus, we have y ≠ x ) y 2 = fxg. This means that
y 2 fxg ) y = x. Namely, fxg ⊂ fxg, but from (6.11) we have fxg ⊂ fxg.
Thus, fxg = fxg. From Theorem 6.2 this shows that {x} is a closed set of T.
2. Sufficient condition: Suppose that {x} is a closed set of T. Choose any x and
y such that y ≠ x. Since {x} is a closed set, N = T - {x} is an open set that contains
y; i.e., y 2 T - {x} and x 2
= T - {x}. Since y 2 T - {x} ⊂ T - {x} where T - {x}
is an open set, from Definition 6.2 N = T - {x} is a neighborhood of y, and
moreover N does not contain x. Then, from Definition 6.8 (T, τ) is a T1-space.
These complete the proof.
A set comprising a unit element is called a singleton or a unit set.
Theorem 6.8 Let S be a subset of a T1-space. Then Sd is a closed set in the T1-space.
spaces can be obtained by imposing stronger constraints upon the T1-space [2]. A
typical example for this is a metric space. In this context, the T1-space has acquired
primitive notions of metric that can be shared with the metric space. Definitions of
the metric (or distance function) and metric space will briefly be summarized in
Chap. 13 in reference to an inner product space.
In the above discussion, we have described a brief outline of the set theory and
topology. We use the results in this chapter and in Part IV as well.
z = x þ iy ð6:1Þ
j zj = x 2 þ y2 : ð6:31Þ
i
x
0 1
6.1 Set and Topology 203
(Fig. 6.8). In the complex analysis the angular coordinate is called an argument and
denoted by arg z so as to be
argz = θ þ 2πn; n = 0, ± 1, ± 2, ⋯:
we get
z = reiθ : ð6:34Þ
where the last equality comes from replacing θ with nθ in (6.33). Comparing both
sides with the last equality of (6.35), we have
Equation (6.36) is called the de Moivre’s theorem. This relation holds with
n being integers including zero along with rational numbers including negative
numbers. Euler’s formula (6.33) immediately leads to the following important
formulae:
1 iθ 1 iθ
cos θ = e þ e - iθ , sin θ = e - e - iθ : ð6:37Þ
2 2i
Notice that although in (6.32) and (6.33) we assumed that θ is a real number,
(6.33) and (6.37) hold with any complex number θ. Note also that (6.33) results from
(6.37) and the following definition of power series expansion of the exponential
function
204 6 Theory of Analytic Functions
1 zk
ez k = 0 k!
, ð6:38Þ
where z is any complex number [5]. In fact, from (6.37) and (6.38) we have the
following familiar expressions of power series expansion of the cosine and sine
functions
1 z2k 1 z2kþ1
cos z = ð- 1Þk , sin z = ð- 1Þk :
k=0 ð2kÞ! k=0 ð2k þ 1Þ!
z - w = ðx - uÞ þ iðy - vÞ:
jz - wj = ð x - uÞ 2 þ ð y - v Þ 2 : ð6:39Þ
ρðz, wÞ jz - wj,
ρ(z, w) defines a metric (or distance function); see Chap. 13. Thus, the metric gives a
distance between any arbitrarily chosen pair of elements z, w 2 ℂ and (ℂ, ρ) can be
dealt with as a metric space. This makes it easy to view the complex plane as a
topological space. Discussions about the set theory and topology already developed
earlier in this chapter basically hold.
In the theory of analytic functions, a subset A depicted in Fig. 6.7 can be regarded
as a part of the complex plane. Since the complex plane ℂ itself represents a
topological space, the subset A may be viewed as a subspace of ℂ. A complex
function f(z) of a complex variable z of (6.1) is usually defined in a connected open
set called a region [5]. The said region is of practical use among various subspaces
and can be an entire domain of ℂ or a subset of ℂ. The connectedness of the region
can be considered in a manner similar to that described in Sect. 6.1.2.
6.2 Analytic Functions of a Complex Variable 205
Since the complex number and complex plane have been well-characterized in the
previous section, we deal with the complex functions of a complex variable in the
complex plane.
Taking the complex conjugate of (6.1), we have
z = x - iy: ð6:40Þ
ð z z Þ = ð x y Þ 1
i
1
-i
: ð6:41Þ
1 1
The matrix i -i
is non-singular, and so the inverse matrix exists (see Sect.
11.3) and (x y) can be described as
1 -i
ð x yÞ = ð z z Þ 1
ð6:42Þ
2 1 i
1 i
x= ðz þ z Þ and y = - ðz - z Þ, ð6:43Þ
2 2
where both u(x, y) and v(x, y) are the real functions of real variables x and y.
Here, let us pause for a second to consider the differentiation of a function.
Suppose that a mathematical function is defined on a real domain (i.e., a real number
line). Consider whether that function is differentiable at a certain point x0 of the real
number line. On this occasion, we can approach x0 only from two directions, i.e.,
from the side of x < x0 (from the left) or from the side of x0 < x (from the right); see
Fig. 6.10a. Meanwhile, suppose that a mathematical function is defined on a
complex domain (i.e., a complex plane). Also consider whether the function is
differentiable at a certain point z0 of the complex plane. In this case, we can approach
z0 from continuously varying directions; see Fig. 6.10b where only four directions
are depicted.
In this context let us think of a simple example.
206 6 Theory of Analytic Functions
∗
−
∗
−
∗ ∗
− − +
0
∗
−
∗
− +
f ðx, yÞ = 2x þ iy = z þ x: ð6:45Þ
1 i 1
hðz, z Þ = 2 ðz þ z Þ þ i - ðz - z Þ = z þ z þ ðz - z Þ
2 2 2
1
= ð3z þ z Þ,
2
where f(x, y) and h(z, z) denote different functional forms. The derivative (6.45)
varies depending on a way z0 is approached. For instance, think of
Suppose that the differentiation is taken along a straight line in a complex plane
represented by iy = (ik)x (k, x, y : real). Then, we have
6.2 Analytic Functions of a Complex Variable 207
(b)
z
0 1
df 2x2 þ k2 x2 - ikx2 2 þ k 2 - ik
= lim = lim
dz 0 x → 0, y → 0 x2 þ k x2
2 x → 0, y → 0 1 þ k2
2 þ k2 - ik
= : ð6:46Þ
1 þ k2
df
However, this means that dz 0 takes varying values depending upon k. Namely,
df
dz 0cannot uniquely be defined but depends on different ways to approach the origin
of the complex plane. Thus, we find that the derivative takes different values
depending on straight lines along which the differentiation is taken. This means
that f(x, y) is not differentiable or analytic at z = 0.
Meanwhile, think of g(z) expressed as
gðzÞ = x þ iy = z: ð6:47Þ
As a result, the derivative takes the same value 1, regardless of what straight lines
the differentiation is taken along.
Though simple, the above example gives us a heuristic method. As before, let f(z)
be a function described as (6.44) such that
where both u(x, y) and v(x, y) possess the first-order partial derivatives with respect to
x and y. Then we have a derivative df(z)/dz such that
We wish to seek the condition that dfdzðzÞ gives the same result regardless of the
order of taking the limit of Δx → 0 andΔy → 0. That is, taking the limit of Δy → 0
first, we have
Consequently, by equating the real and imaginary parts of (6.50) and (6.51) we
must have
6.2 Analytic Functions of a Complex Variable 209
∂uðx, yÞ ∂vðx, yÞ
= , ð6:52Þ
∂x ∂y
∂vðx, yÞ ∂uðx, yÞ
=- : ð6:53Þ
∂x ∂y
These relationships (6.52) and (6.53) are called the Cauchy-Riemann conditions.
Differentiating (6.52) with respect to x and (6.53) with respect to y and further
subtracting one from the other, we get
2 2
∂ uðx, yÞ ∂ uðx, yÞ
2
þ 2
= 0: ð6:54Þ
∂x ∂y
Similarly we have
2 2
∂ vðx, yÞ ∂ vðx, yÞ
2
þ 2
= 0: ð6:55Þ
∂x ∂y
df ðzÞ ∂f
= : ð6:56Þ
dz ∂x
df ðzÞ ∂f
= -i : ð6:57Þ
dz ∂y
Meanwhile, we have
∂ ∂z ∂ ∂z ∂ ∂ ∂
= þ = þ : ð6:58Þ
∂x ∂x ∂z ∂x ∂z ∂z ∂z
Also, we get
∂ ∂z ∂ ∂z ∂ ∂ ∂
= þ =i - : ð6:59Þ
∂y ∂y ∂z ∂y ∂z ∂z ∂z
where the change in the functional form is due to the variables transformation.
Hence, from (6.56) and using (6.58) we have
Similarly, we get
∂F ðz, z Þ
= 0: ð6:63Þ
∂z
This clearly shows that if F(z, z) is differentiable with respect to z, F(z, z) does
not depend on z, but only depends upon z. Thus, we may write the differentiable
function as
F ðz, z Þ f ðzÞ:
where u(x, y) and v(x, y) satisfy the Cauchy-Riemann conditions and possess con-
tinuous first-order partial derivatives with respect to x and y in some region of ℂ.
Then, we have
∂uðx, yÞ ∂uðx, yÞ
uðx þ Δx, y þ ΔyÞ - uðx, yÞ = Δx þ Δy þ ε1 Δx þ δ1 Δy, ð6:64Þ
∂x ∂y
6.2 Analytic Functions of a Complex Variable 211
∂vðx, yÞ ∂vðx, yÞ
vðx þ Δx, y þ ΔyÞ - vðx, yÞ = Δx þ Δy þ ε2 Δx þ δ2 Δy, ð6:65Þ
∂x ∂y
where four positive numbers ε1, ε2, δ1, and δ2 can be made arbitrarily small as Δx and
Δy tend to be zero. The relations (6.64) and (6.65) result from the continuity of the
first-order partial derivatives of u(x, y) and v(x, y).
Then, we have
f ðz þ ΔzÞ - f ðzÞ
Δz
∂uðx, yÞ ∂vðx, yÞ Δx Δy
= þi þ ðε þ iε2 Þ þ ðδ þ iδ2 Þ, ð6:66Þ
∂x ∂x Δz 1 Δz 1
where with the third equality we used the Cauchy-Riemann conditions (6.52) and
(6.53). Accordingly, we have
Notice that (6.69) and (6.70) are equivalent to the Cauchy-Riemann conditions.
From (6.56) and (6.57), the relations (6.69) and (6.70) imply that for f(z) to be
differentiable requires that the first-order partial derivatives of u(x, y) and v(x, y) with
respect to x and y should exist. Meanwhile, once f(z) is found to be differentiable
(or analytic), its higher order derivatives must be analytic as well (vide infra). This
requires, in turn, that the above first-order partial derivatives should be continuous.
(Note that the analyticity naturally leads to continuity.) Thus, the following theorem
will follow.
Theorem 6.9 Let f(z) be a complex function of complex variables z such that
f(z) = u(x, y) + iv(x, y), where x and y are the real variables. Then, a necessary and
sufficient condition for f(z) to be differentiable is that the first-order partial deriva-
tives of u(x, y) and v(x, y) with respect to x and y exist and are continuous and that
u(x, y) and v(x, y) satisfy the Cauchy-Riemann conditions.
Now, we formally give definitions of the differentiability and analyticity of a
complex function of complex variables.
Definition 6.9 Let z0 be a given point of the complex plane ℂ. Let f(z) be defined in
a neighborhood containing z0. If f(z) is single-valued and differentiable at all points
of this neighborhood containing z0, f(z) is said to be analytic at z0. A point at which
f(z) is analytic is called a regular point of f(z). A point at which f(z) is not analytic is
called a singular point of f(z).
We emphasize that the above definition of analyticity at some point z0 requires the
single-valuedness and differentiability at all the points of a neighborhood containing
z0. This can be understood by Fig. 6.10b and Example 6.1. The analyticity at a point
is deeply connected to how and in which direction we take the limitation process.
6.3 Integration of Analytic Functions: Cauchy’s Integral Formula 213
Thus, we need detailed information about the neighborhood of the point in question
to determine whether the function is analytic at that point.
The next definition is associated with a global characteristic of the analytic
function.
Definition 6.10 Let R be a region contained in the complex plane ℂ. Let f(z) be a
complex function defined in R . If f(z) is analytic at all points of R , f(z) is said to be
analytic in R . In this case R is called a domain of analyticity. If the domain of
analyticity is an entire complex plane ℂ, the function is called an entire function.
To understand various characteristics of the analytic functions, it is indispensable
to introduce Cauchy’s integral formula. In the next section we deal with the
integration of complex functions.
where z0 za = z(ta) and zn zb = z(tb) together with zi = z(ti) (0 ≤ i ≤ n). For this
parametrization, we assume that C is subdivided into n pieces designated by z0, z1,
⋯, zn.
Let f(z) be a complex function defined in a region containing C. In this situation,
let us assume the following summation Sn:
n
Sn = i=1
f ðζ i Þðzi - zi - 1 Þ, ð6:72Þ
zb
I = lim Sn f ðzÞdz = f ðzÞdz: ð6:74Þ
n→1
C za
dx dy dx dy
I= uðx, yÞ dt - vðx, yÞ dt þ i vðx, yÞ dt þ uðx, yÞ dt
C dt dt C dt dt
dx dy
= ½uðx, yÞ þ ivðx, yÞ dt þ i ½uðx, yÞ þ ivðx, yÞ dt
C dt C dt
dx dy dz
= ½uðx, yÞ þ ivðx, yÞ þi dt = ½uðx, yÞ þ ivðx, yÞ dt,
C dt dt C dt
tb zb
dz dz
= f ðzÞ dt = f ðzÞ dt = f ðzÞdz: ð6:76Þ
C dt ta dt za
Notice that the contour integral I of (6.74), (6.75), or (6.76) depends in general on
the paths connecting the points za and zb. If so, (6.76) is merely of secondary
importance. However, if f(z) can be described by the derivative of another function,
the situation becomes totally different. Suppose that we have
~ ðzÞ
dF
f ðzÞ = : ð6:77Þ
dz
~ ðzÞ dz dF
dz d F ~ ½ zðt Þ
f ðzÞ = = : ð6:78Þ
dt dz dt dt
dz zb tb ~ ½zðt Þ
dF
f ðzÞ dt = f ðzÞdz = ~ ðzb Þ - F
dt = F ~ ðza Þ: ð6:79Þ
C dt za ta dt
This implies that the contour integral I does not depend on the path C connecting
the points za and zb. We must be careful, however, about a situation where there
would be a singular point zs of f(z) in a region R . Then, f(z) is not defined at zs. If one
chooses C′′ for a contour of the return path in such a way that zs is on C′′ (see
Fig. 6.11), RHS of (6.80) cannot exist, nor can (6.80) hold. To avoid that situation,
we have to “expel” such singularity from the region R so that R can wholly be
contained in the domain of analyticity of f(z) and that f(z) can be analytic in the entire
region R .
Moreover, we must choose R so that the whole region inside any closed paths
within R can contain only the points that belong to the domain of analyticity of f(z).
That is, the region R must be simply connected in terms of the analyticity of f(z) (see
Sect. 6.1). For a region R to be simply connected ensures that (6.80) holds with any
pair of points za and zb in R , so far as the integration path with respect to (6.80) is
contained in R .
From (6.80), we further get
216 6 Theory of Analytic Functions
zb za
f ðzÞdz þ f ðzÞdz = 0: ð6:81Þ
za zb
Since the integral does not depend on the paths, we can take a path from zb to za
along a curve C′ (see Fig. 6.11). Thus, we have
f ðzÞdz = 0, ð6:83Þ
~
C
f ðzÞdz = 0: ð6:84Þ
C
There is a variation in description of Cauchy’s integral theorem. For instance, let f(z)
be analytic in R except for a singular point at zs; see Fig. 6.12a. Then, the domain of
analyticity of f(z) is not identical to R , but is defined as R - fzs g. That is, the domain
of analyticity of f(z) is no longer simply connected and, hence, (6.84) is not generally
(a) (b)
Γ
ℛ
ℛ
Fig. 6.12 Region R and contour C for integration. (a) A singular point at zs is contained within R .
(b) The singular point at zs is not contained within R~ so that f(z) can be analytic in a simply
connected region R~ . Regarding the symbols and notations, see text
6.3 Integration of Analytic Functions: Cauchy’s Integral Formula 217
true. In such a case, we may deform the integration path of f(z) so that its domain of
analyticity can be simply connected and that (6.84) can hold. For example, we take
R~ so that it can be surrounded by curves C and Γ ~ as well as lines L1 and L2
(Fig. 6.12b). As a result, zs has been expelled from R~ and, at the same time, R~
becomes simply connected. Then, as a tangible form of (6.84) we get
f ðzÞdz = 0, ð6:85Þ
~ 1 þL2
CþΓþL
~ denotes the clockwise integration (see Fig. 6.12b) and the lines L1 and L2 are
where Γ
rendered as closely as possible. In (6.85) the integrations with L1 and L2 cancel,
because they are taken in the reverse direction. Thus, we have
where Γi is taken so that it can encircle individual singular points. Reflecting the
nature of singularity, we get different results with (6.87). We will come back to this
point later.
On the basis of Cauchy’s integral theorem, we show an integral representation of
an analytic function that is well-known as Cauchy’s integral formula.
Theorem 6.11: Cauchy’s Integral Formula Let f(z) be analytic in a simply
connected region R . Let C be an arbitrary closed curve within R that encircles z.
Then, we have
1 f ðζ Þ
f ðzÞ = dζ, ð6:88Þ
2πi C ζ-z
f ðζ Þ f ðζ Þ dζ f ðζ Þ - f ðzÞ
dζ = dζ = f ðzÞ þ dζ: ð6:89Þ
C -z
ζ Γ -z
ζ Γ -z
ζ Γ ζ-z
0 1
ζ = z þ ρeiθ : ð6:90Þ
2π
dζ
= idθ = 2πi: ð6:92Þ
Γ -z
ζ 0
f ðζ Þ f ðζ Þ - f ðzÞ
dζ = 2πif ðzÞ þ dζ: ð6:93Þ
Γ -z
ζ Γ ζ-z
1 f ðζ Þ 1 f ðζ Þ 1 f ðζ Þ - f ðzÞ
dζ - f ðzÞ = dζ - f ðzÞ = dζ
2πi C ζ - z 2πi Γ ζ - z 2πi Γ ζ-z
1 f ðζ Þ - f ðzÞ ε jdζ j ε 2π
≤ jdζ j < = dθ = ε: ð6:94Þ
2π Γ ζ-z 2π Γ j ζ - z j 2π 0
In the above proof, the domain of analyticity R of ζf ð-ζÞz (with respect to ζ) is given
by R = R \ ½ℂ - fzg = R - fzg. Since both R and ℂ - fzg are open sets (i.e.,
ℂ - {z} = {z}c is a complementary set of a closed set {z} and, hence, an open set), R
is an open set as well; see Sect. 6.1. Notice that R is not simply connected. For this
reason, we had to evaluate (6.88) using (6.89).
Theorem 6.10 (Cauchy’s integral theorem) and Theorem 6.11 (Cauchy’s integral
formula) play a crucial role in the theory of analytic functions.
To derive (6.94), we can equally use the Darboux’s inequality [5]. This is
intuitively obvious and frequently used in the theory of analytic functions.
Theorem 6.12: Darboux’s Inequality [5] Let f(z) be a function for which jf(z)j is
bounded on C. Here C is a piecewise continuous path in the complex plane. Then,
with the following integral I described by
I= f ðzÞdz,
C
we have
where L represents the arc length of the curve C for the contour integration.
Proof As discussed earlier in this section, the integral is the limit of n → 1 of the
sum described by
n
Sn = i=1
f ðζ i Þðzi - zi - 1 Þ ð6:72Þ
with
n
I = lim Sn = lim i=1
f ðζ i Þðzi - zi - 1 Þ: ð6:73Þ
n→1 n→1
j Sn j ≤ max j f j L:
220 6 Theory of Analytic Functions
That is,
1 f ðζ Þ - f ðzÞ
dζ < ε: ð6:97Þ
2πi Γ ζ-z
~f ðzÞ = 1 f ðζ Þ
dζ: ð6:98Þ
2πi C -z
ζ
~f ðz þ ΔzÞ - ~f ðzÞ 1 f ðζ Þ
Δ - dζ , ð6:99Þ
Δz 2πi C ðζ - zÞ
2
j Δz j f ðζ Þ
Δ= 2
dζ : ð6:100Þ
2π C ðζ - z - ΔzÞðζ - zÞ
1 f ðζ ÞðΔzÞ2
= 2
dζ :
2πiΔz C ðζ - z - ΔzÞðζ - zÞ
d~f ðzÞ 1 f ðζ Þ
= 2
dζ: ð6:101Þ
dz 2πi C ðζ - zÞ
~f ðzÞ = 1 f ðζ Þ
dζ: ð6:102Þ
2πi C ζ-z
1 f ðζ Þ
f ðzÞ = dζ: ð6:103Þ
2πi C ζ-z
df ðzÞ 1 f ðζ Þ
= 2
dζ: ð6:105Þ
dz 2πi C ðζ - zÞ
An analogous result holds with the n-th derivative of f(z) such that [5]
222 6 Theory of Analytic Functions
d n f ðzÞ n! f ðζ Þ
= dζ ðn : zero or positive integersÞ: ð6:106Þ
dzn 2πi C ðζ - zÞnþ1
df 1 f ðζ Þ
= 2
dζ:
dz 2πi C ðζ - zÞ
Since f(z) is an entire function, we can arbitrarily choose a large enough circle of
radius R centered at z for a closed contour C. On the circle, we have
ζ = z þ Reiθ ,
where θ is a real number changing from 0 to 2π. Then, the above equation can be
rewritten as
df 1 2π
f ðζ Þ 1 2π
f ðζ Þ
= iReiθ dθ = dθ: ð6:107Þ
dz 2πi 0 ðReiθ Þ2 2πR 0 eiθ
2π 2π
df 1 M M
j j ≤ j f ðζ Þ j dθ ≤ dθ = , ð6:108Þ
dz 2πR 0 2πR 0 R
sin π2 þ ia takes real values with sin π2 þ ia → 1. This simple example clearly
shows that sinϕ is an unbounded entire function in a complex domain. As a very
familiar example of a bounded entire functions, we show
Using Cauchy’s integral formula, we study Taylor’s series and Laurent’s series in
relation to the power series expansion of analytic function. Taylor’s series and
Laurent’s series are fundamental tools to study various properties of complex
functions in the theory of analytic functions. First, let us examine Taylor’s series
of an analytic function.
Theorem 6.14 [6] Any analytic function f(z) can be expressed by uniformly
convergent power series at an arbitrary regular point of the function in the domain
of analyticity R .
Proof Let us assume that we have a circle C of radius r centered at a within R .
Choose z inside C with |z - a| = ρ (ρ < r) and consider the contour integration of
f(z) along C (Fig. 6.14). Then, we have
1 1 1 1 1 z-a
= = a = þ
ζ - z ζ - a - ðz - aÞ ζ - a 1 - ζz -
-a ζ - a ð ζ - aÞ 2
ð z - aÞ 2
þ þ ⋯, ð6:110Þ
ðζ - aÞ3
where ζ is an arbitrary point on the circle C. To derive (6.110), we used the following
formula:
1 1
= xn ,
1-x n=0
z-a z-a ρ
where x = ζ - a. Since ζ-a = r < 1, the geometric series of (6.110) converges.
Suppose that f(ζ) is analytic on C. Then, we have a finite positive number M on
C such that
jf ðζ Þj < M: ð6:111Þ
f ðζ Þ f ðζ Þ ðz - aÞf ðζ Þ ðz - aÞ2 f ðζ Þ
= þ þ þ ⋯: ð6:112Þ
ζ-z ζ-a ð ζ - aÞ 2 ð ζ - aÞ 3
1 - ρr
n
1 ðz - aÞν f ðζ Þ M 1 ρ ν M 1
≤ = ρ -
ν=n
ðζ - aÞνþ1 r ν=n r r 1- r 1 - ρr
M ρ n
= : ð6:113Þ
r-ρ r
1 f ðζ Þ 1 ð z - aÞ n f ðζ Þ
dζ = dζ: ð6:114Þ
C -z
ζ n=0 nþ1
2πi 2πi C ð ζ - aÞ
1 f ðζ Þ
An dζ, ð6:115Þ
2πi C ðζ - aÞnþ1
we get
6.4 Taylor’s Series and Laurent’s Series 225
1 n
f ðzÞ = A ð z - aÞ
n=0 n
: ð6:116Þ
1 f ðζ Þ 1 d n f ðzÞ
dζ = : ð6:117Þ
2πi C ðζ - zÞ
nþ1 n! dzn
1 f ðζ Þ 1 d n f ðzÞ 1 d n f ð aÞ
An = dζ = = : ð6:118Þ
2πi C ðζ - aÞ
nþ1 n! dzn n! dzn
z=a
1 1 d n f ð aÞ
f ðzÞ = ð z - aÞ n :
n = 0 n! dzn
This is the same form as that obtained with respect to a real variable.
Equation (6.116) is a uniformly convergent power series called a Taylor’s series
or Taylor’s expansion of f(z) with respect to z = a with An being its coefficient. In
Theorem 6.14 we have assumed that a union of a circle C and its inside is simply
connected. This topological environment is virtually the same as that of Fig. 6.13. In
Theorem 6.11 (Cauchy’s integral formula) we have shown the integral representa-
tion of an analytic function. On the basis of Cauchy’s integral formula, Theorem
6.14 demonstrates that any analytic function is given a tangible functional form.
Example 6.2 Let us think of a function f(z) = 1/z. This function is analytic except
for z = 0. Therefore, it can be expanded in the Taylor’s series in the region ℂ - {0}.
We consider the expansion around z = a (a ≠ 0). Using (6.115), we get the
expansion coefficients that are expressed as
1 1
An = nþ1
dz,
2πi C z ð z - aÞ
where C is a closed curve that does not contain z = 0 on C or in its inside. Therefore,
1/z is analytic in the simply connected region encircled with C so that the point z = 0
may not be contained inside C (Fig. 6.15).
Then, from (6.106) we get
226 6 Theory of Analytic Functions
0 1
1 1 1 dn ð1=zÞ 1 1 ð- 1Þn
dz = = ð- 1Þn n! nþ1 = nþ1 :
2πi C z ð z - aÞ
nþ1 n! dzn n! a a
z=a
1 1 ð- 1Þn 1 1 z n
f ðzÞ = = n = 0 anþ1
ð z - aÞ n = n=0
1- : ð6:119Þ
z a a
The series uniformly converges within a convergence circle called a “ball” [7]
(vide infra) whose convergence radius r is estimated to be
1 n ð- 1Þn 1 1 1
= nlim = = : ð6:120Þ
n
sup
r →1 anþ1 jaj jaj jaj
R = z; z, ∃ z0 2 ℂ, ρðz, z0 Þ < r :
6.4 Taylor’s Series and Laurent’s Series 227
The region R is an open set by definition (see Sect. 6.1) and the Tailor’s series
defined in R is convergent. On the other hand, the Tailor’s series may or may not be
convergent at z that is on the convergence circle. In Example 6.2, f(z) = 1/z is not
defined at z = 0, even though z (=0) is on the convergent circle. At other points on
the convergent circle, however, the Tailor’s series is convergent.
Note that a region B n in ℝn similarly defined as
B n = x; x, ∃ x0 2 ℝn , ρðx, x0 Þ < r
is sometimes called a ball or open ball. This concept can readily be extended to any
metric space.
Next, we examine Laurent’s series of an analytic function.
Theorem 6.15 [6] Let f(z) be analytic in a region R except for a point a. Then, f(z)
can be described by the uniformly convergent power series within a region R - fag.
Proof Let us assume that we have a circle C1 of radius r1 centered at a and another
circle C2 of radius r2 centered at a both within R . Moreover, let another circle Γ be
centered at z (≠a) within R so that z can be outside of C2; see Fig. 6.16.
In this situation we consider the contour integration along C1, C2, and Γ. Using
(6.87), we have
228 6 Theory of Analytic Functions
1 f ðζ Þ 1 f ðζ Þ 1 f ðζ Þ
dζ = dζ - dζ: ð6:121Þ
2πi Γ ζ - z 2πi C1 ζ - z 2πi C2 - z
ζ
Notice that the region containing Γ and its inside form a simply connected region
in terms of the analyticity of f(z). With the first term of RHS of (6.121), from
Theorem 6.14 we have
1 f ðζ Þ 1 n
dζ = A ð z - aÞ ð6:122Þ
C1 - z
ζ
2πi n=0 n
with
1 f ðζ Þ
An = dζ ðn = 0, 1, 2, ⋯Þ: ð6:123Þ
2πi C1 ðζ - aÞnþ1
In fact, if ζ lies on the circle C1, the Taylor’s series is uniformly convergent as in
the case of the proof of Theorem 6.14.
With the second term of RHS of (6.121), however, the situation is different in
such a way that z lies outside the circle C2. In this case, from Fig. 6.16 we have
ζ-a r
= 2 < 1: ð6:124Þ
z-a ρ
2
1 1 -1 ζ - a ð ζ - aÞ
= = 1þ þ þ⋯ ð6:125Þ
ζ - z ζ - a - ð z - aÞ z - a z - a ð z - aÞ 2
f ðζ Þ 1 1
- dζ = f ðζ Þdζ þ f ðζ Þðζ - aÞdζ
C2 ζ-z z-a C2 ð z - aÞ 2 C2
1
þ f ðζ Þðζ - aÞ2 dζ þ ⋯: ð6:126Þ
ð z - aÞ 3 C2
1 f ðζ Þ 1
- dζ = A - n ð z - aÞ - n , ð6:127Þ
C2 - z
ζ
2πi n=1
where
1
A-n = f ðζ Þðζ - aÞn - 1 dζ ðn = 1, 2, 3, ⋯Þ: ð6:128Þ
2πi C2
1 f ðζ Þ
An = dζ ðn = 0, ± 1, ± 2, ⋯Þ ð6:130Þ
2πi ~
C ðζ - aÞnþ1
but
d n f ðzÞ
≠ 0, ð6:132Þ
dzn z=a
1
= ð z - aÞ n A ð z - aÞ
k = 0 nþk
k
= ðz - aÞn hðzÞ,
Then, h(z) is analytic and non-vanishing at z = a. From the analyticity h(z) must
be continuous at z = a and differ from zero in some finite neighborhood of z = a.
Consequently, it is also the case with f(z). Therefore, if the set of zeros had an
accumulation point at z = a, any neighborhood of z = a would contain another zero,
in contradiction to the above assumption. To avoid this contradiction, the analytic
function f(z) 0 throughout the region R . Taking the contraposition of the above
statement, if an analytic function is not identically zero [i.e., f(z) ≢ 0], the zeros of
that function are isolated; see the relevant discussion of Sect. 6.1.2.
Meanwhile, the Laurent’s series (6.129) can be rewritten as
1 1 1
f ðzÞ = A ðz - aÞk
k=0 k
þ k=1
A-k ð6:134Þ
ð z - aÞ k
so that the constitution of the Laurent’s series can explicitly be recognized. The
second term is called a singular part (or principal part) of f(z) at z = a. The singularity
is classified as follows:
(1) Removable singularity:
If (6.134) lacks the singular part (i.e., A-k = 0 for k = 1, 2, ⋯), the singular point
is said to be a removable singularity. In this case, we have
6.5 Zeros and Singular Points 231
1 k
f ðzÞ = A ð z - aÞ
k=0 k
: ð6:135Þ
f ðaÞ A0 ,
f(z) can be regarded as analytic. Examples include the following case where a sin
function is expanded as the Taylor’s series:
z3 z 5 ð- 1Þn - 1 z2n - 1
sin z = z - þ -⋯ þ þ⋯
3! 5! ð2n - 1Þ!
1 ð- 1Þn - 1 z2n - 1
= : ð6:136Þ
n=1 ð2n - 1Þ!
Hence, if we define
sin z sin z
f ðzÞ and f ð0Þ lim = 1, ð6:137Þ
z z→0 z
Then, the function f(z) is said to have a pole of order n at z = a. If, in particular,
n = 1 in (6.138), i.e.,
the function f(z) is said to have a simple pole. This special case is important in the
calculation of various integrals (vide infra).
In the general cases for (6.138), (6.134) can be rewritten as
gð z Þ
f ðzÞ = , ð6:140Þ
ð z - aÞ n
1 k n 1
f ðzÞ = A ð z - aÞ
k=0 k
þ k=1
A-k :
ð z - aÞ k
When we discussed the Cauchy’s integral formula (Theorem 6.11), we have known
that if a function is analytic in a certain region of ℂ and on a curve C that encircles the
region, the values of the function within the region are determined once the values of
the function on C are given. We have the following theorem for this.
Theorem 6.16 Let f1(z) and f2(z) be two functions that are analytic within a region
R . Suppose that the two functions coincide in a set chosen from among (i) a
neighborhood of a point z 2 R , or (ii) a segment of a curve lying in R , or (iii) a
set of points containing an accumulation point belonging to R . Then, the two
functions f1(z) and f2(z) coincide throughout R .
Proof Suppose that f1(z) and f2(z) coincide on the above set chosen from the three.
Then, since f1(z) - f2(z) = 0 on that set, the set comprises the zeros of f1(z) - f2(z).
This implies that the zeros are not isolated in R . Thus, as evidenced from the
discussion of Sect. 6.5, we conclude that f1(z) - f2(z) 0 throughout R . That is,
f1(z) f2(z). This completes the proof.
6.6 Analytic Continuation 233
−1 1
1 1 ð- 1Þn 1 1 z n
= n = 0 anþ1
ð z - aÞ n = n=0
1- : ð6:119Þ
z a a
Thus, we obtain the same functional entity throughout R except for the origin
(z = 0). Further continuing similar procedures, we finally reach the entire complex
plane except for the origin, i.e., ℂ - {0} regarding the domain of analyticity of 1/z.
We further compare the above results with those for analytic function defined in
the real domain. The tailor’s expansion of f(x) = 1/x (x: real with x ≠ 0) around
a (a ≠ 0) reads as
1 f ðnÞ ðaÞ
f ðxÞ = = f ðaÞ þ f 0 ðaÞðx - aÞ þ ⋯ þ ð x - aÞ n þ ⋯
x n!
1 ð- 1Þn
= n = 0 anþ1
ðx - aÞn : ð6:142Þ
This is exactly the same form as that of (6.119) aside from notation of the
argument.
The aforementioned procedure that determines the behavior of an analytic func-
tion outside the region where that function has been originally defined is called
analytic continuation or analytic prolongation. Comparing (6.119) and (6.142), we
can see that (6.119) is a consequence of the analytic continuation of 1/x from the real
domain to the complex domain.
The theorem of residues and its application to calculus of various integrals are one of
central themes of the theory of analytic functions.
The Cauchy’s integral theorem (Theorem 6.10) tells us that if f(z) is analytic in a
simply connected region R , we have
f ðzÞdz = 0, ð6:84Þ
C
where C is a closed curve within R . As we have already seen, this is not necessarily
the case if f(z) has singular points in R . Meanwhile, if we look at (6.130), we are
aware of a simple but important fact. That is, replacing n with 1, we have
1
A-1 = f ðζ Þdζ or f ðζ Þdζ = 2πiA - 1 : ð6:143Þ
2πi C C
This implies that if we could obtain the Laurent’s series of f(z) with respect to a
singularity located at z = a and encircled by C, we can immediately estimate Cf(ζ)
dζ of (6.143) from the coefficient of the term (z - a)-1, i.e., A-1. This fact further
6.7 Calculus of Residues 235
implies that even though f(z) has singularities, Cf(ζ)dζ may happen to vanish. Thus,
whether (6.84) vanishes depends on the nature of f(z) and its singularities. To
examine it, we define a residue of f(z) at z = a (that is encircled by C) as
1
Res f ðaÞ f ðzÞdz or f ðzÞdz = 2πi Res f ðaÞ, ð6:144Þ
2πi C C
where Res f(a) denotes the residue of f(z) at z = a. From (6.143) and (6.144), we have
In a formal sense, the point a does not have to be a singular point. If it is a regular
point of f(a), we have Res f(a) = 0 trivially. If there is more than one singularity at
z = aj, we have
Notice that we assume the isolated singularities with (6.144) and (6.146). These
equations are associated with Cases (1) and (2) dealt with in Sect. 6.5. Within this
framework, we wish to evaluate the residue of f(z) at a pole of order n located at
z = a. In Sect. 6.5, we had f(z) described by
gð zÞ
f ðzÞ = , ð6:140Þ
ð z - aÞ n
1 gð ζ Þ 1 d n - 1 gð z Þ
Res f ðaÞ = dζ =
2πi C ð ζ - aÞ
n
ðn - 1Þ! dzn - 1 z=a
1 d n - 1 ½ f ð z Þ ð z - aÞ n
= , ð6:147Þ
ðn - 1Þ! dzn - 1 z=a
1 1 gð z Þ
Res f ðaÞ = f ðzÞdz = dz = f ðzÞðz - aÞjz = a = gðzÞjz = a :
2πi C 2πi C -a
z
Then, let us calculate the contour integral of f(z) on a closed circle Γ of radius ρ
centered at z = a. We have
1
I f ðζ Þdζ = A
n= -1 n
ðζ - aÞn dζ
Γ Γ
2π
1
= iA
n= -1 n
ρnþ1 eiðnþ1Þθ dθ: ð6:149Þ
0
I = 2πiA - 1 :
− 0
−i
To this end, we convert the real variable x to the complex variable z and evaluate a
contour integral IC (Fig. 6.18) described by
R
dz dz
IC = þ , ð6:151Þ
-R 1 þ z2 ΓR 1 þ z2
to be
1
Res f ðiÞ = f ðzÞðz - iÞjz = i = : ð6:153Þ
2i
To estimate the integral (6.150), we must calculate the second term of (6.151). To
this end, we change the variable z such that
238 6 Theory of Analytic Functions
z = Reiθ , ð6:154Þ
π π
dz iReiθ dθ iReiθ
= ≤ dθ
ΓR 1 þ z2 0 R2 e2iθ þ 1 0 R2 e2iθ þ 1
π
R
= dθ
0 R2 e2iθ þ 1 R2 e - 2iθ þ 1
π
1 π π
=R dθ ≤ R = :
0 R þ 2R cos 2θ þ 1
4 2
R2 1 - 1
R2
R 1- 1
R2
Taking R → 1, we have
dz
→ 0: ð6:155Þ
ΓR 1 þ z
2
Consequently, if R → 1, we have
1 1
dx dz dx
lim I C = þ lim = =I
R→1 -1 1 þ x2 R → 1 ΓR 1 þ z2 -1 1 þ x2
= 2πi Res f ðiÞ = π, ð6:156Þ
where with the first equality we placed the variable back to x; with the last equality
we used (6.152) and (6.153). Equation (6.156) gives an answer to (6.150). Notice in
general that if a meromorphic function is given as a quotient of two polynomials, an
integral of a type of (6.155) tends to be zero with R → 1 in the case where the
degree of the denominator of that function is at least two units higher than the degree
of the numerator [8].
We may equally choose another contour C ~ (the closed curve comprising the
interval [-R, R] and the lower semicircle Γ~R ) for the integration (Fig. 6.18). In that
case, we have
-1 -1
dx dz dx
lim I C~ = þ lim =
R→1 1 1 þ x2 R → 1 Γ~R 1 þ z
2
1 1 þ x2
= 2πi Res f ð- iÞ = - π: ð6:157Þ
Note that in the above equation the contour integral is taken in the counter-
clockwise direction and, hence, that the real definite integral has been taken from 1
to -1. Notice also that
6.8 Examples of Real Definite Integrals 239
1
Res f ð- iÞ = f ðzÞðz þ iÞjz = - i = - ,
2i
because f(z) has a simple pole at z = - i in the lower semicircle (Fig. 6.18) for which
the residue has been taken. From (6.157), we get the same result as before such that
1
dx
= π:
-1 1 þ x2
1 1 1 1
= - : ð6:158Þ
1 þ z2 2i z - i z þ i
The residue of 1þ1z2 at z = i is 2i1 [i.e., the coefficient of z -1 i in (6.158)], giving the
same result as (6.153).
Example 6.5 Evaluate the following real definite integral:
1
dx
I= : ð6:159Þ
-1 ð 1 þ x2 Þ 3
R
1 dz dz
IC = 3
dz = þ , ð6:160Þ
C ð1 þ z2 Þ -R ð1 þ z2 Þ3 ΓR ð 1 þ z2 Þ 3
where C stands for the closed curve comprising the interval [-R, R] and the upper
semicircle ΓR (see Fig. 6.18).
The function f ðzÞ ð1þz1 2 Þ3 has two isolated poles at z = ± i. In the present case,
however, the poles are of order of 3. Hence, we use (6.147) to estimate the residue.
The inside of the upper semicircle contains the pole of order 3 at z = i. Therefore, we
have
1 1 gð z Þ 1 d2 gðzÞ
Res f ðiÞ = f ðzÞdz = 3
dz = , ð6:161Þ
2πi C 2πi C ðz - i Þ 2! dz2
z=i
where
240 6 Theory of Analytic Functions
1 gð z Þ
gð z Þ and f ðzÞ = :
ðz þ iÞ3 ðz - iÞ3
Hence, we get
1 d 2 gð z Þ 1 1
z=i = ð- 3Þ ð- 4Þ ðz þ iÞ - 5 = ð- 3Þ ð- 4Þ ð2iÞ - 5
2! dz2 2 z=i 2
3
= = Res f ðiÞ: ð6:162Þ
16i
For a reason similar to that of Example 6.4, the second term of (6.160) vanishes
when R → 1. Then, we obtain
1
dz dz gð z Þ
lim I C = 3
þ lim 3
= lim 3
dz = 2πi Res f ðiÞ
R→1 - 1 ð 1 þ z2 Þ R→1 Γ R ð 1 þ z2 Þ R→1 C ðz - i Þ
3 3π
= 2πi × = :
16i 8
That is,
1 1
dz dx 3π
3
= =I = :
- 1 ð1 þ z2 Þ -1 ð 1 þ x2 Þ 3 8
1 1
f ðzÞ = 3
= :
ð1 þ z Þ
2 ðz þ iÞ ðz - iÞ3
3
-3
1 1 1 ζ
f ðzÞ = ~f ðζ Þ = = 3 1þ
3 3
ðζ þ 2iÞ ζ ζ ð2iÞ 3 2i
6.8 Examples of Real Definite Integrals 241
2 3
1 1 ζ ð- 3Þð- 4Þ ζ ð- 3Þð- 4Þð- 5Þ ζ
= 1 þ ð- 3Þ þ þ þ⋯
ζ 3 ð2iÞ3 2i 2! 2i 3! 2i
1 1 3 3 5
=- - - þ þ⋯ , ð6:163Þ
8i ζ 3 2iζ 2 2ζ 4i
1 1 3 3 5
f ðzÞ = - - - þ þ⋯ : ð6:164Þ
8i ðz - iÞ3 2iðz - iÞ2 2ðz - iÞ 4i
Equation (6.164) is a Laurent’s expansion of f(z). From (6.164) we see that the
coefficient of z -1 i of f(z) is 16i
3
, in agreement with (6.145) and (6.162). To obtain the
answer of (6.159), once again we may equally choose the contour C ~ (Fig. 6.18) and
get the same result as in the case of Example 6.4. In that case, f(z) can be expanded
into a Laurent’s series around z = - i. Readers are encouraged to check it.
We make a few remarks about the (generalized) binomial expansion formula. We
described it in (3.181) of Sect 3.6.1 such that
1
ð1 þ xÞ - λ = m=0
-λ
m
xm , ð3:181Þ
1
ð1 þ zÞ - λ = m=0
-λ
m
zm , ð6:165Þ
where λ is any complex number with z also being a complex number of |z| < 1. We
may view (6.165) as a consequence of the analytic continuation and (6.163) has been
dealt with as such indeed.
242 6 Theory of Analytic Functions
1.5
0.5
-0.5
-1
-1.5
-2
-5 -3 -1 1 3 5
We put
1
f ðxÞ : ð6:167Þ
1 þ x3
Figure 6.19 depicts a graphical form of f(x). The modulus of f(x) tends to be
infinity at x = - 1. Inflection points are present at x = 0 and 3 1=2 ð ≈ 0:7937Þ.
Now, in the former two examples, the isolated singular points (i.e., poles) existed
only in the inside of a closed contour. In the present case, however, a pole is located
on the real axis. Bearing in mind this situation, we wish to estimate the integral
(6.166).
Rewriting (6.166), we have
1 1
dx dz
I= or I = :
- 1 ð1 þ xÞðx - x þ 1Þ ð1 þ zÞðz2 - z þ 1Þ
2
-1
6.8 Examples of Real Definite Integrals 243
− −1 0
/
1
f ðzÞ
ð1 þ zÞðz2 - z þ 1Þ
has three simple poles at z = - 1 along with z = e±iπ/3. This time, we have a contour
C depicted in Fig. 6.20, where the contour integration IC is performed in such a way
that
-1-r
dz dz
IC = þ
-R ð 1 þ z Þ ð z 2 - z þ 1Þ Γ-1 ð1 þ zÞðz2 - z þ 1Þ
R
dz
þ
- 1þr ð 1 þ z Þ ð z 2 - z þ 1Þ
dz
þ , ð6:168Þ
ΓR ð 1 þ z Þ ð z 2 - z þ 1Þ
where Γ-1 stands for a small semicircle of radius r around z = - 1 as shown. Of the
complex roots z = e±iπ/3, only z = eiπ/3 is responsible for the contour integration (see
Fig. 6.20).
Here we define a principal value P of the integral such that
244 6 Theory of Analytic Functions
1 -1-r
dx dx
P lim ð6:169Þ
- 1 ð 1 þ xÞ ð x - x þ 1Þ ð 1 þ x Þ ð x2 - x þ 1Þ
2 r→0 -1
1
dx
þ ,
- 1þr ð 1 þ x Þ ðx 2 - x þ 1Þ
where r is an arbitrary real positive number. As shown in this example, the principal
value is a convenient device to traverse the poles on the contour and gives a correct
answer in the contour integral. At the same time, getting R to infinity, we obtain
1
dx dz
lim I C = P þ
R→1 -1 ð 1 þ x Þ ð x2 - x þ 1 Þ Γ-1 ð1 þ zÞðz2 - z þ 1Þ
dz
þ : ð6:170Þ
Γ1 ð 1 þ z Þ ð z 2 - z þ 1Þ
where in the present case f ðzÞ ð1þzÞðz12 - zþ1Þ and we have only one simple pole at
aj = eiπ/3 within the contour C.
We are going to get the answer by combining (6.170) and (6.146) such that
1
dx
IP
-1 ð1 þ x Þ ð x 2 - x þ 1Þ
dz
= 2πi Res f eiπ=3 -
Γ-1 ð 1 þ z Þ ðz 2 - z þ 1Þ
dz
- : ð6:171Þ
Γ1 ð1 þ zÞðz2 - z þ 1Þ
The third term of RHS of (6.171) tends to be zero as in the case of Examples 6.4
and 6.5. Therefore, if we can estimate the second term as well as the residues
properly, the integral (6.166) can be adequately determined. Note that in this case
the integral is meant by the principal value. To estimate the integral of the second
term, we make use of the fact that defining g(z) as
6.8 Examples of Real Definite Integrals 245
1
gð z Þ , ð6:172Þ
z2 - z þ 1
g(z) is analytic at and around z = - 1. Hence, g(z) can be expanded in the Taylor’s
series around z = - 1 such that
1
gðzÞ = A0 þ A ðz
n=1 n
þ 1Þ n , ð6:173Þ
where A0 ≠ 0. In fact,
Then, we have
dz gðzÞdz dz
= = A0
ð þ Þ ð Γ-1 1 þ z Γ-1 þ z
1 z z 2 - z þ 1Þ 1
Γ-1
1
þ A ðz
n=1 n
þ 1Þn - 1 dz: ð6:175Þ
Γ-1
z = - 1 þ reiθ , ð6:176Þ
0 π
ireiθ dθ
A0 iθ
= A0 idθ = A0 idθ = - A0 idθ = - A0 iπ: ð6:177Þ
Γ - 1 re Γ-1 π 0
Note that in (6.177) the contour integration along Γ-1 was performed in the
decreasing direction of the argument θ from π to 0. Thus, (6.175) is rewritten as
0
dz 1
= - A0 iπ þ An i rn einθ dθ, ð6:178Þ
Γ - 1 ð1 þ zÞðz - z þ 1Þ
2 n=1
π
where with the last equality we exchanged the order of the summation and integra-
tion. The second term of RHS of (6.178) vanishes when r → 0. Thus, (6.171) is
further rewritten as
We have only one simple pole for f(z) at z = eiπ/3 within the semicircle contour C.
Then, using (6.148), we have
246 6 Theory of Analytic Functions
1 1 p
= =- 1 þ 3i : ð6:180Þ
ð1 þ eiπ=3 Þðeiπ=3 - e - iπ=3 Þ 6
The above calculus is straightforward, but (6.37) can be conveniently used. Thus,
finally we get
p
πi p πi 3π π
I= - 1 þ 3i þ = =p : ð6:181Þ
3 3 3 3
That is,
I ≈ 1:8138: ð6:182Þ
1 R
P f ðxÞdx lim f ðxÞdx:
-1 R→1 -R
The integral can also be calculated using a lower semicircle including the simple
p
1 - 3i
pole located at z = e - 3 =
iπ
2 . That gives the same answer as (6.181). The
calculations are left for readers as an exercise.
Example 6.7 [9, 10] It will be interesting to compare the result of Example 6.6 with
that of the following real definite integral:
1
dx
I= : ð6:183Þ
0 1 þ x3
R
dz dz dz
IC = þ þ : ð6:184Þ
0 1 þ z3 ΓR 1 þ z3 L 1 þ z3
In the third integral of (6.184), we take argz = θ (constant) so that with z = reiθ
we may have dz = dreiθ. Then, that integral is given by
6.8 Examples of Real Definite Integrals 247
(a) (b)
Γ
Γ
/
i /
i
Γ
−1 0 − −1 0
/ /
1
Fig. 6.21 Sector of circle for contour integration of 1þz 3 that appears in Example 6.7. (a) The
dz dr
= :
L 1 þ z3 L 1 þ r 3 e3iθ
0 R
dz dr dr
= e2πi=3 = - e2πi=3 :
L 1 þ r 3 e3iθ R 1 þ r3 0 1 þ r3
Meanwhile, considering that f(z) has a simple pole at z = eiπ/3 within the contour
C, i.e., inside the sector (see Fig. 6.21a), we have
where f(x) was given in (6.167) of Example 6.6. Then, equating (6.186) with (6.187)
we have
248 6 Theory of Analytic Functions
To calculate Res f(eiπ/3), once again we used (6.148) at z = eiπ/3; see (6.180).
p
Noting that (1 + eiπ/3)(1 - e2πi/3) = 3 and eiπ=3 - e - iπ=3 = 2i sinðπ=3Þ = 3i, we get
p 2π
I = 2πi= 3 3i = p : ð6:188Þ
3 3
0
~I = P dx
: ð6:189Þ
-1 1 þ x
3
0
dz dz dz dz
I C0 = P þ þ þ : ð6:190Þ
-R 1 þ z3 Γ-1 ð 1 þ z Þ ð z 2 - z þ 1Þ
L 0 1 þ z3 ΓR 1 þ z3
0 1 0
dz πi dr dz π
lim I C0 = 0 = P - þ e2πi=3 =P - p ,
-1 1 þ z 1 þ r3 1 þ z3 3 3
R→1 3 3 0 -1
where we used (6.188) and (6.178) in combination with (6.174). Thus, we get
0
~I = dx π
= p : ð6:191Þ
-1 1 þ x
3
3 3
π
= iR f Reiθ eiαR cos θ - αR sin θþiθ dθ:
0
where ε(R) is a certain positive number that depends only on R and tends to be zero
as jζ j → 1. Therefore, we have
π π=2
jI R j < εðRÞR e - αR sin θ dθ = 2 εðRÞR e - αR sin θ dθ,
0 0
where the last equality results from the symmetry of sinθ about π/2. Meanwhile, we
have
Hence, we have
π=2
πεðRÞ
jI R j < 2 εðRÞR e - 2αRθ=π dθ = 1 - e - αR :
0 α
Therefore, we get
250 6 Theory of Analytic Functions
lim I R = 0:
R→1
Rewriting (6.192) as
1 1
sin z 1 eiz - e - iz
I= dz = dz, ð6:193Þ
-1 z 2i -1 z
R
eiz eiz eiz
IC = P dz þ dz þ dz, ð6:194Þ
-R z Γ0 z ΓR z
1 ðizÞn 1 ðizÞn
eiz = n = 0 n!
=1 þ n = 1 n!
: ð6:195Þ
Γ
− 0
eiz
dz = - 1 × iπ = - iπ: ð6:196Þ
Γ0 z
Notice that in (6.196) the argument is decreasing from π → 0 during the contour
integration. Within the contour C, there is no singular point, and so IC = 0 due to
Theorem 6.10 (Cauchy’s integral theorem). Thus, from (6.194) we obtain
1
eiz eiz
P dz = - dz = iπ: ð6:197Þ
-1 z Γ0 z
-1 1
e - iz e - iz
P - dz = P dz = iπ: ð6:198Þ
1 -z -1 - z
1 1
ðeiz - e - iz Þ sin z
P dz = 2iP dz = 2iπ: ð6:200Þ
-1 z -1 z
1 1
sin z sin x
P dz = P dx = I = π: ð6:201Þ
-1 z -1 x
1
sin 2 x
I= dx: ð6:202Þ
-1 x2
1
sin 2 x = ð1 - cos 2xÞ: ð6:203Þ
2
Then, the integrand of (6.202) with the variable changed is rewritten using (6.37)
as
1 ð2izÞn 1 ð2izÞn
e2iz = n = 0 n!
= 1 þ 2iz þ n = 2 n!
: ð6:205Þ
Then, we get
1 ð2izÞn
1 - e2iz = - 2iz - n = 2 n!
,
1 - e2iz 2i 1 ð2iÞn zn - 2
=- - n=2
: ð6:206Þ
z2 z n!
where the contour for the integration is the same as that of Fig. 6.22.
With the second integral of (6.207), as before, only the first term -2i/z of (6.206)
contributes to the integral. The result is given as
6.8 Examples of Real Definite Integrals 253
0
1 - e2iz 1
dz = - 2i dz = - 2i idθ = - 2π: ð6:208Þ
Γ0 z2 Γ0 z π
1 - e2iz 1 e2iz
dz = dz - dz: ð6:209Þ
Γ1 z2 Γ1 z2
Γ1 z
2
The first term of RHS of (6.209) vanishes similarly to the integral of (6.155) that
appeared in Example 6.4. The second term of RHS vanishes as well because of the
Jordan’s lemma. Within the contour C of (6.207), again there is no singular point,
and so lim I C = 0. Hence, from (6.207) we have
R→1
1
1 - e2iz 1 - e2iz
P dz = - dz = 2π: ð6:210Þ
-1 z2 Γ0 z2
-1 1
1 - e - 2iz 1 - e - 2iz
P - dz = P dz = 2π: ð6:211Þ
1 ð- zÞ2 -1 z2
Summing both sides of (6.210) and (6.211) in combination with (6.204), we get
an answer described by
1
sin 2 z
dz = π: ð6:212Þ
-1 z2
1 1 1 1
= - : ð6:214Þ
ðx - aÞðx - bÞ a - b x - a x - b
(b)
− 0
Γ
1
1 cos z cos z
I= - dz
a-b -1 z-a z-b
1
1 eiz þ e - iz eiz þ e - iz
= - dz: ð6:215Þ
2 ð a - bÞ -1 z-a z-b
We wish to consider the integration by dividing the integrand and calculate each
term individually. First, we estimate
1
eiz
dz: ð6:216Þ
-1 z - a
To apply the Jordan’s lemma to the integration of (6.216), we use the upper
contour C that consists of the real axis, Γa, and ΓR; see Fig. 6.23a. As usual, we
consider a following integral:
6.8 Examples of Real Definite Integrals 255
1
eiz eiz eiz
lim I C = P dz þ dz þ dz, ð6:217Þ
R→1 -1 z - a Γa z - a Γ1 z - a
where with the last equality we used (6.196). There is no singular point within the
contour C, and so lim I C = 0. Then, we have
R→1
1
eiz eiz
P dz = - dz = eia iπ: ð6:219Þ
-1 - a
z Γa - a
z
To apply the Jordan’s lemma to the integration of (6.220), we use the lower
contour C′ that consists of the real axis, Γ~a , and Γ~R (Figure 6.23b). This time, the
integral to be evaluated is given by
1
e - iz e - iz e - iz
lim I C0 = - P dz þ dz þ dz ð6:221Þ
R→1 -1 - a
z Γ~a - a
z Γ~1 - a
z
so that we can trace the contour C′ counter-clockwise. Notice that the minus sign is
present in front of the principal value. The third term of (6.221) vanishes due to the
Jordan’s lemma. Evaluating the second term of (6.221) similarly just above, we get
e - iz e - iðzþaÞ e - iz
dz = dz = e - ia dz = - e - ia iπ: ð6:222Þ
Γ~a z - a Γ~0 z Γ~0 z
Hence, we obtain
1
e - iz e - iz
P dz = dz = - e - ia iπ: ð6:223Þ
-1 z - a Γ~a z - a
Notice that in (6.222) the argument is again decreasing from 0 → - π during the
contour integration. Summing (6.219) and (6.223), we get
256 6 Theory of Analytic Functions
1 1 1
eiz e - iz eiz þ e - iz
P dz þ P dz = P dz
-1 z - a -1 z - a -1 z-a
1
1 eiz þ e - iz eiz þ e - iz 2π ðsin b - sin aÞ
I= - dz =
2ð a - bÞ -1 z-a z-b 2ða - bÞ
π ðsin b - sin aÞ
= :
a-b
The calculation is left for readers. Hints: (i) Use cosx = (eix + e-ix)/2 and Jordan’s
lemma. (ii) Use the contour as shown in Fig. 6.18. Besides the examples listed
above, a variety of integral calculations can be seen in literature [9–12].
The last topics of the theory of analytic functions are related to multivalued func-
tions. Riemann surfaces play a central role in developing the theory of the
multivalued functions.
Definition 6.12 Let n be a positive integer and let z be an arbitrary complex number.
Suppose z can be written as
z = wn : ð6:226Þ
w = z1=n ð6:227Þ
and
Then, we have
where with the last equality we used de Moivre’s theorem (6.36). Comparing the real
and imaginary parts of (6.32) and (6.228), we have
r = ρn or ρ = r 1=n ð6:229Þ
and
258 6 Theory of Analytic Functions
(a) 1 /
(−1) /
−1 1
(b) 1 /
(−1) /
/
/
1 −1
/
/
Fig. 6.24 Multiple roots in the complex plane. (a) Square roots of 1 and -1. (b) Cubic roots of
1 and -1
1
φ= ðθ - 2kπ Þ ðk = 0, ± 1, ± 2, ⋯Þ:
n
1 1 1 1
wk ðθÞ = zn = r n cos ðθ - 2kπ Þ þ i sin ðθ - 2kπ Þ
n n
ðk = 0, 1, 2, ⋯, n - 1Þ, ð6:231Þ
That is, wk(θ) is a periodic function of the period 2nπ. The number of the total
branches is n; the index k of wk(θ) denotes these different branches. In particular,
when k = 0, we have
1 1 1 1
w0 ðθÞ = zn = r n cos θ þ i sin θ : ð6:233Þ
n n
From (6.234), we find that wk(θ) is obtained by shifting (or rotating) w0(θ) by 2kπ
toward the positive direction of θ. The branch w0(θ) is called a principal branch of
the n-th root of z. The value of w0(θ) is called the principal value of that branch. The
implication of the presence of the branches is as follows: Suppose that the branches
besides w0(θ) would be absent. Then, from (6.233) we have
1 1 2π 2π 1
w0 ð2π Þ = zn = r n cos þ i sin ≠ w0 ð0Þ = r n :
n n
This means that the value of w0(0) is recovered and, hence, the single-valuedness
remains intact. After another cycle around the origin, similarly we have
260 6 Theory of Analytic Functions
1 1 1 1
wn ðθÞ = zn = r n cos ðθ - 2nπ Þ þ i sin ðθ - 2nπ Þ
n n
1 1
= r 1=n cos θ - 2π þ i sin θ - 2π
n n
1 1 1
= r n cos θ þ i sin θ = w0 ðθÞ:
n n
Thus, we have no more new function. At the same time, the single-valuedness of
wk(θ) (k = 0, 1, 2, ⋯, n - 1) remains intact during these processes. To summarize the
above discussion, if we have n planes (or sheets) as the complex planes and allocate
them to w0(θ), w1(θ), ⋯, wn - 1(θ) individually, we have a single-valued function
w(θ) as a whole throughout these planes. In a word, the following relation represents
the situation:
w 0 ðθ Þ for Plane 0,
w 1 ðθ Þ for Plane 1,
wð θ Þ =
⋯ ⋯
wn - 1 ðθÞ for Plane n - 1:
This is the essence of the Riemann surface. The superposition of these n planes is
called a Riemann surface and each plane is said to be a Riemann sheet [of the
function w(θ)]. Each single-valued function w0(θ), w1(θ), ⋯, wn - 1(θ) defined on
each Riemann sheet is called a branch of w(θ). In the above discussion, the origin is
called a branch point of w(z).
For simplicity, let us think of the function w(z) described by
p
wðzÞ z1=2 = z:
In this case, for z = reiθ (0 ≤ θ ≤ 2π) we have two different values w0 and w1
given by
6.9 Multivalued Functions and Riemann Surfaces 261
Then, we have
In this case, w(z) is called a “two-valued” function of z and the functions w0 and
w1 are said to be branches of w(z). Suppose that z makes a counter-clockwise circuit
around the origin, starting from, e.g., a real positive number z = z0 to come full circle
to the original point z = z0. Then, the argument of z has been increased by 2π. In this
situation the arguments of the individual branches w0 and w1 are increased by π.
Accordingly, w0 is switched to w1 and w1 is switched to w0. This situation can be
understood more clearly from (6.232) and (6.234). That is, putting n = 2 in (6.232),
we have
w1 ðθÞ = w0 ðθ - 2π Þ: ð6:238Þ
(b)
(a) Plane I placed on top of
Plane II
Plane I Plane II
Fig. 6.25 Simple kit that helps visualize the Riemann surface. To make it, follow next procedures:
(a) Take two sheets of paper (Planes I and II) and make slits (indicated by dashed lines) as shown.
Next, put Plane I on top of Plane II so that the two slits (i.e., branch cuts) can fit in line. (b) Tape
together the downside of the cut of Plane I and the foreside of the cut of Plane II. Then, also tape
together the downside of the cut of Plane II and the foreside of the cut of Plane I
Fig. 6.25a for the processes (i) and (ii). (iii) Next, put Plane I on top of Plane II so that
the two branch cuts can fit in line. (iv) Tape together the downside of the cut of Plane
I and the foreside of the cut of Plane II (Fig. 6.25b). (v) Then, also tape together the
downside of the cut of Plane II and the foreside of the cut of Plane I (see Fig. 6.25b
once again).
Thus, what we see is that starting from, e.g., a real positive number z = z0 of Plane
I and coming full circle to the original point z = z0, then we cross the cut to enter
Plane II located underneath Plane I. After another cycle within Plane II, we come
back to Plane I again by crossing the cut. After all, we come back to the original
Plane I after two cycles on the combined planes. This combined plane is called the
Riemann surface. In this way, Planes I and II correspond to different branches w0 and
w1, respectively, so that each branch can be single-valued. In other words, w(z) = z1/2
is a single-valued function that is defined on the whole Riemann surface.
There are a variety of Riemann surfaces according to the nature of the complex
functions. Another example of the multivalued function is expressed as
where two branch points are located at z = a and z = b. As before, we assume that
Then, we have
6.9 Multivalued Functions and Riemann Surfaces 263
p
wðθa , θb Þ = z1=2 = r a r b eiðθa þθb Þ=2 : ð6:239Þ
From (6.242), we find that adding 2π to θa or θb, w0 and w1 are switched to each
other as before. We also find that after the variable z has come full circle around one
of a and b, w0(θa, θb) changes sign as in the case of (6.235).
264 6 Theory of Analytic Functions
We also have
From (6.243), however, after the variable z has come full circle around both a and
b, both w0(θa, θb) and w1(θa, θb) keep the original value. This is also the case where
the variable z comes full circle without encircling a or b. These behaviors imply that
(i) if a contour encircles one of z = a and z = b, w0(θa, θb) and w1(θa, θb) change sign
and switch to each other. Thus, w0(θa, θb) and w1(θa, θb) form branches. On the other
hand, (ii) if a contour encircles both z = a and z = b, w0(θa, θb) and w1(θa, θb) remain
intact. (iii) If a contour encircles neither z = a nor z = b, w0(θa, θb) and w1(θa, θb)
remain intact as well.
These three cases (i), (ii), and (iii) are depicted in Fig. 6.27a, b, and c, respec-
tively. In Fig. 6.27c after z comes full circle, argz relative to z = a returns to the
original value keeping θ1 ≤ arg z ≤ θ2. This is similarly the case with argz relative to
z = b. Correspondingly, the branch cut(s) are depicted with doubled broken line(s),
e.g., as shown in Fig. 6.28. Note that in the cases (ii) and (iii) the branch cut can be
chosen so that the circle (or contour) may not cross the branch cut. Although a bit
complicated, the Riemann surface can be shaped accordingly. Other choices of the
branch cuts are shown in Sect. 6.9.2 (vide infra).
When we consider logarithm functions, we have to deal with rather complicated
situation. For polar coordinate representation, we have z = reiθ. That is,
where
When we deal with a function having a branch point, we must remind that unless the
contour crosses the branch cut (either algebraic or logarithmic), a function we are
6.9 Multivalued Functions and Riemann Surfaces 265
(b)
(a)
0 0
(c)
0
Fig. 6.27 Geometrical relationship between contour C and two branch points (located at z = a and
z = b) for ðz - aÞðz - bÞ. (a) The contour C encircles only z = a. (b) C encircles both z = a and
z = b. (c) C encircles neither z = a nor z = b
(a) (b)
0 0
Fig. 6.28 Branch cut(s) for ðz - aÞðz - bÞ. (a) A branch cut is shown by a line connecting z = a
and z = b. (b) Branch cuts are shown by a line connecting z = a and z = 1 and another line
connecting z = b and z = 1. In both (a) and (b) the branch cut(s) are depicted with doubled broken
line(s)
266 6 Theory of Analytic Functions
−1
Γ
0
thinking of is held single-valued and analytic (if that function is originally analytic)
and can be differentiated or integrated in a normal way. We give some examples
for this.
Example 6.12 [6] Evaluate the following real definite integral:
1
xa - 1
I= dx ð0 < a < 1Þ: ð6:246Þ
0 1þx
We rewrite (6.246) as
1 a-1
z
I= dz: ð6:247Þ
0 1þz
za - 1
f ðzÞ ,
1þz
za - 1 = eða - 1Þ ln z :
The function f(z) has a branch point at z = 0 and a simple pole at z = - 1. Bearing
in mind this situation, we consider a contour for integration (see Fig. 6.29). In
Fig. 6.29 we depict the branch cut with a doubled broken line. Lines PQ and Q′P′
are contour lines located over and under the branch cut, respectively. We assume that
6.9 Multivalued Functions and Riemann Surfaces 267
the lines PQ and Q′P′ are illimitably close to the real axis. Thus, starting from the
point P the contour integration IC is described by
za - 1 za - 1 za - 1 za - 1
IC = dz þ dz þ dz þ dz, ð6:248Þ
PQ 1 þ z ΓR 1 þ z Q0 P0 1 þ z Γ0 1 þ z
where ΓR and Γ0 denote the outer large circle and inner small circle of their radius
R (≫1) and r (≪1), respectively. Note that with the contour integration ΓR is traced
counter-clockwise but Γ0 is traced clockwise (see Fig. 6.29). Since the simple pole is
present at z = - 1, the related residue Res f(-1) is
a-1
Res f ð- 1Þ = za - 1 z= -1
= ð- 1Þa - 1 = eiπ = - eaiπ :
Then, we have
za - 1 Ra - 1 za - 1 ra - 1
dz < 2πR → 0 and dz < 2πr → 0: ð6:250Þ
ΓR 1 þ z R-1 Γ0 1 þ z 1-r
Moreover, we have
eða - 1Þ lnjzj
= eða - 1Þ2πi dz: ð6:252Þ
Q0 P0 1þz
Notice that the argument underneath the branch cut (i.e., the line Q′P′) is
increased by 2π relative to that over the branch cut.
Considering (6.249)–(6.252) and taking the limit of R → 1 and r → 0, (6.248) is
rewritten by
268 6 Theory of Analytic Functions
1 ða - 1Þ lnjzj 0 ða - 1Þ lnjzj
e e
- 2πi eaπi = dz þ eða - 1Þ2πi dz
0 1þz 1 1þz
1 ða - 1Þ lnjzj 1 a-1
e z
= 1 - eða - 1Þ2πi dz = 1 - eða - 1Þ2πi dz: ð6:253Þ
0 1þz 0 1þz
In (6.253) we have assumed z to be real and positive (see Fig. 6.29), and so we
have e(a - 1) ln |z| = e(a - 1) ln z = za - 1. Therefore, from (6.253) we get
1 a-1
z 2πi eaπi 2πi eaπi
dz = - =- : ð6:254Þ
0 1þz 1-e ð a - 1 Þ2πi 1 - e2πai
Using (6.37) for (6.254) and performing simple trigonometric calculations (after
multiplying both the numerator and denominator by 1 - e-2πai), we finally get
1 a-1 1
z xa - 1 π
dz = dx = I = :
0 1þz 0 1þx sin aπ
b
1
I= dx ða, b : real with a < bÞ: ð6:255Þ
a ðx - aÞðb - xÞ
1
IC = dz, ð6:256Þ
C ð z - aÞ ð z - bÞ
1
f ðzÞ :
ðz - aÞðz - bÞ
The total contour C comprises Ca, PQ, Cb, and Q′P′ as depicted in Fig. 6.30. The
function f(z) has two branch points at z = a and z = b; otherwise f(z) is analytic. We
6.9 Multivalued Functions and Riemann Surfaces 269
draw a branch cut with a doubled broken line as shown. Since the contour C does not
cross the branch cut but encircle it, we can evaluate the integral in a normal manner.
Starting from the point P′, (6.256) can be expressed by
1 1 1
IC = dz þ dz þ dz
Ca ðz - aÞðz - bÞ PQ ð z - aÞ ð z - bÞ Cb ðz - aÞðz - bÞ
1
þ dz: ð6:257Þ
QP0 0 ð z - aÞ ð z - bÞ
We assume that the lines PQ and Q′P′ are illimitably close to the real axis. This
time, Ca and Cb are both traced counter-clockwise. Putting
we have
1
f ðzÞ = p e - iðθ1 þθ2 Þ=2 : ð6:258Þ
r1 r2
z - b = εeiθ2 ð- π ≤ θ2 ≤ π Þ; i:e:, r 2 = ε:
π π p
1 iε ε
dz = p e - iðθ1 þθ2 Þ=2 dθ2 ≤ p dθ2 :
Cb ðz - aÞðz - bÞ -π r1 ε -π r1
Therefore, we get
1 1
lim dz = 0 or lim dz = 0: ð6:259Þ
ε→0 Cb ð z - aÞ ð z - bÞ ε→0 Cb ð z - aÞ ð z - bÞ
1
lim dz = 0:
ε→0 C
a ð z - aÞ ð z - bÞ
1 a
e - iπ=2 a
i
dz = dx = - dx
0 0
QP ðz - aÞðz - bÞ b ð x - aÞ ð b - xÞ b ð x - aÞ ð b - xÞ
b
1
=i dx: ð6:261Þ
a ðx - aÞðb - xÞ
On the line PQ, in turn, we have r1 = x - a and θ1 = 2π. This is because when
going from P′ to P, the argument is increased by 2π; see Fig. 6.30 once again. Also,
we have r2 = b - x, θ2 = π, and dz = dx. Notice that the argument θ2 remains
unchanged. Then, we get
1 b
e - 3iπ=2
dz = dx
PQ ð z - aÞ ð z - bÞ a ðx - aÞðb - xÞ
b
1
=i dx: ð6:262Þ
a ðx - aÞðb - xÞ
b
1 1
IC = dz = 2i dx: ð6:263Þ
C ð z - aÞ ð z - bÞ a ð x - aÞ ð b - xÞ
1
z= , ð6:264Þ
w
we have
1 dw
ð z - a Þ ð z - bÞ = ð1 - awÞð1 - bwÞ with dz = - : ð6:265Þ
w w2
Thus, we have
6.9 Multivalued Functions and Riemann Surfaces 271
(a) (b)
0 1/ 1/ 0
1/ 1/
Fig. 6.31 Branch cuts (shown with doubled broken lines) and contour in the w-plane for the
integration of p 1
. (a) Case of 0 < a < b. (b) Case of a < 0 < b. Notice that in both
w ð1 - awÞð1 - bwÞ
cases the contour C′ is traced clockwise (see text)
1 1
IC = dz = - dw, ð6:266Þ
C ð z - aÞ ð z - bÞ C0 w ð1 - awÞð1 - bwÞ
where the closed contour C′ of diameter 1/R comes full circle clockwise around the
origin of the w-plane (see Fig. 6.31). This is because as in the z-plane the contour C is
traced viewing the point z = 1 on the right, the contour C′ is traced viewing the
corresponding point w = 0 on the right. The above clockwise cycle results from such
a situation. The integrand of (6.266) has a simple pole at w = 0.
Regarding other singularities of the integrand, there are two branch points at
w = 1/a and w = 1/b. Then, we appropriately choose R ≫ 1 (or 1/R ≪ 1) so that C′
does not cut across the branch cut. To meet this condition, we have two choices.
(i) The branch cut connects w = 1/a and w = 1/b. (ii) Another choice of the branch
cut is that it connects w = 1/a and w = - 1 along with w = 1/b and w = 1 (see
Fig. 6.31). If we have 0 < a < b, we can choose a branch cut as that depicted in
Fig. 6.31a corresponding to Fig. 6.28a. In case a < 0 < b, the branch cut can be that
shown in Fig. 6.31b corresponding to Fig. 6.28b. (When a < b < 0, the branch cut
geometry may be that similar to Fig. 6.31a.)
Consequently, in both the above cases of (i) and (ii) there is only a simple pole at
w = 0 without any other singularities inside the contour C′. Therefore, according to
(6.148) we have a contour integration described by
1 1
dw = - 2πi Res
C 0 w ð1 - awÞð1 - bwÞ w ð1 - awÞð1 - bwÞ w=0
1
= - 2πi = - 2πi, ð6:267Þ
ð1 - awÞð1 - bwÞ w=0
where the above minus sign is due to the clockwise rotation of the contour integra-
tion. Hence, from (6.266) we get
272 6 Theory of Analytic Functions
1
IC = dz = 2πi: ð6:268Þ
C ð z - aÞ ð z - bÞ
-λ
1 - 2tx þ t 2 : ð6:269Þ
f ðt Þ 1 - 2tx þ t 2 : ð6:270Þ
We rewrite (6.270) as
p p
f ð t Þ = t - x þ i 1 - x2 t - x - i 1 - x2 : ð6:271Þ
p
With f(t) = 0 we have two roots at t ± = x ± i 1 - x2 , and so |t±| = 1 (see Sect. 3.
6.1).
When λ = 1/2, we are dealing with essentially the same problem as that of
Example 6.13. For instance, when x = 0, the branch points are located at t = ± i.
p p
When x = 1= 2, the branch points are positioned at t = ð1 ± iÞ= 2. These values of
t form the branch points. In Fig. 6.32 we depict these branch points and
corresponding branch cuts. In this situation, the function
- 1=2
hðt Þ 1 - 2tx þ t 2 ð6:272Þ
is analytic within a circle jt j < 1 of the complex plane t [14] (Fig. 6.32). Therefore,
we can freely expand h(t) into a Taylor’s series at any point within that circle. If we
expand h(t) around t = 0, we expect to have a following Taylor’s series:
1
hðt Þ = c ðxÞt
n=0 n
n
: ð6:273Þ
1 hð t Þ
cn ð xÞ = nþ1
dt, ð6:274Þ
2πi Ct
where the contour C can be chosen so that C can encircle t = 0 in a region jt j < 1.
Let us make a substitution such that [14]
6.9 Multivalued Functions and Riemann Surfaces 273
−1 0 1
(1 − )/ 2
−i
1
1 - ut = 1 - 2tx þ t 2 2 : ð6:275Þ
Then we have
2ð u - xÞ
t= and dt = 2ð1 - ut Þdu= u2 - 1 with u ≠ ± 1: ð6:276Þ
u2 - 1
n n
1 ð- 1Þn ð1 - u2 Þ ð- 1Þn dn ð1 - u2 Þ
cn ð xÞ = du = Pn ðxÞ: ð6:278Þ
2πi C 0 2 ð u - xÞ
n nþ1 2n n! dun
u=x
The functions Pn(x) are Legendre polynomials that have already appeared in
(3.145) and (3.207) of Sects. 3.5 and 3.6. In this manner, the above argument directly
confirms that the Legendre polynomials of (3.194) derived from the generating
function (3.180) or defined as Rodrigues formula of (3.145) are identical in the
complex domain. Originally, (3.145) and (3.207) were defined in the real domain.
The identical representations in both the domains real and complex are the conse-
quence of the analytic continuation.
Substituting x = ± 1 for (6.277), we get
274 6 Theory of Analytic Functions
n
1 ð- 1Þn ð1 - u2 Þ 1 1 ðu þ 1Þn 1
c n ð 1Þ = du = n du = n ðu þ 1Þn
C0 u - 1
nþ1
2πi C 0 2 ð u - 1Þ 2 2πi 2
n
u=1
=1 ð6:279Þ
and
n
1 ð- 1Þn ð1 - u2 Þ 1 1 ð u - 1Þ n 1
cn ð- 1Þ = du = n du = n ðu - 1Þn
2πi C 2 ðu þ 1Þ
0 n nþ1 2 2πi C 0 u þ 1 2
u= -1
= ð- 1Þn : ð6:280Þ
Notice that in (6.279) and (6.280), the integrand has a simple pole at u = 1 and
u = - 1, respectively. Hence, we used (6.144) and (6.148) with the contour
integration. The contour C′ is taken so that it can encircle u = 1 in the former case
and u = - 1 in the latter. Hence, we have important relations.
P n ð 1Þ = 1 and Pn ð- 1Þ = ð- 1Þn :
Summarizing this chapter, we have explored the theory and applications of the
analytic functions on the complex domain. We have studied how the theory of
analytic functions produces fruitful results.
References
In this chapter, we first represent Maxwell’s equations as vector forms. The equa-
tions are represented as a differential form that is consistent with a viewpoint based
on “action trough medium.” The equation of wave motion (or wave equation) is
naturally derived from these equations.
Maxwell’s equations of electromagnetism are expressed as follows:
div D = ρ, ð7:1Þ
div B = 0, ð7:2Þ
∂B
rot E þ = 0, ð7:3Þ
∂t
∂D
rot H - = i: ð7:4Þ
∂t
In (7.3), RHS denotes a zero vector. Let us take some time to get acquainted with
physical quantities with their dimension as well as basic ideas and concepts along
with definitions of electromagnetism.
m2 = m2] (or electric displacement);
The quantity D is called electric flux density [As C
Vx
V = ð e1 e 2 e 3 Þ Vy : ð7:5Þ
Vz
∂V x ∂V y ∂V z
div V þ þ : ð7:6Þ
∂x ∂y ∂z
Thus, the div operator converts a vector to a scalar. The quantities D and the
electric field E [V/m] are associated with the following expression:
D = εE, ð7:7Þ
2
where ε [Nm
C
2] is called a dielectric constant (or permittivity) of the dielectric medium.
The dimension can be understood from the following Coulomb’s law that describes a
force exerted between two charges:
1 QQ0
F= , ð7:8Þ
4πε r 2
where F is the force; Q and Q′ are electric charges of the two charges; r is a distance
between the two charges. Equation (7.1) represents Gauss’ law of electrostatics.
The electric charge of 1 Coulomb (1 [C]) is defined as follows: Suppose that two
point charges having the same electric charge are placed in vacuum 1 m apart. In that
situation, if a force F between the two charges is
where ~c is a light velocity, then we define the electric charge which each point charge
possesses as 1 [C]. Here, note that ~c is a dimensionless number related to the light
velocity in vacuum that is measured in a unit of [m/s]. Notice also that the light
velocity c in vacuum is defined as
7.1 Maxwell’s Equations and Their Characteristics 279
Thus, we have
107 C2 C2
ε0 = 2 2
≈ 8:854 x 10 - 12 : ð7:9Þ
4π~c Nm Nm2
Meanwhile, 1 s (second) has been defined from a certain spectral line emitted
from 133Cs. Thus, 1 m (meter) is defined by
ðdistance along which light is propagated in vacuum during 1 sÞ=299 792 458:
sents Gauss’s law of magnetostatics. In contrast to (7.1), RHS of (7.2) is zero. This
corresponds to the fact that although a true charge exists, a true “magnetic charge”
does not exist. (More precisely, such charge has not been detected so far.) The
quantity B is connected with magnetic field H by the following relation:
B = μH, ð7:10Þ
4π N
μ0 = : ð7:11Þ
107 A2
Also we have
μ0 ε0 = 1=c2 : ð7:12Þ
We will encounter this relation later again. Since the quantities c and ε0 have a
defined magnitude, so does μ0 from (7.12).
We often make an issue of a relative magnitude of dielectric constant and
magnetic permeability of a dielectric medium. That is, relative permittivity εr and
relative permeability μr are defined as follows:
Note that both εr and μr are dimensionless quantities. Those magnitudes are equal
to 1 (in the case of vacuum) or larger than 1 (with any other dielectric media).
Equations (7.3) and (7.4) deal with the change in electric and magnetic fields with
time. Of these, (7.3) represents Faraday’s law of electromagnetic induction due to
Michael Faraday (1831). He found that when a permanent magnet was thrust into or
out of a closed circuit, the transient current flowed. Moreover, that experiment
implied that even without the closed circuit, an electric field was generated around
the space that changed the position relative to the permanent magnet. Equation (7.3)
is easier to understand if it is rewritten as follows:
∂B
rot E = - : ð7:14Þ
∂t
That is, the electric field E is generated in such a way that the induced electric
field (or induced current) tends to lessen the change in magnetic flux (or magnetic
field) produced by the permanent magnet (Lenz’s law). The minus sign in RHS
indicates that effect.
The rot operator appearing in (7.3) and (7.4) is defined by
e1 e2 e3
∂ ∂ ∂
rot V = — × V =
∂x ∂y ∂z
Vx Vy Vz
∂V z ∂V y ∂V x ∂V z ∂V y ∂V x
= - e1 þ - e2 þ - e3 : ð7:15Þ
∂y ∂z ∂z ∂x ∂x ∂y
The operator — has already appeared in (3.9). This operator transforms a vector to
a vector. Let us think of the meaning of the rot operator. Suppose that there is a
vector field that varies with time and spatial positions. Suppose also at some instant
the spatial distribution of the field varies as shown in Fig. 7.1, where a spiral vector
field V is present. For a z-component of rot V around the origin, we have
∂V y ∂V x ðV 1 Þy - ðV 3 Þy ðV 2 Þx - ðV 4 Þx
ðrot V Þz = - = lim - :
∂x ∂y Δx⟶0, Δy⟶0 Δx Δy
In the case of Fig. 7.1, (V1)y - (V3)y > 0 and (V2)x - (V4)x < 0 and, hence, we find
∂V
that rot V has a positive z-component. If Vz = 0 and ∂zy = ∂V ∂z
x
= 0, we find from
∂V
(7.15) that rot V possesses only the z-component. The equation ∂zy = ∂V ∂z
x
=0
implies that the vector field V is uniform in the direction of the z-axis. Thus, under
the above conditions the spiral vector field V is accompanied by the rot V vector field
that is directed toward the upper side of the plane of paper (i.e., the positive direction
of the z-axis).
Equation (7.4) can be rewritten as
7.1 Maxwell’s Equations and Their Characteristics 281
( x/2, 0) O x
( x/2, 0)
z
V3 V4
(0, y/2)
∂D
rot H = i þ : ð7:16Þ
∂t
Notice that ∂D
∂t
has the same dimension as A
m2 ] and is called displacement current.
Without this term, we have
rot H = i: ð7:17Þ
This relation is well known as Ampère’s law or Ampère’s circuital law (Andr-
é-Marie Ampère: 1827), which determines a magnetic field yielded by a stationary
current. Again with the aid of Fig. 7.1, (7.17) implies that the current given by i
produces spiral magnetic field.
Now, let us think of a change in amount of charges with time in a part of three-
dimensional closed space V surrounded by a closed surface S. It is given by
d ∂ρ
ρdV = dV = - i ndS = - div idV, ð7:18Þ
dt V V ∂t S V
where n is an outward-directed normal unit vector; with the last equality we used
Gauss’s theorem. The Gauss’s theorem is described by
Figure 7.2 shows an intuitive diagram that explains the Gauss’s theorem. The
diagram shows a cross-section of the closed space V surrounded by a surface S. In
this case, imagine a cube or a hexahedron as V. The periphery is the cross-section of
the closed surface accordingly. Arrows in the diagram schematically represent div i
on individual fragments; only those of the center infinitesimal fragment are shown
with solid lines. The arrows of adjacent fragments cancel out each other and only the
components on the periphery are non-vanishing. Thus, the volume integration of
div i is converted to the surface integration of i. Readers are referred to appropriate
literature with the vector integration [1].
Consequently, from (7.18) we have
∂ρ
þ div i dV = 0: ð7:19Þ
V ∂t
∂ρ
þ div i = 0: ð7:20Þ
∂t
The relation (7.20) is called a current continuity equation. This relation represents
the law of conservation of charge.
Meanwhile, taking div of both sides of (7.17), we have
7.1 Maxwell’s Equations and Their Characteristics 283
∂ ∂H z ∂H y ∂ ∂H x ∂H z ∂ ∂H y ∂H x
div rot H = - þ - þ -
∂x ∂y ∂z ∂y ∂z ∂x ∂z ∂x ∂y
2 2 2 2 2 2
∂ ∂ ∂ ∂ ∂ ∂
= - Hz þ - Hx þ - Hy
∂x∂y ∂y∂x ∂y∂z ∂z∂y ∂z∂x ∂x∂z
= 0: ð7:22Þ
2 2
∂ Hz
With the last equality of (7.22), we used the fact that if, e.g., ∂x∂y
and ∂∂y∂x
Hz
are
2 2
∂ Hz ∂ Hz
continuous and differentiable in a certain domain (x, y), = ∂x∂y ∂y∂x
. That is, we
assume “ordinary” functions for Hz, Hx, and Hy. Thus from (7.21), we have
div i = 0: ð7:23Þ
∂ρðx, t Þ
= 0, ð7:24Þ
∂t
where we explicitly show that ρ depends upon both x and t. Note that x is a position
vector described as (3.5). Therefore, (7.24) shows that ρ(x, t) is temporally constant
at a position x, consistent with the stationary current.
Nevertheless, we encounter a problem when ρ(x, t) is temporally varying. In other
words, (7.17) goes against the charge conservation law, when ρ(x, t) is temporally
varying. It was James Clerk Maxwell (1861–1862) that solved the problem by
introducing a concept of the displacement current. In fact, taking div of both sides
of (7.16), we have
∂div D ∂ρ
div rot H = div i þ = div i þ = 0, ð7:25Þ
∂t ∂t
where with the first equality we exchanged the order of differentiations with respect
to t and x; with the second equality we used (7.1). The last equality of (7.25) results
from (7.20). In other words, in virtue of the term of ∂D
∂t
, (7.4) is consistent with the
charge conservation law. Thus, the set of Maxwell’s Eqs. (7.1) to (7.4) supply us
with well-established base in natural science up until the present.
284 7 Maxwell’s Equations
Although the set of these equations describe spatial and temporal changes in
electric and magnetic fields in vacuum and matter including metal, in Part II we
confine ourselves to the changes in the electric and magnetic fields in a uniform
dielectric medium.
If we further confine ourselves to the case where neither electric charge nor electric
current is present in a uniform dielectric medium, we can readily obtain equations of
wave motion regarding the electric and magnetic fields. That is,
div D = 0, ð7:26Þ
div B = 0, ð7:27Þ
∂B
rot E þ = 0, ð7:28Þ
∂t
∂D
rot H - = 0: ð7:29Þ
∂t
The relations (7.27) and (7.28) are identical to (7.2) and (7.3), respectively.
Let us start with a formula of vector analysis. First, we introduce a grad operator.
We have
∂f ∂f ∂f
grad f = — f = e1 þ e 2 þ e 3 :
∂x ∂y ∂z
That is, the grad operator transforms a scalar to a vector. We have a following
formula:
∂ ∂V y ∂V x ∂ ∂V x ∂V z
½rot rot V x = - - - , ð7:31Þ
∂y ∂x ∂y ∂z ∂z ∂x
∂ ∂V x ∂V y ∂V z
2 2 2
∂ Vx ∂ Vx ∂ Vx
grad div V - — 2 V x = þ þ - - -
∂x ∂x ∂y ∂z ∂x
2
∂y
2
∂z
2
∂ ∂V y ∂V z
2 2
∂ Vx ∂ Vx
= þ - - : ð7:32Þ
∂x ∂y ∂z ∂y
2
∂z
2
2 2
∂ V ∂ Vy 2 2
Again assuming that ∂y∂xy = ∂x∂y
and ∂∂z∂x
Vz
= ∂ Vz
∂x∂z
, we have
2
∂B ∂rotB ∂ E
rot rot E þ rot = grad div E - — 2 E þ = - — 2 E þ με 2
∂t ∂t ∂t
= 0, ð7:34Þ
where with the first equality we used (7.30) and for the second equality we used
(7.7), (7.10), and (7.29). Thus we have
2
∂ E
— 2 E = με 2
: ð7:35Þ
∂t
2
∂ H
— 2 H = με 2
: ð7:36Þ
∂t
Equations (7.35) and (7.36) are called the equations of wave motions for the
electric and magnetic fields.
To consider implications of these equations, let us think of for simplicity a
following equation in a one-dimensional space.
286 7 Maxwell’s Equations
2 2
∂ yðx, t Þ 1 ∂ yðx, t Þ
2
= , ð7:37Þ
∂x v2 ∂t
2
where y is an arbitrary scalar function that depends on x and t; v is a constant. Let f(x,
t) and g(x, t) be arbitrarily chosen functions. Then, f(x - vt) and g(x + vt) are two
solutions of (7.37). In fact, putting X = x - vt, we have
2 2
∂f ∂f ∂X ∂f ∂ f ∂ ∂f ∂X ∂ f
= = , = = , ð7:38Þ
∂x ∂X ∂x ∂X ∂x2 ∂X ∂X ∂x ∂X
2
2 2
∂f ∂f ∂X ∂f ∂ f ∂ ∂f ∂X ∂ f
= = ð- vÞ , = ð- vÞ = ð- vÞ2 : ð7:39Þ
∂t ∂X ∂t ∂X ∂t 2 ∂X ∂X ∂t ∂X
2
Deleting ∂2f/∂X2 from the second equation of (7.38) and the second equation of
(7.39), we recover
2 2
∂ f 1 ∂ f
2
= 2 2: ð7:40Þ
∂x v ∂t
Similarly, we get
2 2
∂ g 1 ∂ g
2
= 2 2
: ð7:41Þ
∂x v ∂t
The implication of (7.42) is as follows: (i) The function f(x - vt) can be obtained
by parallel translation of f(x) by vt in a positive direction of x-axis. In other words,
f(x - vt) is obtained by translating f(x) by v in a unit of time in a positive direction of
x-axis, or the function represented by f(x) is translated at a rate of v with its form
unchanged in time. (ii) The function g(x + vt), however, is translated at a rate of -v
with its form unchanged in time as well. (iii) Thus, y(x, t) of (7.42) represents two
“waves,” i.e., a forward wave and a backward wave. Propagation velocity of the two
waves is jvj accordingly. Usually we choose a positive number for v and v is called a
phase velocity.
Comparing (7.35) and (7.36) with (7.41), we have
7.2 Equation of Wave Motion 287
με = 1=v2 : ð7:43Þ
where ~f shows the change in the functional from according to the variable transfor-
mation. In (7.45), we have
2π
kv = kλν = λν = 2πν = ω, ð7:46Þ
λ
where ν and ω are said to be frequency and angular frequency, respectively. For a
three-dimensional wave f of a scalar function, we have a following form:
2
1 ∂ f
— 2f = : ð7:49Þ
v2 ∂t 2
1
A - k2 eiðkx - ωtÞ = A - ω2 eiðkx - ωtÞ : ð7:50Þ
v2
k 2 v 2 = ω2 or kv = ω: ð7:51Þ
2π nx
k = kn = n, n = ðe1 e2 e3 Þ ny , ð7:52Þ
λ nz
where nx, ny, and nz define direction cosines. Then (7.47) can be rewritten as
where an exponent is called a phase. Suppose that the phase is fixed at zero. That is,
kn x - ωt = 0: ð7:54Þ
n x - vt = 0: ð7:55Þ
vt P
y
O
x
p
n ≈ εr : ð7:57Þ
In (7.58) and (7.59), E0 and H0 are constant vectors and may take a complex
magnitude. Substituting (7.58) for (7.28) and using (7.10) as well as (7.43) and
(7.46), we have
H y
O
Comparing coefficients of the exponential functions of the first and last sides,
we get
H 0 = n × E0 = μ=ε : ð7:60Þ
E0 = μ=ε H 0 × n: ð7:61Þ
n E0 = n H 0 = 0: ð7:62Þ
This indicates that E and H are both perpendicular to n, i.e., the propagation
direction of the electromagnetic wave. Thus, the electromagnetic wave is character-
ized by a transverse wave. The fields E and H have the same phase on P at an
arbitrary given time. Taking account of (7.60)–(7.62), E, H, and n are mutually
perpendicular to one another. We depict a geometry of E, H, and n for the
electromagnetic plane wave in Fig. 7.4, where a plane P is perpendicular to n.
We find that (7.60) is not independent of (7.61). In fact, taking an outer product
from the right with respect to both sides of (7.60), we have
7.3 Polarized Characteristics of Electromagnetic Waves 291
H 0 × n = n × E0 × n= μ=ε = E0 = μ=ε ,
n × E0 × n = E0 : ð7:63Þ
A = nðn AÞ þ n × ðA × nÞ:
This relation means that A can be decomposed into a component parallel to n and
that perpendicular to n. Equation (7.63) shows that E0 has no component parallel to
n. This is another confirmation that E is perpendicular to n.
In (7.60) and (7.61), μ=ε has a dimension [Ω]. Make sure that this can be
confirmed by (7.9) and (7.11). Hence, μ=ε is said to be characteristic impedance
[2]. We denote it by
Z μ=ε: ð7:64Þ
Thus, we have
H 0 = ðn × E0 Þ=Z:
In vacuum we have
For the electromagnetic wave to be the transverse wave means that neither E nor
H has component along n. Choosing the positive direction of the z-axis for n and
ignoring components related to partial differentiation with respect to x and y (i.e., the
component related to ∂/∂x and ∂/∂y), we rewrite (7.28) and (7.29) for individual
Cartesian coordinates as
292 7 Maxwell’s Equations
∂E y ∂Bx ∂Dy ∂H x
- þ =0 or - þ με = 0,
∂z ∂t ∂z ∂t
∂E x ∂By ∂Dx ∂H y
þ =0 or þ με = 0,
∂z ∂t ∂z ∂t
∂H y ∂Dx ∂By ∂x
- - =0 or - - με = 0,
∂z ∂t ∂z ∂t
∂H x ∂Dy ∂Bx ∂E y
- =0 or - με = 0: ð7:65Þ
∂z ∂t ∂z ∂t
2
∂ Ey 2
∂ Bx
- þ = 0:
∂z
2 ∂z∂t
Also differentiating the fourth equation of (7.65) with respect to t and multiplying
both sides by -μ, we have
2
2
∂ Hx ∂ Dy
-μ þμ = 0:
∂t∂z ∂t
2
Summing both sides of the above equations and arranging terms, we get
2 2
∂ Ey ∂ Ey
2
= με 2
:
∂z ∂t
2 2
∂ Ex ∂ E
2
= με 2 x :
∂z ∂t
2 2
2
∂ Hx
2
∂ Hx ∂ Hy ∂ Hy
2
= με 2
and 2
= με 2
:
∂z ∂t ∂z ∂t
From the above relations, we have two plane electromagnetic waves polarized
either the x-axis or y-axis.
7.3 Polarized Characteristics of Electromagnetic Waves 293
What is implied in the above description is that as solutions of (7.35) and (7.36)
we have a plane wave characterized by a specific direction defined by E0 and H0.
This implies that if we observe the electromagnetic wave at a fixed point, both E and
H oscillate along the mutually perpendicular directions E0 and H0. Hence, we say
that the “electric wave” is polarized in the direction E0 and that the “magnetic wave”
is polarized in the direction H0.
To characterize the polarization of the electromagnetic wave, we introduce
following unit polarization vectors εe and εm (with indices e and m related to the
electric and magnetic field, respectively) [3]:
where E0 and H0 are said to be amplitude and may be again complex. We have
εe × εm = n: ð7:67Þ
We call εe and εm a unit polarization vector of the electric field and magnetic field,
respectively. As noted above, εe, εm, and n constitute a right-handed system in this
order and are mutually perpendicular to one another.
The phase of E in the plane wave (7.58) and that of H in (7.59) are individually
the same on all the points of P. From the wave equations of (7.35) and (7.36),
however, it is unclear whether E and H have the same phase. Suppose that E and H
would have a different phase such that
E = E0 eiðkx - ωtÞ
and
where H~0 ð= H 0 eiδ Þ is complex and a phase factor eiδ in the exponent is unknown.
This factor, however, can be set at zero. To show this, let us make qualitative
discussion using Fig. 7.5. Figure 7.5 shows the electric field of the plane wave at
some instant as a function of phase Φ. Suppose that Φ is taken in the direction of n in
Fig. 7.4. Then, from (7.53) we have
Φ = kn x - ωt = kn ρn - ωt = kρ - ωt,
Suppose furthermore that the phase is measured at t = 0 and that the electric field
is measured along the εe direction. Then, we have
294 7 Maxwell’s Equations
Electric field
sinusoidal curve
representing the electric P1 P2
field is shifted with time 0
from left to right with its
form unchanged
−
0
Phase
where E1 (>0) and E2 (>0) are amplitudes and e1 and e2 represent unit polarization
vectors in the direction of positive x-axis and y-axis; we assume that two waves are
7.4 Superposition of Two Electromagnetic Waves 295
being propagated in the direction of the positive z-axis; δ is a phase difference. The
total electric field E is described as the superposition of E1 and E2 such that
E2
Ey = E: ð7:71Þ
E1 x
E2
Ey = - E:
E1 x
This gives a straight line as well. Therefore, if we wish to seek the relationship
between Ex and Ey, it suffices to examine it as a function of δ in a region of
- π2 ≤ δ ≤ π2.
(i) Case I: E1 ≠ E2.
Let us consider the case where δ ≠ 0 in (7.70). Rewriting the second equation of
(7.70) and inserting the first equation into it so that we can eliminate t, we have
Ex E2
E y = E2 ðcos ωt cos δ þ sin ωt sin δÞ = E2 cos δ ± sin δ 1 - x2 :
E1 E1
Ey E Ex 2
- ðcos δÞ x = ± ðsin δÞ 1- : ð7:72Þ
E2 E1 E 21
Ex 2 2ðcos δÞE x E y Ey 2
- þ = 1: ð7:73Þ
E 21 sin 2 δ E 1 E 2 sin 2 δ E 22 sin 2 δ
Note that the above matrix is real symmetric. In that case, to examine properties
of the matrix we calculate its determinant along with principal minors. The principal
minor means a minor with respect to a diagonal element. In this case, two principal
1 1
minors are E2 sin 2 and 2
δ E sin 2 δ
. Also we have
2 1
1 cos δ
-
E 21 sin 2 δ E 1 E 2 sin 2 δ 1
= 2 2 : ð7:75Þ
cos δ 1 E 1 E 2 sin 2 δ
-
E 1 E 2 sin 2 δ E22 sin 2 δ
Evidently, two principal minors as well as a determinant are all positive (δ ≠ 0). In
this case, the (2, 2) matrix of (7.74) is said to be positive definite. The related
discussion will be given in Part III. The positive definiteness means that in a
quadratic form described by (7.74), LHS takes a positive value for any real number
Ex and Ey except a unique case where Ex = Ey = 0, which renders LHS zero. The
positive definiteness of a matrix ensures the existence of positive eigenvalues with
the said matrix.
Let us consider a real symmetric (2, 2) matrix that has positive principal minors
and a positive determinant in a general case. Let such a matrix M be
M= a
c
c
b
,
where a, b > 0 and detM > 0; i.e., ab - c2 > 0. Let a corresponding quadratic form
be Q. Then, we have
cy 2 c2 y2 aby2
Q=ðx yÞ
a c x
= ax2 þ 2cyx þ by2 = a xþ - þ 2
c b y a a2 a
7.4 Superposition of Two Electromagnetic Waves 297
cy 2 y2
=a xþ þ ab - c2 :
a a2
Thus, Q ≥ 0 for any real numbers x and y. We seek a condition under which
Q = 0. We readily find that with M that has the above properties, only x = y = 0
makes Q = 0. Thus, M is positive definite. We will deal with this issue from a more
general standpoint in Part III.
In general, it is pretty complicated to seek eigenvalues and corresponding eigen-
vectors in the above case. Yet, we can extract important information from (7.74).
The eigenvalues λ of the matrix that appeared in (7.74) are estimated as follows:
2
E 21 þ E 22 ± E21 þ E 22 - 4E 21 E 22 sin 2 δ
λ= : ð7:76Þ
2E 21 E 22 sin 2 δ
2
E 21 - E22 þ 4E21 E22 cos 2 δ > 0 ðδ ≠ ± π=2Þ:
Also we have
2
E 21 þ E 22 > E 21 þ E 22 - 4E21 E22 sin 2 δ:
These clearly show that the quadratic form of (7.74) gives an ellipse (i.e.,
elliptically polarized light). Because of the presence of the second term of LHS of
(7.73), both the major and minor axes of the ellipse are tilted and diverted from the x-
and y-axes.
Let us inspect the ellipse described by (7.74). Inserting Ex = E1 obtained at t = 0
in (7.70) into (7.73) and solving a quadratic equation with respect to Ey, we get Ey as
a double root such that
Ey = E 2 cos δ:
Ex = E 1 cos δ:
(b)
we find that at t = 0 the electric field is represented by the point P (Ex = E1,
Ey = E2 cos δ); see Fig. 7.6a. From (7.70), if δ > 0, P traces the ellipse counter-
clockwise. It reaches a maximum point of Ey = E2 at t = δ/2ω. Since the trace of
electric field forms an ellipse as in Fig. 7.6, the associated light is said to be an
elliptically polarized light. If δ < 0 in (7.70), however, P traces the ellipse clockwise.
In a special case of δ = π/2, the second term of (7.73) vanishes and we have a
simple form described as
Ex 2 Ey 2
þ 2 = 1: ð7:77Þ
E 21 E2
Thus, the principal axes of the ellipse coincide with the x- and y-axes. On the basis
of (7.70), we see from Fig. 7.6b that starting from P at t = 0, again the coordinate
point representing the electric field traces the ellipse counter-clockwise with time;
see the curved arrow of Fig. 7.6b. If δ < 0, the coordinate point traces the ellipse
clockwise with time.
7.4 Superposition of Two Electromagnetic Waves 299
- cos δ
ð Ex Ey Þ 1
- cos δ 1
Ex
Ey
= E 21 sin 2 δ: ð7:79Þ
λ = 1 ± j cos δ j : ð7:80Þ
π
Setting - 2 ≤ δ ≤ π2, we have
λ = 1 ± cos δ: ð7:81Þ
1 1
p p
2 2
v1 = and v2 = : ð7:82Þ
1 1
- p p
2 2
1 1
p p
2 2
P= : ð7:83Þ
1 1
- p p
2 2
- cos δ
A= 1
- cos δ 1
, ð7:84Þ
we obtain
1þ cos δ
P - 1 AP = 0
0
1 - cos δ
: ð7:85Þ
Notice that eigenvalues (1 + cos δ) and (1 - cos δ) are both positive as expected.
Rewriting (7.79), we have
300 7 Maxwell’s Equations
- cos δ
ð Ex Ey ÞPP - 1 1
- cos δ 1
PP - 1 Ex
Ey
1þ cos δ
= ð Ex Ey ÞP 0
0
1 - cos δ
P-1 Ex
Ey
= E21 sin 2 δ: ð7:86Þ
u
v
P-1 Ex
Ey
: ð7:87Þ
ð e1 e 2 Þ Ex
Ey
= ðe1 e2 ÞPP - 1 Ex
Ey
= e01 e02 u
v
, ð7:88Þ
1 1 1 1
e01 e02 = ðe1 e2 ÞP = p e1 - p e2 p e 1 þ p e2 : ð7:89Þ
2 2 2 2
The coordinate system is depicted in Fig. 7.7 along with the basis vectors. The
relevant discussion will again appear in Part. III.
7.4 Superposition of Two Electromagnetic Waves 301
u2 v2
þ = 1: ð7:90Þ
E 21 ð1 - cos δÞ E 21 ð1 þ cos δÞ
p
Equation (7.90) indicates that a major axis and minor axis are E 1 1 þ cos δ and
p
E 1 1 - cos δ, respectively. When δ = ± π/2, (7.90) becomes
u2 v 2
þ = 1: ð7:91Þ
E 21 E 21
This represents a circle. For this reason, the wave described by (7.91) is called a
circularly polarized light. In (7.90) where δ ≠ ± π/2, the wave is said to be an
elliptically polarized light. Thus, we have linearly, elliptically, and circularly polar-
ized lights depending on a magnitude of δ.
Let us closely examine characteristics of the elliptically and circularly polarized
lights in the case of E1 = E2. When t = 0, from (7.70) we have
p p
E2x þ E 2y = 2E1 cosðδ=2Þ = E1 1 þ cos δ: ð7:93Þ
π
E x = E1 cosð- ωt Þ and Ey = E 1 cos - ωt ± : ð7:95Þ
2
>0
(b)
= /2
counter-clockwise. In this situation, we see the light from above the z-axis. In other
words, we are viewing the light against the direction of its propagation. The wave is
said to be left-circularly polarized and have positive helicity. In contrast, when
δ = - π/2, starting from Point C1 the electric field traces the circle clockwise.
That light is said to be right-circularly polarized and have negative helicity.
With the left-circularly polarized light, (7.69) can be rewritten as
normalize them, it is convenient to use the following vectors as in the case of Sect.
4.3 [3].
1 1
eþ p ðe1 þ ie2 Þ and e - p ðe1 - ie2 Þ: ð4:45Þ
2 2
In the case of δ = 0, we have a linearly polarized light. For this, the points A1, A2,
and A3 coalesce to be a point on a straight line of Ey = Ex.
References
1. Arfken GB, Weber HJ, Harris FE (2013) Mathematical methods for physicists, 7th edn. Aca-
demic Press, Waltham
2. Pain HJ (2005) The physics of vibrations and waves, 6th edn. Wiley, Chichester
3. Jackson JD (1999) Classical electrodynamics, 3rd edn. Wiley, New York
Chapter 8
Reflection and Transmission
of Electromagnetic Waves in Dielectric
Media
l
D1 t1
h
D2 n C t2
S
Fig. 8.1 A small rectangle S that strides an interface formed by two semi-infinite dielectric media
of D1 and D2. Let a curve C be a closed loop surrounding the rectangle S. A unit vector n is directed
to a normal of S. Unit vectors t1 and t2 are directed to a tangential line of the interface plane
∂B
rot E ndS þ ndS = 0, ð8:1Þ
S S ∂t
∂B
E dl þ nΔlΔh = 0: ð8:2Þ
C ∂t
With the line integral of the first term, C is a closed loop surrounding the rectangle
S and dl = tdl, where t is a unit vector directed toward the tangential direction of
C (see t1 and t2 in Fig. 8.1). The line integration is performed such that C is followed
counterclockwise in the direction of t.
Figure 8.2 gives an intuitive diagram that explains Stokes’ theorem. The diagram
shows an overview of a surface S encircled by a closed curve C. Suppose that we
have a spiral vector field E represented by arrowed circles as shown. In that case,
rot E is directed toward the upper side of the plane of paper in the individual
fragments. A summation of rot E ndS forms a surface integral covering S.
Meanwhile, the arrows of adjacent fragments cancel out each other and only the
components on the periphery (i.e., the curve C) are non-vanishing (see Fig. 8.2).
8.2 Basic Concepts Underlying Phenomena 307
Thus, the surface integral of rot E is equivalent to the line integral of E. Accordingly,
we get Stokes’ theorem described by [1]
E dl = 0:
C
ΔlðE1 t1 þ E2 t2 Þ = 0,
where E1 and E2 represent the electric field in the dielectrics D1 and D2 close to the
interface, respectively. Considering t2 = - t1 and putting t1 = t, we get
ðE1 - E2 Þ t = 0, ð8:4Þ
where t represents a unit vector in the direction of a tangential line of the interface
plane. Equation (8.4) means that the tangential components of the electric field are
continuous on both sides of the interface. We obtain a similar result with the
magnetic field. This can be shown by taking a surface integral of both sides of
(7.29) as well. As a result, we get
ðH 1 - H 2 Þ t = 0, ð8:5Þ
where H1 and H2 represent the magnetic field in D1 and D2 close to the interface,
respectively. Hence, from (8.5) the tangential components of the magnetic field are
continuous on both sides of the interface as well.
where Fi, Fr, and Ft denote an amplitude of the field; εi, εr, and εt represent a unit
vector of polarization direction, i.e., the direction along which the field oscillates;
and ki, kr, and kt are wavenumber vectors such that ki ⊥ εi, kr ⊥ εr, and kt ⊥ εt. These
wavenumber vectors represent the propagation directions of individual waves. In
(8.6)–(8.8), indices of i, r, and t stand for incidence, reflection, and transmission,
respectively.
Let xs be an arbitrary position vector at the interface between the dielectrics. Also,
let t be a unit vector paralleling the interface. Thus, tangential components of the
field are described as
Note that F it and F rt represent the field in D1 just close to the interface and that F tt
denotes the field in D2 just close to the interface. Thus, in light of (8.4) and (8.5),
we have
F it þ F r t = F t t : ð8:12Þ
Notice that (8.12) holds with any position xs and any time t.
Let us think of elementary calculation of exponential functions or exponential
polynomials and the relationship between individual coefficients and exponents.
0
With respect to two functions eikx and eik x , we have two alternatives according to a
value Wronskian takes. Here, Wronskian W is expressed as
8.2 Basic Concepts Underlying Phenomena 309
ikx 0 0
eik x
W = ðeeikx Þ0 0 0
ðeik x Þ = - iðk - k 0 Þeiðkþk Þx : ð8:13Þ
0
(i) W ≠ 0 if and only if k ≠ k′. In this case, eikx and eik x are said to be linearly
independent. That is, on condition of k ≠ k′, for any x we have
0
aeikx þ beik x = 0 ⟺ a = b = 0: ð8:14Þ
Notice that eikx never vanishes with any x. To conclude, if we think of an equation
of an exponential polynomial
0
aeikx þ beik x = 0,
where W ≠ 0 if and only if k1 ≠ k2, k2 ≠ k3, and k3 ≠ k1. That is, on this condition for
any x we have
If the three exponential functions are linearly dependent, at least two of k1, k2, and
k3 are equal to each other and vice versa. On this condition, again consider a
following equation of an exponential polynomial:
a þ b þ c = 0: ð8:19Þ
F i ðt εi Þeiðki xs - ωtÞ þ F r ðt εr Þeiðkr xs - ωtÞ - F t ðt εt Þeiðkt xs - ωtÞ = 0: ð8:20Þ
Again, (8.20) must hold with any position xs and any time t. Meanwhile, for
(8.20) to have a physical meaning, we should have
F i ðt εi Þ ≠ 0, F r ðt εr Þ ≠ 0, and F t ðt εt Þ ≠ 0: ð8:21Þ
On the basis of the above consideration, we must have following two relations:
ki xs - ωt = kr xs - ωt = kt xs - ωt
or
k i xs = kr xs = kt xs , ð8:22Þ
and
F i ðt εi Þ þ F r ðt εr Þ - F t ðt εt Þ = 0
or
F i ðt εi Þ þ F r ðt εr Þ = F t ðt εt Þ: ð8:23Þ
In this way, we are able to obtain a relation among amplitudes of the fields of
incidence, reflection, and transmission. Notice that we get both the relations between
exponents and coefficients at once.
First, let us consider (8.22). Suppose that the incident light (ki) is propagated in a
dielectric medium D1 in parallel to the zx-plane and that the interface is the xy-plane
(see Fig. 8.3). Also suppose that at the interface the light is reflected partly back to
D1 and transmitted (or refracted) partly into another dielectric medium D2. In
8.2 Basic Concepts Underlying Phenomena 311
Fig. 8.3, ki, kr, and kt represent the incident, reflected, and transmitted lights that
make an angle θ, θ′, and ϕ with the z-axis, respectively. Then we have
k i sin θ
k i = ð e1 e2 e3 Þ 0 , ð8:24Þ
- k i cos θ
x
xs = ð e 1 e 2 e 3 Þ y , ð8:25Þ
0
ki xs = ki x sin θ, ð8:26Þ
kr xs = krx x þ k ry y, ð8:27Þ
kt xs = ktx x þ k ty y, ð8:28Þ
where ki = j kij; krx and kry are x and y components of kr; similarly k tx and kty are
x and y components of kt.
Since (8.22) holds with any x and y, we have
312 8 Reflection and Transmission of Electromagnetic Waves in Dielectric Media
k ry = k ty = 0: ð8:30Þ
From (8.30) neither kr nor kt has a y component. This means that ki, kr, and kt are
coplanar. That is, the incident, reflected, and transmitted waves are all parallel to the
zx-plane. Notice that at the beginning we did not assume the coplanarity of those
waves. We did not assume the equality of θ and θ′ either (vide infra). From (8.29)
and Fig. 8.3, however, we have
where kr = j krj and kt = j ktj; θ′ and ϕ are said to be a reflection angle and a
refraction angle, respectively. Thus, the end points of ki, kr, and kt are connected on a
straight line that parallels the z-axis. Figure 8.3 clearly shows it.
Now, we suppose that a wavelength of the electromagnetic wave in D1 is λ1 and
that in D2 is λ2. Since the incident light and reflected light are propagated in D1,
we have
k i = kr = 2π=λ1 : ð8:32Þ
θ = θ0 : ð8:34Þ
This implies that the components tangential to the interface of ki, kr, and kt are
the same.
Meanwhile, we have
kt = 2π=λ2 : ð8:36Þ
Also we have
8.3 Transverse Electric (TE) Waves and Transverse Magnetic (TM) Waves 313
c = λ0 ν, v1 = λ1 ν, v2 = λ2 ν, ð8:37Þ
where v1 and v2 are phase velocities of light in D1 and D2, respectively. Since ν is
common to D1 and D2, we have
or
sin θ k λ n
= t = 1 = 2 ð nÞ, ð8:39Þ
sin ϕ ki λ 2 n1
On the basis of the above argument, we are now in the position to determine the
relations among amplitudes of the electromagnetic fields of waves of incidence,
reflection, and transmission. Notice that since we are dealing with non-absorbing
media, the relevant amplitudes are real (i.e., positive or negative). In other words,
when the phase is retained upon reflection, we have a positive amplitude due to
ei0 = 1. When the phase is reversed upon reflection, on the other hand, we will be
treating a negative amplitude due to eiπ = - 1. Nevertheless, when we consider the
total reflection, we deal with a complex amplitude (vide infra).
We start with the discussion of the vertical incidence of an electromagnetic wave
before the general oblique incidence. In Fig. 8.4a, we depict electric fields E and
magnetic fields H obtained at a certain moment near the interface. We index, e.g., Ei,
for the incident field. There we define unit polarization vectors of the electric field εi,
εr, and εt as identical to be e1 (a unit vector in the direction of the x-axis). In (8.6), we
also define Fi (both electric and magnetic fields) as positive.
We have two cases about a geometry of the fields (see Fig. 8.4). The first case is
that all Ei, Er, and Et are directed in the same direction (i.e., the positive direction of
the x-axis); see Fig. 8.4a. Another case is that although Ei and Et are directed in the
314 8 Reflection and Transmission of Electromagnetic Waves in Dielectric Media
Hi Ei Hi Ei
Hr Er Hr Er
D1 x x
Ht e1 D2 e1
Et Ht Et
Fig. 8.4 Geometry of the electromagnetic fields near the interface between dielectric media D1 and
D2 in the case of vertical incidence. (a) All Ei, Er, and Et are directed in the same direction e1 (i.e., a
unit vector in the positive direction of the x-axis). (b) Although Ei and Et are directed in the same
direction, Er is reversed. In this case, we define Er as negative
kt
Example 8.1 TE Wave In Fig. 8.5 we depict the geometry of oblique incidence of a
TE wave. The xy-plane defines the interface of the two dielectrics and t of (8.9) lies
on that plane. The zx-plane defines the incidence plane. In this case, E is polarized
along the y-axis with H polarized in the zx-plane. That is, regarding E, we choose
polarization direction εi, εr, and εt of the electric field as e2 (a unit vector toward the
positive direction of the y-axis that is perpendicular to the plane of paper). In Fig. 8.5,
the polarization direction of the electric field is denoted by a symbol ⨂. Therefore,
we have
e2 εi = e2 εr = e2 εt = 1: ð8:40Þ
For H we define the direction of unit polarization vectors εi, εr, and εt so that their
direction cosine relative to the x-axis can be positive; see Fig. 8.5. Choosing e1
(a unit vector in the direction of the x-axis) for t of (8.9) with regard to H, we have
Note that we have used the same symbols εi, εr, and εt to represent the unit
polarization vectors of both the electric field E and magnetic field H. In Fig. 8.5, the
unit polarization vectors of H are depicted with red arrows.
According as Hr is directed to the same direction as εr or the opposite direction to
εr, the amplitude is defined as positive or negative, as in the case of the vertical
incidence. In Fig. 8.5, we depict the case where the amplitude Hr is negative. That is,
the phase of the magnetic field is reversed upon reflection and, hence, Hr is in an
opposite direction to εr in Fig. 8.5. Note that Hi and Ht are in the same direction as εi
and εt, respectively. To avoid complication, neither Hi nor Ht is shown in Fig. 8.5.
316 8 Reflection and Transmission of Electromagnetic Waves in Dielectric Media
Taking account of the above note of caution, we apply (8.23) to both E and H to
get the following relations:
Ei þ Er = Et , ð8:42Þ
E r H r < 0: ð8:44Þ
Z1 = μ1 =ε1 = E i =H i = - E r =H r , ð8:45Þ
Z2 = μ2 =ε2 = E t =H t , ð8:46Þ
where ε1 and μ1 are permittivity and permeability of D1, respectively, and ε2 and μ2
are permittivity and permeability of D2, respectively. As an example, we have
where we distinguish polarization vectors of electric field (εi, e) and magnetic field
(εi, m). With the notation and geometry of the fields Ei and Hi, see Fig. 7.4.
Comparing coefficients of the last relation of (8.47), we get
E i =Z 1 = H i : ð8:48Þ
On the basis of (8.42)–(8.46), we are able to decide Er, Et, Hi, Hr, and Ht.
What we wish to determine, however, is a ratio among those quantities. To this
end, dividing (8.42) and (8.43) by Ei (>0), we define following quantities:
8.3 Transverse Electric (TE) Waves and Transverse Magnetic (TM) Waves 317
R⊥
E E r =E i and T⊥
E E t =E i , ð8:49Þ
where R⊥ ⊥
E and T E are said to be a reflection coefficient and transmission coefficient
with the electric field, respectively; the symbol ⊥ means a quantity of the TE wave
(i.e., electric field oscillating vertically with respect to the incidence plane). Thus
rewriting (8.42) and (8.43) and using R⊥ ⊥
E and T E , we have
R⊥ ⊥
E - T E = - 1,
cos θ cos ϕ cos θ ð8:50Þ
R⊥
E Z þ T⊥
E Z = :
1 2 Z1
-1 -1
cos θ cos ϕ
Z1 Z2 Z cos θ - Z 1 cos ϕ
R⊥
E = = 2 , ð8:51Þ
1 -1 Z 2 cos θ þ Z 1 cos ϕ
cos θ cos ϕ
Z1 Z2
1 -1
cos θ cos θ
Z1 Z1 2Z 2 cos θ
T⊥
E = = : ð8:52Þ
1 -1 Z 2 cos θ þ Z 1 cos ϕ
cos θ cos ϕ
Z1 Z2
Similarly, defining
R⊥
H H r =H i and T⊥
H H t =H i , ð8:53Þ
where R⊥ ⊥
H and T H are said to be a reflection coefficient and transmission coefficient
with the magnetic field, respectively, we get
Z 1 cos ϕ - Z 2 cos θ
R⊥
H= , ð8:54Þ
Z 2 cos θ þ Z 1 cos ϕ
2Z 1 cos θ
T⊥
H= : ð8:55Þ
Z 2 cos θ þ Z 1 cos ϕ
In this case, rewrite (8.42) as a relation among Hi, Hr, and Ht using (8.45) and
(8.46). Derivation of (8.54) and (8.55) is left for readers. Notice also that
318 8 Reflection and Transmission of Electromagnetic Waves in Dielectric Media
R⊥ ⊥
H = - RE : ð8:56Þ
Hi þ Hr = Ht : ð8:58Þ
k Z 2 cos ϕ - Z 1 cos θ
RE = , ð8:59Þ
Z 1 cos θ þ Z 2 cos ϕ
k 2Z 2 cos θ
TE = : ð8:60Þ
Z 1 cos θ þ Z 2 cos ϕ
Also, we get
k Z 1 cos θ - Z 2 cos ϕ k
RH = = - RE , ð8:61Þ
Z 1 cos θ þ Z 2 cos ϕ
k 2Z 1 cos θ
TH = : ð8:62Þ
Z 1 cos θ þ Z 2 cos ϕ
In Examples 8.1 and 8.2, we have examined how the reflection and transmission
coefficients vary as a function of characteristic impedance as well as incidence and
refraction angles. Meanwhile, in a nonmagnetic substance, a refractive index n can
be approximated as
p
n ≈ εr , ð7:57Þ
Using this relation, we can readily rewrite the reflection and transmission coef-
ficients as a function of refractive indices of the dielectrics. The derivation is left for
the readers.
where εe and εm are unit polarization vector; we assume that both E and H are
positive. Notice again that εe, εm, and n constitute a right-handed system in this
order.
The energy transport is characterized by a Poynting vector S that is described by
S = E × H: ð8:65Þ
V A
Since E and H have a dimension [m ] and [m ], respectively, S has a dimension [mW2].
Hence, S represents an energy flow per unit time and per unit area with respect to the
propagation direction. For simplicity, let us assume that the electromagnetic wave is
propagating toward the z-direction. Then we have
T
EH
S = e3 cos 2 ωtdt, ð8:68Þ
T 0
1
cos 2 ωt = ð1 þ cos 2ωt Þ, ð8:69Þ
2
1
S= EHe3 : ð8:70Þ
2
Equivalently, we have
1
S= E × H : ð8:71Þ
2
1
W= ðE D þ H BÞ, ð8:72Þ
2
where the first and second terms are pertinent to the electric and magnetic fields,
respectively. Note in (8.72) that the dimension of E D is [m m2] = [mJ3] and that the
V C
1
W= εE2 þ μH 2 : ð8:73Þ
2
1 1 2 1 2 1 1
W= εE þ μH = εE 2 þ μH 2 : ð8:74Þ
2 2 2 4 4
εE2 = μH 2 : ð8:75Þ
This implies that the energy density resulting from the electric field and that due
to the magnetic field have the same value. Thus, rewriting (8.74) we have
1 2 1 2
W= εE = μH : ð8:76Þ
2 2
1 1
S= vεE 2 e3 = vμH 2 e3 : ð8:78Þ
2 2
cos ϕ
- R⊥ ⊥ ⊥ ⊥
E RH þ T E T H = 1, ð8:79Þ
cos θ
k k k k cos ϕ
- RE RH þ T E T H = 1: ð8:80Þ
cos θ
R - RE RH = R2E = 2 Sr =2 Si = Sr = Si , ð8:81Þ
where Sr and Si are time-averaged Poynting vectors of the reflected wave and
incident waves, respectively.
Also, we have
θ
D1
D2 x
kt
R þ T = 1: ð8:83Þ
ϕ
The relation (8.83) represents the energy conservation. The factor cos cos θ can be
understood by Fig. 8.6 that depicts a luminous flux near the interface. Suppose that
we have an incident wave with an irradiance I mW2 whose incidence plane is the zx-
plane. Notice that I has the same dimension as a Poynting vector.
Here let us think of the luminous flux that is getting through a unit area (i.e., a unit
length square) perpendicular to the propagation direction of the light. Then, this flux
illuminates an area on the interface of a unit length (in the y-direction) multiplied by
ϕ
cos θ (in the x-direction). That is, the luminous flux has been widened
a length of cos
ϕ
(or squeezed) by cos cos θ times after getting through the interface. The irradiance has
been weakened (or strengthened) accordingly (see Fig. 8.6). Thus, to take a balance
of income and outgo with respect to the luminous flux before and after getting
ϕ
through the interface, the transmission irradiance must be multiplied by a factor cos cos θ .
cos θ - n cos ϕ
R⊥
E = , ð8:84Þ
cos θ þ n cos ϕ
8.5 Brewster Angles and Critical Angles 323
k cos ϕ - n cos θ
RE = , ð8:85Þ
cos ϕ þ n cos θ
sin θ
½numerator of ð8:84Þ = cos θ - n cos ϕ = cos θ - cos ϕ
sin ϕ
where with the second equality we used Snell’s law; the last equality is due to
trigonometric formula. Since we assume 0 < θ < π/2 and 0 < ϕ < π/2, we have -π/
2 < ϕ - θ < π/2. Therefore, if and only if ϕ - θ = 0, sin(ϕ - θ) = 0. Namely, only
when ϕ = θ, R⊥ E could vanish. For different dielectrics having different refractive
indices, only if ϕ = θ = 0 (i.e., a vertical incidence), we have ϕ = θ. But, in that case
we obtain
sinðϕ - θÞ 0
lim = :
ϕ → 0, θ → 0 sin ϕ 0
1-n
R⊥
E = , ð8:87Þ
1þn
k 1-n
RE = :
1þn
sin θ
½numerator of ð8:85Þ = cos ϕ - n cos θ = cos ϕ - cos θ
sin ϕ
324 8 Reflection and Transmission of Electromagnetic Waves in Dielectric Media
With the last equality of (8.88), we used a trigonometric formula. From (8.86) we
ðϕ - θ Þ k
know that sinsin ϕ does not vanish. Therefore, for RE to vanish, we need cos
(ϕ + θ) = 0. Since 0 < ϕ + θ < π, cos(ϕ + θ) = 0 if and only if
ϕ þ θ = π=2: ð8:89Þ
ϕB þ θB = π=2, ð8:90Þ
k
we have RE = 0, i.e., we do not observe a reflected wave. The particular angle θB is
said to be the Brewster angle. For θB we have
π
sin ϕB = sin - θB = cos θB ,
2
ϕB = tan - 1 n - 1 : ð8:91Þ
~
θ B = ϕB : ð8:93Þ
Thus, regarding the TM wave that is propagating in D2 after getting through the
interface and is to get back to D1, ~
θB = ϕB is again the Brewster angle. In this way,
the said TM wave is propagating from D1 to D2 and then getting back from D2 to D1
without being reflected by the two interfaces. This conspicuous feature is often
utilized for an optical device.
8.6 Total Reflection 325
θc = sin - 1 n: ð8:94Þ
sin θc n
= sin θc = 2 ð nÞ: ð8:95Þ
sin π2 n1
θc > θB : ð8:96Þ
The critical angle is always larger than the Brewster angle with TM waves.
In Sect. 8.2 we saw that Snell’s law results from the kinematical requirement. For
this reason, we may consider it as a universal relation that can be extended to
complex refraction angles. In fact, for Snell’s law to hold with θ > θc, we must have
326 8 Reflection and Transmission of Electromagnetic Waves in Dielectric Media
π
ϕ= þ ia ða : real, a ≠ 0Þ, ð8:98Þ
2
we have
1 iϕ 1
sin ϕ e - e - iϕ = ðe - a þ ea Þ > 1, ð8:99Þ
2i 2
1 iϕ i
cos ϕ e þ e - iϕ = ðe - a - ea Þ: ð8:100Þ
2 2
where k tx and ktz are x and z components of kt, respectively; kt = j ktj. Putting
we have
Et = Eεt eiðxkt sin ϕþibzkt - ωtÞ = Eεt eiðxkt sin ϕ - ωtÞ e - bzkt : ð8:104Þ
b > 0: ð8:106Þ
Meanwhile, we have
8.6 Total Reflection 327
sin 2 θ n2 - sin 2 θ
cos 2 ϕ = 1 - sin 2 ϕ = 1 - = , ð8:107Þ
n2 n2
where notice that n < 1, because we are dealing with the incidence of light from a
medium with a higher refractive index to a low index medium. When we consider
the total reflection, the numerator of (8.107) is negative, and so we have two choices
such that
p
sin 2 θ - n2
cos ϕ = ± i : ð8:108Þ
n
Then, we have
p
⊥ cos θ - i sin 2 θ - n2
R = - R⊥
E R⊥
H = R⊥
E R⊥
E = p
cos θ þ i sin 2 θ - n2
p
cos θ þ i sin 2 θ - n2
p = 1: ð8:111Þ
cos θ - i sin 2 θ - n2
The relations (8.111) and (8.113) ensure that the energy flow gets back to a higher
refractive index medium.
Thus, the total reflection is characterized by the complex reflection coefficient
expressed as (8.110) and (8.112) as well as a reflectance of 1. From (8.110) and
328 8 Reflection and Transmission of Electromagnetic Waves in Dielectric Media
(8.112), we can estimate a change in a phase of the electromagnetic wave that takes
place by virtue of the total reflection. For this purpose, we put
k
R⊥
E e
iα
and RE eiβ : ð8:114Þ
sin θc = n: ð8:116Þ
Therefore, we have
1 - n2 = cos 2 θc : ð8:117Þ
R⊥
E θ = θc
= 1: ð8:118Þ
R⊥
E θ = π=2
= - 1: ð8:119Þ
The argument α defines a phase shift upon the total reflection. Considering
(8.115) and (8.118), we have
αjθ = θc = 0:
Since 1 - n2 > 0 (i. e., n < 1) and in the total reflection region sin2θ - n2 > 0, the
imaginary part of R⊥E is negative for any θ (i.e., 0 to π/2). On the other hand, the real
⊥
part of RE varies from 1 to -1, as is evidenced from (8.118) and (8.119). At θ0 that
satisfies a following condition:
8.6 Total Reflection 329
0 1
α
= =
2
1+
= sin
2
1 þ n2
sin θ0 = , ð8:121Þ
2
the real part is zero. Thus, the phase shift α varies from 0 to -π as indicated in
Fig. 8.8. Comparing (8.121) with (8.116) and taking into account n < 1, we have
p
- n4 cos 2 θ þ sin 2 θ - n2 þ 2in2 cos θ sin 2 θ - n2
= : ð8:122Þ
ð1 - n2 Þ sin 2 θ - n2 cos 2 θ
Then, we have
330 8 Reflection and Transmission of Electromagnetic Waves in Dielectric Media
k
RE = - 1: ð8:123Þ
θ = θc
k
RE = 1: ð8:124Þ
θ = π=2
βjθ = θc = π: ð8:126Þ
Therefore, the denominator of (8.122) is positive and, hence, the imaginary part
k
of RE is positive as well for any θ (i.e., 0 to π/2). From (8.123) and (8.124), on the
other hand, the real part of RE in (8.122) varies from -1 to 1. At ~θ0 that satisfies a
k
following condition:
1 - n2
cos ~
θ0 = , ð8:128Þ
1 þ n4
k
the real part of RE is zero. Once again, we have
θc < ~
θ0 < π=2: ð8:129Þ
β
1
0
= =
2
There are many optical devices based upon light propagation. Among them, wave-
guide devices utilize the total reflection. We explain their operation principle.
Suppose that we have a thin plate (usually said to be a slab) comprising a
dielectric medium that infinitely spreads two-dimensionally and that the plate is
sandwiched with another dielectric (or maybe air or vacuum) or metal. In this
situation electromagnetic waves are confined within the slab. Moreover, only
under a restricted condition those waves are allowed to propagate in parallel to the
slab plane. Such electromagnetic waves are usually called propagation modes or
simply “modes.” An optical device thus designed is called a waveguide. These
modes are characterized by repeated total reflection during the propagation. Another
mode is an evanescent mode. Because of the total reflection, the energy transport is
not allowed to take place vertically to the interface of two dielectrics. For this reason,
the evanescent mode is thought to be propagated in parallel to the interface very
close to it.
(a) (b)
Metal Dielectric (clad layer)
Waveguide
Waveguide
d
d
(core layer)
z z
x x
Fig. 8.10 Cross-section of a slab waveguide comprising a dielectric medium. (a) A waveguide is
sandwiched between a couple of metal layers. (b) A waveguide is sandwiched between a couple of
layers consisting of another dielectric called clad layer. The sandwiched layer is called core layer
longer have the same phase in that direction. Yet, as solutions of equations of wave
motion, we can have a solution that has the same phase with the direction of the x-
axis. Bearing in mind such a situation, let us think of Maxwell’s equations in relation
to the equations of wave motion.
Ignoring components related to partial differentiation with respect to x (i.e., the
component related to ∂/∂x) and rewriting (7.28) and (7.29) for individual Cartesian
coordinates, we have [3]
∂Ez ∂E y ∂Bx
- þ = 0, ð8:130Þ
∂y ∂z ∂t
∂E x ∂By
þ = 0, ð8:131Þ
∂z ∂t
∂E x ∂Bz
- þ = 0, ð8:132Þ
∂y ∂t
∂H z ∂H y ∂Dx
- - = 0, ð8:133Þ
∂y ∂z ∂t
∂H x ∂Dy
- = 0, ð8:134Þ
∂z ∂t
∂H x ∂Dz
- - = 0: ð8:135Þ
∂y ∂t
2
∂ E x ∂ By
2
þ = 0, ð8:136Þ
∂z
2 ∂z∂t
2 2
∂ Ex ∂ Bz
- = 0, ð8:137Þ
∂y
2 ∂y∂t
334 8 Reflection and Transmission of Electromagnetic Waves in Dielectric Media
2
2
∂ Hz ∂ H y ∂2 Dx
- - = 0: ð8:138Þ
∂t∂y ∂t∂z ∂t
2
Multiplying (8.138) by μ and further adding (8.136) and (8.137) to it and using
(7.7), we get
2 2 2
∂ Ex ∂ Ex ∂ E
2
þ 2
= με 2 x : ð8:139Þ
∂y ∂z ∂t
2 2 2
∂ Hx ∂ Hx ∂ Hx
2
þ 2
= με 2
: ð8:140Þ
∂y ∂z ∂t
Equations (8.139) and (8.140) are two-dimensional wave equations with respect
to the y- and z-coordinates. With the direction of the x-axis, a propagating wave has
the same phase. Suppose that we have plane wave solutions for them as in the case of
(7.58) and (7.59). Then, we have
Note that a plane wave expressed by (7.58) and (7.59) is propagated uniformly in
a dielectric medium. In a waveguide, on the other hand, the electromagnetic waves
undergo repeated (total) reflections from the two boundaries positioned either side of
the waveguide while being propagated.
In a three-dimensional version, the wavenumber vector has three components kx,
ky, and kz as expressed in (7.48). In (8.141), in turn, k has y- and z-components
such that
k2 = k 2 = k 2y þ k2z : ð8:142Þ
2 2 2 2 2 2
∂ Ex ∂ Ex ∂ Ex ∂ Hx ∂ Hx ∂ Hx
2
þ 2
= με 2
, 2
þ 2
= με :
∂ ð ± yÞ ∂ ð ± zÞ ∂ ð ± t Þ ∂ ð ± yÞ ∂ ð ± zÞ ∂ ð ± t Þ2
−| | | |
−| |
k kz
ky
θ ky θ
k
kz
z
x
ky = ± j ky j and kz = ± j kz j :
Figure 8.12 shows the geometries of the electromagnetic waves within the slab
waveguide. The slab plane is parallel to the zx-plane. Let the positions of the two
interfaces of the slab waveguide be
336 8 Reflection and Transmission of Electromagnetic Waves in Dielectric Media
where E+ (E-) and εe ε0e represent an amplitude and unit polarization vector of the
incident (reflected) waves, respectively. The vector εe (or εe′) is defined in (7.67).
Equation (8.145) is common to both the cases of TE and TM waves. From now,
we consider the TE mode case. Suppose that the slab waveguide is sandwiched with
a couple of metal sheet of high conductance. Since the electric field must be absent
inside the metal, the electric field at the interface must be zero owing to the
continuity condition of a tangential component of the electric field. Thus, we require
the following condition should be met with (8.145):
t εe E þ þ t εe 0 E - = 0, ð8:147Þ
E þ þ E - = 0:
This means that the reflection coefficient of the electric field is -1. Denoting
E+ = - E- E0 (>0), we have
8.7 Waveguide Applications 337
l ≤ kd=π: ð8:150Þ
Meanwhile, we have
k = nk 0 , ð8:151Þ
where n is a refractive index of a dielectric that shapes the slab waveguide and the
quantity k0 is a wavenumber of the electromagnetic wave in vacuum. The index n is
given by
n = c=v, ð8:152Þ
where c and v are light velocity in vacuum and the dielectric media, respectively.
Here v is meant as a velocity in an infinitely spreading dielectric. Thus, θ is allowed
to take several (or more) numbers depending upon k, d, and l.
Since in the z-direction no specific boundary conditions are imposed, we have
propagating modes in that direction characterized by a propagation constant (vide
infra). Looking at (8.148), we notice that k sin θ plays a role of a wavenumber in a
free space. For this reason, a quantity β defined as
338 8 Reflection and Transmission of Electromagnetic Waves in Dielectric Media
1=2
l2 π 2
β = k2 - : ð8:154Þ
d2
Thus, the allowed TE waves indexed by l are called TE modes and represented as
TEl. The phase velocity vp is given by
vp = ω=β: ð8:155Þ
-1
dω dβ
vg = = : ð8:156Þ
dβ dω
vg = v2 β=ω: ð8:157Þ
Thus, we have
v p vg = v2 : ð8:158Þ
d
suppose another virtual
plane N′X′Y′ that is parallel Q
with NXY. We need the
plane N′X′Y′ to estimate an
X’ N’
optical path difference
(or phase difference) B Y’
P’ θ
A’
Q’
B’
n1 > n2 ð8:159Þ
so that the total internal reflection can take place at the interface of D1 and D2. In this
case, the dielectrics D1 and D2 act as a core layer and a clad layer, respectively.
The biggest difference between the present waveguide and the previous one is
that unlike the previous case, the total internal reflection occurs in the present case.
Concomitantly, an evanescent wave is present in the clad layer very close to the
interface.
First, let us estimate the conditions that are satisfied so that an electromagnetic
wave can be propagated within a waveguide. Figure 8.13 depicts a cross-section of
the waveguide where the light is propagated in the direction of k. In Fig. 8.13,
suppose that we have a normal N to the plane of paper at P. Then, N and a straight
line XY shape a plane NXY. Also suppose that a dielectric fills a semi-infinite space
situated below NXY. Further suppose that there is another virtual plane N′X′Y′ that
is parallel with NXY as shown. Here N′ is parallel to N. The separation of the two
parallel planes is d. We need the virtual plane N′X′Y′ just to estimate an optical path
difference (or phase difference, more specifically) between two waves, i.e., a prop-
agating wave and a reflected wave.
Let n be a unit vector in the direction of k, i.e.,
340 8 Reflection and Transmission of Electromagnetic Waves in Dielectric Media
n = k= j k j = k=k: ð8:160Þ
x = rn þ su þ tv, ð8:162Þ
where u and v represent unit vectors in the direction perpendicular to n. Then (8.161)
can be expressed by
Suppose that the wave is propagated starting from a point A to P and reflected at
P. Then, the wave is further propagated to B and reflected again to reach Q. The
wave front is originally at AB and finally at PQ. Thus, the Z-shaped optical path
length APBQ is equal to a separation between A′B′ and PQ. Notice that the
separation between A′B′ and P′Q′ is taken so that it is equal to that between AB
and PQ. The geometry of Fig. 8.13 implies that two waves starting from AB and
A′B′ at once reach PQ again at once.
We find the separation between AB and A′B′ is
2d cos θ:
Let us tentatively call these waves Wave-AB and Wave-A′B′ and describe their
electric fields as EAB and EA0 B0 , respectively. Then, we denote
where k is a wavenumber in the dielectric. Note that since EA0 B0 gets behind EAB, a
plus sign appears in the first term of the exponent. Therefore, the phase difference
between the two waves is
Now, let us come back to the actual geometry of the waveguide. That is, the core
layer of thickness d is sandwiched by a couple of clad layers (Fig. 8.10b). In this
situation, the wave EAB experiences the total internal reflection two times, which we
ignored in the above discussion of the metal waveguide. Since the total internal
8.7 Waveguide Applications 341
reflection causes a complex phase shift, we have to take account of this effect. The
phase shift was defined as α of (8.114) for a TE mode and β for a TM mode. Notice
that in Fig. 8.13 the electric field oscillates perpendicularly to the plane of paper with
the TE mode, whereas it oscillates in parallel with the plane of paper with the TM
mode. For both cases the electric field oscillates perpendicularly to n. Consequently,
the phase shift due to these reflections has to be added to (8.166). Thus, for the phase
commensuration to be obtained, the following condition must be satisfied:
where δ is either δTE or δTM defined below according to the case of the TE wave and
TM wave, respectively. For a practical purpose, (8.167) is dealt with by a numerical
calculation, e.g., to design an optical waveguide.
Unlike (8.149), what is the most important with (8.167) is that the condition l = 0
is permitted because of δ < 0 (see just below).
For convenience and according to the custom, we adopt a phase shift notation
other than that defined in (8.114). With the TE mode, the phase is retained upon
reflection at the critical angle, and so we identify α with an additional component
δTE. In the TM case, on the other hand, the phase is reversed upon reflection at the
critical angle (i.e., a π shift occurs). Since this π shift has been incorporated into β, it
suffices to consider only an additional component δTM. That is, we have
We rewrite (8.110) as
ae - iσ
R⊥
E =e =e
iα iδTE
= e - 2iσ
aeiσ
and
where we have
Therefore,
p
δ sin 2 θ - n2
tan σ = - tan TE = : ð8:171Þ
2 cos θ
k be - iτ
RE = eiβ = eiðδTM þπ Þ - = - e - 2iτ = eið- 2τþπ Þ , ð8:172Þ
beiτ
where
Note that the minus sign with the second last equality in (8.172) is due to the
phase reversal upon reflection. From (8.172), we may put
β = - 2τ þ π:
Consequently, we get
p
δTM sin 2 θ - n2
tan τ = - tan = : ð8:175Þ
2 n2 cos θ
Finally, the additional phase change δTE and δTM upon the total reflection is given
by [4].
p p
-1 sin 2 θ - n2 -1 sin 2 θ - n2
δTE = - 2 tan and δTM = - 2 tan : ð8:176Þ
cos θ n2 cos θ
We emphasize that in (8.176) both δTE and δTM are negative quantities. This
phase shift has to be included in (8.167) as a negative quantity δ.
At a first glance, (8.176) seems to differ largely from (8.120) and (8.125).
Nevertheless, noting that a trigonometric formula
2 tan x
tan 2x =
1 - tan 2 x
and remembering that δTM in (8.168) includes π arising from the phase reversal, we
find that both the relations are virtually identical.
Evanescent waves are drawing a large attention in the field of basic physics and
applied device physics. If the total internal reflection is absent, ϕ is real. But, under
the total internal reflection, ϕ is pure imaginary. The electric field of the evanescent
wave is described as
8.8 Stationary Waves 343
Et = Eεt eiðkt zsin ϕþkt y cos ϕ - ωtÞ = Eεt eiðkt zsin ϕþkt yib - ωtÞ
v1 ω ω v
v1 < vðpsÞ = = = = vðpeÞ = 2 < v2 , ð8:178Þ
sin θ k sin θ kt sin ϕ sin ϕ
where v1 and v2 are light velocity in a free space filled by the dielectrics D1 and D2,
respectively. For this, we used a relation described as
ω = v1 k = v2 k t : ð8:179Þ
We also used Snell’s law with the third equality of (8.178). Notice that sinϕ > 1
in the evanescent region and that kt sin ϕ is a propagation constant in the clad layer.
ðsÞ ðeÞ
Also note that vp is equal to vp and that these phase velocities are in between the
two velocities of the free space. Thus, the evanescent waves must be present,
accompanying propagating waves that undergo the total internal reflections in a
slab waveguide.
As remarked in (8.105), the electric field of evanescent waves decays exponen-
tially with increasing z. This implies that the evanescent waves exist only in the clad
layer very close to an interface of core and clad layers.
So far, we have been dealing with propagating waves either in a free space or in a
waveguide. If the dielectric shaping the waveguide is confined in another direction,
the propagating waves show specific properties. Examples include optical fibers.
In this section we consider a situation where the electromagnetic wave is prop-
agating in a dielectric medium and reflected by a “wall” formed by metal or another
dielectric. In such a situation, the original wave (i.e., a forward wave) causes
interference with the backward wave, and a stationary wave is formed as a conse-
quence of the interference.
344 8 Reflection and Transmission of Electromagnetic Waves in Dielectric Media
To approach this issue, we deal with superposition of two waves that have
different phases and different amplitudes. To generalize the problem, let ψ 1 and
ψ 2 be two cosine functions described as
Here we wish to unify (8.181) as a single cosine (or sine) function. To this end,
we modify a description of ψ 2 such that
ψ 2 = a2 cos½b1 þ ðb2 - b1 Þ
ψ = ψ1 þ ψ2
we get
where θ is expressed by
8.8 Stationary Waves 345
a1
θ
a2
O b1 b2 x
a2 sinðb1 - b2 Þ
tan θ = : ð8:186Þ
a1 þ a2 cosðb1 - b2 Þ
where the former equation represents a forward wave, whereas the latter a backward
wave. Then we have
b1 - b2 = 2kx: ð8:188Þ
with
346 8 Reflection and Transmission of Electromagnetic Waves in Dielectric Media
and
a2 sin 2kx
tan θ = : ð8:191Þ
a1 þ a2 cos 2kx
Equation (8.189) looks simple, but since both R and θ vary as a function of x, the
situation is somewhat complicated unlike a simple sinusoidal wave. Nonetheless,
when x takes a special value, (8.189) is expressed by a simple function form. For
example, at t = 0,
This corresponds to (i) of Fig. 8.15. If t = T/2 (where T is a period, i.e., T = 2π/ω),
we have
T
ψ x, = - ða1 þ a2 Þ cos kx:
2
This corresponds to (iii) of Fig. 8.15. But, the waves described by (ii) or (iv) do
not have a simple function form.
We characterize Fig. 8.15 below. If we have
respectively. Notice that in Fig. 8.15 we took a1, a2 > 0. At another instant t = T/4,
we have similarly
Thus, the waves that vary with time are characterized by two dram-shaped
envelopes that have extremals ja1 + a2j and ja1 - a2j or those - j a1 + a2j and
- ja1 - a2j. An important implication of Fig. 8.15 is that no node is present in the
superposed wave. In other words, there is no instant t0 when ψ(x, t0) = 0 for any x.
From the aspect of energy transport of electromagnetic waves, if |a1| > j a2j (where
a1 and a2 represent an amplitude of the forward and backward waves, respectively),
the net energy flow takes place in the travelling direction of the forward wave. If, on
the other hand, |a1| > j a2j, the net energy flow takes place in the travelling direction
of the backward wave. In this respect, think of Poynting vectors.
In case |a1| = j a2j, the situation is particularly simple. No net energy flow takes
place in this case. Correspondingly, we observe nodes. Such waves are called
stationary waves. Let us consider this simple situation for an electromagnetic wave
that is incident perpendicularly to the interface between two dielectrics (one of them
may be a metal).
Returning back to (7.58) and (7.66), we describe two electromagnetic waves that
are propagating in the positive and negative direction of the z-axis such that
where εe is a unit polarization vector arbitrarily fixed so that it can be parallel to the
interface, i.e., wall (i.e., perpendicular to the z-axis). The situation is depicted in
Fig. 8.16. Notice that in Fig. 8.16 E1 represents the forward wave (i.e., incident
wave) and E2 the backward wave (i.e., wave reflected at the interface). Thus, a
superposed wave is described as
E = E1 þ E2 : ð8:196Þ
= ( )
Note that in (8.198) variables z and t have been separated. This implies that we
have a stationary wave. For this case to be realized, the characteristic impedance of
the dielectric of the incident wave side should be smaller enough than that of the
other side; see (8.51) and (8.59). In other words, the dielectric constant of the
incident side should be large enough. We have nodes at positions that satisfy
π 1 m
- kz = þ mπ ðm = 0, 1, 2, ⋯Þ or -z= λ þ λ: ð8:199Þ
2 4 2
Note that we are thinking of the stationary wave in the region of z < 0. Equation
(8.199) indicates that nodes are formed at a quarter wavelength from the interface
and every half wavelength from it. The node means the position where no electric
field is present.
Meanwhile, antinodes are observed at positions
8.8 Stationary Waves 349
m
- kz = mπ ðm = 0, 1, 2, ⋯Þ or -z= þ λ:
2
Thus, the nodes and antinodes alternate with every quarter wavelength.
(ii) Antiphase:
The phase of the electric field is reversed. We assume that E1 = - E2 (>0). Then,
we have
In (8.201) variables z and t have been separated as well. For this case to be
realized, the characteristic impedance of the dielectric of the incident wave side
should be larger enough than that of the other side. In other words, the dielectric
constant of the incident side should be small enough. Practically, this situation can
easily be attained choosing a metal of high reflectance for the wall material. We have
nodes at positions that satisfy
m
- kz = mπ ðm = 0, 1, 2, ⋯Þ or -z= λ: ð8:202Þ
2
The nodes are formed at the interface and every half wavelength from it. As in the
case of the syn-phase, the antinodes take place with the positions shifted by a quarter
wavelength relative to the nodes.
If there is another interface at say -z = L (>0), the wave goes back and forth
many times. If an absolute value of the reflection coefficient at the interface is high
enough (i.e., close to the unity), attenuation of the wave is ignorable. For both the
syn-phase and antiphase cases, we must have
m
kL = mπ ðm = 1, 2, ⋯Þ or L= λ ð8:203Þ
2
so that stationary waves can stand stable. The stationary waves indexed with a
positive integer m in (8.203) are said to be longitudinal modes. Note that the index
l that features the transverse modes in the waveguide can be zero or a positive
integer. Unlike the previous case of (8.167) in Sect. 8.7, the index m of (8.203) is not
allowed to be zero.
If L in (8.203) is large enough compared to λ, m is a pretty large number as well,
and what is more, many different numbers of m are allowed between given
350 8 Reflection and Transmission of Electromagnetic Waves in Dielectric Media
References
1. Arfken GB, Weber HJ, Harris FE (2013) Mathematical methods for physicists, 7th edn. Aca-
demic Press, Waltham
2. Rudin W (1987) Real and complex analysis, 3rd edn. McGraw-Hill, New York
3. Smith FG, King TA, Wilkins D (2007) Optics and photonics, 2nd edn. Wiley, Chichester
4. Born M, Wolf E (2005) Principles of optics, 7th edn. Cambridge University Press, Cambridge
5. Pain HJ (2005) The physics of vibrations and waves, 6th edn. Wiley, Chichester
Chapter 9
Light Quanta: Radiation and Absorption
Historically, the relevant theory was first propounded by Max Planck and then
Albert Einstein as briefly discussed in Chap. 1. The theory was developed on the
basis of the experiments called cavity radiation or blackbody radiation. Here,
however, we wish to derive Planck’s law of radiation on the assumption of the
existence of quantum harmonic oscillators.
As discussed in Chap. 2, the ground state of a quantum harmonic oscillator has an
energy ℏω/2. Therefore, we measure energies of the oscillator in reference to that
state. Let N0 be the number of oscillators (i.e., light quanta) present in the ground
state. Then, according to Boltzmann distribution law the number of oscillators of the
first excited state N1 is
N 1 = N 0 e - ℏω=kB T , ð9:1Þ
N j = N 0 e - jℏω=kB T : ð9:2Þ
N = N 0 þ N 0 e - ℏω=kB T þ ⋯ þ N 0 e - jℏω=kB T þ ⋯
1 ð9:3Þ
= N0 j=0
e - jℏω=kB T :
Let E be a total energy of the oscillator system in reference to the ground state.
That is, we put a ground state energy at zero. Then we have
1 - jℏω=k B T
E j = 0 je
E= = ℏω 1 - jℏω=kB T
: ð9:5Þ
N j = 0e
1 1 d 1 d 1
jxj = jxj - 1 x = xj x= x
j=0 j=0 dx j=0 dx 1 - x
x
= , ð9:7Þ
ð 1 - xÞ 2
1 1
xj = : ð9:8Þ
j=0 1-x
Then, we get
9.2 Planck’s Law of Radiation and Mode Density of Electromagnetic Waves 353
The function
1
ð9:10Þ
eℏω=kB T - 1
eℏω=kB T ≈ 1 þ ðℏω=kB T Þ:
Therefore, we get
E ≈ k B T: ð9:11Þ
Thus, the relation (9.9) asymptotically agrees with a classical theory. In other
words, according to the classical theory related to law of equipartition of energy,
energy of kBT/2 is distributed to each of two degrees of freedom of motion, i.e., a
kinetic energy and a potential energy of a harmonic oscillator.
Researcher at the time tried to seek the relationship between the energy density
inside the cavity and (angular) frequency of radiation. To reach the relationship, let
us introduce a concept of mode density of electromagnetic waves related to the
blackbody radiation. We define the mode density D(ω) as the number of modes of
electromagnetic waves per unit volume per unit angular frequency. We refer to the
electromagnetic waves having allowed specific angular frequencies and polarization
as modes. These modes must be described as linearly independent functions.
Determination of the mode density is related to boundary conditions (BCs)
imposed on a physical system. We already dealt with this problem in Chaps. 2, 3,
and 8. These BCs often appear when we find solutions of a differential equation. Let
us consider a following wave equation:
2 2
∂ ψ 1 ∂ ψ
2
= 2 : ð9:12Þ
∂x v ∂t 2
Substituting (9.13) for (9.12) and dividing both sides by X(x)T(t), we have
1 d2 X 1 1 d2 T
2
= 2 = - k2 , ð9:14Þ
X dx v T dt 2
d2 X
þ k 2 X = 0: ð9:15Þ
dx2
we find a solution of
where a is a constant. The constant k can be determined to satisfy the BCs; i.e.,
This solution has already appeared in Chap. 8 as a stationary solution. The readers
are encouraged to derive these results.
In a three-dimensional case, we have a wave equation
2 2 2 2
∂ ψ ∂ ψ ∂ ψ 1 ∂ ψ
2
þ 2þ 2 = 2 : ð9:21Þ
∂x ∂y ∂z v ∂t 2
Similarly we get
1 d2 X 1 d2 Y 1 d2 Z 1 1 d2 T
þ þ = = - k2 : ð9:23Þ
X dx2 Y dx2 Z dx2 v2 T dt 2
Putting
1 d2 X 1 d2 Y 1 d2 Z
= - k 2x , = - k 2y , = - k2z , ð9:24Þ
X dx2 Y dx2 Z dx2
we have
Then, we get a stationary wave solution as in the one-dimensional case such that
kx L = mx π, ky L = my π, k z L = mz π mx , my , mz = 1, 2, ⋯ : ð9:27Þ
Returning to the main issue, let us deal with the mode density. Think of a cube of
each side of L that is placed as shown in Fig. 9.1. Calculating k, we have
π ω
k= k 2x þ k2y þ k2z = m2x þ m2y þ m2z = , ð9:28Þ
L c
where we assumed that the inside of a cavity is vacuum and, hence, the propagation
velocity of light is c. Rewriting (9.28), we have
Lω
m2x þ m2y þ m2z = : ð9:29Þ
πc
The number mx, my, and mz represents allowable modes in the cavity; the set of
(mx, my, mz) specifies individual modes. Note that mx, my, and mz are all positive
integers. If, for instance, -mx were allowed, this would produce sin(-kxx) = -
sin kxx; but this function is linearly dependent on sinkxx. Then, a mode indexed by -
mx should not be regarded as an independent mode. Given a ω, a set (mx, my, mz) that
satisfies (9.29) corresponds to each mode.
Therefore, the number of modes that satisfies a following expression
356 9 Light Quanta: Radiation and Absorption
Lω
m2x þ m2y þ m2z ≤ ð9:30Þ
πc
4π Lω 3
1 L3 ω3
NL = 2= 2 3: ð9:31Þ
3 πc 8 3π c
1 dN L ω2
DðωÞdω = dω = 2 3 dω, ð9:32Þ
L 3 dω π c
where D(ω)dω represents the number of modes per unit volume whose angular
frequencies range ω and ω + dω.
Now, we introduce a function ρ(ω) as an energy density per unit angular
frequency. Then, combining D(ω) with (9.9), we get
9.2 Planck’s Law of Radiation and Mode Density of Electromagnetic Waves 357
ℏω ℏω3 1
ρðωÞ = DðωÞ = : ð9:33Þ
eℏω=kB T - 1 π 2 c3 eℏω=kB T - 1
The relation (9.33) is called Planck’s law of radiation. Notice that ρ(ω) has a
dimension [J m-3 s].
To solve (9.15) under the Dirichlet conditions (9.16) is pertinent to analyzing an
electric field within a cavity surrounded by a metal husk, because the electric field
must be absent at an interface between the cavity and metal. The problem, however,
can equivalently be solved using the magnetic field. This is because at the interface
the reflection coefficient of electric field and magnetic field has a reversed sign (see
Chap. 8). Thus, given an equation for the magnetic field, we may use the Neumann
condition. This condition requires differential coefficients to vanish at the boundary
(i.e., the interface between the cavity and metal). In a similar manner to the above,
we get
By imposing BCs, again we have (9.18) that leads to the same result as the above.
We may also impose the periodic BCs. This type of equation has already been
treated in Chap. 3. In that case we have a solution of
Notice that eikx and e-ikx are linearly independent and, hence, minus sign for m is
permitted. Correspondingly, we have
4π Lω 3
L3 ω 3
NL = 2= : ð9:36Þ
3 2πc 3π 2 c3
ω2
ρðωÞ = DðωÞkB T = k T: ð9:37Þ
π 2 c3 B
,
−
( ) ( )
=
ℏ
ℏω21 = E 2 - E1 , ð9:38Þ
where ω21 is an angular frequency of light that takes part in the optical transitions.
For the time being, let us follow Einstein’s postulates.
1. Stimulated absorption: This process is simply said to be an “absorption.” Let Wa
[s-1] be the transition probability that the electron absorbs a light quantum to be
excited to the excited state. Wa is described as
where N1 is the number of atoms occupying the ground state; B21 is a proportional
constant; ρ(ω21) is due to (9.33). Note that in (9.39) we used ω21 instead of ω in
(9.33). The coefficient B21 is called Einstein B coefficient; more specifically one
of Einstein B coefficients. Namely, B21 is pertinent to the transition from the
ground state to excited state.
2. Emission processes: The processes include both the spontaneous and stimulated
emissions. Let We [s-1] be the transition probability that the electron emits a light
quantum and returns back to the ground state. We is described as
where N2 is the number of atoms occupying the excited state; B12 and A12 are
proportional constants. The coefficient A12 is called Einstein A coefficient rele-
vant to the spontaneous emission. The coefficient B12 is associated with the
stimulated emission and also called Einstein B coefficient together with B21.
Here, B12 is pertinent to the transition from the excited state to ground state.
Now, we have
The reasoning for this is as follows: The coefficients B12 and B21 are proportional
to the matrix elements pertinent to the optical transition. Let T be an operator
associated with the transition. Then, a matrix element is described using an inner
product notation of Chap. 1 by
B21 = hψ 2 jT j ψ 1 i, ð9:42Þ
where ψ 1 and ψ 2 are initial and final states of the system in relation to the optical
transition. As a good approximation, we use er for T (dipole approximation), where
e is an elementary charge and r is a position operator (see Chap. 1). If (9.42)
represents the absorption process (i.e., the transition from the ground state to excited
state), the corresponding emission process should be described as a reversed process
by
360 9 Light Quanta: Radiation and Absorption
B12 = hψ 1 jT j ψ 2 i: ð9:43Þ
B21 = hψ 1 jT { j ψ 2 i, ð9:44Þ
H { = H: ð1:119Þ
T { = T: ð9:45Þ
Thus, we get
But, as in the cases of Sects. 4.2 and 4.3, ψ 1 and ψ 2 can be represented as real
functions. Then, we have
That is, we assume that the matrix B is real symmetric. In the case of two-level
atoms, as a matrix form we get
B= 0
B12
B12
0
: ð9:47Þ
W e = W a: ð9:48Þ
That is,
where we used (9.41) for LHS. Assuming Boltzmann distribution law, we get
9.4 Dipole Radiation 361
N2
= exp½ - ðE 2 - E 1 Þ=kB T : ð9:50Þ
N1
N2
= expð- ℏω21 =kB T Þ: ð9:51Þ
N1
B21 ρðω21 Þ
expð- ℏω21 =kB T Þ = : ð9:52Þ
B21 ρðω21 Þ þ A12
Assuming that
A12 ℏω21 3
= 2 3 , ð9:54Þ
B21 π c
we have
ℏω21 3 1
ρðω21 Þ = : ð9:55Þ
π 2 c3 expðℏω21 =k B T Þ - 1
In (9.54) we only know the ratio of A12 to B21. To have a good knowledge of these
Einstein coefficients, we briefly examine a mechanism of the dipole radiation. The
electromagnetic radiation results from an accelerated motion of a dipole.
A dipole moment p(t) is defined as a function of time t by
.
.
consider a system comprising point charges, integration can immediately be carried
out to yield
pð t Þ = qx,
i i i
ð9:57Þ
where qi is a charge of each point charge i and xi is a position vector of the point
charge i. From (9.56) and (9.57), we find that p(t) depends on how we set up the
coordinate system. However, if a total charge of the system is zero, p(t) does not
depend on the coordinate system. Let p(t) and p′(t) be a dipole moment viewed from
the frame O and O′, respectively (see Fig. 9.3). Then we have
p0 ðt Þ = q x0
i i i
= q ðx
i i 0
þ xi Þ = q
i i
x0 þ qx
i i i
= qx
i i i
= pðt Þ: ð9:58Þ
Notice that with the third equality the first term vanishes because the total charge
is zero. The system comprising two point charges that have an opposite charge (±q)
is particularly simple but very important. In that case we have
Here we assume that q > 0 according to the custom and, hence, x~ is a vector
directing from the minus point charge to the plus charge.
Figure 9.4 displays geometry of an oscillating dipole and electromagnetic radia-
tion from it. Figure 9.4a depicts the dipole. It is placed at the origin of the coordinate
9.4 Dipole Radiation 363
+q z
(a) (b) x
z0 a
θ
‒a ‒z 0
y
O
‒q φ
x
Fig. 9.4 Electromagnetic radiation from an accelerated motion of a dipole. (a) A dipole placed at
the origin of the coordinate system is executing harmonic oscillation along the z-direction around an
equilibrium position. (b) Electromagnetic radiation from a dipole in a wave zone. εe and εm are unit
polarization vectors of the electric field and magnetic field, respectively. εe, εm, and n form a right-
handed system
where z+ and z- are position vectors of a plus charge and minus charge, respectively;
z0 and -z0 are equilibrium positions of each charge; a is an amplitude of the
harmonic oscillation; ω is an angular frequency of the oscillation. Then, accelera-
tions of the charges are given by
Meanwhile, we have
Therefore,
The quantity pð€t Þ, i.e., the second derivative of p(t) with respect to time produces
the electric field described by [2]
E= -€
p=4πε0 c2 r = - qaω2 eiωt e3 =2πε0 c2 r, ð9:66Þ
where r is a distance between the dipole and observation point. In (9.66) we ignored
a term proportional to inverse square and cube of r for the aforementioned reason.
As described in (9.66), the strength of the radiation electric field in the wave zone
measured at a point away from the oscillating dipole is proportional to a component
of the vector of the acceleration motion of the dipole [i.e., pð€t Þ ]. The radiation
electric field lies in the direction perpendicular to a line connecting the observation
point and the point of the dipole (Fig. 9.4). Let εe be a unit polarization vector of the
electric field in that direction and let E⊥ be the radiation electric field. Then, we have
E⊥ = - qaω2 eiωt ðe3 εe Þεe =2πε0 c2 r = - qaω2 eiωt εe sin θ=2πε0 c2 r : ð9:67Þ
As shown in Sect. 7.3, (εe e3)εe in (9.67) “extracts” from e3 a vector component
parallel to εe. Such an operation is said to be a projection of a vector. The related
discussion can be seen in Part III.
It takes a time of r/c for the emitted light from the charge to arrive at the
observation point. Consequently, the acceleration of the charge has to be measured
at the time when the radiation leaves the charge. Let t be the instant when the electric
field is measured at the measuring point. Then, it follows that the radiation leaves the
charge at a time of t - r/c. Thus, the electric field relevant to the radiation that can be
observed far enough away from the oscillating charge is described as [2]
9.4 Dipole Radiation 365
E⊥ ðx, t Þ = - εe : ð9:68Þ
2πε0 c2 r
H ⊥ ðx, t Þ = - n × ε e = - n × εe
cμ0 2πε0 c2 r 2πcr
qaω2 eiωðt - cÞ sin θ
r
=- εm ,
2πcr
ð9:69Þ
where n represents a unit vector in the direction parallel to a line connecting the
observation point and the dipole. The εm is a unit polarization vector of the magnetic
field as defined by (7.67). From the above, we see that the radiation electromagnetic
waves in the wave zone are transverse waves that show the properties the same as
those of electromagnetic waves in a free space.
Now, let us evaluate a time-averaged energy flux from an oscillating dipole.
Using (8.71), we have
ω4 sin 2 θ
SðθÞ = ðeaÞ2 n: ð9:71Þ
8π 2 ε0 c3 r 2
Let us relate the above argument to Einstein A and B coefficients. Since we are
dealing with an isolated dipole, we might well suppose that the radiation comes from
the spontaneous emission. Let P be a total power of emission from the oscillating
dipole that gets through a sphere of radius r. Then we have
366 9 Light Quanta: Radiation and Absorption
2π π
P = SðθÞ ndS = dϕ SðθÞr 2 sin θdθ
0
2π π
0
ð9:72Þ
ω4
= 2 3 ðeaÞ2 dϕ sin θdθ: 3
8π ε0 c 0 0
π
Changing cosθ to t, the integral I 0 sin 3 θdθ can be converted into
1
I= 1 - t 2 dt = 4=3: ð9:73Þ
-1
Thus, we have
ω4
P= ðeaÞ2 : ð9:74Þ
3πε0 c3
ω21 3
A12 = ðeaÞ2 : ð9:75Þ
3πε0 c3 ℏ
π
B12 = ðeaÞ2 : ð9:76Þ
3ε0 ℏ2
2 2
ω 3 e2 πe2
A12 = 21 3 h1jrj2i and B12 = h1jrj2i : ð9:77Þ
3πε0 c ℏ 3ε0 ℏ2
where with the second equality we used (1.116); the last equality comes from that r is
Hermitian. Meanwhile, we have
9.5 Lasers
t t+dt
I(x) I(x+dx)
0 x x+dx L
Fig. 9.5 Rectangular parallelepiped of a laser medium with a length L and a cross-section area
S (not shown). I(x) denotes irradiance at a point of x from the origin
368 9 Light Quanta: Radiation and Absorption
N = N1 þ N 2, ð9:80Þ
where N1 and N2 represent the number of atoms occupying the ground and excited
states, respectively. Suppose that a light is propagated from the left of the rectangular
parallelepiped and entering it. Then, we expect that three processes occur simulta-
neously. One is a stimulated absorption and others are stimulated emission and
spontaneous emission. After these process, an increment dE in photon energy of the
total system (i.e., the rectangular parallelepiped) during dt is described by
E 0
I= c: ð9:84Þ
SL
c0 = c=n, ð9:85Þ
c0 c0
dI = dE = B21 ρðω21 ÞðN 2 - N 1 Þℏω21 dt
SL SL ð9:86Þ
= B21 ρðω21 ÞNℏω21 c0 dt,
where N~ = ðN 2 - N 1 Þ=SL denotes a “net” density of atoms that occupy the excited
state.
The energy density ρ(ω21) can be written as
The quantity I(ω21) is an energy flux that gets through per unit area per unit time.
This flux corresponds to an energy contained in a long and thin rectangular paral-
lelepiped of a length c′ and a unit cross-section area. To obtain ρ(ω21), I(ω12) should
be divided by c′ in (9.87) accordingly. Using (9.87) and replacing c′dt with a
distance dx and I(ω12) with I(x) as a function of x, we rewrite (9.86) as
~ 21
B21 gðω21 ÞNℏω
dI ðxÞ = 0 I ðxÞdx: ð9:88Þ
c
I
dI ðxÞ I ~ 21
B21 gðω21 ÞNℏω x
= d ln I ðxÞ = 0 dx, ð9:89Þ
I0 I ð xÞ I0 c 0
where I0 is an irradiance of light at an instant when the light is entering the laser
medium from the left. Thus, we get
~ 21
B21 gðω21 ÞNℏω
I ðxÞ = I 0 exp x : ð9:90Þ
c0
370 9 Light Quanta: Radiation and Absorption
Equation (9.90) shows that an irradiance of the laser light is augmented expo-
nentially along the path of the laser light. In (9.90), denoting an exponent as G
~ 21
B21 gðω21 ÞNℏω
G , ð9:91Þ
c0
we get
The constant G is called a gain constant. This is an index that indicates the laser
performance. Large numbers B21, g(ω21), and N ~ yield a high performance of the
laser.
In Sects. 8.8 and 9.2, we sought conditions for electromagnetic waves to cause
constructive interference. In a one-dimensional dielectric medium, the condition is
described as
kL = mπ or mλ = 2L ðm = 1, 2, ⋯Þ, ð9:92Þ
It is often the case that if the laser is a long and thin rod, rectangular parallele-
piped, etc., we see that sharply resolved and regularly spaced spectral lines are
observed in emission spectrum. These lines are said to be a longitudinal multimode.
The separation between two neighboring emission lines is referred to as the free
spectral range [2]. If adjacent emission lines are clearly resolved so that the free
spectral range can easily be recognized, we can derive useful information from the
laser oscillation spectra (vide infra).
Rewriting (9.94), as, e.g.,
πc
ωm n = m, ð9:95Þ
L
δn δn πc
nδωm þ ωm δn = nδωm þ ωm δω = n þ ωm δωm = δm: ð9:96Þ
δωm m δωm L
Therefore, we get
-1
πc δn
δωm = n þ ωm δm: ð9:97Þ
L δωm
δn
ng n þ ωm ð9:98Þ
δωm
plays a role of refractive index when the laser material has a wavelength dispersion.
The quantity ng is said to be a group refractive index (or group index). Thus, (9.97) is
rewritten as
πc
δω = δm, ð9:99Þ
Lng
where we omitted the index m of ωm. When we need to distinguish the refractive
index n clearly from the group refractive index, we refer to n as a phase refractive
index. Rewriting (9.98) as a relation of continuous quantities and using differenti-
ation instead of variation, we have [2]
dn dn
ng = n þ ω or ng = n - λ : ð9:100Þ
dω dλ
ω λ
ωdλ þ λdω = 0 or =- :
dω dλ
B
n= Aþ , ð9:101Þ
c 2
1- λ
where A, B, and C are appropriate constants with A and B being dimensionless and
C having a dimension [m]. In an actual case, it would be difficult to determine
n analytically. However, if we are able to obtain well-resolved spectra, δm can be put
as 1 and δωm can be determined from the free spectral range. Expressing it as δωFSR
from (9.99) we have
πc πc
δωFSR = or ng = : ð9:102Þ
Lng LðδωFSR Þ
(a) (b)
11
10
3
3
10
5
9
0
500 520 540 560 580 600 620 640 529 530 531
Wavelength (nm) Wavelength (nm)
Fig. 9.6 Broadband emission spectra of an organic semiconductor crystal AC′7. (a) Full spectrum.
(b) Enlarged profile of the spectrum around 530 nm. (Reproduced from Yamao T, Okuda Y,
Makino Y, Hotta S (2011) Dispersion of the refractive indices of thiophene/phenylene co-oligomer
single crystals. J Appl Phys 110(5): 053113/7 pages [6], with the permission of AIP Publishing.
https://doi.org/10.1063/1.3634117)
indices of thiophene/
phenylene co-oligomer
2
single crystals. J Appl Phys
110(5): 053113/7 pages [6],
with the permission of AIP 1
Publishing. https://doi.org/
10.1063/1.3634117)
0
522 524 526 528
Wavelength (nm)
measurements were carried out using an AC′7 crystal that had a pair of parallel
crystal faces which worked as an optical resonator called Fabry-Pérot resonator. It
caused the interference modulation of emissions and, as a result, produced a periodic
modulation that was superposed onto the broadband emission. It is clearly seen in
Fig. 9.6b.
As another example, Fig. 9.7 [6] displays a laser oscillation spectrum obtained
under a strong excitation regime (using a laser beam) with the same AC′7 crystal as
the above. The structural formula of AC′7 is shown in Fig. 9.8 together with other
374 9 Light Quanta: Radiation and Absorption
Fig. 9.8 Structural formulae of several organic semiconductors BP1T, AC5, and AC′7
related organic semiconductors. The spectrum again exhibits the periodic modula-
tion accompanied by the longitudinal multimode. Sharply resolved emission lines
are clearly noted with a regular spacing in the spectrum. If we compare the regular
spacing with that of Fig. 9.6b, we immediately recognize that the regular spacing is
virtually the same with the two spectra in spite of a considerably large difference
between their overall spectral profile. In terms of the interference, both the broad-
band spectra and laser emission lines gave the same free spectral range [6].
Thus, both the broadband emission spectra and laser oscillation spectra are
equally useful to study the wavelength dispersion of refractive index of the crystal.
Choosing a suitable empirical formula of the wavelength dispersion, we can com-
pare it with experimentally obtained results. For example, if we choose, e.g., (9.101)
for the empirical formula, what we do next is to determine the constants A, B, and C
of (9.101) by comparing the computation results with actual emission data.
If well-resolved interference modulation spectra can be obtained, this enables one
to immediately get a precise value of ng from (9.102). In turn, it leads to information
about the wavelength dispersion of n through (9.100) and (9.101). Bearing in mind
this situation, Yamao et al. got a following expression [6]:
9.5 Lasers 375
Table 9.1 Optimized constants of A, B, and C for Sellmeier’s dispersion formula (9.101) with
several organic semiconductor crystalsa
Material A B C (nm)
BP1T 5.7 1.04 397
AC5 3.9 1.44 402
AC′7 6.0 1.06 452
a
Reproduced from Yamao T, Okuda Y, Makino Y, Hotta S (2011) Dispersion of the refractive
indices of thiophene/phenylene co-oligomer single crystals. J Appl Phys 110(5): 053113/7 pages
[6], with the permission of AIP Publishing. https://doi.org/10.1063/1.3634117
2
c 2
A 1- λ þB
ng = : ð9:103Þ
3
c 2 c 2
1- λ A 1- λ þB
Notice that (9.103) is obtained by inserting (9.101) into (9.100) and expressing ng
as a function of λ. Determining optimum constants A, B, and C, a set of these
constants yield a reliable dispersion formula in (9.101). Numerical calculations can
be utilized effectively. The procedures are as follows: (1) Tentatively choosing
probable numbers for A, B, and C in (9.103), ng can be expressed as a function of
λ. (2) The resulting fitting curve is then compared with ng data experimentally
decided from (9.102). After this procedure, one can choose another set of A, and
B, and C and again compare the fitting curve with the experimental data. (3) This
procedure can be repeated many times through iterative numerical computations of
(9.103) using different sets of A, B, and C.
Thus, we should be able to adjust and determine better and better combination of
A, B, and C so that the refined function (9.103) can reproduce the experimental
results as precise as one pleases. At the same time, we can determine the most
reliable combination of A, B, and C. Optimized constants A, B, and C of (9.101) are
listed in Table 9.1 [6]. Using these numbers A, B, and C, the dispersion of the phase
refractive indices n can be calculated using (9.101) as a function of wavelengths λ.
Figure 9.9 [6] shows several examples of the wavelength dispersion of ng and n for
the organic semiconductor crystals. Once again, it is intriguing to see that ng
associated with the laser oscillation line sits on the dispersion curve determined
from the broadband emission lines (Fig. 9.9a).
The formulae (9.101) and (9.103) along with associated procedures to determine
the constants A, B, and C are expected to apply widely to various laser and light-
emitting materials consisting of semiconducting inorganic and organic materials.
Example 9.2 [7] If we wish to construct a laser device, it will be highly desired to
equip the laser material with a suitable diffraction grating or resonator [8, 9]. Suppose
a situation where we choose a thin crystal for an optically active material equipped
with the diffraction grating. In that case, we are to deal with a device consisting of a
crystal slab waveguide. We have already encountered such a device configuration in
Sect. 8.7 where the transverse modes must be considered on the basis of (8.167). We
376 9 Light Quanta: Radiation and Absorption
7
(a)
BP1T
Group index ng
6 AC'7
5 AC5
3.3
3.2 AC'7 (b)
Refractive index n
3.1 BP1T
3
2.9
2.8
2.7
2.6 AC5
2.5
500 600
Wavelength (nm)
Fig. 9.9 Examples of the wavelength dispersion of refractive indices for several organic semicon-
ductor crystals. (a) Dispersion curves of the group indices (ng). The optimized solid curves are
plotted using (9.103). The data plotted by filled circles (BP1T), filled triangles (AC5), and filled
diamonds (AC′7) were obtained by the broadband interference modulation of emissions. An open
circle, an open triangle, and open diamonds show the group indices estimated from the mode
intervals of the multimode laser oscillations. (b) Dispersion curves of the phase refractive indices
(n). These data are plotted using the Sellmeier’s dispersion formula (9.101) with the optimized
parameters A, B, and C (listed in Table 9.1). (Reproduced from Yamao T, Okuda Y, Makino Y,
Hotta S (2011) Dispersion of the refractive indices of thiophene/phenylene co-oligomer single
crystals. J Appl Phys 110(5): 053113/7 pages [6], with the permission of AIP Publishing. https://
doi.org/10.1063/1.3634117)
should also take account of the longitudinal modes that arise from the presence of the
diffraction grating. As a result, allowed modes in the optical device are indexed by,
e.g., TMml, TEml, where m and l denote the longitudinal modes and transverse
modes, respectively (see Sects. 8.7 and 8.8).
Besides the information about the dispersion of phase refractive index, we need
numerical data of the propagation constant and its dispersion. The propagation
constant β has appeared in Sect. 8.7.1 and is defined as
9.5 Lasers 377
where θ is identical to an angle denoted by θ that was defined in Fig. 8.12. The
propagation constant β (>0) characterizes the light propagation within a slab
waveguide where the transverse modes of light are responsible.
As the phase refractive index n has a dispersion, the propagation constant β has a
dispersion as well. In optics, n sin θ is often referred to as an effective index. We
denote it by
(a)
P6T
(b)
2 µm
(c)
P6T crystal
air
optical device
AZO grating
Fig. 9.10 Construction of an organic light-emitting device. (a) Structural formula of P6T. (b)
Diffraction grating of an organic device. The purple arrow indicates the direction of grating
wavevector K. (Reproduced from Yamao T, Higashihara S, Yamashita S, Sano H, Inada Y,
Yamashita K, Ura S, Hotta S (2018) Design principle of high-performance organic single-crystal
light-emitting devices. J Appl Phys 123(23): 235501/13 pages [7], with the permission of AIP
Publishing. https://doi.org/10.1063/1.5030486). (c) Cross-section of the device consisting of P6T
crystal/AZO substrate
9.5 Lasers 379
where β is the propagation vector with |β| β that appeared in (8.153). The quantity
β is called a propagation constant. In (9.105), furthermore, m is the diffraction order
with a natural number (i.e., a positive integer); k0 denotes the wavenumber vector of
an emission in vacuum with
jk0 j k 0 = 2π=λp ,
where λp is the emission peak wavelength in vacuum. Note that k0 is almost identical
to the wavenumber of the emission in air (i.e., the emission to be detected). The unit
vector ~e that is oriented in the direction parallel to β - mK within the crystal plane is
defined as
β - mK
~e = : ð9:106Þ
j β - mK j
jK j K = 2π=Λ,
where Λ is the grating period and the direction of K is perpendicular to the grating
grooves; that direction is indicated in Fig. 9.10b with a purple arrow. Equation
(9.105) can be visualized in Fig. 9.11 as geometrical relationship among the light
propagation in crystal (indicated with β), the propagation outside the crystal (k0), and
the grating wavevector (K).
If we are dealing with the emission parallel to the substrate plane, i.e., grazing
emission, (9.105) can be reduced to
380 9 Light Quanta: Radiation and Absorption
β - mK = k0 : ð9:107Þ
Equations (9.105) and (9.107) represent the relationship between β and k0, i.e.,
the phase matching conditions between the emission inside the device (organic slab
crystal) and the out-coupled emission, namely the emission outside the device
(in air).
Thus, the boundary conditions can be restated as (1) the tangential component
continuity of electromagnetic fields at both the planes of interface (see Sect. 8.1) and
(2) the phase matching of the electromagnetic fields both inside and outsides the
laser medium. We examine these two factors below.
1. Tangential component continuity of electromagnetic fields (waveguide
condition):
Let us suppose that we are dealing with a dielectric medium with anisotropic
dielectric constant, but isotropic magnetic permeability. In this situation, Max-
well’s equations must be formulated so that we can deal with the electromagnetic
properties in an anisotropic medium. Such substances are widely available and
organic crystals are counted as typical examples. Among those crystals, P6T
crystallizes in the monoclinic system as in many other cases of organic crystals
[10, 11]. In this case, the permittivity tensor (or electric permittivity tensor) ε is
written as [7]
εaa 0 εac
ε= 0 εbb 0 : ð9:108Þ
εc a 0 εc c
μ0 0 0
μ= 0 μ0 0 , ð9:109Þ
0 0 μ0
∂H
rot E = - μ0 , ð9:110Þ
∂t
9.5 Lasers 381
εaa 0 εac ∂E
rot H = ε0 0 εbb 0 : ð9:111Þ
εc a 0 εc c ∂t
ξ
ðξ η ζ Þ~ε η = 1, ð9:112Þ
ζ
0
ð0 η ζ Þ~ε η = εηη η2 þ εζζ ζ 2 = 1: ð9:113Þ
ζ
where Rc ðαÞ and Rξ(δ) stand for the first and second rotation matrices of the
above, respectively. In (9.114) we have
cos α - sin α 0 1 0 0
Rc ðαÞ = sin α cos α 0 , Rξ ðδÞ = 0 cos δ - sin δ : ð9:115Þ
0 0 1 0 sin δ cos δ
εξξ 0 εξζ ∂E
rot H = ε0 0 εηη 0 : ð9:116Þ
εζξ 0 εζζ ∂t
Notice here that the electromagnetic quantities H and E are measured in the
ξηζ-coordinate system.
Meanwhile, the electromagnetic waves that are propagating within the slab
crystal are described as
9.5 Lasers 383
where Φν stands for either an electric field or a magnetic field with ν chosen from
ν = 1, 2, 3 representing each component of the ξηζ-system. Inserting (9.117) into
(9.116) and separating the resulting equation into ξ, η, and ζ components, we
obtain six equations with respect to six components of H and E. Of these, we are
particularly interested in Eξ and Eζ as well as Hη, because we assume that the
electromagnetic wave is propagated as a TM mode [7].
Using the set of the above six equations, with Hη in the P6T crystal we get the
following second-order linear differential equation (SOLDE):
and inserting (9.119) into (9.118), we get a following type of equation described
by
In (9.119) Hcryst represents the magnetic field within the P6T crystal. The
constants Ζ and Ω in (9.120) can be described using κ and k together with the
constant coefficients of dHη/dζ and Hη of SOLDE (9.118). Meanwhile, we have
where the differentiation is pertinent to ζ. This shows that e-iκζ cos (kζ + ϕs) and
e-iκζ sin (kζ + ϕs) are linearly independent. This in turn implies that
ΖΩ0 ð9:121Þ
εξζ εξζ 1
κ=β = neff k0 , k2 = εξξ εζζ - εξζ 2 εζζ - neff 2 k0 2 : ð9:122Þ
εζζ εζζ εζζ 2
εζζ
Eξ = i kH e - iκζ sinðkζ þ ϕs Þ: ð9:123Þ
ωε0 ðεξξ εζζ - εξζ 2 Þ cryst
d2 H η
~2 - neff 2 k0 2 H η = 0,
þ n ð9:124Þ
dζ 2
where ~
n is the refractive index of either the substrate or air. Since
1<~
n ≤ neff ≤ n, ð9:125Þ
where n is the phase refractive index of the P6T crystal, we have ~n2 - neff 2 < 0.
Defining a quantity
γ neff 2 - ~
n2 k 0 ð9:126Þ
for the substrate (i.e., AZO) and air, from (9.124) we express the electromagnetic
fields as
~ γζ ,
H η = He ð9:127Þ
where H~ denotes the magnetic field relevant to the substrate or air. Considering
that ζ < 0 for the substrate and ζ > d/ cos δ in air (see Fig. 9.13), with the
magnetic field we have
~ γζ
H η = He ~ - γðζ - cosd δÞ
and H η = He ð9:128Þ
γ ~ γζ γ ~ - γ ðζ - cosd δÞ
Eξ = - i He and Eξ = i He ð9:129Þ
ωε0 ~
n2 ωε0 ~
n2
P6T crystal
AZO substrate
where the factor of cosδ comes from the requirement of the tangential continuity
condition. (b) The tangential continuity condition for the electric field at the same
interface reads as
εζζ γ ~ γ0
i kH e - iκ0 sinðk 0 þ ϕs Þ = - i He : ð9:131Þ
ωε0 ðεξξ εζζ - εξζ 2 Þ cryst ωε0 ~n2
εζζ γ εζζ γ
k tan ϕs = - 2 or k tanð- ϕs Þ = 2 , ð9:132Þ
εξξ εζζ - εξζ 2 ~
n εξξ εζζ - εξζ 2 ~n
386 9 Light Quanta: Radiation and Absorption
where γ and ~n are substrate related quantities; see (9.125) and (9.126).
Another boundary condition at the air/crystal interface can be obtained in a
similar manner. Notice that in that case ζ = d/ cos δ; see Fig. 9.13. As a result,
we have
εζζ kd γ
k tan þ ϕs = 2 , ð9:133Þ
εξξ εζζ - εξζ 2 cos δ n
where γ and n ~ are air related quantities. The quantity ϕs can be eliminated
considering arc tangent of (9.132) and (9.133).
In combination with (9.122), we finally get the following eigenvalue equation
for TM modes:
1
k0 d εξξ εζζ - εξζ 2 εζζ - neff 2 =ðcos δÞ
εζζ 2
1 neff 2 - nair 2
= lπ þ tan - 1 ðεξξ εζζ - εξζ 2 Þ ð9:134Þ
nair 2 εζζ - neff 2
1 neff 2 - nsub 2
þ tan - 1 ðεξξ εζζ - εξζ 2 Þ ,
nsub 2 εζζ - neff 2
where nair (=1) is the refractive index of air, nsub is that of AZO substrate
equipped with a diffraction grating, d is the crystal thickness, and l is the order
of the transverse mode. The refractive index nsub is estimated as the average of the
refractive indices of air and AZO weighted by volume fraction they occupy in the
diffraction grating.
Regarding the index l, the condition l ≥ 0 (i.e., zero or a positive integer) must
be satisfied. Note that l = 0 is allowed because the total internal reflection (Sects.
8.5 and 8.6) is responsible in the crystal waveguide. Thus, neff can be determined
as a function of k0 with d and l being parameters. To solve (9.134), iterative
numerical computation is needed. The detailed calculation procedures for this can
be seen in the literature [7] and supplementary material therein.
Note that (9.134) includes the well-known eigenvalue equation for a wave-
guide that comprises an isotropic core dielectric medium sandwiched by a couple
of clad layers having the same refractive index (symmetric waveguide) [12]. In
that case, in fact, the second and third terms in RHS of (9.134) are the same and
their sum is identical with -δTM of (8.176) in Sect. 8.7.2. To confirm it, in (9.134)
use εξξ = εζζ ≈ n2 [see (7.57)] and εξζ = 0 together with the relative refractive
index given by (8.39).
(a) (b)
magnitudes of β, k0, and K according to the above two cases. Using an elementary
trigonometric formula, we describe the results as follows:
or
where we assume that m is a positive integer. In (9.135) the former and latter
equations correspond to Case (1) and Case (2), respectively (see Fig. 9.14).
Replacing β with β = k0neff and taking the variation of the resulting equation
with regard to k0 and θ, we get
where the signs - and + are associated with Case (1) and Case (2), respectively.
In (9.136) we did not take the variation of neff, because it was assumed to be
weakly dependent on θ compared to k0. Note that since k0 was not allowed to take
continuous numbers, we used the sign of variation δ instead of the sign of
differentiation d or ∂. The denominator of (9.136) is positive (readers, check
it), and so δk0/δθ is negative or positive with 0 ≤ θ ≤ π/2 for Case (1) and Case
(2), respectively. The quantity δk0/δθ is positive or negative with -π/2 ≤ θ ≤ 0 for
Case (1) and Case (2), respectively. For both Case (1) and Case (2), δk0/δθ = 0 at
θ = 0, namely, k0 (or the emission wavelength) takes an extremum at θ = 0. In
terms of the emission wavelength, Case (1) and Case (2) are related to the redshift
and blueshift in the spectra, respectively. These shifts can be detected and
inspected closely by the angle-dependent emission spectra (vide infra).
Meanwhile, taking inner products (see Chap. 13) of both sides of (9.107),
we have
(a) Z
(b) Z
k0 k0
mK mK
Grating Grating
Crystal β Crystal β
X X
(c) Z
mK
Grating
k0
Crystal −β β
X
Fig. 9.15 Schematic diagram for the relationship among k0, β, and K. This geometry is obtained by
substituting φ = 0 for (9.137). (a) Redshifted case. (b) Blueshifted case. (c) Bragg’s condition case.
All these diagrams are depicted as a cross section of the devices cut along the K direction.
(Reproduced from Inada Y, Kawata Y, Kawai T, Hotta S, Yamao T (2020) Laser oscillation at
an intended wavelength from a distributed feedback structure on anisotropic organic crystals. J Appl
Phys 127(5): 053102/6 pages [13], with the permission of AIP Publishing. https://doi.org/10.1063/
1.5129425)
ðβ - mK Þ2 = k 0 2 or β - mK = ± k0 :
2π mK
ð1Þ β = k 0 þ mK = neff k0 , k0 = = ,
λp neff - 1
ð9:138Þ
2π mK
ð 2Þ - β = k 0 - mK = - neff k0 , k0 = = :
λp neff þ 1
In (9.138), all the physical quantities are assumed to be positive. According to the
above two cases, we have the following corresponding geometry such that
1. β, K k k0 (β and K are parallel to k0),
2. β, K ka k0 (β and K are antiparallel to k0).
The above geometry is depicted in Fig. 9.15 [13]. Figure 9.15 includes a case of
the Bragg’s condition as well (Fig. 9.15c). Notice that Fig. 9.15a and Fig. 9.15b are
special cases of the configurations depicted in Fig. 9.14a and Fig. 9.14b,
respectively.
390 9 Light Quanta: Radiation and Absorption
where the sign ∓ corresponds to the aforementioned Case (1) and Case (2), respec-
tively; we used the relation of K = 2π/Λ. Equation (9.139) is useful for us to guess
the magnitude of m. It is because since m is changed stepwise (e.g., m = 1, 2, 3,etc.),
we have only to examine a limited number of m so that neff may satisfy (9.125).
When the laser oscillation is responsible, (9.138) and (9.139) should be replaced
with the following relation that represents the Bragg’s condition [14]:
2β = mK ð9:140Þ
or
2neff Λ
λp = : ð9:141Þ
m
Once we find out a suited m, (9.107) enables us to seek the relationship between
λp and neff experimentally.
(a) (b)
4
(10 counts)
Intensity 2
3
90
ion angle (degree)
60
30
RotatAngle
−30
−60
−90
600 700 800
Wavelength (nm)
Fig. 9.16 Angle-dependent emission spectra of a P6T crystal. (a) Emission spectra. (Reproduced
from Yamao T, Higashihara S, Yamashita S, Sano H, Inada Y, Yamashita K, Ura S, Hotta S (2018)
Design principle of high-performance organic single-crystal light-emitting devices. J Appl Phys
123(23): 235501/13 pages [7], with the permission of AIP Publishing. https://doi.org/10.1063/1.
5030486). (b) Geometrical relationship between k0 and K. The grazing emissions (k0) were
collected at various angles θ made with the grating wavevector direction (K)
2
(1, 1, r)
and, hence, (9.134) is associated with the experimental data in that sense. As clearly
recognized from Fig. 9.17, agreement between the experiments and numerical
computations is satisfactory. These curves are compared with a black solid curve
drawn at the upper right that indicates the wavelength dispersion of the phase
refractive indices (n) of the P6T crystal related to the one principal axis [15].
In a practical sense, it will be difficult to make a device that causes the laser
oscillation at an intended wavelength using an anisotropic crystal as in the case of
present example. It is because mainly of difficulty in predicting neff of the anisotropic
crystal. From (9.134), however, we can readily determine the wavelength dispersion
of neff of the crystal. From the Bragg’s condition (9.141), in turn, Λ can precisely be
designed for a given index m and suitable λp (at which neff can be computed with a
crystal of its thickness d ). Naturally, it is sensible to set λp at a wavelength where
high optical gain (or emission gain) is anticipated [16].
Example 9.3 [13] Thus, everything has been arranged for one to construct a high-
performance optical device, especially a laser. Using a single crystal of BP3T (see
Fig. 9.18a for the structural formula), Inada and co-workers have developed a device
that causes the laser oscillation at an intended wavelength [13].
To this end, the BP3T crystal was placed on a silicon oxide substrate and
subsequently spin-coated with a photoresist film. The photoresist film was then
9.5 Lasers 393
(a)
S
S S
BP3T
(b)
Effective/phase refractive index
2.8
(2, 0, b)
2.6 (m, l) =
(3, 0)
2.4
(m, l, s) = (1, 0, r)
2.2
550 600 650 700
Peak wavelength (nm)
Fig. 9.18 Wavelength dispersions of the effective/phase refractive indices for the BP3T crystal
waveguide. (a) Structural formula of BP3T. (b) Dispersion data of the device D1 (see Table 9.2).
Colored solid curves (either green or red) are obtained from (9.134). The data of green circles and
red triangles are obtained through (9.107) using the emission directions and the emission line
locations. The laser oscillation line is indicated with a blue square as a TM30 mode occurring at
568.1 nm. The black solid curve drawn at the upper right indicates the dispersion of the phase
refractive index of the BP3T crystal pertinent to the one principal axis [15]. (Reproduced from
Inada Y, Kawata Y, Kawai T, Hotta S, Yamao T (2020) Laser oscillation at an intended wavelength
from a distributed feedback structure on anisotropic organic crystals. J Appl Phys 127(5): 053102/6
pages [13], with the permission of AIP Publishing. https://doi.org/10.1063/1.5129425)
engraved with a one-dimensional diffraction grating so that the a-axis of the BP3T
crystal was parallel to the grating wavevector. With this device geometry, the angle-
dependent spectroscopy was carried out in a weak excitation regime to determine the
wavelength dependence of the effective index. At the same time, strong excitation
measurement was performed to analyze the laser oscillation data.
394 9 Light Quanta: Radiation and Absorption
The BP3T crystal exhibited a high emission gain around 570 nm [17]. Accord-
ingly, setting the laser oscillation wavelength at 569.0 nm, Inada and co-workers
made the BP3T crystal devices with different crystal thicknesses. As an example,
using a crystal of 428 nm in thickness (device D1; Table 9.2), on the basis of (9.134)
they estimated the effective index neff at 569.0 nm in the direction of the a-axis of the
BP3T crystal. Meanwhile, using (9.141) and assuming m to be 3 in (9.141), the
grating period was decided with the actual period of the resulting device being
319.8 nm (see Table 9.2).
The results of the emission measurements for the device D1 are displayed in
Fig. 9.18b. The figure shows the dispersion of the effective indices neff of the BP3T
crystal. Once again, the TM20 (blueshifted) and TM10 (redshifted) modes are
seamlessly jointed. As in the case of Fig. 9.9a, the laser oscillation line shows up
as a TM30 mode among other modes. That is, the effective index neff of the laser
oscillation line sits on the dispersion curve consisting of other emission lines arising
from the weak excitation.
Figure 9.19 shows the distinct single-mode laser oscillation line occurring at
568.1 nm (Table 9.2). This clearly indicates that the laser oscillation has been
achieved at a location very close to the designed wavelength (at 569.0 nm). Compare
the spectrum of Fig. 9.19 with that of Fig. 9.7. Manifestation of the single-mode line
is evident and should be contrasted with the longitudinal multimode laser oscillation
of Fig. 9.7.
To quantify the validity of (9.134), the following quantity V is useful [13]:
where V stands for validity of (9.134); neff (WG) was estimated from the waveguide
condition of (9.134) in combination with the emission data obtained with the device;
neff (PM) was obtained from the phase matching condition described by (9.107),
9.5 Lasers 395
where we designated the polarization vector as a direction of the x-axis. At the same
time, we omitted the index from the amplitude. Thus, from the second equation of
(7.65) we have
p
∂H y 1 ∂E x 2Ek
=- =- sin ωt cos kz: ð9:143Þ
∂t μ ∂z μ
Note that this equation appeared as (8.131) as well. Integrating both sides of
(9.143), we get
p
2Ek
Hy = cos ωt cos kz:
μω
p
2E
Hy = cos ωt cos kz þ C,
μv
E
H ,
μv
we have
p
H y = 2H cos ωt cos kz:
Thus, E (ke1), H (ke2), and n (ke3) form the right-handed system in this order. As
noted in Sect. 8.8, at the interface (or wall) the electric field and magnetic field form
nodes and antinodes, respectively. Namely, the two fields are in-quadrature.
Let us calculate electromagnetic energy of the dielectric medium within a cavity.
In the present case the cavity is meant as the dielectric sandwiched by a couple of
metal layer. We have
W = W e þ W m,
where W is the total electromagnetic energy; We and Wm are electric and magnetic
energies, respectively. Let the length of the cavity be L. Then the energy per unit
cross-section area is described as
L L
ε μ
We = E2 dz and W m = H 2 dz:
2 0 2 0
ε 2
We = LE sin 2 ωt,
2
2 ð9:144Þ
μ μ E ε 2
Wm = LH 2 cos 2 ωt = L cos 2 ωt = LE cos 2 ωt,
2 2 μv 2
ε 2 μ 2
W= LE = LH : ð9:145Þ
2 2
ε ε ~ = ε E2 = μ H 2 :
W~e = E 2 sin2 ωt, W~m = E2 cos 2 ωt, W ð9:146Þ
2 2 2 2
v0
xð t Þ = sin ωt = x0 sin ωt: ð2:7Þ
ω
v0
x0 :
ω
Defining
p0 mωx0 ,
we have
Let a kinetic energy and potential energy of the oscillator be K and V, respec-
tively. Then, we have
1 1 2
K= pð t Þ2 = p cos 2 ωtxðt Þ,
2m 2m 0
1 1
V = mω2 xðt Þ2 = mω2 x0 2 sin 2 ωtxðt Þ, ð9:147Þ
2 2
1 2 1 1
W =K þ V = p = mv0 2 = mω2 x0 2 :
2m 0 2 2
References
A general form of n-th order linear differential equations has the following form:
dn u dn - 1 u du
an ð x Þ n þ an - 1 ðxÞ n - 1 þ ⋯ þ a1 ðxÞ þ a0 ðxÞu = dðxÞ: ð10:1Þ
dx dx dx
d2 u du
að x Þ 2
þ bðxÞ þ cðxÞu = dðxÞ: ð10:2Þ
dx dx
In (10.2) we assume that the variable x is real. The equation can be solved under
appropriate boundary conditions (BCs). A general form of BCs is described as
du du
B1 ðuÞ = α1 uðaÞ þ β1 x=a þ γ 1 uðbÞ þ δ1 = σ1, ð10:3Þ
dx dx x=b
du du
B2 ðuÞ = α2 uðaÞ þ β2 x=a þ γ 2 uðbÞ þ δ2 = σ2, ð10:4Þ
dx dx x=b
where α1, β1, γ 1, δ1 σ 1, etc. are real constants; u(x) is defined in an interval [a, b],
where a and b can be infinity (i.e., ±1). The LHS of B1(u) and B2(u) is referred to as
boundary functionals [1, 2]. If σ 1 = σ 2 = 0, the BCs are called homogeneous;
otherwise, the BCs are said to be inhomogeneous. In combination with the inhomo-
geneous equation expressed as (10.2), Table 10.1 summarizes characteristics of
SOLDEs. We have four types of SOLDEs according to homogeneity and inhomo-
geneity of equations and BCs.
Even though SOLDEs are mathematically tractable, yet it is not easy necessarily
to solve them depending upon the nature of a(x), b(x), and c(x) of (10.2). Nonethe-
less, if those functions are constant coefficients, it can readily be solved. We will deal
with SOLDEs of that type in great deal later. Suppose that we find two linearly
independent solutions u1(x) and u2(x) of a following homogeneous equation:
d2 u du
að x Þ þ bðxÞ þ cðxÞu = 0: ð10:5Þ
dx2 dx
Then, any solution u(x) of (10.5) can be expressed as their linear combination
such that
10.1 Second-Order Linear Differential Equations (SOLDEs) 403
a1 a a
f n ð xÞ = - f ðxÞ - 2 f 2 ðxÞ - ⋯ - n - 1 f n - 1 ðxÞ: ð10:8Þ
an 1 an an
If f1(x), f2(x), ⋯, and fn(x) are not linearly dependent, they are called linearly
independent. In other words, the statement that f1(x), f2(x), ⋯, and fn(x) are linearly
independent is equivalent to that (10.7) holds if and only if a1 = a2 = ⋯ = an = 0.
We will have relevant discussion in Part III.
Now suppose that with the above two linearly independent functions u1(x) and
u2(x), we have
u1 ðxÞ u2 ð x Þ
a1
du1 ðxÞ du2 ðxÞ = 0: ð10:11Þ
a2
dx dx
Thus, that u1(x) and u2(x) are linearly independent is equivalent to that the
following expression holds:
404 10 Introductory Green’s Functions
u1 ð x Þ u2 ð xÞ
du1 ðxÞ du2 ðxÞ W ðu1 , u2 Þ ≠ 0, ð10:12Þ
dx dx
where W(u1, u2) is called Wronskian of u1(x) and u2(x). In fact, if W(u1, u2) = 0, then
we have
du2 du
u1 - u2 1 = 0: ð10:13Þ
dx dx
This implies that there is a functional relationship between u1 and u2. In fact, if we
can express as u2 = u2(u1(x)), then du dx = du1 dx . That is,
2 du2 du1
where the second equality of the first equation comes from (10.13). The third
equation can easily be integrated to yield
u2 u2
ln = c or = ec or u2 = e c u 1 : ð10:15Þ
u1 u1
Equation (10.15) shows that u1(x) and u2(x) are linearly dependent. It is easy to
show if u1(x) and u2(x) are linearly dependent, W(u1, u2) = 0. Thus, we have a
following statement:
Two functions are linearly dependent. , W(u1, u2) = 0.
Then, as the contraposition of this statement we have
Two functions are linearly independent. , W (u1, u2) ≠ 0.
Conversely, suppose that we have another solution u3(x) for (10.5) besides u1(x)
and u2(x). Then, we have
d 2 u1 du
að x Þ 2
þ bðxÞ 1 þ cðxÞu1 = 0,
dx dx
2
d u du
aðxÞ 22 þ bðxÞ 2 þ cðxÞu2 = 0, ð10:16Þ
dx dx
2
d u du
aðxÞ 23 þ bðxÞ 3 þ cðxÞu3 = 0:
dx dx
d 2 u1 du1
u1
dx2 dx
a
d 2 u2 du2
u2 b = 0: ð10:17Þ
dx2 dx
c
d 2 u3 du3
u3
dx2 dx
d 2 u1 du1
u1
dx2 dx
d 2 u2 du2
u2 = 0: ð10:18Þ
dx2 dx
d 2 u3 du3
u3
dx2 dx
d 2 u1 du1
u1 u1 u2 u3
dx2 dx
du1 du2 du3
d 2 u2 du2
u2 = - dx dx dx - W ðu1 , u2 , u3 Þ, ð10:19Þ
dx2 dx
d 2 u1 d 2 u2 d 2 u3
d 2 u3 du3
u3 dx2 dx2 dx2
dx2 dx
where W(u1, u2, u3) is Wronskian of u1(x), u2(x), and u3(x). In the above relation, we
used the fact that a determinant of a matrix is identical to that of its transposed matrix
and that a determinant of a matrix changes the sign after permutation of row vectors.
To be short, a necessary and sufficient condition to get a non-trivial solution is that
W(u1, u2, u3) vanishes.
This implies that u1(x), u2(x), and u3(x) are linearly dependent. However, we have
assumed that u1(x) and u2(x) are linearly independent, and so (10.18) and (10.19)
mean that u3(x) must be described as a linear combination of u1(x) and u2(x). That is,
we have no third linearly independent solution. Consequently, the general solution
of (10.5) must be given by (10.6). In this sense, u1(x) and u2(x) are said to be a
fundamental set of solutions of (10.5).
Next, let us consider the inhomogeneous equation of (10.2). Suppose that up(x) is
a particular solution of (10.2). Let us think of a following function v(x) such that:
406 10 Introductory Green’s Functions
d2 v dv d 2 up dup
að x Þ þ b ð x Þ þ c ð x Þv þ a ð x Þ þ bð x Þ þ cðxÞup = dðxÞ: ð10:21Þ
dx2 dx dx2 dx
Therefore, we have
d2 v dv
að x Þ þ bðxÞ þ cðxÞv = 0:
dx2 dx
But v(x) can be described by a linear combination of u1(x) and u2(x) as in the case
of (10.6). Hence, the general solution of (10.2) should be expressed as
du
að x Þ þ bðxÞu = d ðxÞ ½aðxÞ ≠ 0: ð10:23Þ
dx
where α, β, and σ are real constants; u(x) is defined in an interval [a, b]. If in (10.23)
d(x) 0, (10.23) can readily be integrated to yield a solution. Let us multiply both
sides of (10.23) by w(x). Then we have
du
wðxÞaðxÞ þ wðxÞbðxÞu = wðxÞdðxÞ: ð10:25Þ
dx
where w(x) is called a weight function. As mentioned in Sect. 2.4, the weight
function is a real and non-negative function within the domain considered. Here
we suppose that
dpðxÞ
= wðxÞbðxÞ: ð10:27Þ
dx
d
½pðxÞu = wðxÞdðxÞ: ð10:28Þ
dx
x
1
u= wðx0 Þdðx0 Þdx0 þ C , ð10:29Þ
pð x Þ
b
p0 = ðwaÞ0 = wb = wa : ð10:30Þ
a
b C0 b
wa = C0 exp dx or w= exp dx , ð10:31Þ
a a a
0
where C′ is an arbitrary integration constant. The quantity Ca must be non-negative so
that w can be non-negative.
Example 10.1 Let us think of a following FOLDE within an interval [a, b]; i.e.,
a ≤ x ≤ b.
du
þ xu = x: ð10:32Þ
dx
uðaÞ = σ: ð10:33Þ
x
1
uð x Þ = wðx0 Þd ðx0 Þdx0 þ uðaÞpðaÞ : ð10:34Þ
pð x Þ a
Also, we have
x
1 2
pðxÞ = wðxÞ = exp x0 dx0 = exp x - a2 : ð10:35Þ
a 2
1 σ-1
uð x Þ = ½ pð x Þ - 1 þ σ = 1 þ
pð x Þ pð x Þ
ð10:37Þ
1 2
= 1 þ ðσ - 1Þ exp a -x ,2
2
d
Lx = : ð10:38Þ
dx
b b
d d
dxφ ψ þ dx φ ψ = ½φ ψ ba : ð10:39Þ
a dx a dx
b b
d d
dxφ ψ - dx - φ ψ = ½φ ψ ba : ð10:40Þ
a dx a dx
d
L~x -
dx
in (10.40), we have
Comparing (10.42) and (10.43) and considering that φ and ψ are arbitrary
functions, we have L~x = Lx { . Thus, as an operator adjoint to Lx we get
d
Lx { = L~x = - = - Lx :
dx
Notice that only if the surface term vanishes, the adjoint operator Lx{ can
appropriately be defined. We will encounter a similar expression again in Part III.
We add that if with an operator A we have a relation described by
A{ = - A, ð10:44Þ
Let us then examine on what condition the surface term vanishes. The RHS of
(10.40) and (10.41) is given by
φ ðbÞ ψ ðaÞ
φ ðbÞψ ðbÞ = φ ðaÞψ ðaÞ or = :
φ ðaÞ ψ ðbÞ
If ψ(b) = 2ψ(a), then we should have φ(b) = φ(a)/2 for the surface term to
vanish. Recalling (10.24), the above conditions are expressed as
1
B 0 ð φÞ = φðaÞ - φðbÞ = 0: ð10:46Þ
2
The boundary functional B′(φ) is said to be adjoint to B(ψ). The two boundary
functionals are admittedly different. If, however, we set ψ(b) = ψ(a), then we should
have φ(b) = φ(a) for the surface term to vanish. That is
Thus, the two functionals are identical and ψ and φ satisfy homogeneous BCs
with respect to these functionals.
As discussed above, a FOLDE is characterized by its differential operator as well
as a BC (or boundary functional). This is similarly the case with SOLDEs as well.
Example 10.3 Next, let us consider a following differential operator:
1 d
Lx = : ð10:49Þ
i dx
b b
1 d 1 d 1
dxφ ψ - dx φ ψ = ½φ ψ ba : ð10:50Þ
a i dx a i dx i
1 b
hφjLx ψ i - hLx φjψ i = ½φ ψ a : ð10:51Þ
i
Repeating a discussion similar to Example 10.2, when the surface term vanishes,
we get
Lx { = Lx : ð10:54Þ
The second-order differential operators are the most common operators and fre-
quently treated in mathematical physics. The general differential operators are
described as
412 10 Introductory Green’s Functions
d2 d
Lx = aðxÞ þ bðxÞ þ cðxÞ, ð10:55Þ
dx2 dx
where a(x), b(x), and c(x) can in general be complex functions of a real variable x.
Let us think of the following identities [1]:
d2 u d2 ðav Þ d du dðav Þ
v a 2
-u 2
= av -u ,
dx dx dx dx dx
du dðbv Þ d ð10:56Þ
v b þ u = ½buv ,
dx dx dx
v cu - ucv = 0:
d2 u du d 2 ðav Þ d ðbv Þ
v a 2
þ b þ cu - u - þ cv
dx dx dx2 dx
ð10:57Þ
d du dðav Þ d
= av -u þ ½buv :
dx dx dx dx
Hence, following the expressions of Sect. 10.2, we define Lx{ such that
d2 ðav Þ dðbv Þ
Lx { v - þ cv : ð10:58Þ
dx2 dx
d 2 ð a v Þ d ð b vÞ
Lx { v = - þ c v: ð10:59Þ
dx2 dx
d2 da d d2 a db
Lx { = a 2
þ 2 - b þ - þ c : ð10:60Þ
dx dx dx dx2 dx
d du d ðav Þ
v ðLx uÞ - Lx { v u = av -u þ buv : ð10:61Þ
dx dx dx
Assuming that the relevant SOLDE is defined in [r, s] and integrating (10.61)
within that interval, we get
10.3 Second-Order Differential Operators 413
d ðav Þ
s s
du
dx v ðLx uÞ - Lx { v u = av -u þ buv : ð10:62Þ
r dx dx r
dðav Þ
s
du
hvjLx ui - Lx { vju = av -u þ buv :
dx dx r
Here if RHS of the above (i.e., the surface term of the above expression) vanishes,
we get
hvjLx ui = Lx { vju :
daðxÞ
= bðxÞ: ð10:63Þ
dx
d2 d
Lx { = aðxÞ þ bð x Þ þ cð xÞ = L x :
dx2 dx
Notice that b(x) is eliminated from (10.64). If RHS of (10.64) vanishes, we get
This notation is consistent with (1.119) and the Hermiticity of Lx becomes well-
defined.
If we do not have the condition of dadxðxÞ = bðxÞ, how can we deal with the
problem? The answer is that following the procedures in Sect. 10.2, we can convert
Lx to a self-adjoint operator by multiplying Lx by a weight function w(x) introduced
in (10.26), (10.27), and (10.31). Replacing a(x), b(x), and c(x) with w(x)a(x), w(x)
b(x), and w(x)c(x), respectively, in the identity (10.57), we rewrite (10.57) as
414 10 Introductory Green’s Functions
d2 u du d 2 ðawv Þ dðbwv Þ
v aw þ bw þ cwu - u - þ cwv
dx2 dx dx2 dx
ð10:65Þ
d du dðawv Þ d
= awv -u þ ½bwuv :
dx dx dx dx
Let us calculate {⋯} of the second term for LHS of (10.65). Using the conditions
of (10.26) and (10.27), we have
d 2 ðawv Þ dðbwv Þ
- þ cwv
dx2 dx
′
= ðawÞ ′ v þ awv
′ ′
- bw ′ v þ bwv þ cwv
′ ′
= bwv þ awv ′
- bw ′ v þ bwv þ cwv
ð10:66Þ
= ðbwÞ ′ v þ bwv þ ðawÞ ′ v þ awv - ðbwÞ ′ v - bwv þ cwv
′ ′ ″ ′
″ ′ ″ ′
= w av þ bv þ cv = w a v þ b v þ c v
= w av″ þ bv ′ þ cv :
The second last equality of (10.66) is based on the assumption that a(x), b(x), and
c(x) are real functions. Meanwhile, for RHS of (10.65) we have
d du dðawv Þ d
awv -u þ ½bwuv
dx dx dx dx
d du dv d
= awv - uðawÞ0 v - uaw þ ½bwuv
dx dx dx dx
ð10:67Þ
d du dv d
= awv - ubwv - uaw þ ½bwuv
dx dx dx dx
d du dv d du dv
= aw v -u = p v -u :
dx dx dx dx dx dx
With the last equality of (10.67), we used (10.26). Using (10.66) and (10.67), we
rewrite (10.65) once again as
d2 u du d2 v dv
v w a 2
þ b þ cu - uw a 2 þ b þ cv
dx dx dx dx
ð10:68Þ
d du dv
= p v -u :
dx dx dx
s
s
du dv
dxwðxÞfv ðLx uÞ - ½Lx v ug = p v -u : ð10:69Þ
r dx dx r
The relations (10.69) along with (10.62) are called the generalized Green’s
identity.
We emphasize that as far as the coefficients a(x), b(x), and c(x) in (10.55) are real
functions, the associated differential operator Lx can be converted to a self-adjoint
form following the procedures of (10.66) and (10.67).
In the above, LHS of the original homogeneous differential equation (10.5) is
rewritten as
d2 u du
aðxÞwðxÞ þ bðxÞwðxÞ þ cðxÞwðxÞu
dx2 dx
d du
= pð x Þ þ cwðxÞu:
dx dx
1 d du
Lx u = pðxÞ þ cu ½wðxÞ > 0, ð10:70Þ
wðxÞ dx dx
where we have
dpðxÞ
pðxÞ = aðxÞwðxÞ and = bðxÞwðxÞ: ð10:71Þ
dx
dv dv
B{1 ðuÞ = α1 v ðr Þ þ β1 x=r þ γ 1 v ðsÞ þ δ1 = 0, ð10:72Þ
dx dx x=s
dv dv
B{2 ðuÞ = α2 v ðr Þ þ β2 x=r þ γ 2 v ð s Þ þ δ 2 = 0: ð10:73Þ
dx dx x=s
B 1 ð uÞ = u ð r Þ = σ 1 : ð10:74Þ
Further putting
σ 1 = σ 2 = 0, ð10:76Þ
For RHS of (10.69) to vanish, it suffices to define B{1 ðuÞ and B{2 ðuÞ such that
In this manner, we can readily construct the homogeneous adjoint BCs the same
as those of (10.77) so that Lx can be Hermitian.
We list several prescriptions of typical BCs below.
Yet, care should be taken when handling RHS of (10.69), i.e., the surface terms. It
is because conditions (1)–(3) are not necessary but sufficient conditions for the
surface terms to vanish. Such conditions are not limited to them. Meanwhile, we
often have to deal with the non-vanishing surface terms. In that case, we have to start
with (10.62) instead of (10.69).
In Sect. 10.2, we mentioned the definition of Hermiticity of the differential
operator in such a way that the said operator is self-adjoint and that homogeneous
BCs and homogeneous adjoint BCs are the same. In light of the above argument,
however, we may relax the conditions for a differential operator to be Hermitian.
This is particularly the case when p(x) = a(x)w(x) in (10.69) vanishes at both the
endpoints. We will encounter such a situation in Sect. 10.7.
10.4 Green’s Functions 417
L x uð x Þ = d ð xÞ ð10:81Þ
L x { vð xÞ = hð x Þ ð10:82Þ
under homogeneous adjoint BCs [1, 2] with an inhomogeneous term h(x) being an
arbitrary function as well.
Let us describe the above relations as
GL = LG = E, ð10:84Þ
This implies that (10.81) has been solved and the solution is given by G| di. Since
an inverse operation to differentiation is integration, G is expected to be an integral
operator.
We have
Using a weight function w(x), we generalize an inner product of (1.128) such that
s
hgjf i wðxÞgðxÞ f ðxÞdx: ð10:88Þ
r
418 10 Introductory Green’s Functions
Alternatively, we have
s
f ðx0 Þ = dxf ðxÞδðx - x0 Þ: ð10:92Þ
r
δ ð x - x0 Þ δ ðx0 - xÞ
wðxÞhx0 jxi = δðx - x0 Þ or hx0 jxi = = : ð10:94Þ
wðxÞ wðxÞ
δðx - yÞ
Lx Gðx, yÞ = : ð10:95Þ
wðxÞ
δðx - yÞ
Lx { gðx, yÞ = : ð10:96Þ
wðxÞ
That is, G(x, y) and g(x, y) satisfy the homogeneous equation with respect to the
variable x. Accordingly, we require G(x, y) and g(x, y) to satisfy the same homoge-
neous BCs with respect to the variable x as those imposed upon u(x) and v(x) of
(10.81) and (10.82), respectively [1].
The relation (10.88) can be obtained as follows: Operating hg| on (10.89),
we have
s s
hgjf i = dxwðxÞf ðxÞhgjxi = wðxÞgðxÞ f ðxÞdx, ð10:98Þ
r r
For this, see (1.113) where A is replaced with an identity operator E with regard to
a complex conjugate of an inner product of two vectors. Also see (13.2) of Sect.
13.1.
If in (10.69) the surface term (i.e., RHS) vanishes under appropriate conditions
e.g., (10.80), we have
420 10 Introductory Green’s Functions
s
dxwðxÞ v ðLx uÞ - Lx { v u = 0, ð10:100Þ
r
which is called Green’s identity. Since (10.100) is derived from identities (10.56),
(10.100) is an identity as well (as a terminology of Green’s identity shows).
Therefore, (10.100) must hold with any functions u and v so far as they satisfy
homogeneous BCs. Thus, replacing v in (10.100) with g(x, y) and using (10.96)
together with (10.81), we have
s
dxwðxÞ g ðx, yÞ½Lx uðxÞ - Lx { gðx, yÞ uðxÞ
r
s
δ ð x - yÞ
= dxwðxÞ g ðx, yÞdðxÞ - uð x Þ ð10:101Þ
r w ð xÞ
s
= dxwðxÞg ðx, yÞdðxÞ - uðyÞ = 0,
r
where with the second last equality we used a property of the δ functions. Also notice
that δðwxð-xÞyÞ is a real function. Rewriting (10.101), we get
s
uðyÞ = dxwðxÞg ðx, yÞdðxÞ: ð10:102Þ
r
Similarly, replacing u in (10.100) with G(x, y) and using (10.95) together with
(10.82), we have
s
vð yÞ = dxwðxÞG ðx, yÞhðxÞ: ð10:103Þ
r
Replacing u and v in (10.100) with G(x, q) and g(x, t), respectively, we have
s
dxwðxÞ g ðx, t Þ½Lx Gðx, qÞ - Lx { gðx, t Þ Gðx, qÞ = 0: ð10:104Þ
r
Notice that we have chosen q and t for the second argument y in (10.95) and
(10.96), respectively. Inserting (10.95) and (10.96) into the above equation after
changing arguments, we have
s
δðx - qÞ δðx - t Þ
dxwðxÞ g ðx, t Þ - Gðx, qÞ = 0: ð10:105Þ
r wðxÞ wðxÞ
Thus, we get
10.4 Green’s Functions 421
This implies that G(t, q) must satisfy the adjoint BCs with respect to the second
argument q. Inserting (10.106) into (10.102), we get
s
uð y Þ = dxwðxÞGðy, xÞdðxÞ: ð10:107Þ
r
Or, we have
s
vð xÞ = dywðyÞgðx, yÞhðyÞ: ð10:110Þ
r
In Sect. 10.3 we assume that the coefficients a(x), and b(x), and c(x) are real to
assure that Lx is Hermitian [1]. On this condition G(x, y) is real as well (vide infra).
Then we have
422 10 Introductory Green’s Functions
That is, G(x, y) is real symmetric with respect to the arguments x and y.
To be able to apply Green’s functions to practical use, we will have to estimate a
behavior of the Green’s function near x = y. This is because in light of (10.95) and
(10.96) there is a “jump” at x = y.
When we deal with a case where a self-adjoint operator is relevant, using a
function p(x) of (10.69) we have
að x Þ ∂ ∂G δ ð x - yÞ
p þ cðxÞG = : ð10:114Þ
pðxÞ ∂x ∂x wðxÞ
f ðxÞ δðxÞ = f ð0Þ δðxÞ or f ðxÞ δðx - yÞ = f ðyÞ δðx - yÞ, ð10:116Þ
we have
∂Gðx, yÞ pð y Þ x
pðt Þcðt Þ
p = θ ð x - yÞ - dt Gðt, yÞ þ C, ð10:118Þ
∂x aðyÞwðyÞ r aðt Þ
1 ðx > 0Þ,
θ ð xÞ = 0 ðx < 0Þ:
ð10:119Þ
dθðxÞ
= δðxÞ: ð10:120Þ
dx
The function θ(x) is called a step function or Heaviside step function. In RHS of
(10.118) the first term has a discontinuity at x = y because of θ(x - y), whereas the
second term is continuous with respect to y. Thus, we have
10.4 Green’s Functions 423
∂Gðx, yÞ ∂Gðx, yÞ
lim pðy þ εÞ x = yþε - pð y - ε Þ
ε → þ0 ∂x ∂x x=y-ε
ð10:121Þ
pð y Þ pð y Þ
= lim ½θðþεÞ - θð - εÞ = :
ε → þ0 aðyÞwðyÞ aðyÞwðyÞ
Since p( y) is continuous with respect to the argument y, this factor drops off and
we get
∂Gðx, yÞ ∂Gðx, yÞ 1
lim x = yþε - = : ð10:122Þ
ε → þ0 ∂x ∂x x=y-ε aðyÞwðyÞ
ð x, yÞ
Thus, ∂G∂x is accompanied by a discontinuity at x = y by a magnitude of aðyÞw
1
ðyÞ.
Since RHS of (10.122) is continuous with respect to the argument y, integrating
(10.122) again with respect to x, we find that G(x, y) is continuous at x = y. These
properties of G(x, y) are useful to calculate Green’s functions in practical use. We
will encounter several examples in next sections.
Suppose that there are two Green’s functions that satisfy the same homogeneous
BCs. Let G(x, y) and G ~ ðx, yÞ be such functions. Then, we must have
δ ð x - yÞ ~ ðx, yÞ = δðx - yÞ :
Lx Gðx, yÞ = and Lx G ð10:123Þ
w ð xÞ wðxÞ
~ ðx, yÞ - Gðx, yÞ = 0:
Lx G ð10:124Þ
~ ðx, yÞ 0
Gðx, yÞ - G or ~ ðx, yÞ:
Gðx, yÞ G ð10:125Þ
δðx - yÞ
Lx Gðx, yÞ = : ð10:126Þ
wðxÞ
Notice here that both δ(x - y) and w(x) are real functions. Subtracting (10.95)
from (10.126), we have
424 10 Introductory Green’s Functions
Again, from the uniqueness of the Green’s function, we get G(x, y) = G(x, y);
i.e., G(x, y) is real accordingly. This is independent of specific structures of Lx. In
other words, so far as we are dealing with real coefficients a(x), b(x), and c(x), G(x, y)
is real whether or not Lx is self-adjoint.
d2 u du
að x Þ þ bðxÞ þ cðxÞu = d ðxÞ, ð10:2Þ
dx2 dx
where coefficients a(x), b(x), and c(x) are real. In this case, if d(x) 0 in (10.108),
namely, the SOLDE is homogeneous equation, we have a solution u(x) 0 under
homogeneous boundary conditions on the basis of (10.108). If, however, we have
inhomogeneous boundary conditions (BCs), additional terms appear on RHS of
(10.108) in both the cases of homogeneous and inhomogeneous equations. In this
section, we examine how we can deal with this problem.
Following the remarks made in Sect. 10.3, we start with (10.62) or (10.69). If we
deal with a self-adjoint or Hermitian operator, we can apply (10.69) to the problem.
In a more general case where the operator is not self-adjoint, (10.62) is useful. In this
respect, in Sect. 10.6 we have a good opportunity for this.
In Sect. 10.3, we mentioned that we may relax the definition of Hermiticity of the
differential operator in the case where the surface term vanishes. Meanwhile, we
should bear in mind that the Green’s functions and adjoint Green’s functions are
constructed using homogeneous BCs regardless of whether we are concerned with a
homogeneous equation or inhomogeneous equation. Thus, even if the surface terms
do not vanish, we may regard the differential operator as Hermitian. This is because
we deal with essentially the same Green’s function to solve a problem with both the
cases of homogeneous equation and inhomogeneous equations (vide infra). Notice
also that whether or not RHS vanishes, we are to use the same Green’s function
[1]. In this sense, we do not have to be too strict with the definition of Hermiticity.
Now, suppose that for a differential (self-adjoint) operator Lx we are given
s
s
du dv
dxwðxÞfv ðLx uÞ - ½Lx v ug = pðxÞ v -u , ð10:69Þ
r dx dx r
10.5 Construction of Green’s Functions 425
where w(x) > 0 in a domain [r, s] and p(x) is a real function. Note that since (10.69) is
an identity, we may choose G(x, y) for v with an appropriate choice of w(x). Then
from (10.95) and (10.112), we have
s
δ ð x - yÞ
dxwðxÞ Gðy, xÞdðxÞ - uð x Þ
r w ð xÞ
s
δ ð x - yÞ
= dxwðxÞ Gðy, xÞd ðxÞ - uð x Þ ð10:127Þ
r w ð xÞ
s
duðxÞ ∂Gðy, xÞ
= pðxÞ Gðy, xÞ - uð x Þ :
dx ∂x x=r
Note in (10.127) we used the fact that both δ(x - y) and w(x) are real.
Using a property of the δ function, we get
s
uðyÞ = dxwðxÞGðy, xÞdðxÞ
r
s
duðxÞ ∂Gðy, xÞ
- pðxÞ Gðy, xÞ - uð x Þ : ð10:128Þ
dx ∂x x=r
The function G(x, y) satisfies homogeneous BCs. Hence, if we assume, e.g., the
Dirichlet BCs [see (10.80)], we have
Using the symmetric property of G(x, y) with respect to arguments x and y, from
(10.129) we get
Thus, the first term of the surface terms of (10.128) is eliminated to yield
s
s
∂Gðy, xÞ
uð y Þ = dxwðxÞGðy, xÞdðxÞ þ pðxÞuðxÞ :
r ∂x x=r
s
s
∂Gðx, yÞ
uð x Þ = dywðyÞGðx, yÞd ðyÞ þ pðyÞuðyÞ : ð10:131Þ
r ∂y y=r
Then, (1) substituting surface terms of u(s) and u(r) that are associated with the
inhomogeneous BCs described as
∂Gðx, yÞ ∂Gðx, yÞ
and (2) calculating ∂y
and ∂y
, we will be able to obtain a unique
y=r y=s
solution. Notice that even though we have formally the same differential operators,
we get different Green’s functions depending upon different BCs. We see tangible
examples later.
On the basis of the general discussion of Sect. 10.4 and this section, we are in the
position to construct the Green’s functions. Except for the points of x = y, the
Green’s function G(x, y) must satisfy the following differential equation:
Lx Gðx, yÞ = 0, ð10:133Þ
where Lx is given by
d2 d
Lx = aðxÞ þ bðxÞ þ cðxÞ: ð10:55Þ
dx2 dx
The differential equation Lxu = d(x) is defined within an interval [r, s], where
r may be -1 and s may be +1.
From now on, we regard a(x), b(x), and c(x) as real functions. From (10.133), we
expect the Green’s function to be described as a linear combination of a fundamental
set of solutions u1(x) and u2(x). Here the fundamental set of solutions are given by
two linearly independent solutions of a homogeneous equation Lxu = 0. Then we
should be able to express G(x, y) as a combination of F1(x, y) and F2(x, y) that are
described as
where c1, c2, d1, and d2 are arbitrary (complex) constants to be determined later.
These constants are given as a function of y. The combination has to be made
such that
Notice that F1(x, y) and F2(x, y) are “ordinary” functions and that G(x, y) is not,
because G(x, y) contains the θ(x) function.
If we have
Hence, we get
d2 u
þ u = 1: ð10:141Þ
dx2
We assume that a domain of the argument x is [0, L]. We set boundary conditions
such that
Thus, if at least one of σ 1 and σ 2 is not zero, we are dealing with an inhomoge-
neous differential equation under inhomogeneous BCs.
Next, let us seek conditions that the Green’s function satisfies. We also seek a
fundamental set of solutions of a homogeneous equation described by
428 10 Introductory Green’s Functions
d2 u
þ u = 0: ð10:143Þ
dx2
eix and e - ix :
Then, we have
The functions F1(x, y) and F2(x, y) must satisfy the following BCs such that
Thus, we have
Therefore, at x = y we have
∂F 2 ðx, yÞ ∂F 1 ðx, yÞ
x=y - = 1: ð10:148Þ
∂x ∂x x=y
This is because both F1(x, y) and F2(x, y) are ordinary functions and supposed to
be differentiable at any x. The relation (10.148) then reads as
0 - eiy þ e2iL e - iy
i - eiy - e2iL e - iy iðeiy - e2iL e - iy Þ
c1 = iy - iy 2iL - iy
= , ð10:150Þ
e -e -e þ e e
iy 2ð1 - e2iL Þ
eiy þ e - iy - eiy - e2iL e - iy
eiy - e - iy 0
eiy þ e - iy i iðeiy - e - iy Þ
d1 = iy - iy 2iL - iy
= : ð10:151Þ
e -e -e þ e e
iy 2ð1 - e2iL Þ
eiy þ e - iy - eiy - e2iL e - iy
L
uð x Þ = dyGðx, yÞ
0
cosðx - 2LÞ - cos x x
sin x L
= sin ydy þ ½cosðy - 2LÞ - cos ydy:
2 sin 2 L 0 2 sin 2 L x
ð10:155Þ
430 10 Introductory Green’s Functions
This can readily be integrated to yields solution for the inhomogeneous equation
such that
Next, let us consider the surface term. This is given by the second term of
(10.131). We get
Therefore, with the inhomogeneous BCs we have the following solution for the
inhomogeneous equation:
uðxÞ 1:
Looking at (10.141), we find that u(x) 1 is certainly a solution for (10.141) with
inhomogeneous BCs of σ 1 = σ 2 = 1. The uniqueness of the solution then ensures
that u(x) 1 is a sole solution under the said BCs.
From (10.154), we find that G(x, y) has a singularity at L = nπ (n : integers). This
is associated with the fact that a homogenous equation (10.143) has a non-trivial
solution, e.g., u(x) = sin x under homogeneous BCs u(0) = u(L ) = 0. The present
situation is essentially the same as that of Example 1.1 of Sect. 1.3. In other words,
when λ = 1 in (1.61), the form of a differential equation is identical to (10.143) with
virtually the same Dirichlet conditions. The point is that (10.143) can be viewed as a
homogeneous equation and, at the same time, as an eigenvalue equation. In such a
case, a Green’s function approach will fail.
10.6 Initial-Value Problems (IVPs) 431
The IVPs frequently appear in mathematical physics. The relevant conditions are
dealt with as BCs in the theory of differential equations. With boundary functionals
B1(u) and B2(u) of (10.3) and (10.4), setting α1 = β2 = 1 and other coefficients as
zero, we get
du
B1 ðuÞ = uðpÞ = σ 1 and B2 ðuÞ = = σ2 : ð10:159Þ
dx x=p
In the above, note that we choose [r, s] for a domain of x. The points r and s can be
infinity as before. Any point p within the domain [r, s] may be designated as a special
point on which the BCs (10.159) are imposed. The initial conditions are particularly
prominent among BCs. This is because the conditions are set at one point of the
argument. This special condition is usually called initial conditions. In this section,
we investigate fundamental characteristics of IVPs.
Suppose that we have
du
uð pÞ = = 0, ð10:160Þ
dx x=p
d2 d
Lx = aðxÞ þ bðxÞ þ cðxÞ, ð10:55Þ
dx2 dx
Lx uðxÞ = 0: ð10:161Þ
A general solution u(x) for (10.161) is given by a linear combination of u1(x) and
u2(x) such that
where c1 and c2 are arbitrary (complex) constants. Suppose that we have homoge-
neous BCs expressed by (10.160). Then, we have
u1 ðpÞ u2 ðpÞ
u01 ðpÞ u02 ðpÞ
c1
c2
= 0:
Since the matrix represents Wronskian of a fundamental set of solutions u1(x) and
u2(x), its determinant never vanishes at any point p. That is, we have
u 1 ð pÞ u 2 ð pÞ
≠ 0: ð10:163Þ
u01 ðpÞ u02 ðpÞ
uð x Þ 0
d ðav Þ
s s
du
dx v ðLx uÞ - Lx { v u = av -u þ buv : ð10:62Þ
r dx dx r
For the surface term (RHS) to vanish, for homogeneous BCs we have, e.g.,
du dv
uð s Þ = x=s =0 and vð r Þ = = 0,
dx dx x=r
for the two sets of BCs adjoint to each other. Obviously, these are not identical
simply because the former is determined at s and the latter is determined at a different
point r. For this reason, the operator Lx is not Hermitian, even though it is formally
self-adjoint. In such a case, we would rather use Lx directly than construct a self-
adjoint operator because we cannot make the operator Hermitian either way.
Hence, unlike the precedent sections we do not need a weight function w(x). Or,
we may regard w(x) 1. Then, we reconsider the conditions which the Green’s
functions should satisfy. On the basis of the general consideration of Sect. 10.4,
especially (10.86), (10.87), and (10.94), we have [2]
Therefore, we have
2
∂ Gðx, yÞ bðxÞ ∂Gðx, yÞ cðxÞ δ ð x - yÞ
þ þ Gðx, yÞ = :
∂x2 að x Þ ∂x að x Þ að x Þ
x 0
∂Gðx, yÞ bð x Þ x
bð ξ Þ x
cð ξ Þ
þ Gðx, yÞ - Gðξ, yÞdξ þ Gðξ, yÞdξ
∂x að x Þ x0 x0 að ξ Þ x0 aðξÞ
θ ð x - yÞ
= :
að y Þ
∂Gðx, yÞ θ ð x - yÞ
Noting that the functions other than ∂x
and aðyÞ are continuous, as before
we have
∂Gðx, yÞ ∂Gðx, yÞ 1
lim x = yþε - = : ð10:165Þ
ε → þ0 ∂x ∂x x=y-ε að y Þ
From a practical point of view, we may set r = 0 in (10.62). Then, we can choose a
domain for [0, s] (for s > 0) or [s, 0] (for s < 0) with (10.2). For simplicity, we use
x instead of s. We consider two cases of x > 0 and x < 0.
1. Case I (x > 0): Let u1(x) and u2(x) be a fundamental set of solutions. We define
F1(x, y) and F2(x, y) as before such that
As before, we set
u ð 0Þ = 0 and u0 ð0Þ = 0:
Correspondingly, we have
u1 ð0Þ u2 ð0Þ
u01 ð0Þ u02 ð0Þ
c1
c2
= 0:
As mentioned above, since u1(x) and u2(x) are a fundamental set of solutions,
we have c1 = c2 = 0. Hence, we get
F 1 ðx, yÞ = 0:
From the continuity and discontinuity conditions (10.165) imposed upon the
Green’s functions, we have
As before, we get
u2 ð y Þ u1 ðyÞ
d1 = - and d2 = ,
aðyÞW ðu1 ðyÞ, u2 ðyÞÞ aðyÞW ðu1 ðyÞ, u2 ðyÞÞ
where W(u1( y), u2( y)) is Wronskian of u1( y) and u2( y). Thus, we get
Here notice that the sign is reversed in (10.170) relative to (10.168). This is
because on the discontinuity condition, instead of (10.167) we have to have
This results from the fact that magnitude relationship between the arguments
x and y has been reversed in (10.169) relative to (10.166).
10.6 Initial-Value Problems (IVPs) 435
( , )
O y
, = −1
Notice that
Note that Θ(x - a, y - a) can be obtained by shifting Θ(x, y) toward the positive
direction of the x- and y-axes by a (a can be either positive or negative; in Fig. 10.3
we assume a > 0). Using the Θ(x, y) function, the Green’s function is described as
( − , − )
O
y
we have
Notice that
d ðav Þ
s s
du
dx v ðLx uÞ - Lx { v u = av -u þ buv : ð10:62Þ
r dx dx r
x
Fig. 10.4 Domain of G(y,
x). Areas for which G(y, x)
does not vanish are hatched
O y
1 1
du d ðav Þ
dx v ðLx uÞ - Lx { v u = av -u þ buv : ð10:176Þ
0 dx dx 0
Note that in the above we used Lx{g(x, y) = δ(x - y). In Fig. 10.4, we depict a
domain of G(y, x) in which G(y, x) does not vanish. Notice that we get the domain
by folding back that of Θ(x, y) (see Fig. 10.2) relative to a straight line y = x.
ðy, xÞ
Thus, we find that G(y, x) vanishes at x > y. So does ∂G∂x ; see Fig. 10.4.
Namely, the second term of RHS of (10.178) vanishes at x = 1. In other words,
g(x, y) and G(y, x) must satisfy the adjoint BCs; i.e.,
438 10 Introductory Green’s Functions
At the same time, the upper limit of integration range of (10.178) can be set at
y. Noting the above, we have
y
½the first term of ð10:178Þ = dxGðy, xÞd ðxÞ: ð10:180Þ
0
du
u ð 0Þ = σ 1 and = σ2 , ð10:182Þ
dx x=0
for (10.181) along with other appropriate values, we should be able to get a
unique solution as
y
dað0Þ
uð y Þ = dxGðy, xÞdðxÞ þ σ 2 að0Þ - σ 1 þ σ 1 bð0Þ Gðy, 0Þ
0 dx
ð10:183Þ
∂Gðy, xÞ
- σ 1 a ð 0Þ :
∂x x=0
x
dað0Þ
uð x Þ = dyGðx, yÞdðyÞ þ σ 2 að0Þ - σ 1 þ σ 1 bð0Þ Gðx, 0Þ
0 dy
ð10:184Þ
∂Gðx, yÞ
- σ 1 a ð 0Þ :
∂y y=0
Here, we consider that Θ(x, y) = 1 in this region and use (10.174). Meanwhile,
from (10.174), we have
10.6 Initial-Value Problems (IVPs) 439
where we used
as well as
and
x
dað0Þ
uðxÞ = dyF ðx, yÞdðyÞ þ σ 2 að0Þ - σ 1 þ σ 1 bð0Þ F ðx, 0Þ
0 dy
ð10:190Þ
∂F ðx, yÞ
- σ 1 að0Þ :
∂y y=0
0
duðxÞ daðxÞ
uð y Þ ¼ dxGðy, xÞd ðxÞ - aðxÞGðy, xÞ - uð x Þ Gðy, xÞ
-1 dx dx
0
∂Gðy, xÞ
- uðxÞaðxÞ þ bðxÞuðxÞGðy, xÞ :
∂x
x¼ - 1
0
uðyÞ = dxGðy, xÞdðxÞ
y
duðxÞ daðxÞ ∂Gðy, xÞ
- aðxÞGðy, xÞ - uð x Þ Gðy,xÞ - uðxÞaðxÞ þ bðxÞuðxÞGðy,xÞ
dx dx ∂x x= 0
y
dað0Þ
=- dxGðy,xÞdðxÞ - σ 2 að0Þ - σ 1 þ σ 1 bð0Þ Gðy,0Þ
0 dx
∂Gðy,xÞ
þσ 1 að0Þ :
∂x x=0
ð10:191Þ
10.6.4 Examples
d2 u
þ u = 1: ð10:192Þ
dx2
Note that (10.192) is formally the same differential equation of (10.141). We may
encounter (10.192) when we are observing a motion of a charged harmonic oscillator
10.6 Initial-Value Problems (IVPs) 441
that is placed under a static electric field. We assume that a domain of the argument
x is a whole range of real numbers. We set boundary conditions such that
x
uð x Þ = dy sinðx - yÞ þ σ 2 sin x - σ 1 ½ - cosðx - yÞ
0 y=0 ð10:196Þ
= 1 - cos x þ σ 2 sin x þ σ 1 cos x:
uðxÞ 1: ð10:197Þ
This also ensures that this is a unique solution under the inhomogeneous BCs
described as σ 1 = 1 and σ 2 = 0.
Example 10.6 Damped Oscillator If a harmonic oscillator undergoes friction, the
oscillator exerts damped oscillation. Such an oscillator is said to be a damped
oscillator. The damped oscillator is often dealt with when we think of bound
electrons in a dielectric medium that undergo an effect of a dynamic external field
varying with time. This is the case when the electron is placed in an alternating
electric field or an electromagnetic wave.
An equation of motion of the damped oscillator is described as
d2 u du
m þ r þ ku = dðt Þ, ð10:198Þ
dt 2 dt
d2 u du
m þ r þ ku = 0,
dt 2 dt
putting
r k iρt
- ρ2 þ iρ þ e = 0:
m m
r k
- ρ2 þ iρ þ = 0: ð10:200Þ
m m
We call this equation a characteristic quadratic equation. We have three cases for
the solution of a quadratic equation of (10.200). Solving (10.200) with respect to ρ,
we get
ir r2 k
ρ= ∓ - þ : ð10:201Þ
2m 4m2 m
Regarding (10.201), we have following three cases for the roots ρ: (1) ρ has two
r2
different pure imaginary roots; i.e., - 4m 2 þ m < 0 (an over damping). (2) ρ has
k
r2
double root 2m ir
; - 4m 2 þ m = 0 (a critical damping). (3) ρ has two complex roots;
k
2
- 4mr
2 þ m > 0 (a weak damping). Of these, Case (3) is characterized by an oscil-
k
lating solution and has many applications in mathematical physics. For the Cases
(1) and (2), however, we do not have an oscillating solution.
Case (1):
The characteristic roots are given by
ir r2 k
ρ= ∓i - :
2m 4m2 m
rt r2 k
uðt Þ = exp - exp ∓ 2
- t :
2m 4m m
rt r2 k r2 k
uðt Þ = exp - aexp - t þ b exp - - t :
2m 4m2 m 4m2 m
Case (2):
The characteristic roots are given by
ir
ρ= :
2m
rt
u1 ðt Þ = exp - :
2m
∂u1 ðt Þ rt
u2 ðt Þ = c = c0 t exp - ,
∂ðiρÞ 2m
rt rt
uðt Þ = a exp - þ bt exp - :
2m 2m
d2 u r du k 1
þ þ u = d ðt Þ:
dt 2 m dt m m
Putting
r2 k
ω - þ , ð10:202Þ
4m2 m
rt
uðt Þ = exp - expð ∓ iωt Þ: ð10:203Þ
2m
where u1 ðt Þ = exp - 2m
rt
expðiωt Þ and u2 ðt Þ = exp - 2m
rt
expð- iωt Þ.
We examine whether G(t, τ) is eligible for the Green’s function as follows:
dG 1 r
e - 2mðt - τÞ sin ωðt - τÞ þ ωe - 2mðt - τÞ cos ωðt - τÞ Θðt, τÞ
r r
= -
dt ω 2m
1
þ e - 2mðt - τÞ sin ωðt - τÞ½δðt - τÞθðτÞ þ δðτ - t Þθð - τÞ:
r
ω
ð10:205Þ
d2 G 1 r 2 r
e - 2mðt - τÞ sin ωðt - τÞ - ωe - 2mðt - τÞ cos ωðt - τÞ
r r
¼ -
dt 2 ω 2m m
1 r
- ω2 e - 2mðt - τÞ sin ωðt - τÞ Θðt, τÞ þ e - 2mðt - τÞ sin ωðt - τÞ
r r
-
ω 2m
ð10:206Þ
In the last term using the property of the δ function and θ function, we get δ(t - τ).
That is, the following part in (10.206) is calculated such that
d2 G 1 r 2
þ ω2 e - 2mðt - τÞ sin ωðt - τÞ
r
¼ δ ðt - τ Þ þ - -
dt 2 ω 2m
r r
e - 2mðt - τÞ sin ωðt - τÞ þ ωe - 2mðt - τÞ cos ωðt - τÞ
r r
- - Θðt, τÞ
m 2m
k r dG
¼ δ ðt - τ Þ - G- ,
m m dt
ð10:207Þ
where we used (10.202) and (10.204) for the last equality. Rearranging (10.207)
once again, we have
d 2 G r dG k
þ þ G = δðt - τÞ: ð10:208Þ
dt 2 m dt m
d2 r d k
Lt þ þ , ð10:209Þ
dt 2 m dt m
we get
Note that this expression is consistent with (10.164). Thus, we find that (10.210)
satisfies the condition (10.123) of the Green’s function, where the weight function is
identified with unity.
Now, suppose that a sinusoidally changing external field eiΩt influences the
motion of the damped oscillator. Here we assume that an amplitude of the external
field is unity. Then, we have
d2 u r du k 1
þ þ u = eiΩt : ð10:211Þ
dt 2 m dt m m
t
1
e - 2mðt - τÞ eiΩt sin ωðt - τÞdτ,
r
uðt Þ = ð10:212Þ
mω 0
where t is an arbitrary positive or negative time. Equation (10.212) shows that with
the real part we have an external field cosΩt and that with the imaginary part we have
an external field sinΩt. To calculate (10.212) we use
446 10 Introductory Green’s Functions
1 iωðt - τÞ
sin ωðt - τÞ = e - e - iωðt - τÞ : ð10:213Þ
2i
1 r 1 r 2 1 2
Cuðt Þ ¼ Ω sin Ωt þ cos Ωt - Ω - ω2 cos Ωt ð10:214Þ
m m m 2m m
1 - 2mr t r 1 2
þ e Ω2 - ω2 cos ωt - Ω þ ω2 sin ωt
m 2m ω
r 2 r 3 1
- cos ωt - sin ωt ,
2m 2m ω
2 r2 r 4
C = Ω 2 - ω2 þ Ω2 þ ω 2 þ : ð10:215Þ
2m2 2m
For the imaginary part (i.e., the external field is sinΩt) we get
1 2 1 r 1 r 2
Cuðt Þ = - Ω - ω2 sin Ωt - Ω cos Ωt þ sin Ωt
m m m m 2m
ð10:216Þ
1 Ω 2 r r 2Ω
þ e - 2mt
r
Ω - ω2 sin ωt þ Ω cos ωt þ sin ωt :
m ω m 2m ω
In Fig. 10.5 we show an example that depicts the positions of a damped oscillator
as a function of time. In Fig. 10.5a, an amplitude of an envelope gradually dimin-
ishes with time. An enlarged diagram near the origin (Fig. 10.5b) clearly reflects the
initial conditions uð0Þ = uð_0Þ = 0. In Fig. 10.5, we put m = 1 [kg], Ω = 1 [1/s],
ω = 0.94 [1/s], and r = 0.006 [kg/s]. In the above calculations, if r/m is small enough
(i.e., damping is small enough), the third order and fourth order of r/m may be
ignored and the approximation is precise enough.
In the case of inhomogeneous BCs, given σ 1 = u(0) and σ 2 = uð_0Þ we can decide
additional surface term S(t) such that
r 1 - 2mr t
sin ωt þ σ 1 e - 2mt cos ωt:
r
Sð t Þ = σ 2 þ σ 1 e ð10:217Þ
2m ω
The term S(t) corresponds to the second and third terms of (10.190). Thus, from
(10.212) and (10.217),
10.6 Initial-Value Problems (IVPs) 447
(a)
0.6
0.4
0.2
Position (m)
-0.2
-0.4
Phase: 0
-0.6
–446 Time (s) 754
(b) 0.6
0.4
0.2
Position (m)
-0.2
Phase: 0
-0.4
-0.6
–110 Time (s) 110
Fig. 10.5 Example of a damped oscillation as a function of t. The data are taken from (10.214). (a)
Overall profile. (b) Profile enlarged near the origin
uð t Þ þ Sð t Þ
gives a unique solution for the SOLDE with inhomogeneous BCs. Notice that S(t)
does not depend on the external field.
In the above calculations, use
448 10 Introductory Green’s Functions
1 - 2mr ðt - τÞ
F ðt, τÞ = e sin ωðt - τÞ
ω
together with a(t) 1, b(t) r/m, d(t) eiΩt/m with respect to (10.190). The
conformation of (10.217) is left for readers.
d2 u du
að x Þ þ bðxÞ þ cðxÞu = 0: ð10:5Þ
dx2 dx
d2 d
Lx = aðxÞ þ bðxÞ þ cðxÞ, ð10:55Þ
dx2 dx
Lx uðxÞ = 0: ð10:218Þ
d2 u du
að x Þ þ bðxÞ - λu = 0: ð10:219Þ
dx2 dx
d2 d
Lx = a ð x Þ þ bðxÞ - λ, ð10:220Þ
dx2 dx
Lx u = 0 ð10:221Þ
d2 d
Lx = a ð x Þ þ bðxÞ ,
dx2 dx
Lx u = λu ð10:222Þ
to express (10.219).
Equations (10.221) and (10.222) are essentially the same except that the expres-
sion is different. The expression using (10.222) is familiar to us as an eigenvalue
equation. The difference between (10.5) and (10.219) is that whereas c(x) in (10.5) is
a given fixed function, λ in (10.219) is constant, but may be varied according to the
solution of u(x). One of the most essential properties of the eigenvalue problem that
is posed in the form of (10.222) is that its solution is not uniquely determined as
already studied in various cases of Part I.
Remember that the methods based upon the Green’s function are valid for a
problem to which a homogeneous differential equation has a trivial solution (i.e.,
identically zero) under homogeneous BCs. In contrast to this situation, even though
the eigenvalue problem is basically posed as a homogeneous equation under homo-
geneous BCs, non-trivial solutions are expected to be obtained. In this respect, we
have seen that in Part I we rejected a trivial solution (i.e., identically zero) because of
no physical meaning.
As exemplified in Part I, the eigenvalue problems that appear in mathematical
physics are closely connected to the Hermiticity of (differential) operators. This is
because in many cases an eigenvalue is required to be real. We have already
examined how we can convert a differential operator to the self-adjoint form. That
is, if we define p(x) as in (10.26), we have the self-adjoint operator as described in
(10.70). As a symbolic description, we have
d du
wðxÞLx u = pð x Þ þ cðxÞwðxÞu: ð10:223Þ
dx dx
For instance, Hermite differential equation that has already appeared as (2.118) in
Sects. 2.3 is described as
d2 u du
- 2x þ 2nu = 0: ð10:225Þ
dx2 dx
we have
450 10 Introductory Green’s Functions
d 2 du
e-x þ 2ne - x u = 0:
2
ð10:226Þ
dx dx
Notice that the differential operator has been converted to a self-adjoint form
according to (10.31) that defines a real and positive weight function e - x in the
2
present case. The domain of the Hermite differential equation is (-1, +1) at the
endpoints (i.e., ±1) of which the surface term of RHS of (10.69) approaches zero
sufficiently rapidly in virtue of e - x .
2
In Sects. 3.4, and 3.5, in turn, we dealt with the (associated) Legendre differential
equation (3.127) for which the relevant differential operator is self-adjoint. The
surface term corresponding to (10.69) vanishes. It is because (1 - ξ2) vanishes at
the endpoints ξ = cos θ = ± 1 (i.e., θ = 0 or π) from (3.107). Thus, the Hermiticity
is automatically ensured for the (associated) Legendre differential equation as well
as Hermite differential equation. In those cases, even though the differential equa-
tions do not satisfy any particular BCs, the Hermiticity is yet ensured.
In the theory of differential equations, the aforementioned properties of the
Hermitian operators have been fully investigated as the so-called Strum-Liouville
system (or problem) in the form of a homogeneous differential equation. The related
differential equations are connected to classical orthogonal polynomials having
personal names such as Hermite, Laguerre, Jacobi, Gegenbauer, Legendre,
Tchebichef, etc. These equations frequently appear in quantum mechanics and
electromagnetism as typical examples of Strum-Liouville system. They can be
converted to differential equations by multiplying an original form by a weight
function. The resulting equations can be expressed as
d dY ðxÞ
aðxÞwðxÞ n þ λn wðxÞY n ðxÞ = 0, ð10:227Þ
dx dx
1 d dY ðxÞ
aðxÞwðxÞ n þ λn Y n ðxÞ = 0, ð10:228Þ
wðxÞ dx dx
where we put (aw)′ = bw. That is, the differential equation is originally described as
d2 Y n ðxÞ dY ðxÞ
að x Þ þ bðxÞ n þ λn Y n ðxÞ = 0: ð10:229Þ
dx2 dx
In the case of Hermite polynomials, for instance, a(x) = 1 and wðxÞ = e - x . Since
2
we have ðawÞ0 = - 2xe - x = bw, we can put b = - 2x. Examples including this
2
case are tabulated in Table 10.2. The eigenvalues λn are associated with real numbers
10.7
Eigenvalue Problems
that characterize the individual physical systems. The related fields have wide
applications in many branches of natural science.
After having converted the operator to the self-adjoint form; i.e., Lx = Lx{, instead
of (10.100) we have
s
dxwðxÞfv ðLx uÞ - ½Lx v ug = 0: ð10:230Þ
r
ψ j jLx ψ i = ψ j jλi ψ i = λi ψ j jψ i = λj ψ j jψ i = λj ψ j jψ i
= Lx ψ j jψ i : ð10:234Þ
With the second and third equalities, we have used a rule of the inner product (see
Parts I and III). Therefore, we get
λi - λj ψ j jψ i = 0: ð10:235Þ
λi - λi = 0 or λi = λi : ð10:237Þ
References 453
The relation (10.237) obviously indicates that λi is real; i.e., we find that eigen-
values of an Hermitian operator are real. If λi ≠ λj = λj, from (10.235) we get
ψ j jψ i = 0: ð10:238Þ
References
In this part, we treat vectors and their transformations in linear vector spaces so that
we can address various aspects of mathematical physics systematically but intui-
tively. We outline the general principles of linear vector spaces mostly from an
algebraic point of view. Starting with abstract definition and description of vectors,
we deal with their transformation in a vector space using a matrix. An inner product
is a central concept in the theory of a linear vector space so that two vectors can be
associated with each other to yield a scalar. Unlike many books of linear algebra and
linear vector spaces, however, we describe canonical forms of matrices before
considering the inner product. This is because we can treat the topics in light of
the abstract theory of matrices and vector space without a concept of the inner
product. Of the canonical forms of matrices, Jordan canonical form is of paramount
importance. We study how it is constructed providing a tangible example.
In relation to the inner product space, normal operators such as Hermitian
operators and unitary operators frequently appear in quantum mechanics and elec-
tromagnetism. From a general aspect, we revisit the theory of Hermitian operators
that often appeared in both Parts I and II.
The last part deals with exponential functions of matrices. The relevant topics can
be counted as one of the important branches of applied mathematics. Also, the
exponential functions of matrices play an essential role in the theory of continuous
groups that will be dealt with in Parts IV and V.
Chapter 11
Vectors and Their Transformation
In this chapter, we deal with the theory of finite-dimensional linear vector spaces.
Such vector spaces are spanned by a finite number of linearly independent vectors,
namely, basis vectors. In conjunction with developing an abstract concept and
theory, we mention a notion of mapping among mathematical elements. A linear
transformation of a vector is a special kind of mapping. In particular, we focus on
endomorphism within a n-dimensional vector space Vn. Here, the endomorphism is
defined as a linear transformation: Vn → Vn. The endomorphism is represented by a
(n, n) square matrix. This is most often the case with physical and chemical
applications, when we deal with matrix algebra. In this book we focus on this type
of transformation.
A non-singular matrix plays an important role in the endomorphism. In this
connection, we consider its inverse matrix and determinant. All these fundamental
concepts supply us with a sufficient basis for better understanding of the theory of
the linear vector spaces. Through these processes, we should be able to get
acquainted with connection between algebraic and analytical approaches and gain
a broad perspective on various aspects of mathematical physics and related fields.
11.1 Vectors
From both fundamental and practical points of view, it is desirable to define linear
vector spaces in an abstract way. Suppose that V is a set of elements denoted by a, b,
c, etc. called vectors. The set V is a linear vector space (or simply a vector space), if a
sum a + b 2 V is defined for any pair of vectors a and b and the elements of V satisfy
the following mathematical relations:
ða þ bÞ þ c = a þ ðb þ cÞ, ð11:1Þ
a þ b = b þ a, ð11:2Þ
a þ 0 = a, ð11:3Þ
a þ ð- aÞ = 0: ð11:4Þ
For the above 0 is called the zero vector. Furthermore, for a 2 V, ca 2 V is defined
(c is a complex number called a scalar) and we assume the following relations among
vectors (a and b) and scalars (c and d ):
1a = a, ð11:6Þ
On the basis of the above relations, we can construct the following expression
called a linear combination:
c1 a1 þ c 2 a2 þ ⋯ þ c n an :
c1 a1 þ c2 a2 þ ⋯ þ cn an = 0: ð11:9Þ
If (11.9) holds only in the case where every ci = 0 (1 ≤ i ≤ n), the vectors a1, a2, ⋯,
an are said to be linearly independent. In this case the relation represented by (11.9)
is said to be trivial. If the relation is non-trivial (i.e., ∃ci ≠ 0), those vectors are said to
be linearly dependent.
If in the vector space V the maximum number of linearly independent vectors is
n, V is said to be an n-dimensional vector space and sometimes denoted by Vn. In this
case any vector x of Vn is expressed uniquely as a linear combination of linearly
independent vectors such that
x = x 1 a1 þ x 2 a2 þ ⋯ þ x n an : ð11:10Þ
Suppose x is denoted by
11.1 Vectors 459
x1
x = ða1 ⋯an Þ ⋮ : ð11:13Þ
xn
x1
A set of coordinates ⋮ is called a column vector (or a numerical vector) that
xn
indicates an “address” of the vector x with respect to the basis vectors a1, a2, ⋯, an.
These vectors are represented as a row vector in (11.13). Any vector in Vn can be
expressed as a linear combination of the basis vectors and, hence, we say that Vn is
spanned by a1, a2, ⋯, an. This is represented as
V n = Spanfa1 , a2 , ⋯, an g: ð11:14Þ
Let us think of a subset W of Vn (i.e., W ⊂ Vn). If the following relations hold for
W, W is said to be a (linear) subspace of Vn.
a, b 2 W ) a þ b 2 W,
ð11:15Þ
a 2 W ) ca 2 W:
These two relations ensure that the relations of (11.1)-(11.8) hold for W as well.
The dimension of W is equal to or smaller than n. For instance, W = Span{a1, a2, ⋯,
ar} (r ≤ n) is a subspace of Vn. If r = n, W = Vn. Suppose that there are two
subspaces W1 = Span{a1} and W2 = Span{a2}. Note that in this case W1 [ W2 is not
a subspace, because W1 [ W2 does not contain a1 + a2. However, a set U defined by
U = x = x 1 þ x2 ; 8 x1 2 W 1 , 8 x2 2 W 2 ð11:16Þ
x = x 1 þ x2 ; x 1 2 W 1 , x 2 2 W 2 ,
V = W 1 ⨁W 2 : ð11:17Þ
(a)
z z
B B
O O
y y
A A’
x x
(b) z
2
y
3
x
Fig. 11.1 Decomposition of a vector in a three-dimensional Cartesian space ℝ3 into two subspaces.
(a) ℝ3 = W1 + W2; W1 \ W2 = Span{e2}, where e2 is a unit vector in the positive direction of the y-
axis. (b) ℝ3 = W1 + W3; W1 \ W3 = {0}
where “dim” stands for dimension of the vector space considered. To prove (11.18),
we suppose that V is a n-dimensional vector space and that W1 and W2 are spanned
by r1 and r2 linearly independent vectors, respectively, such that
462 11 Vectors and Their Transformation
Then, we have n ≤ r1 + r2. This is almost trivial. Suppose r1 + r2 < n. Then, these
(r1 + r2) vectors cannot span V, but we need additional vectors for all vectors
including the additional vectors to span V. Thus, we should have n ≤ r1 + r2
accordingly. That is,
ð2Þ
Now, let us assume that V = W1 ⨁ W2. Then, ei ð1 ≤ i ≤ r 2 Þ must be linearly
ð1Þ ð1Þ ð2Þ
independent of e1 , e2 , ⋯, erð11 Þ . If not, ei could be described as a linear combina-
ð1Þ ð1Þ ð2Þ ð2Þ
tion of e1 , e2 , ⋯, eðr11 Þ . But this would imply that ei 2 W 1 , i.e., ei 2 W 1 \ W 2 ,
in contradiction to that we have V = W1 ⨁ W2. It is because W1 \ W2 = {0} by
ð1Þ ð2Þ ð2Þ
assumption. Likewise, ej ð1 ≤ j ≤ r 1 Þ is linearly independent of e1 , e2 , ⋯, eðr22 Þ .
ð1Þ ð1Þ ð2Þ ð2Þ
Hence, e1 , e2 , ⋯, eðr11 Þ , e1 , e2 , ⋯, eðr22 Þ must be linearly independent. Thus,
n ≥ r1 + r2. This is because in the vector space V we may well have additional
vector(s) that are independent of the above (r1 + r2) vectors. Meanwhile, n ≤ r1 + r2
from the above. Consequently, we must have n = r1 + r2. Thus, we have proven that
ð1Þ ð2Þ
x = a1 e1 þ ⋯ þ ar1 erð11 Þ þ b1 e1 þ ⋯ þ br2 eðr22 Þ : ð11:22Þ
The vector described by the first term is contained in W1 and that described by the
second term in W2. Both the terms are again expressed uniquely. Therefore, we get
V = W1 ⨁ W2. This is a proof of
Furthermore, we have
In the previous section we introduced vectors and their calculation rules in a linear
vector space. It is natural and convenient to relate a vector to another vector, as a
function f relates a number (either real or complex) to another number such that
y = f(x), where x and y are certain two numbers. A linear transformation from the
vector space V to another vector space W is a mapping A : V → W such that
We will briefly discuss the concepts of mapping at the end of this section.
It is convenient to define addition of the linear transformations. It is defined as
T
T
x = xe1 þ ye2
x ð11:27Þ
= ð e1 e2 Þ ,
y
where e1 and e2 are unit basis vectors in the xy-plane and x and y are coordinates of
the vector x in reference to e1 and e2. The expression (11.27) is consistent with
(11.13). The rotation represented in Fig. 11.2 is an example of a linear transforma-
tion. We call this rotation R. According to the definition
Putting
we have
x0 = xe01 þ ye02
x ð11:29Þ
= ð e01 e02 Þ :
y
cos θ - sin θ
e1′ e2′ = ðe1 e2 Þ : ð11:31Þ
sin θ cos θ
Meanwhile, x′ can be expressed relative to the original basis vectors e1 and e2.
x0 = x0 e1 þ y 0 e2 : ð11:33Þ
x0 = x cos θ - y sin θ,
ð11:34Þ
y0 = x sin θ þ y cos θ:
x′ cos θ - sin θ x
= : ð11:35Þ
y′ sin θ cos θ y
cos θ - sin θ x
x ′ = RðxÞ = ðe1 e2 Þ : ð11:36Þ
sin θ cos θ y
The above example demonstrates that the linear transformation R has a (2, 2)
matrix representation shown in (11.36). Moreover, this example obviously shows
that if a vector is expressed as a linear combination of the basis vectors, the
“coordinates” (represented by a column vector) can be transformed as well by the
same matrix.
Regarding an abstract n-dimensional linear vector space Vn, the linear vector
transformation A is given by
a11 ⋯ a1n x1
A ð xÞ = ð e1 ⋯ en Þ ⋮ ⋱ ⋮ ⋮ , ð11:37Þ
an1 ⋯ ann xn
where e1, e2, ⋯, and en are basis vectors and x1, x2, ⋯, and xn are the corresponding
n
coordinates of a vector x = xi ei . We assume that the transformation is a mapping
i=1
A: Vn → Vn (i.e., endomorphism). In this case the transformation is represented by a
(n, n) matrix. Note that the matrix operates on the basis vectors from the right and
that it operates on the coordinates (i.e., a column vector) from the left. In (11.37) we
often omit a parenthesis to simply write Ax.
466 11 Vectors and Their Transformation
Here we mention matrix notation for later convenience. We often identify a linear
vector transformation with its representation matrix and denote both transformation
and matrix by A. On this occasion, we write
a11 ⋯ a1n
A= ⋮ ⋱ ⋮ , A = ðAÞij = aij , etc:, ð11:38Þ
an1 ⋯ ann
where with the second expression (A)ij and (aij) represent the matrix A itself; for we
~ etc. The notation (11.38)
frequently use indexed matrix notations such as A-1, A{, A,
can conveniently be used in such cases. Note moreover that aij represents the matrix
A as well.
Equation (11.37) has duality such that the matrix A operates either on the basis
vectors or coordinates. This can explicitly be written as
a11 ⋯ a1n x1
A ð xÞ = ð e1 ⋯ e n Þ ⋮ ⋱ ⋮ ⋮
an1 ⋯ ann xn
ð11:39Þ
a11 ⋯ a1n x1
= ð e1 ⋯ en Þ ⋮ ⋱ ⋮ ⋮ :
an1 ⋯ ann xn
That is, we assume the associative law with the above expression. Making
summation representation, we have
n
x1 l = 1 a1l xl
n n
AðxÞ = ea
k = 1 k k1
⋯ ea
k = 1 k kn
⋮ = ð e 1 ⋯ en Þ ⋮
n
xn l = 1 anl xl
n n
= k=1
ea x
l = 1 k kl l
n n n n
= l=1
ea
k = 1 k kl
xl = k=1
a x
l = 1 kl l
ek :
That is, the above equation can be viewed in either of two ways, i.e., coordinate
transformation with fix vectors or vector transformation with fixed coordinates.
Also, (11.37) can formally be written as
x1 x1
AðxÞ = ðe1 ⋯ en ÞA ⋮ = ðe1 A ⋯ en AÞ ⋮ , ð11:40Þ
xn xn
11.2 Linear Transformations of Vectors 467
where we assumed that the distributive law holds with operation of A on (e1⋯en).
Meanwhile, if in (11.37) we put xi = 1, xj = 0 ( j ≠ i), we get
n
A ð ei Þ = ea :
k = 1 k ki
x1
AðxÞ = ðAðe1 Þ ⋯ Aðen ÞÞ ⋮ : ð11:41Þ
xn
a011 ⋯ a01n x1
AðxÞ = ðe1 ⋯ en Þ ⋮ ⋱ ⋮ ⋮ : ð11:43Þ
a0n1 ⋯ a0nn xn
transformation matrix be B = 1
0
0
2
. This means that e1 should be e1 after the
transformation. At the same time, the vector e2 (=e1) should be converted to
2e2 (=2e1). It is impossible except for the case of e1 = e2 = 0. The above matrix
B in its own right is an object of matrix algebra, of course.
Putting a = b = 0 and c = d = 1 in the definition of (11.25) for the linear
transformation, we obtain
Að0Þ = 0: ð11:44Þ
Then, A(Vn) forms a subspace of Vn. In fact, for any x, y 2 Vn, we have A(x),
A(y) 2 A(Vn) ⟹ A(x) + A(y) = A(x + y) 2 A(Vn) and cA(x) = A(cx) 2 A(Vn). Since
A is an endomorphism, obviously A(Vn) ⊂ Vn. Hence, A(Vn) is a subspace of Vn. The
subspace A(Vn) is said to be an image of transformation A and sometimes denoted by
Im A.
The number of dimA(Vn) is said to be a rank of the linear transformation A. That
is, we write
11.2 Linear Transformations of Vectors 469
dimAðV n Þ = rank A:
Also the number of dim Ker A is said to be a nullity of the linear transformation A.
That is, we have
With respect to the two subspaces A(Vn) and Ker A, we show an important
theorem (the so-called dimension theorem) below.
Theorem 11.3 Dimension Theorem Let Vn be a linear vector space of dimension
n. Also let A be an endomorphism: Vn → Vn. Then we have
Proof Let e1, e2, ⋯, and en be basis vectors of Vn. First, assume
n n
This implies that nullity A = n. Then, A xi ei = xi Aðei Þ = 0. Since xi is
i=1 i=1
8
arbitrarily chosen, the expression means that A(x) = 0 for x 2 Vn. This implies
A = 0. That is, rank A = 0. Thus, (11.45) certainly holds.
n
To proceed with proof of the theorem, we think of a linear combination ci ei .
i=1
Next, assume that Ker A = Span{e1, e2, ⋯, eν} (ν < n); dim Ker A = ν. After A is
n
operated on the above linear combination, we are left with ci Aðei Þ. We put
i = νþ1
n n
c Aðei Þ =
i=1 i
c Aðei Þ = 0:
i = νþ1 i
ð11:46Þ
cνþ2 c
Aðeνþ1 Þ þ Aðeνþ2 Þ þ ⋯ þ n Aðen Þ = 0,
cνþ1 cνþ1
c cn
Aðeνþ1 Þ þ A νþ2 eνþ2 þ ⋯ þ A e = 0,
cνþ1 cνþ1 n
cνþ2 c
A eνþ1 þ e þ ⋯ þ n en = 0:
cνþ1 νþ2 cνþ1
n
A ce
i = νþ1 i i
= 0:
From the above discussion, however, this relation holds if and only if
cν + 1 = ⋯ = cn = 0. This implies that Ker A \ Vn - ν = {0}. Then, we have
V n - ν þ Ker A = V n - ν Ker A:
To conclude, we get
in the theory of linear vector space. We add that Theorem 11.3 holds with a linear
transformation that connects two vector spaces with different dimensions [1–3].
As an exercise, we have a following example:
Example 11.2 Let e1, e2, e3, e4 be basis vectors of V4. Let A be an endomorphism of
V4 and described by
1 0 1 0
A= 0
1
1
0
1
1
0
0 :
0 1 1 0
We have
1 0 1 0
0 1 1 0
ð e1 e 2 e 3 e4 Þ = ð e1 þ e3 e2 þ e4 e1 þ e2 þ e3 þ e4 0 Þ:
1 0 1 0
0 1 1 0
That is,
A ð e1 Þ = e 1 þ e 3 , Aðe2 Þ = e2 þ e4 , A ð e3 Þ = e 1 þ e 2 þ e3 þ e4 ,
Aðe4 Þ = 0:
We have
Then, we find
x = c 1 e1 þ c 2 e2 þ c3 e3 þ c4 e4
1 1
= ðc1 þ c3 Þðe1 þ e3 Þ þ ð2c2 þ c3 - c1 Þðe2 þ e4 Þ ð11:50Þ
2 2
1 1
þ ðc3 - c1 Þð - e1 - e2 þ e3 Þ þ ðc1 - 2c2 - c3 þ 2c4 Þe4 :
2 2
Thus, x has been uniquely represented as (11.50) with respect to basis vectors
(e1 + e3), (e2 + e4), (-e1 - e2 + e3), and e4. The linear independence of these vectors
can easily be checked by equating (11.50) with zero. We also confirm that
472 11 Vectors and Their Transformation
surjective
V4 = A V4 Ker A:
The existence of the inverse transformation plays a particularly important role in the
theory of linear vector spaces. The inverse transformation is a linear transformation.
Let x1 = A-1(y1), x2 = A-1(y2). Also, we have A(c1x1 + c2x2) = c1A(x1) + c2A(x2) =
c1y1 + c2y2. Thus, c1x1 + c2x2 = A-1(c1y1 + c2y2) = c1A-1(y1) + c2A-1(y2), showing
that A-1 is a linear transformation. As already mentioned, a matrix that represents a
linear transformation A is uniquely determined with respect to fixed basis vectors.
This should be the case with A-1 accordingly. We have an important theorem
for this.
Theorem 11.5 [4] The necessary and sufficient condition for the matrix A-1 that
represents the inverse transformation to A to exist is that detA ≠ 0 (“det” means a
determinant). Here the matrix A represents the linear transformation A. The matrix
A-1 is uniquely determined and given by
A-1 ij
= ð- 1Þiþj ðM Þji =ðdetAÞ, ð11:51Þ
n
k=1
A-1 ik
ðAÞkj = δij ð1 ≤ i, j ≤ nÞ: ð11:52Þ
Akm = A c:
j ≠ m kj j
ð11:53Þ
n
k=1
A-1 mk
A c
j ≠ m kj j
= A-1 ik
ðAÞkm = 1: ð11:55Þ
i=m
Let a matrix A be
a11 ⋯ a1n
A= ⋮ ⋱ ⋮ : ð11:56Þ
an1 ⋯ ann
a11 ⋯ a1n
⋮ ⋱ ⋮ :
an1 ⋯ ann
where σ means permutation among 1, 2, . . ., n and ε(σ) denotes a sign of + (in the
case of even permutations) or - (for odd permutations).
We deal with triangle matrices for future discussion. It is denoted by
a11
:
T= ⋱ , ð11:58Þ
:
0 ann
where an asterisk (*) means that upper right off-diagonal elements can take any
complex numbers (including zero). A large zero shows that all the lower left
off-diagonal elements are zero. Its determinant is given by
In fact, focusing on anin we notice that only if in = n, anin does not vanish. Then
we get
AðxÞ = y, ð11:60Þ
n n
where we have vectors such that x = xi ei and y = yi ei . In reference to the
i=1 i=1
same set of basis vectors (e1 ⋯ en) and using a matrix representation, we have
a11 ⋯ a1n x1 y1
AðxÞ = ðe1 ⋯ en Þ ⋮ ⋱ ⋮ ⋮ = ð e1 ⋯ en Þ ⋮ : ð11:61Þ
an1 ⋯ ann xn yn
a11 ⋯ a1n x1 y1
⋮ ⋱ ⋮ ⋮ = ⋮ : ð11:62Þ
an1 ⋯ ann xn yn
From the above discussion, for the linear transformation A to be bijective, det
A ≠ 0. In terms of the system of linear equations, we say that for (11.62) to have a
x1 y1
unique solution ⋮ for a given ⋮ , we must have detA ≠ 0. Conversely,
xn yn
detA = 0 is equivalent to that (11.62) has indefinite solutions or has no solution. As
far as the matrix algebra is concerned, (11.61) is symbolically described by omitting
a parenthesis as
Ax = y: ð11:64Þ
However, when the vector transformation is explicitly taken into account, the full
representation of (11.61) should be borne in mind.
The relations (11.60) and (11.64) can be considered as a set of simultaneous
equations. A necessary and sufficient condition for (11.64) to have a unique solution
x for a given y is det A ≠ 0. In that case, the solution x of (11.64) can be symbolically
expressed as
x = A - 1 y, ð11:65Þ
a11 ⋯ a1n
A= ⋮ ⋱ ⋮ : ð11:56Þ
an1 ⋯ ann
We often need a matrix “derived” from A of (11.56) so that we may obtain (m, m)
matrix with m < n. Let us call such a matrix derived from A a submatrix A. Among
various types of submatrices A, we think of a special type of matrix Aði, jÞ that is
defined as a submatrix obtained by striking out the i-th row and j-th column. Then,
Aði, jÞ is a (n - 1, n - 1) submatrix.
We call detAði, jÞ a minor with respect to aij. Moreover, we define a cofactor Δij as
cos θ - sin θ 0
R= sin θ cos θ 0 :
0 0 1
x x′ cos θ sin θ 0 x′
-1
y =R y′ = - sin θ cos θ 0 y′ :
z z′ 0 0 1 z′
1 0 0
P= 0 1 0 :
0 0 0
This matrix transforms a vector x = xe1 + ye2 + ze3 into y = xe1 + ye2 as follows:
1 0 0 x x
0 1 0 y = y : ð11:67Þ
0 0 0 z 0
478 11 Vectors and Their Transformation
z
Fig. 11.4 Example of an
endomorphism P: ℝ3 → ℝ3
y
O
1 0 0 x a
0 1 0 y = b : ð11:68Þ
0 0 0 z c
First let us think of a linear transformation of a set of basis vectors e1, e2, ⋯,
and en. The transformation matrix A representing a linear transformation A is defined
as follows:
a11 ⋯ a1n
A= ⋮ ⋱ ⋮ :
an1 ⋯ ann
Notice here that we often denote both a linear transformation and its
corresponding matrix by the same character. After the transformation, suppose that
the resulting vectors are given by e01 , e02 , and e0n . This is explicitly described as
a11 ⋯ a1n
ðe1 ⋯ en Þ ⋮ ⋱ ⋮ = e1′ ⋯ en′ : ð11:69Þ
an1 ⋯ ann
Care should be taken not to confuse (11.70) with (11.63). Here, a set of vectors
e01 , e02 , and e0n may or may not be linearly independent. Let us operate both sides of
x1
(11.70) from the left on ⋮ and equate both the sides to zero. That is,
xn
a11 ⋯ a1n x1 x1
ðe1 ⋯ en Þ ⋮ ⋱ ⋮ ⋮ = e1′ ⋯ en′ ⋮ = 0:
an1 ⋯ ann xn xn
a11 ⋯ a1n x1
⋮ ⋱ ⋮ ⋮ = 0: ð11:71Þ
an1 ⋯ ann xn
x1
Meanwhile, we must have ⋮ = 0 so that e01 , e02 , and e0n can be linearly
xn
independent (i.e., so as to be a set of basis vectors). But this means that (11.71)
has such a unique (and trivial) solution and, hence, detA ≠ 0. If conversely detA ≠ 0,
(11.71) has a unique trivial solution and e01 , e02 , and e0n are linearly independent. Thus,
a necessary and sufficient condition for e01 , e02 , and e0n to be a set of basis vectors is
detA ≠ 0. If detA = 0, (11.71) has indefinite solutions (including a trivial solution)
and e01 , e02 , and e0n are linearly dependent and vice versa. In case detA ≠ 0, an inverse
matrix A-1 exists, and so we have
480 11 Vectors and Their Transformation
In the previous steps we see how the linear transformation (and the corresponding
matrix representation) converts a set of basis vectors (e1⋯ en) to another set of basis
vectors e01 ⋯ e0n . Is this possible then to find a suitable transformation between two
sets of arbitrarily chosen basis vectors? The answer is yes. This is because any vector
can be expressed uniquely by a linear combination of any set of basis vectors. A whole
array of such linear combinations uniquely define a transformation matrix between the
two sets of basis vectors as expressed in (11.69) and (11.72). The matrix has non-zero
determinant and has an inverse matrix. A concept of the transformation between basis
vectors is important and very often used in various fields of natural science.
Example 11.5 We revisit Example 11.2. The relation (11.49) tells us that the basis
vectors of A(V4) and those of Ker A span V4 in total. Therefore, in light of the above
argument, there should be a linear transformation R between the two sets of vectors,
i.e., e1, e2, e3, e4 and e1 + e3, e2 + e4, - e1 - e2 + e3, e4. Moreover, the matrix
R associated with the linear transformation must be non-singular (i.e., detR ≠ 0). In
fact, we find that R is expressed as
1 0 -1 0
0 1 -1 0
R= :
1 0 1 0
0 1 0 1
This is because we have a following relation between the two sets of basis
vectors:
1 0 -1 0
0 1 -1 0
ð e 1 e2 e3 e4 Þ = ð e 1 þ e3 e 2 þ e4 - e 1 - e 2 þ e3 e 4 Þ :
1 0 1 0
0 1 0 1
p11 ⋯ p1n x1
PðxÞ = ðe1 ⋯ en Þ ⋮ ⋱ ⋮ ⋮ , ð11:73Þ
pn1 ⋯ pnn xn
11.4 Basis Vectors and Their Transformations 481
where the non-singular matrix P represents the transformation P. Notice here that the
transformation and its matrix are represented by the same P. As mentioned in Sect.
11.2, the matrix P can be operated either from the right on the basis vectors or from
the left on the column vector. We explicitly write
p11 ⋯ p1n x1
PðxÞ = ðe1 ⋯ en Þ ⋮ ⋱ ⋮ ⋮ ,
pn1 ⋯ pnn xn
ð11:74Þ
x1
= e01 ⋯ e0n ⋮ ,
xn
where e01 ⋯ e0n = (e1⋯ en)P [here P is the non-singular matrix defined in (11.73)].
Alternatively, we have
p11 ⋯ p1n x1
PðxÞ = ðe1 ⋯ en Þ ⋮ ⋱ ⋮ ⋮
pn1 ⋯ pnn xn
ð11:75Þ
x01
= ðe1 ⋯ en Þ ⋮ ,
x0n
where
Equation (11.75) gives a column vector representation regarding the vector that
has been obtained by the transformation P and is viewed in reference to the basis
vectors (e1⋯en). Combining (11.74) and (11.75), we get
x1 x01
PðxÞ = e01 ⋯ e0n ⋮ = ðe1 ⋯ en Þ ⋮ : ð11:77Þ
xn x0n
the basis vectors (e1⋯ en) and e01 ⋯ e0n , respectively. Then, A[P(x)] can be
described in two different ways as follows:
x1 x01
A½PðxÞ = e01 ⋯ e0n A0 ⋮ = ðe1 ⋯ en ÞAO ⋮ : ð11:78Þ
xn x0n
x1 x1
A½PðxÞ = ½ðe1 ⋯ en ÞPA0 ⋮ = ð e1 ⋯ en Þ A O P ⋮ : ð11:79Þ
xn xn
n
As (11.79) is described for a vector x = xi ei arbitrarily chosen in Vn, we get
i=1
PA0 = AO P: ð11:80Þ
A0 = P - 1 AO P: ð11:81Þ
x1
AðxÞ = ðe1 ⋯ en ÞAO ⋮ : ð11:82Þ
xn
Note that since the transformation A has been performed in reference to the basis
vectors (e1⋯en), AO should be used for the matrix representation. This is rewritten as
x1
AðxÞ = ðe1 ⋯ en ÞPP - 1 AO PP - 1 ⋮ : ð11:83Þ
xn
x1
x = ð e1 ⋯ en Þ ⋮
xn
x1 x1 x1
-1
= ðe1 ⋯ en ÞPP ⋮ = e01 ⋯ e0n P -1
⋮ = e01 ⋯ e0n ⋮ :
xn xn xn
ð11:84Þ
In (11.84) we put
x1 x~1
ðe1 ⋯ en ÞP = e01 ⋯ e0n , P-1 ⋮ = ⋮ : ð11:85Þ
xn x~n
Equation (11.84) gives a column vector representation regarding the same vector
x viewed in reference to the basis set of vectors e1, ⋯, en or e01 , ⋯, e0n . Equation
(11.85) should not be confused with (11.76). That is, (11.76) relates the two
x1 x01
coordinates ⋮ and ⋮ of different vectors before and after the transformation
xn x0n
viewed in reference to the same set of basis vectors. The relation (11.85), however,
x1 x~1
relates two coordinates ⋮ and ⋮ of the same vector viewed in reference to
xn x~n
different set of basis vectors. Thus, from (11.83) we have
x~1
AðxÞ = e01 ⋯ e0n P - 1 AO P ⋮ : ð11:86Þ
x~n
x~1
AðxÞ = e01 ⋯ e0n A0 ⋮ : ð11:87Þ
x~n
A0 = P - 1 AO P: ð11:88Þ
References
1. Satake I-O (1975) Linear algebra (Pure and applied mathematics). Marcel Dekker, New York
2. Satake I (1974) Linear algebra. Shokabo, Tokyo. (in Japanese)
3. Hassani S (2006) Mathematical physics. Springer, New York
4. Dennery P, Krzywicki A (1996) Mathematics for physicists. Dover, New York
Chapter 12
Canonical Forms of Matrices
In Sect. 11.4 we saw that the transformation matrices are altered depending on the
basis vectors we choose. Then a following question arises. Can we convert a
(transformation) matrix to as simple a form as possible by similarity transforma-
tion(s)? In Sect. 11.4 we have also shown that if we have two sets of basis vectors in
a linear vector space Vn we can always find a non-singular transformation matrix
between the two. In conjunction with the transformation of the basis vectors, the
matrix undergoes similarity transformation. It is our task in this chapter to find a
simple form or a specific form (i.e., canonical form) of a matrix as a result of the
similarity transformation. For this purpose, we should first find eigenvalue(s) and
corresponding eigenvector(s) of the matrix. Depending upon the nature of matrices,
we get various canonical forms of matrices such as a triangle matrix and a diagonal
matrix. Regarding any form of matrices, we can treat these matrices under a unified
form called the Jordan canonical form.
An eigenvalue problem is one of the important subjects of the theory of linear vector
spaces. Let A be a linear transformation on Vn. The resulting matrix gets to several
different kinds of canonical forms of matrices. A typical example is a diagonal
matrix. To reach a satisfactory answer, we start with a so-called eigenvalue problem.
Suppose that after the transformation of x we have
where α is a certain (complex) number. Then we say that α is an eigenvalue and that
x is an eigenvector that corresponds to the eigenvalue α. Using a notation of (11.37)
of Sect. 11.2, we have
a11 ⋯ a1n x1 x1
AðxÞ = ðe1 ⋯ en Þ ⋮ ⋱ ⋮ ⋮ = αðe1 ⋯ en Þ ⋮ :
an1 ⋯ ann xn xn
a11 ⋯ a1n x1 x1
⋮ ⋱ ⋮ ⋮ =α ⋮ :
an1 ⋯ ann xn xn
x1
If we identify x with ⋮ at fixed basis vectors (e1⋯en), we may naturally
xn
rewrite (12.1) as
Ax = αx: ð12:2Þ
ðA - αE Þx = 0, ð12:3Þ
where E is a (n, n) unit matrix. Equations (12.2) and (12.3) are said to be an
eigenvalue equation (or eigenequation). In (12.2) or (12.3) x = 0 always holds
(as a trivial solution). Consequently, for x ≠ 0 to be a solution we must have
jA - αE j = 0: ð12:4Þ
x - a11 ⋯ - a1n
f A ðxÞ = jxE - Aj = ⋮ ⋱ ⋮ : ð12:5Þ
- an1 ⋯ x - ann
f A ðxÞ = xn þ a1 xn - 1 þ ⋯ þ an , ð12:6Þ
we have
an = ð- 1Þn j A j : ð12:8Þ
The characteristic equation fA(x) = 0 has n roots including possible multiple roots.
Let those roots be α1, ⋯, αn (some of them may be identical). Then we have
n
f A ð xÞ = i=1
ðx - αi Þ: ð12:9Þ
α1 þ ⋯ þ αn = - a1 = TrA, ð12:10Þ
This leads to invariance of the trace under a similarity transformation. That is,
Tr P - 1 AP = TrA: ð12:13Þ
a11
:
T= : ð11:58Þ
⋱
0 ann
The matrix of this type is thought to be one of canonical forms of matrices. Its
characteristic equation fT(x) is
488 12 Canonical Forms of Matrices
x - a11
f T ðxÞ = jxE - T j = 0 ⋱ : ð12:14Þ
0 0 x - ann
Therefore, we get
n
f T ð xÞ = i=1
ðx - aii Þ, ð12:15Þ
where we used (11.59). From (11.58) and (12.15), we find that eigenvalues of a
triangle matrix are given by its diagonal elements accordingly. Our immediate task
will be to examine whether and how a given matrix is converted to a triangle matrix
through a similarity transformation. A following theorem is important.
Theorem 12.1 Every (n, n) square matrix can be converted to a triangle matrix by
similarity transformation [1].
Proof A triangle matrix is either an “upper” triangle matrix [to which all the lower
left off-diagonal elements are zero; see (11.58)] or a “lower” triangle matrix
(to which all the upper right off-diagonal elements are zero). In the present case
we show the proof for the upper triangle matrix. Regarding the lower triangle matrix
the theorem is proven in a similar manner.
The proof is due to mathematical induction. First we show that the theorem is true
of a (2, 2) matrix. Suppose that one of the eigenvalues of A2 is α1 and that its
corresponding eigenvector is x1. Then we have
A2 x1 = α1 x1 , ð12:16Þ
P1 = ðx1 p1 Þ, ð12:17Þ
where p1 is another column vector chosen in such a way that x1 and p1 are linearly
independent so that P1 can be a non-singular matrix. Then, we have
Meanwhile,
x1 = ð x 1 p 1 Þ 1
0
= P1 1
0
: ð12:19Þ
Hence, we have
12.1 Eigenvalues and Eigenvectors 489
α1
P1- 1 A2 x1 = α1 P1- 1 x1 = α1 P1- 1 P1 1
0
= α1 1
0
= 0
: ð12:20Þ
α1
P1- 1 A2 P1 = 0
: ð12:21Þ
-1
A~n = P
~ ~
An P: ð12:22Þ
αn x1 ⋯ xn - 1
0
An = : ð12:23Þ
⋮ An - 1
0
P = ð an p1 p2 ⋯ pn - 1 Þ,
-1
An = P An P
-1 -1 -1 -1 ð12:24Þ
= P An an P A n p1 P An p2 ⋯ P A n pn - 1 :
1 1
0 0
an = ð an p1 p2 ⋯ pn - 1 Þ 0 =P 0 :
⋮ ⋮
0 0
Therefore, we have
1 1 αn
-1 -1 -1
~
P ~
An an = P ~
αn an = P ~
αn P
0
0 = αn
0
0 =
0
0 :
⋮ ⋮ ⋮
0 0 0
Thus, from (12.24) we see that one can express a matrix form of A~n as (12.23).
By a hypothesis of the mathematical induction, we assume that there exists a (n -
1, n - 1) non-singular square matrix Pn - 1 and an upper triangle matrix Δn - 1
such that
Pn--11 An - 1 Pn - 1 = Δn - 1 : ð12:25Þ
d 0 ⋯ 0
0
Pn = ,
⋮ Pn - 1
0
αn x1 ⋯ xn - 1 d 0 ⋯ 0
0 0
⋮ An - 1 ⋮ Pn - 1
0 0
αn d xTn - 1 Pn - 1
0
= , ð12:26Þ
⋮ An - 1 Pn - 1
0
x1
where xTn - 1 is a transpose of a column vector ⋮ . Therefore, xTn - 1 Pn - 1 is a (1,
xn - 1
n - 1) matrix (i.e., a row vector). Meanwhile, we have
d 0 0 0 αn xTn - 1 Pn - 1 =d
0 0
⋮ Pn - 1 ⋮ Δn - 1
0 0
αn d xTn - 1 Pn - 1
0
= : ð12:27Þ
⋮ Pn - 1 Δn - 1
0
That is,
A~n Pn = Pn Δn : ð12:29Þ
αn xTn - 1 Pn - 1 =d
0
Δn = , ð12:30Þ
⋮ Δn - 1
0
-1
~ n
Δn = PP ~ n:
An PP ð12:31Þ
A= 2
0
1
1
: ð12:32Þ
1 1
x=0
0 0
1 1 2 1 1 -1 2 0
P - 1 AP = = : ð12:33Þ
0 1 0 1 0 1 0 1
c1 a1 þ c2 a2 = 0: ð12:34Þ
Suppose that a1 and a2 are linearly dependent. Then, without loss of generality we
can put c1 ≠ 0. Accordingly, we get
c2
a1 = - a : ð12:35Þ
c1 2
c2
α 1 a1 = - α a = α2 a1 : ð12:36Þ
c1 2 2
With the second equality we have used (12.35). From (12.36) we have
c1 a1 þ c2 a2 þ ⋯ þ cn - 1 an - 1 þ cn an = 0, ð12:38Þ
c1 a1 þ c2 a2 þ ⋯ þ cn - 1 an - 1 = 0: ð12:39Þ
c1 c c
an = - a þ 2 a þ ⋯ þ n - 1 an - 1 : ð12:40Þ
cn 1 cn 2 cn
c1 c c
αn an = - α a þ 2 α a þ ⋯ þ n - 1 αn - 1 an - 1 : ð12:41Þ
cn 1 1 cn 2 2 cn
c1 c c
αn an = - α a þ 2 α a þ ⋯ þ n - 1 αn an - 1 : ð12:42Þ
cn n 1 cn n 2 cn
c1 c c
0= ðα - α1 Þa1 þ 2 ðαn - α2 Þa2 þ ⋯ þ n - 1 ðαn - αn - 1 Þan - 1 : ð12:43Þ
cn n cn cn
c1 c c
0= - α a þ 2 α a þ ⋯ þ n - 1 αn - 1 an - 1 : ð12:44Þ
cn 1 1 cn 2 2 cn
Equations (12.21), (12.30), and (12.31) show that if we adopt an eigenvector as one
of the basis vectors, the matrix representation of the linear transformation A in
reference to such basis vectors is obtained so that the leftmost column is zero except
for the left top corner on which an eigenvalue is positioned. (Note that if the said
eigenvector is zero, the leftmost column is a zero vector.) Meanwhile, neither
Theorem 12.1 nor Theorem 12.2 tells about multiplicity of eigenvalues. If the
eigenvalues have multiplicity, we have to think about different aspects. This is a
major issue of this section.
Let us start with a discussion of invariant subspaces. Let A be a (n, n) square
matrix. If a subspace W in Vn is characterized by
x 2 W ⟹ Ax 2 W,
ð a1 a2 ⋯ am amþ1 ⋯ an ÞA
Am
= ð a1 a2 ⋯ am amþ1 ⋯ an Þ
, ð12:45Þ
0
a, Aa, A2 a, ⋯, An a:
496 12 Canonical Forms of Matrices
These vectors are linearly dependent since there are at most n linearly indepen-
dent vectors in Vn. These vectors span a subspace in Vn. Let us call this subspace M;
i.e.,
M Span a, Aa, A2 a, ⋯, An a :
c0 a þ c1 Aa þ c2 A2 a þ ⋯ þ cn An a = 0: ð12:46Þ
Not all ci (0 ≤ i ≤ n) are zero, because the vectors are linearly dependent. Suppose
that cn ≠ 0. Then, from (12.46) we have
1
An a = - c a þ c1 Aa þ c2 A2 a þ ⋯ þ cn - 1 An - 1 a :
cn 0
1
Anþ1 a = - c Aa þ c1 A2 a þ c2 A3 a þ ⋯ þ cn - 1 An a :
cn 0
1
An - 1 a = - c a þ c1 Aa þ c2 A2 a þ ⋯ þ cn - 2 An - 2 a :
cn - 1 0
1
Anþ1 a = - c A2 a þ c1 A3 a þ c2 A4 a þ ⋯ þ cn - 2 An a :
cn - 1 0
c0 a þ c1 Aa = 0:
c0
Aa = - a:
c1
Operating once again An on the above equation from the left, we have
12.2 Eigenspaces and Invariant Subspaces 497
c0 n
Anþ1 a = - A a:
c1
m dimM ≤ dimV n = n:
A= Am
0
: ð12:47Þ
Note again that in (12.45) Vn is spanned by the m basis vectors in M together with
other (n - m) linearly independent vectors.
We happen to encounter a situation where two subspaces W1 and W2 are at once
A-invariant. Here we can take basis vectors a1, a2, ⋯, and am for W1 and am + 1,
am + 2, ⋯, and an for W2. In reference to such a1, a2, ⋯, and an as basis vector of Vn,
we have
A= Am
0
0
An - m
, ð12:48Þ
A = Am An - m :
Vn = W1 W2
ð12:49Þ
= Spanfa1 , a2 , ⋯, am g Spanfamþ1 , amþ2 , ⋯, an g:
498 12 Canonical Forms of Matrices
a11 ⋯ a1n
A= ⋮ ⋱ ⋮ :
an1 ⋯ ann
Δ11 ⋯ Δ1n
~=
A ⋮ ⋱ ⋮ ,
Δn1 ⋯ Δnn
~ T A = AA
A ~ T = j A j E, ð12:50Þ
~ T is a transposed matrix of A.
where A ~ We now apply (12.50) to the characteristic
polynomial to get a following equation expressed as
T T
xE - A ðxE - AÞ = ðxE - AÞ xE - A = jxE - AjE = f A ðAÞE, ð12:51Þ
where xE - A is the cofactor matrix of xE - A. Let the cofactor of the (i, j)-element
of (xE - A) be Δij. Note in this case that Δij is an at most (n - 1)-th order polynomial
of x. Let us put accordingly
Δ11 ⋯ Δ1n
xE - A = ⋮ ⋱ ⋮
Δn1 ⋯ Δnn
n-1
b11,0 x þ ⋯ þ b11,n - 1 ⋯ b1n,0 xn - 1 þ ⋯ þ b1n,n - 1
= ⋮ ⋱ ⋮
bn1,0 xn - 1 þ ⋯ þ bn1,n - 1 ⋯ bnn,0 xn - 1 þ ⋯ þ bnn,n - 1
b11,0 ⋯ b1n,0 b11,0 ⋯ b1n,n - 1
= xn - 1 ⋮ ⋱ ⋮ þ⋯þ ⋮ ⋱ ⋮
bn1,0 ⋯ bnn,0 bn1,0 ⋯ bnn,n - 1
n-1 n-2
=x B0 þ x B1 þ ⋯ þ Bn - 1 :
ð12:53Þ
Thus, we get
This means that fA(A)(x) = 0 for 8x 2 Vn. These complete the proof.
In relation to Hamilton–Cayley theorem, we consider an important concept of a
minimal polynomial.
Definition 12.1 Let f(x) be a polynomial of x with scalar coefficients such that
f(A) = 0, where A is a (n, n) matrix. Among f(x), a lowest-order polynomial with the
highest-order coefficient of one is said to be a minimal polynomial. We denote it by
φA(x); i.e., φA(A) = 0.
We have an important property for this. Namely, a minimal polynomial φA(x) is a
divisor of f(A). In fact, suppose that we have
From the above equation, h(x) should be a polynomial whose order is lower than
that of φA(A). But the presence of such h(x) is in contradiction to the definition of the
minimal polynomial. This implies that h(x) 0. Thus, we get
Then, we have
where c is a constant, because φA(x) and φ0A ðxÞ are of the same order. But c should be
one, since the highest-order coefficient of the minimal polynomial is one. Thus,
φA(x) is uniquely defined.
Vn = W1 W2 ⋯ Wn
ð12:57Þ
= Spanfa1 g Spanfa2 g ⋯ Spanfan g,
2 0 0
A= 0 2 0 : ð12:58Þ
0 0 2
12.3 Generalized Eigenvectors and Nilpotent Matrices 501
The matrix has a triple root 2. As can easily be seen below, individual eigenvec-
tors a1, a2, and a3 form basis vectors of each invariant subspace, indicating that V3
can be decomposed to a direct sum of the three invariant subspaces as in (12.57) such
that
2 0 0
ð a1 a2 a3 Þ 0 2 0 = ð 2a1 2a2 2a3 Þ:
0 0 2
3 1 x1
AðxÞ = ð a1 a2 Þ , ð12:59Þ
0 3 x2
where a1 and a2 are basis vectors and for 8x 2 V2,x = x1a1 + x2a2. This example has a
double root 3. We have
3 1
ð a1 a2 Þ = ð 3a1 a1 þ 3a2 Þ: ð12:60Þ
0 3
Thus, Span {a1} is A-invariant, but this is not the case with Span{a2}. This
implies that a1 is an eigenvector corresponding to an eigenvalue 3 but a2 is not.
Detailed discussion about matrices of this kind follows below.
Nilpotent matrices play an important role in matrix algebra. These matrices are
defined as follows.
Definition 12.2 Let N be a linear transformation in a vector space Vn. Suppose that
we have
Nk = 0 and N k - 1 ≠ 0, ð12:61Þ
where N is a (n, n) square matrix and k (≤2) is a certain natural number. Then, N is
said to be a nilpotent matrix of an order k or a k-th order nilpotent matrix. If (12.61)
holds with k = 1, N is a zero matrix.
Nilpotent matrices have the following properties:
1. Eigenvalues of a nilpotent matrix are zero. Let N be a k-th order nilpotent matrix.
Suppose that
502 12 Canonical Forms of Matrices
Nx = αx, ð12:62Þ
N k x = αk x: ð12:63Þ
~
P - 1 NP = N,
f N ðN Þ = N n = 0:
P - 1 NP = 0:
The above equation holds, because N only takes eigenvalues of zero. Operat-
ing P from the left of the above equation and P-1 from the right, we have
N = 0:
N= 0
0
1
0
:
8
A = 0 , Ax = 0 for x 2 V n: ð12:64Þ
∃
A ≠ 0 , Ax ≠ 0 for x 2 V n:
x, Nx N 2 x, ⋯, N k - 1 x:
k-1
cN
i=0 i
i
x = 0: ð12:65Þ
c0 N k - 1 x = 0: ð12:66Þ
k-1
cN
i=1 i
i
x = 0: ð12:67Þ
c1 N k - 1 x = 0, ð12:68Þ
This immediately tells us that for a k-th order nilpotent matrix N : Vn → Vn, we
should have k ≤ n. This is because the number of linearly independent vectors is at
most n.
In Example 12.4, N causes a transformation of basis vectors in V2 such that
0 1
ð e1 e2 ÞN = ð e1 e2 Þ =ð0 e1 Þ:
0 0
That is, Ne2 = e1. Then, linearly independent two vectors e2 and Ne2 correspond
to the case of Theorem 12.4.
So far we have shown simple cases where matrices can be diagonalized via
similarity transformation. This is equivalent to that the relevant vector space is
decomposed to a direct sum of (invariant) subspaces. Nonetheless, if a characteristic
polynomial of the matrix has multiple root(s), it remains uncertain whether the vector
space is decomposed to such a direct sum. To answer this question, we need a
following lemma.
Lemma 12.1 Let f1(x), f2(x), ⋯, and fs(x) be polynomials without a common factor.
Then we have s polynomials M1(x), M2(x), ⋯, Ms(x) that satisfy the following
relation:
Proof Let Mi(x) (1 ≤ i ≤ s) be arbitrarily chosen polynomials and deal with a set of
g(x) that can be expressed as
Let a whole set of g(x) be H. Then H has the following two properties:
1. g1 ðxÞ, g2 ðxÞ 2 H ⟹ g1 ðxÞ þ g2 ðxÞ 2 H,
2. gðxÞ 2 H, M(x): an arbitrarily chosen polynomial ⟹ M ðxÞgðxÞ 2 H.
Now let the lowest-order polynomial out of the set H be g0(x). Then 8 gðxÞð2 HÞ
are a multiple of g0(x). Suppose that dividing g(x) by g0(x), we have
where h(x) is a certain polynomial. Since gðxÞ, g0 ðxÞ 2 H, we have hðxÞ 2 H from
the above properties (1) and (2). If h(x) ≠ 0, the order of h(x) is lower than that of
g0(x) from (12.71). This is, however, in contradiction to the definition of g0(x).
Therefore, h(x) = 0. Thus,
On assumption the greatest common factor of f1(x), f2(x), ⋯, and fs(x) is 1. This
implies that g0(x) = 1. Thus, we finally get (12.69) and complete the proof.
ðA - αEÞl x = 0, ðA - αEÞl - 1 x ≠ 0:
AðAx - xÞ = 0:
Then, we have
506 12 Canonical Forms of Matrices
Ax = x or Ax = 0: ð12:74Þ
From Theorem 11.3 (dimension theorem), for a general case of linear operators
we had
V n = A ðV n Þ Ker A,
where
ðE - AÞ2 = E - 2A þ A2 = E - 2A þ A = E - A:
AðE - AÞ = ðE - AÞA = 0:
Bx - x = ðE - AÞx - x = x - Ax - x = - Ax:
we get
W A = AV n , W A = ðE - AÞV n = BV n = V n - AV n = Ker A:
That is,
V n = W A þ W A:
Vn = WA W A:
A1 þ A2 þ ⋯ þ As = E,
Ai Aj = Ai δij :
Then
Vn = W1 W2 ⋯ W s: ð12:78Þ
In fact, suppose that ∃x( 2 Vn) 2 Wi, Wj (i ≠ j). Then Aix = x = Ajx. Operating Aj
( j ≠ i) from the left, AjAix = Ajx = AjAjx. That is, 0 = x = Ajx, implying that
Wi \ Wj = {0}. Meanwhile,
V n = ðA1 þ A2 þ ⋯ þ As ÞV n = A1 V n þ A2 V n þ ⋯ þ As V n
ð12:79Þ
= W 1 þ W 2 þ ⋯ þ W s:
1 0 0 0
0 1 0 0
ð e1 e2 e3 e4 ÞA = ð e1 e2 e3 e4 Þ = ð e1 e2 0 0 Þ:
0 0 0 0
0 0 0 0
4
Put x = i = 1 xi ei . Then, we have
4 2
AðxÞ = x Aðei Þ =
i=1 i
xe
i=1 i i
= W A,
508 12 Canonical Forms of Matrices
4
ðE - AÞðxÞ = x - AðxÞ = xe
i=3 i i
= W A:
In the above,
0 0 0 0
0 0 0 0
ð e1 e2 e3 e4 ÞðE - AÞ = ð e1 e2 e3 e4 Þ = ð 0 0 e3 e 4 Þ :
0 0 1 0
0 0 0 1
Thus, we have
W A = Spanfe1 , e2 g, W A = Spanfe3 , e4 g, V4 = WA W A:
The properties of idempotent matrices can easily be checked. It is left for readers.
Using the aforementioned idempotent operators, let us introduce the following
theorem.
Theorem 12.5 [3, 4] Let A be a linear transformation Vn → Vn. Suppose that a
vector x (2Vn) satisfies the following relation:
ðA - αE Þl x = 0, ð12:80Þ
V n = W α1 W α2 ⋯ W αs : ð12:81Þ
~ αi ð1 ≤ i ≤ sÞ is given by
Here W
~ αi = x; x 2 V n , ðA - αi E Þli x = 0 ,
W ð12:82Þ
~ α i = ni .
where li is an enough large natural number. If multiplicity of αi is ni, dim W
Proof Let us define the aforementioned A-invariant subspaces as W ~ α k ð 1 ≤ k ≤ sÞ.
Let fA(x) be a characteristic polynomial of A. Factorizing fA(x) into a product of
powers of first-degree polynomials, we have
12.4 Idempotent Matrices and Generalized Eigenspaces 509
s
f A ð xÞ = i=1
ðx - αi Þni , ð12:83Þ
Then f1(x), f2(x), ⋯, and fs(x) do not have a common factor. Consequently,
Lemma 12.1 tells us that there are polynomials M1(x), M2(x), ⋯, and Ms(x) such that
A1 þ A2 þ ⋯ þ As = E, ð12:87Þ
Ai Aj = Ai δij : ð12:88Þ
In fact, if i ≠ j,
The second equality results from the fact that Mj(A) and fi(A) are commutable
since both are polynomials of A. The last equality follows from Hamilton–Cayley
theorem. On the basis of (12.87) and (12.89),
We have
510 12 Canonical Forms of Matrices
Operating Mi(A) from the left and from the fact that Mi(A) commutes with
ðA - αi E Þni , we get
ðA - αi EÞni Ai = 0, ð12:94Þ
where we used Mi(A)fi(A) = Ai. Operating both sides of this equation on Vn,
furthermore, we have
ðA - αi E Þni Ai V n = 0:
~ αi ð1 ≤ i ≤ sÞ:
Ai V n ⊂ W ð12:95Þ
and, hence,
x = Ai ½N ðAÞx 2 Ai V n : ð12:98Þ
Thus, we get
12.4 Idempotent Matrices and Generalized Eigenspaces 511
~ αi ⊂ Ai V n ð1 ≤ i ≤ sÞ:
W ð12:99Þ
~ αi = Ai V n ð1 ≤ i ≤ sÞ:
W ð12:100Þ
Vn = W1 W2 ⋯ Ws
or
~ α1
Vn = W ~ α2
W ⋯ ~ αs :
W ð12:101Þ
This completes the former half of the proof. With the latter half, the proof is as
follows.
Suppose that dim W~ αi = n0i . In parallel to the decomposition of Vn to the direct
sum of (12.81), A can be reduced as
Að1Þ ⋯ 0
A ⋮ ⋱ ⋮ , ð12:102Þ
0 ⋯ AðsÞ
where A(i) (1 ≤ i ≤ s) is a (n0i , n0i ) matrix and a symbol indicates that A has been
transformed by suitable similarity transformation. The matrix A(i) represents a linear
transformation that A causes to W ~ αi . We denote a n0i order identity matrix by E n0 .
i
Equation (12.82) implies that the matrix represented by
is a nilpotent matrix. The order of a nilpotent matrix is at most n0i (vide supra) and,
hence, li can be n0i . With Ni we have
The last equality is because eigenvalues of a nilpotent matrix are all zero.
Meanwhile,
512 12 Canonical Forms of Matrices
0
f AðiÞ ðxÞ = xE n0i - AðiÞ = f N i ðx - αi Þ = ðx - αi Þni : ð12:105Þ
The last equality comes from (12.83). Thus, n0i = ni . At the same time, we may
equate li in (12.82) to ni. These procedures complete the proof.
Theorem 12.1 shows that any square matrix can be converted to a (upper) triangle
matrix by a similarity transformation. Theorem 12.5 demonstrates that the matrix can
further be segmented according to individual eigenvalues. Considering Theorem
12.1 again, A(i) (1 ≤ i ≤ s) can be described as an upper triangle matrix by
αi ⋯
AðiÞ ⋮ ⋱ ⋮ : ð12:107Þ
0 ⋯ αi
0 ⋯
N ðiÞ = AðiÞ - αi E ni = ⋮ ⋱ ⋮ , ð12:108Þ
0 ⋯ 0
we find that N(i) is nilpotent. This is because all the eigenvalues of N(i) are zero. From
(12.108), we have
μi
N ðiÞ = 0 ðμi ≤ ni Þ:
Readers, check this. We have a following important theorem with the matrix
decomposition.
Theorem 12.6 [3, 4] Any (n, n) square matrix A is expressed uniquely as
A = S þ N, ð12:109Þ
α 1 E n1 ⋯ 0
S ⋮ ⋱ ⋮ , ð12:111Þ
0 ⋯ α s E ns
E n1 ⋯ 0
A1 ⋮ ⋱ ⋮
0 ⋯ 0
N ð 1Þ ⋯ 0
N ⋮ ⋱ ⋮ , N ðiÞ = AðiÞ - αi E ni : ð12:112Þ
0 ⋯ N ðsÞ
Since each N(i) is nilpotent as stated in Sect. 12.4, N is nilpotent as well. Also
(12.112) is a polynomial of A as in the case of S. Therefore, S and N are commutable.
To prove the uniqueness of the decomposition, we show the following:
1. Let S and S′ be commutable semi-simple matrices. Then, those matrices are
simultaneously diagonalized. That is, with a certain non-singular matrix
P, P-1SP and P-1S′P are diagonalized at once. Hence, S ± S′ is semi-simple
as well.
514 12 Canonical Forms of Matrices
V n = W α1 ⋯ W αs :
invariant. Therefore, if we adopt the basis vectors {a1, ⋯, an} with respect to the
direct sum decomposition, we get
α1 E n1 ⋯ 0 S01 ⋯ 0
S ⋮ ⋱ ⋮ , S0 ⋮ ⋱ ⋮ :
0 ⋯ αs E ns 0 ⋯ S0s
ð e1 ⋯ en ÞP = ð a1 ⋯ an Þ:
Thus, we get
α1 E n1 ⋯ 0 S01 ⋯ 0
P - 1 SP = ⋮ ⋱ ⋮ , P - 1 S0 P = ⋮ ⋱ ⋮ :
0 ⋯ αs E ns 0 ⋯ S0s
This means that both P-1SP and P-1S′P are diagonal. That is, P-1SP ±
P S P = P-1(S ± S′)P is diagonal, indicating that S ± S′ is semi-simple as well.
-1 ′
0
2. Suppose that Nν = 0 and N 0ν = 0. From the assumption, N and N′ are commut-
able. Consequently, using binomial theorem we have
m!
ðN ± N 0 Þ = N m ± mN m - 1 N 0 þ ⋯ þ ð ± 1Þi N m - iN0 þ ⋯
m i
i!ðm - iÞ! ð12:113Þ
þð ± 1Þm N 0 :
m
α1 ⋯ 0
S ⋮ ⋱ ⋮ , ð12:114Þ
0 ⋯ αn
A = S þ N = S0 þ N 0 or S - S0 = N 0 - N: ð12:115Þ
From the assumption, S′ and N′ are commutable. Moreover, since S, S′, N, and
′
N are described by a polynomial of A, they are commutable with one another.
Hence, from (1) and (2) along with the second equation of (12.115), S - S′ and
N′ - N are both semi-simple and nilpotent at once. Consequently, from (3) S -
S′ = N′ - N = 0. Thus, we finally get S = S′ and N = N′. That is, the
decomposition is unique.
These complete the proof.
Theorem 12.6 implies that the matrix decomposition of (12.109) is unique. This
matrix decomposition is said to be Jordan decomposition. On the basis of Theorem
12.6, we investigate Jordan canonical forms of matrices in the next section.
Once the vector space Vn has been decomposed to a direct sum of generalized
eigenspaces with the matrix reduced in parallel, we are able to deal with individual
~ αi ð1 ≤ i ≤ sÞ and the corresponding A(i) (1 ≤ i ≤ s) given in (12.102)
eigenspaces W
separately.
Then we have
Note that when ν = 1, we have trivially Vn = W(ν) ⊃ W(0) {0}. Let us put dim
W = mi, mi - mi - 1 = ri (1 ≤ i ≤ ν), m0 0. Then we can add rν linearly
(i)
independent vectors a1 , a2 , ⋯, and arν to the basis vectors of W(ν - 1) so that those rν
vectors can be basis vectors of W(ν). Unless N = 0 (i.e., zero matrix), we must have at
least one such vector; from the supposition with ∃x ≠ 0 we have Nν - 1x ≠ 0, and so
x2= W(ν - 1). At least one such vector x is present and it is eligible for a basis vector of
W(ν). Hence, rν ≥ 1 and we have
Moreover, these rν vectors Na1 , Na2 , ⋯, and Narν are linearly independent.
Suppose that
W ðiÞ =
Span N ν - i a1 , ⋯, N ν - i arν , N ν - i - 1 arν þ1 , ⋯, N ν - i - 1 arν - 1 , ⋯, ariþ1 þ1 , ⋯, ari
W ði - 1Þ :
ð12:119Þ
Table 12.1 [3, 4] shows the resulting structure of these basis vectors pertinent to
Jordan blocks. In Table 12.1, if laterally counting basis vectors, from the top we have
rν, rν - 1, ⋯ , r1 vectors. Their sum is n. This is the same number as that vertically
counted. The dimension n of the vector space Vn is thus given by
ν ν
n= r
i=1 i
= i=1
iðr i - r iþ1 Þ: ð12:121Þ
Let us examine the structure of Table 12.1 more closely. More specifically, let us
inspect the i-layered structures of (ri - ri + 1) vectors. Picking up a vector from
among ariþ1 þ1 , ⋯, ari , we call it aρ. Then, we get following set of vectors aρ, Naρ
N2aρ, ⋯, Ni - 1aρ in the i-layered structure. These i vectors are displayed “vertically”
in Table 12.1. These vectors are linearly independent (see Theorem 12.4) and form
an i-dimensional N-invariant subspace; i.e., Span{Ni - 1aρ, Ni - 2aρ, ⋯, Naρ, aρ},
where ri + 1 + 1 ≤ ρ ≤ ri. Matrix representation of the linear transformation N with
respect to the set of these i vectors is [3, 4]
518
Table 12.1 Jordan blocks and structure of basis vectors for a nilpotent matrixa
rν#
rν← a1 , ⋯, arν rν - 1 - rν#
rν - 1← Na1 , ⋯, Narν arν þ1 , ⋯, arν - 1
⋯ N 2 a1 , ⋯, N 2 arν Narν þ1 , ⋯, Narν - 1
⋯ ⋯ ⋯ ⋯ ri - ri + 1#
ri← N ν - i a1 , ⋯, N ν - i arν N ν - i - 1 arν þ1 , ⋯, N ν - i - 1 arν - 1 ⋯ ariþ1 þ1 , ⋯, ari
ri - 1← N ν - i - 1 a1 , ⋯, N ν - i - 1 arν N ν - i - 2 arν þ1 , ⋯, N ν - i - 2 arν - 1 ⋯ Nariþ1 þ1 , ⋯, Nari
⋯ ⋯ ⋯ ⋯ ⋯ r2 - r3#
r2← N ν - 2 a1 , ⋯, N ν - 2 arν N ν - 3 arν þ1 , ⋯, N ν - 3 arν - 1 ⋯ N i - 2 ariþ1 þ1 , ⋯, N i - 2 ari ⋯ ar3 þ1 , ⋯, ar2 r1 - r2#
ν-1 ν-1 ν-2 ν-2 i-1 i-1
r1← N a1 , ⋯, N ar ν N arν þ1 , ⋯, N ar ν - 1 ⋯ N ariþ1 þ1 , ⋯, N ari ⋯ Nar3 þ1 , ⋯, Nar2 ar2 þ1 , ⋯, ar1
12
ν
n= k = 1 rk νrν (ν - 1)(rν - 1 - rν) ⋯ i(ri - ri + 1) ⋯ 2(r2 - r3) r1 - r2
a
Adapted from Satake I (1974) Linear algebra (Mathematics Library 1: in Japanese) [4], with the permission of Shokabo Co., Ltd., Tokyo
Canonical Forms of Matrices
12.6 Jordan Canonical Form 519
N i - 1 aρ N i - 2 aρ ⋯Naρ aρ N
0 1
0 1
0 ⋱
i-1 i-2
ð12:122Þ
= N aρ N aρ ⋯Naρ aρ ⋮ ⋱ ⋱ ⋮ :
0 1
⋯ 0 1
0
These (i, i) matrices of (12.122) are called i-th order Jordan blocks. Equation
(12.122) clearly shows that Ni - 1aρ is a proper eigenvector and that we have i - 1
generalized eigenvectors Ni - 2aρ, ⋯, Naρ, aρ.
Notice that the number of those Jordan blocks is (ri - ri + 1). Let us expressly
define this number as [3, 4]
J i = r i - r iþ1 , ð12:123Þ
where Ji is the number of the i-th order Jordan blocks. The total number of Jordan
blocks within a whole vector space Vn = W(ν) is
ν ν
J
i=1 i
= i=1
ðr i - r iþ1 Þ = r 1 : ð12:124Þ
Meanwhile, since W(i) = Ker Ni, dim W(i) = mi = dim Ker Ni. From (12.116),
m0 0. Then (11.45) is now read as
That is
n = mi þ rankN i ð12:127Þ
or
mi = n - rankN i : ð12:128Þ
(b)
(a) ⋯
, , ⋯,
= = = ⋯= 1
i i-1
dimW ðiÞ = mi = r ,
k=1 k
dimW ði - 1Þ = mi - 1 = r :
k=1 k
ð12:129Þ
Hence, we have
r i = mi - mi - 1 : ð12:130Þ
where the last equality arises from the dimension theorem expressed as (11.45).
In Table 12.1 [3, 4], moreover, we have two extreme cases. That is, if ν = 1 in
(12.116), i.e., N = 0, from (12.132) we have r1 = n and r2 = ⋯ = rn = 0; see
Fig. 12.1a. Also we confirm n = νi = 1 r i in (12.121). In this case, all the eigenvectors
are proper eigenvectors with multiplicity of n and we have n first-order Jordan
blocks. The other is the case of ν = n. In that case, we have
12.6 Jordan Canonical Form 521
r 1 = r 2 = ⋯ = rn = 1: ð12:133Þ
In the latter case, we also have n = νi = 1 r i in (12.121). We have only one proper
eigenvector and (n - 1) generalized eigenvectors; see Fig. 12.1b. From (12.132), we
have this special case where we have only one n-th order Jordan block, when
rankN = n - 1.
Let us think of (12.81) on the basis of (12.102). Picking up A(i) from (12.102) and
considering (12.108), we put
where Eni denotes (ni, ni) identity matrix. We express a nilpotent matrix as Ni as
before. In (12.134) the number ni corresponds to n in Vn of Sect. 12.6.1. As Ni is a (ni,
ni) matrix, we have
N i ni = 0:
Here we are speaking of ν-th order nilpotent matrices Ni such that Niν - 1 ≠ 0 and
ν
Ni = 0 (1 ≤ ν ≤ ni). We can deal with Ni in a manner fully consistent with the theory
we developed in Sect. 12.6.1. Each A(i) comprises one or more Jordan blocks A(κ)
that is expressed as
AðκÞ = N κi þ αi E κi ð1 ≤ κi ≤ ni Þ, ð12:135Þ
where A(κ) denotes the κ-th Jordan block in A(i). In A(κ), N κi and Eκi are nilpotent
(κi, κ i)-matrix and (κ i, κi) identity matrix, respectively. In N κi , κi zeros are displayed
on the principal diagonal and entries of 1 are positioned on the matrix element next
above the principal diagonal. All other entries are zero; see, e.g., a matrix of
(12.122). As in the Sect. 12.6.1, the number κi is called a dimension of the Jordan
block. Thus, A(i) of (12.102) can further be reduced to segmented matrices A(κ). Our
next task is to find out how many Jordan blocks are contained in individual A(i) and
what is the dimension of those Jordan blocks.
Corresponding to (12.122), the matrix representation of the linear transformation
by A(κ) with respect to the set of κ i vectors is
522 12 Canonical Forms of Matrices
AðκÞ - αi E κi κi - 1
aσ AðκÞ - αi E κi κi - 2
aσ ⋯ aσ AðκÞ - αi Eκi
= AðκÞ - αi E κi κi - 1
aσ AðκÞ - αi Eκi κi - 2
aσ ⋯ aσ
0 1
0 1
0 ⋱ ð12:136Þ
× ⋮ ⋱ ⋱ ⋮ :
0 1
⋯ 0 1
0
A vector aσ stands for a vector associated with the κ-th Jordan block of A(i). From
(12.136) we obtain
κi - 1
AðκÞ - αi E κi AðκÞ - αi Eκi aσ = 0: ð12:137Þ
Namely,
κi - 1 κi - 1
AðκÞ AðκÞ - αi E κi aσ = αi AðκÞ - αi E κi aσ : ð12:138Þ
κi - 1
This shows that AðκÞ - αi Eκi aσ is a proper eigenvector of A(κ) that corre-
sponds to an eigenvalue αi. Conversely, aσ is a generalized eigenvector of rank κi.
There are another (κi - 2) generalized eigenvectors of
μ
AðκÞ - αi E κi aσ ð1 ≤ μ ≤ κi - 2Þ. In total, there are κ i eigenvectors [a proper eigen-
vector and (κ i - 1) generalized eigenvectors]. Also we see that the sole proper
eigenvector can be found for each Jordan block.
In reference to these κi eigenvectors as the basis vectors, the (κi, κi)-matrix A(κ)
(i.e., a Jordan block) is expressed as
12.6 Jordan Canonical Form 523
αi 1
αi 1
αi ⋱
ðκ Þ
A = ⋮ ⋱ ⋱ ⋮ : ð12:139Þ
αi 1
⋯ αi 1
αi
αp 0
αp 0
αp 1 ...
αp 1
⋮ αp 0 ⋮
AðpÞ :
αp 1
αp 1
... αp 1
αp 1
αp
α1
α2
α2
α2
⋱
αi
A
, ð12:140Þ
αi
αi
αi
⋱
αs
αs
where
In (12.141), A(1) is a (1, 1) matrix (i.e., simply a number); A(2) is a (3, 3) matrix; ⋯
A is a (4, 4) matrix; ⋯ A(s) is a (2, 2) matrix.
(i)
The above matrix form allows us to further deal with segmented triangle matrices
separately. In the case of (12.140) we may use a following matrix for similarity
transformation:
1
1
1
1
⋱
p11 p12 p13 p14
P= ,
p21 p22 p23 p24
p31 p32 p33 p34
p41 p42 p43 p44
⋱
1
1
A - αi E
α1 - αi
α2 - αi
α2 - αi
α2 - αi
⋱
0
:
0
0
0
⋱
αs - αi
αs - αi
Here we are treating the (n, n) matrix A on Vn. Note that a matrix A - αiE is not
nilpotent as a whole. Suppose that the multiplicity of αi is ni; in (12.140) ni = 4.
Since eigenvalues α1, α2, ⋯, and αs take different values from one another, α1 - αi,
α2 - αi, ⋯, and αs - αi ≠ 0. In a triangle matrix diagonal elements give eigenvalues
and, hence, α1 - αi, α2 - αi, ⋯, and αs - αi are non-zero eigenvalues of A - αiE. We
rewrite (12.140) as
526 12 Canonical Forms of Matrices
M ð1Þ
M ð2Þ
⋱
A - αi E , ð12:142Þ
M ðiÞ
⋱
M ðsÞ
α2 - αi
where M ð1Þ = ðα1 - αi Þ, M ð2Þ = α2 - αi
, ⋯, M ðiÞ =
α2 - αi
0
0 αs - αi
, ⋯, M ðsÞ = :
0 αs - αi
0
Thus, M( p) ( p ≠ i) is a non-singular matrix and M(i) is a nilpotent matrix. Note that
μ -1 μ
if we can find μi such that M ðiÞ i ≠ 0 and M ðiÞ i = 0, with a minimal polynomial
φM ðiÞ ðxÞ for M(i), we have φM ðiÞ ðxÞ = xμi . Consequently, we get
μi
M ð1Þ
μi
M ð2Þ
⋱
ð A - α i E Þ μi : ð12:143Þ
0
⋱
μi
M ðsÞ
μ
In (12.143) diagonal elements of non-singular triangle matrices M ðpÞ i ðp ≠ iÞ
μ
are αp - αi i ð ≠ 0Þ. Thus, we have a “perforated” matrix ðA - αi EÞμi , where
ðiÞ μi
M = 0 in (12.143). Putting
s
ΦA ð x Þ i=1
ðx - αi Þμi ,
we get
12.6 Jordan Canonical Form 527
s
ΦA ðAÞ i=1
ðA - αi E Þμi = 0:
and
k k
~ αi = ni = dim Ker AðiÞ - αi E ni
dim W þ rank AðiÞ - αi Eni , ð12:146Þ
k
dim Ker ðA - αi EÞk = dim Ker AðiÞ - αi E ni , ð12:147Þ
we get
k
n - rank ðA - αi E Þk = ni - rank AðiÞ - αi Eni : ð12:148Þ
rankðA - αi E Þl = n - ni ðl ≥ μi Þ: ð12:149Þ
ðiÞ
The value r 1 gives the number of Jordan blocks with an eigenvalue αi.
528 12 Canonical Forms of Matrices
Moreover, we must consider a following situation. We know how the matrix A(i)
in (12.134) is reduced to Jordan blocks of lower dimension. To get detailed infor-
mation about it, however, we have to get the information about (generalized)
eigenvalues corresponding to eigenvalues other than αi. In this context,
Eq. (12.148) is useful. Equation (12.131) tells how the number of Jordan blocks in
a nilpotent matrix is determined. If we can get this knowledge before we have found
out all the (generalized) eigenvectors, it will be easier to address the problem. Let us
rewrite (12.131) as
J ðqiÞ = r q - r qþ1
q-1 qþ1 q
= rank AðiÞ - αi E ni þ rank AðiÞ - αi E ni - 2rank AðiÞ - αi E ni ,
ð12:151Þ
where we define J ðqiÞ as the number of the q-th order Jordan blocks within A(i). Note
that these blocks are expressed as (q, q) matrices. Meanwhile, using (12.148) J ðqiÞ is
expressed as
1 -1 0 0
A¼ 1 3 0 0 : ð12:153Þ
0 0 2 0
-3 -1 -2 4
x-1 1 0 0
-1 x-3 0 0
f A ð xÞ =
0 0 x-2 0 ð12:154Þ
3 1 2 x-4
3
= ð x - 4Þ ð x - 2Þ :
ðA - 4EÞx = 0:
-3 -1 0 0 c1
1 -1 0 0 c2
= 0: ð12:155Þ
0 0 -2 0 c3
-3 -1 -2 0 c4
- 3c1 - c2 = 0,
c1 - c2 = 0,
- 2c3 = 0,
- 3c1 - c2 - 2c3 = 0:
0
ð4Þ 0
e1 = 0 :
1
ð4Þ
r 1 = 4 - rankðA - 4E Þ = 1: ð12:156Þ
ð4Þ
J 1 = rank ðA - 4E Þ0 þ rank ðA - 4E Þ2 - 2rank ðA - 4E Þ
ð12:157Þ
= 4 þ 3 - 2 x 3 = 1:
ð4Þ
In (12.157), J 1 gives the number of the first-order Jordan blocks for an
eigenvalue 4. We used
8 4 0 0
ðA - 4EÞ =2 -4 0 0 0 ð12:158Þ
0 0 4 0
8 4 4 0
ðA - 2EÞx = 0: ð12:159Þ
-1 -1 0 0 c1
1 1 0 0 c2
0 0 0 0 = 0:
c3
-3 -1 -2 2 c4
c1 þ c2 = 0,
- 3c1 - c2 - 2c3 þ 2c4 = 0:
From the above, we can put c1 = c2 = 0 and c3 = c4 (=1). The equations allow
the existence of another proper eigenvector. For this we have c1 = - c2 = 1,
12.6 Jordan Canonical Form 531
0 1
ð2Þ
e1 = 0 ,
ð2Þ
e2 = -1 :
1 0
1 1
ðA - 2E Þ2 x = 0: ð12:160Þ
0 0 0 0 c1
0 0 0 0 c2 = 0:
0 0 0 0 c3
-4 0 -4 4 c4
That is,
- c1 - c3 þ c4 = 0: ð12:161Þ
Furthermore, we have
0 0 0 0
ðA - 2E Þ = 3 0 0 0 0 : ð12:162Þ
0 0 0 0
-8 0 -8 8
ð2Þ
r 1 = 4 - rankðA - 2EÞ = 4 - 2 = 2: ð12:163Þ
ð2Þ
J1 = rank ðA - 2E Þ0 þ rank ðA - 2E Þ2 - 2rank ðA - 2EÞ
ð12:164Þ
= 4 þ 1 - 2 × 2 = 1:
( )
( ) ( ) ( )
Eigenvalue: 4 Eigenvalue: 2
ð2Þ
J 2 = rank ðA - 2E Þ þ rank ðA - 2EÞ3 - 2rank ðA - 2E Þ2
ð12:165Þ
= 2 þ 1 - 2 × 1 = 1:
ð2Þ ð2Þ
In the above, J 1 and J 2 are obtained from (12.152). Thus, Fig. 12.2 gives a
constitution of Jordan blocks for eigenvalues 4 and 2. The overall number of Jordan
blocks is three; the number of the first-order and second-order Jordan blocks is two
and one, respectively.
ð2Þ ð2Þ
The proper eigenvector e1 is related to J 1 of (12.164). A set of the proper
ð2Þ ð2Þ
eigenvector e2 and the corresponding generalized eigenvector g2 is pertinent to
ð2Þ ð2Þ
J 2 of (12.165). We must have the generalized eigenvector g2 in such a way that
0
ð2Þ
g2 = -1 : ð12:167Þ
0
0
ð2Þ ð2Þ
We stress here that e1 is not eligible for a proper pair with g2 in J2. It is because
from (12.166) we have
0 0 1 0
0 0 -1 -1 ð4Þ ð2Þ ð2Þ ð2Þ
R= e1 e 1 e 2 g2 , ð12:169Þ
0 1 0 0
1 1 1 0
-1 0 -1 1 1 -1 0 0 0 0 1 0
0 0 1 0 1 3 0 0 0 0-1-1
R - 1 AR ¼
1 0 0 0 0 0 2 0 0 1 0 0
-1-1 0 0 -3-1-2 4 1 1 1 0
4 0 0 0
0 2 0 0
¼ :
0 0 2 1
0 0 0 2
ð12:170Þ
The structure of Jordan blocks is shown in Fig. 12.2. Notice here that the trace of
A remains unchanged before and after similarity transformation.
Next, we consider column vector representations. According to (11.37), let us
view the matrix A as a linear transformation over V4. Then A is given by
x1
x2
AðxÞ = ð e1 e2 e3 e4 ÞA
x3
x4
1 -1 0 0 ð12:171Þ
x1
1 3 0 0 x2
= ð e1 e2 e3 e4 Þ ,
0 0 2 0
x3
-3 -1 -2 4 x4
where e1, e2, e3, and e4 are basis vectors and x1, x2, x3, and x4 are corresponding
n
coordinates of a vector x = xi ei 2 V 4 . We rewrite (12.171) as
i=1
534 12 Canonical Forms of Matrices
x1
x2
AðxÞ = ð e1 e2 e3 e4 ÞRR - 1 ARR - 1
x3
x4
4 0 0 0 x01 ð12:172Þ
0 2 0 0 x02
= eð14Þ ð2Þ
e1
ð2Þ
e2
ð2Þ
g2 ,
0 0 2 1
x03
0 0 0 2 x04
where we have
x1 x1
x2 x2
x = ð e1 e2 e3 e4 Þ = ð e1 e2 e3 e4 ÞRR - 1
x3 x3
x4 x4
x01 ð12:174Þ
x02
= eð14Þ e1
ð2Þ ð2Þ
e2 g2
ð2Þ
:
x03
x04
As for Vn in general, let us put R = ( p)ij and R-1 = (q)ij. Also represent j-th
(generalized) eigenvectors by a column vector and denote them by p( j ). There we
display individual (generalized) eigenvectors in the order of (e(1) e(2)⋯e( j )⋯e(n)),
where e( j ) (1 ≤ j ≤ n) denotes either a proper eigenvector or a generalized
eigenvector according to (12.174). Each p( j ) is represented in reference to original
basis vectors (e1⋯en). Then we have
12.6 Jordan Canonical Form 535
n ðjÞ ðjÞ
R - 1 pðjÞ = q p
k = 1 ik k
= δi , ð12:175Þ
ðjÞ
where δi denotes a column vector to which only the j-th row is 1, otherwise 0. Thus,
ðjÞ
a column vector δi is an “address” of e( j ) in reference to (e(1) e(2)⋯e( j )⋯e(n)) taken
as basis vectors.
In our present case, in fact, we have
-1 0 -1 1 0 1
0 0 1 0 0 0
R - 1 pð1Þ = = ,
1 0 0 0 0 0
-1 -1 0 0 1 0
-1 0 -1 1 0 0
0 0 1 0 0 1
R - 1 pð2Þ = = ,
1 0 0 0 1 0
-1 -1 0 0 1 0
ð12:176Þ
-1 0 -1 1 1 0
0 0 1 0 -1 0
R - 1 pð3Þ = = ,
1 0 0 0 0 1
-1 -1 0 0 1 0
-1 0 -1 1 0 0
0 0 1 0 -1 0
R - 1 pð4Þ = = :
1 0 0 0 0 0
-1 -1 0 0 0 1
ð4Þ
In (12.176) p(1) is a column vector representation of e1 ; p(2) is a column vector
ð2Þ
representation of e1 , and so on.
A minimal polynomial ΦA(x) is expressed as
Φ A ð xÞ = ð x - 4Þ ð x - 2Þ 2 :
0 0 -1 0
′ 0 0 1 1
R = : ð12:177Þ
0 1 0 0
1 1 -1 0
In this case, we also get the same Jordan canonical form as before. That is,
4 0 0 0
-1 0 2 0 0
R0 AR0 = :
0 0 2 1
0 0 0 2
0 1 0 0
0 -1 -1 0
ðe1 e2 e3 e4 ÞR00 = ðe1 e2 e3 e4 Þ
1 0 0 0
1 1 0 1
ð2Þ ð2Þ ð2Þ ð4Þ
= e1 e2 g2 e1 :
2 0 0 0
-1 0 2 1 0
R00 AR00 = :
0 0 2 0
0 0 0 4
Note that we get a different disposition of the matrix elements from that of
(12.172).
Next, we decompose A into a semi-simple matrix and a nilpotent matrix. In
(12.172), we had
4 0 0 0
0 2 0 0
R - 1 AR = :
0 0 2 1
0 0 0 2
Defining
12.6 Jordan Canonical Form 537
4 0 0 0 0 0 0 0
0 2 0 0 0 0 0 0
S= and N= ,
0 0 2 0 0 0 0 1
0 0 0 2 0 0 0 0
we have
A=~ ~
SþN
with
2 0 0 0 -1 -1 0 0
~ 0 2 0 0 ~ = 1 1 0 0
S= and N :
0 0 2 0 0 0 0 0
-2 0 -2 4 -1 -1 0 0
1 -1 0 0 2 0 0 0
1 3 0 0 0 2 0 0
A= =
0 0 2 0 0 0 2 0
-3 -1 -2 4 -2 0 -2 4
-1 -1 0 0
1 1 0 0
þ : ð12:178Þ
0 0 0 0
-1 -1 0 0
Even though matrix forms S and N differ depending on the choice of different
matrix forms of similarity transformation R (namely, R′, R′′, etc. represented above),
the decomposition (12.178) is unique. That is, ~ S and N~ are uniquely determined,
~ ~
once a matrix A is given. The matrices S and N are commutable. The confirmation is
left for readers as an exercise.
We present another simple example. Let A be a matrix described as
0 4
A= :
-1 4
ð2Þ 2
e1 = :
1
ð2Þ
Another eigenvector is a generalized eigenvector g1 of rank 2. This can be
decided such that
As an option, we get
ð2Þ
g1 = 1
1
:
Thus, we can choose R for a diagonalizing matrix together with an inverse matrix
R-1 such that
-1
R= 2
1
1
1
, and R-1 = 1
-1 2
:
R - 1 AR = 2
0
1
2
: ð12:179Þ
As before, putting
S= 2
0
0
2
and N= 0
0
1
0
,
we have
A=~ ~
SþN
with
2 0 -2 4
S= and N= : ð12:180Þ
0 2 -1 2
We may also choose R′ for a diagonalizing matrix together with an inverse matrix
R′ instead of R and R-1, respectively, such that we have, e.g.,
-1
12.7 Diagonalizable Matrices 539
2 -3 -1 -1 3
R′ = and R′ = :
1 -1 -1 2
Using these matrices, we get exactly the same Jordan canonical form and the
matrix decomposition as (12.179) and (12.180). Thus, again we find that the matrix
decomposition is unique.
Another simple example is a lower triangle matrix
2 0 0
A= -2 1 0 :
0 0 1
1 0 0 1 0 0
R= -2 1 0 and R - 1 = 2 1 0 :
0 0 1 0 0 1
Then, we get
2 0 0
R - 1 AR = S = 0 1 0 :
0 0 1
2 0 0 0 0 0
A = RSR - 1 = -2 1 0 þ 0 0 0 ,
0 0 1 0 0 0
where the first term is a semi-simple matrix and the second is a nilpotent matrix (i.e.,
zero matrix). Thus, the decomposition is once again unique.
α1
⋱
α1
α2
⋱
P - 1 AP = : ð12:181Þ
α2
⋱
αs
⋱
αs
In this case, let us examine what form a minimal polynomial φA(x) for A takes. A
characteristic polynomial fA(x) for A is invariant through similarity transformation,
so is φA(x). That is,
In (12.186), the third equality comes from the fact that the domain of A is
restricted to W. Concomitantly, A-1(0) is restricted to A-1(0) \ W as well; notice
that A-1(0) \ W is a subspace. Considering these situations, we use a relation
corresponding to that of (11.45). The fourth equality is due to the dimension theorem
of (11.45). Applying (12.186) to (12.183) successively, we have
Finally we get
s
i=1
rank ½n - ðA - αi EÞ ≥ n: ð12:187Þ
Meanwhile, we have
542 12 Canonical Forms of Matrices
V n ⊃ W α1 W α2 ⋯ W αs ,
s ð12:189Þ
n ≥ dim W α1 W α2 ⋯ W αs = i=1
dimW αi :
The equality results from the property of a direct sum. From (12.188) and
(12.189),
s
i=1
dimW αi = n: ð12:190Þ
Hence,
V n = W α1 W α2 ⋯ W αs : ð12:191Þ
Thus, we have proven that if the minimal polynomial does not have a multiple
root, Vn is decomposed into direct sum of eigenspaces as in (12.191).
If in turn Vn is decomposed into direct sum of eigenspaces as in (12.191), A can be
diagonalized by a similarity transformation. The proof is as follows: Suppose that
(12.191) holds. Then, we can take only eigenvectors for the basis vectors of Vn.
Suppose that dimW αi = ni . Then, we can take vectors
i-1 i
ak j = 1 nj þ 1 ≤ k ≤ j = 1 nj so that ak can be the basis vectors of W αi . In
reference to this basis set, we describe a vector x 2 Vn such that
x1
x = ð a1 a 2 ⋯ a n Þ x2 :
⋮
xn
Operating A on x, we get
x1 x1
x2 x2
AðxÞ = ð a1 a2 ⋯ an ÞA = ð a1 A a2 A ⋯ an A Þ
⋮ ⋮
xn xn
x1
x2
= ð α1 a1 α2 a2 ⋯ αn an Þ
⋮
xn
α1 x1
α2 x2
= ð a1 a2 ⋯ an Þ ,
⋱ ⋮
αn xn
ð12:192Þ
12.7 Diagonalizable Matrices 543
where with the second equality we used the notation (11.40); with the third equality
some of αi (1 ≤ i ≤ n) may be identical; ai is an eigenvector that corresponds to an
eigenvalue αi. Suppose that a1, a2, ⋯, and an are obtained by transforming an
“original” basis set e1, e2, ⋯, and en by R. Then, we have
ð0Þ
α1 x1
ð0Þ
α2 x2
AðxÞ = ð e1 e2 ⋯ en Þ R R-1 :
⋱ ⋮
αn xðn0Þ
We denote the transformation A with respect to a basis set e1, e2, ⋯, and en by A0;
see (11.82) with the notation. Then, we have
ð0Þ
x1
ð0Þ
x2
AðxÞ = ð e1 e2 ⋯ en Þ A0 :
⋮
xnð0Þ
Therefore, we get
α1
α2
R - 1 A0 R = : ð12:193Þ
⋱
αn
A= 2
0
1
1
: ð12:32Þ
From (12.33), fA(x) = (x - 2)(x - 1). Note that f A ðxÞ = f P - 1 AP ðxÞ. Let us treat a
problem according to Theorem 12.5. Also we use the notation of (12.85). Given
f1(x) = x - 1 and f2(x) = x - 2, let us decide M1(x) and M2(x) such that these can
satisfy
We find M1(x) = 1 and M2(x) = - 1. Thus, using the notation of Theorem 12.5,
Sect. 12.4 we have
1 1
A1 = M 1 ðAÞf 1 ðAÞ = A - E = ,
0 0
0 -1
A2 = M 2 ðAÞf 2 ðAÞ = - A þ 2E = :
0 1
We also have
A1 þ A2 = E, Ai Aj = Aj Ai = Ai δij : ð12:195Þ
Thus, we find that A1 and A2 are idempotent matrices. As both A1 and A2 are
expressed by a polynomial A, they are commutative with A.
We find that A is represented by
A = α1 A1 þ α2 A2 , ð12:196Þ
1 0 1
A= 0 1 1 : ð12:197Þ
0 0 0
1 0 1 1 0 1 1 0 -1
~ = P - 1 AP =
A 0 1 1 0 1 1 0 1 -1
0 0 1 0 0 0 0 0 1
1 0 0
= 0 1 0 : ð12:198Þ
0 0 0
~ = A.
As can be checked easily, A ~ We also have
2
0 0 0
~=
E-A 0 0 0 , ð12:199Þ
0 0 1
x = c1 a1 þ c 2 a2 þ ⋯ þ c n - 1 an - 1 þ c n an : ð12:200Þ
Here let us define a following linear transformation P(k) such that P(k) “extracts”
the k-th component of x. That is,
n n n ½k
PðkÞ ðxÞ = PðkÞ ca
j=1 j j
= j=1
p ca
i = 1 ij j i
= c k ak , ð12:201Þ
½k
where pij is the matrix representation of P(k). In fact, suppose that there is another
arbitrarily chosen vector x such that
y = d 1 a1 þ d 2 a2 þ ⋯ þ d n - 1 an - 1 þ d n an : ð12:202Þ
Then we have
546 12 Canonical Forms of Matrices
PðkÞ ðax þ byÞ = ðack þ bdk Þak = ack ak þ bd k ak = aPðkÞ ðxÞ þ bPðkÞ ðyÞ: ð12:203Þ
Thus P(k) is a linear transformation. In (12.201), for the third equality to hold, we
should have
½k ðk Þ
pij = δi δjðkÞ , ð12:204Þ
ðjÞ
where δi has been defined in (12.175). Meanwhile, δjðkÞ denotes a row vector to
ðk Þ
which only the k-th column is 1, otherwise 0. Note that δi represents a (n, 1) matrix
ðk Þ
and that δðj kÞ denotes a (1, n) matrix. Therefore, δi δjðkÞ represents a (n, n) matrix
(k)
whose (k, k) element is 1, otherwise 0. Thus, P (x) is denoted by
0
⋱
x1
0
x2
PðkÞ ðxÞ = ða1 ⋯ an Þ 1 = x k ak , ð12:205Þ
⋮
0
xn
⋱
0
where only the (k, k) element is 1, otherwise 0. Then P(k)[P(k)(x)] = P(k)(x). That is
2
PðkÞ = PðkÞ : ð12:206Þ
A = α1 A1 þ ⋯ þ αn An , ð12:208Þ
where α1, ⋯, and αn are eigenvalues (some of which may be identical) and A1, ⋯,
and An are idempotent matrices such as those represented by (12.205).
In fact, the above situation is equivalent to that A is semi-simple. This implies that
A can be decomposed into a following form described by
12.7 Diagonalizable Matrices 547
3 0 0 3 0 0
A= 0 2 0 , B= 0 2 1 : ð12:213Þ
0 0 2 0 0 2
Therefore, we have
Hence, we get
A1 M 1 ðAÞf 1 ðAÞ = ðA - 2E Þ3 ,
ð12:217Þ
A2 M 2 ðAÞf 2 ðAÞ = ðA - 3E Þ - A2 þ 3A - 3E :
1 0 0 0 0 0
A1 = B1 = 0 0 0 , A2 = B2 = 0 1 0 : ð12:218Þ
0 0 0 0 0 1
Notice that we get the same idempotent matrix of (12.218), even though the
matrix forms of A and B differ. Also we have
A1 þ A2 = B1 þ B2 = E: ð12:219Þ
Then, we have
Nonetheless, although A = 3A1 + 2A2 holds, B ≠ 3B1 + 2B2. That is, the
decomposition of the form of (12.208) is not true of B. The decomposition of this
kind is possible only with diagonalizable matrices.
In summary, a (n, n) matrix with s (1 ≤ s ≤ n) different eigenvalues has at least
s proper eigenvectors. (Note that a diagonalizable matrix has n proper eigenvectors.)
In the case of s < n, the matrix has multiple root(s) and may have generalized
eigenvectors. If the matrix has a generalized eigenvector of rank ν, the matrix is
accompanied by (ν - 1) generalized eigenvectors along with a sole proper eigen-
vector. Those vectors form an invariant subspace along with the proper eigenvector
(s). In total, such n (generalized) eigenvectors span a whole vector space Vn.
With the eigenvalue equation A(x) = αx, we have an indefinite but non-trivial
solution x ≠ 0 for only restricted numbers α (i.e., eigenvalues) in a complex plane.
However, we have a unique but trivial solution x = 0 for complex numbers α other
than eigenvalues. This is characteristic of the eigenvalue problem.
References
Thus far we have treated the theory of linear vector spaces. The vector spaces,
however, were somewhat “structureless,” and so it will be desirable to introduce a
concept of metric or measure into the linear vector spaces. We call a linear vector
space where the inner product is defined an inner product space. In virtue of a
concept of the inner product, the linear vector space is given a variety of structures.
For instance, introduction of the inner product to the linear vector space immediately
leads to the definition of adjoint operators and Gram matrices.
Above all, the concept of inner product can readily be extended to a functional
space and facilitate understanding of, e.g., orthogonalization of functions, as was
exemplified in Parts I and II. Moreover, definition of the inner product allows us to
relate matrix operators and differential operators. In particular, it is a key issue to
understand logical structure of quantum mechanics. This can easily be understood
from the fact that Paul Dirac, who was known as one of the prominent founders of
quantum mechanics, invented bra and ket vectors to represent an inner product.
Inner product relates two vectors to a complex number. To do this, we introduce the
notation jai and hbj to represent the vectors. This notation is due to Dirac and widely
used in physics and mathematical physics. Usually jai and hbj are called a “ket”
vector and a “bra” vector, respectively, again due to Dirac. Alternatively, we may
call haj an adjoint vector of jai. Or we denote ha j j ai{. The symbol “{” (dagger)
means that for a matrix its transposed matrix should be taken with complex conju-
{
gate matrix elements. That is, aij = aji . If we represent a full matrix, we have
In (13.4), equality holds if and only if jai = 0. Note here that two vectors are said
to be orthogonal to each other if their inner product vanishes, i.e., hb| ai = ha| bi = 0.
In particular, if a vector jai 2 Vn is orthogonal to all the vectors in Vn, i.e., hx| ai = 0
for 8jxi 2 Vn, then jai = 0. This is because if we choose jai for jxi, we have ha|
ai = 0. This means that jai = 0. We call a linear vector space to which the inner
product is defined an inner product space.
We can create another structure to a vector space. An example is a metric
(or distance function). Suppose that there is an arbitrary set Q. If a real
non-negative number ρ(a, b) is defined as follows with any arbitrary elements a, b,
c 2 Q, the set Q is called a metric space [1]:
In our study a vector space is chosen for the set Q. Here let us define a norm for
each vector a. The norm is defined as
or
The inequality (13.9) related to the quadratic equation in x with real coefficients
requires the inequality such that
That is,
Namely,
Thus, (13.16) is equivalent to (13.7). At the same time, the norm defined in (13.8)
may be regarded as a “length” of a vector a.
As β| bi + γ| ci represents a vector, we use a shorthand notation for it as
Comparing the leftmost side and rightmost side of (13.19) and considering | ai
can arbitrarily be chosen, we get
Therefore, when we take out a scalar from a bra vector, it should be converted to a
complex conjugate. When the scalar is taken out from a ket vector, however, it is
unaffected by definition (13.3). To show this, we have jαai = α j ai. Taking its
adjoint, hαa| =αha j .
We can view (13.17) as a linear transformation in a vector space. In other words,
if we regard | ∙i as a linear transformation of a vector a 2 Vn to j ai 2 V~n , | ∙i is that of
Vn to V~n . Conversely, h∙j could not be regarded as a linear transformation of a 2 Vn to
0
ha j2 V~n . Sometimes the said transformation is referred to as “antilinear” or
“sesquilinear.” From the point of view of formalism, the inner product can be viewed
0 0
as an operation: V~n × V~n → ℂ. The vector space V~n is said to be a dual vector space
(or dual space) of V~n . The notion of the dual space possesses important position in
the theory of vector space. We will come back to the definition and properties of the
dual space in Part V (Chap. 24).
Let us consider jxi = j yi in an inner product space V~n . Then jxi - j yi = 0. That
is, jx - yi = 0. Therefore, we have x = y, or j0i = 0. This means that the linear
transformation | ∙i converts 0 2 Vn to j 0i 2 V~n . This is a characteristic of a linear
transformation represented in (11.44). Similarly we have h0 j = 0. However, we do
not have to get into further details about the notation in this book. Also if we have to
specify a vector space, we simply do so by designating it as Vn.
Once we have defined an inner product between any pair of vectors jai and jbi of a
vector space Vn, we can define and calculate various quantities related to inner
products. As an example, let ja1i, ⋯, j ani and jb1i, ⋯, j bni be two sets of vectors
in Vn. The vectors ja1i, ⋯, j ani may or may not be linearly independent. This is true
of jb1i, ⋯, j bni. Let us think of a following matrix M defined as below:
13.2 Gram Matrices 553
Then we have
c2 ha1 jb2 iþc3 ha1 jb3 iþ⋯þcn ha1 jbn i ha1 jb2 i ⋯ ha1 jbn i
M= ⋮ ⋮ ⋱ ⋮ : ð13:23Þ
c2 han jb2 iþc3 han jb3 iþ⋯þcn han jbn i han jb2 i ⋯ han jbn i
Multiplying the second column,⋯, and the n-th column by (-c2),⋯, and (-cn),
respectively, and adding them to the first column, we get
0 ⋯ 0
~ = ha2 jb1 i ⋯ ha2 jbn i
M : ð13:26Þ
⋮ ⋱ ⋮
han jb1 i ⋯ han jbn i
Again, det M = 0.
Next, let us examine the case where det M = 0. In that case, n column vectors of
M in (13.21) are linearly dependent. Without loss of generality, we suppose that the
first column is expressed as a linear combination of the other (n - 1) columns such
that
554 13 Inner Product Space
ha1 jb1 - c2 b2 - ⋯ - cn bn i 0
⋮ = ⋮ : ð13:28Þ
han jb1 - c2 b2 - ⋯ - cn bn i 0
Multiplying the first row,⋯, and the n-th row of (13.28) by an appropriate
complex number p1 ,⋯, and pn , respectively, we have
hp1 a1 jb1 - c2 b2 - ⋯ - cn bn i ¼ 0,
⋯,
hpn an jb1 - c2 b2 - ⋯ - cn bn i ¼ 0:
hp1 a1 þ ⋯ þ pn an jb1 - c2 b2 - ⋯ - cn bn i = 0:
Now, suppose that ja1i, ⋯, j ani are the basis vectors. Then, jp1a1 + ⋯ + pnani
represents any vectors in a vector space. This implies that jb1 - c2b2 - ⋯ -
cnbni = 0; for this see remarks after (13.4). That is,
j b1 i = c2 j b2 i þ ⋯ þ cn j bn i: ð13:29Þ
j xi = x1 j e1 i þ x2 j e2 i þ ⋯ þ xn j en i
= j x1 e1 þ x 2 e2 þ ⋯ þ xn en i
x1
= ðj e1 i⋯jen iÞ ⋮ : ð13:32Þ
xn
he1 j
hx j = x1 ⋯ xn ⋮ : ð13:33Þ
hen j
he1 j x1
hxjxi = x1 ⋯ xn ⋮ ðje1 i ⋯ jen iÞ ⋮
hen j xn
As hej| eii = hei| eji, we have G = G{. From Theorem 13.1, detG ≠ 0. With a
shorthand notation, we write (G)ij = (hei| eji). As already mentioned in Sect. 1.4, if
for a matrix H we have a relation described by
H = H{, ð1:119Þ
Since the Gram matrices frequently appear in matrix algebra and play a role, their
properties are worth examining. Since G is an Hermitian matrix, it can be diagonal-
ized through similarity transformation using a unitary matrix. We will give its proof
later (see Sect. 14.3).
Let us deal with (13.34) further. We have
where U is defined as UU{ = U{U = E (or U-1 = U{). Such a matrix U is called a
unitary matrix. We represent a matrix form of U as
Here, putting
x1 ξ1
U{ ⋮ = ⋮
xn ξn
or equivalently
n
x u
k = 1 k ki
ξi ð13:38Þ
or equivalently
n
x u
k = 1 k ki
= ξi ,
we have
λ1 ⋯ 0
U - 1 GU = U { GU = G0 = ⋮ ⋱ ⋮ : ð13:40Þ
0 ⋯ λn
Thus, we get
λ1 ⋯ 0 ξ1
hxjxi = ξ1 ⋯ ξn ⋮ ⋱ ⋮ ⋮ = λ1 jξ1 j2 þ ⋯ þ λn jξn j2 : ð13:41Þ
0 ⋯ λn ξn
From the relation (13.4), hx| xi ≥ 0. This implies that in (13.41) we have
λi ≥ 0 ð1 ≤ i ≤ nÞ: ð13:42Þ
To show this, suppose that for ∃λi, λi < 0. Suppose also that for ∃ξi, ξi ≠ 0. Then,
0
⋮
with ξi we have hx| xi = λi|ξi|2 < 0, in contradiction.
⋮
0
Since we have detG ≠ 0, taking a determinant of (13.40) we have
n
det G0 = det U { GU = det U { UG = detEdetG = detG = λi ≠ 0: ð13:43Þ
i=1
Combining (13.42) and (13.43), all the eigenvalues λi are positive; i.e.,
xi (1 ≤ i ≤ n). Then, the said Hermitian quadratic form is called positive definite and
we write
H > 0: ð13:46Þ
If ϕ(x1, ⋯, xn) ≥ 0 for any xn (1 ≤ i ≤ n) and ϕ(x1, ⋯, xn) = 0 for at least a set of
(x1, ⋯, xn) to which ∃xi ≠ 0, ϕ(x1, ⋯, xn) is said to be positive semi-definite or
non-negative. In that case, we write
H ≥ 0: ð13:47Þ
Notice here that eigenvalues λi remain unchanged after (unitary) similarity trans-
formation. Namely, the eigenvalues are inherent to H.
We have already encountered several examples of positive definite and
non-negative operators. A typical example of the former case is Hamiltonian of
quantum-mechanical harmonic oscillator (see Chap. 2). In this case, energy eigen-
values are all positive (i.e., positive definite). Orbital angular momenta L2 of
hydrogen-like atoms, however, are non-negative operators and, hence, an eigenvalue
of zero is permitted.
Alternatively, the Gram matrix is defined as B{B, where B is any (n, n) matrix. If
we take an orthonormal basis jη1i, j η2i, ⋯, j ηni, jeii can be expressed as
n
j ei i = b
j = 1 ji
j ηj ,
n n n n n
hek jei i = j=1
b b
l = 1 lk ji
ηl jηj = j=1
b b δ
l = 1 lk ji lj
= b b
j = 1 jk ji
n
= j=1
B{ kj
ðBÞji = B{ B ki : ð13:49Þ
For the second equality of (13.49), we used the orthonormal condition hηi| ηji = δij.
Thus, the Gram matrix G defined in (13.35) can be regarded as identical to B{B.
In Sect. 11.4 we have dealt with a linear transformation of a set of basis vectors e1,
e2, ⋯, and en by a matrix A defined in (11.69) and examined whether the
transformed vectors e01 , e02 , and e0n are linearly independent. As a result, a necessary
and sufficient condition for e01 , e02 , and e0n to be linearly independent (i.e., to be a set
of basis vectors) is detA ≠ 0. Thus, we notice that B plays a same role as A of (11.69)
13.2 Gram Matrices 559
and, hence, det B ≠ 0 if and only if the set of vectors je1i, j e2i, and⋯ j eni defined in
(13.32) are linearly independent. By the same token as the above, we conclude that
eigenvalues of B{B are all positive, only if det B ≠ 0 (i.e., B is non-singular).
Alternatively, if B is singular, detB{B = det B{ det B = 0. In that case at least one
of the eigenvalues of B{B must be zero. The Gram matrices appearing in (13.35) are
frequently dealt with in the field of mathematical physics in conjunction with
quadratic forms. Further topics can be seen in the next chapter.
Example 13.1 Let us take two vectors jε1i and j ε2i that are expressed as
j ε1 i = j e1 iþ j e2 i, j ε2 i = j e1 i þ i j e2 i: ð13:50Þ
Here we have hei| eji = δij (1 ≤ i, j ≤ 2). Then we have a Gram matrix expressed as
2-λ
det j G - λE j = 1-i
1þi
2-λ
= 0, λ2 - 4λ þ 2 = 0: ð13:52Þ
p
We have λ = 2 ± 2. Then as a diagonalizing unitary matrix U we get
1þi 1þi
U= p2 2p
: ð13:53Þ
2 2
-
2 2
Thus, we get
p
1-i 2 1þi 1þi
p2 2p
{ 2 2p 2 1þi
U GU =
1-i 2 1-i 2 2
-
2
-
2 2 2 2
p
= 2þ 2
0
0p
2- 2
: ð13:54Þ
p p
The eigenvalues 2 þ 2 and 2 - 2 are real positive as expected. That is, the
Gram matrix is positive definite. □
560 13 Inner Product Space
Example 13.2 Let je1i and j e2i(=-| e1i) be two vectors. Then, we have a Gram
matrix expressed as
1-λ -1
det j G - λE j = -1 1-λ
= 0, λ2 - 2λ = 0: ð13:56Þ
1 1
p p
2 2
U= : ð13:57Þ
1 1
- p p
2 2
Thus, we get
1 1 1 1
p - p p p
2 2 1 -1 2 2 2 0
U { GU = = : ð13:58Þ
1 1 -1 1 1
p p 1
- p p 0 0
2 2 2 2
As expected, we have a diagonal matrix, one of the eigenvalues for which is zero.
That is, the Gram matrix is non-negative.
In the present case, let us think of a following Hermitian quadratic form:
2
ϕðx1 , x2 Þ = x ðGÞij xj
i,j = 1 i
= x1 x2 UU { GUU { x1
x2
ξ1
= x1 x2 U 2
0
0
0
U{ x1
x2
= ξ1 ξ2 2
0
0
0 ξ2
= 2jξ1 j2 :
ξ1
Then, if we take ξ2
= 0
ξ2
with ξ2 ≠ 0 as a column vector, we get ϕ(x1,
x2) = 0. With this type of column vector, we have
1 1
p p
x1 0 x1 0 2 2 0 1 ξ2
U{ = , i:e:, =U = =p ,
ξ2 ξ2 1 1 ξ2 2 ξ2
- p p
x2 x2
2 2
-1
ϕð1, 1Þ = ð 1 1Þ
1
-1 1
1
1
=ð1 1Þ
0
0
= 0,
-1 -1
ϕð1, 2Þ = ð 1 2Þ
1
-1 1
1
2
=ð1 2Þ
1
= 1 > 0:
a11 ⋯ a1n x1
AðjxiÞ = ðj e1 i ⋯ jen iÞ ⋮ ⋱ ⋮ ⋮ , ð13:59Þ
an1 ⋯ ann xn
where (aij) is a matrix representation of A. Defining A(| xi) = j A(x)i and h(x)
A{ j = (hx| )A{ = [A(| xi)]{ to be consistent with (1.117), we have
n
Therefore, putting j yi = i = 1 yi j ei i, we have
a11 ⋯ an1 h e1 j y1
ðxÞA{ j yi = x1 ⋯ xn ⋮ ⋱ ⋮ ⋮ ðje1 i ⋯ jen iÞ ⋮
a1n ⋯ ann h en j yn
y1
= x1 ⋯ xn A{ G ⋮ : ð13:61Þ
yn
Meanwhile, we get
h e1 j a11 ⋯ a1n x1
hyjAðxÞi = y1 ⋯ yn ⋮ ðje1 i ⋯ jen iÞ ⋮ ⋱ ⋮ ⋮
h en j an1 ⋯ ann xn
x1
= y1 ⋯ yn GA ⋮ : ð13:62Þ
xn
Hence, we have
562 13 Inner Product Space
x1 x1
hyjAðxÞi = ðy1 ⋯ yn ÞG A
T
⋮ = ðy1 ⋯ yn ÞGT A{ ⋮ : ð13:63Þ
xn xn
a11 ⋯ a1n
A ⋮ ⋱ ⋮ :
an1 ⋯ ann
Comparing (13.61) and (13.63), we find that one is transposed matrix of the other.
Also note that (AB)T = BTAT, (ABC)T = CTBTAT, etc. Since an inner product can be
viewed as a (1, 1) matrix, two mutually transposed (1, 1) matrices are identical.
Hence, we get
xi A{ ik
ðGÞkj yj = yj ðG Þjk ðA Þki xi
i, j, k i, j, k
= xi yj A{ ik
ðGÞkj - ðG Þjk ðA Þki
i, j, k
= xi yj A{ ik
- ðA Þki ðGÞkj = 0: ð13:65Þ
i, j, k
With the third equality of (13.65), we used (G)jk = (G)kj, i.e., G{ = G (Hermitian
matrix). Thanks to the freedom in choice of basis vectors as well as xi and yi, we
must have
A{ ik
= ðA Þki : ð13:66Þ
A{ ik
= aki : ð13:67Þ
{
ðxÞA{ jy = yj A{ ðxÞ = hyjAðxÞi:
{
A{ = A: ð13:68Þ
A0 = P - 1 A0 P: ð11:88Þ
{ { -1
ðA0 Þ = P{ A{ P - 1 = P{ A{ P{ : ð13:69Þ
In (13.69), we denote the adjoint operator before the basis vectors transformation
simply by A{ to avoid complicated notation. We also have
0 {
A { = ðA 0 Þ : ð13:70Þ
Meanwhile, suppose that (A{)′ = Q-1A{Q. Then, from (13.69) and (13.70)
we have
P{ = Q - 1 :
or
Equation (13.71) states the equality of two vectors in an inner product space on
both sides, whereas (13.72) states that in a vector space where the inner product is
564 13 Inner Product Space
not defined. In either case, both (13.71) and (13.72) show that A{ is indeed a linear
transformation. In fact, the matrix representation of (13.66) and (13.67) is indepen-
dent of the concept of the inner product.
Suppose that there are two (or more) adjoint operators B and C that correspond to
A. Then, from (13.64) we have
Also we have
ðxÞA{ jAðxÞ
a11 ⋯ an1 h e1 j a11 ⋯ a1n x1
= x1 ⋯ xn ⋮ ⋱ ⋮ ⋮ ðje1 i ⋯ jen iÞ ⋮ ⋱ ⋮ ⋮
a1n ⋯ ann h en j an1 ⋯ ann xn
x1
= x1 ⋯ xn A{ GA ⋮ : ð13:75Þ
xn
j xi = ðje1 i je2 iÞ x1
x2
: ð13:76Þ
Then we have
he1 j
hxjxi = ðx1 x2 Þ he2 j
ðj e1 i je2 iÞ x1
x2
13.3 Adjoint Operators 565
| ′⟩ | ⟩ cos
| ′⟩
sin
| ⟩
2
| ⟩
−2| ⟩ sin | ⟩ cos x
cos θ - 2 sin θ
R= : ð13:78Þ
ð sin θÞ=2 cos θ
cos θ - 2 sin θ x1
Rðj xiÞ = j e1 i je2 i : ð13:79Þ
ð sin θÞ=2 cos θ x2
As a result of the transformation R, the basis vectors je1i and je2i are transformed
into j e01 and j e02 , respectively, as in Fig. 13.1 such that
cos θ - 2 sin θ
je1′ je2′ = j e1 i je2 i :
ð sin θÞ=2 cos θ
ðxÞR{ jRðxÞ
cos θ ð sin θÞ=2 1 0 cos θ - 2 sin θ x1
= ð x1 x 2 Þ
- 2 sin θ cos θ 0 4 ð sin θÞ=2 cos θ x2
1 0 x1
= ðx1 x2 Þ = x21 þ 4x22 : ð13:80Þ
0 4 x2
Putting
1 0
G= ,
0 4
we have
R{ GR = G: ð13:81Þ
Comparing (13.77) and (13.80), we notice that a norm of jxi remains unchanged
after the transformation R. This means that R is virtually a unitary transformation. A
somewhat unfamiliar matrix form of R resulted from the choice of basis vectors other
than an orthonormal basis. □
Now we introduce an orthonormal basis, the simplest and most important basis set in
an inner product space. If we choose the orthonormal basis so that hei| eji = δij, a
Gram matrix G = E. Thus, R{GR = G reads as R{R = E. In that case a linear
transformation is represented by a unitary matrix and it conserves a norm of a vector
and an inner product with two arbitrary vectors.
So far we assumed that an adjoint operator A{ operates only on a row vector from
the right, as is evident from (13.61). At the same time, A operates only on the column
vector from the left as in (13.62). To render the notation of (13.61) and (13.62)
consistent with the associative law, we have to examine the commutability of A{
with G. In this context, choice of the orthonormal basis enables us to get through
such a troublesome situation and largely eases matrix calculation. Thus,
ðxÞA{ jAðxÞ
a11 ⋯ an1 h e1 j a11 ⋯ a1n x1
= x1 ⋯ xn ⋮ ⋱ ⋮ ⋮ ðje1 i ⋯ jen iÞ ⋮ ⋱ ⋮ ⋮
a1n ⋯ ann h en j an1 ⋯ ann xn
13.4 Orthonormal Basis 567
x1 x1
= x1 ⋯ xn A{ EA ⋮ = x1 ⋯ xn A{ A ⋮ : ð13:82Þ
xn xn
x1
xA{ jAx = x1 ⋯ xn A{ A ⋮ : ð13:83Þ
xn
This notation has become now consistent with the associative law. Note that A{
and A operate on either a column or row vector. We can also do without a symbol “|”
in (13.83) and express it as
Thus, we can freely operate A{ and A from both the left and right. By the same
token, we rewrite (13.62) as
a11 ⋯ a1n x1 x1
hyjAxi = hyAjxi = y1 ⋯ yn ⋮ ⋱ ⋮ ⋮ = y1 ⋯ yn A ⋮ : ð13:85Þ
an1 ⋯ ann xn xn
Here, notice that a vector jxi is represented by a column vector with respect to the
orthonormal basis. Using (13.64), we have
y1
hyjAxi = xA{ jy = x1 ⋯ xn A{ ⋮
yn
a11 ⋯ an1 y1
= x1 ⋯ xn ⋮ ⋱ ⋮ ⋮ : ð13:86Þ
a1n ⋯ ann yn
If in (13.83) we put jyi = j Axi, hxA{| Axi = hy| yi ≥ 0. Thus, we define a norm of
jAxi as
Proof First let us take j1i. This can be normalized such that
j 1i
je1 i = , he1 je1 i = 1: ð13:88Þ
h1j1i
1
je2 i = ½2i - he1 j2ije1 i, ð13:89Þ
L2
where L2 is a normalization constant such that ‹e2 j e2 › = 1. Note that | e2i cannot be
a zero vector. This is because if | e2i were a zero vector, | 2i and | e1i (or | 1i) would
be linearly dependent, in contradiction to the assumption. We have he1| e2i = 0.
Thus, | e1i and | e2i are orthonormal.
After this, the proof is based upon mathematical induction. Suppose that the
theorem is true of (n - 1) vectors. That is, let | eii (1 ≤ i ≤ n - 1) so that hei|
eji = δij (1 ≤ j ≤ n - 1) and each vector | eii can be a linear combination of the
vectors jii. Meanwhile, let us define
n-1
~ j ni -
j ni ej jn jej : ð13:90Þ
j=1
Again, the vector j n~i cannot be a zero vector as asserted above. We have
n-1 n-1
ek j ~n = hek jni - j=1
ej jn ek jej = hek jni - j=1
ej jn δkj = 0, ð13:91Þ
~
j ni
j en i = , hen jen i = 1: ð13:92Þ
~
h~n j ni
~
eiθ j ni
j en i = : ð13:93Þ
h~n j ni
~
To prove Theorem 13.2, we have used the following simple but important
theorem.
13.4 Orthonormal Basis 569
Theorem 13.3 Let us have any n vectors jii ≠ 0 (1 ≤ i ≤ n) in Vn and let these
vectors jii be orthogonal to one another. Then the vectors jii are linearly
independent.
Proof Let us think of the following equation:
Multiplying (13.94) by hij from the left and considering the orthogonality among
the vectors, we have
ci hijii = 0: ð13:95Þ
λ1 ⋯ 0
U - 1 GU = U { GU = G0 = ⋮ ⋱ ⋮ , ð13:40Þ
0 ⋯ λn
Then, we have
1 ⋯ 0
DT U { GUD = ⋮ ⋱ ⋮ , ð13:97Þ
0 ⋯ 1
where RHS of (13.97) is a (n, n) identity matrix. Thus, we find that (13.97) is
equivalent to what Theorem 13.2 implies.
If the vector space we are thinking of is a real vector space, the corresponding
Gram matrix is real as well and U of (13.97) is an orthogonal matrix. Then, instead of
(13.97) we have
1 ⋯ 0
DT U T GUD = ðUDÞT GUD = ⋮ ⋱ ⋮ : ð13:98Þ
0 ⋯ 1
Defining UD P, we have
570 13 Inner Product Space
PT GP = En , ð13:99Þ
where En stands for the (n, n) identity matrix and P is a real non-singular matrix. The
transformation of G represented by (13.99) is said to be an equivalence transforma-
tion (see Sect. 14.5). Equation (13.99) gives a special case where Theorem 14.11
holds; also see Sect. 14.5.
The real vector space characterized by a Gram matrix En is called Euclidean
space. Its related topics are discussed in Parts IV and V. In Part V, in particular, we
will investigate several properties of vector spaces other than the (real) inner product
space in relation to the quantum theory of fields.
References
Hermitian operators and unitary operators are quite often encountered in mathemat-
ical physics and, in particular, quantum physics. In this chapter we investigate their
basic properties. Both Hermitian operators and unitary operators fall under the
category of normal operators. The normal matrices are characterized by an important
fact that those matrices can be diagonalized by a unitary matrix. Moreover,
Hermitian matrices always possess real eigenvalues. This fact largely eases mathe-
matical treatment of quantum mechanics. In relation to these topics, in this chapter
we investigate projection operators systematically. We find their important applica-
tion to physicochemical problems in Part IV. We further investigate Hermitian
quadratic forms and real symmetric quadratic forms as an important branch of matrix
algebra. In connection with this topic, positive definiteness and non-negative prop-
erty of a matrix are an important concept. This characteristic is readily applicable to
theory of differential operators, thus rendering this chapter closely related to basic
concepts of quantum physics.
V n = W⨁W⊥ : ð14:2Þ
Proof Suppose that an orthonormal basis comprising je1i, j e2i, ⋯, j eni spans Vn;
Of the orthonormal basis, let je1i, j e2i, ⋯, j eri (r < n) span W. Let an arbitrarily
chosen vector from Vn be jxi. Then we have
n
j xi = x1 j e1 i þ x2 j e2 i þ ⋯ þ xn j en i = x
i=1 i
j ei i: ð14:3Þ
That is,
n
j xi = i=1
hei jxijei i: ð14:5Þ
Meanwhile, put
r
j x0 i = i=1
hei jxijei i: ð14:6Þ
hei jx00 i = hei jxi - hei jx0 i = hei jxi - hei jxi = 0: ð14:7Þ
Taking account of W = Span{| e1i, | e2i, ⋯, | eri}, from the definition of the
orthogonal complement we get jx′′i 2 W⊥. That is, for 8jxi 2 Vn
14.1 Projection Operators 573
j xi = j x0 iþ j x00 i: ð14:8Þ
V n = W 1 ⨁W 2 ⨁⋯⨁W r , ð14:9Þ
j xi = j w1 iþ j w2 i þ ⋯þ j wr i = jw1 þ w2 þ ⋯ þ wr i: ð14:10Þ
Pi ðjxiÞ = j wi i ð1 ≤ i ≤ r Þ: ð14:11Þ
Thus, the operator Pi extracts a vector jwii in a subspace Wi. Then we have
ðP1 þ P2 þ ⋯ þ Pr ÞðjxiÞ = P1 j xi þ P2 j xi þ ⋯ þ Pr j xi = j w1 iþ
j w2 i þ ⋯þ j wr i = j xi: ð14:12Þ
P1 þ P2 þ ⋯ þ Pr = E: ð14:13Þ
Moreover,
Pi 2 = Pi : ð14:16Þ
j yi = j u1 iþ j u2 i þ ⋯þ j ur i = j u1 þ u2 þ ⋯ þ ur i: ð14:17Þ
Then, we have
With the last equality, we used the mutual orthogonality of the subspaces.
Meanwhile, we have
where we used (13.64) with the second equality. Since jxi and jyi are arbitrarily
chosen, we get
Pi { = Pi : ð14:21Þ
r
ðP1 þ P2 þ ⋯ þ Pr ÞðP1 þ P2 þ ⋯ þ Pr Þ = P
i=1 i
2
þ PP
i≠j i j
r
= P
i=1 i
þ PP
i≠j i j
=E þ PP
i≠j i j
= E: ð14:22Þ
PP
i≠j i j
= 0:
In particular, we have
Pi Pj = 0, Pj Pi = 0 ði ≠ jÞ: ð14:23Þ
In fact, we have
The second equality comes from Wi \ Wj = {0}. Notice that in (14.24), indices
i and j are interchangeable. Again, jxi is arbitrarily chosen, and so (14.23) holds.
Combining (14.16) and (14.23), we write
Pi Pj = δij : ð14:25Þ
2
Pi þ Pj = Pi 2 þ Pi Pj þ Pj Pi þ Pj 2 = Pi 2 þ Pj 2 = Pi þ Pj ,
{
Pi þ Pj = Pi { þ Pj { = Pi þ Pj :
jwi ihwi j
Pi = : ð14:26Þ
wi wi
Then we have
jwi ihwi j
Pi j xi = ðj w1 iþ j w2 i þ ⋯ þ jwr iÞ
jjwi jj jjwi jj
576 14 Hermitian Operators and Unitary Operators
Furthermore, we have
Pi 2 j xi = Pi j wi i = j wi i = Pi j xi: ð14:27Þ
ðjwi ihwi jÞ{ = ðhwi jÞ{ ðjwi iÞ{ = jwi ihwi j : ð14:28Þ
Hence, we recover
Pi { = Pi : ð14:21Þ
With the second equality of (14.29), we used (13.86) where A is replaced with
B and jyi is replaced with A{ j yi. Since (14.29) holds for arbitrarily chosen vectors
jxi and jyi, comparing the first and last sides of (14.29) we have
ðABÞ{ = B{ A{ : ð14:30Þ
{ {
xB{ jA{ y = y A{ j B{ x = hyAjBxi = hyABxi , ð14:31Þ
where with the second equality we used (13.68). Also recall the remarks after (13.83)
with the expressions of (14.29) and (14.31). Other notations can be adopted.
We can view a projection operator under a more strict condition. Related oper-
ators can be defined as well. As in (14.26), let us define an operator such that
where jeki (k = 1, ⋯, n) are the orthonormal basis set of Vn. Operating P~k on
jxi = x1 j e1i + x2 j e2i + ⋯ + xn j eni from the left, we get
14.1 Projection Operators 577
P~k j xi = j ek ihek j
n n
x je
j=1 j j
= j ek i x he je
j=1 j k j
=
n
j ek i xδ
j = 1 j kj
= xk j ek i:
Thus, we find that P~k plays the same role as P(k) defined in (12.201). Represented
by a matrix, P~k has the same structure as that denoted in (12.205). Evidently,
{
P~k = P~k , P~k = P~k , P~1 þ P~2 þ ⋯ þ P~n = E:
2
ð14:33Þ
ðk Þ
Now let us modify P(k) in (12.201). There PðkÞ = δi δjðkÞ , where only the (k, k)
ðk Þ ðk Þ
element is 1, otherwise 0, in the (n, n) matrix. We define a matrix PðmÞ = δi δjðmÞ . A
full matrix representation for it is
0
⋱ 1
0
ðk Þ
PðmÞ = 0 , ð14:34Þ
0
⋱
0
ðk Þ ðlÞ ðk Þ ðk Þ ðk Þ
PðmÞ PðnÞ = δi δqðmÞ δðqlÞ δjðnÞ = δi δml δjðnÞ = δml PðnÞ : ð14:35Þ
q
ðk Þ
Note that PðkÞ PðkÞ defined in (12.201). From (14.35), moreover, we have
ðk Þ ðmÞ ðk Þ ðmÞ ðk Þ
U { PðmÞ PðnÞ U = U { PðmÞ U U { PðnÞ U = U { PðnÞ U:
ðk Þ
Among these operators, only PðkÞ is eligible for a projection operator. We will
encounter further examples in Parts IV and V.
ðk Þ
Replacing A in (13.62) with PðkÞ , we obtain
x1
ðk Þ ðk Þ
yjPðkÞ ðxÞ = y1 ⋯ yn GPðkÞ ⋮ : ð14:36Þ
xn
x1
ðk Þ ðk Þ
yjPðkÞ ðxÞ = y1 ⋯ yn PðkÞ ⋮ = yk xk : ð14:37Þ
xn
There are a large group of operators called normal operators that play an important
role in mathematical physics, especially quantum physics. A normal operator is
defined as an operator on an inner product space that commutes with its adjoint
operator. That is, let A be a normal operator. Then, we have
AA{ = A{ A: ð14:38Þ
{
A{ x = x A{ jA{ x = xAA{ x = xA{ Ax = jjAxjj: ð14:39Þ
The other way around suppose that j|Ax| j = kA{xk. Then, since ||Ax||2 = kA{xk2,
hxA{Axi = hxAA{xi. That is, hx| A{A - AA{| xi = 0 for an arbitrarily chosen vector
jxi. To assert A{A - AA{ = 0, i.e., A{A = AA{ on the assumption that hx| A{A - AA{|
xi = 0, we need the following theorems:
Theorem 14.2 [4] A linear transformation A on an inner product space is the zero
transformation if and only if hy| Axi = 0 for any vectors jxi and jyi.
14.2 Normal Operators 579
From the assumption that hx| Axi = 0 with any vectors jxi, we have
That is,
hxjAyi = 0: ð14:44Þ
Theorem 14.2 means that A = 0, indicating that the sufficient condition holds.
This completes the proof.
Thus returning to the beginning, i.e., remarks made after (14.39), we establish the
following theorem:
Theorem 14.4 A necessary and sufficient condition for a linear transformation A on
an inner product space to be a normal operator is that a following relation
580 14 Hermitian Operators and Unitary Operators
A{ x = jjAxjj ð14:45Þ
holds.
A normal operator has a distinct property. The normal operator can be diagonalized
by a similarity transformation by a unitary matrix. The transformation is said to be a
unitary similarity transformation. Let us prove the following theorem.
Theorem 14.5 [5] A necessary and sufficient condition for a matrix A to be
diagonalized by unitary similarity transformation is that the matrix A is a normal
matrix.
Proof To prove the necessary condition, suppose that A can be diagonalized by a
unitary matrix U. That is,
For the third equality, we used DD{ = D{D (i.e., D and D{ are commutable). This
shows that A is a normal matrix.
To prove the sufficient condition, let us show that a normal matrix can be
diagonalized by unitary similarity transformation. The proof is due to mathematical
induction, as is the case with Theorem 12.1.
First we show that Theorem is true of a (2, 2) matrix. Suppose that one of the
eigenvalues of A2 is α1 and that its corresponding eigenvector is jx1i. Following
procedures of the proof for Theorem 12.1 and remembering the Gram-Schmidt
orthonormalization theorem, we can construct a unitary matrix U1 such that
where jx1i represents a column vector and jp1i is another arbitrarily determined
column vector. Then we can convert A2 to a triangle matrix such that
14.3 Unitary Diagonalization of Matrices 581
A~2 U 1 { A2 U 1 = α1
0
x
y
: ð14:49Þ
Then we have
{ {
A~2 A~2 = U 1 { A2 U 1 U 1 { A2 U 1 = U 1 { A2 U 1 U 1 { A2 { U 1
= U 1 { A2 A2 { U 1 = U 1 { A2 { A2 U 1 = U 1 { A2 { U 1 U 1 { A2 U 1
{
= A~2 A~2 : ð14:50Þ
With the fourth equality, we used the supposition that A2 is a normal matrix.
Equation (14.50) means that A~2 defined in (14.49) is a normal operator. Via simple
matrix calculations, we have
α1 0
A2 = : ð14:52Þ
0 y
This implies that a normal matrix A2 has been diagonalized by the unitary
similarity transformation.
Now let us examine a general case where we consider a (n, n) square normal
matrix An. Let αn be one of the eigenvalues of An. On the basis of the argument of the
~ we
(2, 2) matrix case, after a suitable similarity transformation by a unitary matrix U
first have
~ { An U,
A~2 = U ~ ð14:53Þ
αn xT
An = , ð14:54Þ
0 B
{ αn 0
An = , ð14:55Þ
x B{
{ jαn j2 þ xT x xT B{
An An = ,
Bx BB{
{ jαn j2 αn xT
An An = : ð14:56Þ
α n x x x þ B{ B
T
For A~n ½A~n { = ½A~n { ½A~n to hold with (14.56), we must have x = 0. Thus we get
αn 0
An = : ð14:57Þ
0 B
αn 0 1 0 1 0 αn 0
= : ð14:58Þ
0 B 0 C 0 C 0 D
Here putting
1 0 αn 0
Cn = and Dn = , ð14:59Þ
0 C 0 D
we get
{
As C~n is a (n, n) unitary matrix, [C~n { C~n = C~n C~n = E. Hence,
{
C~n A~n C~n = D~n . Thus, from (14.53) finally we get
{
C~n ~ { An U
U ~ C~n = D~n : ð14:61Þ
V { An V = D~n : ð14:62Þ
The first equality comes from the fact that |xi 2 W ⟹ A|xi (=| Axi) 2 W as W is
A-invariant. From the last equality of (14.63), we have
1
⋱
1
0 ⋯ 1
1
U= ⋮ ⋱ ⋮ , ð14:66Þ
1
1 ⋯ 0
1
⋱
1
where except (i, j) and ( j, i) elements equal to 1, all the off-diagonal elements are
zero. If operated from the left, U exchanges the i-th and j-th rows of the matrix. If
operated from the right, U exchanges the i-th and j-th columns of the matrix. Note
that U is at once unitary and Hermitian with eigenvalue 1 or -1. Note that U2 = E.
This is because exchanging two columns (or two rows) two times produces identity
transformation.
Thus performing such unitary similarity transformations appropriate times, we
get
α1
⋱
α1
α2
⋱
Dn : ð14:67Þ
α2
⋱
αs
⋱
αs
V n = W α1 W α2 ⋯ W αs : ð14:68Þ
Að1Þ
⋯
An Að2Þ , ð14:69Þ
⋮ ⋱ ⋮
⋯ AðsÞ
A normal operator has other distinct properties. Following theorems are good
examples.
Theorem 14.7 Let A be a normal operator on Vn. Then jxi is an eigenvector of
A with an eigenvalue α, if and only if jxi is an eigenvector of A{ with an eigenvalue
α.
Proof We apply (14.45) for the proof. Both (A - αE){ = A{ - αE and (A - αE) are
normal, since A is normal. Consequently, we have k(A - αE)xk = 0 if and only if
k(A{ - αE)xk = 0. Since only the zero vector has a zero norm, we get
(A - αE) j xi = 0 if and only if (A{ - αE) j xi = 0.
This completes the proof.
Theorem 14.8 Let A be a normal operator on Vn. Then, eigenvectors corresponding
to different eigenvalues are mutually orthogonal.
Proof Let A be a normal operator on Vn. Let jui be an eigenvector corresponding to
an eigenvalue α and jvi be an eigenvector corresponding to an eigenvalue β with
α ≠ β. Then we have
αhvjui = hvjαui = hvjAui = ujA{ v = hujβ vi = hβ vjui = βhvjui, ð14:70Þ
where with the fourth equality we used Theorem 14.7. Then we get
ðα - βÞhvjui = 0:
Since α - β ≠ 0, hv| ui = 0. Namely, the eigenvectors jui and jvi are mutually
orthogonal. This completes the proof.
In (12.208) we mentioned the decomposition of diagonalizable matrices. As for
the normal matrices, we have a related matrix decomposition. Let A be a normal
operator. Then, according to Theorem 14.5, A can be diagonalized and expressed as
(14.67). This is equivalently expressed as a following succinct relation. That is, if we
choose U for a diagonalizing unitary matrix, we have
U { AU = α1 P1 þ α2 P2 þ ⋯ þ αs Ps , ð14:71Þ
E n1
⋯
0n 2
P1 = , ð14:72Þ
⋮ ⋱ ⋮
⋯ 0n s
586 14 Hermitian Operators and Unitary Operators
where E n1 stands for a (n1, n1) identity matrix with n1 corresponding to multiplicity
of α1. A matrix represented by 0n2 is a (n2, n2) zero matrix, and so forth. This
expression is in accordance with (14.69). From a matrix form (14.72), obviously
Pl (1 ≤ l ≤ s) is a projection operator. Thus, operating U and U{ on both sides (14.71)
from the left and right of (14.71), respectively, we obtain
In (14.74), we can easily check that P~l is a projection operator with α1, α2, ⋯,
and αs being different eigenvalues of A. If one of the eigenvalues is zero, in (14.74)
we may disregard the term related to the zero eigenvalue. For example, suppose that
α1 = 0 in (14.74), we get
A = α2 P~2 þ ⋯ þ αs P~s :
P~l P~k = 0:
Thus, we have
A A={ ~i
αi P ~j
αj P = ~iP
αi αj P ~j = ~i =
αi αj δij P ~i,
jαi j2 P
i j i, j i, j i
AA{ = ~j
αj P ~i
αi P = ~jP
αi αj P ~i = ~j
αi αj δji P
j i i, j i, j
2~
= jαi j Pi : ð14:76Þ
i
Hence, A{A = AA{. If projection operators are further decomposed as in the case
of (14.75), we have a related expression to (14.76). Thus, a necessary and sufficient
condition for an operator to be a normal operator is that the said operator is expressed
as (14.74). The relation (14.74) is well-known as a spectral decomposition theorem.
Thus, the spectral decomposition theorem is equivalent to Theorem 14.5.
In Sect. 12.7, we have already encountered the matrix decomposition (12.211)
similar to that of (14.74). In this context, (12.211) may well be referred to as the
spectral decomposition in a broader sense. Notice, however, that in (12.211)
A~k ð1 ≤ k ≤ nÞ is not necessarily Hermitian. In Part V, we will encounter interesting
examples of matrix decomposition such as the spectral decomposition.
The relations (12.208) and (14.74) are virtually the same, aside from the fact that
whereas (14.74) premises an inner product space, (12.208) does not premise
it. Correspondingly, whereas the related operators are called projection operators
with the case of (14.74), those operators are said to be idempotent operators for
(12.208).
Example 14.1 Let us think of a Gram matrix of Example 13.1, as shown below
2 1þi
G= : ð13:51Þ
1-i 2
~ { = G, we get
By a back calculation of U GU
588 14 Hermitian Operators and Unitary Operators
p
1 2ð1 þ iÞ
p 2 4
G= 2 þ 2 p
2ð1 - iÞ 1
4 2
p
1 2ð 1 þ i Þ
p -
2 4
þ 2- 2 p : ð14:77Þ
2ð 1 - i Þ 1
-
4 2
p p
Putting eigenvalues α1 = 2 þ 2 and α2 = 2 - 2 along with
p p
1 2ð1 þ iÞ 1 2ð1 þ iÞ
-
2 4 2 4
A1 = p , A2 = p , ð14:78Þ
2ð1- iÞ 1 2ð 1 - i Þ 1
-
4 2 4 2
we get
G = α1 A1 þ α2 A2 : ð14:79Þ
A1 2 = A1 , A2 2 = A2 , A1 A2 = A2 A1 = 0, A1 þ A2 = E: ð14:80Þ
Moreover, (14.78) obviously shows that both A1 and A2 are Hermitian. Thus,
(14.77) and (14.79) are an example of spectral decomposition. The decomposition is
unique. □
Example 14.1 can be dealt with in parallel to Example 12.5. In Example 12.5,
however, an inner product space is not implied, and so we used an idempotent matrix
instead of a projection operator. Note that as can be seen in Example 12.5 that
idempotent matrix was not Hermitian.
Of normal matrices, Hermitian matrices and unitary matrices play a crucial role both
in fundamental and applied science. Let us think of several topics and examples.
In quantum physics, one frequently treats expectation value of an operator. In
general, such an operator is Hermitian, more strictly an observable. Moreover, a
vector on an inner product space is interpreted as a state on a Hilbert space. Suppose
that there is a linear operator (or observable that represents a physical quantity)
O that has discrete (or countable) eigenvalues α1, α2, ⋯. The number of the
eigenvalues may be a finite number or an infinite number, but here we assume the
14.4 Hermitian Matrices and Unitary Matrices 589
finite number; i.e., let us suppose that we have eigenvalues α1, α2, ⋯, and αs, in
consistent with our previous discussion.
In quantum physics, we have a well-known Born probability rule. The rule says
the following: Suppose that we carry out a physical measurement on A with respect
to a physical state jui. Here we assume that jui has been normalized. Then, the
probability ℘l that A takes αl (1 ≤ l ≤ s) is given by
℘l = P~l u
2
, ð14:81Þ
{
℘l = P~l u = uP~l jP~l u = uP~l jP~l u = uP~l P~l u = uP~l u :
2
ð14:84Þ
For the third equality we used the fact that P~l is Hermitian; for the last equality we
used P~l = P~l . Summing (14.84) with the index l, we have
2
Hence,
ujA - A{ ju = 0: ð14:87Þ
From Theorem 14.3, we get A - A{ = 0, i.e., A = A{. This completes the proof.
Theorem 14.10 [4] The eigenvalues of an Hermitian operator A are real.
Proof Let α be an eigenvalue of A and let jui be a corresponding eigenvector. Then,
A j ui = α j ui. Operating huj from the left, we have hu| A| ui = αhu|
ui = αkuk2. Thus,
α = hujAjui=kuk2 : ð14:88Þ
Since A is Hermitian, hu| A| ui is real. Then, the eigenvalue α is real as well. This
completes the proof.
Unitary matrices have a following conspicuous features: (1) An inner product is
held invariant under unitary transformation: Suppose that jx′i = U j xi and
jy′i = U j yi, where U is a unitary operator. Then hy′| x′i = hyU{| Uxi = hy| xi. A
norm of any vector is held invariant under unitary transformation as well. This is
easily checked by replacing jyi with jxi in the above. (2) Let U be a unitary matrix
and suppose that λ is an eigenvalue with jλi being its corresponding eigenvector of
that matrix. Then we have
ð1 - λλ Þ j λi = 0: ð14:90Þ
Thus, eigenvalues of a unitary matrix have unit absolute value. Let those eigen-
values be λk = eiθk ðk = 1, 2, ⋯; θk : realÞ. Then, from (12.11) we have
That is, any unitary matrix has a determinant of unit absolute value.
Example 14.2 Let us think of a following unitary matrix R:
cos θ - sin θ
R= : ð14:92Þ
sin θ cos θ
A characteristic equation is
cos θ - λ - sin θ
= λ2 - 2λ cos θ þ 1: ð14:93Þ
sin θ cos θ - λ
1 1 1 i
p p p p
2 2 2 2
U= , U{ = :
i i 1 i
- p p p - p
2 2 2 2
Therefore, we have
1 i 1 1
p p p p
2 2 cos θ - sin θ 2 2
U { RU =
1 i sin θ cos θ i i
p -p -p p ð14:94Þ
2 2 2 2
eiθ 0
= :
- iθ
0 e
592 14 Hermitian Operators and Unitary Operators
A trace of the resulting matrix is 2 cos θ. In the case of π < θ < 2π, we get a
diagonal matrix similarly. The conformation is left for readers. □
The Hermitian quadratic forms appeared in, e.g., (13.34) and (13.83) in relation to
Gram matrices in Sect. 13.2. The Hermitian quadratic forms have wide applications
in the field of mathematical physics and materials science.
Let H be an Hermitian operator and jxi on an inner product space Vn. We define
the Hermitian quadratic form in an arbitrary orthonormal basis as follows:
x1
hxjHjxi x1 ⋯ xn H ⋮ = x ðH Þij xj ,
i,j i
xn
where jxi is represented as a column vector, as already mentioned in Sect. 13.4. Let
us start with unitary diagonalization of (13.40), where a Gram matrix is a kind of
Hermitian matrix. Following similar procedures, as in the case of (13.36) we obtain a
diagonal matrix and an inner product such that
λ1 ⋯ 0 ξ1
hxjHjxi = ξ1 ⋯ ξn ⋮ ⋱ ⋮ ⋮ = λ1 jξ1 j2 þ ⋯ þ λn jξn j2 : ð14:95Þ
0 ⋯ λn ξn
Notice, however, that the Gram matrix comprising basis vectors (that are linearly
independent) is positive definite. Remember that if a Gram matrix is constructed by
A{A (Hermitian as well) according to whether A is non-singular or singular, A{A is
either positive definite or non-negative. The Hermitian matrix we are dealing with
here, in general, does not necessarily possess the positive definiteness or
non-negative feature. Yet, remember that hx| H| xi and eigenvalues λ1, ⋯, λn
are real.
Positive definiteness of matrices is an important concept in relation to the
Hermitian (and real) quadratic forms (see Sect. 13.2). In particular, in the case
where all the matrix elements are real and | xi is defined in a real domain, we are
dealing with the quadratic form with respect to a real symmetric matrix. In the case
of the real quadratic forms, we sometimes adopt the following notation [1, 2]:
n x1
A½x xT Ax = a xx;
i,j = 1 ij i j
x= ⋮ ,
xn
Here, we wish to show that the positive definiteness is invariant under a trans-
formation PTAP, where P is non-singular. In other words, if A > 0, we must have
A0 = PT AP > 0: ð14:96Þ
A0 ≃ A:
Note that the equivalence transformation satisfies the equivalence law; readers,
check it. To prove the above proposition stated by (14.96), we take in advance the
notion of the polar decomposition of a matrix (see Sect. 24.9.1). The point is that any
real non-singular matrix P is uniquely decomposed into a product such that P = OS,
where O is an orthogonal matrix and S is a positive definite real symmetric matrix
(see Corollary 24.3).
Then, we have
A0 = ðOSÞT AðOSÞ = S OT AO S = S O - 1 AO S:
O ~ =O
~ T A0 O ~O
~TS O ~ O
~T A ~O ~= O
~ T SO ~ O
~ T SO ~O
~TA ~ O ~ = DAD,
~ T SO ð14:97Þ
d1 ⋯
~ T SO
D=O ~ = ⋮ ⋱ ⋮
⋯ dn
~TA
with di > 0 (i = 1, ⋯, n). Also, we have A = O ~ O. ~ > 0, we have A > 0.
~ Since A
Meanwhile, let us think of a following inner product xjDADjx , where x is an
arbitrary real vector in Vn. We have
594 14 Hermitian Operators and Unitary Operators
x1 x1
xjDADjx = ðx1 ⋯ xn ÞDAD ⋮ = ½ðx1 ⋯ xn ÞDA D ⋮
xn xn
x1 d 1
= ðx1 d1 ⋯ xn d n ÞA ⋮ : ð14:98Þ
xn d n
λ1 ⋯ 0
O ^=
^ T AO ⋮ ⋱ ⋮ ,
0 ⋯ λn
T T n
det O AO = detO ðdetAÞ detO = ð ± 1ÞdetAð ± 1Þ = detA = i=1
λi :
where the principal submatrices mean a matrix made by striking out rows and
columns on diagonal elements. Then, we have
Proof First, suppose that A > 0. Then, in a quadratic form A[x] equating (n - k)
variables xl = 0 (l ≠ i1, i2, ⋯, ik), we obtain a quadratic form of
k
a x x :
μ,ν = 1 iμ iν iμ iν
Since A > 0, this (partial) quadratic form is positive definite as well; i.e., we have
AðkÞ > 0:
Therefore, we get
detAðkÞ > 0:
This is due to (13.48). Notice that detA(k) is said to be a principal minor. Thus, we
have proven ⟹ of (14.99).
To prove ⟸, in turn, we use mathematical induction. If n = 1, we have a trivial
case; i.e., A is merely a real positive number. Suppose that ⟸ is true of n - 1. Then,
we have A(n - 1) > 0 by supposition. Thus, it follows that it will suffice to show
A > 0 on condition that A(n - 1) > 0 and detA > 0 in addition. Let A be a n-
dimensional real symmetric non-singular matrix such that
Aðn - 1Þ a
A= ,
aT an
where A(n - 1) is a symmetric matrix and non-singular as well. We define P such that
-1
P= E Aðn - 1Þ a ,
0 1
E 0
PT = -1 :
aT Aðn - 1Þ 1
-1 T -1 T -1
aT Aðn - 1Þ = Aðn - 1Þ = Aðn - 1Þ
T
aT a:
Aðn - 1Þ 0
A = PT -1 P:
0 an - Aðn - 1Þ ½a
-1
E 0 Aðn - 1Þ 0 E Aðn - 1Þ a :
= ðn - 1Þ - 1 ðn - 1Þ - 1
aT A 1 0 an - A ½ a 0 1
ð14:100Þ
-1
detA = detPT detAðn - 1Þ an - Aðn - 1Þ ½a detP
-1
= detAðn - 1Þ an - Aðn - 1Þ ½ a :
-1
an - Aðn - 1Þ ½a > 0:
-1 xðn - 1Þ
Putting a~n an - Aðn - 1Þ ½a and x xn
, we get
Aðn - 1Þ 0
½x = Aðn - 1Þ xðn - 1Þ þ an xn 2 :
0 an
~ Aðn - 1Þ 0
A > 0:
0 an
Meanwhile, A is expressed as
14.5 Hermitian Quadratic Forms 597
~ or A ≃ A:
A = PT AP ~
a11 ⋯ a1k
A~ðkÞ = ⋮ ⋱ ⋮ :
ak1 ⋯ akk
5 2 5 2 x1
H= , hxjHjxi = ðx1 x2 Þ : ð14:101Þ
2 1 2 1 x2
1 1
p - p
U= 4-2 2 4þ2 2 : ð14:102Þ
1 1
p p
4þ2 2 4-2 2
5 2 x
hxjHjxi = ðx1 x2 ÞUU { UU { 1
2 1 x2
p
3þ2 2 0p x
= ðx1 x2 ÞU U{ 1 : ð14:104Þ
0 3-2 2 x2
Making an argument analogous to that of Sect. 13.2 and using similar notation,
we have
x~1
h~xjH 0 j~xi = ð x~1 x~2 ÞH 0 x~2
= ðx1 x2 ÞUU { HUU { x1
x2
= hxjHjxi: ð14:105Þ
x1 x1
= U{ : ð14:106Þ
x2 x2
Or, taking an adjoint of (14.106), we have equivalently ð x~1 x~2 Þ = ðx1 x2 ÞU.
x~1 x1
Notice here that x~2
and x2
are real and that U is real (i.e., orthogonal matrix).
Likewise,
H 0 = U { HU: ð14:107Þ
x1 ~x1
Thus, it follows that x2
and ~x2
are different column vector representations
of the identical vector that is viewed in reference to two different sets of orthonormal
bases (i.e., different coordinate systems).
From (14.102), as an approximation we have
14.6 Simultaneous Eigenstates and Diagonalization 599
x~1 2 x~2 2
2
þ 2
= 1: ð14:109Þ
1p 1p
3þ2 2 3-2 2
Theorem 14.14 [4] Two Hermitian matrices B and C commute if and only if there
exists a complete orthonormal set of common eigenvectors.
Proof Suppose that there exists a complete orthonormal set of common eigenvec-
tors {jxi; ki (1 ≤ k ≤ mi)} that span the linear vector space, where i and mi are a
positive integer and that jxii corresponds to an eigenvalue bi of B and ci of C. Note
that if mi is 1, we say that the spectrum is non-degenerate and that if mi is equal to
two or more, the spectrum is said to be degenerate. Then we have
Therefore, BC(| xii) = B(ci| xii) = ciB(| xii) = cibi j xii. Similarly, CB(| xii) = cibi j xii.
Consequently, (BC - CB)(| xii) = 0 for any jxii. As all the set of jxii span the vector
space, BC - CB = 0, namely BC = CB.
In turn, assume that BC = CB and that B(| xii) = bi j xii, where jxii are
orthonormal. Then, we have
This implies that C(| xii) is an eigenvector of B corresponding to the eigenvalue bi.
We have two cases.
1. The spectrum is non-degenerate: The spectrum is said to be non-degenerate if
only one eigenvector belongs to an eigenvalue. In other words, multiplicity of bi
is one. Then, C(| xii) must be equal to some constant times jxii; i.e., C(| xii) = ci j xii.
That is, jxii is an eigenvector of C corresponding to an eigenvalue ci. That is, jxii
is a common eigenvector to B and C. This completes the proof for the
non-degenerate case.
2. The spectrum is degenerate: The spectrum is said to be degenerate if two or more
eigenvectors belong to an eigenvalue. Multiplicity of bi is two or more; here
ð1Þ ð2Þ ðmÞ
suppose that the multiplicity is m (m ≥ 2). Let j xi i, j xi i, ⋯, and j xi i
be linearly independent vectors and belong to the eigenvalue bi of B. Then, from
the assumption we have m eigenvectors
γ1
m ðk Þ ð1Þ ðmÞ
C γ jx i = jxi i⋯jxi i C
k=1 k i
⋮
γm
α11 ⋯ α1m γ1
ð1Þ ðmÞ
= jxi i⋯jxi i ⋮ ⋱ ⋮ ⋮ : ð14:113Þ
αm1 ⋯ αmm γm
where the third equality comes from the orthonormality of the basis vectors.
Meanwhile,
ðlÞ ðk Þ ðk Þ ðlÞ ðk Þ ðlÞ
xi jCxi = xi jC{ xi = xi jCxi = αkl , ð14:115Þ
where the second equality comes from the Hermiticity of C. From (14.114) and
(14.115), we get
m ðk Þ m ðjÞ
C γ jx
k=1 k i
=c γ jx
j=1 j i
: ð14:117Þ
α11 ⋯ α1m γ1 γ1
ð1Þ ðmÞ ð1Þ ðmÞ
jxi i⋯jxi i ⋮ ⋱ ⋮ ⋮ = jxi i⋯jxi i c ⋮ : ð14:118Þ
αm1 ⋯ αmm γm γm
602 14 Hermitian Operators and Unitary Operators
ðjÞ
The vectors j xi ið1 ≤ j ≤ mÞ span an invariant subspace (i.e., an eigenspace
corresponding to an eigenvalue of bi). Let us call this subspace Wm. Consequently,
ðjÞ
in (14.118) we can equate the scalar coefficients of individual j xi ið1 ≤ j ≤ mÞ.
Then, we get
α11 ⋯ α1m γ1 γ1
⋮ ⋱ ⋮ ⋮ =c ⋮ : ð14:119Þ
αm1 ⋯ αmm γm γm
ðμÞ ð μÞ
α11 ⋯ α1m γ1 γ1
⋮ ⋱ ⋮ ⋮ = cμ ⋮ : ð14:120Þ
αm1 ⋯ αmm γ ðmμÞ γ ðmμÞ
Equation (14.120) implies that we can construct a (m, m) unitary matrix from the
m orthonormal column vectors γ (μ). Using the said unitary matrix, we will be able to
diagonalize (αij) according to Theorem 14.5.
Having determined m eigenvectors γ (μ) (1 ≤ μ ≤ m), we can construct a set of
eigenvectors such that
ð μÞ m ð μÞ ðk Þ
yi = γ
k=1 k
xi : ð14:121Þ
ð μÞ
Finally let us confirm that j yi ið1 ≤ μ ≤ mÞ in fact constitute an orthonormal
basis. To show this, we have
The last equality comes from the fact that a matrix comprising m orthonormal
ðμÞ
column vectors γ (μ) (1 ≤ μ ≤ m) forms a unitary matrix. Thus, j yi ið1 ≤ μ ≤ mÞ
certainly constitute an orthonormal basis.
The above completes the proof.
14.6 Simultaneous Eigenstates and Diagonalization 603
Theorem 14.14 can be restated as follows: Two Hermitian matrices B and C can
be simultaneously diagonalized by a unitary similarity transformation. As mentioned
above, we can construct a unitary matrix U such that
ð1Þ ðmÞ
γ1 ⋯ γ1
U= ⋮ ⋱ ⋮ :
γ ðm1Þ ⋯ γ ðmmÞ
bi c1
bi c2
U { BU = , U { CU = : ð14:123Þ
⋱ ⋱
bi cm
1 1
A= A þ A{ þ i A - A{ , ð14:124Þ
2 2i
A = B þ iC: ð14:125Þ
Note here that B and C commute if and only if A and A{ commute, that is A is a
normal matrix. In fact, from (14.125) we get
From (14.126), if and only if B and C commute, AA{ - A{A = 0, i.e., AA{ = A{A.
This indicates that A is a normal matrix.
Taking account of the above context along with Theorem 14.14, we see that we
can construct the diagonalizing matrix of A from the complete orthonormal set of
eigenvectors common to B and C. This immediately implies that the diagonalizing
matrix is unitary. In other words, with respect to a normal operator A on Vn a
diagonalizing unitary matrix U must exist. On top of it, such matrix
U diagonalizes the Hermitian matrices B and C at once. Then, we have
U { AU = U { BU þ iU { CU
λ1 ⋯ μ1 ⋯ λ1 þ iμ1 ⋯
= ⋮ ⋱ ⋮ þi ⋮ ⋱ ⋮ = ⋮ ⋱ ⋮ ,
⋯ λn ⋯ μn ⋯ λn þ iμn
ð14:127Þ
ω1
⋱
ω1
ω2
⋱
U { AU = , ð14:128Þ
ω2
⋱
ωs
⋱
ωs
References
1. Satake I-O (1975) Linear algebra (pure and applied mathematics). Marcel Dekker, New York
2. Satake I (1974) Linear algebra. Shokabo, Tokyo
3. Hassani S (2006) Mathematical physics. Springer, New York
4. Byron FW, Fuller RW (1992) Mathematics of classical and quantum physics. Dover, New York
5. Mirsky L (1990) An introduction to linear algebra. Dover, New York
Chapter 15
Exponential Functions of Matrices
In Chap. 12, we dealt with a function of matrices. In this chapter we study several
important definitions and characteristics of functions of matrices. If elements of
matrices consist of analytic functions of a real variable, such matrices are of
particular importance. For instance, differentiation can naturally be defined with
the functions of matrices. Of these, exponential functions of matrices are widely used
in various fields of mathematical physics. These functions frequently appear in a
system of differential equations. In Chap. 10, we showed that SOLDEs with suitable
BCs can be solved using Green’s functions. In the present chapter, in parallel, we
show a solving method based on resolvent matrices. The exponential functions of
matrices have broad applications in group theory that we will study in Part IV. In
preparation for it, we study how the collection of matrices forms a linear vector
space. In accordance with Chap. 13, we introduce basic notions of inner product and
norm to the matrices.
dAðt Þ
A0 ðt Þ = a0ij ðt Þ : ð15:2Þ
dt
We have
1 2 1
exp A E þ A þ A þ ⋯ þ Aν þ ⋯, ð15:7Þ
2! ν!
where A is a (n, n) square matrix and E is a (n, n) identity matrix. Note that E is used
instead of a number 1 that appears in the power series expansion of expx described as
1 2 1
exp x 1 þ x þ x þ ⋯ þ xν þ ⋯:
2! ν!
Let A be a (n, n) square matrix. Putting A = (aij) and max j aij j = M and
i, j
ðνÞ ðνÞ
defining A aij , where aij is defined as a (i, j) component of Aν, we get
ν
ðνÞ
aij ≤ nν M ν ð1 ≤ i, j ≤ nÞ: ð15:8Þ
ð0Þ
Note that A0 = E. Hence, aij = δij . Then, if ν = 0, (15.8) routinely holds. When
ν = 1, (15.8) obviously holds as well from the definition of M and assuming n ≥ 2;
we are considering (n, n) matrices with n ≥ 2. We wish to show that (15.8) holds with
any ν (≥2) using mathematical induction. Suppose that (15.8) holds with ν = ν - 1.
That is,
ðν - 1Þ
aij ≤ nν - 1 M ν - 1 ð1 ≤ i, j ≤ nÞ:
Then, we have
ðνÞ n n ðν - 1Þ
ðAν Þij aij = Aν - 1 A ij
= k=1
Aν - 1 ik
ðAÞkj = a
k = 1 ik
akj : ð15:9Þ
Thus, we get
ðνÞ n ðν - 1Þ n ðν - 1Þ
aij = a
k = 1 ik
akj ≤ k=1
aik j akj j ≤ n nν - 1 M ν - 1 M
= nν M ν : ð15:10Þ
1 1 ν ν
enM = nM : ð15:11Þ
ν=0 ν!
The series (15.11) certainly converges; note that nM is not a matrix but a number.
From (15.10) we obtain
1 1 ðνÞ 1 1 ν ν
a ≤ n M = enM :
ν=0 ν! ij ν=0 ν!
ð νÞ
This implies that 1 1
ν = 0 ν! aij converges (i.e., absolute convergence [3]). Thus,
from Definition 15.1, (15.7) converges.
To further examine the exponential functions of matrices, in the following
proposition, we show that a collection of matrices forms a linear vector space.
610 15 Exponential Functions of Matrices
Proposition 15.1 Let M be a collection of all (n, n) square matrices. Then, M forms
a linear vector space.
Proof Let A = aij 2 M and B = bij 2 M be (n, n) square matrices. Then,
αA = (αaij) and A + B = (aij + bij) are again (n, n) square matrices, and so we
have αA 2 M and A þ B 2 M . Thus, M forms a linear vector space.
2
In Proposition 15.1, M forms a n2-th order vector space V n . To define an inner
ðk Þ
product in this space, we introduce j PðlÞ ðk, l = 1, 2, ⋯, nÞ as a basis set of n2
vectors. Now, let A = (aij) be a (n, n) square matrices. To explicitly show that A is a
vector in the inner product space, we express it as
n ðk Þ
j Ai = a
k,l = 1 kl
j PðlÞ :
n ðk Þ
hA jj Ai{ = a
k,l = 1 kl
PðlÞ j , ð15:12Þ
ðk Þ ðk Þ
where PðlÞ j is an adjoint matrix of j PðlÞ . With the notations in (15.12), see Sects.
ðk Þ
1.4 and 13.3; for PðlÞ see Sect. 14.1. Meanwhile, let B = (bij) be another (n, n) square
matrix denoted by
n ðsÞ
j Bi = b
s,t = 1 st
j PðtÞ :
ðk Þ
Here, let us define an orthonormal basis set j PðlÞ and an inner product between
vectors described in a form of matrices such that
ðk Þ ðsÞ
PðlÞ jPðtÞ δks δlt :
n ðk Þ ðsÞ n
hAjBi a b
k,l,s,t = 1 kl st
PðlÞ jPðtÞ = a b δ δ
k,l,s,t = 1 kl st ks lt
n
= a b :
k,l = 1 kl kl
ð15:13Þ
The relation (15.13) leads to a norm already introduced in (13.8). The norm of a
matrix is defined as
15.2 Exponential Functions of Matrices and Their Manipulations 611
n 2
kAk hAjAi = i,j = 1
aij : ð15:14Þ
Proof Let C = AB. Then, from Cauchy–Schwarz inequality (Sect. 13.1), we get
2 2 2
kC k2 = cij = a b
k ik kj
≤ jaik j2 blj
i, j i, j i, j k l
2
= i,k
jaik j2 blj = kAk2 kBk2 : ð15:16Þ
j, l
Proof Since A and B are mutually commutative, (A + B)n can be expanded through
the binomial theorem such that
1 n Ak Bn - k
ðA þ BÞn = k = 0 k! ðn - k Þ!
: ð15:18Þ
n!
⋯
Japanese) [4], with the
⋯⋯⋯
permission of Baifukan Co.,
Ltd., Tokyo
0
0 +1
2m 1 m An m Bn
n = 0 n!
ðA þ B Þn = n = 0 n! n = 0 n!
þ Rm , ð15:19Þ
where Rm has a form of Ak Bl =k!l!, where the summation is taken over all
ðk , lÞ
combinations of (k, l ) that satisfy max(k, l) ≥ m + 1 and k + l ≤ 2m. The number of all
the combinations (k, l ) is m(m + 1) accordingly; see Fig. 15.1 [4]. We remark that if
in (15.19) we put m = 0, (15.19) trivially holds with Rm = 0 (i.e., zero matrix). The
matrix Rm, however, changes with increasing m, and so we must evaluate it at
m → 1. Putting max(| |A|| , | |B|| , 1) = C and using Theorem 15.1, we get
jjAjjk jjBjjl
jjRm jj ≤ : ð15:20Þ
k!l!
ðk, lÞ
where m(m + 1) in the numerator comes from the number of combinations (k, l ) as
pointed out above. From Stirling’s interpolation formula [5], we have
15.2 Exponential Functions of Matrices and Their Manipulations 613
p
Γðz þ 1Þ ≈ 2πe - z zzþ2 :
1
ð15:22Þ
Then, we have
C2m em - 1 C 2m
½RHS of ð15:21Þ = ≈ p
ðm - 1Þ! 2π ðm - 1Þm - 2
1
m - 12
C eC 2
=p ð15:24Þ
2πe m - 1
and
m - 12
C eC 2
lim p = 0: ð15:25Þ
m → 1 2πe m - 1
lim kRm k = 0:
m→1
ð15:26Þ
Remember that kRmk = 0 if and only if Rm = 0 (zero matrix); see (13.4) and
(13.8) in Sect. 13.1. Finally, taking limit (m → 1) of both sides of (15.19), we
obtain
2m 1 m An m Bn
lim n = 0 n!
ðA þ BÞn = lim n = 0 n! n = 0 n!
þ Rm :
m→1 m→1
2m 1 m An m Bn
lim n=0
ðA þ BÞn = lim n=0 n=0
: ð15:27Þ
m→1 n! m→1 n! n!
This implies that (15.17) holds. It is obvious that if A and B commute, then expA
and expB commute. That is, expA exp B = exp B exp A. These complete the proof.
From Theorem 15.2, we immediately get the following property.
To show this, it suffices to find that A and -A commute. That is, A(-A) = -
A2 = (-A)A. Replacing B with -A in (15.17), we have
614 15 Exponential Functions of Matrices
This leads to (15.28). Notice here that exp0 = E from (15.7). We further have
important properties of exponential functions of matrices.
= P - 1 A PP - 1 A PP - 1 ⋯ PP - 1 AP
= P - 1 AEAE⋯EAP = P - 1 An P: ð15:33Þ
2 1 n 1
exp P - 1 AP = E þ P - 1 AP þ P - 1 AP þ ⋯ þ P - 1 AP þ⋯
2! n!
A2 An
= P - 1 EP þ P - 1 AP þ P - 1 P þ ⋯ þ P-1 P þ ⋯
2! n!
A2 An
= P-1 E þ A þ þ ⋯ þ þ ⋯ P = P - 1 ðexp AÞP: ð15:34Þ
2! n!
= ðA n Þ{ :
n
= ðAn ÞT and A{
n
AT ð15:35Þ
The confirmation of (15.31) and (15.32) is left for readers. For future purpose, we
describe other important theorem and properties of the exponential functions of
matrices.
Theorem 15.3 Let a1, a2, ⋯, an be eigenvalues of a (n, n) square matrix A. Then,
expa1, exp a2, ⋯, exp an are eigenvalues of expA.
Proof From Theorem 12.1 we know that every (n, n) square matrix can be
converted to a triangle matrix by similarity transformation. Also, we know that
eigenvalues of a triangle matrix are given by its diagonal elements (Sect. 12.1). Let
P be a non-singular matrix used for such a similarity transformation. Then, we have
15.2 Exponential Functions of Matrices and Their Manipulations 615
a1
P - 1 AP ¼ ⋱ : ð15:36Þ
0 an
t 11
T¼ ⋱ : ð15:37Þ
0 t nn
tm
11
Tm ¼ ⋱ :
0 tm
nn
am
1
m
P - 1 AP ¼ ⋱ ¼ P - 1 Am P, ð15:38Þ
0 am
n
where the last equality is due to (15.33). Summing (15.38) over m following (15.34),
we obtain another triangle matrix expressed as
Once again considering that eigenvalues of a triangle matrix are given by its
diagonal elements and that the eigenvalues are invariant under a similarity transfor-
mation, (15.39) implies that expa1, exp a2, ⋯, exp an are eigenvalues of expA.
These complete the proof.
A question of whether the eigenvalues are a proper eigenvalue or a generalized
eigenvalue is irrelevant to Theorem 15.3. In other words, the eigenvalue may be a
proper one or a generalized one (Sect. 12.6). That depends solely upon the nature of
A in question.
Theorem 15.3 immediately leads to a next important relation. Taking a determi-
nant of (15.39), we get
To derive (15.40), refer to (11.59) and (12.10) as well. Since (15.40) represents an
important property of the exponential functions of matrices, we restate it separately
as a following formula [4]:
We have examined general aspects until now. Nonetheless, we have not yet
studied the relationship between the exponential functions of matrices and specific
types of matrices so far. Regarding the applications of exponential functions of
matrices to mathematical physics, especially with the transformations of functions
and vectors, we frequently encounter orthogonal matrices and unitary matrices. For
this purpose, we wish to examine the following properties:
(6) Let A be a real skew-symmetric matrix. Then, expA is a real orthogonal matrix.
(7) Let A be an anti-Hermitian matrix. Then, expA is a unitary matrix.
To show (6), we note that a skew-symmetric matrix is described as
AT = - A or AT þ A = 0: ð15:42Þ
where the last equality comes from Property (3) of (15.31). Equation (15.43) implies
that expA is a real orthogonal matrix.
Since an anti-Hermitian matrix A is expressed as
A{ = - A or A{ þ A = 0, ð15:44Þ
where the last equality comes from Property (4) of (15.32). This implies that expA is
a unitary matrix.
Regarding Properties (6) and (7), if A is replaced with tA (t : real), the relations
(15.42)–(15.45) are held unchanged. Then, Properties (6) and (7) are rewritten as
(6)′ Let A be a real skew-symmetric matrix. Then, exptA (t : real) is a real
orthogonal matrix.
(7)′ Let A be an anti-Hermitian matrix. Then, exptA (t : real) is a unitary matrix.
Now, let a function F(t) be expressed as F(t) = exp tx with real numbers t and x.
Then, we have a well-known formula
15.2 Exponential Functions of Matrices and Their Manipulations 617
d
F ðt Þ = xF ðt Þ: ð15:46Þ
dt
Next, we extend (15.46) to the exponential functions of matrices. Let us define the
differentiation of an exponential function of a matrix with respect to a real parameter
t. Then, in concert with (15.46), we have a following important theorem.
Theorem 15.4 [1, 2] Let F(t) exp tA with t being a real number and A being a
matrix that does not depend on t. Then, we have
d
F ðt Þ = AF ðt Þ: ð15:47Þ
dt
where we considered that tA and ΔtA are commutative and used Theorem 15.2.
Therefore, we get
1 1 1 1 1
½F ðt þ Δt Þ - F ðt Þ = ½F ðΔt Þ - E F ðt Þ = ðΔtAÞν - E F ðt Þ
Δt Δt Δt ν=0 ν!
1 1
= ðΔt Þν - 1 Aν F ðt Þ: ð15:49Þ
ν=1 ν!
1 d
lim ½F ðt þ Δt Þ - F ðt Þ F ðt Þ = AF ðt Þ:
Δt → 0 Δt dt
Notice that in (15.49) only the first term (i.e., ν = 1) of RHS is non-vanishing with
respect to Δt → 0. Then, (15.47) will follow. This completes the proof.
On the basis of the power series expansion of the exponential function of a matrix
defined in (15.7), A and F(t) are commutative. Therefore, instead of (15.47) we may
write
d
F ðt Þ = F ðt ÞA: ð15:50Þ
dt
Using Theorem 15.4, we show the following important properties. These are
converse propositions to Properties (6) and (7).
618 15 Exponential Functions of Matrices
(8) Let exptA be a real orthogonal matrix with any real number t. Then, A is a real
skew-symmetric matrix.
(9) Let exptA be a unitary matrix with any real number t. Then, A is an anti-
Hermitian matrix.
To show (8), from the assumption with any real number t, we have
Since (15.52) must hold with any real number, (15.52) must hold with t → 0 as
well. On this condition, from (15.52) we get A + AT = 0. That is, A is a real skew-
symmetric matrix.
In the case of (9), similarly we have
15.3.1 Introduction
In Chap. 10 we have dealt with SOLDEs using Green’s functions. In the present
section, we examine the properties of systems of differential equations. There is a
close relationship between the two topics. First we briefly outline it.
We describe an example of the system of differential equations. First, let us think
of the following general equation:
where x(t) and y(t) vary as a function of t; a(t), b(t), c(t) and d(t) are coefficients.
Differentiating (15.54) with respect to t, we get
= að_t Þxðt Þ þ aðt Þxð_t Þ þ bð_t Þyðt Þ þ bðt Þ½ cðt Þxðt Þ þ d ðt Þyðt Þ
= að_t Þ þ bðt Þcðt Þ xðt Þ þ aðt Þxð_t Þ þ bð_t Þ þ bðt Þdðt Þ yðt Þ, ð15:56Þ
where with the second equality we used (15.55). To delete y(t) from (15.56), we
multiply b(t) on both sides of (15.56) and multiply bð_t Þ þ bðt Þdðt Þ on both sides of
(15.54) and further subtract the latter equation from the former. As a result, we
obtain
Upon the above derivation, we assume b ≠ 0. Solving (15.57) and substituting the
resulting solution for (15.54), we can get a solution for y(t). If, on the other hand,
b = 0, we have xð_t Þ = aðt Þxðt Þ. This is a simple FOLDE and can readily be integrated
to yield
t
x = C exp aðt 0 Þdt 0 , ð15:58Þ
_ þ d y_ = c_ x þ cax þ dy
€y = c_ x þ c_x þ dy _ þ dy_
t
= ðc_ þ caÞ C exp aðt 0 Þdt 0 _ þ d y,_
þ dy
where we used (15.58) as well as x_ = ax. Thus, we are going to solve a following
inhomogeneous SOLDE:
t
_ = ðc_ þ caÞC exp
€y - d y_ - dy aðt 0 Þdt 0 : ð15:59Þ
In Sect. 10.2 we have shown how we are able to solve an inhomogeneous FOLDE
using a weight function. For a later purpose, let us consider a next FOLDE assuming
a(x) 1 in (10.23):
620 15 Exponential Functions of Matrices
t
xðt Þ = C exp - pðt 0 Þdt 0 uðt Þ: ð15:62Þ
where we assume that the functional form u(t) remains unchanged in the inhomo-
geneous equation (15.60) and that instead the “constant” k(t) may change as a
function of t. Inserting (15.63) into (15.60), we have
_ þ ku_ þ pku = ku
x_ þ px = ku _ þ kðu_ þ puÞ = ku
_ þ k ð- pu þ puÞ
_ = q,
= ku ð15:64Þ
where with the third equality we used (15.61) and (15.62) to get u_ = - pu. In this
way, (15.64) can easily be integrated to give
t
qð t 0 Þ 0
k ðt Þ = dt þ C,
uð t 0 Þ
t
qðt 0 Þ 0
xðt Þ = k ðt Þuðt Þ = uðt Þ dt þ Cuðt Þ: ð15:65Þ
uðt 0 Þ
Comparing (15.65) with (10.29), we notice that these two expressions are related.
In other words, the method of variation of constants in (15.65) is essentially the same
as that using the weight function in (10.29).
Another important point to bear in mind is that in Chap. 10 we have solved
SOLDEs of constant coefficients using the method of Green’s functions. In that case,
we have separately estimated a homogeneous term and inhomogeneous term (i.e.,
surface term). In the present section, we are going to seek a method corresponding to
that using Green’s functions. In the above discussion, we see that a system of
differential equations with two unknows can be translated into a SOLDE. Then,
15.3 System of Differential Equations 621
The equations of (15.54) and (15.55) are unified in a homogeneous single matrix
equation such that
að t Þ cð t Þ
xð_t Þ yð_t Þ = ðxðt Þ yðt ÞÞ : ð15:66Þ
bð t Þ d ðt Þ
In (15.66), x(t) and y(t) and their derivatives represent the functions (of t), and so
we describe them as row matrices. With this expression of equation, see Sect. 11.2
and (11.37) therein. Another expression is a transposition of (15.66) described by
xð_t Þ að t Þ bð t Þ xð t Þ
= : ð15:67Þ
yð_t Þ cð t Þ d ð t Þ yð t Þ
Notice that the coefficient matrix has been transposed in (15.67) accordingly. We
are most interested in an equation of constant coefficient, and hence, we rewrite
(15.66) once again explicitly as
a c
xð_t Þ yð_t Þ = ðxðt Þ yðt ÞÞ ð15:68Þ
b d
or
d a c
ð xð t Þ yð t Þ Þ = ð xð t Þ yð t Þ Þ : ð15:69Þ
dt b d
dF ðt Þ a c
= F ðt Þ : ð15:70Þ
dt b d
a c
A= : ð15:71Þ
b d
Then, we get
1 1
exp tA E þ tA þ ðtAÞ2 þ ⋯ þ ðtAÞν þ ⋯: ð15:73Þ
2! ν!
Note that if A (or tA) is a (n, n) matrix, F(t) of (15.72) is a (n, n) matrix as well.
Since in that general case F(t) is non-singular, it consists of n linearly independent
column vectors, i.e., (n, 1) matrices; see Chap. 11. Then, F(t) is symbolically
described as
F ðt Þ = ðx1 ðt Þ x2 ðt Þ ⋯ xn ðt ÞÞ,
where xi(t) (1 ≤ i ≤ n) represents a i-th column vector. Note that since F(0) = E, xi(0) has
its i-th row component of 1, otherwise 0. Since F(t) is non-singular, n column vectors
xi(t) (1 ≤ i ≤ n) are linearly independent. In the case of, e.g., (15.66), A is a (2, 2)
matrix, which implies that x(t) and y(t) represent two linearly independent column
vectors, i.e., (2, 1) matrices. Furthermore, if we look at the constitution of (15.70), x(t)
and y(t) represent two linearly independent solutions of (15.70). We will come back to
this point later.
On the basis of the method of variation of constants, we deal with inhomogeneous
equations and describe below the procedures for solving the system of equations
with two unknowns. But, they can be readily conformed to the general case where
the equations with n unknowns are dealt with.
Now, we rewrite (15.68) symbolically such that
a c
xð_t Þ yð_t Þ = ðxðt Þ yðt ÞÞ þ ðp qÞ: ð15:77Þ
b d
Using the same solution F(t) that appears in the homogeneous equation, we
assume that the solution of the inhomogeneous equation is described by
where k(t) is a variable “constant.” Replacing x(t) in (15.76) with x(t) = k(t)F(t) of
(15.78), we obtain
_ þ kF_ = kF
x_ = kF _ þ kFA = kFA þ b, ð15:79Þ
_ = b:
kF ð15:80Þ
t
k= ds bðsÞF ðsÞ - 1 : ð15:81Þ
t
xðt Þ = kðt ÞF ðt Þ = ds bðsÞF ðsÞ - 1 F ðt Þ : ð15:82Þ
t0
Here we define
The matrix Rðs, t Þ is said to be a resolvent matrix [6]. This matrix plays an
essential role in solving the system of differential equations both homogeneous and
inhomogeneous with either homogeneous or inhomogeneous boundary conditions
(BCs). Rewriting (15.82), we get
624 15 Exponential Functions of Matrices
t
xðt Þ = kðt ÞF ðt Þ = ds½bðsÞRðs, t Þ : ð15:84Þ
t0
-1
ð2Þ Rðs, t Þ - 1 = F ðsÞ - 1 F ðt Þ = F ðt Þ - 1 F ðsÞ = Rðt, sÞ: ð15:86Þ
ð3Þ Rðs, t ÞRðt, uÞ = F ðsÞ - 1 F ðt ÞF ðt Þ - 1 F ðuÞ = F ðsÞ - 1 F ðuÞ = Rðs, uÞ: ð15:87Þ
uðaÞ = σ, ð15:88Þ
where u(t) is a solution of a FOLDE. In that case, the BC can be translated into the
“initial” condition. In the present case, we describe the BCs as
where σ x(t0) and τ y(t0). Then, the said term associated with the inhomoge-
neous BCs is expected to be written as
xðt 0 ÞRðt 0 , t Þ:
In fact, since xðt 0 ÞRðt 0 , t 0 Þ = xðt 0 ÞE = xðt 0 Þ, this satisfies the BCs. Then, the full
expression of the solution of the inhomogeneous equation that takes account of the
BCs is assumed to be expressed as
t
xðt Þ = xðt 0 ÞRðt 0 , t Þ þ ds½bðsÞRðs, t Þ: ð15:90Þ
t0
Let us confirm that (15.90) certainly gives a proper solution of (15.76). First we
give a formula on the differentiation of
15.3 System of Differential Equations 625
t
d
Qðt Þ = Pðt, sÞds, ð15:91Þ
dt t0
where P(t, s) stands for a (1, 2) matrix whose general form is described by (q(t, s)
r(t, s)), in which q(t, s) and r(t, s) are functions of t and s. We rewrite (15.91) as
tþΔ t
1
Qðt Þ = lim Pðt þ Δ, sÞds - Pðt, sÞds
Δ→0 Δ t0 t0
tþΔ t t
1
= lim Pðt þ Δ, sÞds - Pðt þ Δ, sÞds þ Pðt þ Δ, sÞds
Δ→0 Δ t0 t0 t0
t
- Pðt, sÞds
t0
tþΔ t t
1
= lim Pðt þ Δ, sÞds þ Pðt þ Δ, sÞds - Pðt, sÞds
Δ→0 Δ t t0 t0
t t
1 1
≈ lim Pðt, t ÞΔ þ lim Pðt þ Δ, sÞds - Pðt, sÞds
Δ→0 Δ Δ→0 Δ t0 t0
t
∂Pðt, sÞ
= Pðt, t Þ þ ds: ð15:92Þ
t0 ∂t
d t t
∂Rðs, t Þ
bðsÞRðs, t Þds = bðt ÞRðt, t Þ þ bðsÞ ds
dt t0 t0 ∂t
t
∂Rðs, t Þ
= bð t Þ þ bð s Þ ds, ð15:93Þ
t0 ∂t
where with the last equality we used (15.85). Considering (15.83) and (15.93) and
differentiating (15.90) with respect to t, we get
t
xð_t Þ = xðt0 ÞF ðt 0 Þ - 1 F ð_t Þ þ ds bðsÞF ðsÞ - 1 F ð_t Þ þ bðt Þ
t0
t
= xðt 0 ÞF ðt 0 Þ - 1 F ðt ÞA þ ds bðsÞF ðsÞ - 1 F ðt ÞA þ bðt Þ
t0
t
= xðt 0 ÞRðt 0 , t ÞA þ ds½bðsÞRðs, t ÞA þ bðt Þ
t0
626 15 Exponential Functions of Matrices
t
= xðt 0 ÞRðt 0 , t Þ þ ds½bðsÞRðs, t Þ A þ bðt Þ
t0
where with the second equality we used (15.70) and with the last equality we used
(15.90). Thus, we have certainly recovered the original differential equation of
(15.76). Consequently, (15.90) is the proper solution for the given inhomogeneous
equation with inhomogeneous BCs.
The above discussion and formulation equally apply to the general case of the
differential equation with n unknowns, even though the calculation procedures
become increasingly complicated with the increasing number of unknowns.
xð_t Þ = x þ 2y þ 1,
yð_t Þ = 2x þ y þ 2, ð15:95Þ
1 2
xð_t Þ yð_t Þ = ðx yÞ þ ð1 2Þ: ð15:96Þ
2 1
1 2
xð_t Þ yð_t Þ = ðx yÞ : ð15:97Þ
2 1
As discussed in Sect. 15.3.2, this can readily be solved to produce a solution F(t)
such that
15.3 System of Differential Equations 627
where A = 1
2
2
1
. Since A is a real symmetric (i.e., Hermitian) matrix, it should be
diagonalized by the unitary similarity transformation (see Sect. 14.3). For the
purpose of getting an exponential function of a matrix, it is easier to estimate
exptD (where D is a diagonal matrix) than to directly calculate exptA. That is, we
rewrite (15.97) as
1 1
p p
U= 2 2
1 1
-p p
2 2
and
1 2 -1 0
D = U -1 U = U - 1 AU = : ð15:100Þ
2 1 0 3
Or we have
A = UDU - 1 : ð15:101Þ
Hence, we get
where with the last equality we used (15.30). From Theorem 15.3, we have
e-t 0
exptD = : ð15:103Þ
0 e3t
In turn, we get
628 15 Exponential Functions of Matrices
1 1 1 1
p p p -p
2 2 e-t 0 2 2
F ðt Þ = 1 1 0 e3t 1 1
-p p p p
2 2 2 2
1 e - t þ e3t - e - t þ e3t
= : ð15:104Þ
2 - e - t þ e3t e - t þ e3t
1 e - 3t þ et e - 3t - e3t
F ðt Þ - 1 = : ð15:105Þ
2 e - 3t - e3t e - 3t þ et
Rðs; t Þ = F ðsÞ - 1 F ðt Þ
1 e - ðt - sÞ þ e3ðt - sÞ - e - ðt - sÞ þ e3ðt - sÞ
= : ð15:106Þ
2 - e - ðt - sÞ þ e3ðt - sÞ e - ðt - sÞ þ e3ðt - sÞ
t
xðt Þ = xð0ÞRð0, t Þ þ ds½bðsÞRðs, t Þ, ð15:107Þ
0
where x(0) = (x(0) y(0)) = (c d) represents the BCs and b(s) = (1 2) comes from
(15.96). This can easily be calculated so that we can have
xð0ÞRð0, t Þ
1 1
= ðc - dÞe - t þ ðc þ dÞe3t ðd - cÞe - t þ ðc þ dÞe3t ð15:108Þ
2 2
and
t
1 -t 1 -t
ds½bðsÞRðs, t Þ = e þ e3t - 1 - e - e3t : ð15:109Þ
0 2 2
Note that the first component of (15.108) and (15.109) represents the
x component of the solution for (15.95) and that the second component represents
the y component. From (15.108), we have
Thus, we find that the BCs are certainly satisfied. Putting t = 0 in (15.109), we
have
15.3 System of Differential Equations 629
0
ds½bðsÞRðs, 0Þ = 0, ð15:111Þ
0
1
xð t Þ = ðc - d þ 1Þe - t þ ðc þ d þ 1Þe3t - 2 ,
2
1
yðt Þ = ðd - c - 1Þe - t þ ðc þ d þ 1Þe3t : ð15:112Þ
2
1 2 2 1
t A þ ⋯ þ t ν Aν þ ⋯,
exp tA = E þ tA þ ð15:113Þ
2! ν!
1 1
expð- sAÞ = E þ ð- sAÞ þ ð- sÞ2 A2 þ ⋯ þ ð- sÞν Aν þ ⋯: ð15:114Þ
2! ν!
Both (15.113) and (15.114) are polynomials of a constant matrix A and, hence,
exptA and exp(-sA) commute as well. Then, from Theorem 15.2 and Property (1) of
(15.28), we get
This implies that once we get exptA, we can safely obtain the resolvent matrix by
automatically replacing t of (15.72) with t - s. Also, exptA and exp(-sA) are
commutative. In the present case, therefore, by exchanging the order of products
of F(s)-1 and F(t) in (15.106), we have the same result as (15.106) such that
630 15 Exponential Functions of Matrices
Rðs; t Þ = F ðt ÞF ðsÞ - 1
1 e - ðt - sÞ þ e3ðt - sÞ - e - ðt - sÞ þ e3ðt - sÞ
= : ð15:117Þ
2 - e - ðt - sÞ þ e3ðt - sÞ e - ðt - sÞ þ e3ðt - sÞ
2. The resolvent matrix of the present example is real symmetric. This is because if
A is real symmetric, i.e., AT = A, we have
n
ðAn ÞT = AT = An , ð15:118Þ
that is, An is real symmetric as well. From (15.7), expA is real symmetric accord-
ingly. In a similar manner, if A is Hermitian, expA is also Hermitian.
3. Let us think of a case where A is not a constant matrix but varies as a function of
t. In that case, we have
aðt Þ bðt Þ
A ðt Þ = : ð15:119Þ
cð t Þ d ð t Þ
Also, we have
að s Þ bð s Þ
AðsÞ = : ð15:120Þ
cðsÞ d ðsÞ
namely, A(t) and A(s) are not generally commutative. The matrix tA(t) is not
commutative with -sA(s), either. In turn, (15.115) does not hold either. Thus, we
need an elaborated treatment for this.
Example 15.2 Find the resolvent matrix of the following homogeneous equation:
xð_t Þ = 0,
yð_t Þ = x þ y: ð15:122Þ
xð_t Þ yð_t Þ = ðx yÞ 0
0
1
1
: ð15:123Þ
15.3 System of Differential Equations 631
The matrix A = 0
0
1
1
is characterized by an idempotent matrix (see Sect. 12.4).
As in the case of Example 15.1, we get
-1
where we choose P = 1
0
1
1
and, hence, P - 1 = 1
0 1
. Notice that in this
example since A was not a symmetric matrix, we did not use a unitary matrix with the
similarity transformation. As a diagonal matrix D, we have
D = P-1 0
0
1
1
P = P - 1 AP = 0
0
0
1
: ð15:125Þ
Or we have
A = PDP - 1 : ð15:126Þ
Hence, we get
1 0
exptD = : ð15:128Þ
0 et
In turn, we get
1 1 1 0 1 -1 1 - 1 þ et
F ðt Þ = = : ð15:129Þ
0 1 0 et 0 1 0 et
1 e-t - 1
F ðt Þ - 1 = : ð15:130Þ
0 e-t
1 - 1 þ et - s
Rðs; t Þ = F ðsÞ - 1 F ðt Þ = : ð15:131Þ
0 et - s
Using the resolvent matrix, we can include the inhomogeneous term along with
BCs (or initial conditions). Once again, by exchanging the order of products of F(s)-1
632 15 Exponential Functions of Matrices
and F(t) in (15.131), we have the same resolvent matrix. This can readily be checked.
Moreover, we get Rðs, t Þ = F ðt - sÞ.
Example 15.3 Solve the following equation
xð_t Þ = x þ c,
yð_t Þ = x þ y þ d, ð15:132Þ
under the BCs x(0) = σ and y(0) = τ. In a matrix form, (15.132) can be written as
xð_t Þ yð_t Þ = ðx yÞ 1
0
1
1
þ ðc d Þ: ð15:133Þ
M2 = 1
0
1
1
1
0
1
1
= 1
0
2
1
, M3 = 1
0
2
1
1
0
1
1
= 1
0
3
1
, ⋯:
Mn = 1
0
n
1
: ð15:134Þ
Thus, we have
1 2 2 1
exp tM = Rð0, t Þ = E þ tM þ t M þ ⋯ þ tν M ν þ ⋯
2! ν!
1 2 1
= 1 0
þt 1 1
þ t 1 2
þ ⋯ þ t ν 10 1ν þ ⋯
0 1 0 1 2! 0 1 ν!
1 1
et t 1 þ t þ t2 þ ⋯ þ tν - 1 þ ⋯ et tet
= 2! ðν - 1Þ! = : ð15:135Þ
0 et
0 et
et - s ðt - sÞet - s
Rðs; t Þ = expð - sM ÞexptM = : ð15:136Þ
0 et - s
t
xðt Þ = ðσ τÞRð0, t Þ þ ds½ðc dÞRðs, t Þ: ð15:137Þ
0
xðt Þ = ðc þ σ Þet - c,
yðt Þ = ðc þ σ Þtet þ ðd - c þ τÞet þ c - d: ð15:138Þ
From (15.138), we find that the original inhomogeneous equation (15.132) and
BCs of x(0) = σ and y(0) = τ have been recovered.
Example 15.4 Find the resolvent matrix of the following homogeneous equation:
xð_t Þ = 0,
yð_t Þ = x: ð15:139Þ
xð_t Þ yð_t Þ = ðx yÞ 0
0
1
0
: ð15:140Þ
The matrix N = 0
0
1
0
is characterized by a nilpotent matrix (Sect. 12.3). Since
2
N and matrices of higher order of N vanish, we have
exp tN = E þ tN = 1
0
t
1
: ð15:141Þ
t-s
Rðs, t Þ = 1
0 1
: ð15:142Þ
Using the resolvent matrix, we can include the inhomogeneous term along
with BCs.
Example 15.5 As a trivial case, let us consider a following homogeneous equation:
634 15 Exponential Functions of Matrices
xð_t Þ yð_t Þ = ðx yÞ a
0
0
b
, ð15:143Þ
where one of a and b can be zero or both of them can be zero. Even though (15.143)
merely apposes two FOLDEs, we can deal with such cases in an automatic manner.
From Theorem 15.3, as eigenvalues of expA where A = a0 0b , we have ea and eb.
That is, we have
eat
F ðt Þ = exp tA = 0
0
ebt
: ð15:144Þ
is eat. We only have to perform routine calculations with a counterpart of x(t) [or y(t)]
of (x(t) y(t)) using (15.107) to solve an inhomogeneous differential equation
under BCs.
Returning back to Example 15.1, we had
xð_t Þ yð_t Þ = ðx yÞ 1
2
2
1
þ ð1 2Þ: ð15:96Þ
1 2
xð_t Þ yð_t Þ U = ðx yÞUU - 1 U þ ð1 2ÞU, ð15:146Þ
2 1
p1 p1 p1 p1
where U = 2 2
. Then, ðx yÞU = ðx yÞ 2 2
. Defining (X Y)
- p12 p1
2
- p12 p1
2
as
1 1
p p
ðX Y Þ ðx yÞU = ðx yÞ 2 2 , ð15:147Þ
1 1
-p p
2 2
we have
15.3 System of Differential Equations 635
-1 0
X ð_t Þ Y ð_t Þ = ðX Y Þ þ ð1 2ÞU: ð15:148Þ
0 3
t
~ ðt 0 , t Þ þ
X ðt Þ = X ðt 0 ÞR ds ~bðsÞR
~ ðs, t Þ , ð15:149Þ
t0
~ ðs; t Þ = e - ðt - sÞ 0
R : ð15:150Þ
0 e3ðt - sÞ
t
~ ðt 0 , t ÞU - 1 þ
X ðt ÞU - 1 = X ðt 0 ÞU - 1 U R ds ~ ~ ðs, t Þ U - 1 , ð15:151Þ
bðsÞU - 1 U R
t0
where with the first and second terms we insert U-1U = E. Then, we get
t
~ ðt 0 , t ÞU - 1 þ
x ð t Þ = xð t 0 Þ U R ~ ðs, t ÞU - 1 :
dsbðsÞ U R ð15:152Þ
t0
~ ðs, t ÞU - 1 to have
We calculate U R
~ ðs; t ÞU - 1 = 1 1 1 e - ðt - sÞ 0 1 -1
UR
2 -1 1 0 e3ðt - sÞ 1 1
1 e - ðt - sÞ þ e3ðt - sÞ - e - ðt - sÞ þ e3ðt - sÞ
= = Rðs; t Þ: ð15:153Þ
2 - e - ðt - sÞ þ e3ðt - sÞ e - ðt - sÞ þ e3ðt - sÞ
t
xðt Þ = xðt 0 ÞRðt 0 , t Þ þ ds½bðsÞRðs, t Þ: ð15:90Þ
t0
Equation (15.153) shows that Rðs, t Þ and R ~ ðs, t Þ are connected to each other
through the unitary similarity transformation such that
636 15 Exponential Functions of Matrices
0 -ω
xð_t Þ yð_t Þ = ðx yÞ þ ða 0Þ: ð15:155Þ
ω 0
0 -ω
The matrix is characterized by an anti-Hermitian operator. This type
ω 0
of operator frequently appears in Lie groups and Lie algebras that we will deal with
in Chap. 20. Here we define an operator D as
0 -1 0 -ω
D= or ωD = : ð15:156Þ
1 0 ω 0
The calculation procedures will be seen in Chap. 20, and so we only show the
result here. That is, we have
cos ωt - sin ωt
exp tωD = : ð15:157Þ
sin ωt cos ωt
a
x= sin ωt,
ω
a
y= ðcos ωt - 1Þ: ð15:159Þ
ω
− /
− /
a 2 a 2
x2 þ y þ = : ð15:160Þ
ω ω
In other words, the point mass performs a circular motion (see Fig. 15.2).
In Example 15.1 we mentioned that if the matrix A that defines the system of
differential equations varies as a function of t, then the solution method based upon
exptA does not work. Yet, we have an important case where A is not a constant
matrix. Let us consider a next illustration.
In Sect. 4.5 we dealt with an electron motion under a circularly polarized light. Here
we consider the motion of a charged particle in a linearly polarized
electromagnetic wave.
Suppose that a charged particle is placed in vacuum. Suppose also that an
electromagnetic wave (light) linearly polarized along the x-direction is propagated
in the positive direction of the z-axis. Unlike the case of Sect. 4.5, we have to take
account of influence from both of the electric field and magnetic field components of
Lorentz force.
From (7.58) we have
638 15 Exponential Functions of Matrices
where e1, e2, and e3 are the unit vector of the x-, y-, and z-direction. (The latter two
unit vectors appear just below.) If the extent of motion of the charged particle is
narrow enough around the origin compared to a wavelength of the electromagnetic
field, we can ignore kz in the exponent (see Sect. 4.5). Then, we have
E ≈ E0 e1 e - iωt : ð15:162Þ
E0 E 0 e1 E 0 e2
H 0 = e3 × = e3 × = : ð15:165Þ
μ0 μ0 c μ0 c
ε0
E 0 e2 cos ωt
B = μ0 H ≈ μ0 H 0 cos ωt = : ð15:166Þ
c
F = eE þ e x_ × B, ð15:167Þ
where x (=xe1 + ye2 + ze3) is a position vector of the charged particle having a
charge e. In (15.167) the first term represents the electric Lorentz force and the
second term shows the magnetic Lorentz force; see Sect. 4.5. Replacing E and B in
(15.167) with those in (15.163) and (15.166), we get
1 eE
F= 1- z_ eE 0 e1 cos ωt þ 0 x_ e3 cos ωt
c c
x = m€xe1 þ m€ye2 þ m€ze3 ,
= m€ ð15:168Þ
1
m€x = 1 - z_ eE0 cos ωt,
c
m€y = 0,
eE0
m€z = x_ cos ωt: ð15:169Þ
c
Note that the Lorentz force is not exerted in the direction of the y-axis. Putting
a eE0/m and b eE0/mc, we get
0 b cos ωt
ξð_t Þ ζ ð_t Þ = ðξ ζ Þ þ ða cos ωt 0Þ, ð15:172Þ
- b cos ωt 0
0 b cos ωt
ξð_t Þ ζ ð_t Þ = ðξ ζ Þ : ð15:173Þ
- b cos ωt 0
Closely inspecting (15.173) and (15.174), we find that if we can decide f(t) so that
f(t) may satisfy
640 15 Exponential Functions of Matrices
we might well be able to solve (15.172). In that case, moreover, we expect that two
linearly independent column vectors
cos f ðt Þ sin f ðt Þ
and ð15:176Þ
- sin f ðt Þ cos f ðt Þ
give a fundamental set of solutions of (15.172). A function f(t) that satisfies (15.175)
can be given by
b
f ðt Þ = sin ωt = ~
b sin ωt, ð15:177Þ
ω
where ~
b b=ω. Notice that at t = 0 two column vectors of (15.176) give
1 0
0
and 1
,
if we choose f ðt Þ = ~
b sin ωt for f(t) in (15.176).
Thus, defining F(t) as
cos ~
b sin ωt sin ~b sin ωt
F ðt Þ ð15:178Þ
-sin ~
b sin ωt cos ~b sin ωt
dF ðt Þ
= F ðt ÞA,
dt
0 b cos ωt
A : ð15:179Þ
- b cos ωt 0
Equation (15.179) is formally the same as (15.47), even though F(t) is not given
in the form of exptA. This enables us to apply the general scheme mentioned in Sect.
15.3.2 to the present case. That is, we are going to address the given problem using
the method of variation of constants such that
where we define x(t) (ξ(t) ζ(t)) and k(t) (u(t) v(t)) as a variable constant.
Hence, using (15.80)–(15.90), we should be able to find the solution of (15.172)
described by
15.4 Motion of a Charged Particle in Polarized Electromagnetic Wave 641
t
xðt Þ = xðt 0 ÞRðt 0 , t Þ þ ds½bðsÞRðs, t Þ, ð15:90Þ
t0
where b(s) (a cos ωt 0), i.e., the inhomogeneous term in (15.172). The resolvent
matrix Rðs, t Þ is expressed as
where f(t) is given by (15.177). With F(s)-1, we simply take a transpose matrix of
(15.178), because it is an orthogonal matrix. Thus, we obtain
cos ~
b sin ωs - sin ~b sin ωs
F ðsÞ - 1 = :
sin ~
b sin ωs cos ~b sin ωs
t
xðt Þ = xð0ÞRð0, t Þ þ ds½ðacos ωt 0ÞRðs, t Þ ð15:181Þ
0
and setting BCs as x(0) = (σ τ), with the individual components, we finally obtain
a
ξðt Þ = σ cos ~b sin ωt - τ sin ~
b sin ωt þ sin ~b sin ωt ,
b
a a
ζ ðt Þ = σ sin ~b sin ωt þ τ cos ~b sin ωt - cos ~b sin ωt þ : ð15:182Þ
b b
a
ξðt Þ = sin ~
b sin ωt ,
b
a
ζ ðt Þ = 1 - cos ~
b sin ωt : ð15:183Þ
b
642 15 Exponential Functions of Matrices
Notice that the above BCs correspond to that the charged particle is initially
placed (at the origin) at rest (i.e., zero velocity). Deleting the argument t, we get
a 2 a 2
ξ2 þ ζ - = : ð15:184Þ
b b
Figure 15.3 depicts the particle velocities ξ and ζ in the x- and z-directions,
respectively. It is interesting that although the velocity in the x-direction switches
from negative to positive, that in the z-direction is kept nonnegative. This implies
that the particle is oscillating along the x-direction but continually drifting toward the
z-direction while oscillating. In fact, if we integrate ζ(t), we have a continually
drifting term ab t.
Figure 15.4 shows the Lorentz force F as a function of time in the case where the
charge of the particle is positive. Again, the electric Lorentz force (represented by E)
switches from positive to negative in the x-direction, but the magnetic Lorentz force
(represented by x_ × B) holds nonnegative in the z-direction all the time.
To precisely analyze the positional change of the particle, we need to integrate
(15.182) once again. This requires the detailed numerical calculation. For this
purpose the following relation is useful [7]:
1
cosðxsin t Þ = J ðxÞ cos mt,
m= -1 m
1
sinðxsin t Þ = J ðxÞ sin mt,
m= -1 m
ð15:185Þ
where the functions Jm(x) are called the Bessel functions that satisfy the following
equation:
15.4 Motion of a Charged Particle in Polarized Electromagnetic Wave 643
× ,
(b)
× ,
,
d2 J n ðxÞ 1 dJ n ðxÞ n2
þ þ 1 - J ðxÞ = 0: ð15:186Þ
dx2 x dx x2 n
Although we do not examine the properties of the Bessel functions in this book,
the related topics are dealt with in detail in the literature [5, 7–9]. The Bessel
functions are widely investigated as one of special functions in many branches of
mathematical physics.
To conclude this section, we have described an example to which the matrix A of
(15.179) that characterizes the system of differential equations is not a constant
matrix but depends on the parameter t. Unlike the case of the constant matrix A, it
644 15 Exponential Functions of Matrices
may well be difficult to find a fundamental set of solutions. Once we can find the
fundamental set of solutions, however, it is always possible to construct the resolvent
matrix and solve the problem using it. In this respect, we have already encountered a
similar situation in Chap. 10 where Green’s functions were constructed by a
fundamental set of solutions.
In this section, a fundamental set of solutions could be found and, as a conse-
quence, we could construct the resolvent matrix. It is, however, not always the case
that we can recast the system of differential equation (15.66) as the form of (15.179).
Nonetheless, whenever we are successful in finding the fundamental set of solutions,
we can construct a resolvent matrix and, hence, solve the problems. This is a generic
and powerful tool.
References
1. Satake I-O (1975) Linear algebra (Pure and applied mathematics). Marcel Dekker, New York
2. Satake I (1974) Linear algebra. Shokabo, Tokyo
3. Rudin W (1987) Real and complex analysis, 3rd edn. McGraw-Hill, New York
4. Yamanouchi T, Sugiura M (1960) Introduction to continuous groups. Baifukan, Tokyo
5. Dennery P, Krzywicki A (1996) Mathematics for physicists. Dover, New York
6. Inami T (1998) Ordinary differential equations. Iwanami, Tokyo
7. Riley KF, Hobson MP, Bence SJ (2006) Mathematical methods for physics and engineering, 3rd
edn. Cambridge University Press, Cambridge
8. Arfken GB, Weber HJ, Harris FE (2013) Mathematical methods for physicists, 7th edn. Aca-
demic Press, Waltham
9. Hassani S (2006) Mathematical physics. Springer, New York
Part IV
Group Theory and Its Chemical
Applications
The universe comprises space and matter. These two mutually stipulate their modal-
ity of existence. We often comprehend related various aspects as a manifestation of
symmetry. In this part we deal with the symmetry from a point of view of group
theory. In this part of the book, we outline and emphasize chemical applications of
the methods of mathematical physics. This part supplies us with introductory
description of group theory. The group theory forms an important field of both
pure and applied mathematics. Starting with the definition of groups, we cover a
variety of topics related to group theory. Of these, symmetry groups are familiar to
chemists, because they deal with a variety of matter and molecules that are charac-
terized by different types of symmetry. The symmetry group is a kind of finite group
and called a point group as well. Meanwhile, we have various infinite groups that
include rotation group as a typical example. We also mention an introductory theory
of the rotation group of SO(3) that deals with an important topic of, e.g., Euler
angles. We also treat successive coordinate transformations.
Next, we describe representation theory of groups. Schur’s lemmas and related
grand orthogonality theorem underlie the representation theory of groups. In parallel,
characters and irreducible representations are important concepts that support the
representation theory. We present various representations, e.g., regular representa-
tion, direct-product representation, symmetric and antisymmetric representations.
These have wide applications in the field of quantum mechanics and quantum
chemistry, and so forth.
On the basis of the above topics, we highlight quantum chemical applications of
group theory in relation to a method of molecular orbitals. As tangible examples, we
adopt aromatic molecules and methane.
The last part deals with the theory of continuous groups. The relevant theory has
wide applications in many fields of both pure and applied physics and chemistry. We
highlight the topics of SU(2) and SO(3) that very often appear there. Tangible
examples help understand the essence.
Chapter 16
Introductory Group Theory
In contrast to a broad range of applications, the definition of the group is simple. Let
ℊ be a set of elements gν, where ν is an index either countable (e.g., integers) or
uncountable (e.g., real numbers) and the number of elements may be finite or
infinite. We denote this by ℊ = {gν}. If a group is a finite group, we express it as
ℊ = fg1 , g2 , ⋯, gn g, ð16:1Þ
is denoted by a symbol “⋄” below. Note that the symbol ⋄ implies an ordinary
multiplication, an ordinary addition, etc.
(A1) If a and b are any elements of the set ℊ, then so is a ⋄ b. (We sometimes say
that the set is “closed” regarding the multiplication.)
(A2) Multiplication is associative, i.e., a ⋄ (b ⋄ c) = (a ⋄ b) ⋄ c.
(A3) The set ℊ contains an element of e called the identity element such that we
have a ⋄ e = e ⋄ a = a with any element a of ℊ.
(A4) For any a of ℊ, we have an element b such that a ⋄ b = b ⋄ a = e. The
element b is said to be the inverse element of a. We denote b a-1.
In the above definitions, we assume that the commutative law does not necessar-
ily hold, that is, a ⋄ b ≠ b ⋄ a. In that case the group ℊ is said to be a
non-commutative group. However, we have a case where the commutative law
holds, i.e., a ⋄ b = b ⋄ a. If so, the group ℊ is called a commutative group or an
Abelian group.
Let us think of some examples of groups. Henceforth, we follow the convention
and write ab to express a ⋄ b.
Example 16.1 We present several examples of groups below. Examples (1)–(4) are
simple, but Example (5) is general.
1. ℊ = {1, -1}. The group ℊ makes a group with respect to the multiplication. This
is an example of a finite group.
2. ℊ = {⋯, -3, -2, -1, 0, 1, 2, 3, ⋯}. The group ℊ makes a group with
respect to the addition. This is an infinite group. For instance, take a(>0) and
make a + 1 and make (a + 1) + 1, [(a + 1) + 1] + 1, ⋯ again and again. Thus,
addition is not closed and, hence, we must have an infinite group.
-1
3. Let us start with a matrix a = 0
1 0
. Then, the inverse a - 1 = a3 = 0
-1
1
0
.
-1
We have a2 = 0
0
-1
. Its inverse is a2 itself. These four elements make a
group. That is, ℊ = {e, a, a2, a3}. This is an example of cyclic groups.
4. ℊ = {1}. It is a most trivial case, but sometimes the trivial case is very important
as well. We will come back later to this point.
5. Let us think of a more general case. In Chap. 11, we discussed endomorphism on
a vector space and showed the necessary and sufficient condition for the existence
of an inverse transformation. In this relation, consider a set that comprises
matrices such that
This may be either a finite group or an infinite group. The former can be a
symmetry group and the latter can be a rotation group. This group is characterized
by a set of invertible and endomorphic linear transformations over a vector space Vn
and called a linear transformation group or a general linear group and denoted by
GL(n, ℂ), GL(Vn), GL(V ), etc. The relevant transformations are bijective. We can
16.2 Subgroups 649
readily make sure that axioms (A1) to (A4) are satisfied with GL(n, ℂ). Here, the
vector space can be ℂn or a function space.
The structure of a finite group is tabulated in a multiplication table. This is made
up such that group elements are arranged in a first row and a first column and that an
intersection of an element gi in the row and gj in the column is designated as a
product gi ⋄ gj. Choosing the above (3) for an example, we make its multiplication
table (see Table 16.1). There we define a2 = b and a3 = c.
Having a look at Table 16.1, we notice that in the individual rows and columns
each group element appears once and only once. This is well known as a
rearrangement theorem.
Theorem 16.1 Rearrangement theorem [1] In each row or each column in the
group multiplication table, individual group elements appear once and only once.
From this, each row and each column list merely rearranged group elements.
Proof Let a set ℊ = {g1 e, g2, ⋯, gn} be a group. Arbitrarily choosing any
element h from ℊ and multiplying individual elements by h, we obtain a set
H = fhg1 , hg2 , ⋯, hgn g. Then, all the group elements of ℊ appear in H once and
only once. Choosing any group element gi, let us multiply gi by h-1 to get h-1gi.
Since h-1gi must be a certain element gk of ℊ, we put h-1gi = gk. Multiplying both
sides by h, we have gi = hgk. Therefore, we are able to find this very element hgk in
H, i.e., gi in H. This implies that the element gi necessarily appears in H. Suppose in
turn that gi appears more than once. Then, we must have gi = hgk = hgl (k ≠ l ).
Multiplying the relation by h-1, we would get h-1gi = gk = gl, in contradiction to the
supposition. This means that gi appears in H once and only once. This confirms that
the theorem is true of each row of the group multiplication table.
A similar argument applies with a set H0 = fg1 h, g2 h, ⋯, gn hg. This confirms in
turn that the theorem is true of each column of the group multiplication table. These
complete the proof.
16.2 Subgroups
group by itself. Both {e} and ℊ are subgroups as well. We often call subgroups other
than {e} and ℊ “proper” subgroups.
A necessary and sufficient condition for the subset H to be a subgroup is the
following:
ð1Þ hi , hj 2 H ⟹ hi ⋄hj 2 H ,
ð 2Þ h 2 H ⟹ h - 1 2 H :
ℊ = g1 H þ g2 H þ ⋯ þ gk H , ð16:2Þ
where g1, g2, ⋯, gk are mutually different elements with g1 being the identity e. In
that case (16.2) is said to be the left coset decomposition of ℊ by H . Similarly, right
coset decomposition can be done to give
ℊ = H g1 þ H g 2 þ ⋯ þ H g k : ð16:3Þ
In general, however,
gk H ≠ H gk or gk H gk- 1 ≠ H : ð16:4Þ
Taking the case of left coset as an example, let us examine whether different
cosets mutually contain a common element. Suppose that gi H and gj H mutually
16.3 Classes 651
n = sk, ð16:5Þ
16.3 Classes
-1 -1 -1 -1
c = g0 bg0 , b = gag - 1 ⟹ c = g0 bg0 = g0 gag - 1 g0 = g0 gaðg0 gÞ : ð16:6Þ
In the above a set containing a and all the elements conjugate to a is said to be a
(conjugate) class of a. Denoting this set by ∁a, we have
In ∁a the same element may appear repeatedly. It is obvious that in every group
the identity element e forms a class by itself. That is,
∁e = feg: ð16:8Þ
As in the case of the decomposition of a group into (left or right) cosets, we can
decompose a group to classes. If group elements are not exhausted by a set
comprising ∁e or ∁a, let us take b such that b ≠ e and b 2= ∁a and make ∁b similarly
to (16.7). Repeating this procedure, we should be able to decompose a group into
classes. In fact, if group elements have not yet exhausted after these procedures, take
remaining element z and make a class. If the remaining element is only z in this
moment, z can make a class by itself (as in the case of e). Notice that for an Abelian
group every element makes a class by itself. Thus, with a finite group, we have a
decomposition such that
652 16 Introductory Group Theory
ℊ = ∁e þ ∁a þ ∁b þ ⋯ þ ∁z : ð16:9Þ
To show that (16.9) is really a decomposition, suppose that, for instance, a set ∁a \ ∁b
is not an empty set and that x 2 ∁a \ ∁b. Then we must have α and β that satisfy
a following relation: x = αaα-1 = βbβ-1, i.e., b = β-1 αaα-1β = β-1 αa(β-1 α)-1.
This implies that b has already been included in ∁a, in contradiction to the
supposition. Thus, (16.9) is in fact a decomposition of ℊ into a finite number of
classes.
In the above we thought of a class conjugate to a single element. This notion can
be extended to a class conjugate to a subgroup. Let H be a subgroup of ℊ. Let g be
an element of ℊ. Let us now consider a set H 0 = gH g - 1 . The set H 0 is a subgroup of
ℊ and is called a conjugate subgroup.
In fact, let hi and hj be any two elements of H , that is, let ghig-1 and ghjg-1 be any
two elements of H 0 . Then, we have
g - 1H g = H , ð16:11Þ
gH = H g: ð16:12Þ
This implies that the left coset is identical to the right coset. Thus, as far as we are
dealing with a coset pertinent to an invariant subgroup, we do not have to distinguish
left and right cosets.
Now let us anew consider the (left) coset decomposition of ℊ by an invariant
subgroup H
ℊ = g1 H þ g2 H þ ⋯ þ gk H , ð16:13Þ
= gi gj h q , ð16:14Þ
where the third equality comes from (16.11). That is, we should have ∃hp such that
gj- 1 hl gj = hp and hphm = hq. In (16.14) hα 2 H (α stands for l, m, p, q, etc. with
1 ≤ α ≤ s). Note that gi gj hq 2 gi gj H . Accordingly, a product of elements belonging
to gi H and gj H belongs to gi gj H . We rewrite (16.14) as a relation between the sets
ð g i H Þ g j H = g i gj H : ð16:15Þ
Viewing LHS of (16.15) as a product of two cosets, we find that the said product
is a coset as well. This implies that a collection of the cosets forms a group. Such a
group that possesses cosets as elements is said to be a factor group or quotient group.
In this context, the multiplication is a product of cosets. We denote the factor group
by
ℊ=H :
As in the case of the linear vector space, we consider the mapping between group
elements. Of these, the notion of isomorphism and homomorphism is important.
Definition 16.1 Let ℊ = {x, y, ⋯} and ℊ′ = {x′, y′, ⋯} be groups and let a
mapping ℊ → ℊ′ exist. Suppose that there is a one-to-one correspondence (i.e.,
injective mapping)
x $ x0 , y $ y0 , ⋯
between the elements such that xy = z implies that x′y′ = z′ and vice versa.
Meanwhile, any element in ℊ′ must be the image of some element of ℊ. That is,
the mapping is surjective as well and, hence, the mapping is bijective. Then, the two
654 16 Introductory Group Theory
ℊ ffi ℊ0 :
Note that the aforementioned groups can be either a finite group or an infinite
group. We did not designate identity elements. Suppose that x is the identity e. Then,
from the relations xy = z and x′y′ = z′, we have
ey = z = y, x0 y0 = z0 = y0 : ð16:17Þ
Then, we get
x0 = e0 , i:e:, e $ e0 : ð16:18Þ
0
xx - 1 = z = e, x0 x - 1 = e0 , x0 y0 = z0 = e0 : ð16:19Þ
-1 0
y0 = x 0 = x-1 : ð16:20Þ
Then, the two groups ℊ and ℊ′ are said to be homomorphic. The relevant
mapping is called homomorphism. We symbolically denote this relation by
ℊ ℊ0 :
ρðxÞρ x - 1 = ρ xx - 1 = ρðeÞ = e0 :
Therefore,
½ ρð xÞ - 1 = ρ x - 1 :
The two groups can be either a finite group or an infinite group. Note that in the
above the mapping is not injective. The mapping may or may not be surjective.
Regarding the identity and inverse elements, we have the same relations as (16.18)
and (16.20). From Definitions 16.1 and 16.2, we say that the bijective homomor-
phism is the isomorphism.
Let us introduce an important notion of a kernel of a mapping. In this regard, we
have a following definition:
Definition 16.3 Let ℊ = {e, x, y, ⋯} and ℊ′ = {e′, x′, y′, ⋯} be groups and
let e and e′ be the identity elements. Suppose that there exists a homomorphic
mapping ρ: ℊ → ℊ′. Also let F be a subset of ℊ such that
ρðF Þ = e0 : ð16:22Þ
The first and second equalities result from the homomorphism of ρ. Since
F = feg, xy-1 = e, i.e., x = y. Therefore, ρ is injective (i.e., one-to-one correspon-
dence). As ρ is surjective from the assumption, ρ is bijective. The mapping ρ is
isomorphic accordingly.
Conversely, suppose that ρ is isomorphic. Also suppose for ∃x 2 ℊ ρ(x) = e′.
From (16.18), ρ(e) = e′. We have ρ(x) = ρ(e) = e′ ⟹ x = e due to the isomorphism
of ρ (i.e., one-to-one correspondence). This implies F = feg. This completes the
proof.
We become aware of close relationship between Theorem 16.2 and linear trans-
formation versus kernel already mentioned in Sect. 11.2 of Part III. Figure 16.1
shows this relationship. Figure 16.1a represents homomorphic mapping ρ: ℊ → ℊ′
between two groups, whereas Fig. 16.1b shows linear transformation (endomor-
phism) A: Vn → Vn in a vector space Vn.
656 16 Introductory Group Theory
: invertible (bijective)
ρð k i Þ = ρ k j = e0 , ð16:24Þ
ρ k i kj = ρðk i Þρ k j = e0 e0 = e0 : ð16:25Þ
-1
ρ ki- 1 = ½ρðk i Þ - 1 = e0 = e0 : ð16:26Þ
gF g - 1 = F : ð16:28Þ
~
ρðgi F Þ = ρðgi Þ: ð16:29Þ
Then, ~
ρ is an isomorphic mapping.
Proof From (16.15) and (16.21), it is obvious that ρ~ is homomorphic. The confir-
mation is left for readers. Let gi F and gj F be two different cosets. Suppose here that
ρ(gi) = ρ(gj). Then we have
where hp = hihk and h0q = h0j h0l . The identity element is ee = e; ehihk = hihke. The
-1 -1 -1
inverse element is hi h0j = h0j hi - 1 = hi - 1 h0j . Associative law is obvious
from hi h0j = h0j hi .
Thus, ℊ forms a group. This is said to be a direct product of groups,
or a direct-product group. The groups H and H 0 are called direct factors of ℊ. In this
case, we succinctly represent
ℊ = H H 0:
In the above, the condition (2) is equivalent to that 8 g2ℊ is uniquely represented
as
g = hh0 ; h 2 H , h0 2 H 0 : ð16:33Þ
In fact, suppose that H \ H 0 = feg and that g can be represented in two ways
such that
Then, we have
-1 -1
h2 - 1 h1 = h02 h01 ; h2 - 1 h1 2 H , h02 h01 2 H 0: ð16:35Þ
-1
h2 - 1 h1 = h02 h01 = e: ð16:36Þ
That is, h2 = h1 and h02 = h01 . This means that the representation is unique.
Conversely, suppose that the representation is unique and that x 2 H \ H 0 . Then
we must have
x = xe = ex: ð16:37Þ
-1 -1
ghg - 1 = hν h0μ hh0μ hν - 1 = hν hh0μ h0μ hν - 1 = hν hhν - 1 2 H : ð16:38Þ
gH g - 1 = H : ð16:39Þ
Reference 659
Reference
1. Cotton FA (1990) Chemical applications of group theory, 3rd edn. Wiley, New York
Chapter 17
Symmetry Groups
x
P= y : ð17:1Þ
z
where e1, e2, and e3 denote an orthonormal basis vectors pointing to positive
directions of x-, y-, and z-axes, respectively. Similarly we denote a linear transfor-
mation A by
(a) (b)
Disk A Disk B
Fig. 17.1 Rotation of an object. (a) Case where we cannot recognize that the object has been
moved. (b) Case where we can recognize that the object has been moved because of its attribute
(i.e., because the round disk is partly painted black)
moved. According to Fig. 17.1a, b, we have the former case and the latter case,
respectively. In the latter case, we have recognized the movement of the object by its
attribute, i.e., by that the object is partly painted black. □
In the above example, we do not have to be rigorous. We have a clear intuitive
criterion for a judgment of whether a geometric object has been moved. From now
on, we assume that the geometric character of an object is pertinent to both its
positional relationship and attribute. We define the equivalent (or indistinguishable)
disposition of an object and the operation that yields such an equivalent disposition
as follows:
Definition 17.1
1. Symmetry operation: A geometric operation that produces an equivalent
(or indistinguishable) disposition of an object.
2. Equivalent (or indistinguishable) disposition: Suppose that regarding a geometric
operation of an object, we cannot recognize that the object has been moved before
and after that geometric operation. In that case, the original disposition of the
object and the resulting disposition reached after the geometric operation are
referred to as an equivalent disposition. The relevant geometric operation is the
symmetry operation. ▽
Here we should clearly distinguish translation (i.e., parallel displacement) from
the abovementioned symmetry operations. This is because for a geometric object to
possess the translation symmetry the object must be infinite in extent, typically an
infinite crystal lattice. The relevant discipline is widely studied as space group and
has a broad class of applications in physics and chemistry. However, we will not deal
with the space group or associated topics, but focus our attention upon symmetry
groups in this book.
In the above example, the rotation is a symmetry operation with Fig. 17.1a, but
the said geometric operation is not a symmetric operation with Fig. 17.1b.
Let us further inspect properties of the symmetry operation. Let us consider a set
H consisting of symmetry operations. Let a and b be any two symmetric operations
of H. Then, (1) a ⋄ b is a symmetric operation as well. (2) Multiplication of
successive symmetric operations a, b, c is associative, i.e., a ⋄ (b ⋄ c) = (a ⋄ b) ⋄ c.
(3) The set H contains an element of e called the identity element such that we have
a ⋄ e = e ⋄ a = a with any element a of H. Operating “nothing” should be e. If the
664 17 Symmetry Groups
-1 0 0
A= 0 -1 0 : ð17:4Þ
0 0 1
This operation represents a π rotation around the z-axis. Let us think of another
symmetric operation described by
1 0 0
B= 0 1 0 : ð17:5Þ
0 0 -1
This produces a mirror symmetry with respect to the xy-plane. Then, we have
-1 0 0
C AB = BA = 0 -1 0 : ð17:6Þ
0 0 -1
The operation C shows an inversion about the origin. Thus, A, B, and C along
with an identity element E form a group. Here E is expressed as
1 0 0
E= 0 1 0 : ð17:7Þ
0 0 1
y
O
or nπ (n: integer). For later use, let us have a matrix form that expresses a θ rotation
around the z-axis. Figure 17.2 depicts a graphical illustration for this.
A matrix R has a following form:
cos θ - sin θ 0
R= sin θ cos θ 0 : ð17:8Þ
0 0 1
Note that R has been reduced. This implies that the three-dimensional Euclidean
space is decomposed into a two-dimensional subspace (xy-plane) and a
one-dimensional subspace (z-axis). The xy-coordinates are not mixed with the z-
component after the rotation R. Note, however, that if the rotation axis is oblique
against the xy-plane, this is not the case. We will come back to this point later.
Taking only the xy-coordinates in Fig. 17.3, we make a calculation. Using an
addition theorem of trigonometric functions, we get
where we used x = r cos α and y = r sin α. Combining (17.9) and (17.10), as a matrix
form, we get
666 17 Symmetry Groups
θ
α
O x
x′ cos θ - sin θ x
= : ð17:11Þ
y′ sin θ cos θ y
The matrix of RHS in (17.11) is the same as (11.31) and represents a transfor-
mation matrix of a rotation angle θ within the xy-plane. Whereas in Chap. 11 we
considered this from the point of view of the transformation of basis vectors, here we
deal with the transformations of coordinates in the fixed coordinate system. If θ ≠ π,
off-diagonal elements do not vanish. Including the z-component we get (17.8).
Let us summarize symmetry operations and their (3, 3) matrix representations.
The coordinates before and after a symmetry operation are expressed as
x x0
y and y0 , respectively:
z z0
1. Identity transformation:
To leave a geometric object or a coordinate system unchanged (or unmoved).
By convention, we denote it by a capital letter E. It is represented by a (3, 3)
identity matrix.
2. Rotation symmetry around a rotation axis:
Here a “proper” rotation is intended. We denote a rotation by a rotation axis
and its magnitude (i.e., rotation angle). Thus, we have
17.1 A Variety of Symmetry Operations 667
1 0 0
Rxφ = 0 cos φ - sin φ : ð17:12Þ
0 sin φ cos φ
z′ cos ϕ - sin ϕ 0 z
x′ = sin ϕ cos ϕ 0 x : ð17:13Þ
y′ 0 0 1 y
This can easily be visualized in Fig. 17.4; consider that cyclic permutation of
x z
y produces x . Shuffling the order of coordinates, we get
z y
x′ cos ϕ 0 sin ϕ x
y′ = 0 1 0 y : ð17:14Þ
z′ - sin ϕ 0 cos ϕ z
n:
Cm
1 0 0 -1 0 0 1 0 0
M xy = 0 1 0 , M yz = 0 1 0 , M zx = 0 -1 0 : ð17:15Þ
0 0 -1 0 0 1 0 0 1
668 17 Symmetry Groups
y
O A
x
Fig. 17.4 Rotation by ϕ around the y-axis
-1 0 0
IO = 0 -1 0 : ð17:16Þ
0 0 -1
Note that as obviously from the matrix form, IO is commutable with any other
symmetry operations. Note also that IO can be expressed as successive symmetry
operations or product of symmetry operations. For instance, we have
17.2 Successive Symmetry Operations 669
-1 0 0 1 0 0
I O = Rzπ M xy = 0 -1 0 0 1 0 : ð17:17Þ
0 0 1 0 0 -1
Note that Rzπ and Mxy are commutable, i.e., RzπMxy = MxyRzπ .
5. Improper rotation:
This is a combination of a proper rotation and a reflection by a mirror
symmetry plane. That is, rotation around an axis is performed first and then
reflection is carried out by a mirror plane that is perpendicular to the rotation
axis. For instance, an improper rotation is expressed as
1 0 0 cos θ – sin θ 0
M xy Rzθ = 0 1 0 sin θ cos θ 0
0 0 -1 0 0 1
cos θ – sin θ 0
= sin θ cos θ 0 : ð17:18Þ
0 0 -1
Let us now consider successive reflections in different planes and successive rota-
tions about different axes [2]. Figure 17.5a displays two reflections with respect to
the planes σ and σ~ both perpendicular to the xy-plane. The said planes make a
dihedral angle θ with their intersection line identical to the z-axis. Also, the plane
σ is identical with the zx-plane. Suppose that an arrow lies on the xy-plane perpen-
dicularly to the zx-plane. As in (17.15), an operation σ is represented as
670 17 Symmetry Groups
2
A A
‒2
O
y
Fig. 17.5 Successive two reflections about two planes σ and σ~ that make an angle θ. (a) Reflections
σ and σ~ with respect to two planes. (b) Successive operations of σ and σ~ in this order. The combined
operation is denoted by σ~σ. The operations result in a 2θ rotation around the z-axis. (c) Successive
operations of σ~ and σ in this order. The combined operation is denoted by σ~ σ . The operations result
in a -2θ rotation around the z-axis
1 0 0
σ= 0 -1 0 : ð17:19Þ
0 0 1
cos 2θ sin 2θ 0
= sin 2θ - cos 2θ 0 : ð17:20Þ
0 0 1
cos 2θ - sin 2θ 0
σ~ σ = sin 2θ cos 2θ 0 : ð17:21Þ
0 0 1
The expression (17.21) means that the multiplication should be done first by σ
and then by σ~ ; see Fig. 17.5b. Note that σ and σ~ are conjugate to each other (Sect.
16.3). In this case, the combined operations produce a 2θ rotation around the z-axis.
If, on the other hand, the multiplication is made first by σ~ and then by σ, we have a
-2θ rotation around the z-axis; see Fig. 17.5c. As a matrix representation, we have
17.2 Successive Symmetry Operations 671
cos 2θ sin 2θ 0
σ~
σ= - sin 2θ cos 2θ 0 : ð17:22Þ
0 0 1
Thus, successive operations of reflection by the planes that make a dihedral angle
θ yield a rotation ±2θ around the z-axis (i.e., the intersection line of σ and σ~ ). The
operation σ~σ is an inverse to σ~σ. That is,
ðσ~
σ Þðσ~σ Þ = E: ð17:23Þ
σ = det~
We have detσ~ σ σ = 1.
Meanwhile, putting
cos 2θ - sin 2θ 0
R2θ = sin 2θ cos 2θ 0 , ð17:24Þ
0 0 1
we have
This implies the following: Suppose a plane σ and a straight line on it. Also
suppose that one first makes a reflection about σ and then makes a 2θ rotation around
the said straight line. Then, the resulting transformation is equivalent to a reflection
about σ~ that makes an angle θ with σ. At the same time, the said straight line is an
intersection line of σ and σ~. Note that a dihedral angle between the two planes
σ and σ~ is half an angle of the rotation. Thus, any two of the symmetry operations
related to (17.25) are mutually dependent; any two of them produce the third
symmetry operation.
In the above illustration, we did not take account of the presence of a symmetry
axis. If the aforementioned axis is a symmetry axis Cn, we must have
1 0 0
C xπ = 0 -1 0 ,
0 0 -1
cos 2θ sin 2θ 0
= sin 2θ - cos 2θ 0 , ð17:27Þ
0 0 -1
cos 2θ -sin 2θ 0
C aπ C xπ = sin 2θ cos 2θ 0 ,
0 0 1
cos 2θ sin 2θ 0
C xπ C aπ = - sin 2θ cos 2θ 0 : ð17:28Þ
0 0 1
cos 2θ - sin 2θ 0
R2θ = sin 2θ cos 2θ 0 , ð17:24Þ
0 0 1
we have [1, 2]
Note that in the above two illustrations for the successive symmetry operations,
both the relevant operators have been represented in reference to the original xyz-
system. For this reason, the latter operation was done from the left in (17.28).
From a symmetry requirement, once again the aforementioned C2 axes must be
present in combination with the Cn axis. Moreover, those C2 axes should be
perpendicular to the Cn axis. This can be seen in various Dn groups.
Another illustration of successive symmetry operations is an improper rotation. If
the rotation angle is π, this causes an inversion symmetry. In this illustration,
reflection, rotation, and inversion symmetries coexist. A C2h symmetry is a typical
example.
Equations (17.21), (17.22), and (17.24) demonstrate the same relation. Namely,
two successive mirror symmetry operations about a couple of planes and two
successive π-rotations about a couple of C2 axes cause the same effect with regard
to the geometric transformation. In this relation, we emphasize that two successive
reflection operations make a determinant of relevant matrices 1. These aspects cause
an interesting effect and we will briefly discuss it in relation to O and Td groups.
If furthermore the abovementioned mirror symmetry planes and C2 axes coexist,
the symmetry planes coincide with or bisect the C2 axes and vice versa. If these were
not the case, another mirror plane or C2 axis would be generated from the symmetry
requirement and the newly generated plane or axis would be coupled with the
original plane or axis. From the above argument, these processes again produce
another Cn axis. That must be prohibited.
Next, suppose that a Cn axis intersects obliquely with a plane of mirror symmetry.
A rotation of 2π/n around such an axis produces another mirror symmetry plane.
This newly generated plane intersects with the original mirror plane and produces a
different Cn axis according to the above discussion. Thus, in this situation a mirror
symmetry plane cannot coexist with a sole rotation axis. In a geometric object with
higher symmetry such as Oh, however, several mirror symmetry planes can coexist
with several rotation axes in such a way that the axes intersect with the mirror planes
obliquely. In case the Cn axis intersects perpendicularly to a mirror symmetry plane,
that plane can coexist with a sole rotation axis (see Fig. 17.7). This is actually the
case with a geometric object having a C2h symmetry. The mirror symmetry plane is
denoted by σ h.
Now, let us examine simple examples of molecules and geometric figures as well
as associated symmetry operations.
674 17 Symmetry Groups
(c) (d)
π/2 yield a C2 axis around their line of intersection. There are three possibilities of
choosing two axes out of three (i.e., 3C2 = 3). (3) If we choose the inversion (i) along
with, e.g., one of the three C2 axes, a σ necessarily results. This is also the case when
we first combine a σ with i to obtain a C2 axis. We have three possibilities (i.e.,
3C1 = 3) as well. Thus, we have only seven choices to construct subgroups of D2h
having an order of four. This is summarized in Table 17.5.
An inverse of any element is that element itself. Therefore, if with any above
subgroup one chooses any element out of the remaining four elements and combines
it with the identity, one can construct a subgroup Cs, C2, or Ci of an order of 2. Since
all those subgroups of an order of 4 and 2 are commutative with D2h, these sub-
groups are invariant subgroups. Thus in terms of a direct-product group, D2h can be
expressed as various direct factors. Conversely, we can construct factor groups from
coset decomposition. For instance, we have
□
Example 17.3 Figure 17.9 shows an equilateral triangle placed on the xy-plane of a
three-dimensional Cartesian coordinate. An orthonormal basis e1 and e2 are desig-
nated as shown. As for a chemical species of molecules, we have, e.g., boron
trifluoride (BF3). In the molecule, a boron atom is positioned at a molecular center
with three fluorine atoms located at vertices of the equilateral triangle. The boron
atom and fluorine atoms form a planar molecule (Fig. 17.10).
A symmetry group belonging to D3h comprises 12 symmetry operations such that
where symmetry operations of the same species but distinct operation are denoted by
a “prime” or “double prime.” When we represent these operations by a matrix, it is
straightforward in most cases. For instance, a matrix for σ v is given by Myz of
(17.15). However, we should make some matrix calculations about
C 02 , C 002 , σ 0v , and σ 00v .
To determine a matrix representation of, e.g., σ 0v in reference to the xyz-coordinate
system with orthonormal basis vectors e1 and e2, we consider the x′y′z′-coordinate
678 17 Symmetry Groups
(iii)
(ii)
(Σ );
Σ
system with orthonormal basis vectors e01 and e02 (see Fig. 17.9). A transformation
matrix between the two sets of basis vectors is represented by
p
3 1
- 0
2 2
p
e1′ e2′ e3′ = ðe1 e2 e3 ÞRzπ6 = ðe1 e2 e3 Þ 1 3 : ð17:33Þ
0
2 2
0 0 1
1 0 0
Σv = 0 -1 0 : ð17:34Þ
0 0 1
-1
Rzπ6 Σv = σ v′ Rzπ6 or σ v′ = Rzπ6 Σv Rzπ6 : ð17:35Þ
Thus, we see that in the first equation of (17.35) the order of multiplications is
reversed according as the latter operation is expressed in the xyz-system or in the
x′y′z′-system. As a full matrix representation, we get
p p
3 1 3 1
- 0 0
2 2 1 0 0 2 2
p p
σ v′ = 1 3 0 -1 0 1 3
0 0 0 1 - 0
2 2 2 2
0 0 1 0 0 1
p
1 3
2 0
2
p
= 3 1 : ð17:36Þ
- 0
2 2
0 0 1
The matrix form of (17.37) can also be decided in a manner similar to the above
according to three successive operations shown in Fig. 17.9. In terms of classes,
σ v , σ 0v , and σ 00v form a conjugacy class and C 2 , C02 , and C 002 form another conjugacy
class. With regard to the reflection and rotation, we have detσ 0v = - 1 and detC2 = 1,
respectively. □
680 17 Symmetry Groups
1 0 0 0 0 1 0 -1 0
Rxπ2 = 0 0 -1 , Ryπ2 = 0 1 0 , Rzπ2 = 1 0 0 :
0 1 0 -1 0 0 0 0 1
ð17:38Þ
χ=3
z
100
010
y
001
z x
χ=1
10 0 1 00 0 0 1 0 0 -1 0 -1 0 0 1 0
0 0 -1 0 01 0 1 0 0 1 0 1 0 0 -1 0 0
01 0 0 -1 0 -1 0 0 1 0 0 0 0 1 0 0 1 y
x
z
χ=0
001 0 01 0 0 -1 0 0 -1 010 0 -1 0 0 1 0 0 -1 0 y
100 -1 0 0 1 00 -1 0 0 001 0 0 -1 0 0 -1 0 0 1
010 0 -1 0 0 -1 0 0 10 100 10 0 -1 0 0 -1 0 0
x
χ = −1 z
10 0 -1 0 0 -1 0 0
0 -1 0 0 1 0 0 -1 0
0 0 -1 0 0 -1 0 01
y z
x
χ = −1
-1 0 0 -1 0 0 0 1 0 0 -1 0 0 0 1 0 0 -1 y
00 1 0 0 -1 1 0 0 -1 0 0 0 -1 0 0 -1 0
01 0 0 -1 0 0 0 -1 0 0 -1 1 0 0 -1 0 0 x
Fig. 17.12 Point group O and its five conjugacy classes. Geometric characteristics of individual
classes are briefly sketched
these notations, e.g., Rxπ2 stands for a π/2 counterclockwise rotation around the x-axis;
Rxπ2 denotes a π/2 counterclockwise rotation around the -x-axis. Consequently, Rxπ2
implies a π/2 clockwise rotation around the x-axis and, hence, an inverse element of
Rxπ2 . Namely, we have
-1
Rx π2 = Rx π2 : ð17:39Þ
Now let us consider the successive two rotations. This is denoted by the multi-
plication of matrices that represent the related rotations. For instance, the multipli-
cation of, e.g., Rxπ2 and R0yπ produces the following:
2
1 0 0 0 0 1 0 0 1
= 0 0 -1 0 1 0 = 1 0 0 : ð17:40Þ
0 1 0 -1 0 0 0 1 0
0 0 1 1 0 0 0 1 0
= 0 1 0 0 0 -1 = 0 0 -1 , ð17:41Þ
-1 0 0 0 1 0 -1 0 0
where Rxyz2π3 is a 2π/3 counterclockwise rotation around an axis that trisects the x-, y-,
and -z-axes. Notice that we used R0xπ this time, because it was performed after Ryπ2 .
2
Thus, we notice that there are eight related operations that trisect eight octants of the
coordinate system. These operations are further categorized into four sets in which
the two elements are an inverse element of each other. For instance, we have
-1
Rxyz2π3 = Rxyz2π3 : ð17:42Þ
Notice that a 2π/3 counterclockwise rotation around an axis that trisects the -x-,
-y-, and -z-axes is equivalent to a 2π/3 clockwise rotation around an axis that
17.3 O and Td Groups 683
-1
trisects the x-, y-, and z-axes. Also, we have Rxyz2π3 = Rxyz2π3 , etc. Moreover, we
have “cyclic” relations such as
x1 x1
A½PðxÞ = ½ðe1 ⋯ en ÞPA0 ⋮ = ð e1 ⋯ en Þ A O P ⋮ : ð11:79Þ
xn xn
-1
Rxπ2 R0yπ = RO Rxπ2 , i:e:, RO = Rxπ2 R0yπ Rxπ2 , ð17:44Þ
2 2
where RO is viewed in reference to the original (or fixed) coordinate system and
conjugate to R0yπ . Thus, we have
2
1 0 0 0 0 1 1 0 0
RO = 0 0 -1 0 1 0 0 0 1
0 1 0 -1 0 0 0 -1 0
0 -1 0
= 1 0 0 : ð17:45Þ
0 0 1
produce a C2 rotation around the remaining axis, as is the case with naphthalene
belonging to the D2h symmetry (see Sect. 17.2).
Regarding the class comprising six π rotation elements, a combination of, e.g.,
Rxyπ and Rxyπ crossing each other at a right angle causes a related effect. For the other
combinations, the two C2 axes intersect each other at π/3; see Fig. 17.6 and put
θ = π/3 there. In this respect, elementary analytic geometry teaches the positional
relationship among planes and straight lines. The argument is as follows: A plane
x1 x2 x3
determined by three points y1 , y2 , and y3 that do not sit on a line is
z1 z2 z3
expressed by a following equation:
x y z 1
x1 y1 z1 1
= 0: ð17:46Þ
x2 y2 z2 1
x3 y3 z3 1
0 1 0 x1 x2 x3
Substituting 0 , 1 , and 1 for y1 , y2 , and y3 in (17.46),
0 0 1 z1 z2 z3
respectively, we have
x - y þ z = 0: ð17:47Þ
Taking account of direction cosines and using Hesse’s normal form, we get
1
p ðx - y þ zÞ = 0, ð17:48Þ
3
where the normal to the plane expressed in (17.48) has direction cosines of p13, - p13,
and p13 in relation to the x-, y-, and z-axes, respectively.
Therefore, the normal is given by a straight line connecting the origin and
1
-1 . In other words, a line connecting the origin and a corner of a cube is the
1
normal to the plane described by (17.48). That plane is formed by two intersecting
lines, i.e., rotation axes of C2 and C02 (see Fig. 17.13 that depicts a cube of each side
of 2). These axes make an angle π/3; this can easily be checked by taking an inner
p
1=p2 0p
product between 1= 2 and 1=p2 . These column vectors are two direction
0 1= 2
cosines of C2 and C02 . On
the basis of the discussion of Sect. 17.2, we must have a
rotation axis of C3. That is, this axis trisects a solid angle π/2 shaped by three
intersecting sides.
17.3 O and Td Groups 685
(1, –1, 1)
3 y
O
2
(1, 1, 0)
Fig. 17.14 Simple kit that helps visualize the positional relationship among planes and straight
lines in three-dimensional space. To make it, follow next procedures: (1) Take three thick sheets of
paper and make slits (dashed lines) as shown. (2) Insert Sheet 2 into Sheet 1 so that the two sheets
can make a right angle. (3) Insert Sheet 3 into combined Sheets 1 and 2
E:
1 0 0
0 1 0
0 0 1
8C3:
0 0 1 0 0 1 0 0 -1 0 0 -1 0 1 0 0 -1 0 0 1 0 0 -1 0
1 0 0 -1 0 0 1 0 0 -1 0 0 0 0 1 0 0 -1 0 0 -1 0 0 1
0 1 0 0 -1 0 0 -1 0 0 1 0 1 0 0 1 0 0 -1 0 0 -1 0 0
3C2:
-1 0 0 -1 0 0 1 0 0
0 -1 0 0 1 0 0 -1 0
0 0 1 0 0 -1 0 0 -1
6S4:
-1 0 0 -1 0 0 0 0 1 0 0 -1 0 -1 0 0 1 0
0 0 -1 0 0 1 0 -1 0 0 -1 0 1 0 0 -1 0 0
0 1 0 0 -1 0 -1 0 0 1 0 0 0 0 -1 0 0 -1
6σ d:
1 0 0 1 0 0 0 0 1 0 0 -1 0 1 0 0 - 1 0
0 0 1 0 0 -1 0 1 0 0 1 0 1 0 0 -1 0 0
0 1 0 0 -1 0 1 0 0 -1 0 0 0 0 1 0 0 1
17
Symmetry Groups
17.3 O and Td Groups 687
y
O
1
p ðx ± yÞ = 0, ð17:49Þ
2
1
p ðy ± zÞ = 0, ð17:50Þ
2
1
p ðz ± xÞ = 0: ð17:51Þ
2
1 1 1
cos α = p ∙ p = ð17:52Þ
2 2 2
or
1 1 1 1
cos α = p ∙ p - p ∙ p = 0: ð17:53Þ
2 2 2 2
688 17 Symmetry Groups
That is, α = π/3 or α = π/2. On the basis of the discussion of Sect. 17.2, the
intersection of the two planes must be a rotation axis of C3 or C2. Once again, in the
case of C3, the intersection is a straight line connecting the origin and a vertex of the
cube. This can readily be verified as follows: For instance, two planes given by x = y
and y = z make an angle π/3 and produce an intersection line x = y = z. This line, in
turn, connects the origin and a vertex of the cube.
If we choose, e.g., two planes x = ± y from the above, these planes make a right
angle and their intersection must be a C2 axis. The three C2 axes coincide with the x-,
y-, and z-axes. In this light, σ d functions similarly to 6C2 of O in that their
combinations produce 8C3 or 3C2. Thus, constitution and operation of Td and
O are related.
Let us more closely inspect the structure and constitution of O and Td. First we
construct mapping ρ between group elements of O and Td such that
ρ : g 2 O ⟷ g ′ 2 T d , if g 2 T,
-1
ρ : g 2 O ⟷ - g′ 2 T d , if g 2
= T:
In the above relation, the minus sign indicates that with an inverse representation
matrix R must be replaced with -R. Then, ρ(g) = g′ is an isomorphic mapping. In
fact, comparing Fig. 17.12 and Table 17.6, ρ gives identical matrix representation for
O and Td. For example, taking the first matrix of S4 of Td, we have
-1
-1 0 0 -1 0 0 1 0 0
- 0 0 -1 =- 0 0 1 = 0 0 -1 :
0 1 0 0 -1 0 0 1 0
1 2 ⋯ n
σ= ,
i1 i2 ⋯ in
In Parts III and IV thus far, we have dealt with a broad class of linear transforma-
tions. Related groups are finite groups. Here we will describe characteristics of
special orthogonal group SO(3), a kind of infinite groups. The SO(3) represents
rotations in three-dimensional Euclidean space ℝ3. Rotations are made around an
axis (a line through the origin) with the origin fixed. The rotation is defined by an
azimuth of direction and a magnitude (angle) of rotation. The azimuth of direction is
defined by two parameters and the magnitude is defined by one parameter, a rotation
angle. Hence, the rotation is defined by three independent parameters. Since those
parameters are continuously variable, SO(3) is one of the continuous groups.
Two rotations result in another rotation with the origin again fixed. A reverse
rotation is unambiguously defined. An identity transformation is naturally defined.
An associative law holds as well. Thus, the relevant rotations form a group, i.e.,
SO(3). The rotation is represented by a real (3, 3) matrix whose determinant is 1. A
matrix representation is uniquely determined once an orthonormal basis is set in ℝ3.
Any rotation is represented by a rotation matrix accordingly. The rotation matrix R is
defined by
RT R = RRT = E, ð17:54Þ
where R is a real matrix with detR = 1. Notice that we exclude the case where
detR = - 1. Matrices that satisfy (17.54) with detR = ± 1 are referred to as
orthogonal matrices that cause orthogonal transformation. Correspondingly, orthog-
onal groups (represented by orthogonal matrices) contain rotation groups as a special
case. In other words, the orthogonal groups contain the rotation groups as a
690 17 Symmetry Groups
In this section we represent a vector such as jxi. We start with showing that any
rotation has a unique presence of a rotation axis. The rotation axis is defined by the
following: Suppose that there is a rigid body with some point within the body fixed.
Here the said rigid body can be that with infinite extent. Then the rigid body exerts
rotation. The rotation axis is a line on which every point is unmoved during the
rotation. As a matter of course, identity matrix E has three linearly independent
rotation axes. (Practically, this represents no rotation.)
Theorem 17.2 Any rotation matrix R is accompanied by at least one rotation axis.
Unless the rotation matrix is identity, the rotation matrix should be accompanied by
one and only one rotation axis.
Proof As R is an orthogonal matrix of a determinant 1, so are RT and R-1. Then we
have
ðR - E ÞT = RT - E = R - 1 - E: ð17:55Þ
Hence, we get
This equality holds with ℝn (n : odd), but in ℝn (n : even), we have detA = det (-
A). Then, (17.56) results in a trivial equation 0 = 0 accordingly. Therefore, the
discussion made below only applies to ℝn (n : odd). Thus, from (17.56) we have
17.4 Special Orthogonal Group SO(3) 691
detðR - E Þ = 0: ð17:57Þ
ðR - E Þ j x0 i = 0: ð17:58Þ
Therefore, we get
Rðajx0 iÞ = a j x0 i, ð17:59Þ
ðR - E Þ j ui = 0, ð17:61Þ
ðR - E Þ j vi = 0: ð17:62Þ
Let us consider a vector 8jyi that is chosen from Span{| ui, | vi}, to which we
assume that
That is, Span{| ui, | vi} represents a plane P formed by two mutually intersecting
straight lines. Then we have
P
| ⟩
| ⟩
Similarly, we have
hy j R T = hy j : ð17:68Þ
R{ = RT : ð17:69Þ
Now we have
In (17.70) the second equality comes from the associative law; the third is due to
(17.54). The last equality comes from that jwi is perpendicular to P.
From (17.70) we have
where the first equality comes from that R is an orthogonal matrix. Since detR=1,
a = 1. Thus, from (17.72) this time around, we have jRwi = j wi; that is,
ðR - E Þ j wi = 0: ð17:73Þ
Equations (17.65) and (17.73) imply that for any vector jxi arbitrarily chosen
from ℝ3, we have
ðR - E Þ j xi = 0: ð17:74Þ
Consequently, we get
R - E = 0 or R = E: ð17:75Þ
cos θ - sin θ 0
R= sin θ cos θ 0 : ð17:8Þ
0 0 1
1 1 1 i
p p 0 p p 0
2 2 2 2
U= - p
i
p
i
0 , U{ = 1
p - p
i
0 : ð17:76Þ
2 2 2 2
0 0 1 0 0 1
1 i 1 1
p p 0 p p 0
2 2 cos θ - sin θ 0 2 2
U { RU = p
1
-p 0
i sin θ cos θ 0 i i
0 0 1 -p p 0
2 2 2 2
0 0 1 0 0 1
iθ
e 0 0
= 0 e - iθ 0 : ð17:77Þ
0 0 1
Thus, eigenvalues are 1, eiθ, and e-iθ. The eigenvalue 1 results from the existence
of the unique rotation axis. When θ = 0, (17.77) gives an identity matrix with all the
eigenvalues 1 as expected. When θ = π, eigenvalues are -1, - 1, and 1. The
eigenvalue 1 is again associated with the unique rotation axis. The (unitary) simi-
larity transformation keeps a trace unchanged, that is, the trace χ is
χ = 1 þ 2 cos θ: ð17:78Þ
1 i 1 i
0 - 0
2 2 2 2 0 0 0
R = eiθ -
i 1
0 þ e - iθ i 1
0 þ 0 0 0 :
2 2 2 2 0 0 1
0 0 0 0 0 0
Euler angles are well known and have been used in various fields of science. We
wish to connect the above discussion with Euler angles.
In Part III we dealt with successive linear transformations. This can be extended
to the case of three or more successive transformations. Suppose that we have three
successive transformation R1, R2, and R3 and that the coordinate system (a three-
dimensional orthonormal basis) is transformed from O ⟶ I ⟶ II ⟶ III accordingly.
The symbol “O” stands for the original coordinate system and I, II, and III represent
successively transformed systems.
With the discussion that follows, let us denote the transformation by R02 , R03 , R003 ,
etc. in reference to the coordinate system I. For example, R03 means that the third
transformation is viewed from the system I. The transformation R003 indicates the third
transformation is viewed from the system II. That is, the number of primes “′”
denotes the number of the coordinate system to distinguish the systems I and II. Let
R2 (without prime) stand for the second transformation viewed from the system O.
Meanwhile, we have
R1 R02 = R2 R1 : ð17:79Þ
Let us call R02 , R003 , etc. a transformation on a “moving” coordinate system (i.e., the
systems I, II, III, ⋯). On the other hand, we call R1, R2, etc. a transformation on a
“fixed” system (i.e., original coordinate system O). Thus (17.81) shows that the
multiplication order is reversed with respect to the moving system and fixed
system [3].
For a practical purpose, it would be enough to consider three successive trans-
formations. Let us think of, however, a general case where n successive transforma-
tions are involved (n denotes a positive integer). For the purpose of succinct
notation, let us define the linear transformations and relevant coordinate systems
as those in Fig. 17.17. The diagram shows the transformation of the basis vectors.
ðiÞ
We define a following orthogonal transformation Rj :
696 17 Symmetry Groups
( ) ( ) ( ) ( )
≡
O I II III
z
y
x
ðiÞ ð0Þ
Rj ð0 ≤ i < j ≤ nÞ, and Ri Ri , ð17:83Þ
ðiÞ
where Rj is defined as a transformation Rj described in reference to the coordinate
ð0Þ
system i; Ri means that Ri is referred to the original coordinate system (i.e., the
fixed coordinate). Then we have
ði - 2Þ ði - 1Þ ði - 2Þ ði - 2Þ
Ri - 1 Rk = Rk Ri - 1 ðk > i - 1Þ: ð17:84Þ
Particularly, when i = 3
For i = 2 we have
ð1Þ
R1 Rk = Rk R1 ðk > 1Þ: ð17:86Þ
ð1Þ ð2Þ ðn - 3Þ ðn - 2Þ
R~n R1 R2 R3 ⋯Rn - 2 Rn - 1 Rðnn - 1Þ : ð17:87Þ
ðn - 2Þ
Applying (17.84) on Rn - 1 Rðnn - 1Þ and rewriting (17.87), we have
ð1Þ ð2Þ ð n - 3Þ ðn - 2Þ
R~n = R1 R2 R3 ⋯Rn - 2 Rðnn - 2Þ Rn - 1 : ð17:88Þ
ðn - 3Þ
Applying (17.84) again on Rn - 2 Rðnn - 2Þ , we get
17.4 Special Orthogonal Group SO(3) 697
ð1Þ ð2Þ ðn - 3Þ ðn - 2Þ
R~n = R1 R2 R3 ⋯Rðnn - 3Þ Rn - 2 Rn - 1 : ð17:89Þ
where with the last equality we used (17.86). In this case, we have
R1 Rðn1Þ = Rn R1 : ð17:91Þ
ð1Þ ð2Þ ð n - 3Þ
R~n = Rn Rn - 1 R1 R2 R3 ⋯Rn - 2 : ð17:92Þ
In total, we have applied the permutation of (17.84) n(n - 1)/2 times. When n is
2, n(n - 1)/2 = 1. This is the case with (17.79). If n is 3, n(n - 1)/2 = 3. This is the
case with (17.81). Thus, (17.93) once again confirms that the multiplication order is
reversed with respect to the moving system and fixed system.
Meanwhile, we define P ~ as
Alternately, we describe
~ = Rn - 1 Rn - 2 ⋯R3 R2 R1 :
P ð17:95Þ
R~n = PR ~
~ ðnn - 1Þ = Rn P: ð17:96Þ
Equivalently, we have
Rn = PR ~ -1
~ ðnn - 1Þ P ð17:97Þ
or
698 17 Symmetry Groups
~ - 1 Rn P:
Rðnn - 1Þ = P ~ ð17:98Þ
Moreover, we have
T
~ T = R1 Rð21Þ Rð32Þ ⋯Rðnn--23Þ Rðnn--12Þ
P
ð n - 2Þ T ðn - 3Þ T ð1Þ T
= Rn - 1 Rn - 2 ⋯ R2 ½R1 T
ðn - 2Þ - 1 ð n - 3Þ - 1 ð1Þ - 1
= Rn - 1 Rn - 2 ⋯ R2 R1 - 1
ð1Þ ð2Þ ðn - 3Þ ðn - 2Þ - 1 ~ -1
= R1 R2 R3 ⋯Rn - 2 Rn - 1 =P : ð17:99Þ
ðn - 2Þ ðn - 3Þ
The third equality of (17.99) comes from the fact that matrices Rn - 1 , Rn - 2 ,
ð1Þ
R2 , and R1 are orthogonal matrices. Then, we get
~T P
P ~=P
~P~ T = E: ð17:100Þ
cos α cos β cos γ - sin α sin γ - cos α cos β sin γ - sin α cos γ cos α sin β
= sin α cos β cos γ þ cos α sin γ - sin α cos β sin γ þ cos α cos γ sin α sin β :
- sin β cos γ sin β sin γ cos β
ð17:101Þ
cos α – sin α 0
R1 = sin α cos α 0 , ð17:102Þ
0 0 1
cos β 0 sin β
ð1Þ
R2 = 0 1 0 , ð17:103Þ
- sin β 0 cos β
cos γ – sin γ 0
ð2Þ
R3 = sin γ cos γ 0 : ð17:104Þ
0 0 1
-1
R3 = PR ~ - 1 = R1 Rð21Þ Rð32Þ Rð21Þ
~ ð32Þ P R1 - 1 : ð17:106Þ
ð cos 2 αcos 2 β þ sin 2 αÞ cosγ cos αsin αsin 2 βð1- cos γ Þ cos αcos β sin βð1- cos γ Þ
þ cos 2 αsin 2 β - cos β sin γ þ sin αsin β sin γ
cos αsin αsin 2 βð1 - cos γ Þ ð sin 2 αcos 2 β þ cos 2 αÞ cosγ sinα cosβ sin βð1- cos γ Þ
= þ cos β sin γ þ sin 2 α sin 2 β - cos αsin β sin γ :
cosβ sin β cos αð1- cos γ Þ sin αcos β sin βð1 - cos γ Þ sin 2 β cosγ
- sin αsin β sin γ þ cosα sinβ sin γ þ cos 2 β
ð17:107Þ
χ = 1 þ 2 cos γ: ð17:108Þ
ð2Þ
The trace is the same as that of R3 , as expected from (17.106) and (17.97).
Equation (17.107) apparently seems complicated but has simple and well-defined
ð2Þ
meaning. The rotation represented by R3 is characterized by a rotation by γ around
the z -axis. Figure 17.18 represents the orientation of the z′′-axis viewed in reference
′′
to the original xyz-system. That is, the z′′-axis (identical to the rotation axis A) is
designated by an azimuthal angle α and a zenithal angle β as shown. The operation
R3 is represented by a rotation by γ around the axis A in the xyz-system. The angles α,
β, and γ coincide with the Euler angles designated with the same independent
parameters α, β, and γ.
From (17.77) and (17.107), a diagonalizing matrix for R3 is PU. ~ That is,
eiγ 0 0
{~ -1 { ð2Þ
U P ~U= P
R3 P ~ U R3 P
~U = U { R3 U = 0 e - iγ 0 : ð17:109Þ
0 0 1
~{ = P
P ~ -1
~T = P ð17:110Þ
and
17.4 Special Orthogonal Group SO(3) 701
O y
1 1
p p 0
cos α cos β - sin α cos α sin β 2 2
~ ¼
PU sin α cos β cos α sin α sin β i i
-p p 0
- sin β 0 cos β 2 2
0 0 1
1 i 1 i
p cos α cos β þ p sin α p cos α cos β - p sin α cos α sin β
2 2 2 2
1 i 1 i
¼ p sin α cos β - p cos α p sin α cos β þ p cos α sin α sin β :
2 2 2 2
1 1
- p sin β - p sin β cos β
2 2
ð17:111Þ
jR3 - λE j = 0: ð17:112Þ
jR3 -λE j
ð cos 2 αcos 2 β þ sin 2 αÞcosγ cosαsinαsin 2 βð1- cosγ Þ cosαcosβ sinβð1- cosγ Þ
þcos 2 αsin 2 β -λ - cosβ sinγ þsinαsinβ sinγ
cosαsinαsin 2 βð1- cosγ Þ ð sin 2 αcos 2 β þ cos 2 αÞcosγ sinαcosβ sinβð1- cosγ Þ
= - cosαsinβ sinγ :
þcosβ sinγ þsin 2 αsin 2 β -λ
ð cos 2 β cos 2 α þ sin 2 αÞ cos γ sin 2 β cos α sin αð1 - cos γ Þ cos β sin β cos αð1 - cos γ Þ
þ sin 2 β cos 2 α - 1 - cos β sin γ þ sin β sin α sin γ
sin 2 β cos α sin αð1 - cos γ Þ ð cos 2 β sin 2 α þ cos 2 αÞ cos γ sin α cos β sin βð1 - cos γ Þ
þ cos β sin γ þ sin 2 α sin 2 β - 1 - cos α sin β sin γ
cos α cos β sin βð1 - cos γ Þ sin α cos β sin βð1 - cos γ Þ sin 2 β cos γ
- sin α sin β sin γ þ cos α sin β sin γ þ cos 2 β - 1
cos α sin β 0
× sin α sin β = 0 :
cos β 0
ð17:114Þ
The above matrix calculations certainly verify that (17.114) holds. The confir-
mation is left as an exercise for readers.
As an application of (17.107) to an illustrative example, let us consider a 2π/3
rotation around an axis trisecting a solid angle π/2 formed by the x-, y-, and z-axes
(see Fig. 17.19 and Sect. 17.3). Then we have
p p p
cos α = sin α = 1= 2, cos β = 1= 3, sin β = 2= 3,
p
cos γ = - 1=2, sin γ = 3=2: ð17:115Þ
z z
(a) (b)
O
y
O
y
x
Fig. 17.19 Rotation axis of C3 that permutates basis vectors. (a) The C3 axis is a straight line that
connects the origin and each vertex of a cube. (b) Cube viewed along the C3 axis that connects the
origin and vertex
0 0 1
R3 = 1 0 0 : ð17:116Þ
0 1 0
0 0 1 x1
R3 ðjxiÞ = ðje1 i je2 i je3 iÞ 1 0 0 x2 , ð17:117Þ
0 1 0 x3
where je1i, j e2i, and j e3i represent unit basis vectors in the direction of x-, y-, and z-
axes, respectively. We have jxi = x1 j e1i + x2 j e2i + x3 j e3i. Thus, we get
x1
R3 ðjxiÞ = ðje2 i je3 i je1 iÞ x2 : ð17:118Þ
x3
This implies a cyclic permutation of the basis vectors. This is well characterized
by Fig. 17.19. Alternatively, (17.117) can be expressed in terms of the column
vectors (i.e., coordinate) transformation as
704 17 Symmetry Groups
x3
R3 ðjxiÞ = ðje1 i je2 i je3 iÞ x1 : ð17:119Þ
x2
Care should be taken on which linear transformation is intended out of the basis
vector transformation or coordinate transformation.
As mentioned above, we have compared geometric features on the moving
coordinate system and fixed coordinate system. The features apparently seem to
differ at a glance, but are essentially the same. □
References
1. Hamermesh M (1989) Group theory and its application to physical problems. Dover, New York
2. Cotton FA (1990) Chemical applications of group theory, 3rd edn. Wiley, New York
3. Edmonds AR (1957) Angular momentum in quantum mechanics. Princeton University Press,
Princeton
4. Chen JQ, Ping J, Wang F (2002) Group representation theory for physicists, 2nd edn. World
Scientific, Singapore
5. Arfken GB, Weber HJ, Harris FE (2013) Mathematical methods for physicists, 7th edn. Aca-
demic Press, Waltham
Chapter 18
Representation Theory of Groups
In Sect. 16.4 we dealt with various aspects of the mapping between group elements.
Of these, we studied fundamental properties of isomorphism and homomorphism. In
this section we introduce the notion of representation of groups and study it. If we
deal with a finite group consisting of n elements, we describe it as ℊ = {g1 e, g2,
⋯, gn} as in the case of Sect. 16.1.
Definition 18.1 Let ℊ = {gν} be a group comprising elements gν. Suppose that a
(d, d ) matrix D(gν) is given for each group element gν. Suppose also that in
correspondence with
gμ gν = gρ , ð18:1Þ
D gμ Dðgν Þ = D gρ ð18:2Þ
L = fDðgν Þg
is said to be a representation.
Although the definition seems somewhat daunting, the representation is, as
already seen, merely a homomorphism.
We call individual matrices D(gν) a representation matrix. A dimension d of the
matrix is said to be a dimension of representation as well. In correspondence with
gνe = gν, we have
D g μ D ð e Þ = D gμ : ð18:3Þ
Therefore, we get
DðeÞ = E, ð18:4Þ
That is,
D gν - 1 = ½Dðgν Þ - 1 : ð18:6Þ
n
H= i=1
Dðgi Þ{ Dðgi Þ: ð18:7Þ
{ n {
D gj HD gj = i=1
D gj Dðgi Þ{ Dðgi ÞD gj
n {
= i=1
Dðgi ÞD gj Dðgi ÞD gj
n { n
= i=1
D g i gj D gi gj = k=1
D ð gk Þ { D ð gk Þ
= H, ð18:8Þ
where the third equality comes from homomorphism of the representation and the
second last equality is due to rearrangement theorem (see Sect. 16.1).
Note that each matrix D(gi){D(gi) is an Hermitian Gram matrix constructed by a
non-singular matrix and that H is a summation of such Gram matrices. Conse-
quently, on the basis of the argument of Sect. 13.2, H is positive definite and all
the eigenvalues of H are positive. Then, using an appropriate unitary matrix U, we
can get a diagonal matrix Λ such that
U { HU = Λ: ð18:9Þ
λ1
λ2
Λ= , ð18:10Þ
⋱
λd
λ1- 1
-1
Λ1=2 = λ2- 1 : ð18:12Þ
⋱
λd- 1
Notice that both Λ1/2 and (Λ1/2)-1 are non-singular. Furthermore, we define a
matrix V such that
-1
V = U Λ1=2 : ð18:13Þ
Then, multiplying both sides of (18.8) by V-1 from the left and by V from the
right and inserting VV-1 = E in between, we get
{
V - 1 D gj VV - 1 HVV - 1 D gj V = V - 1 HV: ð18:14Þ
Meanwhile, we have
-1 -1
V - 1 HV = Λ1=2 U { HU Λ1=2 = Λ1=2 Λ Λ1=2 = Λ: ð18:15Þ
With the second equality of (18.15), we used (18.9). Inserting (18.15) into
(18.14), we get
{
V - 1 D gj VΛV - 1 D gj V = Λ: ð18:16Þ
{
Λ - 1 V - 1 D gj ðVΛÞ V - 1 D gj V = E: ð18:17Þ
-1 -1 -1
Λ - 1 V - 1 = ðVΛÞ - 1 = UΛ2 U - 1 = Λ2 U{:
1 1 1
= Λ2
-1 { -1
V{ = Λ1=2 U { = Λ1=2 U{:
VΛ = UΛ1=2 :
{
{ -1 -1 { { {
V -1 = Λ2 U { = U{
1 1 1 1
= U Λ2 Λ2 = UΛ2 ,
where we used unitarity of U and Hermiticity of Λ1/2 indicated in (18.11). Using the
above relations and rewriting (18.17), finally we get
{ {
V { D gj V -1 ∙ V - 1 D gj V = E: ð18:18Þ
~ gj as follows
Defining D
~ gj V - 1 D gj V,
D ð18:19Þ
{ { {
V { D gj V -1 ~ gj :
=D ð18:20Þ
~ gj { D
D ~ gj = E: ð18:21Þ
d
gi ð ψ ν Þ = ψ μ Dμν ðgi Þ ð1 ≤ ν ≤ dÞ: ð18:22Þ
μ=1
d d
gj ½gi ðψ ν Þ = gj gi ðψ ν Þ = gj ψ μ Dμν ðgi Þ = gj ψ μ Dμν ðgi Þ
μ=1 μ=1
d d
d
= gj ψ μ Dμν ðgi Þ = λ=1
ψ λ Dλμ gj Dμν ðgi Þ
μ=1 μ=1
d
= λ=1
ψ λ D gj Dðgi Þ λν
: ð18:24Þ
d
gk ð ψ ν Þ = λ=1
ψ λ D gj Dðgi Þ λν
: ð18:25Þ
d
gk ð ψ ν Þ = ψ λ Dλν ðgk Þ: ð18:26Þ
μ=1
Comparing (18.25) and (18.26) and considering the uniqueness of vector repre-
sentation based on linear independence of the basis vectors (see Sect. 11.1), we get
18.2 Basis Functions of Representation 711
D gj D ð gi Þ λν
= Dλν ðgk Þ ½Dðgk Þλν , ð18:27Þ
where the last identity follows the notation (11.38) of Sect. 11.2. Hence, we have
B = fψ 1 , ψ 2 , ⋯, ψ d g ð18:29Þ
d
ψ 0ν = ψ λ T λν : ð18:31Þ
λ=1
d
ψμ = ψ 0λ T - 1 λμ
: ð18:32Þ
λ=1
d d
d
gi ψ 0ν = gi ðψ λ ÞT λν = λ=1
ψ μ Dμλ ðgi ÞT λν
λ=1 μ=1
d d
d d
= λ=1
ψ 0κ T - 1 κμ
Dμλ ðgi ÞT λν = κ=1
ψ 0κ T - 1 Dðgi ÞT κν
: ð18:33Þ
μ=1 κ=1
d
gi ψ 0ν = κ=1
ψ 0κ D ′ κν :
Dðgi Þ
Δ ð gi Þ = ~ ð gi Þ , ð18:34Þ
D
~ ðgi Þ.
A dimension Δ(gi) is a sum of that of D(gi) and that of D
Definition 18.4 Let D be a representation of ℊ and let D(gi) be a representation
matrix of a group element gi. If we can convert D(gi) to a block matrix such as
(18.34) by its similarity transformation, D is said to be a reducible representation, or
completely reducible. If the representation is not reducible, it is irreducible.
A reducible representation can be decomposed (or reduced) to a direct sum of
plural irreducible representations. Such an operation is called reduction.
Definition 18.5 Let ℊ = {g1, g2, ⋯, gn} be a group and let V be a vector space.
Then, we denote a representation D of ℊ operating on V by
18.2 Basis Functions of Representation 713
D : ℊ → GLðV Þ:
where with the second equality we used the fact that D(gi) is a unitary matrix; also,
see (18.6). But, since W is D(gi)-invariant from the supposition, jD(gi-1)| bi 2 W.
Here notice that gi-1 must be identical to ∃gk (1 ≤ k ≤ n) 2 ℊ. Therefore, we
should have
ajD gi - 1 jb = 0:
This implies that jD(gi)| ai 2 W⊥. In turn, this means that W⊥ is a D(gi)-invariant
subspace of V. This completes the proof.
From Theorem 14.1, we have V = W ⨁ W⊥. Correspondingly, as in the case of
(12.102), D(gi) is reduced as follows:
DðW Þ ðgi Þ
Dðgi Þ = ⊥ ,
DðW Þ ðgi Þ
⊥
where D(W )(gi) and DðW Þ ðgi Þ are representation matrices associated with subspaces
W and W⊥, respectively. This is the converse of the additive representations that
appeared in Definition 18.3. From (11.17) and (11.18), we have
H H
H C1 H H C1 H
C3 C2 C3 C2
H H H H
y
x
Fig. 18.1 Allyl radical and its resonance structure. The molecule is placed on the yz-plane. The z-
axis is identical with a straight line connecting C1 and H (i.e., C2 axis)
c1
= ðjϕ1 i jϕ2 i jϕ3 iÞ c2 :
c3
Let us now operate a group element of C2v. For example, choosing C2(z), we have
-1 0 0 c1
C 2 ðzÞðjψiÞ = ðjϕ1 i jϕ2 i jϕ3 iÞ 0 0 -1 c2 : ð18:39Þ
0 -1 0 c3
Table 18.1 Matrix representation of symmetry operations given with respect to jϕ1i, jϕ2i, and jϕ3i
E C2(z) σ v(zx) σ 0v ðyzÞ
1 0 0 -1 0 0 1 0 0 -1 0 0
0 1 0 0 0 -1 0 0 1 0 -1 0
0 0 1 0 -1 0 0 1 0 0 0 -1
-λ-1 0 0
0 -λ - 1 = 0:
0 -1 -λ
1 0 0
1 1
0 p p
U= 2 2 = U{: ð18:40Þ
1 1
0 p -p
2 2
-1 0 0 c1
C 2 ðzÞðjψiÞ = ðjϕ1 i jϕ2 i jϕ3 iÞUU { 0 0 -1 UU { c2
0 -1 0 c3
c1
1
1 1 -1 0 0 p ð c2 þ c 3 Þ
= jϕ1 i p jϕ2 þ ϕ3 i p jϕ2 - ϕ3 i 0 -1 0 2 :
2 2 0 0 1 1
p ð c2 - c3 Þ
2
The diagonalization of the representation matrix σ v(zx) using the same U as the
above is left for the readers as an exercise. Table 18.2 shows the results of the
diagonalized representations with regard to the vectors
j ϕ1 i, p12 j ϕ2 þ ϕ3 i, and p12 j ϕ2 - ϕ3 i. Notice that traces of the matrices remain
unchanged in Tables 18.1 and 18.2, i.e., before and after the unitary similarity
transformation. From Table 18.2, we find that the vectors are eigenfunctions of the
individual symmetry operations. Each diagonal element is a corresponding eigen-
value of those operations. Of the three vectors, jϕ1i and p12 j ϕ2 þ ϕ3 i have the same
eigenvalues with respect to individual symmetry operations. With symmetry oper-
ations C2 and σ v(zx), another vector p12 j ϕ2 - ϕ3 i has an eigenvalue of a sign
opposite to that of the former two vectors. Thus, we find that we have arrived at
“symmetry-adapted” vectors by taking linear combination of original vectors.
18.3 Schur’s Lemmas and Grand Orthogonality Theorem (GOT) 717
Table 18.2 Matrix representation of symmetry operations given with respect to jϕ1i,
p1 ðjϕþ ϕ3 iÞ, and p12 ðjϕ2 - ϕ3 iÞ
2 2
Returning back to Table 17.2, we see that the representation matrices have already
been diagonalized. In terms of the representation space, we constructed the repre-
sentation matrices with respect to x, y, and z as symmetry-adapted basis vectors. In
the next chapter, we make the most of such vectors constructed by the symmetry-
adapted linear combination.
Meanwhile, allocating appropriate numbers to coefficients c1, c2, and c3, we
represent any vector in V3 spanned by jϕ1i, jϕ2i, and jϕ3i. In the next chapter, we
deal with molecular orbital (MO) calculations. Including the present case of the allyl
radical, we solve the energy eigenvalue problem by appropriately determining those
coefficients (i.e., eigenvectors) and corresponding energy eigenvalues in a represen-
tation space. A dimension of the representation space depends on the number of
molecular orbitals. The representation space is decomposed into several or more (but
finite) orthogonal complements according to (18.38). We will come back to the
present example in Sect. 19.4.4 and further investigate the problems there.
As can be seen in the above example, the representation space accommodates
various types of vectors, e.g., mathematical functions in the present case. If the
representation space is decomposed into invariant subspaces, we can choose appro-
priate basis vectors for each subspace, the number of which is equal to the dimen-
sionality of each irreducible representation [4]. In this context, the situation is related
to that of Part III where we have studied how a linear vector space is decomposed
into direct sum of invariant subspaces that are spanned by associated basis vectors.
In particular, it will be of great importance to construct mutually orthogonal
symmetry-adapted vectors through linear combination of original basis vectors in
each subspace associated with the irreducible representation. We further study these
important subjects in the following several sections.
Pivotal notions of the representation theory of finite groups rest upon Schur’s
lemmas (first lemma and second lemma).
Schur’s First Lemma [1, 2]
Let D and D ~ be two irreducible representations of ℊ. Let dimensions of represen-
~ be m and n, respectively. Suppose that with 8g 2 ℊ the following
tations of D and D
relation holds:
718 18 Representation Theory of Groups
~ ðgÞ,
DðgÞM = M D ð18:41Þ
m
gðψ ν Þ = ψ μ Dμν ðgÞ ð1 ≤ ν ≤ mÞ:
μ=1
m
ϕν = ψ μ M μν ð1 ≤ ν ≤ nÞ: ð18:42Þ
μ=1
m
m m
gðϕν Þ = g ψ μ M μν = μ=1 λ=1
ψ λ Dλμ ðgÞ M μν
μ=1
=
m
ψλ
m
Dλμ ðgÞM μν =
m
ψλ
n
~ μν ðgÞ
M λμ D
λ=1 μ=1 λ=1 μ=1
=
n m
~ μν ðgÞ =
ψ λ M λμ D
n
~
ϕ D ðgÞ: ð18:43Þ
μ=1 λ=1 μ = 1 μ μν
~ ðgÞ{ M { = M { DðgÞ{ :
D ð18:44Þ
Therefore, we get
D g - 1 = DðgÞ - 1 = DðgÞ{ :
Similarly, we have
D ~ ð gÞ - 1 = D
~ g-1 = D ~ ð gÞ { :
~ g - 1 M{ = M{D g - 1 :
D ð18:46Þ
The number of rows of M{ is larger than that of columns. This time, dimension
n of D~ ðg - 1 Þ is larger than m of D(g-1). A group element g designates an arbitrary
element of ℊ, so does g-1. Thus, (18.46) can be treated in parallel with (18.41)
where m > n with a (m, n) matrix M. Figure 18.2 graphically shows the magnitude
relationship in dimensions of representation between representation matrices and M.
Thus, in parallel with the above argument of (a), we exclude the case where m < n as
well.
(c) We consider the third case of m = n. Similarly as before we make a linear
combination of vectors contained in the basis set B = fψ 1 , ψ 2 , ⋯, ψ m g such that
m
ϕν = ψ μ M μν ð1 ≤ ν ≤ mÞ, ð18:47Þ
μ=1
where M is a (m, m) square matrix. If detM = 0, then ϕ1, ϕ2, ⋯, and ϕm are linearly
dependent (see Sect. 11.4). With the number of linearly independent vectors p, we
have p < m accordingly. As in (18.43) again, this implies that we would have
obtained a representation of a smaller dimension p for D, ~ in contradiction to the
~
supposition that D is irreducible. To avoid this contradiction, we must have detM ≠ 0.
These complete the proof.
Schur’s Second Lemma [1, 2]
Let D be a representation of ℊ. Suppose that with 8g 2 ℊ we have
720 18 Representation Theory of Groups
(a): >
( ) × = ( )
×
(m, m) (m, n) (m, n) (n, n)
(b): <
( ) × = × ( )
(n, n) (n, m) (n, m) (m, m)
~ ðgÞ]
Fig. 18.2 Magnitude relationship between dimensions of representation matrices [D(g) and D
and M. (a) m > n. (b) m < n. The diagram is based on (18.41) and (18.46) of the text
If D is irreducible, Schur’s first lemma implies that we must have either (1) M -
cE = 0 or (2) det(M - cE) ≠ 0. A matrix M has at least one proper eigenvalue λ (Sect.
12.1), and so choosing λ for c in (18.49), we have det(M - λE) = 0. Consequently,
only former case is allowed. That is, we have
M = cE: ð18:50Þ
n. Let D(α) and D(β) be two irreducible representations chosen from among D(1), D(2),
⋯. Then, regarding their matrix representations, we have the following relationship:
ðαÞ ðβ Þ n
Dij ðgÞ Dkl ðgÞ = δ δ δ , ð18:51Þ
g dα αβ ik jl
where Σg means that the summation should be taken over all n group elements and dα
denotes a dimension of the representation D(α). The symbol δαβ means that δαβ = 1
when D(α) and D(β) are equivalent and that δαβ = 0 when D(α) and D(β) are
inequivalent.
Proof First we prove the case where D(α) = D(β). For the sake of simple expression,
we omit a superscript and denote D(α) simply by D. Let us construct a matrix
A such that
A= g
DðgÞXD g - 1 , ð18:52Þ
-1
Dðg0 ÞA = g
Dðg0 ÞDðgÞXD g - 1 = g
Dðg0 ÞDðgÞXD g - 1 D g0 D ð g0 Þ
-1
= g
Dðg0 ÞDðgÞXD g - 1 g0 D ð g0 Þ
-1
= g
Dðg0 gÞXD ðg0 gÞ Dðg0 Þ: ð18:53Þ
Thanks to the rearrangement theorem, for fixed g′ the element g′g runs through all
the group elements as g does so. Therefore, we have
-1
g
Dðg0 gÞXD ðg0 gÞ = g
DðgÞXD g - 1 = A: ð18:54Þ
Thus,
A = λE: ð18:56Þ
ðlÞ
The value of a constant λ depends upon the choice of X. Let X be δi δjðmÞ where all
the matrix elements are zero except for the (l, m)-component that takes 1 (Sects. 12.5
and 12.6). Thus from (18.53) we have
722 18 Representation Theory of Groups
where λlm is a constant to be determined. Using the fact that the representation is
unitary, from (18.57) we have
g
Dil ðgÞDjm ðgÞ = λlm δij : ð18:58Þ
g i
Dil ðgÞDmi g - 1 = g
D g - 1 DðgÞ ml
= g
D g - 1g ml
= g
½DðeÞml = δ
g ml
= nδml , ð18:59Þ
λ δ
i lm ii
= λlm d, ð18:60Þ
n
λlm d = nδlm or λlm = δ : ð18:61Þ
d lm
n
Dil ðgÞDjm ðgÞ = δ δ : ð18:62Þ
g d lm ij
ðαÞ ðαÞ n
Dil ðgÞDjm ðgÞ = δ δ , ð18:63Þ
g dα lm ij
B= g
DðαÞ ðgÞXDðβÞ g - 1 , ð18:64Þ
DðαÞ ðg0 ÞB = g
DðαÞ ðg0 ÞDðαÞ ðgÞXDðβÞ g - 1
-1
= g
DðαÞ ðg0 ÞDðαÞ ðgÞXDðβÞ g - 1 DðβÞ g0 DðβÞ ðg0 Þ
-1
= g
DðαÞ ðg0 gÞXDðβÞ ðg0 gÞ DðβÞ ðg0 Þ = BDðβÞ ðg0 Þ: ð18:65Þ
B = 0: ð18:66Þ
ðlÞ
Putting X = δi δjðmÞ as before and rewriting (18.64), we get
ðαÞ ðβ Þ
g
Dil ðgÞDjm ðgÞ = 0: ð18:67Þ
Combining (18.63) and (18.67), we get (18.51). These procedures complete the
proof.
18.4 Characters
d
χ ðgÞ TrDðgÞ = i=1
Dii ðgÞ, ð18:68Þ
where g stands for group elements g1, g2, ⋯, and gn, Tr stands for “trace,” and d is
a dimension of the representation D. Let ∁ be a set defined as
This is because
ggi g - 1 = gj : ð18:73Þ
χ ðgi Þ = χ gj : ð18:75Þ
4. Any two equivalent representations have the same trace. This immediately
follows from (18.30).
There are several orthogonality theorem about a trace. Among them, a following
theorem is well known.
Theorem 18.4 A trace of irreducible representations satisfies the following orthog-
onality relation:
g
χ ðαÞ ðgÞ χ ðβÞ ðgÞ = nδαβ , ð18:76Þ
where χ (α) and χ (β) are traces of irreducible representations D(α) and D(β),
respectively.
Proof In (18.51) putting i = j and k = l in both sides and summing over all i and k,
we have
18.4 Characters 725
ðαÞ ðβÞ n n n
Dii ðgÞ Dkk ðgÞ = δαβ δik δik = δαβ δik = δαβ dα
g i, k i, k
d α i, k
d α d α
= nδαβ : ð18:77Þ
From (18.68) and (18.77), we get (18.76). This completes the proof.
Since a character is identical with group elements belonging to the same
conjugacy class Kl, we may write it as χ(Kl) and rewrite a summation of (18.76) as
a summation of the conjugacy classes. Thus, we have
nc
l=1
χ ðαÞ ðK l Þ χ ðβÞ ðK l Þk l = nδαβ , ð18:78Þ
where nc denotes the number of conjugacy classes in a group and kl indicates the
number of group elements contained in a class Kl.
We have seen a case where a representation matrix can be reduced to two
(or more) block matrices as in (18.34). Also as already seen in Part III, the block
matrix decomposition takes place with normal matrices (including unitary matrices).
The character is often used to examine a constitution of a reducible representation or
reducible matrix.
Alternatively, if a unitary matrix (a normal matrix, more widely) is decomposed
into block matrices, we say that the unitary matrix comprises a direct sum of those
block matrices. In physics and chemistry dealing with atoms, molecules, crystals,
etc., we very often encounter such a situation. Extending (18.35), the relation can
generally be summarized as
where D(gi) is a reducible representation for a group element gi and D(1)(gi), D(2)(gi),
⋯, and D(ω)(gi) are irreducible representations in a group. The notation D(ω)(gi)
means that D(ω)(gi) may be equivalent (or identical) to D(1)(gi), D(2)(gi), etc. or may
be inequivalent to them. More specifically, the same irreducible representations may
well appear several times.
To make the above situation clear, we usually use a following equation instead:
ðαÞ
Dðgi Þ = q D
α α
ðgi Þ, ð18:80Þ
ðαÞ
χ ð gÞ = q χ
α α
ðgÞ, ð18:81Þ
where χ(g) is a character of the reducible representation for a group element g, where
we omitted a subscript i indicating the i-th group element gi (1 ≤ i ≤ n). To find qα,
let us multiply both sides of (18.81) by χ (α)(g) and take summation over group
elements. That is,
g
χ ð α Þ ð gÞ χ ð gÞ = q
β β
χ ðαÞ ðgÞ χ ðβÞ ðgÞ = qβ nδαβ = qα n, ð18:82Þ
g β
1
qα = χ ðαÞ ðgÞ χ ðgÞ: ð18:83Þ
n g
The integer qα explicitly gives the number of appearance of D(α)(g) that appears in
a reducible representation D(g). The expression pertinent to the classes is
1
qα = χ ðαÞ K j χ K j kj , ð18:84Þ
n j
where Kj and kj denote the j-th class of the group and the number of elements
belonging to Kj, respectively.
Now, the readers may wonder how many different irreducible representations exist
for a group. To answer this question, let us introduce a special representation of the
regular representation.
Definition 18.7 Let ℊ = {g1, g2, ⋯, gn} be a group. Let us define a (n, n) square
matrix D(R)(gν) for an arbitrary group element gν (1 ≤ ν ≤ n) such that
where
gν g μ g k = g i : ð18:88Þ
That is,
gi- 1 gν gμ gk = e: ð18:89Þ
= D ð RÞ g ν g μ : ð18:90Þ
ik
0 1 0 0
1 0 0 0
DðRÞ ðC 2 Þ = : ð18:92Þ
0 0 0 1
0 0 1 0
1 0 0 0
ð RÞ 0 1 0 0
D ðE Þ = : ð18:93Þ
0 0 1 0
0 0 0 1
0 0 1 0 0 0 0 1
ð RÞ 0 0 0 1 ð RÞ 0 0 1 0
D ðσ v Þ = ,D ðσ v ′ Þ = :
1 0 0 0 0 1 0 0
0 1 0 0 1 0 0 0
n for gν = e,
n
χ ðRÞ ðgν Þ = i=1
δ gi- 1 gν gi = ð18:94Þ
0 for gν ≠ e:
As can be seen from (18.92), the regular representation is reducible because the
matrix is decomposed into block matrices. Therefore, the representation can be
reduced to a direct sum of irreducible representations such that
χ ðRÞ ðgν Þ = q χ
α α
ðαÞ
ðgν Þ, ð18:96Þ
where χ (α) is a trace of the irreducible representation D(α). Using (18.83) and (18.94),
1 1 ðαÞ ðRÞ 1
qα = χ ðαÞ ðgÞ χ ðRÞ ðgÞ = χ ðeÞ χ ðeÞ = χ ðαÞ ðeÞ n = χ ðαÞ ðeÞ
n g n n
= dα : ð18:97Þ
Note that a dimension dα of the representation D(α) is equal to a trace of its identity
matrix. Also notice from (18.95) and (18.97) that D(R) contains every irreducible
representation D(α) dα times. To show it more clearly, Table 18.4 gives a character
table of C2v. The regular representation matrices are given in Example 18.2. For this,
we have
This relation obviously indicates that all the irreducible representations of C2v are
contained one time (that is equal to the dimension of representation of C2v).
To examine general properties of the irreducible representations, we return to
(18.94) and (18.96). Replacing qα with dα there, we get
n for gν = e,
d χ ðαÞ ðgν Þ =
α α
ð18:99Þ
0 for gν ≠ e:
d
α α
2
= n: ð18:100Þ
This is very important relation in the representation theory of a finite group in that
(18.100) sets an upper limit to the number of irreducible representations and their
dimensions. That number cannot exceed the order of a group.
730 18 Representation Theory of Groups
ℵ= ag g, ð18:101Þ
g
ℵ0 = a0g0 g0 : ð18:102Þ
g0
Also, we get
-1 -1
= ag gg0 a0g0 - 1 = agg0 gg0 g0 a0g0 - 1
g0 - 1 g g0 - 1 gg0
where we used the rearrangement theorem and suitable exchange of group elements.
Thus, we see that the above-defined set ℵ is closed under summations (i.e., linear
combinations) and multiplications. A set closed under summations and multiplica-
tions is said to be an algebra. If the set forms a group, the said set is called a group
algebra.
18.5 Regular Representation and Group Algebra 731
gK i g - 1 ⊂ K i : ð18:106Þ
Multiplying g-1 from the left and g from the right of both sides, we get
Ki ⊂ g-1Kig. Since g is arbitrarily chosen, replacing g with g-1, we have
K i ⊂ gK i g - 1 : ð18:107Þ
Therefore, we get
gK i g - 1 = K i : ð18:108Þ
ðiÞ ðiÞ
Meanwhile, for AðαiÞ , Aβ 2 K i ; AðαiÞ ≠ Aβ (1 ≤ α, β ≤ ki), we have
ðiÞ
gAαðiÞ g - 1 ≠ gAβ g - 1 8
g2ℊ : ð18:109Þ
ðiÞ
This is because if the equality holds with (18.109), we have AðαiÞ = Aβ , in
contradiction.
(c) Let K be a set collecting several classes and described as
K= ai K i , ð18:110Þ
i
gKg - 1 = K: ð18:111Þ
732 18 Representation Theory of Groups
K= ai K i þ Q, ð18:112Þ
i
gKg - 1 = g ai K i þ Q g - 1 = g ai K i g - 1 þ gQg - 1 = ai K i
i i i
þ gQg - 1
=K = ai K i þ Q, ð18:113Þ
i
where the equality before the last comes from (18.111). Thus, from (18.113) we get
gQg - 1 = Q 8
g2ℊ : ð18:114Þ
By definition of the classes, this implies that Q must form a “complete” class, in
contradiction to the supposition. Thus, (18.110) holds.
In Sect. 16.3, we described several characteristics of an invariant subgroup. As
readily seen from (16.11) and (18.111), any invariant subgroup consists of two or
more entire classes. Conversely, if a group comprises entire classes, it must be an
invariant subgroup.
(d) Let us think of a product of classes. Let Ki and Kj be sets described as (18.105).
We define a product KiKj as a set containing products
ðjÞ
AðαiÞ Aβ 1 ≤ α ≤ ki , 1 ≤ β ≤ kj . That is,
ki kj ðiÞ
KiKj = l=1
A AðmjÞ :
m=1 l
ð18:115Þ
Multiplying (18.115) by g (8g 2 ℊ) from the left and by g-1 from the right, we get
gK i K j g - 1 = ðgK i gÞ g - 1 K j g - 1 = K i K j : ð18:116Þ
where cijl is a positive integer or zero. In fact, when we take gKiKjg-1, we merely
permute the terms in (18.117).
(e) If two group elements of the group ℊ = {g1, g2, ⋯, gn} are conjugate to each
other, their inverse elements are conjugate to each other as well. In fact, suppose
that for gμ, gν 2 Ki, we have
gμ = ggν g - 1 : ð18:118Þ
gμ - 1 = ggν - 1 g - 1 : ð18:119Þ
Thus, given a class Ki, there exists another class K i0 that consists of the inverses of
the elements of Ki. If gμ ≠ gν, gμ-1 ≠ gν-1. Therefore, Ki and K i0 are of the same
order. Suppose that the number of elements contained in Ki is ki and that in K i0 is ki0 .
Then, we have
k i = k i0 : ð18:120Þ
b = gρ - 1 and b 2 K i0 : ð18:121Þ
where K1 = {e}. As mentioned below, we are most interested in the first term in
(18.122).
In (18.122), if K j = K i0 , we have
cij1 = ki : ð18:123Þ
ki for K j = K i ′ ,
cij1 = ð18:124Þ
0 for K j ≠ K i ′ :
nr n
χ ðαÞ ðK i Þχ ðαÞ K j = δ , ð18:126Þ
α=1 ki ij
8
gK i = K i g g2ℊ : ð18:127Þ
Ki = DðαÞ : ð18:128Þ
g2K i
DðαÞ K i = K i DðαÞ 8
g2ℊ : ð18:129Þ
K i = λE: ð18:130Þ
To determine λ, we take a trace of both sides of (18.130). Then, from (18.128) and
(18.130), we get
Thus, we get
k i ðαÞ
Ki = χ ðK i ÞE: ð18:132Þ
dα
ðαÞ
d χ
α α
ðK i Þ = nδi1 , ð18:135Þ
where again we have K1 = {e}. With respect to (18.134), we sum over all the
irreducible representations α. Then we have
= cij1 n: ð18:136Þ
we get
ki kj χ ðαÞ ðK i Þχ ðαÞ K j = cij1 n = k i nδji : ð18:139Þ
α
nr n
χ ðαÞ ðK i Þχ ðαÞ K j = δ : ð18:126Þ
α=1 k i ij
nr ≤ n c : ð18:141Þ
nc ≤ n r : ð18:143Þ
Thus, from (18.141) and (18.143), we finally reach a simple but very important
conclusion about the relationship between nc and nr such that
18.7 Projection Operators: Revisited 737
nr = nc : ð18:144Þ
That is, the number (nr) of inequivalent irreducible representations is equal to that
(nc) of conjugacy classes of the group. An immediate and important consequence of
(18.144) together with (18.100) is that the representation of an Abelian group is
one-dimensional. This is because in the Abelian group individual group elements
constitute a conjugacy class. We have nc = n accordingly.
In Sect. 18.2 we have described how basis vectors (or functions) are transformed by
a symmetry operation. In that case the symmetry operation is performed by a group
element that belongs to a transformation group. More specifically, given a group
ℊ = {g1, g2, ⋯, gn} and a set of basis vectors ψ 1, ψ 2, ⋯, and ψ d, the basis vectors
are transformed by gi 2 ℊ (1 ≤ i ≤ n) such that
d
gi ð ψ ν Þ = ψ μ Dμν ðgi Þ: ð18:22Þ
μ=1
We may well ask then how we can construct such basis vectors. In this section we
address this question. A central concept about this is a projection operator. We have
already studied the definition and basic properties of the projection operators. In this
section, we deal with them bearing in mind that we apply the group theory to
molecular science, especially to quantum chemical calculations.
In Sect. 18.6, we examined the permissible number of irreducible representations.
We have reached a conclusion that the number (nr) of inequivalent irreducible
representations is equal to that (nc) of conjugacy classes of the group. According
to this conclusion, we modify the above Eq. (18.22) such that
dα
ðαÞ ðαÞ ðαÞ
g ψi = ψ j Dji ðgÞ, ð18:145Þ
j=1
where α and dα denote the α-th irreducible representation and its dimension,
respectively; the subscript i is omitted from gi for simplicity. This naturally leads
ðαÞ ðβ Þ
to the next question of how the basis vectors ψ j are related to ψ j that are basis
vectors as well, but belong to a different irreducible representation β.
Suppose that we have an arbitrarily chosen function f. Then, f is assumed to
contain various components of different irreducible representations. Thus, let us
assume that f is decomposed into the component such that
738 18 Representation Theory of Groups
f= α
cðαÞ ψ ðmαÞ ,
m m
ð18:146Þ
ðαÞ
where cm is a coefficient of the expansion and ψ ðmαÞ is the m-th component of α-th
irreducible representation.
Now, let us define the following operator:
ðαÞ dα ðαÞ
PlðmÞ Dlm ðgÞ g, ð18:147Þ
n g
where Σg means that the summation should be taken over all n group elements g and
ðαÞ
dα denotes a dimension of the representation D(α). Operating PlðmÞ on f, we have
dα ðαÞ ð νÞ ðνÞ
= Dlm ðgÞ ν
c gψ k
k k
n g
dν
dα ðαÞ ðνÞ ð νÞ ðνÞ
= Dlm ðgÞ ν
c
k k
ψ j Djk ðgÞ
n g j=1
dν
dα ðνÞ ðνÞ ðαÞ ðνÞ
= ν
c
k k
ψj Dlm ðgÞ Djk ðgÞ
n j=1 g
dν
dα ðνÞ ðνÞ n ðαÞ ðαÞ
= c ψj δ δ δ = cm ψl , ð18:148Þ
n ν k k
j=1
d α αν lj mk
ðαÞ ðβÞ
Theorem 18.6 [1, 2] Let PlðmÞ and PsðtÞ be projection operators defined in (18.147).
Then, the following equation holds:
Proof We have
dα ðαÞ dβ ðβ Þ
= Dlm ðgÞ g Dst g - 1 g0 g - 1 g0
n g n g0
dα dβ ðαÞ ðβ Þ
= Dlm ðgÞ Dst g - 1 g0 gg - 1 g0
n n g g0
{
dα dβ ðαÞ
= Dlm ðgÞ DðβÞ ðgÞ DðβÞ ðg0 Þ eg0
n n g g0 st
dα dβ
ðαÞ
= Dlm ðgÞ DðβÞ ðgÞks DðβÞ ðg0 Þkt g0
n n g g0
k
dα dβ ðαÞ
= Dlm ðgÞ DðβÞ ðgÞks DðβÞ ðg0 Þkt g0
n n k
g0 g
dα dβ n dβ
= δ δ δ DðβÞ ðg0 Þkt g0 = δαβ δms DðβÞ ðg0 Þlt g0
n n k
g0
dα αβ lk ms n g0
dβ ðαÞ
= δαβ δms DðβÞ ðg0 Þlt g0 = δαβ δms PlðtÞ : ð18:150Þ
n g0
ðαÞ ðβ Þ ðαÞ
PlðmÞ PmðtÞ = δαβ PlðtÞ : ð18:151Þ
Putting t = l furthermore,
2
ðαÞ ðαÞ ðαÞ ðαÞ
PlðlÞ PlðlÞ = PlðlÞ = PlðlÞ : ð18:154Þ
ðαÞ ðαÞ
This means that the term cl ψ l has been extracted entirely.
Moreover, in (18.149) putting β = α, s = l, and t = m, we have
ðαÞ
Therefore, for PlðmÞ to be an idempotent operator, we must have m = l. That is, of
ðαÞ ðαÞ
various operators PlðmÞ , only PlðlÞ is eligible for an idempotent operator.
ðαÞ
Meanwhile, fully describing PlðlÞ we have
ðαÞ dα ðαÞ
PlðlÞ = Dll ðgÞ g: ð18:156Þ
n g
{ dα dα {
ðαÞ ðαÞ ðαÞ
PlðlÞ = Dll ðgÞg{ = Dll g-1 g-1
n g
n
g-1
dα ðαÞ ðαÞ
= Dll ðgÞ g = PlðlÞ , ð18:157Þ
n g
18.7 Projection Operators: Revisited 741
where we used unitarity of g (with the third equality) and equivalence of summation
over g and g-1. Notice that the notation of g{ is less common. This should be
interpreted as meaning that g{ operates on a vector constituting a representation
space. Thus, notation g{ implies that g{ is equivalent to its unitary representation
ðαÞ
matrix D(α)(g). Note also that Dll is not a matrix but a (complex) number. Namely,
ðαÞ
in (18.156) Dll ðgÞ is a coefficient of an operator g.
ðαÞ
Equations (18.154) and (18.157) establish that PlðlÞ is a projection operator in a
rigorous sense; namely, it is an idempotent and Hermitian operator (see Definition
ðαÞ
14.1). Let us call PlðlÞ a projection operator sensu stricto accordingly. Also, we notice
that cðmαÞ ψ m
ðαÞ
is entirely extracted from an arbitrary function f including a coefficient.
This situation resembles that of (12.205).
ðαÞ
Regarding PlðmÞ ðl ≠ mÞ, on the other hand, we have
{ dα dα {
ðαÞ ðαÞ ðαÞ
PlðmÞ = Dlm ðgÞg{ = Dlm g - 1 g - 1
n g
n
g-1
dα ðαÞ ðαÞ
= Dml ðgÞ g = PmðlÞ : ð18:158Þ
n g
ðαÞ
Hence, PlðmÞ is not Hermitian. We have many other related operators and
equations. For instance,
ðαÞ ðαÞ
Therefore, PlðmÞ PmðlÞ is Hermitian, recovering the relation (18.153).
In (18.149) putting m = l and t = s, we get
h= α
d ðαÞ ϕðmαÞ ,
m m
ð18:161Þ
dα
ðαÞ ðαÞ ðαÞ
g ϕi = ϕj Dji ðgÞ: ð18:162Þ
j=1
ðαÞ
Operating PlðlÞ on both sides of (18.155), we have
where with the second equality we used (18.154). That is, we have
ðαÞ ðαÞ
This equation means that cl ψ l is an eigenfunction corresponding to an
ðαÞ ðαÞ ðαÞ
eigenvalue 1 of PlðlÞ . In other words, once cl ψ l is extracted from f, it belongs to
the “position” l of an irreducible representation α. Furthermore, for some constants
c and d as well as the functions f and h that appeared in (18.146) and (18.161), we
consider a following equation:
ðαÞ ðαÞ ðαÞ ðαÞ ðαÞ ðαÞ ðαÞ ðαÞ ðαÞ ðαÞ ðαÞ
PlðlÞ ccl ψ l þ ddl ϕl = cPlðlÞ cl ψ l þ dPlðlÞ d l ϕl
ðαÞ ðαÞ ðαÞ ðαÞ
= ccl ψ l þ ddl ϕl , ð18:165Þ
ðαÞ ðαÞ
= δαβ δls hPlðlÞ jPlðsÞ jf , ð18:166Þ
where with the first equality we used (18.154). Meanwhile, from (18.148) we have
ðαÞ ðαÞ
PlðsÞ j f i = cðsαÞ ψ l : ð18:167Þ
{
ðαÞ ðαÞ ðαÞ ðαÞ
hh j PlðlÞ = h PlðlÞ = d l ϕl , ð18:169Þ
where we used (18.157). The relation (18.169) is due to the notation of Sect. 13.3.
Substituting (18.167) and (18.168) for (18.166),
ðαÞ ðβÞ ðαÞ ðαÞ ðαÞ
hjPlðlÞ PsðsÞ jf = δαβ δls dl cðsαÞ ϕl jψ l :
ðαÞ ðβÞ
hjPlðlÞ PsðsÞ jf = 0 ðα ≠ β or l ≠ sÞ: ð18:172Þ
ðβÞ ðαÞ
That is, under a given condition α ≠ β or l ≠ s, PsðsÞ j f i and PlðlÞ j hi are
orthogonal (see Theorem 13.3 of Sect. 13.4). The relation clearly indicates that
functions belonging to different irreducible representations (α ≠ β) are mutually
orthogonal. Even though the functions belong to the same irreducible representation,
the functions are orthogonal if they are allocated to the different “place” as a basis
ðαÞ
vector designated by l, s, etc. Here the place means the index j of ψ j in (18.146) or
ðαÞ ðαÞ ðαÞ ðαÞ ðαÞ ðαÞ
ϕj in (18.161) that designates the “order” of ψ j within ψ 1 , ψ 2 , ⋯, ψ dα or ϕj
744 18 Representation Theory of Groups
ðαÞ ðαÞ
PlðlÞ PsðsÞ = 0:
ðαÞ ðαÞ
Therefore, on the basis of the discussion of Sect. 14.1, PlðlÞ þ PsðsÞ is a projection
ðαÞ ðαÞ
operator as well in the case of l ≠ s. Notice, however, that if l = s, PlðlÞ þ PsðsÞ is not a
projection operator. Readers can readily show it.
Moreover, let us define P(α) as below:
dα ðαÞ
PðαÞ P :
l = 1 lðlÞ
ð18:173Þ
dα dα
dα ðαÞ
PðαÞ = l=1
Dll ðgÞ g = χ ðαÞ ðgÞ g: ð18:174Þ
n g n g
dα ðαÞ ðαÞ
PðαÞ f = c ψl :
l=1 l
ð18:175Þ
Thus, an operator P(α) has a clear meaning. That is, P(α) plays a role in extracting
all the vectors (or functions) that belong to the irreducible representation α including
their coefficients. Defining those functions as
dα ðαÞ ðαÞ
ψ ðαÞ c ψl ,
l=1 l
ð18:176Þ
In turn, let us calculate P(β)P(α). This can be done with (18.174) as follows:
dβ dα
PðβÞ PðαÞ = χ ðβÞ ðgÞ g χ ðαÞ ðg0 Þ g0 : ð18:178Þ
n g n g0
To carry out the calculation, (1) first we replace g′ with g-1g′ and rewrite the
summation over g′ as that over g-1g′. (2) Using the homomorphism and unitarity of
the representation, we rewrite [χ (α)(g′)] as
χ ðαÞ ðg0 Þ = i,j
DðαÞ ðgÞ DðαÞ ðg0 Þ : ð18:179Þ
ji ji
dβ dα
PðβÞ PðαÞ = DðβÞ ðgÞ DðαÞ ðgÞ DðαÞ ðg0 Þ g0
n n g k , i, j kk ji
g0
ji
dβ dα n dα
= δ δ δ DðαÞ ðg0 Þ g0 = DðαÞ ðg0 Þ g0
n n k, i, j d β αβ kj ki g0
ji n k g0
kk
dα
= δαβ χ ðαÞ ðg0 Þ g0 = δαβ PðαÞ : ð18:180Þ
n g0
These relationships are anticipated from the fact that both P(α) and P(β) are
projection operators.
As in the case of (18.160) and (18.180) is useful to evaluate an inner product of
functions. Again taking an inner product of (18.180) with arbitrary functions f and g,
we have
then P is once again a projection operator (see Sect. 14.1). Now taking summation in
(18.177) over all the irreducible representations α, we have
746 18 Representation Theory of Groups
nr nr
α=1
PðαÞ f = α=1
ψ ðαÞ = f : ð18:183Þ
P = E: ð18:184Þ
dα
ðαÞ
gðψ i Þ = ψ k Dki ðgÞ ð1 ≤ i ≤ dα Þ, ð18:185Þ
k=1
18.8 Direct-Product Representation 747
dβ
ðβ Þ
g ϕj = ϕl Dlj ðgÞ 1 ≤ j ≤ d β , ð18:186Þ
l=1
where ψ k (1 ≤ k ≤ dα) and ϕl (1 ≤ l ≤ dβ) are basis functions of D(α) and D(β),
respectively. We can construct dαdβ new basis vectors using ψ kϕl. These functions
are transformed according to g such that
ðαÞ ðβ Þ
g ψ i ϕj = gðψ i Þg ϕj = k
ψ k Dki ðgÞ ϕ D ð gÞ
l l lj
ðαÞ ðβÞ
= k l
ψ k ϕl Dki ðgÞDlj ðgÞ: ð18:187Þ
Here let us define the following matrix [D(α × β)(g)]kl, ij such that
ðαÞ ðβÞ
Dðα × βÞ ðgÞ Dki ðgÞDlj ðgÞ: ð18:188Þ
kl,ij
Then we have
g ψ i ϕj = k l
ψ k ϕl Dðα × βÞ ðgÞ : ð18:189Þ
kl,ij
Thus,
ðαÞ ðαÞ
d1,1 DðβÞ ðgÞ ⋯ d1, dα DðβÞ ðgÞ
Dðα × βÞ ðgÞ = DðαÞ ðgÞ DðβÞ ðgÞ = ⋮ ⋱ ⋮ : ð18:191Þ
ðαÞ ðαÞ
d dα ,1 DðβÞ ðgÞ ⋯ ddα , dα DðβÞ ðgÞ
we get
ðαÞ ðβ Þ
Dðα × βÞ ðgg0 Þ = Dki ðgg0 ÞDlj ðgg0 Þ
kl,ij
ðαÞ ðαÞ 0 ðβ Þ ðβ Þ 0
= μ Dkμ ðgÞDμi ðg Þ ν Dlν ðgÞDνj ðg Þ]
= μ ν
Dðα × βÞ ðgÞ Dðα × βÞ ðg0 Þ : ð18:194Þ
kl,μν μν,ij
Equation (18.194) shows that the rule of matrix calculation with respect to the
double subscript is satisfied. Consequently, we get
ðαÞ ðβÞ
Dðα × βÞ ðgÞ = Dii ðgÞDjj ðgÞ: ð18:196Þ
ij,ij
Denoting
18.8 Direct-Product Representation 749
χ ð α × β Þ ð gÞ i,j
Dðα × βÞ ðgÞ , ð18:197Þ
ij,ij
we have
1 1
qγ = χ ð γ Þ ð gÞ χ ð α × β Þ ð gÞ = χ ðγÞ ðgÞ χ ðαÞ ðgÞχ ðβÞ ðgÞ, ð18:200Þ
n g n g
where n is an order of the group. This relation is often used to perform quantum
mechanical or chemical calculations, especially to evaluate optical transitions of
matter including atoms and molecular systems. This is also useful to examine
whether a definite integral of a product of functions (or an inner product of vectors)
vanishes.
In Sect. 16.5, we investigated definition and properties of direct-product groups.
Similarly to the case of the direct-product representation, we consider a representa-
tion of the direct-product groups. Let us consider two groups ℊ and H and assume
that a direct-product group ℊ H is defined (Sect. 16.5). Also let D(α) and D(β) be
dα- and dβ-dimensional representations of groups ℊ and H , respectively. Further-
more, let us define a matrix D(α × β)(ab) as in (18.188) such that
ðαÞ ðβÞ
Dðα × βÞ ðabÞ Dki ðaÞDlj ðbÞ, ð18:201Þ
kl,ij
ðαÞ ðβ Þ
χ ðα × βÞ ðabÞ = k l
Dðα × βÞ ðabÞ = k l
Dkk ðaÞDll ðbÞ
kl,kl
of the direct-product group comprising two different groups. In fact, even though a
character is computed with respect to a sole group element g in (18.198), in (18.202)
we evaluate a character regarding two group elements a and b chosen from different
groups ℊ and H , respectively.
ðαÞ ðαÞ
g ψ i ϕj = gðψ i Þg ϕj = k l
ψ k ϕl Dki ðgÞDlj ðgÞ: ð18:203Þ
Regarding a product function ψ jϕi, we can get a similar equation such that
ðαÞ ðαÞ
g ψ j ϕi = g ψ j gð ϕi Þ = k l
ψ k ϕl Dkj ðgÞDli ðgÞ: ð18:204Þ
On the basis of the linearity of the relations (18.203) and (18.204), let us construct
a linear combination of the product functions. That is,
g ψ i ϕj ± ψ j ϕi = g ψ i ϕj ± g ψ j ϕi
ðαÞ ðαÞ ðαÞ ðαÞ
= k l
ψ k ϕl Dki ðgÞDlj ðgÞ ± k l
ψ k ϕl Dkj ðgÞDli ðgÞ
ðαÞ ðαÞ
= k l
ðψ k ϕl ± ψ l ϕk ÞDki ðgÞDlj ðgÞ: ð18:205Þ
Ψ ij± ψ i ϕj ± ψ j ϕi , ð18:206Þ
we rewrite (18.205) as
Notice that Ψ þ -
ij and Ψ ij in (18.206) are symmetric and antisymmetric with
respect to the interchange of subscript i and j, respectively. That is,
1 ð αÞ ð αÞ
= Ψ kl± μ ν
Dðα × αÞ ðgÞ ± Dðα × αÞ ðgÞ Dμi ðg0 ÞDνj ðg0 Þ ,
k l 2 kl,μν lk,μν
ð18:209Þ
where the last equality follows from the definition of a direct-product representation
(18.188). We notice that the terms of [D(α × α)(gg′)]kl, μν ± [D(α × α)(gg′)]lk, μν in
(18.209) are symmetric and antisymmetric with respect to the interchange of sub-
scripts k and l together with subscripts μ and ν, respectively. Comparing both sides of
(18.209), we see that this must be the case with i and j as well. Then, the last factor of
(18.209) should be rewritten as:
Now, we define the following notations D[α × α](g) and D{α × α}(g) represented by
1
D½α × α ðgÞ Dðα × αÞ ðgÞ þ Dðα × αÞ ðgÞ
kl,μν 2 kl,μν lk,μν
and
752 18 Representation Theory of Groups
1
Dfα × αg ðgÞ kl,μν
Dðα × αÞ ðgÞ - Dðα × αÞ ðgÞ
2 kl,μν lk,μν
gg0 Ψ þ
ij = k l μ ν
Ψþ
kl D
½α × α
ð gÞ D½α × α ðg0 Þ
kl,μν μν,ij
= k l
Ψþ
kl D
½α × α
ðgÞD½α × α ðg0 Þ : ð18:212Þ
kl,ij
gΨ þ
ij = k l
Ψþ
kl D
½α × α
ð gÞ : ð18:213Þ
kl,ij
Then we have
gg0 Ψ þ
ij = k l
Ψþ
kl D
½α × α
ðgg0 Þ : ð18:214Þ
kl,ij
gΨ ij- = k l
Ψ kl- Dfα × αg ðgÞ kl,ij
, ð18:216Þ
gg ′ Ψ ij- = k l
Ψ kl- Dfα × αg ðgÞDfα × αg ðg0 Þ kl,ij
: ð18:217Þ
Hence, we obtain
Thus, both D[α × α](g) and D{α × α}(g) produce well-defined representations.
Letting dimension of the representation α be dα, we have dα(dα + 1)/2 functions
belonging to the symmetric representation and dα(dα - 1)/2 functions belonging to
References 753
ψ 1 ϕ1 , ψ 1 ϕ2 þ ψ 2 ϕ1 , and ψ 2 ϕ2 : ð18:219Þ
ψ 1 ϕ2 - ψ 2 ϕ1 : ð18:220Þ
χ ½ α × α ð gÞ = k l
D½α × α ðgÞ D½α × α ðgÞ
kl,kl kl,kl
1 2
= χ ð α Þ ð gÞ þ χ ðαÞ g2 : ð18:221Þ
2
Similarly we have
1 2
χ fα × αg ðgÞ = χ ðαÞ ðgÞ - χ ðαÞ g2 : ð18:222Þ
2
References
1. Inui T, Tanabe Y, Onodera Y (1990) Group theory and its applications in physics. Springer,
Berlin
2. Inui T, Tanabe Y, Onodera Y (1980) Group theory and its applications in physics. Shokabo,
Tokyo
3. Hassani S (2006) Mathematical physics. Springer, New York
4. Chen JQ, Ping J, Wang F (2002) Group representation theory for physicists, 2nd edn. World
Scientific, Singapore
5. Hamermesh M (1989) Group theory and its application to physical problems. Dover, New York
Chapter 19
Applications of Group Theory to Physical
Chemistry
On the basis of studies of group theory, now in this chapter we apply the knowledge
to the molecular orbital (MO) calculations (or quantum chemical calculations). As
tangible examples, we adopt aromatic hydrocarbons (ethylene, cyclopropenyl radi-
cal, benzene, and ally radical) and methane. The approach is based upon a method of
linear combination of atomic orbitals (LCAO). To seek an appropriate LCAO MO,
we make the most of a method based on a symmetry-adapted linear combination
(SALC). To use projection operators is a powerful tool for this purpose. For the sake
of correct understanding, it is desired to consider transformation of functions. To this
end, we first show several examples.
In the process of carrying out MO calculations, we encounter a secular equation
as an eigenvalue equation. Using a SALC eases the calculations of the secular
equation. Molecular science relies largely on spectroscopic measurements and
researchers need to assign individual spectral lines to a specific transition between
the relevant molecular states. Representation theory works well in this situation.
Thus, the group theory finds a perfect fit with its applications in the molecular
science.
O x
f 0 OR f : ð19:1Þ
The RHS of (19.1) means that f ′ is produced as a result of operating the rotation
R on f. We describe the position vector after being transformed as r00 . Following the
notation of Sect. 11.1, r00 and r′ are expressed as
x00
r0 = ðe1 e2 Þ x0
y0
and r00 = ðe1 e2 Þ y00
, ð19:3Þ
where e1 and e2 are orthonormal basis vectors in the xy-plane. Also we have
x0
r = ðe 1 e2 Þ x
y
and r 0 = ð e1 e2 Þ y0
: ð19:4Þ
where
x x0
r = ð e1 e 2 e 3 Þ y and r 0 = ð e1 e2 e3 Þ y0 : ð19:8Þ
z z0
The last relation of (19.7) comes from (19.1). Using (19.2) and (19.3), we rewrite
(19.7) as
That is,
f 0 OR f Rf : ð19:13Þ
A contour is shown in Fig. 19.2. We consider a π/2 rotation around the z-axis.
Then, f(x, y) is transformed into Rf(x, y) = f ′(x, y) such that
758 19 Applications of Group Theory to Physical Chemistry
y' O x
z
We also have
r 0 = ð e1 e2 Þ a
b
,
-1 -b
r00 = ðe1 e2 Þ 0
1 0
a
b
= ð e1 e2 Þ a
,
where we define R as
-1
R= 0
1 0
:
Similarly,
-y
r = ð e1 e 2 Þ x
y
and r0 = ðe1 e2 Þ x
:
This ensures that (19.5) holds. The implication of (19.16) combined with (19.5) is
that a view of f ′(x′, y′) from (-b, a) is the same as that of f(x, y) from (a, b). Imagine
that if we are standing at (-b, a) of f ′(x′, y′), we are in the bottom of the “valley” of
f ′(x′, y′). Exactly in the same manner, if we are standing at (a, b) of f(x, y), we are in
19.1 Transformation of Functions 759
y
, ,
(− , 0) ( , 0)
O
z x
Fig. 19.3 Contour map of f ðrÞ = e - 2½ðx - aÞ þy þz þ e - 2½ðxþaÞ þy þz ða > 0Þ. The function form
2 2 2 2 2 2
the bottom of the valley of f(x, y) as well. Notice that for both f(x, y) and f ′(x′, y′), (a,
b) and (-b, a) are the lowest point, respectively.
Meanwhile, we have
R-1 = 0
-1
1
0
, R - 1 ðrÞ = ðe1 e2 Þ 0
-1
1
0
x
y
= ð e1 e2 Þ y
-x
:
Then we have
(a) (b)
0
‒a 0 a ‒a 0 a
-1 0 0 -1 0 0
R= 0 1 0 and R-1 = 0 1 0 : ð19:20Þ
0 0 1 0 0 1
Plotting f(r) and g(r) as a function of x on the x-axis, we depict results in Fig. 19.4.
Looking at (19.21) and (19.22), we find that f(r) and g(r) are solutions of an
eigenvalue equation for an operator R. Corresponding eigenvalues are 1 and -1 for
f(r) and g(r), respectively. In particular, f(r) holds the functional form after the
transformation R. In this case, f(r) is said to be invariant with the transformation R.
Moreover, f(r) is invariant with the following eight transformations:
±1 0 0
R= 0 ±1 0 : ð19:23Þ
0 0 ±1
ħ2 2
Hψ i ðrÞ - — þ V ðrÞ ψ i ðrÞ = E i ψ i ðrÞ, ð19:24Þ
2m
Or we have
Comparing the first and last sides and remembering that ψ is an arbitrary function,
we get
RV = VR: ð19:29Þ
Next, let us examine the symmetry of the Laplacian — 2. The Laplacian is defined
in Sect. 1.2 as
2 2 2
∂ ∂ ∂
—2 2
þ 2þ 2, ð1:24Þ
∂x ∂y ∂z
where x, y, and z denote the Cartesian coordinates. Let S be an orthogonal matrix that
transforms the xyz-coordinate system to x′y′z′-coordinate system. We suppose that
S is expressed as
SST = ST S = E, ð19:32Þ
∂ ∂x ∂ ∂y ∂ ∂z ∂ ∂ ∂ ∂
= þ þ = s11 þ s12 þ s13 : ð19:33Þ
∂x0 ∂x0 ∂x ∂x0 ∂y ∂x0 ∂z ∂x ∂y ∂z
∂ ∂ ∂ ∂ ∂ ∂ ∂
= s11 0 þ s12 0 þ s13 0
∂x 0 2 ∂x ∂x ∂x ∂y ∂x ∂z
19.2 Method of Molecular Orbitals (MOs) 763
∂ ∂ ∂ ∂ ∂ ∂ ∂ ∂
= s11 s11 þ s12 þ s13 þ s12 s11 þ s12 þ s13
∂x ∂y ∂z ∂x ∂x ∂y ∂z ∂y
∂ ∂ ∂ ∂
þs13 s11 þ s12 þ s13
∂x ∂y ∂z ∂z
2 2 2
∂ ∂ ∂
= s11 2 þ s11 s12 þ s11 s13
∂x
2 ∂y∂x ∂z∂x
2 2 2
∂ ∂ ∂
þs12 s11 þ s12 2 2 þ s12 s13
∂x∂y ∂y ∂z∂y
2 2 2
∂ ∂ ∂
þs13 s11 þ s13 s12 þ s13 2 2 : ð19:34Þ
∂x∂z ∂y∂z ∂z
∂ ∂
Calculating terms of ∂y0 2
and ∂z0 2
, we have similar results. Then, collecting all
those 27 terms, we get, e.g.,
2 2
∂ ∂
s11 2 þ s21 2 þ s31 2 2
= 2, ð19:35Þ
∂x ∂x
2 2
∂
where we used an orthogonal relationship of S. Cross terms of , ∂ ,
∂x∂y ∂y∂x
etc. all
vanish. Consequently, we get
2 2 2
∂ ∂ ∂ ∂ ∂ ∂
þ þ = þ þ : ð19:36Þ
∂x0 2 ∂y0 2 ∂z0 2 ∂x2 ∂y2 ∂z2
Defining — ′2 as
∂ ∂ ∂
—0
2
þ 02 þ 02 , ð19:37Þ
∂x 0 2
∂y ∂z
we have
— 0 = — 2:
2
ð19:38Þ
Notice that (19.38) holds with not only the symmetry operation of the molecule,
but also any rotation operation in ℝ3 (see Sect. 17.4).
764 19 Applications of Group Theory to Physical Chemistry
Consequently, we get
R— 2 = — 2 R: ð19:40Þ
R — 2 þ V = — 2 þ V R: ð19:41Þ
RH = HR: ð19:42Þ
Thus, we confirm that the Hamiltonian H commutes with any symmetry operation
R. In other words, H is invariant with the coordinate transformation relevant to the
symmetry operation.
Now we consider matrix representation of (19.42). Also let D be an irreducible
representation of the symmetry group ℊ. Then, we have
DðgÞH = HDðgÞ g 2 ℊ :
H = λE, ð19:43Þ
λ ⋯ 0
H= ⋮ ⋱ ⋮ , ð19:44Þ
0 ⋯ λ
where the above matrix is (d, d ) square diagonal matrix. This implies that H is
reduced to
19.2 Method of Molecular Orbitals (MOs) 765
H = λDð0Þ ⋯ λDð0Þ ,
where D(0) denotes the totally symmetric representation; notice that it is given by
1 for any symmetry operation. Thus, the commutability of an operator with all
symmetry operation is equivalent to that the operator belongs to the totally symmet-
ric representation. Since H is Hermitian, λ should be real.
Operating both sides of (19.42) on {ψ 1, ψ 2, ⋯, ψ d} from the right, we have
ðψ 1 ψ 2 ⋯ ψ d ÞRH = ðψ 1 ψ 2 ⋯ ψ d ÞHR
λ ⋯
λ
= ðψ 1 ψ 2 ⋯ ψ d Þ ⋮ ⋱ ⋮ R = ðλψ 1 λψ 2 ⋯ λψ d ÞR
⋯ λ
ðψ 1 ψ 2 ⋯ ψ d ÞH = λðψ 1 ψ 2 ⋯ ψ d Þ:
ψ i H = λψ i ð1 ≤ i ≤ dÞ: ð19:46Þ
d
Rðψ i Þ = k=1
ψ k Dki ðRÞ,
we rewrite (19.45) as
ðψ 1 ψ 2 ⋯ ψ d ÞRH = ðψ 1 R ψ 2 R ⋯ ψ d RÞH
d d d
= k=1
ψ k Dk1 ðRÞ k=1
ψ k Dk2 ðRÞ ⋯ k=1
ψ k Dkd ðRÞ H
d d d
=λ k=1
ψ k Dk1 ðRÞ k=1
ψ k Dk2 ðRÞ ⋯ k=1
ψ k Dkd ðRÞ ,
where with the second equality we used the relation (11.42); i.e.,
766 19 Applications of Group Theory to Physical Chemistry
ψ i R = Rðψ i Þ:
Thus, we get
d d
k=1
ψ k Dki ðRÞH = λ k=1
ψ k Dki ðRÞ ð1 ≤ i ≤ dÞ: ð19:47Þ
d d d d
k=1
ψ~ k Dki ðRÞj l=1
ψ~ l Dlj ðRÞ = k=1 l=1
Dki ðRÞ Dlj ðRÞhψ~ k j~
ψ li
d d
= k=1
Dki ðRÞ Dkj ðRÞ = k=1
D{ ðRÞ ik ½DðRÞkj = D{ ðRÞDðRÞ ij
= δij :
With the last equality, we used unitarity of D(R). Thus, the orthonormal basis set
is retained after unitary transformation of fψ~ 1 , ψ~ 2 , ⋯, ψ~ d g.
In the above discussion, we have assumed that ψ 1, ψ 2, ⋯, and ψ d belong to a
certain irreducible representation D(ν). Since ψ~ 1 , ψ~ 2 , ⋯, and ψ~ d consist of their linear
combinations, ψ~ 1 , ψ~ 2 , ⋯, and ψ~ d belong to D(ν) as well. Also, we have assumed that
ψ~ 1 , ψ~ 2 , ⋯, and ψ~ d form an orthonormal basis and, hence, according to Theorem 11.3
these vectors constitute basis vectors that belong to D(ν). In particular, functions
derived using projection operators share these characteristics. In fact, the above
principles underlie molecular orbital calculations dealt with in Sects. 19.4 and
19.5. We will go into more details in subsequent sections.
Bearing the aforementioned argument in mind, we make the most of the relation
expressed by (18.171) to evaluate inner products of functions (or vectors). In this
connection, we often need to calculate matrix elements of an operator. One of the
most typical examples is matrix elements of Hamiltonian. In the field of molecular
science we have to estimate an overlap integral, Coulomb integral, resonance
integral, etc. Other examples include transition matrix elements pertinent to, e.g.,
electric dipole transition. To this end, we deal with direct-product representation (see
Sects. 18.7 and 18.8). Let O(γ) be an Hermitian operator belonging to the γ-th
irreducible representation. Let us think of a following inner product:
19.2 Method of Molecular Orbitals (MOs) 767
ðβ Þ
ϕl jOðγÞ ψ ðsαÞ , ð19:48Þ
ðβ Þ
where ψ ðsαÞ and ϕl are the s-th component of α-th irreducible representation and the
l-th component of β-th irreducible representation, respectively; see (18.146) and
(18.161).
(i) Let us think of first the case where O(γ) is Hamiltonian H. Suppose that ψ ðsαÞ
belongs to an eigenvalue λ of H. Then, we have
ðαÞ ðαÞ
H j ψ l i = λ j ψ l i:
ðβ Þ ðβÞ
ϕl jHψ ðsαÞ = λ ϕl jψ ðsαÞ : ð19:49Þ
At the first glance this equation seems trivial, but the equation gives us a powerful
tool to save us a lot of troublesome calculations. In fact, if we encounter a series of
inner product calculations (i.e., definite integrals), we ignore many of them. In
(19.50), the matrix elements of Hamiltonian do not vanish only if α = β and l = s.
ðαÞ ðαÞ
That is, we only have to estimate ϕl jψ l . Otherwise the inner products vanish.
ðαÞ ðαÞ
The functional forms of ϕl and ψ l are determined depending upon individual
practical problems. We will show tangible examples later (Sects. 19.4 and 19.5).
(ii) Next, let us consider matrix elements of the optical transition. In this case, we
are thinking of transition probability between quantum states. Assuming the dipole
ðβ Þ
approximation, we use εe P(γ) for O(γ) in ϕl jOðγÞ ψ ðsαÞ of (19.48). The quantity
P(γ) represents an electric dipole moment associated with the position vector.
Normally, we can readily find it in a character table, in which we examine which
irreducible representation γ the position vector components x, y, and z indicated in a
rightmost column correspond to. Table 18.4 is an example. If we take a unit
polarization vector εe in parallel to the position vector component that the character
ðβ Þ
table designates, εe P(γ) is non-vanishing. Suppose that ψ ðsαÞ and ϕl are an initial
state and final state, respectively. At the first glance, we would wish to use (18.200)
and count how many times a representation β occurs for a direct-product represen-
tation D(γ × α). It is often the case, however, where β does not belong to an irreducible
representation.
Even in such a case, if either α or β belongs to a totally symmetric representation,
the handling will be easier. This situation corresponds to that an initial electronic
configuration or a final electronic configuration forms a closed shell an electronic
768 19 Applications of Group Theory to Physical Chemistry
state of which is totally symmetric [1]. The former occurs when we consider the
optical absorption that takes place, e.g., in a molecule of a ground electronic state.
The latter corresponds to an optical emission that ends up with a ground state. Let us
ðγ Þ
consider the former case. Since Oj is Hermitian, we rewrite (19.48) as
ðβÞ ðγ Þ ðγ Þ ðβÞ ðγ Þ ðβ Þ
ϕl jOj ψ ðsαÞ = Oj ϕl jψ ðsαÞ = ψ ðsαÞ jOj ϕl , ð19:51Þ
where we assume that ψ ðsαÞ is a ground state having a closed shell electronic
configuration. Therefore, ψ ðsαÞ belongs to a totally symmetric irreducible represen-
tation. For (19.48) not to vanish, therefore, we may alternatively state that it is
necessary for
DðβÞ = q D
ω ω
ðωÞ
,
Dðγ Þ × DðβÞ = q D
ω ω
ðγ Þ
× DðωÞ :
Thus, applying (18.199) and (18.200), we can obtain a direct sum of irreducible
representations. After that, we examine whether an irreducible representation α is
contained in D(γ) × D(β). Since the totally symmetric representation is
one-dimensional, s = 1 in (19.51).
where cki are complex coefficients. We assume that ϕk are normalized. That is,
19.3 Calculation Procedures of Molecular Orbitals (MOs) 769
where dτ implies that an integration should be taken over ℝ3. The notation hg| fi
means an inner product defined in Sect. 13.1. The inner product is usually defined by
a definite integral of gf whose integration range covers a part or all of ℝ3 depending
on a constitution of a physical system.
First we try to solve Schrödinger equation given as an eigenvalue equation. The
said equation is described as
Hψ = λψ or ðH - λψ Þ = 0: ð19:54Þ
where the subscript i in (19.52) has been omitted for simplicity. Multiplying ϕj from
the left and integrating over whole ℝ3, we have
n
c
k=1 k
ϕj ðH - λÞϕk dτ = 0: ð19:56Þ
n
c
k=1 k
ϕj Hϕk - λϕj ϕk dτ = 0: ð19:57Þ
H 11 - λ ⋯ H 1n - λS1n c1
⋮ ⋱ ⋮ ⋮ = 0: ð19:60Þ
H n1 - λSn1 ⋯ H nn - λ cn
det H jk - λSjk = 0
or
H 11 - λ ⋯ H 1n - λS1n
⋮ ⋱ ⋮ = 0: ð19:61Þ
H n1 - λSn1 ⋯ H nn - λ
The above notation has already been introduced in Sect. 1.4. Equation (19.62)
certainly satisfies the definition of the inner product described in (13.2)–(13.4);
readers, please check it. On the basis of (13.64), we have
hyjHxi = xH { jy : ð19:63Þ
~ 11 - λ~S11
H
~ 22 - λ~S22
H
⋱
= 0, ð19:66Þ
~ nn - λ~Snn
H
where all the off-diagonal elements are zero and an eigenvalue λ is given by
19.3 Calculation Procedures of Molecular Orbitals (MOs) 771
~ ii =~
λi = H Sii : ð19:67Þ
This means that the eigenvalue equation has automatically been solved.
The best way to achieve this is to choose basis functions (vectors) such that the
functions conform to the symmetry which the molecule belongs to. That is, we
“shuffle” the atomic orbitals as follows:
n
ξi = d ϕ
k = 1 ki k
ð1 ≤ i ≤ nÞ, ð19:68Þ
where ξi are new functions chosen instead of ψ i of (19.52). That is, we construct a
linear combination of the atomic orbitals that belongs to individual irreducible
representation. The said linear combination is called symmetry-adapted linear com-
bination (SALC). We remark that all atomic orbitals are not necessarily included in
the SALC. Then we have
~ jk =
H ξj Hξk dτ and ~
Sjk = ξj ξk dτ: ð19:69Þ
~ jk - λ~
det H Sjk = 0: ð19:70Þ
ðαÞ dα ðαÞ
PlðlÞ = Dll ðgÞ g: ð19:71Þ
n g
dα dα
dα ðαÞ
PðαÞ = D ðgÞ g =
l = 1 ll
χ ðαÞ ðgÞ g: ð18:174Þ
n g n g
ðαÞ
In the one-dimensional representation, PlðlÞ and P(α) are identical. As expressed in
(18.155) and (18.175), these projection operators act on an arbitrary function and
extract specific component(s) pertinent to a specific irreducible representation of the
point group which the molecule belongs to.
ðαÞ
At first glance the definition of PlðlÞ and P(α) looks daunting, but the use of
character tables relieves a calculation task. In particular, all the irreducible represen-
tations are one-dimensional (i.e., just a number!) with Abelian groups as mentioned
in Sect. 18.6. For this, notice that individual group elements form a class by
themselves. Therefore, utilization of character tables becomes easier for Abelian
groups. Even though we encounter a case where a dimension of representation is
ðαÞ
more than one (i.e., the case of non-commutative groups), Dll can be determined
without much difficulty.
19.4 MO Calculations Based on π-Electron Approximation 773
On the basis of the general argument developed in Sects. 19.2 and 19.3, we preform
molecular orbital calculations of individual molecules. First, we apply group theory
to the molecular orbital calculations about aromatic hydrocarbons such as ethylene,
cyclopropenyl radical and cation as well as benzene and allyl radical. In these cases,
in addition to adoption of the molecular orbital theory, we adopt the so-called
π-electron approximation. With the first three molecules, we will not have to
“solve” a secular equation. However, for allyl radical we deal with two SALC
orbitals belonging to the same irreducible representations and, hence, the final
molecular orbitals must be obtained by solving the secular equation.
19.4.1 Ethylene
We start with one of the simplest examples, ethylene. Ethylene is a planar molecule
and belongs to D2h symmetry (see Sect. 17.2). In the molecule two π-electrons
extend vertically to the molecular plane toward upper and lower directions
(Fig. 19.5). The molecular plane forms a node to atomic 2pz orbitals; that is those
atomic orbitals change a sign relative to the molecular plane (i.e., the xy-plane).
In Fig. 19.5, two pz atomic orbitals of carbon are denoted by ϕ1 and ϕ2. We
should be able to construct basis vectors using ϕ1 and ϕ2. Corresponding to the two
atomic orbitals, we are dealing with a two-dimensional vector space. Let us consider
how ϕ1 and ϕ2 are transformed by a symmetry operation. First we examine an
operation C2(z). This operation exchanges ϕ1 and ϕ2. That is, we have
H z H
y y
C C
H H x
x
Fig. 19.5 Ethylene molecule placed on the xyz-coordinate system. Two pz atomic orbitals of carbon
are denoted by ϕ1 and ϕ2. The atomic orbitals change a sign relative to the molecular plane (i.e., the
xy-plane)
774 19 Applications of Group Theory to Physical Chemistry
C 2 ð z Þ ð ϕ1 Þ = ϕ2 ð19:72Þ
and
C 2 ð z Þ ð ϕ2 Þ = ϕ 1 : ð19:73Þ
C 2 ðzÞ = 0
1
1
0
: ð19:75Þ
cos θ – sin θ 0
Rzθ = sin θ cos θ 0 ð19:76Þ
0 0 1
or
-1 0 0
C2 ðzÞ = 0 -1 0 : ð19:77Þ
0 0 1
Whereas (19.76) and (19.77) show the transformation of a position vector in ℝ3,
(19.75) represents the transformation of functions in a two-dimensional vector space.
A vector space composed of functions may be finite-dimensional or infinite-
dimensional. We have already encountered the latter case in Part I where we dealt
with the quantum mechanics of a harmonic oscillator. Such a function space is often
referred to as a Hilbert space. An essence of (19.75) is characterized by that a trace
(or character) of the matrix is zero.
Let us consider another transformation C2( y). In this case the situation is different
from the above case in that ϕ1 is converted to -ϕ1 by C2( y) and that ϕ2 is converted
to -ϕ2. Notice again that the molecular plane forms a node to atomic p-orbitals.
Thus, C2( y) is represented by
-1
C 2 ð yÞ = 0
0
-1
: ð19:78Þ
The trace of the matrix C2( y) is -2. In this way, choosing ϕ1 and ϕ2 for the basis
functions for the representation of D2h we can determine the characters for individual
symmetry transformations of atomic 2pz orbitals in ethylene belonging to D2h. We
collect the results in Table 19.1.
19.4 MO Calculations Based on π-Electron Approximation 775
Table 19.1 Characters for individual symmetry transformations of atomic 2pz orbitals in ethylene
D2h E C2(z) C2( y) C2(x) i σ(xy) σ(zx) σ(yz)
Γ 2 0 -2 0 0 -2 0 2
As a next step, we are going to find an appropriate basis function belonging to two
irreducible representations of (19.80). For this purpose, we use projection operators
expressed as
dα
PðαÞ = χ ðαÞ ðgÞ g: ð18:174Þ
n g
1
PðB1u Þ ϕ1 = χ ðB1u Þ ðgÞ gϕ1
8 g
776 19 Applications of Group Theory to Physical Chemistry
1
= ½1 ∙ ϕ1 þ 1 ∙ ϕ2 þ ð- 1Þð- ϕ1 Þ þ ð- 1Þð- ϕ2 Þ þ ð- 1Þð- ϕ2 Þ
8
1
þð- 1Þð- ϕ1 Þ þ 1 ∙ ϕ2 þ 1 ∙ ϕ1 = ðϕ1 þ ϕ2 Þ: ð19:81Þ
2
PðB3g Þ ϕ1 = χ ðB3g Þ ðgÞ gϕ1
1
8 g
1
= ½1 ∙ ϕ1 þ ð- 1Þϕ2 þ ð- 1Þð- ϕ1 Þ þ 1ð- ϕ2 Þ þ 1ð- ϕ2 Þ þ ð- 1Þð- ϕ1 Þ
8
1
þð- 1Þϕ2 þ 1 ∙ ϕ1 = ðϕ - ϕ2 Þ: ð19:82Þ
2 1
Thus, after going through routine but sure procedures, we have reached appro-
priate basis functions that belong to each irreducible representation.
As mentioned earlier (see Table 17.5), D2h can be expressed as a direct-product
group and C2 group is contained in D2h as a subgroup. We can use it for the present
analysis. Suppose now that we have a group ℊ and that H is a subgroup of ℊ. Let
g be an arbitrary element of ℊ and let D(g) be a representation of g. Meanwhile, let
h be an arbitrary element of H . Then with 8 h 2 H , a collection of D(h) is a
representation of H . We write this relation as
D#H: ð19:83Þ
1 1 1
PðAÞ ϕ1 = χ ðAÞ ðgÞ gϕ1 = ½1 ∙ ϕ1 þ 1 ∙ ϕ2 = ðϕ1 þ ϕ2 Þ: ð19:85Þ
2 g 2 2
Also, we have
19.4 MO Calculations Based on π-Electron Approximation 777
1 1 1
PðBÞ ϕ1 = χ ðBÞ ðgÞ gϕ1 = ½1 ∙ ϕ1 þ ð- 1Þϕ2 = ðϕ1 - ϕ2 Þ: ð19:86Þ
2 g 2 2
The relations of (19.85) and (19.86) are essentially the same as (19.81) and
(19.82), respectively.
We can easily construct a character table (see Table 19.3). There should be two
irreducible representations. Regarding the totally symmetric representation, we
allocate 1 to each symmetry operation. For another representation we allocate 1 to
an identity element E and -1 to an element C2 so that the row and column of the
character table are orthogonal to each other.
Readers might well ask why we bother to make circuitous approaches to reaching
predictable results such as (19.81) and (19.82) or (19.85) and (19.86). This question
seems natural when we are dealing with a case where the number of basis vectors
(i.e., a dimension of the vector space) is small, typically 2 as in the present case. With
increasing dimension of the vector space, however, to seek and determine appropri-
ate SALCs become increasingly complicated and difficult. Under such circum-
stances, a projection operator is an indispensable tool to address the problems.
We have a two-dimensional secular equation to be solved such that
~ 11 - λS~11
H ~ 12 - λS~12
H
~ 21 - λ~S21 ~ 22 - λ~S22
= 0: ð19:87Þ
H H
In the above argument, 12 ðϕ1 þ ϕ2 Þ belongs to B1u and 12 ðϕ1 - ϕ2 Þ belongs to B3g.
Since they belong to different irreducible representations, we have
~ 12 = ~
H S12 = 0: ð19:88Þ
~ 11 - λ~S11
H
0 H
0
~ 22 - λ~S22 = 0: ð19:89Þ
~ 11 =~
λ1 = H S11 ~ 22 =~S22 :
and λ2 = H ð19:90Þ
The next step is to determine the energy eigenvalue of the molecule. Note here
that a role of SALCs is to determine a suitable irreducible representation that
corresponds to a “direction” of a vector. As the coefficient keeps the direction of a
vector unaltered, it would be of secondary importance. The final form of normalized
MOs can be decided last. That procedure includes the normalization of a vector.
Thus, we tentatively choose following functions for SALCs, i.e.,
778 19 Applications of Group Theory to Physical Chemistry
ξ 1 = ϕ1 þ ϕ2 and ξ 2 = ϕ 1 - ϕ2 :
Then, we have
~ 11 =
H ξ1 Hξ1 dτ = ðϕ1 þ ϕ2 Þ H ðϕ1 þ ϕ2 Þdτ
= H 11 þ H 12 þ H 21 þ H 22 = H 11 þ 2H 12 þ H 22 : ð19:91Þ
Similarly, we have
~ 22 =
H ξ2 Hξ2 dτ = H 11 - 2H 12 þ H 22 : ð19:92Þ
The last equality comes from the fact that we have chosen real functions for ϕ1
and ϕ2 as studied in Part I. Moreover, we have
= H 21 : ð19:93Þ
The first equation comes from the fact that both H11 and H22 are calculated using
the same 2pz atomic orbital of carbon. The second equation results from the fact that
H is Hermitian. Notice that both ϕ1 and ϕ2 are real functions. Following the
convention, we denote
α H 11 = H 22 and β H 12 , ð19:94Þ
~ 11 = 2ðα þ βÞ:
H ð19:95Þ
~ 22 = 2ðα - βÞ:
H ð19:96Þ
Meanwhile, we have
~
S11 = hξ1 jξ1 i = ðϕ1 þ ϕ2 Þ ðϕ1 þ ϕ2 Þdτ = ðϕ1 þ ϕ2 Þ2 dτ
where we used the fact that ϕ1 and ϕ2 have been normalized. Also following the
convention, we denote
S ϕ1 ϕ2 dτ = S12 , ð19:98Þ
~
S11 = 2ð1 þ SÞ: ð19:99Þ
Similarly, we get
~
S22 = 2ð1 - SÞ: ð19:100Þ
Substituting (19.95) and (19.96) along with (19.99) and (19.100) for (19.90), we
get as the energy eigenvalue
αþβ α-β
λ1 = and λ2 = : ð19:101Þ
1þS 1-S
j ξ1 i ϕ1 þ ϕ 2
Ψ1 = = : ð19:103Þ
ξ1 2ð1 þ SÞ
Also, we have
780 19 Applications of Group Theory to Physical Chemistry
j ξ2 i ϕ1 - ϕ 2
Ψ2 = = : ð19:105Þ
ξ2 2ð 1 - SÞ
Note that both normalized MOs and energy eigenvalues depend upon whether we
ignore an overlap integral, as being the case with the simplest Hückel approximation.
Nonetheless, MO functional forms (in this case either ϕ1 + ϕ2 or ϕ1 - ϕ2) remain the
same regardless of the approximation levels about the overlap integral. Regarding
quantitative evaluation of α, β, and S, we will briefly mention it later.
Once we have decided symmetry of MO (or an irreducible representation which
the orbital belongs to) and its energy eigenvalue, we will be in a position to examine
various physicochemical properties of the molecule. One of them is an optical
transition within a molecule, particularly electric dipole transition. In most cases,
the most important transition is that occurring among the highest-occupied molec-
ular orbital (HOMO) and lowest-unoccupied molecular orbital (LUMO). In the case
of ethylene, those levels are depicted in Fig. 19.6. In a ground state (i.e., the most
stable state), two electrons are positioned in a B1u state (HOMO). An excited state is
assigned to B3g (LUMO). The ground state that consists only of fully occupied MOs
belongs to a totally symmetric representation. In the case of ethylene, the ground
state belongs to Ag accordingly.
If a photon is absorbed by a molecule, that molecule is excited by an energy of the
photon. In ethylene, this process takes place by exciting an electron from B1u to B3g.
The resulting electronic state ends up as an electron remaining in B1u and another
excited to B3g (a final state). Thus, the representation of the final sate electronic
configuration (denoted by Γ f) is described as
That is, the final excited state is expressed as a direct product of the states
associated with the optical transition. To determine the symmetry of the final sate,
we use (18.198) and (18.200). If Γ in (19.106) is reducible, using (18.200) we can
determine the number of times that individual representations take place. Calculating
χ ðB1u Þ ðgÞχ ðB3g Þ ðgÞ for each group element, we can readily get the result. Using a
character table, we have
Γ f = B2u : ð19:107Þ
Thus, we find that the transition is Ag ⟶ B2u, where Ag is called an initial state
electronic configuration and B2u is called a final state electronic configuration. This
transition is characterized by an electric dipole transition moment operator P. Here
19.4 MO Calculations Based on π-Electron Approximation 781
= ( )
= ( )
Fig. 19.6 HOMO and LUMO energy levels and their assignments of ethylene
where εe is a unit polarization vector of the electric field; Θi and Θf are the initial state
and final sate electronic configuration, respectively. The description (19.108) is in
parallel with (4.5). Notice that in (4.5) we dealt with a single electron system such as
a particle confined in a square-well potential, a sole one-dimensional harmonic
oscillator, and an electron in a hydrogen atom.
In the present case, however, we are dealing with a two-electron system, ethylene.
Consequently, we cannot describe Θf by a simple wave function, but must use a
more elaborated function. Nonetheless, when we discuss the optical transition of a
molecule, it is often the case that when we study optical absorption or optical
emission we first wish to know whether such a phenomenon truly takes place. In
such a case, qualitative prediction for this is of great importance. This can be done by
judging whether the integral (19.108) vanishes. If the integral does not vanish, the
relevant transition is said to be allowed. If, however, the integral vanishes, the
transition is called forbidden. In this context, a systematic approach based on
group theory is a powerful tool for this.
Let us consider optical absorption of ethylene. With the transition matrix element
for this, we have
ðβÞ ðαÞ
M fi = Φf jOðγÞ jΦi , ð19:110Þ
1 1
qα = χ ð α Þ ð gÞ χ ð γ × β Þ ð gÞ = χ ðαÞ ðgÞχ ðγ × βÞ ðgÞ
n g n g
1 1
= χ ðαÞ ðgÞχ ðγ Þ ðgÞχ ðβÞ ðgÞ = χ ðγÞ ðgÞχ ðαÞ ðgÞχ ðβÞ ðgÞ
n g n g
1 1
= χ ðγ Þ ðgÞχ ðα × βÞ ðgÞ = χ ðγÞ ðgÞ χ ðα × βÞ ðgÞ = qγ : ð19:111Þ
n g n g
Consequently, the number of times that D(γ) appears in D(α × β) is identical to the
number of times that D(α) appears in D(γ × β). Thus, it suffices to examine whether
D(γ × β) contains D(α). In other words, we only have to examine whether qα ≠ 0 in
(19.111).
Thus, applying (19.111)–(19.109), we examine whether DðB2u × Ag Þ contains the
irreducible representation D(η) that is related to x. We easily get
1
qB2u = ½1 × 1 þ ð- 1Þ × ð- 1Þ þ 1 × 1 þ ð- 1Þ × ð- 1Þ
8
19.4 MO Calculations Based on π-Electron Approximation 783
þð- 1Þ × ð- 1Þ þ 1 × 1 þ ð- 1Þ × ð- 1Þ þ 1 × 1 = 1:
B2u × B2u = Ag :
This means that if a light polarized along the y-axis is incident, i.e., εe is parallel to
the y-axis, the transition is allowed. In that situation, ethylene is said to be polarized
along the y-axis or polarized in the direction of the y-axis. As a molecular axis is
parallel to the y-axis, ethylene is polarized in the direction of the molecular axis. This
is often the case with aromatic molecules having a well-defined molecular long axis
such as ethylene.
We would examine whether ethylene is polarized along, e.g., the x-axis. From a
character table of D2h (see Table 19.2), x belongs to B3u. In that case, using (19.111)
we have
1
qB3u = ½1 × 1 þ ð- 1Þ × ð- 1Þ þ 1 × ð- 1Þ þ ð- 1Þ × 1
8
þð- 1Þ × ð- 1Þ þ 1 × 1 þ ð- 1Þ × 1 þ 1 × ð- 1Þ = 0:
Let us think of another example, cyclopropenyl radical that has three resonant
structures (Fig. 19.7). It is a planar molecule and three carbon atoms form an
equilateral triangle. Hence, the molecule belongs to D3h symmetry (see Sect.
17.2). In the molecule three π-electrons extend vertically to the molecular plane
toward upper and lower directions. Suppose that cyclopropenyl radical is placed on
the xy-plane. Three pz atomic orbitals of carbons located at vertices of an equilateral
triangle are denoted by ϕ1, ϕ2 and ϕ3 in Fig. 19.8. The orbitals are numbered
clockwise so that the calculations can be consistent with the conventional notation
of a character table (vide infra). We assume that these π-orbitals take positive and
negative signs on the upper and lower sides of the plane of paper, respectively, with a
nodal plane lying on the xy-plane. The situation is similar to that of ethylene and the
problem can be treated in parallel to the case of ethylene.
As in the case of ethylene, we can choose ϕ1, ϕ2 and ϕ3 as real functions. We
construct basis vectors using these vectors. What we want to do to address the
problem is as follows: (i) We examine how ϕ1, ϕ2 and ϕ3 are transformed by the
784 19 Applications of Group Theory to Physical Chemistry
⟷ ⟷ ⋅
⋅ ⋅
Fig. 19.7 Three resonant structures of cyclopropenyl radical
O x
z
C 3 = E, C3 , C 23 :
In the above, we use the same notation for the group name and group element, and
so we should be careful not to confuse them.
They are transformed as follows:
0 1 0
C 3 ðzÞ = 0 0 1 : ð19:116Þ
1 0 0
0 0 1
C 23 ðzÞ = 1 0 0 : ð19:117Þ
0 1 0
ðϕ1 ϕ2 ϕ3 ÞC 2 = ð- ϕ1 - ϕ3 - ϕ2 Þ: ð19:118Þ
Therefore,
-1 0 0
C2 = 0 0 -1 : ð19:119Þ
0 -1 0
-1 0 0
ðϕ1 ϕ2 ϕ3 Þσ h = ð- ϕ1 - ϕ2 - ϕ3 Þ and σ h = 0 -1 0 : ð19:120Þ
0 0 -1
786 19 Applications of Group Theory to Physical Chemistry
Table 19.5 Characters for individual symmetry transformations of 2pz orbitals in cyclopropenyl
radical
D3h E 2C3 3C2 σh 2S3 3σ v
Γ 3 0 -1 -3 0 1
In this way we can determine the trace for individual symmetry transformations
of basis functions ϕ1, ϕ2, and ϕ3. We collect the results of characters of a reducible
representation Γ in Table 19.5. It can be reduced to a summation of irreducible
representations according to the procedures given in (18.81)–(18.83) and using a
character table of D3h (Table 19.4). As a result, we get
Γ = A2 00 þ E00 : ð19:121Þ
As in the case of ethylene, we make the best use of the information of a subgroup
C3 of D3h. Let us consider a subduced representation of D of D3h to C3. For this, in
Table 19.6 we show a character table of irreducible representations of C3. We can
readily construct this character table. There should be three irreducible representa-
tions. Regarding the totally symmetric representation, we allocate 1 to each sym-
metry operation. Hence, for other representations we allocate 1 to an identity element
E and two other triple roots of 1, i.e., ε and ε [where ε = exp (i2π/3)] to an element
C3 and C32 as shown so that the row and column vectors of the character table are
orthogonal to each other.
Returning to the construction of the subduced representation, we have
1 1
PðAÞ ϕ1 = χ ðAÞ ðgÞ gϕ1 = ½1 ∙ ϕ1 þ 1 ∙ ϕ2 þ 1 ∙ ϕ3
3 g 3
1
= ðϕ1 þ ϕ2 þ ϕ3 Þ: ð19:123Þ
3
Also, we have
19.4 MO Calculations Based on π-Electron Approximation 787
P½E ϕ1 = χ ½E ðgÞ gϕ1 = ½1 ∙ ϕ1 þ ε ϕ3 þ ðε Þ ϕ2
ð 1Þ 1 ð 1Þ 1
3 g 3
1
= ðϕ þ εϕ2 þ ε ϕ3 Þ: ð19:124Þ
3 1
Also, we get
P½E ϕ1 = χ ½E ðgÞ gϕ1 = ½1 ∙ ϕ1 þ ðε Þ ϕ3 þ ε ϕ2
ð 2Þ 1 ð 2Þ 1
3 g
3
1
= ðϕ þ ε ϕ2 þ εϕ3 Þ: ð19:125Þ
3 1
C 3 ð ξ 1 Þ = C 3 ð ϕ1 þ ϕ 2 þ ϕ3 Þ = C 3 ϕ1 þ C 3 ϕ2 þ C 3 ϕ3 = ϕ3 þ ϕ1
þ ϕ2 = ξ 1 , ð19:127Þ
where for the second equality we used the fact that C3 is a linear operator. That is,
regarding a SALC ξ1, an eigenvalue of C3 is 1.
Similarly, we have
Furthermore, we get
C3 ðξ3 Þ = ε ξ3 : ð19:129Þ
Thus, we find that regarding SALCs ξ2 and ξ3, eigenvalues of C3 are ε and ε,
respectively. These pieces of information imply that if we appropriately choose
proper functions for basis vectors, a character of a symmetry operation for a
one-dimensional representation is identical to an eigenvalue of the said symmetry
operation (see Table 19.6).
788 19 Applications of Group Theory to Physical Chemistry
Regarding the last parts of the calculations, we follow the procedures described in
the case of ethylene. Using the above functions ξ1, ξ2, and ξ3, we construct the
secular equation such that
~ 11 - λ~S11
H
~ 22 - λ~S22
H = 0: ð19:130Þ
~ 33 - λ~S33
H
Since we have obtained three SALCs that are assigned to individual irreducible
representations A, E(1), and E(2), these SALCs span the representation space V3. This
makes off-diagonal elements of the secular equation vanish and it is simplified as in
(19.130). Here, we have
~ 11 =
H ξ1 Hξ1 dτ = ðϕ1 þ ϕ2 þ ϕ3 Þ H ðϕ1 þ ϕ2 þ ϕ3 Þdτ
where we used the same α and β as defined in (19.94). Strictly speaking, α and β
appearing in (19.131) should be slightly different from those of (19.94), because a
Hamiltonian is different. This approximation, however, would be enough for the
present studies. In a similar manner, we get.
~ 22 =
H ~ 33 =
ξ2 Hξ2 dτ = 3ðα - βÞ and H ξ3 Hξ3 dτ = 3ðα - βÞ: ð19:132Þ
Meanwhile, we have
~
S11 = hξ1 jξ1 i = ðϕ1 þ ϕ2 þ ϕ3 Þ ðϕ1 þ ϕ2 þ ϕ3 Þdτ = ðϕ1 þ ϕ2 þ ϕ3 Þ2 dτ
Similarly, we get
S22 = ~
~ S33 = 3ð1 - SÞ: ð19:134Þ
α þ 2β α-β α-β
λ1 = ,λ = , and λ3 = : ð19:135Þ
1 þ 2S 2 1 - S 1-S
Notice that two MOs belonging to E′′ have the same energy. These MOs are said
to be energetically degenerate. This situation is characteristic of a two-dimensional
representation. Actually, even though the group C3 has only one-dimensional
representations (because it is an Abelian group), the two complex conjugate repre-
sentations labelled E behave as if they were a two-dimensional representation
[2]. We will again encounter the same situation in a next example, benzene.
From (19.126), we get
j ξ1 i ϕ1 þ ϕ2 þ ϕ 3
Ψ1 = = : ð19:137Þ
ξ1 3ð1 þ 2SÞ
Also, we have
j ξ2 i ϕ1 þ εϕ2 þ ε ϕ3
Ψ2 = = : ð19:139Þ
ξ2 3ð 1 - SÞ
j ξ3 i ϕ1 þ ε ϕ2 þ εϕ3
Ψ3 = = : ð19:140Þ
ξ3 3ð 1 - SÞ
1 i
p - p
2 2
U= : ð19:141Þ
1 i
p p
2 2
Then we have
1 i
ðΨ 2 Ψ 3 ÞU = p ðΨ 2 þ Ψ 3 Þ p ð- Ψ 2 þ Ψ 3 Þ : ð19:142Þ
2 2
1 i
Ψ~2 = p ðΨ 2 þ Ψ 3 Þ and Ψ~3 = p ð- Ψ 2 þ Ψ 3 Þ, ð19:143Þ
2 2
we get
1
Ψ~2 = ½2ϕ1 þ ðε þ ε Þϕ2 þ ðε þ εÞϕ3
6 ð 1 - SÞ
1
= ð2ϕ1 - ϕ2 - ϕ3 Þ: ð19:144Þ
6ð 1 - SÞ
Also, we have
i i p p
Ψ~3 = ½ðε - εÞϕ2 þ ðε - ε Þϕ3 = - i 3 ϕ2 þ i 3 ϕ 3
6ð 1 - S Þ 6ð 1 - SÞ
1
= ðϕ2 - ϕ3 Þ: ð19:145Þ
2 ð 1 - SÞ
the light is polarized parallel to the molecular plane (i.e., the xy-plane in Fig. 19.8).
The proof is left for readers as an exercise. This polarizing feature is typical of planar
molecules with high molecular symmetry.
19.4.3 Benzene
Benzene has structural formula which is shown in Fig. 19.9. It is a planar molecule
and six carbon atoms form a regular hexagon. Hence, the molecule belongs to D6h
symmetry. In the molecule six π-electrons extend vertically to the molecular plane
toward upper and lower directions as in the case of ethylene and cyclopropenyl
radical. This is a standard illustration of quantum chemistry and dealt with in many
textbooks. As in the case of ethylene and cyclopropenyl radical, the problem can be
treated similarly.
As before, six equivalent pz atomic orbitals of carbon are denoted by ϕ1 to ϕ6 in
Fig. 19.10. These vectors or their linear combinations span a six-dimensional
representation space. We construct basis vectors using these vectors. Following
the previous procedures, we construct proper SALC orbitals along with MOs.
Similarly as before, a subgroup C6 of D6h plays an essential role. This subgroup
contains six group elements such that
C 6 = E, C 6 , C 3 , C2 , C 23 , C 56 : ð19:146Þ
O x
z
Table 19.7 Characters for individual symmetry transformations of 2pz orbitals in benzene
D6h E 2C6 2C3 C2 3C 02 3C 002 i 2S3 2S6 σh 3σ d 3σ v
Γ 6 0 0 0 -2 0 0 0 0 -6 0 2
0 1 0 0 0 0
0 0 1 0 0 0
C6 ðzÞ = 0
0
0
0
0
0
1
0
0
1
0
0
: ð19:148Þ
0 0 0 0 0 1
1 0 0 0 0 0
Once again, we can determine the trace for individual symmetry transformations
belonging to D6h. We collect the results in Table 19.7. The representation is
reducible and this is reduced as follows using a character table of D6h
(Table 19.8). As a result, we get
Here, we used Table 19.9 that shows a character table of irreducible representa-
tions of C6. Following the previous procedures, as SALCs we have
19.4 MO Calculations Based on π-Electron Approximation 793
6PðAÞ ϕ1 ξ1 = ϕ1 þ ϕ2 þ ϕ3 þ ϕ4 þ ϕ5 þ ϕ6 ,
6PðBÞ ϕ1 ξ2 = ϕ1 - ϕ2 þ ϕ3 - ϕ4 þ ϕ5 - ϕ6 ,
~ 11 - λ~
H S11
~ 22 - λ~S22
H
~ 33 - λ~S33
H
¼ 0: ð19:152Þ
~ 44 - λ~S44
H
~ 55 - λ~S55
H
~ 66 - λ~S66
H
~ 11 =
H ξ1 Hξ1 dτ = hξ1 jHξ1 i = 6ðα þ 2β þ 2β0 þ β00 Þ: ð19:153Þ
~ 22 =
H ξ2 Hξ2 dτ = hξ2 jHξ2 i = 6ðα - 2β þ 2β0 - β00 Þ,
~ 33 = H
H ~ 44 = 6ðα þ β - β0 - β00 Þ,
~ 55 = H
H ~ 66 = 6ðα - β - β0 þ β00 Þ: ð19:154Þ
Meanwhile, we have
~
S11 = hξ1 jξ1 i = 6ð1 þ 2S þ 2S0 þ S00 Þ,
~
S22 = hξ2 jξ2 i = 6ð1 - 2S þ 2S0 - S00 Þ,
~
S33 = ~
S44 = 6ð1 þ S - S0 - S00 Þ,
19.4 MO Calculations Based on π-Electron Approximation 795
~
S55 = ~
S66 = 6ð1 - S - S0 þ S00 Þ, ð19:155Þ
where S, S′, and S′′ are overlap integrals between the ortho, meta, and para positions,
respectively. Substituting (19.153) through (19.155) for (19.152), the energy eigen-
values are readily obtained as
α - β - β0 þ β00
λ5 = λ6 = : ð19:156Þ
1 - S - S0 þ S00
Notice that two MOs ξ3 and ξ4 as well as ξ5 and ξ6 are degenerate. As can be seen,
λ3 and λ4 are doubly degenerate. So are λ5 and λ6.
From (19.151), we get, for instance,
Thus, for one of the normalized MOs corresponding to an energy eigenvalue λ1,
we get
j ξ1 i ϕ1 þ ϕ2 þ ϕ3 þ ϕ 4 þ ϕ5 þ ϕ6
Ψ1 = = : ð19:158Þ
ξ1 6ð1 þ 2S þ 2S0 þ S00 Þ
Following the previous examples, we have other normalized MOs. That is,
we have
j ξ2 i ϕ1 - ϕ2 þ ϕ3 - ϕ4 þ ϕ5 - ϕ6
Ψ2 = = ,
ξ2 6ð1 - 2S þ 2S0 - S00 Þ
j ξ3 i ϕ1 þ εϕ2 - ε ϕ3 - ϕ4 - εϕ5 þ ε ϕ6
Ψ3 = = ,
ξ3 6ð1 þ S - S0 - S00 Þ
j ξ4 i ϕ1 þ ε ϕ2 - εϕ3 - ϕ4 - ε ϕ5 þ εϕ6
Ψ4 = = ,
ξ4 6ð1 þ S - S0 - S00 Þ
796 19 Applications of Group Theory to Physical Chemistry
= ( )
( )
j ξ5 i ϕ1 - ε ϕ2 - εϕ3 þ ϕ4 - ε ϕ5 - εϕ6
Ψ5 = = ,
ξ5 6ð1 - S - S0 þ S00 Þ
j ξ6 i ϕ1 - εϕ2 - ε ϕ3 þ ϕ4 - εϕ5 - ε ϕ6
Ψ6 = = : ð19:159Þ
ξ6 6ð1 - S - S0 þ S00 Þ
2ϕ þ ϕ2 - ϕ3 - 2ϕ4 - ϕ5 þ ϕ6
Ψ~3 = 1 ,
12ð1 þ S - S0 - S00 Þ
ϕ2 þ ϕ3 - ϕ5 - ϕ6
Ψ~4 = p : ð19:160Þ
2 1 þ S - S0 - S00
2ϕ - ϕ2 - ϕ3 þ 2ϕ4 - ϕ5 - ϕ6
Ψ~5 = 1 ,
12ð1 - S - S0 þ S00 Þ
ϕ2 - ϕ3 þ ϕ5 - ϕ6
Ψ~6 = p : ð19:161Þ
2 1 - S - S0 þ S00
A major optical transition takes place among HOMO (E1g) and LUMO (E2u)
levels. In the case of optical absorption, an initial electronic configuration is assigned
to the totally symmetric representation A1g and the symmetry of the final electronic
configuration is described as
Γ = E1g × E2u :
where ΦðA1g Þ stands for the totally symmetric ground state electronic configuration;
ΦðE1g × E2u Þ denotes an excited state electronic configuration represented by a direct-
product representation. This representation is reducible and expressed as a direct
sum of irreducible representations such that
We revisit the allyl radical and perform its MO calculations. As already noted,
Tables 18.1 and 18.2 of Example 18.1 collected representation matrices of individual
symmetry operations in reference to the basis vectors comprising three atomic
orbitals of allyl radical. As usual, we examine traces (or characters) of those
matrices. Table 19.10 collects them. The representation is readily reduced according
to the character table of C2v (see Table 18.4) so that we have
798 19 Applications of Group Theory to Physical Chemistry
Table 19.10 Characters for individual symmetry transformations of 2pz orbitals in allyl radical
C2v E C2(z) σ v(zx) σ 0v ðyzÞ
Γ 3 -1 1 -3
Γ = A2 þ 2B1 : ð19:164Þ
We have two SALC orbitals that belong to the same irreducible representation of
B1. As noted in Sect. 19.3, the orbitals obtained by a linear combination of these two
SALCs belong to B1 as well. Such a linear combination is given by a unitary
transformation. In the present case, it is convenient to transform the basis vectors
two times. The first transformation is carried out to get SALCs and the second one
will be done in the process of solving a secular equation using the SALCs. Sche-
matically showing the procedures, we have
ϕ1 Ψ1 Φ1
ϕ2 → Ψ2 → Φ2 ,
ϕ3 Ψ3 Φ3
where ϕ1, ϕ2, ϕ3 show the original atomic orbitals; Ψ 1, Ψ 2, Ψ 3 the SALCs; Φ1, Φ2,
Φ3 the final Mos. Thus, the three sets of vectors are connected through unitary
transformations.
Starting with ϕ1 and following the previous cases, we have, e.g.,
1
PðB1 Þ ϕ1 = χ ðB1 Þ ðgÞ gϕ1 = ϕ1 :
4 g
1 1
PðB1 Þ ϕ2 = χ ðB1 Þ ðgÞ gϕ2 = ðϕ2 þ ϕ3 Þ:
4 g 2
Meanwhile, we get
PðA2 Þ ϕ1 = 0,
1 1
PðA2 Þ ϕ2 = χ ðA2 Þ ðgÞ gϕ2 = ðϕ - ϕ3 Þ:
4 g 2 2
Thus, we recovered the results of Example 18.1. Notice that ϕ1 does not partic-
ipate in A2, but take part in B1 by itself.
Normalized SALCs are given as follows:
19.4 MO Calculations Based on π-Electron Approximation 799
det H jk - λSjk = 0:
2
α-λ ðβ - SλÞ 0
1 þ S′
α þ β′
2 -λ 0 = 0,
ðβ - SλÞ 1 þ S′
1 þ S′
α-β′
0 0 -λ
1 - S′
where α, β, and S are similarly defined as (19.94) and (19.98). The quantities β′ is
defined as
2
α-λ ðβ - SλÞ
1 þ S′ α-β′
=0 and - λ = 0: ð19:165Þ
2 α þ β′ 1 - S′
ðβ - SλÞ -λ
1 þ S′ 1 þ S′
α - β0
λ= :
1 - S0
The first equation of (19.165) is somewhat complicated, and so we adopt the next
approximation. That is,
S0 = β0 = 0: ð19:166Þ
This approximation is justified, because two carbon atoms C2 and C3 are pretty
remote, and so the interaction between them is likely to be weak enough. Thus, we
rewrite the first equation of (19.165) as
p
p α-λ 2ðβ - SλÞ
= 0: ð19:167Þ
2ðβ - SλÞ α-λ
p 2αS p αS
λ = α - 2βS ± 2β 1- ≈ α - 2βS ± 2β 1 - ,
β β
p 1
1-x≈1- x
2
where λL < λH. To determine the corresponding eigenfunctions Φ (i.e., MOs), we use
a linear combination of two SALCs Ψ 1 and Ψ 2. That is, putting
Φ = c1 Ψ 1 þ c2 Ψ 2 ð19:169Þ
p
ðα - λÞc1 þ 2ðβ - SλÞc2 = 0:
1 1 p - 1=2 p
ΦL = p ð Ψ 1 þ Ψ 2 Þ = 1 þ 2S 2 ϕ1 þ ϕ2 þ ϕ3 : ð19:170Þ
2 2
1 1 p - 1=2 p
ΦH = p ð- Ψ 1 þ Ψ 2 Þ = 1 - 2S - 2ϕ1 þ ϕ2 þ ϕ3 : ð19:171Þ
2 2
1
λ0 ≈ α and Φ0 = p ðϕ2 - ϕ3 Þ, ð19:172Þ
2
where we have λL < λ0 < λH. The eigenfunction Φ0 does not participate in chemical
bonding and, hence, is said to be a non-bonding orbital.
It is worth noting that eigenfunctions ΦL, Φ0, and ΦH have the same function
forms as those obtained by the simple Hückel theory that ignores the overlap
integrals S. It is because the interaction between ϕ2 and ϕ3 that construct SALCs
of Ψ 2 and Ψ 3 is weak.
Notice that within the framework of the simple Hückel theory, two sets of basis
vectors j ϕ1 i, p12 j ϕ2 þ ϕ3 i, p12 j ϕ2 - ϕ3 i and jΦLi, jΦHi, jΦ0i are connected by a
following unitary matrix V:
1 1
ðjΦL i jΦH i jΦ0 iÞ ¼ j ϕ1 i p j ϕ2 þ ϕ 3 i p j ϕ2 - ϕ 3 i V
2 2
1 1
p - p 0
2 2
1 1
¼ j ϕ1 i p j ϕ2 þ ϕ 3 i p j ϕ2 - ϕ 3 i 1 1 : ð19:173Þ
2 2 p p 0
2 2
0 0 1
( )
( ) + 2 )(1 − 2
Fig. 19.12 Electronic configurations and symmetry species of individual eigenstates along with
their corresponding energy eigenvalues for the allyl cation. (a) Ground state. (b) First excited state.
(c) Second excited state
1 1
p -p 0
2 2
1 1 1
j ϕ1 i j ϕ2 i j ϕ3 i UV = j ϕ1 i j ϕ2 i j ϕ3 i p
2 2 2
1 1 1
-p
2 2 2
The optical transition of allyl radical represents general features of the optical
transitions of molecules. To make a story simple, let us consider a case of allyl
cation. Figure 19.12 shows the electronic configurations together with symmetry
species of individual eigenstates and their corresponding energy eigenvalues for the
allyl cation. In Fig. 19.13, we redraw its geometry where the origin is located at the
center of a line segment connecting C2 and C3 (r2 + r3 = 0). Major optical transitions
(or optical absorption) are ΦL → Φ0 and ΦL → ΦH.
(i) ΦL → Φ0: In this case, following (19.109) the transition matrix element Tfi is
described by
Γ = B1 × A2 = B2 :
Since a direct product of the initial electronic configuration [Θi ðA1 Þ ] and the final
configuration Θf ðB1 × A2 Þ is B1 × A2 × A1 = B2. Then, if εe P belongs to the same
irreducible representation B2, the associated optical transition should be allowed.
Consulting a character table of C2v, we find that y belongs to B2. Thus, the allowed
optical transition is polarized along the y-direction (see Fig. 18.1).
(ii) ΦL → ΦH: In parallel with the above case, Tfi is described by
A1 → A1 ,
where the former A1 indicates the electronic ground state and the latter A1 indicates
the excited state given by B1 × B1 = A1. The direct product of them is simply
described as B1 × B1 × A1 = A1. Consulting a character table again, we find that
z belongs to A1. This implies that the allowed transition is polarized along the z-
direction (see Fig. 18.1).
Next, we investigate the above optical transition in a semi-quantitative manner. In
principle, the transition matrix element Tfi should be estimated from (19.175) and
(19.176) that use electronic configurations of two-electron system. Nonetheless, a
formulation of (4.7) of Sect. 4.1 based upon one-electron states well serves our
present purpose. For ΦL → Φ0 transition, we have
1 p 1
= eεe 2ϕ1 þ ϕ2 þ ϕ3 r p ðϕ2 - ϕ3 Þ dτ
2 2
eε p p
= pe 2ϕ1 rϕ2 - 2ϕ1 rϕ3 þ ϕ2 rϕ2 þ ϕ3 rϕ2 - ϕ3 rϕ3 dτ
2 2
804 19 Applications of Group Theory to Physical Chemistry
eε
≈ pe ðϕ2 rϕ2 - ϕ3 rϕ3 Þ dτ
2 2
eε eε eε
≈ pe r2 jϕ2 j2 dτ - r3 jϕ3 j2 dτ = pe ðr2 - r3 Þ = p e r2 ,
2 2 2 2 2
where with the first near equality we ignored integrals ϕirϕjdτ (i ≠ j); with the last
near equality r ≈ r2 or r3. For these approximations, we assumed that an electron
density is very high near C2 or C3 with ignorable density at a place remote from
them. Choosing εe for the direction of r2, we get
e
T fi ðΦL → Φ0 Þ ≈ p jr2 j:
2
eεe eε
T fi ðΦL → ΦH Þ = ΦH εe PΦL dτ ≈ ðr3 - 2r1 þ r2 Þ = - e r1 :
4 2
e
T fi ðΦL → ΦH Þ ≈ jr j:
2 1
Transition
p probability is proportional to a square of an absolute value of Tfi. Using
j r2 j ≈ 3 j r1 j, we have
2 e2 3e2 2
T fi ðΦL → Φ0 Þ ≈ jr2 j2 = jr j ,
2 2 1
2 e2
T fi ðΦL → ΦH Þ ≈ jr j2 :
4 1
Thus, we obtain
2 2
T fi ðΦL → Φ0 Þ ≈ 6 T fi ðΦL → ΦH Þ : ð19:177Þ
Thus, the transition probability of ΦL → Φ0 is about six times that for ΦL → ΦH.
Note that in the above simple estimation we ignore an overlap integral S.
From the above discussion, we conclude that (i) the ΦL → Φ0 transition is
polarized along the r2 direction (i.e., the molecular long axis) and that the ΦL → ΦH
transition is polarized along the r1 direction (i.e., the molecular short axis).
(ii) Transition probability of ΦL → Φ0 is about six times that of ΦL → ΦH. Note
19.5 MO Calculations of Methane 805
that the polarized characteristics are consistent with those obtained from the discus-
sion based on the group theory. The conclusion reached by the semi-quantitative
estimation of a simple molecule of allyl cation well typifies the general optical
features of more complicated molecules having a well-defined molecular long axis
such as polyenes.
O
y
x
806 19 Applications of Group Theory to Physical Chemistry
where by the above notations we denoted atomic species and molecular orbitals.
Hence, as a matrix representation we have
1 0 0 0 0 0 0 0
0 0 0 1 0 0 0 0
0 1 0 0 0 0 0 0
0 0 1 0 0 0 0 0
C 3xyz = , ð19:178Þ
0 0 0 0 1 0 0 0
0 0 0 0 0 0 0 1
0 0 0 0 0 1 0 0
0 0 0 0 0 0 1 0
where Cxyz
3 is the same operation as Rxyz2π
3
appeared in Sect. 17.3. Therefore,
χ Cxyz
3 = 2:
As another example σ yz
d , we have
where σ yz
d represents a mirror symmetry with respect to the plane that includes the x-
axis and bisects the angle formed by the y- and z-axes. Thus, we have
1 0 0 0 0 0 0 0
0 1 0 0 0 0 0 0
0 0 0 1 0 0 0 0
0 0 1 0 0 0 0 0
σ yz
d = : ð19:179Þ
0 0 0 0 1 0 0 0
0 0 0 0 0 1 0 0
0 0 0 0 0 0 0 1
0 0 0 0 0 0 1 0
Then, we have
19.5 MO Calculations of Methane 807
Table 19.11 Characters for individual symmetry transformations of hydrogen 1s orbitals and
carbon 2s and 2p orbitals in methane
Td E 8C3 3C2 6S4 6σ d
Γ 8 2 0 0 4
χ σ yz
d = 4:
0 0 0 1 0 0 0 0
0 0 1 0 0 0 0 0
0 1 0 0 0 0 0 0
1 0 0 0 0 0 0 0
C 2z = , ð19:180Þ
0 0 0 0 1 0 0 0
0 0 0 0 0 -1 0 0
0 0 0 0 0 0 -1 0
0 0 0 0 0 0 0 1
χ C z2 = 0:
zπ
With S42 (i.e., an improper rotation by π2 around the z-axis), we get
zπ
H 1 H 2 H 3 H 4 C2s C2px C2py C2pz S42
0 1 0 0 0 0 0 0
0 0 0 1 0 0 0 0
1 0 0 0 0 0 0 0
zπ 0 0 1 0 0 0 0 0
S4 2 = , ð19:181Þ
0 0 0 0 1 0 0 0
0 0 0 0 0 0 -1 0
0 0 0 0 0 1 0 0
0 0 0 0 0 0 0 -1
808 19 Applications of Group Theory to Physical Chemistry
zπ
χ S42 = 0:
χ ðE Þ = 8:
In other words, V8 is decomposed into the above two R-invariant subspaces (see
Part III); one is the hydrogen-related subspace and the other is the carbon-related
subspace.
Correspondingly, a representation D comprising the above representation matri-
ces should be reduced to a direct sum of irreducible representations D(α). That is, we
should have
ðαÞ
D= q D
α α
,
1
qα = χ ðαÞ ðgÞ χ ðgÞ: ð18:83Þ
n g
1 1
qA1 = χ ðA1 Þ ðgÞ χ ðgÞ = ð1 8 þ 1 8 2 þ 1 6 4Þ = 2:
24 g 24
1
qA 2 = ½1 8 þ 1 8 2 þ ð- 1Þ 6 4 = 0:
24
1
qT 2 = ð3 8 þ 1 6 4Þ = 2:
24
0 0 0 1
0 0 1 0
C 2z = :
0 1 0 0
1 0 0 0
1 1 1 1
2 2 2 2
1 1 1 1
- -
P= 2 2 2 2
1 1 1 1
- -
2 2 2 2
1 1 1 1
- -
2 2 2 2
1 0 0 0
-1 0 -1 0 0
P C 2z P = :
0 0 -1 0
0 0 0 1
Note that this diagonal matrix is identical to submatrix of (19.180) for Span{C2s,
zπ
C2px, C2py, C2pz}. Similarly, S42 of (19.181) gives eigenvalues ±1 and ±i for both
zπ
the subspaces. Notice that since S42 is unitary, its eigenvalues take a complex number
with an absolute value of 1. Since these symmetry operation matrices are unitary,
these matrices must be diagonalized according to Theorem 14.5. Using unitary
matrices whose column vectors are chosen from eigenvectors of the matrices,
diagonal elements are identical to eigenvalues including their multiplicity. Writing
representation matrices of symmetry operations for the hydrogen-associated sub-
space and carbon-associated subspace as H and C, we find that H and C have the
810 19 Applications of Group Theory to Physical Chemistry
-1
P - 1 HP = Q - 1 CQ or PQ - 1 H PQ - 1 = C:
Namely, H and C are similar, i.e., the representation is equivalent. Thus, recalling
Schur’s First Lemma, (19.184) results.
Our next task is to construct SALCs, i.e., proper basis vectors using projection
operators. From the above, we anticipate that the MOs comprise a linear combina-
tion of the hydrogen-associated SALC and carbon-associated SALC that belongs to
the same irreducible representation (i.e., A1 or T2). For this purpose, we first find
proper SALCs using projection operators described as
ðαÞ dα ðαÞ
PlðlÞ = Dll ðgÞ g: ð18:156Þ
n g
We apply this operator to Span{H1, H2, H3, H4}. Taking, e.g., H1 and operating
both sides of (18.156) on H1, as a basis vector corresponding to a one-dimensional
representation of A1 we have
ðA Þ d A1 ðA Þ 1
P1ð11Þ H 1 = D111 ðgÞ gH 1 = ½ð1 H 1 Þ þ ð1 H 1 þ 1 H 1 þ 1 H 3
n g 24
þ1 H 4 þ 1 H 3 þ 1 H 2 þ 1 H 4 þ 1 H 2 Þ þ ð1 H 4 þ 1 H 3 þ 1 H 2 Þ
þð1 H 3 þ 1 H 2 þ 1 H 2 þ 1 H 4 þ 1 H 4 þ 1 H 3 Þ
þð1 H 1 þ 1 H 4 þ 1 H 1 þ 1 H 2 þ 1 H 1 þ 1 H 3 Þ:
1
= ð6 H 1 þ 6 H 2 þ 6 H 3 þ 6 H 4 Þ
24
1
= ðH 1 þ H 2 þ H 3 þ H 4 Þ: ð19:185Þ
4
The case of C2s is simple, because all the symmetry operations convert C2s to
itself. That is, we have
ðA Þ 1
P1ð11Þ C2s = ð24 C2sÞ = C2s: ð19:186Þ
24
ðA Þ 1
P1ð11Þ C2px = ð1 C2px Þ þ 1 C2py þ 1 C2pz þ 1 - C2py
24
= 0:
The calculation is somewhat tedious, but it is natural that since C2s is spherically
symmetric, it belongs to the totally symmetric representation. Conversely, it is
natural to think that C2px is totally unlikely to contain a totally symmetric represen-
tation. This is also the case with C2py and C2pz.
Table 19.12 shows the character table of Td in which the three-dimensional
irreducible representation T2 is spanned by basis vectors (x y z). Since in
Table 17.6 each (3, 3) matrix is given in reference to the vectors (x y z), it can
directly be utilized to represent T2. More specifically, we can directly choose the
ðT Þ
diagonal elements (1, 1), (2, 2), and (3, 3) of the individual (3, 3) matrices for D112 ,
ðT 2 Þ ðT 2 Þ
D22 , and D33 elements of the projection operators, respectively. Thus, we can
construct SALCs using projection operators explained in Sect. 18.7. For example,
using H1 we obtain SALCs that belong to T2 such that
ðT Þ 3 ðT Þ 3
P1ð12Þ H 1 = D112 ðgÞ gH 1 = fð 1 H 1 Þ
24 g 24
812 19 Applications of Group Theory to Physical Chemistry
þ½ð- 1Þ H 4 þ ð- 1Þ H 3 þ 1 H 2
þ½0 ð- H 3 Þ þ 0 ð- H 2 Þ þ 0 ð- H 2 Þ þ 0 ð- H 4 Þ þ ð- 1Þ H 4 þ ð- 1Þ H 3
þð0 H 1 þ 0 H 4 þ 1 H 1 þ 1 H 2 þ 0 H 1 þ 0 H 3 Þg
3
= ½2 H 1 þ 2 H 2 þ ð- 2Þ H 3 þ ð- 2Þ H 4
24
1
= ðH þ H 2 - H 3 - H 4 Þ: ð19:187Þ
4 1
Similarly, we have
ðT Þ 1
P2ð22Þ H 1 = ðH - H 2 þ H 3 - H 4 Þ, ð19:188Þ
4 1
ðT Þ 1
P3ð32Þ H 1 = ðH 1 - H 2 - H 3 þ H 4 Þ: ð19:189Þ
4
ðT Þ
Now we can easily guess that P1ð12Þ C2px solely contains C2px. Likewise,
ðT Þ ðT Þ
P2ð22Þ C2py and P3ð32Þ C2pz contain only C2py and C2pz, respectively. In fact, we
obtain what we anticipate. That is,
ðT Þ ðT Þ ðT Þ
P1ð12Þ C2px = C2px , P2ð22Þ C2py = C2py , P3ð32Þ C2pz = C2pz : ð19:190Þ
This gives a good example to illustrate the general concept of projection operators
and related calculations of inner products discussed in Sect. 18.7. For example, we
ðT Þ ðT Þ
have a non-vanishing inner product of P1ð12Þ H 1 jP1ð12Þ C2px and an inner product
ðT Þ ðT Þ
of, e.g., P2ð22Þ H 1 jP1ð12Þ C2px is zero. This significantly reduces efforts to solve a
ðT Þ ðT Þ
secular equation (vide infra). Notice that functions P1ð12Þ H 1 , P1ð12Þ C2px , etc. are
ðT Þ ðT Þ
linearly independent of one another. Recall also that P1ð12Þ H 1 and P1ð12Þ C2px
ðαÞ ðαÞ
correspond to ϕl and ψ l in (18.171), respectively. That is, the former functions
are linearly independent, while they belong to the same place “1” of the same three-
dimensional irreducible representation T2.
Equations (19.187)–(19.190) seem intuitively obvious. This is because if we
draw a molecular geometry of methane (see Fig. 19.14), we can immediately
recognize the relationship between the directionality in ℝ3 and “directionality” of
19.5 MO Calculations of Methane 813
SALCs represented by sings of hydrogen atomic orbitals (or C2px, C2py, and C2pz of
carbon).
As stated above, we have successfully obtained SALCs relevant to methane.
Therefore, our next task is to solve an eigenvalue problem and construct appropriate
MOs. To do this, let us first normalize the SALCs. We assume that carbon-based
SALCs have already been normalized as well-studied atomic orbitals. We suppose
that all the functions are real. For the hydrogen-based SALCs
hH 1 þ H 2 þ H 3 þ H 4 jH 1 þ H 2 þ H 3 þ H 4 i = 4hH 1 jH 1 i þ 12hH 1 jH 2 i
where the second equality comes from the fact that jH1i, i.e., a 1 s atomic orbital is
normalized. We define hH1| H2i = SHH, i.e., an overlap integral between two
adjacent hydrogen atoms. Note also that hHi| Hji (1 ≤ i, j ≤ 4; i ≠ j) is the same
because of the symmetry requirement.
Thus, as a normalized SALC we have
j H1 þ H2 þ H3 þ H4i
H ðA1 Þ : ð19:192Þ
2 1 þ 3SHH
Also, we have
hH 1 þ H 2 - H 3 - H 4 jH 1 þ H 2 - H 3 - H 4 i = 4hH 1 jH 1 i - 4hH 1 jH 2 i
Thus, we have
ðT 2 Þ j H1 þ H2 - H3 - H4i
H1 p : ð19:195Þ
2 1 - SHH
p
Also defining a denominator as d = 2 1 - SHH , we have
814 19 Applications of Group Theory to Physical Chemistry
ðT 2 Þ
H1 j H 1 þ H 2 - H 3 - H 4 i=d:
ðT 2 Þ
H2 j H 1 - H 2 þ H 3 - H 4 i=d,
ðT 2 Þ
H3 j H 1 - H 2 - H 3 þ H 4 i=d:
The next step is to construct MOs using the above SALCs. To this end, we make a
linear combination using SALCs belonging to the same irreducible representations.
In the case of A1, we choose H ðA1 Þ and C2s. Naturally, we anticipate two linear
combinations of
where a1 and b1 are arbitrary constants. On the basis of the discussions of projection
operators in Sect. 18.7, both the above two linear combinations belong to A1 as well.
ðT Þ ðT Þ ðT Þ
Similarly, according to the projection operators P1ð12Þ , P2ð22Þ , and P3ð32Þ , we make three
sets of linear combinations
ðT Þ ðT Þ ðT Þ
q1 P1ð12Þ H 1 þ r 1 C2px ; q2 P2ð22Þ H 1 þ r 2 C2py ; q3 P3ð32Þ H 1 þ r 3 C2pz , ð19:197Þ
where q1, r1, etc. are arbitrary constants. These three sets of linear combinations
belong to individual “addresses” 1, 2, and 3 of T2. What we have to do is to
determine coefficients of the above MOs and to normalize them by solving the
secular equations. With two different energy eigenvalues, we get two orthogonal
(i.e., linearly independent) MOs for the individual four sets of linear combinations of
(19.196) and (19.197). Thus, total eight linear combinations constitute MOs of
methane.
In light of (19.183), the secular equation can be reduced as follows according to
the representations A1 and T2. There we have changed the order of entries in the
equation so that we can deal with the equation easily. Then we have
H 11 - λ H 12 - λS12
H 21 - λS21 H 22 - λ
G11 - λ G12 - λT 12
G21 - λT 21 G22 - λ
F 11 - λ F 12 - λV 12
F 21 - λV 21 F 22 - λ
K 11 - λ K 12 - λW 12
K 21 - λW 21 K 22 - λ
¼ 0, ð19:198Þ
19.5 MO Calculations of Methane 815
where off-diagonal elements are all zero except for those explicitly described. This is
because of the symmetry requirement (see Sect. 19.2). Thus, in (19.198) the secular
equation is decomposed into four (2, 2) blocks. The first block is pertinent to A1 of a
hydrogen-based component and carbon-based component from the left, respectively.
Lower three blocks are pertinent to T2 of hydrogen-based and carbon-based compo-
ðT Þ ðT Þ ðT Þ
nents from the left, respectively, in order of P1ð12Þ , P2ð22Þ , and P3ð32Þ SALCs from the
top. The notations follow those of (19.59).
We compute these equations. The calculations are equivalent to solving the
following four two-dimensional secular equations:
H 11 - λ H 12 - λS12
H 21 - λS21 H 22 - λ
= 0,
G11 - λ G12 - λT 12
G21 - λT 21 G22 - λ
= 0,
F 11 - λ F 12 - λV 12
F 21 - λV 21 F 22 - λ
= 0,
K 11 - λ K 12 - λW 12
K 21 - λW 21 K 22 - λ
= 0: ð19:199Þ
Notice that these four secular equations are the same as a (2, 2) determinant of
(19.59) in the form of a secular equation. Note, at the same time, that while (19.59)
did not assume SALCs, (19.199) takes account of SALCs. That is, (19.199)
expresses a secular equation with respect to two SALCs that belong to the same
irreducible representation.
The first equation of (19.199) reads as
2SCH
A1
= : ð19:201Þ
1 þ 3SHH
A1 hH 1 jC2si:
SCH ð19:202Þ
αH þ 3βHH
H 11 H ðA1 Þ jHH ðA1 Þ = , ð19:203Þ
1 þ 3SHH
H 22 hC2sjHC2si, ð19:204Þ
Similarly, we obtain related solutions for the latter three eigenvalue equations of
(19.199). With the second equation of (19.199), for instance, we have
G11 þG22 -2G12 T 12 ± ðG11 -G22 Þ2 þ4 G12 2 þG11 G22 T 12 2 -G12 T 12 ðG11 þG22 Þ
λ¼ :
2 1-T 12 2
ð19:209Þ
ðT Þ ðT Þ hH 1 þ H 2 - H 3 - H 4 jC2px i
T 12 H 1 2 C2px dτ H 1 2 jC2px = p
2 1 - SHH
ðT Þ ðT 2 Þ αH - βHH
G11 H 1 2 jHH 1 = , ð19:211Þ
1 - SHH
ðT 2 Þ hH 1 þ H 2 - H 3 - H 4 jHC2px i
G12 H 1 j HC2px = p
2 1 - SHH
2hH 1 jHC2px i 2βCH
= p = p T2 : ð19:213Þ
1 - SHH 1 - SHH
T 2 hH 1 jC2px i, βT 2 hH 1 jHC2px i:
SCH ð19:214Þ
CH
In (19.210), an overlap integral T12 between four hydrogen atomic orbitals and
C2px is identical again from the symmetry requirement. That is, the integrals of
(19.213) are additive regarding a product of plus components of a hydrogen atomic
orbital of (19.187) and C2px as well as another product of minus components of a
hydrogen atomic orbital of (19.187) and C2px; see Fig. 19.14. Notice that C2px has a
node on yz-plane.
The third and fourth equations of (19.199) give exactly the same eigenvalues as
that given in (19.209). This is obvious from the fact that all the latter three equations
of (19.199) are associated with a irreducible representation T2. The corresponding
three MOs are triply degenerate.
818 19 Applications of Group Theory to Physical Chemistry
A B
In (19.207) and (19.209), the plus sign gives a higher orbital energy and the minus
sign gives a lower energy. Equations (19.207) and (19.209), however, look some-
what complicated. To simplify the situation, (i) in, e.g., (19.207) let us consider a
case where jH11 j ≫ j H22j or jH11 j ≪ j H22j. In that case, (H11 - H22)2 dominates
inside a square root, and so ignoring -2H12S12 and -S122 we have λ ≈ H11 or
λ ≈ H22. Inserting these values into (19.199), we have either Ψ ≈ H ðA1 Þ or Ψ ≈ C2s,
where Ψ is a resulting MO. This implies that no interaction would arise between
H ðA1 Þ and C2s. (ii) If, however, H11 = H22, we would get
H 11 - H 12 S12 ± j H 12 - H 11 S12 j
λ= : ð19:215Þ
1 - S12 2
The Hamiltonian of Hþ
2 is described as
ħ2 2 e2 e2 e2
H= - — - - þ , ð19:216Þ
2m 4πε0 r A 4πε0 r B 4πε0 R
where m is an electron rest mass and other symbols are defined in Fig. 19.15; the last
term represents the repulsive interaction between the two hydrogen nuclei. To
estimate the quantities in (19.215), it is convenient to use dimensionless ellipsoidal
μ
coordinates ν such that [3].
ϕ
rA þ rB rA - rB
μ= and ν= : ð19:217Þ
R R
The quantity ϕ is an azimuthal angle around the molecular axis (i.e., the straight
line connecting the two nuclei). Then, we have
ħ2 2 e2 e2
H 11 = hAjHjAi = Aj - — - jA þ Aj - jA
2m 4πε0 r A 4πε0 r B
e2
þ Aj jA
4πε0 R
e2 1 e2
= E 1s - Aj jA þ , ð19:218Þ
4πε0 rB 4πε0 R
where E1s is the same as that given in (3.258) with Z = n = 1 and μ replaced with
ħ2
m (i.e., E1s = - 2ma 2 ). Using a coordinate representation of (3.301), we have
1 - 3=2 - rA =a
j Ai = a e : ð19:219Þ
π
1 1 1
Aj jA = 3 dτe - 2rA =a :
rB πa rB
3 2π 1 1
R
dτ = dϕ dμ dν μ2 - ν2 ,
2 0 1 -1
we have
820 19 Applications of Group Theory to Physical Chemistry
1
1 R3 1
e - ðμþνÞR=a
Aj jA = 2π dμ dν μ2 - ν2
2 ðμ - ν Þ
rB 8πa3 R
1 -1
1 1
R2
= dμ dνðμ þ νÞe - ðμþνÞR=a : ð19:220Þ
2a3 1 -1
1
þ νÞe - ðμþνÞR=a , we obtain
1
Putting I = 1 dμ - 1 dνðμ
1 1 1 1
I= μe - μR=a dμ e - νR=a dν þ e - μR=a dμ νe - νR=a dν: ð19:221Þ
1 -1 1 -1
The above definite integrals can readily be calculated using the methods
described in Sect. 3.7.2; see, e.g., (3.262) and (3.263). For instance, we have
1
1
e - cx 1
e - cν dν = = ðec - e - c Þ: ð19:222Þ
-1 -c -1
c
1
1
νe - cν dν = ½ð1 - cÞec - ð1 þ cÞe - c : ð19:223Þ
-1 c2
In the present case, c is to be replaced with R/a. Other calculations of the definite
integrals are left for readers. Thus, we obtain
a3 R
I= 2 - 2 1 þ e - 2R=a :
R3 a
In turn, we have
1 R2 1 R
Aj jA = 3 I = 1 - 1 þ e - 2R=a :
rB 2a R a
e2
j0 ,
4πε0
we finally obtain
19.5 MO Calculations of Methane 821
j0 R j
1 - 1 þ e- a þ 0 :
2R
H 11 = E 1s - ð19:224Þ
R a R
2π 1 1
R3
S12 = hBjAi = dϕ dμ dν μ2 - ν2 e - μR=a , ð19:225Þ
8πa3 0 1 -1
1 - 3=2 - rB =a
where we used j Bi = πa e and (19.217). Noting that
1 1 1
2 - μR=a
dμ dν μ2 - ν2 e - μR=a = dμ 2μ2 - e
1 -1 1 3
2
R 1 R
S12 = 1 þ þ e - R=a : ð19:226Þ
a 3 a
ħ2 2 e2 e2
H 12 = hAjHjBi = Aj - — - jB þ Aj - jB
2m 4πε0 r B 4πε0 r A
e2
þ Aj jB
4πε0 R
e2 1 e2
= E 1s hAjBi - Aj jB þ hAjBi: ð19:227Þ
4πε0 rA 4πε0 R
ħ 2
Note that in (19.227) jBi is an eigenfunction belonging to E1s = - 2ma 2 that is an
ħ2 e2
eigenvalue of an operator - 2m — - 4πε0 rB . From (3.258), we estimate E1s to be
2
13.61 eV. Using (19.217) and converting Cartesian coordinates to ellipsoidal coor-
dinates once again, we get
1 1 R
Aj jB = 1 þ e - R=a :
rA a a
Thus, we have
822 19 Applications of Group Theory to Physical Chemistry
j0 j R
S - 0 1 þ e - a:
R
H 12 = E1s þ ð19:228Þ
R 12 a a
1 1
j0 j0 Aj jA and k0 j0 Aj jB :
rB rA
Then, we have
H 12 - H 11 S12
2
1 2R - R=a j0 R R 1 R
= j0 - e - 1þ 1þ þ e - 3R=a
R 3a2 R a a 3 a
j0 a 2R - R=a j0 2
a R 1 R
= - e - 1þ 1þ þ e - 3R=a : ð19:230Þ
a R 3a a R a 3 a
In (19.230) we notice that whereas the second term is always negative, the first
term may be negative or positive depending upon R. We could not tell a priori
whether H12 - H11S12 is negative accordingly. If we had R ≪ a, (19.230) would
become positive.
Let us then make a quantitative estimation. The Bohr radius a is about 52.9 pm
(using an electron rest mass). As an experimental result, R is approximately 106 pm
[3]. Hence, for a Hþ2 ion we have
R=a ≈ 2:0:
H 11 þ H 12 H 11 - H 12
λL = and λH = , ð19:231Þ
1 þ S12 1 - S12
where λL and λH indicate the lower and higher energy eigenvalues, respectively.
Namely, Case II in the above is more likely. Correspondingly, for MOs we have
j Aiþ j Bi j Ai - j Bi
ΨL = and ΨH = , ð19:232Þ
2ð1 þ S12 Þ 2ð1 - S12 Þ
where Ψ L and Ψ H belong to λL and λH, respectively. The results are virtually the
same as those given in (19.101)–(19.105) of Sect. 19.1. In (19.101) and Fig. 19.6,
however, we merely assumed that λ1 = αþβ α-β
1þS is lower than λ2 = 1 - S. Here, we have
confirmed that this is truly the case. A chemical bond is formed in such a way that an
electron is distributed as much as possible along the molecular axis (see Fig. 19.4 for
a schematic) and that in this configuration a minimized orbital energy is achieved.
In our present case, H11 in (19.203) and H22 in (19.204) should differ. This is also
the case with G11 in (19.211) and G22 in (19.212). Here we return back to (19.199).
Suppose that we get a MO by solving, e.g., the first secular equation of (19.199)
such that
Ψ = a1 H ðA1 Þ þ b1 C2s:
H 12 - λS12
a1 = - b , ð19:233Þ
H 11 - λ 1
where S12 and H12 were defined as in (19.201) and (19.205), respectively. A
~ is described by
normalized MO Ψ
~= a1 H ðA1 Þ þ b1 C2s
Ψ : ð19:234Þ
a1 2 þ b1 2 þ 2a1 b1 S12
Thus, according to two different energy eigenvalues λ we will get two linearly
independent MOs. Other three secular equations are dealt with similarly.
From the energetical consideration of Hþ 2 we infer that a1b1 > 0 with a bonding
MO and a1b1 < 0 with an anti-bonding MO. Meanwhile, since both H ðA1 Þ and C2s
belong to the irreducible representation A1, so does Ψ~ on the basis of the discussion
on the projection operators of Sect. 18.7. Thus, we should be able to construct proper
MOs that belong to A1. Similarly, we get proper MOs belonging to T2 by solving
other three secular equations of (19.199). In this case, three bonding MOs are triply
degenerate, so are three anti-bonding MOs. All these six MOs belong to the
irreducible representation T2. Thus, we can get a complete set of MOs for methane.
These eight MOs span the representation space V8.
824 19 Applications of Group Theory to Physical Chemistry
CH4
where ΘðA1 Þ stands for an electronic configuration of the totally symmetric ground
state; Θ(α × β) denotes an electronic configuration of an excited state represented by a
direct-product representation pertinent to irreducible representations α and β; P is an
electric dipole operator and εe is a unit polarization vector of the electric field. From a
19.5 MO Calculations of Methane 825
character table for Td, we find that εe P belongs to the irreducible representation T2
(see Table 19.12).
The ground state electronic configuration is A1 (totally symmetric). It is
denoted by
a21 t 22 t ′ 22 t″22 ,
where three T2 states are distinguished by a prime and double prime. For possible
configuration of excited states, we have
T 2 × T 2 × T 2 = 3T 1 þ 4T 2 þ 2E þ A2 þ A1 , ð19:238Þ
T 2 × T 2 = T 1 þ T 2 þ E þ A1 , ð19:239Þ
T 2 × A1 = T 2 : ð19:240Þ
Admittedly, (19.238) and (19.239) contain A1, and so the transition is allowed. As
for (19.240), however, the transition is forbidden because it does not contain A1. In
light of the character table of Td (Table 19.12), we find that the allowed transitions
(i.e., t 2 → t 2 , t 2 → a1 , and a1 → t 2) equally take place in the direction polarized along
the x-, y-, z-axes. This is often the case with molecules having higher symmetries
such as methane. The transition a1 → a1 , however, is forbidden.
In the above argument including the energetic consideration, we could not tell
magnitude relationship of MO energies or photon energies associated with the
826 19 Applications of Group Theory to Physical Chemistry
optical transition. Once again, this requires more accurate calculations and experi-
ments. Yet, the discussion we have developed gives us strong guiding principles in
the investigation of molecular science.
References
1. Cotton FA (1990) Chemical applications of group theory, 3rd edn. Wiley, New York
2. Hamermesh M (1989) Group theory and its application to physical problems. Dover, New York
3. Atkins P, Friedman R (2005) Molecular quantum mechanics, 4th edn. Oxford University Press,
Oxford
4. Anslyn EV, Dougherty DA (2006) Modern physical organic chemistry. University Science
Books, Sausalito
5. Gable KP (2014) Molecular orbitals: methane. http://www.science.oregonstate.edu/~gablek/
CH334/Chapter1/methane_MOs.htm
Chapter 20
Theory of Continuous Groups
In Chap. 16, we classified the groups into finite groups and infinite groups. We
focused mostly on finite groups and their representations in the precedent chapters.
Of the infinite groups, continuous groups and their properties have been widely
investigated. In this chapter we think of the continuous groups as a collective term
that include Lie groups and topological groups. The continuous groups are also
viewed as the transformation group. Aside from the strict definition, we will see the
continuous group as a natural extension of the rotation group or SO(3) that we
studied briefly in Chap. 17. Here we reconstruct SO(3) on the basis of the notion of
infinitesimal rotation. We make the most of the exponential functions of matrices the
theory of which we have explored in Chap. 15. In this context, we study the basic
properties and representations of the special unitary group SU(2) that has close
relevance to SO(3). Thus, we focus our attention on the representation theory of
SU(2) and SO(3). The results are associated with the (generalized) angular momenta
which we dealt with in Chap. 3. In particular, we show that the spherical surface
harmonics constitute the basis functions of the representation space of SO(3).
Finally, we study various important properties of SU(2) and SO(3) within a
framework of Lie groups and Lie algebras. The last sections comprise the description
of abstract ideas, but those are useful for us to comprehend the constitution of the
group theory from a higher perspective.
First let us consider how a function Ψ is transformed by the rotation (see Sect.
19.1). Here we express Ψ by
c1
Ψ = ðψ 1 ψ 2 ⋯ ψ d Þ c2
⋮
, ð20:1Þ
cd
where the index θ indicates the rotation angle in a real space whose dimension is two
or three. Note that the dimensions of the real space and representation space may or
may not be the same. Equation (20.2) is essentially the same as (19.11).
As a three-dimensional matrix representation of rotation, we have, e.g.,
cos θ – sin θ 0
Rθ = sin θ cos θ 0 : ð20:3Þ
0 0 1
The matrix Rθ represents a rotation around the z-axis in ℝ3; see Fig. 11.2 and
(11.31). Borrowing the notation of (17.3) and (17.12), we have
cos θ – sin θ 0 x
R θ ð xÞ = ð e1 e 2 e 3 Þ sin θ cos θ 0 y : ð20:4Þ
0 0 1 z
cos θ sin θ 0 x
Rθ- 1 ðxÞ = ðe1 e2 e3 Þ - sin θ cos θ 0 y : ð20:5Þ
0 0 1 z
1 θ 0 x x þ θy
Rθ- 1 ðxÞ ≈ ðe1 e2 e3 Þ - θ 1 0 y = ðe1 e2 e3 Þ y - θx
0 0 1 z z ð20:6Þ
= ðx þ θyÞe1 þ ðy - θxÞe2 þ ze3 :
Putting
we have
or
∂ ∂
Mz - i x -y , ð20:7Þ
∂y ∂x
we get
Note that the operator Mz has already appeared in Sect. 3.5 [also see (3.19), Sect.
3.2], but in this section we denote it by Mz to distinguish it from the previous
notation Mz. This is because Mz and Mz are connected through a unitary similarity
transformation (vide infra).
From (20.8), we express an infinitesimal rotation θ around the z-axis as
Rθ = 1 - iθMz : ð20:9Þ
n
Rθ = Rθ=n , ð20:10Þ
where RHS of (20.10) implies that the rotation Rθ of a finite angle θ is attained by
n successive infinitesimal rotations of Rθ/n with a large enough n. Taking a limit
n → 1 [1, 2], we get
n
n θ
Rθ = lim Rθ=n = lim 1 - i Mz = expð- iθMz Þ: ð20:11Þ
n→1 n→1 n
Recall the following definition of an exponential function other than (15.7) [3]:
x n
ex lim 1 þ : ð20:12Þ
n→1 n
Note that as in the case of (15.7), (20.12) holds when x represents a matrix. Thus,
if a dimension of the representation space d is two or more, (20.9) should be read as
Rθ = E - iθMz , ð20:13Þ
Rθ = expðθAz Þ, ð20:14Þ
with
Az - iMz : ð20:15Þ
Operators of this type play an essential role in continuous groups. This is because
in (20.14) the operator Az represents a Lie algebra, which in turn generates a Lie
group. The Lie group is categorized as one of the continuous groups along with a
topological group. In this chapter we further explore various characteristics of
SO(3) and SU(2), both of which are Lie groups and have been fully investigated
from various aspects. The Lie algebras frequently appear in the theory of continuous
groups in combination with the Lie groups. Brief outline of the theory of Lie groups
and Lie algebras will be given at the end of this chapter.
Let us further consider implications of (20.9) and (20.13). From those equations
we have
Mz = ð1 - Rθ Þ=iθ ð20:16Þ
or
20.1 Introduction: Operators of Rotation and Infinitesimal Rotation 831
Mz = ðE - Rθ Þ=iθ: ð20:17Þ
R θ ð e3 Þ = e 3 :
1 0 0 1 -θ 0
E= 0 1 0 and Rθ = θ 1 0 : ð20:18Þ
0 0 1 0 0 1
0 -i 0
Mz = i 0 0 : ð20:19Þ
0 0 0
0 -i 0 c1
Mz ðΧÞ = ðx y zÞ i 0 0 c2 : ð20:20Þ
0 0 0 c3
c1
Χ = ðx y zÞ c2 , ð20:21Þ
c3
832 20 Theory of Continuous Groups
where c1, c2, and c3 are coefficients (or coordinates). Again, this notation is a natural
extension of (11.13). We emphasize once more that (x y z) does not represent
coordinates but functions. As an example, we depict a function f(x, y) = ay
(a > 0) in Fig. 20.1. The function ay (drawn green in Fig. 20.1) behaves as a vector
with respect to a linear transformation like a basis vector e2 of the Cartesian
coordinate. The “directionality” as a vector can easily be visualized with the function
ay.
Since Mz of (20.19) is Hermitian, it must be diagonalized by unitary similarity
transformation (Sect. 14.3). We find a diagonalizing unitary matrix P with respect to
Mz such that
1 1
-p p 0
2 2
P= i i : ð20:22Þ
-p -p 0
2 2
0 0 1
0 -i 0 c1
Mz ðX Þ = ðx y zÞP P - 1 i 0 0 P P-1 c2
0 0 0 c3
1 1
-p p 0
2 2 1 0 0
= ðx y zÞ i i 0 -1 0
-p -p 0
2 2 0 0 0
0 0 1
1 i
-p p 0
2 2 c1
1 i 1 i
× 1 i c2 = - p x- p y p x- p y z
p p 0 2 2 2 2
2 2 c3
0 0 1
1 i
1 0 0 - p c1 þ p c2
2 2
× 0 -1 0 1 i : ð20:23Þ
p c 1 þ p c2
0 0 0 2 2
c3
1 0 0
Mz P -1
Mz P = 0 -1 0 : ð20:24Þ
0 0 0
Equation (20.24) was obtained by putting l = 1 and shuffling the rows and
columns of (3.158) in Chap. 3. That is, choosing a (real) unitary matrix R3 given by
0 0 1
R3 = 1 0 0 , ð17:116Þ
0 1 0
we have
-1 0 0
-1
R3- 1 M z R3 = R3- 1 P - 1 Mz PR3 = ðPR3 Þ Mz ðPR3 Þ = 0 0 0 = Mz:
0 0 1
834 20 Theory of Continuous Groups
3 1 1 1
Y 11 Y 1- 1 Y 01 = - p ðx þ iyÞ p ðx - iyÞ z , ð20:25Þ
4π r 2 2
where r is a radial coordinate (i.e., a positive number) that appears in the polar
coordinate (r, θ). Note that the factor r contained in Y 11 Y 1- 1 Y 01 is invariant
with respect to the rotation.
We can generalize the results and conform them to the discussion of related
characteristics in representation spaces of higher dimensions. The spherical surface
harmonics possess an important position for this (vide infra).
From Sect. 3.5, (3.150) we have
M z Y 00 = 0,
where Y 00 ðθ, ϕÞ = 1=4π. The above relation obviously shows that Y 00 ðθ, ϕÞ belongs
to an eigenvalue of zero of Mz. This is consistent with the earlier comments made on
the one-dimensional real space.
0 0 0 0 0 i
Mx = 0 0 -i and My = 0 0 0 :
0 i 0 -i 0 0
0 -1 0 0 0 0
Az = 1 0 0 , Ax = 0 0 -1 ,
0 0 0 0 1 0
0 0 1
Ay = 0 0 0 , ð20:26Þ
-1 0 0
1 1 1 1
expðθAz Þ = E þ θAz þ ðθAz Þ2 þ ðθAz Þ3 þ ðθAz Þ4 þ ðθAz Þ5
2! 3! 4! 5!
þ ⋯: ð20:27Þ
We note that
-1 0 0 1 0 0
A2z = 0 -1 0 = A6z , A4z = 0 1 0 = A8z etc:;
0 0 0 0 0 0
0 1 0 0 -1 0
A3z = -1 0 0 = A7z , A5z = 1 0 0 = A9z :
0 0 0 0 0 0
Thus, we get
836 20 Theory of Continuous Groups
expðθAz Þ
1 1 1 1
= E þ ðθAz Þ2 þ ðθAz Þ4 þ ⋯ þ θAz þ ðθAz Þ3 þ ðθAz Þ5 þ ⋯
2! 4! 3! 5!
1 1
1 - θ2 þ θ4 - ⋯ 0 0
2! 4!
= 1 1
0 1 - θ 2 þ θ4 - ⋯ 0
2! 4!
0 0 1
1 1
0 - θ þ θ3 - θ5 þ ⋯ 0
3! 5!
þ 1 1 ð20:28Þ
θ - θ 3 þ θ5 þ ⋯ 0 0
3! 5!
0 0 0
cos θ 0 0 0 - sin θ 0
= 0 cos θ 0 þ sin θ 0 0
0 0 1 0 0 0
cos θ - sin θ 0
= sin θ cos θ 0 :
0 0 1
1 0 0
expðφAx Þ = 0 cos φ - sin φ , ð20:29Þ
0 sin φ cos φ
cos ϕ 0 sin ϕ
exp ϕAy = 0 1 0 : ð20:30Þ
- sin ϕ 0 cos ϕ
Of the above operators, using (20.28) and (20.30) we recover (17.101) related to
the Euler angles such that
Although (20.31) is useful for the matrix representation in the real space, its
extension to abstract representation spaces would somewhat be limited. In this
respect, the method developed in SU(2) can widely be used to produce representa-
tion matrices for a representation space of an arbitrary dimension. Let us start with
constructing those representation matrices using (2, 2) complex matrices.
In Sect. 16.1 we mentioned a general linear group GL(n, ℂ). There are many sub-
groups of GL(n, ℂ). Among them, SU(n), in particular SU(2), plays a central role in
theoretical physics and chemistry. In this section we examine how to construct
SU(2) matrices.
Definition 20.1 A group consisting of (n, n) unitary matrices with a determinant 1 is
called a special unitary group of degree n and denoted by SU(n).
In Chap. 14 we mentioned that the “absolute value” of determinant of a unitary
matrix is 1. In Definition 20.1, however, there is an additional constraint in that we
are dealing with unitary matrices whose determinant is 1.
The SU(2) matrices U have the following general form:
a b
U= , ð20:32Þ
c d
ac þ bd
2
a b a c aj2 þ b
=
c d b d a c þ b d cj2 þ d
2
2
ð20:33Þ
a c a b aj2 þ c a b þ c d 1 0
= = = :
b d c d ab þ cd bj2 þ d
2 0 1
Then we have
Of the above conditions, (vii) comes from detU = 1. Condition (iv) can be
eliminated by conditions (i)-(iii). From (v) and (vii), using Cramer’s rule we have
838 20 Theory of Continuous Groups
0 b a 0
1 a - b -b 1 a
c= = = - b , d= =
a b jaj2 þjbj2 a b jaj2 þjbj2
-b a -b a
= a : ð20:34Þ
From (20.34) we see that (i)–(iv) are equivalent and that (v) and (vi) are redun-
dant. Thus, as a general form of U we get
a b
U= with aj2 þ bj2 = 1: ð20:35Þ
- b a
we have
p2 þ q2 þ r 2 þ s2 = 1: ð20:36Þ
a -b
U - 1 = U{ = :
b a
Also putting
a1 b1 a2 b2
U1 = and U2 = , ð20:37Þ
- b1 a1 - b2 a2
we get
a1 a2 - b1 b2 a1 b2 þ b1 a2
U1U2 = : ð20:38Þ
- a2 b1 - a1 b2 - b1 b2 þ a1 a2
Thus, both U-1 and U1U2 satisfy criteria (20.35) and, hence, the unitary matrices
having a matrix form of (20.35) constitute a group; i.e., SU(2).
20.2 Rotation Groups: SU(2) and SO(3) 839
Next we wish to connect the SU(2) matrices to the matrices that represent the
angular momentum. In Sect. 3.4 we developed the theory of generalized angular
momentum. In the case of j = 1/2 we can obtain accurate information about the spin
states. Using (3.101) as a starting point and defining the state belonging to the spin
1 0
angular momentum μ = - 1/2 and μ = 1/2 as and , respectively, we
0 1
obtain
0 0 0 1
J ð þÞ = and J ð- Þ = : ð20:39Þ
1 0 0 0
1 0
Note that andare in accordance with (2.64) as a column vector
0 1
representation. Using (3.72), we have
1 0 1 1 0 i 1 -1 0
Jx = , Jy = , Jz = : ð20:40Þ
2 1 0 2 -i 0 2 0 1
1 1
In (20.40) Jz was given so that we may have J z =- 1
2 and
0 0
0 0
Jz = 1
2 . Deleting the coefficient 12 from (20.40), we get Pauli spin matrices
1 1
such that
0 1 0 i -1 0
σx = , σy = , σz = : ð20:41Þ
1 0 -i 0 0 1
1 0 -i 1 0 1 1 i 0
ζx = , ζy = , ζz = : ð20:42Þ
2 -i 0 2 -1 0 2 0 -i
expðtAÞ
α α
α α cos þ i sin 0
expðαζ z Þ = E cos - iσ z sin = 2 2
α α
2 2 0 cos - i sin ð20:43Þ
2 2
iα=2
e 0
= :
0 e - iα=2
Readers are recommended to derive this. Using ζ y and applying similar calcula-
tion procedures to the above, we also get
β β
cos sin
exp βζ y = 2 2 : ð20:44Þ
β β
- sin cos
2 2
The representation matrices describe “complex rotations” and (20.43) and (20.44)
ð1=2Þ
correspond to (20.28) and (20.30), respectively. We define Dα,β,γ as the representa-
tion matrix that describes the successive rotations α, β, and γ and corresponds to R~3
in (17.101) and (20.31) including Euler angles. As a result, we have
ð1=2Þ
Dα,β,γ = expðαζz Þ exp βζ y expðγζ z Þ
β β
e2ðαþγÞ cos e2ðα - γÞ sin
i i
= 2 2 , ð20:45Þ
β β
- e2ðγ - αÞ sin e - 2ðαþγÞ cos
i i
2 2
ð1=2Þ
where the superscript 1/2 of Dα,β,γ corresponds to the non-negative generalized
angular momentum j (=1/2). Equation (20.45) can be obtained by putting a =
e2ðαþγÞ cos β2 and b = e2ðα - γÞ sin β2 in the matrix U of (20.35). Using this general
i i
Suppose that we have two functions (or vectors) v and u. We regard v and u as
linearly independent vectors that undergo a unitary transformation by a unitary
matrix U of (20.35). That is, we have
a b
v ′ u ′ = ðv uÞU = ðv uÞ , ð20:46Þ
- b a
v0 = av - b u, u0 = bv þ a u:
u jþm v j - m
fm = ðm = - j; - j þ 1; ⋯; j - 1; jÞ, ð20:47Þ
ðj þ mÞ!ðj - mÞ!
ðjÞ
Rða; bÞðf m Þ f m ′ Dm ′ m ða; bÞ, ð20:48Þ
m′
ðjÞ
where Dm0 m ða, bÞ denotes matrix elements with respect to the transformation.
Replacing u and v with u′ and v′, respectively, and using the binomial theorem we get
1
Rða, bÞf m ¼ ðbv þ a uÞjþm ðav - b uÞj - m
ðj þ mÞ!ðj - mÞ!
1
¼ μ,ν
ðbv þ a uÞjþm ðav - b uÞj - m
ðj þ mÞ!ðj - mÞ!
1 ðjþmÞ!ðj-mÞ!
¼ ða uÞjþm-μ ðbvÞμ ð-b uÞj-m-ν ðavÞν
μ,ν ðjþmÞ!ðj-mÞ! ðjþm-μÞ!μ!ðj-m-νÞ!ν!
ðj þ mÞ!ðj - mÞ!
¼ ða Þjþm - μ bμ ð- b Þj - m - ν aν u2j - μ - ν vμþν :
ðj þ m - μÞ!μ!ðj - m - νÞ!ν!
μ,ν
ð20:49Þ
Here, to eliminate ν, putting
2j - μ - ν = j þ m0 , μ þ ν = j - m0 ,
we get
ð j þ mÞ!ð j - mÞ!
ℜða, bÞf m ¼
μ, m 0
ð j þ m - μ Þ!μ! ðm0 - m þ μÞ!ð j - m0 - μÞ!
0
ða Þ jþm - μ bμ ð- b Þm - mþμ a j - m
0
- μ jþm0 j - m0
u v : ð20:50Þ
Expressing f m0 as
842 20 Theory of Continuous Groups
0 0
u jþm v j - m
f m0 = ðm0 = - j, - j þ 1, ⋯, j - 1, jÞ, ð20:51Þ
ð j þ m0 Þ!ð j - m0 Þ!
we obtain
ℜða,bÞð f m Þ
ð j þ mÞ!ð j-mÞ!ð j þ m0 Þ!ð j-m0 Þ! 0
ða Þ jþm-μ bμ ð-b Þm -mþμ a j-m - μ :
0
= f m0
μ, m 0
ð j þ m-μÞ!μ!ðm0 -m þ μÞ!ð j-m0 -μÞ!
ð20:52Þ
ð jÞ
Dm0 m ða, bÞ
ð j þ mÞ!ð j - mÞ!ð j þ m0 Þ!ð j - m0 Þ! 0
ða Þ jþm - μ bμ ð- b Þm - mþμ aj - m - μ :
0
=
μ ð j þ m - μÞ!μ!ðm0 - m þ μÞ!ð j - m0 - μÞ!
ð20:53Þ
0
ð jÞ ð- 1Þm - mþμ ð j þ mÞ!ð j - mÞ!ð j þ m0 Þ!ð j - m0 Þ!
Dm0 m ðα, β, γ Þ =
μ ð j þ m - μÞ!μ!ðm0 - m þ μÞ!ð j - m0 - μÞ! ð20:54Þ
0 0
- iðαm0 þγmÞ β 2jþm - m - 2μ β m - mþ2μ
× e cos 2 sin 2 :
In (20.54), the notation D( j )(a, b) has been replaced with D( j )(α, β, γ), where α, β,
and γ represent Euler angles that appeared in (17.101). Equation (20.54) is called
Wigner formula [5–7].
ðjÞ
To confirm that Dm0 m is indeed unitary, we carry out the following calculations.
From (20.47) we have
Meanwhile, we have
20.2 Rotation Groups: SU(2) and SO(3) 843
where with the last equality k was replaced with j - m. Comparing the above two
equations, we get
j 2j
1
jf m j2 = juj2 þ jvj2 : ð20:55Þ
m= -j
ð2jÞ!
v0 v
= U{ , ð20:56Þ
u0 u
a b
where we defined U = (i.e., a unitary matrix). Operating (20.56) on
- b a
both sides of (20.46) from the right, we get
v0 v v
ð v0 u0 Þ = ðv uÞUU { = ð v uÞ :
u0 u u
Once representation matrices D( j )(α, β, γ) have been obtained, we are able to get
further useful information. We immediately notice that (20.54) is related to (3.216)
that is explicitly described by trigonometric functions. Putting m = 0 in (20.54) we
have
844 20 Theory of Continuous Groups
ðjÞ ðjÞ
Dm0 0 ðα, β, γ Þ = Dm0 0 ðα, βÞ
0
2j - m0 - 2μ m0 þ2μ
ð- 1Þm þμ j! ðj þ m0 Þ!ðj - m0 Þ! - iαm0 β β
= e cos sin :
μ ðj - μÞ!μ!ðm0 þ μÞ!ðj - m0 - μÞ! 2 2
ð20:58Þ
Note that (20.58) does not depend on the variable γ, because m = 0 leads to
eimγ = ei 0 γ = e0 = 1 in (20.54). Notice also that the event of m = 0 in (20.54)
never happens with the case where j is a half-odd-integer but happens with the case
where j is a half-odd-integer but happens with the case where j is an integer l. In fact,
D(l )(α, β, γ) is a (2l + 1)-degree representation of SO(3). Later in this section, we will
give a full matrix form in the case of l = 1. Replacing m′ with m in (20.58), we get
ðjÞ
Dm0 ðα, β, γ Þ
2j - m - 2μ
ð- 1Þmþμ j! ðj þ mÞ!ðj - mÞ! - iαm β β
mþ2μ
= e cos sin :
μ ðj - μÞ!μ!ðm þ μÞ!ðj - m - μÞ! 2 2
ð20:59Þ
ðlÞ
Dm0 ðϕ, θÞ
2l - 2r - m
ð- 1Þmþr l! ðl þ mÞ!ðl - mÞ! - imϕ θ θ
2rþm
= e cos sin :
r r!ðl - m - r Þ!ðl - r Þ!ðm þ r Þ! 2 2
ð20:60Þ
ðlÞ 4π m
Dm0 ðϕ, θÞ = Y ðθ, ϕÞ: ð20:61Þ
2l þ 1 l
From a general point of view, let us return to a discussion of SU(2) and introduce
a following important theorem in relation to the basis functions of SU(2).
Theorem 20.1 [1] Let {ψ -j, ψ -j + 1, ⋯, ψ j} be a set of (2j + 1) functions. Let D( j )
be a representation of the special unitary group SU(2). Suppose that we have a
following expression for ψ m, k (m = - j, ⋯, j) such that
20.2 Rotation Groups: SU(2) and SO(3) 845
( , , )
II
ðjÞ
ψ m,k ðα, β, γ Þ = Dmk ðα, β, γ Þ , ð20:62Þ
where α, β, and γ are Euler angles that appeared in (17.101). Then, the above
functions are basis functions of the representation of D( j ).
Proof In Fig. 20.2 we show coordinate systems O, I, and II. Let P, Q, R be operators
of transformation (i.e., rotation) between the coordinate system as shown. The
transformation Q is defined as the coordinate transformation that transforms I to
II. Note that their inverse transformations, e.g., Q-1 exist. In. Fig. 20.2, P and R are
specified as P(α0, β0, γ 0) and R(α, β, γ), respectively. Then, we have [1, 2]
Q - 1 R = P: ð20:63Þ
ðjÞ
Qψ m,k ðα, β, γ Þ = ψ m,k Q - 1 ðα, β, γ Þ = ψ m,k ðα0 , β0 , γ 0 Þ = Dmk Q - 1 R ,
where the last equality comes from (20.62) and (20.63); (α, β, γ) and (α0, β0, γ 0) stand
for the transformations R(α, β, γ) and P(α0, β0, γ 0), respectively. The matrix element
ðjÞ
Dmk Q - 1 R can be written as
ðjÞ ðjÞ ðjÞ
Dmk Q - 1 R = Dðmn
jÞ
Q - 1 Dnk ðRÞ = Dðnm
jÞ
ðQÞ Dnk ðRÞ,
n n
846 20 Theory of Continuous Groups
where with the last equality we used the unitarity of the representation matrix D( j ).
Taking complex conjugate of the above expression, we get
ðjÞ
Qψ m,k ðα, β, γ Þ = ψ m,k ðα0 , β0 , γ 0 Þ = Dðnm
jÞ
ðQÞ Dnk ðRÞ
n
ðjÞ
= Dðnm
jÞ
ðQÞ Dnk ðα, β, γ Þ = ψ n,k ðα, β, γ ÞDðnm
jÞ
ðQÞ, ð20:64Þ
n n
where with the last equality we used the supposition (20.62). Equation (20.64)
implies that ψ m, k (m = - j, ⋯, 0, ⋯, j) are basis functions of the representation
of D( j ). This completes the proof.
If we restrict Theorem 20.1 to SO(3), we get an important result. Replacing j with
an integer l and putting k = 0 in (20.62) and (20.64), we get
ðlÞ
ψ m,0 ðα, β, γ Þ = Dm0 ðα, β, γ Þ : ð20:65Þ
This expression shows that the complex conjugates of individual components for
the “center” column vector of the representation matrix give the basis functions of
the representation of D(l ). Removing γ as it is redundant and by the aid of (20.61), we
get
4π m
ψ m,0 ðα, βÞ Y ðβ, αÞ:
2l þ 1 l
Thus, in combination with (20.61), Theorem 20.1 immediately indicates that the
spherical surface harmonics Y m l ðθ, ϕÞ ðm = - l, ⋯, 0, ⋯, lÞ span the representation
space with respect to D(l ). That is, we have
l ðβ, αÞ =
QY m Y nl ðβ, αÞDðnm
lÞ
ðQÞ: ð20:66Þ
n
Q = RP - 1 : ð20:67Þ
l ðβ, αÞ =
Rðα, β, γ ÞY m Y nl ðβ, αÞDðnm
lÞ
ðα, β, γ Þ: ð20:68Þ
n
Thus, we have
l ðβ - β, α - γ - αÞ = Y l ð0, - γ Þ:
Ym m
l ð0, - γ Þ =
Ym Y nl ðβ, αÞDðnm
lÞ
ðα, β, γ Þ: ð20:70Þ
n
2l þ 1 ðlÞ
l ð0, - γ Þ =
Ym
4π
Dm0 ð- γ, 0Þ :
ðlÞ
Dm0 ð- γ, 0Þ = δm0 :
2l þ 1
l ð0, - γ Þ =
Ym
4π 0m
δ : ð20:71Þ
Notice that this expression is consistent with (3.147) and (3.148). Meanwhile,
replacing ϕ and θ in (20.61) with α and β, respectively, multiplying the resulting
ðlÞ
equation by Dmk ðα, β, γ Þ, and further summing over m, we get
where with the second equality we used the unitarity of the matrices Dðnm
lÞ
ðα, β, γ Þ (see
Sect. 20.2.2) and with the last equality we used (20.70) multiplied by the constant
4π
2lþ1. At the same time, we recovered (20.71) comparing the last two sides of
(20.72).
Rewriting (20.48) in a (2j + 1, 2j + 1) matrix form, we get
f - j f - jþ1 ⋯ f j - 1 f j ℜða, bÞ
D - j, - j ⋯ D - j,j
ð20:73Þ
= f - j f - jþ1 ⋯ f j - 1 f j ⋮ ⋱ ⋮ ,
Dj, - j ⋯ Dj,j
ð0Þ
D00 ðα, β, γ Þ = 1:
ð0Þ p
D00 ðϕ, θÞ = 1 = 4π Y 00 ðθ, ϕÞ, i:e:, Y 00 ðθ, ϕÞ = 1=4π:
D - 1, - 1 D - 1,0 D - 1,1
ð1Þ
D = D0, - 1 D0,0 D0,1
D1, - 1 D1,0 D1,1
β 1 iα β
eiðαþγÞ cos 2 p e sin β eiðα - γÞ sin 2 ð20:74Þ
2 2 2
1 1
= - p eiγ sin β cos β p e - iγ sin β :
2 2
β 1 β
eiðγ - αÞ sin 2 - p e - iα sin β e - iðαþγÞ cos 2
2 2 2
1
p e - iα sin β
ð1Þ 2
Dm0 ðϕ; θÞ = cos β ð20:75Þ
1 iα
- p e sin β
2
that is related to the spherical surface harmonics (see Sect. 3.6); also see p. 760,
Table 15.4 of Ref. [4].
Now, (20.25), (20.61), (20.65), and (20.73) provide a clue to relating D(1) to the
rotation matrix of ℝ3, i.e., R~3 of (17.101) that includes Euler angles. The argument is
as follows: As already shown in (20.61) and (20.65), we have related the basis
850 20 Theory of Continuous Groups
~ 1 p1 ðx - iyÞ,
f- f~0 z,
1
f~1 p ð- x - iyÞ, ð20:76Þ
2 2
we have
1 1
p 0 - p
2 2
~ 1 f~0 f~1 = ðx z yÞU = ðx z yÞ
f- , ð20:77Þ
0 1 0
i i
- p - p
2 0 2
1 1
p 0 - p
2 2
U 0 1 0 :
i i
- p - p
2 0 2
~ 1 0 f~0 0 f~1 0 f -
f- ~ 1 f~0 f~1 Dð1Þ :
~ 1 f~0 f~1 Rða, bÞ = f - ð20:78Þ
~ 1 0 f~0 0 f~1 0 U - 1 = f -
ð x0 z 0 y0 Þ = f - ~ 1 f~0 f~1 Dð1Þ U - 1
= ðx z yÞUDð1Þ U - 1 : ð20:79Þ
1 0 0
V= 0 0 1
0 1 0
-1
ð x0 y0 z 0 Þ = ð x y z Þ U - 1 V Dð1Þ U - 1 V: ð20:80Þ
Note that (x′ z′ y′)V = (x′ y′ z′) and (x z y)V = (x y z); that is, V exchanges the order
of z and y in (x z y).
Meanwhile, regarding (x y z) as the basis vectors in ℝ3 we have
ðx0 y0 z0 Þ = ðx y zÞ R , ð20:81Þ
-1 {
R U - 1V Dð1Þ U - 1 V = U - 1 V Dð1Þ U - 1 V: ð20:82Þ
β β
More explicitly, using a = e2ðαþγÞ cos and b = e2ðα - γÞ sin
i i
2 2 as in the case of
(20.45), we describe R as
Schur’s First Lemma of Sect. 18.3. Thus, D(1) is certainly a representation matrix of
SO(3). The trace of D(1) and R~3 is given by
Let us continue the discussion still further. We think of the quantity (x/r y/r z/r),
where r is radial coordinate of (20.25). Operating (20.82) on (x/r y/r z/r) from the
right, we have
For the confirmation of (20.85), use the spherical coordinate representation (3.17)
to denote x, y, and z in the above equation. This seems to be merely a confirmation of
the unitarity of representation matrices; see (20.72). Nonetheless, we emphasize that
the simple relation (20.85) holds only with a special case where the parameters α and
β in D(1) (or R ) are identical with the angular components of spherical coordinates
associated with the Cartesian coordinates x, y, and z. Equation (20.85), however,
does not hold in a general case where the parameters α and β are different from those
angular components. In other words, (20.68) that describes the special case where
Q R(α, β, γ) leads to (20.85); see Fig. 20.2. Equation (20.66), however, is used for
the general case where Q represents an arbitrary transformation in ℝ3.
Regarding further detailed discussion on the topics, readers are referred to the
literature [1, 2, 5, 6].
20.2 Rotation Groups: SU(2) and SO(3) 853
m0 - m þ 2μ = 0: ð20:86Þ
ðjÞ 0
Dm0 m ðα, 0, γ Þ = δm0 m e - iðαm þγmÞ : ð20:87Þ
ðjÞ
Djm ðα, β, γ Þ
jþm j-m
ð2jÞ! β β
= ð- 1Þj - m e - iðαjþγmÞ cos sin : ð20:88Þ
ðj þ mÞ!ðj - mÞ! 2 2
854 20 Theory of Continuous Groups
Meanwhile, we obtain
ðjÞ
Djm ðα, β, γ Þ aj - am = 0: ð20:91Þ
ðjÞ
Considering that the matrix elements Djm ðα, β, γ Þ in (20.88) do not vanish in a
general case for α, β, γ, from (20.91) we must have aj - am = 0 for m that satisfies
1 ≤ m ≤ j - 1. When m = j, we have a trivial equation 0 = 0. This implies that aj
cannot be determined. Thus, putting am = aj a ≠ 0 (1 ≤ m ≤ j), we have
Am0 m = aδm0 m ,
In this context, Theorem 18.2 (Sect. 18.2) as well as (18.38) teaches us that the
unitary representation D(g) is completely reducible so that we can describe it as a
direct sum such that
where g is any group element. Then, we can choose a following matrix A for a linear
transformation that is commutative with D(g):
where α(1), ⋯, α(ω) are complex constants that may be different from one another;
E1, ⋯, Eω are unit matrices having the same dimension as D(1)(g), ⋯, D(ω)(g),
respectively. Then, even though A is not a type of A = αδm0 m , A commutes with D(g)
for any g.
Here we rephrase Schur’s Second Lemma as the following theorem.
Theorem 20.2 Let D be a unitary representation of a group G. Suppose that with
8
g 2 G we have
ðlÞ
DðlÞ ðα, β, γ Þ DðlÞ R~3 = DðαlÞ ðRzα ÞDβ R0y0 β DðγlÞ R ′ ′ z00 γ : ð20:95Þ
we have
where with each representation two out of three Euler angles are zero.
In light of (20.94), we estimate each factor of (20.96). By the same token as
before, putting β = γ = 0 in (20.94) let us examine on what conditions
m0 - mþ2μ
sin 02 survives. For this factor to survive, we must have m′ -
m + 2μ = 0 as in (20.87). Then, following the argument as before, we should have
μ = 0 and m′ = m. Thus, D(l )(α, 0, 0) must be a diagonal matrix. This is the case with
D(l )(0, 0, γ) as well.
Equation (3.216) implies that the functions Y m l ðβ, αÞ ðm = - l, ⋯, 0, ⋯, lÞ are
eigenfunctions with regard to the rotation about the z-axis. In other words, we have
- imα m
l ðθ, ϕÞ = Y l ðθ, ϕ - αÞ = e
Rzα Y m Y l ðθ, ϕÞ:
m
l ðθ, ϕÞ = F ðθ Þe
Ym imϕ
,
imðϕ - αÞ
l ðθ, ϕ - αÞ = F ðθ Þe
Ym = e - imα F ðθÞeimϕ = e - imα Y m
l ðθ, ϕÞ:
Y l- l Y l- lþ1 ⋯Y 0l ⋯Y ll Rzα
e - ið- lÞα
e - ið- lþ1Þα
= Y l- l Y l- lþ1 ⋯ Y 0l ⋯Y ll
⋱
e - ilα ð20:97Þ
ilα
e
eiðl - 1Þα
= Y l- l Y l- lþ1 ⋯Y 0l ⋯Y ll :
⋱
e - ilα
From (20.96), with the (m′, m) matrix elements of D(l )(α, β, γ) we get
DðlÞ ðα, β, γ Þ
m0 m
0
= e - im α DðlÞ ð0, β, 0Þ e - imγ :
m0 m
DðlÞ ð0, β, 0Þ
m0 m
0
2lþm - m0 - 2μ m0 - mþ2μ
ð- 1Þm - mþμ ðl þ mÞ!ðl - mÞ!ðl þ m0 Þ!ðl - m0 Þ! β β
¼ cos sin :
μ ðl þ m - μÞ!μ!ðm0 - m þ μÞ!ðl - m0 - μÞ! 2 2
ð20:100Þ
Thus, we find that [DðlÞ ðα, β, γ Þm0 m of (20.99) has been factorized into three
factors. Equation (20.94) has such an implication. The same characteristic is shared
with the SU(2) representation matrices (20.54) more generally. This can readily be
understood from the aforementioned discussion.
Taking D(1) of (20.74) as an example, we have
858 20 Theory of Continuous Groups
ADðlÞ ðα, 0, 0Þ m0 m
= am0 k DðlÞ ðα, 0, 0Þ km
= am0 k e - imα δkm
k k ð20:101Þ
- imα
= am 0 m e ,
0
DðlÞ ðα, 0, 0ÞA = DðlÞ ðα, 0, 0Þ akm = am0 m e - im α : ð20:102Þ
m0 m m0 k
k
0
am0 m e - imα - e - im α = 0:
A = am δm0 m , ð20:103Þ
l ðβ, αÞ =
QY m Y nl ðβ, αÞDðnm
lÞ
ðQÞ: ð20:66Þ
n
l ðβ, 0Þ =
Rð0, θ, 0ÞY m Y nl ðβ, 0ÞDðnm
lÞ
ð0, θ, 0Þ: ð20:104Þ
n
-1
l ðβ, 0Þ = Y l Rð0, θ, 0Þ
Rð0, θ, 0ÞY m m
ðβ, 0Þ = Y m
l ½Rð0, - θ, 0Þðβ, 0Þ
= Y l ðβ - θ, 0Þ:
m
ð20:105Þ
l ðβ - θ, 0Þ =
Ym Y nl ðβ, 0ÞDðnm
lÞ
ð0, θ, 0Þ: ð20:106Þ
n
l ð- θ, 0Þ =
Ym Y nl ð0, 0ÞDðnm
lÞ
ð0, θ, 0Þ: ð20:107Þ
n
Taking account of (20.71), RHS does not vanish in (20.107) only when n = 0. On
this condition, we obtain
ðlÞ
l ð- θ, 0Þ = Y l ð0, 0ÞD0m ð0, θ, 0Þ:
Ym ð20:108Þ
0
2l þ 1
Y 0l ð0, 0Þ = : ð20:109Þ
4π
2l þ 1 ðlÞ
l ð- θ, 0Þ =
Ym D ð0, θ, 0Þ:
4π 0m
ð20:110Þ
ðlÞ 4π m
D0m ð0, - θ, 0Þ = Y ðθ, 0Þ: ð20:111Þ
2l þ 1 l
Compare (20.111) with (20.61) and confirm (20.111) using (20.74) with l = 1.
Viewing (20.111) as a “row vector,” (20.111) with l = 1 can be expressed as
4π - 1 1
= Y ðθ, 0Þ Y 01 ðθ, 0Þ Y 11 ðθ, 0Þ = p sin θ cos θ - p1 sin θ :
3 1 2 2
and
where we assume A = am δm0 m in (20.103). Since from (20.108) [D(l )(0, θ, 0)]0m does
not vanish other than specific values of θ, for A and D(l )(0, θ, 0) to commute we must
have
a m = a0
A = a0 δ m 0 m , ð20:112Þ
where a0 is an arbitrary complex number. Thus, a matrix that commutes with both
D(l )(α, 0, 0) and D(l )(0, β, 0) must be of a form of (20.112). The same discussion
holds with D(l )(α, β, γ) of (20.94) as a whole with regard to the commutativity. On
the basis of Schur’s Second Lemma (Sect. 18.3), we conclude that the representation
D(l )(α, β, γ) is again irreducible. This is one of the prominent features of SO(3) as
well as SU(2).
As can be seen clearly in (20.97), the dimension of representation matrix
D(l )(α, β, γ) is 2l + 1. Suppose that there is another representation matrix
0 0
Dðl Þ ðα, β, γ Þ ðl0 ≠ lÞ. Then, D(l )(α, β, γ) and Dðl Þ ðα, β, γ Þ are inequivalent (see Sect.
18.3). Meanwhile, the unitary representation of SO(3) is completely reducible and,
hence, any reducible unitary representation D can be described by
20.2 Rotation Groups: SU(2) and SO(3) 861
where al is zero or a positive integer. If al ≥ 2, this means that the same represen-
tations D(l )(α, β, γ) repeatedly appear. Equation (20.113) provides a tangible exam-
ple of (20.92) expressed as
and applies to the representation matrices of SU(2) more generally. We will encoun-
ter related discussion in Sect. 20.3 in connection with the direct-product
representation.
As already discussed in Sect. 17.4.2, we need three angles α, β, and γ to specify the
rotation in ℝ3. Their domains are usually taken as follows:
0 ≤ α ≤ 2π, 0 ≤ β ≤ π, 0 ≤ γ ≤ 2π:
u1
u= u2 :
u3
Then, we have
A u = 1u = u: ð20:114Þ
AT A u = Eu = u = AT u, ð20:115Þ
where we used the property of an orthogonal matrix; i.e., ATA = E. From (20.114)
and (20.115), we have
A - AT u = 0: ð20:116Þ
That is,
u1
If we normalize u, i.e., u12 + u22 + u32 = 1, u2 gives direction cosines of the
u3
eigenvector, namely the rotation axis. Equation (20.119) applies to any orthogonal
matrix. A simple example is (20.3); a more complicated example can be seen in
(17.107). Confirmation is left for readers. We use this property soon below.
Let us consider infinitesimal transformations of rotation. As already implied in
(20.6), an infinitesimal rotation θ around the z-axis is described by
1 -θ 0
Rzθ = θ 1 0 : ð20:120Þ
0 0 1
y
O
1 0 0 1 0 η
Rxξ = 0 1 -ξ , Ryη = 0 1 0 ,
0 ξ 1 -η 0 1
1 -ζ 0
Rzζ = ζ 1 0 : ð20:121Þ
0 0 1
These operators represent infinitesimal rotations ξ, η, and ζ around the x-, y-, and
z-axes, respectively (see Fig. 20.3). Note that these operators commute one another
to the first order of infinitesimal quantities ξ, η, and ζ. That is, we have
Rxξ Ryη = Ryη Rxξ , Ryη Rzζ = Rzζ Ryη , Rxξ Ryη Rzζ = Rzζ Ryη Rxξ , etc:
with, e.g.,
1 -ζ η
Rxξ Ryη Rzζ = ζ 1 -ξ ð20:122Þ
-η ξ 1
to the first order. Notice that the order of operations Rxξ, Ryη, and Rzζ is disregarded.
Next, let us consider a successive transformation with a finite rotation angle ω that
follows the infinitesimal rotations. Readers may well ask why and for what purpose
we need to make such elaborate calculations. It is partly because when we were
dealing with finite groups in the previous chapters, group elements were finite and
864 20 Theory of Continuous Groups
1 -ζ η cos ω – sin ω 0
R = RΔ Rω = ζ 1 -ξ sin ω cos ω 0
-η ξ 1 0 0 1
ð20:123Þ
cos ω - ζ sin ω - sin ω - ζ cos ω η
= sin ω þ ζ cos ω cos ω - ζ sin ω - ξ :
ξ sin ω - η cos ω ξ cos ω þ η sin ω 1
Hence, we have
Since (20.124) gives a relative directional ratio of the rotation axis, we should
normalize a vector whose components are given by (20.124) to get direction cosines
of the rotation axis. Note that the direction of the rotation axis related to R of
(20.123) should be close to Rω and that only R12 - R21 in (20.124) has a term
(i.e., 2 sin ω) lacking infinitesimal quantities ξ, η, ζ. Therefore, it suffices to
normalize R12 - R21 to seek the direction cosines to the first order. Hence, dividing
(20.124) by 2 sin ω, as components of the direction cosines we get
Meanwhile, combining (17.78) and (20.123), we can find a rotation angle ω′ for
R in (20.123). The trace χ of (20.123) is written as
χ = 1 þ 2 cos ω0 : ð20:127Þ
1 02 1
1- ω ≈ 1 - ω2 - ζω: ð20:129Þ
2 2
Hence, we obtain
ω0 ≈ ω þ ζ: ð20:130Þ
Combining (20.125) and (20.130), we get their product to the first order such that
Since these quantities are products of individual direction cosines and the rotation
angle ω + ζ, we introduce variables ~x, ~y, and ~z as the x-, y-, and z-related quantities,
respectively. Namely, we have
where we used formulae of trigonometric functions with the last equality. Note that
(20.133) does not depend on the ± signs of ω. This is because J represents the
relative volume density ratio between two parameter spaces of the ~x~y~z -system and
ξηζ-system and this ratio is solely determined by the modulus of rotation angle.
Taking ω → 0, we have
0
ω2 ð ω2 Þ 1 2ω
lim J = lim ω = ωlim 0 = lim
ω→0 ω→0
4 sin 2 → 0 sin 2 ω 4 ω→0 ω 1 ω
2 2 2 sin cos
2 2 2
0
ω ð ωÞ 1
= lim = lim 0 = lim = 1:
ω → 0 sin ω ω → 0 ð sin ωÞ ω → 0 cos ω
Thus, the relative volume density ratio in the limit of ω → 0 is 1, as expected. Let
dV = d~xd~yd~z and dΠ = dξdηdζ be volume elements of each coordinate system. Then
we have
dV = JdΠ: ð20:134Þ
We assume that
dV ρξηζ
J= = ,
dΠ ρ~x~y~z
where ρ~x~y~z and ρξηζ are a “number density” of group elements in each coordinate
system. We suppose that the infinitesimal rotations in the ξηζ-coordinate are
converted by a finite rotation Rω of (20.123) into the ~x~y~z -coordinate. In this way,
J can be viewed as an “expansion” coefficient as a function of rotation angle jωj.
Hence, we have
20.2 Rotation Groups: SU(2) and SO(3) 867
ρξηζ
dV = dΠ or ρ~x~y~z dV = ρξηζ dΠ: ð20:135Þ
ρ~x~y~z
Equation (20.135) implies that total (infinite) number of group elements that are
contained in the group SO(3) is invariant with respect to the transformation.
Let us calculate the total volume of SO(3). This can be measured such that
1
dΠ = dV: ð20:136Þ
J
π
π 2π
4 sin 2 ω~2 2
Π½SOð3Þ = dΠ = ~
dω dθ dϕ ~ sin θ = 8π 2 ,
ω ð20:137Þ
0 0 ~2
ω
0
where we denote the total volume of SO(3) by Π[SO(3)]. Note that ω in (20.133) is
replaced with ω ~ j ω j so that the radial coordinate can be positive. As already
noted, (20.133) does not depend on the ± signs of ω. In other words, among the
angles α, β, and ω (usually γ is used instead of ω; see Sect. 17.4.2), ω is taken as -
π ≤ ω < π (instead of 0 ≤ ω < 2π) so that it can be conformed to the radial
coordinate.
We utilize (20.137) for calculation of various functions. Let f(ϕ, θ, ω) be an
arbitrary function on a parameter space. We can readily estimate a “mean value”
of f(ϕ, θ, ω) on that space. That is, we have
f ðϕ, θ, ωÞdΠ
f ðϕ, θ, ωÞ , ð20:138Þ
dΠ
π
π 2π
~
ω
f ðϕ, θ, ωÞ = 4 ~
dω dθ dϕf ðϕ, θ, ωÞ sin 2 sin θ = dΠ: ð20:139Þ
0 0 2
0
To evaluate (20.139), let us get back to the irreducible representations D(l )(α, β, γ) of
Sect. 20.2.3. Also, we recall how successive coordinate transformations produce
changes in the orthogonal matrices of transformation (Sect. 17.4.2). Since the
parameters α, β, and γ uniquely specify individual rotation, different sets of α, β,
and γ (in this section we use ω instead of γ) cause different irreducible representa-
tions. A corresponding rotation matrix is given by (17.101). In fact, the said matrix is
a representation of the rotation. Keeping these points in mind, we discuss irreducible
representations and their characters particularly in relation to the orthogonality of the
irreducible characters.
Figure 20.4 depicts a geometrical arrangement of two rotation axes accompanied
by the same rotation angle ω. Let Rω and R0ω be such two rotations around the
rotation axis A and A′, respectively. Let Q be another rotation that transforms the
rotation axis A to A′. Then we have [1, 2]
This implies that Rω and R0ω belong to the same conjugacy class. To consider the
situation more clearly, let us view the two rotations Rω and R0ω from two different
coordinate systems, say some xyz-coordinate system and another x′y′z′-coordinate
system. Suppose also that A coincides with the z-axis and that A′ coincides with the
z′-axis. Meanwhile, if we describe Rω in reference to the xyz-system and R0ω in
reference to the x′y′z′-system, the representation of these rotations must be identical
in reference to the individual coordinate systems. Let this representation matrix be
Rω .
(i) If we describe Rω and R0ω with respect to the xyz-system, we have
-1
Rω = Rω , R0ω = Q - 1 Rω Q - 1 = QRω Q - 1 : ð20:141Þ
Namely, we reproduce
R0ω = Rω , Rω = Q - 1 Rω Q: ð20:142Þ
Hence, again we reproduce (20.140). That is, the relation (20.140) is independent
of specific choices of the coordinate system.
Taking the representation indexed by l of Sect. 20.2.4 with respect to (20.140), we
have
where with the last equality we used (18.6). Operating D(l )(Q) on (20.143) from the
right, we get
Since both DðlÞ R0ω and D(l )(Rω) are irreducible, from (18.41) of Schur’s First
Lemma DðlÞ R0ω and D(l )(Rω) are equivalent. An infinite number of rotations
determined by the orientation of the rotation axis A that is accompanied by an
azimuthal angle α (0 ≤ α < 2π) and a zenithal angle β (0 ≤ β < π) (see
Fig. 17.18) form a conjugacy class with each specific ω shared. Thus, we have
classified various representations D(l )(Rω) according to ω and l. Meanwhile, we may
identify D(l )(Rω) with D(l )(α, β, γ) of Sect. 20.2.4.
In Sect. 20.2.2 we know that the spherical surface harmonics
l θ, ϕÞ ðm = - l, ⋯, 0, ⋯, lÞ constitute basis functions of D
ð (l )
Ym and span the repre-
sentation space. A dimension of the matrix (or the representation space) is 2l + 1.
0
Therefore, D(l ) and Dðl Þ for different l and l′ have a different dimension. From
0
(18.41) of Schur’s First Lemma, such D(l ) and Dðl Þ are inequivalent.
Returning to (20.139), let us evaluate a mean value of f ðϕ, θ, ωÞ. Let the trace of
DðlÞ R0ω and D(l )(Rω) be χ ðlÞ R0ω and χ (l )(Rω), respectively. Remembering (12.13),
the trace is invariant under a similarity transformation, and so χ ðlÞ R0ω = χ ðlÞ ðRω Þ.
Then, we put
Consequently, it suffices to evaluate the trace χ (l )(ω) using D(l )(α, 0, 0) of (20.96)
whose representation matrix is given by (20.97). We have
870 20 Theory of Continuous Groups
l
e - ilω 1 - eiωð2lþ1Þ
χ ðlÞ ðωÞ = eimω =
m= -l
1 - eiω
- ilω
e ð1 - e - iω Þ 1 - eiωð2lþ1Þ
= , ð20:146Þ
ð1 - eiω Þð1 - e - iω Þ
where
½numerator of ð20:146Þ
= e - ilω þ eilω - eiωðlþ1Þ - e - iωðlþ1Þ = 2½cos lω - cosðl þ 1Þω
1 ω
= 4 sin l þ ω sin
2 2
and
ω
½denominator of ð20:146Þ = 2ð1 - cos ωÞ = 4 sin 2 :
2
Thus, we get
sin l þ 12 ω
χ ðlÞ ðωÞ = : ð20:147Þ
sin ω2
1
χ ðαÞ ðgÞ χ ðβÞ ðgÞ = δαβ : ð20:148Þ
n g
π π 2π ω
4 sin 2
f ðα, β, ωÞdΠ = dω dβ dαf ðα, β, ωÞ 2 ω2 sin β
ω2
π
0
π
0
2π
0
ð20:149Þ
ω
= 4 dω dβ dαf ðα, β, ωÞ sin 2
sin β:
2
0 0 0
20.2 Rotation Groups: SU(2) and SO(3) 871
If, moreover, f(α, β, ω) depends only on ω, again as in the case of the character
described by (20.147), the calculation is further simplified such that
π
ω
f ðωÞdΠ = 16π dωf ðωÞ sin 2 : ð20:150Þ
2
0
0
Replacing f(ω) with χ ðl Þ ðωÞ χ ðlÞ ðωÞ, we have
π
0 0 ω
χ ðl Þ ðωÞ χ ðlÞ ðωÞdΠ = 16π dω χ ðl Þ ðωÞ χ ðlÞ ðωÞ sin 2
2
0
π
sin l0 þ 12 ω sin l þ 12 ω ω
= 16π dω sin 2
sin ω2 sin ω2 2
0
π
1 1
= 16π dω sin l0 þ ω sin l þ ω
2 2
0
π
0
χ ðl Þ ðωÞ χ ðlÞ ðωÞdΠ = 8π 2 δl0 l : ð20:152Þ
0
Finally, with χ ðl Þ ðωÞ χ ðlÞ ðωÞ defined as in (20.139), we get
0
χ ðl Þ ðωÞ χ ðlÞ ðωÞ = δl0 l : ð20:153Þ
π
ðlÞ
ω
χ ðωÞ KðωÞdΠ = 16π dω χ ðlÞ ðωÞ KðωÞ sin 2 : ð20:154Þ
2
0
π
ðlþ1Þ
ω
χ ðωÞ KðωÞdΠ = 16π dω χ ðlþ1Þ ðωÞ KðωÞ sin 2 , ð20:155Þ
2
0
which must vanish as well. Subtracting RHS of (20.154) from RHS of (20.155) and
taking account of the fact that χ (l )(ω) is real [see (20.147)], we have
π
ω
dω χ ðlþ1Þ ðωÞ - χ ðlÞ ðωÞ KðωÞ sin 2 = 0: ð20:156Þ
2
0
aþb a-b
sin a - sin b = 2 cos sin
2 2
π
ω
2 dω½cosðl þ 1ÞωKðωÞ sin 2 = 0 ðl = 0, 1, 2, ⋯Þ: ð20:157Þ
2
0
Putting l = 0 in LHS of (20.154) and from the assumption that the integral of
(20.154) vanishes, we also have
π
ω
dωKðωÞ sin 2 = 0, ð20:158Þ
2
0
where we used χ (0)(ω) = 1. Then, (20.157) and (20.158) are combined to give
20.3 Clebsch–Gordan Coefficients of Rotation Groups 873
π
ω
dωðcos lωÞKðωÞ sin 2 = 0 ðl = 0, 1, 2, ⋯Þ: ð20:159Þ
2
0
This implies that all the Fourier cosine coefficients of KðωÞ sin 2 ω2 vanish.
Considering that coslω (l = 0, 1, 2, ⋯) forms a complete orthonormal system in
the region [0, π], we must have
ω
KðωÞ sin 2 0: ð20:160Þ
2
In the infinite groups, typically the continuous groups, this situation often appears
when we deal with the addition of two angular momenta. Examples include the
addition of two (or more) orbital angular momenta and that of the orbital angular
momentum and spin angular momentum [6, 8].
Meanwhile, there is a set of basis vectors that spans the representation space with
respect to individual irreducible representations. There, we have a following ques-
tion: What are the basis vectors that span the total reduced representation space
described by (18.199)? An answer for this question is that these vectors must be
constructed by the basis vectors relevant to the individual irreducible representa-
tions. The Clebsch-Gordan coefficients appear as coefficients with respect to a
874 20 Theory of Continuous Groups
j
sin j þ 12 ω
χ ðjÞ ðωÞ = eimω = , ð20:161Þ
m= -j
sin ω2
where χ ðj1 × j2 Þ ðωÞ gives a character of the direct-product representation; see (18.198).
Furthermore, we have [5]
j1 j2
χ ðj1 × j2 Þ ðωÞ = eim1 ω eim2 ω
m1 = - j1 m2 = - j2
j1 j2
= eiðm1 þm2 Þω ð20:163Þ
m1 = - j1 m2 = - j2
j1 þj2 J j1 þj2
= eiMω = χ ðJ Þ ðωÞ,
J = jj1 - j2 j M = - J J = jj1 - j2 j
χ ðj1 × j2 Þ ðωÞ = χ ðjj1 - j2 jÞ ðωÞ þ χ ðjj1 - j2 jþ1Þ ðωÞ þ ⋯ þ χ ðj1 þj2 Þ ðωÞ: ð20:165Þ
where k is zero or an integer and with the last equality we used (20.152). Dividing
both sides by dΠ, we get
876 20 Theory of Continuous Groups
χ ðkÞ ðωÞ χ ðj1 × j2 Þ ðωÞ = δk,jj1 - j2 j þ δk,jj1 - j2 jþ1 þ ⋯ þ δk,j1 þj2 , ð20:167Þ
where we used (20.153). Equation (20.166) implies that only if k is identical to one
out of jj1 - j2j, jj1 - j2 j + 1, ⋯, j1 + j2, χ ðkÞ ðωÞ χ ðj1 × j2 Þ ðωÞ does not vanish. If, for
example, we choose jj1 - j2j for k in (20.167), we have
This relation corresponds to (18.200) where a finite group is relevant. Note that
the volume of SO(3), i.e., dΠ = 8π 2 corresponds to the order n of a finite group in
(20.148). Considering (18.199) and (18.200) once again, it follows that in the above
case Dðjj1 - j2 jÞ takes place once and only once in the direct-product representation
Dðj1 × j2 Þ . Thus, we obtain the following important relation:
j1 þj2
Dðj1 × j2 Þ = Dðj1 Þ ðωÞ Dðj2 Þ ðωÞ = J = jj1 - j2 j
DðJ Þ : ð20:169Þ
ðμÞ ðνÞ
ψj μ = 1, ⋯, nμ and ϕl ðν = 1, ⋯, nν Þ:
We know from Sect. 18.8 that nμnν new basis functions can be constructed using
ð μÞ ð ν Þ
the functions that are described by ψ j ϕl . Our next task will be to classify these
nμnν functions into those belonging to different irreducible representations. For this,
we need relation of (18.199). According to the custom, we rewrite it as follows:
where ω denotes a group element (i.e., rotation angle); (μνγ) is identical with qγ of
RHS of (18.199) and shows how many times the same representation γ takes place in
the direct-product representations. Naturally, we have
20.3 Clebsch–Gordan Coefficients of Rotation Groups 877
ðμνγ Þ = ðνμγ Þ:
Then, we get
ðμνγ Þnγ = nμ nν :
γ
ðγ τ Þ ð μÞ ð ν Þ
Ψs γ = μj, νljγ τγ s ψ j ϕl , ð20:171Þ
j, l
where τγ = 1, ⋯, (μνγ) and s stands for the number that designates the basis
functions of the irreducible representation γ; i.e., s = 1, ⋯, nγ . (We often designate
the basis functions with a negative integer. In that case, we choose that negative
integer for s; see Examples 20.1 and 20.2 shown later.) If γ takes place more than
once, we label γ as γ τγ . The coefficients with respect to the above linear combination
μj, νljγ τγ s
ðγ τ Þ
c γ τ γ Ψs γ
τγ
that belong to the irreducible representation γ. This would cause arbitrariness and
ambiguity [5]. Nevertheless, we do not need to get into further discussion about this
878 20 Theory of Continuous Groups
problem. From now on, we focus on the Clebsch-Gordan coefficients of SU(2) and
SO(3), a typical simply reducible group, namely τγ is at most one.
For SU(2) and SO(3) to be simply reducible saves us a lot of labor. The argument
for this is as follows: Equation (20.169) shows how the dimension of (2j1 + 1)
(2j2 + 1) for the representation space with the direct product of two representations is
redistributed to individual irreducible representations that take place only once. The
arithmetic based on this fact is as follows:
ð2j1 þ 1Þð2j2 þ 1Þ
1
= 2ðj1 - j2 Þð2j2 þ 1Þ þ 2 2j ð2j þ 1Þ þ ð2j2 þ 1Þ
2 2 2
= 2ðj1 - j2 Þð2j2 þ 1Þ þ 2ð0 þ 1 þ 2 þ ⋯ þ 2j2 Þ þ ð2j2 þ 1Þ ð20:172Þ
= ½2ðj1 - j2 Þ þ 0 þ 2½ðj1 - j2 Þ þ 1 þ 2½ðj1 - j2 Þ þ 2 þ ⋯
þ2½ðj1 - j2 Þ þ 2j2 þ ð2j2 þ 1Þ
= ½2ðj1 - j2 Þ þ 1 þ f2½ðj1 - j2 Þ þ 1 þ 1g þ ⋯ þ ½2ðj1 þ j2 Þ þ 1,
where with the second equality, we used the formula 1 þ 2 þ ⋯ þ n = 12 nðn þ 1Þ;
resulting numbers of 0, 1, 2, ⋯, and 2j2 are distributed to 2j2 + 1 terms of the
subsequent RHS of (20.172) each; the third term of 2j2 + 1 is distributed to 2j2 + 1
terms each as well on the rightmost hand side. Equation (20.172) is symmetric with
j1 and j2, but if we assume that j1 ≥ j2, each term of the rightmost hand side of
(20.172) is positive. Therefore, for convenience we assume j1 ≥ j2 without loss of
generality.
To clarify the situation, we list several tables (Tables 20.1, 20.2, and 20.3).
Tables 20.1 and 20.2 show specific cases, but Table 20.3 represents a general
case. In these tables the topmost layer has a diagram indicated by . This
diagram corresponds to the case of J = j j1 - j2j and contains different 2|j1 -
j2| + 1 quantum states. The lower layers include diagrams indicated by . These
diagrams correspond to the case of J = |j1 - j2| + 1, ⋯, j1 + j2. For each J, 2J + 1
functions (or quantum states) are included and labeled according to different num-
bers M m1 + m2 = - J, - J + 1, ⋯, J. The rightmost column displays J = |j1 - j2|,
|j1 - j2| + 1, ⋯, j1 + j2 from the topmost row through the bottommost row.
Tables 20.1, 20.2, and 20.3 comprise 2j2 + 1 layers, where total (2j1 + 1)(2j2 + 1)
functions that form basis vectors of Dðj1 Þ ðωÞ Dðj2 Þ ðωÞ are redistributed according
to the M number. The same numbers that appear from the topmost row through the
bottommost row are marked red and connected with dotted aqua lines. These lines
form a parallelogram together with the top and bottom horizontal lines as shown.
The upper and lower sides of the parallelogram are 2 j j1 - j2j in width. The
parallelogram gets flattened with decreasing jj1 - j2j. If j1 = j2, the parallelogram
coalesces into a line (Table 20.1). Regarding the notations M and J, see Sects. 20.3.2
and 20.3.3 (vide infra).
Bearing in mind the above remarks, we focus on the coupling of two angular
momenta as a typical example. Within this framework we deal with the Clebsch-
Gordan coefficients with regard to SU(2). We develop the discussion according to
20.3 Clebsch–Gordan Coefficients of Rotation Groups 879
1
−1 0 1 (= 1)
2
−1 −2 −1 0
0 −1 0 1
1 (= 2) 0 1 2
1
−2 −1 0 1 2 (= 1)
2
−1 −3 −2 −1 0 1
0 −2 −1 0 1 2
1 (= 2) −1 0 1 2 3
1
−1 − 1+1 ⋯ 2 2 − 1 ⋯ 1 −2 2 ⋯ 1 −1 1
2
− 2 −1− 2 −1− 2 +1 ⋯ 2 − 1 ⋯ 1 −3 2 ⋯ 1 − 2 −1 1 − 2
− 2+1 −1− 2 +1 −1− 2 +2 ⋯ 2 − 1 +1 ⋯ 1 −3 2+1 ⋯ 1 − 2 1 − 2 +1
⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯
2 −1 −1+ 2 −1 −1+ 2 ⋯ 32− 1 −1 ⋯ 1 − 2 −1 ⋯ 1 + 2 −2 1 + 2 −1
2 −1+ 2 −1+ 2 +1 ⋯ 3 2 − 1 ⋯ 1 − 2 ⋯ 1 + 2 −1 1 + 2
literature [5]. The relevant approach helps address a more complicated situation
where the coupling of three or more angular momenta including j - j coupling and
L - S coupling is responsible [6].
From (20.47) we know that fm forms a basis set that spans a (2j + 1)-dimensional
representation space associated with D( j ) such that
880 20 Theory of Continuous Groups
u jþm v j - m
fm = ðm = - j; - j þ 1; ⋯; j - 1; jÞ: ð20:47Þ
ðj þ mÞ!ðj - mÞ!
ðjÞ
Rða, bÞðf m Þ f m0 Dm0 m ða, bÞ, ð20:48Þ
m0
In the previous section we have described how systematically the quantum states are
made up of two angular momenta. We express those states in terms of a linear
combination of the basis functions related to the direct-product representations of
(20.169). To determine those coefficients, we follow calculation procedures due to
Hamermesh [5]. The calculation procedures are rather lengthy, and so we summarize
each item separately.
1. Invariant quantities AJ and BJ:
We start with seeking invariant quantities under the unitary transformation U of
(20.35) expressed as
U= a
- b
b
a
with jaj2 þ jbj2 = 1: ð20:35Þ
As in (20.46) let us consider a combination of variables u (u2 u1) and v (v2 v1)
that are transformed such that
u1 j1 þm1 u2 j1 - m1
ψ jm1 1 = ðm1 = - j1 , - j1 þ 1, ⋯, j1 - 1, j1 Þ,
ð j1 þ m1 Þ!ð j1 - m1 Þ!
v1 j2 þm2 v2 j2 - m2
ϕjm2 2 = ðm2 = - j2 ; - j2 þ 1; ⋯; j2 - 1; j2 Þ: ð20:176Þ
ð j2 þ m2 Þ!ð j2 - m2 Þ!
a b
x02 x01 = ðx2 x1 Þ -b a
: ð20:177Þ
x0 = xm , ð20:178Þ
where we have
a b
x0 x02 x01 , x ðx2 x1 Þ, m = -b a
: ð20:179Þ
-1
g 0
1 0
ð20:180Þ
Rewriting (20.181) as
u0 g = ðugÞm , ð20:182Þ
x0 = ðm ÞT xT = m{ xT :
T
Then, we get
where with the third equality we used the unitarity of m. This implies that uxT is an
invariant quantity under the unitary transformation by m. We have
v0 x0 = vxT ,
T
ð20:184Þ
Thus, v(ug)T (or u1v2 - u2v1) is another invariant under the unitary transformation
by m. In the above discussion we can view (20.183)–(20.185) as a quadratic form of
two variables (see Sect. 14.5).
Using these invariants, we define other important invariants AJ and BJ as follows
[5]:
The polynomial AJ is of degree 2j1 with respect to the covariant variables u1 and
u2 and is of degree 2j2 with v1 and v2. The degree of the contravariant variables x1
and x2 is 2J. Expanding AJ in powers of x1 and x2, we get
J
AJ = M = -J
W JM X JM , ð20:188Þ
where X JM is defined as
x1 JþM x2 J - M
X JM ðM = - J, - J þ 1, ⋯, J - 1, J Þ ð20:189Þ
ðJ þ M Þ!ðJ - M Þ!
and coefficients W JM are polynomials of u1, u2, v1, and v2. The coefficients W JM will
be determined soon in combination with X JM .
Meanwhile, we have
20.3 Clebsch–Gordan Coefficients of Rotation Groups 883
2J
ð2J Þ!
BJ = ðu1 x1 Þk ðu2 x2 Þ2J - k
k=0
k! ð 2J - k Þ!
J
1
= ð2J Þ! ðu1 x1 ÞJþM ðu2 x2 ÞJ - M
M= -J
ðJ þ M Þ!ð J - M Þ!
ð20:190Þ
J
1
= ð2J Þ! u JþM u2 J - M x1 JþM x2 J - M
M= -J
ðJ þ M Þ!ðJ - M Þ! 1
J
= ð2J Þ! ΦJM X JM ,
M= -J
u1 JþM u2 J - M
ΦMJ ðM = - J, - J þ 1, ⋯, J - 1, J Þ: ð20:191Þ
ðJ þ M Þ!ðJ - M Þ!
j1 þj2 - J j1 þ j2 - J
ðu1 v2 - u2 v1 Þj1 þj2 - J = ð- 1Þλ ðu1 v2 Þj1 þj2 - J - λ ðu2 v1 Þλ ,
λ=0 λ
j1 - j2 þJ j1 - j2 þ J
ðu2 x2 þ u1 x1 Þj1 - j2 þJ = ðu1 x1 Þj1 - j2 þJ - μ ðu2 x2 Þμ ,
μ=0 μ
j1 - j2 þJ j2 - j1 þ J
ðv2 x2 þ v1 x1 Þj2 - j1 þJ = ðv1 x1 Þj2 - j1 þJ - ν ðv2 x2 Þν :
μ=0 ν
ð20:192Þ
Therefore, we have
884 20 Theory of Continuous Groups
j1 þ j2 - J j1 - j2 þ J j2 - j1 þ J
AJ = ð- 1Þλ
λ, μ , ν λ μ ν ð20:193Þ
× u1 2j1 - λ - μ u2 λþμ v1 j2 - j1 þJ - νþλ v2 j1 þj2 - J - λþν x1 2J - μ - ν x2 μþν :
j1 þ j2 - J j1 - j2 þ J j2 - j1 þ J
AJ = ð- 1Þλ
λ, μ , ν λ j1 - λ - m1 j2 - λ þ m 2 ð20:194Þ
j1 þm1 j1 - m1
× u1 u2 v1 j2 þm2 v2 j2 - m2 x1 Jþm1 þm2 x2 J - m1 - m2 ,
j2 - j1 þJ
where to derive j2 - λþm2
in RHS we used
a
b
= a
a-b
: ð20:195Þ
Setting m1 + m2 = M, we have
ΛJM = UW JM ,
where C(J ) is a constant that depends only on J with given numbers j1 and j2. An
implication of (20.199) is that we can get suitably normalized functions ΨJM using
arbitrary basis vectors ΛJM .
Putting
we get
ρJ CJm1 , m2 :
3. Normalization of ΨJM :
Normalization condition for ΨJM of (20.201) or (20.202) is given by
2
jρJ j2 C Jm1 ,m2 = 1: ð20:203Þ
m1 , m2 , m1 þm2 = M
-l k m-k
zl1 zm
2 jz1 z2 = l!ðm - lÞ! k!ðm - k Þ!δlk
i.e.,
-l -k
zl1 zm zk1 zm
2
j 2
= δlk : ð20:204Þ
l!ðm - lÞ! k!ðm - kÞ!
λ þ m 1 - j1 ≥ 0 or λ ≥ j1 - m1 : ð20:205Þ
λ ≤ j1 - m 1 : ð20:206Þ
From the above equations, we obtain only one choice for λ such that [5]
λ = j1 - m 1 : ð20:207Þ
ð2J Þ! j1 þ m1 j2 þ m 2
= ð- 1Þj1 - m1 :
ðJ - j2 þ j1 Þ!ðJ - j1 þ j2 Þ! j2 - m2 j1 - m 1
ð20:208Þ
j1 þ m1 j2 þ m2
To confirm (20.208), calculate and use, e.g.,
j2 - m2 j1 - m1
j1 - j2 þ m1 þ m2 = j1 - j2 þ M = j1 - j2 þ J: ð20:209Þ
Thus, we get
2
C Jm1 ,m2
m1 , m2 , m1 þm2 = M
j1 þ m 1 j2 þ m2 ð20:210Þ
ð2J Þ!
= :
ðJ - j2 þ j1 Þ!ðJ - j1 þ j2 Þ! m , m , m þm = M j2 - m2 j1 - m1
1 2 1 2
y-x-1 Γðx þ 1Þ
= ð- 1Þy = ð- 1Þy x
,
y y!Γðx - y þ 1Þ y
where with the last equality we used (20.213). That is, we get
y-x-1
x
y
= ð- 1Þy y
: ð20:214Þ
2
C Jm1 ,m2
m1 , m2 , m1 þm2 ¼M
ð2J Þ!
¼
ðJ - j2 þ j1 Þ!ðJ - j1 þ j2 Þ!
j2 - m2 - j1 - m1 - 1
× ð- 1Þj2 - m2 ð- 1Þj1 - m1
m1 , m2 , m1 þm2 ¼M j2 - m2
j1 - m1 - j2 - m2 - 1
j1 - m 1
ð20:215Þ
r α s β r s αþβ
ð1 þ xÞr ð1 þ xÞs = x x = x
α
α β
β α, β
α β
ð20:216Þ
rþs rþs γ
= ð 1 þ xÞ = x:
γ
γ
Comparing the coefficients of the γ-th power monomials of the last three sides,
we get
rþs
γ
= r
α
s
β
: ð20:217Þ
α, β, αþβ = γ
2
C Jm1 ,m2
m1 , m2 , m1 þm2 ¼M
2J þ 1
C ðJ Þ = : ð20:220Þ
ðJ - j2 þ j1 Þ!ðJ - j1 þ j2 Þ!ðj1 þ j2 - J Þ!ðj1 þ j2 þ J þ 1Þ!
During the course of the above calculations, we encountered, e.g., Γ(-x) and the
factorial involving a negative integer. Under ordinary circumstances, however, such
things must be avoided. Nonetheless, it would be convenient for practical use, if we
properly recognize that we first choose a number, e.g., -x close to (but not identical
with) a negative integer and finally take the limit of -x at a negative integer after the
related calculations have been finished.
890 20 Theory of Continuous Groups
so that we can get the basis functions of the direct-product representation described
by
ðu2 v2 u1 v2 u2 v1 u1 v1 Þ:
Notice that we have four functions above; i.e., (2j1 + 1)(2j2 + 1) = 4, j1 = j2 = 1/2.
Using (20.221) we can determine the proper basis functions with respect to ΨJM of
(20.202). For example, for Ψ1- 1 we have only one choice to take m1 = m2 = - 12 to
get m1 + m2 = M = - 1. In (20.221) we have no other choice but to take λ = 0. In
this way, we have
1=2 1=2
Ψ1- 1 = u2 v2 = ψ - 1=2 ϕ - 1=2 :
Similarly, we get
1=2 1=2
Ψ11 = u1 v1 = ψ 1=2 ϕ1=2 :
We put the above results in a simple matrix form representing the basis vector
transformation such that
1 0 0 0
1 1
0 p 0 p
2 2
ð u2 v 2 u1 v 2 u1 v 1 u2 v 1 Þ = Ψ1- 1 Ψ10 Ψ11 Ψ00 : ð20:222Þ
0 0 1 0
1 1
p -p
0 2 0 2
In this way, (20.222) shows how the basis functions that possess a proper
irreducible representation and span the direct-product representation space are
constructed using the original product functions. The Clebsch-Gordan coefficients
play a role as a linker (i.e., a unitary operator) of those two sets of functions. In this
respect these coefficients resemble those appearing in the symmetry-adapted linear
combinations (SALCs) mentioned in Sects. 18.2 and 19.3.
Expressing the unitary transformation according to (20.173) and choosing
Ψ - 1 Ψ10 Ψ11 Ψ00 as the basis vectors, we describe the transformation in a similar
1
Rða, bÞ Ψ1- 1 Ψ10 Ψ11 Ψ00 = Ψ1- 1 Ψ10 Ψ11 Ψ00 Dð1=2Þ Dð1=2Þ : ð20:223Þ
2
p 2
pa 2ab
pb 0
- 2ab aa -
p bb 2a b 0
Dð1=2Þ Dð1=2Þ = 2 : ð20:224Þ
ðb Þ - 2a b ð a Þ 2 0
0 0 0 1
Notice that (20.224) is a block matrix, namely (20.224) has been reduced according
to the symmetric and antisymmetric representations. Symbolically writing (20.223)
and (20.224), we have
The block matrix of (20.224) is unitary, and so the (3, 3) submatrix and (1, 1)
submatrix (i.e., the number of 1) are unitary accordingly. To show it, use the
conditions (20.35) and (20.173). The confirmation is left for readers. The (3, 3)
submatrix is a symmetric representation, whose representation space Ψ1- 1 Ψ10 Ψ11
892 20 Theory of Continuous Groups
span. Note that these functions are symmetric with the exchange m1 $ m2. Or, we
may think that these functions are symmetric with the exchange ψ $ ϕ. Though
trivial, the basis function Ψ00 spans the antisymmetric representation space. The
corresponding submatrix is merely the number 1. This is directly related to the fact
that v(ug)T (or u1v2 - u2v1) of (20.185) is an invariant. The function Ψ00 changes the
sign (i.e., antisymmetric) with the exchange m1 $ m2 or ψ $ ϕ.
Once again, readers might well ask why we need to work out such an elaborate
means to pick up a small pebble. With increasing dimension of the vector space,
however, to seek an appropriate set of proper basis functions becomes increasingly
as difficult as lifting a huge rock. Though simple, a next example gives us a feel for
it. Under such situations, a projection operator is an indispensable tool to address the
problems.
Example 20.2 We examine a direct product of D(1) D(1), where j1 = j2 = 1. This
is another example of the symmetric and antisymmetric representations. Taking two
sets of complex valuables u (u2 u1) and v (v2 v1) and rewriting them according to
(20.176) again, we have nine, i.e., (2j1 + 1)(2j2 + 1) product functions expressed as
ψ 1- 1 ϕ1- 1 , ψ 10 ϕ1- 1 , ψ 11 ϕ1- 1 , ψ 1- 1 ϕ10 , ψ 10 ϕ10 , ψ 11 ϕ10 , ψ 1- 1 ϕ11 , ψ 10 ϕ11 , ψ 11 ϕ11 : ð20:225Þ
These product functions form the basis functions of the direct-product represen-
tation. Consequently, we will be able to construct proper eigenfunctions by means of
linear combinations of these functions. This method is again related to that based on
SALCs discussed in Chap. 19. In the present case, it is accomplished by finding the
Clebsch-Gordan coefficients described by (20.221).
As implied in Table 20.1, we need to determine the individual coefficients of nine
functions of (20.225) with respect to the following functions that are constituted by
linear combinations of the above nine functions:
(i) With the first five functions, we have J = j1 + j2 = 2. In this case, the
determination procedures of the coefficients are pretty much simplified. Substituting
J = j1 + j2 for the second factor of the denominator of (20.221), we get (-λ)! As well
as λ! In the first factor of the denominator. Therefore, for the inside of factorials not
to be negative we must have λ = 0. Consequently, we have [5]
ð2j1 Þ!ð2j2 Þ!
ρJ = ,
ð2J Þ!
1=2
ðJ þ M Þ!ðJ - M Þ!
C Jm1 ,m2 = :
ðj1 þ m1 Þ!ðj1 - m1 Þ!ðj2 þ m2 Þ!ðj2 - m2 Þ!
20.3 Clebsch–Gordan Coefficients of Rotation Groups 893
With Ψ2± 2 we have only one choice for ρJ and C Jm1 ,m2 . That is, we have m1 = ± j1
and m2 = ± j2. Then, Ψ2± 2 = ψ 1± 1 ϕ1± 1 . In the case of M = - 1, we have two choices
in (20.221); i.e., m1 = - 1, m2 = 0 and m1 = 0, m2 = - 1. Then, we get
1
Ψ2- 1 = p ψ 1- 1 ϕ10 þ ψ 10 ϕ1- 1 : ð20:226Þ
2
1
Ψ21 = p ψ 11 ϕ10 þ ψ 10 ϕ11 : ð20:227Þ
2
1
Ψ20 = p ψ 1- 1 ϕ11 þ 2ψ 10 ϕ10 þ ψ 11 ϕ1- 1 : ð20:228Þ
6
(ii) With J = 1, i.e., Ψ1- 1 , Ψ10 , and Ψ11 , we start with (20.221). Noting the
λ! and (1 - λ)! in the first two factors of denominator, we have λ = 0 or λ = 1. In the
case of M = - 1, we have only one choice of m1 = 0, m2 = - 1 for λ = 0. Similarly,
m1 = - 1, m2 = 0 for λ = 1. Hence, we get
1
Ψ1- 1 = p ψ 10 ϕ1- 1 - ψ 1- 1 ϕ10 : ð20:229Þ
2
1
Ψ10 = p ψ 11 ϕ1- 1 - ψ 1- 1 ϕ11 , ð20:230Þ
2
1
Ψ11 = p ψ 11 ϕ10 - ψ 10 ϕ11 : ð20:231Þ
2
ð2J þ 1Þ!ð2j2 Þ!
ρJ = ,
ð2j1 þ 1Þ!
1=2
ðj1 þ m1 Þ!ðj1 - m1 Þ!
CJm1 ,m2 = ð- 1Þj2 þm2 :
ðJ þ M Þ!ðJ - M Þ!ðj2 þ m2 Þ!ðj2 - m2 Þ!
1
Ψ00 = p ψ 1- 1 ϕ11 - ψ 10 ϕ10 þ ψ 11 ϕ1- 1 :
3
ψ 1-1 ϕ1-1 ψ 10 ϕ1-1 ψ 11 ϕ1-1 ψ 10 ϕ11 ψ 11 ϕ11 ψ 1-1 ϕ10 ψ 1-1 ϕ11 ψ 11 ϕ10 ψ 10 ϕ10
1 0 0 0 0 0 0 0 0
1 1
0 p 0 0 0 p 0 0 0
2 2
1 1 1
0 0 p 0 0 0 p 0 p
6 2 3
1 1
0 0 0 p 0 0 0 -p 0
2 2
0 0 0 0 1 0 0 0 0
× ð20:232Þ
1 1
0 p 0 0 0 -p 0 0 0
2 2
1 1 1
0 0 p 0 0 0 -p 0 p
6 2 3
1 1
0 0 0 p 0 0 0 p 0
2 2
2 1
0 0 p 0 0 0 0 0 -p
6 3
= Ψ2-2 Ψ2-1 Ψ20 Ψ21 Ψ22 Ψ1-1 Ψ10 Ψ11 Ψ00 :
words, we have the arbitrariness with the unitary similarity transformation of the
matrix, but the unitary matrix that underwent the unitary similarity transformation is
again a unitary matrix (see Chap. 14). To show that the determinant of (20.232) is -
1, use the expansion of the determinant and use the fact that when (a multiple of) a
certain column (or row) is added to another column (or row), the determinant is
unchanged (see Sect. 11.3).
We find that the functions belonging to J = 2 and J = 0 are the basis functions of
the symmetric representations. Meanwhile, those belonging to J = 1 are the basis
functions of the antisymmetric representations (see Sect. 18.9). Expressing the
unitary transformation of this example according to (20.173) and choosing
Ψ2- 2 Ψ2- 1 Ψ20 Ψ21 Ψ22 Ψ1- 1 Ψ10 Ψ11 Ψ00 as the basis vectors, we describe the trans-
formation as
ℜða, bÞ Ψ2- 2 Ψ2- 1 Ψ20 Ψ21 Ψ22 Ψ1- 1 Ψ10 Ψ11 Ψ00
ð20:233Þ
= Ψ2- 2 Ψ2- 1 Ψ20 Ψ21 Ψ22 Ψ1- 1 Ψ10 Ψ11 Ψ00 Dð1Þ Dð1Þ :
Notice again that the representation matrix of D(1) D(1) can be converted
(or reduced) to block unitary matrices according to the symmetric and antisymmetric
representations such that
where D(2) and D(0) are the symmetric representations and D(1) is the antisymmetric
representation. Recall that SO(3) is simply reducible. To show (20.234), we need
somewhat lengthy calculations, but the computation procedures are straightforward.
A set of basis functions Ψ2- 2 Ψ2- 1 Ψ20 Ψ21 Ψ22 spans a five-dimensional sym-
metric representation space. In turn, Ψ00 spans a one-dimensional symmetric repre-
sentation space. Another set of basis functions Ψ1- 1 Ψ10 Ψ11 spans a three-
dimensional antisymmetric representation space.
We add that to seek some special Clebsch-Gordan coefficients would be easy.
For example, we can readily find out them in the case of J = j1 + j2 = M or
J = j1 + j2 = - M. The final result of the normalized basis functions can be
j þj j j j þj j j
Ψ j11 þj22 = ψ j11 ϕj22 , Ψ -1 ðj 2þj Þ = ψ -1 j1 ϕ -2 j2 :
1 2
ðj1 , m1 = j1 , j2 , m2 = j2 jJ = j1 þ j2 , M = J Þ
and
ðj1 , m1 = - j1 , j2 , m2 = - j2 jJ = j1 þ j2 , M = - J Þ
survive and take a value of 1; see the corresponding parts of the matrix elements in
(20.221). This can readily be seen in Tables 20.1, 20.2, and 20.3.
896 20 Theory of Continuous Groups
In Sect. 20.2 we have developed a practical approach to dealing with various aspects
and characteristics of SU(2) and SO(3). In this section we introduce an elegant theory
of Lie groups and Lie algebras that enable us to systematically study SU(2) and
SO(3).
We start with the following discussion. Let A(t) be a non-singular matrix whose
elements vary as a function of a real number t. Here we think of a matrix as a function
of a real number t. If under such conditions A(t) constitutes a group, A(t) is said to be
a one-parameter group. In particular, we are interested in a situation where we have
We further get
Since A(0) is non-singular, A(0)-1 must exist. Then, multiplying A(0)-1 on both
sides of (20.237), we obtain
Að0Þ = E, ð20:238Þ
meaning that
Að- t Þ = Aðt Þ - 1 :
A0 ðt Þ = A0 ð0ÞAðt Þ: ð20:240Þ
Defining
X A0 ð0Þ, ð20:241Þ
we obtain
A0 ðt Þ = XAðt Þ: ð20:242Þ
under an initial condition A(0) = E. Thus, any one-parameter group A(t) is described
by (20.243) on condition that A(t) is represented by (20.235).
In Sect. 20.1 we mentioned the Lie algebra in relation to (20.14). In this context
(20.243) shows how the Lie algebra is related to the one-parameter group. For a
group to be a continuous group is thus deeply connected to this one-parameter group.
Equation (20.45) of Sect. 20.2 is an example of product of one-parameter groups.
Bearing in mind the above situation, we describe definitions of the Lie group and Lie
algebra.
Definition 20.2 [9] Let G be a subgroup of GL(n, ℂ). Suppose that Ak (k = 1, 2,
⋯) 2 G and that lim Ak = A ½2 GLðn, ℂÞ. If A 2 G, then G is said to be a linear Lie
k→1
group of dimension n.
We usually call a linear Lie group simply a Lie group. Note that GL(n, ℂ) has
appeared in Sect. 16.1. The definition again is most likely to bewilder chemists in
that they read it as though the definition itself including the word of Lie group was
written in the air. Nevertheless, a next example is again expected to relieve the
chemists of useless anxiety.
Example 20.3 Let G be a group and let Ak 2 G with det Ak = 1 such that
cos θk - sin θk
Ak = ðk : real numberÞ: ð20:244Þ
sin θk cos θk
expðtX Þ 2 G, ð20:246Þ
all X are called a Lie algebra of the corresponding Lie group G. We denote the Lie
algebra of the Lie group G by g.
The relation (20.246) includes the case where X is a (1, 1) matrix (i.e., merely a
complex number) as the trivial case. Most importantly, (20.246) is directly
connected to (20.243) and Definition 20.3 shows the direct relevance between the
Lie group and Lie algebra. The two theories of Lie group and Lie algebra underlie
continuous groups.
The lie algebra has following properties:
(i) If X 2 g, aX 2 g as well, where a is any real number.
(ii) If X, Y 2 g, X þ Y 2 g as well.
(iii) If X, Y 2 g, ½X, Y ð XY - YX Þ 2 g as well.
In fact, exp[t(aX)] = exp [(ta)X] and ta is a real number so that we have (i).
Regarding (ii) we should examine individual properties of X. As already shown in
Property (7)′ of Chap. 15 (Sect. 15.2), e.g., a unitary matrix of exp(tX) is associated
with an anti-Hermitian matrix of X; we are interested in this particular case. In the
case of SU(2) X is a traceless anti-Hermitian matrix and as to SO(3) X is a real skew-
symmetric matrix. In the former case, for example, traceless anti-Hermitian matrices
X and Y can be expressed as
ic - a þ ib ih - f þ ig
X= ,Y = , ð20:247Þ
a þ ib - ic f þ ig - ih
iðc þ hÞ - ð a þ f Þ þ i ð b þ gÞ
XþY= : ð20:248Þ
ða þ f Þ þ iðb þ gÞ - iðc þ hÞ
Notice that we ignored the terms having s2, t2, st2, s2t, and s2t2 as a coefficient.
Defining
we have
d df ðt Þ dgðt Þ
hf ðt Þjgðt Þi = jgðt Þ þ f ðt Þj : ð20:253Þ
dt dt dt
Dividing both sides of (20.254) by a real number Δ and taking a limit as below,
we have
1
lim ½hf ðt þ ΔÞjgðt þ ΔÞi - hf ðt Þjgðt Þi
Δ→0 Δ
ð20:255Þ
f ðt þ Δ Þ - f ðt Þ gð t þ Δ Þ - gð t Þ
= lim jgðt þ ΔÞ þ f ðt Þj ,
Δ→0 Δ Δ
where we used the calculation rules for inner products of (13.3) and (13.20). Then,
we get (20.253). This completes the proof.
Now, let us apply (20.253) to a unitary operator A(t) we dealt with in the previous
section. Replacing f(t) and g(t) in (20.253) with ψA(t){ and A(t)χ, respectively, we
have
{
d dAðt Þ dAðt Þ
ψAðt Þ{ jAðt Þχ = ψ jAðt Þχ þ ψAðt Þ{ j χ , ð20:256Þ
dt dt dt
where ψ and χ are arbitrarily chosen vectors that do not depend on t. Then, we have
d d d
LHS of ð20:256Þ = ψjAðt Þ{ Aðt Þjχ = hψjEjχ i = hψjχ i = 0, ð20:257Þ
dt dt dt
where with the second equality we used the unitarity of A(t). Taking limit t → 0 in
(20.256), we have
dAð0Þ{ dAð0Þ
lim ½RHS of ð20:256Þ = ψ jAð0Þχ þ ψAð0Þ{ j χ :
t→0 dt dt
{
dAðt Þ{ dAðt Þ {
= = ½A0 ð0Þ ,
dt t→0
dt t→0
where we assumed that operations of the adjoint and t → 0 are commutable. Then,
we get
20.4 Lie Groups and Lie Algebras 901
{ dAð0Þ
lim ½RHS of ð20:256Þ = ψ ½A0 ð0Þ jAð0Þχ þ ψAð0Þ{ j χ
t→0 dt ð20:258Þ
{ {
= ψX jχ þ hψjXχ i = ψ X þ X jχ ,
where we used X A′(0) of (20.241) and A(0) = A(0){ = E. From (20.257) and
(20.258), we have
0 = ψ X { þ X jχ :
X{ þ X 0 i:e:, X { = - X: ð20:259Þ
where we used (15.32) and Theorem 15.2 as well as the assumption that X is anti-
Hermitian. Note that exp(tX) and exp(-tX) are commutative; see (15.29) of Sect.
15.2. Equation (20.261) implies that exp(tX) is unitary, i.e., exp(tX) 2 U(n). The
902 20 Theory of Continuous Groups
notation U(n) means a unitary group. Hence, X 2 uðnÞ, where uðnÞ means the Lie
algebra corresponding to U(n).
Next, let us seek a Lie algebra that corresponds to a special unitary group SU(n).
By Definition 20.3, it is obvious that since U(n) ⊃ SU(n), uðnÞ ⊃ suðnÞ , where
suðnÞ means the Lie algebra corresponding to SU(n). It suffices to find the condition
under which det[exp(tX)] = 1 holds with any real number t and the elements
X 2 uðnÞ. From (15.41) [9], we have
where Tr stands for trace (see Chap. 12). This is equivalent to that
holds with any real t. For this, we must have Tr(X) = 0 with m = 0. This implies that
suðnÞ comprises anti-Hermitian matrices with its trace being zero (i.e., traceless).
Consequently, the Lie algebra suðnÞ is certainly a subset (or subspace) of uðnÞ that
consists of anti-Hermitian matrices.
In relation to (20.256) let us think of a real orthogonal matrix A(t). If A(t) is real
and orthogonal, (20.256) can be rewritten as
T
d dAðt Þ dAðt Þ
ψAðt ÞT jAðt Þχ = ψ jAðt Þχ þ ψAðt ÞT j χ : ð20:264Þ
dt dt dt
XT þ X 0 i:e:, X T = - X: ð20:265Þ
In this case we obtain a skew-symmetric matrix. All its diagonal elements are zero
and, hence, the skew-symmetric matrix is traceless. Other typical Lie groups are an
orthogonal group O(n) and a special orthogonal group SO(n) and the corresponding
Lie algebras are denoted by oðnÞ and soðnÞ, respectively. Note that both oðnÞ and
soðnÞ consist of skew-symmetric matrices.
Conversely, let us consider what type of operator exp(tX) would be if X is a skew-
symmetric matrix. Here we assume that X is a (n, n) matrix. With an arbitrary real
number t we have
where we used (15.31). This implies that exp(tX) is an orthogonal matrix, i.e., exp
(tX) 2 O(n). Hence, X 2 oðnÞ. Notice that exp(tX) and exp(-tX) are commutative.
Summarizing the above, we have the following theorem.
20.4 Lie Groups and Lie Algebras 903
Theorem 20.4 The n-th order Lie algebra uðnÞ corresponding to the Lie group U(n)
(i.e., unitary group) consists of all the anti-Hermitian (n, n) matrices. The n-th order
Lie algebra suðnÞ corresponding to the Lie group SU(n) (i.e., special unitary group)
comprises all the anti-Hermitian (n, n) matrices with the trace zero. The n-th order
Lie algebras oðnÞ and soðnÞ corresponding to the Lie groups O(n) and SO(n),
respectively, are all the real skew-symmetric (n, n) matrices.
Notice that if A and B are anti-Hermitian matrices, so are A + B and cA (c: real
number). This is true of skew-symmetric matrices as well. Hence, uðnÞ, suðnÞ, oðnÞ,
and soðnÞ form a linear vector space.
In Sect. 20.3.1 we introduced a commutator. The commutators are ubiquitous in
quantum mechanics, especially as commutation relations (see Part I). Major prop-
erties are as follows:
where gðd Þ stands for a Lie algebra of dimension d. As their commutators belong to
gðdÞ as well, with an arbitrary pair of Xi and Xj (i, j = 1, 2, ⋯, d) we have
d
Xi, Xj = f X ,
k = 1 ijk k
ð20:269Þ
where a set of real coefficients fijk is said to be structure constants, which define the
structure of the Lie algebra.
Example 20.4 In (20.42) we write ζ 1 ζ x, ζ 2 ζ y, and ζ 3 ζ z. Rewriting it
explicitly, we have
1 -i 1 1
ζ1 = 0
-i
, ζ2 = 0
-1
1
, ζ3 = i 0
-i
: ð20:270Þ
2 0 2 0 2 0
Then, we get
904 20 Theory of Continuous Groups
3
ζi , ζj = E ζ ,
k = 1 ijk k
ð20:271Þ
Notice that in (20.272) if (i, j, k) represents an even permutation of (1, 2, 3), Eijk = 1,
but if (i, j, k) is an odd permutation of (1, 2, 3), Eijk = - 1. Otherwise, for e.g.,
i = j, j = k, or k = i, etc. Eijk = 0. The relation of (20.271) is essentially the same as
(3.30) and (3.69) of Chap. 3. We have
0 0 0 0 0 1 0 -1 0
A1 = 0 0 -1 , A2 = 0 0 0 , A3 = 1 0 0 : ð20:274Þ
0 1 0 -1 0 0 0 0 0
Again, we have
3
Ai , Aj = E A:
k = 1 ijk k
ð20:275Þ
iJ iJ y ~
J~x - x , J~y - , J - iJ z =ħ,
ħ ħ z
we get
J~l , J~m = ~
3
E J ,
k = 1 lmn n
We can readily check that the definition (20.252) satisfies those of the inner
product of (13.2), (13.3), and (13.4). In fact, we have
hBjAi = bij aij = aij bij = aij bij = hAjBi : ð20:277Þ
i, j i, j i, j
Equation (13.3) is obvious from the calculation rules of a matrix. With (13.4), we get
hAjAi = Tr A{ A =
2
aij ≥ 0, ð20:278Þ
i, j
where we have A = (aij) and the equality holds if and only if all aij = 0, i.e., A = 0.
Thus, hA| Ai gives a positive definite inner product on g.
From (20.278), we may equally define an inner product of g as
hAjBi = Tr A{ B : ð20:279Þ
In fact, we have
In another way, we can readily compare (20.279) with (20.252) using, e.g., a
(2, 2) matrix and find that both equations give the same result. It is left for readers as
an exercise.
906 20 Theory of Continuous Groups
From Theorem 20.4, uðnÞ corresponding to the unitary group U(n) consists of all
the anti-Hermitian (n, n) matrices. With these matrices, we have
hAjBi = hAjBi:
This means that hA| Bi is real. For example, using uð2Þ, let us evaluate an inner
product. We denote arbitrary X 1 , X 2 2 uð2Þ by
X1 = ia
- cþid
cþid
ib
, X2 = ip
- rþis
rþis
iq
,
which gives a real inner product. Notice that this is the case with suð2Þ as well.
Next, let us think of the following inner product:
{
Tr gXg - 1 gYg - 1 ¼ Tr gXg{ { gYg{
ð20:282Þ
¼ Tr gX { g{ gYg{ ¼ Tr gX { Yg{ ¼ Tr X { Y ¼ hX Yi,
where with the second last equality we used (12.13), namely invariance of the trace
under the (unitary) similarity transformation. If g is represented by a real orthogonal
matrix, instead of (20.282) we have
{
Tr gXg - 1 gYg - 1 ¼ Tr gXgT { gYgT
¼ Tr gT { X { g{ gYgT ¼ Tr g X { gT gYg{ ¼ Tr gX { YgT ¼ hX Yi,
ð20:283Þ
20.4 Lie Groups and Lie Algebras 907
This relation clearly shows that the real inner product of (20.284) remains
unchanged by the operation
X ⟼ gXg - 1 ,
Ad½g : g → g:
From Definition 20.4, Ad½gðX Þ 2 g. The operator Ad[g] is a kind of mapping (i.e.,
endomorphism) discussed in Sect. 11.2. Notice that both g and X are represented by
(n, n) matrices. The matrix g is either a unitary matrix or an orthogonal matrix. The
matrix X is either an anti-Hermitian matrix or a skew-symmetric matrix.
Theorem 20.5 Let g be an element of a Lie group G and g be a Lie algebra of G.
Then, Ad[g] is a linear transformation on g.
Proof Let X, Y 2 g. Then, we have
≡ Ad[ ]( )
20.4 Lie Groups and Lie Algebras 909
g ⟼ Ad½g or Ad : G → G0 : ð20:289Þ
The G and G′ may or may not be identical. Examples can be seen in, e.g., (20.294)
and (20.302) later.
As mentioned above, if g is either uðnÞ or oðnÞ, Ad[g] is an orthogonal
transformation on g. The representation of Ad[g] is real accordingly.
An immediate implication of this is that the transformation of the basis vectors of
g has a connection with the corresponding Lie groups such as U(n) and O(n).
Moreover, it is well known and studied that there is a close relationship between
Ad[SU(2)] and SO(3). First, we wish to seek basis vectors of suð2Þ. A general form
of X 2 suð2Þ is a following traceless anti-Hermitian matrix described by
ic - a þ ib
X= ,
a þ ib - ic
where a, b, and c are real. We have encountered this type of matrix in (20.42) and
(20.270). Using an inner product described in (20.252), we determine an orthonor-
mal basis set of suð2Þ such that
1 -i 1 1
e1 = p 0
-i 0
, e2 = p 0
-1
1
0
, e3 = p i
0
0
-i
, ð20:290Þ
2 2 2
β β
eiα=2 0 cos sin
gα and gβ 2 2 :
0 e - iα=2 β β
- sin cos
2 2
The elements gα and gβ have appeared in (20.43) and (20.44). Then, we have [9]
1 eiα=2 0 0 -i e - iα=2 0
Ad½gα ðe1 Þ = gα e1 gα- 1 = p
2 0 e - iα=2 -i 0 0 eiα=2
1 0 sin α - i cos α
=p = e1 cos α þ e2 sin α:
2 - sin α - i cos α 0
ð20:291Þ
Similarly, we get
910 20 Theory of Continuous Groups
1 0 cosαþisinα
Ad½gα ðe2 Þ= p = -e1 sinαþe2 cosα, ð20:292Þ
2 - cosαþisinα 0
1
Ad½gα ðe3 Þ = p i
0
0
-i
= e3 : ð20:293Þ
2
cos α - sin α 0
ðe1 e2 e3 ÞAd½gα = ðe1 e2 e3 Þ sin α cos α 0 : ð20:294Þ
0 0 1
cos α - sin α 0
Ad½gα = sin α cos α 0 : ð20:295Þ
0 0 1
cos β 0 sin β
Ad gβ = 0 1 0 : ð20:296Þ
- sin β 0 cos β
Moreover, with
eiγ=2 0
gγ = , ð20:297Þ
0 e - iγ=2
we have
cos γ - sin γ 0
Ad gγ = sin γ cos γ 0 : ð20:298Þ
0 0 1
Ad½gα Ad gβ Ad gγ = Ad gα gβ gγ , ð20:299Þ
β β
e2ðαþγÞ cos e2ðα - γÞ sin
i i
gαβγ gα gβ gγ = 2 2 : ð20:300Þ
β β
- e2ðγ - αÞ sin - 2i ðαþγ Þ
i
e cos
2 2
Note that the matrix of (20.300) is identical with (20.45). Using (20.300), we
calculate Ad[gαβγ ](e1) such that
1 2 2 0 -i 2 2
¼p
2 β - 2i ðαþγ Þ β -i 0 β β
-e2ðγ -αÞ sin e2ðγ -αÞ sin
e2ðαþγ Þ cos
i i i
e cos
2 2 2 2
-iðcosαcosβ cosγ - sinαsinγ Þ
-isinβ cosγ
1 þsinαcosβ cosγ þ cosαsinγ
¼p
2 -iðcosαcosβ cosγ - sinαsinγ Þ
isinβ cosγ
- ðsinαcosβ cosγ þ cosαsinγ Þ
cosαcosβ cosγ - sinαsinγ
¼ ðe1 e2 e3 Þ sinαcosβ cosγ þ cosαsinγ :
- sinβ cosγ
ð20:301Þ
Similarly calculating Ad[gαβγ ](e2) and Ad[gαβγ ](e3) and combining the results
with (20.301), we obtain
ðe1 e2 e3 Þ Ad gαβγ =
cosαcosβcosγ - sinαsinγ - cosαcosβsinγ - sinαcosγ cosαsinβ
ð e1 e2 e3 Þ sinαcosβcosγ þ cosαsinγ - sinαcosβsinγ þ cosαcosγ sinαsinβ :
- sinβcosγ sinβsinγ cosβ
ð20:302Þ
The matrix of RHS of (20.302) is the same as (17.101) that contains Euler angles.
The calculations are somewhat lengthy but straightforward. As can be seen, e.g., in
(20.294) and (20.302), we may symbolically express the above adjoint representa-
tion as [9]
Ad gαβγ ¼
cos α cos β cos γ - sin α sin γ - cos α cos β sin γ - sin α cos γ cos α sin β
sin α cos β cos γ þ cos α sin γ - sin α cos β sin γ þ cos α cos γ sin α sin β :
- sin β cos γ sin β sin γ cos β
ð20:304Þ
h11 h12
where e3 is denoted by (20.290). Expressing h as h21 h22
, we get
- ih12
ih11
ih21 - ih22
= ih11
- ih21
ih12
- ih22
:
From the above relation, we must have h12 = h21 = 0. As h 2 SU(2), deth = 1.
Hence, for h we choose h = a
0
0
a-1
, where a ≠ 0. Moreover, considering he2 = e2h
we get
20.4 Lie Groups and Lie Algebras 913
a-1
0
- a-1
a
0
= 0
-a 0
: ð20:306Þ
Ad½ ± e = E: ð20:309Þ
g 1 g2 - 1 = ± e or g1 = ± g 2 : ð20:311Þ
Conversely, we have
where with the first equality we used Theorem 20.6 and with the second equality
we used (20.309). The relations (20.311) and (20.312) imply that with any
g 2 SU(2) there are two elements ±g that satisfy Ad[g] = Ad[-g]. These complete
the proof.
In the above proof of Theorem 20.7, (20.309) is a simple, but important expres-
sion. In view of Theorem 16.4 (Homomorphism theorem), we summarize the above
discussion as follows: Let Ad be a homomorphism mapping such that
914 20 Theory of Continuous Groups
Ad : SU ð2Þ⟶SOð3Þ ð20:303Þ
with a kernel F = fe, - eg. The relation is schematically represented in Fig. 20.6.
Notice the resemblance between Figs. 20.6 and 16.1a.
Correspondingly, from Theorem 16.4 of Sect. 16.4 we have an isomorphic
mapping Ad~ such that
~ : SU ð2Þ=F ⟶SOð3Þ,
Ad ð20:313Þ
Both the above two elements gαβγ and -gαβγ produce an identical orthogonal
matrix given in RHS of (20.302).
To associate the above results of abstract algebra with the tangible matrix form
obtained earlier, we rewrite (20.54) by applying a simple trigonometric formula of
β β
sin β = 2 sin cos :
2 2
From (20.315), we clearly see that (i) if j is zero or a positive integer, so are m and
m′. Therefore, (20.315) is a periodic function with 2π with respect to α, β, and γ. But
(ii) if j is a half-odd-integer, so are m and m′. Then, (20.315) is a periodic function
with 4π. Note that in the case of (ii) 2j = 2n + 1 (n = 0.1, 2, ⋯). Therefore, in
(20.315) we have
20.5 Connectedness of Lie Groups 915
As studied in Sect. 20.2.3, in the former case (i) the spherical surface harmonics
span the representation space of D(l ) (l = 0, 1, 2, ⋯). The simplest case for it is
D(0) = 1; a (1, 1) identity matrix, i.e., merely a number 1. The second simplest case
was given by (20.74) that describes D(1). Meanwhile, the simplest case for (ii) is
ð1=2Þ
D(1/2) whose matrix representation was given by Dα,β,γ in (20.45) or by gαβγ in
(20.300). With ±gαβγ , both Ad[±gαβγ] produce a real orthogonal (3, 3) matrix
expressed as (20.302). This representation matrix allowedly belongs to SO(3).
cos πt - sin πt
f ðt Þ = :
sin πt cos πt
Then, f(t) is a continuous matrix function of all real numbers t. We have f(0) = a
and f(1) = b and, hence, a and b are connected to each other within SO(2).
916 20 Theory of Continuous Groups
f ð2t Þ ð0 ≤ t ≤ 1=2Þ
hðt Þ = gð2t - 1Þ ð1=2 ≤ t ≤ 1Þ
,
so that h(t) can be a third continuous curve. Then, h(1/2) = f(1) = g(0). From the
supposition we have h(0) = f(0) = a, h(1/2) = f(1) = g(0) = b, and h(1) = g(1) = c.
This implies if a~b and b~c, then we have a~c.
Let us give another related definition.
Definition 20.6 Let A be a subset of GL(n, ℂ). Let a point a 2 A. Then, a collection
of points that are connected to a within A is said to be a connected component of
a and denoted by C(a).
From Proposition 20.2 (i), a 2 C(a). Let C(a) and C(b) be two different connected
components. We have two alternatives with C(a) and C(b). (I) C(a) \ C(b) = ∅.
(II) C(a) = C(b). Suppose c 2 C(a) \ C(b). Then, it follows that c~a and c~b. From
Proposition 20.2 (ii) and (iii), we have a~b. Hence, we get C(a) = C(b). Thus, the
subset A is a direct sum of several connected components such that
A = CðaÞ [ CðbÞ [ ⋯ [ CðzÞ with any CðpÞ \ CðqÞ = ∅ðp ≠ qÞ, ð20:317Þ
satisfy f(a) = b and f(a′) = b′. Then, from Definition 20.5 there is a continuous
function g(t) (0 ≤ t ≤ 1) that satisfies g(0) = a and g(1) = a′.
Now, define a function h(t) f[g(t)]. Then, h(t) is a continuous mapping with
h(0) = f[g(0)] = f(a) = b and h(1) = f[g(1)] = f(a′) = b′. Again from Definition 20.5,
h(t) is a continuous curve that connects b and b′ within B. Meanwhile, from
supposition of f(A) = B, with 8b, b′ 2 B we must have ∃ a0 , a00 2 A that satisfy
f(a0) = b and f a00 = b0 . Consequently, from Definition 20.7, B is a connected set as
well. This completes the proof.
Applications of Theorem 20.8 are given below as examples.
Example 20.7 Let f(x) be a real continuous function defined on a subset
A ⊂ GL(n, ℝ), where x 2 A with A being a connected set. Suppose that for a
pair of a, b 2 A, f(a) = p and f(b) = q ( p, q : real) with p < q. Then, f(x) takes all
real numbers for its value that exist on the interval [p, q]. In other words, we have
f(A) ⊃ [p, q]. This is well-known as an intermediate value theorem.
Example 20.8 [9] Suppose that A is a connected subset of GL(n, ℝ). Then, with any
pair of elements a, b 2 A ⊂ GL(n, ℝ), we must have a continuous function
g(t) (0 ≤ t ≤ 1) such that g(0) = a and g(1) = b. Meanwhile, suppose that we can
define a real determinant for any 8g 2 A as a real continuous function detg(t).
Now, suppose that deta = p < 0 and detb = q > 0 with p < q. Then, from
Theorem 20.8 we would have det(A) ⊃ [p, q]. Consequently, we must have some
x 2 A such that det(x) = 0. But this is in contradiction to A ⊂ GL(n, ℝ), because we
must have det(A) ≠ 0. Thus, within a connected set the sign of the determinant should
be constant, if the determinant is defined as a real continuous function.
In relation to the discussion of the connectedness, we have the following general
theorem.
Theorem 20.9 [9] Let G0 be a connected component within a linear Lie group
G that contains the identity element e. Then, G0 is an invariant subgroup of G. A
connected component C(g) containing g 2 G is identical with gG0 = G0g.
Proof Let a, b 2 G0. Since G0 = C(e), from the supposition there are continuous
curves f(t) and g(t) within G0 that connect a with e and b with e, respectively, such
that f(0) = a, f(1) = e and g(0) = b, g(1) = e. Meanwhile, let h(t) f(t)[g(t)]-1. Then,
h(t) is another continuous curve. We have h(0) = f(0)[g(0)]-1 = ab-1 and h(1) = f(1)
[g(1)]-1 = e(e-1) = e e = e. That is, ab-1 and e are connected, being indicative of
ab-1 2 G0. This implies that G0 is a subgroup of G.
Next, with 8g 2 G and 8a 2 G0 we have f(t) in the same sense as the above.
Defining k(t) as k(t) = g f(t)g-1, k(t) is a continuous curve within G with k(0) = gag-1
and k(1) = g eg-1 = e. Hence, we have gag-1 2 G0. Since a is an arbitrary point of G0,
we have gG0g-1 ⊂ G0 accordingly. Similarly, g is an arbitrary point of G, and so
replacing g with g-1 in the above, we get g-1G0g ⊂ G0. Operating g and g-1 from the
left and right, respectively, we obtain G0 ⊂ gG0g-1. Combining this with the above
918 20 Theory of Continuous Groups
relation gG0g-1 ⊂ G0, we obtain gG0g-1 = G0 or gG0 = G0g. This means that G0 is
an invariant subgroup of G; see Sect. 16.3.
Taking 8a 2 G0 and f(t) in the same sense as the above again, l(t) = g f(t) is a
continuous curve within G with l(0) = ga and l(1) = g e = g. Then, ga and g are
connected within G. That is, ga 2 C(g). Since a is an arbitrary point of G0,
gG0 ⊂ C(g). Meanwhile, choosing any 8p 2 C(g), we set a continuous curve
m(t) (0 ≤ t ≤ 1) that connects p with g within G. This implies that m(0) = p and
m(1) = g.
Also setting a continuous curve n(t) = g-1m(t), we have n(0) = g-1m(0) = g-1p
and n(1) = g-1m(1) = g-1g = e. This means that g-1p 2 G0. Multiplying g on its
both sides, we get p 2 gG0. Since p is arbitrarily chosen from C(g), we get
C(g) ⊂ gG0. Combining this with the above relation gG0 ⊂ C(g), we obtain
C(g) = gG0 = G0g. These complete the proof.
In the above proof, we add the following statement: In Sect. 16.2 with the
necessary and sufficient condition for the subset to be a subgroup, we describe
(1) hi , hj 2 H ⟹hi ⋄hj 2 H . (2) h 2 H ⟹h - 1 2 H . In (1) if we replace hj with
hi- 1 , hi ⋄hj- 1 = hi ⋄hi- 1 = e 2 H . In turn, if we replace hi with e,
hi ⋄hj- 1 = e⋄hj- 1 = hj- 1 2 H . Finally, if we replace hj = hj- 1 ,
-1
hi ⋄hj- 1 = hi ⋄ hj- 1 = hi ⋄hj 2 H . Thus, H satisfies the axioms (A1), (A3), and
(A4) of Sect. 16.1 and, hence, forms a (sub)group. In other words, the conditions
(1) and (2) are combined so as to be
hi 2 H , hj 2 H ⟹hi ⋄hj- 1 2 H :
In Chap. 17 we dealt with finite groups related to O(3) and SO(3), i.e., their
subgroups. In this section, we examine several important properties in terms of Lie
groups.
The groups O(3) and SO(3) are characterized by their determinant detO(3) = ± 1
and detSO(3) = 1. Example 20.8 implies that in O(3) there should be two different
connected components according to detO(3) = ± 1. Also Theorem 20.9 tells us that
a connected component C(E) is an invariant subgroup G0 in O(3), where E is the
identity element of O(3). Obviously, G0 is SO(3). In turn, another connected
component is SO(3)c, where SO(3)c denotes a complementary set of SO(3); with
the notation see Sect. 6.1. Remembering (20.317), we have
20.5 Connectedness of Lie Groups 919
-1 0 0
where E denotes a (3, 3) unit matrix and - E = 0 -1 0 . In the above,
0 0 -1
SO(3) is identical to C(E) with SO(3) itself being a connected set; SO(3)c is identical
with C(-E) that is another connected set.
Another description of O(3) is given by [13]
cos ω - sin ω 0
Rω = sin ω cos ω 0 : ð20:322Þ
0 0 1
Note that -E is commutative with any Q (or Q-1). Also, notice that from
(20.323) - R0ω and -Rω are again connected via a unitary similarity transformation.
From (20.322), we have
920 20 Theory of Continuous Groups
- cos ω sin ω 0
- Rω = - sin ω - cos ω 0
0 0 -1
cos ðω þ π Þ - sin ðω þ π Þ 0
= sin ðω þ π Þ cos ðω þ π Þ 0 , ð20:324Þ
0 0 -1
0 -1 0 -1 0 0 0 1 0
-1 0 0 0 -1 0 = 1 0 0 , ð20:325Þ
0 0 -1 0 0 -1 0 0 1
where the first component of LHS belongs to O (octahedral rotation group) and the
RHS belongs to Td (tetrahedral group) that gives a mirror symmetry. Combining
(20.324) and (20.325), we get the following schematic representation:
1
1
F= ⋱ , ð20:326Þ
1
-1
where the last side represents the coset decomposition (Sect. 16.2). This is essen-
tially the same as (20.321). In terms of the isomorphism, we have
20.5 Connectedness of Lie Groups 921
where E is a (n, n) identity matrix. Note that since -E is commutative with any
elements of O(n), the direct-product of (20.329) is allowed.
Of the connected components, that containing the identity element is of particular
importance. This is because as already mentioned in Sect. 20.3.1, the “initial”
condition of the one-parameter group A(t) is set at t = 0 such that
Að0Þ = E: ð20:238Þ
Under this condition, it is important to examine how A(t) evolves with t, starting
from E at t = 0. In this connection we have the following theorems that give good
grounds to further discussion of Sect. 20.2. We describe them without proof.
Interested readers are encouraged to look up literature [9].
Theorem 20.10 Let G be a linear Lie group. Let G0 be a connected component of
the identity element e 2 G. Then, G0 is another linear Lie group and the Lie algebra
of G and G0 is identical.
Theorem 20.10 shows the significance of the connected component of the
identity. From this theorem the Lie algebras of, e.g., oðnÞ and soðnÞ are identical,
namely both of the Lie algebras are given by real skew-symmetric matrices. This is
inherently related to the properties of Lie algebras. The zero matrix (i.e., zero vector)
as an element of Lie algebra corresponds to the identity element of the Lie group.
This is obvious from the relation
e = exp 0,
where e is the identity element of the Lie group and 0 represents the zero matrix.
Basically Lie algebras are suited for describing local properties associated with
infinitesimal transformations around the identity element e. More specifically, the
exponential functions of the real skew-symmetric matrices cannot produce -E.
Considering that one of the connected components of O(3) is SO(3) that contains
the identity element, it is intuitively understandable that oðnÞ and soðnÞ are the same.
Another interesting point is that SO(3) is at once an open set and a closed set (i.e.,
clopen set; see Sect. 6.1). This is relevant to Theorem 6.3 which says that a necessary
and sufficient condition for a subset S of a topological space to be both open and
closed at once is Sb = ∅. Since the determinant of any elements of O(3) is alterna-
tively ±1, it is natural that SO(3) has no boundary; if there were boundary, its
determinant would be zero, in contradiction to SO(3) ⊂ GL(n, ℝ); see Example 20.8.
922 20 Theory of Continuous Groups
Theorem 20.11 Let G be a connected linear Lie group. (The Lie group G may be G0
in the sense of Theorem 20.10.) Then, 8g 2 G can be described by
f : I⟶G or f ðI Þ ⊂ G: ð20:330Þ
Then, the function f is called a path, in which f(0) and f(1) are an initial point and
an end point, respectively. If f(0) = f(1) = x0, f is called a loop (i.e., a closed path) at
x0 [14]. If f(t) x0 (0 ≤ t ≤ 1), f is said to be a constant loop.
Definition 20.9 Let f and g be two paths. If f and g can continuously be deformed
from one to the other, they are said to be homotopic.
To be more specific with Definition 20.9, let us define a function h(s, t) that is
continuous with respect to s and t in a region I × I = {(s, t); 0 ≤ s ≤ 1, 0 ≤ t ≤ 1}. If
furthermore h(0, t) = f(t) and h(1, t) = g(t) hold, f and g are homotopic [9].
Definition 20.10 Let G be a connected Lie group. If all the loops at an arbitrary
chosen point 8x 2 G are homotopic to a constant loop, G is said to be simply
connected.
This would somewhat be an abstract concept. Rephrasing the statement in a word,
for G to be simply connected is that loops at any 8x 2 G can continually be
contracted to that point x. A next example helps understand the meaning of being
simply connected.
Example 20.9 Let S2 be a spherical surface of ℝ3 such that
S2 = x 2 ℝ3 ; hxjxi = 1 : ð20:331Þ
Any x is equivalent in virtue of the spherical symmetry of S2. Let us think of any
loop at x. This loop can be contracted to that point x. Hence, S2 is simply connected.
20.5 Connectedness of Lie Groups 923
cos α cos β cos γ - sin α sin γ - cos α cos β sin γ - sin α cos γ cos α sin β
R3 ðα, β; γ Þ ¼ sin α cos β cos γ þ cos α sin γ - sin α cos β sin γ þ cos α cos γ sin α sin β :
- sin β cos γ sin β sin γ cos β
ð20:332Þ
Note that in (20.332) we represent the rotation in the moving coordinate system.
Putting β = 0 in (20.332), we get
R3 ðα; 0; γ Þ
cos α cos γ - sin α sin γ - cos α sin γ - sin α cos γ 0
= sin αð cos 0Þ cos γ þ cos α sin γ - sin αð cos 0Þ sin γ þ cos α cos γ 0
0 0 1
cos ðα þ γ Þ - sin ðα þ γ Þ 0
= sin ðα þ γ Þ cos ðα þ γ Þ 0 :
0 0 1
ð20:333Þ
- cos ðα - γ Þ - sin ðα - γ Þ 0
R3 ðα; π; γ Þ = - sin ðα - γ Þ cos ðα - γ Þ 0 : ð20:335Þ
0 0 -1
ðcos 2 α cos 2 βþ sin 2 αÞ cos γ cos α sin α sin 2 βð1 - cos γ Þ cos α cos β sin βð1 - cos γ Þ
þ cos α sin β
2 2
- cos β sin γ þ sin α sin β sin γ
R3 ¼ cos α sin α sin 2 βð1 - cos γ Þ ðsin 2 α cos 2 βþ cos 2 αÞ cos γ sin α cos β sin βð1 - cos γ Þ
:
þ cos β sin γ þ sin 2 α sin 2 β - cos α sin β sin γ
cos β sin β cos αð1 - cos γ Þ sin α cos β sin βð1 - cos γ Þ sin 2 β cos γ
- sin α sin β sin γ þ cos α sin β sin γ þ cos 2 β
ð20:337Þ
Note in (20.337) we represent the rotation in the fixed coordinate system. In this
case, again we have arbitrariness with the choice of parameters. Figure 20.7 shows a
geometry of the rotation axis (A) accompanied by a rotation γ; see Fig. 17.18 once
again. In this parameter space, specifying SO(3) the rotation is characterized by
azimuthal angle α, zenithal angle β, and magnitude of rotation γ. The domains of
variability of the parameters (α, β, γ) are
0 ≤ α ≤ 2π, 0 ≤ β ≤ π, 0 ≤ γ ≤ 2π:
Figure 20.7a shows a geometry viewed in parallel to the equator that includes a
zenithal angle β as well as a point P (i.e., a point of intersection between the rotation
axis and geosphere) and its antipodal point P. If β is measured at P, β and γ should be
replaced with π - β and 2π - γ, respectively. Meanwhile, Fig. 20.7b, c depict an
azimuthal angle α viewed from above the north pole (N). If the azimuthal angle α is
measured at the antipodal point P, α should be replaced with either α + π (in the case
of 0 ≤ α ≤ π; see Fig. 20.7b) or α - π (in the case of π < α ≤ 2π; see Fig. 20.7c).
Consequently, we have the following correspondence:
ðα, β, γ Þ $ ðα ± π, π - β, 2π - γ Þ: ð20:338Þ
Thus, we confirm once again that the different sets of parameters give the same
rotation. In other words, two different points in the parameter space correspond to
the same transformation (i.e., rotation). In fact, using these different sets of param-
eters, the matrix form of (20.337) is held unchanged. The conformation is left for
readers.
Because of the aforementioned characteristic of the parameter space, a loop
cannot be contracted to a single point in SO(3). For this reason, SO(3) is not simply
connected.
In contrast to Example 20.10, the mapping from the parameter space S3 to
SU(2) is bijective in the case of SU(2). That is, any two different points of S3 give
different transformations on SU(2) (i.e., injective). For any element of SU(2) it has a
corresponding point in S3 (i.e., surjective). This can be rephrased such that SU(2) and
S3 are isomorphic.
In this context, let us give important concepts. Let (T, τ) and (S, σ) be two
topological spaces. Suppose that f : (T, τ) (S, σ) is a bijective mapping. If in this
20.5 Connectedness of Lie Groups 925
(b)
(c)
N
926 20 Theory of Continuous Groups
case, furthermore, both f and f -1 are continuous mappings, (T, τ) and (S, σ) are said
to be homeomorphic [14]. In our present case, both SU(2) and S3 are homeomorphic.
Apart from being simply connected or not, SU(2) and SO(3) resemble in many
aspects. The resemblance has already shown up in Chap. 3 when we considered the
(orbital) angular momentum and generalized angular momentum. As shown in
(3.30) and (3.69), the commutation relation was virtually the same. It is obvious
when we compare (20.271) and (20.275). That is, in terms of Lie algebras their
structure constants are the same and, eventually, the structure of corresponding Lie
groups is related. This became well understood when we considered the adjoint
representation of the Lie group G on its Lie algebra g. Of the groups whose Lie
algebras are characterized by the same structure constants, the simply connected
group is said to be a universal covering group. For example, among the Lie groups
O(3), SO(3), and SU(2), only SU(2) is simply connected and, hence, is a universal
covering group.
The understanding of the Lie algebra is indispensable to explicitly describing the
elements of the Lie group, especially the connected components of the identity
element e 2 G. Then, we are able to understand the global characteristics of the
Lie group. In this chapter, we have focused on SU(2) and SO(3) as a typical example
of linear Lie groups.
Finally, though formal, we describe a definition of a topological group.
Definition 20.11 [15] Let G be a set. If G satisfies the following conditions, G is
called a topological group.
1. The set G is a group.
2. The set G is a T1-space.
3. Group operations (or calculations) of G are continuous. That is, let P and Q be
two mappings such that
References
1. Inui T, Tanabe Y, Onodera Y (1990) Group theory and its applications in physics. Springer,
Berlin
References 927
2. Inui T, Tanabe Y, Onodera Y (1980) Group theory and its applications in physics, Expanded
edn. Shokabo, Tokyo. (in Japanese)
3. Takagi T (2010) Introduction to analysis, Standard edn. Iwanami, Tokyo. (in Japanese)
4. Arfken GB, Weber HJ, Harris FE (2013) Mathematical methods for physicists, 7th edn.
Academic Press, Waltham
5. Hamermesh M (1989) Group theory and its application to physical problems. Dover, New York
6. Rose ME (1995) Elementary theory of angular momentum. Dover, New York
7. Hassani S (2006) Mathematical physics. Springer, New York
8. Edmonds AR (1957) Angular momentum in quantum mechanics. Princeton University Press,
Princeton, NJ
9. Yamanouchi T, Sugiura M (1960) Introduction to continuous groups. Baifukan, Tokyo.
(in Japanese)
10. Dennery P, Krzywicki A (1996) Mathematics for physicists. Dover, New York
11. Satake I-O (1975) Linear algebra (pure and applied mathematics). Marcel Dekker, New York
12. Satake I (1974) Linear algebra. Shokabo, Tokyo. (in Japanese)
13. Steeb W-H (2007) Continuous symmetries, Lie algebras, differential equations and computer
algebra, 2nd edn. World Scientific, Singapore
14. McCarty G (1988) Topology. Dover, New York
15. Murakami S (2004) Foundation of continuous groups, Revised edn. Asakura Publishing,
Tokyo. (in Japanese)
Part V
Introduction to the Quantum Theory of
Fields
matrices (e.g., Hermitian matrix or unitary matrix) because of the presence of the
Lorentz boost. We examine how these issues are dealt with within the framework of
the standard matrix theory. This last part of the book deals with advanced topics of
Lie algebra, with its differential representation stressed. Since the theory of vector
spaces underlies all parts of the final part, several extended concepts of vector spaces
are explained in relation to Part III.
Historically speaking, the Schrödinger equation has obtained citizenship across a
wide range of scientific community including physics, chemistry, materials science,
and even biology. As for the Dirac equation, on the other hand, it seems to have
found the place only in theoretical physics, especially the quantum theory of fields.
Nonetheless, in recent years the Dirac equation often shows up as a fundamental
equation in, e.g., solid-state physics. Under these circumstances, we wish to inves-
tigate the constitution and properties of the Dirac equation as well as the Dirac
operators and spinors on a practical level throughout the whole of this part.
Chapter 21
The Dirac Equation
In this chapter we describe the feature of the Dirac equation including its historical
background. The Dirac equation is a quantum theory of electron that deals with the
electron dynamics within the framework of special theory of relativity. Before the
establishment of the theory achieved by Dirac, several attempts to develop the
relativistic quantum-mechanical theory of electron were made mainly on the basis
of the Klein-Gordon equation. The theory, however, had a serious problem in terms
of the probability density. Dirac proposed the Dirac equation, named after himself, to
dispel the problem. Most importantly, the Dirac equation opened up a novel route to
the quantum theory of fields. The quantum theory of fields has first been established
as the quantum electrodynamics (QED) that deals with electron and photons as
quantized fields. The theory has later become the foundation of the advanced
quantum field theory based on the gauge theory. In this chapter we study and pursue
the framework of the Dirac equation with emphasis upon the plane wave solutions to
make it a clear starting point for gaining a good understanding of QED. The structure
of the Dirac equation can be well-understood using the polar coordinate representa-
tion. The results will be used in Chap. 24 as well to further develop the theory. The
Dirac equation is originally “relativistic.” To help understand the nature of the Dirac
equation, we add several remarks on the special theory of relativity at the beginning
of the chapter accordingly.
2
1 ∂ mc 2
-—2 þ ϕðt, xÞ = 0, ð21:1Þ
c2 ∂t 2 ħ
where the operator — 2 has already appeared in (1.24) and (7.30). More succinctly we
have
mc 2
□þ ϕðt, xÞ = 0,
ħ
2
1 ∂
□ - — 2:
c2 ∂t 2
E 2 = p 2 c2 þ m 2 c4 , ð21:2Þ
where E, p, and m are energy, momentum, and mass of the particle, respectively; see
(1.20) and (1.21). Equation (21.1) can be obtained using the correspondence
between momentum or energy and differential operators expressed in (1.31) or
(1.40).
Nonetheless, the Klein-Gordon equation included a fundamental difficulty in
explaining a probability density for the presence of particle. The argument is as
follows: The Schrödinger equation is described as
∂ψ ħ2 2
iħ = - — þ V ψ: ð1:45Þ
∂t 2m
∂ψ ħ2 2
iħψ =- ψ — ψ þ Vψ ψ: ð21:3Þ
∂t 2m
Taking complex conjugate of (1.45) and multiplying ψ from the right, we have
∂ψ ħ2
- iħ ψ= - — 2 ψ ψ þ Vψ ψ: ð21:4Þ
∂t 2m
∂ψ ∂ψ ħ2
iħ ψ þ ψ = — 2ψ ψ - ψ — 2ψ : ð21:5Þ
∂t ∂t 2m
∂ ħ
ðψ ψ Þ þ ψ — 2 ψ - — 2 ψ ψ = 0: ð21:6Þ
∂t 2mi
∂ρ
þ div j = 0: ð21:8Þ
∂t
iħ ∂ϕ ∂ϕ
ρðx, t Þ ϕ - ϕ ,
2m ∂t ∂t
ħ
jðx, t Þ ½ϕ — ϕ - ð— ϕ Þϕ ð21:9Þ
2mi
c = 1, ħ = 1: ð21:10Þ
We follow the custom from here on. Using the natural units, as the Klein-
Gordon equation we have
□ þ m2 ϕðt, xÞ = 0: ð21:11Þ
x01
x = xμ = t
x
= x2
x3 ðμ = 0, 1, 2, 3Þ ð21:12Þ
x
where x0 represents the time coordinate and x1, x2, and x3 show the space coordi-
nates. The quantity x is referred to collectively as a space–time coordinate. We
follow the custom as well with the superscript notation of physical quantities. The
implication will be mentioned later (also, see Sect. 24.1). The quantity
xμ (μ = 0, 1, 2, 3) is called a four-vector; this is often represented by a single letter
x. That is, ϕ(x) is meant by ϕ(t, x).
Example 21.1 Suppose that a train is running on a straight rail track with a constant
velocity v toward the positive direction of the x-axis and that a man (M) standing on
firm ground is watching the train. Also, suppose that at the moment when the tail end
of the train is passing in front of him, he and a passenger (P) on the train adjust their
clocks to t = t′ = 0. Here, t is time for the man on firm ground and t′ for the passenger
(Fig. 21.1). Suppose moreover that a light is emitted at the tail end of the train at
t = t′ = 0.
Let O be the inertial frame of reference fixed on the man on firm ground and let O′
be another inertial frame of reference fixed on the train. Also, let the origin of O be
on that man and let that of O′ be on the tail end of the train. Let us assume that the
emitted light is observed at t = t at a position vector x (=xe1 + ye2 + ze3) of O and
that the same light is observed at t′ = t′ at a position vector x0 = x0 e01 þ y0 e02 þ z0 e03
of O′. In the above, e1, e2, e3 and e01 , e02 , e03 are individual basis sets for the Cartesian
coordinates related to O and O′, respectively.
To make the situation clearer, suppose that a laser beam has been emitted at t = 0
and x = 0 in the frame O and that the beam has been emitted at t′ = 0 and x′ = 0 in
21.2 Several Remarks on the Special Theory of Relativity 935
the frame O′. After a moment, the same beam hits a ball at a certain place Q. Let Q be
specified as x = x in the frame O and at x′ = x′ in the frame O′. Let time when the
beam hits the ball be t = t in the frame O and t′ = t′ in the frame O′. Naturally, the
man M chooses his standing position for x = 0 (i.e., the origin of O) and the
passenger P chooses the tail end of the train for x′ = 0 (i.e., the origin of O′). In
this situation, the special theory of relativity tells us that the following relationship
must hold:
ðct Þ2 - x2 = ðct 0 Þ - x0 = 0:
2 2
Note that the light velocity c is the same with the two frames of O and O′ from the
requirement of the special theory of relativity. Using the natural units, the above
equation is rewritten as
t 2 - x2 = t 0 - x0 = 0:
2 2
ð21:13Þ
Equation (21.13) represents an invariant (i.e., zero in this case) with respect to the
choice of space–time coordinates, or inertial frames of reference.
Before taking a step further to think of the implication of (21.13) and generalize
the situation of Example 21.1, let us introduce a mathematical technique. Let B(y, x)
be a bilinear form defined as [3, 4]
x1
Bðy, xÞ yT ηx = y1 ⋯ yn η ⋮ , ð21:14Þ
xn
x1 y1
x ⋮ ,y ⋮ :
xn yn
Although the coordinates xi and yi (i = 1, ⋯, n) are usually chosen from either real
numbers or complex numbers, in this chapter we assume that both xi and yi are real.
In the Minkowski space we will be dealing with in this chapter, xi and yi (i = 0, 1, 2, 3)
are real numbers. The coordinates x0 and y0 denote the time coordinate and xk and
yk (k = 1, 2, 3) represent the space coordinates. We call xi and yi (i = 0, 1, 2, 3) the
space–time coordinates altogether.
In (21.14) η is said to be the metric tensor that characterizes the vector space
which we are thinking of. In a general case of a n-dimensional vector space, we have
1
⋱
1
η= , ð21:15Þ
-1
⋱
-1
1 0 0 0
0 -1 0 0
η= : ð21:16Þ
0 0 -1 0
0 0 0 -1
The tensor η of (21.16) is called Minkowski metric and the vector space pertinent
to the special theory of relativity is known as a four-dimensional Minkowski space
accompanied by space–time coordinates represented by (21.12). By virtue of the
metric tensor η, we have the following three cases:
<0
Bðx, xÞ =0 : ð21:17Þ
>0
ðx0 x1 x2 x3 Þ x0 x1 x2 x3 η:
x01
Bðx, xÞ = ðx0 x1 x2 x3 Þ x2
x3 : ð21:18Þ
x
3 3
xν = xμ ημν = xμ ηνμ , ð21:19Þ
μ=0 μ=0
x0 = x 0 , x 1 = - x 1 , x 2 = - x 2 , x 3 = - x 3 : ð21:20Þ
3
Bðx, xÞ = xμ xμ xμ xμ , ð21:21Þ
μ=0
where the same subscripts and superscripts are supposed to be summed over 0 to
3. Using this summation convention, (21.13) can be rewritten as
μ
xμ xμ = x0μ x0 = 0: ð21:22Þ
Notice that η of (21.16) is symmetric matrix. The inverse matrix of η is the same
as η. That is, η-1 = η or
where δμρ is Kronecker delta. Once again, note that the subscript and superscript ν are
summed. The symbols ημν and ηνρ represent η-1 and η, respectively. That is,
(ημν) = (η)μν = (η-1)μν and (ημν) = (η)μν; with these notations see (11.38).
What we immediately see from (21.17) is that unlike the inner product, B(x, x)
does not have a positive definite feature. If we put e.g., x0 = x0 = 0, we have B(x,
x) ≤ 0. For this reason, the Minkowski metric η is said to be an indefinite metric or
the Minkowski space is called an indefinite metric space.
Inserting E = Λ-1Λ in (21.18), we rewrite it as
938 21 The Dirac Equation
x01
Bðx, xÞ = ðx0 x1 x2 x3 ÞΛ - 1 Λ x2
x3 , ð21:23Þ
x
x00 x01
x01 =Λ x2 : ð21:24Þ
x02 x3
x03 x
x0 0 3
x00 12 μ μ
Bðx, xÞ = x00 x01 x02 x03 x = x0μ x0 = x0μ x0 = Bðx0 , x0 Þ: ð21:26Þ
x0 3 μ=0
μ σ
Bðx0 , x0 Þ = x0μ x0 = xσ Λ- 1 μ
Λμρ xρ = xσ δσρ xρ = xσ xσ = Bðx, xÞ:
x00 x0
x01 -1 T
¼ Λ x1
: ð21:27Þ
x02 x2
x03 x3
Remark 21.1 Up to now in this book, we have not thought about the necessity to
distinguish the covariant and contravariant vectors except for a few cases (see Sect.
20.3.2). This is because if we think of a vector, e.g., in ℝn (i.e., the n-dimensional
Euclidean space), the metric called Euclidean metric is represented by an n-dimen-
sional identity matrix En. Instead of dealing with the Lorentz transformation, let us
think of a familiar real orthogonal transformation in ℝ3.
In parallel with (21.14), as a bilinear form we have
x1 x1
Bðx, xÞ = x1 x2 x3 E3 x2 = x1 x 2 x 3 x2 ,
x3 x3
where E3 is a (3, 3) identity matrix. In this case the bilinear form B(x, x) is identical to
the positive definite inner product hx| xi. With the transformation of a contravariant
vector, we have
x0 1 x1
x0 2 =R x2 : ð21:28Þ
x0 3 x3
x01 T x1 x1 x1
= R-1
T
x02 x2 = RT x2 =R x2 , ð21:29Þ
x03 x3 x3 x3
where with the second equality we used R-1 = RT for an orthogonal matrix. From
(21.28) and (21.29), we find that with the orthogonal transformation in ℝ3 we do not
need to distinguish a contravariant vector and a covariant vector. Note, however, that
unless the transformation is orthogonal, the contravariant vector and covariant vector
should be distinguished. Also, we will come back to this issue in Chap. 24.
frame O′ has experienced the same events E and F at x′μ (E) and x′μ (F), respectively.
In that situation we define a following quantity s.
3
s ½ x0 ð F Þ - x 0 ð E Þ 2 - k=1
½xk ðF Þ - xk ðE Þ2 : ð21:30Þ
The quantity s is said to be a world interval between the events E and F. The
world interval between the two events is classified as follows:
timelike ðs : realÞ
s= spacelike ðs : imaginaryÞ ð21:31Þ
lightlike, null ðs : zeroÞ:
In terms of the special principle of relativity which demands that physical laws be
the same in every inertial frame of reference, the world interval s must be an
invariant under the Lorentz transformation. The relationship s = 0 is associated
with the principle of constancy of light velocity. Therefore, we can see (21.31) as a
natural extension of the light velocity constancy principle. Historically, this principle
has been established on the basis of the precise physical measurements including
those performed in the Michelson-Morley experiment [5].
Now we are in the position to determine an explicit form of Λ. To make the
argument simple, we revisit Example 21.1.
Example 21.2 We assume that in Fig. 21.1 the frame O′ is moving at a velocity of
v toward the positive direction of the x1-axis of the frame O so that the x1- and x′1-
axes can coincide. In this situation, it suffices to only take account of x1 as a space
coordinate. In other words, we have
x0 = x 2 x0 = x3 :
2 3
and ð21:32Þ
x0 0 x0
x0 1
= p
r
q
u x1
: ð21:34Þ
2 2 2 2
s2 = x 0 ð F Þ - x 0 ð E Þ - x1 ð F Þ - x1 ð E Þ = x 0 ð F Þ - x1 ð F Þ ,
21.2 Several Remarks on the Special Theory of Relativity 941
2 2 2 2
s 0 = x0 ð F Þ - x 0 ð E Þ - x0 ðF Þ - x0 ðE Þ = x0 ð F Þ - x0 ðF Þ : ð21:35Þ
2 0 0 1 1 0 1
0 2 1 2 2 2
s02 = x0 - x0 = x0 - x1 = s2 : ð21:36Þ
p2 - r 2 = 1, pq - ru = 0, q2 - u2 = - 1: ð21:37Þ
r þ uv = 0: ð21:39Þ
Since (21.37) and (21.39) are second-order equations with respect to p, q, r, and
u, we do not obtain a unique solution. Solving (21.37) and (21.39), we get
p
p = ± 1= 1 - v2 , u = ± p, q = - vp, and r = - vu.
However, p, q, r, and u must satisfy the following condition such that
lim p q
= 1 0
:
v→0 r u 0 1
1 1
x0 = p x0 = p
0 1
x0 - vx1 , - vx0 þ x1 :
1 - v2 1 - v2
x0 0 γ - γv x0
x0 1
= - γv γ x1
: ð21:40Þ
x0 γ γv x0 0
x1
= γv γ x0 1
: ð21:41Þ
0
x′ γ - γv 0 0 x0
x′
1
- γv γ 0 0 x1
2 = : ð21:42Þ
x′ 0 0 1 0 x2
3
x′ 0 0 0 1 x3
1
cosh ω = p , ð21:44Þ
1 - tan h2 ω
we have
1 p
cosh ω = p = γ, sinh ω = γv = v= 1 - v2 : ð21:45Þ
1 - v2
γ - γv 0 0 cosh ω - sinh ω 0 0
- γv γ 0 0 - sinh ω cosh ω 0 0
0 0 1 0 = 0 0 1 0 : ð21:46Þ
0 0 0 1 0 0 0 1
The matrix of (21.46) is Hermitian (or real symmetric) with real positive eigen-
values and the determinant of 1. We will come back to this feature in Chap. 24. If
two successive Lorentz transformations are given by tanhω1 and tanhω2, their
multiplication is given by
Equation (21.47) obviously shows the closure in terms of the multiplication. The
associative law with respect to the multiplication is evident. Thus, the matrices
represented by (21.46) satisfy the axioms (A1) to (A4) of Sect. 16.1. Hence, a
collection of these matrices forms a group (i.e., the Lorentz group). As clearly
seen in (21.46) and (21.47), the constitution of the Lorentz group is well character-
ized by the variable ω, which is sometimes called “rapidity.” Notice that the matrices
given above are not unitary. These matrices are often referred to as the Lorentz
boosts.
Example 21.3 Here let us briefly think of the Lorentz contraction. Suppose that a
straight stick exists in a certain inertial frame of reference O. The stick may or may
not be moving in reference to that frame. Being strictly formal, measuring a length of
the stick in that frame comprises (1) to measure and fix a spatial point P x1P , x2P , x3P
that corresponds to one end of the stick and (2) to measure and fix another spatial
point Q x1Q , x2Q , x3Q corresponding to the other end. The former and latter opera-
tions correspond to the events E and F, respectively. These operations must be done
at the same time in the frame. Then, the distance between P and Q (P and Q may be
either different or identical) defined as dPQ in the frame is given by a following
absolute value of the world interval:
3 2
d PQ - k=1
xkQ - xkP : ð21:48Þ
944 21 The Dirac Equation
We define the length of the straight stick as dPQ. We have dPQ ≥ 0; dPQ = 0 if and
only if P and Q are identical. Notice that the world interval between the events E and
F is a pure imaginary from (21.30), because those events happen at the same time.
Bearing in mind the above discussion, suppose that the straight stick is laid along
the x′1-direction with its end in contact with the tail end of the train. Suppose that the
length of the stick is L′ in the frame O′ of Example 21.1. This implies that from
(21.40) x′1 = L′. (Remember that the man standing on firm ground and the passenger
on the train adjusted their watches to t = t′ = 0, at the moment when the tail end of
the train was passing in front of the man.) If we assume that the length of the stick is
read off at t (or x0) = 0 in the frame O of Example 21.1, again from (21.40) we have
p p
L = x1 = 1 - v2 x0 = L0 1 - v2 = L0 =γ,
1
ð21:49Þ
where L is a length of the stick measured in the frame O. Thisp implies that the man on
firm ground judges that the stick has been shortened by 1 - v2 (i.e., the Lorentz
contraction).
In an opposite way, suppose that a straight stick of its length of L is laid along the
x1-direction with its end at the origin of the frame O. Similarly to the above case,
x1 = L. If the length of the stick is read off at t′ (or x′0) = 0 in the frame O′, from
(21.41), i.e., the inverse matrix to (21.40), this time we have
p p
L0 = x0 = 1 - v2 x1 = L 1 - v2 = L=γ,
1
where L′ is a length of the stick measured in the frame O′. The Lorentz contraction is
again observed.
Substituting x1 = L in (21.40) gives x′0 ≠ 0. This observation implies that even
though measurements of either end of the stick have been done at the same time in
the frame O, the coinstantaneity is lost with the frame O′.
The relativeness of the time dilation can be dealt with in a similar manner. The
discussion is left for readers.
In the above examples, we have shown the brief outline and constitution of the
special theory of relativity. Since the theory is closely related to the Lorentz
transformation, or more specifically the Lorentz group, we wish to survey the
constitution and properties of the Lorentz group in more detail in Chap. 24.
μ ρ
∂ ∂x0μ ∂ ∂Λ ρ x ∂ ∂ ∂ 0
∂ν ν
= ν 0μ
= ν 0μ
= Λμρ δρν 0μ = Λμν 0μ = Λμν ∂μ : ð21:51Þ
∂x ∂x ∂x ∂x ∂x ∂x ∂x
ν
Multiplying (21.51) by Λ - 1 κ
, we have
ν ∂ ν ∂ ν ∂ ∂ 0
∂ν Λ- 1 κ
= Λ- 1 κ
= Λμν Λ- 1 κ
= δμκ = = ∂κ : ð21:52Þ
∂xν ∂x0μ ∂x0μ ∂x0κ
A iγ μ ∂μ ,
we rewrite (21.50) as
ðA - mÞϕðxÞ = 0: ð21:54Þ
ð- A - mÞϕðxÞ = 0: ð21:56Þ
Equations (21.54) and (21.56) both share a solid footing for detailed analysis of
the Dirac equation. According to the custom, however, we solely use (21.54) for our
present purposes. As a consequence of the fact that the original Klein-Gordon
equation (21.11) is a second-order linear differential equation (SOLDE), we have
946 21 The Dirac Equation
another linearly independent solution (see Chap. 10). Letting that solution be χ(x),
we have
As in the case of Example 1.1 of Sect. 1.3, we choose exponential functions for
ϕ(x) and χ(x). More specifically, we assume the solutions described by
x0 p0
where x = x12
x3
, p= p12
p3
. Also, remark that
x p
px = pμ xμ = pμ xμ : ð21:60Þ
γ μ ∂μ γ ν ∂ν = γ μ γ ν ∂μ ∂ν = γ ν γ μ ∂ν ∂μ = γ ν γ μ ∂μ ∂ν ðμ ≠ νÞ, ð21:61Þ
where the last equality was obtained by exchanging ∂ν for ∂μ. Therefore, the
coefficient of ∂μ∂ν (μ ≠ ν) is
γμγν þ γνγμ:
Taking account of the fact that the cross terms ∂μ∂ν (μ ≠ ν) are not present in the
original Klein-Gordon equation (21.11), we must have
γ μ γ ν þ γ ν γ μ = 0 ðμ ≠ νÞ: ð21:62Þ
To satisfy the relation (21.62), γ μ must have a matrix form. With regard to the
2
terms of ∂μ , comparing their coefficients with those of (21.11), we get
2 2 2 2
γ0 = 1, γ 1 = γ2 = γ3 = - 1: ð21:63Þ
γ μ γ ν þ γ ν γ μ = 2ημν : ð21:64Þ
Introducing an anti-commutator
we obtain
fγ μ , γ ν g = 2ημν : ð21:66Þ
The quantities γ μ are represented by (4, 4) matrices and called the gamma
matrices. Among several representations of the gamma matrices, the Dirac repre-
sentation is frequently used. With this representation, we have [1, 2, 6]
σk
γ0 = 1
0
0
-1
, γk = 0
- σk 0
ðk = 1, 2, 3Þ, ð21:67Þ
where each component denotes (2, 2) matrix; 1 shows a (2, 2) identity matrix;
0 shows a (2, 2) zero matrix; σ k (k = 1, 2, 3) are the Pauli spin matrices given in
(20.41). That is, as a full representation of the gamma matrices we have
1 0 0 0
0 1 0 0
γ0 = ,
0 0 - 1 0
0 0 0 -1
0 0 0 1 0 0 0 i
0 0 1 0 0 0 -i 0
γ1 = , γ2 = , ð21:68Þ
0 -1 0 0 0 -i 0 0
-1 0 0 0 i 0 0 0
0 0 -1 0
0 0 0 1
γ3 = :
1 0 0 0
0 -1 0 0
Let us determine the solutions of the Dirac equation. Replacing ϕ(x) and χ(x) of
(21.58) with those of (21.59), we get
In accordance with (21.68) and (21.69), u( p) and v( p) are (4, 1) matrices, i.e.,
column vectors or strictly speaking spinors to be determined below. If we replaced
p of v( p) with -p, where
948 21 The Dirac Equation
p01 - p01
-E - p2
p= E
p
= p2
p3 , -p= -p
= - p3 , ð21:70Þ
p -p
we would have
pμ γ μ - m vð- pÞ = 0: ð21:71Þ
So far, the relevant discussion looks parallel with that appeared in Example 1.1 of
Sect. 1.3. However, the present case differs from the previous example in the
following aspects: (1) Although boundary conditions (BCs) of Dirichlet type were
imposed in the previous case, we are dealing with a plane wave solution with an
infinite space–time extent. (2) In the previous case (Example 1.1), k could take both
positive and negative numbers. In the present case, however, a negative p0 does not
fit naturally into our intuition, even though the Lorentz covariance permits a negative
number for p0. This is because p0 represents energy and unlike the momentum p, p0
does not have “directionality.”
Instead of considering p as the quantities changeable between -1 and +1, we
single out p0 and deal with it as a positive parameter (0 < p0 < + 1) with varying
p1
values depending on jpj, while assuming that p = p2 are parameters freely
p3
changeable between -1 and +1. This treatment seems at first glance to violate
the Lorentz invariance, but it is not the case. We will discuss the issue in Chap. 22.
Thus, we safely assign p0 (>0) and -p0 (<0) to u( p) and v( p), respectively.
Meanwhile, as shown in (1.40) we have a relationship between the time coordinate
and energy described by
∂ ∂ ∂
iħ = i = i 0 $ E p0 : ð1:40Þ
∂t ∂t ∂x
Therefore, once we have defined p0 > 0, ϕ(x) = u( p) e-ipx of (21.58) and (21.59)
must be assigned to the positive-energy state and χ(x) = v( p) eipx of (21.58) and
(21.59) to the negative-energy state. To further specify the solutions of (21.69) we
restate (21.69) as
pμ γ μ - m uðp, hÞ = 0, ð21:73Þ
μ
- pμ γ - m vðp, hÞ = 0, ð21:74Þ
21.3 Constitution and Solutions of the Dirac Equation 949
where (21.73) and (21.74) represent the positive-energy state and negative-energy
state, respectively. The index h is called helicity and takes a value either +1 or -1.
Notice that we have abandoned u( p) and v( p) in favor of u(p, h) and v(p, h). We will
give further consideration to the index h in the next section.
p0 - m 0 p3 - p1 - ip2 c0
0 p -m
0
- p þ ip2
1
- p3 c1
= 0,
- p3 p1 þ ip2 - p0 - m 0 c2
p1 - ip2 p3 0 -p -m
0 c3
ð21:75Þ
c01
where c2 is a column vector representation of u(p, h). That is, we have
c3
c
c01
uðp, hÞ = c2
c3 : ð21:76Þ
c
We must have u(p, h) ≢ 0 to obtain a physically meaningful solution with u(p, h).
For this, the determinant of the (4, 4) matrix of (21.75) should be zero. That is,
950 21 The Dirac Equation
p0 - m 0 p3 - p1 - ip2
0 p0 - m - p1 þ ip2 - p3
= 0: ð21:77Þ
- p3 p1 þ ip2 - p0 - m 0
p1 - ip2 p3 0 - p0 - m
This is a quartic equation with respect to pμ. The equation leads to a simple form
described by
2 2
p0 - p2 - m 2 = 0: ð21:78Þ
Hence, we have
2
p0 - p2 - m2 = 0 or p0 = p2 þ m2 : ð21:79Þ
Remember that p0 represents the positive energy. Note that (21.79) is consistent
with (1.8), (1.9), and (1.20); recall the natural units. The result, however, has already
been implied in (21.2). Thus, we have to find another way to solve (21.75). To this
end, first let us consider (21.75) in a simpler form. That is, putting p1 = p2 = 0 in
(21.75), (21.75) reads as
p0 - m 0 p3 0 c0
0 p0 - m 0 - p3 c1
= 0: ð21:80Þ
- p3 0 - p0 - m 0 c2
0 p3 0 - p0 - m c3
Clearly, the first and second column vectors are linearly independent with each
other. Multiplying the first column by ( p0 + m) and the third column by p3 produces
the same column vectors. This implies that the first and third column vectors are
linearly dependent with each other. In a similar manner, the second and fourth
column vectors are linearly dependent as well. Thus, the rank of the matrix of
(21.80) is two. If p3 = 0 in (21.80), i.e., p = 0, from (21.79) we have
p0 = m (>0). Then, (21.80) can be rewritten as
0 0 0 0 c0
0 0 0 0 c1
- 2m = 0: ð21:81Þ
0 0 0 c2
0 0 0 - 2m c3
Equation (21.81) gives the solution of an electron at rest. We will come back to
this simple situation in Sect. 24.5. The rank of the matrix of (21.81) is two. In a
general case of (21.75) the rank of the matrix is two as well. This means the presence
of two linearly independent solutions for (21.75). The index h (h = ± 1) of u(p, h)
21.3 Constitution and Solutions of the Dirac Equation 951
represents and distinguishes these two solutions. The situation is the same as the
negative-energy solutions of v(p, h) given in (21.74); see Sect. 21.3.3.
Now, (21.75) can be rewritten in a succinct form as
c01
p0 - m - σp
σp - p0 - m
c2
c3 = 0: ð21:82Þ
c
1 σp
H 0
: ð21:83Þ
jpj 0 σp
χ
Χ sχ
, ð21:84Þ
1
S σ p: ð21:85Þ
jpj
The factor σ p is contained in common in the matrix of LHS for (21.82) and H of
(21.83).
Using the notation of (21.85), we have
S 1 σp
H= 0
= 0
: ð21:86Þ
0 S jpj 0 σp
S{ S = E 2 ,
detS = - 1: ð21:87Þ
Since the (2, 2) matrix S is both Hermitian and unitary, (21.87) immediately
implies that the eigenvalues of S are ±1; see Chaps. 12 and 14. Representing the
952 21 The Dirac Equation
Sχ = hχ ðh = ± 1Þ, ð21:89Þ
S χ Sχ χ
HΧ = 0
0
S sχ
= sSχ
=h sχ
= hΧ: ð21:90Þ
p0 - m - σp
G σp - p0 - m
= pμ γ μ - m, ð21:91Þ
we have
Guðp, hÞ = 0 ð21:92Þ
½G, H = 0: ð21:93Þ
That is, G and H commute with each other. Note that (21.93) results naturally
from the presence of gamma matrices. We remember that two Hermitian matrices
commute if and only if there exists a complete orthonormal set (CONS) of common
eigenfunctions (Theorem 14.14). This is almost true of G and H. Care should be
taken, however, to apply Theorem 14.14 to the present case. This is because G is
neither a normal matrix nor an Hermitian matrix except for special cases. Therefore,
the matrix G is in general impossible to diagonalize by the unitary similarity
transformation (Theorem 14.5). From the point of view of the matrix algebra, it is
of great interest whether or not G can be diagonalized. Moreover, if G could be
diagonalize, it is important to know what kind of non-singular matrix is used for the
similarity transformation (i.e., diagonalization) and what eigenvalues G possesses.
Yet, (21.92) immediately tells us that G possesses an eigenvalue zero. Otherwise, we
21.3 Constitution and Solutions of the Dirac Equation 953
are to have u(p, h) 0. We will come back to this intriguing discussion in Chap. 24
to examine the condition for G to be diagonalized.
Here, we examine what kind of eigenfunctions G and H share as the common
eigenvectors. Since G and H commute, from (21.84) and (21.90) we have
GΧ = cΧ, ð21:95Þ
χ p0 - m -σ p χ p0 - m - PS χ
GΧ = G = =
sχ σp -p -m 0
sχ PS -p -m 0 sχ
ðp0 - mÞχ - P sSχ ðp0 - mÞχ - P shχ
= = = 0,
P Sχ - ðp0 þ mÞsχ P hχ - ðp0 þ mÞsχ
ð21:96Þ
P j p j : ð21:97Þ
χ
For (21.96) to be satisfied, namely, for sχ
to be an eigenvector of G that
belongs to the eigenvalue zero, the coefficient of χ of each row for (21.96) must
vanish. That is, we must have
p0 - m - P sh = 0,
P h - p0 þ m s = 0:
Ph p0 - m
s= = , ð21:98Þ
p0 þ m Ph
2
ð P h Þ 2 = P 2 h 2 = P 2 = p 2 = p0 þ m p 0 - m = p 0 - m2 :
χ
However, this is equivalent to (21.79). Thus, we have confirmed that Χ = sχ
of
(21.84) is certainly an eigenfunction of G, if we choose s as s = P h=ðp þ mÞ. At the 0
χ
same time, we find that Χ = sχ
is indeed the common eigenfunction shared by G
and H with a definite value s designated by (21.98). This eigenfunction belongs to
the eigenvalue h (h = ± 1) of H and the eigenvalue 0 of G. It will be convenient for a
purpose of future reference to define a positive number S as
P
S ð21:99Þ
p0 þ m
χ
Χ= hSχ
: ð21:100Þ
χ
uðp, hÞ = N hSχ
, ð21:101Þ
p1 = P sin θ cos ϕ,
p2 = P sin θ sin ϕ,
p3 = P cos θ:
Then, we have
θ θ
eiϕ=2 cos eiϕ=2 sin
χ- = 2
θ
and χ þ = 2
θ
ð21:103Þ
- iϕ=2 - iϕ=2
-e sin e cos
2 2
21.3 Constitution and Solutions of the Dirac Equation 955
where χ - and χ + are the eigenfunctions of the operator S belonging to the helicity
h = - 1 and +1, respectively. Then, as the positive-energy solution of (21.73) we
have
θ θ
eiϕ=2 cos eiϕ=2 sin
2 2
- iϕ=2
θ - iϕ=2
θ
-e sin e cos
uðp, - 1Þ = N 2
θ
, uðp, þ1Þ = N 2
θ
: ð21:104Þ
- Se iϕ=2
cos Se iϕ=2
sin
2 2
- iϕ=2
θ - iϕ=2
θ
Se sin Se cos
2 2
Using (21.74), with the negative-energy sate v(p, h) we have the Dirac equation
expressed as
- p0 - m - σð - pÞ χ~
- pμ γ μ - m vðp, hÞ = σð - pÞ p0 - m χ
t~
= 0: ð21:105Þ
~=
Χ χ~
: ð21:106Þ
t~χ
~ as
Meanwhile, defining S
~ 1 1
S σ ð- pÞ = σ ð- pÞ = - S ð21:107Þ
j - pj P
~ as
and defining G
~ - pμ γ μ - m,
G ð21:108Þ
we rewrite (21.105) as
956 21 The Dirac Equation
~ χ~ =
G - p0 - m - PS ~ χ~
= 0: ð21:109Þ
χ
t~ PS~ p0 - m χ
t~
where we put
~ χ = h~
S~ χ:
- p0 þ m - P ht = 0, ð21:111Þ
hP þ p0 - m t = 0: ð21:112Þ
Then, we get
p0 þ m Ph
t= - =- 0 : ð21:113Þ
Ph p -m
Ph
χ~ p0 þ m χ~ p0 þ m χ
hS~ 1 S~
χ
= p0 þ m = = , ð21:114Þ
χ
t~ Ph - χ~
Ph - χ~ S - h~
χ
where with the last equality the relations h2 = 1 and (21.99) were used. As a
~ the eigenvalue equation (21.89) is
consequence of switching the operator S to S,
replaced with
~ χ = h~
S~ χ = - S~
χ: ð21:115Þ
~ 1
~
H S 0
= -H= - σp 0
, ð21:116Þ
~
0 S j -p j 0 - σp
we get
21.3 Constitution and Solutions of the Dirac Equation 957
~ ~χ
~
H S~χ
= S 0 S~χ
= S S~
= Sh~χ
=h S~χ
: ð21:117Þ
- h~χ 0 ~
S - h~χ ~χ
- hS~ - h2 χ~ - h~χ
~ S~χ
vðp, hÞ = N , ð21:118Þ
- h~χ
θ θ
- eiϕ=2 cos - eiϕ=2 sin
χ~þ = θ
2
and χ~ - = 2
θ
, ð21:119Þ
e - iϕ=2 sin - e - iϕ=2 cos
2 2
θ θ
- Seiϕ=2 cos - Seiϕ=2 sin
2 2
- iϕ=2
θ - iϕ=2
θ
Se sin - Se cos
~
vðp, þ1Þ = N 2 ~
, vðp, - 1Þ = N 2 : ð21:120Þ
θ θ
e iϕ=2
cos -e iϕ=2
sin
2 2
θ θ
- e - iϕ=2 sin -e - iϕ=2
cos
2 2
The column vectors of (21.104) and (21.120) expressed as (4, 1) matrices are
called the Dirac spinors. In both the cases, we retain the freedom to choose a phase
factor, because it does not alter the quantum state. Strictly speaking, the factor e±ipx
of (21.59) is a phase factor, but this factor is invariant under the Lorentz transfor-
mation. Ignoring the phase factor, with the second equation of (21.120) we may
choose
958 21 The Dirac Equation
θ
Seiϕ=2 sin
2
- iϕ=2
θ
Se cos
~
vðp, - 1Þ = N 2 : ð21:121Þ
θ
eiϕ=2 sin
2
θ
e - iϕ=2 cos
2
We will revisit this issue in relation to the Dirac adjoint and the charge conjuga-
tion (vide infra).
Since the rank of G ~ is two, any negative-energy solution can be expressed as a
linear combination of v(p, +1) and v(p, -1) as in the case of the positive-energy
solutions. In the description of the Dirac spinor, a following tangible example helps
further realize what is the point.
Example 21.4 In Sect. 21.3.1 we have mentioned that the parameter p can be freely
chosen between -1 and +1 in the three-dimensional space. Then, we wish to
examine how the Dirac spinor can be described by switching the momentum from p
to -p. Choosing u(p, -1) for the Dirac spinor, for instance, we had
θ
eiϕ=2 cos
2
θ
- e - iϕ=2 sin
uðp, - 1Þ = N 2
: ð21:104Þ
θ
- Seiϕ=2 cos
2
θ
Se - iϕ=2 sin
2
Corresponding to p → - p, we have
see Sect. 20.5.3. Upon the replacement of (20.338), u(p, -1) is converted to
21.3 Constitution and Solutions of the Dirac Equation 959
iϕ θ iϕ θ
ð ± iÞe 2 sin e 2 sin
2 2
- iϕ θ - iϕ2 θ
ð ± iÞe 2 cos e cos
2 2
u0 ð - p, þ1Þ = N iϕ θ = ð ± iÞN iϕ θ
ð ± iÞSe 2 sin Se 2 sin
2 2
iϕ θ iϕ θ
ð ± iÞSe - 2 cos Se - 2 cos
2 2
= ð ± iÞuðp, þ1Þ,
ð21:122Þ
where u′ denotes the change in the functional form; the helicity has been switched
from -1 to +1 according to (21.89). With the above switching, we have
With the negative-energy solutions, we obtain a similar result. Then, aside from
the phase factor, the momentum switching produces
χ - $ χ þ; χþ $ χ -; h - $ hþ ,
A = L þ σ: ð21:125Þ
A p = L p þ σ p: ð21:126Þ
If the angular momentum is measured in the direction of motion (i.e., the direction
parallel to p), L p = 0. It is because in this situation we can represent
L p = ðx × pÞ p = 0,
A p = σ p: ð21:127Þ
A p = σ p = c, ð21:128Þ
where c is a constant. That is, the component of the total angular momentum in the
direction of motion is conserved and equal to the component of the spin angular
momentum in the direction of motion.
Getting back to the definition of S, we may define it as
1
S lim σ p: ð21:129Þ
jpj → 0 j pj
In this section, we wish to think of how we can specify the quantum state of the plane
wave of an electron in the relativistic Dirac field. We know from Chaps. 3 and 14 that
it is important to specify a set of mutually commutative physical quantities (i.e.,
Hermitian operators) to characterize the physical system. Normally, it will suffice to
21.3 Constitution and Solutions of the Dirac Equation 961
describe the Hamiltonian of the system and seek the operators that commute with the
Hamiltonian.
First, let us think of the one-particle Hamiltonian Hp for the plane wave with the
positive energy. It is given by [2]
H p = γ 0 ðγ p þ mÞ: ð21:130Þ
σ ∙p
Hp = m
σ ∙p -m
: ð21:131Þ
χ
Hence, operating Hp on hSχ
, we have
χ σ ∙p χ mχ þP hhSχ p0 χ χ
Hp hSχ
= m
σ ∙p -m hSχ
= P hχ - mhSχ
= hSp0 χ
= p0 hSχ
: ð21:132Þ
S~χ
Χ0 = - h~χ
, ð21:133Þ
we have
S~χ
- p0 - h~χ
: ð21:134Þ
− 0 ℰ
962 21 The Dirac Equation
and (21.118) are the simultaneous eigenstates of the Hamiltonian and Hermitian
helicity operator of the Dirac field. Naturally, these Dirac spinors constitute the
complete set of the solutions of the Dirac equation. In Table 21.1 we summarize the
matrix form of the Hamiltonian and helicity operators relevant to the Dirac field
together with their simultaneous eigenstates and their eigenvalues.
It is worth comparing the Dirac equation with the Schrödinger equation from the
point of view of an eigenvalue problem. An energy eigenvalue equation based on the
Schrödinger equation is usually described as
where H denotes the Hamiltonian. In a hydrogen atom, the total angular momentum
operator L2 is commutative with H; see (3.15). Since L2 (or M2) and Lz (or Mz) have
simultaneously been diagonalized, Lz is commutative with H; see (3.158) and
(3.159) as well. Since all H, L2, and Lz are Hermitian operators, there must exist a
complete orthonormal set of common eigenvectors (Theorem 14.14). Thus, with the
~ ðnÞ
l ðθ, ϕÞRl ðr Þ.
hydrogen atom, the solution ϕ(x) is represented by (3.300), i.e., Y m
As compared with the Schrödinger equation, the eigenvalue equation of the Dirac
equation is given by
Guðp, hÞ = 0 or ~ ðp, hÞ = 0:
Gv
-σ p -σ p Sχ -p0
H -p = ℌ = j -1 pj N
-σ p -m 0 -σ p - hχ
a
With the normalization constants N and N, we have N = N = p0 þ m; see (21.138) and (21.143)
963
964 21 The Dirac Equation
In a general case where p ≠ 0, however, the eigenvectors eligible for the solution
of the Dirac equation are not likely to constitute the CONS. As already mentioned in
~ is Hermitian in the general case makes the
Sect. 21.3.2, the fact that neither G nor G
problem more complicated. Hence, we must abandon the familiar
orthonormalization method based on the inner product in favor of another way.
We develop a different method in the next section accordingly.
ψ ψ {γ0: ð21:135Þ
χ χ
uðp, hÞ uðp, hÞ = N χ { hSχ { γ 0 N = jN j2 χ { - hSχ {
hSχ hSχ
{ 2 2 {
2
= jN j χ χ - h S χ χ , ð21:136Þ
2
P 2m
= jN j 2 1 - S 2 = jN j2 1 - = jN j2 ,
p0 þ m p0 þ m
N= p0 þ m: ð21:138Þ
χ
uðp, hÞ = p0 þ m hSχ
: ð21:139Þ
χ {h χ h0 = δhh0 :
With the negative-energy solution, using (21.114) and similarly normalizing the
function given in (21.118) we obtain
~
vðp, hÞ = N S~χ
= p0 þ m S~χ
:
- h~χ - h~χ
~=
N p0 þ m ð21:143Þ
for (21.118) is the same as the constant N of (21.138). Note, however, the minus sign
in (21.142). Thus, the normalization of the eigenfunctions differs from that based on
the positive definite inner product (see Theorem 13.2: Gram–Schmidt
orthonormalization Theorem).
With regard to the normalization of the plane wave solutions of the Dirac
equation, we have the following useful relations:
where the summation with h should be taken as both -1 and +1. We will come back
to these important expressions for the normalization in Chap. 24 in relation to the
projection operators.
We remark the following relationship with respect to the Dirac adjoint. We had
966 21 The Dirac Equation
Guðp, hÞ = 0: ð21:92Þ
G{ = pμ ðγ μ Þ{ - m = pμ γ 0 γ μ γ 0 - m, ð21:147Þ
ð γ μ Þ{ = γ 0 γ μ γ 0 : ð21:148Þ
Multiplying both sides of (21.147) by γ 0 from both the left and right, we get
γ 0 G{ γ 0 = pμ γ μ - m = G: ð21:149Þ
That is,
Multiplying both sides of (21.151) by uðp, hÞ from the left and u(p, h′) from the
right, we obtain
uðp, hÞ½ðpν γ ν Þγ μ þ γ μ ðpν γ ν Þuðp, h0 Þ = 2pμ uðp, hÞuðp, h0 Þ = 2pμ 2mδhh0 , ð21:152Þ
where with the last equality we used (21.141). Using (21.92) and (21.150), we get
Likewise, we have
Equation (21.157) immediately follows from the fact that u(±p, h) and v(±p, h)
belong to the different energy eigenvalues p0 and -p0, respectively; see the discus-
sion of Sect. 21.3.4.
As in the case of (21.141) and (21.142), the normalization based upon the Dirac
adjoint keeps (21.141) and (21.142) invariant by the multiplication of a phase factor
c = eiθ (θ : real). That is, instead of u(p, h) and v(p, h) we may freely use cu(p, h) and
cv(p, h). Yet, we normally impose a pretty strong constraint on the eigenfunctions
u(p, h) and v(p, h) (vide infra). Such a constraint is known as the charge conjugation
or charge-conjugation transformation, which is a kind of discrete transformation and
thought of as the particle-antiparticle transduction. The charge-conjugation trans-
formation of a function ψ is defined by
ψ C iγ 2 ψ : ð21:158Þ
ψ C = Cψ T , ð21:159Þ
where C is given by
C iγ 2 γ 0 : ð21:160Þ
T T
ψ C = Cψ T = iγ 2 γ 0 γ 0 ψ{ = iγ 2 γ 0 γ 0 ψ = iγ 2 ψ ð21:161Þ
ψ 1
ψ 2
ψ = , ð21:162Þ
ψ 3
ψ 4
we obtain
0 0 0 -1 ψ 1 - ψ 4
0 0 1 0 ψ 2 ψ 3
ψ C = iγ 2 ψ = 0 1 0 0 = , ð21:163Þ
ψ 3 ψ 2
-1 0 0 0 ψ 4 - ψ 1
where we used the representation of (21.68) with γ 2. As for the representation of the
matrix C, we get
0 0 0 1
0 0 -1 0
C = iγ 2 γ 0 = 0 1 0 0 : ð21:164Þ
-1 0 0 0
= iγ 2 ð- iÞ - γ 2 ðψ Þ = - γ 2
2
ψ C = iγ 2 iγ 2 ψ
C
ψ=
- ð- E 4 Þψ = ψ, ð21:165Þ
Also, we have
iϕ
θ
e 2 sin 2
0 0 0 -1
iϕ
0 0 1 0 e - 2 cos θ
2
iγ 2 uðp, þ1Þ = N 0 1 0 0 iϕ
θ
Se 2 sin 2
-1 0 0 0 iϕ
Se - 2 cos θ
2
θ
- Seiϕ=2 cos
2
- iϕ=2 θ
Se sin
2
= N θ = vðp, þ1Þ,
eiϕ=2 cos
2
θ
- e - iϕ=2 sin
2
970 21 The Dirac Equation
iϕ
θ
e 2 cos 2
0 0 0 -1
iϕ
0 0 1 0 - e - 2 sin θ
2
iγ 2 uðp, - 1Þ = N 0 1 0 0 iϕ
θ
- Se 2 cos 2
-1 0 0 0 iϕ
Se - 2 sin θ
2
θ ð21:168Þ
- Seiϕ=2 sin
2
- iϕ=2 θ
- Se cos
2
=N θ = vðp, - 1Þ:
-e iϕ=2
sin
2
θ
- e - iϕ=2 cos
2
pμ γ μ - m uðp, hÞ = 0,
p2 - γ 2 þ pμ γ μ - m uðp, hÞ = 0,
μ≠2
p2 - γ 2 þ pμ γ μ - m iγ 2 iγ 2 uðp, hÞ = 0,
μ≠2
- p2 γ 2 - pμ γ μ - m Cuðp, hÞT = 0:
μ≠2
Thus, we obtain
- pμ γ μ - m vðp, hÞ = 0: ð21:74Þ
The concepts of the Dirac adjoint and charge conjugation play an essential role in
the QED, and so we will come back to this point in Chaps. 23 and 24 to consider
these concepts in terms of the matrix algebra.
terms of the latter aspect, we make the most of various formula of the gamma
matrices (see Chaps. 23 and 24). We summarize several of them in the last part of
this section.
We started with the discussion of the Dirac equation by introducing the gamma
matrices (Sect. 21.3.1). The key equation for this is given by
fγ μ , γ ν g = 2ημν : ð21:66Þ
As the frequently used representation of the gamma matrices, we had the Dirac
representation described by
1 0 0 σk
γ0 = , γk = ðk = 1, 2, 3Þ: ð21:67Þ
0 -1 - σk 0
a b c d
e f g h
γk = p q r s , ð21:172Þ
t u v w
γ0γk þ γk γ0 = 0 ðk = 1, 2, 3Þ ð21:173Þ
0 b c d
- b 0 g h
γ k = - γ k{ = - c - g 0 s : ð21:174Þ
- d - h - s 0
2 2
bj2 þ c þ dj2 = 1, b 2 þ gj2 þ h = 1,
2 2
cj2 þ g þ sj2 = 1, d 2 þ hj2 þ s = 1: ð21:175Þ
0 0 0 eiθk
0 0 eiρk 0
γk = 0 - e - iρk 0 0 ðk = 1, 2Þ: ð21:176Þ
- e - iθk 0 0 0
- eiðθ1 - θ2 Þ 0 0 0
iðρ1 - ρ2 Þ
0 -e 0 0
γ1, γ2 =
0 0 - eiðρ2 - ρ1 Þ 0
0 0 0 - eiðθ2 - θ1 Þ
iðθ2 - θ1 Þ
-e 0 0 0
iðρ2 - ρ1 Þ
0 -e 0 0
þ = 0:
0 0 - eiðρ1 - ρ2 Þ 0
0 0 0 - eiðθ1 - θ2 Þ
ð21:177Þ
Hence,
2 cosðθ1 - θ2 Þ = 2 cosðρ1 - ρ2 Þ = 0:
0 0 eiτ 0
0 0 0 eiσ
γ3 = - e - iτ 0 0 0 : ð21:178Þ
0 - e - iσ 0 0
There is arbitrariness for choosing the gamma matrices. Multiplying both sides of
(21.66) by a unitary operator U{ from the left and U from the right and defining
γ μ U { γ μ U, ð21:179Þ
we get
fγ μ , γ ν g = 2ημν : ð21:180Þ
That is, newly defined gamma matrices γ μ satisfy the same condition (21.66)
while retaining unitarity and (anti-)Hermiticity.
The characteristic polynomials of γ k (k = 1, 2, 3) given by (21.68) are all
2
λ2 þ 1 = 0: ð21:181Þ
That is, λ = ± i (each, doubly degenerate). Their determinants are all 1. With γ 0,
its characteristic polynomial is
2
λ2 - 1 = 0: ð21:182Þ
0 0 1 0
0 0 0 1
γ 5 iγ 0 γ 1 γ 2 γ 3 = 1 0 0 0 : ð21:183Þ
0 1 0 0
So far, we have assumed that the gamma matrices comprise a fourth-order square
matrix. But it is not self-evident. Now, we tentatively assume that the gamma matrix
is the n-th order square matrix. From (21.62) we have
γ μ γ ν = - γ ν γ μ ðμ ≠ νÞ:
γ λ γ λ = 4E4 , γ λ γα γ λ = - 2γ α , ð21:185Þ
γ λ γ α γ β γ λ = 4ηαβ , γ λ γ α γ β γ ρ γ λ = - 2γ ρ γ β γ α , ð21:186Þ
γ λ ðAσ γ σ Þγ λ = - 2ðAσ γ σ Þ, γ λ ðAσ γ σ ÞðBσ γ σ Þγ λ = 4Aσ Bσ , ð21:187Þ
γ λ ðAσ γ σ ÞðBσ γ σ ÞðCσ γ σ Þγ λ = - 2ðC σ γ σ ÞðBσ γ σ ÞðAσ γ σ Þ: ð21:188Þ
References
1. Mandl F, Shaw G (2010) Quantum field theory, 2nd edn. Wiley, Chichester
2. Itzykson C, Zuber J-B (2005) Quantum field theory. Dover, New York
3. Satake I-O (1975) Linear algebra (pure and applied mathematics). Marcel Dekker, New York
4. Satake I (1974) Linear algebra. Shokabo, Tokyo. (in Japanese)
5. Møller C (1952) The theory of relativity. Oxford University Press, London
6. Kaku M (1993) Quantum field theory. Oxford University Press, New York
7. Sakamoto M (2014) Quantum field theory. Shokabo, Tokyo. (in Japanese)
8. Sakamoto M (2014). https://www.shokabo.co.jp/author/2511/answer/QFT_answer_all.pdf;
http://www.shokabo.co.jp/author/2511/answer/QFT_answer_all.pdf. (in Japanese)
Chapter 22
Quantization of Fields
play a central role in the quantum theory of fields as a powerful tool to develop the
theory and to perform calculations. We describe a brief outline of the Lagrangian
formalism [1].
For the sake of brevity, suppose that a single particle is moving under the
influence of potential U. Then, the kinetic energy T of the particle is described as
3 3
1 2 1 2
T= m_x = p, ð22:1Þ
k=1
2 k k=1
2m k
where m is a mass of the particle; xk and pk (k = 1, 2, 3) represent the x-, y-, and z-
component of the Cartesian coordinate and momentum of the particle in that
direction, respectively. In the Cartesian coordinate, we have
Here, we may modify (22.2) so that we can use any suited variables to specify the
position of the particle instead of the Cartesian coordinate xi. Examples include the
spherical coordinate introduced in Sect. 3.2. Let qk (k = 1, 2, 3) be a set of such
variables, which are said to be generalized coordinates. Suppose that xk is expressed
as a function of qk such that
x1 = x1 ðq1 , q2 , q3 Þ,
x2 = x2 ðq1 , q2 , q3 Þ,
x3 = x3 ðq1 , q2 , q3 Þ:
3
∂xk
x_ k = q_ ðk = 1, 2, 3Þ: ð22:3Þ
l=1
∂ql l
Assuming that x_ k depends upon both q1, q2, q3 and q_ 1 , q_ 2 , q_ 3 , x_ k can be described
by
x_ k = x_ k ðq1 , q2 , q3 , q_ 1 , q_ 2 , q_ 3 Þ: ð22:4Þ
∂_xk ∂x
= k: ð22:5Þ
∂q_ l ∂ql
∂T
pk = m_xk = ðk = 1, 2, 3Þ: ð22:6Þ
∂_xk
Thus, in parallel with (22.6) we can define a generalized momentum (or conjugate
momentum) pk such that
∂T
pk ðk = 1, 2, 3Þ: ð22:8Þ
∂q_ k
3
∂T ∂_xl
pk = ðk = 1, 2, 3Þ: ð22:9Þ
l=1
∂_xl ∂q_ k
3
∂xl
pk = m_xl : ð22:10Þ
l=1
∂qk
3 3
∂xl d ∂xl
p_k = m€xl þ m_xl : ð22:11Þ
l=1
∂qk l = 1 dt ∂qk
Now, suppose that the single particle we are dealing with undergoes an infinites-
imal displacement so that q1, q2, q3 are changed by dq1, dq2, dq3. This causes
xk = xk(q1, q2, q3) to be changed by dxk such that
3
∂xk
dxk = dq : ð22:12Þ
l=1
∂ql l
3
∂xk
δxk = δq : ð22:13Þ
l=1
∂ql l
3
δW = F k δxk : ð22:14Þ
k=1
Note that Fk is the x-, y-, and z-component of the original Cartesian coordinate.
Namely, F can be expressed such that
3
F= F k ek , ð22:15Þ
k=1
3 3 3 3 3
∂xk ∂xk
δW = F k δxk = Fk δq = Fk δql , ð22:16Þ
k=1 k=1 l=1
∂ql l l=1 k=1
∂ql
where δxk in (22.14) was replaced with δqk through (22.13) and the order of
summation was exchanged with respect to l and k. In (22.16), we define Ql as
3
∂xk
Ql Fk : ð22:17Þ
k=1
∂ql
Then, we have
3
δW = Ql δql : ð22:18Þ
l=1
3 3
F k δxk = Qk δqk : ð22:19Þ
k=1 k=1
22.1 Lagrangian Formalism of the Fields [1] 981
F l = m€xl : ð22:20Þ
3 3 3
∂xl d ∂xl d ∂xl
p_k = Fl þ m_xl = Qk þ m_xl , ð22:21Þ
l=1
∂qk l = 1 dt ∂qk l=1
dt ∂qk
3 3
∂T ∂T ∂_xl ∂_xl
= = m_xl ðk = 1, 2, 3Þ, ð22:22Þ
∂qk l=1
∂_xl ∂qk l=1
∂qk
∂_xl
where with the second equality we used (22.6). Calculating ∂q by use of (22.3), we
k
obtain
3 3 3
∂_xl ∂xl 2 ∂xl 2 ∂ ∂xl dqm d
= q_ = q_ = =
∂qk m=1
∂qk ∂qm m m=1
∂qm ∂qk m m=1
∂qm ∂qk dt dt
∂xl
:
∂qk
ð22:23Þ
The last equality of (22.23) is obtained from the differentiation of the composite
function xl = xl ðq1 , q2 , q3 , q_ 1 , q_ 2 , q_ 3 Þ with respect to t. Substituting (22.23) for RHS
of (22.22), we get
3
∂T d ∂xl
= m_xl ðk = 1, 2, 3Þ: ð22:24Þ
∂qk l=1
dt ∂qk
Replacing the second term of (22.21) with LHS of (22.24), finally we obtain a
succinct expression described by
982 22 Quantization of Fields
∂T
p_k = Qk þ : ð22:25Þ
∂qk
d ∂T ∂T
= Qk þ ðk = 1, 2, 3Þ: ð22:26Þ
dt ∂q_ k ∂qk
Equation (22.26) can further be transformed to a well-known form based upon the
Lagrangian L. Suppose that the force is a conservative force. In that case, Qk can be
described as
∂U
Qk = - , ð22:27Þ
∂qk
d ∂T ∂T ∂U
- þ = 0 ðk = 1, 2, 3Þ: ð22:28Þ
dt ∂q_ k ∂qk ∂qk
∂U
= 0: ð22:29Þ
∂q_ k
Accordingly, we get
d ∂ ðT - U Þ ∂ ðT - U Þ
- = 0: ð22:30Þ
dt ∂q_ k ∂qk
L T - U, ð22:31Þ
we obtain
d ∂L ∂L
- = 0 ðk = 1, 2, 3Þ: ð22:32Þ
dt ∂q_ k ∂qk
under a constraint; for instance, suppose that motion of some particles is restricted to
the surface of a sphere, or some particles are subject to free fall under the force of
gravity. In such cases the freedom decreases. Hence, the freedom of the dynamical
system is normally characterized by a positive integer n. Then, (22.32) is
reformulated as follows:
d ∂L ∂L
- = 0 ðk = 1, 2, ⋯, nÞ: ð22:33Þ
dt ∂q_ k ∂qk
∂L ∂T
pk = ðk = 1, 2, ⋯, nÞ: ð22:34Þ
∂q_ k ∂q_ k
The relations (22.8) and (22.34) make a turning point in our formulation. Through
the medium of (22.7) and (22.33), we express the Lagrangian L as
L = Lðq1 , ⋯, qn , q_ 1 , ⋯, q_ n , t Þ Lðq, q,
_ t Þ, ð22:35Þ
where with the last identity q and q_ represent q1, ⋯, qn and q_ 1 , ⋯, q_ n collectively.
Consequently, pk ðk = 1, 2, ⋯, nÞ can be described as a function of q, q, _ t. In an
_ p, t Þ, but this would break the
opposite way, we could describe L as Lðq, p, t Þ or Lðq,
original functional form of L. Instead, we wish to seek another way of describing a
physical quantity P as a function of q, p, and t, i.e., Pðq, p, t Þ. Such satisfactory
prescription is called the Legendre transformation [1]. The Legendre transformation
is defined by
_ t Þ - pq:
Pðq, p, t Þ = Lðq, q, _ ð22:36Þ
The implication of (22.36) is that the physical quantity P has been described as a
_ t. Notice that Pðq, p, t Þ
function of the independent variables q, p, t instead of q, q,
_ t Þ are connected to each other through p and q.
and Lðq, q, _ The quantities q, p are said
to be canonical variables. That is, the Legendre transformation enables us to describe
P as a function of the canonical variables. The Hamiltonian H ðq, p, t Þ, another
important quantity in the analytical mechanics, is given by the Legendre transfor-
mation and defined as
n
H ðq, p, t Þ = _ t Þ:
pk q_ k - Lðq, q, ð22:37Þ
k=1
∂T
= pk = 2T k =q_ k : ð22:38Þ
∂q_ k
Hence, we obtain
2T k = pk q_ k :
n n
pk q_ k = 2 T k = 2T, ð22:39Þ
k=1 k=1
n
where T k = T. Hence, the Hamiltonian H ðq, p, t Þ given in (22.37) is described by
k=1
H ðq, p, t Þ = 2T - L = 2T - ðT - U Þ = T þ U: ð22:40Þ
Thus, the Hamiltonian represents the total energy of the dynamical system. The
Legendre transformation is often utilized in thermodynamics. Interested readers are
referred to literature [1].
Along with the Lagrange’s equations of motion, we have another important
guiding principle of the Lagrangian formalism. It is referred to as the principle of
least action or Hamilton’s principle. To formulate the principle properly, we define
an action integral S such that [2]
1
Sðq1 , ⋯, qn Þ dtLðq1 , ⋯, qn , q_ 1 , ⋯, q_ n , t Þ: ð22:41Þ
-1
δSðq1 , ⋯, qn Þ = 0: ð22:42Þ
The principle of least action says that particles of the dynamical system move in
such a way that the action integral takes an extremum. Equations (22.41) and (22.42)
lead to the following equation:
1
δSðqÞ = dt ½Lðq þ δq, q_ þ δq,
_ t Þ - Lðq, q,
_ t Þ = 0, ð22:43Þ
-1
n
∂L ∂L
Lðq þ δq, q_ þ δq,
_ t Þ = Lðq, q,
_ tÞ þ δqk þ δq_ k : ð22:44Þ
k=1
∂qk ∂q_ k
Noting that
∂L d ∂L d ∂L d ∂L
δq_ k = δq = - δqk þ δqk , ð22:45Þ
∂q_ k dt k ∂q_ k dt ∂q_ k dt ∂q_ k
we get
1 n
∂L d ∂L d ∂L
δSðqÞ = dt δqk - δqk þ δqk
-1 k=1
∂qk dt ∂q_ k dt ∂q_ k
1 n
∂L d ∂L d ∂L
= dt δqk - þ δqk
-1 k=1
∂qk dt ∂q_ k dt ∂q_ k
1 n 1 n
∂L d ∂L d ∂L
= dt δqk - þ dt δqk
-1 k=1
∂qk dt ∂q_ k -1 k=1
dt ∂ q_ k
1 n n 1
∂L d ∂L ∂L
= dt δqk - þ δqk = 0:
-1 k=1
∂qk dt ∂q_ k k=1
∂q_ k -1
1 n
∂L d ∂L
δSðqÞ = dt δqk - = 0: ð22:46Þ
-1 k=1
∂qk dt ∂q_ k
The integrand of (22.46) must vanish so that (22.46) can hold with any arbitrarily
chosen variations δqk. In other words, the Lagrange’s equations of motion must be
represented by
d ∂L ∂L
- = 0 ðk = 1, 2, ⋯, nÞ: ð22:33Þ
dt ∂q_ k ∂qk
the quantum theory of fields, we introduce the related techniques rather systemati-
cally in this section. This is because the field quantization is associated with the
whole space–time. Be that as it may, the major concepts related to the field
quantization have already been shown in Parts I and IV. Here, we further extend
and improve such concepts.
Let us start with the Fourier series expansion. Consider the following SOLDE that
appeared in Example 1.1 of Sect. 1.3:
d 2 yð xÞ
þ λyðxÞ = 0: ð1:61Þ
dx2
We solved (1.61) under the Dirichlet boundary conditions (see Sect. 10.3) of
y(L ) = 0 and y(-L ) = 0 (L > 0). The normalized eigenfunctions were described by
1 π
yð xÞ = cos kx kL = þ mπ ðm = 0, 1, 2, ⋯Þ , ð1:91Þ
L 2
1
yð xÞ = sin kx ½kL = nπ ðn = 1, 2, 3, ⋯Þ, ð1:92Þ
L
1 d2 ΦðϕÞ
- = η: ð3:58Þ
ΦðϕÞ dϕ2
d 2 Φð ϕÞ
þ ηΦðϕÞ = 0 ð0 ≤ ϕ ≤ 2π Þ: ð22:47Þ
dϕ2
1
ΦðϕÞ = p eimϕ ðm = 0, ± 1, ± 2, ⋯Þ, ð3:64Þ
2π
1
F n ðxÞ = p einπx=a ðn = 0, ± 1, ± 2, ⋯; a > 0Þ: ð22:49Þ
2a
Note that if we put a = π in (22.49), (22.49) is of the same form as (3.64). Then,
using (22.49) f(x) can be expanded such that
1
1
f ðxÞ = cðnÞeinπx=a ð22:50Þ
2a n= -1
with
a
c ð nÞ = e - inπx=a f ðxÞ: ð22:51Þ
-a
Let us suppose that a → 1 in (22.50) and (22.51), keeping the BCs the same as
(22.48) to see what happens. Putting
988 22 Quantization of Fields
nπ
= k, ð22:52Þ
a
Δnπ
= Δk: ð22:53Þ
a
Δk
Taking Δn = 1, we have 2a
1
= 2π . Then, (22.50) is rewritten as
1 1
Δk 1
f ðxÞ = cðnÞeikx = cðnÞeikx Δk: ð22:54Þ
2π n= -1
2π n= -1
where c(n) in (22.50) was changed to c(k) through (22.52). Multiplying both sides of
(22.55) by e-iqx and integrating it with respect to x, we have
1 1 1
1
e - iqx f ðxÞdx = cðkÞeiðk - qÞx dkdx
-1 2π -1 -1
1 1
1
= cð k Þ eiðk - qÞx dx dk ð22:56Þ
2π -1 -1
1
1
= cðkÞ2πδðk - qÞdk = cðqÞ:
2π -1
1
1
f ðxÞ = p cðkÞeikx dk, ð22:59Þ
2π -1
p
1
- iqx 2π 1 p
e f ðxÞdx = cðk Þ2πδðk - qÞdk = 2π cðqÞ: ð22:60Þ
-1 2π -1
P j kihk j = E, ð5:15Þ
k
where E is an identity operator. In (5.15) the number of jki might be either finite or
infinite, but jki was countable. A dynamical system we will be dealing with in this
chapter, however, normally has infinite and uncountable degrees of freedom. To
conform (5.15) to our present purpose, (5.15) should be modified as follows:
1
E= j kihk j dk, ð22:64Þ
-1
Both (5.15) and (22.64) can be regarded as the completeness of the projection
operators (see Sect. 14.1). Multiplying both sides of (22.64) by hxj and jfi from the
left and from the right, respectively, we get
1
hxjf i = hxjk ihkjf idk:
-1
we have
1
f ðxÞ = hxjk icðk Þdk: ð22:66Þ
-1
1
hxjk i = p eikx : ð22:67Þ
2π
Multiplying both sides of (22.68) by hkj and jfi from the left and from the right,
respectively, we get
1
hkjf i = hkjxihxjf idx: ð22:69Þ
-1
Thus, we obtain
1 1 1
1
cðk Þ = hkjxif ðxÞdx = hxjk i f ðxÞdx = p e - ikx f ðxÞdx
-1 -1 - 1 2π
1 ð22:70Þ
1 - ikx
=p e f ðxÞdx,
2π -1
where with the second equality we used (13.2). Thus, we reproduced (22.63).
22.3 Quantization of the Scalar Field [5, 6] 991
E 2 = p2 þ m 2 : ð22:71Þ
As already seen in Sect. 21.1, the Klein–Gordon equation was based upon this
relationship. From (22.71), we have
E= ± p2 þ m2 , ð22:72Þ
to which both positive and negative energies are presumed. The quantization of
(neutral) scalar field (or Klein-Gordon field), Dirac field, and electromagnetic field
to be dealt with in this chapter is closely associated with this assumption. At the same
time, this caused problems at the early stage of the quantum field theory along with
the difficulty in the probabilistic interpretation of (21.9). Nevertheless, after Dirac
invented the equation named after himself (i.e., the Dirac equation), the Klein–
Gordon equation turned out to be valid as well as the Dirac equation as a basis
equation describing quantum fields [7].
Historically speaking, the quantum theory of fields was at first established as the
theory that dealt with the dynamical system consisting of photons and electrons and
referred to as quantum electrodynamics (QED). In this connection, the quantization
of scalar field is not necessarily required. In this section, however, we first describe
the quantization of the (neutral) scalar fields based on the Klein–Gordon equation. It
is because the scalar field quantization is not only the simplest example but also a
prototype of the related quantum theories of fields. Hence, we study the quantization
of scalar field in detail accordingly.
We start with the formulation of the Lagrangian density of the free Klein–Gordon
field. The Lagrangian density L is given by [5, 8]
1 μ
L= ∂ ϕ∂ ϕ - m2 ϕ2 : ð22:73Þ
2 μ
1 μ
Sð ϕÞ = dx L = dx ∂ ϕðxÞ∂ ϕðxÞ - m2 ½ϕðxÞ2 : ð22:74Þ
2 μ
Note that unlike (22.41) where the integration was taken only with the time
coordinate, the integration in (22.74) has been taken over the whole space–time.
That is
992 22 Quantization of Fields
1 1 1 1
dx dx0 dx1 dx2 dx3 : ð22:75Þ
-1 -1 -1 -1
δSðϕÞ = δ dx LðxÞ
1 μ μ
= dx ∂μ δϕðxÞ ∂ ϕðxÞ þ ∂μ ϕðxÞ½∂ δϕðxÞ - m2 2ϕðxÞδϕðxÞ
2
1 μ 1 1 1
= dx ≠ μ f½δϕðxÞ∂ ϕðxÞgxμ = - 1 þ dx ≠ μ ∂μ ϕðxÞ½δϕðxÞ xμ = - 1
2 2
1 μ μ
þ dx - ∂μ ∂ ϕðxÞ - ∂ ∂μ ϕðxÞ - m2 2ϕðxÞ δϕðxÞ = 0,
2
ð22:76Þ
where the second to the last equality resulted from the integration by parts; x≠μ
(or x≠μ) in the first and second terms means the triple integral with respect to the
variables x other than xμ (or xμ). We assume that δϕ(x) → 0 at xμ (xμ) → ± 1.
Hence, with the second to the last equality of (22.76), the first and second integrals
vanish. Consequently, for δS(ϕ) = 0 to be satisfied with regard to any arbitrarily
taken δϕ(x), the integrand of the third integral must vanish. This is equivalent to
μ μ
- ∂μ ∂ ϕðxÞ - ∂ ∂μ ϕðxÞ - m2 2ϕðxÞ = 0:
μ
∂μ ∂ ϕðxÞ þ m2 ϕðxÞ = 0: ð22:77Þ
∂L ∂L
∂μ - =0 ð22:79Þ
∂ ∂μ ϕ ∂ϕ
22.3 Quantization of the Scalar Field [5, 6] 993
Once the Lagrangian density L has been given with the field to be quantized, the
quantization of fields is carried out as the equal-time commutation relation between
the canonical variables, i.e., generalized coordinate (or canonical coordinate) and
generalized momentum (or canonical momentum). That is, if L is given as
L[ϕ(x), ∂μϕ(x)] in (22.78) as a function of generalized coordinates ϕ(x) as well as
their derivatives ∂μϕ(x), the quantization (or canonical quantization) of the scalar
field is postulated as the canonical commutation relations expressed as
∂L
π : ð22:81Þ
∂ϕ_
Also, we have
δ 3 ð x - yÞ δ x 1 - y 1 δ x 2 - y2 δ x3 - y3 :
If more than one coordinate is responsible for (22.80), that can appropriately be
indexed. In that case, the commutation relations are sometimes represented as a
matrix form (see Sect. 22.4). The first equation of (22.80) is essentially the same as
(1.140) and (1.141), which is rewritten in the natural units (i.e., ħ = 1) as
where E is the identity operator. Thus, we notice that E of (22.82) is replaced with
δ3(x - y) in (22.80). This is because the whole spatial coordinates are responsible for
the field quantization.
994 22 Quantization of Fields
∂L
= ∂ ϕðt, xÞ = ϕ_ ðt, xÞ:
0
π ðt, xÞ = ð22:83Þ
∂ð∂0 ϕÞ
Now, we are in the position to do with our task at hand. We wish to seek Fourier
integral representation of the free Klein–Gordon field ϕ(x). The three-dimensional
representation is given by [8]
d3 k
ϕðt, xÞ = qk ðt Þeikx , ð22:85Þ
3
ð2π Þ
with
Notice that in (22.87) each term has been factorized with respect to k and t so that
the exponential factors contain only t as a variable. The quantity k0 represents the
angular frequency of the harmonic oscillator and varies with |k|. Also, note that the
22.3 Quantization of the Scalar Field [5, 6] 995
angular frequency and energy have the same dimension in the natural units (vide
infra).
Here we assume that ϕ(x) represents the neutral scalar field so that ϕ(x) is
Hermitian. In the present case ϕ(x) is represented by a (1, 1) matrix. Whereas it is
a real c-number before the field quantization, after the field quantization it should be
regarded as a real q-number. Namely, the order of the multiplication must be
considered properly between the q-numbers.
Taking adjoint of (22.85), we have
d3 k
ϕ{ ðxÞ = qk { ðt Þe - ikx : ð22:89Þ
3
ð2π Þ
d3 k
ϕ{ ðxÞ = q - k { ðt Þeikx : ð22:90Þ
3
ð2π Þ
þ1 -1
d3 k ð- 1Þ3 d3 k
qk { ðt Þe - ikx = q - k { ðt Þeikx
-1 ð2π Þ 3 þ1 ð2π Þ 3
þ1
ð22:91Þ
d k
3
{
= q - k ðt Þe : ikx
-1 ð2π Þ3
q - k { ðt Þ = qk ðt Þ: ð22:92Þ
with the first and second relations of (22.94) being identical. Substituting (22.87) for
(22.85), we get
1
d3 k
ϕðt, xÞ = q1 ðkÞe - ik0 t þ q2 ðkÞeik0 t eikx
-1 ð2π Þ 3
1 -1
d3 k ð- 1Þ3 d3 k
= q1 ðkÞe - ik0 t eikx þ q2 ð - kÞeik0 t eið- kÞx
-1 ð2π Þ 3 1 ð2π Þ 3
1 -1
d k
3
ð- 1Þ3 d3 k
= q1 ðkÞe - ik0 t eikx þ q1 { ðkÞeik0 t eið- kÞx
-1 ð2π Þ 3 1 ð2π Þ 3
1 1
d k
3
d k
3
= q1 ðkÞe - ikx þ q1 { ðkÞeikx
-1 ð2π Þ 3 -1 ð2π Þ 3
1
d k
3
= q1 ðkÞe - ikx þ q1 { ðkÞeikx ,
-1 ð2π Þ 3
ð22:95Þ
where with the second equality k of the second term has been replaced with -k; with
the third equality (22.94) was used in the second term. Note also that in (22.95)
kx k0 x0 - kx = k2 þ m2 t - kx: ð22:96Þ
a ð kÞ 2k 0 q1 ðkÞ, ð22:97Þ
we get
d3 k
ϕð xÞ = aðkÞe - ikx þ a{ ðkÞeikx : ð22:98Þ
3
ð2π Þ ð2k0 Þ
p
The origin of the coefficient 2k0 is explained in literature [9] (vide infra).
Equation (22.98) clearly shows that ϕ(x) is Hermitian, that is ϕ{(x) = ϕ(x). We
divide (22.98) into the “positive-frequency” part ϕ+(x) and “negative-frequency”
part ϕ-(x) such that [5]
22.3 Quantization of the Scalar Field [5, 6] 997
Note that the positive- and negative-frequency correspond to the positive- and
negative-energy, respectively. From (22.83) and (22.98) we have
0
d3 k
π ð xÞ = ∂ ϕð xÞ = ϕð xÞ =
ð2π Þ3 ð2k0 Þ
Next, we convert the equal-time commutation relations (22.84) into the commu-
tation relation between a(k) and a{(q). For this purpose, we solve (22.98) and
(22.100) with respect to a(k) and a{(k). To perform the calculation, we make the
most of the Fourier transform developed in Sect. 22.2.2. In particular, the three-
dimensional version of (22.57) is useful. The formula is expressed as
1
ð2π Þ3 δ3 ðkÞ = eikx d 3 x: ð22:101Þ
-1
To apply (22.101) to (22.98) and (22.100), we deform those equations such that
d3 k
ϕðt, xÞ = aðkÞe - ik0 t þ a{ ð- kÞeik0 t eikx , ð22:102Þ
3
ð2π Þ ð2k0 Þ
d3 k
ϕ_ ðt, xÞ = ð- ik 0 Þ aðkÞe - ik0 t - a{ ð- kÞeik0 t eikx : ð22:103Þ
3
ð2π Þ ð2k0 Þ
In (22.102) and (22.103), k has been replaced with -k in their second terms of
RHS. We want to solve these equations in terms of aðkÞe - ik0 t and a{ ð- kÞeik0 t . The
calculation procedures are as follows: (1) Multiply both sides of (22.102) and
(22.103) by e-iqx and integrate them with respect to x by use of (22.101) to get a
factor δ3(k - q). (2) Then, we perform the integration with respect to k to delete k.
As a consequence of the integration, k is replaced with q and k0 = k2 þ m2 is
replaced with q2 þ m2 q0 accordingly. In that way, we convert (22.102) and
(22.103) into the equations with respect to aðqÞe - iq0 t and a{ ðqÞeiq0 t . The result is
expressed as follows:
998 22 Quantization of Fields
d3 x
aðqÞe - iq0 t = i ϕ_ ðt, xÞ - iq0 ϕðt, xÞ e - iqx ,
3
ð2π Þ ð2q0 Þ
ð22:104Þ
d3 x
{
a ðqÞe iq0 t
= -i ϕ_ ðt, xÞ þ iq0 ϕðt, xÞ eiqx :
ð2π Þ3 ð2q0 Þ
Note that in the second equation of (22.104) -q was switched back to q to change
a{(-q) to a{(q).
(3) Next, we calculate the commutation relation between a(k) and a{(q). For this,
the commutator aðkÞe - ik0 t , a{ ðqÞeiq0 t can directly be calculated using the equal-
time commutation relations (22.84). The factors e - ik0 t and eiq0 t are gotten rid of in
the final stage. The relevant calculations are straightforward and greatly simplified
by virtue of the integration of the factor δ3(x - y) that comes from (22.84). The use
of (22.101) also eases computations. As a result, we get succinct equation that is
described by
q0
aðkÞe - ik0 t , a{ ðqÞeiq0 t = e - iðk0 - q0 Þt aðkÞ, a{ ðqÞ = δ3 ðk - qÞ:
k 0 q0
a, a{ = 1 and a, a{ = E: ð2:24Þ
In fact, characteristics of a(k), a{(q), and a, a{ are closely related in that a{(q) and
{
a function as the creation operators and that a(k) and a act as the annihilation
operators. Thus, the above discussion clearly shows that the quantum-mechanical
description of the harmonic oscillators is of fundamental importance in the quantum
field theory. The operator a(k) is sometimes given in a simple form such that [8]
22.3 Quantization of the Scalar Field [5, 6] 999
d3 x $
aðkÞ = i eikx ∂0 ϕ , ð22:106Þ
ð2π Þ3 ð2k 0 Þ
where
$
f ∂0 g f ð∂0 gÞ - ð∂0 f Þg: ð22:107Þ
As already shown in Chap. 2, the use of a(k) and a{(k) enables us to calculate
various physical quantities in simple algebraic forms. In the next section, we give the
calculation processes for the Hamiltonian of the free Klein–Gordon field. The
concept of the Fock space is introduced as well. The results can readily be extended
to the calculations regarding, e.g., the Dirac field and photon field.
In our present case, we have π ðxÞ = ϕ_ ðt, xÞ and from (22.73) we get
1
ℋð xÞ = ½π ðxÞ2 þ ½— ϕðxÞ2 þ ½mϕðxÞ2 : ð22:109Þ
2
1
H= d3 x ℋ = d3 x ½π ðxÞ2 þ ½— ϕðxÞ2 þ ½mϕðxÞ2 : ð22:110Þ
2
1 1
d3 x ϕ_ ðt; xÞ
2
d3 x½π ðt; xÞ2 =
2 2
1 d3 k
= d3 x ð - ik 0 Þ aðkÞe - ik0 t - a{ ð - kÞeik0 t eikx
2 3
ð2π Þ ð2k0 Þ
d q
3
× ð - iq0 Þ aðqÞe - iq0 t - a{ ð - qÞeiq0 t eiqx
3
ð2π Þ ð2q0 Þ
1 d3 kd3 q
=- d3 x k0 q0 aðkÞaðqÞe - iðk0 þq0 Þt
4ð2π Þ3 k 0 q0
1 d 3 kd3 q
=- 3
d3 xeiðkþqÞx k0 q0 aðkÞaðqÞe - iðk0 þq0 Þt ½ {
4ð2π Þ k 0 q0
- aðkÞa{ ð - qÞe - iðk0 - q0 Þt - a{ ð - kÞaðqÞeiðk0 - q0 Þt
1 d3 kd3 q
=- 3
ð2π Þ3 δðk þ qÞk 0 q0 aðkÞaðqÞe - iðk0 þq0 Þt ½{{
4ð2π Þ k 0 q0
- aðkÞa{ ð - qÞe - iðk0 - q0 Þt - a{ ð - kÞaðqÞeiðk0 - q0 Þt
1 d3 k
=- 3
ð2π Þ3 k0 q0 aðkÞað - kÞe - iðk0 þq0 Þt
4ð2π Þ k 0 q0
- aðkÞa{ ðkÞe - iðk0 - q0 Þt - a{ ð - kÞað - kÞeiðk0 - q0 Þt
1 d3 k
= k0 q0 - aðkÞað - kÞe - iðk0 þq0 Þt þ aðkÞa{ ðkÞe - iðk0 - q0 Þt
4 k 0 q0
þ a{ ð- kÞað - kÞeiðk0 - q0 Þt - a{ ð - kÞa{ ðkÞeiðk0 þq0 Þt
1
= d3 kk0 - aðkÞað - kÞe - 2ik0 t þ aðkÞa{ ðkÞ þ a{ ðkÞaðkÞ ½{{{
4
1
d3 x½— ϕðxÞ2
2
1 d3 k 2 ð22:112Þ
= k þ aðkÞað - kÞe - 2ik0 t þ aðkÞa{ ðkÞ þ a{ ðkÞaðkÞ
4 k 0 q0
þ a{ ð - kÞa{ ðkÞe2ik0 t ,
1
d3 x½mϕðxÞ2
2
1 d3 k ð22:113Þ
= m2 þ aðkÞað - kÞe - 2ik0 t þ aðkÞa{ ðkÞ þ a{ ðkÞaðkÞ
4 k 0 q0
þ a{ ð - kÞa{ ðkÞe2ik0 t :
1
d3 x½— ϕðxÞ2 þ ½mϕðxÞ2
2
1 d3 k
= k2 þ m2
4 k 0 q0
× þaðkÞað - kÞe - 2ik0 t þ aðkÞa{ ðkÞ þ a{ ðkÞaðkÞ þ a{ ð - kÞa{ ðkÞe2ik0 t
1
= d3 kk0
4
× þaðkÞað - kÞe - 2ik0 t þ aðkÞa{ ðkÞ þ a{ ðkÞaðkÞ þ a{ ð - kÞa{ ðkÞe2ik0 t ,
ð22:114Þ
1
H = d3 kk 0 aðkÞa{ ðkÞ þ a{ ðkÞaðkÞ
2
ð22:115Þ
1
= d3 k k2 þ m2 aðkÞa{ ðkÞ þ a{ ðkÞaðkÞ ,
2
1
H= d3 kk 0 a{ ðkÞaðkÞ þ a{ ðkÞaðkÞ þ lim δ3 ðeÞ : ð22:116Þ
2 e→0
1 1
lim δ3 ðeÞ = lim eiex d3 x → d3 x:
e→0 ð2π Þ3 e → 0 ð2π Þ3
Therefore, we get
1
H= d 3 kk0 a{ ðkÞaðkÞ þ k0 d3 k d3 x: ð22:117Þ
2ð2π Þ3
we have
1 1
H= d3 kk0 nðkÞ þ d3 kd3 x k : ð22:119Þ
ð2π Þ3 2 0
The second term of (22.119) means a total energy included in the whole phase
space. In other words, the quantity (k0/2) represents the energy density per “volume”
(2π)3 of phase space. Note here that in the natural units the dimension of time and
length is the inverse of mass (mass dimension). The angular frequency and
wavenumber as well as energy and momentum have the same dimension of mass
accordingly. This is represented straight in (22.88).
Representing (2.21) in the natural units, we have
22.3 Quantization of the Scalar Field [5, 6] 1003
1
H = ωa{ a þ ω: ð22:120Þ
2
The analogy between (22.117) and (22.120) is evident. The operator n(k) is
closely related to the number operator introduced in Sect. 2.2; see (2.58). The second
term of (22.117) corresponds to the zero-point energy and has relevance to the
discussion about the harmonic oscillator of Chap. 2. Thus, we appreciate how deeply
the quantization of fields is connected to the treatment of the harmonic oscillators.
In accordance with (2.28) of Sect. 2.2, we define the state of vacuum j0i such that
8
aðkÞ j 0i = 0 for k: ð22:121Þ
1 1
H j 0i = d3 kk0 nðkÞ j 0i þ d3 kd 3 x k j 0i
ð2π Þ3 2 0
ð22:122Þ
1 1
= d3 kd 3 x k j 0i:
ð2π Þ3 2 0
1 1 μ
Pμ d 3 kkμ nðkÞ þ d3 kd3 x k ðμ = 1, 2, 3Þ: ð22:123Þ
ð2π Þ3 2
Notice that from (22.119) and (22.124), if we ignore the second term of (22.119)
and the index μ is extended to 0, we have P0 H. Thus, we reformulate Pμ as a four-
vector such that
Meanwhile, using the creation operators a{(k), let us consider the following
states:
where with the second to the last equality the first term vanishes because from
(22.121) we have a(q) j 0i = 0; with the last equality n terms containing the δ3(q -
ki) (i = 1, ⋯, n) factor survive.
Thus, from (22.118), (22.125), and (22.127) we get
Pμ j k1 , k 2 , ⋯, k n i = d 3 qkμ a{ ðqÞaðqÞ j k1 , k 2 , ⋯, k n i
where with the second to the last equality we used the second equation of (22.105);
with the last equality we used the definition of (22.126). With respect to the energy
of jth state, we have
22.3 Quantization of the Scalar Field [5, 6] 1005
From (22.130), we find that jk1, k2, ⋯, kni is the eigenstate of the four-momentum
operator Pμ (μ = 0, 1, 2, 3) that belongs to the eigenvalue k μ1 þ ⋯ þ k μn . The state
described by jk1, k2, ⋯, kni is called a Fock basis. A Hilbert space spanned by the
Fock basis is called a Fock space. As in the case of Chap. 2, the operator a{(k) is
interpreted as an operator that creates a particle having momentum k.
The calculation procedures of (22.127) naturally lead to the normal product
(or normal-ordered product), where the annihilation operators a(k) always stand to
the right of the creation operators a{(k). This is because if so, any term that contains
a(k) j 0i as a factor vanishes in virtue of (22.121). This situation is reminiscent of
(2.28), where jaψ 0i = 0 corresponds to a(k) j 0i = 0.
We have already mentioned that the Lorentz invariance (or covariance) is not
necessarily clear with the equal-time commutation relation. In this section we
describe characteristics of the invariant delta functions of the scalar field to explicitly
show the Lorentz invariance of these functions. In this section, we deal with the
invariant delta functions of the scalar field in detail, since their usage offers a typical
example as a prototype of the field quantization and, hence, can be applied to the
Dirac field and photon field as well.
The invariant delta function is defined using a commutator as
t
where x and y represent an arbitrary space–time four-coordinate such that x =
x
t0
and y = . Several different invariant delta functions will appear soon. Since
y
their definition and notation differ depending on literature, care should be taken.
Note that although we have first imposed the condition t = t′ on the commutation
relation, it is not the case with the present consideration. Since the invariant delta
function Δ(x - y) is a scalar function as we shall see below, we must have a Lorentz
1006 22 Quantization of Fields
invariant form for it. One of the major purposes of this section is to represent Δ(x -
y) and related functions in the Lorentz invariant form.
Dividing the positive-frequency part and negative-frequency part of ϕ(x) and
ϕ( y) such that [5]
we have
Then, we have
where with the third equality we used the first equation of (22.105). Also, in (22.136)
we used the fact that since k 0 = k2 þ m2 and q0 = q2 þ m2 with k = q, we have
k0 = q0 accordingly (vide supra). We define one of the invariant delta functions, i.e.,
Δ+(x - y) as
22.3 Quantization of the Scalar Field [5, 6] 1007
1 d 3 k - ikðx - yÞ
iΔþ ðx - yÞ ½ϕþ ðxÞ, ϕ - ðyÞ = e : ð22:137Þ
ð2π Þ3 2k0
Exchanging the coordinate x and y in (22.137) and reversing the order of the
operators in the commutator, we get the following expression where another invari-
ant delta function Δ-(x - y) is defined as [5]
From (22.137) and (22.138), we can readily get the following expression
described by
□x þ m2 Δþ ðx - yÞ = □x þ m2 Δ - ðx - yÞ = 0,
ð22:140Þ
□x þ m2 Δðx - yÞ = 0,
δ k 2 - m2 = δ k 0 2 - k2 þ m2
1
= δ k0 - k2 þ m 2 þ δ k 0 þ k2 þ m 2 :
2 k þ m2
2
ð22:142Þ
This relation results from the following property of the δ function [10]. We have
δðx - an Þ
δ½f ðxÞ = , ð22:143Þ
n jf 0 ðan Þj
1
δ x 2 - a2 = ½δðx - aÞ þ δðx þ aÞ ða > 0Þ: ð22:144Þ
2a
δ k 2 - m 2 θ ðk 0 Þ
1
= δ k0 - k2 þ m2 θðk0 Þ þ δ k0 þ k2 þ m2 θðk0 Þ :
2 k þ m2
2
ð22:145Þ
Since the argument in the δ function is positive in the second term of (22.145),
that term vanishes and we are left with
1
δ k 2 - m 2 θ ðk 0 Þ = δ k0 - k2 þ m 2 : ð22:146Þ
2 k þ
2
m2
1
δ k 2 - m2 θðk0 Þf ðk 0 Þ = δ k0 - k2 þ m 2 f ð k 0 Þ
2 k þ
2
m2
1
= δ k0 - k2 þ m 2 f k2 þ m2 ,
2 k þ
2
m2
ð22:147Þ
where with the second equality we used the following relation [11]:
22.3 Quantization of the Scalar Field [5, 6] 1009
dk 0 δ k 2 - m2 θðk0 Þf ðk0 Þ
1
¼ dk0 δ k0 - k2 þ m2 f k2 þ m 2 ð22:149Þ
2 k þ m2
2
1
¼ f k2 þ m2 :
2 k þ m2
2
Multiplying both sides of (22.149) by g(k) and further integrating it with respect
to k, we get
f ðk0 Þ e - ik0 ðx - y0 Þ
gðkÞ eikðx - yÞ :
0
and ð22:151Þ
1 d 3 k - ikðx - yÞ
iΔþ ðx - yÞ = e
ð2π Þ3 2k0
ð22:152Þ
1
= d 4 kδ k2 - m2 θðk 0 Þe - ikðx - yÞ :
ð2π Þ3
Note that with the first equality of (22.152) the factor e-ik(x - y) is given by
1010 22 Quantization of Fields
p
k2 þm2 ðx0 - y0 Þ ikðx - yÞ
e - ikðx - yÞ = e - i e ð22:153Þ
-1
iΔ - ðx - yÞ = dkδ k 2 - m2 θðk 0 Þeikðx - yÞ : ð22:154Þ
ð2π Þ3
iΔðx - yÞ = iΔþ ðx - yÞ þ iΔ - ðx - yÞ
= ½ϕþ ðxÞ, ϕ - ðyÞ þ ½ϕ - ðxÞ, ϕþ ðyÞ
1
= d4 kδ k 2 - m2 θðk0 Þ e - ikðx - yÞ - eikðx - yÞ ð22:155Þ
ð2π Þ3
- 2i
= d4 kδ k 2 - m2 θðk0 Þ sin k ðx - yÞ:
ð2π Þ3
1
iΔðx - yÞ = d 4 kδ k2 - m2 εðk0 Þe - ikðx - yÞ , ð22:156Þ
ð2π Þ3
To show that (22.155) and (22.156) are identical, use the fact that ε(x) is an odd
function with respect to x that is expressed as
Thus, we have shown that all the above functions Δ+(x - y), Δ-(x - y), and
Δ(x - y) are Lorentz invariant. On account of the Lorentz invariance, these functions
are said to be invariant delta functions. Since the invariant delta functions are given
22.3 Quantization of the Scalar Field [5, 6] 1011
= +
as a scalar, they are of special importance in the quantum field theory. This situation
is the same with the Dirac field and photon field as well.
In (22.152) and (22.154), we assume that the integration range spans the whole
real numbers from -1 to +1. As another way to represent the invariant delta
functions in the Lorentz invariant form, we may choose the complex variable for k0.
The contour integration method developed in Chap. 6 is useful for the present
purpose. In Fig. 22.1, we draw several contours to evaluate the complex integrations
with respect to the invariant delta functions. With these contours, the invariant delta
functions are represented as [5]
1 e - ikðx - yÞ
Δ ± ð x - yÞ = d4 k : ð22:159Þ
ð2π Þ4 C± m2 - k 2
m2 - k2 = m2 þ k2 - k20 : ð22:160Þ
e - ikðx - yÞ
IC ± d4 k , ð22:161Þ
C± m2 - k2
we have
1012 22 Quantization of Fields
e - ikðx - yÞ 1 - ikðx - yÞ 1 1
IC ± = d4 k = d4 k e -
C± m2 - k 2 C± 2K k0 þ K k0 - K
ð22:162Þ
1 - ik 0 ðx0 - y0 Þ 1 1
= d3 keikðx - yÞ dk 0 e - :
2K C± k0 þ K k0 - K
1
The function k0 þK is analytic within the complex domain k0 encircled by C+. With
the function k0 -1 K , however, a simple pole is present within the said domain
(at k0 = K ). Therefore, choosing C+ for the contour, only the second term contributes
to the integral with the integration I Cþ . It is given by
1 - ikðx - yÞ 1
I Cþ = d4 k e -
Cþ 2K k0 - K
dk 0 e - ik0 ðx - y0 Þ
1 1
d 3 keikðx - yÞ
0
= -
2K Cþ k0 - K
ð22:163Þ
ikðx - yÞ 1 - ik 0 ðx0 - y0 Þ 1
= d ke
3
2πiRes e -
2K k0 - K k0 = K
- e - iK ðx - y Þ ,
1
d 3 keikðx - yÞ
0 0
= 2πi
2K
where with the third equality we used (6.146) and (6.148). Then, from (22.159) Δ+ is
given by [5]
1 -i 1 - ikðx - yÞ
Δ þ ð x - yÞ = I Cþ = d3 k e , ð22:164Þ
ð2π Þ4 ð2π Þ3 2K
where we have
e - ikðx - yÞ = e - iK ðx - y0 Þ ikðx - yÞ
0
e : ð22:165Þ
1 - ikðx - yÞ 1
IC - = d4 k e
C- 2K k0 þ K
dk 0 e - ik0 ðx - y0 Þ
1 1
d3 keikðx - yÞ
0
=
2K C- k0 þ K
2πiRes e - ik0 ðx - y0 Þ
1 1 ð22:166Þ
d3 keikðx - yÞ
0
=
2K k0 þ K k0 = - K
1 3 - ikðx - yÞ iK ðx0 - y0 Þ
= 2πi d ke e
2K
1 ikðx - yÞ
= 2πi d3 k e ,
2K
where with the third equality we used (6.148) again. Also, note that with the second
to the last equality we used
1 i 1 ikðx - yÞ
Δ - ðx - yÞ = I =
4 C-
d3 k e , ð22:168Þ
ð2π Þ ð2π Þ3 2K
-i 1
Δ ðx - yÞ = d3 k e - ikðx - yÞ - eikðx - yÞ
ð2π Þ3 2K
-1 1
= d3 k sin k ðx - yÞ: ð22:169Þ
ð2π Þ3 K
1 1
½ϕðxÞ, ϕðyÞ = d3 k e - ikðx - yÞ - eikðx - yÞ : ð22:170Þ
ð2π Þ3 2K
This relation can readily be derived from (22.137) and (22.138) as well.
Returning to the notation (22.159), we get
1014 22 Quantization of Fields
1 1 e - ikðx - yÞ e - ikðx - yÞ
Δðx - yÞ = I þ IC - =
4 Cþ
d4 k þ d4 k
ð2π Þ ð2π Þ4 Cþ m2 - k 2 C- m2 - k2
- ik ðx - yÞ
1 e
= d4 k ,
ð2π Þ4 C m2 - k 2
ð22:171Þ
where C is a contour that encircles both C+ and C- as shown in Fig. 22.1. This is
evident from (6.87) and (6.146) that relate the complex integral with residues.
We summarize major properties of the invariant delta functions below. From
(22.169), we have
d3 k
ΔðxÞ = - i½ϕðxÞ, ϕð0Þ = - i e - ikx - eikx : ð22:172Þ
ð2π Þ3 2k 0
d3 k
ΔðxÞjx0 = 0 = - i eikx - e - ikx = 0, ð22:173Þ
ð2π Þ3 2k 0
where with the second equality we used (22.167). Differentiating (22.172) with
respect to x0, we get
: d3 k d3 k
ΔðxÞ = - i ð- ik 0 Þe - ikx - ik 0 eikx = -
ð2π Þ3 2k 0 ð2π Þ3 2
e - ikx þ eikx : ð22:174Þ
: d3 k
ΔðxÞ =- eikx þ e - ikx = - δ3 ðxÞ,
x0 = 0 ð2π Þ3 2
-1 1
ΔðxÞ = d3 k sin kx:
ð2π Þ3 K
Using ðsin kxÞ = k0 cos kx, we obtain
22.3 Quantization of the Scalar Field [5, 6] 1015
: -1
Δ ð xÞ = d3 k cos kx:
ð2π Þ3
Then, we get
:
-1 -1
Δ ð xÞ = d 3 k cosð- k xÞ = d3 k cosðk xÞ = - δ3 ðxÞ:
x0 = 0 ð2π Þ3 ð2π Þ3
Using (22.131) and (22.173), however, the above equation can be rewritten as
A world interval between the events that are labeled by t, x and t, y is imaginary
(i.e., spacelike) as can be seen in (21.31). Since this nature is Lorentz invariant, we
conclude that
This clearly states that the fields at any two points x and y with spacelike
separation commute [5]. In other words, this statement can be translated into that
the physical measurements of the fields at two points with spacelike separation must
not interfere with each other. This is known as the microcausality.
Once the Lorentz invariant functions have been introduced, this directly leads to an
important quantity called the Feynman propagator. It is expressed as a c-number
directly connected to the experimental results that are dealt with in the framework of
the interaction between fields. The Feynman propagator is defined as a vacuum
expectation value of the time-ordered product of ϕ(x)ϕ( y), i.e.,
h0jT½ϕðxÞϕðyÞj0i: ð22:177Þ
With the above quantity, the time-ordered product T[ϕ(x)ϕ( y)] of the neutral
scalar field is defined by
1016 22 Quantization of Fields
T ½ϕðxÞϕðyÞ
d 3 kd3 q
= θ x0 - y0 aðkÞe - ikx þ a{ ðkÞeikx aðqÞe - iqy þ a{ ðqÞeiqy
ð2π Þ3 2 k 0 q0
d3 kd3 q
þ θ y0 - x0 aðqÞe - iqy þ a{ ðqÞeiqy aðkÞe - ikx þ a{ ðkÞeikx :
ð2π Þ3 2 k0 q0
Each integral contains four terms, only one of which survives after taking the
vacuum expectation values h0| ⋯| 0i; use the equations of a(k) j 0i = h0 j a{(k) = 0.
Then, we have
h0jT½ϕðxÞϕðyÞj0i
d3 kd3 q
= θ x0 - y0 e - ikx eiqy 0jaðkÞa{ ðqÞj0
ð2π Þ3 2 k0 q0 ð22:179Þ
d kd q
3 3
þ 3
θ y0 - x0 e - iqy eikx 0jaðqÞa{ ðkÞj0 :
ð2π Þ 2 k0 q0
Hence, we obtain
Similarly, we have
Substituting the above equations for (22.179) and integrating the resulting equa-
tion with respect to q, we get
22.3 Quantization of the Scalar Field [5, 6] 1017
d3 k
h0jT½ϕðxÞϕðyÞj0i =
ð2π Þ3 2k 0
θ x0 - y0 e - ikðx - yÞ þ θ y0 - x0 eikðx - yÞ , ð22:180Þ
ΔF ðx - yÞ h0jT½ϕðxÞϕðyÞj0i: ð22:182Þ
Notice that the definition and notation of the invariant delta functions and
Feynman propagator are different from literature to literature. Caution should be
taken accordingly. In this book we define the Feynman propagator as identical with
the vacuum expectation value of the time-ordered product [8].
Meanwhile, the commutator [ϕ+(x), ϕ-( y)] is a scalar function. Then, we have [5]
iΔþ ðx - yÞ = ½ϕþ ðxÞ, ϕ - ðyÞ = h0j½ϕþ ðxÞ, ϕ - ðyÞj0i = h0jϕþ ðxÞϕ - ðyÞj0i
= h0jϕðxÞϕðyÞj0i:
ð22:183Þ
ΔF ðx - yÞ = i θ x0 - y0 Δþ ðx - yÞ - θ y0 - x0 Δ - ðx - yÞ : ð22:185Þ
-i e - ikðx - yÞ
Δ F ð x - yÞ = d4 k , ð22:186Þ
ð2π Þ4 m2 - k 2 - iE
where E is an infinitesimal positive constant and we will take the limit E → 0 after the
whole calculation. When we dealt with the invariant delta functions in the last
section, the constant E was not included in the denominator. The presence or absence
of E comes from the difference in the integration paths of the complex integral that
are chosen for the evaluation of individual invariant functions. We will make a brief
discussion on this point at the end of this section.
It is convenient to express the Feynman propagator in the momentum space
(momentum space representation). In practice, taking the Fourier transform of
(22.186), we have
-i
= d 4 xΔF ðx - yÞeikðx - yÞ , ð22:187Þ
m2 - k 2 - iE
where we applied the relations (22.55) and (22.58) to (22.186). For the derivation,
use
1
ð2π Þ4 δ4 ðkÞ = eikx d 4 x:
-1
We define ΔF ðk Þ as
-i
ΔF ðkÞ : ð22:188Þ
m2 - k2 - iE
1 1 1
A - : ð22:189Þ
2K k 0 - ð- K þ iEÞ k 0 - ðK - iEÞ
1 2K - 2iE 1 - ðiE=K Þ
A= 2 = 2 : ð22:190Þ
2K K - k 0 - 2iKE þ E
2 2 m - k 2 - 2iKE þ E2
22.3 Quantization of the Scalar Field [5, 6] 1019
ΔF ðx - yÞ
-i 1 - ikðx - yÞ 1 1
= d4 k e - : ð22:191Þ
ð2π Þ4 2K k0 - ð- K þ iEÞ k0 - ðK - iEÞ
þ1
-i 1 - ik0 ðx0 - y0 Þ
ΔF ðx - yÞ ¼ d3 keikðx - yÞ dk 0 e
ð2π Þ4 -1 2K
1 1
× - : ð22:192Þ
k0 - ð- K þ iEÞ k0 - ðK - iEÞ
þ1
1 - ik0 ðx0 - y0 Þ 1 1
I dk 0 e - : ð22:193Þ
-1 2K k0 - ð- K þ iEÞ k0 - ðK - iEÞ
1 1
and
k0 - ð- K þ iEÞ k0 - ðK - iEÞ
tend uniformly to zero with |k0| → 1. This situation allows us to apply Jordan’s
lemma to the complex integration of (22.192); see Sect 6.8, Lemma 6.6. To perform
the complex integration, we have the following two cases.
1020 22 Quantization of Fields
(a)
Γ
i
− +
0
−
= +
(b)
= +
− 0 −
−i
Fig. 22.2 Contours to calculate the complex integrations with respect to the Feynman propagator
ΔF(x - y) of the scalar field. (a) Contour integration with the upper semicircle C. (b) Contour
~
integration with the lower semicircle C
1. Case I:
We choose an upper semicircle C for the contour integration (see Fig. 22.2a).
As in the case of Example 6.4, we define IC as
1 - ik0 ðx0 - y0 Þ 1 1
IC = dk 0 e - : ð22:194Þ
2K k 0 - ð- K þ iEÞ k0 - ðK - iEÞ
C
22.3 Quantization of the Scalar Field [5, 6] 1021
As the function k0 - ð1K - iEÞ is analytic on and inside the contour C, its associated
contour integration vanishes due to the Cauchy’s integral theorem (Theorem
6.10). Hence, we are left with
1 - ik0 ðx0 - y0 Þ 1
IC = dk 0 e : ð22:195Þ
2K k0 - ð- K þ iEÞ
C
e - ik0 ðx - y0 Þ
= e - iRðx - y0 Þ cos θ Rðx0 - y0 Þ sin θ
0 0
e :
Taking the absolute value of both sides for the above equation, we obtain
e - ik0 ðx - y0 Þ
= e - iRðx - y0 Þ cos θ
eRðx - y0 Þ sin θ
= eR ð x - y0 Þ sin θ
0 0 0 0
:
Taking E → 0, we obtain
1022 22 Quantization of Fields
1 iK ðx0 - y0 Þ
I = 2πi e : ð22:197Þ
2K
2. Case II:
We choose a lower semicircle C for the contour integration (see Fig. 22.2b).
As in Case I, we define I as
C
1 - ik0 ðx0 - y0 Þ 1 1
I = dk 0 e - :
C 2K k0 - ð- K þ iEÞ k0 - ðK - iEÞ
C
This time, as the function k0 - ð-1 KþiEÞ is analytic on and inside the contour C, its
associated contour integration vanishes due to the Cauchy’s integral theorem, and
so we are left with
1
1 - ik0 ðx0 - y0 Þ 1
lim I = dk 0 e -
R→1 C -1 2K k0 - ðK - iEÞ
ð22:198Þ
1 - ik0 ðx0 - y0 Þ 1
þ lim dk 0 e - :
R→1 Γ
R
2K k0 - ðK - iEÞ
Using Jordan’s lemma as in Case I and applying (6.146) and (6.148) to the
present case, we get
- e - ik0 ðx - y Þ
1 0 0 1 - iK ðx0 - y0 Þ
I = - 2πi = 2πi e : ð22:199Þ
2K k0 = K 2K
In (22.199), the minus sign before 2πi with the first equality resulted from that
the contour integration was taken clockwise so that k0 = K - iE was contained
inside the closed contour (i.e., the lower semicircle C). Note that in this case we
must have x0 - y0 > 0 for Jordan’s lemma to be applied.
Combining the results of Case I and Case II and taking account of the fact that we
must have x0 - y0 < 0 (x0 - y0 > 0) with Case I (Case II), we get
θ - x0 þ y0 eiK ðx - y Þ þ θ x0 - y0 e - iK ðx - y Þ :
1 0 0 0 0
I = 2πi ð22:200Þ
2K
-i
Δ F ðx - y Þ = d 3 kIe - ikðx - yÞ
ð2π Þ4
d 3 k ikðx - yÞ
θ - x0 þ y0 eiK ðx - y Þ þ θ x0 - y0 e - iK ðx - y Þ
1 0 0 0 0
= e
ð2π Þ3 2K
1 d3 k ik ðx - yÞ
= 3
θ - x0 þ y0 þ θ x0 - y0 e - ikðx - yÞ
ð2π Þ 2K
= h0jT ½ϕðxÞϕðyÞj0i,
ð22:201Þ
where with the second equality we switched the integral variable k to -k in the first
term; the last equality came from (22.180). Thus, we have shown that (22.186)
properly reproduces the definition of the Feynman propagator ΔF(x - y) given by
(22.182).
To calculate the invariant delta functions, we have encountered several examples
of the complex integration. It is worth noting that comparing the contours C of
Fig. 22.2a and C- of Fig. 22.1 their topological features are virtually the same. This
is also the case with the topological relationship between the contours C of
Fig. 22.2b and C+ of Fig. 22.1. That is, the former two contours encircle only
k0 = - K (or k0 = - K + iE), whereas the latter two contours encircle only
k0 = K (or k0 = K - iE).
Comparing (22.186) with (22.159) and (22.171), we realize that the functional
forms of the integrand are virtually the same except for the presence or absence of
the term -iE in the denominator. With the complex integration of k0, simple poles
±K are located inside the contour for the integration regarding (22.159) and
(22.171). With (22.186), however, if the term -iE were lacked, the simple poles
±K would be located on the integration path. To avoid this situation, we “raise” or
“lower” the pole by adding +iE (E > 0) or -iE to its complex coordinate, respectively,
so that we can apply Cauchy’s Integral Theorem (see Theorem 6.10) and Cauchy’s
Integral Formula (Theorem 6.11) to the contour integration [11]. After completing
the relevant calculations, we take E → 0 to evaluate the integral properly; see
(22.197), for example. Nevertheless, we often do without this mathematical manip-
ulation. An example can be seen in Chap. 23.
So far, we have described the quantization procedures of the scalar field. We will
subsequently deal with the quantization of the Dirac field and electromagnetic field
(or photon field). The quantization processes and resulting formulation have differ-
ent faces depending on different types of relativistic fields. Nevertheless, the quan-
tization methods can be integrated into a unified approach. This section links the
contents of the previous sections to those of the subsequent sections.
In Sect. 22.2.2, we had
1024 22 Quantization of Fields
1
E= j kihk j dk, ð22:64Þ
-1
p0
where p = , to which p0 denotes the energy and p represents the three-
p
dimensional momenta of a relativistic particle. We define the quantity p2 as
p2 p μ p μ :
p2 = p0 2 - p2 = m2 or p2 - m2 = 0:
δðx - aÞ δðx þ aÞ
δ x 2 - a2 = þ :
2j aj 2j - aj
Exchanging x and a and using the fact that the δ function is an even function, we
have
δðx - aÞ δðx þ aÞ
δ x2 - a2 = þ ,
2j xj 2j - x j
δ p2 - m2 = δ p0 2 - p2 þ m2
δ p0 - p2 þ m2 δ p0 þ p2 þ m 2
= þ :
2jp0 j 2j - p0 j
1 1 δ p0 - p2 þ m2 δ p0 þ p2 þ m 2
= dp3 dp0 þ j pihp j :
-1 -1 2jp0 j 2j - p0 j
Operating hxj and jΦi on both sides of the above equation from the left and right,
respectively, we get
xEΦ = hxjΦi
1 ð22:204Þ
dp3
= xjp, E p hp, E p jΦi þ hxjp, - E p ihp, - Ep jΦi ,
- 1 2E p
where jxi represents the continuous space–time basis vectors of the Minkowski
space (see Fig. 10.1, which shows a one-dimensional case); jΦi denotes the quantum
state of the relativistic particle (photon, electron, etc.) we are thinking of. According
to (10.90), we express hx| Φi of (22.204) as
ΦðxÞ hxjΦi:
1
x, tjp, ± E p = eipx : ð22:205Þ
3
ð2π Þ
Φ ð xÞ
1
dp3 1
= eipx e - iEp t αþ ðpÞΦþ p, E p þ eiEp t α{- ðpÞΦ - p, - E p ,
- 1 2E p ð2π Þ 3
ð22:206Þ
where α+(p) and α{- ðpÞ are the annihilation and creation operators (i.e., q-numbers),
respectively. In (22.206) we have defined
Even though Φ±(p, ±Ep) contains the representation of spinor, vector, etc., it is a
c-number. Further defining the operators such that [8]
we obtain
22.3 Quantization of the Scalar Field [5, 6] 1027
Φ ð xÞ
1
dp3
= eipx e - iEp t aþ ðpÞΦþ p, E p þ eiEp t a{- ðpÞΦ - p, - E p :
3
-1 ð2π Þ 2E p
ð22:207Þ
Then, we get
1
dp3
ΦðxÞ ¼ eipx e - iEp t aþ ðpÞΦþ p, Ep
-1 ð2π Þ3 2E p
where with the second equality e - ipx = eipx e - iEp t and eipx = e - ipx eiEp t; in the second
term of RHS p was switched to -p. In (22.208), e.g., b{- ðpÞ is often used instead of
a{- ð- pÞ for the sake of distinction and clarity.
The quantization of the relativistic fields is based upon (22.208) in common with
regard to the neutral scalar field, Dirac field, and the photon field. The first term of
(22.208) is related to the positive-energy state that is characterized by a+(p) and
Φ+(p, Ep). The second term, in turn, is associated with the negative-energy state
characterized by a{- ð- pÞ and Φ-(-p, -Ep). In Sect. 22.3.2, we established a basic
equation of (22.98) as a starting point for the quantization of the scalar field. It was
unclear whether (22.98) is Lorentz invariant. Its equivalent expression of (22.208),
however, has been derived from the explicitly invariant equation of (22.203). Thus,
this enables us to adopt (22.208) as a fundamental equation for the quantization of
relativistic fields.
As in (22.208), it is always possible to separate (or decompose) the field operator
Φ(x) into the positive-energy and negative-energy parts. Defining
we have
1028 22 Quantization of Fields
1
dp3
Φþ ðxÞ e - ipx aþ ðpÞΦþ p, E p ,
-1 3
ð2π Þ 2E p
1 ð22:209Þ
dp3
Φ - ð xÞ eipx a{- ð - pÞΦ - - p, - E p :
-1 ð2π Þ3 2E p
Φþ p, E p Φ - - p, - Ep 1:
Thus, we have reproduced (22.98). In the case of the Dirac field, in turn, we have
following correspondence:
where we consider the helicity h as an additional freedom (see Sect. 22.4). Regarding
other specific notations, see the subsequent sections.
As in the case of the scalar field, we start with the Lagrangian density of the Dirac
field. The Lagrangian density L(x) is given by
Then, the action integral S(ψ) follows from (22.210) such that
δSðψ Þ = dxδLðxÞ
= dxδψ iγ μ ∂μ ψ - mψ þ dx ψiγ μ ∂μ - mψ δψ
= dxδψ iγ μ ∂μ ψ - mψ þ ½ψiγ μ δψ 1
-1 þ dx - ∂μ ψiγ μ - mψ δψ
ð22:212Þ
where with the second to the last equality we used integration by parts and δψ was
supposed to vanish at x = ± 1. Since ψ and ψ are regarded as independent
variables, for δS(ψ) = 0 to hold with any arbitrarily chosen δψ and δψ we must have
iγ μ ∂μ - m ψ ðxÞ = 0 ð22:213Þ
and
i ψ ∂μ γ μ þ mψ = 0: ð22:215Þ
Equations (22.214) and (22.215) are called the adjoint Dirac equation. Rewriting
(22.214), we have
1030 22 Quantization of Fields
i∂μ ψ { γ 0 γ μ þ mψ { γ 0 = 0,
# take a transposition and rearrange the terms :
T
iðγ Þ γ 0 ∂μ ψ þ mγ 0 ψ = 0,
μ T
where we used the fact that γ 0 is an Hermitian operator and that γ k (k = 1, 2, 3) are
anti-Hermitian. Rewriting (22.216), finally we recover
iγ μ ∂μ - m ψ ðxÞ = 0: ð22:213Þ
Thus, (22.213) can naturally be derived from (22.214), which is again equivalent
to (21.54).
As in the case of the scalar field, the Hamiltonian density ℋ(x) of the Dirac field
is given by
Since ψ(x) represents a spinor field that is described by a (4, 1) matrix (i.e.,
column vector), π(x) is a (1, 4) matrix (row vector) so that ℋ is a scalar, i.e., a (1, 1)
matrix. If we need to distinguish each component, we put an index, e.g., ψ(x)a and
π(x)b (a, b = 1, 2, 3, 4). When we need to distinguish the time coordinate from the
space coordinates, we describe, e.g., π(x) as π(t, x). We have
∂LðxÞ
π ð xÞ = = ψ ðxÞiγ 0 = ψ { ðxÞγ 0 iγ 0 = iψ { ðxÞ: ð22:218Þ
∂ ½ ∂ 0 ψ ð xÞ
∂LðxÞ
π a ð xÞ = = ψ ðxÞiγ 0 = ψ { ðxÞγ 0 iγ 0 a = i ψ { ðxÞa
∂ ½ ∂ 0 ψ a ð xÞ a
= iψ ðxÞa : ð22:219Þ
Note that in (22.219) ψ {(x) is a (1, 4) matrix and that ψ (x)a is its ath component.
The derivation of (22.219) is based on a so-called compendium quantization method.
Interested readers are referred to literature for further discussion [2, 6, 8]. Using
(22.219), the Hamiltonian density ℋ of (22.217) is expressed as
:
ℋðxÞ = iψ { ðxÞ ψ ðxÞ - ψ ðxÞ iγ μ ∂μ - m ψ ðxÞ
:
= iψ { ðxÞ ψ ðxÞ - ψ { ðxÞγ 0 iγ μ ∂μ ψ ðxÞ þ ψ ðxÞmψ ðxÞ
:
= iψ { ðxÞ ψ ðxÞ - ψ { ðxÞγ 0 iγ μ ∂μ ψ ðxÞ þ ψ ðxÞmψ ðxÞ
3
=- ψ { ðxÞγ 0 iγ k ∂k ψ ðxÞ þ ψ ðxÞmψ ðxÞ
k=1
3
= ψ ð xÞ - iγ k ∂k þ m ψ ðxÞ:
k=1
According to the classical theory (i.e., the theory before the quantum field theory)
based on the one-particle wave function and its probabilistic interpretation, the
integrand of (22.221) can be viewed as an expectation value of the one-particle
energy if ψ(x) is normalized. Then, the classical Schrödinger equation could be
described as
3
- iγ 0 γ k ∂k þ γ 0 m - γ 0 γ 0 i∂0 ψ ðxÞ = 0: ð22:223Þ
k=1
3
- iγ k ∂k þ m - iγ 0 ∂0 ψ ðxÞ = 0:
k=1
iγ μ ∂μ - m ψ ðxÞ = 0: ð22:213Þ
Thus, once again we have recovered the Dirac equation. If Schrödinger had
invented the gamma matrices, he would have discovered the Dirac equation
(22.213).
If we choose ϕ(x) of (21.59) for the solution of the Dirac equation, H is identical
with Hp of (21.131) and gives a positive one-particle energy p0. If, however, χ(x) of
(21.59) is chosen, H is identical with H-p that gives a negative energy -p0, in
agreement with the result of Sect. 21.3.3.
Unlike the classical theory, ψ(x) must be dealt with as a quantum-mechanical
operator. Instead of the commutation relation given in (22.84) with the scalar field,
we use the following equal-time anti-commutation relation (see Sect. 21.3.1)
described by
T
fψ ðt, xÞ, π ðt, yÞg ψ ðt, xÞπ ðt, yÞ þ π ðt, yÞT ½ψ ðt, xÞT : ð22:225Þ
In the second term of (22.225), π(t, y)ψ(t, x) is replaced with π(t, y)T[ψ(t, x)]T and
the components of π(t, y) are placed in front of those of ψ(t, x).
Another word of caution is that both ψ(t, x) and π(t, y) are Grassmann numbers.
Let a and b be Grassmann numbers. Then, we have ab = - ba, a2 = b2 = 0. Let
A = (aij) and B = (bij) be matrices whose matrix elements are Grassmann numbers.
Unlike ordinary c-numbers, we have aijbkl ≠ bklaij. Hence, the standard matrix
multiplication rules must be modified. If numbers we are dealing with are not
Grassmann numbers but ordinary c-numbers, we would simply have
22.4 Quantization of the Dirac Field [5, 6] 1033
T
π ðt, yÞT ½ψ ðt, xÞT = ψ ðt, xÞπ ðt, yÞ:
This equation, however, is not true of the present case where both ψ(t, x) and π(t,
y) are Grassmann numbers. To highlight this feature, let us think of a following
simple example.
Example 22.1 Suppose that we have matrices
a1
a2
A= and B = ðb1 b2 b3 b4 Þ: ð22:226Þ
a3
a4
Then, we have
a 1 b1 a1 b2 a1 b3 a1 b4
a2 b1 a2 b2 a2 b3 a2 b4
AB = ,
a3 b1 a3 b2 a3 b3 a3 b4
a4 b1 a4 b2 a4 b3 a4 b4
ð22:227Þ
b1 a1 b2 a1 b3 a1 b4 a1
T b1 a2 b2 a2 b3 a2 b4 a2
BT AT = :
b1 a3 b2 a3 b3 a3 b4 a3
b1 a4 b2 a4 b3 a4 b4 a4
If all the above numbers are ordinary c-numbers, we have no problem with
AB = (BTAT)T. But, if we are dealing with the Grassmann numbers, we have
AB ≠ (BTAT)T. If we express the anti-commutator between A and B, symbolically
we have
T
fA, Bg = AB þ BT AT
a 1 b1 þ b1 a1 a1 b2 þ b2 a1 a1 b3 þ b3 a1 a 1 b4 þ b4 a1
a2 b1 þ b1 a2 a2 b2 þ b2 a2 a2 b3 þ b3 a2 a 2 b4 þ b4 a2
= :
a3 b1 þ b1 a3 a3 b2 þ b2 a3 a3 b3 þ b3 a3 a 3 b4 þ b4 a3
a4 b1 þ b1 a4 a4 b2 þ b2 a4 a4 b3 þ b3 a4 a 4 b4 þ b4 a4
Thus, this simple example shows that care should be taken when dealing with the
Grassmann numbers, particularly in the case of matrix calculations. □
Getting back to our present issue, we write the first equation of (22.224) in a
matrix form to get
T
ψ1 π1
ψ2 π2
ð π1 π2 π3 π4 Þ þ ψ1 ψ2 ψ3 ψ4
ψ3 π3
ψ4 π4
ð22:228Þ
iδ ðx - yÞ
3
iδ3 ðx - yÞ
= ,
iδ3 ðx - yÞ
iδ3 ðx - yÞ
where with RHS all the off-diagonal elements vanish with the diagonal elements
being iδ3(x - y). That is, (22.228) is a diagonal matrix. Also, notice that using
(22.218), (22.228) can be rewritten as
ψ 1
T
ψ1
ψ2 ψ 2
ð ψ 1 ψ 2 ψ 3 ψ 4 Þ þ ψ1 ψ2 ψ3 ψ4
ψ3 ψ 3
ψ4 ψ 4
ð22:229Þ
δ ð x - yÞ
3
δ 3 ð x - yÞ
= :
δ3 ðx - yÞ
δ3 ðx - yÞ
Another way to show (22.228) is to write down the matrix elements in such a way
that [8]
Namely,
Note that neither α nor β is an index that indicates the four-vector, but that both α
and β are indices of spinors. Then, we specify α, β = 1, 2, 3, 4 instead of α, β = 0,
1, 2, 3.
Equations (22.230) and (22.231) imply the presence of a transposed matrix,
which has appeared in (22.228) and (22.229). Unlike (22.84) describing the com-
mutation relation of the scalar field, the anti-commutation relation for the Dirac field
of (22.224), (22.229), and (22.230) is represented by (4, 4) matrices.
The quantization of the Dirac field can be performed basically in parallel with that of
the scalar field. That is, (A) the operator ψ(x) needs to be Fourier-expanded in an
invariant form. (B) The creation operator and annihilation operator are introduced.
(C) The anti-commutation relation between the creation and annihilation operators
needs to be established. The calculation processes, however, differ from the scalar
field quantization in several points. For instance, (a) ψ(x) does not need to be
Hermitian. (b) The calculations are somewhat time-consuming in that we have to
do (4, 4) matrices calculations. (But they are straightforward.)
A. We start with the Fourier expansion of ψ(x). In Sect. 22.3.6 we had
1
dp3
Φ ð xÞ =
-1 ð2π Þ3 2Ep
In the Dirac field, we must include an additional freedom of the helicity h that
takes a number 1 or -1. Accordingly, Fourier expansion of the identity operator
(22.203) is reformulated as
1
E= δ p2 - m2 j p, hihp, h j dp4 :
- 1h = ± 1
as well as
With respect to the Dirac field, using ψ(x) in place of Φ(x) in the above
expression, we get [8]
ψ ð xÞ
1
dp3
= e - ipx bðp, hÞuðp, hÞ þ eipx d { ðp, hÞvðp, hÞ ,
-1 3
ð2π Þ 2p0 h= ±1
ð22:232Þ
where p0 was used instead of Ep in (22.208). The operators b(p, h) and d{(p, h)
are an annihilation operator of an electron and a creation operator of a positron,
respectively.
The introduction of d{(p, h) needs some explanation. So far, we permitted the
negative-energy state. However, if a particle carrying a negative energy did exist
in the real world, we would face an intolerable situation. To get rid of this big
hurdle, Dirac proposed the hole theory (1930). Even though several aspects of
the hole theory are explained in literature [6, 8], we wish to add a few comments
to the theory. Dirac thought that vacuum consisted of electrons that occupied all
the negative-energy states (strictly speaking, the energy states with an energy of
-m or less where m is an electron mass). If an electron happened to be excited so
as to have a positive energy, a created hole within the negative-energy electrons
would behave as if it were a particle carrying a positive charge (|e| = - e) and a
positive energy with a rest mass m, i.e., a positron. The positron predicted by
Dirac was indeed soon discovered experimentally by Carl Anderson (1932).
Getting back to the present issue, the operator d{(p, h) might well be interpreted
as an annihilation operator of an electron carrying a negative energy. But once
the positron was discovered in the real world, d{(p, h) must be interpreted as the
creation operator of a positron.
Notice again that according to the custom the annihilation operator b(p, h) is a
“coefficient” of the positive-energy state u(p, h) and that the creation operator
d{(p, h) is a coefficient of the negative-energy state v(p, h). The anti-
commutation relations will be introduced because electron is a Fermion particle.
At the same time, the anti-commutation relations facilitate various calculations
(vide infra). Note that unlike the scalar field ϕ(x) described in (22.98), ψ(x) is
22.4 Quantization of the Dirac Field [5, 6] 1037
asymmetric with respect to the creation and annihilation operators, reflecting the
fact that ψ(x) is not Hermitian.
C. Our next task is to translate the canonical anti-commutation relations (22.224)
into the anti-commutation relations between the creation and annihilation oper-
ators. To this end, we wish first to solve (22.232) with respect to b(p, h). The
calculation procedures for this are as follows:
(a) Multiply both sides of (22.232) by e-iqx and integrate them with respect to x
to apply (22.101). As a result, we obtain
1
dx3 e - iqx ψ ðxÞ
-1
ð22:233Þ
ð2π Þ3 - iq0 t 0 0 iq0 t { 0 0
= e bðq, h Þuðq, h Þ þ e d ðq, h Þvðq, h Þ :
2q0 h0 = ± 1
(b) Multiplying both sides of (22.233) by u{(q, h) from the left and using
(21.156) and (21.157), we delete d{(q, h′) from (22.233) to get
1
dx3 e - iqx u{ ðq, hÞψ ðxÞ = ð2π Þ3 2q0 e - iq0 t bðq, hÞ: ð22:234Þ
-1
(c) Next, we determine the anti-commutation relations between b(q, h) and d(q,
h) (and their adjoints). For example,
1 1
dx3 dy3 eipx e - iqy u{ ðp, hÞψ ðxÞψ { ðyÞuðq, h0 Þþψ { ðyÞuðq, h0 Þu{ ðp, hÞψ ðxÞ:
-1 -1
ð22:236Þ
where with the last equality we used (21.156) and (22.229). Inserting (22.238) into
(22.236), we obtain
1 1
2p0 eitðp0 - q0 Þ
bðp, hÞ, b{ ðq, h0 Þ = δhh0 dx3 dy3 eiðqy - pxÞ δ3 ðx - yÞ
ð2π Þ3 2p0 -1 -1
1
2p0 eitðp0 - q0 Þ
= δhh0 dx3 eixðq - pÞ
ð2π Þ3 2p0 -1
22.4 Quantization of the Dirac Field [5, 6] 1039
2p0 eitðp0 - q0 Þ
= δhh0 ð2π Þ3 δ3 ðq - pÞ = δhh0 δ3 ðq - pÞ, ð22:239Þ
ð2π Þ3 2p0
where with the second to the last equality we used (22.101); the last equality was due
to the inclusion of δ3(q - p), which leads to q0 - p0 = 0. Thus, the result of (22.239)
has largely been simplified. We remark that the above calculation procedures offer
an interesting example for a computation of numbers that contain both Grassmann
numbers and ordinary c-numbers.
Other anti-commutation relations can be derived similarly. For example, we
obtain the following relation as in the case of (22.235):
1
1
d { ðq, hÞ = dx3 e - iqx v{ ðq, hÞψ ðxÞ: ð22:240Þ
3
ð2π Þ 2p0 -1
These results can be viewed in parallel with (22.105) in spite of the difference
between the commutation and anti-commutation relations and the difference in
presence or absence of the helicity freedom. Since (22.242) does not contain
space–time coordinates, it is useful for various calculations in relation to the
quantum fields.
The Dirac field is characterized by the antiparticles. The antiparticle of electron is the
positron. In Sect. 21.4 we have introduced the notion of the charge conjugation. It
can be regarded as the particle–antiparticle transformation. Strictly speaking, it is not
until we properly describe the field quantization by introducing the creation and
annihilation operators that the charge conjugation has its place. Nonetheless, it is of
great use in a mathematical approach to introduce the notion of charge conjugation
into the spinor field before being quantized. In fact, the formulation of the charge
1040 22 Quantization of Fields
conjugation developed in Sect. 21.4 can directly be imported along with the creation
and annihilation operators into the field quantization of the antiparticle.
In Sect. 21.4 the charge conjugation of the spinor ψ into ψ C was described as
ψ C = Cψ T = iγ 2 ψ : ð21:161Þ
Borrowing this notation and using (22.232), the antiparticle field ψ anti(x) is
expressed as
ψ anti ðxÞ
1
dp3
= eipx b{ ðp, hÞvðp, hÞ þ e - ipx dðp, hÞuðp, hÞ : ð22:244Þ
-1 3
ð2π Þ 2p0 h = ± 1
Comparing (22.232) and (22.244), we find that ψ(x) and ψ anti(x) can be obtained
merely by exchanging b(p, h) and d(p, h).
The discrimination between the particle and antiparticle is associated with
whether the operator ψ(x) is Hermitian or not. Since in the case of the neutral scalar
field the operator ϕ(x) is Hermitian, we have no discrimination between the particle
and antiparticle.
With the Dirac field, however, the operator ψ(x) is not Hermitian, but the particle
and antiparticle should be distinguished. We may think that destroying the negative-
energy state is equivalent to creating the positive-energy state, or vice versa.
Interested readers are referred to appropriate literature with more detailed discus-
sions on the particle and antiparticle as well as their implications [8, 12].
In this section we introduce the invariant delta functions of the Dirac field. As in the
case of the scalar field, these functions are of great importance in studying the
interaction of quantum fields. The derivation of these functions is parallel with
those of the scalar field.
We have performed the canonical quantization by setting the equal-time anti-
commutation relation (22.224). Combining (22.224) with the Dirac field expansion
22.4 Quantization of the Dirac Field [5, 6] 1041
dp3
ψ þ ðxÞ = e - ipx bðp, hÞuðp, hÞ,
3
ð2π Þ 2p0 h = ± 1
ð22:246Þ
dp3
ψ - ðxÞ = eipx b{ ðp, hÞuðp, hÞ:
ð2π Þ3 2p0 h = ± 1
dp3
ψ - ðxÞ = eipx d{ ðp, hÞvðp, hÞ,
3
ð2π Þ 2p0 h = ± 1
dp3
ψ þ ðxÞ = e - ipx dðp, hÞvðp, hÞ:
3
ð2π Þ 2p0 h = ± 1
fψ þ ðxÞ, ψ - ðxÞg
1 dq3 dp3
= e - iðpx - qyÞ ð22:247Þ
ð2π Þ 3
2p0 2q0 h, h0 = ± 1
× bðp, hÞb{ ðq, h0 Þuðp, hÞuðq, h0 Þ þ b{ ðq, h0 Þbðp, hÞuðq, h0 Þuðp, hÞ :
As implied in Example 22.1, the factor uðq, h0 Þuðp, hÞ in the second term of the
integrand of (22.247) must be read as
1042 22 Quantization of Fields
T
uðq, h0 Þuðp, hÞ = uðq, h0 Þ uðp, hÞT = uðp, hÞuðq, h0 Þ:
T
ð22:248Þ
Notice that since (22.248) does not contain the Grassmann numbers, unlike
(22.227) the second equality of (22.248) holds. Then, (22.247) can be rewritten as
1 dq3 dp3
fψ þ ðxÞ, ψ - ðxÞg = e - iðpx - qyÞ
ð2π Þ3 2p0 2q0 h, h0 = ± 1
× uðp, hÞuðq, hÞ bðp, hÞb{ ðq, h0 Þ þ b{ ðq, h0 Þbðp, hÞ
1 dq3 dp3
= e - iðpx - qyÞ uðp, hÞuðq, h0 Þ bðp, hÞ, b{ ðq, h0 Þ
ð2π Þ3 2p0 2q0 h, h0 = ± 1
1 dq3 dp3
= e - iðpx - qyÞ uðp, hÞuðq, h0 Þδhh0 δ3 ðq - pÞ ½{
ð2π Þ3 2p0 2q0 h, h0 = ± 1
1 dp3
= e - ipðx - yÞ uðp, hÞuðp, hÞ
ð2π Þ3 2p0 h = ± 1
1 dp3 - ipðx - yÞ
= e pμ γ μ þ m ½{{
ð2π Þ3 2p0
1 dp3 - ipðx - yÞ
= iγ μ ∂μ þ m = iγ μ ∂μ þ m iΔþ ðx - yÞ:
x x
e ½{{{
ð2π Þ3 2p0
ð22:249Þ
ð22:250Þ
iγ μ ∂μ þ m iΔ - ðx - yÞ iS - ðx - yÞ = fψ - ðxÞ, ψ þ ðyÞg:
x
Sðx - yÞ Sþ ðx - yÞ þ S - ðx - yÞ,
we obtain [5]
22.4 Quantization of the Dirac Field [5, 6] 1043
fψ ðxÞ, ψ ðyÞg
= fψ þ ðxÞ, ψ þ ðxÞg þ fψ þ ðxÞ, ψ - ðxÞg þ fψ - ðxÞ, ψ þ ðxÞg þ fψ - ðxÞ, ψ - ðxÞg
= fψ þ ðxÞ, ψ - ðxÞg þ fψ - ðxÞ, ψ þ ðxÞg = iSþ ðx - yÞ þ iS - ðx - yÞ
= iSðx - yÞ,
ð22:252Þ
where with the first equality, the first term and fourth term vanish because of
(22.242).
Similarly to the case of the scalar field, the Feynman propagator SF(x - y) is defined
as a vacuum expectation value of the T-product such that
In the case of the Dirac field, the time-ordered product (T-product) T½ψ ðxÞψ ðyÞ is
defined by
Note the minus sign before the second term of (22.254). It comes from the fact
that the Fermion operators ψ(x) and ψ ðyÞ are anti-commutative. We have to pay
attention to the symbolic notation of (22.254) as in the case of the calculation of
fψ þ ðxÞ, ψ - ðxÞg and fψ - ðxÞ, ψ þ ðyÞg. In fact, ψ ðyÞψ ðxÞ must be dealt with as a
(4, 4) matrix as well. Taking the vacuum expectation value of (22.254), we have
SF ð x - y Þ = i θ x 0 - y 0 Sþ ð x - y Þ - θ y 0 - x 0 S - ð x - y Þ : ð22:258Þ
T ½ψ ðxÞψ ðyÞ
1
θ ð x0 - y 0 Þ
= dq3 dp3 e - ipx bðp, hÞuðp, hÞ þ eipx d { ðp, hÞvðp, hÞ
-1 ð2π Þ3 2p0 h= ±1
1
θ ð y0 - x0 Þ
- dq3 dp3 eipy b{ ðq, h0 Þuðq, h0 Þ þ e - iqy dðq, h0 Þvðq, h0 Þ
-1 ð2π Þ3 2p0 0
h = ±1
Therefore, we obtain
1
θ ð y0 - x0 Þ
- dq3 dp3 h0j eipy b{ ðq, h0 Þuðq, h0 Þþ e - iqy d ðq, h0 Þvðq, h0 Þ
-1 ð2π Þ3 2p0 h0 ¼ ± 1
Equation (22.260) comprises two integrals and the integrands of each integral
comprise 2 × 2 = 4 terms. However, we do not need to calculate all of them. This is
because in (22.260) we only have to calculate the integrals with respect to the
following two integrands that lead to non-vanishing integrals:
and
22.4 Quantization of the Dirac Field [5, 6] 1045
bðp, hÞ j 0i = d ðq, h0 Þ j 0i = 0
and
We estimate the above two integrands A and B one by one. First, we rewrite A as
ð22:263Þ
where the second equality comes from (22.239) and the third equality from (22.262).
Paying attention to the fact that uðp, hÞuðq, h0 Þ is not a quantum operator but a (4, 4)
matrix, we have
Notice that h0| 0i = 1, i.e., the vacuum state has been normalized. Thus, we obtain
Similarly, we have
Substituting (22.265) and (22.266) for (22.260), by the aid of (22.267) we get
1 1
θðx0 - y0 Þ θ ðy 0 - x 0 Þ
h0jT ½ψ ðxÞψ ðyÞj0i = dq3 dp3
3
A- dq3 dp3 B
-1 ð2π Þ 2p0 -1 ð2π Þ3 2p0
1
θðx0 - y0 Þ
= dq3 dp3 e - iðpx - qyÞ δhh0 δ3 ðq - pÞuðp, hÞuðq, h0 Þ
-1 ð2π Þ3 2p0 h, h0 = ± 1
1
θðy0 - x0 Þ
- dq3 dp3 e - iðqy - pxÞ δhh0 δ3 ðq - pÞνðp, hÞνðq, h0 Þ:
-1 ð2π Þ3 2p0 h, h0 = ± 1
ð22:268Þ
Performing the integration with respect to q and summation with respect to h′, we
obtain
1
θ ð x0 - y0 Þ
h0jT ½ψ ðxÞψ ðyÞj0i = dp3 3
e - ipðx - yÞ uðp, hÞuðp, hÞ
-1 ð2π Þ 2p0 h = ± 1
1 ð22:269Þ
θ ð y0 - x0 Þ
- dp3 eipðx - yÞ νðp, hÞνðp, hÞ:
-1 ð2π Þ3 2p0 h = ± 1
ð22:270Þ
SF ðx - yÞ = iγ μ ∂μ þ m ΔF ðx - yÞ,
x
ð22:271Þ
where ΔF(x - y) was given by (22.201). With the derivation of (22.271), use
(22.270) and perform the differentiation with respect to the individual coordinate
xμ. We should be careful in treating the differentiation with respect to x0, because it
contains the differentiation of θ(x0 - y0).
We show the related calculations as follows:
22.4 Quantization of the Dirac Field [5, 6] 1047
x
iγ 0 ∂0 ΔF ðx - yÞ
1
iγ 0 dp3 x
= 3
∂0 θ x0 - y0 e - ipðx - yÞ þ θ y0 - x0 eipðx - yÞ
- 1 ð2π Þ 2p0
1
iγ 0 dp3
= 3
δ x0 - y0 e - ipðx - yÞ þ θ x0 - y0 ð - ip0 Þe - ipðx - yÞ
- 1 ð2π Þ 2p0
where with the second to the last equality the first and second integrands canceled
out each other. For this, use the properties of δ functions; i.e., δ(x0 - y0)e-ip(x - y) =
δ(x0 - y0)eip(x - y) and δ(x0 - y0) = δ(y0 - x0) along with
1 ipðx - yÞ 3 1
- 1e d p = - 1 e - ipðx - yÞ d3 p.
Also, we make use of the following relations:
3
x
iγ k ∂k ΔF ðx - yÞ
k=1
3 1
iγ k dp3
∂k θ x0 - y0 e - ipðx - yÞ þ θ y0 - x0 eipðx - yÞ
x
= 3
k=1 -1 ð2π Þ 2p0
3 1
iγ k dp3 ð22:273Þ
= 3
θ x0 - y0 ð - ipk Þe - ipðx - yÞ
- 1 ð2π Þ 2p0
k=1
þ θðy0 - x0 Þðipk Þeipðx - yÞ
1
3
dp3 pk γ k
= 3
θ x0 - y0 e - ipðx - yÞ - θ y0 - x0 eipðx - yÞ :
k=1 - 1 ð2π Þ 2p 0
Adding (22.272) and (22.273) together with mΔF(x - y) and taking account of
(22.201), we acknowledge that (22.271) is identical to (22.270).
We wish to relate SF(x - y) expressed as (22.270) to the momentum representa-
tion (or Fourier integral representation) as discussed in Sect. 22.3.5. We are readily
aware that the difference between (22.270) and (22.201) is with or without the factor
of ( pμγ μ + m) or (-pμγ μ + m). We derive these factors by modifying the functional
form of the integrand of (22.186). As studied in Sect. 22.3.5, (22.201) was based on
the residue obtained by the contour integral with respect to k0.
In Sect. 6.5, we considered a complex function having a pole of order n at z = a
that is represented by f(z) = g(z)/(z - a)n, where g(z) is analytic within a domain we
are thinking of. In Sect. 22.3.5, we thought of a type of a function having a simple
pole given by e-ikz/(z - a) with g(z) = e-ikz. In the present case, we are thinking of a
1048 22 Quantization of Fields
type of a function described by e-ikz(cz + b)/(z - a), where c and b are certain
constants. That is, we have
Therefore, the residue of the present case is given by g(a) = e-ika(ca + b). Taking
account of (22.270), we assume the next analytic function described by
3
gðp0 , pÞ = eipðx - yÞ e - ip0 ðx - y0 Þ
0
p0 γ 0 þ pk γ k þ m
k=1 ð22:274Þ
- ipðx - yÞ μ
=e pμ γ þ m :
3
gð- p0 , pÞ = eipðx - yÞ eip0 ðx - y0 Þ
0
- p0 γ 0 þ pk γ k þ m :
k=1
3
gð - p0 , - pÞ = e - ipðx - yÞ eip0 ðx - y0 Þ
0
- p0 γ 0 - pk γ k þ m
k=1
3
= eipðx - yÞ - p0 γ 0 - pk γ k þ m
k=1
ipðx - yÞ μ
=e - pμ γ þ m :
-i e - ipðx - yÞ pμ γ μ þ m
SF ðx - yÞ = d4 p : ð22:275Þ
ð2π Þ4 m2 - p2 - iE
- i pμ γ μ þ m
S F ð pÞ : ð22:276Þ
m2 - p2 - iE
We will make the most of the above-mentioned Feynman propagator in the next
chapter that deals with the interaction between the quantum fields.
To properly define the Lagrangian density and Hamiltonian density of the electro-
magnetic field, we revisit Part II as a preliminary stage. First, we introduce the
Heaviside-Lorentz units (in combination with the natural units) to simplify the
description of the Maxwell’s equations. It is because the use of the dielectric constant
of vacuum (ε0) and permeability of vacuum (μ0) is often troublesome. To delete
these constants, we redefine the electromagnetic quantities as follows:
p p
E, D → E= ε0 , ε0 D;
p p
H, B → H= μ0 , μ0 B;
p p
ρ, i → ε0 ρ, ε0 i:
div E = ρ,
div B = 0,
∂B ð22:277Þ
rot E þ = 0,
∂t
∂E
rot B - = i:
∂t
In (22.277) μ0ε0 = 1 is implied in accordance with the natural units; see (7.12).
Next, we introduce vector potential and scalar potential to express the electromag-
netic fields. Of the Maxwell’s equations, with the magnetic flux density B we had
div B = 0: ð7:2Þ
for any vector field V. From the above equations, we immediately express B as
B = rot A, ð22:279Þ
where A is a vector field called the vector potential. We know another formula of
vector analysis
rot— Λ 0, ð22:280Þ
where Λ is any scalar field. The confirmation of (22.280) is left for readers.
Therefore, the vector potential A has arbitrariness of choice so that B can remain
unchanged as follows:
∂A
rot E þ = 0: ð22:282Þ
∂t
∂A
E= - - — ϕ, ð22:283Þ
∂t
where ϕ is another scalar field called the scalar potential. The scalar potential ϕ and
vector potential A are said to be electromagnetic potentials as a collective term.
Nonetheless, we have no clear reason to mathematically distinguish ϕ of (22.283)
from Λ in (22.280) or (22.281).
Considering that these potentials ϕ and A have arbitrariness, we wish to set the
functional relationship between them by imposing an appropriate constraint. Such
constraint on the electromagnetic potentials is called the gauge transformation. The
gauge transformation is defined by
Note that A′ and ϕ′ denote the change in the functional form caused by the gauge
transformation. The gauge transformation holds the electric field E and magnetic
flux density B unchanged such that
∂A0 ðt, xÞ
E0 ðt, xÞ = - - — ϕ0 ðt, xÞ
∂t
∂Aðt, xÞ ∂— Λðt, xÞ ∂Λðt, xÞ
=- þ - — ϕðt, xÞ - —
∂t ∂t ∂t
∂Aðt, xÞ
=- - — ϕðt, xÞ,
∂t
where the last equality resulted from the fact that the operating order of ∂/∂t and
grad is exchangeable in the second equality. Also, we have
B0 ðt, xÞ = rot A0 ðt, xÞ = rot Aðt, xÞ - rot ½— Λðt, xÞ = rot Aðt, xÞ:
F 00 F 01 F 02 F 03 0 - E1 - E2 - E3
F 10 F 11 F 12 F 13 E1 0 - B3 B2
F μν =
F 20 F 21 F 22 F 23 E2 B3 0 - B1
F 30 F 31 F 32 F 33 E3 - B2 B1 0
= - F νμ , ð22:285Þ
where the indices 0, 1, 2, and 3 denote the t-, x-, y-, and z-component, respectively.
Note that the field quantities of (22.285) have been taken from (22.277). For later
use, we show the covariant form expressed as
F 00 F 01 F 02 F 03
F 10 F 11 F 12 F 13
F μν =
F 20 F 21 F 22 F 23
F 30 F 31 F 32 F 33
0 E1 E2 E3
- E1 0 - B3 B2
= = - F νμ : ð22:286Þ
- E2 B3 0 - B1
- E3 - B2 B1 0
where (η)ρμ is the Minkowski metric that has appeared in (21.16). We will discuss
the constitution and properties of the tensors in detail in Chap. 24.
Defining the charge–current density four-vector as
ρð xÞ
ρð xÞ i 1 ð xÞ
sν ðxÞ = , ð22:288Þ
i ð xÞ i 2 ð xÞ
i 3 ð xÞ
we get the Maxwell’s equations that are described in the Lorentz invariant form as
t t
where x represents the four-vector . Note that we use the notation to
x x
represent the four-vector and that, e.g., the notation rot A(t, x) is used to show the
dependence of a function on independent variables. In (22.290) the indices ρ, μ, and
ν are cyclically permuted. These equations are equivalent to (22.277). For instance,
in (22.289) the equation described by
implies
∂E 1 ∂E2 ∂E 3
þ þ = ρðxÞ:
∂x1 ∂x2 ∂x3
div E = ρ,
0 1 2
∂ F 12 ðxÞ þ ∂ F 20 ðxÞ þ ∂ F 01 ðxÞ = 0
means
∂E y ∂E x ∂Bz ∂B
- þ =0 or ðrot EÞz þ = 0:
∂x ∂y ∂t ∂t z
That is, the third equation of (22.277) is recovered. The confirmation of other
relations is left for readers. In the above derivations of Maxwell’s equations, note
that
0 k
∂ = ∂=∂x0 = ∂=∂x0 = ∂0 , ∂ = ∂=∂xk = - ∂=∂xk =
- ∂k ðk = 1, 2, 3Þ: ð22:291Þ
∂μ sμ ðxÞ = 0: ð22:292Þ
1054 22 Quantization of Fields
Equation (22.292) is essentially the same as the current continuity equation that
has already appeared in (7.20). Using four-vectors, the gauge transformation is
succinctly described as
μ μ
A0 ðxÞ = Aμ ðxÞ þ ∂ ΛðxÞ, ð22:293Þ
ϕð xÞ
where Aμ ðxÞ = . Equation (22.293) is equivalent to (22.284). Using Aμ(x),
Að x Þ
we get
μ ν
F μν ðxÞ = ∂ Aν ðxÞ - ∂ Aμ ðxÞ: ð22:294Þ
0 1
F 01 ðxÞ = ∂ A1 ðxÞ - ∂ A0 ðxÞ,
i.e.,
Other relations can readily be confirmed. The field strength Fμν(x) is invariant
under the gauge transformation (gauge invariance). It can be shown as follows:
μν μ ν ν μ
F 0 ðxÞ = ∂ A0 ðxÞ - ∂ A0 ðxÞ
μ ν ν μ
= ∂ ½Aν ðxÞ þ ∂ ΛðxÞ - ∂ ½Aμ ðxÞ þ ∂ ΛðxÞ
μ μ ν ν ν μ μ ν
= ∂ Aν ðxÞ þ ∂ ∂ Λ - ∂ Aμ ðxÞ - ∂ ∂ Λ = ∂ Aν ðxÞ - ∂ Aμ ðxÞ
= F μν ðxÞ,
ð22:295Þ
where with the third equality the operating order of ∂μ and ∂ν can be switched.
Further substituting (22.294) for (22.289), we obtain
μ ν μ ν
∂μ ½∂ Aν ðxÞ - ∂ Aμ ðxÞ = ∂μ ∂ Aν ðxÞ - ∂μ ∂ Aμ ðxÞ = sν ðxÞ:
Exchanging the operating order of ∂μ and ∂ν with the first equality of the above
equation, we get
μ ν
∂μ ∂ Aν ðxÞ - ∂ ∂μ Aμ ðxÞ = sν ðxÞ:
ν
□Aν ðxÞ - ∂ ∂μ Aμ ðxÞ = sν ðxÞ: ð22:296Þ
Defining in (22.296)
we obtain
ν
□Aν ðxÞ - ∂ χ ðxÞ = sν ðxÞ: ð22:298Þ
Now, we are in the position to describe the Lagrangian density to quantize the free
electromagnetic field. Choosing Aμ(x) for the canonical coordinate, we first define
the Lagrangian density L(x) as
1 1
Lð xÞ - F ðxÞF μν ðxÞ = - ∂μ Aν ðxÞ - ∂ν Aμ ðxÞ
4 μν 4
μ ν
½∂ Aν ðxÞ - ∂ Aμ ðxÞ: ð22:299Þ
Using (22.285) and (22.286) based on the classical electromagnetic fields E(t, x)
and B(t, x), we obtain
1
Lð xÞ = ½Eðt, xÞ2 - ½Bðt, xÞ2 : ð22:300Þ
2
0
μ
- E1
π μ ðxÞA_ ðxÞ = 0 - E 1 - E 2 - E 3 = E2 ,
- E2
- E3
1
ℋðxÞ = ½Eðt, xÞ2 þ ½Bðt, xÞ2 : ð22:302Þ
2
1 1
Lð xÞ - F ðxÞF μν ðxÞ - χ 2 : ð22:303Þ
4 μν 2
Rewriting FμνFμν as
1 1 μ ν
- F F μν = - ∂μ Aν ∂ Aν - ∂μ Aν ∂ Aμ
4 μν 2
1 μ 1 1
= - ∂μ Aν ∂ Aν þ ∂μ ðAν ∂ν Aμ Þ - Aν ∂ν χ,
2 2 2
we have
1 μ 1 1 1
Lð xÞ - ∂ A ∂ Aν þ ∂μ ðAν ∂ν Aμ Þ - Aν ∂ν χ - χ 2 : ð22:304Þ
2 μ ν 2 2 2
We wish to take the variation with respect to Aν (i.e., δAν) termwise in (22.304).
Namely, we have
where L1(x), ⋯, and L4(x) correspond to the first, ⋯, and fourth term of (22.304),
respectively. We obtain
22.5 Quantization of the Electromagnetic Field [5–7] 1057
1 μ μ
= dx - ∂μ δAν ∂ Aν þ ∂μ Aν ð∂ δAν Þ
2
ð22:307Þ
1 μ μ
= dx - ∂μ δAν ∂ Aν þ ∂μ Aν ð∂ δAν Þ
2
1 μ μ μ
= dxδAν ∂μ ∂ Aν þ ∂ ∂μ Aν = dxδAν ∂μ ∂ Aν ,
2
where with the third equality the second term was rewritten so that the variation was
evaluated with δAν. In the derivation of (22.307), we used the integration by parts
and assumed that δAν vanishes at xμ → ± 1 as in the derivation of (22.76). Similarly
evaluating δS2(Aν), δS3(Aν), and δS4(Aν), we have
1 ν
δS2 ðAν Þ = dxδL2 ðxÞ = - dxδAν ∂μ ∂ Aμ ,
2
1 ν
δS3 ðAν Þ = dxδL3 ðxÞ = - dxδAν ∂ ∂μ Aμ , ð22:308Þ
2
ν
δS4 ðAν Þ = dxδL4 ðxÞ = dxδAν ∂ ∂μ Aμ :
μ
δSðAν Þ = δS1 ðAν Þ = dxδAν ∂μ ∂ Aν : ð22:309Þ
Notice that δS2(Aν), δS3(Aν), and δS4(Aν) canceled out one another. Thus, the
Euler–Lagrange equation obtained from (22.309) reads as
1058 22 Quantization of Fields
μ
∂ μ ∂ A ν ð xÞ = 0 or □Aν ðxÞ = 0: ð22:310Þ
This leads to
ν
∂ν ∂ χ ðxÞ = 0: ð22:313Þ
0
∂ χ ðxÞ = ∂χ ðxÞ=∂t = 0: ð22:314Þ
Consequently, if we impose
from (22.313) we have χ(x) 0 at all the times [12]. The constraint χ(x) = 0 is called
the Lorentz condition.
Any gauge in which the Lorentz condition holds is called a Lorentz gauge
[12]. The significance of the Lorentz condition will be mentioned later in relation
to the indefinite metric (Sect. 22.5.5).
Differentiating (22.293) with respect to xμ, we obtain
μ μ
∂μ A0 ðxÞ = ∂μ Aμ ðxÞ þ ∂μ ∂ ΛðxÞ: ð22:316Þ
Using the Lorentz gauge, from (22.315) and (22.316) we have [12]
μ
∂ μ ∂ Λ ð xÞ = 0 or □ΛðxÞ = 0: ð22:317Þ
When we originally chose Λ(x) in (22.280), Λ could be any scalar field. We find,
however, that the scalar function Λ(x) must satisfy the condition of (22.317) under
the Lorentz gauge.
Under the Lorentz gauge, in turn, the last two terms of (22.304) vanish. If we
assume that Aν vanishes at x = ± 1, the second term of (22.304) vanishes as well. It
is because the integration of that term can be regarded the extension of the Gauss’s
22.5 Quantization of the Electromagnetic Field [5–7] 1059
theorem to the four-dimensional space–time (see Sect. 7.1). Thus, among four terms
of (22.304) only the first term survives as the Lagrangian density L(x) that properly
produces the physically meaningful π 0(x). That is, we get [12]
1 μ
Lð xÞ = - ∂ A ∂ Aν : ð22:318Þ
2 μ ν
Thus, the inconvenience of π 0(x) = 0 mentioned earlier has been skillfully gotten
rid of. Notice that RHS of (22.319) has a minus sign, in contrast to (22.83). What
remains to be done is to examine whether or not L(x) is invariant under the gauge
transformation (22.293). We will discuss this issue in Chap. 23.
In the subsequent sections, we perform the canonical quantization of the electro-
magnetic field using the canonical coordinates and canonical momenta.
In Part I and Part II, we discussed the polarization vectors of the electromagnetic
fields within the framework of the classical theory. In this section, we revisit the
related discussion and rebuild the polarization features of the fields from the rela-
tivistic point of view.
Previously, we have mentioned that the electromagnetic waves have two inde-
pendent polarization states as the transverse modes. Since in this section we are
dealing with the electromagnetic fields as the four-vectors, we need to consider two
extra polarization freedoms, i.e., the longitudinal mode and scalar mode. Then, we
distinguish these modes as the polarization vectors such as
where the index μ denotes the component of the four-vector (either contravariant or
covariant); k is a wavenumber vector (see Part II); r represents the four linearly
independent polarization states for each k. The modes εμ(k, 1) and εμ(k, 2) are called
transverse polarizations, εμ(k, 3) longitudinal polarization, εμ(k, 0) scalar polariza-
tion. The quantity εμ(k, r) is a complex c-number. Also, note that
1060 22 Quantization of Fields
k0
k= with k0 = k0 = jkj: ð22:320Þ
k
with
ζ 0 = - 1, ζ 1 = ζ 2 = ζ 3 = 1: ð22:323Þ
1 0
εμ ðk, 0Þ = , εμ ðk, r Þ = ðr = 1, 2, 3Þ, ð22:325Þ
0 εðk, r Þ
εμ ðk, 0Þ nμ : ð22:326Þ
Also, we have
k μ - ðknÞnμ
εμ ðk, 3Þ = , ð22:328Þ
ðknÞ2 - k 2
where the numerator was obtained by subtracting the timelike component from
k0
k= ; related discussion for the three-dimensional version can be seen in
k
Sect. 7.3.
In (22.321) and (22.322), we used complex numbers for εμ(k, s), which is
generally used for the circularly (or elliptically) polarized light (Sect. 7.4). With a
linearly polarized light, we use a real number for εμ(k, s).
All the above discussions relevant to the polarization features of the electromag-
netic fields can be dealt with within the framework of the classical theory of
electromagnetism.
Equation (22.329) corresponds to (22.84) of the scalar field case and is clearly
represented in the tensor form, i.e., a direct product of a contravariant vector Aμ(t, x)
and a covariant vector π ν(t, y). Multiplying both sides of (22.329) by ηνρ, we obtain
In (22.331) and (22.332), the minus sign of RHS comes from (22.319). All the
other commutators of other combinations vanish. (That is, all the other combinations
are commutative.) We have, e.g.,
μ ν
½Aμ ðt, xÞ, Aν ðt, yÞ = A_ ðt, xÞ, A_ ðt, yÞ = 0:
1 3
E= δ k 0 2 - k2 j p, rihp, r j dp4 : ð22:333Þ
-1 r=0
Notice that in (22.333) photon is massless. As the Fourier expansion of the field
operator Aμ(x) we obtain [2]
d3 k
Aμ ðxÞ =
ð2π Þ3 ð2k 0 Þ
3
× aðk, r Þεμ ðk, r Þe - ikx þ a{ ðk, r Þεμ ðk, r Þ eikx : ð22:334Þ
r=0
In (22.334), the summation with r is pertinent to a(k, r) [or a{(k, r)] and εμ(k, r)
[or εμ(k, r)]. In comparison with (22.208), in (22.334) we use the following
correspondence:
as well as
Then, as the decomposition of Aμ(x) into the positive-frequency term Aμ+(x) and
negative-frequency term Aμ-(x), we have
with
22.5 Quantization of the Electromagnetic Field [5–7] 1063
3
d3 k
Aμþ ðxÞ aðk, r Þεμ ðk, r Þe - ikx ,
3
ð2π Þ ð2k 0 Þ r=0
ð22:335Þ
3
μ- d3 k { μ ikx
A ð xÞ a ðk, r Þε ðk, r Þ e :
ð2π Þ3 ð2k 0 Þ r=0
Note that Aμ(x) is Hermitian, as is the case with the scalar field ϕ(x). With the
covariant quantized electromagnetic field Aμ(x), we have
3
d3 k
Aμ ð x Þ =
ð2π Þ3 ð2k0 Þ r=0
where Aμ þ ðxÞ and Aμ - ðxÞ are similarly defined as in the case of (22.335).
Now, we define operators Bμ ðkÞ as
3
Bμ ð kÞ aðk, r Þεμ ðk, r Þ: ð22:337Þ
r=0
3
Bμ ð kÞ { = a{ ðk, r Þεμ ðk, r Þ : ð22:338Þ
r=0
Using the operators Bμ ðkÞ and Bμ ðkÞ{ , we may advance discussion of the
quantization in parallel with that of the scalar field. Note the following correspon-
dence between the photon field and scalar field:
Remember once again that both the photon field operator and real scalar field
operator are Hermitian (and bosonic field operators), allowing us to carry out the
parallel discussion on the field quantization. As in the case of (22.104), we have
1064 22 Quantization of Fields
d3 x μ
Bμ ðkÞe - ik0 t = i A_ ðt, xÞ - ik 0 Aμ ðt, xÞ e - iqx , ð22:339Þ
ð2π Þ3 ð2k 0 Þ
d3 x μ
Bμ ðkÞ{ eik0 t = - i A_ ðt, xÞ þ ik 0 Aμ ðt, xÞ eiqx : ð22:340Þ
ð2π Þ3 ð2k0 Þ
where Bν ðqÞ{ is the covariant form corresponding to Bν ðqÞ{ ; the minus sign of the
head of RHS in (22.341) comes from (22.319). In fact, we have
3 3
Bμ ðkÞ, Bν ðqÞ{ = aðk, r Þεμ ðk, rÞ, a{ ðq, sÞεν ðq, sÞ
r=0 s=0
3 3
= εμ ðk, r Þεν ðq, sÞ aðk, r Þ, a{ ðq, sÞ
r=0 s=0
3 3
= ζ r εμ ðk, r Þεν ðq, sÞ ζ r aðk, r Þ, a{ ðq, sÞ , ð22:342Þ
r=0 s=0
or
3 3
Bμ ðkÞ, Bν ðqÞ{ = ζ r εμ ðk, r Þεν ðq, sÞ δrs δ3 ðk - qÞ
r=0 s=0
3
= ζ r εμ ðk, r Þεν ðq, r Þ δ3 ðk - qÞ
r=0 ð22:341Þ
3
= ζ r εμ ðq, r Þεν ðq, r Þ δ3 ðk - qÞ
r=0
= - δμν δ3 ðk - qÞ
as required. With the second to the last equality of (22.341), we used (10.116), i.e.,
the property of the δ function; with the last equality we used (22.322). It is worth
mentioning that RHS of (22.344) contains ζ r. This fact implies that the quantization
of the radiation field is directly associated with the polarization feature of the
classical electromagnetic field.
In the last section, we had the commutation relation (22.344) with the creation and
annihilation operators as well as the canonical commutation relations (22.331) and
(22.332). These equations differ from the corresponding commutation relations
(22.84) and (22.105) of the scalar field in that (22.331) has a minus sign and that
(22.344) has a factor ζ r. This feature causes problems with the subsequent
discussion.
In relation to this issue, we calculate the Hamiltonian density ℋ(x) of the
electromagnetic field. It is given by
where with the second equality we used (22.318) and (22.319). As in the case of the
scalar field, the Hamiltonian H is given by the triple integral of ℋ(x) such that
H= d3 x ℋðxÞ:
The calculations are similar to those of the scalar field (Sect. 22.3.3). We estimate
the integral termwise. With the first term of (22.345), using (22.334) and (22.336) we
have
1066 22 Quantization of Fields
μ
- A_ ðxÞA_ μ ðxÞ
3
-1 d3 k
= p aðk, r Þεμ ðk, r Þð - ik 0 Þe - ikx þ a{ ðk, r Þεμ ðk, r Þ ðik 0 Þeikx
2ð2π Þ3 k0 r=0
3
d q3
× p aðq, sÞεμ ðq, sÞð - iq0 Þe - iqx þ a{ ðq, sÞεμ ðq, sÞ ðiq0 Þeiqx
q0 s=0
3
1
= d3 k k0 - aðk, r Þεμ ðk, r Þe - ikx þ a{ ðk, r Þεμ ðk, r Þ eikx
2ð2π Þ3 r=0
3
p
× d 3 q q0 - aðq, sÞεμ ðq, sÞe - iqx þ a{ ðq, sÞεμ ðq, sÞ eiqx :
s=0
ð22:346Þ
½Integrand of ð22:346Þ
3 3
= k 0 q0 - aðk, r Þεμ ðk, r Þa{ ðq, sÞεμ ðq, sÞ e - iðk - qÞx
r=0 s=0
μ
To calculate d3 x - A_ ðxÞA_ μ ðxÞ , we exchange the integration order between
the variable x and variable k (or variable q). That is, performing the integration with
respect to x first, we obtain
d3 x ½Intrgrand of ð22:346Þ
3 3
= k 0 q0 - aðk, r Þεμ ðk, r Þa{ ðq, sÞεμ ðq, sÞ e - iðk0 - q0 Þ ð2π Þ3 δðk - qÞ
r=0 s=0
þaðk, r Þεμ ðk, r Þaðq, sÞεμ ðq, sÞe - iðk0 þq0 Þ ð2π Þ3 δðk þ qÞ
þa{ ðk, r Þεμ ðk, r Þ a{ ðq, sÞεμ ðq, sÞ eiðk0 þq0 Þ ð2π Þ3 δðk þ qÞ :
ð22:347Þ
d 3 q ð22:347Þ
3 3
= ð2π Þ3 k 0 - aðk, r Þεμ ðk, r Þa{ ðk, sÞεμ ðk, sÞ
r=0 s=0
Only when k = - k, the last two terms of (22.348) are physically meaningful. In
that case, however, we have k = 0. But this implies no photon state, and so we
discard these two terms. Compare the present situation with that of (22.111), i.e., the
scalar field case, where k = 0 is physically meaningful. Then, using (22.321) we get
d 3 q ð22:347Þ
3 3
= ð2π Þ3 k 0 - aðk, r Þεμ ðk, sÞð - ζ r δrs Þ - a{ ðk, r Þaðk, sÞð - ζ r δrs Þ
r=0 s=0
3
= ð2π Þ3 k 0 ζ r aðk, r Þa{ ðk, r Þ þ a{ ðk, r Þaðk, r Þ
r=0
3
= ð2π Þ3 k 0 ζ r a{ ðk, r Þaðk, r Þ þ ζ r δ3 ð0Þ þ a{ ðk, r Þaðk, r Þ
r=0
3
= ð2π Þ3 k 0 ζ r 2a{ ðk, r Þaðk, r Þ þ ζ r 2 δ3 ð0Þ
r=0
3
= ð2π Þ3 k 0 ζ r 2a{ ðk, r Þaðk, r Þ þ ð2π Þ3 k0 4δ3 ð0Þ,
r=0
where with the third equality we used (22.344); with the last equality we used
(22.323). Finally, we get
μ
d3 x - A_ ðxÞA_ μ ðxÞ
3 ð22:349Þ
{
= d kk 0
3
ζ r a ðk, r Þaðk, r Þ þ 2 d kk0 δ ð0Þ:
3 3
r=0
In (22.349) the second term leads to the divergence as discussed in Sect. 22.3.3.
As before, we discard this term; namely the normal order is implied there.
1068 22 Quantization of Fields
As for the integration of the second term of (22.345), use the Fourier expansion
formulae (22.334) and (22.336) in combination with the δ function formula (22.101).
That term will vanish because of the presence of a factor
k μ kμ = k0 k 0 - jkj2 = 0:
The confirmation is left for readers. Thus, we obtain the Hamiltonian H described
by
3
H= d 3 kk 0 ζ r a{ ðk, r Þaðk, r Þ: ð22:350Þ
r=0
3
H j 1q,s i = d 3 kk0 ζ r a{ ðk, r Þaðk, r Þa{ ðq, sÞ j 0i
r=0
3
= d 3 kk0 ζ r a{ ðk, r Þ a{ ðq, sÞaðk, r Þ þ ζ r δrs δ3 ðk - qÞ j 0i
r=0
3
= d 3 kk0 ζ r a{ ðk, r Þζ r δrs δ3 ðk - qÞ j 0i
r=0
where with the second equality we used (22.344); with the third equality we used
a(k, r) j 0i = 0; with the third to the last equality we used (22.323). Thus, (22.352)
certainly shows that the energy of the one-photon state is k0.
As a square of the norm h1q, s| 1q, si [see (13.8)], we have
1q,s j1q,s = 0jaðq, sÞa{ ðq, sÞj0 = 0ja{ ðq, sÞaðq, sÞ þ ζ s δ3 ðq - qÞj0
ð22:353Þ
= ζ s δ3 ð0Þh0j0i = ζs δ3 ð0Þ,
where with the last equality we assumed that the vacuum state is normalized. With
s = 0, we have ζ 0 = - 1 and, hence,
22.5 Quantization of the Electromagnetic Field [5–7] 1069
This implies that the (square of) norm of a scalar photon is negative. The negative
norm is said to be an indefinite metric. Admittedly, (22.354) violates the rule of inner
product calculations (see Sect. 13.1). In fact, such a metric is intractable, and so it
should be appropriately dealt with. So far as the factor a(k, 0)a{(k, 0) is involved, we
could not avoid inconvenience. We wish to remove it from quantum-mechanical
operators.
At the same time, we have another problem with the Lorentz condition described
by
∂μ Aμ ðxÞ = 0: ð22:355Þ
This problem was resolved by Gupta and Bleuler by replacing the Lorentz
condition (22.355) with a weaker condition. The point is that we divide the
Hermitian operator Aμ(x) into Aμ+(x) and Aμ-(x) such that
with
3
d3 k
Aμþ ðxÞ aðk, r Þεμ ðk, r Þe - ikx ,
3
ð2π Þ ð2k 0 Þ r=0
ð22:335Þ
3
μ- d3 k { μ ikx
A ð xÞ a ðk, r Þε ðk, r Þ e :
ð2π Þ3 ð2k 0 Þ r=0
where jΨi is any physically meaningful state. Taking an adjoint of (22.357), we have
hΨ j ∂μ Aμ - ðxÞ = 0: ð22:358Þ
Accordingly, we obtain
1070 22 Quantization of Fields
Equation (22.359) implies that the Lorentz condition holds with an “expectation
value.” Substituting Aμ+(x) of (22.335) for (22.357), we get [2]
3
k μ aðk, r Þεμ ðk, r Þ j Ψi = 0: ð22:360Þ
r=0
With the above equation, notice that the functions e-ikx in (22.335) are linearly
independent with respect to individual k. Recalling the special coordinate frame
where k is directed to the positive direction of the z-axis, we have [2]
1 0
0 1
ð0Þ ð1Þ
ε0 ðk, 0Þ = δ0 , ε1 ðk, 1Þ = δ1 ,
0 0
0 0
ð22:361Þ
0 0
0 0
ð2Þ ð3Þ
ε2 ðk, 2Þ = δ2 , ε3 ðk, 3Þ = δ3 ,
1 0
0 1
we obtain
3 3
ð0Þ ð3Þ
kμ aðk, r Þεμ ðk, r Þ = k 0 aðk, r Þδ0 þ k 3 aðk, r Þδ3
r=0 r=0
= k0 ½aðk, 0Þ - aðk, 3Þ:
ð0Þ ð3Þ
With δ0 , δ3 , etc. in (22.361), see (12.175). Then, (22.360) is rewritten as
Since k0 ≠ 0, we have
22.5 Quantization of the Electromagnetic Field [5–7] 1071
Meanwhile, sandwiching both sides of (22.350) between hΨj and jΨi, we obtain
3
hΨjH jΨi = d3 kk0 Ψ ζ r a{ ðk, r Þaðk, r Þ Ψ : ð22:367Þ
r=0
2
hΨjH jΨi = d3 kk 0 Ψ a{ ðk, r Þaðk, r Þ Ψ : ð22:368Þ
r=1
Thus, we have successfully gotten rid of the factor a{(k, 0)a(k, 0) together with
{
a (k, 3)a(k, 3) from the Hamiltonian.
As in the case of both the scalar field and Dirac field, we wish to find the Feynman
propagator of the photon field. First, we calculate [Aμ(x), Aν( y)]. Using (22.334), we
have
1072 22 Quantization of Fields
3
þa{ ðk, rÞεμ ðk, rÞ eikx g, aðq, sÞεν ðq, sÞe - iqx þ a{ ðq, sÞεν ðq, sÞ eiqx
s=0
3 3
d 3 kd3 q
= εμ ðk, r Þεν ðq, sÞ e - ikx eiqx aðk, r Þ, a{ ðq, sÞ
ð2π Þ3 2 k 0 q0 r=0 s=0
where the second to the last equality we used (22.344) and with the last equality we
used (22.322). Comparing (22.369) with (22.169), we find that (22.369) differs from
(22.169) only by the factor -ημν. Also, notice that photon is massless (m = 0).
Therefore, we have
Once we have gotten the invariant delta function Dμν(x) in the form of (22.369),
we can immediately obtain the Feynman propagator with respect to the photon field.
Defining the Feynman propagator DμνF ðxÞ as
Dμν μ ν
F ðxÞ h0jT½A ðxÞA ðyÞj0i ð22:372Þ
Dμν μν
F ðxÞ = lim ½ - η ΔF ðx - yÞ:
m→0
-i e - ikðx - yÞ
Dμν
F ð x - yÞ = d4 kð- ημν Þ
ð2π Þ4 - k2 - iE
-i ημν e - ikðx - yÞ
= d4 k : ð22:373Þ
ð2π Þ4 k2 þ iE
μν - iημν iημν
DF ðk Þ = = = lim - ημν ΔF ðkÞ : ð22:374Þ
k þ iE
2
- k 2 - iE m → 0
2
ημν = - εμ ðk, r Þ εν ðk, r Þ - εμ ðk, 3Þ εν ðk, 3Þ þ εμ ðk, 0Þ εν ðk, 0Þ
r=1
The quantity ημν in (22.374) can be replaced with that of (22.375) and the
μν
resulting DF ðk Þ can be used for the calculations of interaction among the quantum
fields.
Meanwhile, in (22.375) we assume that k2 ≠ 0 in general cases. With the real
photon we have k2 = 0. Then, we get
kμ k ν - ðknÞðkμ nν þ nμ kν Þ
2
ημν = - εμ ðk, r Þ εν ðk, r Þ - : ð22:376Þ
r=1 ðknÞ2
2 k μ kν - ðknÞðkμ nν þ nμ k ν Þ
r=1
εμ ðk, r Þ εν ðk, r Þ = - ημν - : ð22:377Þ
ðknÞ2
Equation (22.377) will be useful in evaluating the matrix elements of the S-matrix
including the Feynman amplitude; see the next chapter, especially Sect. 23.8.
1074 22 Quantization of Fields
References
1. Goldstein H, Poole C, Safko J (2002) Classical mechanics, 3rd edn. Addison Wesley, San
Francisco
2. Sakamoto M (2014) Quantum field theory. Shokabo, Tokyo. (in Japanese)
3. Byron FW Jr, Fuller RW (1992) Mathematics of classical and quantum physics. Dover,
New York
4. Sunakawa S (1991) Quantum mechanics. Iwanami, Tokyo. (in Japanese)
5. Mandl F, Shaw G (2010) Quantum field theory, 2nd edn. Wiley, Chichester
6. Itzykson C, Zuber J-B (2005) Quantum field theory. Dover, New York
7. Kaku M (1993) Quantum field theory. Oxford University Press, New York
8. Kugo C (1989) Quantum theory of gauge field I. Baifukan, Tokyo. (in Japanese)
9. Greiner W, Reinhardt J (1996) Field quantization. Springer, Berlin
10. Boas ML (2006) Mathematical methods in the physical sciences, 3rd edn. Wiley, New York
11. Dennery P, Krzywicki A (1996) Mathematics for physicists. Dover, New York
12. Schweber SS (1989) An introduction to relativistic quantum field theory. Dover, New York
Chapter 23
Interaction Between Fields
So far, we have focused on the characteristics of the scalar field, Dirac field, and
electromagnetic field that behave as free fields without interaction among one
another. In this chapter, we deal with the interaction between these fields. The
interaction is described as that between the quantized fields, whose important
features have been studied in Chap. 22. The interactions between the fields are
presented as non-linear field equations. It is difficult to obtain analytic solutions of
the non-linear equation, and so suitable approximation must be done. The method
for this is well-established as the perturbation theory that makes the most of the S-
matrix. Although we have dealt with the perturbation method in Chap. 5, the method
we use in the present chapter differs largely from the previous method. Here we
focus on the interaction between the Dirac field (i.e., electron) and the electromag-
netic field (photon). The physical system that comprises electrons and photons has
been fully investigated and established as the quantum electrodynamics (QED). We
take this opportunity to get familiar with this major field of quantum mechanics. In
the last part of the chapter, we deal with the Compton scattering as a typical example
to deepen understanding of the theory [1].
Although in the quantum theory of fields abstract and axiomatic approaches tend to
be taken, we wish to adhere to an intuitive approach to the maximum extent possible.
To that effect, we get back to the Lorentz force we studied in Chaps. 4 and 15. The
Lorentz force and its nature are widely investigated in various areas of natural
science. The Lorentz force F(t) is described by
where the first term is electric Lorentz force and the second term represents the
magnetic Lorentz force.
We wish to relate the Lorentz force to the Hamiltonian of the interacting physical
system within the framework of the quantum field theory. To this end, the minimal
coupling (or minimal substitution) is an important concept in association with the
covariant derivative. The covariant derivative is defined as [1]
∂ ∂
i → i - qϕðt, xÞ,
∂t ∂t
1 1
— → — - qAðt, xÞ, ð23:1Þ
i i
where q is a charge of the matter field; ϕ(t, x) and A(t, x) are the electromagnetic
potentials. Using the four-vector notation, we have
where Dμ defines the covariant derivative, which leads to the interaction between the
electron and photon. This interaction is referred to as a minimal coupling. To see
how the covariant derivative produces the interaction, we replace ∂μ with Dμ in
(22.213) such that
iγ μ Dμ - m ψ ðxÞ = 0: ð23:3Þ
Then, we have
3 3
iγ 0 ∂0 ψ = - i k=1
γ k ∂k þ eγ 0 A0 þ e k=1
γ k Ak þ m ψ: ð23:5Þ
3 3
i∂0 ψ = - i k=1
γ 0 γ k ∂k þ eγ 0 γ 0 A0 þ e k=1
γ 0 γ k Ak þ mγ 0 ψ
3 3
= -i k=1
γ 0 γ k ∂k þ mγ 0 þ eA0 þ e k=1
γ 0 γ k Ak ψ
23.1 Lorentz Force and Minimal Coupling 1077
1
= α — þ mβ þ ðeA0 - eα AÞ ψ, ð23:6Þ
i
γ1
where α γ 0γ; γ = γ2 and β γ 0. The operator iα
1
— þ mβ is identical with
γ3
the one-particle Hamiltonian H~ (of the free Dirac field) defined in (22.222). There-
fore, the operator (eA0 - eα A) in (23.6) is the interaction Hamiltonian Hint that
represents the interaction between the Dirac field and electromagnetic field. Sym-
bolically writing (23.6), we have
~ þ H int ψ ðxÞ,
i∂0 ψ ðxÞ = H ð23:7Þ
where
~ = 1 α — þ mβ
H ð23:8Þ
i
and
~ þ H int ,
H=H ð23:10Þ
we get
Equations (23.7) and (23.11) show the Schrödinger equation of the Dirac field
that interacts with the electromagnetic field. The interaction Hamiltonian Hint is
directly associated with the Lorentz force F(t) described by (4.88).
Representing E and B using the electromagnetic potential as in (22.279) and
(22.283), we get
∂Aðxðt ÞÞ
Fð t Þ = - e — ϕ ð x ð t Þ Þ - þ exð_t Þ × rot Aðxðt ÞÞ: ð23:12Þ
∂t
where — A acts only on A (i.e., vector potential). Then, (23.12) is further rewritten as
1078 23 Interaction Between Fields
∂Aðxðt ÞÞ
Fð t Þ = - e — ϕ ð x ð t Þ Þ - þ e— A A xð_t Þ - e xð_t Þ — A:
∂t
P P
- FC ðt Þ dl = e — ϕðxðt ÞÞ - — A A xð_t Þ dl
Q Q
P P
=e dϕðxðt ÞÞ - e d xð_t Þ A
Q Q
P
- FC ðt Þ dl = eϕðPÞ - exð_t Þ AðPÞ: ð23:15Þ
Q
H int = eγ 0 γ μ Aμ :
Thus, the interaction Hamiltonian density Hint ðxÞ is given by the integrand of
(23.16) as
Note that in (23.18) we have derived the Lagrangian density from the Hamilto-
nian density in a way opposite to (22.37). This derivation, however, does not cause a
problem, because regarding the equation of motion of the interacting system we are
considering now, the interaction does not involve derivatives of physical variables
(see Sect. 22.1). Or rather, since the kinetic energy is irrelevant to the interaction,
from (22.31) and (22.40) we have
1 μ
= ψ ðxÞ iγ μ ∂μ - m ψ ðxÞ - ∂ A ðxÞ∂ Aν ðxÞ - eψ ðxÞγ μ Aμ ðxÞψ ðxÞ: ð23:19Þ
2 μ ν
Note that in some textbooks [1] the sign of the third term is reversed. This is due
to the difference in the definition of e (i.e., e < 0 in this book, whereas e > 0 in [1]).
Rewriting (23.19) using the covariant derivative, we get
1 μ
LðxÞ = ψ ðxÞ iγ μ ∂μ þ ieAμ ðxÞ - m ψ ðxÞ - ∂μ Aν ðxÞ∂ Aν ðxÞ
2
1 μ
= ψ ðxÞ iγ μ Dμ - m ψ ðxÞ - ∂μ Aν ðxÞ∂ Aν ðxÞ: ð23:20Þ
2
where with the last equality we used γ 0γ 0 = 1. Meanwhile, the conserved current J is
represented as
= dx δψ iγ μ ∂μ ψ - mψ - eγ μ Aμ ðxÞψ ðxÞ
or
Equation (23.27) is the Dirac equation with the system where electron and photon
are interacting with each other. As in the case of Sect. 22.4.1, (23.27) and (23.28) are
equivalent. Also, note that (21.150) is the adjoint Dirac equation (without interaction
with photon), if we assume that the solution of the Dirac equation is given by
ψ(x) = u(p, h) e-ipx.
In conjunction with (23.27), we remark the implication of the charge conjugation.
Taking the adjoint of (23.27), we have
- i∂μ ψ { γ μ{ - eψ { A{μ γ μ{ - mψ { = 0,
- i∂μ ψ { γ 0 γ 0 γ μ{ γ 0 - eψ { γ 0 γ 0 Aμ γ μ{ γ 0 - mψ { γ 0 = 0,
- i∂μ ψγ μ - eAμ ψγ μ - mψ = 0,
- i∂μ γ μT ψ T - eAμ γ μT ψ T - mψ T = 0,
i∂μ γ μ ψ C þ eAμ γ μ ψ C - mψ C = 0:
Hence, we get
iγ μ ∂μ þ eγ μ Aμ - m ψ C = 0: ð23:29Þ
Notice that during the above processes that derived (23.29), Aμ commutes with
the gamma matrices γ μ. With the operators ψ C and C, see (21.161) and (21.164).
Comparing (23.27) and (23.29), we find that e is exchanged with -e (-e > 0, i.e.,
positive charge). Thus, (23.29) represents the field equation of the positron.
In Chap. 22, we mentioned several characteristics of the gauge field and the
associated gauge transformation. To give a precise definition of the “gauge field”
and “gauge transformation,” however, is beyond the scope of this book. Yet, we can
get an intuitive concept of these principles at the core of the covariant derivative. In
Chap. 22, we have shown that the field strength Fμν(x) is invariant under the gauge
transformation of (22.293).
Meanwhile, we can express an exponential function (or phase factor) as
eiΘðxÞ , ð23:30Þ
where Θ(x) is a real function of x. The phase factor (23.30) is different from those
that appeared in Part I in that in this chapter the exponent may change depending
upon the space-time coordinate x. Let us consider a function transformation
described by
Then, GðxÞ forms a U(1) group (i.e., continuous transformation group). That is,
(A1) closure and (A2) associative law are evident (see Sect. 16.1) by choosing real
functions Θλ(x) (λ 2 Λ), where Λ can be a finite set or infinite set with different
cardinalities (see Sect. 6.1). (A3) Identity element corresponds to Θ(x) 0.
(A4) Inverse element eiΘ(x) is e-iΘ(x). The functional transformation described by
(23.31) and related to GðxÞ is said to be a local phase transformation. The important
point is that if we select Θ(x) appropriately, we can make the covariant derivative of
f(x) in (23.31) undergo the same local phase transformation as f(x). In other words,
appropriately selecting a functional form as Θ(x), we may have
0
Dμ f ðxÞ = eiΘðxÞ Dμ f ðxÞ , ð23:33Þ
ΘðxÞ = - qΛðxÞ:
0
Dμ f ðxÞ = ∂μ þ iqA0μ ðxÞ f 0 ðxÞ = ∂μ f 0 ðxÞ þ iqA0μ ðxÞf 0 ðxÞ
μ
= ∂μ e - iqΛðxÞ f ðxÞ þ iq½Aμ ðxÞ þ ∂ ΛðxÞe - iqΛðxÞ f ðxÞ
In (23.34), the prime (′) denotes either the local phase transformation with the
Dirac field or the gauge transformation with the electromagnetic field. Applying
(23.34) to ψ ðxÞ iγ μ Dμ - m ψ ðxÞ, we get
1084 23 Interaction Between Fields
0
ψ ðxÞ iγ μ Dμ - m ψ ðxÞ = eiqΛðxÞ ψ ðxÞe - iqΛðxÞ iγ μ Dμ - m ψ ðxÞ
= ψ ðxÞ iγ μ Dμ - m ψ ðxÞ: ð23:35Þ
Thus, the first term of LðxÞ of (23.20) is invariant under both the gauge transfor-
mation (22.293) and the local phase transformation (23.31). As already seen in
(22.318), the second term of LðxÞ of (23.20) represents the Lagrangian density of
the free electromagnetic field. From (22.316) and (22.317), ∂μAμ(x) is gauge invari-
ant if one uses the Lorentz gauge. It is unclear, however, whether the second term of
LðxÞ in (23.20) is gauge invariant. To examine it, from (22.293) we have
μ ν μ ν
∂μ A0ν ðxÞ∂ A0 ðxÞ = ∂μ f½Aν þ ∂ν ΛðxÞgf∂ ½Aν ðxÞ þ ∂ ΛðxÞg
μ μ ν
= ∂μ Aν ðxÞ þ ∂μ ∂ν ΛðxÞ ½∂ Aν ðxÞ þ ∂ ∂ ΛðxÞ
μ μ ν
= ∂μ Aν ðxÞ∂ Aν ðxÞ þ ∂μ Aν ðxÞ∂ ∂ ΛðxÞ
μ μ ν
þ∂μ ∂ν ΛðxÞ∂ Aν ðxÞ þ ∂μ ∂ν ΛðxÞ∂ ∂ ΛðxÞ: ð23:36Þ
Thus, we see that the term ∂μAν(x)∂μAν(x) is not gauge invariant. Yet, this does
not cause a serious problem. Let us think of the action integral S(Aν) of
∂μAν(x)∂μAν(x). As in (22.306), we have
μ ν
S A0ν = dx ∂μ A0ν ðxÞ∂ A0 : ð23:37Þ
Taking the variation with respect to Aν and using integration by parts with the
second and third terms of (23.36), we are left with ∂μ∂μΛ(x) for them on condition
that Aν(x) vanishes at x = ± 1. But, from (22.317), we have
μ
∂μ ∂ ΛðxÞ = 0: ð23:38Þ
Taking variation with respect to Λ with the last term of (23.36), we also have the
same result of (23.38). Thus, only the first term of (23.36) survives on condition that
Aν(x) vanishes at x = ± 1 and, hence, we conclude that LðxÞ of (23.20) is virtually
unchanged in total by the gauge transformation. That is, LðxÞ is gauge invariant.
In this book, we restrict the gauge field to the electromagnetic vector field Aμ(x).
Correspondingly, we also restrict the gauge transformation to (22.293). By virtue of
the local phase transformation (23.31) and the gauge transformation (22.293), we
can construct a firm theory of QED. More specifically, we can develop the theory
that uniquely determines the interaction between the Dirac field and gauge field (i.e.,
electromagnetic field).
Before closing this section, it is worth mentioning that the gauge fields must be
accompanied by massless particles. Among them, the photon field is the most
important both theoretically and practically. Moreover, the photons are only an
23.4 Interaction Picture 1085
entity that has ever been observed experimentally as a massless free particle.
Otherwise, gluons are believed to exist, accompanied by the vector bosonic field,
but those have never been observed as a massless free particle so far [4, 5]. Also, the
massless free particles must move at a light velocity.
We revisit Example 21.1 where the inertial frame of reference O′ is moving with a
constant velocity v (>0) relative to the frame O. Suppose that the x-axis of O and the
x′-axis of O′ coincide and that the frame O′ is moving in the positive direction of the
x′-axis. Also, suppose that a particle is moving along the x-axis (x′-axis, too) at a
velocity of u and u′ relative to the frame O and O′, respectively. Then, we have [6]
u-v u0 þ v
u0 = and u= : ð23:39Þ
1 - uv 1 þ u0 v
In the previous sections, we have seen how the covariant derivative introduces the
interaction term into the Dirac equation through minimal coupling. Meanwhile, we
also acknowledge that this interaction term results from the variation of the action
integral related to the interaction Lagrangian. As a result, we have gotten the proper
field equation of (23.27). The magnitude of the interaction relative to the free field,
however, is not explicitly represented in (23.27). In this respect, the situation is
different from the case of Chap. 5 where the interaction Hamiltonian was quantita-
tively presented. Thus, we are going to take a different approach in this section. The
point is that the Hamiltonian density Hint ðxÞ is explicitly presented in (23.17). This
provides a strong clue to solving the problem. We adopt an interaction picture to
address a given issue.
Let us assume the time-dependent total Hamiltonian H(t) as follows:
H ðt Þ = H 0 ðt Þ þ H int ðt Þ, ð23:40Þ
To deal with the interaction between the quantum fields systematically, we revisit
Sect. 1.2. In Chap. 1, we solved the Schrödinger equation by the method of the
separation of variables and obtained the solution of (1.48) described by
where we adopted the natural units of ħ = 1. More generally, the solution of (1.47) is
given as
With the exponential function of the operator H(t), especially that given in a
matrix form see Chap. 15. Since H(t) is Hermitian, -iH(t)t is anti-Hermitian. Hence,
exp[-iH(t)t] is unitary; see Sect. 15.2. Thus, (23.42) implies that the state vector
jψ(t, x)i evolves with time through the unitary transformation of exp[-iH(t)t],
starting with | ψ(0, x)i.
Now, we wish to generalize the present situation in some degree. We can think of
the situation as viewing the same vector in reference to a different set of basis vectors
(Sect. 11.4). Suppose that jψ(λ, x)i is an element of a vector space V (or Hilbert
space, more generally) that is spanned by a set of basis vectors E λ ðλ 2 ΛÞ, where Λ
is an infinite set (see Sect. 6.1). Also, let jψ(λ, x)i′ be an element of another vector
space V 0 spanned by a set of basis vectors E 0λ ðλ 2 ΛÞ. In general, V 0 might be
different from the original vector space V , but in this chapter we assume that the
mapping caused by an operator is solely an endomorphism (see Sect. 11.2). Suppose
that a quantum state ψ is described as
Equation (23.43) is given pursuant to (11.13) so that the vector ψ can be dealt
with in a vector space with an infinite dimension. Note, however, if we were strictly
pursuant to (13.32), we would write instead
where jE λ i is the basis set of the inner product space with ψ(λ, x) being a coordinate
of a vector in that space. A reason why we use (23.43) is that we place emphasis on
the coordinate representation; the basis set representation is hardly used in the
present case.
Operating O on the function ψ, we have
23.4 Interaction Picture 1087
where O is an arbitrary Hermitian operator. For a future purpose, we assume that the
operator O does not depend on λ in a special coordinate system (or frame). Equation
(23.44) can be rewritten as
E 0λ E λ V ðλÞ, O0 ðλÞ V ðλÞ{ OV ðλÞ, V ðλÞ{ jψ ðλ, xÞi jψ ðλ, xÞi0 , ð23:46Þ
Equation (23.47) shows how the operator O′(λ) and the state vector jψ(λ, x)i′
change in accordance with the transformation of the basis vectors E λ . We can
develop the theory, however, so that jψ(λ, x)i′ may not depend upon λ in a specific
frame.
To show how it looks in the present case, according to the custom we choose
t (time variable) for λ and choose a unitary operator exp[-iH(t)t] for V(t). Namely,
V ðt Þ = exp½ - iH ðt Þt U ðt Þ: ð23:48Þ
where OS denotes the operator O in the Schrödinger frame. Equation (23.49) can
further be rewritten as
where we define
OH ðt Þ U ðt Þ{ OS U ðt Þ: ð23:53Þ
Equations (23.49) and (23.50) imply the invariance of O(ψ) upon transfer from
the Schrödinger frame to the Heisenberg frame. The situation is in parallel to that of
Sect. 11.4. We can view (23.51) and (23.52) as the unitary transformation between
the two vectors | ψ(t, x)iS and | ψ(0, x)iH. Correspondingly, (23.53) represents the
unitary similarity transformation of the operator O between the two frames.
In (23.53), we assume that OH(t) is a time-dependent operator, but that OS is time-
independent. In particular, if we choose the Hamiltonian H(t) for O,
H H ðt Þ = H H = H S : ð23:54Þ
It is because H(t) and U(t) = exp [-iH(t)t] are commutative. Thus, the Hamil-
tonian is a time-independent quantity, regardless of the specific frames chosen.
Differentiating OH(t) with respect to t in (23.53), we obtain
23.4 Interaction Picture 1089
dOH ðt Þ
= iH ðt ÞeiH ðtÞt OS e - iH ðtÞt þ eiH ðtÞt OS ½ - iH ðt Þe - iH ðtÞt
dt
= iH ðt ÞeiH ðtÞt OS e - iH ðtÞt - ieiH ðtÞt OS e - iH ðtÞt H ðt Þ
where with the second equality, we used the fact that e-iH(t)t and H(t) are commu-
tative. Multiplying both sides of (23.55) by i, we get
dOH ðt Þ
i = ½OH ðt Þ, H ðt Þ: ð23:56Þ
dt
where PS is the physical quantity of the Schrödinger picture that corresponds to PH(t)
of the Heisenberg picture. If, in particular, [OS, PS] = c (c is a certain constant),
[OH(t), PH(t)] = c as well. Thus, we have
where q and p are the canonical coordinate and canonical momentum, respectively.
Thus, the unitary similarity transformation described by (23.53) makes the canonical
commutation relation invariant. Such a transformation is called the canonical
transformation.
Now, we get back to the major issue related to the interaction picture. A major
difficulty in solving the interacting field equation rests partly on the fact that H0(t)
and Hint(t) in general are not commutative. Let us consider another transformation of
the state vector and operator different from (23.50). That is, using
U 0 ðt Þ exp½ - iH 0 ðt Þt ð23:59Þ
where we define
1090 23 Interaction Between Fields
OI ðt Þ U 0 ðt Þ{ OS U 0 ðt Þ: ð23:62Þ
In (23.62), OI(t) is a physical quantity of the interaction picture. This time, | ψ(t,
x)iI is time-dependent.
The previous results (23.45) and (23.46) obtained with the Schrödinger picture
and Heisenberg picture basically hold as well between the Schrödinger picture and
the interaction picture. Differentiating | ψ(t, x)iI with respect to t, we get
where with the third equality eiH 0 ðtÞt and H0(t) are commutative. In (23.63), we
defined Hint(t)I as
Also, notice again that in (23.63) e ± iH 0 ðtÞt and Hint(t) are not commutative in
general. Multiplying both sides of (23.63) by i, we obtain
dOI ðt Þ
i = ½OI ðt Þ, H 0 ðt Þ: ð23:66Þ
dt
| ( , )⟩ | ( , )⟩
djΨðt, xÞi
i = H int ðt ÞjΨðt, xÞi, ð23:67Þ
dt
where the subscript I of | ψ(t, x)iI and Hint(t)I was omitted. Namely, as we work
solely on the interaction picture, we redefine the state vector and Hamiltonian such
that
If we have no interaction, i.e., Hint(t) = 0, we have a state vector | Ψ(t, x)i constant
in time. Suppose that the interaction is “switched on” at a certain time ton and that the
state vector is in | Ψ(ti, x)i at an initial time t = ti well before ton. In that situation, we
define the state vector | Ψ(ti, x)i as
Also, suppose that the interaction is “switched off” at a certain time toff and that
the state vector is in | Ψ(tf, x)i at a final time t = tf well after toff. Then, the state vector
| Ψ(tf, x)i is approximated by
Because of the interaction (or collision), the state | Ψ(1, x)i may include different
final states | fαi (α = 1, 2, ⋯), which may involve different species of particles. If we
assume that | fαi form a complete orthonormal system (CONS), we have [1]
1092 23 Interaction Between Fields
jf α ihf α j = E, ð23:71Þ
α
(i) The state | ii is described by the Fock state [see (22.126)] that is characterized by
the particle species and definite number of those particles with definite momenta
and energies. The particles in | ii are far apart from one another before the
collision such that they do not interact with one another. The particles in | fαi are
also far apart from one another after the collision.
(ii) The states | ii and | fαi are described by the plane wave.
Normally, the interaction is assumed to cause in a very narrow space and short
time. During the interaction, we cannot get detailed information about | Ψ(t, x)i.
Before and after the interaction, the species or number of particles may or may not be
conserved. The plane wave description is particularly suited for the actual experi-
mental situation where accelerated particle beams collide. The bound states, on the
other hand, are neither practical nor appropriate. In what follows we are going to deal
with electrons, positrons, and photons within the framework of QED.
The S-matrix S is defined as an operator that relates | fαi to | ii such that
where with the first near equality, we used (23.70). Then, (23.72) can be rewritten as
Equation (23.74) is a plane wave expansion of | Ψ(1, x)i. If jΨ(tf, x)i and jii are
normalized, we have
Since the transformation with S does not change the norm, S is unitary.
Integrating (23.67) with respect to t from -1 to t, we get
t
j Ψðt, xÞi = j Ψð- 1, xÞi þ ð- iÞ dt 1 H int ðt 1 Þ j Ψðt 1 , xÞi
-1
23.5 S-Matrix and S-Matrix Expansion 1093
t
= j ii þ ð- iÞ dt 1 H int ðt 1 Þ j Ψðt 1 , xÞi: ð23:76Þ
-1
The RSH of (23.76) comprises jii and an additional second term. Since we can
solve (23.67) only iteratively, as a next step replacing jΨ(t1, x)i with
t1
j ii þ ð- iÞ H int ðt 2 Þ j Ψðt 2 , xÞi, ð23:77Þ
-1
we obtain
t
j Ψðt, xÞi = j ii þ ð- iÞ dt 1 H int ðt 1 Þ j ii
-1
t t1
þð- iÞ2 dt 1 dt 2 H int ðt 1 ÞH int ðt 2 Þ j Ψðt 2 , xÞi: ð23:78Þ
-1 -1
j Ψðt, xÞi
= j ii
t tn - 1
r
þ n=1
ð- iÞn dt 1 ⋯ dt n H int ðt 1 Þ⋯H int ðt n ÞjΨðt n , xÞi , ð23:79Þ
-1 -1
tr
j ii þ ð- iÞ H int ðt rþ1 Þ j Ψðt rþ1 , xÞi,
-1
we cut off the iteration process in this stage. Hence, jΨ(t, x)i of (23.79) is approx-
imated by
t tn - 1
r
j Ψðt, xÞi ≈ j ii þ n=1
ð- iÞn dt 1 ⋯ dt n H int ðt 1 Þ⋯H int ðt n Þjii
-1 -1
t tn - 1
r
= j ii þ n=1
ð- iÞn dt 1 ⋯ dt n H int ðt 1 Þ⋯H int ðt n Þ j ii
-1 -1
t tn - 1
r
= Eþ n=1
ð- iÞn dt 1 ⋯ dt n H int ðt 1 Þ⋯H int ðt n Þ j ii: ð23:80Þ
-1 -1
j Ψð1, xÞi
1 1
r
≈ Eþ n=1
ð- iÞn dt 1 ⋯ dt n H int ðt 1 Þ⋯H int ðt n Þ j ii: ð23:81Þ
-1 -1
1 1
r
S=E þ n=1
ð- iÞn dt 1 ⋯ dt n H int ðt 1 Þ⋯H int ðt n Þ :
-1 -1
we get
r
S= n=0
SðnÞ : ð23:83Þ
The matrix S(n) is said to be the n-th order S-matrix. In (23.83), the number r can
be as large as one pleases, even though the related S-matrix calculations become
increasingly complicated.
Taking the T-product of (23.82), moreover, we get
1 1 1
r ð- iÞn
SðrÞ = n = 0 n!
dt 1 dt 2 ⋯ dt n T½H int ðt 1 Þ⋯H int ðt n Þ: ð23:84Þ
-1 -1 -1
Note that the factorial in the denominator of (23.84) resulted from taking the
T-product.
Using the Hamiltonian density Hint ðxÞ of (23.17), we rewrite (23.84) as
1 1 1
r ð- iÞn
Sð r Þ = n = 0 n!
dx1 dx2 ⋯ dxn T½Hint ðx1 Þ⋯Hint ðxn Þ: ð23:85Þ
-1 -1 -1
j Ψð1, xÞi = n
f α jSðnÞ ji jf α i: ð23:86Þ
α
23.6 N-Product and T-Product 1095
By virtue of using Hint ðxÞ, S has been made an explicitly covariant form. Notice
that we must add a factor -1 when we reorder Fermion quantities as in the case of
the T-product. But, in the above we have not put the factor -1, because Hint ðxÞ
contains two Fermion quantities and, hence, exchanging Hint ðxi Þ and
Hint xj xi ≠ xj produces even number permutations of the Fermion quantities.
Then, even permutations produce a factor (-1)2n = 1 where n is a positive integer.
The S-matrix contains the identity operator E and additional infinite terms and can
be expressed as an infinite power series of the fine structure constant α described in
the natural units (SI units) by
In Sects. 22.3 and 23.5, we mentioned the normal-ordered product (N-product) and
time-ordered product (T-product). The practical calculations based on them fre-
quently appear and occupy an important position in the quantum field theory. We
represent them as, for example
where the former represents the N-product and the latter denotes the T-product; A, B,
C, etc. show the physical quantities (i.e., quantum fields). The N-product and
T-product are important not only in the practical calculations but also from the
theoretical point of view. In particular, they are useful to clarify the relativistic
invariance (or covariance) of the mathematical formulae.
We start with the simplest case where two field operators are pertinent. The essence
of both the N-product and T-product clearly shows up in this case and the methods
developed here provide a good starting point for dealing with more complicated
problems. Let A(x) and B(x) be two field operators, where x represents the space-time
coordinates. We choose A(x) and B(x) from among the following operators:
1096 23 Interaction Between Fields
d3 k
ϕð xÞ = aðkÞe - ikx þ a{ ðkÞeikx , ð22:98Þ
3
ð2π Þ ð2k0 Þ
ψ ð xÞ
1
dp3
= h= ±1
bðp, hÞuðp, hÞe - ipx þ d{ ðp, hÞvðp, hÞeipx ,
-1 3
ð2π Þ 2p0
ð22:232Þ
ψ ð xÞ
1
dp3
= h= ±1
dðp, hÞvðp, hÞe - ipx þ b{ ðp, hÞuðp, hÞeipx ,
-1 3
ð2π Þ 2p0
ð23:88Þ
ψ anti ðxÞ
1
dp3
= h= ±1
dðp, hÞuðp, hÞe - ipx þ b{ ðp, hÞvðp, hÞeipx ,
-1 3
ð2π Þ 2p0
ð22:244Þ
ψ anti ðxÞ
1
dp3
= h= ±1
bðp, hÞvðp, hÞe - ipx þ d{ ðp, hÞuðp, hÞeipx ,
-1 3
ð2π Þ 2p0
ð23:89Þ
μ
A ð xÞ
d3 k 3
= r=0
aðk, r Þεμ ðk, r Þe - ikx þ a{ ðk, r Þεμ ðk, r Þ eikx :
3
ð2π Þ ð2k0 Þ
ð22:334Þ
Equations (22.244) and (23.89) represent the operators of a positron. Note that
with respect to (22.98) for the real scalar field and (22.334) for the photon field, the
field operators ϕ(x) and Aμ(x) are Hermitian. The Dirac field operators, on the other
hand, are not Hermitian. For this reason, we need two pairs of operators b(p, h), b{(p,
h); d(p, h), d{(p, h) to describe the Dirac field. We see that it is indeed the case when
we describe the time-ordered products (vide infra). When QED is relevant, the
operators of the Dirac field and photon field are chosen from among ψ(x), ψ ðxÞ,
ψ anti(x), ψ anti ðxÞ, and Aμ(x).
The above field operators share several features in common. (i) They can be
divided into two parts, i.e., the positive-frequency part (containing the factor e-ikx)
23.6 N-Product and T-Product 1097
and the negative-frequency part (containing the factor eikx) [1]. The positive and
negative frequencies correspond to the positive energy and negative energy, respec-
tively. These quantities result from the fact that the relativistic field theory is
associated with the Klein-Gordon equation (see Chaps. 21 and 22). (ii) For each
operator, the first term, i.e., the positive-frequency part, contains the annihilation
operator and the second term (negative-frequency part) contains the creation oper-
ator. Accordingly, we express the operators as in, e.g., (22.99) and (22.132) such that
[1]
Care should be taken when dealing with ψ ðxÞ [or ψ anti ðxÞ]. It is because when
taking the Dirac adjoint of ψ(x) [or ψ anti(x)], the positive-frequency part and the
negative-frequency part are switched.
Let us get back to the present task. As already mentioned in Sect. 22.3, with
respect to the N-product for, e.g., the scalar field, the annihilation operators a(k)
always stand to the right of the creation operators a{(k). Now, we show the
computation rule of the N-product as follows:
(i) Boson:
Notice that with the second term in the last equality of (23.91), ϕ+(x)ϕ-( y) has
been switched to ϕ-( y)ϕ+(x). Similarly, we obtain
N½ϕðyÞϕðxÞ
= ϕ ðyÞϕ ðxÞ þ ϕ ðxÞϕ ðyÞ þ ϕ - ðyÞϕþ ðxÞ þ ϕ - ðyÞϕ - ðxÞ:
þ þ - þ
ð23:92Þ
N½ϕðyÞϕðxÞ
= ϕþ ðxÞϕþ ðyÞ þ ϕ - ðxÞϕþ ðyÞ þ ϕ - ðyÞϕþ ðxÞ þ ϕ - ðxÞϕ - ðyÞ: ð23:93Þ
In this way, in the Boson field case, the N-product is kept unchanged after
exchanging the operators involved in the N-product. The expressions related to
(23.91) and (23.94) hold with the photon field Aμ(x) as well.
(ii) Fermion (electron, positron, etc.):
Contrast (23.98) with (23.94) with respect to the presence or absence of the minus
sign. If we think of N½ψ ðxÞψ ðyÞ, we can perform the relevant calculations following
(23.95)-(23.98) in the same manner.
Thinking of the constitution of the N-product, it is clear that for both the cases of
Boson and Fermion, the vacuum expectation value of the N-product is zero. Or
rather, the N-product is constructed in such a way that the vacuum expectation value
of the product of field operators may vanish.
23.6 N-Product and T-Product 1099
We continue dealing with the mathematical expressions of the above example and
examine their implications for both the Boson and Fermion cases.
(i) Boson:
Subtracting N[ϕ(x)ϕ( y)] from ϕ(x)ϕ( y) in (23.91), we have
where with the last equality we used (22.137). Exchanging x and y in (23.99), we
obtain
where with the last equality we used (22.138). Using (23.94), we get
Equations (23.99) and (23.101) show that if we subtract the N-product from the
product of the field operators, i.e., ϕ(x)ϕ( y) or ϕ( y)ϕ(x), we are left with the scalar of
iΔ+(x - y) or -iΔ-(x - y). This implies the relativistic invariance of (23.99) and
(23.101). Meanwhile, sandwiching (23.99) between h0j and | 0i, i.e., taking a
vacuum expectation value of (23.99), we obtain
where with the second equality we used the fact that [ϕ+(x), ϕ-( y)] = iΔ+(x - y) is a
c-number, see (22.137); with the second to the last equality, we used the normalized
condition
h0 j 0i = 1: ð23:104Þ
Thus, (23.105) shows the relativistic invariance based on the N-product and the
vacuum expectation value of ϕ(x)ϕ( y).
We wish to derive a more useful expression on the basis of (23.105). To this end,
we deal with the mathematical expressions including both the N-products and
T-products. Multiplying both sides of (23.99) by θ(x0 - y0), we get
θ y0 - x0 ϕðyÞϕðxÞ - θ y0 - x0 N½ϕðxÞϕðyÞ
= - iθ y0 - x0 Δ - ðx - yÞ: ð23:107Þ
θ x0 - y0 ϕðxÞϕðyÞ þ θ y0 - x0 ϕðyÞϕðxÞ
- θ x0 - y0 þ θ y0 - x0 N½ϕðxÞϕðyÞ
= i θ x0 - y0 Δþ ðx - yÞ - θ y0 - x0 Δ - ðx - yÞ = ΔF ðx - yÞ, ð23:108Þ
where the last equality is due to (22.185). Using the definition of the time-ordered
product (or T-product) of (22.178) and the relation described by (see Chap. 10)
θ x0 - y0 þ θ y0 - x0 1, ð23:109Þ
we obtain
where with the last equality we used (22.182). Comparing (23.110) with (23.105),
we find that in (23.110) T[ϕ(x)ϕ( y)] has replaced ϕ(x)ϕ( y) of (23.105) except for
ϕ(x)ϕ( y) in N[ϕ(x)ϕ( y)]. Putting it another way, subtracting the N-product from the
T-product in (23.110), we are left with the relativistically invariant scalar h0| T
[ϕ(x)ϕ( y)]| 0i.
(ii) Fermion:
The discussion is in parallel with that of the Boson case mentioned above. From a
practical point of view, a product ψ ðxÞψ ðyÞ is the most important. Subtracting
N½ψ ðxÞψ ðyÞ from ψ ðxÞψ ðyÞ in (23.95), we have
23.6 N-Product and T-Product 1101
Note that with the last equality of (23.111), we used (22.256). In a similar manner,
we have
Subtracting both sides of (23.114) from (23.112) and using the definition of the
time-ordered product of (22.254) for the Dirac field, we get
where we used the following relationship that holds with the Fermion case:
T½ψ ðxÞψ ðyÞ - N½ψ ðxÞψ ðyÞ = SF ðx - yÞ = h0jT½ψ ðxÞψ ðyÞj0i: ð23:117Þ
In the case where both A and B are Fermion operators, A(x) = ψ(x) [or ψ ðxÞ] and
BðyÞ = ψ ðyÞ [or ψ( y)] are intended. In this case, all three terms of (23.118) change
the sign upon exchanging A(x) and B( y). Therefore, (23.118) does not change the
sign upon the same exchange. Meanwhile, in the case where at least one of A and B is
a Boson operator, each term of (23.118) does not change sign upon permutation of
A(x) and B( y). Hence, (23.118) holds as well.
1102 23 Interaction Between Fields
In the case of the photon field, the discussion can be developed in parallel to the
scalar bosonic field, because the photon is among a Boson family. In both the Boson
and Fermion cases, the essence of the perturbation method clearly shows up in the
simple expression (23.118). In fact, (23.118) is the simplest case of the Wick’s
theorem [1, 3]. In the general case, the Wick’s theorem is expressed as [1]
TðABCD⋯WXYZ Þ = NðABCD⋯WXYZ Þ þ N AB
⎵
C YZ
þN AB⋯WX YZ
⎵
þ ⋯: ð23:119Þ
⎵
Equation (23.119) looks complicated, but clearly shows how the Feynman
propagators are produced in coexistence of the T-products and N-products. Glancing
at (23.110) and (23.117), this feature is once again obvious. In RHS of (23.119), the
terms from the second on down are sometimes called the generalized normal
products [1]. We have another rule with them. For instance, we have [1]
N F 1 B1 F 01 F 2 B2 F 02 , ð23:121Þ
where F and F′ are Fermion operators; B and B′ are Boson operators. Indices 1 and
2 designate the space-time coordinates. Suppose that we have a following contrac-
tion within the operators given by
N F 1 B1 F 01 F 2 B2 F 02 = ð- 1Þ2 F 1 F 02 N B1 F 01 F 2 B2
j j j j
= F 1 F 02 N B1 F 01 F 2 B2 : ð23:122Þ
j j
N F 2 B2 F 02 F 1 B1 F 01 = F 02 F 1 ð- 1Þ2 N F 2 B2 B1 F 01 = - F 1 F 02 ð- 1Þ2 N F 2 B2 B1 F 01
j j j j j j
N F 1 B1 F 01 F 2 B2 F 02 = N F 2 B2 F 02 F 1 B1 F 01 : ð23:124Þ
j j j j
Thus, we have obtained a simple but important relationship with the generalized
normal product. We make use of this relationship in the following sections.
So far, we have developed the general discussion on the interacting fields and how to
deal with the interaction using the perturbation methods based upon the S-matrix
method. In this section, we develop the methods to apply the general principle to the
specific practical problems. For this purpose, we make the most of the Feynman
diagrams and associated Feynman rules. In parallel, we represent the mathematical
expressions using a symbol or diagram. One of typical examples is a contraction,
which we mentioned in the previous section. We summarize frequently appearing
contractions below. The contractions represent the Feynman propagators.
Among the above relations, (23.126) and (23.127) are pertinent to QED. When
we examine the interaction between the fields, it is convenient to describe field
operator F such that [1]
F = Fþ þ F - , ð23:128Þ
Aμ ðxÞ = Aþ -
μ ðxÞ þ Aμ ðxÞ:
We have
{ {
Aþ
μ ð xÞ = Aμ- ðxÞ and Aμ- ðxÞ = Aþ
μ ðxÞ: ð23:131Þ
Note, however, that neither ψ(x) nor ψ ðxÞ is Hermitian. In conjunction to this fact,
the first factor of (23.130) contains a creation operator ψ - of electron and an
annihilation operator ψ þ of positron. The third factor of (23.130), on the other
hand, includes a creation operator ψ - of positron and an annihilation operator ψ + of
electron. The relevant discussion can be seen in Sect. 23.8.2. According to the
prescription of Sect. 23.5, we wish to calculate matrix elements of (23.86) up to
the second order of the S-matrix.
For simplicity, we assume that the final state obtained after the interaction
between the fields is monochromatic. In other words, the final state of individual
particles is assumed to be described by a single plane wave. For instance, if we are
observing photons and electrons as the final states after the interaction, those photons
and electrons possess definite four-momentum and spin (or polarization). We use
f instead of fα in (23.86) accordingly.
We start with the titled simple cases to show how the S-matrix elements are related to
the elementary processes that are associated with the interaction between fields.
(i) Zeroth-order matrix element hf| S(0)| ii:
We have S(0) = E (identity operator) in this trivial case. We assume
The RHSs of (23.132) imply that an electron with momenta p (or p′) and helicity
h (or h′) has been created in vacuum. That is, the elementary process is described by
e- ⟶ e- ,
= 0j - b{ ðp, hÞbðp0 , h0 Þj0 þ δh0 h δ3 ðp0 - pÞh0j0i = δh0 h δ3 ðp0 - pÞ, ð23:133Þ
where with the last equality, we used (22.262). Equation (23.133) implies that hf|
S(0)| ii vanishes unless we have h′ = h or p′ = p. That is, electrons are only moving
without changing its helicity or momentum. This means that no interaction occurred.
(ii) First-order matrix element hf| S(1)| ii [1]:
We assume that
e- ⟶ e- þ γ, ð23:135Þ
In (23.137), we expand ψ - ðxÞ, Aμ- ðxÞ, and ψ +(x) using (23.88), (22.334), and
(22.232), respectively. That is, we have
1106 23 Interaction Between Fields
1
d 3 w0 0
ψ - ð xÞ = t0 = ± 1
b{ ðw0 , t 0 Þuðw0 , t 0 Þeiw x , ð23:138Þ
-1 ð2π Þ3 2w00
1
d3 l0 2 0
Aμ- ðxÞ = m0 = 1
a{ ðl0 , m0 Þεμ ðl0 , m0 Þeil x , ð23:139Þ
-1 ð2π Þ3 2l00
1
d3 w
ψ þ ð xÞ = t= ±1
bðw, t Þuðw, t Þe - iwx : ð23:140Þ
-1 ð2π Þ 2w0 3
f jSð1Þ ji
1 0 0
= ie d 4 xuðp0 , h0 Þεμ ðk0 , r 0 Þγ μ ζ r0 uðp, hÞeiðp þk - pÞx
9
ð2π Þ 2p0 2p00 2k00
1
= ie uðp0 , h0 Þεμ ðk0 , r 0 Þγ μ ζ r0 uðp, hÞð2π Þ4 δ4 ðp0 þ k0 - pÞ,
ð2π Þ 2p0 2p00 2k00
9
ð23:144Þ
where with the last equality, we used the four-dimensional version of (22.101). The
four-dimensional δ function means that
p00
where p0 = p0 , p= p0
p , etc. From (23.145), εμ ðk0 , r 0 Þ in (23.144) can be replaced
with εμ ðp - p , r 0 Þ. Equation (23.145) implies the conservation of four-momentum
0
p = p0 þ k 0 : ð23:146Þ
p02 = p00 - p0 = m2
2 2
and p2 = p0 2 - p2 = m2 : ð23:147Þ
k 0 = k00 - k0 = ω0 - ω0 = 0,
2 2 2 2 2
ð23:148Þ
p2 = p0 þ 2p0 k0 þ k 0 :
2 2
This implies p′k′ = 0. It is, however, incompatible with a real physical process
[1]. This can be shown as follows: We have
p0 k0 = p00 ω0 - p0 k0 :
But, we have
Hence, p′k′ > p′0ω′ - p′0ω′ = 0. That is, p′k′ > 0. This is, however, inconsistent
with p′k′ = 0, and so an equality of p′k′ = 0 is physically unrealistic. The simplest
case would be that a single electron is present at rest in the initial state and it emits a
photon with the electron recoiling in the final state. Put
m p0 0 ω0
p= , p0 = , k0 = :
0 p0 k0
f jSð1Þ ji = 0: ð23:149Þ
(a) (b)
Fig. 23.2 Building blocks of the Feynman diagram. The Feynman diagram comprises two
electron- or positron-associated lines (labelled p, p′) and a photon-associated line (labelled k, k′)
that are jointed at a vertex. (a) Incoming photon is labelled k. (b) Outgoing photon is labelled k′
23.7 Feynman Rules and Feynman Diagrams in QED 1109
(a)
(b)
= + = − =
(c) (d)
Fig. 23.3 Feynman diagram representing the second-order S-matrix S(2). The diagram is composed
of two building blocks. (a) Feynman diagram containing a single contraction related to electron
field. (b) Feynman diagram showing the electron self-energy. (c) Feynman diagram showing the
photon self-energy (or vacuum polarization). (d) Feynman diagram showing the vacuum bubble
(or vacuum diagram)
Examples that have two contractions are shown in Fig. 23.3b, c, which represent
the “electron self-energy” and “photon self-energy,” respectively. The former term
causes us to reconsider an electron as a certain entity that is surrounded by “photon
cloud.” The photon self-energy, on the other hand, is sometimes referred to as the
vacuum polarization. It is because the photon field might modify the distribution of
virtual electron-positron pairs in such a way that the photon field polarizes the
vacuum like a dielectric [1]. The S-matrix computation of the physical processes
represented by Fig. 23.3b, c leads to a divergent integral. We need a notion of
renormalization to traverse these processes, but we will not go into detail about this
issue.
We show another example of the Feynman diagrams in Fig. 23.3d called “vac-
uum diagram” (or vacuum bubble [7]). It lacks external lines, and so no transitions
are thought to occur. Diagrams collected in Fig. 23.3c, d contain two contractions
between the Fermions. In those cases, the associated contraction lines go in opposite
directions. This implies that both types of Fermions (i.e., electron and positron) take
part in the process.
Now, in accordance with the above topological feature of the Feynman diagrams,
we list major S(2) processes [1] as follows:
ð2Þ e2
SA = - dx1 dx2 N ψγ μ Aμ ψ x1
ðψγ ν Aν ψ Þx2 , ð23:151Þ
2
ð2Þ
SB =
e2
- dx1 dx2 N ðψγ μ Aμ ψÞx1 ðψ γ ν Aν ψÞx þ ½ðψγ μ Aμ ψÞx1 ðψγ ν Aν ψÞx ,
2 j j 2 j j 2
ð23:152Þ
ð2Þ e2
SC = - dx1 dx2 N ðψγ μ Aμ ψÞx1 ðψγ ν Aν ψÞx , ð23:153Þ
2 j j 2
ð23:154Þ
ð23:155Þ
ð23:156Þ
ð2Þ
Of these equations, SB contains a single contraction between the Fermions. We
describe related important aspects in full detail in the next section with emphasis
23.7 Feynman Rules and Feynman Diagrams in QED 1111
ð2Þ
upon the Compton scattering. Another process SC containing a single contraction
includes the Mϕller scattering and Bhabha scattering. Interested readers are referred
ð2Þ ð2Þ ð2Þ
to the literature [1, 3]. The SD , SE , and SF processes are associated with the
electron self-energy, photon self-energy, and vacuum bubble, respectively. The first
ð2Þ
two processes contain two contractions, whereas the last process SF contains three
contractions. Thus, we find that the second-order S-matrix elements
ð2Þ
Sk ðk = A, ⋯, FÞ reflect major processes of QED.
Among the above equations, (23.151) lacking a contraction is of little importance
as in the previous case, because it does not reflect real processes. Equation (23.152)
contains two contractions between the Dirac fields (ψ and ψ) and related two terms
are equivalent. As discussed in Sect. 23.6.2, especially as represented in (23.124),
we have a following N-product expressed as
Thus, two terms of (23.152) are identical, and so (23.152) can be equivalently
expressed as
ð2Þ
SB = - e2 dx1 dx2 N ðψγ μ Aμ ψÞx1 ðψγ ν Aν ψÞx2 : ð23:158Þ
j j
(a)
+
=
− +
(b)
−
=
+
(c) (d)
( , ℎ) – ( , ℎ)
( , ℎ) ( , ℎ)
(e)
( , )
( , )
Fig. 23.4 Major components of the Feynman diagrams. (a) Internal (or intermediate) electron line
and the corresponding Feynman propagator. (b) Internal (or intermediate) photon line and the
corresponding Feynman propagator. (c) External lines of the initial electron [u(p, h)] and final
electron [uðp, hÞ] from the top. (d) External lines of the initial positron [vðp, hÞ] and final positron
[v(p, h)] from the top. (e) External lines of the initial photon [εμ(k, r)] and final photon [εμ(k, r)] from
the top
relative to Fig. 23.4c. Moreover, Fig. 23.4e shows likewise the external lines of
photons. They are labelled by the polarization εμ(k, r).
To gain better understanding of the interaction between the quantum fields and to get
used to dealing with the Feynman diagrams, we take this opportunity to focus upon a
familiar example of Compton scattering. In Chap. 1, we dealt with it from a classical
point of view (i.e., early-stage quantum theory). Here we revisit this problem to
address the major issue from a point of view of quantum field theory. Even though
the Compton scattering is only an example among various kinds of interactions of
the quantum fields, it represents a general feature of the field interactions and, hence,
to explore its characteristics in depth enhances heuristic knowledge.
The elementary process of the Compton scattering is described by
γ þ e- ⟶ γ þ e- , ð23:159Þ
where γ denotes a photon and e- stands for an electron. Using (23.130), a major part
ð2Þ
of the interaction Hamiltonian for SB of (23.158) is described as
× γ ν Aþ - þ -
ν ðx2 Þ þ Aν ðx2 Þ ½ψ ðx2 Þ þ ψ ðx2 Þg: ð23:160Þ
or
1114 23 Interaction Between Fields
ð2Þ
SComp = Sa þ Sb , ð23:161Þ
and
In (23.162) and (23.163), once again we note that the field operators have been
placed in the normal order. In both (23.162) and (23.163), the positive-frequency
part ψ + and negative-frequency part ψ - of the electron field are assigned to the
coordinate x2 and x1, respectively. The normal order of the photon part is given by
Of these, the first and last terms are responsible for Sa and Sb, respectively. Other
ð2Þ
two terms produce no contribution to SComp .
- þ
The term Aμ ðx1 ÞAν ðx2 Þ of Sa in (23.162) plays a role in annihilating the initial
photon at x2 and in creating the final photon at x1. Meanwhile, the term
Aν- ðx2 ÞAþ
μ ðx1 Þ of Sb in (23.163) plays a role in annihilating the initial photon at x1
and in creating the final photon at x2. Thus, comparing (23.162) and (23.163), we
find that the coordinates x1 and x2 switch their role in the annihilation and creation of
photon. Therefore, it is of secondary importance which “event” has happened prior
to another between the event labelled by Aþ þ
ν ðx2 Þ and that labelled by Aμ ðx1 Þ. This is
the case with Aμ- ðx1 Þ and Aν- ðx2 Þ as well. Such a situation frequently occurs in other
kinds of field interactions and their calculations more generally. Normally, we
23.8 Example: Compton Scattering [1] 1115
distinguish the state of the quantum field by momentum and helicity with electron
and momentum and polarization with photon.
The related Feynman diagrams and calculations pertinent to the Compton scat-
tering are given and discussed accordingly (vide infra).
ð2Þ
The S-matrix SB of (23.152) represents interaction processes other than the Comp-
ton scattering shown in (23.159) as well. These are described as follows:
γ þ eþ ⟶ γ þ eþ , ð23:164Þ
- þ
e þ e ⟶ γ þ γ, ð23:165Þ
γ þ γ ⟶ e- þ eþ , ð23:166Þ
ð2Þ
where e+ stands for a positron. In the SB types of processes, we are given four
particles, i.e.,
e - , eþ , γ, γ:
Using the above four particles, we have four different ways of choosing two of
them. That is, we have
γ, e - , γ, eþ , e - , eþ , γ, γ:
The above four combinations are the same as the remainder combinations left
after the choice. The elementary processes represented by (23.164), (23.165), and
(23.166) are called the Compton scattering by positrons, electron-positron pair
annihilation, and electron-positron pair creation, respectively. To distinguish
those processes unambiguously, the interaction Hamiltonian (23.160) including
both the positive frequency-part and negative-frequency part must be written down
in full. In the case of the electron-positron pair creation of (23.166), for instance, the
second-order S-matrix element is described as [1]
In (23.167), field operators are expressed in the normal order and both the
annihilation operators (corresponding to the positive frequency) and creation oper-
ators (corresponding to the negative frequency) are easily distinguished.
1116 23 Interaction Between Fields
We have already shown the calculation procedure of hf| S(1)| ii. Even though it did
not describe a real physical process, it helped understand how the calculations are
performed on the interacting fields. The calculation processes of hf| S(2)| ii are
basically the same as those described above. In the present case, instead of
(23.134) we assume
ð2Þ
f jSComp ji = hf jSa jii þ hf jSb jii
with
ζ r ζ r0
hf jSa jii = ð- ieÞ2
ð2π Þ 6
2p00 2k00 ð2p0 Þð2k 0 Þ
× 0j d4 x1 d4 x2 uðp0 , h0 Þεν ðk0 , r 0 Þ γ ν SF ðx1 - x2 Þεμ ðk, r Þγ μ uðp, hÞ j0 ,
ð23:169Þ
- ix ðq - p 0
- k 0 Þ ix2 ðq - p - kÞ
- id 4 q e 1 e qμ γ μ þ m
SF ð x 1 - x 2 Þ = :
ð2π Þ4 m2 - q2 - iE
Performing the integration of SF(x1 - x2) in (23.169) with respect to x1 and x2, we
obtain an expression including δ functions of four-momenta such that
ζ r ζ r0 h0j0i
hf jSa jii = ð- ieÞ2 ×
ð2π Þ6 2p00 2k 00 ð2p0 Þð2k0 Þ
0 0 4 μ
-id 4 q δ ðq - p -k Þδ ðq- p - k Þ qμ γ þ m
4
ð2π Þ8 uðp0 , h0 Þεν ðk0 ,r 0 Þ γ ν εμ ðk, rÞγ μ uðp,hÞ
ð2π Þ4 m2 - q2 -iE
ζ r ζ r0 δ4 ðp þ k - p0 - k 0 Þ
= ð- ieÞ2 ð2π Þ4
ð2π Þ2 2p00 2k00 ð2p0 Þð2k0 Þ
ð- iÞ pμ þ kμ γ μ þ m
× uðp0 , h0 Þεν ðk0 , r 0 Þ γ ν εμ ðk, r Þγ μ uðp, hÞ
m 2 - ðp þ k Þ2
ð2π Þ4 δ4 ðp þ k - p0 - k 0 Þ
=
ð2π Þ6 2p00 2k00 ð2p0 Þð2k0 Þ
× - e2 ζ r ζ r0 uðp0 , h0 Þεν ðk0 , r 0 Þ γ ν S~F ðp þ kÞεμ ðk, r Þγ μ uðp, hÞ : ð23:170Þ
In (23.170), we used
Since the first factor of (23.171) does not contain the variable q, it can be taken
out from the integrand. The integration of the integrand containing the second factor
produces S~F ðp þ kÞ in (23.170). In (23.170), S~F ðpÞ was defined in Sect. 22.4.4 as
- i pμ γ μ þ m
S~F ðpÞ = : ð22:276Þ
m2 - p2 - iE
Namely, the function S~F ðpÞ is a momentum component (or Fourier component) of
SF(x) defined in (22.275). If in (23.170) we think of only the transverse modes for the
photon field, the indices r and r′ of ζ r and ζ r0 run over 1 and 2 [1]. In that case, we
have ζ 1 = ζ 2 = 1, and so we may neglect the factor ζ r ζ r0 ð= 1Þ accordingly.
On this condition, we define a complex number M a as
1118 23 Interaction Between Fields
= +
(b)
= −
M a - e2 uðp0 , h0 Þεν ðk0 , r 0 Þ γ ν S~F ðp þ kÞεμ ðk, r Þγ μ uðp, hÞ: ð23:172Þ
The complex number M a is called the Feynman amplitude (or invariant scattering
amplitude) associated with the Feynman diagram of Sa (see Fig. 23.5a).
To compute S~F ðp þ k Þ, we have
ð- iÞ pμ γ μ þ m ð- iÞ pμ γ μ þ m
S~F ðp þ kÞ = =
2
ðp þ k Þ - m2 þ iE p2 þ 2pk þ k2 - m2 þ iE
ð- iÞ pμ γ μ þ m ð- iÞ pμ γ μ þ m
= = , ð23:173Þ
2pk þ iE 2pk
ð p þ k Þ 2 pμ þ k μ ð p μ þ k μ Þ = p μ p μ þ p μ k μ þ k μ p μ þ k μ k μ
= pμ pμ þ 2pμ k μ þ k μ kμ = p2 þ 2pk þ k2 ,
1
hf jSa jii = ð2π Þ4 δ4 ðp0 þ k 0 - p - k Þ M a : ð23:174Þ
ð2π Þ 6
2p00 2k00 ð2p0 Þð2k 0 Þ
1
hf jSb jii = ð2π Þ4 δ4 ðp0 þ k0 - p - k Þ
ð2π Þ6 2p00 2k00 ð2p0 Þð2k0 Þ
× - e2 ζ r ζ r0 uðp0 , h0 Þεν ðk, r Þγ ν S~F ðp - k0 Þεμ ðk0 , r 0 Þ γ μ uðp, hÞ : ð23:175Þ
M b - e2 uðp0 , h0 Þεν ðk, r Þγ ν S~F ðp - k 0 Þεμ ðk0 , r 0 Þ γ μ uðp, hÞ, ð23:176Þ
we obtain
1
hf jSb jii = ð2π Þ4 δ4 ðp0 þ k 0 - p - k Þ M b : ð23:177Þ
ð2π Þ 6
2p00 2k00 ð2p0 Þð2k 0 Þ
P1 þ P2 ⟶ P0 1 þ P0 2 þ ⋯ þ P0 N, ð23:178Þ
where P1 and P2 denote the initial particles and P′1, P′2, ⋯P′N represent the final
particles. Then, in accordance with (23.174) and (23.177), the S-matrix element is
expressed as
hf jSjii
N 2 1
= ð2π Þ4 δ4 p0
f =1 f
- p
i=1 i
ð2π Þ3 ð2E 1 Þð2π Þ3 ð2E 2 Þ
N 1
f =1
M: ð23:179Þ
ð2π Þ 2E 0f
3
In (23.179), pi = Ei
pi
ði = 1, 2Þ denotes four-momenta of the initially colliding
E 0f
two particles and p0f = p0f
ðf = 1, ⋯, N Þ represents those of the finally created
N particles. Also, in (23.179) we used S and M instead of Sa, M a , etc., since a general
case is intended. Notice that with the Compton scattering N = 2.
N
E1 þ E2 = E0 ,
k=1 k
ð23:181Þ
23.8 Example: Compton Scattering [1] 1121
N
p1 þ p2 = p0 :
k=1 k
ð23:182Þ
From the above equations, we know that all E 01 , ⋯, E 0N are not independent
valuables. All p01 , ⋯, p0N are not independent valuables, either.
The cross-section σ for the reaction or scattering is described by
where vrel represents the relative velocity between two colliding particles P1 and P2.
In (23.183), vrel is defined by [1]
where m1 and m2 are the rest masses of the colliding particles 1 and 2, respectively.
When we have a colinear collision of the two particles, (23.184) is reduced to a
simple expression. Suppose that at a certain inertial frame of reference Particle 1 and
Particle 2 have a velocity v1 and v2 = av1 (where a is a real number), respectively.
According to (1.9), we define γ 1 and γ 2 such that
Also, we obtain
With the above derivation, we used (23.185) and (23.186) along with v2 = av1
(colinear collision). Then, we get
vrel = j v1 ða - 1Þ j = j v2 - v1 j : ð23:188Þ
Thus, in the colinear collision case, we have had a relative velocity identical to
that in a non-relativistic limit. In a particular instance of a = 0 (to which Particle 2 is
at rest), vrel = j v1j. This is usually the case with a laboratory frame where Particle
1122 23 Interaction Between Fields
vrel = 1: ð23:189Þ
ð2π Þ6 E1 E2
σ=
TV
ð p1 p2 Þ 2 - ð m 1 m 2 Þ 2
1 N 2 2
6
ð2π Þ4 δ4 p0
f =1 f
- p
i=1 i
ð2π Þ 4E1 E2
1
× jM j2 , ð23:190Þ
f ð2π Þ 2E 0f
3
where (2π)6 of the numerator in the first factor is related to the normalization of
incident particles 1 and 2 (in the initial state) [8]. In (23.190), we have
N 2 2 N 2
ð2π Þ4 δ4 p0
f =1 f
- p
i=1 i
= ð2π Þ4 δ4 ð0Þ ð2π Þ4 δ4 p0
f =1 f
- p
i=1 i
,
where (2π)4δ4(0) represents the volume of the whole space-time. Thus, we get
ð2π Þ4 N 2
σ= δ4 p0
f =1 f
- p
i=1 i
4 ðp1 p2 Þ2 - ðm1 m2 Þ2
N 1
f =1
jM j2 : ð23:193Þ
ð2π Þ 2E 0f
3
ð2π Þ4 N 2
dσ = δ4 p0
f =1 f
- p
i=1 i
4 ðp1 p2 Þ2 - ðm1 m2 Þ2
N dp0f
f =1
jM j2 , ð23:194Þ
ð2π Þ3 2E0f
where dp0f represents an infinitesimal range of momentum from p0f to p0f þ dp0f for
which the particle P′f ( f = 1, ⋯, N ) is found. Then, σ can be expressed in a form of
integration of (23.194) as
ð2π Þ4 N 2
σ= δ4 p0
f =1 f
- p
i=1 i
4 ðp1 p2 Þ2 - ðm1 m2 Þ2
N dp0f
f =1
jM j2 : ð23:195Þ
ð2π Þ 3
2E0f
Note that (23.195) is Lorentz invariant. In fact, if in (22.150) we put f(k0) 1 and
g(k) 1, we have
1
d4 kδ k2 - m2 θðk 0 Þ = d3 k :
2 k þ m2
2
ð2π Þ4 4 N 2 N dp0f
σ= δ p0 -
f =1 f i=1
p f =1
jM j2 : ð23:196Þ
4vrel E 1 E 2 ð2π Þ3 2E 0f
1 2 2
dσ = δ4 p0 - p jM j2 d3 p01 d3 p02 : ð23:197Þ
64π 2 vrel E 1 E 2 E 01 E 02 f =1 f i=1 i
we regard only p01 (of the scattered photon) as an independent variable out of p01 and
p02 . The choice is at will, but it is both practical and convenient to choose the
momentum of the particle in question for the independent variable. To express the
1124 23 Interaction Between Fields
2 2 1
dσ = δ4 E0 - E jM j2 p01 d p01 dΩ0 , ð23:199Þ
2
f =1 f i=1 i 64π 2 E 1 E 2 E 01 E 02 vrel
where dΩ′ denotes a solid angle element, to which the particle P′1 is scattered into.
Notice that the p02 integration converted δ4 2 0
f = 1 pf -
2
i = 1 pi into
2 0 2
δ4 f = 1 Ef - i = 1 Ei .
Further integration of dσ over j p01 j produces [1]
-1
1 ∂ E 01 þ E02
p01 jM j2 p01 dΩ0 , ð23:200Þ
2
dσd = d~
σ=
64π E 1 E 2 E 01 E 02 vrel
2
∂ p01
∂x
f ðx, yÞδ½gðx, yÞdx = f ðx, yÞδ½gðx, yÞ dg
∂g y
-1
∂g
= f ðx, yÞ : ð23:201Þ
∂x y
g=0
where c is a complex number and Γ is usually a (4, 4) matrix (tensor) that contains
gamma matrices. The amplitude M is a complex number as well and we are
interested in calculating jM j2 (a real number).
Hence, we have
23.8 Example: Compton Scattering [1] 1125
jM j2 = jcj2 ½uðp0 , h0 ÞΓuðp, hÞ½uðp0 , h0 ÞΓuðp, hÞ : ð23:203Þ
{
½uðp0 , h0 ÞΓuðp, hÞ = ½uðp0 , h0 ÞΓuðp, hÞ : ð23:204Þ
Then, we obtain
where with the second equality we used γ 0γ 0 = E; with the third equality, we used
~ as
u{ = γ 0 u. Defining Γ
~ γ 0 Γ{ γ 0 ,
Γ ð23:206Þ
we get
~ ðp0 , h0 Þ:
jM j2 = jcj2 uðp0 , h0 ÞΓuðp, hÞuðp, hÞΓu ð23:207Þ
In (23.207), u(p, h), u(p′, h′), etc. are (4, 1) matrices; Γ and Γ ~ as well as
~ 0 0
uðp, hÞuðp, hÞ are (4, 4) matrices. Then, Γuðp, hÞuðp, hÞΓuðp , h Þ is a (4, 1) matrix.
Now, we recall (18.70) and (18.71); namely,
Having a look at (18.71), we find (18.70) can be extended to the case where
neither P nor Q is a square matrix but PQ is a square matrix. Thus, (23.207) can be
rewritten as
~ ðp0 , h0 Þuðp0 , h0 Þ ,
jM j2 = jcj2 Tr Γuðp, hÞuðp, hÞΓu ð23:208Þ
where Γuðp, hÞuðp, hÞΓu ~ ðp0 , h0 Þuðp0 , h0 Þ is a (4, 4) matrix. Note, however, that
taking traces does not mean sum or average of the spin and photon polarization. It is
because in (23.208) the helicity indices h or h′ for electron remain without being
1126 23 Interaction Between Fields
summed. The photon polarization indices [hidden in Γ and Γ ~ for (23.208)] have not
yet been summed either.
The polarization sum and average are completed by taking summation over all the
pertinent indices. This will be done in detail in the next section.
Now that we have gotten used to the generalities of the scattering theory, we are in
the position to carry out the detailed calculations of the Feynman amplitude. In the
case of the Compton scattering, the Feynman amplitude M a has been given as
M a - e2 uðp0 , h0 Þεν ðk0 , r 0 Þ γ ν S~F ðp þ k Þεμ ðk, r Þγ μ uðp, hÞ, ð23:172Þ
M b - e2 uðp0 , h0 Þεν ðk, r Þγ ν S~F ðp - k 0 Þεμ ðk0 , r 0 Þ γ μ uðp, hÞ: ð23:176Þ
c = - e2 ζ r ζ r 0 : ð23:209Þ
M = M a þ M b: ð23:210Þ
Then, we have
jMj2 = jM a j2 þ jM b j2 þ M a M b þ M a M b : ð23:211Þ
From here on, we assume that electrons and/or photons are unpolarized in both
their initial and final states. In terms of the experimental techniques, it is more
difficult to detect or determine the polarization of helicity of electrons than the
polarization of photons. To estimate the polarization sum with the photon field, we
need another technique in relation to its gauge transformation (vide infra).
(i) In the case of the electrons, we first average over initial spins (or helicities) h and
sum over final helicities h′. These procedures yield a positive real number X ~ that
is described by [1]
23.8 Example: Compton Scattering [1] 1127
~= 1
X
2 2
jMj2 : ð23:212Þ
2 h=1 h0 = 1
(ii) As for the photons, we similarly average over initial polarizations r and sum
over final polarizations r′. These procedures yield another positive real number
X defined as [1]
1 2 2
~= 1 2 2 2 2
X r=1 r0 = 1
X r=1 r0 = 1 h=1 h0 = 1
jMj2 , ð23:213Þ
2 4
1 2 2 2 2
X1 r=1 r0 = 1 h=1 h0 = 1
jM a j2 , ð23:214Þ
4
1 2 2 2 2
X2 r=1 r0 = 1 h=1 h0 = 1
jM b j2 , ð23:215Þ
4
1 2 2 2 2
X3 r=1 r0 = 1 h=1 h0 = 1
M aM b, ð23:216Þ
4
1 2 2 2 2
X4 r=1 r0 = 1 h=1 h0 = 1
M aM b: ð23:217Þ
4
X = X1 þ X2 þ X3 þ X4 : ð23:218Þ
Thereby, we get
1128 23 Interaction Between Fields
e4 2 2 2 2
~ ðp0 , h0 Þuðp0 , h0 Þ
X1 = r=1 r0 = 1 h=1 h0 = 1
Tr Γuðp, hÞuðp, hÞΓu
4
e4 εν ðk0 , r 0 Þ γ ν pμ þ k μ γ μ þ m ελ ðk, r Þγ λ
= r,r 0 ,h,h0
Tr uðp, hÞuðp, hÞ
4 2pk
γ ρ ερ ðk, r Þ pμ þ k μ γ μ þ m εσ ðk0 , r 0 Þγ σ
× uðp0 , h0 Þuðp0 , h0 Þ ½ {
2pk
e4 εν ðk0 , r 0 Þ γ ν pμ þ kμ γ μ þ m ελ ðk, r Þγ λ
= r,r 0
Tr ð pα γ α þ m Þ
4 2pk
γ ρ ερ ðk, r Þ pμ þ kμ γ μ þ m εσ ðk0 , r 0 Þγ σ 0 β
× pβ γ þ m
2pk
e4 εν ðk0 , r 0 Þ εσ ðk0 , r 0 Þγ ν pμ þ kμ γ μ þ m ερ ðk, r Þ ελ ðk, r Þγ λ
= r,r0
Tr
4 2pk
e4 ð- ηνσ Þγ ν pμ þ kμ γ μ þ m - ηρλ γ λ
= Tr
4 2pk
At [{{], we used the fact that εσ (k′, r′) and ερ ðk, r Þ can freely be moved in
(23.221) because they are not an operator, but merely a complex number. At
23.8 Example: Compton Scattering [1] 1129
[{ { {], see discussion made just below based on the fact that the Feynman amplitude
M must have the gauge invariance.
Suppose that the vector potential Aμ(x) takes a form [1]
where α is a constant.
Also, suppose that in (22.293) we have [1]
~ ðk Þe ± ikx :
Λ ð xÞ = α Λ
~ ðk Þe ± ikx
A0μ ðxÞ = Aμ ðxÞ þ ∂μ ΛðxÞ = Aμ ðxÞ ± ik μ Λ
~ ðk Þe ± ikx = α εμ ðk, r Þ ± ik μ Λ
= αεμ ðk, r Þe ± ikx ± ik μ Λ ~ ðkÞ=α e ± ikx : ð23:222Þ
~ ðkÞ=α:
ε0μ ðk, r Þ = εμ ðk, r Þ ± ik μ Λ ð23:223Þ
~ ðkÞ =α :
ε0μ ðk, r Þ = εμ ðk, r Þ ∓ ik μ Λ ð23:224Þ
2
ε0 ðk, r Þ εν 0 ðk, r Þ
r=1 μ
=
2
~ ðk Þ =α εν ðk, r Þ ± ik ν Λ
εμ ðk, r Þ ∓ ik μ Λ ~ ðkÞ=α
r=1
2
= ε ðk, r Þ
r=1 μ
εν ðk, r Þ
þ
2
~ ðkÞ=α ∓ ik μ εν ðk, r ÞΛ
± ik ν εμ ðk, r Þ Λ ~ ðkÞ=α
~ ðkÞ =α þ k μ kν Λ 2
:
r=1
ð23:225Þ
Namely, the gauge transformation of the photon field produces the additional
second term in (23.225). Let the (partial) Feynman amplitude X1 including the factor
2 0
r = 1 εμ ′ ðk, r Þ εν ðk, r Þ be described by
1130 23 Interaction Between Fields
e4 μν
M~
2 2
X1 = Tr r0 = 1
ε0 ðk, r Þ ε0ν ðk, r Þ
r=1 μ
4
e4 μν
þ Ξ k μ , kν M~
2 2
= Tr r0 = 1
ε ðk, r Þ εν ðk, r Þ
r=1 μ
,
4
where Ξ(kμ, kν) represents the second term of (23.225). That is, we have
Ξ kμ , kν
2
~ ðk Þ=α ∓ ik μ εν ðk, r ÞΛ
± ik ν εμ ðk, r Þ Λ ~ ðkÞ=α
~ ðk Þ =α þ kμ kν Λ 2
:
r=1
ð23:226Þ
μν
The factor M~ represents a remaining part of X1 . In the present case, the
μν
indices μ and ν of M ~ indicate those of the gamma matrices γ μ and γ ν. Since X1 is
μν
gauge invariant, Ξ kμ , k ν M~ must vanish. As Ξ(kμ, kν) is linear with respect to
both kμ and kν, it follows that
μν μν
k μ M~ = k ν M~ = 0: ð23:227Þ
Meanwhile, we had
2 k μ kν - ðknÞ k μ nν þ nμ k ν
ε ðk, r Þ
r=1 μ
εν ðk, r Þ = - ημν - : ð23:228Þ
ðknÞ2
Note that as pointed out previously, (22.377) holds with the real photon field (i.e.,
an external photon) as in the present case. Then, we obtain
e4 kμ kν - ðknÞ kμ nν þ nμ kν μν
M~
2
X1 = Tr r0 = 1
- ημν - 2
: ð23:229Þ
4 ðknÞ
kμ k ν - ðknÞ kμ nν þ nμ kν μν
M~ = 0, ð23:230Þ
ðknÞ2
μν
with only the first term - ημν M~ contributing to X1 . With regard to another
2 0 0 0 0
related factor 0 ε
r =1 α ð k , r Þε β ð k , r Þ of (23.221), the above argument holds
as well.
Thus, the calculation of X1 comes down to the trace computation of (23.221). The
remaining calculations are straightforward and follow the literature method [1]. We
first calculate the numerator of (23.221). To this end, we define a (4, 4) matrix
Y included as a numerator of (23.221) as [1]
23.8 Example: Compton Scattering [1] 1131
With the last equality of (23.231), we used the following formulae [1]:
γ λ γ λ = 4E 4 , ð21:185Þ
σ λ σ σ σ λ σ
γ λ ðAσ γ Þγ = - 2ðAσ γ Þ, γ λ ðAσ γ ÞðBσ γ Þγ = 4Aσ B , ð21:187Þ
γ λ ðAσ γ σ ÞðBσ γ σ ÞðCσ γ σ Þγ λ = - 2ðC σ γ σ ÞðBσ γ σ ÞðAσ γ σ Þ: ð21:188Þ
X1 = e4 Y=16ðpk Þ2 : ð23:233Þ
With Y, we obtain
ρ ρ
Y = 16 2 pμ þ kμ pμ pρ þ k ρ p0 - pμ þ kμ ðpμ þ kμ Þpρ p0
Also, we used the fact that the trace of an odd number product of gamma matrices
vanishes [1]. Calculations of (23.234) are straightforward using three independent
scalars, i.e., m2, pk, and pk′; here notice that pk, pk′, etc. are the abbreviations of
pk pμ kμ , pk 0 pμ k 0μ , etc: ð23:235Þ
p þ k = p0 þ k 0 : ð23:237Þ
kp = k 0 p0 : ð23:239Þ
pk 0 = p0 k: ð23:240Þ
p - p0 = k0 - k: ð23:241Þ
Taking the square of both sides of (23.241) again and using k′k = pk - pk′, we get
pp0 = m2 þ pk - pk 0 : ð23:242Þ
Using the above relationship among the scalars, we finally get [1]
jM a j2 / εν ðk0 , r 0 Þ εν ðk0 , r 0 ÞS~F ðp þ k ÞS~F ðp þ kÞεμ ðk, r Þεμ ðk, r Þ :
X1 / εν ðk0 , r Þ εν ðk0 , r ÞS~F ðp þ k ÞS~F ðp þ kÞεμ ðk, r 0 Þεμ ðk, r 0 Þ
= εν ð- k0 , r Þ εν ð- k0 , r ÞS~F ðp þ k ÞS~F ðp þ kÞεμ ð- k, r 0 Þεμ ð- k, r 0 Þ :
Since in the above equation we are thinking of the transverse modes with the
polarization, those modes are held unchanged with respect to the exchange of
k $ - k and k′ $ - k′ as shown in the above equation. Meanwhile, from
(23.176), we have
X 2 / jM b j2
/ εν ðk, r Þεν ðk, r Þ S~F ðp - k0 ÞS~F ðp - k 0 Þεμ ðk0 , r 0 Þ εμ ðk0 , r 0 Þ:
To calculate X3 , we need related but slightly different techniques from that used
for the calculation of X1 . The outline for this is as follows. As similarly to the case of
(23.221), we arrive at the following relation described by
X3
ð23:246Þ
m). Of these terms, m is included as either a factor of m2 or m4. Classifying the eight
terms according to the power of m, we get
where with the second and last equalities, we used (21.185); E4 denotes the (4, 4)
identity matrix. Taking the trace of (23.248), we have
m2 γ σ pρ þ kρ γ ρ γ μ ½pα γ α γ σ γ μ = m2 γ σ pρ þ kρ γ ρ pα γ μ γ α γ σ γ μ
= m2 γ σ pρ þ k ρ γ ρ pα ð4ηασ Þ
where with the second equality, we used the first equation of (21.186). Taking the
trace of (23.250), we obtain
where with the first equality, we used (21.189). Other five terms of the second power
of m are calculated in a similar manner. Summing up the traces of these six matrices,
we get
(iii)
m0 : γ σ pρ þ k ρ γ ρ γ μ ½pα γ α γ σ pλ - k0λ γ λ γ μ p0β γ β
= - 2½pα γ α γ μ pρ þ k ρ γ ρ pλ - k 0λ γ λ γ μ p0β γ β
= - 2½ pα γ α γ μ p ρ þ k ρ γ ρ pλ - k 0λ γ λ γ μ p0β γ β
where with the first equality, we used (21.188) with Bσ 1 [or B (1 1 1 1), where
B denotes a covariant four-vector]; with the second to the last equality, we used the
second equation of (21.187). Taking the trace of (23.253), we have
where with the first equality, we used (21.189); the last equality resulted from
(23.237), (23.239), and (23.242). Summing (23.249), (23.252), and (23.254), finally
we get
Summing up all the above results and substituting them for (23.218), we get X of
(23.213) as
1136 23 Interaction Between Fields
1 2 2 2 2
X = X1 þ X2 þ X3 þ X4 = r=1 r0 = 1 h=1 h0 = 1
j Mj 2
4
2e4 ½m4 þ m2 ðpk Þ þ ðpk Þðpk0 Þ 2e4 ½m4 - m2 ðpk 0 Þ þ ðpk Þðpk 0 Þ
= þ
ðpk Þ2 ðpk 0 Þ
2
pk 0 pk
2
1 1 1 1
= 2e4 m4 - þ 2m2 - þ þ : ð23:257Þ
pk pk 0 pk pk 0 pk pk0
Once again, notice that (23.257) is unchanged with the exchange between k and -
k′ .
Our next task is to connect (23.257) to the experimental results associated with the
scattering cross-section that has been expressed as (23.200). In the present situation,
we assume a scattering described by (23.178) as
P1 þ P2⟶P0 1 þ P0 2: ð23:258Þ
E E0 ω ω0
p , p0 ,k , k0 , ð23:259Þ
p p0 k k0
where p and k represent the initial state of the electron and photon field, respectively;
p′ and k′ represent the final state of the electron and photon field, respectively. To use
(23.200), we express the quantities of (23.259) in the laboratory frame where a
photon beam is incident to the targeted electrons that are nearly stationary. In
practice, this is a good approximation. Thereby, p of (23.259) is approximated as
p≈ m
0
:
= −
p0 = k - k0 : ð23:260Þ
m þ ω ≈ E 0 þ ω0 : ð23:261Þ
k0 k = pk - pk 0 ð23:262Þ
ω2 = k2 , ω0 = k0 , ω = jkj, ω0 = jk0 j:
2 2
ð23:264Þ
Figure 23.6 shows the geometry of the photon field. We assume that photons are
scattered in a radially symmetric direction with respect to the photon momentum k as
a function of the scattering angle θ. Notice that we conform Figs. 23.6 to 1.1. In this
geometry, we have
This is consistent with (1.16), if (1.16) is written in the natural units. In (23.258),
we may freely choose P′1 (or P′2) from among electron and photon. For a practical
1138 23 Interaction Between Fields
purpose, however, we choose photon for P′1 (i.e., scattered photon) in (23.258). That
is, since we are to directly observe the photon scattering in the geometry of the
laboratory frame, it will be convenient to measure the energy and momentum of the
scattered photon as P′1 in the final state. Then, P′2 is electron accordingly. To apply
(23.200) to the experimental results, we need to express the electron energy E 02 in
terms of the final photon energy ω′. We have
E 0 = p0 þ m2 = ðk - k0 Þ þ m2 ,
2 2 2
ð23:267Þ
where with the last equality, we used (23.260). Applying the cosine rule to Fig. 23.6,
we have
Then, we obtain
Accordingly, we obtain
∂ðE 0 þ ω0 Þ ω0 - ω cos θ
= þ1
∂ω0 ω2 þ ω0 2 - 2ωω0 cos θ þ m2
ω0 - ω cos θ þ E 0 ωð1 - cos θÞ þ m
= = , ð23:271Þ
E0 E0
where with the last equality, we used (23.261). Using (23.266), furthermore, we get
∂ ðE 0 þ ω 0 Þ mω
= 0 0: ð23:272Þ
∂ω0 Eω
-1
ω0 2 ∂ðω0 þ E 0 Þ
σ=
d~ X dΩ0 : ð23:273Þ
64π 2 mωω0 E 0 vrel ∂ω0
σ
d~ ω0 2 E 0 ω0
0 = 2 0 0 X
dΩ 64π mωω E vrel mω
pk 0 pk
2
2e4 ω0 2 1 1 1 1
= m4 - þ 2m2 - þ þ : ð23:274Þ
64π 2 m2 ω2 pk pk 0 pk pk 0 pk pk 0
The energy of the scattered electron was naturally deleted. As mentioned earlier,
in (23.274) we assumed vrel = 1 as a good approximation. In the case where the
electron is initially at rest, we can use the approximate relations (23.266) as well as
pk = mω and pk′ = mω′. On these conditions, we obtain a succinct relation described
by
2
σ
d~ α 2 ω0 ω ω0
= þ - sin2 θ , ð23:275Þ
dΩ0 2m2 ω ω0 ω
where α is the fine structure constant given in (23.87). Moreover, using (23.266) ω′
can be eliminated from (23.275) [1].
The polarized measurements of the photon have thoroughly been pursued and the
results are well-known as the Klein-Nishina formula. The details can be seen in the
literature [1, 3].
23.9 Summary
In this chapter, we have described the features of the interaction among quantum
field, especially within the framework of QED. As a typical example, we detailed the
calculation procedures of the Compton scattering. It is intriguing to point out that
QED adequately explains the various aspects of the Compton scattering, whose
experiments precisely opened up a route to QED.
In this chapter, we focused on the S-matrix expansion of the second order. Yet,
the essence of QED clearly showed up. Namely, we studied how the second-order S-
ð2Þ
matrix elements Sk ðk = A, ⋯, FÞ reflected major processes of QED including the
Compton scattering. We have a useful index about whether the integrals relevant to
the S-matrix diverge. The index K is given by [1]
K = 4 - ð3=2Þf e - be , ð23:276Þ
where fe and be are the number of the external Fermion (i.e., electron) lines and
external Boson (photon) lines in the Feynman diagram, respectively. That K ≥ 0 is a
necessary condition for the integral to diverge. In the case of the Compton scattering,
fe = be = 2, then we have K = - 1. The integrals converge accordingly. We have
confirmed that this is really the case. In the cases of the electron self-energy (K = 1)
and photon self-energy (K = 2), the second-order S-matrix integrals diverge [1].
1140 23 Interaction Between Fields
Even though we did not go into detail on those issues, especially the
renormalization, we gained a glimpse into the essence of QED that offers prolific
aspects on the elementary particle physics.
References
1. Mandl F, Shaw G (2010) Quantum field theory, 2nd edn. Wiley, Chichester
2. Arfken GB, Weber HJ, Harris FE (2013) Mathematical methods for physicists, 7th edn. Aca-
demic Press, Waltham
3. Itzykson C, Zuber J-B (2005) Quantum field theory. Dover, New York
4. Kaku M (1993) Quantum field theory. Oxford University Press, New York
5. Peskin ME, Schroeder DV (1995) An introduction to quantum field theory. Westview Press,
Boulder
6. Moller C (1952) The theory of relativity. Oxford University Press, London
7. Klauber RD (2013) Student friendly quantum field theory, 2nd edn. Sandtrove Press, Fairfield
8. Sakamoto M (2020) Quantum field theory II. Shokabo, Tokyo. (in Japanese)
9. Dennery P, Krzywicki A (1996) Mathematics for physicists. Dover, New York
Chapter 24
Basic Formalism
We have studied several important aspects of the quantum theory of fields and found
that the theory is based in essence upon the theory of relativity. In Chaps. 1 and 21,
we briefly surveyed its essence. In this chapter, we explore the basic formalism and
fundamental skeleton of the theory more systematically in depth and in breadth.
First, we examine the constitution of the Lorentz group from the point of view of Lie
algebra. The Lorentz group comprises two components, one of which is virtually the
same as the rotation group that we have examined in Part IV. Another component, on
the other hand, is called Lorentz boost, which is considerably different from the
former one. In particular, whereas the matrices associated with the spatial rotation
are unitary, the matrices that represent the Lorentz boost are not unitary. In this
chapter, we connect these aspects of the Lorentz group to the matrix representation
of the Dirac equation to clarify the constitution of the Dirac equation. Another point
of great importance rests on the properties of Dirac operators G and G ~ that describe
the Dirac equation (Sect. 21.3). We examine their properties from the point of view
of matrix algebra. In particular, we study how G and G ~ can be diagonalized through
the similarity transformation. Remaining parts of this chapter deal with advanced
topics of the Lorentz group from the point of view of continuous groups and their Lie
algebras.
In Part III, we studied the properties and characteristics of the linear vector spaces.
We have seen that among the vector spaces, the inner product space and its
properties are most widely used and investigated in various areas of natural science,
accompanied by the bra-ket notation ha| bi of inner product due to Dirac (see
Chaps. 1 and 13). The inner product space is characterized by the positive definite
metric associated with the Gram matrix (Sect. 13.2). In this section, we wish to
modify the concept of metric so that it can be applied to wider vector spaces
First, we introduce a direct product (vector) space formed by the product of two
(or more) vector spaces. In Part IV, we introduced a notion of direct product in
relation to groups and their representation. In this chapter, we extend this notion to
the linear vector spaces. We also study the properties of the bilinear mapping that
maps the direct product space to another vector space. First, we have a following
definition:
Definition 24.1 Let B be a mapping described by
B : W × V ⟶ T, ð24:1Þ
where V, W, and T are vector spaces over K; W × V means the direct product of V and
W. Suppose that B(, ) satisfies the following conditions [1, 2]:
where K is a field usually chosen from a complex number field ℂ or real number field
ℝ. Then, the mapping B is called a bilinear mapping.
The range of B, i.e., B(, ) is a subspace of T. This is obvious from the definition
of (24.2) and (11.15). Since in Definition 24.1 the dimension of V and W can be
different, we assume that the dimension of V and W is n and m, respectively. If the
direct product W × V is deemed to be a topological space (Sect. 6.1.2), it is called a
direct product space. (However, we do not need to be too strict with the
terminology.)
Definition 24.1 does not tell us more detailed structure of B(, ) with the excep-
tion of the fact that B(, ) forms a subspace of T. To clarify this point, we set further
conditions on the bilinear mapping B defined by (24.1) and (24.2). The conditions
(B1)-(B3) are given as follows [1, 2]:
(B1) Let V and W be linear vector spaces with dimV = n and dimW = m. If x1, ⋯,
xr 2 V are linearly independent, with y1, ⋯, yr 2 W we have
r
i=1
Bðyi , xi Þ = 0 ⟹ yi = 0 ð1 ≤ i ≤ r Þ, ð24:3Þ
where r ≤ n.
24.1 Extended Concepts of Vector Spaces 1143
where r ≤ m.
(B3) The vector space T is spanned by B(y, x) (x 2 V, y 2 W ).
Regarding Definition 24.1, we have a following important theorem.
Theorem 24.1 [1, 2] Let B be a bilinear mapping described by B(, ) : W × V ⟶ T,
where V, W, and T are vector spaces over K. Let {ei; 1 ≤ i ≤ n} and {fj; 1 ≤ j ≤ m} be
the basis set of V and W, respectively. Then, a necessary and sufficient condition for
the statements (B1)-(B3) to hold is that B( fj, ei) (1 ≤ i ≤ n, 1 ≤ j ≤ m) are the basis
set of T.
Proof
(i) Sufficient condition: Suppose that B( fj, ei) (1 ≤ i ≤ n, 1 ≤ j ≤ m) are the basis set
of T. That is, we have
T = Span B f j , ei ; 1 ≤ i ≤ n, 1 ≤ j ≤ m : ð24:5Þ
n ðk Þ m ðk Þ
xðkÞ = ξ ei , yðkÞ
i=1 i
= η fj
j=1 j
ð1 ≤ k ≤ r Þ, ð24:6Þ
ðk Þ ðk Þ
with ξi , ηj 2 K. We assume that in (24.6) r ≤ max (m, n). Then, we have
r m n r ðk Þ ðkÞ
k=1
B yð k Þ , xð k Þ = j=1 i=1
η ξi
k=1 j
B f j , ei : ð24:7Þ
In (24.7) putting r
k = 1B yðkÞ , xðkÞ = 0, we get
r ðk Þ ðk Þ
η ξi
k=1 j
= 0: ð24:8Þ
ð1Þ ð2Þ ðr Þ
ξ1 ξ1 ξ1
ð1Þ ð2Þ ðr Þ
ηj ⋮ þ ηj ⋮ þ⋯þ ηj ⋮ = 0: ð24:10Þ
ξðn1Þ ξðn2Þ ξðnrÞ
ð1Þ ð2Þ ðr Þ
ηj = ηj = ⋯ = ηj = 0: ð24:11Þ
The above argument ensures the assertion of (B1). In a similar manner, the
statement (B2) is assured. Since B( fj, ei) (1 ≤ i ≤ n, 1 ≤ j ≤ m) are the basis set of
T, (B3) is evidently assured.
(ii) Necessary condition: Suppose that the conditions (B1)- (B3) hold. From the
argument of the sufficient condition, we have (24.5). Therefore, for B( fj,
ei) (1 ≤ i ≤ n, 1 ≤ j ≤ m) to be the basis set of T, we only have to show that
B( fj, ei) are linearly independent. Suppose that we have
ζ B
i,j ij
f j , ei = 0, ζ ij 2 K: ð24:12Þ
i
Bðyi , ei Þ = 0: ð24:13Þ
Bðy, xÞ
η 1 ξ1
⋮
η 1 ξn
η 2 ξ1
= ð f 1 e1 ⋯ f 1 en f 2 e1 ⋯ f 2 en ⋯ f m e1 ⋯ f m en Þ ⋮ :
η 2 ξn
⋮
ηm ξ1
⋮
ηm ξn
ð24:15Þ
Equations (24.14) and (24.15) clearly show that any elements belonging to T can
be expressed as RHS of these equations. Namely, T is spanned by B(y, x) (x 2 V,
y 2 W ). This is what the statement (B3) represents.
Equation (24.15) corresponds to (11.13); in (24.15) mn basis vectors form a row
vector and mn “coordinates” form a corresponding column vector.
The next example will help us understand the implication of the statements (B1)-
(B3) imposed on the bilinear mapping earlier in this section.
Example 24.1 Let V and W be vector spaces with dimension 3 and 2, respectively.
Suppose that xi 2 V (i = 1, 2, 3) and yj 2 W ( j = 1, 2, 3). Let xi and yj be given by
ξ1 ξ1′ ξ″1
x1 = ð e1 e2 e3 Þ ξ2 , x 2 = ð e1 e2 e3 Þ ξ2′ , x3 = ð e1 e2 e3 Þ ξ″2 ;
ξ3 ξ3′ ξ″3
η1 η′ η″1
y1 = ð f 1 f 2 Þ , y2 = ðf 1 f 2 Þ 1′ , y3 = ð f 1 f 2 Þ , ð24:16Þ
η2 η2 η″2
where (e1 e2 e3) and ( f1 f2) are basis sets of V and W, respectively; ξ1, η1, etc. 2 K. For
simplicity, we write
1146 24 Basic Formalism
ðe1 e2 e3 Þ e, ðf 1 f 2 Þ f : ð24:17Þ
η1 ξ1
ξ1 η1 ξ2
η1 η1 ξ3
Bðy1 ; x1 Þ = ðf × eÞ × ξ2 ð f × eÞ , ð24:18Þ
η2 η2 ξ1
ξ3 η2 ξ2
η2 ξ3
where we have
ð f × eÞ = ð f 1 e1 f 1 e2 f 1 e3 f 2 e1 f 2 e2 f 2 e3 Þ : ð24:19Þ
Omitting the basis set (f × e), (24.20) is separated into two components such that
Assuming that ξ1, ξ2, and ξ3 (2V ) are linearly independent, (24.21) implies that
η1 = η01 = η001 = η2 = η02 = η002 = 0. Namely, from (24.16) we have
y1 = y2 = y3 = 0: ð24:22Þ
η1 ξ1 þ η1′ ξ1′
η1 ξ2 þ η1′ ξ2′
η1 ξ3 þ η1′ ξ3′
Bðy1 ; x1 Þ þ Bðy2 ; x2 Þ = ðf × eÞ = 0: ð24:23Þ
η2 ξ1 þ η2′ ξ1′
η2 ξ2 þ η2′ ξ2′
η2 ξ3 þ η2′ ξ3′
This time, (24.23) can be separated into three components such that
η1 η01 η1 η01 η1
ξ1 þ ξ01 = 0, ξ2 þ ξ02 = 0, ξ3
η2 η02 η2 η02 η2
η01
þ ξ03 = 0: ð24:24Þ
η02
Assuming that y1 and y2(2W) are linearly independent, in turn, (24.24) implies
that ξ1 = ξ01 = ξ2 = ξ02 = ξ3 = ξ03 = 0. Namely, from (24.16), we have
x1 = x2 = 0: ð24:25Þ
This is an assertion of (B2). The statement (B3) is self-evident; see (24.14) and
(24.15).
In relation to Theorem 24.1, we give another important theorem.
Theorem 24.2 [1–3] Let V and W be vector spaces over K. Then, a vector space
T (over K ) and a bilinear mapping B : W × V ⟶ T that satisfy the above-mentioned
conditions (B1), (B2), and (B3) must exist. Also, let T be an arbitrarily chosen vector
space over K and let B be an arbitrary bilinear mapping expressed as
B : W × V ⟶ T: ð24:26Þ
is uniquely determined.
Proof
(i) Existence: Suppose that T is a nm-dimensional vector space with the basis set
B( fj, ei) (1 ≤ i ≤ n, 1 ≤ j ≤ m), where ei (1 ≤ i ≤ n) and fj (1 ≤ j ≤ m) are the basis
set of V and W, respectively. We assume that V and W are n- and m-dimensional
vector spaces, respectively.
Let x 2 V and y 2 W be expressed as
1148 24 Basic Formalism
n m
x= ξ e ,y=
i=1 i i
ηf :
j=1 j j
ð24:28Þ
ρ ζ e
i,j ji ji
= ζ B
i,j ji
f j , ei , ð24:30Þ
m n m n
ρ½Bðy, xÞ = ρ j=1
ηξe
i = 1 j i ji
= j=1
ηξρ
i=1 j i
eji
= ηξB
i,j j i
f j , ei = i,j
B ηj f j , ξi ei = B ηf ,
j j j
ξe
i i i
= Bðy, xÞ,
where with the third equality, we used (24.30) and (24.31); with the fourth and fifth
equalities, we used the bilinearity of B. Thus, the linear mapping ρ defined in
(24.30) satisfies (24.27).
The other way around, any linear mapping that satisfies (24.27) can be described
as
ρ eji = ρ B f j , ei = B f j , ei : ð24:32Þ
Multiplying both sides of (24.32) by ζji and summing over i and j, we recover
(24.30). Thus, the linear mapping ρ is determined uniquely. These complete the
proof.
24.1 Extended Concepts of Vector Spaces 1149
Despite being apparently craggy and highly abstract, Theorem 24.2 finds extensive
applications in the vector space theory. The implication of Theorem 24.2 is that once
the set (T, B) is determined for the combination of the vector spaces V and W such
that B : W × V ⟶ T, any other set ðT, BÞ can be derived from (T, B). In this respect,
B defined in Theorem 24.2 is referred to as the universal bilinear mapping
[3]. Another important point is that the linear mapping ρ : T ⟶ T defined in
Theorem 24.2 is bijective (see Fig. 11.3) [1, 2]. That is, the inverse mapping ρ-1
exists.
The range of the universal bilinear mapping B is a vector space over K. The vector
space T accompanied by the bilinear mapping B, or the set (T, B) is called a tensor
product (or Kronecker product) of V and W. Thus, we find that the constitution of
tensor product hinges upon Theorem 24.2. The tensor product T is expressed as
T W V: ð24:33Þ
Bðy, xÞ y x, ð24:34Þ
where x 2 V, y 2 W, and y x 2 T.
The notion of tensor product produces a fertile plain for the vector space theory.
In fact, (24.29) can readily be extended so that various bilinear mappings are
included. As a typical example, let us think of linear mappings φ and ψ such that
φ : V ⟶ V 0, ψ : W ⟶ W 0, ð24:35Þ
where V, V′, W, and W′ are vector spaces of the dimension n, n′, m, and m′,
respectively. Also, suppose that we have [1, 2]
T =W V and T W 0 V 0: ð24:36Þ
What we assume in (24.36) is that even though in Theorem 24.2 we have chosen
T as an arbitrary vector space, in (24.36) we choose T as a special vector space (i.e.,
a tensor product W′ V′). From (24.35) and (24.36), using φ and ψ we can construct
a bilinear mapping B : W × V ⟶ W 0 V 0 such that
where ψ( y) 2 W′, φ(x) 2 V′, and ψ ðyÞ φðxÞ 2 T = W 0 V 0 . Then, from Theorem
24.2 and (24.34), a linear mapping ρ : T ⟶ T must uniquely be determined such
that
1150 24 Basic Formalism
ρ ψ φ, ð24:39Þ
Equation (24.40) implies that in response to a tensor product of the vector spaces,
we can construct a tensor product of the operators that act on the vector spaces. This
is one of the most practical consequences derived from Theorem 24.2.
The constitution of the tensor products is simple and straightforward. Indeed, we
have already dealt with the tensor products in connection with the group represen-
tation (see Sect. 18.8). Let A and B be matrix representations of φ and ψ, respec-
tively. Following (18.193), the tensor product of B A is expressed as [4]
b11 A ⋯ b1m A
B A= ⋮ ⋱ ⋮ , ð24:41Þ
bm0 1 A ⋯ bm0 m A
where A and B are given by (aij) (1 ≤ i ≤ n′; 1 ≤ j ≤ n) and (bij) (1 ≤ i ≤ m′; 1 ≤ j ≤ m),
respectively. In RHS of (24.41), A represents a (n′, n) full matrix of (aij). Thus, the
direct product of matrix A and B yields a (m′n′, mn) matrix. We write it symbolically
as
The computation rules of (24.41) and (24.42) apply to a column matrix and a row
matrix as well. Suppose that we have
x1 y1
x = ðe1 ⋯ en Þ ⋮ , y = ðf 1 ⋯ f m Þ ⋮ , ð24:43Þ
xn ym
where e1, ⋯, en are the basis vectors of V and f1, ⋯, fm are the basis vectors of W.
Thus, the expression of y x has already appeared in (24.15) as
24.1 Extended Concepts of Vector Spaces 1151
y x=
y1 x1
⋮
y1 xn
y2 x1
ð f 1 e1 ⋯ f 1 en f 2 e 1 ⋯ f 2 en ⋯ f m e1 ⋯ f m en Þ ⋮ :
y2 xn
⋮
y m x1
⋮
y m xn
ð24:44Þ
y1 x1
BðyÞ = f 01 ⋯ f 0m0 B ⋮ , AðxÞ = e01 ⋯ e0n0 A ⋮ , ð24:46Þ
ym xn
where e01 , ⋯, e0n0 are the basis vectors of V′ and f 01 , ⋯, f 0m0 are the bas vectors of W′.
Thus, (24.45) can be rewritten as
y1 x1
f 01 ⋯ f 0m0 B ⋮ e01 ⋯ e0n0 A ⋮
ym xn
y1 x1
= f 01 ⋯ f 0m0 e01 ⋯ e0n0 ðB AÞ ⋮ ⋮ : ð24:47Þ
ym xn
y1 x1
⋮
y1 x n
y1 x1 b11 A ⋯ b1m A y2 x1
ðB AÞ ⋮ ⋮ = ⋮ ⋱ ⋮ ⋮ : ð24:48Þ
y2 x n
ym xn bm ′ 1 A ⋯ bm ′ m A
⋮
ym x 1
⋮
ym x n
Notice that since B A operated on the coordinates from the left with the basis set
unaltered, we omitted the basis set f 01 ⋯f 0m0 e01 ⋯e0n0 in (24.48). Thus, we only
had to perform the numerical calculations based on the standard matrix theory as in
RHS of (24.48), where we routinely dealt with a product of a (m′n′, mn) matrix and a
(mn, 1) matrix, i.e., column vector. The resulting column vector is a (m′n′, 1) matrix.
The bilinearity of tensor product of the operators A and B can readily be checked
so that we have
B ðA þ A0 Þ = B A þ B A0 ,
ðB þ B0 Þ A = B A þ B0 A: ð24:49Þ
The bilinearity of the tensor product of column or row matrices can be checked
similarly as well.
Suppose that the dimension of all the vector spaces V, V′, W, and W′ is the same n.
This simplest case is the most important and frequently appears in many practical
applications. We have
a11 ⋯ a1n x1
AðxÞ = ðe1 ⋯ en Þ ⋮ ⋱ ⋮ ⋮ ,
an1 ⋯ ann xn
b11 ⋯ b1n y1
B ð y Þ = ð e1 ⋯ e n Þ ⋮ ⋱ ⋮ ⋮ , ð24:50Þ
bn1 ⋯ bnn yn
x1 y1
x = ð e1 ⋯ en Þ ⋮ , y = ðe1 ⋯ en Þ ⋮ : ð24:51Þ
xn yn
A : V n ⟶ V n, B : V n ⟶ V n, ð24:52Þ
ðB AÞðy xÞ
y1 x1
= ½ðe1 ⋯ en Þ ðe1 ⋯ en Þ½B A ⋮ ⋮ , ð24:55Þ
yn xn
b11 A ⋯ b1n A
B A= ⋮ ⋱ ⋮ : ð24:56Þ
bn1 A ⋯ bnn A
The structure of (24.55) is the same as (24.45) in point of the linear mapping.
Also, (24.56) has a same constitution as (24.41), except that the former gives a (n2,
n2) matrix, whereas the latter is a (m′n′, mn) matrix. In other words, instead of (24.48)
we have a linear mapping B A described by
2 2
B A : Vn ⟶ Vn : ð24:57Þ
In this regard, the tensor product produces an enlarged vector space. As shown
above, we use a term of tensor product to represent a direct product of two operators
(or matrices) besides a direct product of two vector spaces. In relation to the latter
case, we use the same term of tensor product to represent a direct product of row
vectors and that of column vectors (vide supra).
A following example helps us understand how to construct a tensor product.
Then, we get
= ðe1 e1 e1 e2 e2 e1 e2 e2 Þ ×
b11 a11 b11 a12 b12 a11 b12 a12 y 1 x1
b11 a21 b11 a22 b12 a21 b12 a22 y 1 x2
: ð24:59Þ
b21 a11 b21 a12 b22 a11 b22 a12 y 2 x1
b21 a21 b21 a22 b22 a21 b22 a22 y 2 x2
where the rightmost side denotes the column vector after operating B A on y x.
Thus, we obtain
Equation (24.62) has already represented a general feature of the tensor product.
We discuss the transformation properties of tensors later.
24.1 Extended Concepts of Vector Spaces 1155
In the previous sections, we have studied several forms of the bilinear mapping. In
this section, we introduce a specific form of the bilinear mapping called a
bilinear form.
Definition 24.2 Let V and W be n-dimensional and m-dimensional vector spaces
over K, respectively. Let B be a mapping described by
B : W × V ⟶ K, ð24:63Þ
where W × V is a direct product space (see Sect. 24.1.1); K denotes the field (usually
chosen from among a complex number field ℂ or real number field ℝ). Suppose that
B(, ) satisfies the following conditions [1, 2]:
Then, we have
m n
Bðy, xÞ = j=1
ηξB
i=1 j i
f j , ei : ð24:65Þ
So far, we have introduced the bilinear mapping and tensor product as well as related
notions. The vector spaces, however, were “structureless,” as in the case of the
vector spaces where the inner product was not introduced (see Part III). Now, we
wish to introduce the concept of the dual (vector) space to equip the vector space
with a “metric.”
1156 24 Basic Formalism
Definition 24.3 Let V be a vector space over a field K. Then, a collection of arbitrary
homogeneous linear functionals f defined on V that map x (2V ) to K is called a dual
vector space (or dual space) of V.
This simple but abstract definition needs some explanation. The dual space of V is
written as V. With 8f 2 V and 8g 2 V, we define the sum f + g and multiplication
αf (α 2 K ) as
Then, V forms a vector space over K with respect to the operation of (24.66). The
dimensionality of V is the same as that of V [1, 2]. To show this, we make a
following argument: Let 8x 2 V be expressed as
n
x= i=1
ξ i ei ,
fi ðxÞ = ξi : ð24:67Þ
Accordingly, we have
n
x= i=1
fi ðxÞei ði = 1, ⋯, nÞ: ð24:68Þ
Meanwhile, we have
n n
fi ð xÞ = f i k=1
ξ k ek = k=1
ξk fi ðek Þ: ð24:69Þ
A set of f1 , ⋯, fn is a basis set of V and called a dual basis with respect to {e1,
⋯, en}. As we have already seen in Sect. 11.4, we can choose another basis set of V
that consists of n linearly independent vectors. Accordingly, the dual basis that
satisfies (24.70) is sometimes referred to as a standard basis or canonical basis to
distinguish it from other dual bases of V. We have
24.1 Extended Concepts of Vector Spaces 1157
V = Span f1 , ⋯, fn
in correspondence with
V = Span fe1 , ⋯, en g:
Then, there must be an inverse mapping N - 1 to N (see Sect. 11.2). That is, we
have
ek = N - 1 fk ðk = 1, ⋯, nÞ:
with
ei ðyÞ = ηi , ð24:73Þ
where ei 2 V (ei : V ⟶ K ).
As discussed above, we find that the vector spaces V and V stand in their own
right on an equal footing. Whereas residents in V see V as a dream world, those in V
see V as it as well. Nonetheless, the two vector spaces are complementary to each
other, which can clearly be seen from (24.67), (24.68), (24.72), and (24.73). These
situations can be formulated as a recurrence property such that
ðV Þ = V = V: ð24:74Þ
number field) for a future purpose. Then, as in (24.65), B(, ) is uniquely determined
by a (n, n) matrix B fj , ei ð2 ℝÞ, where fj ðj = 1, ⋯, nÞ ð2 V Þ are linearly inde-
pendent in V and form a dual basis (vide supra). The other way around, ek (k = 1,
⋯, n) (2V ) are linearly independent in V. To denote these bilinear forms B(, ), we
use a shorthand notation described by
½j,
where an element of V and that of V are written in the left-side and right-side
“parentheses,” respectively. Also, to clearly specify the nature of a vector species,
we use j ] for an element of V and [ j for that of V as in the case of a vector of the
inner product space (see Chap. 13). Notice, however, that [| ] is different from the
inner product h| i in the sense that although h| i is positive definite, it is not
necessarily true of [| ]. Using the shorthand notation, e.g., (24.70), can be succinctly
expressed as
Because of the isomorphism between V and V, jx] is mapped to [xj by the
bijective mapping and vice versa. We call [xj a dual vector of jx].
Here we briefly compare the bilinear form with the inner product (Chap. 13).
Unlike the inner product space, the basis vectors [xj do not imply the concept of
adjoint with respect to the vector spaces we are thinking of now. Although hx| xi is
positive definite for any jxi 2 V, the positive definiteness is not necessarily true of [x|
x]. Think of a following simple example. In case K = ℂ, we have [x| x] = [ix| -
ix] = - [ix| ix]. Hence, if [x| x] > 0, [ix| ix] < 0. Also, we may have a case where [x|
x] = 0 with jx] ≠ 0 (vide infra). We will come back to this issue later. Hence, it is safe
to say that whereas the inner product buys the positive definiteness at the cost of the
bilinearity, the bilinear form buys the bilinearity at the cost of the positive
definiteness.
The relation represented by (24.70) is somewhat trivial and, hence, it will be
convenient to define another bilinear form B(, ). Meanwhile, in mathematics and
physics, we often deal with invariants before and after the vector (or coordinate)
transformation (see Sect. 24.1.5). For this purpose, we wish to choose a basis set of
V more suited than fk ðk = 1, ⋯, nÞ. Let us define the said basis set of V as
ff1 , ⋯, fn g:
n n
j x = i=1
ξi j ei , x j = i=1
ξi fi j , ξi 2 ℝ, ð24:77Þ
Mjx = ½x j : ð24:79Þ
we obtain
ξ1
½xjx = ξ1 ⋯ ξn M ⋮ : ð24:81Þ
ξn
We rewrite (24.81) as
½xjx = ξi M ij ξj = ξk M kk ξk þ ξl M lm ξm , ð24:82Þ
i, j k l, m; l ≠ m
1
ξl M lm ξm = ξl M lm ξm þ ξm M ml ξl
l, m; l ≠ m
2 l, m; l ≠ m m, l; m ≠ l
1
= ðM þ M ml Þξl ξm :
2 l, m; l ≠ m lm
½xjx = ~ kk ξk þ
ξk M ~ lm ξl ξm :
M
k l, m; l ≠ m
1160 24 Basic Formalism
Then, we assume that M is symmetric from the beginning, i.e., fi jej = fj jei
[1, 2]. Thus, M is real symmetric matrix (note K = ℝ), and so it is to be diagonalized
through orthogonal similarity transformation. That is, M can be diagonalized with an
orthogonal matrix P = Pi j . We have
f Pj j
j j 1
e Pi
i i 1
⋯ f Pj j
j j 1
e Pi
i i n
= ⋮ ⋱ ⋮
f Pj j
j j n
e Pi
i i 1
⋯ f Pj j
j j n
e Pi
i i n
where summation with respect to both i and j ranges from 1 to n. In (24.83), we have
e0k = Pðek Þ = i
Pi k j ei , f0k j = Pðfk Þ = j
Pj k fj j , f0j je0i = d j δji , ð24:84Þ
d i1
⋱
d ip
dipþ1
,
⋱
dipþq
where di1 , ⋯, dip are positive; dipþ1 , ⋯, d ipþq are negative; we have p + q = n. What
we have done in the above process is to rearrange the order of the basis vectors
s times so that the positive di1 , ⋯, dip can be arranged in the upper diagonal elements
and the negative dipþ1 , ⋯, d ipþq can be arranged in the lower diagonal elements.
Correspondingly, we obtain
24.1 Extended Concepts of Vector Spaces 1161
we get
The numbers p and q are intrinsic to the matrix M. Since M is a real symmetric
matrix, M can be diagonalized with the similarity transformation by an orthogonal
matrix. But, since the eigenvalues (given by diagonal elements) are unchanged after
the similarity transformation, it follows that p (i.e., the number of the positive
eigenvalues) and q (the number of the negative eigenvalues) are uniquely determined
by M. The integers p and q are called a signature [1, 2]. The uniqueness of ( p, q) is
known as the Sylvester’s law of inertia [1, 2].
We add as an aside that if M is a real symmetric singular matrix, M must have an
eigenvalue 0. In case the eigenvalue zero is degenerated r-fold, we have p + q + r = n.
In that case, p, q, and r are also uniquely determined and these integers are called a
signature as well. This case is also included as the Sylvester’s law of inertia [1, 2].
Note that (24.87) is not a similarity transformation, but an equivalence transfor-
mation and, hence, that the eigenvalues have been changed [from dj (≠0) to ±1].
Also, notice that unlike the case of the equivalence transformation with an arbitrarily
chosen non-singular matrix mentioned in Sect. 14.5, (24.87) represents an equiva-
lence transformation as a special case. That is, the non-singular operator comprises a
product of an orthogonal matrix PΠ and a diagonal matrix D.
The above-mentioned equivalence transformation also causes the transformation
of the basis set. Using (24.86), we have
Defining
1162 24 Basic Formalism
we find that both the basis sets of V and V finally undergo the same transformation
through the equivalence transformation by PΠD such that
with
-1 ξ1
½xjx = ξ1 ⋯ ξn W T W T MW W - 1 ⋮
ξn
T ξ1
= ξ1 ⋯ ξ n W - 1 W T MW W - 1 ⋮ :
ξn
Just as the basis sets are transformed by W, so the coordinates are transformed
according to W-1 = D-1Π-1P-1 = D-1ΠTPT. Then, consecutively we have
k ik
ξ0 = P-1 ξi , ξ0 k = Π-1 ξ0 m ,
k i i
i i im im
~ξik ¼ D-1
ik
ξ0 m ¼
i
± d ik δik im ξ0 m ¼
i
± d i k ξ0 k :
i
ð24:90Þ
im im im
~ξi1 ξ1 ξ1
-1 -1 -1 -1
⋮ ¼W ⋮ ¼D Π P ⋮ : ð24:91Þ
~ξin ξn ξn
Thus, (24.88) and (24.91) afford the transformation of the basis vectors and
corresponding coordinates. These relations can be compared to (11.85).
Now, if the bilinear form [y| x] (x 2 V, y 2 V) described by [1, 2]
ξ1
½yjx = η1 ⋯ ηn M ⋮
ξn
24.1 Extended Concepts of Vector Spaces 1163
η1 ⋯ ηn M = 0 ⟹ η1 ⋯ ηn = 0:
This implies detM ≠ 0. Namely, M is non-singular. From the condition (ii) which
implies that
ξ1 ξ1
M ⋮ =0 ⟹ ⋮ = 0,
ξn ξn
we get detM ≠ 0 as well. The determinant of the matrix for RHS of (24.87) is ±1.
Thus, the non-degeneracy of the bilinear form is equivalent to that M has only
non-zero eigenvalues. It is worth comparing two vector spaces, the one of which is
an inner product space Vin and the other of which is a vector space Vbi where the
bilinear form is defined.
Let x 2 Vin and y 2 Vin, where Vin is a dual space of Vin. The inner product can
be formulated by a mapping I such that
I : V in × V in ⟶ ℂ:
As for the vector space Vbi, the matrix M defined in (24.80) is a counterpart to the
Gram matrix G. With respect to two sets of basis vectors of Vbi and its dual space V bi ,
we have an expression related to (13.31). That is, we have [1, 2]
Let us consider a case where we construct M as in (24.80) from two sets of basis
vectors of Vbi and V bi given by
f1 j , ⋯, fn j ; ½fi j 2 V bi ði = 1, ⋯, nÞ :
We assume that these two sets are connected to each other through (24.76).
Suppose that we do not know whether these sets are linearly dependent or indepen-
dent. However, if either of the above two sets of vectors is linearly independent, so is
the remainder set. Think of the following relation:
ξ 1 j e1 þ ⋯ þ ξ n j en = 0
# M " M-1
ξ1 ½f1 j þ⋯ þ ξn ½fn j = 0:
In the above relation, M has been defined in (24.76). If je1], ⋯, j en] are linearly
independent, we obtain ξ1 = ⋯ = ξn = 0. Then, it necessarily follows that
½f1 j , ⋯, ½fn j are linearly independent as well, and vice versa. If, the other way
around, either of the above two sets of vectors is linearly dependent, so is the
remainder set. This can be shown in a manner similar to the above. Nonetheless,
even though we construct M using these two basis sets, we do not necessarily have
detM ≠ 0. To assert detM ≠ 0, we need the condition (24.92).
In place of M of (24.80), we use a character M and define (24.87) anew as
1
⋱
1
M : ð24:93Þ
-1
⋱
-1
Meanwhile, we redefine the notation ~e, ~f, and ~ξ as e, f, and ξ, respectively, and
renumber their subscripts (or superscripts) in (24.88) and (24.90) such that
The matrix M is called metric tensor. This terminology is, however, somewhat
misleading, because M is merely a matrix consisting of real numbers. Care should be
taken accordingly. If all the eigenvalues of M are positive, the quadratic form [x| x] is
positive definite (see Sect. 14.5). In that case, the metric tensor M can be expressed
using an inner product such that
1 0 0 0
-1
M= 0
0 0
0
-1
0
0
: ð24:97Þ
0 0 0 -1
The matrix M is identical with η of (21.16) that has already appeared in Chap. 21.
24.1.5 Invariants
The metric tensor M plays a fundamental role in a n-dimensional real vector space. It
is indispensable for studying the structure and properties of the vector space. One of
the most important points is that M has a close relationship to the invariants of the
vector field. First, among various operators that operate on a vector, we wish to
examine what kinds of operators hold [x| x] invariant after their operation on vectors
jx] and [xj belonging to V or V, respectively.
Let the relevant operator be R. Operating R on
n ξ1
jx = i=1
ξi j ei = ðje1 ⋯ jen Þ ⋮ ð2 V Þ,
ξn
we obtain
n
Rjx = j Rx = i,j = 1
Ri j ξj j ei , ð24:98Þ
1166 24 Basic Formalism
n
Rx j = i,j = 1
R i j ξj ½ fi j : ð24:99Þ
Then, we get
n n n
½RxjRx = i,j = 1
R i j ξj k,l = 1
Rk l ξl ½fi jek = i,j,k,l = 1
ξj Ri j M ik Rk l ξl , ð24:100Þ
where with the last equality we used (24.80) and M ik represents (i, k)-element of M
given in (24.93), i.e., M ik = ± δik . Meanwhile, we have
n
½xjx = j,l = 1
ξj M jl ξl : ð24:101Þ
so that [Rx| Rx] - [x| x] = 0 (namely, so that [x| x] can be invariant before and after
the transformation by R). In a matrix form, (24.103) can be rewritten as
RT M R = M : ð24:104Þ
Remind that an equation related to (24.104) has already appeared in Sect. 13.3 in
relation to the Gram matrix and adjoint operator; see (13.81). If real matrices are
chosen for R and G, (13.81) is translated into
RT GR = G:
x1
x = ðe1 ⋯ en Þ ⋮
xn
x1 x1 x~1
= ðe1 ⋯ en ÞPP - 1 ⋮ = e01 ⋯ e0n P - 1 ⋮ = e01 ⋯ e0n ⋮ : ð11:84Þ
xn xn x~n
ξ1 ξ1 ξ01
x = ð e1 ⋯ e n Þ ⋮ = ðe1 ⋯ en ÞR - 1 R ⋮ = e01 ⋯ e0n ⋮ : ð24:105Þ
ξn ξn ξ0n
Note that in (24.105) R was used instead of P-1 in (11.84). This is because in this
section we place emphasis on the coordinate transformation. Thus, as the transfor-
mation of the basis set and its corresponding coordinates of V, we have
n j
e01 ⋯ e0n = ðe1 ⋯ en ÞR - 1 or e0i = e
j=1 j
R - 1 i: ð24:106Þ
Also, we obtain
ξ01 ξ1 n
⋮ =R ⋮ or ξ0i = j=1
R i j ξj : ð24:107Þ
ξ0n ξn
We assume that the transformation of the dual basis fi of V is described such that
n
f0i = j=1
Qi j fj : ð24:108Þ
k k k
f0i je0j = Qi l fl jek R - 1 j
= Qi l R - 1 j
fl jek = Qi l R - 1 δl
j k
l
= Qi l R - 1 j
= fi jej = δij : ð24:110Þ
QR - 1 = E: ð24:111Þ
Also, we have
n n n
½xjx = j,l = 1
ξj fj jel ξl = j,l = 1
ξj T f
k = 1 jk
k
jel ξl
n n n
= j,l,k = 1
ξj T jk fk jel ξl = j,l,k = 1
ξj T jk δkl ξl = j,l = 1
ξj T jl ξl : ð24:113Þ
T jl = M jl i:e:, T =M:
Hence, we have
n
fi = j=1
M ij fj ði = 1, ⋯, nÞ: ð24:114Þ
j n j
M e0i = f0i = R - 1 i M ej = j=1
R - 1 i fj , ð24:118Þ
where with the first equality we used (24.117); with the last equality we used (24.76).
Rewriting (24.118), we get
n j
f0i = f
j=1 j
R - 1 i: ð24:119Þ
n n n k
½xjx = ξξ
l=1 l
l
= ξ δk ξl
k,l = 1 k l
= ξ
i,k,l = 1 k
R-1 i
R i l ξl : ð24:122Þ
n k
ξ0i ξ
k=1 k
R-1 i
ð24:123Þ
n j l
f0i je0k = j,l = 1
fj jel R - 1 i
R-1 k
: ð24:125Þ
n n j l
i,k = 1
Ri s Rk t f0i je0k = i,k,j,l = 1
R - 1 i Ri s R - 1 k
Rk t fj jel
n n
= δj δl
j,l = 1 s t
fj jel = ½fs jet = M st = i,k = 1
Ri s M ik Rk t , ð24:126Þ
where with the second to the last equality we used (24.115); with the last equality we
used (24.103). Putting f0i je0k = M 0ik and comparing the leftmost side and rightmost
side of (24.126), we have
n n
i,k = 1
Ri s M 0ik Rk t = i,k = 1
Ri s M ik Rk t : ð24:127Þ
RT M 0 R = RT M R: ð24:128Þ
M0=M: ð24:129Þ
That is, M is invariant with respect to the basis vector transformation. Hence, M
is called invariant metric tensor. In other words, the transformation R that satisfies
(24.128) and (24.129) makes [x| x] invariant and, hence, plays a major role in the
vector space characterized by the invariant metric tensor M .
As can be seen immediately from (24.93), M is represented as a symmetric matrix
and an inverse matrix of M - 1 is M itself. We denote M - 1 by
ij
M -1 M -1 ðM Þij :
Then, we have
n n
j=1
M ij M jk = δik or k=1
M jk M ki = δij : ð24:130Þ
These notations make various expressions simple. For instance, multiplying both
sides of (24.120) by M ik and summing the result over i we have
n n n n
ξM
i=1 i
ik
i=1 j=1
ξj M ji M ik = j=1
ξj δkj = ξk :
n n n n
fM
i=1 i
il
= i=1 j=1
M li M ij fj = δl fj
j=1 j
= fl : ð24:131Þ
Moreover, we get
n n n n n n
i=1
ξi fi = l=1 i=1
ξi δli fl = l=1 k=1 i=1
ξi M ik M kl fl
n
= ξf
k=1 k
k
, ð24:132Þ
In Sect. 24.1.1, we showed the definition and a tangible example of the bilinear
mapping. Advancing this notion further, we reach the multilinear mapping
[1, 2]. Suppose that a mapping S from a direct product space Vs × ⋯ × V1 to a
vector space T is described by
S : V s × ⋯ × V 1 ⟶ T,
where V1, ⋯, Vs are vector spaces and Vs × ⋯ × V1 is their direct product space. If
with x(i) 2 Vi (i = 1, ⋯, s), S xðsÞ , ⋯, xð1Þ is linear with respect to each x(i) (i = 1,
⋯, s), then S is called a multilinear mapping. As a natural extension of the tensor
product dealt with in Sect. 24.1.2, we can define a tensor product T in relation to the
multilinear mapping. It is expressed as
T = V s ⋯ V 1: ð24:133Þ
~ : V s × ⋯ × V 1 ⟶ T:
S
T = V s ⋯ V 1:
is uniquely determined.
As in the case of the bilinear mapping, we have linear mappings φi (i = 1, ⋯, s)
such that
φi : V i ⟶ V 0i ði = 1, ⋯, sÞ: ð24:135Þ
T = V 0s ⋯ V 01 : ð24:136Þ
ρ φs ⋯ φ1 , ð24:138Þ
we get
φi : V i ⟶ V i ði = 1, ⋯, sÞ: ð24:140Þ
T = V s ⋯ V 1:
φs ⋯ φ1 : T ⟶ T:
Replacing φi with a (n, n) matrix Ai, (24.55) can readily be generalized such that
where e(1) ⋯ e(s) is represented by a (1, ns) row matrix; ξ(s) ⋯ ξ(1) by a
(ns, 1) column matrix. The operator As ⋯ A1 is represented by a (ns, ns) matrix.
As in the case of (11.39), the operation of As ⋯ A1 is described by
That is, the associative law holds with the matrix multiplication.
We have a more important case where some of vector spaces Vi (i = 1, ⋯, s) that
are involved in the tensor product are the same space V and the remaining vector
spaces are its dual space V. The resulting tensor product is described by
T = V s ⋯ V 1 ðV i = V or V ; i = 1, ⋯, nÞ, ð24:142Þ
An element of the tensor space is called a s-th rank tensor. In particular, if all
Vi (i = 1, ⋯, s) are V, the said tensor space is called a s-th rank contravariant tensor
space and the elements in that space are said to be a s-th rank contravariant tensor. If
all Vi (i = 1, ⋯, s) are V, the tensor space is called a s-th rank covariant tensor space
and the elements in that space are said to be a s-th rank covariant tensor accordingly.
As an example, we construct three types of second rank tensors described by
(i)
ðR RÞ xð2Þ xð1Þ = eð2Þ eð1Þ ½R R ξð2Þ ξð1Þ ,
(ii)
T T
R-1 R-1 yð2Þ yð1Þ
T T
= fTð2Þ fTð1Þ R-1 R-1 ηTð2Þ ηTð1Þ ,
(iii)
T T
R-1 R yð2Þ xð1Þ = fTð2Þ eð1Þ R-1 R
Equation (24.143) shows (2, 0)-, (0, 2)-, and (1, 1)-type tensors for (i), (ii), and
(iii) from the top. In (24.143), with x(i) 2 V we denote it by x(i) = e(i) ξ(i) (i = 1, 2);
for y(i) 2 V we express it as yðiÞ = fTðiÞ ηTðiÞ ði = 1, 2Þ. The vectors x(i) and y(i) are
referred to as a contravariant vector and a covariant vector, respectively. Since with a
dual vector of V, a basis set fðiÞ are arranged in a column vector and coordinates are
arranged in a row vector, they are transposed so that their arrangement can be
consistent with that of a vector of V. Concomitantly, R-1 has been transposed as
well. Higher rank tensors can be constructed in a similar manner as in (24.143) noted
above.
The basis sets e(2) e(1), fTð2Þ fTð1Þ , and fTð2Þ eð1Þ are referred to a standard basis
(or canonical basis) of the tensor spaces V V, V V, and V V, respectively.
Then, if (24.142) contains the components of V, we need to use ηTðiÞ for the
coordinate; see (24.132). These basis sets are often disregarded in the computation
process of tensors.
Equation (24.143) are rewritten using tensor components (or “coordinates” of a
tensor) as
24.1 Extended Concepts of Vector Spaces 1175
n
(i) ξ ′ ið2Þ ξ ′ kð1Þ = j,l = 1
Ri l Rk j ξðl 2Þ ξjð1Þ ,
(ii)
n l j
η0ið2Þ η0kð1Þ = j,l = 1
R-1 i
R-1 η η ,
k lð2Þ jð1Þ
(iii)
n l
η0ið2Þ ξ ′ kð1Þ = j,l = 1
R - 1 i Rk j ηlð2Þ ξjð1Þ ,
where, e.g., ξ ′ ið2Þ ξ ′ kð1Þ shows the tensor components after being transformed by R.
With these relations, see (24.62). Also, notice that the invariant tensor fj jel of
(24.125) indicates the transformation property of the tensor components of a second
rank tensor of (0, 2)-type. We have another invariant tensor δij of (1, 1)-type, whose
transformation is described by
l l i
δ0 j = R - 1 j Rik δkl = R - 1 j Ril = RR - 1 j = δij :
i
Note that in the above relation R can be any non-singular operator. The invariant
tensor δij is exactly the same as fi ðek Þ of (24.70). Although the invariant metric tensor
ðM Þij of (24.93) is determined by the inherent structure of individual vector spaces
that form a tensor product, the invariant tensor δij is common to any tensor space,
regardless of the nature of constituent vector spaces.
To show tensor components, we use a character with super/subscripts such as
e.g., Ξij (1≤i, j ≤ n), which are components of a second rank tensor of (2, 0)-type.
The transformation of Ξ is described by
n
Ξ0 =
ik
j,l = 1
Ri l Rk j Ξlj , etc:
We can readily infer the computation rule of the tensor from analogy and
generalization of the tensor product calculations shown in Example 24.2.
More generally, let us consider a (q, r)-type tensor space T. We express an
element T (i.e., a tensor) of T ðT 2 T Þ symbolically as
T = Γ Θ Ψ, ð24:144Þ
where Γ denotes a tensor product of the s basis vectors of V or V; Θ denotes a tensor
product of the s operators R or (R-1)T; Ψ also denotes a tensor product of the
s coordinates ξiðjÞ or ηk(l ) where 1 ≤ j ≤ q, 1 ≤ l ≤ r, 1 ≤ i, k ≤ n (n is a dimension
of the vector spaces V and V). Individual tensor products are further described by
1176 24 Basic Formalism
In (24.145), Γ represents s basis sets gðiÞ ð1 ≤ i ≤ sÞ, which denote either e(i) or
fTðiÞ ð1 ≤ i ≤ sÞ. The number of e(i) is q and that of f TðiÞ is r with q + r = s. With the
tensor product Θ, Q(i) represents R or (R-1)T with the number of R and (R-1)T being
q and r, respectively. The tensor product Ψ denotes s coordinates ζ (i) (1 ≤ i ≤ s),
which represent either ξ(i) or ηTðiÞ . The number of ξ(i) is q and that of ηTðiÞ is r with
q + r = s.
The symbols ξ(i) and ηTðiÞ represent
ξ1ðiÞ η 1ð i Þ
ξðiÞ = ⋮ ð1 ≤ i ≤ qÞ, ηTðiÞ = ⋮ ð1 ≤ i ≤ r Þ,
ξnðiÞ η nð i Þ
ξ ξ ⋯ξ
Ψ T ηððqrÞÞηððrq-- 11ÞÞ⋯ηðð11ÞÞ : ð24:146Þ
ξ ξ ⋯ξ
Ψ0 = T 0 ηððqrÞÞηððrq-- 11ÞÞ⋯ηðð11ÞÞ
α β γ ρσ⋯τ
= R-1 η ðr Þ
R-1 η ð r - 1Þ
⋯ R-1 η ð 1Þ
RξðqÞ ρ Rξðq - 1Þ σ ⋯Rξð1Þ τ T αβ⋯γ : ð24:147Þ
As can be seen above, if the tensor products of the basis vectors Γ are fixed
(normally at the standard basis) in (24.144), the tensor T and their transformation are
determined by the part Θ Ψ in (24.144). Moreover, since the operator R is related to
the invariant metric tensor M through (24.104), the most important factor of
(24.144) is the tensor product Ψ of the coordinates. Hence, it is often the case in
practice with physics and natural science that Ψ described by (24.146) is simply
referred to as a tensor.
Let us think of the transformation property of a following (1, 1)-type tensor Ξji :
24.1 Extended Concepts of Vector Spaces 1177
n l
Ξ ′ ki = j,l = 1
R - 1 i Rk j Ξjl : ð24:148Þ
n n l n n
i=1
Ξ ′ ii = i,j,l = 1
R - 1 i Ri j Ξjl = δl Ξj
j,l = 1 j l
= Ξl
l=1 l
Ξll : ð24:149Þ
Thus, (24.149) represents an invariant quantity that does not depend on the
coordinate transformation. The above computation rule is known as tensor contrac-
tion. In this case, the summation symbol is usually omitted and this rule is referred to
as Einstein summation convention (see Chap. 21). We readily infer from (24.149)
that a (q, r)-type tensor is converted into a (q - 1, r - 1)-type tensor by the
contraction.
As typical examples of the real vector spaces that are often dealt with in mathemat-
ical physics, we have Euclidean space and Minkowski space. Even though the
former is familiar enough even at the elementary school level, we wish to revisit
and summarize its related topics.
The metric tensor of Euclidean space is given by
Hence, from (24.104) the transformation R that holds a bilinear form [x| x]
invariant satisfies the condition
RT ER = RT R = E, ð24:151Þ
where E is the identity operator of Euclidean space. That is, R must be an orthogonal
matrix. From (24.120), we have ξi = ξ j. Also, from (24.123) we have
n k n n
ξ0i R-1
k
ξ
k=1 k i
= ξ
k=1 k
RT i
= ξ Ri
k=1 k k
n
= k=1
Ri k ξk : ð24:152Þ
1 0 0 0
-1
M= 0
0 0
0
-1
0
0
: ð24:97Þ
0 0 0 -1
Minkowski space has several properties different from those of Euclidean space
because of the difference in the metric tensor. First, we make general discussion
concerning the vector spaces, followed by specific topics related to Minkowski space
as contrasted with the properties of Euclidean space. The discussion is based on the
presence of the bilinear form that has already been mentioned in Sect. 24.1.3. First,
we formally give a following definition.
Definition 24.4 [5] Let W be a subspace of a vector space V. Suppose that with
∃
w0 (≠0) 2 W we have
8
½w0 jw = 0 for w 2 W: ð24:154Þ
8
½ujv = 0 for u, v 2 W,
In this context, we have following important proposition and theorems and see
how the present situation can be compared with the previous results obtained in the
inner product space (Sect. 14.1).
Proposition 24.1 [5] Let V be a vector space where a bilinear form [| ] is defined.
Let W be a subspace of V and let W⊥ be an orthogonal complement of W. A
necessary and sufficient condition for W \ W⊥ = {0} to hold is that W is
non-degenerate.
Proof
(i) Sufficient condition: To prove “W is non-degenerate ⟹ W \ W⊥ = {0},” we
prove its contraposition. That is, suppose W \ W⊥ ≠ {0}. Then, we have 0 ≠ ∃w0
which satisfies w0 2 W \ W⊥. From the definition of (24.155) for the orthogonal
complement and the assumption of w0 2 W⊥, we have
V =W W ⊥: ð24:157Þ
Moreover, we have
⊥
ðW ⊥ Þ = W, ð24:158Þ
⊥
dimW = n - r: ð24:159Þ
see (24.94) and (24.115). Notice that the terminology of orthonormal basis was
borrowed from that of the inner product space and that the existence of the ortho-
normal basis can be proven in parallel to the proof of Theorem 13.2 (Gram-Schmidt
orthonormalization theorem) and Theorem 13.3 [4].
1180 24 Basic Formalism
n n
fj jx = i=1
ξi fj jei = i=1
ξi δji ζ i = ξj ζ j : ð24:161Þ
Note that the summation with respect to j was not taken. Multiplying both sides of
(24.161) by ζ j, we have
ξj = ζ j fj jx: ð24:162Þ
Thus, we obtain
n
j x = j=1
ζ j fj jxjej : ð24:163Þ
we obtain
r r r
½w0 jw = ηk ηj
k,j = 1 0
fk jej = ηk ηj δkj ζ j
k,j = 1 0
= ηj ηj ζ j :
j=1 0
ð24:165Þ
r
j x0 = j=1
ζ j fj jxjej , ð24:166Þ
where we used (24.162). Putting |x′′] = |x] - |x′] and operating ½fk j ð1 ≤ k ≤ r Þ on
both sides from the left, we get
n r
fk jx00 = j=1
ζ j fj jx fk jej - j=1
ζ j fj jx fk jej
r r
= j=1
ζ j fj jxδkj ζ j - j=1
ζ j fj jxδkj ζ j
2 2
= ζk fk jx - ζ k ½fk jx = ½fk jx - ½fk jx = 0: ð24:167Þ
Since W = Span {e1, ⋯, er}, (24.168) implies that [x′′|w] = 0 with 8|w] 2 W.
Consequently, from the definition of the orthogonal complement, we have x′′ 2 W⊥.
Thus, we obtain
where |x] 2 V, |x′] 2 W, and x′′ 2 W⊥. Then, from Proposition 24.1 we get
V =W W ⊥: ð24:157Þ
dimW ⊥ = n - r: ð24:159Þ
Also, from the above argument we have W⊥ = Span {er + 1, ⋯, en}. This once
again implies that W⊥ is non-degenerate as well. Then, we have
⊥
V = W⊥ ðW ⊥ Þ : ð24:170Þ
Therefore, we obtain
⊥
dimðW ⊥ Þ = n - ðn - r Þ = r: ð24:171Þ
⊥ 8
ðW ⊥ Þ = jx; ½xjy = 0 for j y 2 W ⊥ : ð24:172Þ
Then, if jx] 2 W, again from the definition such jx] must be contained in (W⊥)⊥.
That is, (W⊥)⊥ ⊃ W [1, 2]. But, from the assumption and (24.171), dim
(W⊥)⊥ = dim W. This implies [1, 2]
⊥
ðW ⊥ Þ = W: ð24:158Þ
V = Wþ W-: ð24:174Þ
p = nþ , q = n - ; nþ þ n - = n:
We have
n0 = minðnþ , n - Þ: ð24:176Þ
Proof From Theorem 24.4, we have W+ \ W- = {0}. From the definition of the
totally degenerate subspace, obviously we have W+ \ W0 = W- \ W0 = {0}. (Oth-
erwise, W+ and W- would be degenerate.) From this, we have
nþ þ n0 ≤ n, n - þ n0 ≤ n: ð24:177Þ
Then, we obtain
c1 E1 þ ⋯ þ cm Em = 0: ð24:181Þ
c2 c
ei1 = ej1 þ e - ej2 þ ⋯ þ m eim - ejm : ð24:182Þ
c1 i2 c1
Then, from the assumption we have n0 = m. Namely, the equality holds with
(24.178) so that (24.176) is established.
If, on the other hand, we assume n- = m ≤ n+, proceeding in a similar way as the
above discussion we reach the same conclusion, i.e., n0 = m. This completes the
proof.
Since we have made general discussion and remarks on the vector spaces where
the bilinear form is defined, we wish to show examples of the Minkowski space 4
and 3 .
Example 24.3 We study several properties of the Minkowski space, taking four-
dimensional space 4 and three-dimensional space 3 as an example. For both the
cases of 4 and 3 , the dimension of a totally degenerate subspace is 1 (Theorem
24.5). In 4 , n+ and n- of (24.176) are 1 and 3, respectively; see (24.97). The space
4 is denoted by
where e0, e1, e2, and e3 are defined according to (24.95) with
1184 24 Basic Formalism
where Span {e0}⊥ = Span{e1, e2, e3}. Since ½f0 je0 = 1, e0 is a timelike vector. As
½f1 þ f2 þ f3 je1 þ e2 þ e3 = - 3, e1 + e2 + e3 is a spacelike vector, etc. Meanwhile,
according to Theorem 24.5, a dimension n0 of the totally degenerate subspace W0 is
n0 = min (n+, n-) = min (1, 3) = 1. Then, for instance, we have
The presence of the totally degenerate subspace W0 has a special meaning in the
theory of special relativity. The collection of all singular vectors forms a light cone.
Since the dimension of W0 is one, it is spanned by a unique lightlike vector (i.e.,
singular vector), e.g., je0] + j e1]. It may be chosen from among any singular vectors.
Let us denote a singular vector by uλ (λ 2 Λ) and the collection of all the singular
vectors that form the light cone by L. Then, we have
L = [λ2Λ uλ : ð24:189Þ
With the above notation, see Sect. 6.1.1. Notice that the set L is not a vector space
(vide infra). The totally degenerate subspace W0 can be described by
W 0 = Span fuλ g,
3
w= i=0
ξ i ei : ð24:190Þ
Then, we obtain
3 3 2 2 2 2
i=0
ξ i ei j k=0
ξ k ek = ξ 0 - ξ1 - ξ2 - ξ3 = 0: ð24:191Þ
That is,
2 2 2 2
ξ1 þ ξ2 þ ξ3 = ξ0 : ð24:192Þ
If we regard the time coordinate (represented by ξ0) as parameter, the light cone
can be seen as an expanding (or shrinking) hypersphere in 4 with elapsing time
(both past and future directions).
In Fig. 24.1a, we depict a geometric structure including the light cone of the
three-dimensional Minkowski space 3 so that the geometric feature can appeal to
eye. There, we have
2 2 2
ξ1 þ ξ2 = ξ0 , ð24:193Þ
where ξ1 and ξ2 represent the coordinate of the x- and y-axis, respectively; ξ0 indi-
cates the time coordinate t. Figure 24.1b shows how to make a right circular cone
contained in the light cone.
The light cone comprises the whole collection of singular vectors uλ (λ 2 Λ).
Geometrically, it consists of a couple of right circular cones that come face-to-face
with each other at the vertices, with the axis (i.e., a line that connects the vertex and
the center of the bottom circle) shared in common by the two right circular cones in
3 (see Fig. 24.1a). A lightlike vector (i.e., singular vector) is represented by a
straight line obliquely extending from the origin to both the future and past directions
in the orthogonal coordinate system. Since individual lightlike vectors form the light
cone, it is a kind of ruled surface. A circle expanding (or shrinking) with time
ξ0 according to (24.193) is a conic section of the right circular cone shaping the
light cone.
In Fig. 24.1a, the light cone comprises a two-dimensional figure, and so one
might well wonder if the light cone is a two-dimensional vector space. Think of,
however, e.g., je0] + j e1] and je0] - j e1] within the light cone. If those vectors were
basis vectors, (| e0]+| e1]) + (| e0]-| e1]) = 2 j e0] would be a vector of the light cone as
well. That is, however, evidently not the case, because the vector je0] is located
1186 24 Basic Formalism
(a) t (b)
| ]
| ] y
| ] O
Fig. 24.1 Geometric structure including the light cone of the three-dimensional Minkowski space
3 . (a) The light cone consists of a couple of right circular cones that come face-to-face with each
other at the vertices, with the axis (i.e., a line that connects the vertex and the center of the bottom
circle) shared in common. A lightlike vector (i.e., singular vector) is represented by an oblique
arrow extending from the origin to both the future and past directions. Individual lightlike vectors
form the light cone as a ruled surface. (b) Simple kit that helps visualize the light cone. To make it,
follow next procedures: (i) Take a thick sheet of paper and cut out two circles to shape them into
p
sectors with a central angle θ = 2π radian ≈ 254:6 ° . (ii) Bring together the two sides of the sector
so that it can be a right circular cone with its vertical angle of a right angle (i.e., 90°). (iii) Bring
together the two right circular cones so that they can be jointed at their vertices with the axis shared
in common to shape the light cone depicted in (a) of the above
outside the light cone. Whereas a vector “inside” the light cone is a timelike, that
“outside” the light cone is spacelike.
We have, e.g.,
where Span {e0}⊥ = Span{e1, e2}. Since ½f0 je0 = 1, e0 is a timelike vector. As
½f1 þ f2 je1 þ e2 = - 2, e1 + e2 is a spacelike vector, etc. Meanwhile, a dimension n0
of the totally degenerate subspace W~0 of 3 is n0 = min (n+, n-) = min (1, 2) = 1.
Then, for instance, as in (24.186), we have
⊥ ⊥
W~0 = Span fe0 þ e1 , e2 g or W~0 = W~0 Span fe2 g: ð24:196Þ
⊥
The orthogonal complement W~0 forms a tangent space to the light cone L. We
⊥
have W~0 \ W~0 = W~0 = j e0 þ j e1 ≠ 0.
As described above, we have overviewed the characteristics of the extended
concept of vector spaces and studied several properties which the inner product
space and Euclidean space lack. Examples included the singular vectors and degen-
erate subspaces as well as their associated properties. We have already seen that the
indefinite metric caused intractability in the field quantization. The presence of the
singular vectors is partly responsible for it.
The theory of vector space is a fertile ground for mathematics and physics.
Indeed, the theory deals with a variety of vector spaces such as an inner product
space (in Chap. 13), Euclidean space, Minkowski space, etc. Besides them a spinor
space is also contained, even though we do not go into detail about it. Interested
readers are referred to the literature with this topic [5]. Yet, the spinor space naturally
makes an appearance in the Dirac equation as a representation space. We will deal
with the Dirac equation from a different angle in the following sections.
In the previous section, we have learned that the transformation R that makes the
bilinear form [| ] invariant must satisfy the condition described by
RT M R = M : ð24:104Þ
At the same time, (24.104) has given a condition for the metric tensor M to be
invariant with respect to the vector (or coordinate) transformation R in a vector space
where [| ] is defined. In the Minkowski space 4 , R of (24.104) is called the
Lorentz transformation and written as Λ according to the custom. Also, M is denoted
by η. Then, we have
η = ΛT ηΛ: ð24:197Þ
ðΛ1 Λ2 ÞT ηΛ1 Λ2 = ΛT2 ΛT1 ηΛ1 Λ2 = ΛT2 ΛT1 ηΛ1 Λ2 = ΛT2 ηΛ2 = η: ð24:199Þ
That is, Λ1Λ2 2 O(3, 1). With the identity operator E, we have
-1
ηðΛ1 Þ - 1 = Λ1- 1 ηðΛ1 Þ - 1 ,
T
η = ΛT1 ð24:201Þ
-1 T
where we used ΛT1 = Λ1- 1 . From (24.199) to (24.201), we find that O(3, 1)
certainly satisfies the definitions of group; see Sect. 16.1. Taking determinant of
(24.197), we have
where with the first equality we used detΛT = det Λ [1, 2]. That is, we obtain
Lorentz groups are classified according to the sign of determinants and the (0, 0)
component (i.e., Λ00 ) of the Lorentz transformation matrix. Using the notation of
tensor elements, (24.197) can be written as
2 3 2
η00 Λ0 0 þ η
i = 1 ii
Λi 0 = η00 ,
namely we have
2 3 2
Λ0 0 =1 þ i=1
Λi 0 : ð24:205Þ
Hence,
24.2 Lorentz Group and Lorentz Transformations 1189
2
Λ0 0 ≥ 1: ð24:206Þ
Λ0 0 ≥ 1 or Λ0 0 ≤ - 1: ð24:207Þ
The Lie algebra of the Lorentz group is represented by a (4, 4) real matrix. We have
an important theorem about the Lie algebra of the Lorentz group.
Theorem 24.6 [8] Let the Lie algebra of the Lorentz group O(3, 1) be denoted as
oð3, 1Þ. Then, we have
Proof Suppose that X 2 oð3, 1Þ. Then, from Definition 20.3, exp(sX) 2 O(3, 1) with
8
s (s: real number). Moreover, from (24.198), we have
X T η þ ηX = 0: ð24:211Þ
1 1 ν 1 sν T ν
ðexp sX ÞT η = exp sX T η = sX T η = X η: ð24:212Þ
ν=0 ν! ν=0 ν!
In (24.212), we have
1 sν 1 sν
ðexp sX ÞT η = ν = 0 ν!
ηð- 1Þν X ν = η ν = 0 ν!
ð- X Þν = η expð- sX Þ
= ηðexp sX Þ - 1 , ð24:214Þ
where with the last equality we used (15.28). Multiplying both sides of (24.214) by
expsX from the right, we obtain
This implies that expsX 2 O(3, 1). This means X 2 oð3, 1Þ. These complete the
proof.
Let us seek a tangible form of a Lie algebra of the Lorentz group. Suppose that
X 2 oð3, 1Þ of (24.208). Assuming that
24.2 Lorentz Group and Lorentz Transformations 1191
a b c d
e f g h
X= p q r s ða w : realÞ ð24:216Þ
t u v w
2a b - e c-p d -t
b-e - 2f - ð g þ qÞ - ð h þ uÞ
c-p - ð g þ qÞ - 2r - ð s þ vÞ = 0: ð24:217Þ
d-t - ð h þ uÞ - ð s þ vÞ - 2w
Then, we obtain
a = f = r = w = 0, b = e, c = p, d = t, g = - q, h = - u, s = - v: ð24:218Þ
0 b c d
X= b
c
0
-g
g
0
h
s
: ð24:219Þ
d -h -s 0
Since we have ten conditions for sixteen elements, we must have six independent
representations. We show a tangible example of these representations as follows:
0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 1 0 0 -1 0
0 0 0 -1 , 0 0 0 0 , 0 1 0 0 ;
0 0 1 0 0 -1 0 0 0 0 0 0
0 1 0 0 0 0 1 0 0 0 0 1
1 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 , 1 0 0 0 , 0 0 0 0 : ð24:220Þ
0 0 0 0 0 0 0 0 1 0 0 0
The Lie algebra of the Lorentz group can be expressed as linear combinations of
the above matrices (i.e., basis vectors of this Lie algebra). This Lie algebra is a
six-dimensional vector space V6 accordingly. Note that in (24.220) the upper three
matrices are anti-Hermitian and that the lower three matrices are Hermitian. We
symbolically represent the upper three matrices and the lower three matrices as
respectively, in the order from the left of (24.220). Regarding the upper three
matrices, the (3, 3) principal submatrices excluding the (0, 0) element are identical
with Ax, Ay, and Az in the shown order; see (20.26). In other words, these three
matrices represent the spatial rotation in ℝ3. Moreover, these matrices have been
decomposed into a direct sum such that
where {0} denotes the (0, 0) element of A. The remaining three matrices B are
called Lorentz boosts.
Expressing a magnitude related to A and B as a (a1, a2, a3) and b (b1, b2,
b3), respectively, we succinctly describe any Lorentz transformation Λ as
1 0 0 0
0 cos θ - sin θ 0
expðθA3 Þ = , ð24:224Þ
0 sin θ cos θ 0
0 0 0 1
coshω 0 0 - sinhω
expð - ωB3 Þ = 0 1 0 0 , ð24:225Þ
0 0 1 0
- sinhω 0 0 coshω
where we put b3 = - ω.
In (24.224), we have 0 ≤ θ ≤ 2π, that is, the parameter space θ is bounded. In
(24.225), on the other hand, we have - 1 < ω < 1 where the parameter space ω is
unbounded. The former group is called a compact group, whereas the latter is said to
be a non-compact group accordingly.
In general, with a compact group, it is possible to construct a unitary represen-
tation [8]. For a non-compact group, however, it is not necessarily possible to
construct a unitary representation. In fact, although expðθA3 Þ of (24.224) is unitary
[see (7)’ of Sect. 15.2], expð- ωB3 Þ in (24.225) in not unitary, but Hermitian. These
features of the Lorentz group lead to various intriguing properties (vide infra).
24.2 Lorentz Group and Lorentz Transformations 1193
In the last sections, we have studied how the Lorentz transformation is dealt with
using the bilinear form and Lie algebra in the Minkowski space. In the subsequent
sections, we examine how the Dirac equation is transformed through the medium of
the Lorentz transformation within the framework of standard matrix algebra. To
examine the constitution of the Dirac equation appropriately, we need to deal with
the successive Lorentz transformations. These transformations can be well under-
stood by the idea of the moving coordinate systems (Sect. 17.4.2).
Let us express a four-vector X (i.e., space-time vector) in the Minkowski space as
x01
X = ðje0 je1 je2 je3 Þ x2
x3 , ð24:226Þ
x
where {| e0], | e1], | e2], | e3]} are the basis set of Minkowski space defined in Sect.
x01
24.1.7; x2
x3 are corresponding coordinates. We will use a symbol ei (i = 0, 1, 2, 3)
x
instead of jei], in case confusion is unlikely to happen.
Following Sect. 11.4, we specify the transformation in such a way that the said
transformation keeps the vector X unaltered. In terms of the special theory of
relativity, a vector is associated with a certain space-time point where an event has
taken place. Consequently, the vector X should be the same throughout the inertial
frames of reference that are connected to one another via the Lorentz
transformations.
In what follows, we express a four-vector X using a shorthand notation as
x01
X ex = ðje0 je1 je2 je3 Þ x2
x3 ,
x
where e stands for the basis set of 4 and x denotes the corresponding coordinates
(i.e., a column vector). Let a transformation be expressed as P. We rewrite X as
X = ex = eP - 1 ðPxÞ = e0 x0 , ð24:227Þ
where e and e′ are the set of basis vectors connected by P with e′ = eP-1 and Px = x′.
Once again, we note that in the case where the Lorentz transformation is relevant to
the expression, unlike (11.84) of Sect. 11.4 the coordinate representation is placed
ahead of the basis vector representation (see Sect. 24.1.5). Next, let us think about
the several successive transformations P1, ⋯, Pn. We have
1194 24 Basic Formalism
x0 = Pn ⋯P1 x, ð24:229Þ
We proceed with the calculations from the assumption that an electron is at rest in
the frame O, where the Dirac equation can readily be solved. We perform the
successive Lorentz transformations that start with O and finally reach O′ via Oθ
and Oϕ such that
e = e0 Λϕ Λθ Λb- 1 : ð24:233Þ
In accordance with Fig. 24.2a, Fig. 24.2b shows how the motion of electron looks
from the frame O′, where the electron is moving at a velocity v in the direction
defined by a zenithal angle θ and azimuthal angle ϕ. Correspondingly, from (24.229)
we have
x0 = Λϕ Λθ Λb- 1 x: ð24:234Þ
x00 x01
x01
x02 = Λϕ Λθ Λb- 1 x2
x3 : ð24:235Þ
x03 x
Thus, we find that three successive Lorentz transformations have produced the
total Lorentz transformation Λ described by
Λ = Λϕ Λθ Λb- 1 , ð24:236Þ
where Λb- 1 related to the Lorentz boost is the inverse matrix of (24.225). We
compute the combined Lorentz transformation Λ such that
Λ = Λϕ Λθ Λb- 1 =
1 0 0 0 1 0 0 0 coshω 0 0 sinhω
0 cos ϕ - sin ϕ 0 0 cos θ 0 sin θ 0 1 0 0
0 sin ϕ cos ϕ 0 0 0 1 0 0 0 1 0
0 0 0 1 0 - sin θ 0 cos θ sinhω 0 0 coshω
coshω 0 0 sinhω
cos ϕ sin θsinhω cos ϕ cos θ - sin ϕ cos ϕ sin θcoshω
= : ð24:237Þ
sin ϕ sin θsinhω sin ϕ cos θ cos ϕ sin ϕ sin θcoshω
cos θsinhω - sin θ 0 cos θcoshω
in combination with the inverse operators to those shown in Fig. 24.2a. Then, use
ð1Þ
(17.85) and (17.91) with k = n = 3. Deleting R3 , we get
ð1Þ ð2Þ
Substituting Λϕ, Λθ, and Λb- 1 for R1, R2 , and R3 , respectively, and defining
-1
R3 Λ~b , we obtain
1196 24 Basic Formalism
(a) (b)
Λ Λ Λ
Fig. 24.2 Successive Lorentz transformations and their corresponding coordinate systems. (a)
Lorentz boost (Λb) followed by two spatial rotations (Λθ- 1 and Λϕ- 1 ). The diagram shows the
successive three transformations of the basis vectors. (b) Geometry of an electron motion. The
electron is moving at a velocity v in the direction defined by a zenithal angle θ and azimuthal angle
ϕ in reference to the frame O′
-1 -1
Λ~b = Λϕ Λθ Λb- 1 Λϕ Λθ : ð24:240Þ
Λ u Λϕ Λθ , ð24:241Þ
we get
Λ = Λu Λb- 1 : ð24:242Þ
Also, we have
-1
Λ~b = Λu Λb- 1 Λ{u : ð24:243Þ
-1
The operator Λ~b represents the Lorentz boost viewed in reference to the
frame O′, whereas the operator Λb- 1 is given in reference to Oθ with respect to the
same boost. Both the transformations are connected to each other through the unitary
similarity transformation and belong to the same conjugacy class, and so they have a
-1
same physical meaning. The matrix representation of Λ~b is given by
24.3 Covariant Properties of the Dirac Equation 1197
-1
Λ~b =
coshω cos ϕsinθsinhω sin ϕsinθsinhω cos θsinhω
cos ϕsin ϕsin θ 2 cos ϕcos θ sinθ
cos ϕsinθsinhω cos 2 ϕcos 2 θ þ sin 2 ϕ
× ðcoshω- 1Þ × ðcoshω -1Þ
þcos 2 ϕ sin 2 θcoshω
sin 2 ϕcos 2 θ þ cos 2 ϕ sinϕcos θ sinθ :
sin ϕsinθsinhω cos ϕ sinϕsin 2 θ
þ sin 2 ϕsin 2 θcoshω × ðcoshω -1Þ
× ðcoshω- 1Þ
sin ϕcos θ sinθ
cos θsinhω cos ϕcos θ sinθ sin 2 θ þ cos 2 θcoshω
× ðcoshω- 1Þ
× ðcoshω- 1Þ
ð24:244Þ
χ = 2 þ cosh ω: ð24:245Þ
of Λb- 1 , reflecting the fact that the unitary similarity transformation keeps the trace
unchanged.
Equation (24.242) gives an interesting example of the decomposition of Lorentz
transformation into the unitary part and Hermitian boost part (i.e., polar decompo-
sition). The Lorentz boost is represented by a positive definite Hermitian matrix (or a
real symmetric matrix). We will come back to this issue later (Sect. 24.9.1).
A ðX Þ = 0: ð24:246Þ
c1 ðxÞ
X = ðξ 1 ⋯ ξ n Þ ⋮ , ð24:247Þ
cn ðxÞ
where {ξ1, ⋯, ξn} are the basis functions (or basis vectors) of a representation space
c1 ðxÞ
and ⋮ indicate the corresponding coordinates. In (24.247), we assume that
cn ðxÞ
X is an element of the representation space of dimension n that is related to the
physical equation. Correspondingly, we wright LHS of (24.246) as
c1 ð xÞ
A ðX Þ = ðξ1 ⋯ ξn ÞA ⋮ : ð24:248Þ
cn ð x Þ
c1 ð xÞ
A ðX Þ = ðξ1 ⋯ ξn ÞR - 1 RAR - 1 R ⋮
cn ð x Þ
ðξ ′ 1 ⋯ ξ ′ n Þ = ðξ1 ⋯ ξn ÞR - 1 ð24:250Þ
shows the set of basis functions obtained after the transformation. The relation
represents the corresponding coordinates after being transformed. Also, the operator
A 0 given by
A 0 = RAR - 1 ð24:252Þ
denotes the representation matrix of the operator viewed in reference to ξ01 ⋯ ξ0n .
In general, the collection of the transformations forms a group (i.e., transforma-
tion group). In that case, ξ01 , ⋯, ξ0n spans the representation space. To stress that
24.3 Covariant Properties of the Dirac Equation 1199
R is the representation matrix of the group, we identify it with D(R). That is, from
(18.6) we have
c01 ðxÞ
A ðX Þ = ðξ ′ 1 ⋯ ξ ′ n ÞDðRÞA ½DðRÞ - 1 ⋮ : ð24:254Þ
c0n ðxÞ
ðξ1 ⋯ ξn ÞA = 0 or ðξ ′ 1 ⋯ ξ ′ n ÞA 0 = 0: ð24:255Þ
Alternatively, we write
c1 ð xÞ c01 ðxÞ
A ⋮ =0 or A0 ⋮ = 0, ð24:256Þ
cn ð x Þ c0n ðxÞ
where
Now, we wish to see how the tangible physical equations which we are interested in
undergo the transformation. We start with the Klein-Gordon equation and then
examine the transformation properties of the Dirac equation in detail.
(i) Klein-Gordon equation:
The equation is given by
μ
∂μ ∂ þ m2 ϕðxÞ = 0: ð24:260Þ
μ
A ∂μ ∂ þ m 2 , ð24:261Þ
μ
A 0 = SAS - 1 = S ∂μ ∂ þ m2 S - 1 : ð24:262Þ
μ
S ∂μ ∂ þ m2 S - 1 SϕðxÞ = 0: ð24:263Þ
μ 0 μ 0ρ 0 0ρ 0 0ρ
∂μ ∂ ¼ Λν μ ∂ν Λ - 1 ρ
∂ ¼ δνρ ∂ν ∂ ¼ ∂ρ ∂ : ð24:264Þ
Assuming that S is commutative with both ∂μ and ∂μ, (24.263) can be rewritten as
0 0μ
∂μ ∂ þ m2 SϕðxÞ = 0: ð24:265Þ
0 0μ
∂μ ∂ þ m2 ϕ0 ðx0 Þ = 0: ð24:266Þ
Since ϕ(x) is a scalar function, from the discussion of Sect 19.1 we must have
S 1: ð24:268Þ
iγ μ ∂μ - m ψ ðxÞ = 0: ð22:213Þ
Since we assume the four-dimensional representation space with the Dirac spinor
ψ(x), we describe ψ(x) as
c1 ð xÞ
ψ ð xÞ c2 ð xÞ ð24:269Þ
c34 ðxÞ
c ð xÞ
accordingly. Notice that in (24.269) the superscripts take the integers 1, ⋯, 4 instead
of 0, ⋯, 3. It is because the representation space for the Dirac equation is not the
Minkowski space. Since ψ(x) in (24.269) represents a spinor, the superscripts are not
associated with the space-time. Defining
A iγ μ ∂μ - m, ð24:270Þ
(22.213) is read as
Aψ ðxÞ = 0: ð24:271Þ
ðχ 1 ⋯ χ n ÞAψ ðxÞ = 0:
S iγ μ ∂μ - m S - 1 Sψ ðxÞ = 0: ð24:272Þ
0
S iγ μ Λν μ ∂ν - m S - 1 Sψ ðxÞ = 0: ð24:273Þ
We wish to delete the factor Λν μ from (24.273) to explicitly describe 22.213 in the
Lorentz covariant form. For this, it suffices for the following relation to hold:
0 0
Siγ μ Λν μ ∂ν S - 1 = iγ ν ∂ν : ð24:274Þ
0
In (24.274) we have assumed that S is commutative with ∂ν . Rewriting (24.274),
we get
0
∂ν Λν μ Sγ μ S - 1 - γ ν = 0: ð24:275Þ
That is, to maintain the Lorentz covariant form, again, it suffices for the following
relation to hold [10, 11]:
Λν μ Sγ μ S - 1 = γ ν : ð24:276Þ
where ψ ′(x′) denotes the change in the functional form accompanied by the coordi-
nate transformation. Notice that only in the case where ψ(x) is a scalar function, we
have ψ ′(x′) = ψ(x); see (19.5) and (24.267). Using (24.276), we rewrite (24.273) as
0
iγ μ ∂μ - m ψ 0 ðx0 Þ = 0: ð24:278Þ
Thus, on the condition of (24.276), we have gotten the covariant form of the Dirac
equation (24.278). Defining
A 0 iγ μ ∂ ′ μ - m, ð24:279Þ
A 0 ψ 0 ðx0 Þ = 0, ð24:280Þ
with
A 0 = SAS - 1 : ð24:281Þ
m) that appeared in Sect. 21.3.1 Dirac operators. When we are dealing with the plane
wave solution with the Dirac equation, ∂μ is to be replaced with -ipμ or +ipμ (i.e.,
Fourier components) to give the Dirac operator G or G, ~ respectively.
We also emphasize that (24.277) is contrasted with (24.267), which characterizes
the transformation of the scalar function. Comparing the transformation properties of
the Klein-Gordon equation and the Dirac equation, we realize that the representa-
tion matrix S is closely related to the nature of the representation space. In the
representation space, state functions (i.e., scalar, vector, spinor, etc.) and basis
functions (e.g., basis vectors in a vector space) as well as the operators undergo
the simultaneous transformation by the representation matrix S; see (24.257) and
(24.258). In the next section, we wish to determine a general form of S of the Dirac
equation. To this end, we carry the discussion in parallel with that of Sect. 24.2.
Before going into detail, once again we wish to view solving the Dirac equation
as an eigenvalue problem. From (21.96), (21.107), and (21.109), we immediately get
~ = - 2mE,
GþG ð24:282Þ
~ = - 2mS - 1 ES = - 2mE:
S - 1 GS þ S - 1 GS
~ = - 2mE - D,
S - 1 GS
where D is a diagonal matrix given by D = S - 1 GS. This implies that G ~ can also be
diagonalized using the same matrix S.
Bearing in mind this point, in the next section we are going to seek and construct a
matrix S(Λ) that diagonalizes both G and G. ~
S Λϕ = expðϕρm Þ, ð24:284Þ
i k l
ρm ð- iÞJ kl ; J kl γ , γ ðk, l, m = 1, 2, 3Þ: ð24:285Þ
4
1 σm
J kl = 0
σm
, ð24:286Þ
2 0
where σ m is a Pauli spin matrix and 0 denotes (2, 2) zero matrix. Note that Jkl is given
by a direct sum of σ m. That is,
1
J kl = σ σm : ð24:287Þ
2 m
eiϕ=2 0 0 0
- iϕ=2 0 0
S Λϕ = expðϕρ3 Þ = 0 e : ð24:288Þ
0 0 eiϕ=2 0
0 0 0 e - iϕ=2
With another case where the rotation by θ is made around the y-axis, we have
θ θ
cos sin 0 0
2 2
θ θ
- sin cos 0 0
SðΛθ Þ ¼ expðθρ2 Þ ¼ 2 2 : ð24:289Þ
θ θ
0 0 cos sin
2 2
θ θ
0 0 - sin cos
2 2
i
βk ð- iÞ J 0k ; J 0k γ 0 γ k ðk = 1, 2, 3Þ: ð24:291Þ
2
If we make the Lorentz boost in the direction of the z-axis, S(Λb) is expressed as
θ θ
cos sin 0 0
2 2
eiϕ=2 0 0 0
θ θ
0 e - iϕ=2 0 0 - sin cos 0 0
2 2
0 0 eiϕ=2 0 θ θ
0 0 cos sin
0 0 0 e - iϕ=2 2 2
θ θ
0 0 - sin cos
2 2
ω ω
cosh 0 - sinh 0
2 2
ω ω
0 cosh 0 sinh
2 2
ω ω
- sinh 0 cosh 0
2 2
ω ω
0 sinh 0 cosh
2 2
θ ω θ ω θ ω θ ω
eiϕ=2 cos cosh eiϕ=2 sin cosh -eiϕ=2 cos sinh eiϕ=2 sin sinh
2 2 2 2 2 2 2 2
θ ω θ ω θ ω θ ω
- e -iϕ=2 sin cosh e -iϕ=2 cos cosh e - iϕ=2 sin sinh e - iϕ=2 cos sinh
= 2 2 2 2 2 2 2 2 :
θ ω θ ω θ ω θ ω
- eiϕ=2 cos sinh eiϕ=2 sin sinh eiϕ=2 cos cosh eiϕ=2 sin cosh
2 2 2 2 2 2 2 2
-iϕ=2 θ ω -iϕ=2 θ ω - iϕ=2 θ ω - iϕ=2 θ ω
e sin sinh e cos sinh -e sin cosh e cos cosh
2 2 2 2 2 2 2 2
ð24:293Þ
Also, we have
½SðΛÞ - 1 =
24.5 Transformation Properties of the Dirac Spinors [9] 1207
θ ω θ ω θ ω θ ω
e - iϕ=2 cos cosh - eiϕ=2 sin cosh e - iϕ=2 cos sinh - eiϕ=2 sin sinh
2 2 2 2 2 2 2 2
θ ω θ ω θ ω θ ω
e - iϕ=2 sin cosh eiϕ=2 cos cosh - e - iϕ=2 sin sinh - eiϕ=2 cos sinh
2 2 2 2 2 2 2 2 :
- iϕ=2 θ ω θ ω - iϕ=2 θ ω θ ω
e cos sinh -e iϕ=2
sin sinh e cos cosh -e iϕ=2
sin cosh
2 2 2 2 2 2 2 2
- iϕ=2 θ ω θ ω - iϕ=2 θ ω θ ω
-e sin sinh -e iϕ=2
cos sinh e sin cosh iϕ=2
e cos cosh
2 2 2 2 2 2 2 2
ð24:294Þ
Now, we are in the position to seek a tangible form of the Dirac spinors. Regarding
the plane waves, ψ(x) written in a general form in (24.269) is described by
c0 ð xÞ c0
c1 ð x Þ c1
ψ ð xÞ = = e ± ipx = e ± ipx wðp; hÞ, ð24:295Þ
c2 ð x Þ c2
c3 ð x Þ c3
where w(p, h) represents either u(p, h) or v(p, h) given in Sect. 21.3. Note that w(p, h)
does not depend on x. Since px (pμxμ) in e±ipx is a scalar quantity, it is independent
of the choice of an inertial frame of reference, namely, px is invariant with the
Lorentz transformation. Therefore, it behaves as a constant in terms of the operation
of S(Λ).
Let the x-system be set within the inertial frame of reference O where the electron
is at rest (see Fig. 24.2). Then, we have
In (24.297) we assume that x′-system is set within the inertial frame of reference
′
O where the electron is moving at a velocity of v (see Fig. 24.2). Note that in
p00
(24.297) p′x′ = px = p0x0 = mx0 and p = p0
p
; p 0 = p0 , p0 = p0
.
1208 24 Basic Formalism
In the remaining parts of this section, we use the variables x and p for the frame
O and the variables x′ and p′ for the frame O′. Note that the frames O and O′ are
identical to those appearing in Sect. 24.2.2. First, we wish to seek the solution w(0, h)
of the Dirac equation with an electron at rest. In this case, momentum p of the
electron is zero, i.e., p1 = p2 = p3 = 0. Then, if we find the solution w(0, h) in the
frame O, we are going to get a solution w(p′, h) in the frame O′ such that
Equation (24.298) implies that the helicity is held unchanged by the transforma-
tion S(Λ). Now, the Dirac operator of the rest frame is given by
pμ γ μ - m = p0 γ 0 - m = mγ 0 - m = m γ 0 - 1 A ð24:299Þ
or
- pμ γ μ - m = - p0 γ 0 - m = - mγ 0 - m = m - γ 0 - 1 B: ð24:300Þ
Namely, we have
0 0 0 0
0 0 0 0
A ð24:301Þ
0 0 - 2m 0
0 0 0 - 2m
and
- 2m 0 0 0
0 - 2m 0 0
B : ð24:302Þ
0 0 0 0
0 0 0 0
0 0 0 0 c0
0 0 0 0 c1
= 0: ð24:303Þ
0 0 - 2m 0 c2
0 0 0 - 2m c3
1 0
0 1
u1 ð0; - 1Þ = , u2 ð0; þ1Þ = : ð24:304Þ
0 0
0 0
1 0
- imx0 0 - imx0 1
ψ 1 ð xÞ = e , ψ 2 ðxÞ = e x0 t : ð24:305Þ
0 0
0 0
- 2m 0 0 0 c0
0 - 2m 0 0 c1
= 0: ð24:306Þ
0 0 0 0 c2
0 0 0 0 c3
0 0
0 0
v1 ð0; þ1Þ = , v2 ð0; - 1Þ = ð24:307Þ
1 0
0 1
0 0
0 0 0 0
ψ 3 ðxÞ = eimx , ψ 4 ðxÞ = eimx : ð24:308Þ
1 0
0 1
The above solutions (24.304) can be automatically obtained from the results of
Sect. 21.3 (including the normalized constant as well). That is, using (21.104) and
putting θ = ϕ = 0 as well as S = 0 in (21.104), we have the same results as (24.304).
Note, however, that in (24.307) we did not take account of the charge conjugation
(vide infra). But, using (21.120) and putting θ = ϕ = 0 and S = 0 once again, we
obtain
0 0
0 0
v1 ð0; þ1Þ = , v2 ð0; - 1Þ = : ð24:309Þ
1 0
0 -1
Notice that the latter solution of (24.309) has correctly reflected the charge
conjugation. Therefore, with the negative-energy plane wave solutions, we must use
1210 24 Basic Formalism
0 0
imx0 0 imx0 0
ψ 3 ð xÞ = e , ψ 4 ð xÞ = e : ð24:310Þ
1 0
0 -1
Even though (24.303) and (24.306) and their corresponding solutions (24.304)
and (24.309) look trivial, they provide solid ground for the Dirac equation and its
solution of an electron at rest. In our case, these equations are dealt with in the frame
O and described by x.
In the above discussion, (24.305) and (24.310) give ψ(x) in (24.296). Thus,
substituting (24.304) and (24.309) for w(0, h) of (24.297) and using D(Λ) = S(Λ)
of (24.293), we get at once the solutions ψ ′(x′) described by x′ with respect to the
frame O′. The said solutions are given by
θ ω θ ω
eiϕ=2 cos cosh eiϕ=2 sin cosh
2 2 2 2
θ ω θ ω
- e - iϕ=2 sin cosh e - iϕ=2 cos cosh
0 0 2 2 0 0 2 2
ψ 01 ðx0 Þ = e - ip x , ψ 02 ðx0 Þ = e - ip x ,
θ ω θ ω
- eiϕ=2 cos sinh eiϕ=2 sin sinh
2 2 2 2
θ ω θ ω
e - iϕ=2 sin sinh e - iϕ=2 cos sinh
2 2 2 2
θ ω θ ω
- eiϕ=2 cos sinh -e
iϕ=2
sin sinh
2 2 2 2
θ ω θ ω
e - iϕ=2 sin sinh - e - iϕ=2 cos sinh
0 0 2 2 0 0 2 2
ψ 03 ðx0 Þ = eip x , ψ 04 ðx0 Þ = eip x :
θ ω θ ω
eiϕ=2 cos cosh -e iϕ=2
sin cosh
2 2 2 2
θ ω θ ω
- e - iϕ=2 sin cosh - e - iϕ=2 cos cosh
2 2 2 2
ð24:311Þ
θ
eiϕ=2 cos
2
θ
0 0 ω - e - iϕ=2 sin
ψ 01 ðx0 Þ = e - ip x cosh θ 2 ω : ð24:312Þ
2 - eiϕ=2 cos tanh
2 2
θ ω
e - iϕ=2 sin tanh
2 2
θ
eiϕ=2 cos
2
θ
0 0 p00 þ m - e - iϕ=2 sin
ψ 01 ðx0 Þ = e - ip x 2 : ð24:316Þ
2m θ
-S ′e iϕ=2
cos
2
0 - iϕ=2 θ
Se sin
2
The column vector of (24.316) is exactly the same as the first equation of (21.104)
including the phase factor (but, aside from the normalization factor 1=2m). Other
three equations of (24.311) yield the column vectors same as given in (21.104) and
(21.120) as well.
Thus, following the procedure of (24.277) we have proven that S(Λ) given in
(24.293) produces the proper solutions of the Dirac equation. From (24.304) and
(24.309), we find that individual column vectors of S(Λ) yield four solutions of the
~ (aside from the phase factor). This
Dirac equation, i.e., the eigenvectors of G and G
1212 24 Basic Formalism
suggests that S(Λ) is a diagonalizing matrix of G and G. ~ For the same reason, the
column vectors of S(Λ) are the eigenvectors of H for (21.86) as well. As already
pointed out in Sect. 21.3.2, however, G is not Hermitian. Consequently, on the basis
of Theorem 14.5, the said eigenvectors shared by G and H do not constitute the
orthonormal system. Hence, we conclude S(Λ) is not unitary. This is consistent with
the results of Sect. 24.4; see an explicit matrix form of S(Λ) indicated by (24.293).
That is, the third factor matrix representing the Lorentz boost in LHS of (24.293) is
not unitary.
Now, we represent the positive energy solutions and negative-energy solutions as
Ψ and Ψ ~ altogether, respectively. Operating both sides of (24.282) on Ψ and Ψ
~ from
the left, we get
~ = 0 þ GΨ
GΨ þ GΨ ~ = GΨ
~ = - 2mΨ,
~ þG
GΨ ~Ψ~ = GΨ
~ þ 0 = GΨ
~ = - 2mΨ:
~
From the above equations, we realize that Ψ is an eigen function that belongs to
the eigenvalue 0 of G and, at the same time, an eigen function that belongs to the
eigenvalue -2m of G. ~ In turn, Ψ~ is an eigen function that belongs to the eigenvalue
~
0 of G and, at the same time, an eigen function that belongs to the eigenvalue -2m of
G. Both G and G ~ possess the eigenvalues 0 and -2m with both doubly degenerate.
Thus, the questions raised in Sect. 21.3.2 have been adequately answered.
The point rests upon finding out a proper matrix form of S(Λ). We will further
pursue the structure of S(Λ) in Sect. 24.9 in relation to the polar decomposition of a
matrix.
In the previous section, we have derived the functional form of the Dirac spinor from
that of the Dirac spinor determined in the rest frame of the electron. In this section we
study how the Dirac operator is transformed according to the Lorentz transformation.
In the first half of this section, we examine it in reference to the rest frame in
accordance with the approach taken in the last section. More generally, however,
we need to investigate the transformation property of the Dirac operator in reference
to a moving frame of an electron. We deal with this issue in the latter half of the
section.
we have
θ θ
cos sin 0 0
2 2
eiϕ=2 0 0 0
θ θ
0 e - iϕ=2 0 0 - sin cos 0 0
2 2
SR ¼
0 0 eiϕ=2 0 θ θ
0 0 cos sin
2 2
0 0 0 e - iϕ=2
θ θ
0 0 - sin cos
2 2
θ θ
eiϕ=2 cos eiϕ=2 sin 0 0
2 2
θ θ
- e - iϕ=2 sin e - iϕ=2 cos 0 0
¼ 2 2 : ð24:318Þ
θ θ
0 0 eiϕ=2 cos eiϕ=2 sin
2 2
- iϕ=2 θ - iϕ=2 θ
0 0 -e sin e cos
2 2
θ θ
eiϕ=2 cos eiϕ=2 sin
S~R 2
θ
2
θ , ð24:319Þ
- e - iϕ=2 sin e - iϕ=2 cos
2 2
we have
SR 0
SR = : ð24:320Þ
0 SR
Then, we obtain
-1
0 0 SR
SR ASR- 1 = ð - 2mÞ SR 0 0
-1
0 SR 0 E2 0 SR
= ð- 2mÞ 0
0
0
S~R E 2 S~R
-1 = ð- 2mÞ 0
0
0
E2
= A, ð24:321Þ
1214 24 Basic Formalism
where A is that appeared in (24.301). Consequently, the functional form of the Dirac
equation for the electron at rest is invariant with respect to the spatial rotation.
Corresponding to (24.311), the Dirac spinor is given by e.g.,
θ
eiϕ=2 cos
1 2
00
ψ 01 ðx0 Þ = SR 0
0
= e - imx - e - iϕ=2 sin
θ
, ð24:322Þ
0 2
0
0
θ
eiϕ=2 sin
0 2
00 θ
ψ 02 ðx0 Þ = SR 1
0
= e - imx e - iϕ=2 cos : ð24:323Þ
2
0 0
0
Similarly, we have
0
0 0 θ
00
ψ 03 ðx0 Þ = SR 0
= eimx eiϕ=2 cos
2
, ð24:324Þ
1
0 - iϕ=2
θ
-e sin
2
0
0 0
00 θ
ψ 04 ðx0 Þ = SR 0
0
= eimx - eiϕ=2 sin
2
: ð24:325Þ
-1 - iϕ=2
θ
-e cos
2
θ θ θ
cos2 - eiϕ cos sin 0 0
2 2 2
θ θ θ
jψ 01 ψ 01 j = = jψ 01 ψ 01 j :
2
- e - iϕ cos sin
2 2
sin 2
2
0 0 ð24:326Þ
0 0 0 0
0 0 0 0
4
k=1
jψ 0k ψ 0k j = E 4 , ð24:327Þ
where E4 is a (4, 4) identity matrix. Other requirements with the projection operators
mentioned in Sect. 14.1 are met accordingly (Definition 14.1). Equation (24.327)
represents the completeness relation (see Sect. 18.7).
24.6 Transformation Properties of the Dirac Operators [9] 1215
ω ω
cosh 0 - sinh 0
2 2
ω ω
0 cosh 0 sinh
Sω ¼ 2 2 , ð24:328Þ
ω ω
- sinh 0 cosh 0
2 2
ω ω
0 sinh 0 cosh
2 2
we get
Sω ASω- 1
ω ω
cosh 0 - sinh 0
2 2
ω ω
0 cosh 0 sinh 0 0
¼ ð- 2mÞ 2 2
ω ω
- sinh 0 cosh 0 0 E2
2 2
ω ω
0 sinh 0 cosh
2 2
ω ω
cosh 0 sinh 0
2 2
ω ω
0 cosh 0 - sinh
2 2
×
ω 0 ω 0
sinh cosh
2 2
ω ω
0 - sinh 0 cosh
2 2
1216 24 Basic Formalism
ω ω ω
- sin h2 0 - cosh sinh 0
2 2 2
ω ω ω
0 - sin h2 0 cosh sinh
2 2 2
¼ ð - 2mÞ :
ω ω ω
cosh sinh 0 cos h2 0
2 2 2
ω ω ω
0 - cosh sinh 0 cos h2
2 2 2
ð24:329Þ
p′ 0 - m 0 p′ 3 0
0 p′ - m
0
0 - p′ 3
Sω ASω- 1 = p′ 3 > 0 :
- p′ 3 0 - p′ 0 - m 0
0 p′ 3 0 - p′ 0 - m
ð24:330Þ
This is identical with RHS of (21.80). The Dirac spinor is given by e.g.,
ω
1 cosh
2
0 ′
x′ 0
ψ 1′ x ′ = Sω = e - ip ω
0 - sinh
0 2
0
1
0
′ ′ p′0 þ m j p′3 j
= e - ip x
- ′0 , ð24:331Þ
2m p þm
0
24.6 Transformation Properties of the Dirac Operators [9] 1217
0
0 ω
1 cosh
′ ′ 2
ψ 2′ x ′ = Sω = e - ip x
0 0
0 ω
sinh
2
0
1
′
x′ p′0 þ m
= e - ip 0 : ð24:332Þ
2m ′3
jp j
p′0 þ m
Similarly, we have
ω
0 - sinh
2
0 ′
x′ 0
ψ 3′ x ′ = Sω = eip ω
1 cosh
0 2
0
j p′3 j
-
′
x′ p′0 þ m p′0 þ m
= eip 0 , ð24:333Þ
2m
1
0
0
ω
0 sinh
0 ′ ′ 2
ψ 4′ x ′ = Sω = eip x
0 0
-1
ω
cosh
2
0
j p′3 j
′ ′ p′0 þ m - ′0
= eip x p þm : ð24:334Þ
2m 0
-1
In the case of the Lorentz boost, we see that from (24.322) to (24.325) the
components of upper two rows and lower two rows are intermixed with the two
pairs of Dirac spinors ψ 01 ðx0 Þ, ψ 02 ðx0 Þ and ψ 03 ðx0 Þ, ψ 04 ðx0 Þ. This is because whereas one
is observing the electron at rest, the other is observing the moving electron.
Using (24.301) and (24.302), we have
B = - 2m - A: ð24:335Þ
Therefore, multiplying both sides of (24.335) by Sω from the left and Sω- 1 from
the right and further using (24.330), we get
1218 24 Basic Formalism
Sω BSω- 1 = - 2m - Sω ASω- 1
-p′0 -m 0 - p′3 0
0 -p′0 -m 0 p′3
= : ð24:336Þ
p′3 0 p′0 -m 0
0 -p′3 0 p′0 -m
Equation (24.336) is the operator that gives the negative-energy solution of the
Dirac equation. From (24.293) we see that S(Λ) is decomposed into S(Λ) = SRSω.
Although SR is unitary, Sω is not a unitary matrix nor a normal matrix either.
Accordingly, S is not a unitary matrix as a whole. Using (24.329) in combination
with (24.318), we obtain
ω ω ω iϕ ω ω
-sinh2 0 -cos θ cosh sinh e sin θ cosh sinh
2 2 2 2 2
ω ω ω ω ω
0 -sinh2 e-iϕ sin θ cosh sinh cos θ cosh sinh
2 2 2 2 2 :
ω ω ω ω ω
cos θ cosh sinh -e sin θ cosh sinh
iϕ
cosh2 0
2 2 2 2 2
ω ω ω ω ω
-e-iϕ sin θ cosh sinh -cos θ cosh sinh 0 cosh2
2 2 2 2 2
ð24:337Þ
Using (24.314) and (21.79), (24.337) can be rewritten in the polar coordinate
representation as
SAS - 1
p′ 0 - m 0 j p′j cos θ - j p′j eiϕ sin θ
- iϕ
0 p′ - m
0
- j p′j e sin θ -j p′j cos θ
= :
- j p′j cos θ j p′j eiϕ sin θ - p′ 0 - m 0
j p′j e - iϕ sin θ j p′j cos θ 0 - p′ 0 - m
ð24:338Þ
Converting the polar coordinate into the Cartesian coordinate, we find that
(24.338) is identical with the (4, 4) matrix of (21.75). Namely, we have successfully
gotten
SAS - 1 = G: ð24:339Þ
-1
We remark that when deriving (24.338), we may choose Λ~b of (24.240) for
-1
the Lorentz boost. Using S Λb~ in place of S(Λ) = S(Λϕ)S(Λθ)[S(Λb)]-1,
instead of SAS-1 of (24.338) we get
24.6 Transformation Properties of the Dirac Operators [9] 1219
-1
S Λ~b A S Λ~b ,
which yields the same result as SAS-1 of (24.338). This is evident in light of
(24.321). Corresponding to (24.243), we have
-1
S Λ~b = SðΛu ÞS Λb- 1 S Λ{u = SðΛu ÞS Λb- 1 S Λu- 1
ω ω ω
cosh 0 - cos θ sinh eiϕ sin θ sinh
2 2 2
ω ω ω
0 cosh e - iϕ sin θ sinh cos θ sinh
2 2 2
¼ :
ω ω ω
- cos θ sinh eiϕ sin θ sinh cosh 0
2 2 2
- iϕ ω ω ω
e sin θ sinh cos θ sinh 0 cosh
2 2 2
ð24:340Þ
-1
In (24.340) we obtain the representation S Λ~b as a consequence of viewing
′
the operator S Λb- 1in reference to the frame O . The trace of (24.340) is 4 cosh ω2 ,
which is the same as [S(Λb)]-1, reflecting the fact that the unitary similarity trans-
formation keeps the trace unchanged; see (24.328).
Now, the significance of (24.339) is evident and consistent with (24.281). Tracing
the calculation (24.339) reversely, we find that S - 1 GS = A. That is, G has been
diagonalized to yield A through the similarity transformation with S. The operator G
possesses the eigenvalues of -2m (doubly degenerate) and zero (doubly degener-
ate). We also find that the rank of G is two. From (24.338) and (24.339), we find that
the trace of G is -4m that is identical with the trace of the matrices A and B of
(24.301) and (24.302).
Let us get back to (24.282) once again. Sandwiching both sides of (24.282)
between S-1 and S from the left and right, respectively, we obtain
~ = - 2mS - 1 ES:
S - 1 GS þ S - 1 GS ð24:341Þ
Table 24.2 Eigenvalues and their corresponding Dirac spinors (eigenspinors) of the Dirac
operators
Dirac operatora Dirac spinorb
Inertial frame Inertial frame
O O′ Eigenvalue O O′
A G 0 ψ 1(x), ψ 2(x) Sψ 1(x), Sψ 2(x)
-2m ψ 3(x), ψ 4(x) Sψ 3(x), Sψ 4(x)
B ~
G 0 ψ 3(x), ψ 4(x) Sψ 3(x), Sψ 4(x)
-2m ψ 1(x), ψ 2(x) Sψ 1(x), Sψ 2(x)
a
A and B are taken from (24.301) and (24.302), respectively. G and G ~ are connected to A and
B through similarity transformation by S, respectively; see (24.339) and (24.342)
b
ψ 1(x) and ψ 2(x) are taken from (24.305); ψ 3(x) and ψ 4(x) are taken from (24.310).
~ = - 2mE - A = B
S - 1 GS or ~
SBS - 1 = G: ð24:342Þ
In (24.343) ϕ(x) is chosen from among ψ 1(x) and ψ 2(x) of (24.305) and represents
the positive-energy state of an electron at rest; χ(x) from among ψ 3(x) and ψ 4(x) of
(24.310) and for the negative-energy state of an electron at rest. Using (24.301),
(24.303) is rewritten as
AϕðxÞ = 0: ð24:344Þ
where G = SAS - 1 and ϕ′(x′) = Sϕ(x). Taking a reverse process of the above three
equations, we get
The implication of (24.348) is that once G has been diagonalized with S, ϕ′(x′) is
at once given by operating S on a simple function ϕ(x) that represents the state of an
electron at rest. This means that the Dirac equation has been solved.
With negative-energy state of an electron, starting with (24.306), we reach a
related result described by
~ 0 ð x0 Þ = 0
Gχ ð24:349Þ
with G~ = SBS - 1 and χ ′(x′) = Sχ(x). Evidently, (24.346) and (24.349) correspond to
(21.73) and (21.74), respectively. At the same time, we find that both G and G ~
possess eigenvalues -2m and 0. It is evident from (24.303) and (24.306).
At the end of this section, we remark that in relation to the charge conjugation
mentioned in Sect. 21.5, using the matrix C of (21.164) we obtain following
relations:
As already seen earlier, the Lorentz transformations include the unitary operators
that are related to the spatial rotation and Hermitian operators (or real symmetric
operators) called Lorentz boost. In the discussion made thus far, we focused on the
transformation that included only a single Lorentz boost. In this section, we deal with
the case where the transformation contains complex Lorentz boosts, especially
non-coaxial Lorentz boosts. When the non-coaxial Lorentz boosts are involved,
we encounter a somewhat peculiar feature.
As described in Sect. 24.2.2, the successive Lorentz transformations consisted of
a set of two spatial rotations and a single Lorentz boost. In this section, we further
make a step toward constructing the Lorentz transformations. Considering those
three successive transformations as a unit, we combine two unit transformations so
that the resulting transformation can contain two Lorentz boosts. We have two cases
for this: (i) The boosts are coaxial. (ii) The boosts are non-coaxial. Whereas in the
former case the boost directions are the same when viewed from a certain inertial
frame of reference, in the latter case the boost directions are different.
1222 24 Basic Formalism
As can be seen from Fig. 24.2a, if the rotation angle θ ≠ 0 in the frame Oθ, the
initial Lorentz boost and subsequently occurring boost are non-coaxial. Here, we
assume that the second Lorentz boost takes place along the z′-axis of the frame O′. In
the sense of the non-coaxial boost, the situation is equivalent to a following case: An
observer A1 is watching a moving electron with a constant velocity in the direction
of, say the z-axis in a certain inertial frame of reference F1 where the observer A1
stays at rest. Meanwhile, another observer A2 is moving with a constant velocity in
the direction of, say along the x-axis relative to the frame F1. On this occasion, we
have a question of how the electron is moving in reference to the frame F2 where the
observer A2 stays at rest.
In Sect. 24.2.2 we expressed the successive transformations as
Λ = Λϕ Λθ Λb- 1 : ð24:236Þ
Suppose that (ϕ′, θ′, b′) takes place subsequently to the initial transformation
(ϕ, θ, b).
The overall transformation Ξ is then described such that
Note that in (24.351) we are viewing the transformation in the moving coordinate
systems. That is, in the second transformation Λ′ the boost takes place in the
direction of the z′-axis of the frame O′ (Fig. 24.2a). As a result, the successive
transformations Λ′Λ cause the non-coaxial Lorentz boosts in the case of θ ≠ 0. In a
special case where θ = 0 is assumed for the original transformation Λ, however,
Ξ = Λ′Λ produces co-axial boosts.
Let the general form of the Dirac equation be
Dψ ðxÞ = 0, ð24:352Þ
(24.296) with the latter equation. Operating S(Λ) of (24.293) on (24.352) from the
left, we obtain
D0 ψ 0 ðx0 Þ = 0: ð24:354Þ
-1
SðΛ0 ÞD0 ½SðΛ0 Þ SðΛ0 Þψ 0 ðx0 Þ = 0, ð24:355Þ
-1
where Λ′ was given in (24.350). Once again, defining D00 SðΛ0 ÞD0 ½SðΛ0 Þ and
ψ ′′(x′′) S(Λ′)ψ ′(x′), we have
We assume that after the successive Lorentz transformations Λ and Λ′, we reach
the frame O′′. Also, we have
In the second equation of (24.357), with the last equality we used the fact that
S(Λ) is a representation of the Lorentz group. Repeatedly operating S(Λ), S(Λ′), etc.,
we can obtain a series of successively transformed Dirac spinors and corresponding
Dirac operators. Thus, the successive transformations of the Dirac operators can be
viewed as a similarity transformation as a whole. In what follows, we focus on a case
where two non-coaxial Lorentz boosts take place.
Example 24.4 As a tangible example, we show how the Dirac spinors and Dirac
operators are transformed as a result of the successively occurring Lorentz trans-
formations. Table 24.3 shows a (4, 4) matrix of Ξ = Λ′Λ given in (24.351).
Table 24.4 lists another (4, 4) matrix of S(Ξ) = S(Λ′Λ) = S(Λ′)S(Λ). The matrix
S(Λ) is given in (24.293). The matrix S(Λ′) is expressed as
-1
S Λ ′ = S Λϕ ′ S Λθ ′ S Λb ′ =
′
=2 θ′ ω′ ′ θ′ ω′ ′ θ′ ω′ ′ θ′ ω′
eiϕ cosh
cos eiϕ =2 sin cosh - eiϕ =2
cossinh eiϕ =2 sin sinh
2 2 2 2 2 2 2 2
- iϕ ′ =2 θ′ ω ′ - iϕ ′ =2 θ′ ω′ - iϕ ′ =2 θ′ ω′ - iϕ ′ =2 θ′ ω′
-e sin cosh e cos cosh e sin sinh e cos sinh
2 2 2 2 2 2 2 2 :
iϕ ′ =2 θ′ ω′ iϕ ′ =2 θ′ ω′ iϕ ′ =2 θ′ ω′ iϕ ′ =2 θ′ ω′
-e cos sinh e sin sinh e cos cosh e sin cosh
2 2 2 2 2 2 2 2
′ ′ ′ ′ ′
′ θ ω ′ θ ω′ ′ θ ω ′
′ θ ω′
e - iϕ =2 sin sinh e - iϕ =2 cos sinh - e - iϕ =2 sin cosh e - iϕ =2 cos cosh
2 2 2 2 2 2 2 2
ð24:358Þ
1224
cos ϕ ′ sin θ ′ sinhω ′ coshω cos ϕ ′ cos θ ′ cos ϕ cos θ - cos ϕ ′ cos θ ′ sin ϕ cos ϕ ′ sin θ ′ sinhω ′ sinhω
þ cos ϕ ′ cos θ ′ cos ϕ sin θsinhω - sin ϕ ′ sin ϕ cos θ - sin ϕ ′ cos ϕ þ cos ϕ ′ cos θ ′ cos ϕ sin θcoshω
- sin ϕ ′ sin ϕ sin θsinhω - cos ϕ ′ sin θ ′ sin θcoshω ′ - sin ϕ ′ sin ϕ sin θcoshω
þ cos ϕ ′ sin θ ′ cos θcoshω ′ sinhω þ cos ϕ ′ sin θ ′ cos θcoshω ′ coshω
sin ϕ ′ sin θ ′ sinhω ′ coshω sin ϕ ′ cos θ ′ cos ϕ cos θ - sin ϕ ′ cos θ ′ sin ϕ sin ϕ ′ sin θ ′ sinhω ′ sinhω
þ sin ϕ ′ cos θ ′ cos ϕ sin θsinhω þ cos ϕ ′ sin ϕ cos θ þ cos ϕ ′ cos ϕ þ sin ϕ ′ cos θ ′ cos ϕ sin θcoshω
þ cos ϕ ′ sin ϕ sin θsinhω - sin ϕ ′ sin θ ′ sin θcoshω ′ þ cos ϕ ′ sin ϕ sin θcoshω
þ sin ϕ ′ sin θ ′ cos θcoshω ′ sinhω þ sin ϕ ′ sin θ ′ cos θcoshω ′ coshω
cos θ ′ sinhω ′ coshω - sin θ ′ cos ϕ cos θ sin θ ′ sin ϕ cos θ ′ sinhω ′ sinhω
- sin θ ′ cos ϕ sin θsinhω - cos θ ′ sin θcoshω ′ - sin θ ′ cos ϕ sin θcoshω
þ cos θ ′ cos θcoshω ′ sinhω þ cos θ ′ cos θcoshω ′ coshω
24
Basic Formalism
24.6
0 θ0 θ ω0 þ ω 0 θ0 θ ω0 - ω
- e - iðϕ - ϕÞ=2 sin cos cosh - e - iðϕ - ϕÞ=2 sin sin cosh 0 θ0 θ ω0 þ ω 0 θ0 θ ω0 - ω
2 2 2 2 2 2 e - iðϕ - ϕÞ=2 sin cos sinh e - iðϕ - ϕÞ=2 sin sin sinh
2 2 2 2 2 2
- iðϕ0 þϕÞ=2 θ0 θ ω0 - ω - iðϕ0 þϕÞ=2 θ0 θ ω0 þ ω 0 θ0 θ ω0 - ω
-e cos sin cosh þe cos cos cosh - e - iðϕ þϕÞ=2 cos sin sinh - iðϕ0 þϕÞ=2 θ0 θ ω0 þ ω
2 2 2 2 2 2 2 2 2 þe cos cos sinh
2 2 2
0 θ0 θ ω0 - ω
0 θ0 θ ω0 þ ω 0 θ0 θ ω0 - ω 0
- eiðϕ þϕÞ=2 cos cos sinh - eiðϕ þϕÞ=2 cos sin sinh eiðϕ þϕÞ=2 cos
θ0 θ
cos cosh
ω0 þ ω eiðϕ þϕÞ=2 cos sin cosh
2 2 2 2 2 2 2 2 2 2 2 2
0 θ0 θ ω0 þ ω
θ0 θ ω0 - ω θ0 θ ω0 þ ω 0 θ0 θ ω0 - ω þeiðϕ - ϕÞ=2 sin cos cosh
iðϕ0 - ϕÞ=2 iðϕ0 - ϕÞ=2
-e sin sin sinh þe sin cos sinh - eiðϕ - ϕÞ=2 sin sin cosh 2 2 2
Transformation Properties of the Dirac Operators [9]
2 2 2 2 2 2 2 2 2
0 θ0 θ ω0 þ ω 0
θ0 θ ω0 - ω
0 θ0 θ ω0 þ ω 0 θ0 θ ω0 - ω - e - iðϕ - ϕÞ=2 sin cos cosh - e - iðϕ - ϕÞ=2 sin sin cosh
e - iðϕ - ϕÞ=2 sin cos sinh e - iðϕ - ϕÞ=2 sin sin sinh 2 2 2 2 2 2
2 2 2 2 2 2 0 θ0 θ ω0 - ω 0 θ0 θ ω0 þ ω
0 θ0 θ ω0 - ω 0 θ0 θ ω0 þ ω - e - iðϕ þϕÞ=2 cos sin cosh þe - iðϕ þϕÞ=2 cos cos cosh
- e - iðϕ þϕÞ=2 cos sin sinh þe - iðϕ þϕÞ=2 cos cos sinh 2 2 2 2 2 2
2 2 2 2 2 2
1225
1226
′ ′
cosh2ω2 cosh2ω2 þ sinh2ω2 sinh2ω2
þ12 cos θ sinhω coshω þ cos θcoshω ′ sinhω ð4; 1Þ þ12 cos θsinhω ′ sinhω 0
′ ′
1 - iϕ ′
2e sin θsinhω eiϕ sin 2θ2 - e - iϕ cos 2θ2
′ - ð3; 1Þ
- 12e - iϕ sin θ ′ cos θcoshω ′ sinhω þ sinhω ′ coshω 0 ð3; 3Þ
a
In the above matrix, e.g., (1, 1) means that the (2, 2) matrix element should be replaced with the (1, 1) element
24
Basic Formalism
24.6 Transformation Properties of the Dirac Operators [9] 1227
As shown in Sect. 24.5, the Dirac spinors ψ(x) was expressed in reference to the
rest frame of the electron as
Then, we have
The spinors w(0, h) were given in (24.305) and (24.310) according to the positive-
and negative-energy electron. Thus, individual columns of S(Ξ) in Table 24.4 are
directly associated with the four spinors of either the positive- or negative-energy
electron, as in the case of (24.293). The orthogonal characteristics hold with the
Dirac spinors as in (21.141) and (21.142). Individual matrix elements of S(Ξ) consist
of two complex terms. Note, however, in the case of the co-axial boosts (i.e., θ = 0)
each matrix element comprises a single term. In both the co-axial and non-coaxial
boost cases, individual four columns of S(Λ′Λ) represent the Dirac spinors with
respect to the frame O′′. As mentioned before, the sign for ψ 004 ðx00 Þ should be reversed
relative to the fourth column of S(Λ′Λ) because of the charge conjugation.
Meanwhile, Table 24.5 lists individual matrix elements of a (4, 4) matrix of the
Dirac operator D00 with the positive-energy electron. Using S(Λ′Λ), D00 viewed in
reference to the frame O′′ is described by
where A was given in (24.301). As can be seen in (21.75), the Dirac operator
contains all the information of the rest mass of an electron and the energy-
momentum of a plane wave electron. This is clearly stated in (24.278). In the x′′-
system (i.e., the frame O′′) the Dirac equation is described in a similar way such that
00
iγ μ ∂ μ - m ψ 00 ðx00 Þ = 0:
The point is that in any inertial frame of reference the matrix form of the gamma
matrices is held invariant. Or rather, S(Ξ) has been chosen for the gamma matrices to
be invariant in such a way that
Λνμ Sγ μ S - 1 = γ ν : ð24:276Þ
p00 μ γ μ - m ψ 00 ðx00 Þ = 0,
where p′′μ is a four-momentum in O′′. Hence, even though the matrix form of
Table 24.5 is somewhat complicated, the Dirac equation must uniquely be
represented in any inertial frame of reference.
Thus, from Table 24.5 we get the following four-momenta in the frame O′′
described by
where (m, n) to which 1 ≤ m, n ≤ 4 indicates the (m, n)-element of the matrix given in
Table 24.5. As noted from (21.75), we can determine other matrix elements from the
four elements indicated in Table 24.5, in which we certainly confirm that
(1, 1) + (3, 3) = - 2m. This is in agreement with G of (21.75) and (21.91). Thus,
from (24.361) and Table 24.5 we get
2
p000 - p002 - m2 = 0 or p000 = p002 þ m2 : ð21:79Þ
As seen above, the basic construction of the Dirac equation remains unaltered by
the operation of successive Lorentz transformations including the non-coaxial
boosts.
24.7 Projection Operators and Related Operators for the Dirac Equation 1229
Example 24.5 We show another simple example of the non-coaxial Lorentz boosts.
Suppose that in the moving coordinate systems we have successive non-coaxial
boosts Λ described by
In this section, we examine the constitution of the Dirac equation in terms of the
projection operator. In the next section, related topics are discussed as well. To this
end, we wish to deal with the issue from a somewhat general standpoint. Let G be
expressed as
pE H
G= ðp ≠ qÞ,
-H qE
p2 E þ H 2 ðq - pÞH p2 E þ H 2 ðp - qÞH
G{ G = , GG{ = : ð24:364Þ
ðq - pÞH q2 E þ H 2 ðp - qÞH q2 E þ H 2
A B
þ = E, ð24:365Þ
ð- 2mÞ ð- 2mÞ
where E is the identity operator; i.e., a (4, 4) identity matrix. Operating S from the left
and S-1 from the right on both sides of (24.365), respectively, we get
SAS - 1 SBS - 1
þ = SES - 1 = E or SAS - 1 þ SBS - 1 = ð- 2mÞE: ð24:366Þ
ð- 2mÞ ð- 2mÞ
Operating both sides of (24.366) on χ ′(x′) of (24.349) from the left and taking
account of (24.282) once again, we have
Now, suppose that the general solutions of the plane wave Dirac field Ψ(x′) are
given by
4
Ψðx0 Þ = α ψ 0 ðx0 Þ,
i=1 i i
ð24:369Þ
where ψ 0i ðx0 Þ are those of (24.311) and αi are their suitable coefficients. Then,
operating G ~ from the left on both sides of (24.369), we get
~ ðx0 Þ =
GΨ
2
ð- 2mÞαi ψ 0i ðx0 Þ, ð24:370Þ
i=1
24.7 Projection Operators and Related Operators for the Dirac Equation 1231
4
GΨðx0 Þ = i=3
ð- 2mÞαi ψ 0i ðx0 Þ: ð24:371Þ
Equations (24.372) and (24.373) mean that GΨ~ ðx0 Þ=ð- 2mÞ extracts the positive-
′
energy states from Ψ(x ) including the coefficients and that GΨðx0 Þ=ð- 2mÞ extracts
the negative-energy states from Ψ(x′) including the coefficients as well. These imply
~ ðx0 Þ=ð- 2mÞ and GΨðx0 Þ=ð- 2mÞ act as the projection
that both the operators GΨ
operators (see Sects. 14.1 and 18.7). Defining P and Q as
G~ - pμ γ μ - m pμ γ μ þ m
P = = ,
ð- 2mÞ ð- 2mÞ 2m
G pμ γ μ - m - pμ γ μ þ m
Q = = ð24:374Þ
ð- 2mÞ ð- 2mÞ 2m
P þ Q = E: ð24:375Þ
pμ γ μ þ m p ρ γ ρ þ m pμ γ μ pρ γ ρ þ 2mpρ γ ρ þ m2
P2 = = : ð24:376Þ
ð2mÞð2mÞ ð2mÞð2mÞ
1 1
p μ γ μ pρ γ ρ = pμ pρ γ μ γ ρ = p p ðγ μ γ ρ þ γ ρ γ μ Þ = pμ pρ 2ημρ = pμ pμ
2 μ ρ 2
= p2 = m 2 , ð24:377Þ
where with the third equality we used (21.64). Substituting (24.377) into (24.376),
we rewrite it as
2m pρ γ ρ þ m pμ γ μ þ m
P2 = = = P, ð24:378Þ
ð2mÞð2mÞ 2m
where the last equality came from the first equation of (24.374). Or, using the matrix
algebra we reach the same result at once. That is, we have
1232 24 Basic Formalism
SBS - 1
P= ,
ð- 2mÞ
SBS - 1 SBS - 1 SB2 S - 1 ð- 2mÞSBS - 1 SBS - 1
P2 = = = = = P, ð24:379Þ
ð- 2mÞ ð- 2mÞ ð- 2mÞ2 ð- 2mÞ2 ð- 2mÞ
Q2 = Q: ð24:380Þ
Moreover, we have
PQ = QP = 0: ð24:381Þ
The relations (24.375) through (24.381) imply that P and Q are projection
operators. However, notice that neither P nor Q is Hermitian. This property
originally comes from the involvement of the Lorentz boost term of (24.293). In
this regard, P and Q can be categorized as projection operators sensu lato (Sect.
18.7), even though these operators are idempotent as can be seen from (24.378) and
(24.380). Equation (24.375) represents the completeness relation.
Let us consider the following example.
Example 24.6 We have an interesting and important example in relation to the
projection operators and spectral decomposition dealt with in Sect. 14.3. A projec-
tion operator was defined using an adjoint operator (see Sect. 14.1) such that
ðjwi ihwi jÞ{ = ðhwi jÞ{ ðjwi iÞ{ = jwi ihwi j : ð14:28Þ
To apply this notation to the present case, we define following vectors in relation
to (24.304) and (24.309) as:
Defining
~ A ~ B
P and Q , ð24:385Þ
ð- 2mÞ ð- 2mÞ
we have
24.7 Projection Operators and Related Operators for the Dirac Equation 1233
2 2
P þ Q = E, P = P, Q = Q, PQ = 0: ð24:386Þ
Hence, both P ~ and Q~ are Hermitian projection operators (i.e., projection operators
sensu stricto; see Sect. 18.7). This is self-evident, however, notice that neither A nor
B in (24.385) involves a Lorentz boost. Then, using (24.383) and (24.384), we obtain
1 0 0 0 0 0 0 0
0 0 0 0 þ 0 1 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
þ 0 0 0 0 þ 0 0 0 0 = E: ð24:388Þ
0 0 1 0 0 0 0 0
0 0 0 0 0 0 0 1
1
S σp ð21:85Þ
jpj
and
~ 1
S σ ð- pÞ = - S: ð21:107Þ
j - pj
The above operators were (2, 2) submatrices (Sect. 21.3). The projection opera-
~
~ of (4, 4) full matrix can be defined by use of the direct sum of S or S
tors Σ and Σ
such that
1 S
Σ Eþ 0
S
, ð24:389Þ
2 0
~
~ 1 Eþ
Σ S 0
, ð24:390Þ
2 0 ~
S
2
Σ þ Σ = E, Σ2 = Σ, Σ = Σ, ΣΣ = ΣΣ = 0: ð24:391Þ
~ { = Σ:
Σ{ = Σ and Σ ~ ð24:392Þ
χ
Χ= hSχ
: ð21:100Þ
1 S χ 1 χ 1 Sχ 1 χ 1
ΣΧ = Eþ 0
S
= þ = þ hχ
2 0 hSχ 2 hSχ 2 hSSχ 2 hSχ 2 hShχ
1 χ 1 χ
= þ h : ð24:393Þ
2 hSχ 2 hSχ
χ
If in (24.393) h = 1, we have ΣΧ = hSχ
. If h = - 1, on the other hand,
χ
ΣΧ = 0. Consequently, Σ extracts hSχ
intactly when h = 1, whereas it discards
χ
hSχ
in whole when h = - 1.
For another example of the projection operator, we introduce
~
~h 1 E þ h
Σ S 0
: ð24:394Þ
2 0 ~
S
~ ~χ
~ h Χ0 ¼ 1 E þ h
Σ S 0 S~χ
¼
1 S~χ 1
þ h S S~
2 0 ~
S - h~χ 2 - h~χ 2 ~χ
- hS~
Equation (24.395) looks trivial, but it clearly shows that Χ′ is certainly a proper
negative-energy eigenvector that belongs to the helicity h. Similarly, we define
24.8 Spectral Decomposition of the Dirac Operators 1235
1 S
Σh Eþh 0
S
: ð24:396Þ
2 0
Also, we have
χ Sχ
Σh Χ = Σ h = Χ, Σh Χ0 = Σh = 0, Σh Χ = 0, ð24:397Þ
hSχ - hχ
{
~h = Σ
Σ h 2 = Σh , Σ
2
~h, Σ
~ h Σ h = Σh Σ ~ =Σ
~ h = 0, Σ{ = Σh , Σ ~ h , etc: ð24:398Þ
h h
To the above derivation, use (21.89) and (21.115). It is obvious that both Σh and
~ h are projection operators sensu stricto.
Σ
In the previous section, we have shown how the Dirac operators G and G ~ (with the
eigenvalues -2m and 0 both doubly degenerate) can be diagonalized. In Sect. 14.3
we discussed the spectral decomposition and its condition. That is, we have shown
that a necessary and sufficient condition for an operator to be a normal operator is
that the said operator is expressed as the relation (14.74) called the spectral decom-
position. This is equivalent to the statement of Theorem 14.5. In this case, relevant
projection operators are the projection operators sensu stricto (i.e., Hermitian and
idempotent).
Meanwhile, in Sects. 12.5 and 12.7 we mentioned that similar decomposition can
be done if an operator in question can be diagonalized via similarity transformation
(namely, in the case where the said operator is semi-simple). In that case, Definition
14.1 of the projection operator has been relaxed so that the projection operators
sensu lato can play a role (see Sect. 18.7). The terminology of spectral decomposi-
tion should be taken in a broader meaning accordingly. Bearing in mind the above
discussion, we examine the conditions for the spectral decomposition of the Dirac
operators G and G. ~ Neither G nor G ~ is Hermitian or normal, but both of them are
semi-simple.
Now, operating S and S-1 on both sides of (24.387) from the left and right,
respectively, we have
If the Lorentz boost is absent, the representation matrix S related to the Lorentz
transformation is unitary (i.e., normal). In this case, (24.399) is rewritten as
1236 24 Basic Formalism
Defining
γ 0 S{ γ 0 = S - 1 : ð24:404Þ
Sðj1ih1jþj2ih2j þ j3ih3jþj4ih4jÞγ 0 S{ γ 0 = E:
Note that | 1′i and | 2′i represent two linearly independent solutions ϕ′(x′) of
(24.346) and that | 3′i and | 4′i represent two linearly independent solutions χ ′(x′) of
(24.349).
Using h1| γ 0 = h1| , h2|γ 0 = h2j; h3| γ 0 = - h3| , h4|γ 0 = - h4j, we get
ðj10 ih10 jþj20 ih20 j - j30 ih30 j - j40 ih40 jÞγ 0 = E: ð24:407Þ
Defining
hk 0 jγ 0 hk 0 j, ð24:408Þ
we obtain
Note that (24.408) is equivalent to (21.135) that defines the Dirac adjoint.
Equation (24.409) represents the completeness relation of the Dirac spinors; be
aware of the minus signs of the third and fourth terms. Multiplying P of (24.374)
on both sides of (24.409) from the left and using (24.349) and (24.367), we have
pμ γ μ þ m
P j10 ih10 j þ j20 ih20 j = j10 ih10 j þ j20 ih20 j = P = : ð24:410Þ
2m
In (24.410), we identify j10 ih10 j þ j20 ih20 j with ( pμγ μ + m)/2m. This can readily be
confirmed using the coordinate representation ψ 01 ðx0 Þ and ψ 02 ðx0 Þ of (24.311) for | 1′i
and | 2′i, respectively. Readers are encouraged to confirm this.
Multiplying Q of (24.374) on both sides of (24.409) from the left and using
(24.346) and (24.368), in turn, we obtain
pμ γ μ - m
j30 ih30 j þ j40 ih40 j = - Q = : ð24:412Þ
2m
P þ Q = E: ð24:375Þ
Also, we obtain
1238 24 Basic Formalism
j10 ih10 j j10 ih10 j = j10 ih10 j, j20 ih20 j j20 ih20 j = j20 ih20 j,
Similarly, we have
j30 ih30 j j30 ih30 j = - j30 ih30 j, j40 ih40 j j40 ih40 j = - j40 ih40 j,
For instance, (24.415) can readily be checked using (24.311) and (24.337).
Equation (24.413) can similarly be checked from (24.282).
Looking at (24.413) and (24.415), we find the resemblance between them and
(12.196). In fact, both these cases offer examples of the spectral decomposition (see
Sects. 12.5 and 12.7). Taking account of the fact that eigenvalues of a matrix remain
unchanged after the similarity transformation (Sect. 12.1), we find that the coeffi-
cients -2m for RHS of (24.413) and (24.415) are the eigenvalues (doubly degener-
ate) for the matrix B of (24.302) and A of (24.301).
From (24.414), j10 ih10 j and j20 ih20 j are idempotent operators. Also, from
(24.415), - j30 ih30 j and - j40 ih40 j are idempotent as well. The presence of the two
idempotent operators in (24.413) and (24.415) reflects that the rank of (-pμγ μ - m)
and ( pμγ μ - m) is 2. Notice that neither the operator j30 ih30 j nor j40 ih40 j, on the other
hand, is an idempotent operator but, say, “anti-idempotent” operator. That is, for an
anti-idempotent operator A we have
A2 = - A,
where its eigenvalues are 0 or -1. The operators of both types play an important role
in QED. Meanwhile, we have
{ { {
jk0 ihk 0 j = jk 0 ihk 0 jγ 0 = γ 0 jk 0 ihk 0 j = γ 0 jk0 ihk0 j
Hence, jk0 ihk0 j is not Hermitian. The operator - jk 0 ihk 0 j is not Hermitian, either.
Thus, the operators ± jk0 ihk0 j may be regarded as the projection operators sensu lato
accordingly. The non-Hermiticity of the projection operators results from the fact
that S is not unitary; see the discussion made in Sect. 24.7.
As mentioned earlier in this section, however, we have an exceptional case where
the Lorentz boost is absent; see Sect. 24.6.1. In this case, we can construct the
projection operators sensu stricto; see (24.327), (24.387), and (24.403). As can be
24.8 Spectral Decomposition of the Dirac Operators 1239
seen in these examples, the Dirac operator G is Hermitian. Then, as pointed out in
Sect. 21.3.4, we must have the complete orthonormal set (CONS) of common
eigenfunctions with the operators G and H according to Theorem 14.14. Then, the
CONS constitutes the diagonalizing unitary matrix S(Λ). This can easily be checked
using (21.86), (21.102), (24.319) and (24.320). With H, we obtain
-1 0 0 0
-1 { 0 1 0 0
SR HSR = SR HSR = ,
0 0 -1 0
0 0 0 1
where SR is given by (24.318); the diagonal elements are eigenvalues -1 and 1 (each
doubly degenerate) of H. Individual columns of (24.318) certainly constitute the
CONS. Denoting them from the left as | ji ( j = 1, 2, 3, 4), we have
4
hjjji = δij, j=1
jjihj j = E:
The operator | jihjj is a projection operator sensu stricto. The above results are
essentially the same as those of Example 24.6. With G ð = AÞ, the result is self-
evident.
What about S - 1 HS, then? As already seen in Sect. 21.3.2, G and H commute
with each other; see (21.93). Consequently, we expect to have common eigenvectors
for G and H. In other words, once G has been diagonalized with S as in (24.339), H
may well be diagonalized with S as well. It is indeed the case. That is, we also have
-1 0 0 0
-1 0 1 0 0
S HS = Sω- 1 SR -1
HSR Sω = , ð24:416Þ
0 0 -1 0
0 0 0 1
where Sω is given by (24.328). Thus, we find that the common eigenvectors for G
and H are designated by the individual column vectors of S given in (24.293). As
already shown in Sects. 21.4 and 24.5, these column vectors do not
constitute CONS.
~ given in (21.116), similarly we get
Regarding the operator H
1 0 0 0
-1~ -1 0 -1 0 0
S HS= -S HS = : ð24:417Þ
0 0 1 0
0 0 0 -1
We take the normalization as in (21.141) and (21.142) and define the operators
| u-i, | u+i, etc. such that
1240 24 Basic Formalism
p p p p
2m j 1 ′ j ju - i, 2m 2 ′ juþ i, 2m j3 ′ jvþ i, 2m 4 ′ jv - i;
p p p p
2m h10 j hu - j, 2m h20 j huþ j, 2m h30 j hvþ j, 2m h40 j hv - j,
ð24:418Þ
where the subscripts - and + denote the helix -1 and +1, respectively. Thus,
rewriting (24.410) and (24.412) we get the relations described by
ju - ihu - j ju - ihu - j = 2mju - ihu - j, juþ ihuþ j juþ ihuþ j = 2mjuþ ihuþ j:
jvþ ihvþ j jvþ ihvþ j = - 2mjvþ ihvþ j, jv - ihv - j jv - ihv - j = - 2mjv - ihv - j:
-1 { 0
ψ ′ ðx0 Þ = Sψ ðxÞ = ½Sψ ðxÞ{ γ 0 = ψ { ðxÞS{ γ 0 = ψ ðxÞ γ 0 Sγ
where with the last equality we used (24.404). Multiplying both sides of (24.421) by
the Dirac operator G from the right, we obtain
where with the second equality we used (24.339). Multiplying both sides of (24.421)
~ from the right, we get
by the Dirac operator G
~ = ψ ðxÞS - 1 G
ψ ′ ðx0 ÞG ~ = ψ ðxÞS - 1 SBS - 1 = ψ ðxÞBS - 1 ,
where with the second equality we used (24.342). If we choose ψ 1(x) or ψ 2(x) of
(24.305) for ψ(x), we have ψ ðxÞA = 0. This leads to
ψ ′ ðx0 ÞG = 0, ð24:422Þ
~ = 0:
ψ ′ ð x0 Þ G ð24:423Þ
In this section, we wish to examine the properties of the Lorenz groups in terms of
the Lie groups and Lie algebras. In Chap. 20 we studied the structure and properties
of SU(2) and SO(3). We mentioned that several different Lie groups share the same
Lie algebra. Of those Lie groups, we had a single universal covering group charac-
terized as the simply connected group (Sect. 20.5). As a typical example, only
SU(2) is simply connected as the universal covering group amongst the Lie groups
such as O(3), SO(3), etc., having the same Lie algebra. In this section, we examine
how the Lorentz group is positioned as part of the continuous groups.
1242 24 Basic Formalism
We have already studied properties of Lie algebra of the Lorentz group (Sect 24.2).
We may ask what kind of properties the Lorentz group has in relation to the above-
mentioned characteristics of SU(2) and related continuous groups. In particular, is
there a universal covering group for the Lorentz group? To answer this question, we
need a fundamentally important theorem about the matrix algebra. The theorem is
well-known as the polar decomposition of a matrix.
Theorem 24.7 Polar Decomposition Theorem [15] Any complex non-singular
matrix A is uniquely decomposed into a product such that
A = UP, ð24:424Þ
λ1
λ2
W { A{ AW = Λ , ð24:425Þ
⋱
λn
~ is defined as
where Λ1=2
~ WΛ1=2 W { :
Λ1=2 ð24:429Þ
~
Since the unitary similarity transformation holds the eigenvalue unchanged, Λ1=2
is the positive definite Hermitian operator. Meanwhile, we define a following matrix
V as
-1
~
V A Λ1=2 : ð24:430Þ
-1 { -1 { -1 -1
V {V = ~
Λ1=2 ~
A{ A Λ1=2 = ~
Λ1=2 ~
Λ1=2
2
~
Λ1=2
-1 -1
~
= Λ1=2 Λ~1=2
2
Λ~
1=2
= E, ð24:431Þ
-1 { { -1
where with the second equality we used ~
Λ1=2 = Λ~1=2 and with the
third equality we used the Hermiticity of Λ~1=2 . To show the former relation, suppose
that we have two non-singular matrices Q and R. We have (QR){ = R{Q{. Putting
R = Q-1, we get E{ = E = (Q-1){Q{, namely (Q{)-1 = (Q-1){ [14]. Thus, from
(24.430) it follows that there exist a unitary matrix V and a positive definite
~ which satisfy (24.424) such that
Hermitian matrix Λ1=2
A = V Λ~1=2 : ð24:432Þ
A{ A = P{ U { UP = P2 , ð24:433Þ
where with the second equality we used U{U = E (U: unitary) and P{ = P (P:
Hermitian). Then, from (24.428) and (24.433), we obtain
1244 24 Basic Formalism
P2 = Λ~1=2
2
: ð24:434Þ
~
P2 = Λ1=2
2
= WΛW { : ð24:435Þ
-1
~
U = A Λ1=2 : ð24:439Þ
1=2 1=2
Λð ± Þ Λð ± Þ = Λ, ð24:440Þ
1=2
where Λð ± Þ is defined as
24.9 Connectedness of the Lorentz Group 1245
p
± λ1 p
1=2 ± λ2
Λð ± Þ = : ð24:441Þ
⋱ p
± λn
Among them, however, only Λ1/2 is the positive definite operator. The
one-dimensional polar representation of a non-zero complex number is the polar
coordinate representation expressed as z = eiθr [where the unitary part is described
first and r (>0) corresponds to a positive definite eigenvalue]. Then, we can safely
say that the polar decomposition of a matrix is a multi-dimensional polar represen-
tation of the matrix.
Corollary 24.1 [14] Any complex non-singular matrix A is uniquely decomposed
into a product such that
A = P0 U 0 , ð24:442Þ
λ1′
{ λ2′
U′ AA{ U ′ = Ω , ð24:443Þ
⋱
λn′
þ λ1′
A = Ω~1=2 W:
This means the existence of the polar decomposition of (24.442). The uniqueness
of the polar decomposition can be shown as before. This completes the proof.
Corollary 24.2 [14] Let A be a non-singular matrix. Then, there exist positive
definite Hermitian matrices H1 and H2 as well as a unitary matrix U that satisfy
A = UH 1 = H 2 U: ð24:445Þ
If and only if A is a normal matrix, then we have H1 = H2. (That is, H1 = H2 and
U are commutative.)
Proof From Theorem 24.7, there must exist a positive definite Hermitian matrix H1
and a unitary matrix U that satisfy the relation A = UH1. Rewriting this, we have
A = UH 1 U { U = H~1 U, ð24:446Þ
H 2 U = H~1 U = UH 1 U { U = UH 1 : ð24:447Þ
A{ A = H {1 U { UH 1 = H {1 H 1 = H 1 H 1 :
Also, we have
AA{ = UH 1 H {1 U { = H 1 UU { H {1 = H 1 H {1 = H 1 H 1 , ð24:448Þ
where with the second equality we used UH1 = H1U and its adjoint H {1 U { = U { H {1 .
From the above two equations, A is normal.
Conversely, if A is normal, from (24.447) we have
A{ A = H 21 = U { H {2 H 2 U = U { H 22 U = AA{ = H 22 :
24.9 Connectedness of the Lorentz Group 1247
A = OP, ð24:449Þ
2 1
A= : ð24:450Þ
1 0
We have
5 2
AT A = : ð24:451Þ
2 1
p p 2
3þ2 2 0p 2þ1 0
D = U HU =
T
= p : ð24:452Þ
0 3-2 2 0 2-1
2
Then, we get
p
2þ1 p0
D 1=2
= ð24:453Þ
0 2-1
as well as
1248 24 Basic Formalism
-1
p p
- 1=
V = A D~1=2
2 1 1= p
2 p 2
=
1 0 - 1= 2 3= 2
p p
1=p2 1= p2
= : ð24:455Þ
1= 2 - 1= 2
Thus, we find that V is certainly an orthogonal matrix. The unique polar decom-
position of A is given by
p p p p
= V D~1=2 =
2 1 1=p2 1= p2 3=p2 1=p2
A= : ð24:456Þ
1 0 1= 2 - 1= 2 1= 2 1= 2
1 2
A= : ð24:457Þ
0 1
In this case, the orthogonal matrix and the positive definite symmetric matrix are
not commutative. Finding the other decomposition is left as an exercise for readers.
With respect to the transformation matrix S(Λ) of Sect. 24.4, let us think of a
following example.
24.9 Connectedness of the Lorentz Group 1249
ω ω
eiϕ=2 cosh 0 - eiϕ=2 sinh 0
2 2
ω ω
0 e - iϕ=2 cosh 0 e - iϕ=2 sinh
2 2
ω ω
- eiϕ=2 sinh 0 eiϕ=2 cosh 0
2 2
ω ω
0 e - iϕ=2 sinh 0 e - iϕ=2 cosh
2 2
eiϕ=2 0 0 0
- iϕ=2
0 e 0 0
¼ iϕ=2
0 0 e 0
0 0 0 e - iϕ=2
ω ω
cosh 0 - sinh 0
2 2
ω ω
0 cosh 0 sinh
2 2 : ð24:459Þ
ω ω
- sinh 0 cosh 0
2 2
ω ω
0 sinh 0 cosh
2 2
The LHS of Eq. (24.459) is a normal matrix and the two matrices of RHS are
commutative as expected. As in this example, if an axis of the rotation and a
direction of the Lorentz boost coincide, such successive operations of the rotation
and boost are commutative and, hence, the relevant representation matrix S(Λ) is
normal according to Corollary 24.2.
Theorem 24.7 helps understand that universal covering group exists in the
Lorentz groups. The said group is termed special linear group and abbreviated as
SL(2, ℂ). In Sects. 20.2 and 20.5, we have easily shown that SU(2) is isomorphic to
S3, because we have many constraint conditions imposed upon the parameters of
SU(2). Then, we could readily show that since S3 is simply connected, so is SU(2).
With SL(2, ℂ), on the other hand, the number of constraint conditions is limited; we
have only two conditions resulting from that the determinant of two-dimensional
complex matrices is 1. Therefore, it is not easy to examine whether SL(2, ℂ) is
simply connected. In virtue of Theorem 24.7, however, we can readily prove that
SL(2, ℂ) is indeed simply connected. It will be one of the main subjects of the next
section.
1250 24 Basic Formalism
In this section we examine how SL(2, ℂ) is related to the Lorentz group. We start
with the most general feature of SL(2, ℂ) on the basis of the discussion of Sect. 20.5.
The discussion is based on Definitions 20.8-20.10 (Sect. 20.5) along with the
following definition (Definition 24.5).
Definition 24.5 The special linear group SL(2, ℂ) is defined as
SL(2, ℂ) {A = (aij); i, j = 1, 2, aij 2 ℂ, detA = 1}.
Theorem 24.8 [8] The group SL(2, ℂ) is simply connected.
Proof Let g(t, s) 2 SL(2, ℂ) be an arbitrary continuous matrix function with respect
to t in a region I × I = {(t, s); 0 ≤ t ≤ 1, 0 ≤ s ≤ 1}. Also, let g(t, 1) be an arbitrary loop
in SL(2, ℂ). Then, according to Theorem 24.7, g(t, 1) can uniquely be decomposed
such that
where X ðt Þ 2 slð2, ℂÞ; slð2, ℂÞ is a Lie algebra of SL(2, ℂ). Since P(t, 1) is Hermitian
and s is real, X(t) is Hermitian as well. This can immediately be derived from (15.32).
Since we have
and expsX(t) is analytic, g(t, 1) is homotopic to g(0, t) = U(t); see Definition 20.9.
Because SU(2) is simply connected (see Sect. 20.5.3), U(t) is homotopic to any
constant loop (Definition 20.10). Consequently, from the above statement g(t, 1) is
homotopic to a constant loop. This loop can arbitrarily be chosen and g(t, 1) is an
arbitrary loop in SL(2, ℂ). Therefore, SL(2, ℂ) is simply connected.
During the process of the above proof, an important feature of SL(2, ℂ) has
already shown up. That is, Hermitian operators are contained in slð2, ℂÞ. This feature
shows a striking contrast with the Lie algebra suðnÞ in that suðnÞ comprises anti-
Hermitian operators only.
Theorem 24.9 [8] The Lorentz group O(3, 1) is self-conjugate. That is, if
Λ 2 O(3, 1), then ΛT 2 O(3, 1).
Proof From (24.198), for a group element Λ to constitute the Lorentz group O(3, 1)
we have
24.9 Connectedness of the Lorentz Group 1251
ΛT ηΛ = η:
Multiplying both sides by Λη from the left and Λ-1η from the right, we obtain
Λη ΛT ηΛ Λ - 1 η = Λη η Λ - 1 η:
Using η2 = 1, we get
T
ΛηΛT = η or ΛT ηΛT = η:
Λ = UP: ð24:463Þ
P = exp Y, ð24:464Þ
ΛT Λ = PT U T UP = PT P = P2 , ð24:465Þ
where with the second equality we used the fact that U is an orthogonal matrix; with
the last equality we used the fact that P is symmetric. Then, P2 2 O(3, 1). Therefore,
using ∃ X 2 oð3, 1Þ, P2 can be expressed as
P2 = exp X: ð24:466Þ
T
P2 = P2 = ðexp X ÞT = exp X T , ð24:467Þ
where with the last equality we used (15.31). Comparing (24.466) and (24.467), we
obtain
1252 24 Basic Formalism
Since (24.468) equals the positive definite matrix P2, a logarithmic function
corresponding to (24.468) uniquely exists [8]. Consequently, taking the logarithm
of (24.468) we get
X T = X: ð24:469Þ
Namely, X is real symmetric. As X=2 2 oð3, 1Þ, exp(X/2) 2 O(3, 1). Since P and
P2 are positive definite, exp(X/2) is uniquely determined such that
1=2
expðX=2Þ = P2 = P: ð24:470Þ
±1 0 0 0
0
ð24:472Þ
Λ= g0
g2
g1
g3
; g0 2 ℝ, g1 : ð1, 3Þ matrix, g2 : ð3, 1Þ matrix,
g3 2 glð3, ℝÞ: ð24:473Þ
g0 g1 g0 - g1
ηΛ = = = Λη: ð24:474Þ
- g2 - g3 g2 - g3
g0 g2T g0 g1 g20 0
ΛT ηΛ = =
g1T g3T - g2 - g3 0 - g3T g3
1 0 0 0
0 -1 0 0
= = η: ð24:475Þ
0 0 -1 0
0 0 0 -1
1 0 0 0 1 0 0 0
0 1 0 0 0 1 0 0
E0 E = , E1 = ,
0 0 1 0 0 0 1 0
0 0 0 1 0 0 0 -1
-1 0 0 0 -1 0 0 0
0 1 0 0 0 1 0 0
E2 = , E3 = , ð24:477Þ
0 0 1 0 0 0 1 0
0 0 0 1 0 0 0 -1
respectively.
Proof As in the case of the proof of Theorem 24.8, an arbitrary element Λ of the
Lorentz group O(3, 1) can be represented as
where O(t) is an orthogonal matrix of O(4); P(t, s) = exp sX(t) for which X(t) is a real
symmetric matrix according to Theorem 24.10. Both O(t) and P(t, s) belong to
O(3, 1) (Theorem 24.10). Then, we have O(t) 2 O(3, 1) O(4).
1254 24 Basic Formalism
1 0 0 1 0 0
0
O= 0 1 0 ,O = 0 1 0 : ð24:479Þ
0 0 1 0 0 -1
For this, see (20.326) and (20.327). Taking account of O(1) = {+1, -1} in
(24.471), we have 2 × 2 = 4 connected sets in O(3, 1). These four sets originate
from the combinations of (24.479) with ±1 of O(1) and the individual connected sets
are represented by Ei (i = 0, 1, 2, 3) of (24.477). Suppose that L0 , L1 , L2 , and L3
comprise all the elements that are connected to E0, E1, E2, and E3, respectively.
Then, these Li ði = 0, 1, 2, 3Þ are connected sets and arbitrary elements of O(3, 1) are
contained in one of them. Thus, it suffices to show that Li \ Lj = ∅ði ≠ jÞ:
First, from ΛTηΛ = η we have (detΛ)2 = 1, i.e., detΛ = ± 1. From (24.477) we
have detE0 = det E3 = 1 and detE1 = det E2 = - 1. Then, all the elements Λ
belonging to L0 and L3 have detΛ = 1 and those belonging to L1 and L2 have
detΛ = - 1. From Example 20.8, the elements of L0 cannot be connected to the
elements of L1 or those of L2 . Similarly, the elements of L3 cannot be connected to
the elements of L1 or those of L2 .
Let Λ0 2 L0 and let Λ0 be expressed as Λ0 = O0P0 with the polar decomposition.
The element Λ0 is connected to O0, and so O0 2 L0 from Theorem 24.10. Since L0 is
a collection of the elements that are connected to E0, so is O0 ½2 L0 \ Oð4Þ. The
element E0 can be expressed as E 0 = O ~ fþ1g so that detO ~ = 1 in (24.479) and det
{+1} = 1. Meanwhile, let Λ3 2 L3 be described by Λ3 = O3P3. The element Λ3 is
connected to O3, and so O3 2 L3 . Similarly to the above case, O3 ½2 L3 \ Oð4Þ is
connected to E3, which is expressed as E 3 = O~0 f - 1g so that detO~0 = - 1 in
(24.479) and det{-1} = - 1. Again, from Example 20.8, within a connected set the
sign of the determinant should be constant. Therefore, the elements of L0 cannot be
connected to the elements of L3 . The discussion equally applies to the relationship
between L1 and L2 .
Thus, after all, we get
24.9 Connectedness of the Lorentz Group 1255
Oð3, 1Þ = E0 SO0 ð3, 1Þ þ E 1 SO0 ð3, 1Þ þ E 2 SO0 ð3, 1Þ þ E 3 SO0 ð3, 1Þ: ð24:481Þ
Summarizing the above, we classify the connected components such that [8]:
The set {E0, E1, E2, E3} forms a group, which is isomorphic to the four-group we
dealt with in Sects. 17.1 and 17.2. In parallel with (20.328), we have
1 0 0 0
0 -1 0 0
E~1 ¼
0 0 -1 0
0 0 0 -1
The element E~1 represents the inversion symmetry (i.e., spatial inversion) and is
connected to E1. This can be shown as in the case of Example 20.6. Suppose a
following continuous matrix function:
1 0 0 0
0 cos πt - sin πt 0
f ðt Þ = :
0 sin πt cos πt 0
0 0 0 -1
1256 24 Basic Formalism
Then, f(t) is a continuous matrix function with respect to real t; f(t) represents
product transformations of mirror symmetry with respect to the xy-plane and rotation
about the z-axis. We have f(0) = E1 and f ð1Þ = E~1 . Hence, f(0) and f(1) are connected
to each other in L1 .
The element E1 (or E~1 ) is known as a discrete symmetry along with the element
E2 that represents the time reversal. The matrix C defined in (21.164), on the other
hand, is not a Lorentz transformation. This is known from Theorem 24.11. Namely,
C is unitary, and so if C belonged to O(3, 1), C would be described in a form of
(24.472). But it is not the case. This is associated with the fact that C represents the
exchange between a pair of particle and antiparticle. Note that this is not a space-time
transformation but an “inner” transformation.
As a result of Theorem 24.12 and on the basis of Theorem 20.10, we reach the
following important consequence: The Lie algebras oð3, 1Þ of O(3, 1) and so0 ð3, 1Þ,
of SO0(3, 1) are identical.
With the modulus of Λ00 , see (24.207). The special theory of relativity is based
upon L0 = O0 ð3, 1Þ where neither the spatial inversion nor time reversal is assumed.
As a tangible example, expðθA3 Þ of (24.224) and expð- ωB3 Þ of (24.225) satisfy
this criterion. Equation (24.223) reflects the polar decomposition. The representation
matrix S(Λ) of (24.293) with respect to the transformation of the Dirac spinor gives
another example of the polar decomposition. In this respect, we show some exam-
ples below.
Example 24.11: Lorentz Transformation Matrix In (24.223), expða AÞ and
expðb BÞ represent an orthogonal matrix and a positive definite symmetric matrix,
respectively. From Theorems 24.10 and 24.11, we have
~ of SO0(3, 1)
where SO(1) {1}. Any elements Λ SO(4) can therefore be described
by
1 0 0 0
0
ð24:485Þ
Since all these eigenvalues are positive and det expð- ωB3 Þ = 1, expð- ωB3 Þ is
positive definite (see Sect. 13.2) and the product of the eigenvalues of (24.486) is
1, as can be easily checked. Consequently, (24.223) is a (unique) polar
decomposition.
Example 24.12: Transformation Matrix S(Λ) of Dirac Spinor As already shown,
(24.293) describes the representation matrix S(Λ) with respect to the transformation
of the Dirac spinor caused by the Lorentz transformation. The RHS of (24.293) can
be decomposed such that
Sð Λ Þ ¼
iϕ θ iϕ θ
e 2 cos e 2 sin 0 0
2 2
iϕ θ iϕ θ
- e - 2 sin e - 2 cos 0 0
2 2
iϕ θ iϕ θ
0 0 e 2 cos e 2 sin
2 2
iϕ θ iϕ θ
0 0 - e - 2 sin e - 2 cos
2 2
ω ω
cosh 0 - sinh 0
2 2
ω ω
0 cosh 0 sinh
× 2 2 : ð24:487Þ
ω ω
- sinh 0 cosh 0
2 2
ω ω
0 sinh 0 cosh
2 2
The first factor of (24.487) is unitary and the second factor is a real symmetric
matrix. The eigenvalues of the second factor are positive. These are given by
ω ω ω
cosh þ sinh ðdoubly degenerateÞ, cosh
2 2 2
ω
- sinh ðdoubly degenerateÞ ð24:488Þ
2
with a determinant of 1 (>0). Hence, the second matrix is positive definite. Thus, as
in the case of Example 24.10, S(Λ) gives a unique polar decomposition.
Given the above tangible examples, we wish to generalize the argument a little bit
further. In Theorem 24.7 and Corollary 24.1 of Sect. 24.9.1 we mentioned two ways
of the polar decomposition. What about the inverse Lorentz transformation, then?
The inverse transformation of (24.424) is described by
1258 24 Basic Formalism
A - 1 = P - 1U - 1: ð24:489Þ
λ1 ⋯
U { PU = D ⋮ ⋱ ⋮ , ð24:490Þ
⋯ λn
where λ1, ⋯, λn are real positive numbers. Taking the inverse of (24.490), we obtain
1=λ1 ⋯
-1
U - 1P - 1 U{ = U{P - 1U = ⋮ ⋱ ⋮ : ð24:491Þ
⋯ 1=λn
Meanwhile, we have
{ -1
P-1 = P{ = P - 1: ð24:492Þ
Equations (24.491) and (24.492) imply that P-1 is a positive definite Hermitian
operator. Thus, the above argument ensures that the inverse transformation A-1 of
(24.489) can be dealt with on the same footing as the transformation A. In fact, in
both Examples 24.11 and 24.12, eigenvalues of the positive definite real symmetric
(i.e., Hermitian) matrices are unchanged, or switched to one another (consider e.g.,
exchange between ω and -ω) after the inversion.
We have known from Theorem 24.8 that an arbitrary continuous function g(t,
s) 2 SL(2, ℂ) is homotopic to U(t) 2 SU(2) and concluded that since SU(2) is simply
connected, SL(2, ℂ) is simply connected as well. As for the Lorentz group O(3, 1), on
the other hand, O(3, 1) is homotopic to O(t) 2 O(4), but since O(4) is not simply
connected, we infer that O(3, 1) is not simply connected, either. In the next section,
we investigate the representation of O(3, 1), especially that of the proper
orthochronous Lorentz group SO0(3, 1). We further investigate the connectedness
of the Lorentz group with this special but important case.
Amongst four connected components of the Lorentz group O(3, 1), the component
that is connected to the identity element E0 E does not contain the spatial inversion
or time reversal. This component itself forms a group termed SO0(3, 1) and fre-
quently appears in the elementary particle physics.
24.10 Representation of the Proper Orthochronous Lorentz. . . 1259
where (hij) is a complex (2, 2) Hermitian matrix. That is, V4 is a set of all (2, 2)
Hermitian matrices. The set V4 is a vector space on a real number field ℝ. To show
this, we express an arbitrary element H of V4 as
a c þ id
H= ða; b; c; d : realÞ: ð24:494Þ
c - id b
We put
1 0 0 1 0 i -1 0
σ0 = , σ1 = , σ2 = , σ3 = : ð24:495Þ
0 1 1 0 -i 0 0 1
Notice that σ k (k = 1, 2, 3) are the Pauli spin matrices that appeared in (20.41).
Also, σ k is expressed as
σ k = 2iζ k ðk = 1, 2, 3Þ,
1 1
H= ða þ bÞσ 0 þ ð- a þ bÞσ 3 þ cσ 1 þ dσ 2 : ð24:496Þ
2 2
1 1 1 1
p σ0, p σ1 , p σ2, p σ3: ð24:497Þ
2 2 2 2
x0 - x3 x1 þ ix2
H= x0 ; x1 ; x2 ; x3 : real , ð24:498Þ
x1 - ix2 x0 þ x 3
we have
1260 24 Basic Formalism
H = x0 σ 0 þ x1 σ 1 þ x2 σ 2 þ x3 σ 3 = xμ σ μ : ð24:499Þ
with
q= a
c
b
d
, detq = ad - bc = 1: ð24:501Þ
We have (qHq{){ = qH{q{ = qHq{. That is, qHq{ 2 V4 . Defining this mapping
as φ[q], we write
Also, we have
x0
x1
H ′ = φ½qðH Þ = ðσ 0 σ 1 σ 2 σ 3 Þφ½q
x2
x3
x0
x1
= ðσ 0 φ½q σ 1 φ½q σ 2 φ½q σ 3 φ½qÞ : ð24:505Þ
x2
x3
a b 1 0 a c
σ 0 φ½q = φ½qðσ 0 Þ = qσ 0 q{ =
c d 0 1 b d
jaj2 þjbj2 ac þbd
= a cþb d jcj2 þjdj2
: ð24:506Þ
1 2 1 2
H0 = aj2 þ b 2 þ cj2 þ d σ0 þ cj2 þ d 2 - aj2 - b σ3
2 2
þℜeðac þ bd Þσ 1 þ ℑmðac þ bd Þσ 2 : ð24:507Þ
2 2 2 2
detH = x0 - x1 - x2 - x3 : ð24:508Þ
Equation (24.7) is identical with the bilinear form introduced into the
Minkowski space 4 . We know from (24.500) and (24.501) that the transforma-
tion φ[q] keeps the determinant unchanged. That is, the bilinear form (24.508) is
invariant under this transformation. This implies that φ[q] is associated with the
Lorentz transformation.
To see whether this is really the case, we continue the calculations pertinent to
(24.507). In a similar manner, we obtain related results H′ after the transformations
φ[q](σ 1), φ[q](σ 2), and φ[q](σ 3). The results are shown collectively as [6]
ðσ 0 σ 1 σ 2 σ 3 Þφ½q
= ðσ 0 σ 1 σ 2 σ 3 Þ ×
1 1
jaj2 þ jbj2 þ jcj2 þ jd j2 Reðab þ cd Þ - Imðab þ cd Þ jbj2 þ jdj2 - jaj2 - jcj2
2 2
Reðac þ bd Þ Reðad þ bc Þ - Imðad - bc Þ Reðbd - ac Þ
:
Imðac þ bd Þ Imðad þ bc Þ Reðad - bc Þ Imðbd - ac Þ
1 1
jcj þ jd j2 - jaj2 - jbj2
2
Reðcd - ab Þ Imðab - cd Þ jaj2 þ jd j2 - jbj2 - jcj2
2 2
ð24:509Þ
Example 24.13 We may freely choose the parameters a, b, c, and d under the
restriction of (24.501). Let us choose b = c = 0 and a = eiθ/2 and d = e-iθ/2 in
(24.501). That is,
eiθ=2 0
q= : ð24:510Þ
0 e - iθ=2
1 0 0 0
0 cos θ - sin θ 0
φ½q = : ð24:511Þ
0 sin θ cos θ 0
0 0 0 1
This is just a rotation by θ around the z-axis. Next, we choose a = d = cosh (ω/2)
and b = c = - sinh (ω/2). Namely,
coshðω=2Þ - sinhðω=2Þ
q= : ð24:512Þ
- sinhðω=2Þ coshðω=2Þ
Then, we obtain
coshω - sinhω 0 0
- sinhω coshω 0 0
φ½ q = : ð24:513Þ
0 0 1 0
0 0 0 1
ω ω
Moreover, if we choose b = c = 0 and a = cosh 2 - sinh 2 and d =
cosh ω2 þ sinh ω2 in (24.501), we have
ω ω
cosh - sinh 0
q= 2 2 : ð24:514Þ
ω ω
0 cosh þ sinh
2 2
Then, we get
coshω 0 0 sinhω
φ½ q = 0 1 0 0 : ð24:515Þ
0 0 1 0
sinhω 0 0 coshω
The latter two illustrations represent Lorentz boosts. We find that the (2, 2) matrix
of (24.510) belongs to SU(2) and that the (2, 2) matrices of (24.512) and (24.514) are
positive definite symmetric matrices. In terms of Theorem 24.7 and Corollary 24.3,
24.10 Representation of the Proper Orthochronous Lorentz. . . 1263
these matrices have already been decomposed. If we expressly (or purposely) write
in a decomposed form, we may do so with the above examples such that
eiθ=2 0 1 0
q= , ð24:516Þ
0 e - iθ=2 0 1
1 0 coshðω=2Þ - sinhðω=2Þ
q= : ð24:517Þ
0 1 - sinhðω=2Þ coshðω=2Þ
where F is the kernel and e is the identity element. To show this, it was important to
show the existence of the kernel F . In the present case, we wish to show the
characteristic of the kernel N with respect to the homomorphism mapping φ, namely
group that contains the identity element E0. Then, according to Corollary 24.4, we
have
As already seen in (24.509), φ[q] [q 2 SL(2, ℂ)] has given the matrix represen-
tation of the Lorentz group SO0(3, 1). This seems to imply at first glance that
Now, we have a question of what kind of property φ has (see Sect. 16.4). In terms
of the mapping, if we have
f ð x1 Þ = f ð x 2 Þ ⟹ x1 = x2 , ð24:523Þ
the mapping f is injective (Sect. 11.2). If (24.523) is not true, f is not injective, and so
f is not isomorphic either. Therefore, our question can be translated into whether or
not the following proposition is true of φ:
φ ½ q = φ½ σ 0 ⟹ q = σ0: ð24:524Þ
Meanwhile, φ[σ 0] gives the identity mapping (Sect. 16.4). In fact, we have
Consequently, our question boils down to the question of whether or not we can
find q (≠σ 0) for which φ[q] gives the identity mapping. If we find such q, φ[q] is not
isomorphic, but has a non-trivial kernel. Let us get back to our current issue. We
have
φ½ - σ 0 ðH Þ = ð- σ 0 ÞH - σ 0 { = σ 0 Hσ 0 = H: ð24:526Þ
Then, φ[-σ 0] gives the identity mapping as well; that is, φ[-σ 0] = φ[σ 0]. Equation
(24.24) does not hold with φ accordingly. Therefore, according to Theorem 16.2 φ is
not isomorphic. To fully characterize the representation φ, we must seek element
(s) other than -σ 0 as a candidate for the kernel.
24.10 Representation of the Proper Orthochronous Lorentz. . . 1265
φ½qðH Þ = qHq{ = H:
In a special case where we choose σ 0 (2H ), the following relation must hold:
where the last equality comes from the supposition that φ[q] is contained in the
kernel N . This means that q is unitary. That is, q{ = q-1. Then, for any H 2 V4 we
have
z = x þ iy, x, y 2 V4 , ð24:529Þ
which will be proven in Theorem 24.14 given just below. Then, from both (24.528)
and (24.529), q must be commutative with any (2, 2) complex matrices. Let C be a
set of all the (2, 2) complex matrices described as
Then, since C ⊃ SLð2, ℂÞ, it follows that q is commutative with 8g 2 SL(2, ℂ).
Consequently, according to Schur’s Second Lemma q must be expressed as
q = σ0 or q = - σ0: ð24:531Þ
Thus, we obtain
N = fσ 0 , - σ 0 g: ð24:532Þ
Any other elements belonging to V4 are excluded from the kernel N . This
implies that φ is a two-to-one homomorphic mapping. At the same time, from
Theorem 16.4 (Homomorphism Theorem) we obtain
1266 24 Basic Formalism
z = x þ iy x, y 2 V4 : ð24:534Þ
a þ ib c þ id
z= ða; b; c; d; p; q; r; s 2 ℝÞ: ð24:535Þ
p þ iq r þ is
Then, z is written as
1
z= ½ða þ rÞσ 0 þ ð- a þ r Þσ 3 þ ðc þ pÞσ 1 þ ðd - qÞσ 2
2
i
þ ½ðb þ sÞσ 0 þ ð- b þ sÞσ 3 þ ðd þ qÞσ 1 þ ð- c þ pÞσ 2 , ð24:536Þ
2
Theorem 24.15 [8] Let Z be any arbitrary element of slð2, ℂÞ. Then, z is uniquely
described as
Z = X 0 þ iY 0 ½X 0 , Y 0 2 suð2Þ: ð24:539Þ
ðX 0 - X Þ þ iðY 0 - Y Þ = 0: ð24:540Þ
- ðX 0 - X Þ þ iðY 0 - Y Þ = 0: ð24:541Þ
2iðY 0 - Y Þ = 0 or Y 0 = Y: ð24:542Þ
2ð X 0 - X Þ = 0 or X 0 = X: ð24:543Þ
Equations (24.40) and (24.41) indicate that the representation (24.537) is unique.
These complete the proof.
The fact that an element of slð2, ℂÞ is traceless is shown as follows: From (15.41),
we have
where with the first equality we used (15.41) with A replaced with tA; the last
equality came from Definition 24.5. This implies that TrtA = tTrA = 0. That is,
A is traceless.
Theorem 24.15 gives an outstanding feature to slð2, ℂÞ. That is, Theorem 24.15
tells us that the elements belonging to slð2, ℂÞ can be decomposed into a matrix
belonging to suð2Þ and that belonging to the set i suð2Þ. In turn, the elements of
slð2, ℂÞ can be synthesized from those of suð2Þ. With suð2Þ, an arbitrary vector is
described as a linear combination on a real field ℝ. In fact, let X be a vector on suð2Þ.
Then, we have
ic a - ib
X= ða; b; c 2 ℝÞ: ð24:546Þ
- a - ib - ic
Defining the basis vectors of suð2Þ by the use of the matrices defined in (20.270)
as
0 -i 0 1 i 0
ζ1 = , ζ2 = , ζ3 = , ð24:547Þ
-i 0 -1 0 0 -i
we describe X as
a þ ib c þ id
Z= ða; b; c; d; p; q 2 ℝÞ: ð24:549Þ
p þ iq - a - ib
1 i
Z = ðb - iaÞζ~3 þ ðc - pÞ þ ðd - qÞ ζ~2
2 2
1 i
þ - ðd þ qÞ þ ðc þ pÞ ζ~1 : ð24:550Þ
2 2
Thus, we find that slð2, ℂÞ is a complex vector space. If in general a complex Lie
algebra g has a real Lie subalgebra h (i.e., h ⊂ g) and an arbitrary element Z 2 g is
uniquely expressed as
suð2Þ and slð2, ℂÞ is evident. Note that in the case of slð2, ℂÞ, X is anti-Hermitian
and iY is Hermitian. Defining the operators ~ζ k such that
ζ~k , ζ~l = ~
3
E ζ ,
m = 1 klm m
ð24:553Þ
~ζ , ~ζ = - 3
E ζ , ~ ð24:554Þ
k l m = 1 klm m
ζ~k , ~ζ l =
3
E ~ζ : ð24:555Þ
m = 1 klm m
References
1. Satake I-O (1975) Linear algebra (Pure and applied mathematics). Marcel Dekker, New York
2. Satake I (1974) Linear algebra. Shokabo, Tokyo. (in Japanese)
3. Birkhoff G, Mac Lane S (2010) A survey of modern algebra, 4th edn. CRC Press, Boca Raton
4. Byron FW Jr, Fuller RW (1992) Mathematics of classical and quantum physics. Dover,
New York
5. Rosén A (2019) Geometric multivector analysis. Birkhäuser, Cham
6. Hirai T (2001) Linear algebra and representation of groups II. Asakura Publishing, Tokyo.
(in Japanese)
7. Kugo C (1989) Quantum theory of gauge field I. Baifukan, Tokyo. (in Japanese)
8. Yamanouchi T, Sugiura M (1960) Introduction to continuous groups. Baifukan, Tokyo.
(in Japanese)
9. Hotta S (2023) A method for direct solution of the Dirac equation. Bull Kyoto Inst Technol 15:
27–46
10. Itzykson C, Zuber J-B (2005) Quantum field theory. Dover, New York
11. Sakamoto M (2014) Quantum field theory. Shokabo, Tokyo. (in Japanese)
12. Goldstein H, Poole C, Safko J (2002) Classical mechanics, 3rd edn. Addison Wesley, San
Francisco
13. Schweber SS (1989) An introduction to relativistic quantum field theory. Dover, New York
14. Mirsky L (1990) An Introduction to linear algebra. Dover, New York
15. Hassani S (2006) Mathematical physics. Springer, New York
Chapter 25
Advanced Topics of Lie Algebra
Abstract In Chap. 20, we dealt with the Lie groups and Lie algebras within the
framework of the continuous groups. With the Lie groups the operation is multipli-
cative (see Part IV), whereas the operation of the Lie algebras is additive since the
Lie algebra forms a vector space (Part III). It is of importance and interest, however,
that the representation is multiplicative in both the Lie groups and Lie algebras. So
far, we have examined the properties of the representations of the groups (including
the Lie group), but little was mentioned about the Lie algebra. In this chapter, we
present advanced topics of the Lie algebra with emphasis upon its representation.
Among those topics, the differential representation of Lie algebra is particularly
useful. We can form a clear view of the representation theory by contrasting the
representation of the Lie algebra with that of the Lie group. Another topic is the
complexification of the Lie algebra. We illustrate how to extend the scope of
application of Lie algebra through the complexification with actual examples. In
the last part, we revisit the topic of the coupling of angular momenta and compare the
method with the previous approach based on the theory of continuous groups.
25.1.1 Overview
(ii)
σ ðX þ Y Þ = σ ðX Þ þ σ ðY Þ, ð25:2Þ
(iii)
σ ðXY Þ = σ ðX Þσ ðY Þ: ð25:3Þ
σ : g→V
or
g 3 X ° σ ðX Þ 2 V: ð25:4Þ
The relationship (iii) implies that the σ(X) is a homomorphic mapping on g. In the
above definition, if V of (25.4) is identical to g, σ(X) is an endomorphism
(Chap. 11). Next, we think of the mapping σ(X), where X represents a commutator.
We have
where with the second equality we used (25.2); with the third equality we used
(25.3). That is,
(iv)
σ ð½X, Y Þ = ½σ ðX Þ, σ ðY Þ: ð25:5Þ
1
lim ½ρðexp tX Þ - ρðexp 0Þ = σ ðX Þ lim ½exp tσ ðX Þ: ð25:7Þ
t→0 t t→0
where e and E are an identity element of ℊ and ρ(ℊ), respectively. Then, (25.7) can
be rewritten as
1
σ ðX Þ = lim ½ρðexp tX Þ - E : ð25:9Þ
t→0 t
Thus, the relationship between the Lie group ℊ and its representation space L S
[see (18.36)] can be translated into that between the Lie algebra g and the associated
vector space V. In other words, as a whole collection consisting of X forms the Lie
algebra g (i.e., a vector space g), so a whole collection consisting of σ(X) forms the
Lie algebra on the vector space V. We summarize the above statements as a
following definition.
Definition 25.2 Suppose that we have a relationship described by
σ dρ: ð25:10Þ
To specify the representation space that is associated with the Lie group and Lie
algebra, we normally write the combination of the representation σ or ρ and its
associated representation space V as [1]
Notice that the same representation space V has been used in (25.11) for both the
representations σ and ρ relevant to the Lie algebra and Lie group, respectively.
One of the most important representations of the Lie algebras is an adjoint repre-
sentation. In Sect. 20.4.3, we defined the adjoint representation for the Lie group. In
the present section, we deal with the adjoint representation for the Lie algebra. It is
1274 25 Advanced Topics of Lie Algebra
σ ðX Þ ad½X , ð25:12Þ
where we used an angled bracket with RHS in accordance with Sect. 20.4.3. The
operator ad[X] operates on an element Y of the Lie algebra such that
gX ðt Þ = Ad½exp tX : ð25:15Þ
That is, gX is an adjoint representation of the Lie group ℊ on the Lie algebra g.
Notice that we are using (25.15) for ρ(exptX) of (25.6). Then, according to (25.6) and
(25.9), we have
d
ad½X ðY Þ = g0X ð0ÞY = ðexp tX ÞY exp tXÞ - 1 t=0
dt
d
= ½ðexp tX ÞY ðexp - tX Þt = 0 = XY - YX = ½X, Y , ð25:16Þ
dt
where with the third equality we used (15.28) and with the second to the last equality
we used (15.4). This completes the proof.
Equation (25.14) can be seen as a definition of the adjoint representation of the
Lie algebra. We can directly show that the operator ad[X] is indeed a representation
of g as follows: That is, with any arbitrary element Z 2 g we have
where with the third equality we used Jacobi’s identity (20.267). Comparing the first
and last sides of (25.17), we get
25.1 Differential Representation of Lie Algebra [1] 1275
Equation (25.18) holds with any arbitrary element Z 2 g. This implies that
From (25.5), (25.19) satisfies the condition for a representation of the Lie algebra
g.
We know that if X, Y 2 g, ½X, Y 2 g (see Sect. 20.4). Hence, we have a following
relationship described by
or
ad½X : g → g: ð25:21Þ
Also, we have
0 -1 0
ðζ 1 ζ 2 ζ 3 Þad½ζ 3 = ðζ 1 ζ 2 ζ 3 Þ 1 0 0 = ðζ 1 ζ 2 ζ 3 ÞA3 : ð25:25Þ
0 0 0
Similarly, we obtain
1276 25 Advanced Topics of Lie Algebra
0 0 0
ðζ 1 ζ 2 ζ 3 Þad½ζ 1 = ðζ 1 ζ 2 ζ 3 Þ 0 0 -1 = ðζ 1 ζ 2 ζ 3 ÞA1 , ð25:26Þ
0 1 0
0 0 1
ðζ 1 ζ 2 ζ 3 Þad½ζ 2 = ðζ 1 ζ 2 ζ 3 Þ 0 0 0 = ðζ 1 ζ 2 ζ 3 ÞA2 : ð25:27Þ
-1 0 0
The (3, 3) matrices of RHS of (25.25), (25.26), and (25.27) are identical to A3, A1,
and A2 of (20.274), respectively. Equations (25.25)-(25.27) show that the differen-
tial mapping ad transforms the basis vectors ζ1, ζ 2, and ζ 3 of suð2Þ into basis vectors
A1, A2, and A3 of soð3Þ ½ = oð3Þ, respectively (through the medium of ζ 1, ζ 2,
and ζ 3). From the discussion of Sect. 11.4, the transformation ad is bijective.
Namely, suð2Þ and soð3Þ are isomorphic to each other.
In Fig. 25.1 we depict the isomorphism ad between suð2Þ and soð3Þ as
Notice that the mapping between two Lie algebras suð2Þ and soð3Þ represented in
(25.28) is in parallel with the mapping between two Lie groups SU(2) and SO(3) that
is described by
In the present example, we have chosen suð2Þ for V3 ; see Sect. 20.4.3.
In (25.6), replacing ρ, t, X, and σ with Ad, α, ζ 3, and ad, respectively, we get
In fact, comparing (20.28) with (20.295), we find that both sides of the above
equation certainly give a same matrix expressed as
cos α - sin α 0
sin α cos α 0 :
0 0 1
X i , X j = f ij l X l : ð25:31Þ
The term [Xi, [Xj, Xk]] can be rewritten by the use of the structure constants such
that
X i , f jk l X l = f jk l ½X i , X l = f jk l f il m X m :
f jk l f il m
þ f ki l f jl m
þ f ij l f kl m X m = 0:
f jk l f il m
þ f ki l f jl m
þ f ij l f kl m = 0: ð25:33Þ
Taking account of the fact that f ki l is antisymmetric with respect to the indices
k and i, (25.33) is rewritten as
f jk l f il m
- f ik l f jl m
= f ij l f lk m : ð25:34Þ
If in (25.34) we define f jk l as
l
f jk l σ j k
, ð25:35Þ
l
then the operator σ j k is a representation of the Lie algebra. Note that in (25.35)
l
σ j k indicates (l, k) element of the matrix σ j. By the use of (25.35), (25.34) can be
rewritten as
l m
ð σ i Þm l σj k
- σj l
ðσ i Þl k = f ij l ðσ l Þm k : ð25:36Þ
σ i , σ j = f ij l σ l : ð25:37Þ
Equation (25.37) is virtually the same form as (25.31) and (20.269). Hence, we
find that (25.35) is the representation of the Lie algebra, whose structure is
represented by (25.37) or (20.269). Using the notation (25.14), we get
ad½σ i σ j = σ i , σ j = f ij l σ l : ð25:38Þ
with
25.1 Differential Representation of Lie Algebra [1] 1279
a b
q= , detq = ad - bc = 1: ð24:501Þ
c d
d
dφ½X ðyÞ = f 0X ð0ÞðyÞ = ðexp tX Þy exp tXÞ{ t=0
dt
d
= ðexp tX Þy exp tXÞ{ t=0
dt
with
0 0 0 0
0 0 -1 0
ðσ 0 σ 1 σ 2 σ 3 Þdφ½ζ 3 = ðσ 0 σ 1 σ 2 σ 3 Þ : ð25:42Þ
0 1 0 0
0 0 0 0
Similarly, we obtain
0 0 0 0
0 0 0 0
ðσ 0 σ 1 σ 2 σ 3 Þdφ½ζ 1 = ðσ 0 σ 1 σ 2 σ 3 Þ , ð25:43Þ
0 0 0 -1
0 0 1 0
0 0 0 0
0 0 0 1
ðσ 0 σ 1 σ 2 σ 3 Þdφ½ζ 2 = ðσ 0 σ 1 σ 2 σ 3 Þ : ð25:44Þ
0 0 0 0
0 -1 0 0
On the basis of Theorem 24.15, we have known that slð2, ℂÞ is spanned by six
linearly independent basis vectors. That is, we have
0 0 0 1
0 0 0 0
ðσ 0 σ 1 σ 2 σ 3 Þdφ½iζ 3 = ðσ 0 σ 1 σ 2 σ 3 Þ , ð25:46Þ
0 0 0 0
1 0 0 0
0 1 0 0
1 0 0 0
ðσ 0 σ 1 σ 2 σ 3 Þdφ½iζ 1 = ðσ 0 σ 1 σ 2 σ 3 Þ , ð25:47Þ
0 0 0 0
0 0 0 0
25.1 Differential Representation of Lie Algebra [1] 1281
0 0 1 0
0 0 0 0
ðσ 0 σ 1 σ 2 σ 3 Þdφ½iζ 2 = ðσ 0 σ 1 σ 2 σ 3 Þ : ð25:48Þ
1 0 0 0
0 0 0 0
Equations (25.42), (25.43), and (25.44) are related to the spatial rotation and
(25.46), (25.47), and (25.48) to the Lorentz boost. All these six (4, 4) real matrices
are identical to those given in (24.220). Equations (25.42)-(25.48) show that the
differential mapping dφ transforms the basis vectors ζ 1, ζ 2, ζ 3, iζ 1, iζ 2, iζ 3 of slð2, ℂÞ
into those A = ðA1 , A2 , A3 Þ, B = ðB1 , B2 , B3 Þ of so0 ð3, 1Þ, respectively (through
the medium of σ 0, σ 1, σ 2, and σ 3). Whereas slð2, ℂÞ is a six-dimensional complex
vector space, so3 ð3, 1Þ is a six-dimensional real vector space. Once again, from the
discussion of Sect. 11.4 the transformation dφ is bijective. Namely, slð2, ℂÞ and
so0 ð3, 1Þ [i.e., Lie algebras corresponding to the Lie groups SL(2, ℂ) and SO0(3, 1)]
are isomorphic to each other.
Thus, the implication of Examples 25.1 and 25.2 is evident and the differential
representation (e.g., ad and dφ in the present case) allows us to get a clear view on
the representation theory of the Lie algebra. In Sect. 24.1, we had the following
relation described by
We establish the expression the same as (24.223). Notice that the parameters -
ωi (i = 1, 2, 3) have the minus sign. It came from the conventional definition about
the direction of the Lorentz boost.
Example 25.3 In Example 25.2, through the complexification we extended the Lie
algebra suð2Þ = Spanfζ1 , ζ 2 , ζ 3 g to slð2, ℂÞ such that
can be read as
1282 25 Advanced Topics of Lie Algebra
1 -1 0
iζ 3 = :
2 0 1
ω ω
cosh - sinh 0
expðωiζ 3 Þ = 2 2 :
ω ω
0 cosh þ sinh
2 2
cosh ω 0 0 sinh ω
0 1 0 0
φðexp ωiζ3 Þ = 0 0 1 0 :
sinh ω 0 0 cosh ω
0 0 0 ω
0 0 0 0
ωdφ½iζ 3 = :
0 0 0 0
ω 0 0 0
N = fg 2 G; ρðgÞ = eg,
G=N ffi ρðGÞ:
(i)
(ii) Let n be a Lie algebra of N . Then, n is a kernel of the homomorphism mapping
σ = dρ of g. That is,
n = fX 2 g; σ ðX Þ = 0g: ð25:51Þ
Proof (i) We have already proven (i) in the proof of Theorem 16.4 (Homomorphism
Theorem). With (ii), the proof is as follows: Regarding 8 X 2 g and any real number
t, we have
0 = σ ðX Þ½exp tσ ðX Þ:
Taking the limit t → 0, we have 0 = σ(X). That is, n comprises the elements X that
satisfy (25.51). This completes the proof.
Now, we come back to the proof of Theorem 24.13. We wish to consider
dφ, V4 in correspondence with φ, V4 . Let us start a discussion about the kernel
N of φ. According to Theorem 24.13, it was given as
N = fσ 0 , - σ 0 g: ð24:532Þ
0 d e f
-c
dφ½aζ1 þ bζ 3 þ cζ 3 þ diζ 1 þ eiζ 2 þ fiζ 3 = d
e
0
c 0 -a
b
:
f -b a 0
From (25.45), aζ 1 + ⋯ + f i ζ 3 (a, ⋯, f : real) represents all the elements slð2, ℂÞ.
For RHS to be zero element 0′ of so0 ð3, 1Þ, we need a = ⋯ = f = 0. This implies that
dφ - 1 ð00 Þ = f0g,
1284 25 Advanced Topics of Lie Algebra
: isomorphism = ( )
where 0 denotes the zero element of slð2, ℂÞ; dφ-1 is the inverse transformation of
dφ. The zero element 0′ is represented by a (4, 4) matrix. An explicit form is given by
0 0 0 0
00 = 0
0
0
0
0
0
0
0 :
0 0 0 0
is a bijective mapping (see Sect. 11.2). From Theorem 25.2, with respect to the
kernel n of dφ we have
n = f0g, 0 0
0
0
0
:
This relationship is in parallel with that between suð2Þ and soð3Þ; see Example
25.1. In Fig. 25.2 we depict the isomorphism dφ between slð2, ℂÞ and so0 ð3, 1Þ.
Meanwhile, Theorem 20.11 guarantees that 8g 2 SO0(3, 1) can be described by
where X1, X2, ⋯, and X6 can be chosen from among the (4, 4) matrices of, e.g.,
(25.42), (25.43), (25.44), (25.46), (25.47), and (25.48). With 8q 2 SL(2, ℂ), q can be
expressed as
Since from the above discussion there is the bijective mapping (or isomorphism)
between slð2, ℂÞ and so0 ð3, 1Þ, we may identify X1, ⋯, X6 with dφ[ζ 1], ⋯, dφ[iζ 3],
respectively; see (25.42)-(25.48). Thus, we get
Figure 25.3 shows the homomorphic mapping φ from SL(2, ℂ) to SO0(3, 1) with
the kernel N = fσ 0 , - σ 0 g. In correspondence with (25.29) of Example 25.1, we
have
1286 25 Advanced Topics of Lie Algebra
dφ, V4 ⟺ φ, V4 : ð25:57Þ
Note that in Example 25.1 the representation space V4 has been defined as a four-
dimensional real vector space given in (24.493).
Comparing Figs. 25.1 and 25.2 with Figs. 25.3 and 20.6, we conclude the
following: (i) The differential representations ad and dφ exhibit the isomorphic
mapping between the Lie algebras, whereas the representations Ad and φ indicate
the homomorphic mapping between the Lie groups. (ii) This enables us to deal with
the relationship between SL(2, ℂ) and SO0(3, 1) and that between SU(2) and SO(3) in
parallel. Also, we are allowed to deal with the relation between slð2, ℂÞ and so0 ð3, 1Þ
and that between suð2Þ and soð3Þ in parallel. Table 25.1 summarizes the properties
of several (differential) representations associated with the Lie algebras and
corresponding Lie groups.
The Lie algebra is a special kind of vector space. In terms of the vector space, it is
worth taking a second look at the basic concept of Chaps. 11 and 20. In Sect. 25.1,
we have mentioned the properties of the differential representation of the Lie algebra
g on a certain vector space V. That is, we have had
σ : g⟶V: ð25:4Þ
These features not only enrich the properties of Lie groups and Lie algebras but
also make various representations including differential representation easy to use
and broaden the range of applications.
As already seen in Sect. 20.4, soð3Þ and suð2Þ have the same structure constants.
Getting back to Theorem 11.3 (Dimension Theorem), we have had
M : suð2Þ⟶ soð3Þ:
Equations (20.273) and (20.276) clearly show that both suð2Þ and soð3Þ are
spanned by three basis vectors, namely the dimension of both suð2Þ and soð3Þ is
three. Hence, (11.45) should be translated into
Also, we have
This naturally leads to dim Ker M = 0 in (25.58). From Theorem 11.4, in turn,
we conclude that M is a bijective mapping, i.e., isomorphism. In this situation, we
expect to have the homomorphism between the corresponding Lie groups. This was
indeed the case with SU(2) and SO(3) (Theorem 20.7).
By the same token, in this section we have shown that there is one-to-one
correspondence between slð2, ℂÞ and so0 ð3, 1Þ. Corresponding to (25.58), we have
where
In this section, we wish to take the concept of complexification a step further. In this
context, we deal with the Cartan-Weyl basis that is frequently used in quantum
mechanics and the quantum theory of fields. In our case, the Cartan-Weyl basis has
already appeared in Sect. 3.5, even though we did not mention that terminology.
1288 25 Advanced Topics of Lie Algebra
25.2.1 Complexification
We focus on the application of Cartan-Weyl basis to soð3Þ [or oð3Þ] to widen the
vector space of soð3Þ by their “complexification” (see Sect. 24.10). The Lie algebra
soð3Þ is a vector space spanned by three basis vectors whose commutation relation is
described by
where Xi (i = 1, 2, 3) are the basis vectors, which are not commutative with one
another. We know that soð3Þ is isomorphic to suð2Þ. In the case of suð2Þ, the
complexification of suð2Þ enables us to deal with many problems within the
framework of slð2, ℂÞ.
Here, we construct the Cartan-Weyl basis from the elements of soð3Þ [or oð3Þ]
by extending soð3Þ to oð3, ℂÞ through the complexification. Note that oð3, ℂÞ
comprises complex skew-symmetric matrices. In Sects 20.1 and 20.2 we converted
the Hermitian operators to the anti-Hermitian operators belonging to soð3Þ such that,
e.g.,
Az - iMz : ð20:15Þ
These give a clue to how to address the problem. In an opposite way, we get
iAz = Mz : ð25:64Þ
The operators M1 , M2 , and M3 construct oð3, ℂÞ together with A1, A2, and A3.
That is, we have
0 a þ bi c þ di 0 a c 0 bi di
- ða þ biÞ 0 f þ gi = -a 0 f þ - bi 0 gi
- ðc þ diÞ - ðf þ giÞ 0 -c -f 0 - di - gi 0
0 a c 0 b d
= -a 0 f þi -b 0 g :
-c -f 0 -d -g 0
ð25:66Þ
U = V þ iW ½V, W 2 soð3Þ:
This implies that oð3, ℂÞ is a complexification of soð3Þ. Or, we may say that soð3Þ
is a real form of oð3, ℂÞ.
According to the custom, we define Fþ ½2 oð3, ℂÞ and F - ½2 oð3, ℂÞ as
and choose Fþ and F - for the basis vectors of oð3, ℂÞ instead of M1 and M2 . We
further define the operators J(+) and J(-) in accordance with (3.72) such that
Thus, we have gotten back to the discussion made in Sects. 3.4 and 3.5 to directly
apply it to the present situation. Although J(+), J(-), and M3 are formed by the linear
combination of the elements of soð3Þ, these operators are not contained in soð3Þ but
contained in oð3, ℂÞ. The vectors J(+), J(-), and M3 form a subspace of oð3, ℂÞ. We
term this subspace W, to which these three vectors constitute a basis set. This basis
set is referred to as Cartan-Weyl basis and we have
M3 , J ðþÞ = J ðþÞ ,
M3 , J ð- Þ = - J ð- Þ , ð25:68Þ
J ðþÞ , J ð- Þ = 2M3 :
Note that the commutation relations of (25.68) have already appeared in (3.74) in
the discussion of the generalized angular momentum.
p
Using J ð~± Þ J ð ± Þ = 2, we obtain a “normalized” relations expressed as
From (25.68) and (25.69), we find that W forms a subalgebra of oð3, ℂÞ. Also, at
the same time, from (25.14) the differential representation ad with respect to W
diagonalizes ad½M3 . That is, with the three-dimensional representation, using
(25.68) or (25.69) we have
-1 0 0
J ð- Þ M3 J ðþÞ ad½M3 = J ð- Þ M3 J ðþÞ 0 0 0 : ð25:70Þ
0 0 1
Hence, we obtain
-1 0 0
ad½M3 = 0 0 0 : ð25:71Þ
0 0 1
~ Spanfiζ1 þ ζ 2 , iζ 3 , iζ 1 - ζ 2 g:
W
-1 0 0
ad½iζ 3 = 0 0 0 : ð25:72Þ
0 0 1
From (25.68) and on the basis of the discussions in Sects. 3.4, 3.5, 20.2, and 20.4,
we can readily construct an irreducible representation ad of W or W ~ on the (2j + 1)-
dimensional representation space V, where j is given as positive integers or positive
half-odd-integers. The number j is said to be a highest weight as defined below.
Definition 25.3 Eigenvalues of the differential representation ad½M3 or ad[iζ 3] are
called a weight of the representation ad. Among the weights, the weight that has the
largest real part is the highest weight.
In (25.71) and (25.72), for instance, the highest weight of M3 or iζ 3 is 1. From the
discussion of Sects. 3.4 and 3.5, we immediately obtain the matrix representation of
the Cartan-Weyl basis of J(-), M3, and J(+) or (iζ 1 + ζ 2, iζ 3, and iζ 1 - ζ 2).
Replacing l with j in (3.152), (3.153), and (3.158), we get
-j
-j þ 1
-jþ 2
M3 = ⋱ , ð25:73Þ
k-j
⋱
j
J ð- Þ =p
0 2j 1
0 ð2j - 1Þ 2
0 ⋱
⋱ 0
ð2j - k þ 1Þ k ,
0
⋱ ⋱
p
0 1 2j
0
ð25:74Þ
1292 25 Advanced Topics of Lie Algebra
J ð þÞ =
0
p
2j 1 0
ð2j - 1Þ 2 0
⋱ 0
,
ð2j - k þ 1Þ k 0
⋱ ⋱
0
⋱ 0
p
1 2j 0
ð25:75Þ
M3 j ζ, μi = μ j ζ, μ i ðμ = j, j - 1, ⋯, - j þ 1, - jÞ,
J ðþÞ j ζ, μi = ðj - μÞðj þ μ þ 1Þ j ζ, μ þ 1i ðμ = j - 1, j - 2, ⋯, - jÞ, ð25:76Þ
ð- Þ
J j ζ, μi = ðj - μ þ 1Þðj þ μÞ j ζ, μ - 1 i ðμ = j, j - 1, ⋯, - j þ 1Þ,
Except for (i) of Definition 25.1, the conditions (ii), (iii), and (iv) hold without
alterations.
In the context of the complex representation, we have a following general
theorem.
Theorem 25.4 [1] Let g be a complex Lie algebra and h be a real form of g (see
Sect. 24.10). Let ρ be any representation of h on a vector space V. Moreover, let ρc
be a representation of g on V. Suppose with ρc we have a following relationship:
Proof For the proof, use (25.1), (25.2), and (25.5). The representation ρc is a real
linear mapping from g to glðVÞ. The representation ρc is a complex linear mapping
as well. It is because we have
Meanwhile, we have
where J indicates the momenta of the coupled system with j1 and j2 being momenta
of the individual systems. At the same time, J represents the highest weight. Thus,
j used in the previous section should be replaced with J.
We show two examples below and compare the results with those of Sect. 20.3.3.
In the examples given below, we adopt the following calculation rule. Let O be an
operator chosen from among M3, J(+), and J(-) of (25.76). Suppose that the starting
equation we wish to deal with is given by
Ξ = Φ Ψ,
where Φ Ψ is a direct product of quantum states Φ and Ψ (or physical states, more
generally). Then, we have
25.2 Cartan–Weyl Basis of Lie Algebra [1] 1295
J = 1, 0:
(i) J = 1
If among the highest weights we choose the largest one, the associated
coupled state is always given by the direct product of the individual component
state(see Examples 20.1 and 20.2). Thus, we have
Then, we obtain
p
2 j 0, 3i = J ð- Þ j1=2, 2iA j 1=2, 2iB þ j 1=2, 2iA J ð- Þ j1=2, 2iB : ð25:83Þ
Hence, we have
1
j 0, 3i = p ½j - 1=2, 2iA j1=2, 2iB þj1=2, 2iA j - 1=2, 2iB : ð25:84Þ
2
Looking at (25.85), we notice that we can equally start with (25.85) using J(+)
instead of starting with (25.81) using J(-).
(ii) J = 0
In this case, we determine a suitable solution j0, 1i so that it may become
orthogonal to j0, 3i given by (25.84). We can readily find the solution such that
1
j 0, 1i = p ½j - 1=2, 2iA j1=2, 2iB - j1=2, 2iA j - 1=2, 2iB : ð25:87Þ
2
J = 2, 1, 0: ð25:88Þ
The calculation processes are a little bit longer than Example 25.4, but the
calculation principle is virtually the same.
(i) J = 2.
We start with
1
j 1, 5i = p ðj1, 3iA j0, 3iB þj0, 3iA j1, 3iB Þ: ð25:90Þ
2
1
j 0, 5i = p ðj - 1, 3iA j1, 3iB þ 2j0, 3iA j0, 3iB þj1, 3iA j - 1, 3iB Þ, ð25:91Þ
6
1
j - 1, 5i = p ðj - 1, 3iA j0, 3iB þj0, 3iA j - 1, 3iB Þ, ð25:92Þ
2
j - 2, 5i = j - 1, 3iA j - 1, 3iB : ð25:93Þ
(ii) J = 1
25.2 Cartan–Weyl Basis of Lie Algebra [1] 1297
1
j 1, 3i = p ðj1, 3iA j0, 3iB - j0, 3iA j1, 3iB Þ: ð25:94Þ
2
1
j 0, 3i = p ðj1, 3iA j - 1, 3iB - j - 1, 3iA j0, 3iB Þ, ð25:95Þ
2
1
j - 1, 3i = p ðj0, 3iA j - 1, 3iB - j - 1, 3iA j0, 3iB Þ: ð25:96Þ
2
(iii) J = 0
We determine a suitable solution j0, 1i so that it is orthogonal to j0, 5i given
by (25.91) to get the desired solution described by
1
j 0, 1i = p ðj - 1, 3iA j1, 3iB - j0, 3iA j0, 3iB þj1, 3iA j - 1, 3iB Þ: ð25:97Þ
3
Notice that if we had chosen for j0, 1i a vector orthogonal to j0, 3i of (25.95), we
would not successfully construct a unitary matrix expressed in (20.232). The above
is the whole procedure that finds a set of proper solutions. Of these, the solutions
obtained with J = 2 and 0 possess the symmetric representations, whereas those
obtained for J = 1 have the antisymmetric representions.
Symbolically writing the above results of the direct product in terms of the direct
sum, we have
3 3 = 5 3 1: ð25:98Þ
A = B þ iC, ð14:125Þ
where
1 1
B A þ A{ and C A - A{ ð25:100Þ
2 2i
with both B and C being Hermitian. In other words, any matrix A can be decomposed
into the sum of an Hermitian matrix and an anti-Hermitian matrix. Instead of
decomposing A as (14.125), we may equally decompose A such that
A = B þ iC, ð14:125Þ
where
1 1
B A - A{ and C A þ A{ ð25:101Þ
2 2i
Z = B þ iC Z 2 glð2, ℂÞ; B, C 2 V4 :
Viewing the Lie algebra as a vector space, we have a following theorem with
respect to V4 .
Theorem 25.5 Let the vector space V4 be given by
where (hij) is a complex (2, 2) Hermitian matrix. Then, the Lie algebra glð2, ℂÞ is
expressed as a direct sum of V4 and iV4 . That is, we have
Proof It is evident that both V4 and iV4 form a vector space [i.e., subspace of
glð2, ℂÞ. Let z be an arbitrarily chosen element of glð2, ℂÞ. We assume that z can be
described as
a þ ib c þ id
z= ða, b, c, d, p, q, r, s 2 ℝÞ: ð24:535Þ
p þ iq r þ is
1
z = ½ða þ r Þσ 0 þ ð - a þ r Þσ 3 þ ðc þ pÞσ 1 þ ðd - qÞσ 2
2 ð24:536Þ
i
þ ½ðb þ sÞσ 0 þ ð - b þ sÞσ 3 þ ðd þ qÞσ 1 þ ð - c þ pÞσ 2 :
2
The element of the first term of (24.536) belongs to V4 and that of the second
term of (24.536) belongs to iV4 . Thus, z 2 V4 þ iV4 . Since z is arbitrary, this
implies that
1300 25 Advanced Topics of Lie Algebra
α = β = γ = δ = λ = μ = ν = ξ = 0:
V4 \ iV4 = f0g:
w = w1 þ w2 w1 2 V4 , w2 2 iV4 : ð25:106Þ
That is, by the definition of the direct sum described by (11.17), V4 þ iV4 is
identical with V4 iV4 . Meanwhile, from (25.106) evidently we have
Using (25.103) and (25.107) along with (25.106), we obtain (25.102). These
complete the proof.
Thus, glð2, ℂÞ is a complexification of V4 . In a complementary style, the
subalgebra V4 is said to be a real form of glð2, ℂÞ. Theorem 25.5 is in parallel
with the discussion of Sects. 24.9 and 25.2 on the relationship between
slð2, ℂÞ and suð2Þ. In the latter case, we say that slð2, ℂÞ is a complexification of
suð2Þ and that suð2Þ is a real form of slð2, ℂÞ. Caution should be taken in that the real
form V4 comprises the Hermitian operators, whereas the real form suð2Þ consists of
the anti-Hermitian operators. We remark that although iV4 is a subspace of glð2, ℂÞ,
it is not a subalgebra of glð2, ℂÞ.
In this book, we chose glð2, ℂÞ and slð2, ℂÞ as typical examples of the complex
Lie algebra. A bit confusing, however, these Lie algebras are real Lie algebra as
well. This is related to the definition of the representation of Lie algebra. In (25.1),
we designated a as a real number. Suppose that the basis set of the Lie algebra g is
25.4 Further Topics of Lie Algebra 1301
fe1 , ⋯, ed g, where d is the dimension of g (as a vector space). Then, from (25.1) and
(25.2) we obtain
d d
σ ak e k = ak σ ðek Þ ak : real : ð25:108Þ
k=1 k=1
In relation to the advanced topics of the Lie algebra given in this chapter, we wish to
mention related topics below. In Sect. 15.2 we have had various properties of the
exponential functions of matrices. Here, it is worth adding some more properties of
those functions. We have additional important properties below [1].
1302 25 Advanced Topics of Lie Algebra
where with the first equality we used Property (3) of (15.31). Differentiating exp
(tAT) and exptA of (25.110) with respect to t and taking a limit of t → 0 in the
derivative, we obtain AT = A. In a similar manner, we can show Property (13).
With so0 ð3, 1Þ we have six basis vectors (in a matrix form) as shown in (24.220).
Of these, three vectors are real skew-symmetric (or anti-Hermitian) matrices and
other three vectors are real symmetric (or Hermitian) matrices. Then, their exponen-
tial functions are real orthogonal matrices [from Property (6) of Sect. 15.2] and real
symmetric (or Hermitian) matrices [from Properties (12) and (13) as shown above],
respectively. Since eigenvalues of the Hermitian matrix are real, the eigenvalues
corresponding to its exponential function are positive definite from Theorem 15.3.
Thus, naturally we have a good example of the polar decomposition of a matrix with
respect to the Lorentz transformation matrix; see Theorem 24.7 and Corollary 24.3.
We have a next example for this.
Example 25.6 Returning to the topics dealt with in Sects. 24.1 and 24.2, let us
consider the successive Lorentz transformations of the rotation about the z-axis and
the subsequent boost in the positive direction of the z-axis. It is a case where the
direction of rotation axis and that of boost coincide.
The pertinent elements of the Lie algebra are given by
0 0 0 0 0 0 0 1
0 0 -1 0 0 0 0 0
Xr = , Xb = 0 0 0 0 , ð25:111Þ
0 1 0 0
0 0 0 0 1 0 0 0
where Xr and Xb stand for the rotation and boost, respectively. We express the
rotation by θ and the boost by ω as
25.4 Further Topics of Lie Algebra 1303
0 0 0 0 0 0 0 -ω
0 0 -θ 0 0 0 0 0
X θr = 0 θ 0 0 , X ωb = : ð25:112Þ
0 0 0 0
0 0 0 0 -ω 0 0 0
Notice that Xr and Xb are commutative and so are X θr and X ωb . Therefore, on the
basis of Theorem 15.2 we obtain
Meanwhile, we define X as
0 0 0 -ω
0 0 -θ 0
X X θr þ X ωb = : ð25:114Þ
0 θ 0 0
-ω 0 0 0
Borrowing (24.224) and the related result, as the successive Lorentz transforma-
tions Λ we get
cosh ω 0 0 - sinh ω
0 cos θ - sin θ 0
= 0 sin θ cos θ 0 :
- sinh ω 0 0 cosh ω
In this book we have dealt with classical electromagnetism and quantum mechanics
that are bridged by the quantum field theory. Meanwhile, chemistry-oriented topics
such as quantum chemistry were studied in a framework of materials science and
molecular science including the device physics. The theory of vector spaces has
given us a bird’s-eye view of all these topics. Theory of analytic functions and group
theory supply us with a powerful tool for calculating various physical quantities and
invariants that appeared in the mathematical equations. The author is sure that the
readers acquire ability to attack various problems in the mathematical physical
chemistry.
References
Antiparticle, 967, 970, 1039–1040, 1256 1160–1162, 1164, 1167, 1174, 1176,
Anti-phase, 349 1180, 1193, 1194, 1289, 1300
Antipodal point, 924, 925 vectors, 8, 42, 61, 63, 143, 159, 300, 418,
Antisymmetric, 435, 750–753, 890–892, 895, 457, 459, 464, 465, 467, 469–471, 473,
1003, 1053, 1278, 1297 475, 478–484, 495, 497, 501, 503, 514,
Approximation method, 155–183 516–518, 522, 532–535, 540, 542,
Arc tangent, 66, 386 544–546, 554, 558, 562–566, 571, 601,
Argument, 30, 57, 68, 156, 192, 203, 234, 238, 662, 666, 675, 677, 678, 681, 683, 695,
245, 251, 255, 261, 263, 264, 267, 268, 703, 705, 709, 710, 712, 714, 715, 717,
270, 273, 313, 328, 330, 365, 407, 416, 730, 737, 742, 746, 764, 770, 773, 787,
420–423, 425, 427, 428, 431, 434, 435, 791, 798, 810, 831, 834, 873, 883, 895,
438, 440, 441, 480, 489, 516, 525, 527, 903, 960, 1025, 1086, 1150, 1160, 1175,
543, 558, 581, 598, 604, 634, 642, 649, 1194, 1198, 1203, 1259, 1261, 1276,
673, 684, 707, 719, 729, 733, 757, 761, 1287–1290, 1302
766, 773, 777, 790, 825, 834, 846, 849, BCs, see Boundary conditions (BCs)
856, 858, 878, 932, 940, 988, 1003, Benzene, 674, 755, 773, 789, 791–797, 799
1008, 1017, 1068, 1130, 1144, 1156, Bessel functions, 642, 643
1181, 1220, 1257, 1258 Bijective, 472–474, 476, 648, 655, 924, 926,
Aromatic hydrocarbons, 755, 773 1149, 1157, 1158, 1168, 1276, 1281,
Aromatic molecules, 783, 797, 805 1284, 1285, 1287, 1301
Associated Laguerre polynomials, 59, 110, Bijective homomorphism, 655, 656
118–124 Bilinear form, 935, 936, 938, 939, 1155, 1157,
Associated Legendre differential equation, 85, 1158, 1162, 1163, 1168, 1177–1179,
94–105, 450 1183, 1187, 1193, 1261, 1263
Associated Legendre functions, 59, 85, 97, Bilinearity
105–109 of tensor product, 1152
Associative law, 24, 466, 566, 567, 650, 658, Binomial expansion, 240, 241, 883
689, 692, 943, 1083, 1173 Binomial theorem
Asymmetric, 1037 generalized, 240
Atomic orbitals, 761, 768, 771–773, 775, 778, Biphenyl, 674
783, 784, 791, 792, 798, 805, 813 Bithiophene, 674
Attribute, 662, 663 Blackbody, 351–353
Azimuthal angle, 30, 72, 700, 819, 861, 869, Blackbody radiation, 351–353
924, 925, 1194, 1196 Block matrix, 712, 891
Azimuthal quantum number, 122, 123, 144, Blueshift, 388
149 Blueshifted, 389–392, 394
Azimuth of direction, 689 Bohr radius
of hydrogen, 135, 822
of hydrogen-like atoms, 110
B Boltzmann distribution law, 351, 360
Back calculation, 587 Born probability rule, 589
Backward wave, 286, 343, 347 Boron trifluoride, 678
Ball, 226, 227, 935, 939 Bose–Einstein distribution functions, 353
Basis Boson, 1097–1102, 1139
functions, 90, 709–717, 746, 747, 771, 776, Boundary
786, 805, 806, 827, 844–846, 849, 869, functionals, 402, 406, 410, 411, 415, 431
873, 874, 876, 877, 880, 890–892, 894, term, 409
895, 1198, 1199 Boundary conditions (BCs)
set, 63, 483, 542, 543, 566, 576, 610, 711, adjoint, 415
718, 719, 746, 766, 877, 879, 885, 909, homogeneous, 445
912, 922, 934, 935, 1086, 1143–1148, homogeneous adjoint, 415–417
1151, 1152, 1154, 1156, 1158, inhomogeneous, 424, 623
Index 1307
Bounded, 25, 29, 77, 81, 212, 219, 221–223, Chemical species, 677
229, 1192, 1263 Circular argument, 156
BP1T, 374–377 Circular cone, 1185, 1186
BP3T, 392–394 Circularly polarized light
Bragg’s condition, 389, 390, 392 left-, 138, 140, 150, 302
Branch, 259, 261–264, 643 right-, 138, 140
Branch cut, 261, 262, 264–269, 271–273 Clad layer, 332, 339, 340, 343, 386
Branch point, 260, 262, 264–266, 268, 271–273 Classes, 647, 651–653, 679–681, 683, 685,
Bra vector, 22, 549, 552, 555 688, 725, 726, 731–737, 871
Brewster angle, 322–325 Classical Hamiltonian, 33
Classical numbers, 10
Classical orthogonal polynomials, 49, 97, 401,
C 450
Canonical basis, 1156, 1174 Clebsch–Gordan coefficient, 873–895, 1294
Canonical coordinate, 28, 977, 993, 1055, Clopen set, 196, 199, 200, 921
1059, 1089 Closed interval, 201
Canonical forms of matrices, 485–548, 583 Closed-open set, 196
Canonical momentum, 977, 993, 994, 1055, Closed path, 199, 200, 215, 216, 922
1059, 1089 Closed set, 190, 191, 193–195, 198, 201, 219,
Canonical variables, 983, 993 921
Cardinalities, 188, 1083 Closed shell, 767, 768, 782
Cardinal numbers, 188 Closed surface, 281, 282
Carrier space, 713 Closure, 191–194, 680, 943, 1083
Cartan–Weyl basis, 1287–1297 C-number, 10, 22, 995, 1015, 1026, 1032,
Cartesian coordinate, 8, 47, 111, 175, 291, 361, 1033, 1038, 1039, 1059, 1099, 1133
677, 762, 819, 821, 834, 852, 919, 978, Cofactor, 476, 477, 498
980, 1124, 1218 Cofactor matrix, 498
Cartesian space, 61, 460, 461 Coherent state, 128, 129, 135, 141, 152
Cauchy–Hadamard theorem, 209–212 Coinstantaneity, 944
Cauchy–Liouville Theorem, 222 Column vector, 8, 22, 42, 90, 91, 93, 299, 465,
Cauchy–Riemann conditions, 209–212 481, 483, 488, 489, 491, 492, 529,
Cauchy–Schwarz inequality, 551, 611 533–535, 537, 553, 560, 567, 580, 581,
Cauchy’s integral formula, 213–223, 225, 232, 592, 597, 598, 602, 622, 640, 684, 710,
236 786, 809, 839, 846, 849, 861, 894, 935,
Cauchy’s integral theorem, 216, 217, 251, 937, 938, 947, 949–953, 957, 968, 969,
1021–1023 1030, 1144–1146, 1152–1154, 1174,
Cavity, 33, 351, 353, 355, 357, 396, 397, 399 1176, 1193, 1201, 1210–1212, 1239,
Cavity radiation, 351 1240
Cayley’s theorem, 689 Common factor, 504, 505, 509, 510
Center of inversion, 668 Commutable, 28, 509, 513–515, 537, 550, 580,
Center-of-mass coordinate, 60 603, 668, 669, 900
Character, 479, 654, 657, 663, 723, 725, 726, Commutation relation
728, 729, 748–750, 767, 772, 774–777, anti-, 1032, 1035–1039, 1041, 1098, 1116
782–787, 792, 797, 803, 811, 825, 871, canonical, 3, 28–30, 44, 45, 56, 61, 67, 68,
872, 874, 943, 1156, 1164, 1175 977, 993, 1061, 1065
Characteristic equation, 443, 486, 487, 528, equal-time, 977, 993–999, 1005, 1015
591, 597, 715, 861 Commutative, 28, 175, 499, 544, 611, 617, 629,
Characteristic impedance 630, 641, 648, 677, 715, 854, 855, 896,
of dielectric media, 279, 280, 288, 305–351 901, 902, 919, 921, 945, 960–962, 1062,
Characteristic polynomial, 486, 508, 540, 543, 1088–1090, 1168, 1200, 1202, 1240,
547, 974 1248, 1249, 1265, 1288, 1303, 1304
Charge conjugation, 967–971, 1039, 1040, Commutative group, 648
1081, 1209, 1210, 1221, 1227 Commutative law, 648
1308 Index
Commutator, 28–30, 146, 147, 899, 903, 947, Conjugate momentum, 979
998, 1005–1007, 1017, 1032, 1062, Conjugate representations, 789
1089, 1272 Connected
Compact group, 1192, 1263 component, 916–918, 921, 926, 1189, 1255,
Compatibility relation, 776 1258, 1263
Compendium quantization method, 1031 set, 916, 917, 919, 923, 1254, 1263
Complement, 187, 188, 190, 194, 195, 571, Connectedness, 199, 200, 204, 915–926,
572, 583, 1178, 1179, 1181, 1184, 1186, 1241–1258
1187 CONS, see Complete orthonormal system
Complementary set, 188, 219, 918 (CONS)
Completely reducible, 712, 714, 855, 860 Conservation law, 283, 933
Completeness relation, 746, 1214, 1232, 1233, Conservation of charge, 282
1237 Conservation of energy, 4
Complete orthonormal system (CONS), 64, Conservative force, 982, 1078
155, 159, 167, 170, 171, 177, 873, 952, Conserved quantity, 959
962, 969, 989, 1091, 1239 Constant loop, 922, 1250
Complete system, 159 Constant matrix, 629, 630, 643
Complex amplitude, 313 Constant term, 173
Complex analysis, 185, 186, 203 Constructability, 423
Complex analytic representation, 1293, 1294 Continuity equation
Complex conjugate, 22, 24, 25, 29, 34, 119, current, 282, 1054
140, 205, 360, 412, 419, 423, 552, 562, Continuous function, 220, 915, 917, 922, 1258
718, 849, 881, 907, 932, 970, 1030, Continuous group, 196, 612, 689, 827–926,
1129 1141, 1241, 1242, 1293, 1294
Complex conjugate representation, 846 Continuous spectra, 961
Complex conjugate transposed matrix, 22, 550, Contour
740, 789 integral, 213–215, 236–238, 244, 246, 248,
Complex domain, 19, 205, 223, 234, 272–274, 250, 1021, 1047
326, 1012 integration, 217–219, 223, 227, 237, 239,
Complex function, 14, 185, 205, 210, 212, 213, 243, 245, 247, 255, 267, 270, 271, 1011,
411, 790, 1047 1020–1023, 1048
Complexification, 1268, 1269, 1281, map, 755, 756, 758, 759
1288–1294, 1298–1300 Contraposition, 210, 230, 404, 474, 503, 554,
Complex number field, 1142, 1155 693, 854, 1179
Complex phase shift, 341 Contravariant
Complex plane, 14, 185, 186, 202–204, 206, tensor, 1051, 1173, 1174
207, 212, 213, 217–219, 223, 249, 250, vector, 937–939, 1061, 1174, 1177
258–261, 272, 328, 329, 548, 915 Convergence circle, 226, 227, 233
Complex roots, 243, 442, 443 Convergence radius, 226
Complex variable, 15, 204–213, 222, 237, 557, Converse proposition, 617, 854
886, 1011 Coordinate point, 298, 301
Composite function, 66, 979, 981 Coordinate representation, 45–53, 110, 114,
Compton, A., 4, 6, 1075, 1078, 1102, 1109, 124, 129, 131, 139, 149, 161, 167, 170,
1111 172, 174, 175, 179, 181, 182, 264, 418,
Compton effect, 4 819, 852, 867, 954, 1049, 1086, 1124,
Compton scattering, 1075, 1078, 1102, 1109, 1184, 1193, 1195, 1199, 1201, 1237,
1111, 1113–1139 1245
Compton wavelength, 4 Coordinate space, 977
Condon–Shortley phase, 103, 105, 109, 125, Coordinate transformation, 300, 382, 661, 667,
135, 141–143 703, 706, 764, 845, 868, 1167, 1169,
Conductance, 336 1177, 1187
Conic section, 1185 Core layer, 300, 382, 661, 667, 703, 764, 845,
Conjugacy classes, 651, 680, 681, 683, 685, 868, 1167, 1169, 1177, 1187
688, 725, 734, 737 Corpuscular beam, 6, 7
Conjugate element, 651 Correction term, 157–160
Index 1309
363–365, 383–385, 396, 397, 441, 637, Equilateral triangle, 677, 678, 783, 784
781, 824, 1051 Equivalence
Electric flux density, 278, 382 law, 512, 593
Electric permittivity tensor, 380 relation, 916
Electric wave, 293–295 theorem, 932
Electromagnetic field, 33, 129, 151, 305–307, transformation, 570, 593, 594, 1161, 1162
313–315, 332, 363, 377, 380, 384, 386, Equivalent transformation, 593
398, 594, 643, 977, 991, 993, 1023, Ethylene, 755, 773–784, 786, 788, 790, 791,
1049–1073, 1075, 1077–1079, 1083, 797, 799
1084 Euclidean space, 1165, 1177–1187
Electromagnetic induction, 280 Euler angles, 695–704, 836, 840, 842, 845, 849,
Electromagnetic plane wave, 290 856, 861, 911, 1261
Electromagnetic potential, 1051, 1077 Euler equation
Electromagnetic wave, 129, 277, 289–303, second-order, 173
305–351, 353–358, 365, 370, 372, 382, Euler–Lagrange equation, 982, 992, 993, 1057
383, 396, 401, 441, 637, 1059 Euler’s formula, 203
Electromagnetism, 170, 277, 278, 287, 450, Euler’s identity, 203
594, 1049, 1061, 1304 Evanescent mode, 331
Electronic configuration, 767, 780–782, 797, Evanescent wave, 332, 338–343
802, 803, 824, 825 Even function, 19, 51, 134, 161, 867, 873, 1024
Element, 47, 86, 129, 186, 296, 359, 457, 487, Even permutations, 475, 904, 1095, 1111
549, 577, 607, 647, 661, 705, 761, 841, Excitation
936, 1032, 1083, 1145, 1271 optical, 372, 825
Elementary charge, 60, 129, 151, 359, 1076 strong, 373, 393
Elementary particle weak, 393, 394
physics, 1140, 1258 Excited state, 37, 44, 134, 156, 178, 179, 181,
reaction, 1120 351, 352, 358, 359, 366, 780, 797, 802,
Ellipse, 297, 298, 301, 302, 381, 599 803, 824, 825
Ellipsoidal coordinates, 819 Existence probability, 21, 32, 987
Elliptically polarized light, 297, 298, 301, 1061 Existence probability density, 142
Emission Expansion coefficient, 177, 225, 418, 866, 994,
direction, 377, 393 1261
gain, 392, 394 Expectation value, 25, 33–36, 53, 54, 70, 176,
grazing, 379, 390, 391 178, 588–590, 1015–1017, 1031, 1043,
lines, 370, 374, 375, 377, 393, 394 1079, 1098–1100
optical, 372, 768 Exponential function, 15, 248, 290, 308, 309,
spontaneous, 358, 359, 365, 366, 368 446, 607–644, 827, 830, 836, 856, 921,
stimulated, 358, 359, 367, 368 946, 1048, 1082, 1086, 1301, 1302
Empty set, 188–190, 652 Exponential polynomial, 309
Endomorphism, 457, 465, 468, 469, 471–474, External force field, 60, 441
478, 480, 648, 656, 907, 908, 1086, Extremals, 131, 347
1152, 1172, 1173, 1272, 1275, 1279, Extremum, 180, 388, 984
1286, 1301
Energy conservation, 4, 322, 1137
Energy eigenvalue, 20, 31, 36, 37, 53, 123, 135, F
155, 156, 158, 162, 176, 717, 746, 765, Fabry-Pérot resonator, 373
777, 779, 780, 789, 795, 796, 802, 814, Factor group, 653, 657, 677
823, 962, 967 Faraday, M., 280
Energy level, 41, 157–160, 176, 358, 781, 824 Faraday’s law of electromagnetic induction,
Energy-momentum, 991, 1107 280
Energy transport, 319–322, 331, 347 Fermion, 1036, 1043, 1095, 1098–1103, 1110,
Entire function, 213, 222, 223, 250, 1048 1111, 1139
Equation of motion, 60, 151, 441, 933, 981, Feynman
992–994, 1079, 1089 amplitude, 1073, 1115–1120, 1124, 1126,
Equation of wave motion, 277, 284–289, 334 1129
1312 Index
Grazing incidence, 328, 330 Hermite polynomials, 49, 51, 59, 167, 450
Greatest common factor, 505 Hermitian, 7, 34, 70, 132, 156, 360, 401, 555,
Green’s functions, 401–453, 607, 618, 620, 571, 616, 707, 765, 830, 943, 995, 1081,
621, 644 1165, 1278
Green’s identity, 420 Hermitian differential operator, 176, 178, 411,
Ground state, 36, 130, 134, 156, 157, 160, 161, 421, 424, 425, 448
168, 176, 178, 179, 181, 351, 352, 358, Hermitian matrix, 24, 133, 555–557, 569, 592,
359, 768, 780, 797, 802, 803, 824, 825 597, 616, 618, 627, 898, 907, 909, 951,
Group 952, 1192, 1197, 1229, 1242, 1243,
algebra, 726–734 1245–1247, 1259, 1263, 1298, 1299,
element, 647, 649, 651, 659, 675, 680, 685, 1302
688, 689, 705–707, 710, 712, 715, 719, Hermitian operator, 24, 25, 29, 54, 55, 71, 79,
721, 723–726, 730, 733–735, 737, 738, 111, 132, 160, 177, 181, 360, 401, 424,
746, 750, 761, 772, 780, 784, 791, 855, 450, 453, 571–605, 636, 741, 766, 834,
863–867, 870, 876, 1250 835, 839, 901, 904, 960, 962, 972, 1030,
finite, 647–649, 651, 654, 655, 657, 689, 1069, 1087, 1104, 1221, 1243, 1250,
705–707, 709, 710, 712, 717, 729, 827, 1258, 1259, 1288, 1300
863, 870, 871, 876, 918, 920 Hermiticity, 3, 19, 25, 26, 119, 413, 416, 424,
infinite, 648, 655, 661, 689, 827, 873, 876 449, 450, 601, 709, 771, 974, 1238,
representation, 711, 720, 723, 725–727 1240, 1243
symmetric, 688, 689 Hesse’s normal form, 288, 684, 687
theory, 484, 607, 647–659, 662, 669, 705, High-energy physics, 949
714, 737, 755–826 Highest-occupied molecular orbital (HOMO),
Group index, 371, 376 780, 781, 797
Group refractive index, 371 Hilbert space, 26, 588, 774, 1005
Group velocity, 7, 12, 338 Hole theory, 1036
Homeomorphic, 926
HOMO, see Highest-occupied molecular orbital
H (HOMO)
Half-integer, 78, 79 Homogeneous boundary conditions (BCs), 410,
Half-odd-integer, 78, 79, 81, 84, 841, 844, 880, 411, 416, 417, 420, 421, 424, 425, 427,
914, 1291 429–433, 445, 448, 449, 623
Hamilton–Cayley theorem, 498–499, 509, 510, Homogeneous equation, 173, 402, 419, 421,
515, 540 423, 424, 426, 427, 430, 441, 448, 449,
Hamiltonian, 11, 21, 29, 31, 33, 36, 43, 59–69, 620, 623, 626, 630, 633, 639, 640
109, 156, 161, 168, 178, 181, 558, 761, Homomorphic, 654–657, 706, 711, 734, 1265,
764, 766, 767, 771, 788, 816, 819, 1272, 1285–1287
960–964, 977, 983, 984, 999–1005, Homomorphism, 647, 653–657, 705–707, 739,
1031, 1049, 1065–1071, 1076, 1077, 745, 912–914, 1263, 1283, 1286, 1287
1079, 1085, 1088, 1091, 1094, 1104, Homomorphism theorem, 657, 914, 1265
1105, 1108, 1113, 1115 Homotopic, 922, 1250, 1258
Hamiltonian density, 999, 1028–1035, 1049, Hückel approximation, 780
1055–1059, 1065, 1079, 1085, 1094, Hydrogen, 59, 124, 127, 134–144, 167–171,
1104 176, 781, 805, 807–810, 813–819, 962
Hamilton’s principle, 984 Hydrogen-like atoms, 59–125, 127, 134, 143,
Harmonic oscillator, 30–59, 109–111, 127, 144, 155, 157, 558, 603, 949
128, 132, 155, 156, 178, 179, 351, 353, Hyperbolic functions, 942, 1211, 1216
396, 398, 399, 401, 440, 441, 558, 774, Hypersphere, 923, 1185
994, 998, 1003, 1201 Hypersurface, 599, 923
Heaviside-Lorentz units, 1049
Heaviside step function, 422
Heisenberg, W., 11, 1087–1091 I
Heisenberg equation of motion, 1089 Idempotent matrix, 505–513, 544, 546–548,
Hermite differential equation, 449, 450 588, 631
1314 Index
Invariant delta functions, 977, 1005–1015, Klein-Gordon field, 991, 994, 999, 1028
1017, 1018, 1023, 1028, 1040–1043, Kronecker delta, 50, 937
1072 Kronecker product, 1149
Invariant scattering amplitude, 1118
Invariant subgroup, 652, 656, 659, 677, 688,
732, 914, 917, 918, 1189, 1255 L
Invariant subspace, 495–500, 508, 517, Laboratory coordinate system, 381
529–531, 539, 540, 544, 548, 571, 583, Laboratory frame, 1121, 1136, 1138
602, 603, 713, 714, 717, 808 Ladder operator, 76, 93, 94, 114
Inverse element, 472, 648, 653, 655, 658, 664, Lagrange’s equations of motion, 982, 984, 985,
682, 706, 727, 733, 942, 1083 993
Inverse matrix, 43, 205, 457, 476, 479, 480, Lagrangian, 977–985, 991–993, 1028–1035,
538, 596, 622, 628, 631, 706, 937, 942, 1049, 1055–1059, 1079–1082, 1084,
944, 1170, 1195 1085
Inverse transformation, 468, 472, 473, 648, Lagrangian density, 991–993, 1028, 1049,
845, 847, 1257, 1258, 1284 1055–1059, 1079, 1080, 1084
Inversion symmetry, 668, 669, 673, 680, 1255 Lagrangian formalism, 977–985, 993
Inverted distribution, 368 Laguerre polynomials, 59, 110, 118–124
Invertible, 472, 648, 656 Laplace operator, 8, 761
Irradiance, 322, 367–370 Laplacian, 8, 761, 762
Irrational functions, 256 Laser
Irreducible, 712, 714, 717–726, 728, 729, device, 372, 375, 395
734–738, 742–746, 749, 764, 766–768, material, 371, 375, 377
771–773, 775–777, 780, 782, 784, 786, medium, 367–369, 371, 380
788, 792, 797–799, 801, 803, 806, 808, organic, 372–395
810–812, 814, 815, 817, 823–825, Laser oscillation
853–861, 868–874, 876–878, 891, multimode, 376, 394
1291–1293 single-mode, 394, 395
Irreducible character, 723, 868–874 Lasing property, 372
Isolated point, 196–199 Laurent’s expansion, 227, 229, 236, 240, 241
Isomorphic, 654, 655, 657, 674, 680, 688, 711, Laurent’s series, 223–230, 234, 236, 241
760, 914, 924, 1157, 1249, 1255, 1264, Law of conservation of charge, 282
1276, 1281, 1286, 1288, 1290, 1301 Law of electromagnetic induction, 280
Isomorphism, 647, 653–657, 705, 920, 1157, Law of equipartition of energy, 353
1158, 1276, 1284–1287 LACO, see Linear combination of atomic
IVPs, see Initial value problems (IVPs) orbitals (LCAO)
Least action
principle of, 984, 992, 993
J Left-circularly polarized light, 138, 140, 146,
Jacobian, 865 150, 301, 302, 637
Jacobi’s identity, 903, 1274, 1277 Legendre differential equation, 95, 96, 450
j-j Coupling, 879 Legendre polynomials, 59, 88, 94, 95, 101, 103,
Jordan blocks, 517–528, 530–533, 540 107–109, 273
Jordan canonical form, 485, 515–539 Legendre transformation, 983, 984
Jordan decomposition, 515 Leibniz rule, 94, 96, 103, 120
Jordan’s lemma, 248–256, 1021, 1022 Leibniz rule about differentiation, 94, 96
Levi-Civita symbol, 904
Lie algebra, 618, 626, 636, 827, 830, 896–915,
K 921, 922, 926, 1189–1193, 1241, 1242,
Kernel, 468, 529, 655–657, 912–914, 1250, 1256, 1259, 1266, 1269,
1263–1265, 1283–1285 1271–1304
Ket vector, 22, 549, 552 Lie group
Klein-Gordon equation, 931–934, 946, 991, linear, 897, 898, 908, 917, 921, 922, 926,
992, 994, 1097, 1197, 1200–1204 1282
1316 Index
surjective, 657, 912 Minor, 296, 397, 473, 476, 559, 597
universal bilinear, 1149 Minor axis, 301
Mathematical induction, 49, 82, 115, 117, 488, Mirror symmetry
490, 493, 568, 582, 595, 609, 1171 plane of, 667, 671, 673, 675
Matrix Mode density, 353–358
adjugate, 477 Modulus, 202, 219, 242, 866, 1256
algebra, 43, 317, 382, 457, 468, 476, 501, Molecular axis, 674, 783, 819, 823
556, 853, 952, 971, 1141, 1193, 1242 Molecular long axis, 783, 804, 805
decomposition, 515, 539, 585, 587, 725 Molecular orbital energy, 761
equation, 531, 949 Molecular orbitals (MOs), 717, 746, 761–826
function, 915, 1250, 1255, 1256 Molecular science, 3, 737, 750, 755, 826, 1304
mechanics, 11 Molecular short axis, 804
nilpotent, 93, 94, 500–505, 511–521, 526, Molecular states, 755
536, 539, 577, 633 Molecular systems, 746, 749
self-adjoint, 555 Momentum, 3–5, 7, 28–30, 33, 57, 62, 396,
semi-simple, 513, 536, 539 932, 958, 960, 977, 991, 1002, 1005,
singular, 579, 1161 1018, 1024, 1073, 1105, 1112, 1116,
skew-symmetric, 616, 618, 626, 902, 903, 1123, 1136
907, 921, 1288, 1302 Momentum operator, 29–31, 33, 63, 79–94,
symmetric, 133, 592, 594–597, 631, 1160, 147, 1003
1170, 1197, 1247, 1248, 1251–1253, Momentum space
1262, 1302 representation, 1017, 1018, 1049, 1073
transposed, 405, 498, 549, 562, 1035 Monoclinic, 380
Matrix element Monomials, 841, 886, 888
of electric dipole, 144 Moving coordinate system, 695, 696, 698, 861,
of Hamiltonian, 766, 767 923, 1193, 1229
Matter field, 1076 Multiple roots, 258, 487, 500, 505, 543, 544,
Matter wave, 6, 7 583
Maximum number, 458 Multiplication table, 649, 674, 675, 714, 727
Maxwell, J.C., 332, 333, 380, 381, 1049–1053, Multiplicity, 495, 508, 509, 513, 520, 525, 586,
1058 589, 600, 809
Maxwell’s equations, 277–303, 332, 333, 380, Multiply connected, 199, 229
381, 1049, 1050, 1052, 1053, 1058 Multivalued function, 256–274
Mechanical system, 33, 124, 396–399
Meromorphic, 232, 238
Meta, 794, 795 N
Methane, 620, 622, 640 Nabla, 64, 933
Method of variation of constants, 685, 755, Nano capacitor, 161
805–826 Nano device, 164
Metric Naphthalene, 674, 684
Euclidean, 939, 1165 Natural units, 934, 935, 991, 993, 995, 1002,
indefinite, 476, 937, 1058, 1065–1071, 1049, 1050, 1086, 1095, 1137
1187 Necessary and sufficient condition, 195, 198,
Minkowski, 936, 937, 1052 201, 405, 473, 476, 486, 554, 579, 580,
space, 202, 204, 226, 227, 550, 937 648, 650, 655, 855, 918, 1143, 1179,
tensor, 936, 1165, 1170, 1175–1178, 1187 1235
Michelson–Morley experiment, 940 Negation, 192, 196
Microcausality, 1015 Negative-energy
Minimal coupling, 1075–1078, 1085 solution, 951, 955–960, 962, 965, 1209,
Minimal substitution, 1076 1212, 1218
Minkowski space state, 948, 949, 1026, 1027, 1036, 1220,
four-dimensional, 936, 1165 1221, 1231
n-dimensional, 1177 Negative helicity, 302
1318 Index
Neumann conditions, 27, 338, 416 Normal product, 1005, 1103, 1105
Newtonian equation, 31 N-product, 1016, 1095–1103, 1111
Newton’s equation of motion, 981 n-th Root, 257–259
Nilpotent, 93, 94, 500–505, 511–521, 525, 526, Nullity, 469
528, 539, 577, 633, 1292 Null-space, 468, 529
Node, 337, 338, 347–349, 397, 773, 774, 817 Number density operator, 1002
Non-commutative, 3, 28, 603, 960 Number operator, 41, 1003
Non-commutative group, 648, 664, 772 Numerical vector, 459, 1146
Non-compact group, 1192, 1263
Non-degenerate, 155–158, 167, 168, 180, 600,
1163, 1178, 1180, 1181 O
Non-degenerate subspace, 1180, 1181 O(3), 690, 918, 919, 921
Non-equilibrium phenomenon, 368 O(3, 1), 1187–1190, 1250–1256, 1258
Non-linear equation, 1075, 1091 O(n), 902, 903, 906, 907, 1254
Non-magnetic substance, 288, 295 Oblique incidence
Non-negative, 27, 34, 51, 70, 71, 77, 96, 202, of wave, 305, 315
220, 249, 250, 406, 550, 560, 571, 592, Observable, 588, 590
840, 849, 886 Octants, 682
Non-negative operator, 76, 558 Odd function, 51, 134, 161, 1014
Non-relativistic Odd permutation, 475, 904
approximation, 7, 11, 12 Off-diagonal elements, 475, 488, 539, 584, 664,
limit, 1121 666, 770, 771, 788, 815, 936, 1034,
quantum mechanics, 959 1242, 1245
Non-singular matrix, 457, 480, 481, 488, 489, One-dimensional
492, 502, 513, 525, 535, 540, 570, harmonic oscillator, 31, 59, 128, 132
593–596, 614, 707, 711, 896, 906, 938, infinite potential well, 20
952, 1161, 1168, 1170, 1242–1249 position coordinate, 33
Non-trivial, 403, 458, 1264 representation, 772, 787, 828
Non-trivial solution, 405, 421, 430, 449, 548 system, 130–134
Non-vanishing, 15, 16, 93, 130, 134, 139, 149, One-parameter group, 896–905, 921
230, 231, 235, 282, 306, 416, 617, 681, One-particle Hamiltonian, 960–964, 1031,
767, 771, 797, 812, 1044, 1133 1077
Non-zero One-particle wave function, 1031
coefficient, 310 Open ball, 227
constant, 27 Open neighborhood, 201
determinant, 480 Open set, 190–193, 195, 196, 199, 201, 219
eigenvalue, 525, 527, 1163 Operator, 3, 31, 63, 132, 155, 278, 359, 401,
element, 89, 90 482, 505, 549, 571, 636, 673, 710, 755,
vector, 70 827, 932, 989, 1077, 1141, 1274
Norm, 24, 55, 550, 551, 555, 557, 564–567, Operator method, 33–41, 134
578, 585, 590, 607, 610, 789, 790, 1068, Optical device
1069, 1092 high-performance, 390
Normalization constant, 37, 46, 53, 114, 568, Optical path, 339, 340
954, 957, 965 Optical process, 127, 358
Normalized, 18, 24, 25, 29, 34, 35, 37, 40, 55, Optical resonator, 372, 373
70, 78, 109, 117, 123, 130, 134–136, Optical transition, 127–152, 155, 359, 743, 746,
158, 369, 568, 753, 769, 779, 813, 885, 749, 767, 780, 781, 783, 797, 802, 803,
1026, 1031, 1045, 1079, 1209, 1211, 824–826
1290 Optical waveguide, 341
Normalized eigenfunction, 37, 801 Optics, 305, 351, 377
Normalized eigenstate, 55 Orbital energy, 761, 772, 818, 823
Normalized eigenvector, 157 Order of group, 675, 722
Normal operator, 578–581, 585, 604, 694, Ordinate axis, 131
1203, 1235 Organic materials, 375
Normal-ordered product, 1005, 1016, 1095 Orthochronous Lorentz group
Index 1319
Quantum chemical calculations, 737, 755 Real symmetric matrix, 592–594, 597, 1160,
Quantum chemistry, 791, 1304 1197, 1251
Quantum electrodynamics (QED), 399, 931, Real variable, 14, 25, 205, 212, 213, 225, 237,
971, 977, 991, 1049, 1075, 1080, 1084, 411, 607
1092, 1096, 1103–1113, 1139, 1140, Rearrangement theorem, 649, 707, 721, 728,
1238 730, 739
Quantum electromagnetism, 1061 Rectangular parallelepiped, 367, 368, 372
Quantum-mechanical, 3, 21–27, 30, 59, 124, Recurrence relation, 86
155, 169, 605, 931, 959, 987, 1032, Redshift, 4, 388, 392
1069 Redshifted, 389–392, 394
Quantum-mechanical harmonic oscillator, Reduced mass, 60, 61, 110, 135, 761
31–58, 109, 111, 156, 164, 558, 1201 Reduced Planck constant, 4, 129
Quantum mechanics, 3, 6, 11, 12, 19, 28, 29, Reducible, 497, 712, 714, 725, 726, 728, 768,
31, 59, 60, 155–183, 351, 366, 401, 450, 772, 775, 780, 786, 792, 797, 854, 855,
549, 571, 699, 774, 903, 959, 977, 1075, 860, 873, 876, 878, 895
1080, 1304 Reduction, 583, 712
Quantum numbers Reflectance, 321, 328, 349
azimuthal, 122, 149 Reflected light, 312
magnetic, 143, 144 Reflected wave, 321, 324, 336, 339, 348
orbital angular momentum, 110, 124, 144 Reflection
principal, 122–124 angle, 312
Quantum operator, 29, 1045 coefficient, 317, 328, 336, 338, 349, 357
Quantum state, 21, 32, 54, 77, 93, 114, 123, Refraction angle, 312, 319, 326
128, 129, 134, 144, 145, 148, 155–164, Refractive index
167, 168, 171, 176, 366, 767, 878, 880, of dielectric, 288, 305, 324, 325, 332, 337,
957, 960, 967, 1026, 1086, 1294 338
Quantum theory of light, 368 group, 371
Quartic equation, 148, 950 of laser medium, 369, 371
Quotient group, 653 phase, 376, 377, 384, 390, 393
relative, 313, 323, 329
Region, 151, 204, 210, 213–218, 221, 225–230,
R 232–234, 294, 295, 328, 330, 332, 343,
Radial coordinate, 202, 834, 852, 867 351, 363, 438, 873, 922, 1250
Radial wave function, 109–124 Regular hexagon, 791
Radiation field, 129, 139, 140, 1065 Regular point, 212, 223
Raising and lowering operators, 93 Regular singular point, 173
Raising operator, 76, 93 Regular tetrahedron, 687, 805
Rank, 468, 469, 505, 522, 527, 530–532, 548, Relations between roots and coefficients, 487
950, 955, 958, 1174, 1175, 1219, 1238 Relative coordinates, 59–61
Rapidity, 943 Relative permeability, 279
Rational number, 203 Relative permittivity, 279
Rayleigh–Jeans law, 351, 357 Relative refractive index, 313, 323, 324, 329
Real analysis, 186, 222 Relativistic effect, 949
Real axis, 18, 37, 185, 202, 242, 250, 254, 255, Relativistic field, 993, 1023, 1027, 1055, 1097
263, 267, 269 Relativistic quantum mechanics, 959
Real domain, 205, 222, 234, 273, 592 Relativity
Real form, 1268, 1289, 1293, 1298–1300 special principle of, 940
Real function, 51, 130, 143, 144, 205, 330, 414, Renormalization, 1110, 1140
420, 423, 425, 426, 778, 783, 790, 933, Representation
1082, 1083 adjoint, 905–915, 1261, 1273–1278, 1286
Real number antisymmetric, 750–753, 890–892, 895,
field, 1142, 1155, 1259 1297
line, 207, 418 complex conjugate, 789, 846
Real space, 828, 834, 837 conjugate, 651
1322 Index
process, 4, 5, 176, 1108, 1111, 1119 1087–1089, 1141, 1160, 1161, 1196,
Schrödinger, E., 8 1197, 1203, 1219, 1220, 1223, 1235,
Schrödinger equation 1238, 1242, 1243, 1246
of motion, 60 Simple harmonic motion, 32
time-evolved, 128 Simply connected, 199, 200, 215–219, 221,
Schur’s First Lemma, 717, 720, 723, 810, 852, 225, 228, 229, 234, 920–926, 1241,
869 1249, 1250, 1258, 1263
Schur’s lemmas, 717–723, 853 Simply reducible, 876, 878, 895
Schur’s Second Lemma, 717, 719, 721, 734, Simultaneous eigenfunction, 89
764, 854, 855, 860, 1265 Singleton, 201
Second-order differential operator, 411–416 Single-valuedness, 212, 221, 259, 260
Second-order extension, 1301 Singularity
Second-order linear differential equation essential, 232
(SOLDE), 3, 14, 15, 33, 45, 72, 73, 85, removable, 230, 252
383, 401–406, 410–412, 417, 424, 447, Singular part, 230, 232
448, 451, 607, 618–620, 945, 986 Singular point
Secular equation, 530, 755, 770–773, 777, 788, isolated, 242
794, 798, 799, 801, 812, 814, 815, 823 regular, 173
Selection rule, 127–152, 746 Sinusoidal oscillation, 129, 131, 135
Self-adjoint, 411, 415, 416, 424, 432, 449, 450, Skew-symmetric, 616, 618, 626, 902, 903, 907,
452, 555 921, 1288, 1302
Self-adjoint operator, 413, 422, 424, 449 S-matrix
Self-energy element, 1104–1108, 1111, 1115, 1116,
electron, 1109, 1139 1120
photon, 1109–1111, 1139 expansion, 1091–1095, 1139
Sellmeier’s dispersion formula, 371, 375, 376 Snell’s law, 313, 323, 325, 326, 343
Semicircle, 237–239, 243, 245, 246, 249, 250, SO0(3, 1), 1189, 1255, 1256, 1258, 1261, 1263,
255, 1020–1022 1264, 1266, 1281, 1284–1287, 1298
Semiclassical, 127 Solid angle, 681, 684, 702, 1124
Semiclassical theory, 150 Solid-state chemistry, 395
Semiconductor Solid-state physics, 395
crystal, 372, 373, 375, 376, 395 Space coordinates, 127, 934, 936, 1030
organic, 372, 373, 375, 376, 395 Space group, 663
processing, 395 Spacelike, 940, 1015, 1186
Semi-infinite dielectric media, 305, 306 Spacelike separation, 1015
Semi-simple, 512–515, 536, 539, 543, 546, Spacelike vector, 1060, 1184, 1186
1230, 1235 Space-time coordinates, 934–936, 946, 987,
Separation axiom, 200, 210 998, 1039, 1082, 1102
Separation condition, 200 Space-time point, 939, 1193
Separation of variables, 12, 69–74, 110, 127, Span, 459–462, 468–470, 480, 495–498, 501,
353, 1086 517, 529, 539, 548, 572, 600, 602, 691,
Sesquilinear, 552 693, 764, 771, 788, 791, 808–810, 823,
Set 828, 846, 869, 873, 874, 886, 891, 915,
difference, 188 1143, 1157, 1181–1184, 1186, 1187,
finite, 188, 1083, 1086 1288, 1289
infinite, 188, 1083, 1086 Special functions, 59, 109, 643
of points, 232 Special linear group, 1249–1258
theory, 185, 186, 202, 204, 274 Special orthogonal group
Signature, 1161 SO(3), 661, 689–704, 827, 830, 834–874,
Similarity transformation, 485, 487–489, 492, 876, 878, 895, 896, 898, 904, 909,
502, 504, 511–513, 523, 525, 533, 535, 911–915, 918, 919, 921, 923, 924, 926,
537, 540, 542, 556, 577, 580, 581, 583, 1241, 1265, 1266, 1276, 1286, 1287
584, 587, 593, 603, 614, 627, 631, 635, SO(n), 902, 903, 906, 907, 918
681, 694, 705, 707, 709, 712, 834, 851, Special solution, 8
869, 895, 906, 919, 951, 952, Special theory of relativity, 931, 934–944, 1256
1324 Index
Special unitary group 514, 517, 528–531, 539, 540, 544, 548,
SU(2), 827, 830, 834–880, 883, 896, 898, 571–574, 583, 584, 602, 603, 649, 650,
904, 909–914, 923, 924, 926, 1207, 665, 713, 714, 717, 808, 809, 902, 1142,
1241, 1242, 1249, 1250, 1258, 1262, 1145, 1178–1184, 1186, 1187, 1252,
1265, 1266, 1276, 1286, 1287, 1294 1269, 1289, 1290, 1298–1300
SU(n), 837, 902, 903, 906, 907 Superior limit, 226
Spectral decomposition, 587–589, 604, 605, Superposed wave, 346, 347
694, 1232, 1233, 1235–1241 Superposition
Spectral lines, 129, 279, 370, 755 of waves, 294–303, 336, 344–346, 348
Spherical basis, 834, 850 Surface term, 409–411, 413, 416, 419,
Spherical coordinate, 60, 62, 63, 834, 852, 978 424–426, 430, 432, 436–440, 446, 450,
Spherical surface harmonics, 89, 94–105, 109, 621
171, 846, 869, 873, 915 Surjective, 472–474, 478, 653, 655, 657, 912,
Spherical symmetry, 59, 60, 922 924, 1266
Spinor Sylvester’s law of inertia, 1161
Dirac, 957, 958, 961–963, 1201, 1202, Symmetric, 59, 60, 69, 133, 175, 179, 296, 360,
1207–1212, 1214–1217, 1220, 1223, 386, 422, 425, 571, 592–597, 616, 618,
1227, 1236, 1237, 1256, 1257 626, 627, 630, 631, 663, 664, 688, 689,
space, 1187 750–753, 760, 765, 767, 768, 771, 777,
Spin state, 123 780, 786, 797, 811, 824, 825, 835, 864,
Spring constant, 31 878, 890–892, 895, 898, 902, 903, 907,
Square-integrable, 26 921, 937, 943, 995, 1051, 1137,
Square matrix, 457, 488, 490, 495, 497, 498, 1159–1161, 1170, 1192, 1197, 1221,
501, 512, 513, 523, 608, 609, 614, 711, 1247, 1248, 1251–1253, 1256–1258,
718, 719, 723, 830, 858, 975, 1125, 1262, 1288, 1296, 1297, 1302
1258 Symmetry
Standard deviation, 54, 57 discrete, 1256
Stark effect, 176 groups, 648, 661–704, 761, 764, 782
State vector, 129, 133, 135, 157, 1086, 1087, operations, 661, 663–669, 671, 673–680,
1089, 1091 685–687, 710, 716, 717, 728, 737, 755,
Stationary current, 283 759, 763–765, 771–773, 776, 777, 784,
Stationary wave, 343–350, 355, 396 787, 797, 806, 809, 810
Step function, 422 requirement, 179, 671, 673, 813, 815–817
Stimulated absorption, 358, 359, 368 species, 685, 746, 772, 784, 785, 802, 824
Stirling’s interpolation formula, 612 transformation, 774, 775, 786, 792, 798,
Stokes’ theorem, 306, 307 807, 808
Strictly positive, 17 Symmetry-adapted linear combination (SALC),
Structure constants, 903, 926, 1095, 1139, 755, 771–773, 777–778, 784, 787, 788,
1277, 1286 791, 792, 794, 798, 800, 801, 805, 810,
Structure-property relationship, 372 811, 813–816, 891, 892
Strum-Liouville system, 450 Syn-phase, 347, 349
Subalgebra, 1268, 1269, 1290, 1293, System of differential equations, 607–637, 643,
1298–1300 644
Subgroups, 647, 649–652, 656, 659, 675, 677, System of linear equations, 476, 478
685, 688–690, 732, 776, 784, 786, 791,
897, 906, 914, 917, 918, 921, 1252,
1255 T
Submatrix, 382, 476, 477, 583, 809, 891, 892, Tangent space, 1187
951, 1213 Taylor’s expansion, 223, 225, 229
Subset, 186–198, 200, 201, 204, 233, 459, Taylor’s series, 99, 223–229, 231, 241, 245,
649–651, 655, 902, 915–918, 921, 1263 250, 272
Subspace, 190, 199, 200, 204, 459–463, Td, 673, 680–689, 806–808, 811, 825, 920
468–470, 486, 495–501, 504, 505, 508, TE mode, see Transverse electric (TE) mode
Index 1325
Transverse magnetic (TM) mode, 341, 383, matrix, 138, 299, 556, 559, 560, 566, 571,
386, 390 580–583, 585, 590, 591, 597, 602–604,
Transverse magnetic (TM) wave, 305, 616, 618, 627, 631, 694, 705, 707, 713,
313–319, 321, 323–325, 327, 329, 716, 725, 772, 789, 796, 801, 809, 832,
331–338, 341 833, 843, 849–851, 877, 881, 885, 894,
Transverse mode, 349, 375, 377, 386, 392, 895, 898, 906, 907, 1192, 1218, 1239,
1059, 1106, 1117, 1133 1242, 1243, 1245–1247, 1256, 1258,
Transverse wave, 290, 291, 365 1297
Trial function, 178, 179 operator, 571–605, 834, 881, 891, 900, 901,
Triangle inequality, 551 974, 1087, 1221, 1230
Triangle matrix representation, 705–707, 709, 713, 714,
lower, 488, 539, 577, 632 741, 855, 860, 1192, 1263
upper, 488, 490, 502, 512, 523, 577, 632 transformation, 138, 142, 566, 590, 766,
Trigonometric formula, 104, 132, 320, 323, 789, 790, 801, 840, 880, 882, 883, 885,
324, 342, 388, 872, 914 891, 895, 1086, 1088, 1196, 1207
Trigonometric function, 143, 248, 665, 843, Unitary similarity transformation, 577, 580,
866 583, 584, 587, 603, 627, 635, 694, 834,
Triple integral, 992, 1065 851, 895, 906, 919, 951, 1087–1089,
Triply degenerate, 817, 824 1197, 1203, 1242, 1243, 1246
Trivial, 21, 26, 199, 309, 403, 421, 423, 432, Unit polarization vector
449, 458, 462, 479, 486, 502, 507, 548, of electric field, 129, 293, 313, 363, 364,
579, 591, 595, 633, 634, 648, 690, 693, 781, 824
767, 831, 849, 854, 892, 898, 1104, of magnetic field, 314, 365
1158, 1201, 1210, 1233, 1234, 1264, Unit set, 201
1301 Universal covering group, 926, 1241, 1242,
Trivial solution, 421, 423, 432, 449, 479, 486, 1249
548 Universal set, 186–190
Trivial topology, 199 Unknowns, 157, 293, 316, 386, 622, 624, 626
T1-space, 199–202, 926 Unperturbed
Two-level atoms, 351, 358–361, 366, 367, 396 state, 157, 162
system, 156, 157
Upper right off-diagonal elements, 475, 488
U
U(1), 1082–1085
Ultraviolet catastrophe, 351, 358 V
Ultraviolet light, 377 Vacuum expectation value, 1016, 1017, 1043,
Unbounded, 25, 29, 223, 1192, 1263 1098–1100
Uncertainty principle, 29, 53–58 Variable transformation, 48, 80, 142, 241
Uncountable, 647, 989 Variance, 53–58
Underlying set, 190 Variance operator, 53
Undetermined constant, 76, 78, 86 Variational method, 155, 176–183
Uniformly convergent, 224, 225, 227–229 Variational principle, 36, 112
Uniqueness Vector
of decomposition, 513 analysis, 284, 291, 1050
of Green’s function, 424 field, 280, 281, 306, 1050, 1084, 1165
of representation, 460, 465, 467, 658, 710 potential, 1050, 1051, 1055, 1077–1079,
of solution, 408, 430 1129
Unique solution, 438, 441, 447, 472, 476, 941 singular, 1178, 1184–1187
Unitarity, 709, 739, 741, 745, 766, 837, 846, transformation, 465, 466, 476, 480, 514,
848, 852, 882, 900, 951, 972, 974, 1230 661, 704, 831, 1170, 1171
Unitary unit, 4, 131, 132, 281, 288–290, 306–308,
diagonalization, 580–588, 592, 598 313–315, 336, 339, 340, 365, 460, 461,
group, 827, 837, 844, 901–903, 906 564, 605, 638, 831
Index 1327
W
Wave equation, 8, 277, 287, 293, 334, 336, 353, Z
354, 932 Zenithal angle, 72, 700, 861, 869, 924, 1194,
Wave front, 340 1196
Waveguide, 305, 331–343, 349, 372, 375, 377, Zero
380, 386, 390–394 element, 1283, 1284
Wavelength, 4, 6, 7, 152, 312, 313, 321, 346, matrix, 28, 495, 497, 501, 502, 514–516,
348–350, 370–376, 388–395, 638 539, 586, 612, 613, 921, 947, 1204
Wavelength dispersion, 371, 372, 374–376, vector, 24, 278, 458, 460, 486, 495, 568,
392, 393 585, 650, 899, 921, 1178
Wavenumber, 4, 7, 287, 337, 338, 340, 370, Zero-point energy, 1003
379, 1002 Zeros, 19, 229–232, 521
Wavenumber vector, 4, 5, 287, 307, 308, 311,
326, 332, 334, 368, 379, 391, 1059