Download as pdf or txt
Download as pdf or txt
You are on page 1of 1335

Shu Hotta

Mathematical
Physical
Chemistry
Practical and Intuitive Methodology
Third Edition
Mathematical Physical Chemistry
Shu Hotta

Mathematical Physical
Chemistry
Practical and Intuitive Methodology

Third Edition
Shu Hotta
Takatsuki, Osaka, Japan

ISBN 978-981-99-2511-7 ISBN 978-981-99-2512-4 (eBook)


https://doi.org/10.1007/978-981-99-2512-4

© Springer Nature Singapore Pte Ltd. 2018, 2020, 2023


This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of the
material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation,
broadcasting, reproduction on microfilms or in any other physical way, and transmission or information
storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology
now known or hereafter developed.
The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication
does not imply, even in the absence of a specific statement, that such names are exempt from the relevant
protective laws and regulations and therefore free for general use.
The publisher, the authors, and the editors are safe to assume that the advice and information in this
book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or
the editors give a warranty, expressed or implied, with respect to the material contained herein or for any
errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional
claims in published maps and institutional affiliations.

This Springer imprint is published by the registered company Springer Nature Singapore Pte Ltd.
The registered company address is: 152 Beach Road, #21-01/04 Gateway East, Singapore 189721,
Singapore
To the memory of
my brother Kei Hotta
and
Emeritus Professor Yûsaku Hamada
Preface to the Third Edition

This book is the third edition of Mathematical Physical Chemistry. Although the
main concept and main theme remain unchanged, this book’s third edition contains
the introductory quantum theory of fields. The major motivation for including the
relevant topics in the present edition is as follows: The Schrödinger equation has
prevailed as a quantum-mechanical fundamental equation up to the present date in
many fields of natural science. The Dirac equation was formulated as a relativisti-
cally covariant expression soon after the Schrödinger equation was discovered. In
contrast to the Schrödinger equation, however, the Dirac equation is less prevailing
even nowadays.
One of the major reasons for this is that the Dirac equation has been utilized
almost exclusively by elementary particle physicists and related researchers. And
what is more, a textbook of the elementary particle physics or quantum theory of
fields cannot help but be thick because it has to deal with so many topics that require
a detailed explanation, starting with, e.g., renormalization and regularization as well
as the underlying gauge theory. Such a situation inevitably leads to significant
reduction of descriptions about fundamental properties of the Dirac equation.
Under these circumstances, the author has aimed to explain the fundamental prop-
erties of the Dirac equation in a plain style so that scientific audience of various fields
of specialization can readily understand the essence. In particular, the author
described in detail the transformation properties of the Dirac equation (or Dirac
operator) and its plane wave solutions in terms of standard matrix algebra.
Another motivation for writing the present edition lies in extending the concepts
of vector spaces. The theory of linear vector spaces has been developed in previous
editions, in which the inner product space is intended as a vector space. The Dirac
equation was constructed so as to be consistent with the requirement of the special
theory of relativity that is closely connected to the Minkowski space, a type of vector
space. In relation to the Minkowski space, moreover, various kinds of vector spaces
play a role in the quantum theory of fields. Examples include tensor space and spinor
space. In particular, the tensor spaces are dealt with widely in different fields of

vii
viii Preface to the Third Edition

natural science. In this book, therefore, the associated theory of tensor spaces has
been treated rather systematically along with related concepts.
As is the case with the previous editions of this book, the author has disposed the
mathematical topics at the last chapter of individual parts (Parts I–V). In the present
edition, we added a chapter that dealt with advanced topics of Lie algebra at the end
of the book. The description has been improved in several parts so that readers can
gain clear understanding. Errors found in the previous editions have been corrected.
As in the case of the previous editions, readers benefit from going freely back and
forth across the whole topics of this book.
The author owes sincere gratitude to the late Emeritus Professor Yûsaku Hamada
in Kyoto Institute of Technology for a course of excellent lectures delivered by him
on applied mathematics in the undergraduate days of the author. The author is deeply
grateful as well to Emeritus Professor Chiaki Tsukamoto in Kyoto Institute of
Technology for his helpful discussions and suggestions on exponential functions
of matrices (Chap. 15). Once again, the author wishes to thank many students for
valuable discussions and Dr. Shin’ichi Koizumi, Springer, for giving him an oppor-
tunity to write this book. Finally, the author is most grateful to his wife, Kazue Hotta,
for continually encouraging him to write the book.

Takatsuki, Osaka, Japan Shu Hotta


March 2023
Preface to the Second Edition

This book is the second edition of Mathematical Physical Chemistry. Mathematics is


a common language of natural science including physics, chemistry, and biology.
Although the words mathematical physics and physical chemistry (or chemical
physics) are commonly used, mathematical physical chemistry sounds rather uncom-
mon. Therefore, it might well be reworded as the mathematical physics for chemists.
The book title could have been, for instance, “The Mathematics Of Physics And
Chemistry” accordingly, in tribute to the famous book that was written three-quarters
of a century ago by H. Margenau and G. M. Murphy. Yet, the word mathematical
physical chemistry is expected to be granted citizenship, considering that chemistry
and related interdisciplinary fields such as materials science and molecular science
are becoming increasingly mathematical.
The main concept and main theme remain unchanged, but this book’s second
edition contains the theory of analytic functions and the theory of continuous groups.
Both the theories are counted as one of the most elegant theories of mathematics. The
mathematics of these topics is of a somewhat advanced level and something like a
“sufficient condition” for chemists, whereas that of the first edition may be a
prerequisite (or a necessary condition) for them. Therefore, chemists (or may be
physicists as well) can creatively use the two editions. In association with these
major additions to the second edition, the author has disposed the mathematical
topics (the theory of analytic functions, Green’s functions, exponential functions of
matrices, and the theory of continuous groups) at the last chapter of individual parts
(Parts I–IV).
At the same time, the author has also made several specific revisions including the
introductory discussion on the perturbation method and variational method, both of
which can be effectively used for gaining approximate solutions of various quantum-
mechanical problems. As another topic, the author has presented the recent progress
on organic lasers. This topic is expected to help develop high-performance light-
emitting devices, one of important fields of materials science. As in the case of the
first edition, readers benefit from going freely back and forth across the whole topics
of this book.

ix
x Preface to the Second Edition

Once again, the author wishes to thank many students for valuable discussions
and Dr. Shin’ichi Koizumi, Springer, for giving him an opportunity to write
this book.

Takatsuki, Osaka, Japan Shu Hotta


October 2019
Preface to the First Edition

The contents of this book are based upon manuscripts prepared for both undergrad-
uate courses of Kyoto Institute of Technology by the author entitled “Polymer
Nanomaterials Engineering” and “Photonics Physical Chemistry” and a master’s
course lecture of Kyoto Institute of Technology by the author entitled “Solid-State
Polymers Engineering.”
This book is intended for graduate and undergraduate students, especially those
who major in chemistry and, at the same time, wish to study mathematical physics.
Readers are supposed to have basic knowledge of analysis and linear algebra.
However, they are not supposed to be familiar with the theory of analytic functions
(i.e., complex analysis), even though it is desirable to have relevant knowledge
about it.
At the beginning, mathematical physics looks daunting to chemists, as used to be
the case with myself as a chemist. The book introduces the basic concepts of
mathematical physics to chemists. Unlike other books related to mathematical
physics, this book makes a reasonable selection of material so that students majoring
in chemistry can readily understand the contents in spontaneity. In particular, we
stress the importance of practical and intuitive methodology. We also expect engi-
neers and physicists to benefit from reading this book.
In Parts I and II, the book describes quantum mechanics and electromagnetism.
Relevance between the two is well considered. Although quantum mechanics covers
the broad field of modern physics, in Part I we focus on a harmonic oscillator and a
hydrogen (like) atom. This is because we can study and deal with many of funda-
mental concepts of quantum mechanics within these restricted topics. Moreover,
knowledge acquired from the study of the topics can readily be extended to practical
investigation of, e.g., electronic sates and vibration (or vibronic) states of molecular
systems. We describe these topics by both analytic method (that uses differential
equations) and operator approach (using matrix calculations). We believe that the
basic concepts of quantum mechanics can be best understood by contrasting the
analytical and algebraic approaches. For this reason, we give matrix representations

xi
xii Preface to the First Edition

of physical quantities whenever possible. Examples include energy eigenvalues of a


quantum-mechanical harmonic oscillator and angular momenta of a hydrogen-like
atom. At the same time, these two physical systems supply us with a good oppor-
tunity to study classical polynomials, e.g., Hermite polynomials, (associated) Legen-
dre polynomials, Laguerre polynomials, and Gegenbauer polynomials and special
functions, more generally. These topics constitute one of important branches of
mathematical physics. One of the basic concepts of quantum mechanics is that a
physical quantity is represented by a Hermitian operator or matrix. In this respect,
the algebraic approach gives a good opportunity to get familiar with this concept. We
present tangible examples for this. We also emphasize the importance of the notion
of Hermiticity of a differential operator. We often encounter unitary operator or
unitary transformation alongside of the notion of Hermitian operator. We show
several examples of the unitary operators in connection with transformation of
vectors and coordinates.
Part II describes Maxwell equations and their applications to various phenomena
of electromagnetic waves. These include their propagation, reflection, and transmis-
sion in dielectric media. We restrict ourselves to treating those phenomena in
dielectrics without charge. Yet, we cover a wide range of important topics. In
particular, when two (or more) dielectrics are in contact with each other at a plane
interface, reflection and transmission of light are characterized by various important
parameters such as reflection and transmission coefficients, Brewster angles, and
critical angles. We should have their proper understanding not only from a point of
view of basic study, but also to make use of relevant knowledge in optical device
applications such as a waveguide. In contrast to the concept of electromagnetic
waves, light possesses a characteristic of light quanta. We present semiclassical and
statistical approach to blackbody radiation occurring in a simplified system in
relation to Part I. The physical processes are well characterized by a notion of
two-level atoms. In this context, we outline the dipole radiation within the frame-
work of the classical theory. We briefly describe how the optical processes occurring
in a confined dielectric medium are related to a laser that is of great importance in
fundamental science and its applications. Many of basic equations of physics are
descried as second-order linear differential equations (SOLDEs). Different methods
were developed and proposed to seek their solutions. One of the most important
methods is that of Green’s functions. We present the introductory theory of Green’s
functions accordingly. In this connection, we rethink the Hermiticity of a differential
operator.
In Parts III and IV, we describe algebraic structures of mathematical physics.
Their understanding is useful to studies of quantum mechanics and electromagne-
tism whose topics are presented in Parts I and II. Part III deals with theories of linear
vector spaces. We focus on the discussion on vectors and their transformations in
finite-dimensional vector spaces. Generally, we consider the vector transformations
among the vector spaces of different dimensions. In this book, however, we restrict
ourselves to the case of the transformation between the vector spaces of same
dimension, i.e., endomorphism of the space (Vn → Vn). This is not only because
this is most often the case with many of physical applications, but because the
Preface to the First Edition xiii

relevant operator is represented by a square matrix. Canonical forms of square


matrices hold an important position in algebra. These include a triangle matrix,
diagonalizable matrix as well as a nilpotent matrix and idempotent matrix. The most
general form will be Jordan canonical form. We present its essential parts in detail
taking a tangible example. Next to the general discussion, we deal with an inner
product space. Once an inner product is defined between any couple of vectors, the
vector space is given a fruitful structure. An example is a norm (i.e., “length”) of a
vector. Also, we gain a clear relationship between Parts III and I. We define various
operators or matrices that are important in physical applications. Examples include
normal operators (or matrices) such as Hermitian operators, projection operators, and
unitary operators. Once again, we emphasize the importance of Hermitian operators.
In particular, two commutable Hermitian matrices share simultaneous eigenvectors
(or eigenstates) and, in this respect, such two matrices occupy a special position in
quantum mechanics.
Finally, Part IV describes the essence of group theory and its chemical applica-
tions. Group theory has a broad range of applications in solid-state physics, solid-
state chemistry, molecular science, etc. Nonetheless, the knowledge of group theory
does not seem to have fully prevailed among chemists. We can discover an adequate
reason for this in a preface to the first edition of “Chemical Applications of Group
Theory” written by F. A. Cotton. It might well be natural that definition and
statement of abstract algebra, especially group theory, sound somewhat pretentious
for chemists, even though the definition of group is quite simple. Therefore, we
present various examples for readers to get used to notions of group theory. Notion
of mapping is important as in the case of linear vector spaces. Aside from being
additive with calculation for a vector space and multiplicative for a group, the
fundamentals of calculation regulations are pretty much the same regarding the
vector space and group. We describe the characteristics of symmetry groups in detail
partly because related knowledge is useful for molecular orbital (MO) calculations
that are presented in the last section of the book. Representation theory is probably
one of the most daunting notions for chemists. Practically, however, the representa-
tion is just homomorphism that corresponds to a linear transformation in a vector
space. In this context, the representation is merely denoted by a number or a matrix.
Basis functions of representation correspond to basis vectors in a vector space.
Grand orthogonality theorem (GOT) is a “nursery bed” of the representation theory.
Therefore, readers are encouraged to understand its essence apart from the rigorous
proof of the theorem. In conjunction with Part III, we present a variety of projection
operators. These are very useful to practical applications in, e.g., quantum mechanics
and molecular science. The final parts of the book are devoted to applications of
group theory to problems of physical chemistry, especially those of quantum
chemistry, more specifically molecular orbital calculations. We see how symmetry
consideration, particularly use of projection operators, saves us a lot of labor.
Examples include aromatic hydrocarbons and methane.
The previous sections sum up the contents of this book. Readers may start with
any part and go freely back and forth. This is because contents of many parts are
interrelated. For example, we stress the importance of Hermiticity of differential
xiv Preface to the First Edition

operators and matrices. Also, projection operators and nilpotent matrices appear in
many parts along with their tangible applications to individual topics. Hence, readers
are recommended to carefully examine and compare the related contents throughout
the book. We believe that readers, especially chemists, benefit from the writing style
of this book, since it is suited to chemists who are good at intuitive understanding.
The author would like to thank many students for their valuable suggestions and
discussions at the lectures. The author also wishes to thank many students for
valuable discussions and Dr. Shin’ichi Koizumi, Springer, for giving him an oppor-
tunity to write this book.

Kyoto, Japan Shu Hotta


October 2017
Contents

Part I Quantum Mechanics


1 Schrödinger Equation and Its Application . . . . . . . . . . . . . . . . . . . 3
1.1 Early-Stage Quantum Theory . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2 Schrödinger Equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.3 Simple Applications of Schrödinger Equation . . . . . . . . . . . . 14
1.4 Quantum-Mechanical Operators and Matrices . . . . . . . . . . . . 21
1.5 Commutator and Canonical Commutation Relation . . . . . . . . 28
Reference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
2 Quantum-Mechanical Harmonic Oscillator . . . . . . . . . . . . . . . . . . 31
2.1 Classical Harmonic Oscillator . . . . . . . . . . . . . . . . . . . . . . . . 31
2.2 Formulation Based on an Operator Method . . . . . . . . . . . . . . 33
2.3 Matrix Representation of Physical Quantities . . . . . . . . . . . . . 42
2.4 Coordinate Representation of Schrödinger Equation . . . . . . . . 45
2.5 Variance and Uncertainty Principle . . . . . . . . . . . . . . . . . . . . 53
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
3 Hydrogen-Like Atoms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
3.1 Introductory Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
3.2 Constitution of Hamiltonian . . . . . . . . . . . . . . . . . . . . . . . . . 60
3.3 Separation of Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
3.4 Generalized Angular Momentum . . . . . . . . . . . . . . . . . . . . . 75
3.5 Orbital Angular Momentum: Operator Approach . . . . . . . . . . 79
3.6 Orbital Angular Momentum: Analytic Approach . . . . . . . . . . 94
3.6.1 Spherical Surface Harmonics and Associated
Legendre Differential Equation . . . . . . . . . . . . . . . . 94
3.6.2 Orthogonality of Associated Legendre Functions . . . . 105
3.7 Radial Wave Functions of Hydrogen-Like Atoms . . . . . . . . . 109
3.7.1 Operator Approach to Radial Wave Functions . . . . . . 110
3.7.2 Normalization of Radial Wave Functions . . . . . . . . . 114

xv
xvi Contents

3.7.3 Associated Laguerre Polynomials . . . . . . . . . . . . . . . 118


3.8 Total Wave Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125
4 Optical Transition and Selection Rules . . . . . . . . . . . . . . . . . . . . . 127
4.1 Electric Dipole Transition . . . . . . . . . . . . . . . . . . . . . . . . . . . 127
4.2 One-Dimensional System . . . . . . . . . . . . . . . . . . . . . . . . . . . 130
4.3 Three-Dimensional System . . . . . . . . . . . . . . . . . . . . . . . . . . 134
4.4 Selection Rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144
4.5 Angular Momentum of Radiation . . . . . . . . . . . . . . . . . . . . . 150
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152
5 Approximation Methods of Quantum Mechanics . . . . . . . . . . . . .. 155
5.1 Perturbation Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 155
5.1.1 Quantum State and Energy Level Shift Caused by
Perturbation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157
5.1.2 Several Examples . . . . . . . . . . . . . . . . . . . . . . . . . . 160
5.2 Variational Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 176
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183
6 Theory of Analytic Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185
6.1 Set and Topology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185
6.1.1 Basic Notions and Notations . . . . . . . . . . . . . . . . . . 186
6.1.2 Topological Spaces and Their Building Blocks . . . . . 189
6.1.3 T1-Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 199
6.1.4 Complex Numbers and Complex Plane . . . . . . . . . . . 202
6.2 Analytic Functions of a Complex Variable . . . . . . . . . . . . . . 205
6.3 Integration of Analytic Functions: Cauchy’s Integral
Formula . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213
6.4 Taylor’s Series and Laurent’s Series . . . . . . . . . . . . . . . . . . . 223
6.5 Zeros and Singular Points . . . . . . . . . . . . . . . . . . . . . . . . . . . 229
6.6 Analytic Continuation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 232
6.7 Calculus of Residues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 234
6.8 Examples of Real Definite Integrals . . . . . . . . . . . . . . . . . . . 236
6.9 Multivalued Functions and Riemann Surfaces . . . . . . . . . . . . 256
6.9.1 Brief Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 256
6.9.2 Examples of Multivalued Functions . . . . . . . . . . . . . 264
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 274

Part II Electromagnetism
7 Maxwell’s Equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 277
7.1 Maxwell’s Equations and Their Characteristics . . . . . . . . . . . 277
7.2 Equation of Wave Motion . . . . . . . . . . . . . . . . . . . . . . . . . . 284
7.3 Polarized Characteristics of Electromagnetic Waves . . . . . . . . 289
7.4 Superposition of Two Electromagnetic Waves . . . . . . . . . . . . 294
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 303
Contents xvii

8 Reflection and Transmission of Electromagnetic Waves


in Dielectric Media . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 305
8.1 Electromagnetic Fields at an Interface . . . . . . . . . . . . . . . . . . 305
8.2 Basic Concepts Underlying Phenomena . . . . . . . . . . . . . . . . . 307
8.3 Transverse Electric (TE) Waves and Transverse Magnetic
(TM) Waves . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 313
8.4 Energy Transport by Electromagnetic Waves . . . . . . . . . . . . . 319
8.5 Brewster Angles and Critical Angles . . . . . . . . . . . . . . . . . . . 322
8.6 Total Reflection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 326
8.7 Waveguide Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . 331
8.7.1 TE and TM Waves in a Waveguide . . . . . . . . . . . . . 331
8.7.2 Total Internal Reflection and Evanescent Waves . . . . 338
8.8 Stationary Waves . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 343
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 350
9 Light Quanta: Radiation and Absorption . . . . . . . . . . . . . . . . . . . 351
9.1 Blackbody Radiation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 351
9.2 Planck’s Law of Radiation and Mode Density
of Electromagnetic Waves . . . . . . . . . . . . . . . . . . . . . . . . . . 353
9.3 Two-Level Atoms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 358
9.4 Dipole Radiation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 361
9.5 Lasers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 367
9.5.1 Brief Outlook . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 367
9.5.2 Organic Lasers . . . . . . . . . . . . . . . . . . . . . . . . . . . . 372
9.6 Mechanical System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 396
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 399
10 Introductory Green’s Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . 401
10.1 Second-Order Linear Differential Equations (SOLDEs) . . . . . 401
10.2 First-Order Linear Differential Equations (FOLDEs) . . . . . . . 406
10.3 Second-Order Differential Operators . . . . . . . . . . . . . . . . . . . 411
10.4 Green’s Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 417
10.5 Construction of Green’s Functions . . . . . . . . . . . . . . . . . . . . 424
10.6 Initial-Value Problems (IVPs) . . . . . . . . . . . . . . . . . . . . . . . . 431
10.6.1 General Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . 431
10.6.2 Green’s Functions for IVPs . . . . . . . . . . . . . . . . . . . 433
10.6.3 Estimation of Surface Terms . . . . . . . . . . . . . . . . . . 436
10.6.4 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 440
10.7 Eigenvalue Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 448
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 453

Part III Linear Vector Spaces


11 Vectors and Their Transformation . . . . . . . . . . . . . . . . . . . . . . . . 457
11.1 Vectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 457
11.2 Linear Transformations of Vectors . . . . . . . . . . . . . . . . . . . . 463
xviii Contents

11.3 Inverse Matrices and Determinants . . . . . . . . . . . . . . . . . . . . 473


11.4 Basis Vectors and Their Transformations . . . . . . . . . . . . . . . . 478
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 484
12 Canonical Forms of Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 485
12.1 Eigenvalues and Eigenvectors . . . . . . . . . . . . . . . . . . . . . . . . 485
12.2 Eigenspaces and Invariant Subspaces . . . . . . . . . . . . . . . . . . 495
12.3 Generalized Eigenvectors and Nilpotent Matrices . . . . . . . . . . 500
12.4 Idempotent Matrices and Generalized Eigenspaces . . . . . . . . . 505
12.5 Decomposition of Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . 512
12.6 Jordan Canonical Form . . . . . . . . . . . . . . . . . . . . . . . . . . . . 515
12.6.1 Canonical Form of Nilpotent Matrix . . . . . . . . . . . . . 515
12.6.2 Jordan Blocks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 521
12.6.3 Example of Jordan Canonical Form . . . . . . . . . . . . . 528
12.7 Diagonalizable Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . 539
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 548
13 Inner Product Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 549
13.1 Inner Product and Metric . . . . . . . . . . . . . . . . . . . . . . . . . . . 549
13.2 Gram Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 552
13.3 Adjoint Operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 561
13.4 Orthonormal Basis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 566
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 570
14 Hermitian Operators and Unitary Operators . . . . . . . . . . . . . . . . . 571
14.1 Projection Operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 571
14.2 Normal Operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 578
14.3 Unitary Diagonalization of Matrices . . . . . . . . . . . . . . . . . . . 580
14.4 Hermitian Matrices and Unitary Matrices . . . . . . . . . . . . . . . 588
14.5 Hermitian Quadratic Forms . . . . . . . . . . . . . . . . . . . . . . . . . 592
14.6 Simultaneous Eigenstates and Diagonalization . . . . . . . . . . . . 599
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 605
15 Exponential Functions of Matrices . . . . . . . . . . . . . . . . . . . . . . . . . 607
15.1 Functions of Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 607
15.2 Exponential Functions of Matrices and Their Manipulations . . . 611
15.3 System of Differential Equations . . . . . . . . . . . . . . . . . . . . . . 618
15.3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 618
15.3.2 System of Differential Equations in a Matrix Form:
Resolvent Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . 621
15.3.3 Several Examples . . . . . . . . . . . . . . . . . . . . . . . . . . 626
15.4 Motion of a Charged Particle in Polarized Electromagnetic
Wave . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 637
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 644
Contents xix

Part IV Group Theory and Its Chemical Applications


16 Introductory Group Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 647
16.1 Definition of Groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 647
16.2 Subgroups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 649
16.3 Classes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 651
16.4 Isomorphism and Homomorphism . . . . . . . . . . . . . . . . . . . . 653
16.5 Direct-Product Groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 657
Reference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 659
17 Symmetry Groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 661
17.1 A Variety of Symmetry Operations . . . . . . . . . . . . . . . . . . . . 661
17.2 Successive Symmetry Operations . . . . . . . . . . . . . . . . . . . . . 669
17.3 O and Td Groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 680
17.4 Special Orthogonal Group SO(3) . . . . . . . . . . . . . . . . . . . . . 689
17.4.1 Rotation Axis and Rotation Matrix . . . . . . . . . . . . . . 690
17.4.2 Euler Angles and Related Topics . . . . . . . . . . . . . . . 695
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 704
18 Representation Theory of Groups . . . . . . . . . . . . . . . . . . . . . . . . . 705
18.1 Definition of Representation . . . . . . . . . . . . . . . . . . . . . . . . . 705
18.2 Basis Functions of Representation . . . . . . . . . . . . . . . . . . . . . 709
18.3 Schur’s Lemmas and Grand Orthogonality Theorem (GOT) . . . 717
18.4 Characters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 723
18.5 Regular Representation and Group Algebra . . . . . . . . . . . . . . 726
18.6 Classes and Irreducible Representations . . . . . . . . . . . . . . . . . 734
18.7 Projection Operators: Revisited . . . . . . . . . . . . . . . . . . . . . . . 737
18.8 Direct-Product Representation . . . . . . . . . . . . . . . . . . . . . . . 746
18.9 Symmetric Representation and Antisymmetric
Representation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 750
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 753
19 Applications of Group Theory to Physical Chemistry . . . . . . . . . . 755
19.1 Transformation of Functions . . . . . . . . . . . . . . . . . . . . . . . . . 755
19.2 Method of Molecular Orbitals (MOs) . . . . . . . . . . . . . . . . . . 761
19.3 Calculation Procedures of Molecular Orbitals (MOs) . . . . . . . 768
19.4 MO Calculations Based on π-Electron Approximation . . . . . . 773
19.4.1 Ethylene . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 773
19.4.2 Cyclopropenyl Radical . . . . . . . . . . . . . . . . . . . . . . 783
19.4.3 Benzene . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 791
19.4.4 Allyl Radical . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 797
19.5 MO Calculations of Methane . . . . . . . . . . . . . . . . . . . . . . . . 805
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 826
xx Contents

20 Theory of Continuous Groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . 827


20.1 Introduction: Operators of Rotation and Infinitesimal
Rotation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 827
20.2 Rotation Groups: SU(2) and SO(3) . . . . . . . . . . . . . . . . . . . . 834
20.2.1 Construction of SU(2) Matrices . . . . . . . . . . . . . . . . 837
20.2.2 SU(2) Representation Matrices: Wigner Formula . . . . 840
20.2.3 SO(3) Representation Matrices and Spherical Surface
Harmonics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 843
20.2.4 Irreducible Representations of SU(2) and SO(3) . . . . 853
20.2.5 Parameter Space of SO(3) . . . . . . . . . . . . . . . . . . . . 861
20.2.6 Irreducible Characters of SO(3) and Their
Orthogonality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 868
20.3 Clebsch-Gordan Coefficients of Rotation Groups . . . . . . . . . 873
20.3.1 Direct-Product of SU(2) and Clebsch-Gordan
Coefficients . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 874
20.3.2 Calculation Procedures of Clebsch-Gordan
Coefficients . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 880
20.3.3 Examples of Calculation of Clebsch-Gordan
Coefficients . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 890
20.4 Lie Groups and Lie Algebras . . . . . . . . . . . . . . . . . . . . . . . . 896
20.4.1 Definition of Lie Groups and Lie Algebras:
One-Parameter Groups . . . . . . . . . . . . . . . . . . . . . . 896
20.4.2 Properties of Lie Algebras . . . . . . . . . . . . . . . . . . . . 899
20.4.3 Adjoint Representation of Lie Groups . . . . . . . . . . . 905
20.5 Connectedness of Lie Groups . . . . . . . . . . . . . . . . . . . . . . . . 915
20.5.1 Several Definitions and Examples . . . . . . . . . . . . . . 915
20.5.2 O(3) and SO(3) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 918
20.5.3 Simply Connected Lie Groups: Local Properties and
Global Properties . . . . . . . . . . . . . . . . . . . . . . . . . . 920
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 927

Part V Introduction to the Quantum Theory of Fields


21 The Dirac Equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 931
21.1 Historical Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 931
21.2 Several Remarks on the Special Theory of Relativity . . . . . . . 934
21.2.1 Minkowski Space and Lorentz Transformation . . . . . 934
21.2.2 Event and World Interval . . . . . . . . . . . . . . . . . . . . . 939
21.3 Constitution and Solutions of the Dirac Equation . . . . . . . . . . 944
21.3.1 General Form of the Dirac Equation . . . . . . . . . . . . . 944
21.3.2 Plane Wave Solutions of the Dirac Equation . . . . . . . 949
21.3.3 Negative-Energy Solution of the Dirac Equation . . . . 955
21.3.4 One-Particle Hamiltonian of the Dirac Equation . . . . 960
Contents xxi

21.4 Normalization of the Solutions of the Dirac Equation . . . . . . . 964


21.5 Charge Conjugation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 967
21.6 Characteristics of the Gamma Matrices . . . . . . . . . . . . . . . . . 971
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 976
22 Quantization of Fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 977
22.1 Lagrangian Formalism of the Fields . . . . . . . . . . . . . . . . . . . 977
22.2 Introductory Fourier Analysis . . . . . . . . . . . . . . . . . . . . . . . . 985
22.2.1 Fourier Series Expansion . . . . . . . . . . . . . . . . . . . . . 986
22.2.2 Fourier Integral Transforms: Fourier Transform and
Inverse Fourier Transform . . . . . . . . . . . . . . . . . . . . 987
22.3 Quantization of the Scalar Field . . . . . . . . . . . . . . . . . . . . . . 991
22.3.1 Lagrangian Density and Action Integral . . . . . . . . . . 991
22.3.2 Equal-Time Commutation Relation and Field
Quantization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 993
22.3.3 Hamiltonian and Fock Space . . . . . . . . . . . . . . . . . . 999
22.3.4 Invariant Delta Functions of the Scalar Field . . . . . . . 1005
22.3.5 Feynman Propagator of the Scalar Field . . . . . . . . . . 1015
22.3.6 General Consideration on the Field Quantization . . . . 1023
22.4 Quantization of the Dirac Field . . . . . . . . . . . . . . . . . . . . . . . 1028
22.4.1 Lagrangian Density and Hamiltonian Density of the
Dirac Field . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1028
22.4.2 Quantization Procedure of the Dirac Field . . . . . . . . . 1035
22.4.3 Antiparticle: Positron . . . . . . . . . . . . . . . . . . . . . . . 1039
22.4.4 Invariant Delta Functions of the Dirac Field . . . . . . . 1040
22.4.5 Feynman Propagator of the Dirac Field . . . . . . . . . . . 1043
22.5 Quantization of the Electromagnetic Field . . . . . . . . . . . . . . . 1049
22.5.1 Relativistic Formulation of the Electromagnetic
Field . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1049
22.5.2 Lagrangian Density and Hamiltonian Density
of the Electromagnetic Field . . . . . . . . . . . . . . . . . . 1055
22.5.3 Polarization Vectors of the Electromagnetic
Field . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1059
22.5.4 Canonical Quantization of the Electromagnetic
Field . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1061
22.5.5 Hamiltonian and Indefinite Metric . . . . . . . . . . . . . . 1065
22.5.6 Feynman Propagator of Photon Field . . . . . . . . . . . . 1071
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1074
23 Interaction Between Fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1075
23.1 Lorentz Force and Minimal Coupling . . . . . . . . . . . . . . . . . . 1075
23.2 Lagrangian and Field Equation of the Interacting Fields . . . . . 1079
23.3 Local Phase Transformation and U(1) Gauge Field . . . . . . . . 1082
23.4 Interaction Picture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1085
23.5 S-Matrix and S-Matrix Expansion . . . . . . . . . . . . . . . . . . . . . 1091
xxii Contents

23.6 N-Product and T-Product . . . . . . . . . . . . . . . . . . . . . . . . . . . 1095


23.6.1 Example of Two Field Operators . . . . . . . . . . . . . . . 1095
23.6.2 Calculations Including Both the N-Products and
T-Products . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1099
23.7 Feynman Rules and Feynman Diagrams in QED . . . . . . . . . . 1103
23.7.1 Zeroth- and First-Order S-Matrix Elements . . . . . . . . 1104
23.7.2 Second-Order S-Matrix Elements and Feynman
Diagrams . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1108
23.8 Example: Compton Scattering . . . . . . . . . . . . . . . . . . . . . . . 1113
23.8.1 Introductory Remarks . . . . . . . . . . . . . . . . . . . . . . . 1113
23.8.2 Feynman Amplitude and Feynman Diagrams
of Compton Scattering . . . . . . . . . . . . . . . . . . . . . . . 1115
23.8.3 Scattering Cross-Section . . . . . . . . . . . . . . . . . . . . . 1120
23.8.4 Spin and Photon Polarization Sums . . . . . . . . . . . . . 1124
23.8.5 Detailed Calculation Procedures of Feynman
Amplitude . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1126
23.8.6 Experimental Observations . . . . . . . . . . . . . . . . . . . 1136
23.9 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1139
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1140
24 Basic Formalism . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1141
24.1 Extended Concepts of Vector Spaces . . . . . . . . . . . . . . . . . . 1141
24.1.1 Bilinear Mapping . . . . . . . . . . . . . . . . . . . . . . . . . . 1142
24.1.2 Tensor Product . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1149
24.1.3 Bilinear Form . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1155
24.1.4 Dual Vector Space . . . . . . . . . . . . . . . . . . . . . . . . . 1155
24.1.5 Invariants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1165
24.1.6 Tensor Space and Tensors . . . . . . . . . . . . . . . . . . . . 1171
24.1.7 Euclidean Space and Minkowski Space . . . . . . . . . . 1177
24.2 Lorentz Group and Lorentz Transformations . . . . . . . . . . . . . 1187
24.2.1 Lie Algebra of the Lorentz Group . . . . . . . . . . . . . . 1189
24.2.2 Successive Lorentz Transformations . . . . . . . . . . . . . 1193
24.3 Covariant Properties of the Dirac Equation . . . . . . . . . . . . . . 1197
24.3.1 General Consideration on the Physical Equation . . . . 1197
24.3.2 The Klein-Gordon Equation and the Dirac
Equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1200
24.4 Determination of General Form of Matrix S(Λ) . . . . . . . . . . . 1204
24.5 Transformation Properties of the Dirac Spinors . . . . . . . . . . . 1207
24.6 Transformation Properties of the Dirac Operators . . . . . . . . . . 1212
24.6.1 Case I: Single Lorentz Boost . . . . . . . . . . . . . . . . . . 1212
24.6.2 Case II: Non-Coaxial Lorentz Boosts . . . . . . . . . . . . 1221
24.7 Projection Operators and Related Operators for the Dirac
Equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1229
Contents xxiii

24.8 Spectral Decomposition of the Dirac Operators . . . . . . . . . . . 1235


24.9 Connectedness of the Lorentz Group . . . . . . . . . . . . . . . . . . . 1241
24.9.1 Polar Decomposition of a Non-Singular Matrix . . . . . 1242
24.9.2 Special Linear Group: SL(2, ℂ) . . . . . . . . . . . . . . . . 1250
24.10 Representation of the Proper Orthochronous Lorentz Group
SO0(3, 1) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1258
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1269
25 Advanced Topics of Lie Algebra . . . . . . . . . . . . . . . . . . . . . . . . . . 1271
25.1 Differential Representation of Lie Algebra . . . . . . . . . . . . . . . 1271
25.1.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1271
25.1.2 Adjoint Representation of Lie Algebra . . . . . . . . . . . 1273
25.1.3 Differential Representation of Lorentz Algebra . . . . . 1278
25.2 Cartan-Weyl Basis of Lie Algebra . . . . . . . . . . . . . . . . . . . . 1287
25.2.1 Complexification . . . . . . . . . . . . . . . . . . . . . . . . . . . 1288
25.2.2 Coupling of Angular Momenta: Revisited . . . . . . . . . 1294
25.3 Decomposition of Lie Algebra . . . . . . . . . . . . . . . . . . . . . . . 1298
25.4 Further Topics of Lie Algebra . . . . . . . . . . . . . . . . . . . . . . . . 1301
25.5 Closing Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1304
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1304

Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1305
Part I
Quantum Mechanics

Quantum mechanics is clearly distinguished from classical physics whose major


pillars are Newtonian mechanics and electromagnetism established by Maxwell.
Quantum mechanics was first established as a theory of atomic physics that handled
the microscopic world. Later on, quantum mechanics was applied to the macroscopic
world, i.e., cosmos. The question on how exactly quantum mechanics describes the
natural world and on how far the theory can go remains yet problematic and is in
dispute to this day.
Such an ultimate question is irrelevant to this monograph. Our major aim is to
study a standard approach to applying Schrödinger equation to selected topics. The
topics include a particle confined within a potential well, a harmonic oscillator, and
hydrogen-like atoms. Our major task rests on solving eigenvalue problems of these
topics. To this end, we describe both an analytical method and algebraic (or operator)
method. Focusing on these topics, we will be able to acquire various methods to
tackle a wide range of quantum-mechanical problems. These problems are usually
posed as an analytical equation (i.e., differential equation) or an algebraic equation.
A Hamiltonian is constructed analytically or algebraically accordingly. Besides
Hamiltonian, physical quantities are expressed as a differential operator or a matrix
operator. In both analytical and algebraic approaches, Hermitian property
(or Hermiticity) of an operator and matrix is of crucial importance. This feature
will, therefore, be highlighted not only in this part but also throughout this book
along with a unitary operator and matrix.
Optical transition and associated selection rules are dealt with in relation to the
above topics. Those subjects are closely related to electromagnetic phenomena that
are considered in Part II.
Unlike the eigenvalue problems of the above-mentioned topics, it is difficult to
get exact analytical solutions in most cases of quantum-mechanical problems. For
this reason, we need appropriate methods to obtain approximate solutions with
respect to various problems including the eigenvalue problems. In this context, we
deal with approximation techniques of a perturbation method and variational
method.
2 Part I Quantum Mechanics

In the last part, we study the theory of analytic functions, one of the most elegant
theories of mathematics. This approach not only helps cultivate a broad view of pure
mathematics, but also leads to the acquisition of practical methodology. The last part
deals with the introductory set theory and topology as well.
Chapter 1
Schrödinger Equation and Its Application

Quantum mechanics is an indispensable research tool of modern natural science that


covers cosmology, atomic physics, molecular science, materials science, and so
forth. The basic concept underlying quantum mechanics rests upon Schrödinger
equation. The Schrödinger equation is described as a second-order linear differential
equation (SOLDE). The equation is analytically solved accordingly. Alternatively,
equations of the quantum mechanics are often described in terms of operators and
matrices and physical quantities are represented by those operators and matrices.
Normally, they are non-commutative. In particular, the quantum-mechanical formal-
ism requires the canonical commutation relation between position and momentum
operators. One of the great characteristics of the quantum mechanics is that physical
quantities must be Hermitian. This aspect is deeply related to the requirement that
these quantities should be described by real numbers. We deal with the Hermiticity
from both an analytical point of view (or coordinate representation) relevant to the
differential equations and an algebraic viewpoint (or matrix representation) associ-
ated with the operators and matrices. Including these topics, we briefly survey the
origin of Schrödinger equation and consider its implications. To get acquainted with
the quantum-mechanical formalism, we deal with simple examples of the
Schrödinger equation.

1.1 Early-Stage Quantum Theory

The Schrödinger equation is a direct consequence of discovery of quanta. It stemmed


from the hypothesis of energy quanta propounded by Max Planck (1900). This
hypothesis was further followed by photon (light quantum) hypothesis propounded
by Albert Einstein (1905). He claimed that light is an aggregation of light quanta and
that individual quanta carry an energy E expressed as Planck constant h multiplied
by frequency of light ν, i.e.,

© Springer Nature Singapore Pte Ltd. 2023 3


S. Hotta, Mathematical Physical Chemistry,
https://doi.org/10.1007/978-981-99-2512-4_1
4 1 Schrödinger Equation and Its Application

E = hν = ħω, ð1:1Þ

where ħ  h/2π and ω = 2πν. The quantity ω is called angular frequency with ν
being frequency. The quantity ħ is said to be a reduced Planck constant.
Also Einstein (1917) concluded that momentum of light quantum p is identical to
the energy of light quantum divided by light velocity in vacuum c. That is, we have

p = E=c = ħω=c = ħk, ð1:2Þ

where k  2π/λ (λ is wavelength of light in vacuum) and k is called wavenumber.


Using vector notation, we have

p = ħk, ð1:3Þ

where k  2πλ n (n: a unit vector in the direction of propagation of light) is said to be a
wavenumber vector.
Meanwhile, Arthur Compton (1923) conducted various experiments where he
investigated how an incident X-ray beam was scattered by matter (e.g., graphite,
copper, etc.). As a result, Compton found out a systematical redshift in X-ray
wavelengths as a function of scattering angles of the X-ray beam (Compton effect).
Moreover he found that the shift in wavelengths depended only on the scattering
angle regardless of quality of material of a scatterer. The results can be summarized
in a simple equation described as

h
Δλ = ð1 - cos θÞ, ð1:4Þ
me c

where Δλ denotes a shift in wavelength of the scattered beam; me is a rest mass of an


electron; θ is a scattering angle of the X-ray beam (see Fig. 1.1). A quantity mhe c has a
dimension of length and denoted by λe. That is,

λe  h=me c: ð1:5Þ

In other words, λe is equal to the maximum shift in the wavelength of the scattered
beam; this shift is obtained when θ = π/2. The quantity λe is called an electron
Compton wavelength and has an approximate value of 2.426 × 10-12 (m).
Let us derive (1.4) on the basis of conservation of energy and momentum. To this
end, in Fig. 1.1 we assume that an electron is originally at rest. An X-ray beam is
incident to the electron. Then the X-ray is scattered and the electron recoils as shown.
The energy conservation reads as

ħω þ me c2 = ħω0 þ p2 c2 þ m e 2 c4 , ð1:6Þ
1.1 Early-Stage Quantum Theory 5

(a) recoiled electron ( )

(b)

incident X-ray ( ) rest electron

scattered X-ray ( )
Fig. 1.1 Scattering of an X-ray beam by an electron. (a) θ denotes a scattering angle of the X-ray
beam. (b) Conservation of momentum

where ω and ω′ are initial and final angular frequencies of the X-ray; the second term
of RHS is an energy of the electron in which p is a magnitude of momentum after
recoil. Meanwhile, conservation of the momentum as a vector quantity reads as

ħk = ħk0 þ p, ð1:7Þ

where k and k′ are wavenumber vectors of the X-ray before and after being scattered;
p is a momentum of the electron after recoil. Note that an initial momentum of the
electron is zero since the electron is originally at rest. Here p is defined as

p  mu, ð1:8Þ

where u is a velocity of an electron and m is given by [1]

m = me = 1 - juj2 =c2 : ð1:9Þ

Figure 1.1 shows that -ħk, ħk′, and p form a closed triangle.
From (1.6), we have

2
me c2 þ ħðω - ω0 Þ = p2 c2 þ me 2 c4 : ð1:10Þ

Hence, we get

2me c2 ħðω - ω0 Þ þ ħ2 ðω - ω0 Þ = p2 c2 :
2
ð1:11Þ

From (1.7), we have


6 1 Schrödinger Equation and Its Application

= ħ2 ðk - k0 Þ = ħ2 k2 þ k0 - 2kk 0 cos θ
2 2
p2
ð1:12Þ
ħ2 2
ω þ ω0 - 2ωω0 cos θ ,
2
=
c2

where we used the relations ω = ck and ω′ = ck′ with the third equality. Therefore,
we get

p2 c2 = ħ2 ω2 þ ω0 - 2ωω0 cos θ :
2
ð1:13Þ

From (1.11) and (1.13), we have

2me c2 ħðω - ω0 Þ þ ħ2 ðω - ω0 Þ = ħ2 ω2 þ ω0 - 2ωω0 cos θ :


2 2
ð1:14Þ

Equation (1.14) is simplified to the following:

2me c2 ħðω - ω0 Þ - 2ħ2 ωω0 = - 2ħ2 ωω0 cos θ:

That is,

me c2 ðω - ω0 Þ = ħωω0 ð1 - cos θÞ: ð1:15Þ

Thus, we get

ω - ω0 1 1 1 0 ħ
= 0- = ð λ - λÞ = ð1 - cos θÞ, ð1:16Þ
ωω0 ω ω 2πc m e c2

where λ and λ′ are wavelengths of the initial and final X-ray beams, respectively.
Since λ′ - λ = Δλ, we have (1.4) from (1.16) accordingly.
We have to mention another important person, Louis-Victor de Broglie (1924) in
the development of quantum mechanics. Encouraged by the success of Einstein and
Compton, he propounded the concept of matter wave, which was referred to as the
de Broglie wave afterward. Namely, de Broglie reversed the relationship of (1.1) and
(1.2) such that

ω = E=ħ, ð1:17Þ

and

k = p=ħ or λ = h=p, ð1:18Þ

where p equals |p| and λ is a wavelength of a corpuscular beam. This is said to be the
de Broglie wavelength. In (1.18), de Broglie thought that a particle carrying an
1.1 Early-Stage Quantum Theory 7

energy E and momentum p is accompanied by a wave that is characterized by an


angular frequency ω and wavenumber k (or a wavelength λ = 2π/k). Equation (1.18)
implies that if we are able to determine the wavelength of the corpuscular beam
experimentally, we can decide a magnitude of momentum accordingly.
In turn, from squares of both sides of (1.8) and (1.9) we get

p
u= : ð1:19Þ
me 1 þ ðp=me cÞ2

This relation represents a velocity of particles of the corpuscular beam. If we are


dealing with an electron beam, (1.19) gives the velocity of the electron beam. As a
non-relativistic approximation (i.e., p/mec ≪ 1), we have

p ≈ me u:

We used a relativistic relation in the second term of RHS of (1.6), where an


energy of an electron Ee is expressed by

Ee = p2 c2 þ me 2 c4 : ð1:20Þ

In the meantime, deleting u2 from (1.8) and (1.9) we have

mc2 = p2 c2 þ me 2 c4 :

Namely, we get [1]

Ee = mc2 : ð1:21Þ

The relation (1.21) is due to Einstein (1905, 1907) and is said to be the equiva-
lence theorem of mass and energy.
If an electron is accompanied by a matter wave, that wave should be propagated
with a certain phase velocity vp and a group velocity vg. Thus, using (1.17) and (1.18)
we have

vp = ω=k = Ee =p = p2 c2 þ me 2 c4 =p > c,
vg = ∂ω=∂k = ∂E e =∂p = c2 p= p2 c2 þ me 2 c4 < c, ð1:22Þ
vp vg = c :
2

Notice that in the above expressions, we replaced E of (1.17) with Ee of (1.20).


The group velocity is thought to be a velocity of a wave packet and, hence, a
propagation velocity of a matter wave should be identical to vg. Thus, vg is consid-
ered as a particle velocity as well. In fact, vg given by (1.22) is identical to
u expressed in (1.19). Therefore, a particle velocity must not exceed c. As for
8 1 Schrödinger Equation and Its Application

photons (or light quanta), vp = vg = c and, hence, once again we get vpvg = c2. We
will encounter the last relation of (1.22) in Part II as well.
The above discussion is a brief historical outlook of early-stage quantum theory
before Erwin Schrödinger (1926) propounded his equation.

1.2 Schrödinger Equation

First we introduce a wave equation expressed by

2
1 ∂ ψ
— 2ψ = , ð1:23Þ
v2 ∂t 2

where ψ is an arbitrary function of a physical quantity relevant to propagation of a


wave; v is a phase velocity of wave; — 2 called Laplacian (or Laplace operator) is
defined below

2 2 2
∂ ∂ ∂
—2  þ þ : ð1:24Þ
∂x2 ∂y2 ∂z2

One of the special solutions for (1.24) called a plane wave is well-studied and
expressed as

ψ = ψ 0 eiðkx - ωtÞ : ð1:25Þ

In (1.25), x denotes a position vector of a three-dimensional Cartesian coordinate


and is described as

x
x = ðe1 e2 e3 Þ y , ð1:26Þ
z

where e1, e2, and e3 denote basis vectors of an orthonormal base pointing to positive
directions of x-, y-, and z-axes, respectively. Here we make it a rule to represent basis
vectors by a row vector and represent a coordinate or a component of a vector by a
column vector; see Sect. 11.1.
The other way around, now we wish to seek a basic equation whose solution is
described as (1.25). Taking account of (1.1)–(1.3) as well as (1.17) and (1.18), we
rewrite (1.25) as
1.2 Schrödinger Equation 9

ψ = ψ 0 eiðħx - ħ tÞ ,
p E
ð1:27Þ

px
where we redefine p = ðe1 e2 e3 Þ py and E as quantities associated with those of
pz
matter (electron) wave. Taking partial differentiation of (1.27) with respect to x, we
obtain

∂ψ
= px ψ 0 eiðħx - ħ tÞ = px ψ:
i p E i
ð1:28Þ
∂x ħ ħ

Rewriting (1.28), we have

ħ ∂ψ
= px ψ: ð1:29Þ
i ∂x

Similarly we have

ħ ∂ψ ħ ∂ψ
= py ψ and = pz ψ: ð1:30Þ
i ∂y i ∂z

Comparing both sides of (1.29), we notice that we may relate a differential


operator ħi ∂x

to px. From (1.30), similar relationship holds with the y and
z components. That is, we have the following relations:

ħ ∂ ħ ∂ ħ ∂
$ px , $ py , $ pz : ð1:31Þ
i ∂x i ∂y i ∂z

Taking partial differentiation of (1.28) once more,

2
∂ ψ 2
ψ 0 eiðħx - ħ tÞ = -
i p E 1 2
= px p ψ: ð1:32Þ
∂x 2 ħ ħ2 x

Hence,

2
∂ ψ
- ħ2 = p2x ψ: ð1:33Þ
∂x2

Similarly we have
10 1 Schrödinger Equation and Its Application

2 2
∂ ψ ∂ ψ
- ħ2 = p2y ψ and - ħ2 = p2z ψ: ð1:34Þ
∂y2 ∂z2

As in the above cases, we have

2 2 2
∂ ∂ ∂
- ħ2 $ p2x , - ħ2 $ p2y , - ħ2 $ p2z : ð1:35Þ
∂x2 ∂y2 ∂z2

Summing both sides of (1.33) and (1.34) and then dividing by 2m, we have

ħ2 2 p2
- — ψ= ψ ð1:36Þ
2m 2m

and the following correspondence

ħ2 2 p2
- — $ , ð1:37Þ
2m 2m

where m is the mass of a particle.


Meanwhile, taking partial differentiation of (1.27) with respect to t, we obtain

∂ψ
= - Eψ 0 eiðħx - ħ tÞ = - Eψ:
i p E i
ð1:38Þ
∂t ħ ħ

That is,

∂ψ
iħ = Eψ: ð1:39Þ
∂t

As the above, we get the following relationship:


iħ $ E: ð1:40Þ
∂t

Thus, we have relationships between c-numbers (classical numbers) and q-num-


bers (quantum numbers, namely, operators) in (1.35) and (1.40). Subtracting (1.36)
from (1.39), we get

∂ψ ħ2 2 p2
iħ þ — ψ = E- ψ: ð1:41Þ
∂t 2m 2m

Invoking the relationship on energy


1.2 Schrödinger Equation 11

ðTotal energyÞ = ðKinetic energyÞ þ ðPotential energyÞ, ð1:42Þ

we have

p2
E= þ V, ð1:43Þ
2m

where V is a potential energy. Thus, (1.41) reads as

∂ψ ħ2 2
iħ þ — ψ = Vψ: ð1:44Þ
∂t 2m

Rearranging (1.44), we finally get

ħ2 2 ∂ψ
- — þ V ψ = iħ : ð1:45Þ
2m ∂t

This is the Schrödinger equation, a fundamental equation of quantum mechanics.


In (1.45), we define a following Hamiltonian operator H as

ħ2 2
H - — þ V: ð1:46Þ
2m

Then we have a shorthand representation such that

∂ψ
Hψ = iħ : ð1:47Þ
∂t

On going from (1.25) to (1.27), we realize that quantities k and ω pertinent to a


field have been converted to quantities p and E related to a particle. At the same time,
whereas x and t represent a whole space-time in (1.25), those in (1.27) are charac-
terized as localized quantities.
From a historical point of view, we have to mention a great achievement
accomplished by Werner Heisenberg (1925) who propounded matrix mechanics.
The matrix mechanics is often contrasted with the wave mechanics Schrödinger
initiated. Schrödinger and Paul Dirac (1926) demonstrated that wave mechanics and
matrix mechanics are mathematically equivalent. Note that the Schrödinger equation
is described as a non-relativistic expression based on (1.43). In fact, kinetic energy
K of a particle is given by [1]

me c2
K= - m e c2 :
2
1 - ðu=cÞ

As a non-relativistic approximation, we get


12 1 Schrödinger Equation and Its Application

1 u 2 1 p2
K ≈ m e c2 1 þ - m e c2 = me u2 ≈ ,
2 c 2 2me

where we used p ≈ meu again as a non-relativistic approximation; also, we used

1 1
p ≈1 þ x
1-x 2

2
when x (>0) corresponding to uc is enough small than 1. This implies that in the
above case the group velocity u of a particle is supposed to be well below light
velocity c. Dirac (1928), however, formulated an equation that describes relativistic
quantum mechanics of electron (the Dirac equation). The detailed description with
respect to the Dirac equation will be found in Part V.
In (1.45) ψ varies as a function of x and t. Suppose, however, that a potential
V depends only upon x. Then we have

ħ2 2 ∂ψ ðx, t Þ
- — þ V ðxÞ ψ ðx, t Þ = iħ : ð1:48Þ
2m ∂t

Now, let us assume that separation of variables can be done with (1.48) such that

ψ ðx, t Þ = ϕðxÞξðt Þ: ð1:49Þ

Then, we have

ħ2 2 ∂ϕðxÞξðt Þ
- — þ V ðxÞ ϕðxÞξðt Þ = iħ : ð1:50Þ
2m ∂t

Accordingly, (1.50) can be recast as

ħ2 2 ∂ξðt Þ
- — þ V ðxÞ ϕðxÞ=ϕðxÞ = iħ =ξðt Þ: ð1:51Þ
2m ∂t

For (1.51) to hold, we must equate both sides to a constant E. That is, for a certain
fixed point x0 we have

ħ2 2 ∂ξðt Þ
- — þ V ðx0 Þ ϕðx0 Þ=ϕðx0 Þ = iħ =ξðt Þ, ð1:52Þ
2m ∂t

where ϕ(x0) of a numerator should be evaluated after operating — 2, while with ϕ(x0)
in a denominator, ϕ(x0) is evaluated simply replacing x in ϕ(x) with x0. Now, let us
define a function Φ(x) such that
1.2 Schrödinger Equation 13

ħ2 2
Φ ð xÞ  - — þ V ðxÞ ϕðxÞ=ϕðxÞ: ð1:53Þ
2m

Then, we have

∂ξðt Þ
Φðx0 Þ = iħ =ξðt Þ: ð1:54Þ
∂t

If RHS of (1.54) varied depending on t, Φ(x0) would be allowed to have various


values, but this must not be the case with our present investigation. Thus, RHS of
(1.54) should take a constant value E. For the same reason, LHS of (1.51) should
take a constant.
Thus, (1.48) or (1.51) should be separated into the following equations:

HϕðxÞ = EϕðxÞ, ð1:55Þ


∂ξðt Þ
iħ = Eξðt Þ: ð1:56Þ
∂t

Equation (1.56) can readily be solved. Since (1.56) depends on a sole variable t,
we have

dξðt Þ E E
= dt or d ln ξðt Þ = dt: ð1:57Þ
ξðt Þ iħ iħ

Integrating (1.57) from zero to t, we get

ξðt Þ Et
ln = : ð1:58Þ
ξ ð 0Þ iħ

That is,

ξðt Þ = ξð0Þ expð- iEt=ħÞ: ð1:59Þ

Comparing (1.59) with (1.38), we find that the constant E in (1.55) and (1.56)
represents an energy of a particle (electron).
Thus, the next task we want to do is to solve an eigenvalue equation of (1.55).
After solving the problem, we get a solution

ψ ðx, t Þ = ϕðxÞ expð- iEt=ħÞ, ð1:60Þ

where the constant ξ(0) has been absorbed in ϕ(x). Normally, ϕ(x) is to be normal-
ized after determining the functional form (vide infra).
14 1 Schrödinger Equation and Its Application

1.3 Simple Applications of Schrödinger Equation

The Schrödinger equation has been expressed as (1.48). The equation is a second-
order linear differential equation (SOLDE). In particular, our major interest lies in
solving an eigenvalue problem of (1.55). Eigenvalues consist of points in a complex
plane. Those points sometimes form a continuous domain, but we focus on the
eigenvalues that comprise discrete points in the complex plane. Therefore, in our
studies the eigenvalues are countable and numbered as, e.g., λn (n = 1, 2, 3, ⋯). An
example is depicted in Fig. 1.2. Having this common belief as a background, let us
first think of a simple form of SOLDE.
Example 1.1 Let us think of a following differential equation:

d 2 yð xÞ
þ λyðxÞ = 0, ð1:61Þ
dx2

where x is a real variable; y may be a complex function of x with λ possibly being a


complex constant as well. Suppose that y(x) is defined within a domain [-L, L]
(L > 0). We set boundary conditions (BCs) for (1.61) such that

yðLÞ = 0 and yð- LÞ = 0 ðL > 0Þ: ð1:62Þ

Fig. 1.2 Eigenvalues λn (n = 1, 2, 3, ⋯) on a complex plane


1.3 Simple Applications of Schrödinger Equation 15

The BCs of (1.62) are called Dirichlet conditions. We define the following
differential operator D described as

d2
D - : ð1:63Þ
dx2

Then rewriting (1.61), we have

DyðxÞ = λyðxÞ: ð1:64Þ

According to a general principle of SOLDE, it has two linearly independent


solutions. In the case of (1.61), we choose exponential functions for those solutions
described by

eikx and e - ikx ðk ≠ 0Þ: ð1:65Þ

This is because the above functions do not change a functional form with respect
to the differentiation and we ascribe solving a differential equation to solving an
algebraic equation among constants (or parameters). In the present case, λ and k are
such constants.
The parameter k could be a complex variable, because λ is allowed to take a
complex value as well. Linear independence of these functions is ensured from a
non-vanishing Wronskian, W. That is,

eikx e - ikx eikx e - ikx


W= ikx 0 - ikx 0
= = - ik - ik = - 2ik:
e e ikeikx - ike - ikx

If k ≠ 0, W ≠ 0. Therefore, as a general solution, we get

yðxÞ = aeikx þ be - ikx ðk ≠ 0Þ, ð1:66Þ

where a and b are (complex) constant. We call two linearly independent solutions
eikx and e-ikx (k ≠ 0) a fundamental set of solutions of a SOLDE. Inserting (1.66) into
(1.61), we have

λ - k 2 aeikx þ be - ikx = 0: ð1:67Þ

For (1.67) to hold with any x, we must have

λ - k 2 = 0, i:e:, λ = k2 : ð1:68Þ

Using BCs (1.62), we have


16 1 Schrödinger Equation and Its Application

aeikL þ be - ikL = 0 and ae - ikL þ beikL = 0: ð1:69Þ

Rewriting (1.69) in a matrix form, we have

eikL e - ikL a 0
= : ð1:70Þ
e - ikL eikL b 0

For a and b in (1.70) to have non-vanishing solutions, we must have

eikL e - ikL
= 0, i:e:, e2ikL - e - 2ikL = 0: ð1:71Þ
e - ikL eikL

It is because if (1.71) were not zero, we would have a = b = 0 and y(x)  0. Note
that with an eigenvalue problem we must avoid having a solution that is identically
zero. Rewriting (1.71), we get

eikL þ e - ikL eikL - e - ikL = 0: ð1:72Þ

That is, we have either

eikL þ e - ikL = 0 ð1:73Þ

or

eikL - e - ikL = 0: ð1:74Þ

In the case of (1.73), inserting this into (1.69) we have

eikL ða - bÞ = 0: ð1:75Þ

Therefore,

a = b, ð1:76Þ

where we used the fact that eikL is a non-vanishing function for any ikL (either real or
complex). Similarly, in the case of (1.74), we have

a = - b: ð1:77Þ

For (1.76), from (1.66) we have


1.3 Simple Applications of Schrödinger Equation 17

yðxÞ = a eikx þ e - ikx = 2a cos kx: ð1:78Þ

With (1.77), in turn, we get

yðxÞ = a eikx - e - ikx = 2ia sin kx: ð1:79Þ

Thus, we get two linearly independent solutions (1.78) and (1.79).


Inserting BCs (1.62) into (1.78), we have

cos kL = 0: ð1:80Þ

Hence,

π
kL = þ mπ ðm = 0, ± 1, ± 2, ⋯Þ: ð1:81Þ
2
π π
In (1.81), for instance, we have k = 2L for m = 0 and k = - 2L for m = - 1. Also,
we have k = 2L for m = 1 and k = - 2L for m = - 2. These cases, however,
3π 3π

individually give linearly dependent solutions for (1.78). Therefore, to get a set of
linearly independent eigenfunctions we may define k as positive. Correspondingly,
from (1.68) we get eigenvalues of

λ = ð2m þ 1Þ2 π 2 =4L2 ðm = 0, 1, 2, ⋯Þ: ð1:82Þ

Also, inserting BCs (1.62) into (1.79), we have

sin kL = 0: ð1:83Þ

Hence,

kL = nπ ðn = 1, 2, 3, ⋯Þ: ð1:84Þ

From (1.68) we get

λ = n2 π 2 =L2 = ð2nÞ2 π 2 =4L2 ðn = 1, 2, 3, ⋯Þ, ð1:85Þ

where we chose positive numbers n for the same reason as the above. With the
second equality of (1.85), we made eigenvalues easily comparable to those of (1.82).
Figure 1.3 shows the eigenvalues given in both (1.82) and (1.85) in a unit of π 2/4L2.
From (1.82) and (1.85), we find that λ is positive definite (or strictly positive), and
so from (1.68) we have
18 1 Schrödinger Equation and Its Application

(x /4 )
+∞
0 1 4 9 16 25 ⋯

Fig. 1.3 Eigenvalues of a differential equation (1.61) under boundary conditions given by (1.62).
The eigenvalues are given in a unit of π 2/4L2 on a real axis

p
k = λ: ð1:86Þ

The next step is to normalize eigenfunctions. This step corresponds to appropriate


choice of a constant a in (1.78) and (1.79) so that we can have

L L
I= yðxÞ yðxÞdx = jyðxÞj2 dx = 1: ð1:87Þ
-L -L

That is,

L L
1
I = 4jaj2 cos 2 kxdx = 4jaj2 ð1 þ cos 2kxÞdx
-L -L 2
ð1:88Þ
L
21 2
= 2jaj x þ sin 2kx = 4Ljaj :
2k -L

Combining (1.87) and (1.88), we get

1 1
j aj = : ð1:89Þ
2 L

Thus, we have

1 1 iθ
a= e , ð1:90Þ
2 L

where θ is any real number and eiθ is said to be a phase factor. We usually set eiθ  1.
Then, we have a = 1
2
1
L. Thus for a normalized cosine eigenfunctions, we get

1 π
yðxÞ = cos kx kL = þ mπ ðm = 0, 1, 2, ⋯Þ ð1:91Þ
L 2

that corresponds to an eigenvalue λ = (2m + 1)2π 2/4L2 (m = 0, 1, 2, ⋯). For another


series of normalized sine functions, similarly we get
1.3 Simple Applications of Schrödinger Equation 19

1
yð xÞ = sin kx ½kL = nπ ðn = 1, 2, 3, ⋯Þ ð1:92Þ
L

that corresponds to an eigenvalue λ = (2n)2π 2/4L2 (n = 1, 2, 3, ⋯).


Notice that arranging λ in ascending order, we have even functions and odd
functions alternately as eigenfunctions corresponding to λ. Such a property is said to
be parity. We often encounter it in quantum mechanics and related fields. From
(1.61) we find that if y(x) is an eigenfunction, so is cy(x). That is, we should bear in
mind that the eigenvalue problem is always accompanied by an indeterminate
constant and that normalization of an eigenfunction does not mean the uniqueness
of the solution (see Chap. 10).
Strictly speaking, we should be careful to assure that (1.81) holds on the basis of
(1.80). It is because we have yet the possibility that k is a complex number. To see it,
we examine zeros of a cosine function that is defined in a complex domain. Here the
zeros are (complex) numbers to which the function takes zero. That is, if f(z0) = 0, z0
is called a zero (i.e., one of zeros) of f(z). Now we have

1 iz
cos z  e þ e - iz ; z = x þ iy ðx, y : realÞ: ð1:93Þ
2

Inserting z = x + iy in cos z and rearranging terms, we get

1
cos z = ½cos xðey þ e - y Þ þ i sin xðe - y - ey Þ: ð1:94Þ
2

For cos z to vanish, both its real and imaginary parts must be zero. Since
e y + e-y > 0 for all real numbers y, we must have cos x = 0 for the real part to
vanish; i.e.,

π
x= þ mπ ðm = 0, ± 1, ± 2, ⋯Þ: ð1:95Þ
2

Note in this case that sin x = ± 1(≠0). Therefore, for the imaginary part to vanish,
e-y - e y = 0. That is, we must have y = 0. Consequently, the zeros of cos z are real
numbers. In other words, with respect to z0 that satisfies cos z0 = 0 we have

π
z0 = þ mπ ðm = 0, ± 1, ± 2, ⋯Þ: ð1:96Þ
2

The above discussion equally applies to a sine function as well.


Thus, we ensure that k is a non-zero real number. Eigenvalues λ are positive
definite from (1.68) accordingly. This conclusion is not fortuitous but a direct
consequence of the form of a differential equation we have dealt with in combination
with the BCs we imposed, i.e., the Dirichlet conditions. Detailed discussion will
follow in Sects. 1.4, 10.3, and 10.4 in relation to Hermiticity of a differential
operator.
20 1 Schrödinger Equation and Its Application

Example 1.2 A particle confined within a potential well The results obtained in
Example 1.1 can immediately be applied to dealing with a particle (electron) in a
one-dimensional infinite potential well. In this case, (1.55) reads as

ħ2 d 2 ψ ð xÞ
þ Eψ ðxÞ = 0, ð1:97Þ
2m dx2

where m is a mass of a particle and E is an energy of the particle. A potential V is


expressed as

0 ð - L ≤ x ≤ LÞ,
V ð xÞ =
1 ð - L > x; x > LÞ:

Rewriting (1.97), we have

d2 ψ ðxÞ 2mE
þ 2 ψ ð xÞ = 0 ð1:98Þ
dx2 ħ

with BCs

ψ ðLÞ = ψ ð- LÞ = 0: ð1:99Þ

If we replace λ of (1.61) with 2mE


ħ2
, we can follow the procedures of Example 1.1.
That is, we put

ħ2 λ
E= ð1:100Þ
2m

with λ = k2 in (1.68). For k we use the values of (1.81) and (1.84). Therefore, with
energy eigenvalues we get either

2
ħ2 ð2l þ 1Þ π 2
E=  ðl = 0, 1, 2, ⋯Þ,
2m 4L2

to which

1 π
ψ ð xÞ = cos kx kL = þ lπ ðl = 0, 1, 2, ⋯Þ ð1:101Þ
L 2

corresponds or
1.4 Quantum-Mechanical Operators and Matrices 21

2
ħ2 ð2nÞ π 2
E=  ðn = 1, 2, 3, ⋯Þ,
2m 4L2

to which

1
ψ ð xÞ = sin kx ½kL = nπ ðn = 1, 2, 3, ⋯Þ ð1:102Þ
L

corresponds.
Since the particle behaves as a free particle within the potential well (-L ≤ x ≤ L )
and p = ħk, we obtain

p2 ħ2 2
E= = k ,
2m 2m

where

ð2l þ 1Þπ=2L ðl = 0, 1, 2, ⋯Þ,


k=
2nπ=2L ðn = 1, 2, 3, ⋯Þ:

The energy E is a kinetic energy of the particle.


Although in (1.97), ψ(x)  0 trivially holds, such a function may not be regarded
as a solution of the eigenvalue problem. In fact, considering that |ψ(x)|2 represents
existence probability of a particle, ψ(x)  0 corresponds to a situation where a
particle in question does not exist. Consequently, such a trivial case has physically
no meaning.

1.4 Quantum-Mechanical Operators and Matrices

As represented by (1.55), a quantum-mechanical operator corresponds to a physical


quantity. In (1.55), we connect a Hamiltonian operator to an energy (eigenvalue). Let
us rephrase the situation as follows:

PΨ = pΨ : ð1:103Þ

In (1.103), we are viewing P as an operation or measurement on a physical system


that is characterized by the quantum state Ψ . Operating P on the physical system
(or state), we obtain a physical quantity p relevant to P as a result of the operation
(or measurement).
A way to effectively achieve the above is to use a matrix and vector to represent
the operation and physical state, respectively. Let us glance a little bit of matrix
calculation to get used to the quantum-mechanical concept and, hence, to obtain
22 1 Schrödinger Equation and Its Application

clear understanding about it. In Part III, we will deal with matrix calculation in detail
from a point of view of a general principle. At present, a (2, 2) matrix suffices. Let
A be a (2, 2) matrix expressed as

a b
A= : ð1:104Þ
c d

Let jψ i be a (2, 1) matrix, i.e., a column vector such that

e
j ψi = : ð1:105Þ
f

Note that operating (2, 2) matrix on a (2, 1) matrix produces another (2, 1) matrix.
Furthermore, we define an adjoint matrix A{ such that

a c
A{ = , ð1:106Þ
b d

where a is a complex conjugate of a. That is, A{ is a complex conjugate transposed


matrix of A. Also, we define an adjoint vector hψ j or jψ i{ such that

hψj  jψ i{ = ðe f  Þ: ð1:107Þ

In this case, jψi{ also denotes a complex conjugate transpose of jψi. The notation
jψi and hψj are due to Dirac. He named hψj and jφi a bra vector and ket vector,
respectively. This naming or equivoque comes from that hψ|  | φi = hψ| φi forms a
bracket. This is a (1, 2) × (2, 1) = (1, 1) matrix, i.e., a c-number (including a complex
number) and hψ| φi represents an inner product. These notations are widely used
nowadays in the field of mathematics and physics.
g
Taking another vector j ξi = and using a matrix calculation rule, we have
h

a c e a e þ c f
A{ j ψi = A{ ψi = = : ð1:108Þ
b d f b e þ d  f

According to the definition (1.107), we have

j A{ ψi{ = hA{ ψ j = ðae þ cf  be þ df  Þ: ð1:109Þ

Thus, we get
1.4 Quantum-Mechanical Operators and Matrices 23

g
A{ ψjξ = ðae þ cf  be þ df  Þ = ðag þ bhÞe þ ðcg þ dhÞf  : ð1:110Þ
h

Similarly, we have

a b g
hψjAξi = ðe f  Þ = ðag þ bhÞe þ ðcg þ dhÞf  : ð1:111Þ
c d h

Comparing (1.110) and (1.111), we get

A{ ψjξ = hψjAξi: ð1:112Þ

Also, we have

hψjAξi = hAξjψ i: ð1:113Þ

Replacing A with A{ in (1.112), we get

{
A{ ψjξ = ψjA{ ξ : ð1:114Þ

From (1.104) and (1.106), obviously we have

{
A{ = A: ð1:115Þ

Then, from (1.114) and (1.115) we have

hAψjξi = ψjA{ ξ = hξjAψ i , ð1:116Þ

where the second equality comes from (1.113) obtained by exchanging ψ and ξ
there. Moreover, we have a following relation:

ðABÞ{ = B{ A{ : ð1:117Þ

The proof is left for readers. Using this relation, we have

hAψj = jAψ i{ = ½Ajψi{ = j ψi{ A{ = hψ j A{ = hψA{ j : ð1:118Þ

Making an inner product by multiplying jξi from the right of the leftmost and
rightmost sides of (1.118) and using (1.116), we get
24 1 Schrödinger Equation and Its Application

hAψjξi = ψA{ jξ = ψjA{ ξ :

This relation may be regarded as the associative law with regard to the symbol “j”
of the inner product. This is equivalent to the associative law with regard to the
matrix multiplication.
The results obtained above can readily be extended to a general case where (n, n)
matrices are dealt with.
Now, let us introduce an Hermitian operator (or matrix) H. When we have

H { = H, ð1:119Þ

H is called an Hermitian matrix. Then, applying (1.112) to the Hermitian matrix H,


we have

H { ψjξ = hψjHξi = ψjH { ξ or hHψjξi = ψjH { ξ = hψjHξi: ð1:120Þ

Also let us introduce a norm of a vector jψ i such that

jjψ jj = hψjψ i: ð1:121Þ

A norm is a natural extension for a notion of a “length” of a vector. The norm kψk
is zero, if and only if jψ i = 0 (zero vector). For, from (1.105) and (1.107), we have

hψjψ i = jej2 þ jf j2 :

Therefore, hψ| ψ i = 0 , e = f = 0 in (1.105), i.e., jψ i = 0.


Let us further consider an eigenvalue problem represented by our newly intro-
duced notation. The eigenvalue equation is symbolically written as

H j ψi = λ j ψi, ð1:122Þ

where H represents an Hermitian operator and jψi is an eigenfunction that belongs to


an eigenvalue λ. Operating hψ j on (1.122) from the left, we have

hψjHjψ i = hψjλjψ i = λhψjψ i = λ, ð1:123Þ

where we assume that jψi is normalized; namely hψ| ψi = 1 or ||ψ|| = 1. Notice that
the symbol “j” in an inner product is of secondary importance. We may disregard
this notation as in the case where a product notation “×” is omitted by denoting ab
instead of a × b.
Taking a complex conjugate of (1.123), we have
1.4 Quantum-Mechanical Operators and Matrices 25

hψjHψ i = λ : ð1:124Þ

Using (1.116) and (1.124), we have

λ = hψjHψ i = ψjH { ψ = hψjHψ i = λ, ð1:125Þ

where with the third equality we used the definition (1.119) for the Hermitian
operator. Hence, the relation λ = λ obviously shows that any eigenvalue λ is real,
if H is Hermitian. The relation (1.125) immediately tells us that even though jψi is
not an eigenfunction, hψ| Hψi is real as well, if H is Hermitian. The quantity hψ| Hψi
is said to be an expectation value. This value is interpreted as the most probable or
averaged value of H obtained as a result of operation of H on a physical state jψi. We
sometimes denote the expectation value as

hH i  hψjHψ i, ð1:126Þ

where jψi is normalized. Unless jψi is normalized, it can be normalized on the basis
of (1.121) by choosing jΦi such that

j Φi = j ψi= ψ : ð1:127Þ

Thus, we have an important consequence; if an Hermitian operator has an


eigenvalue, it must be real. An expectation value of an Hermitian operator is real
as well. The real eigenvalue and expectation value are a prerequisite for a physical
quantity.
As discussed above, the Hermitian matrices play a central role in quantum
physics. Taking a further step, let us extend the notion of Hermiticity to a function
space.
In Example 1.1, we have remarked that we have finally reached a solution where λ
is a real (and positive) number, even though at the beginning we set no restriction on
λ. This is because the SOLDE form (1.61) accompanied by BCs (1.62) is Hermitian,
and so eigenvalues λ are real.
In this context, we give a little bit of further consideration. We define an inner
product between two functions as follows:

b
hgjf i  gðxÞ f ðxÞdx, ð1:128Þ
a

where g(x) is a complex conjugate of g(x); x is a real variable and an integration


range can be either bounded or unbounded. If a and b are real definite numbers, [a, b]
is the bounded case. With the unbounded case, we have, e.g., (-1, 1), (-1, c),
and (c, 1), etc. where c is a definite number. This notation will appear again in
Chap. 10. In (1.128) we view functions f and g as vectors in a function space, often
26 1 Schrödinger Equation and Its Application

referred to as a Hilbert space. We assume that any function f is square-integrable; i.e.,


|f|2 is finite. That is,

b
jf ðxÞj2 dx < 1: ð1:129Þ
a

Using the above definition, let us calculate hg| Dfi, where D was defined in (1.63).
Then, using the integration by parts, we have

b
d 2 f ðxÞ b
gð x Þ  -
0
dx = - ½g f 0 a þ g f 0 dx
b
hgjDf i =
a dx2 a
b b b b
0 00 0 00
- ½g f 0 a þ g f g fdx = g f - g f 0 - g f dx
b
= - þ
a a a a
b
0  0
= g f -g f þ hDgjf i:
a
ð1:130Þ

If we have BCs such that

f ð bÞ = f ð aÞ = 0 and
  ð1:131Þ
gð bÞ = gð aÞ = 0 i:e:, gðbÞ = gðaÞ = 0,

we get

hgjDf i = hDgjf i: ð1:132Þ

In light of (1.120), (1.132) implies that D is Hermitian. In (1.131), notice that the
functions f and g satisfy the same BCs. Normally, for an operator to be Hermitian has
this property. Thus, the Hermiticity of a differential operator is closely related to BCs
of the differential equation.
Next, we consider a following inner product:

b b b
0
f  f 00 dx = - ½f  f 0 a þ f  f 0 dx = - ½f  f 0 a þ jf 0 j dx: ð1:133Þ
b b 2
hf jDf i = -
a a a

Note that the definite integral of (1.133) cannot be negative. There are two
possibilities for D to be Hermitian according to different BCs.
1. Dirichlet conditions: f(b) = f(a) = 0. If we could have f ′ = 0, hfj Dfi would be
zero. But, in that case f should be constant. If so, f(x)  0 according to BCs. We
must exclude this trivial case. Consequently, to avoid this situation we must have
1.4 Quantum-Mechanical Operators and Matrices 27

b
jf 0 j dx > 0
2
or hf jDf i > 0: ð1:134Þ
a

In this case, the operator D is said to be positive definite. Suppose that such a
positive-definite operator has an eigenvalue λ. Then, for a corresponding
eigenfunction y(x) we have

DyðxÞ = λyðxÞ: ð1:135Þ

In this case, we state that y(x) is an eigenfunction or eigenvector that corre-


sponds (or belongs) to an eigenvalue λ. Taking an inner product of both sides, we
have

hyjDyi = hyjλyi = λhyjyi = λjjyjj2 or λ = hyjDyi=jjyjj2 : ð1:136Þ

Both hy| Dyi and ||y||2 are positive and, hence, we have λ > 0. Thus, if D has an
eigenvalue, it must be positive. In this case, λ is said to be positive definite as
well; see Example 1.1.
2. Neumann conditions: f ′(b) = f ′(a) = 0. From (1.130), D is Hermitian as well.
Unlike the condition (1), however, f may be a non-zero constant in this case.
Therefore, we are allowed to have

b
jf 0 j dx = 0
2
or hf jDf i = 0: ð1:137Þ
a

For any function, we have

hf jDf i ≥ 0: ð1:138Þ

In this case, the operator D is said to be non-negative (or positive semi-


definite). The eigenvalue may be zero from (1.136) and, hence, is called
non-negative accordingly.
3. Periodic conditions: f(b) = f(a) and f ′(b) = f ′(a). We are allowed to have hf|
Dfi ≥ 0 as in the case of the condition (2). Then, the operator and eigenvalues are
non-negative.
Thus, in spite of being formally the same operator, that operator behaves
differently according to the different BCs. In particular, for a differential operator
to be associated with an eigenvalue of zero produces a special interest. We will
encounter another illustration in Chap. 3.
28 1 Schrödinger Equation and Its Application

1.5 Commutator and Canonical Commutation Relation

In quantum mechanics it is important whether two operators A and B are commut-


able. In this context, a commutator between A and B is defined such that

½A, B  AB - BA: ð1:139Þ

If [A, B] = 0 (zero matrix), A and B are said to be commutable (or commutative).


If [A, B] ≠ 0, A and B are non-commutative. Such relationships between two
operators are called commutation relation.
We have canonical commutation relation as an underlying concept of quantum
mechanics. This is defined between a (canonical) coordinate q and a (canonical)
momentum p such that

½q, p = iħ, ð1:140Þ

where the presence of a unit matrix E is implied. Explicitly writing it, we have

½q, p = iħE: ð1:141Þ

The relations (1.140) and (1.141) are called the canonical commutation relation.
On the basis of a relation p = ħi ∂q

, a brief proof for this is as follows:

½qpjψi ¼ ðqp - pqÞjψi


ħ ∂ ħ ∂ ħ ∂ ħ ∂
¼ q - q jψi ¼ q jψi - ðqjψiÞ
i ∂q i ∂q i ∂q i ∂q ð1:142Þ
ħ ∂jψi ħ ∂q ħ ∂jψi ħ
¼q - jψi - q ¼ - jψi ¼ iħjψi:
i ∂q i ∂q i ∂q i

Since jψ i is an arbitrarily chosen vector, we have (1.140).


Using (1.117), we have

½A, B{ = ðAB - BAÞ{ = B{ A{ - A{ B{ : ð1:143Þ

If in (1.143) A and B are both Hermitian, we have

½A, B{ = BA - AB = - ½A, B: ð1:144Þ

If we have an operator G such that

G{ = - G, ð1:145Þ
1.5 Commutator and Canonical Commutation Relation 29

G is said to be anti-Hermitian. Therefore, [A, B] is anti-Hermitian, if both A and B are


Hermitian. If an anti-Hermitian operator has an eigenvalue, that eigenvalue is zero or
pure imaginary. To show this, suppose that

G j ψi = λ j ψi, ð1:146Þ

where G is an anti-Hermitian operator and jψi has been normalized. As in the case of
(1.123) we have

hψjGjψ i = λhψjψ i = λ: ð1:147Þ

Taking a complex conjugate of (1.147), we have

hψjGψ i = λ : ð1:148Þ

Using (1.116) and (1.145) again, we have

λ = hψjGψ i = ψjG{ ψ = - hψjGψ i = - λ: ð1:149Þ

This shows that λ is zero or pure imaginary.


Therefore, (1.142) can be viewed as an eigenvalue equation to which any physical
state jψi has a pure imaginary eigenvalue iħ with respect to [q, p]. Note that both
q and p are Hermitian (see Sect. 10.2, Example 10.3), and so [q, p] is anti-Hermitian
as mentioned above. The canonical commutation relation given by (1.140) is
believed to underpin the uncertainty principle.
In quantum mechanics, it is of great importance whether a quantum operator is
Hermitian or not. A position operator and momentum operator along with an angular
momentum operator are particularly important when we constitute Hamiltonian. Let
f and g be arbitrary functions. Let us consider, e.g., a following inner product with
the momentum operator.

b
ħ ∂
hgjpf i = gðxÞ ½f ðxÞdx, ð1:150Þ
a i ∂x

where the domain [a, b] depends on a physical system; this can be either bounded or
unbounded. Performing integration by parts, we have

b
ħ ∂ ħ
½ gð x Þ  f ð xÞ  a - ½gðxÞ  f ðxÞdx
b
hgjpf i =
i a ∂x i
b  ð1:151Þ
ħ ħ ∂
= ½ gð bÞ f ð bÞ - gð aÞ f ð aÞ  þ gðxÞ f ðxÞdx:
i a i ∂x

If we require f(b) = f(a) and g(b) = g(a), the first term vanishes and we get
30 1 Schrödinger Equation and Its Application

b 
ħ ∂
hgjpf i = gðxÞ f ðxÞdx = hpgjf i: ð1:152Þ
a i ∂x

Thus, as in the case of (1.120), the momentum operator p is Hermitian. Note that a
position operator q of (1.142) is Hermitian as a priori assumption.
Meanwhile, the z-component of angular momentum operator Lz is described in a
polar coordinate as follows:

ħ ∂
Lz = , ð1:153Þ
i ∂ϕ

where ϕ is an azimuthal angle varying from 0 to 2π. The notation and implication of
Lz will be mentioned in Chap. 3. Similarly as the above, we have

2π 
ħ ħ ∂
hgjLz f i = ½gð2π Þ f ð2π Þ - gð0Þ f ð0Þ þ gðxÞ f ðxÞdϕ: ð1:154Þ
i 0 i ∂ϕ

Requiring an arbitrary function f to satisfy a BC f(2π) = f(0), we reach

hgjLz f i = hLz gjf i: ð1:155Þ

Note that we must have the above BC, because ϕ = 0 and ϕ = 2π are spatially the
same point. Thus, we find that Lz is Hermitian as well on this condition.
On the basis of aforementioned argument, let us proceed to quantum-mechanical
studies of a harmonic oscillator. Regarding the angular momentum, we will study
their basic properties in Chap. 3.

Reference

1. Mϕller C (1952) The theory of relativity. Oxford University Press, London


Chapter 2
Quantum-Mechanical Harmonic Oscillator

Quantum-mechanical treatment of a harmonic oscillator has been a well-studied


topic from the beginning of the history of quantum mechanics. This topic is a
standard subject in classical mechanics as well. In this chapter, first we briefly
survey characteristics of a classical harmonic oscillator. From a quantum-mechanical
point of view, we deal with features of a harmonic oscillator through matrix
representation. We define creation and annihilation operators using position and
momentum operators. A Hamiltonian of the oscillator is described in terms of the
creation and annihilation operators. This enables us to easily determine energy
eigenvalues of the oscillator. As a result, energy eigenvalues are found to be positive
definite. Meanwhile, we express the Schrödinger equation by the coordinate repre-
sentation. We compare the results with those of the matrix representation and show
that the two representations are mathematically equivalent. Thus, the treatment of the
quantum-mechanical harmonic oscillator supplies us with a firm ground for studying
basic concepts of the quantum mechanics.

2.1 Classical Harmonic Oscillator

Classical Newtonian equation of a one-dimensional harmonic oscillator is expressed


as

d 2 xð t Þ
m = - sxðt Þ, ð2:1Þ
dt 2

where m is a mass of an oscillator and s is a spring constant. Putting s/m = ω2, we


have

© Springer Nature Singapore Pte Ltd. 2023 31


S. Hotta, Mathematical Physical Chemistry,
https://doi.org/10.1007/978-981-99-2512-4_2
32 2 Quantum-Mechanical Harmonic Oscillator

d 2 xð t Þ
þ ω2 xðt Þ = 0: ð2:2Þ
dt 2

In (2.2) we set ω positive, namely,

ω= s=m, ð2:3Þ

where ω is called an angular frequency of the oscillator.


If we replace ω2 with λ, we have formally the same equation as (1.61). Two
linearly independent solutions of (2.2) are the same as before (see Example 1.1); we
have eiωt and e-iωt (ω ≠ 0) as such. Note, however, that in Example 1.2 we were
dealing with a quantum state related to existence probability of a particle in a
potential well. In (2.2), however, we are examining a position of harmonic oscillator
undergoing a force of a spring. We are thus considering a different situation.
As a general solution we have

xðt Þ = aeiωt þ be - iωt , ð2:4Þ

where a and b are suitable constants. Let us consider BCs different from those of
Example 1.1 or 1.2 this time. That is, we set BCs such that

x ð 0Þ = 0 and x0 ð0Þ = v0 ðv0 > 0Þ: ð2:5Þ

Notice that (2.5) gives initial conditions (ICs). Mathematically, ICs are included
in BCs (see Chap. 10). From (2.4) we have

xð t Þ = a þ b = 0 and x0 ð0Þ = iωða - bÞ = v0 : ð2:6Þ

Then, we get a = - b = v0/2iω. Thus, we get a simple harmonic motion as a


solution expressed as

v0 iωx v
xð t Þ = e - e - iωx = 0 sin ωt: ð2:7Þ
2iω ω

From this, we have

1
E=K þ V = mv 2 : ð2:8Þ
2 0

In particular, if v0 = 0, x(t)  0. This is a solution of (2.1) that has the meaning


that the particle is eternally at rest. It is physically acceptable as well. Notice also that
unlike Examples 1.1 and 1.2, the solution has been determined uniquely. This is due
to the different BCs.
2.2 Formulation Based on an Operator Method 33

From a point of view of a mechanical system, mathematical formulation of the


classical harmonic oscillator resembles that of electromagnetic fields confined within
a cavity. We return this point later in Sect. 9.6.

2.2 Formulation Based on an Operator Method

Now let us return to our task to find quantum-mechanical solutions of a harmonic


oscillator. Potential V is given by

1 2 1
V ð qÞ = sq = mω2 q2 , ð2:9Þ
2 2

where q is used for a one-dimensional position coordinate. Then, we have a classical


Hamiltonian H expressed as

p2 p2 1
H= þ V ð qÞ = þ mω2 q2 : ð2:10Þ
2m 2m 2

Following the formulation of Sect. 1.2, the Schrödinger equation as an eigenvalue


equation related to energy E is described as

Hψ ðqÞ = Eψ ðqÞ

or

ħ2 2 1
- — ψ ðqÞ þ mω2 q2 ψ ðqÞ = Eψ ðqÞ: ð2:11Þ
2m 2

This is a SOLDE and it is well known that the SOLDE can be solved by a power
series expansion method.
In the present studies, however, let us first use an operator method to solve the
eigenvalue equation (2.11) of a one-dimensional oscillator. To this end, we use a
quantum-mechanical Hamiltonian where a momentum operator p is explicitly
represented. Thus, the Hamiltonian reads as

p2 1
H= þ mω2 q2 : ð2:12Þ
2m 2

Equation (2.12) is formally the same as (2.10). Note, however, that in (2.12) p and
q are expressed as quantum-mechanical operators.
As in (1.126), we first examine an expectation value hHi of H. It is given by
34 2 Quantum-Mechanical Harmonic Oscillator

p2 1
hHi ¼ hψjHψi ¼ ψj ψ þ ψj mω2 q2 ψ
2m 2
1 1 1 1
¼ p{ ψjpψi þ mω2 q{ ψjqψi ¼ hpψjpψi þ mω2 hqψjqψi
2m 2 2m 2
1 1
¼ kpψ k2 þ mω2 kqψ k2 ≥ 0, ð2:13Þ
2m 2

where again we assumed that jψi has been normalized. In (2.13) we used the
notation (1.126) and the fact that both q and p are Hermitian. In this situation, hHi
takes a non-negative value.
In (2.13), the equality holds if and only if jpψi = 0 and jqψi = 0. Let us specify a
vector jψ 0i that satisfies these conditions such that

j pψ 0 i = 0 and j qψ 0 i = 0: ð2:14Þ

Multiplying q from the left on the first equation of (2.14) and multiplying p from
the left on the second equation, we have

qp j ψ 0 i = 0 and pq j ψ 0 i = 0: ð2:15Þ

Subtracting the second equation of (2.15) from the first equation, we get

ðqp - pqÞ j ψ 0 i = iħ j ψ 0 i = 0, ð2:16Þ

where with the first equality we used (1.140). Therefore, we would have jψ 0(q)i  0.
This leads to the relations (2.14). That is, if and only if jψ 0(q)i  0, hHi = 0. But,
since it has no physical meaning, jψ 0(q)i  0 must be rejected as unsuitable for the
solution of (2.11). Regarding a physically acceptable solution of (2.13), hHi must
take a positive definite value accordingly. Thus, on the basis of the canonical
commutation relation, we restrict the range of the expectation values.
Instead of directly dealing with (2.12), it is well known to introduce the following
operators [1]:

mω i
a qþp p ð2:17Þ
2ħ 2mħω

and its adjoint (complex conjugate) operator

mω i
a{ = q- p p: ð2:18Þ
2ħ 2mħω

Notice here again that both q and p are Hermitian. Using a matrix representation
for (2.17) and (2.18), we have
2.2 Formulation Based on an Operator Method 35

mω i
p
a 2ħ 2mħω q
= : ð2:19Þ
a{ mω
-p
i p
2ħ 2mħω

Then we have

a{ a ¼ ð2mħωÞ - 1 ðmωq - ipÞðmωq þ ipÞ


¼ ð2mħωÞ - 1 m2 ω2 q2 þ p2 þ imωðqp - pqÞ
1 1 2 1 1
¼ ðħωÞ - 1 mω2 q2 þ p þ iωiħ ¼ ðħωÞ - 1 H - ħω , ð2:20Þ
2 2m 2 2

where the second last equality comes from (1.140). Rewriting (2.20), we get

1
H = ħωa{ a þ ħω: ð2:21Þ
2

Similarly we get

1
H = ħωaa{ - ħω: ð2:22Þ
2

Subtracting (2.22) from (2.21), we have

0 = ħωa{ a - ħωaa{ þ ħω: ð2:23Þ

That is,

a, a{ = 1 or a, a{ = E, ð2:24Þ

where E represents an identity operator.


Furthermore, using (2.21) we have

1
H, a{ = ħω a{ a þ , a{ = ħω a{ aa{ - a{ a{ a = ħωa{ a, a{ = ħωa{ : ð2:25Þ
2

Similarly, we get

½H, a = - ħωa: ð2:26Þ

Next, let us calculate an expectation value of H. Using a normalized function jψi,


from (2.21) we have
36 2 Quantum-Mechanical Harmonic Oscillator

1 1
hψjHjψi ¼ ψjħωa{ a þ ħωjψ ¼ ħω ψja{ ajψi þ ħωhψjψi
2 2
1 1 1
2
¼ ħωhaψjaψi þ ħω ¼ ħωkaψ k þ ħω ≥ ħω: ð2:27Þ
2 2 2

Thus, the expectation value is equal to or larger than 12 ħω. This is consistent with
that an energy eigenvalue is positive definite as mentioned above. Equation (2.27)
also tells us that if we have

j aψ 0 i = 0, ð2:28Þ

we get

1
hψ 0 jHjψ 0 i = ħω: ð2:29Þ
2

Equation (2.29) means that the smallest expectation value is 12 ħω on the condition
of (2.28). On the same condition, using (2.21) we have

1 1
H j ψ 0 i = ħωa{ a j ψ 0 i þ ħω j ψ 0 i = ħω j ψ 0 i: ð2:30Þ
2 2

Thus, jψ 0i is an eigenfunction corresponding to an eigenvalue 12 ħω  E 0 , which


is identical with the smallest expectation value of (2.29). Since this is the lowest
eigenvalue, jψ 0i is said to be a ground state. We ensure later that jψ 0i is certainly an
eligible function for a ground state.
The above method is consistent with the variational principle [2] which stipulates
that under appropriate BCs an expectation value of Hamiltonian estimated with any
arbitrary function is always larger than or equal to the smallest eigenvalue
corresponding to the ground state.
Next, let us evaluate energy eigenvalues of the oscillator. First we have

1
H j ψ 0i = ħω j ψ 0 i = E 0 j ψ 0 i: ð2:31Þ
2

Operating a{ on both sides of (2.31) from the left, we have

a{ H j ψ 0 i = a{ E 0 j ψ 0 i: ð2:32Þ

Meanwhile, using (2.25), we have

a{ H j ψ 0 i = Ha{ - ħωa{ j ψ 0 i: ð2:33Þ

Equating RHSs of (2.32) and (2.33), we get


2.2 Formulation Based on an Operator Method 37

(x )

Fig. 2.1 Energy eigenvalues of a quantum-mechanical harmonic oscillator on a real axis

Ha{ j ψ 0 i = ðE 0 þ ħωÞa{ j ψ 0 i: ð2:34Þ

This implies that a{ j ψ 0i belongs to an eigenvalue (E0 + ħω), which is larger than
E0 as expected. Again multiplying a{ on both sides of (2.34) from the left and using
(2.25), we get

2 2
H a{ j ψ 0 i = ðE 0 þ 2ħωÞ a{ j ψ 0 i: ð2:35Þ

This implies that (a{)2 j ψ 0i belongs to an eigenvalue (E0 + 2ħω). Thus, repeat-
edly taking the above procedures, we get
n n
H a{ j ψ 0 i = ðE 0 þ nħωÞ a{ j ψ 0 i: ð2:36Þ

Thus, (a{)n j ψ 0i belongs to an eigenvalue

1
En  E 0 þ nħω = n þ ħω, ð2:37Þ
2

where En denotes an energy eigenvalue of the n-th excited state. The energy
eigenvalues are plotted in Fig. 2.1.
Our next task is to seek normalized eigenvectors of the n-th excited state. Let cn
be a normalization constant of that state. That is, we have
n
j ψ n i = c n a{ j ψ 0 i, ð2:38Þ

where jψ ni is a normalized eigenfunction of the n-th excited state. To determine cn,


let us calculate a j ψ ni. This includes a factor a(a{)n. We have
38 2 Quantum-Mechanical Harmonic Oscillator

n n-1 n-1
a a{ ¼ aa{ - a{ a a{ þ a{ a a{
n-1 n-1 n-1 n-1
¼ a, a{ a{ þ a { a a{ ¼ a{ þ a{ a a{
n-1 n-2 2 n-2
¼ a{ þ a{ a, a{ a{ þ a { a a{
n-1 2 n-2 ð2:39Þ
¼ 2 a{ þ a{ a a {
n-1 2 n-3 3 n-3
¼ 2 a{ þ a{ a, a{ a{ þ a { a a{
n-1 3 n-3
¼ 3 a{ þ a{ a a {
¼ ⋯:

In the above procedures, we used [a, a{] = 1. What is implied in (2.39) is that a
coefficient of (a{)n - 1 increased one by one with a transferred toward the right one
by one in the second term of RHS. Notice that in the second term a is sandwiched
such that (a{)ma(a{)n - m (m = 1, 2, ⋯). Finally, we have

n n-1 n
a a{ = n a{ þ a{ a: ð2:40Þ

Meanwhile, operating a on both sides of (2.38) from the left and using (2.40), we
get

n n-1 n n-1
a ψ n i ¼ c n a a{ ψ 0 i ¼ c n n a{ þ a{ a j ψ 0 i ¼ c n n a{ j ψ 0i
cn n-1 cn ð2:41Þ
¼n c n - 1 a{ j ψ 0i ¼ n j ψ n - 1 i,
cn - 1 cn - 1

where the third equality comes from (2.28).


Next, operating a on (2.40) we obtain

n n-1 n
a2 a{ ¼ na a{ þ a a{ a
n-2 n-1 n
¼ n ð n - 1Þ a { þ a{ a þ a a{ a ð2:42Þ
n-2 n-1 n
¼ n ð n - 1Þ a{ þ n a{ a þ a a{ a:

Operating another a on (2.42), we get


2.2 Formulation Based on an Operator Method 39

n n-2 n-1 n
a3 a{ = nðn - 1Þa a{ þ na a{ a þ a2 a{ a
n-3 n-2 n-1 n
= nðn - 1Þ ðn - 2Þ a{ þ a{ a þ na a{ a þ a2 a{ a
n-3 n-2 n-1 n
= nðn - 1Þðn - 2Þ a{ þ nðn - 1Þ a{ a þ na a{ a þ a2 a{ a
= ⋯:
ð2:43Þ

To generalize the above procedures, operating a on (2.40) m (<n) times we get


n n-m
am a{ = nðn - 1Þðn - 2Þ⋯ðn - m þ 1Þ a{ þ f a, a{ a, ð2:44Þ

where m < n and f(a, a{) is a polynomial of a{ that has a power of a as a coefficient.
Further operating hψ 0j and jψ 0i from the left and right on both sides of (2.44),
respectively, we have

n n-m
ψ 0 jam a{ jψ 0 ¼ nðn - 1Þðn - 2Þ⋯ðn - m þ 1Þ ψ 0 j a{ jψ 0

þ ψ 0 j f a, a{ ajψ 0
n-m
¼ nðn - 1Þðn - 2Þ⋯ðn - m þ 1Þ ψ 0 j a{ jψ 0 : ð2:45Þ

Note that in (2.45) hψ 0j f(a, a{)aj ψ 0i vanishes because of (2.28).


Meanwhile, taking adjoint of (2.28), we have

hψ 0 j a{ = 0: ð2:46Þ

Operating a{(n - m - 1) times from the right of LHS of (2.46), we have


n-m
hψ 0 j a{ = 0:

Further operating jψ 0i from the right of the above equation, we obtain

n-m
ψ 0 j a{ jψ 0 = 0:

Therefore, from (2.45) we get

n
ψ 0 jam a{ jψ 0 = 0: ð2:47Þ

Taking adjoint of (2.47) once again, we have


40 2 Quantum-Mechanical Harmonic Oscillator

m
ψ 0 jan a{ jψ 0 = 0: ð2:48Þ

Equation (2.48) can be obtained by repeatedly using (1.117). From (2.47) and
(2.48), we get

n
ψ 0 jam a{ jψ 0 = 0, ð2:49Þ

when m ≠ n. If m = n, from (2.45) we obtain

n
ψ 0 jan a{ jψ 0 = n!hψ 0 jψ 0 i: ð2:50Þ

If we assume that jψ 0i is normalized; i.e., hψ 0| ψ 0i = 1, (2.50) is expressed as

n
ψ 0 jan a{ jψ 0 = n!: ð2:51Þ

From (2.51), if we put

1 n
j ψ n i = p a{ j ψ 0 i, ð2:52Þ
n!

then we have

1
hψ n j = p hψ 0 j an :
n!

Thus, from (2.49) and (2.52) we get

hψ m jψ n i = δmn : ð2:53Þ

At the same time, for cn of (2.38) we get

1
cn = p : ð2:54Þ
n!

Notice here that an undetermined phase factor eiθ (θ : real) is intended such that

1
cn = p eiθ :
n!

But eiθ is usually omitted for the sake of simplicity. Thus, we have constructed a
series of orthonormal eigenfunctions jψ ni.
Furthermore, using (2.41) and (2.54) we obtain
2.2 Formulation Based on an Operator Method 41

p
a j ψ n i = n j ψ n - 1 i: ð2:55Þ

From (2.36), we get

H j ψ n i = ðE 0 þ nħωÞ j ψ n i: ð2:56Þ

Meanwhile, from (2.21) we have

H j ψ n i = ħωa{ a þ E0 j ψ n i: ð2:57Þ

Equating RHSs of (2.56) and (2.57), we get

a{ a j ψ n i = n j ψ n i: ð2:58Þ

Thus, we find that an integer n is an eigenvalue of a{a when it is evaluated with


respect to jψ ni. For this reason a{a is called a number operator. Notice that a{a is
Hermitian because we have

{ {
a{ a = a{ a{ = a{ a, ð2:59Þ

where we used (1.115) and (1.117).


Moreover, from (2.55) and (2.58),
p
a{ a j ψ n i = na{ j ψ n - 1 i = n j ψ n i: ð2:60Þ

Thus,
p
a{ j ψ n - 1 i = n j ψ n i: ð2:61Þ

Or replacing n with n + 1, we get


p
a{ j ψ n i = n þ 1 j ψ nþ1 i: ð2:62Þ

As implied in (2.55) and (2.62), we find that operating a on jψ ni lowers an energy


level by one and that operating a{ on jψ ni raises an energy level by one. For this
reason, a and a{ are said to be an annihilation operator and creation operator,
respectively.
42 2 Quantum-Mechanical Harmonic Oscillator

2.3 Matrix Representation of Physical Quantities

Equations (2.55) and (2.62) clearly represent the relationship between an operator
and eigenfunction (or eigenvector). The relationship is characterized by

ðMatrixÞ × ðVectorÞ = ðVectorÞ: ð2:63Þ

Thus, we are now in a position to construct this relation using matrices. From
(2.53), we should be able to construct basis vectors using a column vector such that

1 0 0
0 1 0
0 0 1
j ψ 0i = 0 , j ψ 1i = 0 , j ψ 2i = 0 , ⋯: ð2:64Þ
0 0 0

⋮ ⋮ ⋮

Notice that these vectors form a vector space of an infinite dimension. The
orthonormal relation (2.53) can easily be checked using (2.64). We represent
a and a{ so that (2.55) and (2.62) can be satisfied. We obtain

0 1 0 0 0 ⋯
p
0 0 2 0 0 ⋯
p
0 0 0 3 0 ⋯
a= : ð2:65Þ
0 0 0 0 2 ⋯
0 0 0 0 0 ⋯
⋮ ⋮ ⋮ ⋮ ⋮ ⋱

Similarly,

0 0 0 0 0 ⋯
1 0 0 0 0 ⋯
p
0 2 0 0 0 ⋯
a{ = p : ð2:66Þ
0 0 3 0 0 ⋯
0 0 0 2 0 ⋯
⋮ ⋮ ⋮ ⋮ ⋮ ⋱

Note that neither a nor a{ is Hermitian.


2.3 Matrix Representation of Physical Quantities 43

Since a determinant of the matrix of (2.19) is -i/ħ ≠ 0, using its inverse matrix we
get

ħ ħ
q 2mω 2mω a
= : ð2:67Þ
p 1 mħω 1 mħω a{
-
i 2 i 2

That is, we have

ħ 1 mħω
q= a þ a{ and p= a - a{ : ð2:68Þ
2mω i 2

With the inverse matrix, we will deal with it in Part III. Note that q and p are both
Hermitian. Inserting (2.65) and (2.66) into (2.68), we get

0 1 0 0 0 ⋯
p
1 0 2 0 0 ⋯
p p
ħ 0 2 0 3 0 ⋯
q¼ p , ð2:69Þ
2mω 0 0 3 0 2 ⋯

0 0 0 2 0 ⋯

⋮ ⋮ ⋮ ⋮ ⋮ ⋱
0 -1 0 0 0 ⋯
p
1 0 - 2 0 0 ⋯
p p
mħω 0 2 0 - 3 0 ⋯
p= i p : ð2:70Þ
2 0 0 3 0 -2 ⋯
0 0 0 2 0 ⋯
⋮ ⋮ ⋮ ⋮ ⋮ ⋱

Equations (2.69) and (2.70) obviously show that q and p are Hermitian. We can
derive various physical quantities from these equations. For instance, following
matrix algebra Hamiltonian H can readily be calculated. The result is expressed as
44 2 Quantum-Mechanical Harmonic Oscillator

1 0 0 0 0 ⋯
0 3 0 0 0 ⋯
p2 1 ħω 0 0 5 0 0 ⋯
H= þ mω2 q2 = : ð2:71Þ
2m 2 2 0 0 0 7 0 ⋯
0 0 0 0 9 ⋯
⋮ ⋮ ⋮ ⋮ ⋮ ⋱

Looking at (2.69)–(2.71), we immediately realize that although neither q or p is


diagonalized, H is diagonalized. The matrix representation of (2.71) is said to be a
representation that diagonalizes H. This representation is of great practical use. In
fact, using (2.64) we get, e.g.,

1 0 0 0 0 ⋯
0 0 0
0 3 0 0 0 ⋯ 0 0 0

ħω 0 0 5 0 0 ⋯ 1 ħω 5 5ħω 1
Hjψ 2 i ¼ ¼ ¼
2 0 0 0 7 0 ⋯ 0 2 0 2 0

0 0 0 0 9 ⋯ 0 0 0
⋮ ⋮ ⋮

⋮ ⋮ ⋮ ⋮ ⋱
5ħω 1
¼ ψ 2i ¼ þ 2 ħω ψ 2 i:
2 2
ð2:72Þ

This clearly means that the second-excited state jψ 2i has an eigenenergy


1
2 þ 2 ħω. More generally, we find that jψ ni has an eigenenergy 12 þ n ħω as
already shown in (2.36) and (2.56).
Furthermore, let us confirm the canonical commutation relation of (1.140). Using
(2.68), we have

1 ħ mħω
qp - pq = a þ a{ a - a{ - a - a{ a þ a{
i 2mω 2 ð2:73Þ
ħ
=  ð - 2Þ  aa{ - a{ a = iħ a, a{ = iħ = iħE,
2i

where with the second last equality we used (2.24) and the identity matrix E of an
infinite dimension is given by
2.4 Coordinate Representation of Schrödinger Equation 45

1 0 0 0 0 ⋯
0 1 0 0 0 ⋯
0 0 1 0 0 ⋯
E= : ð2:74Þ
0 0 0 1 0 ⋯
0 0 0 0 1 ⋯
⋮ ⋮ ⋮ ⋮ ⋮ ⋱

Thus, the canonical commutation relation holds with the quantum-mechanical


harmonic oscillator. This can be confirmed directly from (2.69) and (2.70). The
proof is left for readers as an exercise.

2.4 Coordinate Representation of Schrödinger Equation

The Schrödinger equation has been given in (1.55) or (2.11) as a SOLDE form. In
contrast to the matrix representation, (1.55) and (2.11) are said to be coordinate
representation of Schrödinger equation. Now, let us derive a coordinate representa-
tion of (2.28) to obtain an analytical solution.
On the basis of (1.31) and (2.17), a is expressed as

mω i mω i ħ ∂ mω ħ
a= qþp p= qþp = qþ
2ħ 2mħω 2ħ 2mħω i ∂q 2ħ 2mω

 : ð2:75Þ
∂q

Thus, (2.28) reads as a following first-order linear differential equation (FOLDE):

mω ħ ∂
qþ ψ ð qÞ = 0 ð2:76Þ
2ħ 2mω ∂q 0

or

mω ħ ∂ψ 0 ðqÞ
qψ ðqÞ þ = 0: ð2:77Þ
2ħ 0 2mω ∂q

This is further reduced to


46 2 Quantum-Mechanical Harmonic Oscillator

∂ψ 0 ðqÞ mω
þ qψ 0 ðqÞ = 0: ð2:78Þ
∂q ħ

From this FOLDE form, we anticipate the following solution:

ψ 0 ðqÞ = N 0 e - αq ,
2
ð2:79Þ

where N0 is a normalization constant and α is a constant coefficient. Putting (2.79)


into (2.78), we have


q N 0 e - αq = 0:
2
- 2αq þ ð2:80Þ
ħ

Hence, we get


α= : ð2:81Þ

Thus,

ψ 0 ðqÞ = N 0 e - 2ħ q :
mω 2
ð2:82Þ

The constant N0 can be determined by the normalization condition given by


1 1
e-
mω 2
jψ 0 ðqÞj2 dq = 1 or N 0 2 ħ q dq = 1: ð2:83Þ
-1 -1

Recalling the following formula:

1
π
e - cq dq =
2
ðc > 0Þ, ð2:84Þ
-1 c

we have

1
πħ
e-
mω 2
ħ q dq = : ð2:85Þ
-1 mω
1 - cq2
To get (2.84), putting I  - 1e dq, we have
2.4 Coordinate Representation of Schrödinger Equation 47

1 1 1
e - cq dq e - cs ds = e - cð q þs2 Þ
2 2 2
I2 = dqds
-1 -1 -1
1 2π 1 2π
ð2:86Þ
- cr 2 1 - cR π
= e rdr dθ = e dR dθ = ,
0 0 2 0 0 c

where with the third equality we converted two-dimensional Cartesian coordinate to


polar coordinate; take q = r cos θ, s = r sin θ and convert an infinitesimal area
element dqds to dr  rdθ. With the second last equality of (2.86), we used the variable
transformation of r2 → R. Hence, we get I = π=c.
Thus, we get

mω 1=4 mω 1=4
e - 2ħ q :
mω 2
N0 = and ψ 0 ð qÞ = ð2:87Þ
πħ πħ

Also, we have

mω i mω i ħ ∂ mω ħ
a{ = q- p p= q- p = q-
2ħ 2mħω 2ħ 2mħω i ∂q 2ħ 2mω

 : ð2:88Þ
∂q

From (2.52), we get


n
1 n 1 mω ħ ∂
ψ n ð q Þ = p a{ j ψ 0i = p q- ψ 0 ðqÞ
n! n! 2ħ 2mω ∂q
ð2:89Þ
n
1 mω n=2 ħ ∂
=p q- ψ 0 ðqÞ:
n! 2ħ mω ∂q

Putting

β mω=ħ and ξ = βq, ð2:90Þ

we rewrite (2.89) as
48 2 Quantum-Mechanical Harmonic Oscillator

n
n
ξ 1 mω 2 ∂
ψ n ð qÞ = ψ n =p ξ- ψ 0 ðqÞ
β n! 2ħβ2 ∂ξ
n
1 1 n=2 ∂
=p ξ- ψ 0 ð qÞ ð2:91Þ
n! 2 ∂ξ
n
1 mω 1=4 ∂
e - 2ξ :
1 2
= ξ-
2 n! πħ
n
∂ξ

Comparing (2.81) and (2.90), we have

β2
α= : ð2:92Þ
2

Moreover, putting

1 mω 1=4 β
Nn  = , ð2:93Þ
2 n! πħ
n
π 1=2 2n n!

we get
n

e - 2ξ :
1 2
ψ n ðξ=βÞ = N n ξ - ð2:94Þ
∂ξ

We have to normalize (2.94) with respect to a variable ξ. Since ψ n(q) has already
been normalized as in (2.53), we have
1
jψ n ðqÞj2 dq = 1: ð2:95Þ
-1

Changing a variable q to ξ, we have


1
1
jψ n ðξ=βÞj2 dξ = 1: ð2:96Þ
β -1

Let us define ψ n ðξÞ as being normalized with ξ. In other words, ψ n(q) is converted
to ψ n ðξÞ by means of variable transformation and concomitant change in normali-
zation condition. Then, we have
1
jψ n ðξÞj2 dξ = 1: ð2:97Þ
-1

Comparing (2.96) and (2.97), if we define ψ n ðξÞ as


2.4 Coordinate Representation of Schrödinger Equation 49

1
ψ n ðξÞ  ψ ðξ=βÞ, ð2:98Þ
β n

ψ n ðξÞ should be a proper normalized function. Thus, we get

n
∂ 1
e - 2ξ
1 2
ψ n ðξÞ = N n ξ - with Nn  : ð2:99Þ
∂ξ π 1=2 2n n!

Meanwhile, according to a theory of classical orthogonal polynomial, the Hermite


polynomials Hn(x) are defined as [3]

dn
e-x
2 2
H n ðxÞ  ð- 1Þn ex ðn ≥ 0Þ, ð2:100Þ
dxn

where Hn(x) is a n-th order polynomial. We wish to show the following relation on
the basis of mathematical induction:

ψ n ðξÞ = N n H n ðξÞe - 2ξ :
1 2
ð2:101Þ

Comparing (2.87), (2.98), and (2.99), we make sure that (2.101) holds with n = 0.
When n = 1, from (2.99) we have


e - 2ξ = N 1 ξe - 2ξ - ð - ξÞe - 2ξ = N 1  2ξe - 2ξ
1 2 1 2 1 2 1 2
ψ 1 ðξÞ = N 1 ξ -
∂ξ
ð2:102Þ
d
1 ξ2
e-ξ - 12ξ2 - 12ξ2
2
= N 1 ð- 1Þ e e = N 1 H 1 ðξÞe :

Then, (2.101) holds with n = 1 as well.


Next, from supposition of mathematical induction we assume that (2.101) holds
with n. Then, we have
50 2 Quantum-Mechanical Harmonic Oscillator

nþ1 n
∂ 1 ∂ ∂
e - 2ξ ¼ e - 2ξ
1 2 1 2
ψ nþ1 ðξÞ ¼ N nþ1 ξ - ξ- Nn ξ -
∂ξ 2ðn þ 1Þ ∂ξ ∂ξ
1 ∂
N n H n ðxÞe - 2ξ
1 2
¼ ξ-
2 ð n þ 1Þ ∂ξ
1 ∂ dn
ð- 1Þn eξ e-ξ e - 2ξ
2 2 1 2
¼ Nn ξ -
2 ð n þ 1Þ ∂ξ dξn
1 ∂ dn
e 2ξ e-ξ
1 2 2
¼ N n ð- 1Þn ξ -
2 ð n þ 1Þ ∂ξ dξn
1 1 2 d
n
∂ 12ξ2 dn
¼ N n ð- 1Þn ξe2ξ n e
- ξ2
- e n e
- ξ2
2 ð n þ 1Þ dξ ∂ξ dξ
n n
1 1 2 d 1 2 d
N n ð- 1Þn ξe2ξ n e - ξ - ξe2ξ - ξ2
2
¼ n e
2 ð n þ 1Þ dξ dξ

d nþ1
- e 2ξ e-ξ
1 2 2

dξnþ1
nþ1
1 1 2 d
N n ð- 1Þnþ1 e2ξ nþ1 e - ξ
2
¼
2 ð n þ 1Þ dξ
dnþ1
¼ N nþ1 ð- 1Þnþ1 eξ e-ξ e - 2ξ ¼ N nþ1 H nþ1 ðxÞe - 2ξ :
2 2 1 2 1 2

dξnþ1

This means that (2.101) holds with n + 1 as well. Thus, it follows that (2.101) is
true of n that is zero or any positive integer.
Orthogonal relation reads as
1
ψ m ðξÞ ψ n ðξÞdξ = δmn , ð2:103Þ
-1

where δmn is Kronecker delta to which

1 ðm = nÞ,
δmn = ð2:104Þ
0 ðm ≠ nÞ:

Placing (2.98) back into the function form ψ n(q), we have

ψ n ð qÞ = βψ n ðβqÞ: ð2:105Þ

Using (2.101) and explicitly rewriting (2.105), we get


2.4 Coordinate Representation of Schrödinger Equation 51

Table 2.1 First six Hermite H0(x) = 1


polynomials
H1(x) = 2x
H2(x) = 4x2 - 2
H3(x) = 8x3 - 12x
H4(x) = 16x4 - 48x2 + 12
H5(x) = 32x5 - 160x3 + 120x

mω 1=4 1 mω
q e - 2ħ q ðn = 0, 1, 2, ⋯Þ:
mω 2
ψ n ð qÞ = Hn ð2:106Þ
ħ π 1=2 2n n! ħ

We tabulate first several Hermite polynomials Hn(x) in Table 2.1, where the index
n represents the highest order of the polynomials. In Table 2.1 we see that even
functions and odd functions appear alternately (i.e., parity). This is the case with
ψ n(q) as well, because ψ n(q) is a product of Hn(x) and an even function e - 2ħ q .
mω 2

Combining (2.101) and (2.103) as well as (2.99), the orthogonal relation between
ψ n ðξÞ ðn = 0, 1, 2, ⋯Þ can be described alternatively as [3]
1 p
e - ξ H m ðξÞH n ðξÞdξ = π 2n n!δmn :
2
ð2:107Þ
-1

Note that Hm(ξ) is a real function, and so Hm(ξ) = Hm(ξ). The relation (2.107) is
well known as the orthogonality of Hermite polynomials with e - ξ taken as a weight
2

function [3]. Here the weight function is a real and non-negative function within the
domain considered [e.g., (-1, +1) in the present case] and independent of indices
m and n. We will deal with it again in Sect. 10.4.
The relation (2.101) and the orthogonality relationship described as (2.107) can
more explicitly be understood as follows: From (2.11) we have the Schrödinger
equation of a one-dimensional quantum-mechanical harmonic oscillator such that

ħ2 d 2 uð qÞ 1
- þ mω2 q2 uðqÞ = EuðqÞ: ð2:108Þ
2m dq2 2

Changing a variable as in (2.90), we have

d 2 uð ξ Þ 2E
- þ ξ 2 uð ξ Þ = uðξÞ: ð2:109Þ
dξ2 ħω

Defining a dimensionless parameter

2E
λ ð2:110Þ
ħω

and also defining a differential operator D such that


52 2 Quantum-Mechanical Harmonic Oscillator

d2
D - þ ξ2 , ð2:111Þ
dξ2

we have a following eigenvalue equation:

DuðξÞ = λuðξÞ: ð2:112Þ

We further consider a following function v(ξ) such that

uðξÞ = vðξÞe - ξ =2
2
: ð2:113Þ

Then, (2.109) is converted as follows:

d 2 vð ξ Þ dvðξÞ - ξ22 ξ2
- 2
þ 2ξ e = ðλ - 1Þ vðξÞe - 2 : ð2:114Þ
dξ dξ

ξ2
Since e - 2 does not vanish with any ξ, we have

d 2 vð ξ Þ dvðξÞ
- þ 2ξ = ðλ - 1Þ vðξÞ: ð2:115Þ
dξ2 dξ

If we define another differential operator D such that

d2 d
D - 2
þ 2ξ , ð2:116Þ
dξ dξ

we have another eigenvalue equation

DvðξÞ = ðλ - 1Þ vðξÞ: ð2:117Þ

Meanwhile, we have a following well-known differential equation:

d 2 H n ðξÞ dH ðξÞ
2
- 2ξ n þ 2nH n ðξÞ = 0: ð2:118Þ
dξ dξ

This equation is said to be Hermite differential equation. Using (2.116), (2.118)


can be recast as an eigenvalue equation such that

DH n ðξÞ = 2nH n ðξÞ: ð2:119Þ

Therefore, comparing (2.115) and (2.118) and putting


2.5 Variance and Uncertainty Principle 53

λ = 2n þ 1, ð2:120Þ

we get

vðξÞ = cH n ðξÞ, ð2:121Þ

where c is an arbitrary constant. Thus, using (2.113), for a solution of (2.109) we get

un ðξÞ = cH n ðξÞe - ξ =2
2
, ð2:122Þ

where the solution u(ξ) is indexed with n. From (2.110), as an energy eigenvalue we
have

1
En = n þ ħω:
2

Thus, (2.37) is recovered. A normalization constant c of (2.122) can be decided as


in (2.106).
As discussed above, the operator representation and coordinate representation are
fully consistent.

2.5 Variance and Uncertainty Principle

Uncertainty principle is one of the most fundamental concepts of quantum mechan-


ics. To think of this conception on the basis of a quantum harmonic oscillator, let us
introduce a variance operator [4]. Let A be a physical quantity and let hAi be an
expectation value as defined in (1.126). We define a variance operator as

ðΔAÞ2 ,

where we have

ΔA  A - hAi: ð2:123Þ

In (2.123), we assume that hAi is obtained by operating A on a certain physical


state jψ i. Then, we have

ðΔAÞ2 = ðA - hAiÞ2 = A2 - 2hAiA þ hAi2 = A2 - hAi2 : ð2:124Þ

If A is Hermitian, ΔA is Hermitian as well. This is because


54 2 Quantum-Mechanical Harmonic Oscillator

ðΔAÞ{ = A{ - hAi = A - hAi = ΔA, ð2:125Þ

where we used the fact that an expectation value of an Hermitian operator is real.
Then, h(ΔA)2i is non-negative as in the case of (2.13). Moreover, if jψi is an
eigenstate of A, h(ΔA)2i = 0. Therefore, h(ΔA)2i represents a measure of how
large measured values are dispersed when A is measured in reference to a quantum
state jψ i. Also, we define a standard deviation δA as

δA  ðΔAÞ2 : ð2:126Þ

We have a following important theorem on a standard deviation δA [4].


Theorem 2.1 Let A and B be Hermitian operators. If A and B satisfy

½A, B = ik ðk : non‐zero real numberÞ, ð2:127Þ

then we have

δA  δB ≥ jkj=2 ð2:128Þ

in reference to any quantum state jψ i.


Proof We have

½ΔA, ΔB = ½A - hψjAjψ i, B - hψjBjψi = ½A, B = ik: ð2:129Þ

In (2.129), we used the fact that hψ| A| ψi and hψ| B| ψi are just real numbers and
those commute with any operator. Next, we calculate a following quantity in relation
to a real number λ:

kðΔA þ iλΔBÞjψik2 = hψjðΔA - iλΔBÞðΔA þ iλΔBÞjψ i


ð2:130Þ
= ψjðΔAÞ2 jψ - kλ þ ψjðΔBÞ2 jψ λ2 ,

where we used the fact that ΔA and ΔB are Hermitian. For the above quadratic form
to hold with any real number λ, we have

ð- k Þ2 - 4 ψjðΔAÞ2 jψ ψjðΔBÞ2 jψ ≤ 0: ð2:131Þ

Thus, (2.128) will follow. This completes the proof.


On the basis of Theorem 2.1, we find that both δA and δB are positive on
condition that (2.127) holds. We have another important theorem.
2.5 Variance and Uncertainty Principle 55

Theorem 2.2 Let A be an Hermitian operator. The necessary and sufficient condi-
tion for a physical state jψ 0i to be an eigenstate of A is δA = 0.
Proof Suppose that jψ 0i is a normalized eigenstate of A that belongs to an eigen-
value a. Then, we have

ψ 0 jA2 ψ 0 = ahψ 0 jAψ 0 i = a2 hψ 0 jψ 0 i = a2 ,


ð2:132Þ
hψ 0 jAψ 0 i2 = ½ahψ 0 jψ 0 i2 = a2 :

From (2.124) and (2.126), we have

ψ 0 jðΔAÞ2 ψ 0 = 0 i:e:, δA = 0: ð2:133Þ

Note that δA is measured in reference to jψ 0i. Conversely, suppose that δA = 0.


Then,

δA = ψ 0 jðΔAÞ2 ψ 0 = hΔAψ 0 jΔAψ 0 i = jjΔAψ 0 jj, ð2:134Þ

where we used the fact that ΔA is Hermitian. From the definition of norm of (1.121),
for δA = 0 to hold, we have

ΔAψ 0 = ðA - hAiÞψ 0 = 0 i:e:, Aψ 0 = hAiψ 0 : ð2:135Þ

This indicates that ψ 0 is an eigenstate of A that belongs to an eigenvalue hAi. This


completes the proof.
Theorem 2.1 implies that (2.127) holds with any physical state jψi. That is, we
must have δA > 0 and δB > 0, if δA and δB are evaluated in reference to any jψi on
condition that (2.127) holds. From Theorem 2.2, in turn, it follows that eigenstates
cannot exist with A or B under the condition of (2.127).
To explicitly show this, we take an inner product using (2.127). That is, with
Hermitian operators A and B, consider the following inner product:

hψj½A, Bjψ i = hψjikjψ i i:e:, hψjAB - BAjψ i = ik, ð2:136Þ

where we assumed that jψi is arbitrarily chosen normalized vector. Suppose now
that jψ 0i is an eigenstate of A that belongs to an eigenvalue a. Then, we have

A j ψ 0 i = a j ψ 0 i: ð2:137Þ

Taking an adjoint of (2.137), we get


56 2 Quantum-Mechanical Harmonic Oscillator

hψ 0 j A{ = hψ 0 j A = hψ 0 j a = ahψ 0 j , ð2:138Þ

where the last equality comes from the fact that A is Hermitian. From (2.138), we
would have

hψ 0 jAB - BAjψ 0 i = hψ 0 jABjψ 0 i - hψ 0 jBAjψ 0 i


= hψ 0 jaBjψ 0 i - hψ 0 jBajψ 0 i = ahψ 0 jBjψ 0 i - ahψ 0 jBjψ 0 i = 0:

This would imply that (2.136) does not hold with jψ 0i, in contradiction to (2.127),
where ik ≠ 0. Namely, we conclude that any physical state cannot be an eigenstate of
A on condition that (2.127) holds. Equation (2.127) is rewritten as

hψjBA - ABjψ i = - ik: ð2:139Þ

Suppose now that jφ0i is an eigenstate of B that belongs to an eigenvalue b. Then,


we can similarly show that any physical state cannot be an eigenstate of B.
Summarizing the above, we restate that once we have a relation [A, B] = ik (k ≠ 0),
their representation matrix does not diagonalize A or B. Or, once we postulate [A,
B] = ik (k ≠ 0), we must abandon an effort to have a representation matrix that
diagonalizes A and B. In the quantum-mechanical formulation of a harmonic oscil-
lator, we have introduced the canonical commutation relation (see Sect. 2.3)
described by [q, p] = iħ; see (1.140) and (2.73). Indeed, neither q nor p is diagonal-
ized as shown in (2.69) or (2.70).
Example 2.1 Taking a quantum harmonic oscillator as an example, we consider
variance of q and p in reference to jψ ni (n = 0, 1, ⋯). We have

ðΔqÞ2 = ψ n jq2 jψ n - hψ n jqjψ n i2 : ð2:140Þ

Using (2.55) and (2.62) as well as (2.68), we get

ħ
hψ n jqjψ n i = ψ ja þ a{ jψ n
2mω n
ð2:141Þ
ħn ħ ð n þ 1Þ
= hψ jψ iþ ψ n jψ nþ1 = 0,
2mω n n - 1 2mω

where the last equality comes from (2.53). We have

ħ 2 ħ 2
q2 = a þ a{ = a2 þ E þ 2a{ a þ a{ , ð2:142Þ
2mω 2mω

where E denotes an identity operator and we used (2.24) along with the following
relation:
2.5 Variance and Uncertainty Principle 57

aa{ = aa{ - a{ a þ a{ a = a, a{ þ a{ a = E þ a{ a: ð2:143Þ

Using (2.55) and (2.62), we have

ħ ħ
ψ n jq2 jψ n = hψ n jψ n i þ 2hψ n ja{ aψ n i = ð2n þ 1Þ, ð2:144Þ
2mω 2mω

where we used (2.60) with the last equality. Thus, we get

ħ
ðΔqÞ2 = ψ n jq2 jψ n - hψ n jqjψ n i2 = ð2n þ 1Þ:
2mω

Following similar procedures to those mentioned above, we get

mħω
hψ n jpjψ n i = 0 and ψ n jp2 jψ n = ð2n þ 1Þ: ð2:145Þ
2

Thus, we get

mħω
ðΔpÞ2 = ψ n jp2 jψ n = ð2n þ 1Þ:
2

Accordingly, we have

ħ ħ
δq  δp = ðΔqÞ2  ðΔpÞ2 = ð2n þ 1Þ ≥ : ð2:146Þ
2 2

The quantity δq  δp is equal to ħ2 for n = 0 and becomes larger with increasing n.


The above example gives a good illustration for Theorem 2.1. Note that putting
A = q and B = p along with k = ħ in Theorem 2.1, from (1.140) we should have

ħ
δq  δp ≥ : ð2:147Þ
2

Thus, the quantum-mechanical harmonic oscillator gives us an interesting exam-


ple with respect to the standard deviation of the position operator and momentum
operator. That is, (2.146) is a special case of (2.147). The general relation
represented by (2.147) is well-known as uncertainty principle.
In relation to the aforementioned argument, we might well wonder if Examples
1.1 and 1.2 have an eigenstate of a fixed momentum. Suppose that we chose for an
eigenstate y(x) = ceikx, where c is a constant. Then, we would have ħi ∂y∂xðxÞ = ħkyðxÞ
and get an eigenvalue ħk for a momentum. Nonetheless, such y(x) does not satisfy
the proper BCs; i.e., y(L) = y(-L ) = 0. This is because eikx never vanishes with any
real numbers of k or x (any complex numbers of k or x, more generally). Thus, we
58 2 Quantum-Mechanical Harmonic Oscillator

cannot obtain a proper solution that has an eigenstate with a fixed momentum in a
confined physical system.

References

1. Messiah A (1999) Quantum mechanics. Dover, New York, NY


2. Stakgold I (1998) Green’s functions and boundary value problems, 2nd edn. Wiley,
New York, NY
3. Lebedev NN (1972) Special functions and their applications. Dover, New York, NY
4. Shimizu A (2003) Upgrade foundation of quantum theory. Saiensu-sha, Tokyo. (in Japanese)
Chapter 3
Hydrogen-Like Atoms

In a history of quantum mechanics, it was first successfully applied to the motion of


an electron in a hydrogen atom along with a harmonic oscillator. Unlike the case of a
one-dimensional harmonic oscillator we dealt with in Chap. 2, however, with a
hydrogen atom we have to consider three-dimensional motion of an electron.
Accordingly, it takes somewhat elaborate calculations to constitute the Hamiltonian.
The calculation procedures themselves, however, are worth following to understand
underlying basic concepts of the quantum mechanics. At the same time, this chapter
is a treasure of special functions. In Chap. 2, we have already encountered one of
them, i.e., Hermite polynomials. Here we will deal with Legendre polynomials,
associated Legendre polynomials, etc. These special functions arise when we deal
with a physical system having, e.g., the spherical symmetry. In a hydrogen atom, an
electron is moving in a spherically-symmetric Coulomb potential field produced by a
proton. This topic provides us with a good opportunity to study various special
functions. The related Schrödinger equation can be separated into an angular part
and a radial part. The solutions of angular parts are characterized by spherical
(surface) harmonics. The (associated) Legendre functions are correlated to them.
The solutions of the radial part are connected to the (associated) Laguerre poly-
nomials. The exact solutions are obtained by the product of the (associated) Legen-
dre functions and (associated) Laguerre polynomials accordingly. Thus, to study the
characteristics of hydrogen-like atoms from the quantum-mechanical perspective is
of fundamental importance.

3.1 Introductory Remarks

The motion of the electron in hydrogen is well-known as a two-particle problem


(or two-body problem) in a central force field. In that case, the coordinate system of
the physical system is separated into the relative coordinates and center-of-mass
coordinates. To be more specific, the coordinate separation is true of the case where

© Springer Nature Singapore Pte Ltd. 2023 59


S. Hotta, Mathematical Physical Chemistry,
https://doi.org/10.1007/978-981-99-2512-4_3
60 3 Hydrogen-Like Atoms

two particles are moving under control only by a force field between the two
particles without other external force fields [1].
In the classical mechanics, equation of motion is separated into two equations
related to the relative coordinates and center-of-mass coordinates accordingly. Of
these, a term of the potential field is only included in the equation of motion with
respect to the relative coordinates.
The situation is the same with the quantum mechanics. Namely, the Schrödinger
equation of motion with the relative coordinates is expressed as an eigenvalue
equation that reads as

ħ2 2
- — þ V ðr Þ ψ = Eψ, ð3:1Þ

where μ is a reduced mass of two particles [1], i.e., an electron and a proton; V(r) is a
potential with r being a distance between the electron and proton. In (3.1), we
assume the spherically-symmetric potential; i.e., the potential is expressed only as
a function of the distance r. Moreover, if the potential is coulombic,

ħ2 2 e2
- — - ψ = Eψ, ð3:2Þ
2μ 4πε0 r

where ε0 is permittivity of vacuum and e is an elementary charge.


If we think of hydrogen-like atoms such as He+, Li2+, Be3+, etc., we have an
equation described as

ħ2 2 Ze2
- — - ψ = Eψ, ð3:3Þ
2μ 4πε0 r

where Z is an atomic number and μ is a reduced mass of an electron and a nucleus


pertinent to the atomic (or ionic) species. We start with (3.3) in this chapter.

3.2 Constitution of Hamiltonian

As explicitly described in (3.3), the coulombic potential has a spherical symmetry. In


such a case, it will be convenient to recast (3.3) in a spherical coordinate (or polar
coordinate). As the physical system is of three-dimensional, we have to consider
orbital angular momentum L in Hamiltonian.
We have
3.2 Constitution of Hamiltonian 61

Lx
L = ð e1 e2 e3 Þ Ly , ð3:4Þ
Lz

where e1, e2, and e3 denote orthonormal basis vectors in a three-dimensional


Cartesian space (ℝ3); Lx, Ly, and Lz represent each component of L. The angular
momentum L is expressed in a form of determinant as

e1 e2 e3
L=x×p= x y z ,
px py pz

where x denotes a position vector with respect to the relative coordinates x, y, and z.
That is,

x
x = ðe1 e2 e3 Þ y : ð3:5Þ
z

The quantity p denotes a momentum of an electron (as a particle carrying a


reduced mass μ) with px, py, and pz being their components; p is denoted similarly to
the above.
As for each component of L, we have, e.g.,

Lx = ypz - zpy : ð3:6Þ

To calculate L2, we estimate Lx2, Ly2, and Lz2 separately. We have

Lx 2 = ypz - zpy  ypz - zpy


= ypz ypz - ypz zpy - zpy ypz - zpy zpy
= y2 pz 2 - ypz zpy - zpy ypz þ z2 py 2
= y2 pz 2 - y zpz - iħ py - z ypy - iħ pz þ z2 py 2
= y2 pz 2 þ z2 py 2 - yzpz py - zypy pz þ iħ ypy þ zpz ,

where we have used canonical commutation relation (1.140) in the second to the last
equality. In the above calculations, we used commutability of, e.g., y and pz; z and py.
For example, we have
62 3 Hydrogen-Like Atoms

ħ ∂ ∂ ħ ∂ j ψi ∂ j ψi
pz , y j ψi = y-y j ψi = y -y = 0:
i ∂z ∂z i ∂z ∂z

Since jψi is arbitrarily chosen, this relation implies that pz and y commute. We
obtain similar relations regarding Ly2 and Lz2 as well. Thus, we have

L2 ¼ Lx 2 þ Ly 2 þ Lz 2
¼ y 2 pz 2 þ z 2 py 2 þ z 2 px 2 þ x 2 pz 2 þ x 2 py 2 þ y 2 px 2
þ x2 px 2 - x2 px 2 þ y2 py 2 - y2 py 2 þ z2 pz 2 - z2 pz 2 - yzpz py

þ zypy pz þ zxpx pz þ xzpz px þ xypy px þ yxpx py


þ iħ ypy þ zpz þ zpz þ xpx þ xpx þ ypy

¼ y2 pz 2 þ z 2 py 2 þ z 2 px 2 þ x 2 pz 2 þ x 2 py 2 þ y 2 p x 2 þ x 2 px 2 þ y 2 py 2

þ z2 pz 2 - x2 px 2 þ y2 py 2 þ z2 pz 2 þ yzpz py þ zypy pz þ zxpx pz

þ xzpz px þ xypy px þ yxpx py þ iħ ypy þ zpz þ zpz þ xpx þ xpx þ ypy

¼ r2  p2 - rðr  pÞ  p þ 2iħðr  pÞ:


ð3:7Þ

In (3.7), we are able to ease the calculations by virtue of putting a term (x2px2 -
x px2 + y2py2 - y2py2 + z2pz2 - z2pz2). As a result, for the second term after the
2

second to the last equality we have

- x 2 p x 2 þ y 2 py 2 þ z 2 pz 2 þ yzpz py þ zypy pz þ zxpx pz þ xzpz px

þ xypy px þ yxpx py

¼ - x xpx þ ypy þ zpz px þ y xpx þ ypy þ zpz py

þ z xpx þ ypy þ zpz pz


¼ - rðr  pÞ  p:

The calculations of r2  p2 [the first term of (3.7)] and r  p (in the third term) are
straightforward.
In a spherical coordinate, momentum p is expressed as
3.2 Constitution of Hamiltonian 63

Fig. 3.1 Spherical


coordinate system and (a) z
orthonormal basis set. (a)
Orthonormal basis vectors
e(r), e(θ), and e(ϕ) in ℝ3. (b)
The basis vector e(ϕ) is e(r)
perpendicular to the plane
shaped by the z-axis and a
straight line of y = x tan ϕ
e(φ )

θ e(θ )

O y

(b)
z
e(r)

e(θ )
e(φ )

p = pr eðrÞ þ pθ eðθÞ þ pϕ eðϕÞ , ð3:8Þ

where pr, pθ, and pϕ are components of p; e(r), e(θ), and e(ϕ) are orthonormal basis
vectors of ℝ3 in the direction of increasing r, θ, and ϕ, respectively (see Fig. 3.1). In
Fig. 3.1b, e(ϕ) is perpendicular to the plane shaped by the z-axis and a straight line of
y = x tan ϕ. Notice that the said plane is spanned by e(r) and e(θ). Meanwhile, the
momentum operator is expressed as [2]
64 3 Hydrogen-Like Atoms

ħ
p = —
i
ð3:9Þ
ħ ðrÞ ∂ 1 ∂ 1 ∂
= e þ eðθÞ þ eðϕÞ :
i ∂r r ∂θ r sin θ ∂ϕ

The vector notation of (3.9) corresponds to (1.31). That is, in the Cartesian
coordinate, we have

ħ ħ ∂ ∂ ∂
p= —= e þ e2 þ e3 , ð3:10Þ
i i 1 ∂x ∂y ∂z

where — is said to be nabla (or del), a kind of differential vector operator.


Noting that

r = reðrÞ ,

and using (3.9), we have

ħ ħ ∂
r p=r — =r : ð3:11Þ
i i ∂r

Hence,

2
ħ ∂ ħ ∂ ∂
r ð r  pÞ  p = r r = - ħ2 r 2 2 : ð3:12Þ
i ∂r i ∂r ∂r

Thus, we have

2
∂ ∂ ∂ ∂
L2 = r 2 p2 þ ħ2 r 2 þ 2ħ2 r = r 2 p2 þ ħ2 r2 : ð3:13Þ
∂r 2 ∂r ∂r ∂r

Therefore,

ħ2 ∂ ∂ L2
p2 = - r2 þ 2: ð3:14Þ
r ∂r
2 ∂r r

Notice here that L2 does not contain r (vide infra); i.e., L2 commutes with r2, and
so it can freely be divided by r2. Thus, the Hamiltonian H is represented by
3.2 Constitution of Hamiltonian 65

p2
H = þ V ðr Þ

ð3:15Þ
1 ħ2 ∂ ∂ L2 Ze2
= - 2 r2 þ 2 - :
2μ r ∂r ∂r r 4πε0 r

Thus, the Schrödinger equation can be expressed as

1 ħ2 ∂ ∂ L2 Ze2
- 2 r2 þ 2 - ψ = Eψ: ð3:16Þ
2μ r ∂r ∂r r 4πε0 r

Now, let us describe L2 in a polar coordinate. The calculation procedures are


somewhat lengthy, but straightforward. First we have

x = r sin θ cos ϕ,
y = r sin θ sin ϕ, ð3:17Þ
z = r cos θ,

where we have 0 ≤ θ ≤ π and 0 ≤ ϕ ≤ 2π. Rewriting (3.17) with respect to r, θ, and


ϕ, we get

1=2
r = ðx2 þ y2 þ z2 Þ ,
2 1=2
ð x2 þ y Þ
θ = tan - 1 , ð3:18Þ
z
y
ϕ = tan - 1 :
x

Thus, we have

∂ ∂
Lz = xpy - ypx = - iħ x -y , ð3:19Þ
∂y ∂x
∂ ∂r ∂ ∂θ ∂ ∂ϕ ∂
= þ þ : ð3:20Þ
∂x ∂x ∂r ∂x ∂θ ∂x ∂ϕ

In turn, we have
66 3 Hydrogen-Like Atoms

∂r x
¼ ¼ sin θ cos ϕ,
∂x r
- 1=2
∂θ 1 ð x 2 þ y2 Þ  2x z x
¼  ¼ 
∂x 1 þ ðx þ y Þ=z
2 2 2 2z x2 þ y2 þ z2 ðx2 þ y2 Þ1=2
ð3:21Þ
cos θ cos ϕ
¼ ,
r
∂ϕ 1 1 sin ϕ
¼ y - 2 ¼ - :
∂x 1 þ ðy2 =x2 Þ x r sin θ

In calculating the last two equations of (3.21), we used the differentiation of an


arc tangent function along with a composite function. Namely,

0 1
tan - 1 x = :
1 þ x2

Inserting (3.21) into (3.20), we get

∂ ∂ cos θ cos ϕ ∂ sin ϕ ∂


= sin θ cos ϕ þ - : ð3:22Þ
∂x ∂r r ∂θ r sin θ ∂ϕ

Similarly, we have

∂ ∂ cos θ sin ϕ ∂ cos ϕ ∂


= sin θ sin ϕ þ þ : ð3:23Þ
∂y ∂r r ∂θ r sin θ ∂ϕ

Inserting (3.22) and (3.23) together with (3.17) into (3.19), we get


Lz = - iħ : ð3:24Þ
∂ϕ

In a similar manner, we have

∂ ∂r ∂ ∂θ ∂ ∂ sin θ ∂
= þ = cos θ - :
∂z ∂z ∂r ∂z ∂θ ∂r r ∂θ

Combining this relation with either (3.23) or (3.22), we get

∂ ∂
Lx = iħ sin ϕ þ cot θ cos ϕ , ð3:25Þ
∂θ ∂ϕ
∂ ∂
Ly = - iħ cos ϕ - cot θ sin ϕ : ð3:26Þ
∂θ ∂ϕ

Now, we introduce the following operators:


3.2 Constitution of Hamiltonian 67

LðþÞ  Lx þ iLy and Lð- Þ  Lx - iLy : ð3:27Þ

Then, we have

∂ ∂ ∂ ∂
LðþÞ = ħeiϕ þ i cot θ and Lð- Þ = ħe - iϕ - þ i cot θ : ð3:28Þ
∂θ ∂ϕ ∂θ ∂ϕ

Thus, we get

∂ ∂ ∂ ∂
LðþÞ Lð- Þ ¼ ħ2 eiϕ þ i cot θ e - iϕ - þ icot θ
∂θ ∂ϕ ∂θ ∂ϕ
2 2
∂ 1 ∂ ∂
¼ ħ2 eiϕ e - iϕ - þi - þ i cot θ
∂θ2 sin 2 θ ∂ϕ ∂θ∂ϕ

2 2
∂ ∂ ∂ ∂
þ e -iϕ cot θ - þ i cot θ þ ie - iϕ cot θ - þ i cot θ 2
∂θ ∂ϕ ∂ϕ∂θ ∂ϕ
2 2
∂ ∂ ∂ ∂
¼ - ħ2 þ cot θ þ i þ cot 2 θ 2 :
∂θ 2 ∂θ ∂ϕ ∂ϕ
ð3:29Þ

In the above calculation procedure, we used differentiation of a product function.


For instance, we have

2
∂ ∂ ∂ cot θ ∂ ∂
icot θ =i þ cot θ
∂θ ∂ϕ ∂θ ∂ϕ ∂θ∂ϕ
2
1 ∂ ∂
=i - þ cot θ :
sin θ ∂ϕ
2 ∂θ∂ϕ
2 2
∂ ∂
Note also that ∂θ∂ϕ = ∂ϕ∂θ . This is because we are dealing with continuous and
differentiable functions.
Meanwhile, we have the following commutation relations:

Lx , Ly = iħLz , Ly , Lz = iħLx , and ½Lz , Lx  = iħLy : ð3:30Þ

This can easily be confirmed by requiring canonical commutation relations. The


derivation can routinely be performed, but we show it because the procedures
include several important points. For instance, we have
68 3 Hydrogen-Like Atoms

Lx , Ly = Lx Ly - Ly Lx
= ypz - zpy zpx - xpz - zpx - xpz ypz - zpy
= ypz zpx - ypz xpz - zpy zpx þ zpy xpz
- zpx ypz þ zpx zpy þ xpz ypz - xpz zpy
= ypx pz z - zpx ypz þ zpy xpz - xpz zpy
þ xpz ypz - ypz xpz þ zpx zpy - zpy zpx
= - ypx zpz - pz z þ xpy zpz - pz z = iħ xpy - ypx = iħLz :

In the above calculations, we used the canonical commutation relation as well as


commutability of, e.g., y and px; y and z; px and py. For example, we get

2 2
∂ ∂ ∂ ∂ ∂ j ψi ∂ j ψi
px , py j ψi = - ħ2 - j ψi = - ħ2 - = 0:
∂x ∂y ∂y ∂x ∂x∂y ∂y∂x

In the above equation, we assumed that the order of differentiation with respect to
x and y can be switched. It is because we are dealing with continuous and differen-
tiable normal functions. Thus, px and py commute.
For other important commutation relations, we have

Lx , L2 = 0, Ly , L2 = 0, and Lz , L2 = 0: ð3:31Þ

With the derivation, use

½A, B þ C  = ½A, B þ ½A, C :

The derivation is straightforward and it is left for readers. The relations (3.30) and
(3.31) imply that a simultaneous eigenstate exists for L2 and one of Lx, Ly, and Lz.
This is because L2 commute with them from (3.31), whereas Lz does not commute
with Lx or Ly. The detailed argument about the simultaneous eigenstate can be seen
in Part III.
Thus, we have

LðþÞ Lð- Þ = Lx 2 þ Ly 2 þ i Ly Lx - Lx Ly = Lx 2 þ Ly 2 þ i Ly , Lx
= Lx 2 þ Ly 2 þ ħLz :

Notice here that [Ly, Lx] = - [Lx, Ly] = - iħLz. Hence,

L2 = LðþÞ Lð- Þ þ Lz 2 - ħLz : ð3:32Þ

From (3.24), we have


3.3 Separation of Variables 69

2

L z 2 = - ħ2 : ð3:33Þ
∂ϕ2

Finally, we get

2 2
∂ ∂ 1 ∂
L2 = - ħ2 þ cot θ þ
∂θ 2 ∂θ sin θ ∂ϕ2
2

or

2
1 ∂ ∂ 1 ∂
L2 = - ħ2 sin θ þ : ð3:34Þ
sin θ ∂θ ∂θ sin 2 θ ∂ϕ2

Replacing L2 in (3.15) with that of (3.34), we have

2
ħ2 ∂ ∂ 1 ∂ ∂ 1 ∂
H= - r2 þ sin θ þ
2μr 2 ∂r ∂r sin θ ∂θ ∂θ sin 2 θ ∂ϕ2
Ze2
- : ð3:35Þ
4πε0 r

Thus, the Schrödinger equation of (3.3) takes a following form:

2
ħ2 ∂ ∂ 1 ∂ ∂ 1 ∂ Ze2
- r2 þ sin θ þ - ψ
2μr ∂r
2 ∂r sin θ ∂θ ∂θ sin θ ∂ϕ
2 2 4πε0 r
= Eψ:
ð3:36Þ

3.3 Separation of Variables

If the potential is spherically symmetric (e.g., a Coulomb potential), it is well-known


that the Schrödinger equations of (3.1)–(3.3) can be solved by a method of separa-
tion of variables. More specifically, (3.36) can be separated into two differential
equations one of which only depends on a radial component r and the other of which
depends only upon angular components θ and ϕ.
To apply the method of separation of variables to (3.36), let us first return to
(3.15). Considering that L2 is expressed as (3.34), we assume that L2 has eigenvalues
γ (at any rate if any) and takes eigenfunctions Y(θ, ϕ) (again, if any as well)
corresponding to γ. That is,
70 3 Hydrogen-Like Atoms

L2 Y ðθ, ϕÞ = γY ðθ, ϕÞ, ð3:37Þ

where Y(θ, ϕ) is assumed to be normalized. Meanwhile,

L2 = Lx 2 þ Ly 2 þ Lz 2 : ð3:38Þ

From (3.6), we have

{
Lx { = ypz - zpy = pz { y{ - py { z{ = pz y - py z = ypz - zpy = Lx : ð3:39Þ

Note that pz and y commute, so do py and z. Therefore, Lx is Hermitian, so is Lx2.


More generally if an operator A is Hermitian, so is An (n: a positive integer); readers,
please show it. Likewise, Ly and Lz are Hermitian as well. Thus, L2 is Hermitian, too.
Next, we consider an expectation value of L2, i.e., hL2i. Let jψi be an arbitrary
normalized non-zero vector (or function). Then,

L2  ψjL2 ψ
= ψjLx 2 ψ þ ψjLy 2 ψ þ ψjLz 2 ψ
= Lx { ψjLx ψ þ Ly { ψjLy ψ þ Lz { ψjLz ψ ð3:40Þ
= hLx ψjLx ψ i þ Ly ψjLy ψ þ hLz ψjLz ψ i
2
= jLx ψ jj2 þ Ly ψ þ jjLz ψ jj2 ≥ 0:

Notice that the second last equality comes from that Lx, Ly, and Lz are Hermitian.
An operator that satisfies (3.40) is said to be non-negative (see Sects. 1.4, 2.2, etc.
where we saw the calculation routines). Note also that in (3.40) the equality holds
only when the following relations hold:

j Lx ψi = j Ly ψi = j Lz ψi = 0: ð3:41Þ

On this condition, we have

j L2 ψi = j Lx 2 þ Ly 2 þ Lz 2 ψi = j Lx 2 ψiþ j Ly 2 ψiþ j Lz 2 ψi
ð3:42Þ
= Lx j Lx ψi þ Ly j Ly ψi þ Lz j Lz ψi = 0:

The eigenfunction that satisfies (3.42) and the next relation (3.43) is a simulta-
neous eigenstate of Lx, Ly, Lz, and L2. This could seem to be in contradiction to the
fact that Lz does not commute with Lx or Ly. However, this is an exceptional case. Let
jψ 0i be the eigenfunction that satisfies both (3.41) and (3.42). Then, we have

j Lx ψ 0 i = j Ly ψ 0 i = j Lz ψ 0 i = j L2 ψ 0 i = 0: ð3:43Þ
3.3 Separation of Variables 71

As can be seen from (3.24) to (3.26) along with (3.34), the operators Lx, Ly, Lz,
and L2 are differential operators. Therefore, (3.43) implies that jψ 0i is a constant. We
will come back this point later. In spite of this exceptional situation, it is impossible
that all Lx, Ly, and Lz as well as L2 take a whole set of eigenfunctions as simultaneous
eigenstates. We briefly show this as follows.
In Chap. 2, we mention that if [A, B] = ik, any physical state cannot be an
eigenstate of A or B. The situation is different, on the other hand, if we have a
following case

½A, B = iC, ð3:44Þ

where A, B, and C are Hermitian operators. The relation (3.30) is a typical example
for this. If C j ψi = 0 in (3.44), jψi might well be an eigenstate of A and/or B.
However, if C j ψi = c j ψi (c ≠ 0), jψi cannot be an eigenstate of A or B. This can
readily be shown in a fashion similar to that described in Sect. 2.5. Let us think of,
e.g., [Lx, Ly] = iħLz. Suppose that for ∃ψ 0 we have Lz j ψ 0i = 0. Taking an inner
product using jψ 0i, from (3.30) we have

ψ 0 j Lx Ly - Ly Lx ψ 0 = 0:

In this case, moreover, even if we have jLxψ 0i = 0 and jLyψ 0i = 0, we have no


inconsistency. If, on the other hand, Lz j ψi = m j ψi (m ≠ 0), jψi cannot be an
eigenstate of Lx or Ly as mentioned above. Thus, we should be careful to deal with a
general situation where we have [A, B] = iC.
In the case where [A, B] = 0; AB = BA, namely A and B commute, we have a
different situation. This relation is equivalent to that an operator AB - BA has an
eigenvalue zero for any physical state jψi. Yet, this statement is of less practical use.
Again, regarding details we wish to make a discussion in Sect. 14.6 of Part III.
Returning to (3.40), let us replace ψ with a particular eigenfunction Y(θ, ϕ). Then,
we have

YjL2 Y = hYjγY i = γ hYjY i = γ ≥ 0: ð3:45Þ

Again, if L2 has an eigenvalue, the eigenvalue should be non-negative. Taking


account of the coefficient ħ2 in (3.34), it is convenient to put

γ = ħ2 λ ðλ ≥ 0Þ: ð3:46Þ

On ground that the solution of (3.36) can be described as

ψ ðr, θ, ϕÞ = Rðr ÞY ðθ, ϕÞ, ð3:47Þ

the Schrödinger equation (3.16) can be rewritten as


72 3 Hydrogen-Like Atoms

1 ħ2 ∂ ∂ L2 Ze2
- 2 r2 þ 2 - Rðr ÞY ðθ, ϕÞ = ERðr ÞY ðθ, ϕÞ: ð3:48Þ
2μ r ∂r ∂r r 4πε0 r

That is,

1 ħ2 ∂ ∂Rðr Þ L2 Y ðθ, ϕÞ Ze2


- 2 r2 Y ðθ, ϕÞ þ R ðr Þ - Rðr ÞY ðθ, ϕÞ
2μ r ∂r ∂r r 2 4πε0 r ð3:49Þ
= ERðr ÞY ðθ, ϕÞ:

Recalling (3.37) and (3.46), we have

1 ħ2 ∂ ∂Rðr Þ ħ2 λY ðθ, ϕÞ Ze2


- 2 r2 Y ðθ, ϕÞ þ R ðr Þ - Rðr ÞY ðθ, ϕÞ
2μ r ∂r ∂r r 2 4πε0 r
= ERðr ÞY ðθ, ϕÞ:
ð3:50Þ

Dividing both sides by Y(θ, ϕ), we get a SOLDE of a radial component as

1 ħ2 ∂ ∂Rðr Þ ħ2 λ Ze2
- 2 r2 þ 2 Rðr Þ - Rðr Þ = ERðr Þ: ð3:51Þ
2μ r ∂r ∂r r 4πε0 r

Regarding angular components θ and ϕ, using (3.34), (3.37), and (3.46), we have

2
1 ∂ ∂ 1 ∂
L2 Y ðθ, ϕÞ = - ħ2 sin θ þ Y ðθ, ϕÞ
sin θ ∂θ ∂θ sin θ ∂ϕ2
2

= ħ2 λY ðθ, ϕÞ: ð3:52Þ

Dividing both sides by ħ2, we get

2
1 ∂ ∂ 1 ∂
- sin θ þ Y ðθ, ϕÞ = λY ðθ, ϕÞ: ð3:53Þ
sin θ ∂θ ∂θ sin 2 θ ∂ϕ2

Notice in (3.53) that the angular part of SOLDE does not depend on a specific
form of the potential.
Now, we further assume that (3.53) can be separated into a zenithal angle part θ
and azimuthal angle part ϕ such that

Y ðθ, ϕÞ = ΘðθÞΦðϕÞ: ð3:54Þ

Then we have
3.3 Separation of Variables 73

2
1 ∂ ∂ΘðθÞ 1 ∂ Φ ð ϕÞ
- sin θ Φ ð ϕÞ þ ΘðθÞ = λΘðθÞΦðϕÞ: ð3:55Þ
sin θ ∂θ ∂θ sin 2 θ ∂ϕ2

Multiplying both sides by sin2θ/Θ(θ)Φ(ϕ) and arranging both the sides, we get

2
1 ∂ Φð ϕÞ sin 2 θ 1 ∂ ∂ΘðθÞ
- = sin θ þ λΘðθÞ : ð3:56Þ
ΦðϕÞ ∂ϕ 2 Θ ðθ Þ sin θ ∂θ ∂θ

Since LHS of (3.56) depends only upon ϕ and RHS depends only on θ, we must
have

LHS of ð3:56Þ = RHS of ð3:56Þ = ηðconstantÞ: ð3:57Þ

Thus, we have a following relation of LHS of (3.56):

1 d2 ΦðϕÞ
- = η: ð3:58Þ
ΦðϕÞ dϕ2

d2
Putting D  - dϕ2
, we get

DΦðϕÞ = ηΦðϕÞ: ð3:59Þ

The SOLDEs of (3.58) and (3.59) are formally the same as (1.61) of Sect. 1.3,
where boundary conditions (BCs) are Dirichlet conditions. Unlike (1.61), however,
we have to consider different BCs, i.e., the periodic BCs.
As in Example 1.1, we adopt two linearly independent solutions. That is, we have

eimϕ and e - imϕ ðm ≠ 0Þ:

As their linear combination, we have

ΦðϕÞ = aeimϕ þ be - imϕ : ð3:60Þ

As BCs, we consider Φ(0) = Φ(2π) and Φ′(0) = Φ′(2π); i.e., we have

a þ b = aei2πm þ be - i2πm : ð3:61Þ

Meanwhile, we have

Φ0 ðϕÞ = aimeimϕ - bime - imϕ : ð3:62Þ

Therefore, from BCs we have


74 3 Hydrogen-Like Atoms

aim - bim = aimei2πm - bime - i2πm :

Then,

a - b = aei2πm - be - i2πm : ð3:63Þ

From (3.61) and (3.63), we have

2a 1 - ei2πm = 0 and 2b 1 - e - i2πm = 0:

If a ≠ 0, we must have m = 0, ± 1, ± 2, ⋯. If a = 0, we must have b ≠ 0 to avoid


having Φ(ϕ)  0 as a solution. In that case, we have m = 0, ± 1, ± 2, ⋯ as well.
Thus, it suffices to put

ΦðϕÞ = ceimϕ ðm = 0, ± 1, ± 2, ⋯Þ:

Therefore, as a normalized function ΦðϕÞ, we get

1
ΦðϕÞ = p eimϕ ðm = 0, ± 1, ± 2, ⋯Þ: ð3:64Þ

Inserting it into (3.58), we have

m2 eimϕ = ηeimϕ :

Therefore, we get

η = m2 ðm = 0, ± 1, ± 2, ⋯Þ: ð3:65Þ

From (3.56) and (3.65), we have

1 d dΘðθÞ m2 ΘðθÞ
- sin θ þ = λΘðθÞ ðm = 0, ± 1, ± 2, ⋯Þ: ð3:66Þ
sin θ dθ dθ sin 2 θ
p
In (3.64) putting m = 0 as an eigenvalue, we have ΦðϕÞ = 1= 2π as a
corresponding eigenfunction. Unlike Examples 1.1 and 1.2, this reflects that the
differential operator -d2/dϕ2 accompanied by the periodic BCs is a non-negative
operator that allows an eigenvalue of zero. Yet, we are uncertain of a range of m. To
clarify this point, we consider generalized angular momentum in the next section.
3.4 Generalized Angular Momentum 75

3.4 Generalized Angular Momentum

We obtained commutation relations of (3.30) among individual angular momentum


components Lx, Ly, and Lz. In an opposite way, we may start with (3.30) to define
angular momentum. Such a quantity is called generalized angular momentum.
Let J be a generalized angular momentum as in the case of (3.4) such that

Jx
J = ð e1 e2 e3 Þ Jy : ð3:67Þ
Jz

For the sake of simple notation, let us define J as follows so that we can eliminate
ħ and deal with dimensionless quantities in the present discussion:

J x =ħ Jx
J  J=ħ = ðe1 e2 e3 Þ J y =ħ = ð e1 e 2 e 3 Þ Jy ,
ð3:68Þ
J z =ħ Jz
J2 = J x 2 þ J y 2 þ J z 2 :

Then, we require the following commutation relations:

J x , J y = iJ z , J y , J z = iJ x , and ½J z , J x  = iJ y : ð3:69Þ

Also, we require Jx, Jy, and Jz to be Hermitian. The operator J2 is Hermitian


accordingly. The relations (3.69) lead to

J x , J2 = 0, J y , J2 = 0, and J z , J2 = 0: ð3:70Þ

This can be confirmed as in the case of (3.30).


As noted above, again a simultaneous eigenstate exists for J2 and one of Jx, Jy,
and Jz. According to the convention, we choose J2 and Jz for the simultaneous
eigenstate. Then, designating the eigenstate by jζ, μi, we have

J 2 j ζ, μi = ζ j ζ, μi and J z j ζ, μi = μ j ζ, μi: ð3:71Þ

The implication of (3.71) is that jζ, μi is the simultaneous eigenstate and that μ is
an eigenvalue of Jz which jζ, μi belongs to with ζ being an eigenvalue of J2 which jζ,
μi belongs to as well.
Since Jz and J2 are Hermitian, both μ and ζ are real (see Sect. 1.4). Of these, ζ ≥ 0
as in the case of (3.45). We define the following operators J(+) and J(-) as in the case
of (3.27):
76 3 Hydrogen-Like Atoms

J ðþÞ  J x þ iJ y and J ð- Þ  J x - iJ y : ð3:72Þ

Then, from (3.69) and (3.70), we get

J ðþÞ , J2 = J ð- Þ , J2 = 0: ð3:73Þ

Also, we obtain the following commutation relations:

J z , J ðþÞ = J ðþÞ ; J z , J ð- Þ = - J ð- Þ ; J ðþÞ , J ð- Þ = 2J z : ð3:74Þ

From (3.70) to (3.72), we get

J2 J ðþÞ j ζ, μi = J ðþÞ J2 j μi = ζJ ðþÞ j ζ, μi,


ð3:75Þ
J2 J ð- Þ j ζ, μi = J ð- Þ J2 j ζ, μi = ζJ ð- Þ j ζ, μi:

Equation (3.75) indicates that both J(+) j ζ, μi and J(-) j ζ, μi are eigenvectors of
J that correspond to an eigenvalue ζ.
2

Meanwhile, from (3.74) we get

J z J ðþÞ j ζ, μi = J ðþÞ ðJ z þ 1Þ j ζ, μi = ðμ þ 1ÞJ ðþÞ j ζ, μi,


ð3:76Þ
J z J ð- Þ j ζ, μi = J ð- Þ ðJ z - 1Þ j ζ, μi = ðμ - 1ÞJ ð- Þ j ζ, μi:

The relation (3.76) means that J(+) j ζ, μi is an eigenvector of Jz corresponding to


an eigenvalue (μ + 1), while J(-) j ζ, μi is an eigenvector of Jz corresponding to an
eigenvalue (μ - 1). This implies that J(+) and J(-) function as raising and lowering
operators (or ladder operators) that have been introduced in this chapter. Thus, using
undetermined constants (or phase factors) aμ(+) and aμ(-), we describe

J ðþÞ j ζ, μi = aμ ðþÞ j ζ, μ þ 1i and J ð- Þ j ζ, μi = aμ ð- Þ j ζ, μ - 1i: ð3:77Þ

Next, let us characterize eigenvalues μ. We have

J x 2 þ J y 2 = J2 - J z 2 : ð3:78Þ

Therefore,

J x 2 þ J y 2 j ζ, μi = J 2 - J z 2 j ζ, μi = ζ - μ2 j ζ, μi: ð3:79Þ

Since (Jx2 + Jy2) is a non-negative operator, its eigenvalues are non-negative as


well, as can be seen from (3.40) and (3.45). Then, we have
3.4 Generalized Angular Momentum 77

ζ - μ2 ≥ 0: ð3:80Þ

Thus, for a fixed value of non-negative ζ, μ is bounded both upwards and


downwards. We define then a maximum of μ as j and a minimum of μ as j′.
Consequently, on the basis of (3.77), we have

J ðþÞ j ζ, ji = 0 and J ð- Þ j ζ, j0 i = 0: ð3:81Þ

This is because we have no quantum state corresponding to jζ, j + 1i or jζ, j′ - 1i.


From (3.75) and (3.81), possible numbers of μ are

j, j - 1, j - 2, ⋯, j0 : ð3:82Þ

From (3.69) and (3.72), we get

J ð- Þ J ðþÞ = J2 - J z 2 - J z , J ðþÞ J ð- Þ = J2 - J z 2 þ J z : ð3:83Þ

Operating these operators on jζ, ji or jζ, j′i and using (3.81) we get

J ð- Þ J ðþÞ j ζ, ji = J2 - J z 2 - J z j ζ, ji = ζ - j2 - j j ζ, ji = 0,
ð3:84Þ
J ðþÞ J ð- Þ j ζ, j0 i= J2 - J z 2 þ J z j ζ, j0 i = ζ - j0 þ j0 j ζ, j0 i = 0:
2

Since jζ, ji ≠ 0 and jζ, j′i ≠ 0, we have

ζ - j2 - j = ζ - j0 þ j0 = 0:
2
ð3:85Þ

This means that

ζ = jðj þ 1Þ = j0 ðj0 - 1Þ: ð3:86Þ

Moreover, from (3.86) we get

jðj þ 1Þ - j0 ðj0 - 1Þ = ðj þ j0 Þðj - j0 þ 1Þ = 0: ð3:87Þ

As j ≥ j′, j - j′ + 1 > 0. From (3.87), therefore, we get

j þ j0 = 0 or j = - j0 : ð3:88Þ

Then, we conclude that the minimum of μ is –j. Accordingly, possible values of μ


are
78 3 Hydrogen-Like Atoms

μ = j, j - 1, j - 2, ⋯, - j - 1, - j: ð3:89Þ

That is, the number μ can take is (2j + 1). The relation (3.89) implies that taking a
positive integer k,

j - k = - j or j = k=2: ð3:90Þ

In other words, j is permitted to take a number zero, a positive integer, or a


positive half-integer (or more precisely, half-odd-integer). For instance, if j = 1/2, μ
can be 1/2 or -1/2. When j = 1, μ can be 1, 0, or - 1.
Finally, we have to decide undetermined constants aμ(+) and aμ(-). To this end,
multiplying hζ, μ - 1j on both sides of the second equation of (3.77) from the left, we
have

ζ, μ - 1jJ ð- Þ jζ, μ = aμ ð- Þ hζ, μ - 1jζ, μ - 1i = aμ ð- Þ , ð3:91Þ

where the second equality comes from that jζ, μ - 1i has been normalized; i.e.,
||| ζ, μ - 1i|| = 1. Meanwhile, taking adjoint of both sides of the first equation of
(3.77), we have

{ 
hζ, μ j J ðþÞ = aμ ðþÞ hζ, μ þ 1 j : ð3:92Þ

However, from (3.72) and the fact that Jx and Jy are Hermitian,

{
J ð þÞ = J ð- Þ : ð3:93Þ

Using (3.93) and replacing μ in (3.92) with μ - 1, we get



hζ, μ - 1 j J ð- Þ = aμ - 1 ðþÞ hζ, μ j : ð3:94Þ

Furthermore, multiplying jζ, μi on (3.94) from the right, we have


 
ζ, μ - 1jJ ð- Þ jζ, μ = aμ - 1 ðþÞ hζ, μjζ, μi = aμ - 1 ðþÞ , ð3:95Þ

where again jζ, μi is assumed to be normalized. Comparing (3.91) and (3.95), we get

aμ ð- Þ = aμ - 1 ðþÞ : ð3:96Þ

Taking an inner product regarding the first equation of (3.77) and its adjoint,
3.5 Orbital Angular Momentum: Operator Approach 79

 2
ζ, μjJ ð- Þ J ðþÞ jζ, μ = aμ ðþÞ aμ ðþÞ hζ, μ þ 1jζ, μ þ 1i = aμ ðþÞ : ð3:97Þ

Once again, the second equality of (3.97) results from the normalization of the
vector.
Using (3.83) as well as (3.71) and (3.86), (3.97) can be rewritten as

ζ, μjJ 2 - J z 2 - J z jζ, μ
2
= ζ, μjjðj þ 1Þ - μ2 - μjζ, μ = hζ, μjζ, μiðj - μÞðj þ μ þ 1Þ = aμ ðþÞ :
ð3:98Þ

Thus, we get

aμ ðþÞ = eiδ ðj - μÞðj þ μ þ 1Þ ðδ : an arbitrary real numberÞ, ð3:99Þ

where eiδ is a phase factor. From (3.96) we also get

aμ ð- Þ = e - iδ ðj - μ þ 1Þðj þ μÞ: ð3:100Þ

In (3.99) and (3.100), we routinely put δ = 0 so that aμ(+) and aμ(-) can be positive
numbers. Explicitly rewriting (3.77), we get

J ðþÞ j ζ, μi = ðj - μÞðj þ μ þ 1Þ j ζ, μ þ 1i,


ð3:101Þ
ð- Þ
J j ζ, μi = ðj - μ þ 1Þðj þ μÞ j ζ, μ - 1i,

where j is a fixed given number chosen from among zero, positive integers, and
positive half-integers (or half-odd-integers).
As discussed above, we have derived various properties and relations with respect
to the generalized angular momentum on the basis of (1) the relation (3.30) or (3.69)
and (2) the fact that Jx, Jy, and Jz are Hermitian operators. This notion is very useful
in dealing with various angular momenta of different origins (e.g., orbital angular
momentum, spin angular momentum, etc.) from a unified point of view. In Chap. 20,
we will revisit this issue in more detail.

3.5 Orbital Angular Momentum: Operator Approach

In Sect. 3.4 we have derived various important results on angular momenta on the
basis of the commutation relations (3.69) and the assumption that Jx, Jy, and Jz are
Hermitian. Now, let us return to the discussion on orbital angular momenta we dealt
with in Sects. 3.2 and 3.3. First, we treat the orbital angular momenta via operator
approach. This approach enables us to understand why a quantity j introduced in
80 3 Hydrogen-Like Atoms

Sect. 3.4 takes a value zero or positive integers with the orbital angular momenta. In
the next section (Sect. 3.6) we will deal with the related issues by an analytical
method.
In (3.28) we introduced differential operators L(+) and L(-). According to Sect.
3.4, we define the following operators to eliminate ħ so that we can deal with
dimensionless quantities:

Mx
M  L=ħ = ðe1 e2 e3 Þ My ,
ð3:102Þ
Mz
M 2 = L2 =ħ2 = M x 2 þ M y 2 þ M z 2 :

Hence, we have

M x = Lx =ħ, M y = Ly =ħ, and M z = Lz =ħ: ð3:103Þ

Moreover, we define the following operators [3]:

∂ ∂
M ðþÞ  M x þ iM y = LðþÞ =ħ = eiϕ þ i cot θ , ð3:104Þ
∂θ ∂ϕ
∂ ∂
M ð- Þ  M x - iM y = Lð- Þ =ħ = e - iϕ - þ i cot θ : ð3:105Þ
∂θ ∂ϕ

Then we have

2
1 ∂ ∂ 1 ∂
M2 = - sin θ þ : ð3:106Þ
sin θ ∂θ ∂θ sin 2 θ ∂ϕ2

Here we execute variable transformation such that

ξ = cos θ ð0 ≤ θ ≤ πÞ or sin θ = 1 - ξ2 : ð3:107Þ

Noting, e.g., that

∂ ∂ξ ∂ ∂ ∂ ∂
= = - sin θ =- 1 - ξ2 , sin θ =
∂θ ∂θ ∂ξ ∂ξ ∂ξ ∂θ
∂ ∂
- sin 2 θ = - 1 - ξ2 , ð3:108Þ
∂ξ ∂ξ

we get
3.5 Orbital Angular Momentum: Operator Approach 81

∂ ξ ∂
M ðþÞ = eiϕ - 1 - ξ2 þi ,
∂ξ 1-ξ ∂ϕ 2

∂ ξ ∂ ð3:109Þ
M ð- Þ = e - iϕ 1 - ξ2 þi ,
∂ξ 1 - ξ ∂ϕ2

2
∂ ∂ 1 ∂
M2 = - 1 - ξ2 - :
∂ξ ∂ξ 1 - ξ2 ∂ϕ2

Although we showed in Sect. 3.3 that m = 0, ± 1, ± 2, ⋯, the range of m was


unclear. The relationship between m and λ in (3.66) remains unclear so far as well.
On the basis of a general approach developed in Sect. 3.4, however, we have known
that the eigenvalue μ of the dimensionless z-component angular momentum Jz is
bounded with its maximum and minimum being j and –j, respectively [see (3.89)],
where j can be zero, a positive integer, or a positive half-odd-integer. Concomitantly,
the eigenvalue ζ of J2 equals j( j + 1).
In the present section, let us reconsider the relationship between m and λ in (3.66)
in light of the knowledge obtained in Sect. 3.4. According to the custom, we replace
μ in (3.89) with m to have

m = j, j - 1, j - 2, ⋯, j - 1, - j: ð3:110Þ

At the moment, we assume that m can be a half-odd-integer besides zero or an


integer [3].
Now, let us define notation of Y(θ, ϕ) that appeared in (3.37). This function is
eligible for a simultaneous eigenstate of M2 and Mz and can be indexed with j and
m as in (3.110). Then, let Y(θ, ϕ) be described accordingly as

j ðθ, ϕÞ  Y ðθ, ϕÞ:


Ym ð3:111Þ

From (3.54) and (3.64), we have

j ðθ, ϕÞ / e
Ym :
imϕ

Therefore, we get

∂ mξ
M ðþÞ Y m
j ðθ, ϕÞ = e

- 1 - ξ2 - j ðθ, ϕÞ
Ym
∂ξ 1 - ξ2
ð3:112Þ
mþ1 -m

= -e iϕ
1-ξ 2
1-ξ 2
j ðθ, ϕÞ
Ym ,
∂ξ

where we used the following equations:


82 3 Hydrogen-Like Atoms

-m -m-1 -1
∂ 1
1 - ξ2 = ð - mÞ 1 - ξ2  1 - ξ2 ð - 2ξÞ
∂ξ 2
-m-2
= mξ 1 - ξ2 ,
-m -m-2
ð3:113Þ

1 - ξ2 j ðθ, ϕÞ = mξ
Ym 1 - ξ2 j ðθ, ϕÞ
Ym
∂ξ
-m
∂Y m
j ðθ, ϕÞ
þ 1 - ξ2 :
∂ξ

Similarly, we get

∂ mξ
M ð- Þ Y m
j ðθ, ϕÞ = e
- iϕ
1 - ξ2 - j ðθ, ϕÞ
Ym
∂ξ 1 - ξ2
ð3:114Þ
- mþ1 m

= e - iϕ 1 - ξ2 1 - ξ2 j ðθ, ϕÞ :
Ym
∂ξ

Let us derive the relations where M(+) or M(-) is successively operated on


j ðθ, ϕÞ. In the case of M , using (3.109) we have
(+)
Ym

mþn n
n ∂
M ðþÞ Y m n inϕ
j ðθ, ϕÞ = ð- 1Þ e 1 - ξ2
∂ξn
-m
 1 - ξ2 j ðθ, ϕÞ :
Ym ð3:115Þ

We confirm this relation by mathematical induction. We have (3.112) by


replacing n with 1 in (3.115). Namely, (3.115) holds when n = 1. Next, suppose
that (3.115) holds with n. Then, using the first equation of (3.109) and noting (3.64),
we have
3.5 Orbital Angular Momentum: Operator Approach 83

nþ1 n
M ðþÞ j ðθ, ϕÞ ¼ M
Ym ðþÞ
M ðþÞ Y m
j ðθ, ϕÞ

∂ ξ ∂
¼ eiϕ - 1 - ξ2 þi
∂ξ 1- ξ 2 ∂ϕ

mþn n -m

× ð- 1Þn einϕ 1 -ξ2 1 - ξ2 j ðθ,ϕÞ
Ym
∂ξn
∂ ξðn þ mÞ
¼ ð- 1Þn eiðnþ1Þϕ - 1- ξ2 -
∂ξ 1 - ξ2
mþn n -m

× 1 - ξ2 1 -ξ2 j ðθ, ϕÞ
Ym
∂ξn
mþn -1
¼ ð- 1Þn eiðnþ1Þϕ - 1 - ξ2 ðm þ nÞ 1- ξ2

-1 n -m
1 ∂
× 1 - ξ2 ð - 2ξÞ 1 - ξ2 j ðθ,ϕÞ
Ym
2 ∂ξn
mþn nþ1 -m

- 1- ξ2 1 - ξ2 1- ξ2 j ðθ,ϕÞ
Ym
∂ξnþ1
mþn -m
ξðn þ mÞ ∂
n
- 1 - ξ2 1 - ξ2 j ðθ,ϕÞ
Ym
1 - ξ2 ∂ξn
mþðnþ1Þ nþ1 -m

¼ ð- 1Þnþ1 eiðnþ1Þϕ 1 - ξ2 1 - ξ2 j ðθ, ϕÞ :
Ym
∂ξnþ1
ð3:116Þ

Notice that the first and third terms in the second last equality canceled each other.
Thus, (3.115) certainly holds with (n + 1).
Similarly, we have [3]

- mþn n
n ∂
M ð- Þ Y m
j ðθ, ϕÞ = e
- inϕ
1 - ξ2
∂ξn
m
 1 - ξ2 j ðθ, ϕÞ :
Ym ð3:117Þ

Proof of (3.117) is left for readers.


From the second equation of (3.81) and (3.114) where m is replaced with -j, we
have
84 3 Hydrogen-Like Atoms

jþ1 -j

M ð- Þ Y j- j ðθ, ϕÞ = e - iϕ 1 - ξ2 1 - ξ2 Y j- j ðθ, ϕÞ
∂ξ

= 0: ð3:118Þ

-j
This implies that 1 - ξ2 Y j- j ðθ, ϕÞ is a constant with respect to ξ. We
describe this as

-j
1 - ξ2 Y j- j ðθ, ϕÞ = c ðc : constant with respect to ξÞ: ð3:119Þ

Meanwhile, putting m = - j and n = 2j + 1 in (3.115) and taking account of the


first equation of (3.77) and the first equation of (3.81), we get

jþ1 2jþ1
2jþ1 ∂
M ðþÞ Y j- j ðθ, ϕÞ = ð- 1Þ2jþ1 eið2jþ1Þϕ 1 - ξ2
∂ξ2jþ1
j
 1 - ξ2 Y j- j ðθ, ϕÞ = 0: ð3:120Þ

This means that

j
1 - ξ2 Y j- j ðθ, ϕÞ = ðat most a 2j‐degree polynomial with ξÞ: ð3:121Þ

Replacing Y j- j ðθ, ϕÞ in (3.121) with that of (3.119), we get

j j
j
c 1 - ξ2 1 - ξ2 = c 1 - ξ2

= ðat most a 2j‐degree polynomial with ξÞ: ð3:122Þ

Here if j is a half-odd-integer, c(1 - ξ2)j of (3.122) cannot be a polynomial. If on


the other hand j is zero or a positive integer, c(1 - ξ2)j is certainly a polynomial and,
j
to top it all, a 2j-degree polynomial with respect to ξ; so is 1 - ξ2 Y j- j ðθ, ϕÞ.
According to the custom, henceforth we use l as zero or a positive integer instead
of j. That is,

l ðθ, ϕÞ ðl : zero or a positive integer Þ:


Y ðθ, ϕÞ  Y m ð3:123Þ

At the same time, so far as the orbital angular momentum is concerned, from
(3.71) and (3.86) we can identify ζ in (3.71) with l(l + 1). Namely, we have
3.5 Orbital Angular Momentum: Operator Approach 85

ζ = lðl þ 1Þ:

Concomitantly, m in (3.110) is determined as

m = l, l - 1, l - 2, ⋯1, 0, - 1, ⋯ - l þ 1, - l: ð3:124Þ

Thus, as expected m is zero or a positive or negative integer. Considering (3.37)


and (3.46), ζ is identical with λ in (3.46). Finally, we rewrite (3.66) such that

1 d dΘðθÞ m2 ΘðθÞ
- sin θ þ = lðl þ 1ÞΘðθÞ, ð3:125Þ
sin θ dθ dθ sin 2 θ

where l is equal to zero or positive integers and m is given by (3.124).


On condition of ξ = cos θ (3.107), defining the following function:

l ðξÞ  ΘðθÞ,
Pm ð3:126Þ

and considering (3.109) along with (3.54), we arrive at the next SOLDE described as

d l ðξÞ
dPm m2
1 - ξ2 þ lðl þ 1Þ - Pm ðξÞ = 0: ð3:127Þ
dξ dξ 1 - ξ2 l

The SOLDE of (3.127) is well-known as the associated Legendre differential


equation. The solutions Pm l ðξÞ are called associated Legendre functions.
In the next section, we characterize the said equation and functions by an
analytical method. Before going into details, however, we further seek characteris-
l ðξÞ by the operator approach.
tics of Pm
Adopting the notation of (3.123) and putting m = l in (3.112), we have

lþ1 -l

M ðþÞ Y ll ðθ, ϕÞ = - eiϕ 1 - ξ2 1 - ξ2 Y ll ðθ, ϕÞ : ð3:128Þ
∂ξ

Corresponding to (3.81), we have M ðþÞ Y ll ðθ, ϕÞ = 0. This implies that

-l
1 - ξ2 Y ll ðθ, ϕÞ = c ðc : constant with respect to ξÞ: ð3:129Þ

From (3.64) and (3.107), we get

Y ll ðθ, ϕÞ = κl sin l θeilϕ , ð3:130Þ

where κ l is another constant that depends on l, but is independent of θ and ϕ. Let us


seek κ l by normalization condition. That is,
86 3 Hydrogen-Like Atoms

2π π π
2
dϕ sin θdθ Y ll ðθ, ϕÞ = 2π  jκ l j2 sin 2lþ1 θdθ = 1, ð3:131Þ
0 0 0

where the integration is performed on a unit sphere. Note that an infinitesimal area
element on the unit sphere is represented by sin θdθdϕ.
We evaluate the above integral denoted as
π
I sin 2lþ1 θdθ: ð3:132Þ
0

Using integration by parts,


π
I = ð - cos θÞ ′ sin 2l θdθ
0
π
π
= - cos θ sin 2l θ 0
þ ð cos θÞ  2l  sin 2l - 1 θ cos θdθ ð3:133Þ
0
π π
= 2l sin 2l - 1 θdθ - 2l sin 2lþ1 θdθ:
0 0

Thus, we get a recurrence relation with respect to I (3.132) such that


π
2l
I= sin 2l - 1 θdθ: ð3:134Þ
2l þ 1 0

Repeating the above process, we get

π
2l 2l - 2 2 22lþ1 ðl!Þ2
I=  ⋯ sin θdθ = : ð3:135Þ
2l þ 1 2l - 1 3 0 ð2l þ 1Þ!

Then,

1 ð2l þ 1Þ! eiχ ð2l þ 1Þ!


jκ l j = or κl = ðχ : realÞ, ð3:136Þ
2l l! 4π 2l l! 4π

where eiχ is an undetermined constant (phase factor) that is to be determined below.


Thus we get

eiχ ð2l þ 1Þ! l ilϕ


Y ll ðθ, ϕÞ = sin θe : ð3:137Þ
2l l! 4π

Meanwhile, in the second equation of (3.101) replacing J(-), j, and μ in (3.101)


l ðθ, ϕÞ instead of jζ, μi, we get
with M(-), l, and m, respectively, and using Y m
3.5 Orbital Angular Momentum: Operator Approach 87

-1 1
Ym
l ðθ, ϕÞ = M ð- Þ Y m
l ðθ, ϕÞ: ð3:138Þ
ðl - m þ 1Þðl þ mÞ

Replacing m with l in (3.138), we have

1
Y ll - 1 ðθ, ϕÞ = p M ð- Þ Y ll ðθ, ϕÞ: ð3:139Þ
2l

Operating M(-)(l - m) times in total on Y ll ðθ, ϕÞ of (3.139), we have

1 l-m
l ðθ, ϕÞ =
Ym M ð- Þ Y ll ðθ, ϕÞ
2lð2l - 1Þ⋯ðl þ m þ 1Þ 1  2⋯ðl - mÞ
ðl þ mÞ! l-m
= M ð- Þ Y ll ðθ, ϕÞ:
ð2lÞ!ðl - mÞ!
ð3:140Þ

Putting m = l, n = l - m, and j = l in (3.117), we have

-m l-m
l-m ∂
M ð- Þ Y ll ðθ, ϕÞ = e - iðl - mÞϕ 1 - ξ2
∂ξl - m
l
 1 - ξ2 Y ll ðθ, ϕÞ : ð3:141Þ

l-m l
Further replacing M ð- Þ Y l ðθ, ϕÞ in (3.140) with that of (3.141), we get

-m l-m
ðl þ mÞ! - iðl - mÞϕ ∂
l ðθ, ϕÞ =
Ym e 1 - ξ2
ð2lÞ!ðl - mÞ! ∂ξl - m
l
 1 - ξ2 Y ll ðθ, ϕÞ : ð3:142Þ

Finally, replacing Y ll ðθ, ϕÞ in (3.142) with that of (3.137) and converting θ to ξ,


we arrive at the following equation:

l-m
eiχ ð2l þ 1Þðl þ mÞ! imϕ - m=2 ∂ l
l ðθ, ϕÞ =
Ym e 1 - ξ2 1 - ξ2 : ð3:143Þ
2l l! 4π ðl - mÞ! ∂ξl - m

Now, let us decide eiχ . Putting m = 0 in (3.143), we have


88 3 Hydrogen-Like Atoms

eiχ ð- 1Þl 2l þ 1 ∂
l
l
Y 0l ðθ, ϕÞ = 1 - ξ2 , ð3:144Þ
2l l!ð- 1Þl 4π ∂ξl

where we put (-1)l on both the numerator and denominator. In RHS of (3.144),

ð- 1Þl ∂l l
1 - ξ2  Pl ðξÞ: ð3:145Þ
2l l! ∂ξl

Equation (3.145) is well-known as Rodrigues formula of Legendre polynomials.


We mention characteristics of Legendre polynomials in the next section. Thus,

eiχ 2l þ 1
Y 0l ðθ, ϕÞ = P ðξÞ: ð3:146Þ
ð- 1Þl 4π l

According to the custom [2], we require Y 0l ð0, ϕÞ to be positive. Noting that θ = 0


corresponds to ξ = 1, we have

eiχ 2l þ 1 eiχ 2l þ 1
Y 0l ð0, ϕÞ = Pl ð1Þ = , ð3:147Þ
ð- 1Þl 4π ð- 1Þl 4π

where we used Pl(1) = 1. For this important relation, see Sect. 3.6.1. Also noting that
eiχ
ð- 1Þl
= 1, we must have

eiχ
=1 or eiχ = ð- 1Þl ð3:148Þ
ð- 1Þl

so that Y 0l ð0, ϕÞ can be positive. Thus, (3.143) is rewritten as

l-m
ð- 1Þl ð2l þ 1Þðl þ mÞ! imϕ - m=2 ∂
l ðθ, ϕÞ =
Ym e 1 - ξ2
2l l! 4π ðl - mÞ! ∂ξl - m
l
 1 - ξ2 : ð3:149Þ

In Sect. 3.3, we mentioned that jψ 0i in (3.43) is a constant. In fact, putting


l = m = 0 in (3.149), we have

Y 00 ðθ, ϕÞ = 1=4π: ð3:150Þ

Thus, as a simultaneous eigenstate of all Lx, Ly, Lz, and L2 corresponding to l = 0


and m = 0, we have
3.5 Orbital Angular Momentum: Operator Approach 89

j ψ 0 i  Y 00 ðθ, ϕÞ:

The normalized functions Y m l ðθ, ϕÞ described as (3.149) define simultaneous


eigenfunctions of L2 (or M2) and Lz (or Mz). Those functions are called spherical
surface harmonics and frequently appear in various fields of mathematical physics.
As in the case of Sect. 2.3, matrix representation enables us to intuitively grasp
the relationship between angular momentum operators and their eigenfunctions
(or eigenvectors). Rewriting the relations of (3.101) so that they can meet the present
purpose, we have

M ð- Þ j l, mi = ðl - m þ 1Þðl þ mÞ j l, m - 1i,
ð3:151Þ
ð þÞ
M j l, mi = ðl - mÞðl þ m þ 1Þ j l, m þ 1i,

where we used l instead of ζ [or λ in (3.46)] to designate the eigenstate.


Now, we know that m takes (2l + 1) different values that correspond to each l.
This implies that the operators can be expressed with (2l + 1, 2l + 1) matrices. As
implied in (3.151), M(-) takes the following form:

M ð- Þ
p
0 2l  1
0 ð2l - 1Þ  2
0 ⋱

= 0 ð2l - k þ 1Þ  k
0
⋱ ⋱
p
0 1  2l
0
ð3:152Þ

where diagonal elements are zero and a (k, k + 1) element is ð2l - k þ 1Þ  k. That
is, non-zero elements are positioned just above the zero diagonal elements.
Correspondingly, we have
90 3 Hydrogen-Like Atoms

0
p
2l1 0
ð2l-1Þ2 0
⋱ 0
ð þÞ
M = ð2l-k þ1Þk 0 , ð3:153Þ
⋱ ⋱
0
⋱ 0
p
12l 0

where again diagonal elements are zero and a (k + 1, k) element is ð2l - k þ 1Þ  k.


In this case, non-zero elements are positioned just below the zero diagonal elements.
Notice also that M(-) and M(+) are adjoint to each other and that these notations
correspond to (2.65) and (2.66).
l ðθ, ϕÞ can be represented by a column vector, as in the case of
Basis functions Y m
Sect. 2.3. These are denoted as follows:

1 0
0 1
0 0
j l, - li = , j l, - l þ 1i = , ⋯,
⋮ ⋮
0 0
0 0
0 0
0 0
⋮ ⋮
j l, l - 1i = , j l, li = , ð3:154Þ
0 0
1 0
0 1

where the first number l in jl, - li, jl, - l + 1i, etc. denotes the quantum number
associated with λ = l(l + 1) given in (3.46) and is kept constant; the latter number
denotes m. Note from (3.154) that the column vector whose k-th row is 1 corresponds
to m such that

m = - l þ k - 1: ð3:155Þ

For instance, if k = 1, m = - l; if k = 2l + 1, m = l, etc.


3.5 Orbital Angular Momentum: Operator Approach 91

The operator M(-) converts the column vector whose (k + 1)-th row is 1 to that
whose k-th row is 1. The former column vector corresponds to jl, m + 1i and the
latter corresponding to jl, mi. Therefore, using (3.152), we get the following
representation:

M ð- Þ j l, m þ 1i = ð2l - k þ 1Þ  k j l, mi = ðl - mÞðl þ m þ 1Þ
ð3:156Þ
× j l, mi,

where the second equality is obtained by replacing k with that of (3.155), i.e.,
k = l + m + 1. Changing m to (m - 1), we get the first equation of (3.151). Similarly,
we obtain the second equation of (3.151) as well. That is, we have

M ðþÞ j l, mi = ð2l - k þ 1Þ  k j l, m þ 1i = ðl - mÞðl þ m þ 1Þ


ð3:157Þ
× j l, m þ 1i:

From (3.32) we have

M 2 = M ðþÞ M ð- Þ þ M z 2 - M z :

In the above, M(+)M(-) and Mz are diagonal matrices and, hence, Mz2 and M2 are
diagonal matrices as well such that
92 3 Hydrogen-Like Atoms

-l

-l þ 1

-lþ 2
Mz = ⋱ ,
k-l


l
0

2l  1

ð2l - 1Þ  2

M ðþÞ M ð- Þ = ,

ð2l - k þ 1Þ  k


1  2l
ð3:158Þ

where k - l and (2l - k + 1)  k represent (k + 1, k + 1) elements of Mz and M(+)M(-),


respectively. Therefore, (k + 1, k + 1) element of M2 is calculated as

ð2l - k þ 1Þ  k þ ðk - lÞ2 - ðk - lÞ = lðl þ 1Þ:

As expected, M2 takes a constant value l(l + 1). A matrix representation is shown


in the following equation such that

lðl þ 1Þ
l ð l þ 1Þ

M =
2
lðl þ 1Þ : ð3:159Þ

l ð l þ 1Þ
lðl þ 1Þ

These expressions are useful to understand how the vectors of (3.154) constitute
simultaneous eigenstates of M2 and Mz. In this situation, the matrix representation is
3.5 Orbital Angular Momentum: Operator Approach 93

said to diagonalize both M2 and Mz. In other words, the quantum states represented
by (3.154) are simultaneous eigenstates of M2 and Mz.
The matrices (3.152) and (3.153) that represent M(-) and M(+), respectively, are
said to be ladder operators or raising and lowering operators, because operating
column vectors those operators convert jmi to jm ∓ 1i as mentioned above. The
operators M(-) and M(+) correspond to a and a{ given in (2.65) and (2.66), respec-
tively. All these operators are characterized by that the corresponding matrices have
diagonal elements of zero and that non-vanishing elements are only positioned on
“right above” or “right below” relative to the diagonal elements.
These matrices are a kind of triangle matrices and all their diagonal elements are
zero. The matrices are characteristic of nilpotent matrices. That is, if a suitable power
of a matrix is zero as a matrix, such a matrix is said to be a nilpotent matrix (see Part
III). In the present case, (2l + 1)-th power of M(-) and M(+) becomes zero as a matrix.
The operator M(-) and M(+) can be described by the following shorthand
representations:

M ð- Þ = ð2l - k þ 1Þ  kδkþ1, j ð1 ≤ k ≤ 2lÞ: ð3:160Þ


kj

If l = 0, Mz = M(+)M(-) = M2 = 0. This case corresponds to Y 00 ðθ, ϕÞ = 1=4π


and we do not need the matrix representation. Defining

ak  ð2l - k þ 1Þ  k,

we have, for instance,

2
M ð- Þ = aδ aδ
p k kþ1,p p pþ1,j
= ak akþ1 δkþ2,j
kj ð3:161Þ
= ð2l - k þ 1Þ  k ½2l - ðk þ 1Þ þ 1  ðk þ 1Þδkþ2, j ,

where the summation is non-vanishing only if p = k + 1. The factor δk + 2, j implies


that the elements are shifted by one toward upper right by being squared. Similarly
we have

M ðþÞ = ak - 1 δk,jþ1 ð1 ≤ k ≤ 2l þ 1Þ: ð3:162Þ


kj

In (3.158), M(+)M(-) is represented as follows:


94 3 Hydrogen-Like Atoms

M ðþÞ M ð- Þ = a δ aδ
p k - 1 k,pþ1 p pþ1, j
= ak - 1 aj - 1 δk, j = ðak - 1 Þ2 δk, j
kj
= ½2l - ðk - 1Þ þ 1  ðk - 1Þδk, j = ð2l - k þ 2Þðk - 1Þδk, j :
ð3:163Þ

Notice that although a0 is not defined, δ1, j + 1 = 0 for any j, and so this causes no
inconvenience. Hence, [M(+)M(-)]kj of (3.163) is well-defined with 1 ≤ k ≤ 2l + 1.
Important properties of angular momentum operators examined above are based
upon the fact that those operators are ladder operators and represented by nilpotent
matrices. These characteristics will further be studied in Parts III–V.

3.6 Orbital Angular Momentum: Analytic Approach

In this section, our central task is to solve the associated Legendre differential
equation expressed by (3.127) by an analytical method. Putting m = 0 in (3.127),
we have

d dP0l ðxÞ
1 - x2 þ lðl þ 1ÞP0l ðxÞ = 0, ð3:164Þ
dx dx

where we use a variable x instead of ξ. Equation (3.164) is called Legendre


differential equation and its characteristics and solutions have been widely investi-
gated. Hence, we put

P0l ðxÞ  Pl ðxÞ, ð3:165Þ

where Pl(x) is said to be Legendre polynomials. We first start with Legendre


differential equation and Legendre polynomials.

3.6.1 Spherical Surface Harmonics and Associated Legendre


Differential Equation

Let us think of a following identity according to Byron and Fuller [4]:

d l l
1 - x2 1 - x2 = - 2lx 1 - x2 , ð3:166Þ
dx

where l is a positive integer. We differentiate both sides of (3.166) (l + 1) times. Here


we use the Leibniz rule about differentiation of a product function that is described
by
3.6 Orbital Angular Momentum: Analytic Approach 95

n n!
dn ðuvÞ = dm udn - m v, ð3:167Þ
m=0 m!ðn - mÞ!

where

dm u=dxm  dm u:

The above shorthand notation is due to Byron and Fuller [4]. We use this notation
for simplicity from place to place.
Noting that the third order and higher order differentiations of (1 - x2) vanish in
LHS of (3.166), we have

l
LHS = dlþ1 1 - x2 d 1 - x2
l l
= 1 - x2 dlþ2 1 - x2 - 2ðl þ 1Þxd lþ1 1 - x2
l
- lðl þ 1Þd l 1 - x2 :

Also noting that the second order and higher order differentiations of 2lx vanish in
LHS of (3.166), we have

l
RHS = - dlþ1 2lx 1 - x2
l l
= - 2lxd lþ1 1 - x2 - 2lðl þ 1Þd l 1 - x2 :

Therefore,

l l l
LHS - RHS = 1 - x2 d lþ2 1 - x2 - 2xd lþ1 1 - x2 þ lðl þ 1Þdl 1 - x2 = 0:

We define Pl(x) as

ð- 1Þl dl l
Pl ðxÞ  1 - x2 , ð3:168Þ
2l l! dxl
l
where a constant ð-2l1l!Þ is multiplied according to the custom so that we can explicitly
represent Rodrigues formula of Legendre polynomials. Thus, from (3.164) Pl(x)
defined above satisfies Legendre differential equation. Rewriting it, we get

d 2 Pl ðxÞ dP ðxÞ
1 - x2 2
- 2x l þ lðl þ 1ÞPl ðxÞ = 0: ð3:169Þ
dx dx

Or equivalently, we have
96 3 Hydrogen-Like Atoms

d dPl ðxÞ
1 - x2 þ lðl þ 1ÞPl ðxÞ = 0: ð3:170Þ
dx dx

Returning to (3.127) and using x as a variable, we rewrite (3.127) as

d l ð xÞ
dPm m2
1 - x2 þ l ð l þ 1Þ - Pm ðxÞ = 0, ð3:171Þ
dx dx 1 - x2 l

where l is a non-negative integer and m is an integer that takes following values:

m = l, l - 1, l - 2, ⋯1, 0, - 1, ⋯ - l þ 1, - l:

Deferential equations expressed as

d dyðxÞ
pð x Þ þ cðxÞyðxÞ = 0
dx dx

are of particular importance. We will come back to this point in Sect. 10.3.
Since m can be either positive or negative, from (3.171) we notice that Pm l ðxÞ and
-m
Pl ðxÞ must satisfy the same differential equation (3.171). This implies that Pm l ðxÞ
and Pl- m ðxÞ are connected, i.e., linearly dependent. First, let us assume that m ≥ 0. In
the case of m < 0, we will examine it later soon.
According to Dennery and Krzywicki [5], we assume

m=2
l ð xÞ = κ 1 - x
Pm C ðxÞ, ð3:172Þ
2

where κ is a constant. Inserting (3.172) into (3.171) and rearranging the terms, we
obtain

d2 C dC
1 - x2 - 2ðm þ 1Þx þ ðl - mÞðl þ m þ 1ÞC = 0 ð0 ≤ m ≤ lÞ: ð3:173Þ
dx2 dx

Recall once again that if m = 0, the associated Legendre differential equation


given by (3.127) and (3.171) is exactly identical to Legendre differential equation of
(3.170). Differentiating (3.170) m times, we get

d2 d m Pl d d m Pl
1 - x2 m - 2ðm þ 1Þx þ ðl - mÞðl þ m þ 1Þ
dx 2 dx dx dxm
dm P
 m l = 0, ð3:174Þ
dx

where we used the Leibniz rule about differentiation of (3.167). Comparing (3.173)
and (3.174), we find that
3.6 Orbital Angular Momentum: Analytic Approach 97

dm Pl
C ð xÞ = κ 0 ,
dxm

where κ′ is a constant. Inserting this relation into (3.172) and setting κκ ′ = 1, we get

m=2 dm Pl ðxÞ
l ð xÞ = 1 - x
Pm ð0 ≤ m ≤ lÞ: ð3:175Þ
2
dxm

Using Rodrigues formula of (3.168), we have

ð- 1Þl m=2 dlþm l


l ð xÞ 
Pm 1 - x2 1 - x2 : ð3:176Þ
2l l! dxlþm

Equations (3.175) and (3.176) define the associated Legendre functions. Note,
however, that the function form differs from literature to literature [2, 5, 6].
Among classical orthogonal polynomials, Gegenbauer polynomials C λn ðxÞ often
appear in the literature. The relevant differential equation is defined by

d2 λ d
1 - x2 C ðxÞ - ð2λ þ 1Þx C λn ðxÞ þ nðn þ 2λÞC λn ðxÞ
2 n
dx dx
1
=0 λ> - : ð3:177Þ
2

Setting n = l - m and λ = m þ 12 in (3.177) [5], we have

d 2 mþ12 d mþ1
1 - x2 C
2 l-m
ðxÞ - 2ðm þ 1Þx Cl - m2 ðxÞ þ ðl - mÞ
dx dx
mþ1
 ðl þ m þ 1ÞC l - m2 ðxÞ = 0: ð3:178Þ

Once again comparing (3.174) and (3.178), we obtain

dm Pl ðxÞ mþ1
m = constant  Cl - m2 ðxÞ ð0 ≤ m ≤ lÞ: ð3:179Þ
dx

Next, let us determine the constant appearing in (3.179). To this end, we consider
a following generating function of the polynomials C λn ðxÞ defined by [7, 8]

-λ 1 1
1 - 2tx þ t 2  n=0
Cλn ðxÞt n λ> - : ð3:180Þ
2

To calculate (3.180), let us think of a following expression for x and λ:


98 3 Hydrogen-Like Atoms

1 -λ m
ð 1 þ xÞ - λ = m=0
x , ð3:181Þ
m


where λ is an arbitrary real number and we define as
m

-λ -λ
 - λð- λ - 1Þð- λ - 2Þ⋯ð- λ - m þ 1Þ=m! and
m 0
 1: ð3:182Þ

Notice that (3.181) with (3.182) is an extension of binomial theorem (generalized


binomial theorem). Putting -λ = n, we have

n n! n
= and = 1:
m ðn - mÞ!m! 0

We rewrite (3.181) using gamma functions Γ(z) such that

1 Γð- λ þ 1Þ
ð 1 þ xÞ - λ = xm , ð3:183Þ
m=0 m!Γð- λ - m þ 1Þ

where Γ(z) is defined by integral representation as


1
ΓðzÞ = e - t t z - 1 dt ðℜe z > 0Þ: ð3:184Þ
0

In (3.184), ℜe denotes the real part of a number z. Changing variables such that
t = u2, we have
1
e - u u2z - 1 du ðℜe z > 0Þ:
2
ΓðzÞ = 2 ð3:185Þ
0

Note that the above expression is associated with the following fundamental
feature of the gamma functions:

Γðz þ 1Þ = zΓðzÞ, ð3:186Þ

where z is any complex number.


Replacing x with -t(2x - t) and rewriting (3.183), we have
3.6 Orbital Angular Momentum: Analytic Approach 99

-λ 1 Γð- λ þ 1Þ
1 - 2tx þ t 2 = ð- t Þm ð2x - t Þm : ð3:187Þ
m=0 m!Γð- λ - m þ 1Þ

Assuming that x is a real number belonging to an interval [-1, 1], (3.187) holds
with t satisfying |t| < 1 [8]. The discussion is as follows: When x satisfies the above
condition, solving 1 - 2tx + t2 = 0 we get solution t± such that
p
t ± = x ± i 1 - x2 :

Defining r as

r  minfjt þ j, jt - jg,

(1 - 2tx + t2)-λ, which is regarded as a function of t, is analytic in the disk |t| < r.
However, we have

jt ± j = 1:

Thus, (1 - 2tx + t2)-λ is analytic within the disk |t| < 1 and, hence, it can be
expanded in a Taylor’s series (see Chap. 6).
Continuing the calculation of (3.187), we have

-λ 1 Γð-λ þ 1Þ m m!
1-2tx þ t2 ¼ m¼0 m!Γð-λ -m þ 1Þ
ð-1Þm t m k¼0 k!ðm -k Þ!
2m -k xm - k ð-1Þk tk

1 m ð-1Þmþk Γð-λ þ 1Þ m- k m -k mþk


¼ m¼0 k¼0 k!ðm -k Þ! Γð-λ- m þ 1Þ
2 x t

1 m ð-1Þmþk ð- 1Þm Γðλ þ mÞ m -k m -k mþk


¼ k¼0 k!ðm -k Þ!
2 x t ,
m¼0 ΓðλÞ
ð3:188Þ

where the last equality results from that we rewrote gamma functions using (3.186).
Replacing (m + k) with n, we get

-λ 1 ½n=2ð- 1Þk 2n - 2k Γðλ þ n - kÞ n - 2k n


1 - 2tx þ t 2 = k = 0 k!ðn - 2k Þ!
x t , ð3:189Þ
n=0 Γ ðλÞ

where [n/2] represents an integer that does not exceed n/2. This expression comes
from a requirement that an order of x must satisfy the following condition:

n - 2k ≥ 0 or k ≤ n=2: ð3:190Þ

That is, if n is even, the maximum of k = n/2. If n is odd, the maximum of


k = (n - 1)/2. Comparing (3.180) and (3.189), we get [8]
100 3 Hydrogen-Like Atoms

½n=2ð- 1Þk 2n - 2k Γðλ þ n - kÞ n - 2k


Cλn ðxÞ = k = 0 k!ðn - 2k Þ!
x : ð3:191Þ
Γ ðλÞ

Comparing (3.164) and (3.177) and putting λ = 1/2, we immediately find that the
two differential equations are identical [7]. That is,

n ðxÞ = Pn ðxÞ:
C1=2 ð3:192Þ

Hence, we further have

½n=2ð- 1Þk 2n - 2k Γ 1
þ n - k n - 2k
P n ð xÞ = k = 0 k!ðn - 2k Þ!
2
x : ð3:193Þ
Γ 12

Using (3.186) once again, we get [8]

½n=2 ð- 1Þk ð2n - 2k Þ! n - 2k


Pn ðxÞ = k = 0 2 k!ðn - k Þ!ðn - 2k Þ!
n x : ð3:194Þ

It is convenient to make a formula about a gamma function. In (3.193), n - k > 0,


and so let us think of Γ 12 þ m ðm : positive integerÞ. Using (3.186), we have

1 1 1 1 3 3
Γ þ m ¼ m- Γ m- ¼ m- m- Γ m- ¼⋯
2 2 2 2 2 2
1 3 1 1
¼ m- m- ⋯ Γ
2 2 2 2
1 ð3:195Þ
¼ 2 - m ð2m - 1Þð2m - 3Þ⋯3  1  Γ
2
ð2m - 1 Þ! 1 ð2mÞ! 1
¼ 2-m m-1 Γ ¼ 2 - 2m Γ :
2 ðm - 1Þ! 2 m! 2

Notice that (3.195) still holds even if m = 0. Inserting n - k into m of (3.195), we


get

1 ð2n - 2kÞ! 1
Γ þ n - k = 2 - 2ðn - kÞ Γ : ð3:196Þ
2 ðn - kÞ! 2

Replacing Γ 12 þ n - k of (3.193) with RHS of the above equation, (3.194) will


follow. A gamma function Γ 12 often appears in mathematical physics. According to
(3.185), we have
3.6 Orbital Angular Momentum: Analytic Approach 101

1 p
1
e - u du = π :
2
Γ =2
2 0

For the derivation of the above definite integral, see (2.86) of Sect. 2.4. From
(3.184), we also have

Γð1Þ = 1:

In relation of the discussion of Sect. 3.5, let us derive an important formula about
Legendre polynomials. From (3.180) and (3.192), we get

- 1=2 1
1 - 2tx þ t2  P ðxÞt
n=0 n
n
: ð3:197Þ

Assuming |t| < 1, when we put x = 1 in (3.197), we have

- 12 1 1 1
1 - 2tx þ t 2 = = tn = P ð1Þt n
: ð3:198Þ
1-t n=0 n=0 n

Comparing individual coefficients of tn in (3.198), we get

Pn ð1Þ = 1:

See the related parts of Sect. 3.5.


Now, we are in the position to determine the constant in (3.179). Differentiating
(3.194) m times, we have

k
½ðl - mÞ=2 ð- 1Þ ð2l - 2k Þ!ðl - 2k Þðl - 2k - 1Þ⋯ðl - 2k - m þ 1Þ l - 2k - m
d m Pl ðxÞ=dxm ¼ x
k¼0 l
2 k!ðl - k Þ!ðl - 2k Þ!
k
½ðl - mÞ=2 ð- 1Þ ð2l - 2kÞ!
¼ xl - 2k - m :
k¼0 2l k!ðl - kÞ!ðl - 2k - mÞ!
ð3:199Þ

Meanwhile, we have

mþ1 ½ðl - mÞ=2 ð- 1Þk 2l - 2k - m Γ l þ 12 - k l - 2k - m


Cl - m2 ðxÞ = x : ð3:200Þ
k=0 k!ðl - 2k - mÞ! Γ m þ 12

Using (3.195) and (3.196), we have

Γ l þ 12 - k ð2l - 2kÞ! m!
= 2 - 2ðl - k - mÞ :
Γ m þ 12 ðl - kÞ! ð2mÞ!

Therefore, we get
102 3 Hydrogen-Like Atoms

mþ1 ½ðl - mÞ=2 ð- 1Þk ð2l - 2kÞ! 2m Γðm þ 1Þ l - 2k - m


C l - m2 ðxÞ = x , ð3:201Þ
k=0 2l k!ðl - k Þ!ðl - 2k - mÞ! Γð2m þ 1Þ

where we used m ! = Γ(m + 1) and (2m) ! = Γ(2m + 1). Comparing (3.199) and
(3.201), we get

dm Pl ðxÞ Γð2m þ 1Þ mþ12


= m C ðxÞ: ð3:202Þ
dx m
2 Γ ð m þ 1Þ l - m

Thus, we find that the constant appearing in (3.179) is 2ΓmðΓ2mþ1 Þ


ðmþ1Þ. Putting m = 0 in
1=2
(3.202), we have Pl ðxÞ = C l ðxÞ. Therefore, (3.192) is certainly recovered. This
gives an easy checkup to (3.202).
Meanwhile, Rodrigues formula of Gegenbauer polynomials [5] is given by

ð- 1Þn Γðn þ 2λÞΓ λ þ 12 - λþ12 dn nþλ - 12


C λn ðxÞ = 1 - x2 1 - x2 : ð3:203Þ
2n n!Γ n þ λ þ 12 Γð2λÞ dxn

Hence, we have

mþ1 ð- 1Þl - m Γðl þ m þ 1ÞΓðm þ 1Þ -m dl - m


C l - m2 ðxÞ = 1 - x2
2l - m ðl - mÞ!Γðl þ 1ÞΓð2m þ 1Þ dxl - m
l
 1 - x2 : ð3:204Þ

Inserting (3.204) into (3.202), we have

d m Pl ðxÞ ð- 1Þl - m Γðl þ m þ 1Þ -m dl - m l


= 1 - x2 1 - x2
dxm 2l ðl - mÞ!Γðl þ 1Þ dxl - m
ð3:205Þ
ð- 1Þl - m ðl þ mÞ! -m dl - m 2 l
= 1 - x2 1-x :
2l l!ðl - mÞ! dxl - m

Further inserting this into (3.175), we finally get

ð- 1Þl - m ðl þ mÞ! - m=2 dl - m l


l ð xÞ =
Pm 1 - x2 1 - x2 : ð3:206Þ
2l l!ðl - mÞ! dxl - m

When m = 0, we have

ð- 1Þl dl l
P0l ðxÞ = 1 - x2 = Pl ðxÞ: ð3:207Þ
2l l! dxl
3.6 Orbital Angular Momentum: Analytic Approach 103

Thus, we recover the functional form of Legendre polynomials. The expression


(3.206) is also meaningful for negative m, provided |m| ≤ l, and permits an extension
l ðxÞ given by (3.175) to negative numbers of m [5].
of the definition of Pm
Changing m to -m in (3.206), we have

ð- 1Þlþm ðl - mÞ! m=2 dlþm l


Pl- m ðxÞ = 1 - x2 1 - x2 : ð3:208Þ
2l l!ðl þ mÞ! dxlþm

Meanwhile, from (3.168) and (3.175),

ð- 1Þl m=2 dlþm l


l ðxÞ =
Pm 1 - x2 1 - x2 ð0 ≤ m ≤ lÞ: ð3:209Þ
2l l! dxlþm

Comparing (3.208) and (3.209), we get

ð- 1Þm ðl - mÞ! m
Pl- m ðxÞ = Pl ðxÞ ð- l ≤ m ≤ lÞ: ð3:210Þ
ðl þ mÞ!

-m
l ðxÞ and Pl
Thus, as expected earlier, Pm ðxÞ are linearly dependent.
Now, we return back to (3.149). From (3.206) we have

- m=2 dl - m l ð- 1Þl - m 2l l!ðl - mÞ! m


1 - x2 1 - x2 = Pl ðxÞ: ð3:211Þ
dxl - m ðl þ mÞ!

Inserting (3.211) into (3.149) and changing the variable x to ξ, we have

m ð2l þ 1Þðl - mÞ! m


l ðθ, ϕÞ = ð- 1Þ
Ym Pl ðξÞeimϕ ðξ = cos θ; 0 ≤ θ ≤ π Þ: ð3:212Þ
4π ðl þ mÞ!

The coefficient (-1)m appearing (3.212) is well-known as Condon–Shortley


phase [7]. Another important expression obtained from (3.210) and (3.212) is

Y l- m ðθ, ϕÞ = ð- 1Þm Y m
l ðθ, ϕÞ : ð3:213Þ

Since (3.208) or (3.209) involves higher order differentiations, it would some-


what be inconvenient to find their functional forms. Here we try to seek the
convenient representation of spherical harmonics using familiar cosine and sine
functions. Starting with (3.206) and applying Leibniz rule there, we have
104 3 Hydrogen-Like Atoms

ð- 1Þl- m ðl þ mÞ!
l ðxÞ ¼
Pm ð1 þ xÞ - m=2 ð1 -xÞ - m=2
2l l!ðl -mÞ!
l- m ðl -mÞ! d r l d
l- m- r
× r¼0 r!ðl- m- r Þ! dx r ð 1 þ x Þ ð1- xÞl
dxl- m- r
ðl -mÞ! l!ð1 þ xÞl- r - 2 ð- 1Þl - m- r l!ð1-xÞmþr - 2
m m
ð- 1Þl- m ðl þ mÞ! l -m
¼
2l l!ðl -mÞ! r¼0 r!ðl -m -rÞ! ðl -rÞ! ðm þ r Þ!
ð-1Þr ð1 þ xÞl -r - 2 ð1-xÞrþ 2
m m
l!ðl þ mÞ! l- m
¼ :
2l r¼0 r!ðl - m- rÞ! ðl -r Þ! ðm þ r Þ!
ð3:214Þ

Putting x = cos θ in (3.214) and using a trigonometric formula, we have

θ
l-m ð- 1Þr cos 2l - 2r - m sin 2rþm θ2
l ðxÞ = l!ðl þ mÞ!
Pm : ð3:215Þ
2
r = 0 r!ðl - m - r Þ! ðl - r Þ! ðm þ r Þ!

Inserting this into (3.212), we get

l ðθ, ϕÞ
Ym

ð2l þ 1Þðl þ mÞ!ðl - mÞ! imϕ l-m ð- 1Þrþm cos 2l - 2r - m θ2 sin 2rþm θ
= l! e 2
:
4π r=0 r!ðl - m - r Þ!ðl - r Þ!ðm þ r Þ!
ð3:216Þ

Summation domain of r must be determined so that factorials of negative integers


can be avoided [6]. That is,
1. If m ≥ 0, 0 ≤ r ≤ l - m; (l - m + 1) terms,
2. If m < 0, |m| ≤ r ≤ l; (l - |m| + 1) terms.
For example, if we choose l for m, putting r = 0 in (3.216) we have

ð2l þ 1Þ! ilϕ ð- 1Þ cos l 2θ sin θ


l l
Y ll ðθ, ϕÞ = e 2
4π l!
ð2l þ 1Þ! ilϕ ð- 1Þl sin l θ
= e : ð3:217Þ
4π 2l l!

In particular, we have Y 00 ðθ, ϕÞ = 1


4π to recover (3.150). When m = - l, putting
r = l in (3.216) we get
3.6 Orbital Angular Momentum: Analytic Approach 105

ð2l þ 1Þ! - ilϕ cos l θ2 sin l θ


ð2l þ 1Þ! - ilϕ sin l θ
Y l- l ðθ, ϕÞ = e 2
= e : ð3:218Þ
4π l! 4π 2l l!

For instance, choosing l = 3 and m = ± 3 and using (3.217) or (3.218), we have

35 i3ϕ 35 - i3ϕ
Y 33 ðθ, ϕÞ = - e sin 3 θ, Y 3- 3 ðθ, ϕÞ = e sin 3 θ:
64π 64π

For the minus sign appearing in Y 33 ðθ, ϕÞ is due to the Condon–Shortley phase.
For l = 3 and m = 0, moreover, we have

θ θ
ð- 1Þr cos 6 -2r
sin 2r
7  3!  3! 3 2 2
Y 03 ðθ,ϕÞ ¼ 3!
4π r¼0 r!ð3 -rÞ!ð3 -rÞ!r!
θ θ θ θ θ
cos 6 cos 4 sin 2 cos 2 sin 4
7 2 2 2 2 2
¼ 18 - þ
π 0!3!3!0! 1!2!2!1! 2!1!1!2!

θ
sin 6
2
-
3!0!0!3!

θ θ θ θ θ θ
cos 6 - sin 6 cos 2 sin 2 sin 2 - cos 2
7 2 2 2 2 2 2
¼ 18 þ
π 36 4

7 5 3
¼ cos 3 θ - cos θ ,
π 4 4

where in the last equality we used formulae of elementary algebra and trigonometric
functions. At the same time, we get

7
Y 03 ð0, ϕÞ = :

This is consistent with (3.147) in that Y 03 ð0, ϕÞ is positive.

3.6.2 Orthogonality of Associated Legendre Functions

Orthogonality relation of functions is important. Here we deal with it, regarding the
associated Legendre functions.
106 3 Hydrogen-Like Atoms

Replacing m with (m - 1) in (3.174) and using the notation introduced before, we


have

1 - x2 d mþ1 Pl - 2mxd m Pl þ ðl þ mÞðl - m þ 1Þdm - 1 Pl = 0: ð3:219Þ

Multiplying both sides by (1 - x2)m - 1, we have

m mþ1 m-1 m
1 - x2 d Pl - 2mx 1 - x2 d Pl
2 m-1 m-1
þ ð l þ m Þ ð l - m þ 1Þ 1 - x d Pl = 0:

Rewriting the above equation, we get

m-1 m-1
d 1 - x2 Þm dm Pl = - ðl þ mÞðl - m þ 1Þ 1 - x2 d Pl : ð3:220Þ

Now, let us define f(m) as follows:

1
m
f ðmÞ  1 - x2 ðdm Pl Þdm Pl0 dx ð0 ≤ m ≤ l, l0 Þ: ð3:221Þ
-1

Rewriting (3.221) as follows and integrating it by parts, we have

1
m
f ðmÞ = 1 - x2 d dm - 1 Pl d m Pl0 dx
-1
1
m m 1 m m
= d m - 1 Pl 1 - x2 d Pl0 -1
- dm - 1 Pl d 1 - x2 d Pl0 dx
-1
1
m m
=- d m - 1 P l d 1 - x2 d Pl0 dx
-1
1
m-1 m-1
= dm - 1 Pl ðl0 þ mÞðl0 - m þ 1Þ 1 - x2 d Pl0 dx
-1
= ðl þ mÞðl0 - m þ 1Þf ðm - 1Þ,
0

ð3:222Þ

where with the second equality the first term vanished and with the second last
equality we used (3.220). Equation (3.222) gives a recurrence formula regarding
f(m). Further performing the calculation, we get
3.6 Orbital Angular Momentum: Analytic Approach 107

f ðmÞ = ðl0 þ mÞðl0 þ m - 1Þ  ðl0 - m þ 2Þðl0 - m þ 1Þf ðm - 2Þ


=⋯
= ðl0 þ mÞðl0 þ m - 1Þ⋯ðl0 þ 1Þ  l0 ⋯ðl0 - m þ 2Þðl0 - m þ 1Þf ð0Þ
ðl0 þ mÞ!
= 0 f ð0Þ,
ðl - mÞ!
ð3:223Þ

where we have

1
f ð 0Þ = Pl ðxÞPl0 ðxÞdx: ð3:224Þ
-1

Note that in (3.223) a coefficient of f(0) comprises 2m factors.


In (3.224), Pl(x) and Pl0 ðxÞ are Legendre polynomials defined in (3.165). Then,
using (3.168) we have
0
ð- 1Þl ð- 1Þl 1
l 0 l0
f ð 0Þ = l 0 d l 1 - x2 d l 1 - x2 dx: ð3:225Þ
2 l! 2l l0 ! -1

To evaluate (3.224), we have two cases; i.e., (1) l ≠ l′ and (2) l = l′. With the first
case, assuming that l > l′ and taking partial integration, we have

1
l 0 l0
I= d l 1 - x2 d l 1 - x2 dx
-1
l 0 l0 1
= d l - 1 1 - x2 d l 1 - x2 ð3:226Þ
-1
1
l 0 l0
- d l - 1 1 - x2 dl þ1 1 - x2 dx:
-1

In the above, we find that the first term vanishes because it contains (1 - x2) as a
factor. Integrating (3.226) another l′ times as before, we get

1
0 0 0 l0
I = ð- 1Þl þ1
l
dl - l -1
1 - x2 d2l þ1 1 - x2 dx: ð3:227Þ
-1

0
In (3.227) we have 0 ≤ l - l′ - 1 ≤ 2l, and so d l - l - 1 ð1 - x2 Þ does not vanish,
l

l0 0 l0
but ð1 - x2 Þ is an at most 2l′-degree polynomial. Hence, d2l þ1 ð1 - x2 Þ vanishes.
Therefore,

f ð0Þ = 0: ð3:228Þ

If l < l′, changing Pl(x) and Pl0 ðxÞ in (3.224), we get f(0) = 0 as well.
108 3 Hydrogen-Like Atoms

In the second case of l = l′, we evaluate the following integral:

1
l 2
I= d l 1 - x2 dx: ð3:229Þ
-1

Similarly integrating (3.229) by parts l times, we have

1 1
l l l
I = ð- 1Þl 1 - x2 d 2l 1 - x2 dx = ð- 1Þ2l ð2lÞ! 1 - x2 dx: ð3:230Þ
-1 -1

In (3.230), changing x to cosθ, we have

1 π
l
1 - x2 dx = sin 2lþ1 θdθ: ð3:231Þ
-1 0

22lþ1 ðl!Þ2
We have already estimate this integral of (3.132) to have ð2lþ1Þ! in (3.135).
Therefore,

ð- 1Þ2l ð2lÞ! 22lþ1 ðl!Þ2 2


f ð 0Þ = = : ð3:232Þ
2 ðl!Þ
2l 2 ð 2l þ 1 Þ! 2l þ1

Thus, we get

ðl þ mÞ! ðl þ mÞ! 2
f ðm Þ = f ð 0Þ = : ð3:233Þ
ðl - mÞ! ðl - mÞ! 2l þ 1

From (3.228) and (3.233), we have

1
ðl þ mÞ! 2
l ðxÞPl0 ðxÞdx =
Pm δ 0: ð3:234Þ
m
-1 ðl - mÞ! 2l þ 1 ll

Accordingly, putting

ð2l þ 1Þðl - mÞ! m


l ð xÞ 
Pm Pl ðxÞ, ð3:235Þ
2ðl þ mÞ!

we get

1
l ðxÞPl0 ðxÞdx = δll0 :
Pm ð3:236Þ
m
-1

Normalized Legendre polynomials immediately follow. This is given by


3.7 Radial Wave Functions of Hydrogen-Like Atoms 109

2l þ 1 ð- 1Þl 2l þ 1 d l l
Pl ðxÞ  P l ð xÞ = l 1 - x2 : ð3:237Þ
2 2 l! 2 dxl

Combining a normalized function (3.235) with p12π eimϕ , we recover

ð2l þ 1Þðl - mÞ! m


l ðθ, ϕÞ =
Ym Pl ðxÞeimϕ ðx = cos θ; 0 ≤ θ ≤ πÞ: ð3:238Þ
4π ðl þ mÞ!

Notice in (3.238), however, we could not determine Condon–Shortley phase (-


1)m; see (3.212).
-m
Since Pml ðxÞ and Pl ðxÞ are linearly dependent as noted in (3.210), the set of the
associated Legendre functions cannot define a complete set of orthonormal system.
In fact, we have

1
-m ð- 1Þm ðl - mÞ! ðl þ mÞ! 2 2ð- 1Þm
l ðxÞPl
Pm ðxÞdx = = : ð3:239Þ
-1 ðl þ mÞ! ðl - mÞ! 2l þ 1 2l þ 1

-m
This means that Pml ðxÞ and Pl ðxÞ are not orthogonal. Thus, we need eimϕ to
constitute the complete set of orthonormal system. In other words,

2π 1 
0
dϕ dðcos θÞ Y m
l0 ðθ, ϕÞ Y l ðθ, ϕÞ = δll δmm :
m 0 0 ð3:240Þ
0 -1

3.7 Radial Wave Functions of Hydrogen-Like Atoms

In Sect. 3.1 we have constructed Hamiltonian of hydrogen-like atoms. If the physical


system is characterized by the central force field, the method of separation of
variables into the angular part (θ, ϕ) and radial (r) part is successfully applied to
the problem and that method allows us to deal with the Schrödinger equation
separately. The spherical surface harmonics play a central role in dealing with the
differential equations related to the angular part. We studied important properties of
the special functions such as Legendre polynomials and associated Legendre func-
tions, independent of the nature of the specific central force fields such as Coulomb
potential and Yukawa potential. With the Schrödinger equation pertinent to the
radial part, on the other hand, its characteristics differ depending on the nature of
individual force fields. Of these, the differential equation associated with the Cou-
lomb potential gives exact (or analytical) solutions. It is well-known that the second-
order differential equations are often solved by an operator representation method.
Examples include its application to a quantum-mechanical harmonic oscillator and
angular momenta of a particle placed in a central force field. Nonetheless, the
110 3 Hydrogen-Like Atoms

corresponding approach to the radial equation for the electron has been less popular
to date. The initial approach, however, was made by Sunakawa [3]. The purpose of
this chapter rests upon further improvement of that approach.

3.7.1 Operator Approach to Radial Wave Functions [3]

In Sect. 3.2, the separation of variables leaded to the radial part of the Schrödinger
equation described as

1 ħ2 ∂ ∂Rðr Þ ħ2 λ Ze2
- 2 r2 þ 2 Rðr Þ - Rðr Þ = ERðr Þ: ð3:51Þ
2μ r ∂r ∂r r 4πε0 r

We identified λ with l(l + 1); see Sect. 3.5. Thus, rewriting (3.51) and indexing
R(r) with l, we have

ħ2 d 2 dRl ðr Þ ħ2 l ð l þ 1Þ Ze2
- 2
r þ 2
- Rl ðr Þ = ERl ðr Þ, ð3:241Þ
2μr dr dr 2μr 4πε 0r

where Rl(r) is a radial wave function parametrized with l; μ, Z, ε0, and E denote a
reduced mass of hydrogen-like atom, atomic number, permittivity of vacuum, and
eigenvalue of energy, respectively. Otherwise we follow conventions.
Now, we are in position to solve (3.241). As in the cases of Chap. 2 of a quantum-
mechanical harmonic oscillator and the previous section of the angular momentum
operator, we present the operator formalism in dealing with radial wave functions of
hydrogen-like atoms. The essential point rests upon that the radial wave functions
can be derived by successively operating lowering operators on a radial wave
function having a maximum allowed orbital angular momentum quantum number.
The results agree with the conventional coordinate representation method based
upon power series expansion that is related to associated Laguerre polynomials.
Sunakawa [3] introduced the following differential equation by suitable trans-
formations of a variable, parameter, and function:

d 2 ψ l ð ρÞ l ð l þ 1Þ 2
- þ - ψ l ðρÞ = Eψ l ðρÞ, ð3:242Þ
dρ2 ρ2 ρ

2
where ρ = Zra , E = 2μ a
ħ2 Z
E, and ψ l(ρ) = ρRl(r) with a (4πε0ħ2/μe2) being Bohr
radius of a hydrogen-like atom. Note that ρ and E are dimensionless quantities. The
related calculations are as follows: We have
3.7 Radial Wave Functions of Hydrogen-Like Atoms 111

dRl d ðψ l =ρÞ dρ dψ l 1 ψ l Z
= = - :
dr dρ dr dρ ρ ρ2 a

Thus, we get

dRl dψ l a d dR d 2 ψ l ð ρÞ
r2 = r - ψ l, r2 l = ρ:
dr dρ Z dr dr dρ2

Using the above relations we arrive at (3.242).


Here we define the following operators:

d l 1
bl  þ - : ð3:243Þ
dρ ρ l

Hence,

d l 1
b{l = - þ - , ð3:244Þ
dρ ρ l

where the operator b{l is an adjoint operator of bl. Notice that these definitions are
different from those of Sunakawa [3]. The operator dρ d
ð AÞ is formally an anti-
Hermitian operator. We have mentioned such an operator in Sect. 1.5. The second
terms of (3.243) and (3.244) are Hermitian operators, which we define as H. Thus,
we foresee that bl and b{l can be denoted as follows:

bl = A þ H and b{l = - A þ H:

These representations are analogous to those appearing in the operator formalism


of a quantum-mechanical harmonic oscillator. Special care, however, should be
taken in dealing with the operators bl and b{l . First, we should carefully examine
d d
whether dρ is in fact an anti-Hermitian operator. This is because for dρ to be anti-
Hermitian, the solution ψ l(ρ) must satisfy boundary conditions in such a way that
ψ l(ρ) vanishes or takes the same value at the endpoints ρ → 0 and 1. Second, the
coordinate system we have chosen is not Cartesian coordinate but the polar (spher-
ical) coordinate, and so ρ is defined only on a domain ρ > 0. We will come back to
this point later.
Let us proceed on calculations. We have
112 3 Hydrogen-Like Atoms

d l 1 d l 1
bl b{l = þ -  - þ -
dρ ρ l dρ ρ l
d2 d l 1 l 1 d l2 2 1
=- þ - - - þ - þ ð3:245Þ
dρ2 dρ ρ l ρ l dρ ρ2 ρ l2
d2 l l2 2 1 d2 l ð l - 1Þ 2 1
=- - þ - þ = - þ - þ 2:
dρ2 ρ2 ρ2 ρ l2 dρ2 ρ2 ρ l

Also, we have

d2 lðl þ 1Þ 2 1
b{l bl = - þ - þ 2: ð3:246Þ
dρ 2 ρ2 ρ l

We further define an operator H(l ) as follows:

d2 lðl þ 1Þ 2
H ðlÞ  - þ - :
dρ2 ρ2 ρ

Then, from (3.243) and (3.244) as well as (3.245) and (3.246) we have

H ðlÞ = blþ1 b{lþ1 þ εðlÞ ðl ≥ 0Þ, ð3:247Þ

where εðlÞ  - 1
ðlþ1Þ2
. Alternatively,

H ðlÞ = b{l bl þ εðl - 1Þ ðl ≥ 1Þ: ð3:248Þ

If we put l = n - 1 in (3.247) with n being a fixed given integer larger than l, we


obtain

H ðn - 1Þ = bn b{n þ εðn - 1Þ : ð3:249Þ

We evaluate the following inner product of both sides of (3.249):

χjH ðn - 1Þ jχ = χjbn b{n jχ þ εðn - 1Þ hχjχ i = b{n χjb{n χ þ εðn - 1Þ hχjχ i


ð3:250Þ
2
= b{n jχi þ εðn - 1Þ hχjχ i ≥ εðn - 1Þ :

Here we assume that χ is normalized (i.e., hχ| χi = 1). On the basis of the
variational principle [9], the above expected value must take a minimum ε(n - 1)
so that χ can be an eigenfunction. To satisfy this condition, we have
3.7 Radial Wave Functions of Hydrogen-Like Atoms 113

j b{n χi = 0: ð3:251Þ

In fact, if (3.251) holds, from (3.249) we have

H ðn - 1Þ χ = εðn - 1Þ χ: ð3:252Þ

We define such a function as follows:

ðnÞ
ψ n - 1  χ: ð3:253Þ

From (3.247) and (3.248), we have the following relationship:

H ðlÞ blþ1 = blþ1 H ðlþ1Þ ðl ≥ 0Þ: ð3:254Þ

Meanwhile we define the functions as shown in the following:

ðnÞ
ψ ðnn-Þ s  bn - sþ1 bn - sþ2 ⋯bn - 1 ψ n - 1 ð2 ≤ s ≤ nÞ: ð3:255Þ

ðnÞ
With these functions (s - 1) operators have been operated on ψ n - 1 . Note that if
s took 1, no operation of bl would take place. Thus, we find that bl functions upon the
l-state to produce the (l - 1)-state. That is, bl acts as an annihilation operator. For the
sake of convenience we express

H ðn,sÞ  H ðn - sÞ : ð3:256Þ

Using this notation and (3.254), we have

ðnÞ
H ðn,sÞ ψ ðnn-Þ s = H ðn,sÞ bn - sþ1 bn - sþ2 ⋯bn - 1 ψ n - 1
ðnÞ
= bn - sþ1 H ðn,s - 1Þ bn - sþ2 ⋯bn - 1 ψ n - 1
ðnÞ
= bn - sþ1 bn - sþ2 H ðn,s - 2Þ ⋯bn - 1 ψ n - 1
⋯⋯
ðnÞ
= bn - sþ1 bn - sþ2 ⋯H ðn,2Þ bn - 1 ψ n - 1 ð3:257Þ
ðnÞ
= bn - sþ1 bn - sþ2 ⋯bn - 1 H ðn,1Þ ψ n - 1
ðnÞ
= bn - sþ1 bn - sþ2 ⋯bn - 1 εðn - 1Þ ψ n - 1
ðnÞ
= εðn - 1Þ bn - sþ1 bn - sþ2 ⋯bn - 1 ψ n - 1
= εðn - 1Þ ψ ðnn-Þ s :

Thus, total n functions ψ ðnn-Þ s ð1 ≤ s ≤ nÞ belong to the same eigenvalue ε(n - 1).
Notice that the eigenenergy En corresponding to ε(n - 1) is given by
114 3 Hydrogen-Like Atoms

ħ2 Z 2
1
En = - : ð3:258Þ
2μ a n2
ðnÞ
If we define l  n - s and take account of (3.252), total n functions ψ l (l = 0,
1, 2, ⋯, n - 1) belong to the same eigenvalue ε(n - 1).
ðnÞ
The quantum state ψ l is associated with the operators H(l ). Thus, the solution of
ðnÞ
(3.242) has been given by functions ψ l parametrized with n and l on condition that
(3.251) holds. As explicitly indicated in (3.255) and (3.257), bl lowers the parameter
ðnÞ
l by one from l to l - 1, when it operates on ψ l . The operator b0 cannot be defined
as indicated in (3.243), and so the lowest number of l should be zero. Operators such
as bl are known as a ladder operator (lowering operator or annihilation operator in the
ðnÞ
present case). The implication is that the successive operations of bl on ψ n - 1
produce various parameters l as a subscript down to zero, while retaining the same
integer parameter n as a superscript.

3.7.2 Normalization of Radial Wave Functions [10]

Next we seek normalized eigenfunctions. Coordinate representation of (3.251) takes

ðnÞ
dψ n - 1 n 1 ðnÞ
- þ - ψ = 0: ð3:259Þ
dρ ρ n n-1

The solution can be obtained as

ðnÞ
ψ n - 1 = cn ρn e - ρ=n , ð3:260Þ

where cn is a normalization constant. This can be determined as follows:


1 2
ðnÞ
ψ n - 1 dρ = 1: ð3:261Þ
0

Namely,
1
j cn j 2 ρ2n e - 2ρ=n dρ = 1: ð3:262Þ
0

Consider the following definite integral:


3.7 Radial Wave Functions of Hydrogen-Like Atoms 115

1
1
e - 2ρξ dρ = :
0 2ξ

Differentiating the above integral 2n times with respect to ξ gives

1 2nþ1
1
ρ2n e - 2ρξ dρ = ð2nÞ!ξ - ð2nþ1Þ : ð3:263Þ
0 2

Substituting 1/n into ξ, we obtain

1 2nþ1
1
ρ2n e - 2ρ=n dρ = ð2nÞ!nð2nþ1Þ : ð3:264Þ
0 2

Hence,

nþ12
2
cn = = ð2nÞ!: ð3:265Þ
n

To further normalize the other wave functions, we calculate the following inner
product:

ðnÞ ðnÞ ðnÞ ðnÞ


ψ l jψ l = ψ n - 1 b{n - 1 ⋯b{lþ2 b{lþ1 jblþ1 blþ2 ⋯bn - 1 ψ n - 1 : ð3:266Þ

From (3.247) and (3.248), we have

b{l bl þ εðl - 1Þ = blþ1 b{lþ1 þ εðlÞ ðl ≥ 1Þ: ð3:267Þ

Applying (3.267)–(3.266) repeatedly and considering (3.251), we reach the


following relationship:

ðnÞ ðnÞ
ψ l jψ l = εðn - 1Þ - εðn - 2Þ εðn - 1Þ - εðn - 3Þ ⋯ εðn - 1Þ - εðlÞ
ðnÞ ðnÞ
 ψ n - 1 jψ n - 1 : ð3:268Þ

ðnÞ
To show this, we use mathematical induction. We have already normalized ψ n - 1
ðnÞ ðnÞ
in (3.261). Next, we calculate ψ n - 2 jψ n - 2 such that
116 3 Hydrogen-Like Atoms

ðnÞ ðnÞ ðnÞ ðnÞ ðnÞ ðnÞ


ψ n - 2 jψ n - 2 = ψ n - 1 b{n - 1 jbn - 1 ψ n - 1 = ψ n - 1 b{n - 1 bn - 1 ψ n - 1
ðnÞ ðnÞ
= ψ n - 1 bn b{n þ εðn - 1Þ - εðn - 2Þ ψ n - 1
ðnÞ ðnÞ ðnÞ ðnÞ
= ψ n - 1 bn b{n ψ n - 1 þ εðn - 1Þ - εðn - 2Þ ψ n - 1 jψ n - 1
ðnÞ ðnÞ
= εðn - 1Þ - εðn - 2Þ ψ n - 1 jψ n - 1 ,
ð3:269Þ

where with the third equality we used (3.267) with l = n - 1; with the last equality
we used (3.251). Therefore, (3.268) holds with l = n - 2. Then, it suffices to show
ðnÞ ðnÞ ðnÞ ðnÞ
that assuming that (3.268) holds with ψ lþ1 jψ lþ1 , it holds with ψ l jψ l as well.
ðnÞ ðnÞ
Let us calculate ψ l jψ l , starting with (3.266) as follows:

ðnÞ ðnÞ ðnÞ ðnÞ


ψ l jψ l = ψ n - 1 b{n - 1 ⋯b{lþ2 b{lþ1 jblþ1 blþ2 ⋯bn - 1 ψ n - 1
ðnÞ ðnÞ
= ψ n - 1 b{n - 1 ⋯b{lþ2 b{lþ1 blþ1 blþ2 ⋯bn - 1 ψ n - 1
ðnÞ ðnÞ
= ψ n - 1 b{n - 1 ⋯b{lþ2 blþ2 b{lþ2 þ εðlþ1Þ - εðlÞ blþ2 ⋯bn - 1 ψ n - 1
ðnÞ ðnÞ
= ψ n - 1 b{n - 1 ⋯b{lþ2 blþ2 b{lþ2 blþ2 ⋯bn - 1 ψ n - 1
ðnÞ ðnÞ
þ εðlþ1Þ - εðlÞ ψ n - 1 b{n - 1 ⋯b{lþ2 blþ2 ⋯bn - 1 ψ n - 1
ðnÞ ðnÞ
= ψ n - 1 b{n - 1 ⋯b{lþ2 blþ2 b{lþ2 blþ2 ⋯bn - 1 ψ n - 1
ðnÞ ðnÞ
þ εðlþ1Þ - εðlÞ ψ lþ1 jψ lþ1 :
ð3:270Þ

In the next step, using b{lþ2 blþ2 = blþ3 b{lþ3 þ εðlþ2Þ - εðlþ1Þ , we have

ðnÞ ðnÞ ðnÞ ðnÞ


ψ l jψ l = ψ n - 1 b{n - 1 ⋯b{lþ2 blþ2 blþ3 b{lþ3 blþ3 ⋯bn - 1 ψ n - 1
ðnÞ ðnÞ ðnÞ ðnÞ
þ εðlþ1Þ - εðlÞ ψ lþ1 jψ lþ1 þ εðlþ2Þ - εðlþ1Þ ψ lþ1 jψ lþ1 :
ð3:271Þ

Thus, we find that in the first term the index number of b{lþ3 has been increased by
one with itself transferred toward the right side. On the other hand, we notice that
ðnÞ ðnÞ
with the second and third terms, εðlþ1Þ ψ lþ1 jψ lþ1 cancels out. Repeating the above
processes, we reach a following expression:
3.7 Radial Wave Functions of Hydrogen-Like Atoms 117

ðnÞ ðnÞ ðnÞ ðnÞ


ψ l jψ l = ψ n - 1 b{n - 1 ⋯b{lþ2 blþ2 ⋯bn - 1 bn b{n ψ n - 1
ðn Þ ðnÞ ðnÞ ðn Þ
þ εðn - 1Þ - εðn - 2Þ ψ lþ1 jψ lþ1 þ εðn - 2Þ - εðn - 3Þ ψ lþ1 jψ lþ1 þ ⋯
ðnÞ ðnÞ ðn Þ ðnÞ
þ εðlþ2Þ - εðlþ1Þ ψ lþ1 jψ lþ1 þ εðlþ1Þ - εðlÞ ψ lþ1 jψ lþ1
ðnÞ ðnÞ
= εðn - 1Þ - εðlÞ ψ lþ1 jψ lþ1 :

ð3:272Þ

In (3.272), the first term of RHS vanishes because of (3.251); the subsequent
ðnÞ ðnÞ
terms produce ψ lþ1 jψ lþ1 whose coefficients have canceled out one another except
for [ε(n - 1) - ε(l )].
Meanwhile, from assumption of the mathematical induction we have

ðnÞ ðnÞ
ψ lþ1 jψ lþ1 = εðn - 1Þ - εðn - 2Þ εðn - 1Þ - εðn - 3Þ ⋯ εðn - 1Þ - εðlþ1Þ
ðnÞ ðnÞ
 ψ n - 1 jψ n - 1 :

Inserting this equation into (3.272), we arrive at (3.268). In other words, we have
shown that if (3.268) holds with l = l + 1, (3.268) holds with l = l as well. This
ðnÞ ðnÞ
completes the proof to show that (3.268) is true of ψ l jψ l with l down to 0.
ðnÞ
The normalized wave functions ψ l are expressed from (3.255) as

ðnÞ ðnÞ
ψ l = κ ðn, lÞ - 2 blþ1 blþ2 ⋯bn - 1 ψ n - 1 ,
1
ð3:273Þ

where κ(n, l ) is defined such that

κ ðn, lÞ  εðn - 1Þ - εðn - 2Þ  εðn - 1Þ - εðn - 3Þ ⋯ εðn - 1Þ - εðlÞ , ð3:274Þ

with l ≤ n - 2. More explicitly, we get

ð2n - 1Þ!ðn - l - 1Þ!ðl!Þ2


κ ðn, lÞ = : ð3:275Þ
ðn þ lÞ!ðn!Þ2 ðnn - l - 2 Þ2

In particular, from (3.265) we have

nþ12
ðnÞ 2 1
ψn-1 = ρn e - ρ=n : ð3:276Þ
n ð2nÞ!

From (3.272), we define the following operator:


118 3 Hydrogen-Like Atoms

- 12
bl  εðn - 1Þ - εðl - 1Þ bl : ð3:277Þ

Then (3.273) becomes

ðnÞ ðnÞ
ψ l = blþ1 blþ2 ⋯bn - 1 ψ n - 1 : ð3:278Þ

3.7.3 Associated Laguerre Polynomials

ðnÞ
It will be of great importance to compare the functions ψ l with conventional wave
functions that are expressed using associated Laguerre polynomials. For this purpose
ðnÞ
we define the following functions Φl ðρÞ such that

lþ32
ðnÞ 2 ðn - l - 1Þ! - ρn lþ1 2lþ1 2ρ
Φl ð ρ Þ  e ρ Ln - l - 1 : ð3:279Þ
n 2nðn þ lÞ! n

The associated Laguerre polynomials are described as

1 - ν x dn nþν - x
Lνn ðxÞ = x e n ðx e Þ, ðν > - 1Þ: ð3:280Þ
n! dx

In a form of power series expansion, the polynomials are expressed for integer
k ≥ 0 as

n ð- 1Þm ðn þ kÞ!
Lkn ðxÞ = m = 0 ðn - mÞ!ðk þ mÞ!m!
xm : ð3:281Þ

Notice that “Laguerre polynomials” Ln(x) are defined as

Ln ðxÞ  L0n ðxÞ:

Hence, instead of (3.280) and (3.281), the Rodrigues formula and power series
expansion of Ln(x) are given by [2, 5]

1 x dn n - x
Ln ðxÞ = e ðx e Þ,
n! dxn
n ð- 1Þm n!
Ln ðxÞ = m=0
xm :
ðn - mÞ!ðm!Þ2
3.7 Radial Wave Functions of Hydrogen-Like Atoms 119

ðnÞ ρ
The function Φl ðρÞ contains multiplication factors e - n and ρl + 1. The function
Ln - l - 1 2ρ
2lþ1
n is a polynomial of ρ with the highest order of ρn - l - 1. Therefore,
ðnÞ ρ
Φl ðρÞ consists of summation of terms containing e - n ρt , where t is an integer equal
ðnÞ
to 1 or larger. Consequently, Φl ðρÞ → 0 when ρ → 0 and ρ → 1 (vide supra).
ðnÞ
Thus, we have confirmed that Φl ðρÞ certainly satisfies proper BCs mentioned
d
earlier and, hence, the operator dρ is indeed an anti-Hermitian. To show this more
explicitly, we define D  dρd
. An inner product between arbitrarily chosen functions
f and g is
1 1
hf jDgi  f  Dgdρ = ½f  g1
0 - ðDf  Þgdρ
0 0 ð3:282Þ
= ½f 
g 1
0

þ h - D f jgi,

where f  is a complex conjugate of f. Meanwhile, from (1.112) we have

hf jDgi = D{ f jg : ð3:283Þ

Therefore if the functions f and g vanish at ρ → 0 and ρ → 1, we obtain D{ = -


D by equating (3.282) and (3.283). However, since D is a real operator, D = D.


Thus we get

D{ = - D:

ðnÞ
This means that D is anti-Hermitian. The functions Φl ðρÞ we are dealing with
certainly satisfy the required boundary conditions. The operator H(l ) appearing in
(3.247) and (3.248) is Hermitian accordingly. This is because

b{l bl = ð- A þ H ÞðA þ H Þ = H 2 - A2 - AH þ HA, ð3:284Þ


{ { {
b{l bl
2 2
= H2 - A2 - H { A{ þ A{ H { = H { - A{ - H { A{ þ A{ H {

= H 2 - ð- AÞ2 - H ð - AÞ þ ð - AÞH = H 2 - A2 þ HA - AH = b{l bl :


ð3:285Þ

The Hermiticity is true of bl b{l as well. Thus, the eigenvalue and eigenstate
(or wave function) which belongs to that eigenvalue are physically meaningful.
Next, consider the following operation:
120 3 Hydrogen-Like Atoms

lþ32
ðnÞ
- 12 2 ðn - l - 1Þ! d l 1
bl Φl ðρÞ ¼ εðn - 1Þ - εðl - 1Þ þ -
n 2nðn þ lÞ! dρ ρ l
ρ 2ρ
 e - n ρlþ1 Ln2lþ1
-l-1 , ð3:286Þ
n

where

- 12 nl
εðn - 1Þ - εðl - 1Þ = : ð3:287Þ
ðn þ lÞðn - lÞ


Rewriting Ln2lþ1
-l-1 n in a power series expansion form using (3.280) and
rearranging the result, we obtain

lþ32
ðnÞ 2 nl 1 d l 1 ρ d n-l-1 nþl - 2ρn
bl Φl ðρÞ ¼ þ - en ρ -l ρ e
n ðn þ lÞðn-lÞ 2nðn þ lÞ!ðn-l-1Þ! dρ ρ l dρn-l-1
lþ32
2 nl 1 d l 1
¼ þ -
n ðn þ lÞðn-lÞ 2nðn þ lÞ!ðn-l-1Þ! dρ ρ l
m
n-l-1 ð-1Þ ðn þ lÞ! lþmþ1 2 m ðn-l-1Þ!
×e -ρ=n ρ ,
m¼0 ð2l þ m þ 1Þ! n m!ðn-l-m-1Þ!
ð3:288Þ

where we used well-known Leibniz rule of higher order differentiation of a product


dn - l - 1 2ρ
function; i.e., dρ n-l-1 ρnþl e - n . To perform further calculation, notice that dρ
d
does
not change a functional form of e-ρ/n, whereas it lowers the order of ρl + m + 1 by one.
Meanwhile, operation of ρl lowers the order of ρl + m + 1 by one as well. The factor nþl
2l
in the following equation (3.289) results from these calculation processes. Consid-
ering these characteristics of the operator, we get

lþ32
ðnÞ 2 nþl nl 1
bl Φl ð ρÞ ¼ e - ρ=n ρl
n 2l ðn þ lÞðn - lÞ 2nðn þ lÞ!ðn - l - 1Þ!

n-l-1 ð- 1Þmþ1 ðn þ lÞ! ðn - l - 1Þ! 2 mþ1


× ρmþ1
m¼0 ð2l þ m þ 1Þ! m!ðn - l - m - 1Þ! n

n-l-1 ð- 1Þm ðn þ l - 1Þ! ðn - l - 1Þ! 2 m


þ 2l ρm :
m¼0 ð2l þ mÞ! m!ðn - l - m - 1Þ! n

ð3:289Þ

In (3.289), calculation of the part {⋯} next to the multiplication sign for RHS is
somewhat complicated, and so we describe the outline of the calculation procedure
below. We have
3.7 Radial Wave Functions of Hydrogen-Like Atoms 121

{⋯} of RHS of (3.289)

n -lð- 1Þm ðn þ lÞ! ðn - l- 1Þ! 2 m


= ρm
m= 1 ð2l þ mÞ! ðm - 1Þ!ðn - l- mÞ! n
m
n-l - 1 ð- 1Þ ðn þ l- 1Þ! ðn - l- 1Þ! 2 m
þ2l ρm
m= 0 ð2l þ mÞ! m!ðn -l - m- 1Þ! n
m 2ρ m
n -l -1 ð-1Þ ðn - l- 1Þ!ðn þ l -1Þ! nþl 2l
= n
þ
m= 1 ð2l þ mÞ! ðm -1Þ!ðn -l - mÞ! m!ðn- l -m - 1Þ!
2ρ n-l
ðn þ l -1Þ!
þð- 1Þn -l þ
n ð2l- 1Þ!
m 2ρ m
n -l -1 ð-1Þ ðn - l- 1Þ!ðn þ l -1Þ! ðm -1Þ!ðn - l- m- 1Þ!ð2l þ mÞðn -lÞ
= n

m= 1 ð2l þ mÞ! ðm -1Þ!ðn -l - mÞ!m!ðn - l- m- 1Þ!
2ρ n-l
ðn þ l -1Þ!
þð- 1Þn -l þ
n ð2l- 1Þ!
m 2ρ m
n -l -1 ð-1Þ ðn - lÞ!ðn þ l- 1Þ! 2ρ n -l
ðn þ l -1Þ!
= n
þ ð- 1Þn -l þ
m= 1 ð2l þ m - 1Þ!ðn - l- mÞ!m! n ð2l- 1Þ!
m
n-l - 1 ð- 1Þm 2ρ ðn þ l- 1Þ! 1 2ρ n- l
ðn þ l- 1Þ!
= ðn - lÞ! n
þ ð- 1Þn-l þ
m =1 ð2l þ m- 1Þ!ðn - l- mÞ!m! ðn- lÞ! n ðn - lÞ!ð2l- 1Þ!
m
n- l ð- 1Þm 2ρ ðn þ l -1Þ! -1 2ρ
= ðn - lÞ! n
m =0 ð2l þ m -1Þ!ðn - l- mÞ!m!
= ðn -lÞ!L2l
n- l :
n
ð3:290Þ

Notice that with the second equality of (3.290), the summation is divided into
three terms; i.e., 1 ≤ m ≤ n - l - 1, m = n - l (the highest-order term), and m = 0
(the lowest-order term). Note that with the second last equality of (3.290), the
highest-order (n - l) term and the lowest-order term (i.e., a constant) have been
absorbed in a single equation, namely, an associated Laguerre polynomial. Corre-
spondingly, with the second last equality of (3.290) the summation range is extended
to 0 ≤ m ≤ n - l.
Summarizing the above results, we get
122 3 Hydrogen-Like Atoms

lþ32
ðnÞ 2 nþl nlðn - lÞ! 1 2ρ
bl Φ l ð ρÞ ¼ e - ρ=n ρl Ln2l--l1
n 2l ðn þ lÞðn - lÞ 2nðn þ lÞ!ðn - l - 1Þ! n
lþ12
2 ðn - lÞ! 2ρ
¼ e - ρ=n ρl Ln2l--l1
n 2nðn þ l - 1Þ! n
ðl - 1Þþ32
2 ½n - ðl - 1Þ - 1! - ρn ðl - 1Þþ1 2ðl - 1Þþ1 2ρ
¼ e ρ Ln - ðl - 1Þ - 1
n 2n½n þ ðl - 1Þ! n
ðnÞ
 Φl - 1 ðρÞ:
ð3:291Þ

ðnÞ ðnÞ
Thus, we find out that Φl ðρÞ behaves exactly like ψ l . Moreover, if we replace
l in (3.279) with n - 1, we find

ðnÞ ðnÞ
Φn - 1 ðρÞ = ψ n - 1 : ð3:292Þ

Operating bn - 1 on both sides of (3.292),

ðnÞ ðnÞ
Φn - 2 ð ρÞ = ψ n - 2 : ð3:293Þ

Likewise successively operating bl (1 ≤ l ≤ n - 1),

ðnÞ ðnÞ
Φl ðρÞ = ψ l ðρÞ, ð3:294Þ

with all allowed numbers of l (i.e., 0 ≤ l ≤ n - 1). This permits us to identify

ðnÞ ðnÞ
Φl ðρÞ  ψ l ðρÞ: ð3:295Þ

Consequently, it is clear that the parameter n introduced in (3.249) is identical to a


principal quantum number and that the parameter l (0 ≤ l ≤ n - 1) is an orbital
angular momentum quantum number (or azimuthal quantum number). The functions
ðnÞ ðnÞ
Φl ðρÞ and ψ l ðρÞ are identical up to the constant cn expressed in (3.265). Note,
however, that a complex constant with an absolute value of 1 (phase factor) remains
undetermined, as is always the case with the eigenvalue problem.
The radial wave functions are derived from the following relationship as
described earlier:

ðnÞ ðnÞ
Rl ðr Þ = ψ l =ρ: ð3:296Þ

ðnÞ
To normalize Rl ðr Þ, we have to calculate the following integral:
3.7 Radial Wave Functions of Hydrogen-Like Atoms 123

1 2 1 2 2 3 1 2
ðnÞ 1 ðnÞ a a a ðnÞ
Rl ðr Þ r 2 dr = ψ ρ dρ = ψl dρ
0 0 ρ2 l Z Z Z 0
a 3
= : ð3:297Þ
Z
ðnÞ
Accordingly, we choose the following functions Rl ðr Þ for the normalized radial
wave functions:

ðnÞ ðnÞ
R l ðr Þ = ðZ=aÞ3 Rl ðr Þ: ð3:298Þ

Substituting (3.296) into (3.298) and taking account of (3.279) and (3.280), we
obtain

ðnÞ 2Z 3
ðn - l - 1Þ! 2Zr l
Zr 2Zr
Rl ðr Þ =  exp - Ln2lþ1
-l-1 : ð3:299Þ
an 2nðn þ lÞ! an an an

Equation (3.298) is exactly the same as the normalized radial wave functions that
can be obtained as the solution of (3.241) through the power series expansion. All
ħ2 Z 2 1
these functions belong to the same eigenenergy E n = - 2μ a n2 ; see (3.258).
Returning back to the quantum states sharing the same eigenenergy, we have such
n states that belong to the principal quantum number n with varying azimuthal
quantum number l (0 ≤ l ≤ n - 1). Meanwhile, from (3.158), (3.159), and
(3.171), 2l + 1 quantum states possess an eigenvalue of l(l + 1) with the quantity
M2. As already mentioned in Sects. 3.5 and 3.6, the integer m takes the 2l + 1
different values of

m = l, l - 1, l - 2, ⋯1, 0, - 1, ⋯ - l þ 1, - l:

The quantum states of a hydrogen-like atoms are characterized by a set of integers


(m, l, n). On the basis of the above discussion, the number of those sharing the same
energy (i.e., the principal quantum number n) is

n-1
l=0
ð2l þ 1Þ = n2 :

Regarding this situation, we say that the quantum states belonging to the principal
ħ2 Z 2 1
quantum number n, or the energy eigenvalue E n = - 2μ a
2
n2 , are degenerate n -
fold. Note that we ignored the freedom of spin state.
In summary of this section, we have developed the operator formalism in dealing
with radial wave functions of hydrogen-like atoms and seen how the operator
formalism features the radial wave functions. The essential point rests upon that
the radial wave functions can be derived by successively operating the lowering
124 3 Hydrogen-Like Atoms

ðnÞ
operators bl on ψ n - 1 that is parametrized with a principal quantum number n and an
orbital angular momentum quantum number l = n - 1. This is clearly represented by
(3.278). The results agree with the conventional coordinate representation method
based upon the power series expansion that leads to associated Laguerre polyno-
mials. Thus, the operator formalism is again found to be powerful in explicitly
representing the mathematical constitution of quantum-mechanical systems.

3.8 Total Wave Functions

Since we have obtained angular wave functions and radial wave functions, we
ðnÞ
describe normalized total wave functions Λl,m of hydrogen-like atoms as a product
of the angular part and radial part such that

ðnÞ ðnÞ
Λl,m = Y m
l ðθ, ϕÞRl ðr Þ: ð3:300Þ

Let us seek several tangible functional forms of hydrogen (Z = 1) including


angular and radial parts. For example, we have

ðnÞ
ð1Þ 1 - 3=2 ψ n - 1 1 - 3=2 - r=a
ϕð1sÞ  Y 00 ðθ, ϕÞR0 ðr Þ = a = a e , ð3:301Þ
4π ρ π

where we used (3.276) and (3.295).


For ϕ(2s), using (3.277) and (3.278) we have

ð2Þ 1 r
ϕð2sÞ  Y 00 ðθ, ϕÞR0 ðr Þ = p a - 2 e - 2a 2 - :
3 r
ð3:302Þ
4 2π a

For ϕ(2pz), in turn, we express it as

ð2Þ 3 1 3 r
ðcos θÞ p a - 2 e - 2a
r
ϕ 2pz  Y 01 ðθ, ϕÞR1 ðr Þ =
4π 2 6 a
1 - 32 r - 2a 1 3 r r z 1
e cos θ = p a - 2 e - 2a = p a - 2 e - 2a z:
r 5 r
= p a
4 2π a 4 2π a r 4 2π
ð3:303Þ

For ϕ(2px + iy), using (3.217) we get


References 125

ð2Þ 1 3 r
ϕ 2pxþiy  Y 11 ðθ, ϕÞR1 ðr Þ = - p a - 2 e - 2a sin θeiϕ
r

8 π a
ð3:304Þ
1 3 r r x þ iy 1
= - p a - 2 e - 2a = - p a - 2 e - 2a ðx þ iyÞ:
5 r

8 π a r 8 π

In (3.304), the minus sign comes from the Condon–Shortley phase. Furthermore,
we have

ð2Þ 1 3 r
ϕ 2px - iy  Y 1- 1 ðθ, ϕÞR1 ðr Þ = p a - 2 e - 2a sin θe - iϕ
r

8 π a
ð3:305Þ
1 3 r r x - iy 1
= p a - 2 e - 2a = p a - 2 e - 2a ðx - iyÞ:
5 r

8 π a r 8 π

Notice that the above notations ϕ(2px + iy) and ϕ(2px - iy) differ from the custom
that uses, e.g., ϕ(2px) and ϕ(2py). We will come back to this point in Sect. 4.3.

References

1. Schiff LI (1955) Quantum mechanics, 2nd edn. McGraw-Hill, New York, NY


2. Arfken GB (1970) Mathematical methods for physicists, 2nd edn. Academic Press,
Waltham, MA
3. Sunakawa S (1991) Quantum mechanics. Iwanami, Tokyo. (in Japanese)
4. Byron FW Jr, Fuller RW (1992) Mathematics of classical and quantum physics. Dover,
New York, NY
5. Dennery P, Krzywicki A (1996) Mathematics for physicists. Dover, New York, NY
6. Riley KF, Hobson MP, Bence SJ (2006) Mathematical methods for physics and engineering,
3rd edn. Cambridge University Press, Cambridge
7. Arfken GB, Weber HJ, Harris FE (2013) Mathematical methods for physicists, 7th edn.
Academic Press, Waltham, MA
8. Lebedev NN (1972) Special functions and their applications. Dover, New York, NY
9. Stakgold I (1998) Green’s functions and boundary value problems, 2nd edn. Wiley,
New York, NY
10. Hotta S (2017) Operator representations for radial wave functions of hydrogen-like atoms. Bull
Kyoto Inst Technol 9:1–12
Chapter 4
Optical Transition and Selection Rules

In Sect. 1.2 we showed the Schrödinger equation as a function of space coordinates


and time. In subsequent sections, we dealt with the time-independent eigenvalue
problems of a harmonic oscillator and hydrogen-like atoms. This implies that the
physical system is isolated from the outside world and that there is no interaction
between the outside world and physical system we are considering. However, by
virtue of the interaction, the system may acquire or lose energy, momentum, angular
momentum, etc. As a consequence of the interaction, the system changes its quan-
tum state as well. Such a change is said to be a transition. If the interaction takes
place as an optical process, we are to deal with an optical transition. Of various
optical transitions, the electric dipole transition is common and the most important.
In this chapter, we study the optical transition of a particle confined in a potential
well, a harmonic oscillator, and a hydrogen using a semiclassical approach. A
question of whether the transition is allowed or forbidden is of great importance.
We have a selection rule to judge it.

4.1 Electric Dipole Transition

We have a time-dependent Schrödinger equation described as

∂ψ
Hψ = iħ : ð1:47Þ
∂t

Using the method of separation of variables, we obtained two equations


expressed below:

HϕðxÞ = EϕðxÞ, ð1:55Þ

© Springer Nature Singapore Pte Ltd. 2023 127


S. Hotta, Mathematical Physical Chemistry,
https://doi.org/10.1007/978-981-99-2512-4_4
128 4 Optical Transition and Selection Rules

∂ξðt Þ
iħ = Eξðt Þ: ð1:56Þ
∂t

Equation (1.55) is an eigenvalue equation of energy and (1.56) is an equation with


time. So far we have focused our attention upon (1.55) taking a particle confined
within a potential well, a one-dimensional harmonic oscillator, and hydrogen-like
atoms as an example. In this chapter we deal with a time-evolved Schrödinger
equation and its relevance to an optical transition. The optical transition takes
place according to selection rules. We mention their significance as well.
We showed that after solving the eigenvalue equation, the Schrödinger equation
is expressed as

ψ ðx, t Þ = ϕðxÞ expð- iEt=ħÞ: ð1:60Þ

The probability density of the system (i.e., normally a particle such as an electron,
a harmonic oscillator, etc.) residing at a certain place x at a certain time t is expressed
as

ψ  ðx, t Þψ ðx, t Þ:

If the Schrödinger equation is described as a form of separated variables as in the


case of (1.60), the exponential factors including t cancel out and we have

ψ  ðx, t Þψ ðx, t Þ = ϕ ðxÞϕðxÞ: ð4:1Þ

This means that the probability density of the system depends only on spatial
coordinate and is constant in time. Such a state is said to be a stationary state. That is,
the system continues residing in a quantum state described by ϕ(x) and remains
unchanged independent of time.
Next, we consider a linear combination of functions described by (1.60). That is,
we have

ψ ðx, t Þ = c1 ϕ1 ðxÞ expð- iE 1 t=ħÞ þ c2 ϕ2 ðxÞ expð- iE 2 t=ħÞ, ð4:2Þ

where the first term is pertinent to the state 1 and second term to the state 2 and c1 and
c2 are complex constants with respect to the spatial coordinates but may be weakly
time-dependent. The state described by (4.2) is called a coherent state. The proba-
bility distribution of that state is described as

2
ψ  ðx, t Þψ ðx, t Þ = jc1 j2 ϕ1 j2 þ c2 jϕ2 j2 þ c1 c2 ϕ1 ϕ2 e - iωt þ c2 c1 ϕ2 ϕ1 eiωt , ð4:3Þ

where ω is expressed as
4.1 Electric Dipole Transition 129

ω = ðE2 - E 1 Þ=ħ: ð4:4Þ

This equation shows that the probability density of the system undergoes a
sinusoidal oscillation with time. The angular frequency equals the energy difference
between the two states divided by the reduced Planck constant. If the system is a
charged particle such as an electron and proton, the sinusoidal oscillation is accom-
panied by an oscillating electromagnetic field. Thus, the coherent state is associated
with the optical transition from one state to another, when the transition is related to
the charged particle.
The optical transitions result from various causes. Of these, the electric dipole
transition yields the largest transition probability and the dipole approximation is
often chosen to represent the transition probability. From the point of view of optical
measurements, the electric dipole transition gives the strongest absorption or emis-
sion spectral lines. The matrix element of the electric dipole, more specifically a
square of an absolute value of the matrix element, is a measure of the optical
transition probability. Labeling the quantum states as a, b, etc. and describing the
corresponding state vector as jai, jbi, etc., the matrix element Pba is given by

Pba  hbjεe  Pjai, ð4:5Þ

where εe is a unit polarization vector of the electric field of an electromagnetic wave


(i.e., light). Equation (4.5) describes the optical transition that takes place as a result
of the interaction between electrons and radiation field in such a way that the
interaction causes electrons in the system to change the state from jai to jbi. That
interaction is represented by εe  P. The quantum states jai and jbi are referred to as
an initial state and final state, respectively.
The quantity P is the electric dipole moment of the system, which is defined as

Pe x,
j j
ð4:6Þ

where e is an elementary charge (e < 0) and xj is a position vector of the j-th electron.
Detailed description of εe and P can be seen in Chap. 7. The quantity Pba is said to be
transition dipole moment, or more precisely, transition electric dipole moment with
respect to the states jai and jbi. We assume that the optical transition occurs from a
quantum state jai to another state jbi. Since Pba is generally a complex number, |
Pba|2 represents the transition probability.
If we adopt the coordinate representation, (4.5) is expressed by

Pba = ϕb εe  Pϕa dτ, ð4:7Þ

where τ denotes an integral range of a space.


130 4 Optical Transition and Selection Rules

4.2 One-Dimensional System

Let us apply the aforementioned general description to individual cases of Chaps. 1–


3.
Example 4.1 A particle confined in a square-well potential This example was
treated in Chap. 1. As before, we assume that a particle (i.e., electron) is confined in a
one-dimensional system [-L ≤ x ≤ L (L > 0)].
We consider the optical transition from the ground state ϕ1(x) to the first excited
state ϕ2(x). Here, we put L = π/2 for convenience. Then, the normalized coherent
state ψ(x) is described as

1
ψ ðx, t Þ = p ½ϕ1 ðxÞ expð- iE 1 t=ħÞ þ ϕ2 ðxÞ expð- iE 2 t=ħÞ, ð4:8Þ
2

where we put c1 = c2 = p1
2
in (4.2). In (4.8), we have

2 2
ϕ1 ð x Þ = cos x and ϕ2 ðxÞ = sin 2x: ð4:9Þ
π π

Following (4.3), we have a following real function called a probability distribu-


tion density:

1
ψ  ðx, t Þψ ðx, t Þ = cos 2 x þ sin 2 2x þ ðsin 3x þ sin xÞ cos ωt , ð4:10Þ
π

where ω is given by (4.4) as

ω = 3ħ=2m, ð4:11Þ

where m is a mass of an electron. Rewriting (4.10), we have

1 1
ψ  ðx, t Þψ ðx, t Þ = 1 þ ðcos 2x - cos 4xÞ þ ðsin 3x þ sin xÞ cos ωt : ð4:12Þ
π 2

Integrating (4.12) over - π2, π2 , a contribution from only the first term is
non-vanishing to give 1, as anticipated (because of the normalization).
Putting t = 0 and integrating (4.12) over a positive domain 0, π2 , we have

π=2
1 4
ψ  ðx, 0Þψ ðx, 0Þdx = þ ≈ 0:924: ð4:13Þ
0 2 3π

Similarly, integrating (4.12) over a negative domain - π2, 0 , we have


4.2 One-Dimensional System 131

Fig. 4.1 Probability


distribution density ψ (x,
t)ψ(x, t) of a particle
confined in a square-well

,
potential. The solid curve
and broken curve represent

,
the density of t = 0 and
t = π/ω (i.e., half period),
respectively

− /2 0 /2

0
1 4
ψ  ðx, 0Þψ ðx, 0Þdx = - ≈ 0:076: ð4:14Þ
- π=2 2 3π

Thus, 92% of a total charge (as a probability density) is concentrated in the


positive domain. Differentiation of ψ (x, 0)ψ(x, 0) gives five extremals including
both edges. Of these, a major maximum is located at 0.635 radian that corresponds to
about 40% of π/2. This can be a measure of the transition moment. Figure 4.1
demonstrates these results (see a solid curve). Meanwhile, putting t = π/ω (i.e., half
period), we plot ψ (x, π/ω)ψ(x, π/ω). The result shows that the graph is obtained by
folding back the solid curve of Fig. 4.1 with respect to the ordinate axis. Thus, we
find that the charge (or the probability density) exerts a sinusoidal oscillation with an
angular frequency 3ħ/2m along the x-axis around the origin.
Let e1 be a unit vector in the positive direction of the x-axis. Then, the electric
dipole P of the system is

P = ex = exe1 , ð4:15Þ

where x is a position vector of the electron. Let us define the matrix element of the
electric dipole transition as

P21  hϕ2 ðxÞje1  Pjϕ1 ðxÞi = hϕ2 ðxÞjexjϕ1 ðxÞi: ð4:16Þ

Notice that we only have to consider that the polarization of light is parallel to the
x-axis. With the coordinate representation, we have
132 4 Optical Transition and Selection Rules

π=2 π=2
2 2
P21 = ϕ2 ðxÞexϕ1 ðxÞdx = ðcos xÞex sin 2xdx
- π=2 - π=2 π π
π=2 π=2
2 e
=e x cos x sin 2xdx = xðsin x þ sin 3xÞdx ð4:17Þ
π - π=2 π - π=2
π=2 0
e 1 16e
= xð- cos xÞ0 þ x - cos 3x dx = ,
π - π=2 3 9π

where we used a trigonometric formula and integration by parts. The factor 16/9π in
(4.17) is about 36% of π/2. This number is pretty good agreement with 40% that is
estimated above from the major maximum of ψ (x, 0)ψ(x, 0).
Note that the transition moment vanishes if the two states associated with the
transition have the same parity. In other words, if these are both described by sine
functions or cosine functions, the integral vanishes.
Example 4.2 One-dimensional harmonic oscillator Second, let us think of an
optical transition regarding a harmonic oscillator that we dealt with in Chap. 2. We
denote the state of the oscillator as jni in place of jψ ni (n = 0, 1, 2, ⋯) of Chap. 2.
Then, a general expression (4.5) can be written as

Pkl = hkjεe  Pjli: ð4:18Þ

Since we are considering the sole one-dimensional oscillator, we have

εe = q and P = eq, ð4:19Þ

where q is a unit vector in the positive direction of the coordinate q. Therefore,


similar to the above, we have

εe  P = eq: ð4:20Þ

That is,

Pkl = ehkjqjli: ð4:21Þ

Since q is an Hermitian operator, we have

Pkl = e ljq{ jk = ehljqjk i = Plk , ð4:22Þ

where we used (1.116). Using (2.68), we have


4.2 One-Dimensional System 133

ħ ħ
Pkl = e kja þ a{ jl = e hkjajli þ hkja{ jli : ð4:23Þ
2mω 2mω

Taking the adjoint of (2.62) and modifying the notation, we have


p
hk j a = k þ 1hk þ 1 j : ð4:24Þ

Using (2.62) once again, we get

ħ p p
Pkl = e k þ 1hk þ 1jli þ l þ 1hkjl þ 1i : ð4:25Þ
2mω

Using orthonormal conditions between the state vectors, we have

ħ p p
Pkl = e k þ 1δkþ1, l þ l þ 1δk, lþ1 : ð4:26Þ
2mω

Exchanging k and l in the above, we get

Pkl = Plk :

The matrix element Pkl is symmetric with respect to indices k and l. Notice that
the first term does not vanish only when k + 1 = l. The second term does not vanish
only when k = l + 1. Therefore we get

ħðk þ 1Þ ħð l þ 1Þ
Pk,kþ1 = e and Plþ1,l = e : ð4:27Þ
2mω 2mω

Meanwhile, we find that the transition matrix P is expressed as

0 1 0 0 0 ⋯
p
1 0 2 0 0 ⋯
p p
ħ ħ 0 2 0 3 0 ⋯
P = eq = e a þ a{ = e p , ð4:28Þ
2mω 2mω 0 0 3 0 2 ⋯
0 0 0 2 0 ⋯
⋮ ⋮ ⋮ ⋮ ⋮ ⋱

where we used (2.68). Note that a real Hermitian matrix is a symmetric matrix.
Practically, it is a fast way to construct a transition matrix (4.28) using (2.65) and
(2.66). It is an intuitively obvious and straightforward task. Having a glance at the
matrix form immediately tells us that the transition matrix elements are
134 4 Optical Transition and Selection Rules

non-vanishing with only (k, k + 1) and (k + 1, k) positions. Whereas the (k, k + 1)-
element represents transition from the k-th excited state to (k - 1)-th excited state
accompanied by photoemission, the (k + 1, k)-element implies the transition from
(k - 1)-th excited state to k-th excited state accompanied by photoabsorption. The
two transitions give the same transition moment. Note that zeroth excited state
means the ground state; see (2.64) for basis vector representations.
We should be careful about “addresses” of the matrix accordingly. For example,
P0, 1 in (4.27) represents a (1, 2) element of the matrix (4.28); P2, 1 stands for a (3, 2)
element.
Suppose that we seek the transition dipole moments using coordinate represen-
tation. Then, we need to use (2.106) and perform definite integration. For instance,
we have
1
e ψ 0 ðqÞqψ 1 ðqÞdq
-1

ħ
that corresponds to (1, 2) element of (4.28). Indeed, the above integral gives e 2mω.
The confirmation is left for the readers. Nonetheless, to seek a definite integral of
product of higher excited-state wave functions becomes increasingly troublesome. In
this respect, the operator method described above provides us with a much better
insight into complicated calculations.
Equations (4.26)–(4.28) imply that the electric dipole transition is allowed to
occur only when the quantum number changes by one. Notice also that the transition
takes place between the even function and odd function; see Table 2.1 and (2.101).
Such a condition or restriction on the optical transition is called a selection rule. The
former equation of (4.27) shows that the transition takes place from the upper state to
the lower state accompanied by the photon emission. The latter equation, on the
other hand, shows that the transition takes place from the lower state to the upper
accompanied by the photon absorption.

4.3 Three-Dimensional System

The hydrogen-like atoms give us a typical example. Since we have fully investigated
the quantum states of those atoms, we make the most of the related results.
Example 4.3 An electron in a hydrogen atom Unlike the one-dimensional sys-
tem, we have to take account of an angular momentum in the three-dimensional
system. We have already obtained explicit wave functions. Here we focus on 1s and
2p states of a hydrogen. For their normalized states, we have
4.3 Three-Dimensional System 135

1 - r=a
ϕð1sÞ = e , ð4:29Þ
πa3
1 1 r - r=2a
ϕ 2pz = e cos θ, ð4:30Þ
4 2πa3 a
1 1 r - r=2a
ϕ 2pxþiy = - e sin θeiϕ , ð4:31Þ
8 πa3 a
1 1 r - r=2a
ϕ 2px - iy = e sin θe - iϕ , ð4:32Þ
8 πa3 a

where a denotes Bohr radius of a hydrogen. Note that a minus sign of ϕ(2px + iy) is
due to the Condon–Shortley phase. Even though the transition probability is pro-
portional to a square of the matrix element and so the phase factor cancels out, we
describe the state vector faithfully. The energy eigenvalues are

ħ2 ħ2
E ð1sÞ = - , E 2pz = E 2pxþiy = E 2px - iy = - , ð4:33Þ
2μa2 8μa2

where μ is a reduced mass of a hydrogen. Note that the latter three states are
degenerate.
First, we consider a transition between ϕ(1s) and ϕ(2pz) states. Suppose that the
normalized coherent state is described as

1 iE ð1sÞt iE 2pz t
ψ ðx, t Þ = p ϕð1sÞ exp - þ ϕ 2pz exp - : ð4:34Þ
2 ħ ħ

As before, we have

ψ  ðx, t Þψ ðx, t Þ = jψ ðx, t Þj2


1 2 ð4:35Þ
= ½ϕð1sÞ2 þ ϕ 2pz þ 2ϕð1sÞϕ 2pz cos ωt ,
2

where ω is given by

ω = E 2pz - E ð1sÞ =ħ = 3ħ=8 μa2 : ð4:36Þ

In virtue of the third term of (4.35) that contains a cos ωt factor, the charge
distribution undergoes a sinusoidal oscillation along the z-axis with an angular
frequency described by (4.36). For instance, ωt = 0 gives +1 factor to (4.35) when
t = 0, whereas it gives -1 factor when ωt = π, i.e., t = 8π μa2/3ħ.
Integrating (4.35), we have
136 4 Optical Transition and Selection Rules

ψ  ðx, t Þψ ðx, t Þdτ ¼ jψ ðx, t Þj2 dτ

1 2
¼ ½ϕð1sÞ2 þ ϕ 2pz dτ
2

þ cos ωt ϕð1sÞϕ 2pz dτ


1 1
¼ þ ¼ 1,
2 2

where we used normalized functional forms of ϕ(1s) and ϕ(2pz) together with
orthogonality of them. Note that both of the functions are real.
Next, we calculate the matrix element. For simplicity, we denote the matrix
element simply as Pðεe Þ only by designating the unit polarization vector εe. Then,
we have

Pðεe Þ = ϕð1sÞjεe  Pjϕ 2pz , ð4:37Þ

where

x
P = ex = eðe1 e2 e3 Þ y : ð4:38Þ
z

We have three possibilities of choosing εe out of e1, e2, and e3. Choosing e3, we
have

ðe Þ
Pz,jp3 i = e ϕð1sÞjzjϕ 2pz
p
z
1 π 2π
e 27 2
= p r 4 e - 3r=2a dr cos 2 θ sin θdθ dϕ = ea ≈ 0:745ea:
4 2πa4 0 0 0 35
ð4:39Þ

ðe Þ
In (4.39), we express the matrix element as Pz,jp3 i to indicate the z-component of
z
position vector and to explicitly show that ϕ(2pz) state is responsible for the
transition. In (4.39), we used z = r cos θ. We also used a radial part integration
such that

1
2a 5
r 4 e - 3r=2a dr = 24 :
0 3
4.3 Three-Dimensional System 137

Also, we changed a variable cos θ → t to perform the integration with respect to θ.


We see that a “leverage” length of the transition moment is comparable to Bohr
radius a.
ðe Þ
With the notation Pz,jp3 i , we need some explanation for consistency with the latter
z
description. Equation (4.39) represents the transition from jϕ(2pz)i to jϕ(1s)i that is
accompanied by the photon emission. Thus, jpzi in the notation means that jϕ(2pz)i
is the initial state. In the notation, in turn, (e3) denotes the polarization vector and
z represents the electric dipole.
In the case of photon absorption where the transition occurs from jϕ(1s)i to
jϕ(2pz)i, we use the following notation:

ðe Þ
Pz,hp3 j = e ϕ 2pz jzjϕð1sÞ : ð4:40Þ
z

Since all the functions related to the integration are real, we have

ðe Þ ðe Þ
Pz,jp3 i = Pz,hp3 j ,
z z

where hpzj means that ϕ(2pz) is designated as the final state. Meanwhile, if we choose
e1 for εe to evaluate the matrix element Px, we have

ðe Þ
Px,jp1 i = e ϕð1sÞjxjϕ 2pz
z

e 1 π 2π ð4:41Þ
= p r 4 e - 3r=2a dr sin 2 θ cos θdθ cos ϕdϕ = 0,
4 2πa4 0 0 0

where cos ϕ comes from x = r sin θ cos ϕ and an integration of cos ϕ gives zero. In a
similar manner, we have

ðe Þ
Py,jp2 i = e ϕð1sÞjyjϕ 2pz = 0: ð4:42Þ
z

Next, we estimate the matrix elements associated with 2px and 2py. For this
purpose, it is convenient to introduce the following complex coordinates by a unitary
transformation:
138 4 Optical Transition and Selection Rules

1 1 1 i
p p 0 p p 0
x 2 2 2 2 x
ð e1 e2 e3 Þ y = ð e1 e2 e3 Þ -p
i
p
i
0 p
1
-p
i
0 y
2 2 2 2
z z
0 0 1 0 0 1
1
p ðx þ iyÞ
2
1 1
= p ðe1 - ie2 Þ p ðe1 þ ie2 Þ e3 1 ,
2 2 p ðx - iyÞ
2
z
ð4:43Þ

where a unitary transformation is represented by a unitary matrix defined as

U { U = UU { = E: ð4:44Þ

We will investigate details of the unitary transformation and matrix in Parts III
and IV.
We define e+ and e- as follows [1]:

1 1
eþ  p ðe1 þ ie2 Þ and e -  p ðe1 - ie2 Þ, ð4:45Þ
2 2

where complex vectors e+ and e- represent the left-circularly polarized light and
right-circularly polarized light that carry an angular momentum ħ and -ħ, respec-
tively. We will revisit the characteristics and implication of these complex vectors in
Sect. 7.4.
We have

1
x p ðx þ iyÞ
2
ð e1 e2 e3 Þ y = ð e - eþ e 3 Þ 1 : ð4:46Þ
p ðx - iyÞ
z 2
z

Note that e+, e-, and e3 are orthonormal. That is,

heþ jeþ i = 1, heþ je - i = 0, etc: ð4:47Þ

In this situation, e+, e-, and e3 are said to form an orthonormal basis in a three-
dimensional complex vector space (see Sect. 11.4).
Now, choosing e+ for εe, we have [2]
4.3 Three-Dimensional System 139

ðe Þ 1
Px -þ iy,jp  e ϕð1sÞjp ðx - iyÞjϕ 2pxþiy , ð4:48Þ
þi
2

where jp+i is a shorthand notation of ϕ(2px + iy) that is designated as the initial state;
x - iy represents a complex electric dipole. Equation (4.48) represents an optical
process in which an electron causes transition from ϕ(2px + iy) to ϕ(1s) to lose an
angular momentum ħ, whereas the radiation field gains that angular momentum to
ðe Þ
conserve a total angular momentum ħ. The notation Px -þ iy,jp i reflects this situation.
þ
Using the coordinate representation, we rewrite (4.48) as

1 π 2π
ðe Þ e
Px -þ iy,jp i = - p r 4 e - 3r=2a dr sin 3 θdθ e - iϕ eiϕ dϕ =
þ
8 2πa4 0 0 0
p
27 2
- ea, ð4:49Þ
35

where we used

x - iy = r sin θe - iϕ : ð4:50Þ

In the definite integral of (4.49), e-iϕ comes from x - iy, while eiϕ comes from
ϕ(2px + iy). Note that from (3.24) eiϕ is an eigenfunction corresponding to an angular
momentum eigenvalue ħ. Notice that in (4.49) exponents e-iϕ and eiϕ cancel out and
that an azimuthal integral is non-vanishing.
If we choose e- for εe, we have

ðe - Þ 1
Pxþiy,jp = e ϕð1sÞjp ðx þ iyÞjϕ 2pxþiy ð4:51Þ
þi
2
1 π 2π
e
= p r 4 e - 3r=2a dr sin 3 θdθ e2iϕ dϕ = 0, ð4:52Þ
8 2πa4 0 0 0

where we used

x þ iy = r sin θeiϕ :

With (4.52), a factor e2iϕ results from the product ϕ(2px + iy)(x + iy), which
renders the integral (4.51) vanishing. Note that the only difference between (4.49)
and (4.52) is about the integration of ϕ factor. For the same reason, if we choose e3
for εe, the matrix element vanishes. Thus, with the ϕ(2px + iy)-related matrix element,
ðe Þ
only Px -þ iy,jp i survives. Similarly, with the ϕ(2px - iy)-related matrix element, only
þ
ðe - Þ
Pxþiy,jp survives. Notice that jp-i is a shorthand notation of ϕ(2px - iy) that is
-i
designated as the initial state. That is, we have, e.g.,
140 4 Optical Transition and Selection Rules

p
ðe - Þ 1 27 2
Pxþiy,jp = e ϕð1sÞjp ðx þ iyÞjϕ 2px - iy = 5 ea,
-i
2 3
ð4:53Þ
ðe Þ 1
Px -þ iy,jp i = e ϕð1sÞjp ðx - iyÞjϕ 2px - iy = 0:
-
2

Taking complex conjugate of (4.48), we have


p
 1 27 2
ðe Þ
Px -þ iy,jp = e ϕ 2pxþiy jp ðx þ iyÞjϕð1sÞ = - ea: ð4:54Þ
þi
2 35

ðe Þ
Here recall (1.116) and (x - iy){ = x + iy. Also note that since Px -þ iy,jp is real,
þi
ðe Þ 
Px -þ iy,jp i is real as well so that we have
þ

ðe Þ  ðe Þ ðe Þ
Px -þ iy,jp = Px -þ iy,jp i = Pxþiy,hp
-
j, ð4:55Þ
þi þ þ

where hp+j means that ϕ(2px + iy) is designated as the final state.
Comparing (4.48) and (4.55), we notice that the polarization vector has been
switched from e+ to e- with the allowed transition, even though the matrix element
remains the same. This can be explained as follows: In (4.48) the photon emission is
occurring, while the electron is causing a transition from ϕ(2px + iy) to ϕ(1s). As a
result, the radiation field has gained an angular momentum by ħ during the process in
which the electron has lost an angular momentum ħ. In other words, ħ is transferred
from the electron to the radiation field and this process results in the generation of
left-circularly polarized light in the radiation field.
In (4.54), on the other hand, the reversed process takes place. That is, the photon
absorption is occurring in such a way that the electron is excited from ϕ(1s) to
ϕ(2px + iy). After this process has been completed, the electron has gained an angular
momentum by ħ, whereas the radiation field has lost an angular momentum by ħ. As
a result, the positive angular momentum ħ is transferred to the electron from the
radiation field that involves left-circularly polarized light. This can be translated into
the statement that the radiation field has gained an angular momentum by -ħ. This is
equivalent to the generation of right-circularly polarized light (characterized by e-)
in the radiation field. In other words, the electron gains the angular momentum by ħ
to compensate the change in the radiation field.
The implication of the first equation of (4.53) can be interpreted in a similar
manner. Also we have
4.3 Three-Dimensional System 141

ðe Þ  ðe - Þ ðe Þ 1
-
Pxþiy,jp = Pxþiy,jp = Px -þ iy,hp j = e ϕ 2px - iy jp ðx - iyÞjϕð1sÞ
-i -i -
2
p
27 2
= ea:
35

In the above relation, hp-j implies that ϕ(2px - iy) is present as the final state.
Notice that the inner products of (4.49) and (4.53) are real, even though neither x + iy
ðe Þ ðe - Þ
nor x - iy is Hermitian. Also note that Px -þ iy,jp i of (4.49) and Pxþiy,jp of (4.53)
þ -i
have the same absolute value with minus and plus signs, respectively. The minus
sign of (4.49) comes from the Condon–Shortley phase. The difference, however, is
2
- ðe Þ
not essential, because the transition probability is proportional to Pxþiy,jp or
-i
2
ðe Þ
Px -þ iy,jp . Some literature [3, 4] uses -(x + iy) instead of x + iy. This is simply
þi

because of the inclusion of the Condon–Shortley phase; see (3.304).


Let us think of the coherent state that is composed of ϕ(1s) and ϕ(2px + iy) or
ϕ(2px - iy). Choosing ϕ(2px + iy), the state ψ(x, t) can be given by

1
ψ ðx, t Þ = p
2
 ϕð1sÞ exp - iE ð1sÞt=ħ þϕ 2pxþiy exp - iE 2pxþiy t=ħ , ð4:56Þ

where ϕ(1s) is described by (3.301) and ϕ(2px + iy) is expressed as (3.304). Then we
have

ψ  ðx, tÞψ ðx,t Þ ¼ jψ ðx, t Þj2


1
þ ϕð1sÞℜe 2pxþiy eiðϕ - ωtÞ þ e - iðϕ- ωtÞ
2
¼ jϕð1sÞj2 þ ϕ 2pxþiy
2
1 2
¼ jϕð1sÞj2 þ ϕ 2pxþiy þ 2ϕð1sÞℜe 2pxþiy cosðϕ -ωt Þ ,
2
ð4:57Þ

where using ℜe 2pxþiy , we denote ϕ(2px + iy) as follows:

ϕ 2pxþiy  ℜe 2pxþiy eiϕ : ð4:58Þ

That is, ℜe 2pxþiy represents a real part of ϕ(2px + iy) that depends only on r and
θ. For example, from (3.304) we have

1 3 r
ℜe 2pxþiy = - p a - 2 e - 2a sin θ:
r

8 π a
142 4 Optical Transition and Selection Rules

The third term of (4.57) implies that the existence probability density of an
electron represented by |ψ(x, t)|2 is rotating counterclockwise around the z-axis
with an angular frequency of ω. Similarly, in the case of ϕ(2px - iy), the existence
probability density of an electron is rotating clockwise around the z-axis with an
angular frequency of ω.
Integrating (4.57), we have

ψ  ðx, t Þψ ðx, t Þdτ = jψ ðx, t Þj2 dτ

1 2
= jϕð1sÞj2 þ ϕ 2pxþiy dτ
2
1 π 2π
þ r 2 dr sin θdθϕð1sÞℜe 2pxþiy dϕ cosðϕ - ωt Þ
0 0 0

1 1
= þ = 1,
2 2

where we used normalized functional forms of ϕ(1s) and ϕ(2px + iy); the last term
vanishes because


dϕ cosðϕ - ωt Þ = 0:
0

This is easily shown by suitable variable transformation.


In relation to the above discussion, we often use real numbers to describe wave
functions. For this purpose, we use the following unitary transformation to transform
the orthonormal basis of e±imϕ to cos mϕ and sin mϕ. That is, we have

1 1 ð- 1Þm imϕ 1 - imϕ


p cos mϕ p sin mϕ = p e p e
π π 2π 2π
ð- 1Þm ð- 1Þm i
p - p
 2 2 , ð4:59Þ
1 i
p p
2 2

where we assume that m is positive so that we can appropriately take into account the
Condon–Shortley phase. Alternatively, we describe it via unitary transformation as
follows:
4.3 Three-Dimensional System 143

ð- 1Þm imϕ 1 - imϕ 1 1


p e p e = p cos mϕ p sin mϕ
2π 2π π π
ð- 1Þm 1
p p
2 2
 : ð4:60Þ
ð- 1Þm i i
p -p
2 2

In this regard, we have to be careful about normalization constants; for trigono-


metric functions the constant should be p1π, whereas for the exponential representa-
tion, the constant is p12π . At the same time, trigonometric functions are expressed as a
linear combination of eimϕ and e-imϕ, and so if we use the trigonometric functions,
information of a magnetic quantum number is lost.
ðnÞ ðnÞ
In Sect. 3.7, we showed normalized functions Λl,m = Y m
l ðθ, ϕÞRl ðr Þ of the
±imϕ ðnÞ
l ðθ, ϕÞ is proportional to e
hydrogen-like atom. Noting that Y m , Λl,m can be
described using cos mϕ and sin mϕ for the basis vectors. We denote two linearly
ðnÞ ðnÞ
independent vectors by Λl, cos mϕ and Λl, sin mϕ . Then, these vectors are expressed as

ð- 1Þm ð- 1Þm i
p - p
ðnÞ ðnÞ ðnÞ ðnÞ 2 2
Λl, cos mϕ Λl, sin mϕ = Λl,m Λl, - m , ð4:61Þ
1 i
p p
2 2

where we again assume that m is positive. In chemistry and materials science, we


ðnÞ ðnÞ
normally use real functions of Λl, cos mϕ and Λl, sin mϕ . In particular, we use the
ð2Þ ð2Þ
notations of, e.g., ϕ(2px) and ϕ(2py) instead of Λ1, cos ϕ and Λ1, sin ϕ , respectively. In
that case, we explicitly have a following form:

1 i
-p p
ð2Þ ð2Þ 2 2
ϕð2px Þ ϕ 2py = Λ1,1 Λ1, - 1
1 i
p p
2 2
1 i
-p p
2 2
= ϕ 2pxþiy ϕ 2px - iy
1 i
p p
2 2
1 - 32 r - 2a 1 3 r
p a - 2 e - 2a sin θ sin ϕ :
r r
= p a e sin θ cos ϕ
4 2π a 4 2π a
ð4:62Þ

Thus, the Condon–Shortley phase factor has been removed.


144 4 Optical Transition and Selection Rules

Using this expression, we calculate matrix elements of the electric dipole transi-
tion. We have

ðe Þ
Px,jp1 i = ehϕð1sÞjxjϕð2px Þi
p
x

e 1 π 2π
27 2 ð4:63Þ
4 - 3r=2a
= p r e dr sin θdθ
3
cos ϕdϕ =
2
ea:
4 2πa4 0 0 0 35

Thus, we obtained the same result as (4.49) apart from the minus sign. Since a
square of an absolute value of the transition moment plays a role, the minus sign is
again of secondary importance. With Pðye2 Þ , similarly we have

ðe Þ
Py,jp2 i = e ϕð1sÞjyjϕ 2py
y
1 π 2π p ð4:64Þ
e 27 2
= p r 4 e - 3r=2a dr sin 3 θdθ sin 2 ϕdϕ = ea:
4 2πa4 0 0 0 35

Comparing (4.39), (4.63), and (4.64), we have


p
ðe Þ ðe Þ ðe Þ 27 2
Pz,jp3 i = Px,jp1 i = Py,jp2 i = ea:
z x y
35
ðe Þ ðe Þ ðe Þ
In the case of Pz,jp3 i , Px,jp1 i , and Py,jp2 i , the optical transition is said to be polarized
z x y

along the z-, x-, and y-axis, respectively, and so linearly polarized lights are relevant.
Note moreover that operators z, x, and y in (4.39), (4.63), and (4.64) are Hermitian
and that ϕ(2pz), ϕ(2px), and ϕ(2py) are real functions.

4.4 Selection Rules

In a three-dimensional system such as hydrogen-like atoms, quantum states of


particles (i.e., electrons) are characterized by three quantum numbers: principal
quantum numbers, orbital angular momentum quantum numbers (or azimuthal
quantum numbers), and magnetic quantum numbers. In this section, we examine
the selection rules for the electric dipole approximation.
Of the three quantum numbers mentioned above, angular momentum quantum
numbers are denoted by l and magnetic quantum numbers by m. First, we examine
the conditions on m. With the angular momentum operator L and its corresponding
operator M, we get the following commutation relations:
4.4 Selection Rules 145

½M z , x = iy, M y , z = ix, ½M x , y = iz;


ð4:65Þ
½M z , iy = x, M y , ix = z, ½M x , iz = y; etc:

Notice that in the upper line the indices change cyclic like (z, x, y), whereas in the
lower line they change anti-cyclic such as (z, y, x). The proof of (4.65) is left for the
reader. Thus, we have, e.g.,

½M z , x þ iy = x þ iy, ½M z , x - iy = - ðx - iyÞ, etc: ð4:66Þ

Putting

Qþ  x þ iy and Q -  x - iy,

we have

½M z , Qþ  = Qþ , ½M z , Q -  = - Q - : ð4:67Þ

Taking an inner product of both sides of (4.67), we have

hm0 j½M z , Qþ jmi = hm0 jM z Qþ - Qþ M z jmi = m0 hm0 jQþ jmi - mhm0 jQþ jmi
= hm0 jQþ jmi,
ð4:68Þ

where the quantum state jmi is identical to jl, mi in (3.151). Here we need no
information about l, and so it is omitted. Thus, we have, e.g., Mz j mi = m j mi.
Taking its adjoint, we have hm j Mz{ = hm j Mz = mhmj, where Mz is Hermitian.
These results lead to (4.68). From (4.68), we get

ðm0 - m - 1Þhm0 jQþ jmi = 0: ð4:69Þ

Therefore, for the matrix element hm′| Q+| mi not to vanish, we must have

m0 - m - 1 = 0 or Δm = 1 ðΔm  m0 - mÞ:

This represents the selection rule with respect to the coordinate Q+.
Similarly, we get

ðm0 - m þ 1Þhm0 jQ - jmi = 0: ð4:70Þ

In this case, for the matrix element hm′| Q-| mi not to vanish, we have
146 4 Optical Transition and Selection Rules

m0 - m þ 1 = 0 or Δm = - 1:

To derive (4.70), we can alternatively use the following: Taking the adjoint of
(4.69), we have

ðm0 - m - 1ÞhmjQ - jm0 i = 0:

Exchanging m′ and m, we have

ðm - m0 - 1Þhm0 jQ - jmi = 0 or ðm0 - m þ 1Þhm0 jQ - jmi = 0:

Thus, (4.70) is recovered.


Meanwhile, we have a commutation relation

½M z , z = 0: ð4:71Þ

Similarly, taking an inner product of both sides of (4.71), we have

ðm0 - mÞhm0 jzjmi = 0:

Therefore, for the matrix element hm′| z| mi not to vanish, we must have

m0 - m = 0 or Δm = 0: ð4:72Þ

These results are fully consistent with Example 4.3 of Sect. 4.3. That is, if
circularly polarized light takes part in the optical transition, Δm = ± 1. For instance,
using the present notation, we rewrite (4.48) as
p
1 1 27 2
ϕð1sÞjp ðx - iyÞjϕ 2pxþiy = p h0jQ - j1i = - 5 a:
2 2 3

If linearly polarized light is related to the optical transition, we have Δm = 0.


Next, we examine the conditions on l. To this end, we calculate a following
commutator [5]:
4.4 Selection Rules 147

M2, z = Mx2 þ My2 þ Mz2, z = Mx2, z þ My2, z


= M x ðM x z - zM x Þ þ M x zM x - zM x 2
þM y M y z - zM y þ M y zM y - zM y 2
= M x ½M x , z þ ½M x , zM x þ M y M y , z þ M y , z M y ð4:73Þ
= i M y x þ xM y - M x y - yM x
= i M x y - yM x - M y x þ xM y þ 2M y x - 2M x y
= i 2iz þ 2M y x - 2M x y = 2i M y x - M x y þ iz :

In the above calculations, (1) we used [Mz, z] = 0 (with the second equality);
(2) RHS was modified so that the commutation relations can be used (the third
equality); (3) we used -Mxy = Mxy - 2Mxy and Myx = - Myx + 2Myx so that we can
use (4.65) (the second to the last equality). Moreover, using (4.65), (4.73) can be
written as

M 2 , z = 2i xM y - M x y = 2i M y x - yM x :

Similar results on the commutator can be obtained with [M2, x] and [M2, y]. For
further use, we give alternative relations such that

M 2 , x = 2i yM z - M y z = 2i M z y - zM y ,
ð4:74Þ
M 2 , y = 2iðzM x - M z xÞ = 2iðM x z - xM z Þ:

Using (4.73), we calculate another commutator such that

M2, M2, z ¼ 2i M 2 , M y x - M 2 , M x y þ i M 2 , z
¼ 2i M y M 2 , x - M x M 2 , y þ i M 2 , z
¼ 2i 2iM y yM z - M y z - 2iM x ðM x z - xM z Þ þ i M 2 , z

¼ - 2 2 Mxx þ Myy þ Mzz Mz - 2 Mx2 þ My2 þ Mz2 z þ M2z

- zM 2

¼ 2 M 2 z þ zM 2 :
ð4:75Þ

In the above calculations, (1) we used [M2, My] = [M2, Mx] = 0 (with the second
equality); (2) we used (4.74) (the third equality); (3) RHS was modified so that we
can use the relation M ⊥ x from the definition of the angular momentum operator,
i.e., Mxx + Myy + Mzz = 0 (the second to the last equality). We used [Mz, z] = 0 as
well. Similar results are obtained with x and y. That is, we have
148 4 Optical Transition and Selection Rules

M 2 , M 2 , x = 2 M 2 x þ xM 2 , ð4:76Þ

M 2 , M 2 , y = 2 M 2 y þ yM 2 : ð4:77Þ

Rewriting, e.g., (4.75), we have

M 4 z - 2M 2 zM 2 þ zM 4 = 2 M 2 z þ zM 2 : ð4:78Þ

Using the relation (4.78) and taking inner products of both sides, we get, e.g.,

l0 jM 4 z - 2M 2 zM 2 þ zM 4 jl = l0 j2 M 2 z þ zM 2 jl : ð4:79Þ

That is,

l0 jM 4 z - 2M 2 zM 2 þ zM 4 jl - l0 j2 M 2 z þ zM 2 jl = 0:

Considering that both terms of LHS contain a factor hl′| z| li in common, we have

l0 ðl0 þ 1Þ - 2l0 lðl0 þ 1Þðl þ 1Þ þ l2 ðl þ 1Þ2 - 2l0 ðl0 þ 1Þ - 2lðl þ 1Þ


2 2

× hl0 jzjli = 0, ð4:80Þ

where the quantum state jli is identical to jl, mi in (3.151) with m omitted.
To factorize the first factor of LHS of (4.80), we view it as a quartic equation with
respect to l′. Replacing l′ with -l, we find that the first factor vanishes, and so the first
factor should have a factor (l′ + l ). Then, we factorize the first factor of LHS of (4.80)
such that
[the first factor of LHS of (4.80)]

= ðl0 þ lÞ ðl0 - lÞ þ 2ðl0 þ lÞ l0 - l0 l þ l2 - 2l0 lðl0 þ lÞ - 2ðl0 þ lÞ - ðl0 þ lÞ


2 2 2 2

= ðl0 þ lÞ ðl0 þ lÞ l0 - lÞ2 þ 2 l0 - l0 l þ l2 - 2l0 l - 2 - ðl0 þ lÞ


2

= ðl0 þ lÞ ðl0 þ lÞ l0 - lÞ2 þ 2 l0 - 2l0 l þ l2 - ðl0 þ l þ 2Þ


2

= ðl0 þ lÞ ðl0 þ lÞðl0 - lÞ þ 2 l0 - lÞ2 - ðl0 þ l þ 2Þ


2

= ðl0 þ lÞ ðl0 - lÞ ½ðl0 þ lÞ þ 2 - ðl0 þ l þ 2Þ


2

= ðl0 þ lÞðl0 þ l þ 2Þðl0 - l þ 1Þðl0 - l - 1Þ: ð4:81Þ

Thus rewriting (4.80), we get


4.4 Selection Rules 149

ðl0 þ lÞðl0 þ l þ 2Þðl0 - l þ 1Þðl0 - l - 1Þhl0 jzjli = 0: ð4:82Þ

We have similar relations with respect to hl′| x| li and hl′| y| li because of (4.76) and
(4.77). For the electric dipole transition to be allowed, among hl′| x| li, hl′| y| li, and
hl′| z| li, at least one term must be non-vanishing. For this, at least one of the four
factors of (4.81) should be zero. Since l′ + l + 2 > 0, this factor is excluded.
For l′ + l to vanish, we should have l′ = l = 0; notice that both l′ and l are
nonnegative integers. We must then examine this condition. This condition is
equivalent to that the sphericalpharmonics related to the angular variables θ and ϕ
take the form of Y 00 ðθ, ϕÞ = 1= 4π , i.e., a constant. Therefore, the θ-related integral
for the matrix element hl′| z| li only consists of a following factor:
π π
1 1
cos θ sin θdθ = sin 2θdθ = - ½cos 2θπ0 = 0,
0 2 0 4

where cos θ comes from a polar coordinate z = r cos θ and sinθ is due to an
infinitesimal volume of space, i.e., r2 sin θdrdθdϕ. Thus, we find that hl′| z| li
vanishes on condition that l′ = l = 0. As a polar coordinate representation,
x = r sin θ cos ϕ and y = r sin θ sin ϕ, and so the ϕ-related integrals hl′| x| li and
hl′| y| li vanish as well. That is,

2π 2π
cos ϕdϕ = sin ϕdϕ = 0:
0 0

Therefore, the matrix elements relevant to l′ = l = 0 vanish with all the


coordinates, i.e., we have

hl0 jxjli = hl0 jyjli = hl0 jzjli = 0: ð4:83Þ

Consequently, we exclude (l′ + l )-factor as well, when we consider a condition of


the allowed transition. Thus, regarding the condition that should be satisfied with the
allowed transition, from (4.82) we get

l0 - l þ 1 = 0 or l0 - l - 1 = 0: ð4:84Þ

Or defining Δl  l′ - l, we get

Δl = ± 1: ð4:85Þ

Thus, for the transition to be allowed, the azimuthal quantum number must
change by one.
150 4 Optical Transition and Selection Rules

4.5 Angular Momentum of Radiation [6]

In Sect. 4.3 we mentioned circularly polarized light. If the circularly polarized light
acts on an electron, what can we anticipate? Here we deal with this problem within a
framework of a semiclassical theory.
Let E and H be an electric and magnetic field of a left-circularly polarized light,
respectively. They are expressed as

1
E = p E0 ðe1 þ ie2 Þ exp iðkz - ωt Þ, ð4:86Þ
2
1 1 E0
H = p H 0 ðe2 - ie1 Þ exp iðkz - ωt Þ = p ðe2 - ie1 Þ exp iðkz - ωt Þ: ð4:87Þ
2 2 μv

Here we assume that the light is propagating in the direction of the positive z-axis.
The electric and magnetic fields described by (4.86) and (4.87) represent the left-
circularly polarized light. A synchronized motion of an electron is expected, if the
electron exerts a circular motion in such a way that the motion direction of the
electron is always perpendicular to the electric field and parallel to the magnetic field
(see Fig. 4.2). In this situation, magnetic Lorentz force does not affect the electron
motion.
Here, the Lorentz force F(t) is described by


Fðt Þ = eEðxðt ÞÞ þ e xðt Þ × Bðxðt ÞÞ, ð4:88Þ

where the first term is electric Lorentz force and the second term represents the
magnetic Lorentz force.

Fig. 4.2 Synchronized y


motion of an electron under
a left-circularly polarized
light
electron motion

electron
O E x
4.5 Angular Momentum of Radiation [6] 151

The quantity B called magnetic flux density is related to H as B = μ0H, where


μ0 is permeability of vacuum; see (7.10) and (7.11) of Sect. 7.1. In (4.88) E and B are
measured at a position where the electron is situated at a certain time t. Equation
(4.88) universally describes the motion of a charged particle in the presence of
electromagnetic fields. We consider another related example in Chap. 15.
Equation (4.86) can be rewritten as

1
E = p E 0 ½e1 cosðkz - ωt Þ - e2 sinðkz - ωt Þ
2
ð4:89Þ
1
þi p E 0 ½e2 cosðkz - ωt Þ þ e1 sinðkz - ωt Þ:
2

Suppose that the electron exerts the circular motion in a region narrow enough
around the origin and that the said electron motion is confined within the xy-plane
that is perpendicular to the light propagation direction. Then, we can assume that
z ≈ 0 in (4.89). Ignoring kz in (4.89) accordingly and taking a real part, we have

1
E = p E 0 ðe1 cos ωt þ e2 sin ωt Þ:
2

Thus, a force F exerting the electron is described by

F = eE, ð4:90Þ

where e is an elementary charge (e < 0). Accordingly, an equation of motion of the


electron is approximated such that

x = eE,
m€ ð4:91Þ

where m is a mass of an electron and x is a position vector of the electron. With


individual components of the coordinate, we have

1 1
m€x = p eE0 cos ωt and m€y = p eE0 sin ωt: ð4:92Þ
2 2

Integrating (4.92) two times, we get

eE
mx = - p 0 cos ωt þ Ct þ D, ð4:93Þ
2ω2

where C and D are integration constants. Setting xð0Þ = - peE 0


2mω2
and x′(0) = 0, we
have C = D = 0. Similarly, we have
152 4 Optical Transition and Selection Rules

eE
my = - p 0 sin ωt þ C 0 t þ D0 , ð4:94Þ
2ω2

where C′ and D′ are integration constants. Setting y(0) = 0 and y0 ð0Þ = - peE 0 ,
2mω
we
have C′ = D′ = 0. Thus, making t a parameter, we get

2
eE 0
x2 þ y 2 = p : ð4:95Þ
2mω2

This implies that the electron is exerting a counterclockwise circular motion with
a radius - peE 0
2mω2
under the influence of the electric field. This is consistent with a
motion of an electron in the coherent state of ϕ(1s) and ϕ(2px + iy) as expressed in
(4.57).
An angular momentum the electron has acquired is

eE 0 meE e2 E 0 2
L = x × p = xpy - ypx = - p -p 0 = : ð4:96Þ
2mω2 2mω 2mω3

Identifying this with ħ, we have

e2 E 0 2
= ħ: ð4:97Þ
2mω3

In terms of energy, we have

e2 E 0 2
= ħω: ð4:98Þ
2mω2

Assuming a wavelength of the light is 600 nm, we need a left-circularly polarized


light whose electric field is about 1.5 × 1010 [V/m].
A radius α of a circular motion of the electron is given by

eE 0
α= p : ð4:99Þ
2mω2

Under the same condition as the above, α is estimated to be ~2 Å.

References

1. Jackson JD (1999) Classical electrodynamics, 3rd edn. Wiley, New York, NY


2. Fowles GR (1989) Introduction to modern optics, 2nd edn. Dover, New York, NY
References 153

3. Arfken GB, Weber HJ, Harris FE (2013) Mathematical methods for physicists, 7th edn. Aca-
demic Press, Waltham, MA
4. Chen JQ, Ping J, Wang F (2002) Group representation theory for physicists, 2nd edn. World
Scientific, Singapore
5. Sunakawa S (1991) Quantum mechanics. Iwanami, Tokyo. (in Japanese)
6. Rossi B (1957) Optics. Addison-Wesley, Reading, MA
Chapter 5
Approximation Methods of Quantum
Mechanics

In Chaps. 1–3 we have focused on solving eigenvalue equations with respect to a


particle confined within a one-dimensional potential well or a harmonic oscillator
along with an electron of a hydrogen-like atom. In each example we obtained exact
analytical solutions with the quantum-mechanical states and corresponding eigen-
values (energy, angular momentum, etc.). In most cases of quantum-mechanical
problems, however, we are not able to get such analytical solutions or accurately
determine the corresponding eigenvalues. Under these circumstances, we need
appropriate approximation methods of those problems. Among those methods, the
perturbation method and variational method are widely used.
In terms of usefulness, we provide several examples concerning physical systems
that have already appeared in Chaps. 1–3. In this chapter, we examine how these
physical systems change their quantum states and corresponding energy eigenvalues
as a result of undergoing influence from the external field. We assume that the
change results from the application of external electric field. For simplicity, we focus
on the change in eigenenergy and corresponding eigenstate with respect to the
non-degenerate quantum state. As specific cases in these examples, we happen to
be able to get perturbed physical quantities accurately. Including such cases, for later
purposes we take in advance important concepts of a complete orthonormal system
(CONS) and projection operator.

5.1 Perturbation Method

In Chaps. 1–3, we considered a situation where no external field is exerted on a


physical system. In Chap. 4, however, we studied the optical transition that takes
place as a consequence of the interaction between the physical system and electro-
magnetic wave. In this chapter, we wish to examine how the physical system
changes its quantum state, when an external electric field is applied to the system.
We usually assume that the external field is weak and the corresponding change in

© Springer Nature Singapore Pte Ltd. 2023 155


S. Hotta, Mathematical Physical Chemistry,
https://doi.org/10.1007/978-981-99-2512-4_5
156 5 Approximation Methods of Quantum Mechanics

the quantum state or energy is small. The applied external field causes the change in
Hamiltonian of a quantum system and results in the change in that quantum state
usually accompanied by the energy change of the system. We describe this process
as a following eigenvalue equation:

H j Ψi i = E i j Ψi i, ð5:1Þ

where H represents the total Hamiltonian, Ei is an energy eigenvalue, and Ψi is an


eigenstate corresponding to Ei. Equation (5.1) implies that there should be a series of
eigenvalues and corresponding eigenvectors belonging to those eigenvalues. Strictly
speaking, however, we can get such combinations of the eigenstate and eigenvalue
only by solving (5.1) analytically. This is some kind of circular argument. Yet, at
present we do not have to go into detail about that issue. Instead we advance to
discussion of the approximation methods from a practical point of view.
We assume that H is divided into two terms such that

H = H 0 þ λV, ð5:2Þ

where H0 is the Hamiltonian of the unperturbed system without the external field;
V is an additional Hamiltonian due to the applied external field, or interaction
between the physical system and the external field. The second term includes a
parameter λ that can be modulated according to the change in the strength of the
applied field. The parameter λ should be real so that H of (5.2) can be Hermitian. The
parameter λ can be the applied field itself or can merely be a dimensionless parameter
that can be set at λ = 1 after finishing the calculation. We will come back to this point
later in examples.
In (5.2) we assume that the following equation holds:

ð0Þ
H 0 j ii = E i j ii, ð5:3Þ

where jii is the i-th quantum state. We designate j0i as the ground state. We assume
the excited states jii (i = 1, 2, ⋯) to be numbered in order of increasing energies.
These functions (or vectors) jii (including j0i) can be a quantum state that appeared
as various eigenfunctions in the previous chapters. Of these, two or more functions
may have the same eigenenergy (i.e., degenerate). If we are to deal with such
degenerate states, those states jii have to be distinguished by, e.g., jidi (d = 1, 2,
⋯, s), where s denotes the degeneracy (or degree of degeneracy). For simplicity,
however, in this chapter we are going to examine only the change in energy and
physical states with respect to non-degenerate states. We disregard the issue on
notation of the degenerate states accordingly. We pay attention to each individual
case later, when necessary.
Regarding a one-dimensional physical system, e.g., a particle confined within
potential well and quantum-mechanical harmonic oscillator, for example, all the
quantum states are non-degenerate as we have already seen in Chaps. 1 and 2. As a
5.1 Perturbation Method 157

special case we also consider the ground state of a hydrogen-like atom. This is
because that state is non-degenerate, and so the problem can be dealt with in parallel
to the cases of the one-dimensional physical systems. We assume that the quantum
states jii (i = 0, 1, 2, ⋯) constitute the orthonormal eigenvectors such that

hjjii = δji : ð5:4Þ

Remember that the notation has already appeared in (2.53); see Chap. 13 as well.

5.1.1 Quantum State and Energy Level Shift Caused by


Perturbation

One of the most important applications of the perturbation method is to evaluate the
shift in quantum state and energy level caused by the applied external field. To this
end, we expand both the quantum state and energy as a power series of λ. That is, in
(5.1) we expand jΨii and Ei such that

ð1Þ ð2Þ ð3Þ


j Ψi i = j ii þ λ j ϕi þ λ 2 j ϕi þ λ 3 j ϕi þ ⋯, ð5:5Þ
ð0Þ ð1Þ ð2Þ ð3Þ
Ei = E i þ λEi þ λ2 Ei þ λ3 E i þ ⋯, ð5:6Þ

ð1Þ ð2Þ ð3Þ


where j ϕi i, j ϕi i, j ϕi i, etc. are chosen as correction terms for the state jii that
is associated with the unperturbed system. Once again, we assume that the state
jii (i = 0, 1, 2, ⋯) represents the non-degenerate normalized eigenvector that
ð0Þ
belongs to the eigenenergy Ei of the unperturbed state. Unknown state vectors
ð1Þ ð2Þ ð3Þ ð1Þ ð2Þ ð3Þ
j ϕi i, j ϕi i, j ϕi i, etc. as well as E i , E i , Ei , etc. are to be determined after
the calculation procedures carried out from now. These states jΨii and energies Ei
ð0Þ
result from the perturbation term λV and represent the deviation from jii and E i of
the unperturbed system.
With the normalization condition we impose the following condition upon jΨii
such that

hijΨi i = 1: ð5:7Þ

This condition, however, does not necessarily mean that jΨii has been normalized
(vide infra). From (5.4) and (5.7) we have

ð1Þ ð2Þ
ijϕi = ijϕi = ⋯ = 0: ð5:8Þ

Let us calculate hΨi| Ψii on condition of (5.7) and (5.8). From (5.5) we have
158 5 Approximation Methods of Quantum Mechanics

ð1Þ ð1Þ
hΨi jΨi i = hijii þ λ ijϕi þ hϕi jii
ð2Þ ð2Þ ð1Þ ð1Þ
þλ2 ijϕi þ hϕi jii þ hϕi jϕi i þ ⋯
ð1Þ ð1Þ ð1Þ ð1Þ
= hijii þ λ2 ϕi jϕi þ ⋯ ≈ 1 þ λ2 ϕi jϕi ,

where we used (5.4) and (5.8). Thus, hΨi| Ψii is normalized if we ignore the factor of
λ 2.
Now, inserting (5.5) and (5.6) into (5.1) and comparing the same power factors
with respect to λ, we obtain

ð0Þ
E i - H 0 j ii = 0, ð5:9Þ

ð0Þ ð1Þ ð1Þ


E i - H 0 j ϕi þ Ei j ii = V j ii, ð5:10Þ

ð0Þ ð2Þ ð1Þ ð1Þ ð2Þ ð1Þ


E i - H 0 j ϕi þ Ei j ϕi þ Ei j ii = V j ϕi : ð5:11Þ

Equation (5.9) is identical with (5.3). Operating hjj on both sides of (5.10) from
the left, we get

ð0Þ ð1Þ ð1Þ ð0Þ ð0Þ ð1Þ ð1Þ


jj E i - H 0 jϕi þ jjEi ji = jj E i - Ej jϕi þ Ei hjjii
ð0Þ ð0Þ ð1Þ ð1Þ
= Ei - Ej jjϕi þ E i δji = hjjVjii:
ð5:12Þ

Putting j = i on (5.12), we have

ð1Þ
E i = hijVjii: ð5:13Þ

This represents the first-order correction term with the energy eigenvalue.
Meanwhile, assuming j ≠ i on (5.12), we have

ð1Þ hjjVjii
jjϕi = ðj ≠ iÞ, ð5:14Þ
ð0Þ ð0Þ
Ei - Ej

ð0Þ ð0Þ
where we used E i ≠ E j on the assumption that the quantum state jii that belongs
ð0Þ
to eigenenergy Ei is non-degenerate. Here, we emphasize that the eigenstates that
ð0Þ
correspond to Ej may or may not be degenerate. In other words, the quantum state
that we are making an issue of with respect to the perturbation is non-degenerate. In
this context, we do not question whether or not other quantum states [represented by
jji in (5.14)] are degenerate.
5.1 Perturbation Method 159

Here we postulate that the eigenvectors, i.e., jii (i = 0, 1, 2, ⋯) of H0 form a


complete orthonormal system (CONS) such that

P k
j kihk j = E, ð5:15Þ

where jkihkj is said to be a projection operator and E is an identity operator. As


remarked just above, (5.15) holds regardless of whether the eigenstates are degen-
erate. The word of complete system implies that any vector (or function) can be
expanded by a linear combination of the basis vectors of the said system. The formal
definition of the projection operator can be seen in Chaps. 14 and 18.
Thus, we have

ð1Þ ð1Þ ð1Þ ð1Þ


j ϕi i = E j ϕ i i = k
j ki kjϕi = k≠i
j ki kjϕi
hkjVjii ð5:16Þ
= k≠i
j ki ,
ð0Þ ð0Þ
Ei - Ek

where with the third equality we used (5.8) and with the last equality we used (5.14).
Operating hij from the left on (5.16) and using (5.4), we recover (5.8). Hence, using
(5.5) the approximated quantum state to the first order of λ is described by

ð1Þ hkjVjii
j Ψi i ≈ j ii þ λ j ϕi i = j ii þ λ k≠i
j ki : ð5:17Þ
ð0Þ ð0Þ
Ei - Ek

For a while, let us think of a case where we deal with the change in the
eigenenergy and corresponding eigenstate with respect to the degenerate quantum
state. In that case, regarding the state jii in question on (5.12) we have ∃ j ji ( j ≠ i)
ð0Þ ð0Þ
that satisfies E i = E j . Therefore, we would have hj| V| ii = 0 from (5.12). Gener-
ally, it is not the case, however, and so we need a relation different from (5.12) to
deal with the degenerate case. However, once again we do not get into details about
this issue in this book.
ð2Þ
Next let us seek the second-order correction term E i of the eigenenergy.
Operating hjj on both sides of (5.11) from the left, we get

ð0Þ ð0Þ ð2Þ ð1Þ ð1Þ ð2Þ ð1Þ


Ei - Ej jjϕi þ Ei jjϕi þ E i δji = jjVjϕi :

Putting j = i in the above relation as before, we have

ð2Þ ð1Þ
Ei = ijVjϕi : ð5:18Þ

Notice that we used (5.8) to derive (5.18). Using (5.16) furthermore, we get
160 5 Approximation Methods of Quantum Mechanics

ð2Þ hkjVjii  hkjVjii


Ei = k≠i
hijVjki = k≠i
kjV { ji
ð0Þ ð0Þ ð0Þ ð0Þ
Ei - Ek Ei - Ek
ð5:19Þ
hkjVjii 1
= k≠i
hkjVjii = k≠i
j hk j V jiij2 ,
ð0Þ ð0Þ ð0Þ ð0Þ
Ei - Ek Ei - Ek

where with the second equality we used (1.116) in combination with (1.118) and
with the third equality we used the fact that V is an Hermitian operator; i.e., V{ = V.
The state jki is sometimes called an intermediate state.
Considering the expression of (5.16), we find that the approximated state of j
ð1Þ
ii þ λ j ϕi i in (5.5) is not an eigenstate of energy, because the approximated state
contains a linear combination of different eigenstates jki that have eigenenergies
ð0Þ
different from E i of jii. Substituting (5.13) and (5.19) for (5.6), with the energy
correction terms up to the second order we have

ð0Þ ð1Þ ð2Þ ð0Þ 1


E i ≈ Ei þ λE i þ λ2 Ei = Ei þ λhijVjii þ λ2 k≠i
j hk j V jiij2 :
ð0Þ ð0Þ
Ei - Ek
ð5:20Þ

If we think of the ground state j0i for jii, we get

ð0Þ ð1Þ ð2Þ


E0 ≈ E 0 þ λE0 þ λ2 E0
j hk j V j0ij2 : ð5:21Þ
ð0Þ 1
= E 0 þ λh0jVj0i þ λ2 k≠0 ð0Þ ð0Þ
E0 - Ek

ð0Þ ð0Þ
Since E 0 < E k ðk = 1, 2, ⋯Þ for any jki, the second-order correction term is
always negative. This contributes to the energy stabilization.

5.1.2 Several Examples

To deepen the understanding of the perturbation method, we show several examples.


We focus on the change in the eigenenergy and corresponding eigenstate with
respect to specific quantum states. We assume that the change is caused by the
applied electric field as a perturbation.
Example 5.1 Suppose that the charged particle is confined within a
one-dimensional potential well. This problem has already appeared in Example
1.2. Now let us consider how an energy of a particle carrying a charge e is shifted
by the applied electric field. This situation could experimentally be achieved using a
5.1 Perturbation Method 161

“nano capacitor” where, e.g., an electron is confined within the capacitor while
applying a voltage between the two electrodes.
We adopt the coordinate system the same as that of Example 1.2. That is, suppose
that we apply an electric field F between the electrodes that are positioned at x = ± L
and that the electron is confined between the two electrodes. Then, the Hamiltonian
of the system is described as

ħ2 d 2
H= - - eFx, ð5:22Þ
2m dx2

where m is a mass of the particle. Note that we may choose -eF for the parameter λ
of Sect. 5.1.1. Then, the coordinate representation of the Schrödinger equation is
given by

ħ2 d 2 ψ i ð xÞ
- - eFxψ i ðxÞ = E i ψ i ðxÞ, ð5:23Þ
2m dx2

where Ei is the energy of the i-th state from the bottom (i.e., the ground state); ψ i is its
corresponding eigenstate. We wish to seek the perturbation energy up to the second
order and the quantum state up to the first order. We are particularly interested in the
change in the ground state j0i. Notice that the ground state is obtained in (1.101) by
putting l = 0.
According to (5.21) we have

ð0Þ 1
E 0 ≈ E 0 þ λh0jVj0i þ λ2 k≠0
j hk j V j0ij2 , ð5:24Þ
ð0Þ ð0Þ
E0 - Ek

where V = - eFx. Since the field F is thought to be an adjustable parameter, in


(5.24) we may replace λ with -eF (vide supra). Then, in (5.24) V is replaced by x in
turn. In this way, (5.24) can be rewritten as

ð0Þ 1
E0 ≈ E0 - eF h0jxj0i þ ð- eF Þ2 k≠0
j hk j xj0ij2 : ð5:25Þ
ð0Þ ð0Þ
E0 - Ek

Considering that j0i is an even function [i.e., a cosine function in (1.101)] with
respect to x and that x is an odd function, we find that the second term vanishes.
Notice that the explicit coordinate representation of j0i is

1 π
j 0i = cos x, ð5:26Þ
L 2L
162 5 Approximation Methods of Quantum Mechanics


Fig. 5.1 Notation of the
quantum states that
distinguish cosine and sine ( ) ℏ
functions. = |2 ⟩ ≡ |4⟩
ð0Þ
E k ðk = 0, 1, 2, ⋯Þ
represents energy
eigenvalues of the
unperturbed states

( ) ℏ
= |2 ⟩ ≡ |3⟩

( ) ℏ
= |1 ⟩ ≡ |2⟩

( ) ℏ
= |1 ⟩ ≡ |1⟩
( ) ℏ
= |0 ⟩ ≡ |0⟩

π
which is obtained by putting l = 0 in j lcos i  1
L cos 2L þ lπL x ðl = 0, 1, 2, ⋯Þ in
(1.101).
By the same token, hk| x| 0i in the third term of RHS of (5.25) vanishes if jki
denotes a cosine function in (1.101). If, however, jki denotes a sine function, hk| x|
0i does not vanish. To distinguish the sine functions from cosine functions, we
denote

1 nπ
j nsin i  sin x ðn = 1, 2, 3, ⋯Þ ð5:27Þ
L L

that is identical with (1.102); see Fig. 5.1 with the notation of the quantum states.
ð0Þ ð0Þ 2 2
ħ2 π ħ2
 ð2n4LÞ 2π ðn = 1, 2, 3, ⋯Þ in (5.25), we
2
Now, putting E0 = 2m  4L 2 and E 2n - 1 = 2m
rewrite it as
5.1 Perturbation Method 163

ð0Þ 1 1
E0 ≈ E0 - eF h0jxj0i þ ð- eF Þ2 k=1
j hk j xj0ij2
ð0Þ ð0Þ
E0 - Ek
ð0Þ 1 1
= E 0 - eF h0jxj0i þ ð- eF Þ2 n=1
j hnsin j xj0ij2 ,
ð0Þ ð0Þ
E0 - E2n - 1

where with the second equality n denotes the number appearing in (5.27) and jnsini
ð0Þ
belongs to E2n - 1 . Referring to, e.g., (4.17) with regard to the calculation of hk| x| 0i,
we have arrived at the following equation described by

ħ2 π 2 322  8  mL4 1 n2
E0 ≈  2 - ð- eF Þ2 : ð5:28Þ
2m 4L π 6 ħ2 n=1
ð4n2 - 1Þ5

The series of the second term in (5.28) rapidly converges. Defining the first
N partial sum of the series as

N n2
Sð N Þ  n=1
,
ð4n2 - 1Þ 5

we obtain

Sð1Þ = 1=243 ≈ 0:004115226 and Sð6Þ ≈ Sð1000Þ ≈ 0:004120685

as well as

½Sð1000Þ - Sð1Þ
= 0:001324653:
Sð1000Þ

That is, only the first term of S(N ) occupies ~99.9% of the infinite series. Thus,
ð2Þ
the stabilization energy λ2 E0 due to the perturbation is satisfactorily given by

ð2Þ 322  8  mL4


λ2 E 0 ≈ ðeF Þ2 , ð5:29Þ
243π 6 ħ2
ð2Þ
where the notation λ2 E 0 is due to (5.6). Meanwhile, the corrected term of the
quantum state is given by
164 5 Approximation Methods of Quantum Mechanics

ð1Þ hkjxj0i
j Ψ0 i ≈ j 0i þ λ j ϕ0 ≈ j 0i - eF k≠0
j ki
ð0Þ ð0Þ
E0 - Ek

32  8  mL3 1 ð- 1Þn - 1 n
= j 0i þ eF j nsin i : ð5:30Þ
π 4 ħ2 n=1
ð4n2 - 1Þ3

For a reason similar to the above, the approximation is satisfactory, if we adopt


only n = 1 in the second term of RHS of (5.30). That is, we obtain

32  8  mL3
j Ψ0 i ≈ j 0i þ eF j 1sin i: ð5:31Þ
27π 4 ħ2

Thus, we have roughly estimated the stabilization energy and the corresponding
quantum state as in (5.29) and (5.31), respectively. It may well be worth estimating
rough numbers of L and F in the actual situation (or perhaps in an actual “nano
device”). The estimation is left for readers.
Example 5.2 [1] In Chap. 2 we dealt with a quantum-mechanical harmonic oscil-
lator. Here let us suppose that a charged particle (with its charge e) is performing
harmonic oscillation under an applied electric field. First we consider the coordinate
representation of Schrödinger equation. Without the external field, the Schrödinger
equation as an eigenvalue equation reads as

ħ2 d 2 uð qÞ 1
- þ mω2 q2 uðqÞ = EuðqÞ: ð2:108Þ
2m dq2 2

Under the applied electric field, the perturbation is expressed as -eFq and, hence,
the equation can be written as

ħ2 d 2 u ð q Þ 1
- þ mω2 q2 uðqÞ - eF ðqÞquðqÞ = EuðqÞ: ð5:32Þ
2m dq2 2

For simplicity we assume that the electric field is uniform independent of q so that
we can deal with it as a constant. Then, (5.32) can be rewritten as

ħ2 d 2 uð qÞ 1 eF 2
1 eF 2
- þ mω2 q - uðqÞ = E þ mω2 uðqÞ: ð5:33Þ
2m dq2 2 mω2 2 mω2

Changing the variable such that

eF
Q  q- , ð5:34Þ
mω2

we have
5.1 Perturbation Method 165

ħ2 d 2 uð qÞ 1 1 eF 2
- þ mω2 Q2 uðqÞ = E þ mω2 uðqÞ: ð5:35Þ
2m dQ2 2 2 mω2

Taking account of the change in functional form that results from the variable
transformation, we rewrite (5.35) as

ħ2 d 2 u ð Q Þ 1 1 eF 2
- þ mω2 Q2 uðQÞ = E þ mω2 uðQÞ, ð5:36Þ
2m dQ2 2 2 mω2

where

eF
uðqÞ = u Q þ  uðQÞ: ð5:37Þ
mω2

Defining E as

2
1 eF e2 F 2
E  E þ mω2 =E þ , ð5:38Þ
2 mω2 2mω2

we get

ħ2 d 2 u ð Q Þ 1
- þ mω2 Q2 uðQÞ = EuðQÞ: ð5:39Þ
2m dQ2 2

Thus, we recover exactly the same form as (2.108). Equation (5.39) can be solved
analytically as already shown in Sect. 2.4. From (2.110) and (2.120), we have

2E
λ , λ = 2n þ 1:
ħω

Then, we have

ħω ħωð2n þ 1Þ
E= λ= : ð5:40Þ
2 2

From (5.38), we get

e2 F 2 1 e2 F 2
E=E- 2
= ħω n þ - : ð5:41Þ
2mω 2 2mω2

This implies that the energy stabilization due to the applied electric field is
e2 F 2
- 2mω2 . Note that the system is always stabilized regardless of the sign of e.
166 5 Approximation Methods of Quantum Mechanics

Next, we wish to consider the problem on the basis of the matrix (or operator)
representation. Let us evaluate the perturbation energy using (5.20). Conforming the
notation of (5.20) to the present case, we have

1
E n ≈ Eðn0Þ - eF hnjqjni þ ð- eF Þ2 k≠n
j hk j qjnij2 , ð5:42Þ
ð0Þ
Eðn0Þ - Ek

where Eðn0Þ is given by

1
E ðn0Þ = ħω n þ : ð5:43Þ
2

Taking account of (2.55), (2.68), and (2.62), the second term of (5.42) vanishes.
With the third term, using (2.68) as well as (2.55) and (2.61) we have

ħ ħ 
jhkjqjnij2 = kja þ a{ jn kja þ a{ jn
2mω 2mω
ħ p p 2
= nhkjn - 1i þ n þ 1hkjn þ 1i
2mω
ħ p p
= nδk,n - 1 þ n þ 1δk,nþ1
2 ð5:44Þ
2mω
ħ
= nδk,n - 1 þ ðn þ 1Þδk,nþ1 þ 2δk,n - 1 δk,nþ1 nð n þ 1 Þ
2mω
ħ
= ½nδ þ ðn þ 1Þδk,nþ1 :
2mω k,n - 1

Notice that with the last equality of (5.44) there is no k that satisfies k = n -
1 = n + 1 at once. Also note that hk| q| ni is a real number. Hence, as the third term of
(5.42) we get

1 1 ħ
jhkjqjnij2 =  ½nδ þ ðn þ 1Þδk,nþ1 
k≠n ð0Þ
Enð0Þ - E k
k≠n ħωðn - k Þ 2mω k,n - 1

1 n nþ1 1
= þ =- :
2mω2 n - ðn - 1Þ n - ðn þ 1Þ 2mω2
ð5:45Þ

Thus, from (5.42) we obtain

1 e2 F 2
E n ≈ ħω n þ - : ð5:46Þ
2 2mω2

Consequently, we find that (5.46) obtained by the perturbation method is consis-


tent with (5.41) that was obtained as the exact solution. As already pointed out in
5.1 Perturbation Method 167

Sect. 5.1.1, however, (5.46) does not represent an eigenenergy. In fact, using (5.17)
together with (4.26) and (4.28), we have

ð1Þ hk jV j0i
jΨ0 i ≈ j0i þ λ j ϕ0 ¼ j0i þ λ k≠0
j ki
ð0Þ ð0Þ
E0 - Ek
h1jV j0i
¼ j0i þ λ j 1i
ð0Þ
E0 - E1
ð0Þ ð5:47Þ

h1jxj0i e2 F 2
¼ j0i - eF j 1i ¼ j0i þ j 1i:
ð0Þ
E0 - E1
ð0Þ 2mω3 ħ

Thus, jΨ0i is not an eigenstate because jΨ0i contains both j0i and j1i that have
different eigenenergies. Hence, jΨ0i does not possess a fixed eigenenergy. Notice
also that in (5.47) the factors hk| x| 0i vanish except for h1| x| 0i; see Chaps. 2 and 4.
To think of this point further, let us come back to the coordinate representation of
Schrödinger equation. From (5.39) and (2.106), we have

1
mω 1 mω
Q e - 2ħ Q
4 mω 2
un ðQÞ = Hn ðn = 0, 1, 2, ⋯Þ,
ħ π 2n n!
1
2 ħ


where H n ħ Q is the Hermite polynomial of the n-th order. Using (5.34) and
(5.37), we replace Q with q - eF
mω2 to obtain the following form described by

eF
un ðqÞ = un q -
mω2
1 2
mω 1 mω eF
e - 2ħ q-
4 mω eF
= Hn q- mω2 ðn = 0, 1, 2, ⋯Þ:
ħ π 2 2n n!
1
ħ mω2
ð5:48Þ

Since ψ n(q) of (2.106) forms the (CONS) [2], un(q) should be expanded using
ψ n(q). That is, as a solution of (5.32) we get

u n ð qÞ = c ψ ðqÞ,
k nk k
ð5:49Þ

where a set of cnk are appropriate coefficients. Since the functions ψ k(q) are
non-degenerate, un(q) expressed by (5.49) lacks a definite eigenenergy.
Example 5.3 [1] In the previous examples we studied the perturbation in
one-dimensional systems, where all the energy eigenstates are non-degenerate.
Here we deal with the change in the energy and quantum states of a hydrogen
atom (i.e., a three-dimensional system). Since the energy eigenstates are generally
degenerate (see Chap. 3), for simplicity we consider only the ground state that is
168 5 Approximation Methods of Quantum Mechanics

non-degenerate. As in the previous two cases, we assume that the perturbation is


caused by the applied electric field. As the simplest example, we deal with the
properties including the polarizability of the hydrogen atom in its ground state. The
polarizability of atom, molecule, etc. is one of the most fundamental properties in
materials science.
We assume that the electric field is applied along the direction of the z-axis; in this
case the perturbation term V is expressed as

V = - eFz, ð5:50Þ

where e is a charge of an electron (e < 0). Then, the total Hamiltonian H is described
by

H = H 0 þ V, ð5:51Þ
2
ħ2 ∂ ∂ 1 ∂ ∂ 1 ∂
H0 = - r2 þ sin θ þ
2μr 2 ∂r ∂r sin θ ∂θ ∂θ sin 2 θ ∂ϕ2
e2
- , ð5:52Þ
4πε0 r

where H0 is identical with the Hamiltonian H of (3.35), in which Z = 1.


In this example, we discuss topics on the polarization and energy shift (Stark
effect) caused by the applied electric field [1, 3].
1. Polarizability of a hydrogen atom in the ground state
As in (5.5) and (5.16), the first-order perturbed state is described by

hkjVj0i
j Ψ0 i ≈ j 0i þ k≠0
j ki = j 0i - eF k≠0
ð0Þ ð0Þ
E0 - Ek
hkjzj0i
j ki , ð5:53Þ
ð0Þ ð0Þ
E0 - Ek

where j0i denotes the ground state expressed as (3.301). Since the quantum states
jki represent all the quantum states of the hydrogen atom as implied in (5.3), in
the second term of RHS in (5.53) we are considering all those states except for
ð0Þ ħ2
j0i. The eigenenergy E 0 is identical with E 1 = - 2μa 2 obtained from (3.258),

where n = 1. As already noted, we simply numbered the states jki in order of


ð0Þ ð0Þ
increasing energies. In the case of k ≠ j (k, j ≠ 0), we may have either Ek = E j
ð0Þ ð0Þ
or E k ≠ E j accordingly.
Using (5.51), let us calculate the polarizability of a hydrogen atom in a ground
state. In Sect. 4.1 we gave a definition of the electric dipole moment such that
5.1 Perturbation Method 169

Pe x,
j j
ð4:6Þ

where xj is a position vector of the j-th charged particle. Placing the proton of
hydrogen at the origin, by use of the notation (3.5) P is expressed as

Px ex
P = ðe1 e2 e3 Þ Py = ex = ðe1 e2 e3 Þ ey :
Pz ez

Hence, the z-component of the electric dipole moment Pz of electron is given


by

Pz = ez: ð5:54Þ

The quantum-mechanical analogue of (5.54) is given by

hPz i = ehΨ0 jzjΨ0 i, ð5:55Þ

where jΨ0i is taken from (5.53). Substituting (5.53) for (5.55), we get

hk jzj0i hk jzj0i
hΨ0 jzjΨ0 i ¼ h0j - eF k≠0
hkj z j0i - eF k≠0
jki
ð0Þ ð0Þ ð0Þ ð0Þ
E0 - Ek E0 - Ek
hk jzj0i
¼ h0jzj0i - eF k≠0
h0jzjk i
ð0Þ ð0Þ
E0 - Ek
hk jzj0i
- eF k≠0
0 z{ k
ð0Þ ð0Þ
E0 - Ek
jhk jzj0ij2
þ ðeF Þ2 k≠0
hkjzjk i 2
:
ð0Þ ð0Þ
E0 - Ek
ð5:56Þ

In (5.56) readers are referred to the computation rule of inner product


described in Chap. 1 or in Chap. 13 for further details. Since z is Hermitian
(i.e., z{ = z), the second and third terms of RHS of (5.56) equal. Neglecting the
second-order perturbation factor on the fourth term, we have
170 5 Approximation Methods of Quantum Mechanics

jhkjzj0ij2
hΨ0 jzjΨ0 i ≈ h0jzj0i - 2eF k≠0
:
ð0Þ ð0Þ
E0 - Ek

Thus, from (5.55) we obtain

jhkjzj0ij2
hPz i ≈ eh0jzj0i - 2e2 F k≠0
: ð5:57Þ
ð0Þ ð0Þ
E0 - Ek

Using (3.301), the coordinate representation of h0| z| 0i is described by


1
a-3
h0jzj0i = ze - 2r=a dxdydz
π -1
1 π
a-3 2π
= r 3 e - 2r=a dr dϕ cos θ sin θdθ ð5:58Þ
π 0 0 0
1 π
a-3 2π
= r 3 e - 2r=a dr dϕ sin 2θdθ = 0,
2π 0 0 0

π
where we used 0 sin 2θdθ = 0. Then, we have

jhkjzj0ij2
hPz i ≈ - 2e2 F k≠0
: ð5:59Þ
ð0Þ ð0Þ
E0 - Ek

On the basis of classical electromagnetism, we define the polarizability α as [4]

α  hPz i=F: ð5:60Þ

Here we have an important relationship between the polarizability and electric


dipole moment. That is,

ðpolarizabilityÞ × ðelectric fieldÞ = ðelectric dipole momentÞ:

From (5.59) and (5.60) we have

jhkjzj0ij2
α = - 2e2 k≠0
: ð5:61Þ
ð0Þ ð0Þ
E0 - Ek

At the first glance, to evaluate (5.61) appears to be formidable, but making the
most of the fact that the total wave functions of a hydrogen atom form the CONS,
the evaluation is straightforward. The discussion is as follows:
First, let us seek an operator G that satisfies [1]
5.1 Perturbation Method 171

z j 0i = ðGH 0 - H 0 GÞ j 0i: ð5:62Þ

To this end, we take account of all the quantum states jki of a hydrogen atom
(including j0i) that are solutions (or eigenstates of energy) of (3.36). In Sect. 3.8
we have obtained the total wave functions described by

ðnÞ ðnÞ
Λl,m = Y m
l ðθ, ϕÞRl ðr Þ, ð3:300Þ

ðnÞ
where Λl,m is expressed as a product of spherical surface harmonics and radial
wave functions. The latter functions are described by associated Laguerre func-
tions. It is well-known [2] that the spherical surface harmonics constitute the
CONS on a unit sphere and that associated Laguerre functions form the CONS in
the real one-dimensional space. Thus, their product expressed as (3.300), i.e.,
aforementioned collection of jki constitutes the CONS in the real three-
dimensional space. This is equivalent to

j
j jihj j = E, ð5:63Þ

where E is an identity operator.


The implications of (5.63) are as follows: Take any function jf(r)i and operate
(5.63) from the left. Then, we get

Ef ðrÞ = f ðrÞ = k
j kihkjf ðrÞi = f
k k
j ki: ð5:64Þ

In other words, (5.64) implies that any function f(r) can be expanded into a
series of jki. The coefficients fk are defined as

f k  hkjf ðrÞi: ð5:65Þ

Those are the so-called Fourier coefficients. Related discussion is given in


Sect. 10.4 as well as Chaps. 14, 18, 20, and 22.
We further impose the conditions on the operator G defined in (5.62).
(1) G commutes with r (=|r|). (2) G has a functional form described by
G = G(r, θ). (3) G does not contain a differential operator. On those conditions,
we have ∂G/∂ϕ = 0. From (3.301), we have ∂ j 0i/∂ϕ = 0. Thus, we get

ðGH 0 - H 0 GÞ j 0i
ħ2 1 ∂ ∂ 1 ∂ ∂ 1 1 ∂ ∂
=- G 2 r2 - 2 r2 G- 2 sin θ G j 0i:
2μ r ∂r ∂r r ∂r ∂r r sin θ ∂θ ∂θ
ð5:66Þ
172 5 Approximation Methods of Quantum Mechanics

This calculation is somewhat complicated, and so we calculate (5.66)


termwise. With the second term of (5.66), we have

1 ∂ ∂ 1 ∂ ∂G 1 ∂ ∂ j 0i
- r2 Gj0 =- 2 r2 j0 - r2 G
r2 ∂r ∂r r ∂r ∂r r 2 ∂r ∂r
2
1 ∂G ∂ G ∂G ∂ j 0i G ∂ ∂ j 0i ∂G ∂ j 0i
=-  2r j0 - j0 - - 2 r2 -
r2 ∂r ∂r2 ∂r ∂r r ∂r ∂r ∂r ∂r
2
2 ∂G ∂ G ∂G ∂ j 0i G ∂ ∂ j 0i
=- j0 - j 0 -2 - 2 r2
r ∂r ∂r 2 ∂r ∂r r ∂r ∂r
2
2 ∂G ∂ G 2 ∂G G ∂ ∂ j 0i
=- j0 - j0 þ j0 - 2 r2 ,
r ∂r ∂r 2 a ∂r r ∂r ∂r
ð5:67Þ

where with the last equality we used (3.301), namely j0i / e-r/a, where a is Bohr
radius (Sect. 3.7.1). The last term of (5.67) cancels the first term of (5.66). Then,
we get

ħ2 1 1 ∂G
ðGH 0 - H 0 GÞj0i ¼ 2 - 0
2μ r a ∂r

2
∂ G 1 1 ∂ ∂G ð5:68Þ
þ 0 þ 2 sin θ 0i
∂r2 r sin θ ∂θ ∂θ
2
ħ2 1 1 ∂G ∂ G 1 1 ∂ ∂G
¼ 2 - þ 2 þ 2 sin θ j0i:
2μ r a ∂r ∂r r sin θ ∂θ ∂θ

From (5.62) we obtain

hrjzj0i = hrjðGH 0 - H 0 GÞj0i ð5:69Þ

so that we can have the coordinate representation. As a result, we get

ħ2
r cos θϕð1sÞ =

2
1 1 ∂G ∂ G 1 1 ∂ ∂G
 2 - þ 2 þ 2 sin θ ϕð1sÞ, ð5:70Þ
r a ∂r ∂r r sin θ ∂θ ∂θ

where ϕ(1s) is given by (3.301). Dividing (5.70) by ϕ(1s) and rearranging it, we
have
5.1 Perturbation Method 173

2
∂ G 1 1 ∂G 1 1 ∂ ∂G 2μ
þ2 - þ sin θ = 2 r cos θ: ð5:71Þ
∂r 2 r a ∂r r 2 sin θ ∂θ ∂θ ħ

Now, we assume that G has a functional form of

Gðr, θÞ = gðr Þ cos θ: ð5:72Þ

Inserting (5.72) into (5.71), we have

d2 g 1 1 dg 2g 2μ
þ2 - - 2 = 2 r: ð5:73Þ
dr 2 r a dr r ħ

Equation (5.73) has a regular singular point at the origin and resembles an
Euler equation whose general form of a homogeneous equation is described by
[5]

d 2 g a dg b
þ þ g = 0, ð5:74Þ
dr 2 r dr r2

where a and b are arbitrary constants. In particular, if we have a following


differential equation described by

d2 g a dg a
þ - 2 g = 0, ð5:75Þ
dr 2 r dr r

we immediately see that one of the particular solutions is g = r. However, (5.73)


differs from the general second-order Euler equation by the presence of - 2a dgdr .
Then, let us assume that a particular solution g has a form of

g = pr 2 þ qr: ð5:76Þ

Inserting (5.76) into (5.73), we get

4pr 2q 2μ
4p - - = 2 r: ð5:77Þ
a a ħ

Comparing coefficient of r and constant term, we obtain

aμ a2 μ
p= - and q= - : ð5:78Þ
2ħ2 ħ2

Hence, we get
174 5 Approximation Methods of Quantum Mechanics

aμ r
gð r Þ = - þ a r: ð5:79Þ
ħ2 2

Accordingly, we have

aμ r aμ r
Gðr, θÞ = gðr Þ cos θ = - þ a r cos θ = - 2 þ a z: ð5:80Þ
ħ2 2 ħ 2

This is a coordinate representation of G(r, θ).


Returning back to (5.62) and operating hjj from the left on its both sides, we
have

hjjzj0i = hjjðGH 0 - H 0 GÞj0i = hjjGH 0 j0i - hjjH 0 Gj0i


ð0Þ ð0Þ ð5:81Þ
= E0 - Ej hjjGj0i:

Notice here that

ð0Þ
H 0 j 0i = E0 j 0i ð5:82Þ

and that

{ {
ð0Þ ð0Þ
hj j H 0 = H 0 { jji = ½H 0 jji{ = Ej jji = E j hj j , ð5:83Þ

where we used a computation rule of (1.118) and with the second equality we
used the fact that H0 is Hermitian, i.e., H0{ = H0. Rewriting (5.81), we have

hjjzj0i
hjjGj0i = ð0Þ ð0Þ
: ð5:84Þ
E0 - Ej

Multiplying h0|z|ji on both sides of (5.84) and summing over j (≠0), we obtain

h0jzjjihjjzj0i
j≠0
h0jzjjihjjGj0i = j≠0
: ð5:85Þ
ð0Þ ð0Þ
E0 - Ej

Adding h0| z| 0ih0| G| 0i on both sides of (5.85), we have

h0jzjjihjjzj0i
h0jzjjihjjGj0i = j≠0
þ h0jzj0ih0jGj0i: ð5:86Þ
j ð0Þ ð0Þ
E0 - Ej

But, from (5.58) h0| z| 0i = 0. Moreover, using the completeness of jji


described by (5.63) for LHS of (5.86), we get
5.1 Perturbation Method 175

h0jzjjihjjzj0i
h0jzGj0i = h0jzjjihjjGj0i = j≠0
j ð0Þ ð0Þ
E0 - Ej
ð5:87Þ
jhjjzj0ij2 α
= j≠0
= - 2,
ð0Þ ð0Þ 2e
E0 - Ej

where with the second equality we used (5.84) and with the last equality we used
(5.61). Also, we used the fact that z is Hermitian.
Meanwhile, using (5.80), LHS of (5.87) is expressed as

aμ r aμ a2 μ
h0jzGj0i = - 0jz þ a zj0 = - h0jzrzj0 i - 0jz2 j0 : ð5:88Þ
ħ2 2 2ħ2 ħ2

Noting that j0i is spherically symmetric and that z and r are commutative,
(5.88) can be rewritten as

aμ a2 μ
h0jzGj0i = - 0jr 3
j0 - 0jr 2 j0 , ð5:89Þ
6ħ2 3ħ2

where we used h0| x2| 0i = h0| y2| 0i = h0| z2| 0i = h0| r2| 0i/3.
Returning back to the coordinate representation and taking account of vari-
ables change from the Cartesian coordinate to the polar coordinate as in (5.58),
we get

aμ 4π 120a6 a2 μ 4π 3a5 aμ 5a3 a2 μ 4a3


h0jzGj0i = - - = - - 2
6ħ2 πa3 64 3ħ2 πa3 4 ħ2 4 ħ 4
aμ 9a3
=- : ð5:90Þ
ħ2 4

To perform the definite integral calculations of (5.89) and obtain the result of
(5.90), modify (3.263) and use the formula described below. That is, differentiate
1
1
e - rξ dr =
0 ξ

four (or five) times with respect to ξ and replace ξ with 2/a to get the result of
(5.90) appropriately.
Using (5.87) and (5.90), as the polarizability α we finally get
176 5 Approximation Methods of Quantum Mechanics

aμ 9a3 e2 aμ 9a3 9a3


α = - 2e2 h0jzGj0i = - 2e2  - = = 4πε 0
ħ2 4 ħ2 2 2
= 18πε0 a3 , ð5:91Þ

where we used

a = 4πε0 ħ2 =μe2 : ð5:92Þ

See Sect. 3.7.1 for this relation.


2. Stark effect of a hydrogen atom in a ground state
The energy shift with the ground state of a hydrogen atom up to the second
order is given by

ð0Þ 1
E 0 ≈ E 0 - eF h0jzj0i þ ð- eF Þ2 j≠0
j j j zj0ij2 : ð5:93Þ
ð0Þ ð0Þ
E0 - Ej

Using (5.58) and (5.87) obtained above, we readily get

ð0Þ αe2 F 2 ð0Þ αF 2 ð0Þ


E0 ≈ E0 - = E 0 - = E0 - 9πε0 a3 F 2 : ð5:94Þ
2e2 2

Experimentally, the energy shift by -9πε0a3F2 is well-known as the Stark


shift.
In the above examples, we have seen how energy levels and related properties are
changed by the applied electric field. The energies of the system that result from the
applied external field should not be considered as an energy eigenvalue but should be
thought to be an expectation value.
The perturbation theory has a wide application in physical problems. Examples
include the evaluation of transition probability between quantum states. The theory
also deals with scattering of particles. Including the treatment of the degenerate case,
interested readers are referred to appropriate literature [1, 3].

5.2 Variational Method

Another approximation method is a variational method. This method also has a wide
range of applications in mathematical physics.
A major application of the variational method lies in seeking an eigenvalue that is
appropriately approximated. Suppose that we have an Hermitian differential opera-
tor D that satisfies an eigenvalue equation
5.2 Variational Method 177

Dyn = λn yn ðn = 1, 2, 3, ⋯Þ, ð5:95Þ

where λn is a real eigenvalue and yn is a corresponding eigenfunction. We have


already encountered one of the typical differential operators and eigenvalue equa-
tions in (1.63) and (1.64). In (5.95) we assume that

λ1 ≤ λ2 ≤ λ3 ≤ ⋯, ð5:96Þ

where corresponding eigenstates may be degenerate.


We also assume that a collection {yn; n = 1, 2, 3, ⋯} constitutes the CONS. Now
suppose that we have a function u that satisfies the boundary conditions (BCs) the
same as those for yn that is a solution of (5.95). Then, we have

hyn jDui = hDyn jui = hλn yn jui = λn hyn jui = λn hyn jui: ð5:97Þ

In (5.97) we used (1.132) and the computation rule of the inner product (see Sect.
13.1). Also, we used the fact that any eigenvalue of an Hermitian operator is real
(Sect. 1.4). From the assumption that the functions {yn} constitute the CONS, u can
be expanded in a series described by

u= c
j j
j yj i ð5:98Þ

with

cj  yj ju : ð5:99Þ

This relation is in parallel with (5.64). Similarly, Du can be expanded such that

Du = d
j j
j yj i = j
yj jDu j yj i = λ
j j
yj ju j yj i = λc
j j j
j yj i, ð5:100Þ

where dj is an expansion coefficient and with the third equality we used (5.95) and
with the last equality we used (5.99). Comparing the individual coefficient of jyji of
(5.100), we get

d j = λ j cj : ð5:101Þ

Thus, we obtain

Du = λc
j j j
j yj i: ð5:102Þ

Comparing (5.98) and (5.102), we find that we may termwise operate D on (5.98).
Meanwhile, taking an inner product hu| Dui again termwise with (5.100), we get
178 5 Approximation Methods of Quantum Mechanics

λj cj cj ¼
2
hujDui ¼ u j
λ j cj j yj ¼ j
λj cj ujyj ¼ j j
λ j cj
2 2
≥ j
λ1 cj ¼ λ1 j
cj , ð5:103Þ

where with the third equality we used (5.99) in combination with



ujyj = yj ju = cj . With hu| ui, we have

c c
2
hujui = c hy j
j j j
c jy i
j j j
= j j j
yj jyj = j
cj , ð5:104Þ

where with the last equality we used the fact that the functions {yn} are normalized.
Comparing once again (5.103) and (5.104), we finally get

hujDui
hujDui ≥ λ1 hujui or λ1 ≤ : ð5:105Þ
hujui

If we choose y1 for u, we have an equality in (5.105). Thus, we reach the


following important theorem.
Theorem 5.1 Suppose that we have a linear differential equation described by

Dy = λy, ð5:106Þ

where D is a suitable Hermitian differential operator. Suppose also that under


appropriate boundary conditions (BCs), (5.106) has a series of (real) eigenvalues.
Then, the smallest eigenvalue is equal to the minimum of hu| Dui/hu| ui.
Using this powerful theorem, we are able to find a suitably approximated smallest
eigenvalue. We have simple examples next. As before, let us focus on the change in
the energies and quantum states caused by the applied electric field for a particle in a
one-dimensional potential well and a harmonic oscillator. The latter problem will be
left for readers as an exercise. As a specific case, for simplicity, we consider only the
ground state and first excited state to choose a trial function. First, we deal with the
problem in common with both the cases. Then, we discuss the feature individually.
As a general case, suppose that H is the Hamiltonian of the system (either a
particle in a one-dimensional potential well or a harmonic oscillator). We calculate
an expectation value of energy hHi when the system undergoes an influence of the
electric field. The Hamiltonian is described by

H = H 0 - eFx, ð5:107Þ

where H0 is the Hamiltonian without the external field F. We suppose that the
quantum state jui is expressed as
5.2 Variational Method 179

j ui ≈ j 0i þ a j 1i, ð5:108Þ

where j0i and j1i denote the ground state and first excited state, respectively; a is an
unknow parameter to be decided after the analysis based on the variational method.
In the present case, jui is a trial function. Then, we have

hujHui
hH i  : ð5:109Þ
hujui

With the numerator, we have

hujHui = 0jþa 1j H 0 - eFx j0 þ aj1


= h0jH 0 j0i þ ah0jH 0 j1i - eF h0jxj0i - eFah0jxj1i
þa h1jH 0 j0 i þ a ah1jH 0 j1i - eFa h1jxj0i - eFa ah1jxj1i
= h0jH 0 j0i - eFah0jxj1i þ a ah1jH 0 j1i - eFa h1jxj0i: ð5:110Þ

Note that in (5.110) four terms vanish because of the symmetry requirement of the
symmetric potential well and harmonic oscillator with respect to the origin as well as
because of the orthogonality between the states j0i and j1i. Here x is Hermitian and
we assume that the coordinate representation of j0i and j1i is real as in (1.101) and
(1.102) or in (2.106). Then, we have

h1jxj0i = h1jxj0i = 0jx{ j1 = h0jxj1i:

Also, assuming that a is a real number, we rewrite (5.110) as

hujHui = h0jH 0 j0i - 2eFah0jxj1i þ a2 h1jH 0 j1i: ð5:111Þ

As for the denominator of (5.109), we have

hujui = 0jþa 1j j0 þ aj1 = 1 þ a2 , ð5:112Þ

where we used the normalized conditions h0| 0i = h1| 1i = 1 and the orthogonal
conditions h0| 1i = h1| 0i = 0. Thus, we get

h0jH 0 j0i - 2eFah0jxj1i þ a2 h1jH 0 j1i


hH i = : ð5:113Þ
1 þ a2

Here, defining the ground state energy and first excited energy as E0 and E1,
respectively, we put
180 5 Approximation Methods of Quantum Mechanics

H 0 j 0i = E0 j 0i and H 0 j 1i = E 1 j 1i, ð5:114Þ

where E0 ≠ E1. As already shown in Chaps. 1 and 2, both the systems have
non-degenerate energy eigenstates. Then, we have

E0 - 2eFah0jxj1i þ a2 E 1
hH i = : ð5:115Þ
1 þ a2

Now we wish to determine the condition where hHi has an extremum. That is, we
are seeking a that satisfies

∂hH i
= 0: ð5:116Þ
∂a

Calculating LHS of (5.116) by use of (5.115), we have

∂hH i 2eF h0jxj1ia2 þ 2ðE 1 - E0 Þa - 2eF h0jxj1i


= : ð5:117Þ
∂a ð 1 þ a2 Þ 2

For ∂∂a
hH i
to be zero, we must have

eF h0jxj1ia2 þ ðE1 - E 0 Þa - eF h0jxj1i = 0: ð5:118Þ

Then, solving the quadratic equation (5.118), we obtain

2
E 0 - E 1 ± ðE 1 - E 0 Þ 1 þ ½2eF h0jxj1i
ðE - E Þ2
1 0
a= : ð5:119Þ
2eF h0jxj1i

Taking the plus sign in the numerator of (5.119) and using

p Δ
1 þ Δ≈1 þ ð5:120Þ
2

for small Δ, we obtain

eF h0jxj1i
a≈ : ð5:121Þ
E1 - E0

Thus, with the optimized state of (5.108) we get


5.2 Variational Method 181

eF h0jxj1i
j ui ≈ j 0i þ j 1i: ð5:122Þ
E1 - E0

If one compares (5.122) with (5.17), one should recognize that these two equa-
tions are the same within the first-order approximation. Notice that putting i = 0 and
k = 1 in (5.17) and replacing V with -eFx, we have the expression similar to (5.122).
Notice once again that h0| x| 1i = h1| x| 0i as x is an Hermitian operator and that we
are considering real inner products.
Inserting (5.121) into (5.115) and approximating

1
≈ 1 - a2 , ð5:123Þ
1 þ a2

we can estimate hHi. The estimation depends upon the nature of the system we
choose. The next examples deal with these specific characteristics.
Example 5.4 We adopt the same problem as Example 5.1. That is, we consider how
an energy of a particle carrying a charge e confined within a one-dimensional
potential well is changed by the applied electric field. Here we deal with it using
the variational method.
We deal with the same Hamiltonian as in the case of Example 5.1. It is described
as

ħ2 d 2
H= - - eFx: ð5:22Þ
2m dx2

By putting

ħ2 d 2
H0  - , ð5:124Þ
2m dx2

we rewrite the Hamiltonian as

H = H 0 - eFx:

Expressing the ground state as j0i and the first excited state as j1i, we adopt a trial
function jui as

j ui = j 0i þ a j 1i: ð5:125Þ

In the coordinate representation, we have


182 5 Approximation Methods of Quantum Mechanics

1 π
j 0i = cos x ð5:26Þ
L 2L

and

1 π
j 1i = sin x: ð5:126Þ
L L

From Example 1.2 of Sect. 1.3, we have

ħ2 π 2
h0jH 0 j0i = E 0 =  ,
2m 4L2
ð5:127Þ
ħ2 4π 2
h1jH 0 j1i = E 1 =  = 4E 0 :
2m 4L2

Inserting these results into (5.121), we immediately get

h0jxj1i
a ≈ eF : ð5:128Þ
3E0

As in Example 5.1 of Sect. 5.1, we have

32L
h0jxj1i = : ð5:129Þ
9π 2

For this calculation, use the coordinate representation of (5.26) and (5.126). Thus,
we get

32  8  mL3
a ≈ eF : ð5:130Þ
27π 4 ħ2

Meanwhile, from (5.125) we obtain

h0jxj1i
j ui ≈ j 0i þ eF j 1i: ð5:131Þ
3E0

The resulting jui in (5.131) is the same as (5.31), which was obtained by the first-
order approximation of (5.30). Inserting (5.130) into (5.113) and approximating the
denominator as before such that

1
≈ 1 - a2 , ð5:123Þ
1 þ a2

and further using (5.130) for a, we get


References 183

322  8  mL4
hH i ≈ E 0 - ðeF Þ2 : ð5:132Þ
243π 6 ħ2

Once again, this is the same result as that of (5.28) and (5.29) where only n = 1 is
taken into account.
Example 5.5 We adopt the same problem as Example 5.2. The calculation pro-
cedures are almost the same as those of Example 5.2. We use

ħ
h0jqj1i = and E 1 - E 0 = ħω: ð5:133Þ
2mω

The detailed procedures are left for readers as an exercise. We should get the same
results as those obtained in Example 5.2.
As discussed in the above five simple examples, we showed the calculation
procedures of the perturbation method and variational method. These methods not
only supply us with suitable approximation techniques, but also provide physical
and mathematical insight in many fields of natural science.

References

1. Sunakawa S (1991) Quantum mechanics. Iwanami, Tokyo. (in Japanese)


2. Byron FW Jr, Fuller RW (1992) Mathematics of classical and quantum physics. Dover,
New York, NY
3. Schiff LI (1955) Quantum mechanics, 2nd edn. McGraw-Hill, New York, NY
4. Jackson JD (1999) Classical electrodynamics, 3rd edn. Wiley, New York, NY
5. Coddington EA (1989) An introduction to ordinary differential equations. Dover, New York, NY
Chapter 6
Theory of Analytic Functions

Theory of analytic functions is one of the major fields of modern mathematics. Its
application covers a broad range of topics of natural science. A complex function
f(z), or a function that takes a complex number z as a variable, has various properties
that often differ from those of functions that take a real number x as a variable. In
particular, the analytic functions hold a paramount position in the complex analysis.
In this chapter we explore various features of the analytic functions accordingly.
From a practical point of view, the theory of analytic functions is very frequently
utilized for the calculation of real definite integrals. For this reason, we describe the
related topics together with tangible examples.
The complex plane (or Gaussian plane) can be dealt with as a topological space
where the metric (or distance function) is defined. Since the complex plane has a
two-dimensional extension, we can readily imagine and investigate its topological
feature. Set theory allows us to make an axiomatic approach along with the topology.
Therefore, we introduce basic notions and building blocks of the set theory and
topology.

6.1 Set and Topology

A complex number z is usually expressed as

z = x þ iy, ð6:1Þ

where x and y are the real numbers and i is an imaginary unit. Graphically, the
number z is indicated as a point in a complex plane where the real axis is drawn as
abscissa and the imaginary axis is depicted as ordinate (see Fig. 6.1). Since the
complex plane has a two-dimensional extension, we can readily imagine the domain
of variability of z and make a graphical object for it on the complex plane. A disk-
like diagram that is enclosed with a closed curve C is frequently dealt with in the

© Springer Nature Singapore Pte Ltd. 2023 185


S. Hotta, Mathematical Physical Chemistry,
https://doi.org/10.1007/978-981-99-2512-4_6
186 6 Theory of Analytic Functions

Fig. 6.1 Complex plane


where a complex number
z is depicted

0 1

theory of analytic functions. Such a diagram provides a good subject for study of the
set theory and topology. Before developing the theory of complex numbers and
analytic functions, we briefly mention the basic notions as well as notations and
meanings of sets and topology.

6.1.1 Basic Notions and Notations

A set comprises numbers (either real or complex) or mathematical objects, more


generally. The latter include, e.g., vectors, their transformation, etc., as we shall see
various illustrations in Parts III and IV. The set A is described such that

A = f1, 2, 3, ⋯, 10g or B = fx; a < x < b, a, b 2 ℝg, ð6:2Þ

where A contains ten elements in the former case and uncountably infinite elements
are contained in the latter set B. In the latter case (sometimes in the former case as
well), the elements are usually called points.
When we speak of the sets, a universal set [1] is implied in the background. This
set contains all the sets in consideration of the study and is usually clearly defined in
the context of the discussion. For example, with the real analysis the universal set is
ℝ and with the complex analysis it is ℂ (an entire complex plane). In the latter case a
two-dimensional complex plane (see Fig. 6.1) is frequently used as the universal set.
We study various characteristics of sets (or subsets) that are contained in the
universal set and use a Venn diagram to represent them. Figure 6.2a illustrates an
example of Venn diagram that shows a subset A. Naturally, every set contained in the
universal set is its subset. As is often the case, the universal set U is omitted in
Fig. 6.2a accordingly.
6.1 Set and Topology 187

Fig. 6.2 Venn diagrams.


(a) An example of the Venn (a)
diagram. (b) A subset A and
its complement Ac. U shows
the universal set

(b)

To show a (sub)set A we usually depict it with a closed curve. If an element a is


contained in A, we write

a2A

and depict it as a point inside the closed curve. To show that b is not contained in A,
we write

b2
= A:

In that case, we depict b outside the closed curve of Fig. 6.2a. To show that a set
A is contained in another set B, we write

A ⊂ B: ð6:3Þ
188 6 Theory of Analytic Functions

Fig. 6.3 Two different


subsets A and B in a
universal set U. The diagram
shows the sum (A [ B),
intersection (A \ B), and −
differences (A - B and B -
A) ˆ

If (6.3) holds, A is called a subset of B. If, however, the set B is contained in A, B is


a subset of A and we write it as B ⊂ A. When both the relations A ⊂ B and B ⊂ A
simultaneously hold, we describe it as

A = B: ð6:4Þ

Equation (6.4) indicates the equality of A and B as (sub)sets. The simultaneous


establishment of A ⊂ B and B ⊂ A is often used to prove (6.4).
We need to explicitly show a set that has no element. The relevant set is called an
empty set and denoted by ∅. Examples are given as follows:

x; x2 < 0, x 2 ℝ = ∅, fx; x ≠ xg = ∅, x 2
= ∅, etc:

The subset A may have different cardinal numbers (or cardinalities) depending on
the nature of A. To explicitly show this, elements are sometimes indexed as, e.g.,
ai (i = 1, 2, ⋯) or aλ (λ 2 Λ), where Λ can be a finite set or infinite set with different
cardinalities; (6.2) is an example. A complement (or complementary set) of A is
defined as a difference U - A and denoted by Ac(U - A). That is, the set Ac is said
to be a complement of A with respect to U and indicated with a shaded area in
Fig. 6.2b.
Let A and B be two different subsets in U (Fig. 6.3). Figure 6.3 represents a sum
(or union, or union of sets) A [ B, an intersection A \ B, and differences A - B and
B - A. More specifically, A - B is defined as a subset of A to which B is subtracted
from A and referred to as a set difference of A relative to B. We have

A [ B = ðA - BÞ [ ðB - AÞ [ ðA \ BÞ: ð6:5Þ

If A has no element in common with B, A - B = A and B - A = B with A \ B = ∅.


Then, (6.5) trivially holds. Notice also that A [ B is the smallest set that contains both
A and B. In (6.5) the elements must not be doubly counted. Thus, we have
6.1 Set and Topology 189

A [ A = A:

The well-known de Morgan’s law can be understood graphically using Fig. 6.3.

ðA [ BÞc = Ac \ Bc and ðA \ BÞc = Ac [ Bc : ð6:6Þ

This can be shown as follows: With 8u 2 U we have

u 2 ðA [ BÞc ⟺u 2
= A [ B⟺u 2
= A and u 2
=B

⟺u 2 Ac and u 2 Bc ⟺u 2 Ac \ Bc :

Furthermore, we have

u 2 ðA \ BÞc ⟺u 2
= A \ B⟺u 2
= A or u 2
=B

⟺u 2 Ac or u 2 Bc ⟺u 2 Ac [ Bc :

We have other important relations such as the distributive law described by

ðA [ BÞ \ C = ðA \ C Þ [ ðB \ C Þ, ð6:7Þ

ðA \ BÞ [ C = ðA [ C Þ \ ðB [ C Þ: ð6:8Þ

The confirmation of (6.7) and (6.8) is left for readers. The Venn diagrams are
intuitively acceptable, but we need more rigorous discussion to characterize and
analyze various aspects of sets (vide infra).

6.1.2 Topological Spaces and Their Building Blocks

On the basis of the aforementioned discussion, we define a topology and topological


space next. Let T be a universal set. Let τ be a collection of subsets of T. If the
collection τ satisfies the following axioms, τ is called a topology on T. The relevant
axioms are as follows:

ðO1Þ T 2 τ and ∅ 2 τ, where ∅ is an empty set:


190 6 Theory of Analytic Functions

ðO2Þ If O1 , O2 , ⋯, On 2 τ, O1 \ O2 ⋯ \ On 2 τ:

ðO3Þ If Oλ ðλ 2 ΛÞ 2 τ, [λ2Λ Oλ 2 τ:

In the above axioms, T is called an underlying set. The coupled object of T and τ
is said to be a topological space and denoted by (T, τ). The members (i.e., subsets) of
τ are defined as open sets. (If we do not assume the topological space, the underlying
set is equivalent to the universal set. Henceforth, however, we do not have to be too
strict with the difference in this kind of terminology.)
Let (T, τ) be a topological space with a subset O′ ⊂ T. Let τ′ be defined such that

τ0  fO0 \ O; O 2 τg:

Then, (O′, τ′) can be regarded as a topological space. In this case τ′ is called a
relative topology for O′ [2, 3] and O′ is referred to as a subspace of T. The
topological space (O′, τ′) shares the same topological feature as that of (T, τ). Notice
that O′ does not necessarily need to be an open set of T. However, if O′ is an open set
of T, then any open set belonging to O′ is an open set of T as well. This is evident
from the definition of the relative topology and Axiom (O2).
The above-mentioned Axioms (O1) to (O3) sound somewhat pretentious and the
definition of the open sets seems to be “descent from heaven.” Nonetheless, these
axioms and terminology turn out useful soon. Once a topological space (T, τ) is
given, subsets of various types arise and a variety of relationships among them ensue
as well. Note that the above axioms mention nothing about a set that is different from
an open set. Let S be a subset (maybe an open set or maybe not) of T and let us think
of the properties of S.
Definition 6.1 Let (T, τ) be a topological space and S be an open set of τ; i.e., S 2 τ.
Then, a subset defined as a complement of S in (T, τ); namely, Sc = T - S is called a
closed set in (T, τ).
Replacing S with Sc in Definition 6.1, we have (Sc)c = S = T - Sc. The above
definition may be rephrased as follows: Let (T, τ) be a topological space and let A be
a subset of T. Then, if Ac = T - A 2 τ, then A is a closed set in (T, τ).
In parallel with the axioms (O1) to (O3), we have the following axioms related to
the closed sets of (T, τ): Let ~τ be a collection of all the closed sets of (T, τ).

ðC1Þ T 2 ~τ and ∅ 2 ~τ, where ∅ is an empty set:

ðC2Þ If C 1 , C 2 , ⋯, C n 2 ~τ, C1 [ C 2 ⋯ [ C n 2 ~τ:

ðC3Þ If C λ ðλ 2 ΛÞ 2 ~τ, \λ2Λ C λ 2 ~τ:


6.1 Set and Topology 191

These axioms are obvious from those of (O1) to (O3) along with de Morgan’s law
(6.6).
Next, we classify and characterize a variety of elements (or points) and subsets as
well as relationships among them in terms of the open sets and closed sets. We make
the most of these notions and properties to study various aspects of topological
spaces and their structures. In what follows, we assume the presence of a topological
space (T, τ).

(a) Neighborhoods [4]

Definition 6.2 Let (T, τ) be a topological space and S be a subset of T. If S contains


an open set ∃O that contains an element (or point) a 2 T, i.e.,

a 2 O ⊂ S,

then S is called a neighborhood of a.


Figure 6.4 shows a Venn diagram that represents a neighborhood of a. This simple
definition occupies an important position in the study of the topological spaces and
the theory of analytic functions.

(b) Interior and Closure [4]

With the notion of neighborhoods at the core, we shall see further characterization of
sets and elements of the topological spaces.
Definition 6.3 Let x be a point of T and S be a subset of T. If S is a neighborhood of
x, x is said to be in the interior of S. In this case, x is called an interior point of S. The
interior of S is denoted by S∘.
The interior as a technical term can be understood as follows: Suppose that x 2
= S.
Then, by Definition 6.2 S is not a neighborhood of x. By Definition 6.3, this leads to
= S∘. This statement can be translated into x 2 Sc ) x 2 (S∘)c. That is, Sc ⊂ (S∘)c.
x2
This is equivalent to

Fig. 6.4 Venn diagram that


represents a neighborhood
S of a. S contains an open set
O that contains a
192 6 Theory of Analytic Functions

S∘ ⊂ S: ð6:9Þ

Definition 6.4 Let x be a point of T and S be a subset of T. If for any neighborhood


8
N of x we have N \ S ≠ ∅, x is said to be in the closure of S. In this case, x is called
an adherent point of S. The closure of S is denoted by S.
According to the above definition, with the adherent point x of S we explicitly
write x 2 S. Think of negation of the statement of Definition 6.4. That is, with some
neighborhood ∃N of x we have (i) N \ S = ∅ , x 2 = S. Meanwhile,
(ii) N \ S = ∅ ) x 2 = S. Combining (i) and (ii), we have x 2
= S ) x2 = S. Similarly
to the above argument about the interior, we get

S ⊂ S: ð6:10Þ

Combining (6.9) and (6.10), we get

S∘ ⊂ S ⊂ S: ð6:11Þ

In relation to the interior and closure, we have the following important lemmas.
Lemma 6.1 Let S be a subset of T and O be any open set contained in S. Then, we
have O ⊂ S∘.
Proof Suppose that with an arbitrary point x, x 2 O. Since we have x 2 O ⊂ S, S is a
neighborhood of x (due to Definition 6.2). Then, from Definition 6.3 we have x 2 S∘.
Then, we have x 2 O ) x 2 S∘. This implies O ⊂ S∘. This completes the proof.
Lemma 6.2 Let S be a subset of T. If x 2 S∘, then there is some open set ∃O that
satisfies x 2 O ⊂ S.
Proof Suppose that with a point x, x 2 S∘. Then by Definition 6.3, S is a neighbor-
hood of x. Meanwhile, by Definition 6.2 there is some open set ∃O that satisfies
x 2 O ⊂ S. This completes the proof.
Lemmas 6.1 and 6.2 teach us further implications. From Lemma 6.1 and (6.11)
we have

O ⊂ S∘ ⊂ S ð6:12Þ

with an arbitrary open set 8O contained in S∘. From Lemma 6.2 we have
x 2 S∘ ) x 2 O; i.e., S∘ ⊂ O with some open set ∃O containing S∘. Expressing
~ we have S∘ ⊂ O.
this specific O as O, ~ Meanwhile, using Lemma 6.1 once again we
must get

~ ⊂ S∘ ⊂ S:
O ð6:13Þ
6.1 Set and Topology 193

(a) (b)

° ̅

Fig. 6.5 (a) Largest open set S∘ contained in S. O denotes an open set. (b) Smallest closed set S
containing S. C denotes a closed set

That is, we have S∘ ⊂ O~ and O~ ⊂ S∘ at once. This implies that with this specific
~ ~ ∘
open set O, we get O = S . That is,

~ = S∘ ⊂ S:
O ð6:14Þ

Relation (6.14) obviously shows that S∘ is an open set and moreover that S∘ is the
largest open set contained in S. Figure 6.5a depicts this relation. In this context we
have a following important theorem.
Theorem 6.1 Let S be a subset of T. The subset S is open if and only if S = S∘.
Proof From (6.14) S∘ is open. Therefore, if S = S∘, then S is open. Conversely,
suppose that S is open. In that event S itself is an open set contained in S. Meanwhile,
S∘ has been characterized as the largest open set contained in S. Consequently, we
have S ⊂ S∘. At the same time, from (6.11) we have S∘ ⊂ S. Thus, we get S = S∘.
This completes the proof.
In contrast to Lemmas 6.1 and 6.2, we have the following lemmas with respect to
closures.
Lemma 6.3 Let S be a subset of T and C be any closed set containing S. Then, we
have S ⊂ C.
Proof Since C is a closed set, Cc is an open set. Suppose that with a point x, x 2
= C.
Then, x 2 Cc. Since C ⊃ S, Cc ⊂ Sc. This implies Cc \ S = ∅. As x 2 Cc ⊂ Cc, by
Definition 6.2 Cc is a neighborhood of x. Then, by Definition 6.4 x is not in S; i.e.,
c c
x2= S. This statement can be translated into x 2 C c ) x 2 S ; i.e., C c ⊂ S . That
is, we get S ⊂ C.
Lemma 6.4 Let S be a subset of T and suppose that a point x does not belong to S;
i.e., x 2 = C for some closed set ∃C containing S.
= S. Then, x 2
= S, then by Definition 6.4 we have some neighborhood ∃N and some
Proof If x 2

open set O contained in that N such that x 2 O ⊂ N and N \ S = ∅. Therefore, we
194 6 Theory of Analytic Functions

must have O \ S = ∅. Let C = Oc. Then C is a closed set with C ⊃ S. Since


x 2 O, x 2
= C. This completes the proof.
From Lemmas 6.3 and 6.4 we obtain further implications. From Lemma 6.3 and
(6.11) we have

S⊂S⊂C ð6:15Þ

with an arbitrary closed set 8C containing S. From Lemma 6.4 we have x 2


~
S ) x 2 C c ; i.e., C ⊂ S with ∃C containing S. Expressing this specific C as C,
c

~
we have C ⊂ S. Meanwhile, using Lemma 6.3 once again we must get

~
S ⊂ S ⊂ C: ð6:16Þ

~ ⊂ S and S ⊂ C
That is, we have C ~ at once. This implies that with this specific
~ we get C
closed set C, ~ = S. Hence, we get

~
S ⊂ S = C: ð6:17Þ

Relation (6.17) obviously shows that S is a closed set and moreover that S is the
smallest closed set containing S. Figure 6.5b depicts this relation. In this context we
have a following important theorem.
Theorem 6.2 Let S be a subset of T. The subset S is closed if and only if S = S.
Proof The proof can be done analogously to that of Theorem 6.1. From (6.17) S is
closed. Therefore, if S = S, then S is closed. Conversely, suppose that S is closed. In
that event S itself is a closed set containing S. Meanwhile, S has been characterized as
the smallest closed set containing S. Then, we have S ⊃ S. At the same time, from
(6.11) we have S ⊃ S. Thus, we get S = S. This completes the proof.

(c) Boundary [4]

Definition 6.5 Let (T, τ) be a topological space and S be a subset of T. If a point


x 2 T is in both the closure of S and the closure of the complement of S, i.e., Sc , x is
said to be in the boundary of S. In this case, x is called a boundary point of S. The
boundary of S is denoted by Sb.
By this definition Sb can be expressed as

Sb = S \ Sc : ð6:18Þ

Replacing S with Sc, we get


6.1 Set and Topology 195

ðSc Þb = Sc \ ðSc Þc = Sc \ S: ð6:19Þ

Thus, we have

Sb = ð S c Þ b : ð6:20Þ

This means that S and Sc have the same boundary. Since both S and Sc are closed
sets and the boundary is defined as their intersection, from Axiom (C3) the boundary
is a closed set. From Definition 6.1 a closed set is defined as a complement of an
open set, and vice versa. Thus, the open set and closed set do not make a confron-
tation concept but a complementation concept. In this respect we have the following
lemma and theorem.
Lemma 6.5 Let S be an arbitrary subset of T. Then, we have

S c = ð S∘ Þ c :

Proof Since S∘ ⊂ S, (S∘)c ⊃ Sc. As S∘ is open, (S∘)c is closed. Hence, ðS∘ Þc ⊃ Sc ,


because Sc is the smallest closed set containing Sc. Next let C be an arbitrary closed
set that contains Sc. Then Cc is an open set that is contained in S, and so we have
Cc ⊂ S∘, because S∘ is the largest open set contained in S. Therefore, C ⊃ (S∘)c.
Then, if we choose Sc for C, we must have Sc ⊃ (S∘)c. Combining this relation with
ðS∘ Þc ⊃ Sc obtained above, we get Sc = ðS∘ Þc .
Theorem 6.3[4] A necessary and sufficient condition for S to be both open and
closed at once is Sb = ∅.
Proof
1. Necessary condition: Let S be open and closed. Then, S = S and S = S∘. Suppose
that Sb ≠ ∅. Then, from Definition 6.5 we would have ∃ a 2 S \ Sc for some a.
Meanwhile, from Lemma 6.5 we have Sc = ðS∘ Þc . This leads to a 2 S \ ðS∘ Þc .
However, from the assumption we would have a 2 S \ Sc. We have no such a,
however. Thus, using the proof by contradiction we must have Sb = ∅.
2. Sufficient condition: Suppose that Sb = ∅. Starting with S ⊃ S∘ and S ≠ S∘, let us
assume that there is a such that a 2 S and a 2 = S∘. That is, we have
∘ c ∘ c ∘ c
a 2 ðS Þ ) a 2 S \ ðS Þ ⊂ S \ ðS Þ = S \ S = S , in contradiction to the sup-
c b

position. Thus, we must not have such a, implying that S = S∘. Next, starting with
S ⊃ S and S ≠ S, we assume that there is a such that a 2 S and a 2 = S. That is, we
have a 2 Sc ) a 2 S \ Sc ⊂ S \ Sc = Sb , in contradiction to the supposition.
Thus, we must not have such a, implying that S = S. Combining this relation
with S = S∘ obtained above, we get S = S = S∘ . In other words, S is both open and
closed at once. These complete the proof.
196 6 Theory of Analytic Functions

The above-mentioned set is sometimes referred to as a closed–open set or a


clopen set as a portmanteau word. An interesting example can be seen in Chap. 20
in relation to topological groups (or continuous groups).

(d) Accumulation Points and Isolated Points

We have another important class of elements and sets that include accumulation
points and isolated points.
Definition 6.6 Let (T, τ) be a topological space and S be a subset of T. Suppose that
we have a point p 2 T. If for any neighborhood N of p we have N \ (S - {p}) ≠ ∅,
then p is called an accumulation point of S. A set comprising all the accumulation
points of S is said to be a derived set of S and denoted by Sd.
Note that from Definition 6.4 we have

N \ ðS - fpgÞ ≠ ∅ , p 2 S - fpg, ð6:21Þ

where N is an arbitrary neighborhood of p.


Definition 6.7 Let S be a subset of T. Suppose that we have a point p 2 S. If for
some neighborhood N of p we have N \ (S - {p}) = ∅, then p is called an isolated
point of S. A set comprising only isolated points of S is said to be a discrete set
(or subset) of S and we denote it by Sdsc.
Note that if p is an isolated point, N \ (S - {p}) = N \ S - N \ {p} = N \ S -
{p} = ∅. That is, {p} = N \ S. Therefore, any isolated points of S are contained in S.
Contrary to this, the accumulation points are not necessarily contained in S.
Comparing Definitions 6.6 and 6.7, we notice that the latter definition is obtained
by negation of the statement of the former definition. This implies that any points of
a set are classified into mutually exclusive two alternatives, i.e., the accumulation
points or isolated points.
We divide Sd into a direct sum of two sets such that

S d = ð S [ S c Þ \ S d = Sd \ S [ S d \ S c , ð6:22Þ

where with the second equality we used (6.7).


Now, we define

þ -
Sd  Sd \ S and Sd  Sd \ S c :

Then, (6.22) means that Sd is divided into two sets consisting of the accumulation
points, i.e., (Sd)+ that belongs to S and (Sd)- that does not belong to S. Thus, as
another direct sum we have
6.1 Set and Topology 197

þ
S = Sd \ S [ Sdsc = Sd [ Sdsc : ð6:23Þ

Equation (6.23) represents the direct sum consisting of a part of the derived set
and the whole discrete set. Here we have the following theorem.
Theorem 6.4 Let S be a subset of T. Then we have a following equation:

S = Sd [ Sdsc Sd \ Sdsc = ∅ : ð6:24Þ

Proof Let us assume that p is an accumulation point of S. Then, we have


p 2 S - fpg ⊂ S. Meanwhile, Sdsc ⊂ S ⊂ S. This implies that S contains both Sd
and Sdsc. That is, we have

S ⊃ Sd [ Sdsc :

It is natural to ask how to deal with other points that do not belong to Sd [ Sdsc.
Taking account of the above discussion on the accumulation points and isolated
points, however, such remainder points should again be classified into accumulation
and isolated points. Since all the isolated points are contained in S, they can be taken
into S.
Suppose that among the accumulation points, some points p are not contained in
S; i.e., p 2
= S. In that case, we have S - {p} = S, namely S - fpg = S. Thus,
p 2 S - fpg , p 2 S. From (6.21), in turn, this implies that in case p is not
contained in S, for p to be an accumulation point of S is equivalent to that p is an
adherent point of S. Then, those points p can be taken into S as well. Thus, finally, we
get (6.24). From Definitions 6.6 and 6.7, obviously we have Sd \ Sdsc = ∅. This
completes the proof.
Rewriting (6.24), we get

- þ -
S = Sd [ Sd [ Sdsc = Sd [ S: ð6:25Þ

We have other important theorems.


Theorem 6.5 Let S be a subset of T. Then we have a following relation:

S = Sd [ S: ð6:26Þ

Proof Let us assume that p 2 S. Then, we have the following two cases: (i) if p 2 S,
we have trivially p 2 S ⊂ Sd [ S. (ii) Suppose p 2
= S. Since p 2 S, from Definition 6.4
any neighborhood of p contains a point of S (other than p). Then, from Definition
6.6, p is an accumulation point of S. Thus, we have p 2 Sd ⊂ Sd [ S and, hence, get
S ⊂ Sd [ S. Conversely, suppose that p 2 Sd [ S. Then, (i) if p 2 S, obviously we
have p 2 S. (ii) Suppose p 2 Sd. Then, from Definition 6.6 any neighborhood of
198 6 Theory of Analytic Functions

Fig. 6.6 Relationship


among S, S, Sd, and Sdsc.
̅
Regarding the symbols and
notations, see text

( )

( )

̅ = ∪ = ∪ ,
= ( ) ∪( )

p contains a point of S (other than p). Thus, from Definition 6.4, p 2 S. Taking into
account the above two cases, we get Sd [ S ⊂ S. Combining this with S ⊂ Sd [ S
obtained above, we get (6.26). This completes the proof.
Theorem 6.6 Let S be a subset of T. A necessary and sufficient condition for S be a
closed set is that S contains all the accumulation points of S.
Proof From Theorem 6.2, that S is a closed set is equivalent to S = S. From
Theorem 6.5 this is equivalent to S = Sd [ S. Obviously, this is equivalent to
Sd ⊂ S. In other words, S contains all the accumulation points of S. This completes
the proof.
As another proof, taking account of (6.25), we have
-
Sd = ∅ ⟺ S = S ⟺S is a closed set ðfrom Theorem 6:2Þ :

This relation is equivalent to the statement that S contains all the accumulation
points of S.
In Fig. 6.6 we show the relationship among S, S, Sd, and Sdsc. From Fig. 6.6, we
can readily show that

Sdsc = S - Sd = S - Sd :

We have a further example of direct sum decomposition. Using Lemma 6.5, we


have

S b = S \ Sc = S \ ð S ∘ Þ c = S - S∘ : ð6:27Þ

Since S ⊃ S∘ , we get
6.1 Set and Topology 199

S = S∘ [ Sb ; S∘ \ Sb = ∅:

Notice that (6.27) is a succinct expression of Theorem 6.3. That is,

Sb = ∅ ⟺ S = S∘ ⟺ S is a clopen set:

(e) Connectedness

In the precedent several paragraphs we studied various building blocks and their
properties of the topological space and sets contained in the space. In this paragraph,
we briefly mention the relationship between the sets. The connectedness is an
important concept in the topology. The concept of the connectedness is associated
with the subspaces defined earlier in this section. In terms of the connectedness, a
topological space can be categorized into two main types; a subspace of the one type
is connected and that of the other is disconnected.
Let A be a subspace of T. Intuitively, for A to be connected is defined as follows:
Let z1 and z2 be any two points of A. If z1 and z2 can be joined by a continuous line
that is contained entirely within A (see Fig. 6.7a), then A is said to be connected. Of
the connected subspaces, suppose that the subspace A has the property that all the
points inside any closed path (i.e., continuous closed line) C within A contain only
the points that belong to A (Fig. 6.7b). Then, that subspace A is said to be simply
connected. A subspace that is not simply connected but connected is said to be
multiply connected. A typical example for the latter is a torus. If a subspace is given
as a union of two (or more) disjoint non-empty (open) sets, that subspace is said to be
disconnected. Notice that if two (or more) sets have no element in common, those
sets are said to be disjoint. Fig. 6.7c shows an example for which a disconnected
subspace A is a union of two disjoint sets B and C.
Interesting examples can be seen in Chap. 20 in relation to the connectedness.

6.1.3 T1-Space

We can “enter” a variety of topologies τ into a set T to get a topological space (T, τ).
We have two extremes of them. One is an indiscrete topology (or trivial topology)
and the other is a discrete topology [2]. For the former the topology is characterized
as

τ = f∅, T g ð6:28Þ

and the latter case is


200 6 Theory of Analytic Functions

(a) (b)

(c)

= ∪

Fig. 6.7 Connectedness of subspaces. (a) Connected subspace A. Any two points z1 and z2 of A can
be connected by a continuous line that is contained entirely within A. (b) Simply connected
subspace A. All the points inside any closed path C within A contain only the points that belong
to A. (c) Disconnected subspace A that comprises two disjoint sets B and C

τ = f∅, all the subsets of T, T g: ð6:29Þ

With τ all the subsets are open and complements to the individual subsets are
closed by Definition 6.1. However, such closed subsets are contained in τ, and so
they are again open. Thus, all the subsets are clopen sets. Practically speaking, the
aforementioned two extremes are of less interest and value. For this reason, we need
some moderate separation conditions with topological spaces. These conditions are
well-established as the separation axioms. We will not get into details about the
discussion [2], but we mention the first separation axiom (Fréchet axiom) that
produces the T1-space.
Definition 6.8 Let (T, τ) be a topological space. Suppose that with respect to
8
x, y (x ≠ y) 2 T there is a neighborhood N in such a way that

x 2 N and y 2
= N:

The topological space that satisfies the above separation condition is called a
T1-space.
We mention two important theorems of the T1-space.
6.1 Set and Topology 201

Theorem 6.7 A necessary and sufficient condition for (T, τ) to be a T1-space is that
for each point x 2 T, {x} is a closed set of T.
Proof
1. Necessary condition: Let (T, τ) be a T1-space. Choose any pair of elements x, y
(x ≠ y) 2 T. Then, from Definition 6.8 there is a neighborhood ∃N of y (≠x) that
does not contain x. Then, we have y 2 N and N \ {x} = ∅. From Definition 6.4
this implies that y 2 = fxg. Thus, we have y ≠ x ) y 2 = fxg. This means that
y 2 fxg ) y = x. Namely, fxg ⊂ fxg, but from (6.11) we have fxg ⊂ fxg.
Thus, fxg = fxg. From Theorem 6.2 this shows that {x} is a closed set of T.
2. Sufficient condition: Suppose that {x} is a closed set of T. Choose any x and
y such that y ≠ x. Since {x} is a closed set, N = T - {x} is an open set that contains
y; i.e., y 2 T - {x} and x 2
= T - {x}. Since y 2 T - {x} ⊂ T - {x} where T - {x}
is an open set, from Definition 6.2 N = T - {x} is a neighborhood of y, and
moreover N does not contain x. Then, from Definition 6.8 (T, τ) is a T1-space.
These complete the proof.
A set comprising a unit element is called a singleton or a unit set.
Theorem 6.8 Let S be a subset of a T1-space. Then Sd is a closed set in the T1-space.

Proof Let p be a point of S. Suppose p 2 Sd . Suppose, furthermore, p 2


= Sd. Then,

from Definition 6.6 there is some neighborhood N of p such that N \ (S - {p}) = ∅.
Since p 2 Sd , from Definition 6.4 we must have N \ Sd ≠ ∅ at once for this special
neighborhood N of p. Then, as we have p 2 = Sd, we can take a point q such that
q 2 N \ S and q ≠ p. Meanwhile, we may choose an open set N for the above
d

neighborhood of p (i.e., an open neighborhood), because p 2 N (=N∘) ⊂ N. Since


N \ (T - {p}) = N - {p} is an open set from Theorem 6.7 as well as Axiom
(O2) and Definition 6.1, N - {p} is an open neighborhood of q which does not
contain p. Namely, we have q 2 N - {p}[=(N - {p})∘] ⊂ N - {p}. Note that
q 2 N - {p} and p 2 = N - {p}, in agreement with Definition 6.8. As q 2 Sd, again
from Definition 6.6 we have

ðN - fpgÞ \ ðS - fqgÞ ≠ ∅: ð6:30Þ

This implies that N \ (S - {q}) ≠ ∅ as well. However, it is in contradiction to the


original relation of N \ (S - {p}) = ∅, which is obtained from p 2
= Sd. Then, we must
have p 2 Sd. Thus, from the original supposition we get Sd ⊂ Sd . Meanwhile, we
have Sd ⊃ Sd from (6.11). We have Sd = Sd accordingly. From Theorem 6.2, this
means that Sd is a closed set. Hence, we have proven the theorem.
The above two theorems are intuitively acceptable. For, in a one-dimensional
Euclidean space ℝ we express a singleton {x} as [x, x] and a closed interval as [x,
y] (x ≠ y). With the latter, [x, y] might well be expressed as (x, y)d. Such subsets are
well-known as closed sets. According to the separation axioms, various topological
202 6 Theory of Analytic Functions

spaces can be obtained by imposing stronger constraints upon the T1-space [2]. A
typical example for this is a metric space. In this context, the T1-space has acquired
primitive notions of metric that can be shared with the metric space. Definitions of
the metric (or distance function) and metric space will briefly be summarized in
Chap. 13 in reference to an inner product space.
In the above discussion, we have described a brief outline of the set theory and
topology. We use the results in this chapter and in Part IV as well.

6.1.4 Complex Numbers and Complex Plane

Returning to (6.1), we further investigate how to represent a complex number. In


Fig. 6.8 we redraw a complex plane and a complex number on it. The complex plane
is a natural extension of a real two-dimensional orthogonal coordinate plane (Car-
tesian coordinate plane) that graphically represents a pair of real numbers x and y as
(x, y). In the complex plane we represent a complex number as

z = x þ iy ð6:1Þ

by designating x as an abscissa on the real axis and y as an ordinate on the imaginary


axis. The two axes are orthogonal to each other.
An absolute value (or modulus) of z is defined as a real non-negative number

j zj = x 2 þ y2 : ð6:31Þ

Analogously to a real two-dimensional polar coordinate, we can introduce in the


complex plane a non-negative radial coordinate r and an angular coordinate θ

Fig. 6.8 Complex plane


where z is expressed in a
polar form using a
non-negative radial
coordinate r and an angular
coordinate θ

i
x
0 1
6.1 Set and Topology 203

(Fig. 6.8). In the complex analysis the angular coordinate is called an argument and
denoted by arg z so as to be

argz = θ þ 2πn; n = 0, ± 1, ± 2, ⋯:

In Fig. 6.8, x and y are given by

x = r cos θ and y = r sin θ:

Thus, we have a polar form of z expressed as

z = r ðcos θ þ i sin θÞ: ð6:32Þ

Using the well-known Euler’s formula (or Euler’s identity)

eiθ = cos θ þ i sin θ, ð6:33Þ

we get

z = reiθ : ð6:34Þ

From (6.32) and (6.34), we have

zn = r n einθ = r n ðcos θ þ i sin θÞn = r n ðcos nθ þ i sin nθÞ, ð6:35Þ

where the last equality comes from replacing θ with nθ in (6.33). Comparing both
sides with the last equality of (6.35), we have

ðcos θ þ i sin θÞn = cos nθ þ i sin nθ: ð6:36Þ

Equation (6.36) is called the de Moivre’s theorem. This relation holds with
n being integers including zero along with rational numbers including negative
numbers. Euler’s formula (6.33) immediately leads to the following important
formulae:

1 iθ 1 iθ
cos θ = e þ e - iθ , sin θ = e - e - iθ : ð6:37Þ
2 2i

Notice that although in (6.32) and (6.33) we assumed that θ is a real number,
(6.33) and (6.37) hold with any complex number θ. Note also that (6.33) results from
(6.37) and the following definition of power series expansion of the exponential
function
204 6 Theory of Analytic Functions

1 zk
ez  k = 0 k!
, ð6:38Þ

where z is any complex number [5]. In fact, from (6.37) and (6.38) we have the
following familiar expressions of power series expansion of the cosine and sine
functions

1 z2k 1 z2kþ1
cos z = ð- 1Þk , sin z = ð- 1Þk :
k=0 ð2kÞ! k=0 ð2k þ 1Þ!

These expressions frequently appear in this chapter.


Next, let us think of a pair of complex numbers z and w with w expressed as
w = u + iv. Then, we have

z - w = ðx - uÞ þ iðy - vÞ:

Using (6.31), we get

jz - wj = ð x - uÞ 2 þ ð y - v Þ 2 : ð6:39Þ

Equation (6.39) represents the “distance” between z and w. If we define a


following function ρ(z, w) such that

ρðz, wÞ  jz - wj,

ρ(z, w) defines a metric (or distance function); see Chap. 13. Thus, the metric gives a
distance between any arbitrarily chosen pair of elements z, w 2 ℂ and (ℂ, ρ) can be
dealt with as a metric space. This makes it easy to view the complex plane as a
topological space. Discussions about the set theory and topology already developed
earlier in this chapter basically hold.
In the theory of analytic functions, a subset A depicted in Fig. 6.7 can be regarded
as a part of the complex plane. Since the complex plane ℂ itself represents a
topological space, the subset A may be viewed as a subspace of ℂ. A complex
function f(z) of a complex variable z of (6.1) is usually defined in a connected open
set called a region [5]. The said region is of practical use among various subspaces
and can be an entire domain of ℂ or a subset of ℂ. The connectedness of the region
can be considered in a manner similar to that described in Sect. 6.1.2.
6.2 Analytic Functions of a Complex Variable 205

6.2 Analytic Functions of a Complex Variable

Since the complex number and complex plane have been well-characterized in the
previous section, we deal with the complex functions of a complex variable in the
complex plane.
Taking the complex conjugate of (6.1), we have

z = x - iy: ð6:40Þ

Combining (6.1) and (6.40) in a matrix form, we get

ð z z Þ = ð x y Þ 1
i
1
-i
: ð6:41Þ

1 1
The matrix i -i
is non-singular, and so the inverse matrix exists (see Sect.
11.3) and (x y) can be described as

1 -i
ð x yÞ = ð z z  Þ 1
ð6:42Þ
2 1 i

or with individual components we have

1 i
x= ðz þ z Þ and y = - ðz - z Þ, ð6:43Þ
2 2

which can be understood as the “vector synthesis” (see Fig. 6.9).


Equations (6.41) and (6.42) imply that two sets of variables (x y) and (z z) can
be dealt with on an equal basis. According to the custom, however, we prioritize the
notation using (x y) over that of (z z). Hence, the complex function f is described as

f ðx, yÞ = uðx, yÞ þ ivðx, yÞ, ð6:44Þ

where both u(x, y) and v(x, y) are the real functions of real variables x and y.
Here, let us pause for a second to consider the differentiation of a function.
Suppose that a mathematical function is defined on a real domain (i.e., a real number
line). Consider whether that function is differentiable at a certain point x0 of the real
number line. On this occasion, we can approach x0 only from two directions, i.e.,
from the side of x < x0 (from the left) or from the side of x0 < x (from the right); see
Fig. 6.10a. Meanwhile, suppose that a mathematical function is defined on a
complex domain (i.e., a complex plane). Also consider whether the function is
differentiable at a certain point z0 of the complex plane. In this case, we can approach
z0 from continuously varying directions; see Fig. 6.10b where only four directions
are depicted.
In this context let us think of a simple example.
206 6 Theory of Analytic Functions





∗ ∗
− − +
0



− +

Fig. 6.9 Vector synthesis of z and z

Example 6.1 Let f(x, y) be a function described by

f ðx, yÞ = 2x þ iy = z þ x: ð6:45Þ

As remarked above, substituting (6.43) for (6.45), we could have

1 i 1
hðz, z Þ = 2  ðz þ z Þ þ i - ðz - z Þ = z þ z þ ðz - z Þ
2 2 2

1
= ð3z þ z Þ,
2

where f(x, y) and h(z, z) denote different functional forms. The derivative (6.45)
varies depending on a way z0 is approached. For instance, think of

df f ð0 þ zÞ - f ð0Þ 2x þ iy 2x2 þ y2 - ixy


= lim = lim = lim :
dz 0 z → 0 z x → 0, y → 0 x þ iy x → 0, y → 0 x2 þ y2

Suppose that the differentiation is taken along a straight line in a complex plane
represented by iy = (ik)x (k, x, y : real). Then, we have
6.2 Analytic Functions of a Complex Variable 207

Fig. 6.10 Ways a number


(real or complex) is
(a)
approached. (a) Two ways a
number x0 is approached in a
real number line.
(b) Various ways a number x
z0 is approached in a
complex plane. Here, only 0
four ways are indicated

(b)
z

0 1

df 2x2 þ k2 x2 - ikx2 2 þ k 2 - ik
= lim = lim
dz 0 x → 0, y → 0 x2 þ k x2
2 x → 0, y → 0 1 þ k2
2 þ k2 - ik
= : ð6:46Þ
1 þ k2
df
However, this means that dz 0 takes varying values depending upon k. Namely,
df
dz 0cannot uniquely be defined but depends on different ways to approach the origin
of the complex plane. Thus, we find that the derivative takes different values
depending on straight lines along which the differentiation is taken. This means
that f(x, y) is not differentiable or analytic at z = 0.
Meanwhile, think of g(z) expressed as

gðzÞ = x þ iy = z: ð6:47Þ

In this case, we get


208 6 Theory of Analytic Functions

dg gð0 þ zÞ - gð0Þ z-0


= lim = lim = 1:
dz 0 z → 0 z z→0 z

As a result, the derivative takes the same value 1, regardless of what straight lines
the differentiation is taken along.
Though simple, the above example gives us a heuristic method. As before, let f(z)
be a function described as (6.44) such that

f ðzÞ = f ðx, yÞ = uðx, yÞ þ ivðx, yÞ, ð6:48Þ

where both u(x, y) and v(x, y) possess the first-order partial derivatives with respect to
x and y. Then we have a derivative df(z)/dz such that

df ðzÞ f ðz þ ΔzÞ - f ðzÞ


= lim
dz Δz → 0 Δz

uðx þ Δx, y þ ΔyÞ - uðx, yÞ þ i½vðx þ Δx, y þ ΔyÞ - vðx, yÞ


= lim :
Δx → 0, Δy → 0 Δx þ iΔy
ð6:49Þ

We wish to seek the condition that dfdzðzÞ gives the same result regardless of the
order of taking the limit of Δx → 0 andΔy → 0. That is, taking the limit of Δy → 0
first, we have

df ðzÞ uðx þ Δx, yÞ - uðx, yÞ þ i½vðx þ Δx, yÞ - vðx, yÞ ∂uðx, yÞ


= lim =
dz Δx → 0 Δx ∂x
∂vðx, yÞ
þi : ð6:50Þ
∂x

Next, taking the limit of Δx → 0 first, we get

df ðzÞ uðx, y þ ΔyÞ - uðx, yÞ þ i½vðx, y þ ΔyÞ - vðx, yÞ


= lim =
dz Δy → 0 iΔy
∂uðx, yÞ ∂vðx, yÞ
-i þ : ð6:51Þ
∂y ∂y

Consequently, by equating the real and imaginary parts of (6.50) and (6.51) we
must have
6.2 Analytic Functions of a Complex Variable 209

∂uðx, yÞ ∂vðx, yÞ
= , ð6:52Þ
∂x ∂y

∂vðx, yÞ ∂uðx, yÞ
=- : ð6:53Þ
∂x ∂y

These relationships (6.52) and (6.53) are called the Cauchy-Riemann conditions.
Differentiating (6.52) with respect to x and (6.53) with respect to y and further
subtracting one from the other, we get

2 2
∂ uðx, yÞ ∂ uðx, yÞ
2
þ 2
= 0: ð6:54Þ
∂x ∂y

Similarly we have

2 2
∂ vðx, yÞ ∂ vðx, yÞ
2
þ 2
= 0: ð6:55Þ
∂x ∂y

From the above discussion, we draw several important implications.


1. From (6.50), we have

df ðzÞ ∂f
= : ð6:56Þ
dz ∂x

2. Also from (6.51) we obtain

df ðzÞ ∂f
= -i : ð6:57Þ
dz ∂y

Meanwhile, we have

∂ ∂z ∂ ∂z ∂ ∂ ∂
= þ = þ : ð6:58Þ
∂x ∂x ∂z ∂x ∂z ∂z ∂z

Also, we get

∂ ∂z ∂ ∂z ∂ ∂ ∂
= þ =i - : ð6:59Þ
∂y ∂y ∂z ∂y ∂z ∂z ∂z

As in the case of Example 6.1, we rewrite f(z) as


210 6 Theory of Analytic Functions

f ðzÞ  F ðz, z Þ, ð6:60Þ

where the change in the functional form is due to the variables transformation.
Hence, from (6.56) and using (6.58) we have

df ðzÞ ∂F ðz, z Þ ∂ ∂ ∂F ðz, z Þ ∂F ðz, z Þ


= = þ  F ðz, z Þ = þ : ð6:61Þ
dz ∂x ∂z ∂z ∂z ∂z

Similarly, we get

df ðzÞ ∂F ðz, z Þ ∂ ∂ ∂F ðz, z Þ ∂F ðz, z Þ


= -i = -  F ðz, z Þ = - : ð6:62Þ
dz ∂y ∂z ∂z ∂z ∂z

Equating (6.61) and (6.62), we obtain

∂F ðz, z Þ
= 0: ð6:63Þ
∂z

This clearly shows that if F(z, z) is differentiable with respect to z, F(z, z) does
not depend on z, but only depends upon z. Thus, we may write the differentiable
function as

F ðz, z Þ  f ðzÞ:

This is an outstanding characteristic of the complex function that is differentiable


with respect to the complex variable z.
Taking the contraposition of the above statement, we say that if F(z, z) depends
on z, it is not differentiable with respect to z. Example 6.1 is one of the illustrations.
On the other way around, we consider what can happen if the Cauchy-Riemann
conditions are satisfied [5]. Let f(z) be a complex function described by

f ðzÞ = uðx, yÞ þ ivðx, yÞ, ð6:48Þ

where u(x, y) and v(x, y) satisfy the Cauchy-Riemann conditions and possess con-
tinuous first-order partial derivatives with respect to x and y in some region of ℂ.
Then, we have

∂uðx, yÞ ∂uðx, yÞ
uðx þ Δx, y þ ΔyÞ - uðx, yÞ = Δx þ Δy þ ε1 Δx þ δ1 Δy, ð6:64Þ
∂x ∂y
6.2 Analytic Functions of a Complex Variable 211

∂vðx, yÞ ∂vðx, yÞ
vðx þ Δx, y þ ΔyÞ - vðx, yÞ = Δx þ Δy þ ε2 Δx þ δ2 Δy, ð6:65Þ
∂x ∂y

where four positive numbers ε1, ε2, δ1, and δ2 can be made arbitrarily small as Δx and
Δy tend to be zero. The relations (6.64) and (6.65) result from the continuity of the
first-order partial derivatives of u(x, y) and v(x, y).
Then, we have

f ðz þ ΔzÞ - f ðzÞ
Δz

uðx þ Δx, y þ ΔyÞ - uðx, yÞ i½vðx þ Δx, y þ ΔyÞ - vðx, yÞ


= þ
Δz Δz

Δx ∂uðx, yÞ ∂vðx, yÞ Δy ∂uðx, yÞ ∂vðx, yÞ Δx Δy


= þi þ þi þ ðε þ iε2 Þ þ
Δz ∂x ∂x Δz ∂y ∂y Δz 1 Δz
 ðδ1 þ iδ2 Þ

Δx ∂uðx, yÞ ∂vðx, yÞ Δy ∂vðx, yÞ ∂uðx, yÞ Δx


= þi þ - þi þ ðε þ iε2 Þ
Δz ∂x ∂x Δz ∂x ∂x Δz 1
Δy
þ ðδ þ iδ2 Þ
Δz 1

Δx þ iΔy ∂uðx, yÞ ðΔx þ iΔyÞ ∂vðx, yÞ Δx Δy


=  þ i þ ðε þ iε2 Þ þ ðδ þ iδ2 Þ
Δz ∂x Δz ∂x Δz 1 Δz 1

∂uðx, yÞ ∂vðx, yÞ Δx Δy
= þi þ ðε þ iε2 Þ þ ðδ þ iδ2 Þ, ð6:66Þ
∂x ∂x Δz 1 Δz 1

where with the third equality we used the Cauchy-Riemann conditions (6.52) and
(6.53). Accordingly, we have

f ðz þ ΔzÞ - f ðzÞ ∂uðx, yÞ ∂vðx, yÞ Δx Δy


- þi = ðε þ iε2 Þ þ
Δz ∂x ∂x Δz 1 Δz
 ðδ1 þ iδ2 Þ: ð6:67Þ

Taking the absolute values of both sides of (6.67), we get


212 6 Theory of Analytic Functions

f ðz þ ΔzÞ - f ðzÞ ∂uðx, yÞ ∂vðx, yÞ Δx


- þi ≤ ðε þ iε2 Þ
Δz ∂x ∂x Δz 1
Δy
þ ðδ þ iδ2 Þ , ð6:68Þ
Δz 1
jΔxj Δy jΔyj
where Δx
Δz =p ≤ 1 and Δz =p ≤ 1. Taking the limit of
ðΔxÞ2 þðΔyÞ2 ðΔxÞ2 þðΔyÞ2
Δz → 0, by assumption both the terms of RHS of (6.68) approach zero. Thus,

f ðz þ ΔzÞ - f ðzÞ ∂uðx, yÞ ∂vðx, yÞ ∂f


lim = þi = : ð6:69Þ
Δz → 0 Δz ∂x ∂x ∂x

Alternatively, using partial derivatives of u(x, y) and v(x, y) with respect to y we


can rewrite (6.69) as

f ðz þ ΔzÞ - f ðzÞ ∂uðx, yÞ ∂vðx, yÞ ∂f


lim = -i þi = -i : ð6:70Þ
Δz → 0 Δz ∂y ∂y ∂y

Notice that (6.69) and (6.70) are equivalent to the Cauchy-Riemann conditions.
From (6.56) and (6.57), the relations (6.69) and (6.70) imply that for f(z) to be
differentiable requires that the first-order partial derivatives of u(x, y) and v(x, y) with
respect to x and y should exist. Meanwhile, once f(z) is found to be differentiable
(or analytic), its higher order derivatives must be analytic as well (vide infra). This
requires, in turn, that the above first-order partial derivatives should be continuous.
(Note that the analyticity naturally leads to continuity.) Thus, the following theorem
will follow.
Theorem 6.9 Let f(z) be a complex function of complex variables z such that
f(z) = u(x, y) + iv(x, y), where x and y are the real variables. Then, a necessary and
sufficient condition for f(z) to be differentiable is that the first-order partial deriva-
tives of u(x, y) and v(x, y) with respect to x and y exist and are continuous and that
u(x, y) and v(x, y) satisfy the Cauchy-Riemann conditions.
Now, we formally give definitions of the differentiability and analyticity of a
complex function of complex variables.
Definition 6.9 Let z0 be a given point of the complex plane ℂ. Let f(z) be defined in
a neighborhood containing z0. If f(z) is single-valued and differentiable at all points
of this neighborhood containing z0, f(z) is said to be analytic at z0. A point at which
f(z) is analytic is called a regular point of f(z). A point at which f(z) is not analytic is
called a singular point of f(z).
We emphasize that the above definition of analyticity at some point z0 requires the
single-valuedness and differentiability at all the points of a neighborhood containing
z0. This can be understood by Fig. 6.10b and Example 6.1. The analyticity at a point
is deeply connected to how and in which direction we take the limitation process.
6.3 Integration of Analytic Functions: Cauchy’s Integral Formula 213

Thus, we need detailed information about the neighborhood of the point in question
to determine whether the function is analytic at that point.
The next definition is associated with a global characteristic of the analytic
function.
Definition 6.10 Let R be a region contained in the complex plane ℂ. Let f(z) be a
complex function defined in R . If f(z) is analytic at all points of R , f(z) is said to be
analytic in R . In this case R is called a domain of analyticity. If the domain of
analyticity is an entire complex plane ℂ, the function is called an entire function.
To understand various characteristics of the analytic functions, it is indispensable
to introduce Cauchy’s integral formula. In the next section we deal with the
integration of complex functions.

6.3 Integration of Analytic Functions: Cauchy’s Integral


Formula

We can define the complex integration as a natural extension of Riemann integral


with regard to a real variable. Suppose that there is a curve C in the complex plane.
Both ends of the curve are fixed and located at za and zb. Suppose also that individual
points of the curve C are described using a parameter t such that

z = zðt Þ = xðt Þ þ iyðt Þ ðt 0  t a ≤ t ≤ t n  t b Þ, ð6:71Þ

where z0  za = z(ta) and zn  zb = z(tb) together with zi = z(ti) (0 ≤ i ≤ n). For this
parametrization, we assume that C is subdivided into n pieces designated by z0, z1,
⋯, zn.
Let f(z) be a complex function defined in a region containing C. In this situation,
let us assume the following summation Sn:
n
Sn = i=1
f ðζ i Þðzi - zi - 1 Þ, ð6:72Þ

where ζi lies between zi - 1 and zi (1 ≤ i ≤ n). Taking the limit n → 1 and


concomitantly jzi - zi - 1 j → 0, we have
n
lim Sn = lim i=1
f ðζ i Þðzi - zi - 1 Þ: ð6:73Þ
n→1 n→1

If the constant limit exists independent of choice of zi (1 ≤ i ≤ n - 1) and


ζ i (1 ≤ i ≤ n), this limit is called a contour integral of f(z) along the curve C. We write
it as
214 6 Theory of Analytic Functions

zb
I = lim Sn  f ðzÞdz = f ðzÞdz: ð6:74Þ
n→1
C za

Using (6.48) and zi = xi + iyi, we rewrite (6.72) as


n
Sn = i=1
½ uð ζ i Þ ð x i - x i - 1 Þ - v ð ζ i Þ ð y i - y i - 1 Þ 
n
þ i=1
i½vðζ i Þðxi - xi - 1 Þ þ uðζ i Þðyi - yi - 1 Þ:

Taking the limit of n → 1 as well as jxi - xi - 1 j → 0 and |yi - yi - 1| → 0


(0 ≤ i ≤ n), we have

I= ½uðx, yÞdx - vðx, yÞdy þ i ½vðx, yÞdx þ uðx, yÞdy: ð6:75Þ


C C

Further rewriting (6.75), we get

dx dy dx dy
I= uðx, yÞ dt - vðx, yÞ dt þ i vðx, yÞ dt þ uðx, yÞ dt
C dt dt C dt dt
dx dy
= ½uðx, yÞ þ ivðx, yÞ dt þ i ½uðx, yÞ þ ivðx, yÞ dt
C dt C dt
dx dy dz
= ½uðx, yÞ þ ivðx, yÞ þi dt = ½uðx, yÞ þ ivðx, yÞ dt,
C dt dt C dt
tb zb
dz dz
= f ðzÞ dt = f ðzÞ dt = f ðzÞdz: ð6:76Þ
C dt ta dt za

Notice that the contour integral I of (6.74), (6.75), or (6.76) depends in general on
the paths connecting the points za and zb. If so, (6.76) is merely of secondary
importance. However, if f(z) can be described by the derivative of another function,
the situation becomes totally different. Suppose that we have

~ ðzÞ
dF
f ðzÞ = : ð6:77Þ
dz

~ ðzÞ is analytic, f(z) is analytic as well. It is because a derivative of an analytic


If F
function is again analytic (vide infra). Then, we have

~ ðzÞ dz dF
dz d F ~ ½ zðt Þ
f ðzÞ = = : ð6:78Þ
dt dz dt dt

Inserting (6.78) into (6.76), we get


6.3 Integration of Analytic Functions: Cauchy’s Integral Formula 215

Fig. 6.11 Region R and


several contours for
integration. Regarding the
symbols and notations,
see text

dz zb tb ~ ½zðt Þ
dF
f ðzÞ dt = f ðzÞdz = ~ ðzb Þ - F
dt = F ~ ðza Þ: ð6:79Þ
C dt za ta dt

~ ðzÞ is called a primitive function of f(z). It is obvious from (6.79) that


In this case F
zb za
f ðzÞdz = - f ðzÞdz: ð6:80Þ
za zb

This implies that the contour integral I does not depend on the path C connecting
the points za and zb. We must be careful, however, about a situation where there
would be a singular point zs of f(z) in a region R . Then, f(z) is not defined at zs. If one
chooses C′′ for a contour of the return path in such a way that zs is on C′′ (see
Fig. 6.11), RHS of (6.80) cannot exist, nor can (6.80) hold. To avoid that situation,
we have to “expel” such singularity from the region R so that R can wholly be
contained in the domain of analyticity of f(z) and that f(z) can be analytic in the entire
region R .
Moreover, we must choose R so that the whole region inside any closed paths
within R can contain only the points that belong to the domain of analyticity of f(z).
That is, the region R must be simply connected in terms of the analyticity of f(z) (see
Sect. 6.1). For a region R to be simply connected ensures that (6.80) holds with any
pair of points za and zb in R , so far as the integration path with respect to (6.80) is
contained in R .
From (6.80), we further get
216 6 Theory of Analytic Functions

zb za
f ðzÞdz þ f ðzÞdz = 0: ð6:81Þ
za zb

Since the integral does not depend on the paths, we can take a path from zb to za
along a curve C′ (see Fig. 6.11). Thus, we have

f ðzÞdz þ f ðzÞdz = f ðzÞdz = 0, ð6:82Þ


C C0 CþC 0

~ as shown. In this way, in accordance with


where C + C′ constitutes a closed curve C
(6.81) we get

f ðzÞdz = 0, ð6:83Þ
~
C

where C~ is followed counter-clockwise according to the custom.


In (6.83) any closed path C~ can be chosen for the contour within the simply
connected region R . Hence, we reach the following important theorem.
Theorem 6.10: Cauchy’s Integral Theorem [5] Let R be a simply connected
region and let C be an arbitrary closed curve within R . Let f(z) be analytic on C and
within the whole region enclosed by C. Then, we have

f ðzÞdz = 0: ð6:84Þ
C

There is a variation in description of Cauchy’s integral theorem. For instance, let f(z)
be analytic in R except for a singular point at zs; see Fig. 6.12a. Then, the domain of
analyticity of f(z) is not identical to R , but is defined as R - fzs g. That is, the domain
of analyticity of f(z) is no longer simply connected and, hence, (6.84) is not generally

(a) (b)

Γ

Fig. 6.12 Region R and contour C for integration. (a) A singular point at zs is contained within R .
(b) The singular point at zs is not contained within R~ so that f(z) can be analytic in a simply
connected region R~ . Regarding the symbols and notations, see text
6.3 Integration of Analytic Functions: Cauchy’s Integral Formula 217

true. In such a case, we may deform the integration path of f(z) so that its domain of
analyticity can be simply connected and that (6.84) can hold. For example, we take
R~ so that it can be surrounded by curves C and Γ ~ as well as lines L1 and L2
(Fig. 6.12b). As a result, zs has been expelled from R~ and, at the same time, R~
becomes simply connected. Then, as a tangible form of (6.84) we get

f ðzÞdz = 0, ð6:85Þ
~ 1 þL2
CþΓþL

~ denotes the clockwise integration (see Fig. 6.12b) and the lines L1 and L2 are
where Γ
rendered as closely as possible. In (6.85) the integrations with L1 and L2 cancel,
because they are taken in the reverse direction. Thus, we have

f ðzÞdz = 0 or f ðzÞdz = f ðzÞdz, ð6:86Þ


~
CþΓ C Γ

where Γ stands for the counter-clockwise integration. Equation (6.86) is valid as


well, when there are more singular points. In that case, instead of (6.86) we have

f ðzÞdz = f ðzÞdz, ð6:87Þ


C i Γi

where Γi is taken so that it can encircle individual singular points. Reflecting the
nature of singularity, we get different results with (6.87). We will come back to this
point later.
On the basis of Cauchy’s integral theorem, we show an integral representation of
an analytic function that is well-known as Cauchy’s integral formula.
Theorem 6.11: Cauchy’s Integral Formula Let f(z) be analytic in a simply
connected region R . Let C be an arbitrary closed curve within R that encircles z.
Then, we have

1 f ðζ Þ
f ðzÞ = dζ, ð6:88Þ
2πi C ζ-z

where the contour integration along C is taken in the counter-clockwise direction.


Proof Let Γ be a circle of a radius ρ in the complex plane so that z can be a center of
the circle (see Fig. 6.13). Then, f(ζ)/(ζ - z) has a singularity at ζ = z. Therefore, we
must evaluate (6.88) using (6.86). From (6.86), we have

f ðζ Þ f ðζ Þ dζ f ðζ Þ - f ðzÞ
dζ = dζ = f ðzÞ þ dζ: ð6:89Þ
C -z
ζ Γ -z
ζ Γ -z
ζ Γ ζ-z

An arbitrary point ζ on the circle Γ is described as


218 6 Theory of Analytic Functions

Fig. 6.13 Simply


connected region R
encircled by a closed
contour C in a complex
plane ζ. A circle Γ of radius
ρ centered at z is contained
within R . The contour ℛ
integration is taken in the
counter-clockwise direction z
along C or Γ Γ
i

0 1

ζ = z þ ρeiθ : ð6:90Þ

Then, taking infinitesimal quantities of (6.90), we have

dζ = ρeiθ idθ = ðζ - zÞidθ: ð6:91Þ

Inserting (6.91) into (6.89), we get



= idθ = 2πi: ð6:92Þ
Γ -z
ζ 0

Rewriting (6.89), we obtain

f ðζ Þ f ðζ Þ - f ðzÞ
dζ = 2πif ðzÞ þ dζ: ð6:93Þ
Γ -z
ζ Γ ζ-z

Since f(z) is analytic, f(z) is uniformly continuous [6]. Therefore, if we make ρ


small enough with an arbitrary positive number ε, we have |f(ζ) - f(z)| < ε, as |ζ -
z| = ρ. Considering this situation, we have

1 f ðζ Þ 1 f ðζ Þ 1 f ðζ Þ - f ðzÞ
dζ - f ðzÞ = dζ - f ðzÞ = dζ
2πi C ζ - z 2πi Γ ζ - z 2πi Γ ζ-z

1 f ðζ Þ - f ðzÞ ε jdζ j ε 2π
≤  jdζ j < = dθ = ε: ð6:94Þ
2π Γ ζ-z 2π Γ j ζ - z j 2π 0

Therefore, we get (6.88) in the limit of ρ → 0. This completes the proof.


6.3 Integration of Analytic Functions: Cauchy’s Integral Formula 219

In the above proof, the domain of analyticity R of ζf ð-ζÞz (with respect to ζ) is given
by R = R \ ½ℂ - fzg = R - fzg. Since both R and ℂ - fzg are open sets (i.e.,
ℂ - {z} = {z}c is a complementary set of a closed set {z} and, hence, an open set), R
is an open set as well; see Sect. 6.1. Notice that R is not simply connected. For this
reason, we had to evaluate (6.88) using (6.89).
Theorem 6.10 (Cauchy’s integral theorem) and Theorem 6.11 (Cauchy’s integral
formula) play a crucial role in the theory of analytic functions.
To derive (6.94), we can equally use the Darboux’s inequality [5]. This is
intuitively obvious and frequently used in the theory of analytic functions.
Theorem 6.12: Darboux’s Inequality [5] Let f(z) be a function for which jf(z)j is
bounded on C. Here C is a piecewise continuous path in the complex plane. Then,
with the following integral I described by

I= f ðzÞdz,
C

we have

f ðzÞdz ≤ maxjf j  L, ð6:95Þ


C

where L represents the arc length of the curve C for the contour integration.
Proof As discussed earlier in this section, the integral is the limit of n → 1 of the
sum described by
n
Sn = i=1
f ðζ i Þðzi - zi - 1 Þ ð6:72Þ

with
n
I = lim Sn = lim i=1
f ðζ i Þðzi - zi - 1 Þ: ð6:73Þ
n→1 n→1

Denoting the maximum modulus of f(z) on C by max j fj, we have


n n
j Sn j ≤ i=1
j f ðζ i Þ j  j ðzi - zi - 1 Þ j ≤ maxjf j i=1
jzi - zi - 1 j: ð6:96Þ

The sum ni = 1 j ðzi - zi - 1 Þ j on RHS of inequality (6.96) is the length of a


polygon inscribed in the curve C. It is shorter than the arc length L of the curve C.
Hence, for all n we have

j Sn j ≤ max j f j L:
220 6 Theory of Analytic Functions

As n → 1, jSn j = jI j = j f ðzÞdz j ≤ max j f j L:


C

This completes the proof.


Applying the Darboux’s inequality to (6.94), we get

f ðζ Þ - f ðzÞ f ðζ Þ - f ðzÞ f ðζ Þ - f ðzÞ


dζ = dζ ≤ max  2πρ < 2πε:
Γ ζ-z Γ ρe iθ ρeiθ

That is,

1 f ðζ Þ - f ðzÞ
dζ < ε: ð6:97Þ
2πi Γ ζ-z

So far, we dealt with the differentiation and integration as different mathematical


manipulations. However, Cauchy’s integral formula (or Cauchy’s integral expres-
sion) enables us to relate and unify these two manipulations. We have the following
proposition for this.
Proposition 6.1 [5] Let C be a piecewise continuous curve of finite length. (The
curve C may or may not be closed.) Let f(z) be a continuous function. The contour
integration of f(ζ)/(ζ - z) gives ~f ðzÞ such that

~f ðzÞ = 1 f ðζ Þ
dζ: ð6:98Þ
2πi C -z
ζ

Then, ~f ðzÞ is analytic at any point z that does not lie on C.


Proof We consider the following expression:

~f ðz þ ΔzÞ - ~f ðzÞ 1 f ðζ Þ
Δ - dζ , ð6:99Þ
Δz 2πi C ðζ - zÞ
2

where Δ is a real non-negative number. Describing (6.99) by use of (6.98) for


~f ðz þ ΔzÞ and ~f ðzÞ, we get

j Δz j f ðζ Þ
Δ= 2
dζ : ð6:100Þ
2π C ðζ - z - ΔzÞðζ - zÞ

To obtain (6.100), in combination with (6.98) we have calculated (6.99) as


follows:
6.3 Integration of Analytic Functions: Cauchy’s Integral Formula 221

1 f ðζ Þðζ - zÞ2 - f ðζ Þðζ - z - ΔzÞðζ - zÞ - f ðζ ÞΔzðζ - z - ΔzÞ


Δ= dζ
2πiΔz C ðζ - z - ΔzÞðζ - zÞ2

1 f ðζ ÞðΔzÞ2
= 2
dζ :
2πiΔz C ðζ - z - ΔzÞðζ - zÞ

Since z is not on C, the integrand of (6.100) is bounded. Then, as Δz → 0,


jΔz j → 0 and Δ → 0 accordingly. From (6.99), this ensures the differentiability of
~f ðzÞ. Then, by definition of the differentiation, d~f ðzÞ is given by
dz

d~f ðzÞ 1 f ðζ Þ
= 2
dζ: ð6:101Þ
dz 2πi C ðζ - zÞ

As f(ζ) is continuous, ~f ðzÞ is single-valued. Thus, ~f ðzÞ is found to be analytic at


any point z that does not lie on C. This completes the proof.
If in (6.101) we take C as a closed contour that encircles z, from (6.98) we have

~f ðzÞ = 1 f ðζ Þ
dζ: ð6:102Þ
2πi C ζ-z

Moreover, if f(z) is analytic in a simply connected region that contains C, from


Theorem 6.11 we must have

1 f ðζ Þ
f ðzÞ = dζ: ð6:103Þ
2πi C ζ-z

Comparing (6.102) with (6.103) and taking account of the single-valuedness of


f(z), we must have

f ðzÞ  ~f ðzÞ: ð6:104Þ

Then, from (6.101) we get

df ðzÞ 1 f ðζ Þ
= 2
dζ: ð6:105Þ
dz 2πi C ðζ - zÞ

An analogous result holds with the n-th derivative of f(z) such that [5]
222 6 Theory of Analytic Functions

d n f ðzÞ n! f ðζ Þ
= dζ ðn : zero or positive integersÞ: ð6:106Þ
dzn 2πi C ðζ - zÞnþ1

Equation (6.106) implies that an analytic function is infinitely differentiable and


that the derivatives of all order of an analytic function are again analytic. These
prominent properties arise partly from the aforementioned stringent requirement on
the differentiability of a function of a complex variable.
So far, we have not assumed the continuity of df(z)/dz. However, once we have
established (6.106), it assures the presence of, e.g., d2f(z)/dz2 and, hence, the
continuity of df(z)/dz. It is true of d nf(z)/dzn with any n (zero or positive integers).
A following theorem is important and intriguing with the analytic functions.
Theorem 6.13: Cauchy–Liouville Theorem A bounded entire function must be a
constant.
Proof Using (6.106), we consider the first derivative of an entire function f(z)
described as

df 1 f ðζ Þ
= 2
dζ:
dz 2πi C ðζ - zÞ

Since f(z) is an entire function, we can arbitrarily choose a large enough circle of
radius R centered at z for a closed contour C. On the circle, we have

ζ = z þ Reiθ ,

where θ is a real number changing from 0 to 2π. Then, the above equation can be
rewritten as

df 1 2π
f ðζ Þ 1 2π
f ðζ Þ
= iReiθ dθ = dθ: ð6:107Þ
dz 2πi 0 ðReiθ Þ2 2πR 0 eiθ

Taking an absolute value of both sides, we have

2π 2π
df 1 M M
j j ≤ j f ðζ Þ j dθ ≤ dθ = , ð6:108Þ
dz 2πR 0 2πR 0 R

where M is the maximum of jf(ζ)j. As R → 1, |df/dz| tends to be zero. This implies


that df/dz tends to be zero as well and, hence, that f(z) is constant. This completes the
proof.
At the first glance, Cauchy–Liouville theorem looks astonishing in terms of
theory of real analysis. It is because we are too familiar with, e.g., -1 ≤ sin x ≤ 1
for any real number x. Note that sinx is bounded in a real domain. In fact, in Sect. 8.6
we will show that from (8.98) and (8.99), when a → ± 1 (a : real, a ≠ 0),
6.4 Taylor’s Series and Laurent’s Series 223

sin π2 þ ia takes real values with sin π2 þ ia → 1. This simple example clearly
shows that sinϕ is an unbounded entire function in a complex domain. As a very
familiar example of a bounded entire functions, we show

f ðzÞ = cos 2 z þ sin 2 z  1, ð6:109Þ

which is defined in an entire complex plane.

6.4 Taylor’s Series and Laurent’s Series

Using Cauchy’s integral formula, we study Taylor’s series and Laurent’s series in
relation to the power series expansion of analytic function. Taylor’s series and
Laurent’s series are fundamental tools to study various properties of complex
functions in the theory of analytic functions. First, let us examine Taylor’s series
of an analytic function.
Theorem 6.14 [6] Any analytic function f(z) can be expressed by uniformly
convergent power series at an arbitrary regular point of the function in the domain
of analyticity R .
Proof Let us assume that we have a circle C of radius r centered at a within R .
Choose z inside C with |z - a| = ρ (ρ < r) and consider the contour integration of
f(z) along C (Fig. 6.14). Then, we have

1 1 1 1 1 z-a
= =  a = þ
ζ - z ζ - a - ðz - aÞ ζ - a 1 - ζz -
-a ζ - a ð ζ - aÞ 2
ð z - aÞ 2
þ þ ⋯, ð6:110Þ
ðζ - aÞ3

Fig. 6.14 Diagram to


explain Taylor’s expansion
that is assumed to be
performed around a. A

circle C of radius r centered
at a is inside R . Regarding
other symbols and notations,
see text
224 6 Theory of Analytic Functions

where ζ is an arbitrary point on the circle C. To derive (6.110), we used the following
formula:

1 1
= xn ,
1-x n=0

z-a z-a ρ
where x = ζ - a. Since ζ-a = r < 1, the geometric series of (6.110) converges.
Suppose that f(ζ) is analytic on C. Then, we have a finite positive number M on
C such that

jf ðζ Þj < M: ð6:111Þ

Using (6.110), we have

f ðζ Þ f ðζ Þ ðz - aÞf ðζ Þ ðz - aÞ2 f ðζ Þ
= þ þ þ ⋯: ð6:112Þ
ζ-z ζ-a ð ζ - aÞ 2 ð ζ - aÞ 3

Hence, combining (6.111) and (6.112) we get

1 - ρr
n
1 ðz - aÞν f ðζ Þ M 1 ρ ν M 1
≤ = ρ -
ν=n
ðζ - aÞνþ1 r ν=n r r 1- r 1 - ρr
M ρ n
= : ð6:113Þ
r-ρ r

RHS of (6.113) tends to be zero when n → 1. Therefore, (6.112) is uniformly


convergent with respect to ζ on C. In the above calculations, when we express the
infinite series of RHS in (6.112) as Σ and the partial sum of its first n terms as Σn,
LHS of (6.113) can be expressed as Σ - Σn(Ρn). Since jΡn j → 0 with n → 1, this
certainly shows that the series Σ is uniformly convergent [6].
Consequently, we can perform termwise integration [6] of (6.112) on C and
subsequently divide the result by 2πi to get

1 f ðζ Þ 1 ð z - aÞ n f ðζ Þ
dζ = dζ: ð6:114Þ
C -z
ζ n=0 nþ1
2πi 2πi C ð ζ - aÞ

From Theorem 6.11, LHS of (6.114) equals f(z). Putting

1 f ðζ Þ
An  dζ, ð6:115Þ
2πi C ðζ - aÞnþ1

we get
6.4 Taylor’s Series and Laurent’s Series 225

1 n
f ðzÞ = A ð z - aÞ
n=0 n
: ð6:116Þ

This completes the proof.


In the above proof, from (6.106) we have

1 f ðζ Þ 1 d n f ðzÞ
dζ = : ð6:117Þ
2πi C ðζ - zÞ
nþ1 n! dzn

Hence, combining (6.117) with (6.115) we get

1 f ðζ Þ 1 d n f ðzÞ 1 d n f ð aÞ
An = dζ = = : ð6:118Þ
2πi C ðζ - aÞ
nþ1 n! dzn n! dzn
z=a

Thus, (6.116) can be rewritten as

1 1 d n f ð aÞ
f ðzÞ = ð z - aÞ n :
n = 0 n! dzn

This is the same form as that obtained with respect to a real variable.
Equation (6.116) is a uniformly convergent power series called a Taylor’s series
or Taylor’s expansion of f(z) with respect to z = a with An being its coefficient. In
Theorem 6.14 we have assumed that a union of a circle C and its inside is simply
connected. This topological environment is virtually the same as that of Fig. 6.13. In
Theorem 6.11 (Cauchy’s integral formula) we have shown the integral representa-
tion of an analytic function. On the basis of Cauchy’s integral formula, Theorem
6.14 demonstrates that any analytic function is given a tangible functional form.
Example 6.2 Let us think of a function f(z) = 1/z. This function is analytic except
for z = 0. Therefore, it can be expanded in the Taylor’s series in the region ℂ - {0}.
We consider the expansion around z = a (a ≠ 0). Using (6.115), we get the
expansion coefficients that are expressed as

1 1
An = nþ1
dz,
2πi C z ð z - aÞ

where C is a closed curve that does not contain z = 0 on C or in its inside. Therefore,
1/z is analytic in the simply connected region encircled with C so that the point z = 0
may not be contained inside C (Fig. 6.15).
Then, from (6.106) we get
226 6 Theory of Analytic Functions

Fig. 6.15 Diagram used for


calculating the Taylor’s z
series of f(z) = 1/z around
z = a (a ≠ 0). A contour
C encircles the point a

0 1

1 1 1 dn ð1=zÞ 1 1 ð- 1Þn
dz = = ð- 1Þn n! nþ1 = nþ1 :
2πi C z ð z - aÞ
nþ1 n! dzn n! a a
z=a

Hence, from (6.116) we have

1 1 ð- 1Þn 1 1 z n
f ðzÞ = = n = 0 anþ1
ð z - aÞ n = n=0
1- : ð6:119Þ
z a a

The series uniformly converges within a convergence circle called a “ball” [7]
(vide infra) whose convergence radius r is estimated to be

1 n ð- 1Þn 1 1 1
= nlim = = : ð6:120Þ
n
sup
r →1 anþ1 jaj jaj jaj

That is, r = j aj. In (6.120) “nlim


→1
sup ” stands for the superior limit and is
sometimes said to be “limes superior” [6]. The estimation is due to Cauchy-
Hadamard theorem. Readers interested in the theorem and related concepts are
referred to appropriate literature [6].
The concept of the convergence circle and radius is very important for defining
the analyticity of a function. Formally, the convergence radius is defined as a radius
of a convergence circle in a metric space, typically ℝ2 and ℂ. In ℂ a region R within
the convergence circle of convergence radius r (i.e., a real positive number) centered
at z0 is denoted by

R = z; z, ∃ z0 2 ℂ, ρðz, z0 Þ < r :
6.4 Taylor’s Series and Laurent’s Series 227

Fig. 6.16 Diagram to


explain Laurent’s expansion
that is performed around a.
The circle Γ containing the
point z lies on the annular
region bounded by the Γ
circles C1 and C2

The region R is an open set by definition (see Sect. 6.1) and the Tailor’s series
defined in R is convergent. On the other hand, the Tailor’s series may or may not be
convergent at z that is on the convergence circle. In Example 6.2, f(z) = 1/z is not
defined at z = 0, even though z (=0) is on the convergent circle. At other points on
the convergent circle, however, the Tailor’s series is convergent.
Note that a region B n in ℝn similarly defined as

B n = x; x, ∃ x0 2 ℝn , ρðx, x0 Þ < r

is sometimes called a ball or open ball. This concept can readily be extended to any
metric space.
Next, we examine Laurent’s series of an analytic function.
Theorem 6.15 [6] Let f(z) be analytic in a region R except for a point a. Then, f(z)
can be described by the uniformly convergent power series within a region R - fag.
Proof Let us assume that we have a circle C1 of radius r1 centered at a and another
circle C2 of radius r2 centered at a both within R . Moreover, let another circle Γ be
centered at z (≠a) within R so that z can be outside of C2; see Fig. 6.16.
In this situation we consider the contour integration along C1, C2, and Γ. Using
(6.87), we have
228 6 Theory of Analytic Functions

1 f ðζ Þ 1 f ðζ Þ 1 f ðζ Þ
dζ = dζ - dζ: ð6:121Þ
2πi Γ ζ - z 2πi C1 ζ - z 2πi C2 - z
ζ

From Theorem 6.11, we have

LHS of ð6:121Þ = f ðzÞ:

Notice that the region containing Γ and its inside form a simply connected region
in terms of the analyticity of f(z). With the first term of RHS of (6.121), from
Theorem 6.14 we have

1 f ðζ Þ 1 n
dζ = A ð z - aÞ ð6:122Þ
C1 - z
ζ
2πi n=0 n

with

1 f ðζ Þ
An = dζ ðn = 0, 1, 2, ⋯Þ: ð6:123Þ
2πi C1 ðζ - aÞnþ1

In fact, if ζ lies on the circle C1, the Taylor’s series is uniformly convergent as in
the case of the proof of Theorem 6.14.
With the second term of RHS of (6.121), however, the situation is different in
such a way that z lies outside the circle C2. In this case, from Fig. 6.16 we have

ζ-a r
= 2 < 1: ð6:124Þ
z-a ρ

Then, a geometric series described by

2
1 1 -1 ζ - a ð ζ - aÞ
= = 1þ þ þ⋯ ð6:125Þ
ζ - z ζ - a - ð z - aÞ z - a z - a ð z - aÞ 2

is uniformly convergent with respect to ζ on C2. Accordingly, again we can perform


termwise integration [6] of (6.125) on C2. Hence, we get

f ðζ Þ 1 1
- dζ = f ðζ Þdζ þ f ðζ Þðζ - aÞdζ
C2 ζ-z z-a C2 ð z - aÞ 2 C2

1
þ f ðζ Þðζ - aÞ2 dζ þ ⋯: ð6:126Þ
ð z - aÞ 3 C2

That is, we have


6.5 Zeros and Singular Points 229

1 f ðζ Þ 1
- dζ = A - n ð z - aÞ - n , ð6:127Þ
C2 - z
ζ
2πi n=1

where

1
A-n = f ðζ Þðζ - aÞn - 1 dζ ðn = 1, 2, 3, ⋯Þ: ð6:128Þ
2πi C2

Consequently, from (6.121), with z lying on the annular region bounded by C1


and C2, we get
1 n
f ðzÞ = A ð z - aÞ
n= -1 n
: ð6:129Þ

In (6.129), the coefficients An are given by (6.123) or (6.128) according to the


plus (including zero) or minus sign of n. These complete the proof.
Equation (6.129) is a uniformly convergent power series called a Laurent’s series
or Laurent’s expansion of f(z) with respect to z = a with A-n being its coefficient.
Note that the above annular region bounded by C1 and C2 is not simply connected,
but multiply connected.
~ is taken in the annular region formed by C1
We add that if the integration path C
and C2 so that it can be sandwiched in between them, the values of integral expressed
as (6.123) and (6.128) remain unchanged in virtue of (6.87). That is, instead of
(6.123) and (6.128) we may have a following unified formula

1 f ðζ Þ
An = dζ ðn = 0, ± 1, ± 2, ⋯Þ ð6:130Þ
2πi ~
C ðζ - aÞnþ1

that represents the coefficients An of the power series


1 n
f ðzÞ = A ð z - aÞ
n= -1 n
: ð6:129Þ

6.5 Zeros and Singular Points

We described the fundamental theorems of Taylor’s expansion and Laurent’s


expansion. Next, we explain important concepts of zeros and singular points.
Definition 6.11 Let f(z) be analytic in a region R . If f(z) vanishes at a point z = a,
the point is called a zero of f(z). When we have
230 6 Theory of Analytic Functions

df ðzÞ d 2 f ðzÞ d n - 1 f ðzÞ


f ð aÞ = = z=a =⋯= = 0, ð6:131Þ
dz z=a dz2 dzn - 1 z=a

but

d n f ðzÞ
≠ 0, ð6:132Þ
dzn z=a

the function is said to have a zero of order n at z = a.


If (6.131) and (6.132) hold, the first n coefficients in the Tailor’s series of the
analytic function f(z) at z = a vanish. Hence, we have

f ðzÞ = An ðz - aÞn þ Anþ1 ðz - aÞnþ1 þ ⋯

1
= ð z - aÞ n A ð z - aÞ
k = 0 nþk
k
= ðz - aÞn hðzÞ,

where we define h(z) as


1
hðzÞ  A ðz - aÞk :
k = 0 nþk
ð6:133Þ

Then, h(z) is analytic and non-vanishing at z = a. From the analyticity h(z) must
be continuous at z = a and differ from zero in some finite neighborhood of z = a.
Consequently, it is also the case with f(z). Therefore, if the set of zeros had an
accumulation point at z = a, any neighborhood of z = a would contain another zero,
in contradiction to the above assumption. To avoid this contradiction, the analytic
function f(z)  0 throughout the region R . Taking the contraposition of the above
statement, if an analytic function is not identically zero [i.e., f(z) ≢ 0], the zeros of
that function are isolated; see the relevant discussion of Sect. 6.1.2.
Meanwhile, the Laurent’s series (6.129) can be rewritten as

1 1 1
f ðzÞ = A ðz - aÞk
k=0 k
þ k=1
A-k ð6:134Þ
ð z - aÞ k

so that the constitution of the Laurent’s series can explicitly be recognized. The
second term is called a singular part (or principal part) of f(z) at z = a. The singularity
is classified as follows:
(1) Removable singularity:
If (6.134) lacks the singular part (i.e., A-k = 0 for k = 1, 2, ⋯), the singular point
is said to be a removable singularity. In this case, we have
6.5 Zeros and Singular Points 231

1 k
f ðzÞ = A ð z - aÞ
k=0 k
: ð6:135Þ

If in (6.135) we define suitably as

f ðaÞ  A0 ,

f(z) can be regarded as analytic. Examples include the following case where a sin
function is expanded as the Taylor’s series:

z3 z 5 ð- 1Þn - 1 z2n - 1
sin z = z - þ -⋯ þ þ⋯
3! 5! ð2n - 1Þ!
1 ð- 1Þn - 1 z2n - 1
= : ð6:136Þ
n=1 ð2n - 1Þ!

Hence, if we define

sin z sin z
f ðzÞ  and f ð0Þ  lim = 1, ð6:137Þ
z z→0 z

z = 0 is a removable singularity of f(z).


(2) Poles:
Suppose that in (6.134) we have a certain positive integer n (n ≥ 1) such that

A-n ≠ 0 but A - ðnþ1Þ = A - ðnþ2Þ = ⋯ = 0: ð6:138Þ

Then, the function f(z) is said to have a pole of order n at z = a. If, in particular,
n = 1 in (6.138), i.e.,

A-1 ≠ 0 but A - 2 = A - 3 = ⋯ = 0, ð6:139Þ

the function f(z) is said to have a simple pole. This special case is important in the
calculation of various integrals (vide infra).
In the general cases for (6.138), (6.134) can be rewritten as

gð z Þ
f ðzÞ = , ð6:140Þ
ð z - aÞ n

where g(z) defined as


1 k
gð z Þ  A
k=0 k-n
ð z - aÞ ð6:141Þ

is analytic and non-vanishing at z = a, i.e., A-n ≠ 0. Explicitly writing (6.140), we


have
232 6 Theory of Analytic Functions

1 k n 1
f ðzÞ = A ð z - aÞ
k=0 k
þ k=1
A-k :
ð z - aÞ k

(3) Essential singularity:


If the singular part of (6.134) comprises an infinite series, the point z = a is called
an essential singularity. In that case, f(z) performs a complicated behavior near or at
z = a. Interested readers are referred to suitable literature [5].
A function f(z) that is analytic in a region within ℂ except for a set of points where
the function has poles is called a meromorphic function in the said region. The above
definition of meromorphic functions is true of the above Cases (1) and (2), but it is
not true of Case (3). Henceforth, we will be dealing with the meromorphic functions.
In the above discussion of Cases (1) and (2), we often deal with a function f(z) that
has a single isolated pole at z = a. This implies that f(z) is analytic within a certain
neighborhood Na of z = a but is not analytic at z = a. More specifically, f(z) is
analytic in a region Na - {a}. Even though we consider more than one isolated pole,
the situation is essentially the same. Suppose that there is another isolated pole at
z = b. In that case, again take a certain neighborhood Nb of z = b (≠a) and f(z) is
analytic in a region Nb - {b}.
Readers may well wonder why we have to discuss this trifling issue. Neverthe-
less, think of the situation where a set of poles has an accumulation point. Any
neighborhood of the accumulation point contains another pole where the function is
not analytic. This is in contradiction to that f(z) is analytic in a region, e.g., Na - {a}.
Thus, if such a function were present, it would be intractable to deal with.

6.6 Analytic Continuation

When we discussed the Cauchy’s integral formula (Theorem 6.11), we have known
that if a function is analytic in a certain region of ℂ and on a curve C that encircles the
region, the values of the function within the region are determined once the values of
the function on C are given. We have the following theorem for this.
Theorem 6.16 Let f1(z) and f2(z) be two functions that are analytic within a region
R . Suppose that the two functions coincide in a set chosen from among (i) a
neighborhood of a point z 2 R , or (ii) a segment of a curve lying in R , or (iii) a
set of points containing an accumulation point belonging to R . Then, the two
functions f1(z) and f2(z) coincide throughout R .
Proof Suppose that f1(z) and f2(z) coincide on the above set chosen from the three.
Then, since f1(z) - f2(z) = 0 on that set, the set comprises the zeros of f1(z) - f2(z).
This implies that the zeros are not isolated in R . Thus, as evidenced from the
discussion of Sect. 6.5, we conclude that f1(z) - f2(z)  0 throughout R . That is,
f1(z)  f2(z). This completes the proof.
6.6 Analytic Continuation 233

Fig. 6.17 Four


convergence circles C1, C2,
C3, and C4 for 1/z. These
convergence circles are
centered at 1, i, - 1, or - i

−1 1

The theorem can be understood as follows. Two different analytic functions


cannot coincide in the set chosen from the above three (or more succinctly, a set
that contains an accumulation point). In other words, the behavior of an analytic
function in R is uniquely determined by the limited information of the subset
belonging to R .
The above statement is reminiscent of the pre-established harmony and, hence,
occasionally talked about mysteriously. Instead, we wish to give a simple example.
Example 6.3 In Example 6.2 we described a tangible illustration of the Taylor’s
expansion of a function 1/z as follows:

1 1 ð- 1Þn 1 1 z n
= n = 0 anþ1
ð z - aÞ n = n=0
1- : ð6:119Þ
z a a

If we choose 1, i, - 1, or - i for a, we can draw four convergence circles as


depicted in Fig. 6.17. Let each function described by (6.119) be f1(z), fi(z), f-1(z),
or f-i(z). Let R be a region that consists of an outer periphery and its inside of the
four circles C1, C2, C3, and C4 (i.e., convergence circles of radius 1). The said region
R is colored pale green in Fig. 6.17. There are four petals as shown.
We can view Fig. 6.17 as follows. For instance, f1(z) and fi(z) are defined in C1
and its inside and C2 and its inside, respectively, excluding z = 0. We have
f1(z) = fi(z) within the petal P1 (overlapped part as shown). Consequently, from
the statement of Theorem 6.16, f1(z)  fi(z) throughout the region encircled by C1
and C2 (excluding z = 0). Repeating this procedure, we get
234 6 Theory of Analytic Functions

f 1 ðzÞ  f i ðzÞ  f - 1 ðzÞ  f - i ðzÞ:

Thus, we obtain the same functional entity throughout R except for the origin
(z = 0). Further continuing similar procedures, we finally reach the entire complex
plane except for the origin, i.e., ℂ - {0} regarding the domain of analyticity of 1/z.
We further compare the above results with those for analytic function defined in
the real domain. The tailor’s expansion of f(x) = 1/x (x: real with x ≠ 0) around
a (a ≠ 0) reads as

1 f ðnÞ ðaÞ
f ðxÞ = = f ðaÞ þ f 0 ðaÞðx - aÞ þ ⋯ þ ð x - aÞ n þ ⋯
x n!
1 ð- 1Þn
= n = 0 anþ1
ðx - aÞn : ð6:142Þ

This is exactly the same form as that of (6.119) aside from notation of the
argument.
The aforementioned procedure that determines the behavior of an analytic func-
tion outside the region where that function has been originally defined is called
analytic continuation or analytic prolongation. Comparing (6.119) and (6.142), we
can see that (6.119) is a consequence of the analytic continuation of 1/x from the real
domain to the complex domain.

6.7 Calculus of Residues

The theorem of residues and its application to calculus of various integrals are one of
central themes of the theory of analytic functions.
The Cauchy’s integral theorem (Theorem 6.10) tells us that if f(z) is analytic in a
simply connected region R , we have

f ðzÞdz = 0, ð6:84Þ
C

where C is a closed curve within R . As we have already seen, this is not necessarily
the case if f(z) has singular points in R . Meanwhile, if we look at (6.130), we are
aware of a simple but important fact. That is, replacing n with 1, we have

1
A-1 = f ðζ Þdζ or f ðζ Þdζ = 2πiA - 1 : ð6:143Þ
2πi C C

This implies that if we could obtain the Laurent’s series of f(z) with respect to a
singularity located at z = a and encircled by C, we can immediately estimate Cf(ζ)
dζ of (6.143) from the coefficient of the term (z - a)-1, i.e., A-1. This fact further
6.7 Calculus of Residues 235

implies that even though f(z) has singularities, Cf(ζ)dζ may happen to vanish. Thus,
whether (6.84) vanishes depends on the nature of f(z) and its singularities. To
examine it, we define a residue of f(z) at z = a (that is encircled by C) as

1
Res f ðaÞ  f ðzÞdz or f ðzÞdz = 2πi Res f ðaÞ, ð6:144Þ
2πi C C

where Res f(a) denotes the residue of f(z) at z = a. From (6.143) and (6.144), we have

A - 1 = Res f ðaÞ: ð6:145Þ

In a formal sense, the point a does not have to be a singular point. If it is a regular
point of f(a), we have Res f(a) = 0 trivially. If there is more than one singularity at
z = aj, we have

f ðzÞdz = 2πi Res f aj : ð6:146Þ


C j

Notice that we assume the isolated singularities with (6.144) and (6.146). These
equations are associated with Cases (1) and (2) dealt with in Sect. 6.5. Within this
framework, we wish to evaluate the residue of f(z) at a pole of order n located at
z = a. In Sect. 6.5, we had f(z) described by

gð zÞ
f ðzÞ = , ð6:140Þ
ð z - aÞ n

where g(z) is analytic and non-vanishing at z = a. Inserting (6.140) into (6.144), we


have

1 gð ζ Þ 1 d n - 1 gð z Þ
Res f ðaÞ = dζ =
2πi C ð ζ - aÞ
n
ðn - 1Þ! dzn - 1 z=a

1 d n - 1 ½ f ð z Þ ð z - aÞ n 
= , ð6:147Þ
ðn - 1Þ! dzn - 1 z=a

where with the second equality we used (6.106).


In the special but important case where f(z) has a simple pole at z = a, from
(6.147) we get

Res f ðaÞ = f ðzÞðz - aÞjz = a : ð6:148Þ

Setting n = 1 in (6.147) in combination with (6.140), we obtain


236 6 Theory of Analytic Functions

1 1 gð z Þ
Res f ðaÞ = f ðzÞdz = dz = f ðzÞðz - aÞjz = a = gðzÞjz = a :
2πi C 2πi C -a
z

This is nothing but Cauchy’s integral formula (6.88) regarding g(z).


Summarizing the above discussion, we can calculate the residue of f(z) at a pole of
order n located at z = a using one of the following alternatives, i.e., (i) using (6.147)
and (ii) picking up the coefficient A-1 in the Laurent’s series described by (6.129).
In Sect. 6.3 we mentioned that the closed contour integral (6.86) or (6.87)
depends on the nature of singularity. Here we have reached a simple criterion of
evaluating that integral. In fact, suppose that we have a Laurent’s expansion such
that
1 n
f ðzÞ = A ð z - aÞ
n= -1 n
: ð6:129Þ

Then, let us calculate the contour integral of f(z) on a closed circle Γ of radius ρ
centered at z = a. We have

1
I f ðζ Þdζ = A
n= -1 n
ðζ - aÞn dζ
Γ Γ


1
= iA
n= -1 n
ρnþ1 eiðnþ1Þθ dθ: ð6:149Þ
0

The above integral vanishes except for n = - 1. With n = - 1 we get

I = 2πiA - 1 :

Thus, we recover (6.145) in combination of (6.144). To get a Laurent’s series of


f(z), however, is not necessarily easy. In that case, we estimate a residue by (6.147).
Tangible examples can be seen in the next section.

6.8 Examples of Real Definite Integrals

The calculus of residues is of great practical importance. For instance, it is directly


associated with the calculations of definite integrals. In particular, even if one finds
the real integration to be hard to perform in order to solve a related problem, it is
often easy to solve it using complex integration, especially the calculus of residues.
Here we study several examples.
6.8 Examples of Real Definite Integrals 237

Fig. 6.18 Contour for the


integration of 1/(1 + z2) that
appears in Example 6.4. One
may equally choose the Γ
upper semicircle (denoted
by ΓR) and lower semicircle
(denoted by Γ~R ) for contour i
integration

− 0
−i

Example 6.4 Let us consider the following real definite integral:


1
dx
I= : ð6:150Þ
-1 1 þ x2

To this end, we convert the real variable x to the complex variable z and evaluate a
contour integral IC (Fig. 6.18) described by

R
dz dz
IC = þ , ð6:151Þ
-R 1 þ z2 ΓR 1 þ z2

where R is a real positive number enough large (R ≫ 1); ΓR denotes an upper


semicircle; IC stands for the contour integral along the closed curve C that comprises
the interval [-R, R] and the upper semicircle ΓR. Simple poles of order 1 are located
at z = ± i as shown. There is only one simple pole at z = i within the upper
semicircle of Fig. 6.18. From (6.146), therefore, we have

IC = f ðzÞdz = 2πi Res f ðiÞ, ð6:152Þ


C

where f ðzÞ  1þz


1
2 . Using (6.148), the residue of f(z) at z = i can readily be estimated

to be

1
Res f ðiÞ = f ðzÞðz - iÞjz = i = : ð6:153Þ
2i

To estimate the integral (6.150), we must calculate the second term of (6.151). To
this end, we change the variable z such that
238 6 Theory of Analytic Functions

z = Reiθ , ð6:154Þ

where θ is an argument. Then, using the Darboux’s inequality (6.95), we get

π π
dz iReiθ dθ iReiθ
= ≤ dθ
ΓR 1 þ z2 0 R2 e2iθ þ 1 0 R2 e2iθ þ 1
π
R
= dθ
0 R2 e2iθ þ 1 R2 e - 2iθ þ 1
π
1 π π
=R dθ ≤ R = :
0 R þ 2R cos 2θ þ 1
4 2
R2 1 - 1
R2
R 1- 1
R2

Taking R → 1, we have

dz
→ 0: ð6:155Þ
ΓR 1 þ z
2

Consequently, if R → 1, we have
1 1
dx dz dx
lim I C = þ lim = =I
R→1 -1 1 þ x2 R → 1 ΓR 1 þ z2 -1 1 þ x2
= 2πi Res f ðiÞ = π, ð6:156Þ

where with the first equality we placed the variable back to x; with the last equality
we used (6.152) and (6.153). Equation (6.156) gives an answer to (6.150). Notice in
general that if a meromorphic function is given as a quotient of two polynomials, an
integral of a type of (6.155) tends to be zero with R → 1 in the case where the
degree of the denominator of that function is at least two units higher than the degree
of the numerator [8].
We may equally choose another contour C ~ (the closed curve comprising the
interval [-R, R] and the lower semicircle Γ~R ) for the integration (Fig. 6.18). In that
case, we have
-1 -1
dx dz dx
lim I C~ = þ lim =
R→1 1 1 þ x2 R → 1 Γ~R 1 þ z
2
1 1 þ x2
= 2πi Res f ð- iÞ = - π: ð6:157Þ

Note that in the above equation the contour integral is taken in the counter-
clockwise direction and, hence, that the real definite integral has been taken from 1
to -1. Notice also that
6.8 Examples of Real Definite Integrals 239

1
Res f ð- iÞ = f ðzÞðz þ iÞjz = - i = - ,
2i

because f(z) has a simple pole at z = - i in the lower semicircle (Fig. 6.18) for which
the residue has been taken. From (6.157), we get the same result as before such that
1
dx
= π:
-1 1 þ x2

Alternatively, we can use the method of partial fraction decomposition. In that


case, as the integrand we have

1 1 1 1
= - : ð6:158Þ
1 þ z2 2i z - i z þ i

The residue of 1þ1z2 at z = i is 2i1 [i.e., the coefficient of z -1 i in (6.158)], giving the
same result as (6.153).
Example 6.5 Evaluate the following real definite integral:
1
dx
I= : ð6:159Þ
-1 ð 1 þ x2 Þ 3

As in the case of Example 6.4, we evaluate a contour integration described by

R
1 dz dz
IC = 3
dz = þ , ð6:160Þ
C ð1 þ z2 Þ -R ð1 þ z2 Þ3 ΓR ð 1 þ z2 Þ 3

where C stands for the closed curve comprising the interval [-R, R] and the upper
semicircle ΓR (see Fig. 6.18).
The function f ðzÞ  ð1þz1 2 Þ3 has two isolated poles at z = ± i. In the present case,
however, the poles are of order of 3. Hence, we use (6.147) to estimate the residue.
The inside of the upper semicircle contains the pole of order 3 at z = i. Therefore, we
have

1 1 gð z Þ 1 d2 gðzÞ
Res f ðiÞ = f ðzÞdz = 3
dz = , ð6:161Þ
2πi C 2πi C ðz - i Þ 2! dz2
z=i

where
240 6 Theory of Analytic Functions

1 gð z Þ
gð z Þ  and f ðzÞ = :
ðz þ iÞ3 ðz - iÞ3

Hence, we get

1 d 2 gð z Þ 1 1
z=i = ð- 3Þ  ð- 4Þ  ðz þ iÞ - 5 = ð- 3Þ  ð- 4Þ  ð2iÞ - 5
2! dz2 2 z=i 2

3
= = Res f ðiÞ: ð6:162Þ
16i

For a reason similar to that of Example 6.4, the second term of (6.160) vanishes
when R → 1. Then, we obtain
1
dz dz gð z Þ
lim I C = 3
þ lim 3
= lim 3
dz = 2πi Res f ðiÞ
R→1 - 1 ð 1 þ z2 Þ R→1 Γ R ð 1 þ z2 Þ R→1 C ðz - i Þ

3 3π
= 2πi × = :
16i 8

That is,
1 1
dz dx 3π
3
= =I = :
- 1 ð1 þ z2 Þ -1 ð 1 þ x2 Þ 3 8

If we are able to perform the Laurent’s expansion of f(z), we can immediately


evaluate the residue using (6.145). To this end, let us utilize the binomial expansion
formula (or generalized binomial theorem). We have

1 1
f ðzÞ = 3
= :
ð1 þ z Þ
2 ðz þ iÞ ðz - iÞ3
3

Changing the variable z - i = ζ, we have

-3
1 1 1 ζ
f ðzÞ = ~f ðζ Þ = = 3  1þ
3 3
ðζ þ 2iÞ ζ ζ ð2iÞ 3 2i
6.8 Examples of Real Definite Integrals 241

2 3
1 1 ζ ð- 3Þð- 4Þ ζ ð- 3Þð- 4Þð- 5Þ ζ
=   1 þ ð- 3Þ þ þ þ⋯
ζ 3 ð2iÞ3 2i 2! 2i 3! 2i

1 1 3 3 5
=- - - þ þ⋯ , ð6:163Þ
8i ζ 3 2iζ 2 2ζ 4i

where ⋯ of the above equation denotes a power series of ζ. Accompanied by the


variable transformation z to ζ, the functional form f has been changed to ~f . Getting
back the original functional form, we have

1 1 3 3 5
f ðzÞ = - - - þ þ⋯ : ð6:164Þ
8i ðz - iÞ3 2iðz - iÞ2 2ðz - iÞ 4i

Equation (6.164) is a Laurent’s expansion of f(z). From (6.164) we see that the
coefficient of z -1 i of f(z) is 16i
3
, in agreement with (6.145) and (6.162). To obtain the
answer of (6.159), once again we may equally choose the contour C ~ (Fig. 6.18) and
get the same result as in the case of Example 6.4. In that case, f(z) can be expanded
into a Laurent’s series around z = - i. Readers are encouraged to check it.
We make a few remarks about the (generalized) binomial expansion formula. We
described it in (3.181) of Sect 3.6.1 such that

1
ð1 þ xÞ - λ = m=0

m
xm , ð3:181Þ

where λ is an arbitrary real number. In (3.181), we assumed x is a real number with |


x| < 1. However, (6.163) suggests that (3.181) holds with x that can be a complex
number with |x| < 1. Moreover, λ in (3.181) is allowed to be any complex number.
Then, on those conditions (1 + x)-λ is analytic because 1 + x ≠ 0, and so (1 + x)-λ
-3
must have a convergent Taylor’s series. By the same token, the factor 1 þ 2iζ of
(6.163) yields a convergent Taylor’s series around ζ = 0. In virtue of the factor ζ13 ,
(6.163) gives a convergent Laurent’s series around ζ = 0.
Returning to the present issue, we rewrite (3.181) as a Taylor’s series such that

1
ð1 þ zÞ - λ = m=0

m
zm , ð6:165Þ

where λ is any complex number with z also being a complex number of |z| < 1. We
may view (6.165) as a consequence of the analytic continuation and (6.163) has been
dealt with as such indeed.
242 6 Theory of Analytic Functions

1.5

0.5

-0.5

-1

-1.5

-2
-5 -3 -1 1 3 5

Fig. 6.19 Graphical form of f ðxÞ  1þx


1
3 . The modulus of f(x) tends to infinity at x = - 1. Inflection

points are present at x = 0 and 3


1=2 ð ≈ 0:7937Þ

Example 6.6 Evaluate the following real definite integral:


1
dx
I= : ð6:166Þ
-1 1 þ x3

We put

1
f ðxÞ  : ð6:167Þ
1 þ x3

Figure 6.19 depicts a graphical form of f(x). The modulus of f(x) tends to be
infinity at x = - 1. Inflection points are present at x = 0 and 3 1=2 ð ≈ 0:7937Þ.
Now, in the former two examples, the isolated singular points (i.e., poles) existed
only in the inside of a closed contour. In the present case, however, a pole is located
on the real axis. Bearing in mind this situation, we wish to estimate the integral
(6.166).
Rewriting (6.166), we have
1 1
dx dz
I= or I = :
- 1 ð1 þ xÞðx - x þ 1Þ ð1 þ zÞðz2 - z þ 1Þ
2
-1
6.8 Examples of Real Definite Integrals 243

Fig. 6.20 Contour for the


1
integration of 1þz3 that

appears in Example 6.6 Γ


/
Γ i

− −1 0
/

The polynomial of the denominator, i.e., (1 + z)(z2 - z + 1) has a real root at


p
1 ± 3i
z = - 1 and complex roots at z = e ± iπ=3 = 2 . Therefore,

1
f ðzÞ 
ð1 þ zÞðz2 - z þ 1Þ

has three simple poles at z = - 1 along with z = e±iπ/3. This time, we have a contour
C depicted in Fig. 6.20, where the contour integration IC is performed in such a way
that

-1-r
dz dz
IC = þ
-R ð 1 þ z Þ ð z 2 - z þ 1Þ Γ-1 ð1 þ zÞðz2 - z þ 1Þ
R
dz
þ
- 1þr ð 1 þ z Þ ð z 2 - z þ 1Þ

dz
þ , ð6:168Þ
ΓR ð 1 þ z Þ ð z 2 - z þ 1Þ

where Γ-1 stands for a small semicircle of radius r around z = - 1 as shown. Of the
complex roots z = e±iπ/3, only z = eiπ/3 is responsible for the contour integration (see
Fig. 6.20).
Here we define a principal value P of the integral such that
244 6 Theory of Analytic Functions

1 -1-r
dx dx
P  lim ð6:169Þ
- 1 ð 1 þ xÞ ð x - x þ 1Þ ð 1 þ x Þ ð x2 - x þ 1Þ
2 r→0 -1

1
dx
þ ,
- 1þr ð 1 þ x Þ ðx 2 - x þ 1Þ

where r is an arbitrary real positive number. As shown in this example, the principal
value is a convenient device to traverse the poles on the contour and gives a correct
answer in the contour integral. At the same time, getting R to infinity, we obtain
1
dx dz
lim I C = P þ
R→1 -1 ð 1 þ x Þ ð x2 - x þ 1 Þ Γ-1 ð1 þ zÞðz2 - z þ 1Þ
dz
þ : ð6:170Þ
Γ1 ð 1 þ z Þ ð z 2 - z þ 1Þ

Meanwhile, the contour integral IC is given by (6.146) such that

lim I C = f ðzÞdz = 2πi Res f aj , ð6:146Þ


R→1 C j

where in the present case f ðzÞ  ð1þzÞðz12 - zþ1Þ and we have only one simple pole at
aj = eiπ/3 within the contour C.
We are going to get the answer by combining (6.170) and (6.146) such that
1
dx
IP
-1 ð1 þ x Þ ð x 2 - x þ 1Þ

dz
= 2πi Res f eiπ=3 -
Γ-1 ð 1 þ z Þ ðz 2 - z þ 1Þ

dz
- : ð6:171Þ
Γ1 ð1 þ zÞðz2 - z þ 1Þ

The third term of RHS of (6.171) tends to be zero as in the case of Examples 6.4
and 6.5. Therefore, if we can estimate the second term as well as the residues
properly, the integral (6.166) can be adequately determined. Note that in this case
the integral is meant by the principal value. To estimate the integral of the second
term, we make use of the fact that defining g(z) as
6.8 Examples of Real Definite Integrals 245

1
gð z Þ  , ð6:172Þ
z2 - z þ 1

g(z) is analytic at and around z = - 1. Hence, g(z) can be expanded in the Taylor’s
series around z = - 1 such that
1
gðzÞ = A0 þ A ðz
n=1 n
þ 1Þ n , ð6:173Þ

where A0 ≠ 0. In fact,

A0 = gð- 1Þ = 1=3: ð6:174Þ

Then, we have

dz gðzÞdz dz
= = A0
ð þ Þ ð Γ-1 1 þ z Γ-1 þ z
1 z z 2 - z þ 1Þ 1
Γ-1
1
þ A ðz
n=1 n
þ 1Þn - 1 dz: ð6:175Þ
Γ-1

Changing the variable z to the polar form in RHS such that

z = - 1 þ reiθ , ð6:176Þ

with the first term of RHS we have

0 π
ireiθ dθ
A0 iθ
= A0 idθ = A0 idθ = - A0 idθ = - A0 iπ: ð6:177Þ
Γ - 1 re Γ-1 π 0

Note that in (6.177) the contour integration along Γ-1 was performed in the
decreasing direction of the argument θ from π to 0. Thus, (6.175) is rewritten as

0
dz 1
= - A0 iπ þ An i rn einθ dθ, ð6:178Þ
Γ - 1 ð1 þ zÞðz - z þ 1Þ
2 n=1
π

where with the last equality we exchanged the order of the summation and integra-
tion. The second term of RHS of (6.178) vanishes when r → 0. Thus, (6.171) is
further rewritten as

I = lim I = 2πi Res f eiπ=3 þ A0 iπ: ð6:179Þ


r→0

We have only one simple pole for f(z) at z = eiπ/3 within the semicircle contour C.
Then, using (6.148), we have
246 6 Theory of Analytic Functions

Res f eiπ=3 = f ðzÞ z - eiπ=3


z = eiπ=3

1 1 p
= =- 1 þ 3i : ð6:180Þ
ð1 þ eiπ=3 Þðeiπ=3 - e - iπ=3 Þ 6

The above calculus is straightforward, but (6.37) can be conveniently used. Thus,
finally we get
p
πi p πi 3π π
I= - 1 þ 3i þ = =p : ð6:181Þ
3 3 3 3

That is,

I ≈ 1:8138: ð6:182Þ

To avoid any ambiguity of the principal value, we properly define it as [8].

1 R
P f ðxÞdx  lim f ðxÞdx:
-1 R→1 -R

The integral can also be calculated using a lower semicircle including the simple
p
1 - 3i
pole located at z = e - 3 =

2 . That gives the same answer as (6.181). The
calculations are left for readers as an exercise.
Example 6.7 [9, 10] It will be interesting to compare the result of Example 6.6 with
that of the following real definite integral:
1
dx
I= : ð6:183Þ
0 1 þ x3

To evaluate (6.183), it is convenient to use a sector of circle for contour (see


Fig. 6.21a). In this case, the contour integral IC estimated along a sector is described
by

R
dz dz dz
IC = þ þ : ð6:184Þ
0 1 þ z3 ΓR 1 þ z3 L 1 þ z3

In the third integral of (6.184), we take argz = θ (constant) so that with z = reiθ
we may have dz = dreiθ. Then, that integral is given by
6.8 Examples of Real Definite Integrals 247

(a) (b)
Γ
Γ
/
i /
i
Γ

−1 0 − −1 0
/ /

1
Fig. 6.21 Sector of circle for contour integration of 1þz 3 that appears in Example 6.7. (a) The

integration range is [0, 1). (b) The integration range is (-1, 0]

dz dr
= :
L 1 þ z3 L 1 þ r 3 e3iθ

Setting 3iθ = 2iπ, namely θ = 2π/3, we get

0 R
dz dr dr
= e2πi=3 = - e2πi=3 :
L 1 þ r 3 e3iθ R 1 þ r3 0 1 þ r3

Thus, taking R → 1 in (6.184) we have


1 1
dx dz dr
lim I C = þ - e2πi=3 : ð6:185Þ
R→1 0 1 þ x3 Γ1 1 þ z3
0 1 þ r3

The second term of (6.185) vanishes as before. Changing the variable r → x, we


obtain
1
dx
½RHS of ð6:185Þ = 1 - e2πi=3 = 1 - e2πi=3 I: ð6:186Þ
0 1 þ x3

Meanwhile, considering that f(z) has a simple pole at z = eiπ/3 within the contour
C, i.e., inside the sector (see Fig. 6.21a), we have

½LHS of ð6:185Þ = 2πiRes f eiπ=3 , ð6:187Þ

where f(x) was given in (6.167) of Example 6.6. Then, equating (6.186) with (6.187)
we have
248 6 Theory of Analytic Functions

I = 2πiRes f eiπ=3 = 1 - e2πi=3

= 2πi= 1 þ eiπ=3 eiπ=3 - e - iπ=3 1 - e2πi=3 :

To calculate Res f(eiπ/3), once again we used (6.148) at z = eiπ/3; see (6.180).
p
Noting that (1 + eiπ/3)(1 - e2πi/3) = 3 and eiπ=3 - e - iπ=3 = 2i sinðπ=3Þ = 3i, we get

p 2π
I = 2πi= 3 3i = p : ð6:188Þ
3 3

That is, I ≈ 1.2092. This number is two-thirds of (6.182).


We further extend the above calculation method to evaluation of real definite
integral. For instance, we have a following integral described by

0
~I = P dx
: ð6:189Þ
-1 1 þ x
3

To evaluate this integral, we define the contour integral I C0 as shown in


Fig. 6.21b, which is expressed as

0
dz dz dz dz
I C0 = P þ þ þ : ð6:190Þ
-R 1 þ z3 Γ-1 ð 1 þ z Þ ð z 2 - z þ 1Þ
L 0 1 þ z3 ΓR 1 þ z3

Notice that combining I C0 of (6.190) with IC of (6.184) makes the contour


integration of (6.168). Proceeding as before and noting that there is no singularity
inside the contour C′, we have

0 1 0
dz πi dr dz π
lim I C0 = 0 = P - þ e2πi=3 =P - p ,
-1 1 þ z 1 þ r3 1 þ z3 3 3
R→1 3 3 0 -1

where we used (6.188) and (6.178) in combination with (6.174). Thus, we get

0
~I = dx π
= p : ð6:191Þ
-1 1 þ x
3
3 3

As a matter of course, the result of (6.191) can immediately be obtained by


subtracting (6.188) from (6.181).
We frequently encounter real definite integrals including trigonometric functions.
In such cases, to make a valid estimation of the integrals it is desired to use complex
exponential functions. Jordan’s lemma is a powerful tool for this. The proof is given
as the following literature [5].
6.8 Examples of Real Definite Integrals 249

Lemma 6.6: Jordan’s Lemma [5] Let ΓR be a semicircle of radius R centered at


the origin in the upper half of the complex plane. Let f(z) be an analytic function that
tends uniformly to zero as jz j → 1 when argz lies in the interval 0 ≤ arg z ≤ π.
Then, with a non-negative real number α, we have

lim eiαζ f ðζ Þdζ = 0:


R→1 Γ
R

Proof Using polar coordinates, ζ is expressed as

ζ = Reiθ = Rðcos θ þ i sin θÞ:

Then, the integral is denoted by


π
IR  eiαζ f ðζ Þdζ = iR f Reiθ eiαRðcos θþi sin θÞ eiθ dθ
ΓR 0

π
= iR f Reiθ eiαR cos θ - αR sin θþiθ dθ:
0

By assumption f(ζ) tends uniformly to zero as jζ j → 1, and so we have

f Reiθ < εðRÞ,

where ε(R) is a certain positive number that depends only on R and tends to be zero
as jζ j → 1. Therefore, we have

π π=2
jI R j < εðRÞR e - αR sin θ dθ = 2 εðRÞR e - αR sin θ dθ,
0 0

where the last equality results from the symmetry of sinθ about π/2. Meanwhile, we
have

sin θ ≥ 2θ=π when 0 ≤ θ ≤ π=2:

Hence, we have

π=2
πεðRÞ
jI R j < 2 εðRÞR e - 2αRθ=π dθ = 1 - e - αR :
0 α

Therefore, we get
250 6 Theory of Analytic Functions

lim I R = 0:
R→1

These complete the proof.


Alternatively, for f(z) that is analytic and tends uniformly to zero as jz j → 1
when argz lies in the interval π ≤ arg z ≤ 2π, similarly we have

lim e - iαζ f ðζ Þdζ = 0,


R→1 Γ
R

where α is a certain non-negative real number. In this case, we assume that a


semicircle ΓR of radius R centered at the origin lies in the lower half of the complex
plane.
Using the Jordan’s lemma, we have several examples.
Example 6.8 Evaluate the following real definite integral:
1
sin x
I= dx: ð6:192Þ
-1 x

Rewriting (6.192) as
1 1
sin z 1 eiz - e - iz
I= dz = dz, ð6:193Þ
-1 z 2i -1 z

we consider the following contour integral:

R
eiz eiz eiz
IC = P dz þ dz þ dz, ð6:194Þ
-R z Γ0 z ΓR z

where Γ0 represents an infinitesimally small semicircle around the origin and ΓR


shows the outer semicircle of radius R centered at the origin (see Fig. 6.22). The
contour C consists of the real axis, Γ0, and ΓR as shown.
The third term of (6.194) vanishes when R → 1 due to the Jordan’s lemma. With
the second term, eiz is analytic in the whole complex plane (i.e., entire function) and,
hence, can be expanded in the Taylor’s series around any complex number.
Expanding it around the origin we have

1 ðizÞn 1 ðizÞn
eiz = n = 0 n!
=1 þ n = 1 n!
: ð6:195Þ

As already mentioned in (6.173) through (6.178), among terms of RHS in (6.195)


only the first term, i.e., 1, contributes to the integral of the second term of RHS in
(6.194). Thus, as in (6.178) we get
6.8 Examples of Real Definite Integrals 251

Fig. 6.22 Contour for the


integration of eiz/z that
appears in Example 6.8. The Γ
same contour will appear in
Example 6.9 as well

Γ
− 0

eiz
dz = - 1 × iπ = - iπ: ð6:196Þ
Γ0 z

Notice that in (6.196) the argument is decreasing from π → 0 during the contour
integration. Within the contour C, there is no singular point, and so IC = 0 due to
Theorem 6.10 (Cauchy’s integral theorem). Thus, from (6.194) we obtain
1
eiz eiz
P dz = - dz = iπ: ð6:197Þ
-1 z Γ0 z

Exchanging the variable z → - z in (6.197), we have

-1 1
e - iz e - iz
P - dz = P dz = iπ: ð6:198Þ
1 -z -1 - z

Summing both sides of (6.197) and (6.198), we get


1 1
eiz e - iz
P dz þ P dz = 2iπ: ð6:199Þ
-1 z -1 - z

Rewriting (6.199), we have

1 1
ðeiz - e - iz Þ sin z
P dz = 2iP dz = 2iπ: ð6:200Þ
-1 z -1 z

That is, the answer is


252 6 Theory of Analytic Functions

1 1
sin z sin x
P dz = P dx = I = π: ð6:201Þ
-1 z -1 x

Notice that as mentioned in (6.137) z = 0 is a removable singularity of (sinz)/z,


and so the symbol P is superfluous in (6.201).
Example 6.9 Evaluate the following real definite integral:

1
sin 2 x
I= dx: ð6:202Þ
-1 x2

From the trigonometric theorem, we have

1
sin 2 x = ð1 - cos 2xÞ: ð6:203Þ
2

Then, the integrand of (6.202) with the variable changed is rewritten using (6.37)
as

sin 2 z 1 1 - e2iz 1 - e - 2iz


= þ : ð6:204Þ
z2 4 z2 z2

Expanding e2iz as before, we have

1 ð2izÞn 1 ð2izÞn
e2iz = n = 0 n!
= 1 þ 2iz þ n = 2 n!
: ð6:205Þ

Then, we get

1 ð2izÞn
1 - e2iz = - 2iz - n = 2 n!
,

1 - e2iz 2i 1 ð2iÞn zn - 2
=- - n=2
: ð6:206Þ
z2 z n!

As in Example 6.8, we wish to use (6.206) to estimate the following contour


integral IC:
1
1 - e2iz 1 - e2iz 1 - e2iz
lim I C = P dz þ dz þ dz, ð6:207Þ
R→1 -1 z2 Γ0 z2 Γ1 z2

where the contour for the integration is the same as that of Fig. 6.22.
With the second integral of (6.207), as before, only the first term -2i/z of (6.206)
contributes to the integral. The result is given as
6.8 Examples of Real Definite Integrals 253

0
1 - e2iz 1
dz = - 2i dz = - 2i idθ = - 2π: ð6:208Þ
Γ0 z2 Γ0 z π

With the third integral of (6.207), we have

1 - e2iz 1 e2iz
dz = dz - dz: ð6:209Þ
Γ1 z2 Γ1 z2
Γ1 z
2

The first term of RHS of (6.209) vanishes similarly to the integral of (6.155) that
appeared in Example 6.4. The second term of RHS vanishes as well because of the
Jordan’s lemma. Within the contour C of (6.207), again there is no singular point,
and so lim I C = 0. Hence, from (6.207) we have
R→1

1
1 - e2iz 1 - e2iz
P dz = - dz = 2π: ð6:210Þ
-1 z2 Γ0 z2

Exchanging the variable z → - z in (6.210) as before, we have

-1 1
1 - e - 2iz 1 - e - 2iz
P - dz = P dz = 2π: ð6:211Þ
1 ð- zÞ2 -1 z2

Summing both sides of (6.210) and (6.211) in combination with (6.204), we get
an answer described by

1
sin 2 z
dz = π: ð6:212Þ
-1 z2

Again, the symbol P is superfluous in (6.210) and (6.211).


Example 6.10 Evaluate the following real definite integral:
1
cos x
I= dx ða ≠ bÞ: ð6:213Þ
-1 ð x - a Þ ð x - bÞ

We rewrite the denominator of (6.213) using the method of partial fraction


decomposition such that

1 1 1 1
= - : ð6:214Þ
ðx - aÞðx - bÞ a - b x - a x - b

Then, (6.213) is described by


254 6 Theory of Analytic Functions

Fig. 6.23 Contour for the


integration of ðz - acos z (a)
Þðz - bÞ that
appears in Example 6.10. Γ
(a) Along the upper contour
C. (b) Along the lower
contour C′
Γ
− 0

(b)

− 0
Γ

1
1 cos z cos z
I= - dz
a-b -1 z-a z-b
1
1 eiz þ e - iz eiz þ e - iz
= - dz: ð6:215Þ
2 ð a - bÞ -1 z-a z-b

We wish to consider the integration by dividing the integrand and calculate each
term individually. First, we estimate
1
eiz
dz: ð6:216Þ
-1 z - a

To apply the Jordan’s lemma to the integration of (6.216), we use the upper
contour C that consists of the real axis, Γa, and ΓR; see Fig. 6.23a. As usual, we
consider a following integral:
6.8 Examples of Real Definite Integrals 255

1
eiz eiz eiz
lim I C = P dz þ dz þ dz, ð6:217Þ
R→1 -1 z - a Γa z - a Γ1 z - a

where Γa represents an infinitesimally small semicircle around z = a. Notice that


(6.217) is obtained when we are considering R → 1 in Fig. 6.23a. The third term of
(6.217) vanishes due to the Jordan’s lemma. With the second term of (6.217), we
change variable z - a → z and rewrite it as

eiz eiðzþaÞ eiz


dz = dz = eia dz = - eia iπ, ð6:218Þ
Γa z - a Γ0 z Γ0 z

where with the last equality we used (6.196). There is no singular point within the
contour C, and so lim I C = 0. Then, we have
R→1

1
eiz eiz
P dz = - dz = eia iπ: ð6:219Þ
-1 - a
z Γa - a
z

Next, we consider a following integral that appears in (6.215):


1
e - iz
dz: ð6:220Þ
-1 z - a

To apply the Jordan’s lemma to the integration of (6.220), we use the lower
contour C′ that consists of the real axis, Γ~a , and Γ~R (Figure 6.23b). This time, the
integral to be evaluated is given by
1
e - iz e - iz e - iz
lim I C0 = - P dz þ dz þ dz ð6:221Þ
R→1 -1 - a
z Γ~a - a
z Γ~1 - a
z

so that we can trace the contour C′ counter-clockwise. Notice that the minus sign is
present in front of the principal value. The third term of (6.221) vanishes due to the
Jordan’s lemma. Evaluating the second term of (6.221) similarly just above, we get

e - iz e - iðzþaÞ e - iz
dz = dz = e - ia dz = - e - ia iπ: ð6:222Þ
Γ~a z - a Γ~0 z Γ~0 z

Hence, we obtain
1
e - iz e - iz
P dz = dz = - e - ia iπ: ð6:223Þ
-1 z - a Γ~a z - a

Notice that in (6.222) the argument is again decreasing from 0 → - π during the
contour integration. Summing (6.219) and (6.223), we get
256 6 Theory of Analytic Functions

1 1 1
eiz e - iz eiz þ e - iz
P dz þ P dz = P dz
-1 z - a -1 z - a -1 z-a

= eia iπ - e - ia iπ = iπ eia - e - ia = iπ  2i sin a = - 2π sin a: ð6:224Þ

In a similar manner, we also get


1
eiz þ e - iz
P dz = - 2π sin b:
-1 z-b

Thus, the final answer is

1
1 eiz þ e - iz eiz þ e - iz 2π ðsin b - sin aÞ
I= - dz =
2ð a - bÞ -1 z-a z-b 2ða - bÞ
π ðsin b - sin aÞ
= :
a-b

Example 6.11 Evaluate the following real definite integral:


1
cos x
I= dx: ð6:225Þ
-1 1 þ x2

The calculation is left for readers. Hints: (i) Use cosx = (eix + e-ix)/2 and Jordan’s
lemma. (ii) Use the contour as shown in Fig. 6.18. Besides the examples listed
above, a variety of integral calculations can be seen in literature [9–12].

6.9 Multivalued Functions and Riemann Surfaces

The last topics of the theory of analytic functions are related to multivalued func-
tions. Riemann surfaces play a central role in developing the theory of the
multivalued functions.

6.9.1 Brief Outline

We mention a brief outline for notions of multivalued functions and Riemann


surfaces. These notions are indispensable for investigating properties of irrational
functions and logarithmic functions and performing related calculations, especially
integrations with respect to those functions.
6.9 Multivalued Functions and Riemann Surfaces 257

Definition 6.12 Let n be a positive integer and let z be an arbitrary complex number.
Suppose z can be written as

z = wn : ð6:226Þ

Then, with respect to w it is denoted by

w = z1=n ð6:227Þ

and w is said to be a n-th root of z.


We have expressly shown this simple thing. It is because if we think of real z, z1/n is
uniquely determined only when n is odd regardless of whether z is positive or
negative. In case we think of even n, (i) if z is positive, we have two n-th root of
±z1/n. (ii) If z is negative, however, we have no n-th root. Meanwhile, if we think of
complex z, we have a totally different situation. That is, regardless of whether n is odd
or even, we always have n different z1/n for any given complex number z.
As a tangible example, let us think of n-th roots of ±1 for n = 2 and 3. These
are depicted graphically in Fig. 6.24. We can readily understand the statement
of the above paragraph. In Fig. 6.24a, if z = - 1 for n = 2, we have either
w = (-1)1/2 = (eiπ)1/2 = i or w = (-1)1/2 = (e-iπ)1/2 = - i with double root. In
Fig. 6.24b, if z = - 1 for n = 3, we have w = (-1)1/3 = - 1; w = (-1)1/3 = (eiπ)1/3 =
eiπ/3; w = (-1)1/3 = (e-iπ)1/3 = e-iπ/3 with triple root. Moreover, we have 11/2 = 1; - 1.
Also, we have 11/3 = 1; e2iπ/3; e-2iπ/3.
Let us extend the above preliminary discussion to the analytic functions. Suppose
we are given a following polar form of z such that

z = r ðcos θ þ i sin θÞ ð6:32Þ

and

w = ρðcos φ þ i sin φÞ:

Then, we have

z = wn = ρn ðcos φ þ i sin φÞn = ρn ðcos nφ þ i sin nφÞ, ð6:228Þ

where with the last equality we used de Moivre’s theorem (6.36). Comparing the real
and imaginary parts of (6.32) and (6.228), we have

r = ρn or ρ = r 1=n ð6:229Þ

and
258 6 Theory of Analytic Functions

(a) 1 /
(−1) /

−1 1

(b) 1 /
(−1) /
/
/

1 −1

/
/

Fig. 6.24 Multiple roots in the complex plane. (a) Square roots of 1 and -1. (b) Cubic roots of
1 and -1

θ = nφ þ 2kπ ðk = 0, ± 1, ± 2, ⋯Þ: ð6:230Þ

In (6.229), we define both r and ρ as being positive so that positive ρ can be


uniquely determined for any positive integer n (i.e., either even or odd). Recall once
again that if n is even, we have two n-th roots for ρ (i.e., both positive and negative)
with given positive r. Of these two roots, we discard the negative ρ. Rewriting
(6.230), we have
6.9 Multivalued Functions and Riemann Surfaces 259

1
φ= ðθ - 2kπ Þ ðk = 0, ± 1, ± 2, ⋯Þ:
n

Further rewriting (6.227), we get

1 1 1 1
wk ðθÞ = zn = r n cos ðθ - 2kπ Þ þ i sin ðθ - 2kπ Þ
n n

ðk = 0, 1, 2, ⋯, n - 1Þ, ð6:231Þ

where wk(θ) is said to be a branch. Note that we have

wk ðθ - 2nπ Þ = wk ðθÞ: ð6:232Þ

That is, wk(θ) is a periodic function of the period 2nπ. The number of the total
branches is n; the index k of wk(θ) denotes these different branches. In particular,
when k = 0, we have

1 1 1 1
w0 ðθÞ = zn = r n cos θ þ i sin θ : ð6:233Þ
n n

Comparing (6.231) and (6.233), we obtain

wk ðθÞ = w0 ðθ - 2kπ Þ: ð6:234Þ

From (6.234), we find that wk(θ) is obtained by shifting (or rotating) w0(θ) by 2kπ
toward the positive direction of θ. The branch w0(θ) is called a principal branch of
the n-th root of z. The value of w0(θ) is called the principal value of that branch. The
implication of the presence of the branches is as follows: Suppose that the branches
besides w0(θ) would be absent. Then, from (6.233) we have

1 1 2π 2π 1
w0 ð2π Þ = zn = r n cos þ i sin ≠ w0 ð0Þ = r n :
n n

This causes inconvenience, because θ = 0 and θ = 2π are assumed to be identical


in the complex plane. Hence, the single-valuedness would be broken with w0(θ).
However, in virtue of the presence of w1(θ), we have

w1 ð2π Þ = w0 ð2π - 2π Þ = w0 ð0Þ:

This means that the value of w0(0) is recovered and, hence, the single-valuedness
remains intact. After another cycle around the origin, similarly we have
260 6 Theory of Analytic Functions

w2 ð4π Þ = w0 ð4π - 4π Þ = w0 ð0Þ:

Thus, in succession we get

wk ð2kπ Þ = w0 ð2kπ - 2kπ Þ = w0 ð0Þ:

For k = n, from (6.231) we have

1 1 1 1
wn ðθÞ = zn = r n cos ðθ - 2nπ Þ þ i sin ðθ - 2nπ Þ
n n

1 1
= r 1=n cos θ - 2π þ i sin θ - 2π
n n

1 1 1
= r n cos θ þ i sin θ = w0 ðθÞ:
n n

Thus, we have no more new function. At the same time, the single-valuedness of
wk(θ) (k = 0, 1, 2, ⋯, n - 1) remains intact during these processes. To summarize the
above discussion, if we have n planes (or sheets) as the complex planes and allocate
them to w0(θ), w1(θ), ⋯, wn - 1(θ) individually, we have a single-valued function
w(θ) as a whole throughout these planes. In a word, the following relation represents
the situation:

w 0 ðθ Þ for Plane 0,
w 1 ðθ Þ for Plane 1,
wð θ Þ =
⋯ ⋯
wn - 1 ðθÞ for Plane n - 1:

This is the essence of the Riemann surface. The superposition of these n planes is
called a Riemann surface and each plane is said to be a Riemann sheet [of the
function w(θ)]. Each single-valued function w0(θ), w1(θ), ⋯, wn - 1(θ) defined on
each Riemann sheet is called a branch of w(θ). In the above discussion, the origin is
called a branch point of w(z).
For simplicity, let us think of the function w(z) described by
p
wðzÞ  z1=2 = z:

In this case, for z = reiθ (0 ≤ θ ≤ 2π) we have two different values w0 and w1
given by
6.9 Multivalued Functions and Riemann Surfaces 261

w0 ðθÞ = r 1=2 eiθ=2 ,

w1 ðθÞ = r 1=2 eiðθ - 2π Þ=2 = w0 ðθ - 2π Þ = - w0 ðθÞ: ð6:235Þ

Then, we have

w0 ðθÞ for Plane 0,


wðθÞ = ð6:236Þ
w1 ðθÞ for Plane 1:

In this case, w(z) is called a “two-valued” function of z and the functions w0 and
w1 are said to be branches of w(z). Suppose that z makes a counter-clockwise circuit
around the origin, starting from, e.g., a real positive number z = z0 to come full circle
to the original point z = z0. Then, the argument of z has been increased by 2π. In this
situation the arguments of the individual branches w0 and w1 are increased by π.
Accordingly, w0 is switched to w1 and w1 is switched to w0. This situation can be
understood more clearly from (6.232) and (6.234). That is, putting n = 2 in (6.232),
we have

wk ðθ - 4π Þ = wk ðθÞ ðk = 0, 1Þ: ð6:237Þ

Meanwhile, putting k = 1 in (6.234) we get

w1 ðθÞ = w0 ðθ - 2π Þ: ð6:238Þ

Replacing θ with θ + 2π in (6.238), we have w1(θ + 2π) = w0(θ). Also replacing θ


with θ + 4π in (6.237), we have w1(θ + 4π) = w1(θ) = w0(θ - 2π) = w0(θ + 2π),
where with the last equality we replaced θ with θ + 2π in (6.237). Rewriting the
above, we get

w1 ðθ þ 2π Þ = w0 ðθÞ and w0 ðθ þ 2π Þ = w1 ðθÞ:

That is, adding 2π to θ, w0 and w1 are switched to each other.


Let us further think of the two-valued function of w(z) = z1/2. Strictly speaking, an
analytic function cannot be a two-valued function. This is because if so, the
continuity and differentiability will be lost from the function and, hence, the
analyticity would be broken. Then, we must make a suitable device to avoid
it. Such a device is called a Riemann surface.
Let us make a kit of the Riemann surface following Fig. 6.25. (i) Take a sheet of
paper so that it can represent a complex plane and cut it with scissors along the real
axis that starts from the origin so that the cut (or slit) can be made toward the real
positive direction. This cut is called a branch cut with the origin being a branch
point. Let us call this sheet Plane I. (ii) Take another sheet of paper and call it Plane
II. Also make a cut in Plane II in exactly the same way as that for Plane I; see
262 6 Theory of Analytic Functions

(b)
(a) Plane I placed on top of
Plane II

Plane I Plane II

Fig. 6.25 Simple kit that helps visualize the Riemann surface. To make it, follow next procedures:
(a) Take two sheets of paper (Planes I and II) and make slits (indicated by dashed lines) as shown.
Next, put Plane I on top of Plane II so that the two slits (i.e., branch cuts) can fit in line. (b) Tape
together the downside of the cut of Plane I and the foreside of the cut of Plane II. Then, also tape
together the downside of the cut of Plane II and the foreside of the cut of Plane I

Fig. 6.25a for the processes (i) and (ii). (iii) Next, put Plane I on top of Plane II so that
the two branch cuts can fit in line. (iv) Tape together the downside of the cut of Plane
I and the foreside of the cut of Plane II (Fig. 6.25b). (v) Then, also tape together the
downside of the cut of Plane II and the foreside of the cut of Plane I (see Fig. 6.25b
once again).
Thus, what we see is that starting from, e.g., a real positive number z = z0 of Plane
I and coming full circle to the original point z = z0, then we cross the cut to enter
Plane II located underneath Plane I. After another cycle within Plane II, we come
back to Plane I again by crossing the cut. After all, we come back to the original
Plane I after two cycles on the combined planes. This combined plane is called the
Riemann surface. In this way, Planes I and II correspond to different branches w0 and
w1, respectively, so that each branch can be single-valued. In other words, w(z) = z1/2
is a single-valued function that is defined on the whole Riemann surface.
There are a variety of Riemann surfaces according to the nature of the complex
functions. Another example of the multivalued function is expressed as

wðzÞ = ðz - aÞðz - bÞ,

where two branch points are located at z = a and z = b. As before, we assume that

z - a = r a eiθa and z - b = r b eiθb :

Then, we have
6.9 Multivalued Functions and Riemann Surfaces 263

Fig. 6.26 Graphically


decided argument θa, which
is an angle between a line
connecting z and a and
another line drawn in
parallel with the real axis

p
wðθa , θb Þ = z1=2 = r a r b eiðθa þθb Þ=2 : ð6:239Þ

In the above, e.g., the argument θa can be decided graphically as shown in


Fig. 6.26, where θa is an angle between a line connecting z and a and another line
drawn in parallel with the real axis. We can choose w(θa, θb) of (6.239) for the
principal branch and define this as
p
w 0 ðθ a , θ b Þ  r a r b eiðθa þθb Þ=2 : ð6:240Þ

Also, we define w1(θa, θb) as


p
w 1 ðθ a , θ b Þ  r a r b eiðθa þθb - 2π Þ=2 : ð6:241Þ

Then, as in (6.235) we have

w1 ðθa , θb Þ = w0 ðθa - 2π, θb Þ = w0 ðθa , θb - 2π Þ = - w0 ðθa , θb Þ: ð6:242Þ

From (6.242), we find that adding 2π to θa or θb, w0 and w1 are switched to each
other as before. We also find that after the variable z has come full circle around one
of a and b, w0(θa, θb) changes sign as in the case of (6.235).
264 6 Theory of Analytic Functions

We also have

w0 ðθa - 2π, θb - 2π Þ = w0 ðθa , θb Þ,

w1 ðθa - 2π, θb - 2π Þ = w1 ðθa , θb Þ: ð6:243Þ

From (6.243), however, after the variable z has come full circle around both a and
b, both w0(θa, θb) and w1(θa, θb) keep the original value. This is also the case where
the variable z comes full circle without encircling a or b. These behaviors imply that
(i) if a contour encircles one of z = a and z = b, w0(θa, θb) and w1(θa, θb) change sign
and switch to each other. Thus, w0(θa, θb) and w1(θa, θb) form branches. On the other
hand, (ii) if a contour encircles both z = a and z = b, w0(θa, θb) and w1(θa, θb) remain
intact. (iii) If a contour encircles neither z = a nor z = b, w0(θa, θb) and w1(θa, θb)
remain intact as well.
These three cases (i), (ii), and (iii) are depicted in Fig. 6.27a, b, and c, respec-
tively. In Fig. 6.27c after z comes full circle, argz relative to z = a returns to the
original value keeping θ1 ≤ arg z ≤ θ2. This is similarly the case with argz relative to
z = b. Correspondingly, the branch cut(s) are depicted with doubled broken line(s),
e.g., as shown in Fig. 6.28. Note that in the cases (ii) and (iii) the branch cut can be
chosen so that the circle (or contour) may not cross the branch cut. Although a bit
complicated, the Riemann surface can be shaped accordingly. Other choices of the
branch cuts are shown in Sect. 6.9.2 (vide infra).
When we consider logarithm functions, we have to deal with rather complicated
situation. For polar coordinate representation, we have z = reiθ. That is,

ln z = ln r þ i arg z = lnjzj þ i arg z, ð6:244Þ

where

arg z = θ þ 2πn ðn = 0, ± 1, ± 2, ⋯Þ: ð6:245Þ

If n = 0 is chosen, ln z is said to be the principal value. With the logarithm


functions, the Riemann surface comprises infinite planes each of which corresponds
to the individual branch whose argument is given by (6.245). The point z = 0 is
p
called a logarithmic branch point. Meanwhile, the branch point for, e.g., wðzÞ = z
is said to be an algebraic branch point.

6.9.2 Examples of Multivalued Functions

When we deal with a function having a branch point, we must remind that unless the
contour crosses the branch cut (either algebraic or logarithmic), a function we are
6.9 Multivalued Functions and Riemann Surfaces 265

(b)
(a)

0 0

(c)

0
Fig. 6.27 Geometrical relationship between contour C and two branch points (located at z = a and
z = b) for ðz - aÞðz - bÞ. (a) The contour C encircles only z = a. (b) C encircles both z = a and
z = b. (c) C encircles neither z = a nor z = b

(a) (b)

0 0
Fig. 6.28 Branch cut(s) for ðz - aÞðz - bÞ. (a) A branch cut is shown by a line connecting z = a
and z = b. (b) Branch cuts are shown by a line connecting z = a and z = 1 and another line
connecting z = b and z = 1. In both (a) and (b) the branch cut(s) are depicted with doubled broken
line(s)
266 6 Theory of Analytic Functions

Fig. 6.29 Branch cut


(shown with a doubled
broken line) and contour for
Γ
a-1
the integration of z1þz

−1
Γ
0

thinking of is held single-valued and analytic (if that function is originally analytic)
and can be differentiated or integrated in a normal way. We give some examples
for this.
Example 6.12 [6] Evaluate the following real definite integral:
1
xa - 1
I= dx ð0 < a < 1Þ: ð6:246Þ
0 1þx

We rewrite (6.246) as
1 a-1
z
I= dz: ð6:247Þ
0 1þz

We define the integrand of (6.247) as

za - 1
f ðzÞ  ,
1þz

where za - 1 can be further rewritten as

za - 1 = eða - 1Þ ln z :

The function f(z) has a branch point at z = 0 and a simple pole at z = - 1. Bearing
in mind this situation, we consider a contour for integration (see Fig. 6.29). In
Fig. 6.29 we depict the branch cut with a doubled broken line. Lines PQ and Q′P′
are contour lines located over and under the branch cut, respectively. We assume that
6.9 Multivalued Functions and Riemann Surfaces 267

the lines PQ and Q′P′ are illimitably close to the real axis. Thus, starting from the
point P the contour integration IC is described by

za - 1 za - 1 za - 1 za - 1
IC = dz þ dz þ dz þ dz, ð6:248Þ
PQ 1 þ z ΓR 1 þ z Q0 P0 1 þ z Γ0 1 þ z

where ΓR and Γ0 denote the outer large circle and inner small circle of their radius
R (≫1) and r (≪1), respectively. Note that with the contour integration ΓR is traced
counter-clockwise but Γ0 is traced clockwise (see Fig. 6.29). Since the simple pole is
present at z = - 1, the related residue Res f(-1) is

a-1
Res f ð- 1Þ = za - 1 z= -1
= ð- 1Þa - 1 = eiπ = - eaiπ :

Then, we have

I C = 2πi Res f ð- 1Þ = - 2πi eaiπ : ð6:249Þ

When R → 1 and r → 0, we have

za - 1 Ra - 1 za - 1 ra - 1
dz <  2πR → 0 and dz <  2πr → 0: ð6:250Þ
ΓR 1 þ z R-1 Γ0 1 þ z 1-r

Thus, the second and fourth terms of (6.248) vanish.


Meanwhile, we take the principal value of lnz of (6.244). Since the lines PQ and
Q′P′ are very close to the real axis, using (6.245) with n = 0 we can put θ = 0 on PQ
and θ = 2π on Q′P′. Then, we have

za - 1 eða - 1Þ ln z eða - 1Þ lnjzj


dz = dz = dz: ð6:251Þ
PQ 1 þ z PQ 1 þ z PQ 1þz

Moreover, we have

za - 1 eða - 1Þ ln z eða - 1Þðlnjzjþ2πiÞ


dz = dz = dz
Q0 P0 1 þ z Q0 P0 1 þ z Q0 P0 1þz

eða - 1Þ lnjzj
= eða - 1Þ2πi dz: ð6:252Þ
Q0 P0 1þz

Notice that the argument underneath the branch cut (i.e., the line Q′P′) is
increased by 2π relative to that over the branch cut.
Considering (6.249)–(6.252) and taking the limit of R → 1 and r → 0, (6.248) is
rewritten by
268 6 Theory of Analytic Functions

Fig. 6.30 Branch cut


(shown with a doubled
broken line) and contour for
the integration of
p 1 ða < bÞ. Only
ðz - aÞðz - bÞ
the argument θ1 is depicted.
With θ2 see text

1 ða - 1Þ lnjzj 0 ða - 1Þ lnjzj
e e
- 2πi eaπi = dz þ eða - 1Þ2πi dz
0 1þz 1 1þz

1 ða - 1Þ lnjzj 1 a-1
e z
= 1 - eða - 1Þ2πi dz = 1 - eða - 1Þ2πi dz: ð6:253Þ
0 1þz 0 1þz

In (6.253) we have assumed z to be real and positive (see Fig. 6.29), and so we
have e(a - 1) ln |z| = e(a - 1) ln z = za - 1. Therefore, from (6.253) we get
1 a-1
z 2πi eaπi 2πi eaπi
dz = - =- : ð6:254Þ
0 1þz 1-e ð a - 1 Þ2πi 1 - e2πai

Using (6.37) for (6.254) and performing simple trigonometric calculations (after
multiplying both the numerator and denominator by 1 - e-2πai), we finally get
1 a-1 1
z xa - 1 π
dz = dx = I = :
0 1þz 0 1þx sin aπ

Example 6.13 [13] Estimate the following real definite integral:

b
1
I= dx ða, b : real with a < bÞ: ð6:255Þ
a ðx - aÞðb - xÞ

To this end, we wish to evaluate the following integral described by

1
IC = dz, ð6:256Þ
C ð z - aÞ ð z - bÞ

where we define the integrand of (6.256) as

1
f ðzÞ  :
ðz - aÞðz - bÞ

The total contour C comprises Ca, PQ, Cb, and Q′P′ as depicted in Fig. 6.30. The
function f(z) has two branch points at z = a and z = b; otherwise f(z) is analytic. We
6.9 Multivalued Functions and Riemann Surfaces 269

draw a branch cut with a doubled broken line as shown. Since the contour C does not
cross the branch cut but encircle it, we can evaluate the integral in a normal manner.
Starting from the point P′, (6.256) can be expressed by

1 1 1
IC = dz þ dz þ dz
Ca ðz - aÞðz - bÞ PQ ð z - aÞ ð z - bÞ Cb ðz - aÞðz - bÞ

1
þ dz: ð6:257Þ
QP0 0 ð z - aÞ ð z - bÞ

We assume that the lines PQ and Q′P′ are illimitably close to the real axis. This
time, Ca and Cb are both traced counter-clockwise. Putting

z - a = r1 eiθ1 and z - b = r 2 eiθ2 ,

we have

1
f ðzÞ = p e - iðθ1 þθ2 Þ=2 : ð6:258Þ
r1 r2

Let Ca and Cb be a small circle of radius ε centered at z = a and z = b,


respectively. When we are evaluating the integral on Cb, we can put

z - b = εeiθ2 ð- π ≤ θ2 ≤ π Þ; i:e:, r 2 = ε:

We also have r1 ≈ b - a; in Fig. 6.30 we assume b > a. Hence, we have

π π p
1 iε ε
dz = p e - iðθ1 þθ2 Þ=2 dθ2 ≤ p dθ2 :
Cb ðz - aÞðz - bÞ -π r1 ε -π r1

Therefore, we get

1 1
lim dz = 0 or lim dz = 0: ð6:259Þ
ε→0 Cb ð z - aÞ ð z - bÞ ε→0 Cb ð z - aÞ ð z - bÞ

In a similar manner, we have

1
lim dz = 0:
ε→0 C
a ð z - aÞ ð z - bÞ

Meanwhile, on the line Q′P′ we have


270 6 Theory of Analytic Functions

z - a = r 1 eiθ1 and z - b = r 2 eiθ2 : ð6:260Þ

In (6.260) we have r1 = x - a, θ1 = 0 and r2 = b - x, θ2 = π; see Fig. 6.30. Also,


we have dz = dx. Hence, we have

1 a
e - iπ=2 a
i
dz = dx = - dx
0 0
QP ðz - aÞðz - bÞ b ð x - aÞ ð b - xÞ b ð x - aÞ ð b - xÞ

b
1
=i dx: ð6:261Þ
a ðx - aÞðb - xÞ

On the line PQ, in turn, we have r1 = x - a and θ1 = 2π. This is because when
going from P′ to P, the argument is increased by 2π; see Fig. 6.30 once again. Also,
we have r2 = b - x, θ2 = π, and dz = dx. Notice that the argument θ2 remains
unchanged. Then, we get

1 b
e - 3iπ=2
dz = dx
PQ ð z - aÞ ð z - bÞ a ðx - aÞðb - xÞ
b
1
=i dx: ð6:262Þ
a ðx - aÞðb - xÞ

Inserting these results into (6.257), we obtain

b
1 1
IC = dz = 2i dx: ð6:263Þ
C ð z - aÞ ð z - bÞ a ð x - aÞ ð b - xÞ

The calculation is yet to be finished, however, because (6.263) is not a real


definite integral. Then, let us try another contour integration. We choose a contour
C of a radius R and centered at the origin. We assume that R is large enough this time.
Meanwhile, putting

1
z= , ð6:264Þ
w

we have

1 dw
ð z - a Þ ð z - bÞ = ð1 - awÞð1 - bwÞ with dz = - : ð6:265Þ
w w2

Thus, we have
6.9 Multivalued Functions and Riemann Surfaces 271

(a) (b)

1/ 0< < <0<


1/

0 1/ 1/ 0
1/ 1/

Fig. 6.31 Branch cuts (shown with doubled broken lines) and contour in the w-plane for the
integration of p 1
. (a) Case of 0 < a < b. (b) Case of a < 0 < b. Notice that in both
w ð1 - awÞð1 - bwÞ
cases the contour C′ is traced clockwise (see text)

1 1
IC = dz = - dw, ð6:266Þ
C ð z - aÞ ð z - bÞ C0 w ð1 - awÞð1 - bwÞ

where the closed contour C′ of diameter 1/R comes full circle clockwise around the
origin of the w-plane (see Fig. 6.31). This is because as in the z-plane the contour C is
traced viewing the point z = 1 on the right, the contour C′ is traced viewing the
corresponding point w = 0 on the right. The above clockwise cycle results from such
a situation. The integrand of (6.266) has a simple pole at w = 0.
Regarding other singularities of the integrand, there are two branch points at
w = 1/a and w = 1/b. Then, we appropriately choose R ≫ 1 (or 1/R ≪ 1) so that C′
does not cut across the branch cut. To meet this condition, we have two choices.
(i) The branch cut connects w = 1/a and w = 1/b. (ii) Another choice of the branch
cut is that it connects w = 1/a and w = - 1 along with w = 1/b and w = 1 (see
Fig. 6.31). If we have 0 < a < b, we can choose a branch cut as that depicted in
Fig. 6.31a corresponding to Fig. 6.28a. In case a < 0 < b, the branch cut can be that
shown in Fig. 6.31b corresponding to Fig. 6.28b. (When a < b < 0, the branch cut
geometry may be that similar to Fig. 6.31a.)
Consequently, in both the above cases of (i) and (ii) there is only a simple pole at
w = 0 without any other singularities inside the contour C′. Therefore, according to
(6.148) we have a contour integration described by

1 1
dw = - 2πi Res
C 0 w ð1 - awÞð1 - bwÞ w ð1 - awÞð1 - bwÞ w=0

1
= - 2πi  = - 2πi, ð6:267Þ
ð1 - awÞð1 - bwÞ w=0

where the above minus sign is due to the clockwise rotation of the contour integra-
tion. Hence, from (6.266) we get
272 6 Theory of Analytic Functions

1
IC = dz = 2πi: ð6:268Þ
C ð z - aÞ ð z - bÞ

Equating (6.263) and (6.268), we finally obtain the answer described by


b
a
p 1
dx = π:
ðx - aÞðb - xÞ

Example 6.14 [14] In Chap. 3 we mentioned the analyticity of the function


1 - 2tx þ t 2 : ð6:269Þ

In Chap. 3, we assumed that x is a real number belonging to an interval [-1, 1].


Under this condition, we define f(t) in a complex domain as

f ðt Þ  1 - 2tx þ t 2 : ð6:270Þ

We rewrite (6.270) as
p p
f ð t Þ = t - x þ i 1 - x2 t - x - i 1 - x2 : ð6:271Þ

p
With f(t) = 0 we have two roots at t ± = x ± i 1 - x2 , and so |t±| = 1 (see Sect. 3.
6.1).
When λ = 1/2, we are dealing with essentially the same problem as that of
Example 6.13. For instance, when x = 0, the branch points are located at t = ± i.
p p
When x = 1= 2, the branch points are positioned at t = ð1 ± iÞ= 2. These values of
t form the branch points. In Fig. 6.32 we depict these branch points and
corresponding branch cuts. In this situation, the function

- 1=2
hðt Þ  1 - 2tx þ t 2 ð6:272Þ

is analytic within a circle jt j < 1 of the complex plane t [14] (Fig. 6.32). Therefore,
we can freely expand h(t) into a Taylor’s series at any point within that circle. If we
expand h(t) around t = 0, we expect to have a following Taylor’s series:
1
hðt Þ = c ðxÞt
n=0 n
n
: ð6:273Þ

We can readily decide cn(x) using (6.115) such that

1 hð t Þ
cn ð xÞ = nþ1
dt, ð6:274Þ
2πi Ct

where the contour C can be chosen so that C can encircle t = 0 in a region jt j < 1.
Let us make a substitution such that [14]
6.9 Multivalued Functions and Riemann Surfaces 273

Fig. 6.32 Branch points


and corresponding branch
cuts for (1 - 2tx + t2)-1/2.
For instance, when x = 0,
the branch points are located
p i t
at t = ± i. When x = 1= 2,
the branch points are (1 + )/ 2
p
positioned at t = ð1 ± iÞ= 2

−1 0 1
(1 − )/ 2
−i

1
1 - ut = 1 - 2tx þ t 2 2 : ð6:275Þ

Then we have

2ð u - xÞ
t= and dt = 2ð1 - ut Þdu= u2 - 1 with u ≠ ± 1: ð6:276Þ
u2 - 1

Using these relations, we rewrite (6.274) as


n n
1 ð u 2 - 1Þ 1 ð- 1Þn ð1 - u2 Þ
cn ðxÞ = nþ1
du = nþ1
du, ð6:277Þ
2πi C 0 2 ð u - xÞ 2πi C 0 2 ðu - xÞ
n n

where the contour C′ encircles u = x. Here using (6.118), we readily get

n n
1 ð- 1Þn ð1 - u2 Þ ð- 1Þn dn ð1 - u2 Þ
cn ð xÞ = du =  Pn ðxÞ: ð6:278Þ
2πi C 0 2 ð u - xÞ
n nþ1 2n n! dun
u=x

The functions Pn(x) are Legendre polynomials that have already appeared in
(3.145) and (3.207) of Sects. 3.5 and 3.6. In this manner, the above argument directly
confirms that the Legendre polynomials of (3.194) derived from the generating
function (3.180) or defined as Rodrigues formula of (3.145) are identical in the
complex domain. Originally, (3.145) and (3.207) were defined in the real domain.
The identical representations in both the domains real and complex are the conse-
quence of the analytic continuation.
Substituting x = ± 1 for (6.277), we get
274 6 Theory of Analytic Functions

n
1 ð- 1Þn ð1 - u2 Þ 1 1 ðu þ 1Þn 1
c n ð 1Þ = du = n  du = n  ðu þ 1Þn
C0 u - 1
nþ1
2πi C 0 2 ð u - 1Þ 2 2πi 2
n
u=1

=1 ð6:279Þ

and

n
1 ð- 1Þn ð1 - u2 Þ 1 1 ð u - 1Þ n 1
cn ð- 1Þ = du = n  du = n  ðu - 1Þn
2πi C 2 ðu þ 1Þ
0 n nþ1 2 2πi C 0 u þ 1 2
u= -1

= ð- 1Þn : ð6:280Þ

Notice that in (6.279) and (6.280), the integrand has a simple pole at u = 1 and
u = - 1, respectively. Hence, we used (6.144) and (6.148) with the contour
integration. The contour C′ is taken so that it can encircle u = 1 in the former case
and u = - 1 in the latter. Hence, we have important relations.

P n ð 1Þ = 1 and Pn ð- 1Þ = ð- 1Þn :

Summarizing this chapter, we have explored the theory and applications of the
analytic functions on the complex domain. We have studied how the theory of
analytic functions produces fruitful results.

References

1. Stoll RR (1979) Set theory and logic. Dover, New York


2. Willard S (2004) General topology. Dover, New York
3. McCarty G (1988) Topology. Dover, New York
4. Mendelson B (1990) Introduction to topology, 3rd edn. Dover, New York
5. Dennery P, Krzywicki A (1996) Mathematics for physicists. Dover, New York
6. Takagi T (2010) Introduction to analysis, Standard edn. Iwanami, Tokyo. (in Japanese)
7. Rudin W (1987) Real and complex analysis, 3rd edn. McGraw-Hill, New York
8. Byron FW Jr, Fuller RW (1992) Mathematics of classical and quantum physics. Dover,
New York
9. Arfken GB, Weber HJ, Harris FE (2013) Mathematical methods for physicists, 7th edn.
Academic Press, Waltham
10. Hassani S (2006) Mathematical physics. Springer, New York
11. Riley KF, Hobson MP, Bence SJ (2006) Mathematical methods for physics and engineering,
3rd edn. Cambridge University Press, Cambridge
12. Boas ML (2006) Mathematical methods in the physical sciences, 3rd edn. Wiley, New York
13. Omote M (1988) Complex functions. Iwanami, Tokyo. (in Japanese)
14. Lebedev NN (1972) Special functions and their applications. Dover, New York
Part II
Electromagnetism

Electromagnetism is one of the pillars of modern physics, even though it belongs to


classical physics along with Newtonian mechanics. Maxwell’s equations describe
and interpret almost all electromagnetic phenomena and are basis equations of
classical physics together with Newton equation. Although after the discovery of
relativistic theory Newton equation needed to be modified, Maxwell’s equations did
not virtually have to be changed. In this part we treat the characteristics of electro-
magnetic waves that are directly derived from Maxwell’s equations.
Electromagnetism becomes connected to quantum mechanics, especially when
we deal with emission and absorption of light. In fact, the experiments performed in
connection with blackbody radiation led to the discovery of light quanta and
establishment of quantum mechanics. These accounts are not only of particular
interest from a historical point of view but of great importance in understanding
modern physics. To understand the propagation of electromagnetic waves in a
dielectric medium is important from a basic aspect of electromagnetism. Moreover,
it is deeply connected to optical applications including optical devices such as
waveguides and lasers. In light of recent progress of the laser devices, we provide
a tangible example of organic lasers as newly occurring laser devices.
The motion of particles as well as spatial and temporal change in, e.g., electro-
magnetic fields are very often described in terms of differential equations. We
describe introductory methods of Green’s functions in order to solve those differ-
ential equations.
Chapter 7
Maxwell’s Equations

Maxwell’s equations consist of four first-order partial differential equations. First we


deal with basic properties of Maxwell’s equations. Next we show how equations of
electromagnetic wave motion are derived from Maxwell’s equations along vector
analysis. It is important to realize that the generation of the electromagnetic wave is a
direct consequence of the interplay between the electric field and magnetic field that
both change with time. We deal with behaviors of electromagnetic waves in dielec-
tric media where no true charge exists. At a first glance, this restriction seems to
narrow a range of application of principles of electromagnetism. In practice, how-
ever, such a situation is universalistic; topics cover a wide range of electromagnetic
phenomena, e.g., light propagation in dielectrics including water, glass, polymers,
etc. Polarized properties characterize the electromagnetic waves. These include
linear, circular, and elliptic polarizations. The characteristics are important both
from a fundamental aspect and from the point of view of optical applications.

7.1 Maxwell’s Equations and Their Characteristics

In this chapter, we first represent Maxwell’s equations as vector forms. The equa-
tions are represented as a differential form that is consistent with a viewpoint based
on “action trough medium.” The equation of wave motion (or wave equation) is
naturally derived from these equations.
Maxwell’s equations of electromagnetism are expressed as follows:

div D = ρ, ð7:1Þ

div B = 0, ð7:2Þ

© Springer Nature Singapore Pte Ltd. 2023 277


S. Hotta, Mathematical Physical Chemistry,
https://doi.org/10.1007/978-981-99-2512-4_7
278 7 Maxwell’s Equations

∂B
rot E þ = 0, ð7:3Þ
∂t

∂D
rot H - = i: ð7:4Þ
∂t

In (7.3), RHS denotes a zero vector. Let us take some time to get acquainted with
physical quantities with their dimension as well as basic ideas and concepts along
with definitions of electromagnetism.
m2 = m2] (or electric displacement);
The quantity D is called electric flux density [As C

ρ is electric charge density [m3 ]. We describe vector quantities V as in (3.4):


C

Vx
V = ð e1 e 2 e 3 Þ Vy : ð7:5Þ
Vz

The notation div denotes a differential operator such that

∂V x ∂V y ∂V z
div V  þ þ : ð7:6Þ
∂x ∂y ∂z

Thus, the div operator converts a vector to a scalar. The quantities D and the
electric field E [V/m] are associated with the following expression:

D = εE, ð7:7Þ
2
where ε [Nm
C
2] is called a dielectric constant (or permittivity) of the dielectric medium.

The dimension can be understood from the following Coulomb’s law that describes a
force exerted between two charges:

1 QQ0
F= , ð7:8Þ
4πε r 2

where F is the force; Q and Q′ are electric charges of the two charges; r is a distance
between the two charges. Equation (7.1) represents Gauss’ law of electrostatics.
The electric charge of 1 Coulomb (1 [C]) is defined as follows: Suppose that two
point charges having the same electric charge are placed in vacuum 1 m apart. In that
situation, if a force F between the two charges is

F = ~c2 =107 ½N,

where ~c is a light velocity, then we define the electric charge which each point charge
possesses as 1 [C]. Here, note that ~c is a dimensionless number related to the light
velocity in vacuum that is measured in a unit of [m/s]. Notice also that the light
velocity c in vacuum is defined as
7.1 Maxwell’s Equations and Their Characteristics 279

c = 299 792 458 ½m=s ðexact numberÞ:

Thus, we have

~c = 299 792 458 ðdimensionless numberÞ:

Vacuum is a kind of dielectric media. From (7.8), its dielectric constant ε0 is


defined by

107 C2 C2
ε0 = 2 2
≈ 8:854 x 10 - 12 : ð7:9Þ
4π~c Nm Nm2

Meanwhile, 1 s (second) has been defined from a certain spectral line emitted
from 133Cs. Thus, 1 m (meter) is defined by

ðdistance along which light is propagated in vacuum during 1 sÞ=299 792 458:

The quantity B is called magnetic flux density [Vs


m2 = m2 ]. Equation (7.2) repre-
Wb

sents Gauss’s law of magnetostatics. In contrast to (7.1), RHS of (7.2) is zero. This
corresponds to the fact that although a true charge exists, a true “magnetic charge”
does not exist. (More precisely, such charge has not been detected so far.) The
quantity B is connected with magnetic field H by the following relation:

B = μH, ð7:10Þ

where μ [AN2 ] is said to be permeability (or magnetic permeability). Thus, H has a


dimension [m A
]. Permeability of vacuum μ0 is defined by

4π N
μ0 = : ð7:11Þ
107 A2

Also we have

μ0 ε0 = 1=c2 : ð7:12Þ

We will encounter this relation later again. Since the quantities c and ε0 have a
defined magnitude, so does μ0 from (7.12).
We often make an issue of a relative magnitude of dielectric constant and
magnetic permeability of a dielectric medium. That is, relative permittivity εr and
relative permeability μr are defined as follows:

εr  ε=ε0 and μr  μ=μ0 : ð7:13Þ


280 7 Maxwell’s Equations

Note that both εr and μr are dimensionless quantities. Those magnitudes are equal
to 1 (in the case of vacuum) or larger than 1 (with any other dielectric media).
Equations (7.3) and (7.4) deal with the change in electric and magnetic fields with
time. Of these, (7.3) represents Faraday’s law of electromagnetic induction due to
Michael Faraday (1831). He found that when a permanent magnet was thrust into or
out of a closed circuit, the transient current flowed. Moreover, that experiment
implied that even without the closed circuit, an electric field was generated around
the space that changed the position relative to the permanent magnet. Equation (7.3)
is easier to understand if it is rewritten as follows:

∂B
rot E = - : ð7:14Þ
∂t

That is, the electric field E is generated in such a way that the induced electric
field (or induced current) tends to lessen the change in magnetic flux (or magnetic
field) produced by the permanent magnet (Lenz’s law). The minus sign in RHS
indicates that effect.
The rot operator appearing in (7.3) and (7.4) is defined by
e1 e2 e3
∂ ∂ ∂
rot V = — × V =
∂x ∂y ∂z
Vx Vy Vz

∂V z ∂V y ∂V x ∂V z ∂V y ∂V x
= - e1 þ - e2 þ - e3 : ð7:15Þ
∂y ∂z ∂z ∂x ∂x ∂y

The operator — has already appeared in (3.9). This operator transforms a vector to
a vector. Let us think of the meaning of the rot operator. Suppose that there is a
vector field that varies with time and spatial positions. Suppose also at some instant
the spatial distribution of the field varies as shown in Fig. 7.1, where a spiral vector
field V is present. For a z-component of rot V around the origin, we have

∂V y ∂V x ðV 1 Þy - ðV 3 Þy ðV 2 Þx - ðV 4 Þx
ðrot V Þz = - = lim - :
∂x ∂y Δx⟶0, Δy⟶0 Δx Δy

In the case of Fig. 7.1, (V1)y - (V3)y > 0 and (V2)x - (V4)x < 0 and, hence, we find
∂V
that rot V has a positive z-component. If Vz = 0 and ∂zy = ∂V ∂z
x
= 0, we find from
∂V
(7.15) that rot V possesses only the z-component. The equation ∂zy = ∂V ∂z
x
=0
implies that the vector field V is uniform in the direction of the z-axis. Thus, under
the above conditions the spiral vector field V is accompanied by the rot V vector field
that is directed toward the upper side of the plane of paper (i.e., the positive direction
of the z-axis).
Equation (7.4) can be rewritten as
7.1 Maxwell’s Equations and Their Characteristics 281

Fig. 7.1 Schematic


representation of a spiral
y
vector field V that yields
rot V
(0, y/2)
V2 V1

( x/2, 0) O x
( x/2, 0)
z

V3 V4
(0, y/2)

∂D
rot H = i þ : ð7:16Þ
∂t

Notice that ∂D
∂t
has the same dimension as A
m2 ] and is called displacement current.
Without this term, we have

rot H = i: ð7:17Þ

This relation is well known as Ampère’s law or Ampère’s circuital law (Andr-
é-Marie Ampère: 1827), which determines a magnetic field yielded by a stationary
current. Again with the aid of Fig. 7.1, (7.17) implies that the current given by i
produces spiral magnetic field.
Now, let us think of a change in amount of charges with time in a part of three-
dimensional closed space V surrounded by a closed surface S. It is given by

d ∂ρ
ρdV = dV = - i  ndS = - div idV, ð7:18Þ
dt V V ∂t S V

where n is an outward-directed normal unit vector; with the last equality we used
Gauss’s theorem. The Gauss’s theorem is described by

div idV = i  ndS:


V S
282 7 Maxwell’s Equations

Fig. 7.2 Intuitive diagram


that explains the Gauss’s
theorem

Figure 7.2 shows an intuitive diagram that explains the Gauss’s theorem. The
diagram shows a cross-section of the closed space V surrounded by a surface S. In
this case, imagine a cube or a hexahedron as V. The periphery is the cross-section of
the closed surface accordingly. Arrows in the diagram schematically represent div i
on individual fragments; only those of the center infinitesimal fragment are shown
with solid lines. The arrows of adjacent fragments cancel out each other and only the
components on the periphery are non-vanishing. Thus, the volume integration of
div i is converted to the surface integration of i. Readers are referred to appropriate
literature with the vector integration [1].
Consequently, from (7.18) we have

∂ρ
þ div i dV = 0: ð7:19Þ
V ∂t

Since V is arbitrarily chosen, we get

∂ρ
þ div i = 0: ð7:20Þ
∂t

The relation (7.20) is called a current continuity equation. This relation represents
the law of conservation of charge.
Meanwhile, taking div of both sides of (7.17), we have
7.1 Maxwell’s Equations and Their Characteristics 283

div rot H = div i: ð7:21Þ

The LHS of (7.21) reads

∂ ∂H z ∂H y ∂ ∂H x ∂H z ∂ ∂H y ∂H x
div rot H = - þ - þ -
∂x ∂y ∂z ∂y ∂z ∂x ∂z ∂x ∂y

2 2 2 2 2 2
∂ ∂ ∂ ∂ ∂ ∂
= - Hz þ - Hx þ - Hy
∂x∂y ∂y∂x ∂y∂z ∂z∂y ∂z∂x ∂x∂z

= 0: ð7:22Þ
2 2
∂ Hz
With the last equality of (7.22), we used the fact that if, e.g., ∂x∂y
and ∂∂y∂x
Hz
are
2 2
∂ Hz ∂ Hz
continuous and differentiable in a certain domain (x, y), = ∂x∂y ∂y∂x
. That is, we
assume “ordinary” functions for Hz, Hx, and Hy. Thus from (7.21), we have

div i = 0: ð7:23Þ

From (7.20), we also have

∂ρðx, t Þ
= 0, ð7:24Þ
∂t

where we explicitly show that ρ depends upon both x and t. Note that x is a position
vector described as (3.5). Therefore, (7.24) shows that ρ(x, t) is temporally constant
at a position x, consistent with the stationary current.
Nevertheless, we encounter a problem when ρ(x, t) is temporally varying. In other
words, (7.17) goes against the charge conservation law, when ρ(x, t) is temporally
varying. It was James Clerk Maxwell (1861–1862) that solved the problem by
introducing a concept of the displacement current. In fact, taking div of both sides
of (7.16), we have

∂div D ∂ρ
div rot H = div i þ = div i þ = 0, ð7:25Þ
∂t ∂t

where with the first equality we exchanged the order of differentiations with respect
to t and x; with the second equality we used (7.1). The last equality of (7.25) results
from (7.20). In other words, in virtue of the term of ∂D
∂t
, (7.4) is consistent with the
charge conservation law. Thus, the set of Maxwell’s Eqs. (7.1) to (7.4) supply us
with well-established base in natural science up until the present.
284 7 Maxwell’s Equations

Although the set of these equations describe spatial and temporal changes in
electric and magnetic fields in vacuum and matter including metal, in Part II we
confine ourselves to the changes in the electric and magnetic fields in a uniform
dielectric medium.

7.2 Equation of Wave Motion

If we further confine ourselves to the case where neither electric charge nor electric
current is present in a uniform dielectric medium, we can readily obtain equations of
wave motion regarding the electric and magnetic fields. That is,

div D = 0, ð7:26Þ

div B = 0, ð7:27Þ

∂B
rot E þ = 0, ð7:28Þ
∂t

∂D
rot H - = 0: ð7:29Þ
∂t

The relations (7.27) and (7.28) are identical to (7.2) and (7.3), respectively.
Let us start with a formula of vector analysis. First, we introduce a grad operator.
We have

∂f ∂f ∂f
grad f = — f = e1 þ e 2 þ e 3 :
∂x ∂y ∂z

That is, the grad operator transforms a scalar to a vector. We have a following
formula:

rot rot V = grad div V - — 2 V: ð7:30Þ

The operator — 2 has already appeared in (1.24). To show (7.30), we compare a x-


component of both sides of (7.30). That is,
7.2 Equation of Wave Motion 285

∂ ∂V y ∂V x ∂ ∂V x ∂V z
½rot rot V x = - - - , ð7:31Þ
∂y ∂x ∂y ∂z ∂z ∂x

∂ ∂V x ∂V y ∂V z
2 2 2
∂ Vx ∂ Vx ∂ Vx
grad div V - — 2 V x = þ þ - - -
∂x ∂x ∂y ∂z ∂x
2
∂y
2
∂z
2

∂ ∂V y ∂V z
2 2
∂ Vx ∂ Vx
= þ - - : ð7:32Þ
∂x ∂y ∂z ∂y
2
∂z
2

2 2
∂ V ∂ Vy 2 2
Again assuming that ∂y∂xy = ∂x∂y
and ∂∂z∂x
Vz
= ∂ Vz
∂x∂z
, we have

½rot rot V x = grad div V - — 2 V x : ð7:33Þ

Regarding y- and z-components, we have similar relations as well. Thus (7.30)


holds.
Taking rot of both sides of (7.28), we have

2
∂B ∂rotB ∂ E
rot rot E þ rot = grad div E - — 2 E þ = - — 2 E þ με 2
∂t ∂t ∂t
= 0, ð7:34Þ

where with the first equality we used (7.30) and for the second equality we used
(7.7), (7.10), and (7.29). Thus we have

2
∂ E
— 2 E = με 2
: ð7:35Þ
∂t

Similarly, from (7.29) we get

2
∂ H
— 2 H = με 2
: ð7:36Þ
∂t

Equations (7.35) and (7.36) are called the equations of wave motions for the
electric and magnetic fields.
To consider implications of these equations, let us think of for simplicity a
following equation in a one-dimensional space.
286 7 Maxwell’s Equations

2 2
∂ yðx, t Þ 1 ∂ yðx, t Þ
2
= , ð7:37Þ
∂x v2 ∂t
2

where y is an arbitrary scalar function that depends on x and t; v is a constant. Let f(x,
t) and g(x, t) be arbitrarily chosen functions. Then, f(x - vt) and g(x + vt) are two
solutions of (7.37). In fact, putting X = x - vt, we have

2 2
∂f ∂f ∂X ∂f ∂ f ∂ ∂f ∂X ∂ f
= = , = = , ð7:38Þ
∂x ∂X ∂x ∂X ∂x2 ∂X ∂X ∂x ∂X
2

2 2
∂f ∂f ∂X ∂f ∂ f ∂ ∂f ∂X ∂ f
= = ð- vÞ , = ð- vÞ = ð- vÞ2 : ð7:39Þ
∂t ∂X ∂t ∂X ∂t 2 ∂X ∂X ∂t ∂X
2

Deleting ∂2f/∂X2 from the second equation of (7.38) and the second equation of
(7.39), we recover

2 2
∂ f 1 ∂ f
2
= 2 2: ð7:40Þ
∂x v ∂t

Similarly, we get

2 2
∂ g 1 ∂ g
2
= 2 2
: ð7:41Þ
∂x v ∂t

Therefore, as a general solution y(x, t) we can take a superposition of f(x, t) and


g(x, t). That is,

yðx, t Þ = f ðx - vt Þ þ gðx þ vt Þ: ð7:42Þ

The implication of (7.42) is as follows: (i) The function f(x - vt) can be obtained
by parallel translation of f(x) by vt in a positive direction of x-axis. In other words,
f(x - vt) is obtained by translating f(x) by v in a unit of time in a positive direction of
x-axis, or the function represented by f(x) is translated at a rate of v with its form
unchanged in time. (ii) The function g(x + vt), however, is translated at a rate of -v
with its form unchanged in time as well. (iii) Thus, y(x, t) of (7.42) represents two
“waves,” i.e., a forward wave and a backward wave. Propagation velocity of the two
waves is jvj accordingly. Usually we choose a positive number for v and v is called a
phase velocity.
Comparing (7.35) and (7.36) with (7.41), we have
7.2 Equation of Wave Motion 287

με = 1=v2 : ð7:43Þ

In particular, in a vacuum we recover (7.12).


Notice that f and g can take any functional form and, hence, they are not
necessarily a periodic wave. Yet, what we are mostly concerned with is a periodic
wave such as sinusoidal waves. Thus, we arrive at a following functional form:

f ðx - vt Þ = Aeiðx - vtÞ , ð7:44Þ

where A is said to be an amplitude of the wave. The constant A usually takes a


positive number, but it may take a complex number including a negative number. An
exponent of (7.44) contains a number having a dimension [m]. To make it a
dimensionless number, we multiply (x - vt) by a wavenumber k that has been
introduced in (1.2) and (1.3).
That is, we have

~f ½k ðx - vt Þ = Aeikðx - vtÞ = Aeiðkx - kvtÞ = Aeiðkx - ωtÞ , ð7:45Þ

where ~f shows the change in the functional from according to the variable transfor-
mation. In (7.45), we have


kv = kλν = λν = 2πν = ω, ð7:46Þ
λ

where ν and ω are said to be frequency and angular frequency, respectively. For a
three-dimensional wave f of a scalar function, we have a following form:

f = Aeiðkx - ωtÞ = A½cosðk  x - ωt Þ þ i sinðk  x - ωt Þ, ð7:47Þ

where k is said to be a wavenumber vector. In (7.47),

k2 = k2 = k2x þ k2y þ k 2z : ð7:48Þ

Equation (7.47) is virtually identical to (1.25). When we deal with a problem of


classical electromagnetism, we usually take a real part of the results after relevant
calculations.
Suppose that (7.47) is a solution of a following wave equation:

2
1 ∂ f
— 2f = : ð7:49Þ
v2 ∂t 2

Substituting LHS of (7.47) for f of (7.49), we have


288 7 Maxwell’s Equations

1
A - k2 eiðkx - ωtÞ = A - ω2 eiðkx - ωtÞ : ð7:50Þ
v2

Comparing both sides of (7.50), we get

k 2 v 2 = ω2 or kv = ω: ð7:51Þ

Thus, we recover (7.46).


Here we introduce a unit vector n as in (1.3) whose direction parallels that of
propagation of wave such that

2π nx
k = kn = n, n = ðe1 e2 e3 Þ ny , ð7:52Þ
λ nz

where nx, ny, and nz define direction cosines. Then (7.47) can be rewritten as

f = Aeiðknx - ωtÞ , ð7:53Þ

where an exponent is called a phase. Suppose that the phase is fixed at zero. That is,

kn  x - ωt = 0: ð7:54Þ

Since ω = kv, we have

n  x - vt = 0: ð7:55Þ

Equation (7.55) defines a plane in a three-dimensional space and is called a


Hesse’s normal form. Figure 7.3 schematically represents a plane wave of the field
f. The field has the same phase on a plane P. A solid arrow x represents an arbitrary
position vector on the plane and n is a unit vector perpendicular to the plane P (i.e.,
parallel to a normal of the plane P). The quantity vt defines a length of a perpendic-
ular that connects the origin O and plane P (i.e., the length of the perpendicular from
the origin and a foot of the perpendicular) at a given time t.
In other words, (7.54) determines a plane in such a way that the wave f has the
same phase (zero) at a given time t at position vectors x on the plane determined by
(7.54) or (7.55). That plane is moving in the direction of n at a phase velocity v. From
this situation, a wave f described by (7.53) is called a plane wave.
A refractive index n of a dielectric media is an important index that characterizes
its dielectric properties. It is defined as
p
n  c=v = με=μ0 ε0 = μr εr : ð7:56Þ

In a non-magnetic substance such as glass and polymer materials, we can assume


that μr ≈ 1. Thus, we get an approximate expression as follows:
7.3 Polarized Characteristics of Electromagnetic Waves 289

Fig. 7.3 Schematic


representation of a plane
z
wave. The field has the same
phase on a plane P. A solid
arrow x represents an
arbitrary position vector on
the plane and n is a unit
vector perpendicular to the
plane P (i.e., parallel to a
x n
normal of the plane P). The
quantity v is a phase velocity
of the plane wave

vt P
y
O

x
p
n ≈ εr : ð7:57Þ

7.3 Polarized Characteristics of Electromagnetic Waves

As in (7.47), we assume a similar form for a solution of (7.35) such that

E = E0 eiðkx - ωtÞ = E0 eiðknx - ωtÞ , ð7:58Þ

where with the second equality we used (7.52). Similarly, we have

H = H 0 eiðkx - ωtÞ = H 0 eiðknx - ωtÞ : ð7:59Þ

In (7.58) and (7.59), E0 and H0 are constant vectors and may take a complex
magnitude. Substituting (7.58) for (7.28) and using (7.10) as well as (7.43) and
(7.46), we have

ikn × E0 eiðknx - ωtÞ = - ð- iωÞμH 0 eiðknx - ωtÞ = ivkμH 0 eiðknx - ωtÞ


290 7 Maxwell’s Equations

Fig. 7.4 Mutual geometry


of E and H for an
z
electromagnetic plane wave
in P. E and H have the same
phase on P at an arbitrary
given time. The unit vector
n is perpendicular to P
P
E n

H y
O

= ik μ=εH 0 eiðknx - ωtÞ :

Comparing coefficients of the exponential functions of the first and last sides,
we get

H 0 = n × E0 = μ=ε : ð7:60Þ

Similarly, substituting (7.59) for (7.29) and using (7.7), we get

E0 = μ=ε H 0 × n: ð7:61Þ

From (7.26) and (7.27), we have

n  E0 = n  H 0 = 0: ð7:62Þ

This indicates that E and H are both perpendicular to n, i.e., the propagation
direction of the electromagnetic wave. Thus, the electromagnetic wave is character-
ized by a transverse wave. The fields E and H have the same phase on P at an
arbitrary given time. Taking account of (7.60)–(7.62), E, H, and n are mutually
perpendicular to one another. We depict a geometry of E, H, and n for the
electromagnetic plane wave in Fig. 7.4, where a plane P is perpendicular to n.
We find that (7.60) is not independent of (7.61). In fact, taking an outer product
from the right with respect to both sides of (7.60), we have
7.3 Polarized Characteristics of Electromagnetic Waves 291

H 0 × n = n × E0 × n= μ=ε = E0 = μ=ε ,

where we used (7.61) with the second equality. Thus, we get

n × E0 × n = E0 : ð7:63Þ

Meanwhile, vector analysis tells us that [1]

C × ðA × BÞ = AðB  CÞ - BðC  AÞ:

In the above, putting B = C = n, we have

n × ðA × nÞ = Aðn  nÞ - nðn  AÞ:

That is, we have

A = nðn  AÞ þ n × ðA × nÞ:

This relation means that A can be decomposed into a component parallel to n and
that perpendicular to n. Equation (7.63) shows that E0 has no component parallel to
n. This is another confirmation that E is perpendicular to n.
In (7.60) and (7.61), μ=ε has a dimension [Ω]. Make sure that this can be
confirmed by (7.9) and (7.11). Hence, μ=ε is said to be characteristic impedance
[2]. We denote it by

Z μ=ε: ð7:64Þ

Thus, we have

H 0 = ðn × E0 Þ=Z:

In vacuum we have

Z0 = μ0 =ε0 ≈ 376:7 ½Ω:

For the electromagnetic wave to be the transverse wave means that neither E nor
H has component along n. Choosing the positive direction of the z-axis for n and
ignoring components related to partial differentiation with respect to x and y (i.e., the
component related to ∂/∂x and ∂/∂y), we rewrite (7.28) and (7.29) for individual
Cartesian coordinates as
292 7 Maxwell’s Equations

∂E y ∂Bx ∂Dy ∂H x
- þ =0 or - þ με = 0,
∂z ∂t ∂z ∂t

∂E x ∂By ∂Dx ∂H y
þ =0 or þ με = 0,
∂z ∂t ∂z ∂t

∂H y ∂Dx ∂By ∂x
- - =0 or - - με = 0,
∂z ∂t ∂z ∂t

∂H x ∂Dy ∂Bx ∂E y
- =0 or - με = 0: ð7:65Þ
∂z ∂t ∂z ∂t

We differentiate the first equation of (7.65) with respect to z to get

2
∂ Ey 2
∂ Bx
- þ = 0:
∂z
2 ∂z∂t

Also differentiating the fourth equation of (7.65) with respect to t and multiplying
both sides by -μ, we have

2
2
∂ Hx ∂ Dy
-μ þμ = 0:
∂t∂z ∂t
2

Summing both sides of the above equations and arranging terms, we get

2 2
∂ Ey ∂ Ey
2
= με 2
:
∂z ∂t

In a similar manner, we have

2 2
∂ Ex ∂ E
2
= με 2 x :
∂z ∂t

Similarly, for the magnetic field, we also get

2 2
2
∂ Hx
2
∂ Hx ∂ Hy ∂ Hy
2
= με 2
and 2
= με 2
:
∂z ∂t ∂z ∂t

From the above relations, we have two plane electromagnetic waves polarized
either the x-axis or y-axis.
7.3 Polarized Characteristics of Electromagnetic Waves 293

What is implied in the above description is that as solutions of (7.35) and (7.36)
we have a plane wave characterized by a specific direction defined by E0 and H0.
This implies that if we observe the electromagnetic wave at a fixed point, both E and
H oscillate along the mutually perpendicular directions E0 and H0. Hence, we say
that the “electric wave” is polarized in the direction E0 and that the “magnetic wave”
is polarized in the direction H0.
To characterize the polarization of the electromagnetic wave, we introduce
following unit polarization vectors εe and εm (with indices e and m related to the
electric and magnetic field, respectively) [3]:

εe = E0 =E 0 and εm = H 0 =H 0 ; H 0 = E 0 =Z, ð7:66Þ

where E0 and H0 are said to be amplitude and may be again complex. We have

εe × εm = n: ð7:67Þ

We call εe and εm a unit polarization vector of the electric field and magnetic field,
respectively. As noted above, εe, εm, and n constitute a right-handed system in this
order and are mutually perpendicular to one another.
The phase of E in the plane wave (7.58) and that of H in (7.59) are individually
the same on all the points of P. From the wave equations of (7.35) and (7.36),
however, it is unclear whether E and H have the same phase. Suppose that E and H
would have a different phase such that

E = E0 eiðkx - ωtÞ

and

H = H 0 eiðkx - ωtþδÞ = H 0 eiδ eiðkx - ωtÞ = H~0 eiðkx - ωtÞ ,

where H~0 ð= H 0 eiδ Þ is complex and a phase factor eiδ in the exponent is unknown.
This factor, however, can be set at zero. To show this, let us make qualitative
discussion using Fig. 7.5. Figure 7.5 shows the electric field of the plane wave at
some instant as a function of phase Φ. Suppose that Φ is taken in the direction of n in
Fig. 7.4. Then, from (7.53) we have

Φ = kn  x - ωt = kn  ρn - ωt = kρ - ωt,

where ρ is distance from the origin. Also we have

E = E0 eiðkx - ωtÞ = E 0 εe eiðkρ - ωtÞ ðE 0 > 0Þ:

Suppose furthermore that the phase is measured at t = 0 and that the electric field
is measured along the εe direction. Then, we have
294 7 Maxwell’s Equations

Fig. 7.5 Electric field of a


plane wave at some instant
as a function of phase. The

Electric field
sinusoidal curve
representing the electric P1 P2
field is shifted with time 0
from left to right with its
form unchanged


0
Phase

Φ = kρ and E = jEj = E 0 eikρ : ð7:68Þ

In Fig. 7.5, a real part of E is shown with E = E0 at ρ = 0. The sinusoidal curve


representing the electric field in Fig. 7.5 is shifted with time from left to right with its
form unchanged.
In this situation, the electric field is to be strengthened with a negative value at a
phase P1 with time according to (7.68), whereas at another phase P2 the electric field
is to be strengthened with a positive value with time. As a result, in a region tucked
between P1 and P2 a spiral magnetic field is generated toward the εm direction, i.e.,
upper side of a plane of paper. The magnitude of the magnetic field is expected to be
maximized at a center point of P1P2 (i.e., the origin of the coordinate system) where
the electric field is maximized as well. Thus, we conclude that E and H have the
same phase. It is important to realize that the generation of the electromagnetic wave
is a direct consequence of the interplay between the electric field and magnetic field
that both change with time. It is truly based upon the nature of the electric and
magnetic fields that are clearly represented in Maxwell’s equations.
Next, we consider the situation where two electromagnetic waves are propagated
in the same direction but with a different phase. Notice again that we are considering
the electromagnetic wave that is propagated in a uniform and infinite dielectric
media without BCs.

7.4 Superposition of Two Electromagnetic Waves

Let E1 and E2 be two electric waves described such that

E1 = E1 e1 eiðkz - ωtÞ and E2 = E 2 e2 eiðkz - ωtþδÞ ,

where E1 (>0) and E2 (>0) are amplitudes and e1 and e2 represent unit polarization
vectors in the direction of positive x-axis and y-axis; we assume that two waves are
7.4 Superposition of Two Electromagnetic Waves 295

being propagated in the direction of the positive z-axis; δ is a phase difference. The
total electric field E is described as the superposition of E1 and E2 such that

E = E1 þ E2 = E1 e1 eiðkz - ωtÞ þ E 2 e2 eiðkz - ωtþδÞ : ð7:69Þ

Note that we usually discuss the polarization characteristics of electromagnetic


wave only by considering electric waves. We emphasize that an electric wave and
concomitant magnetic wave share the same phase in a uniform and infinite dielectric
media. A reason why the electric wave represents an electromagnetic wave is partly
because optical application is mostly made in a non-magnetic substance such as
glass, water, plastics, and most of semiconductors.
Let us view temporal change of E at a fixed point x = 0; x = y = z = 0. Then,
taking a real part of (7.69), x- and y-components of E; i.e., Ex and Ey are expressed as

E x = E1 cosð- ωt Þ and E y = E 2 cosð- ωt þ δÞ: ð7:70Þ

First, let us briefly think of the case where δ = 0. Eliminating t, we have

E2
Ey = E: ð7:71Þ
E1 x

This is an equation of a straight line. The resulting electric field E is called a


linearly polarized light accordingly. That is, when we are observing the electric field
of the relevant light at the origin, the field is oscillating along the straight line
described by (7.71) with the origin centrally located of the oscillating field. If
δ = π, we have

E2
Ey = - E:
E1 x

This gives a straight line as well. Therefore, if we wish to seek the relationship
between Ex and Ey, it suffices to examine it as a function of δ in a region of
- π2 ≤ δ ≤ π2.
(i) Case I: E1 ≠ E2.
Let us consider the case where δ ≠ 0 in (7.70). Rewriting the second equation of
(7.70) and inserting the first equation into it so that we can eliminate t, we have

Ex E2
E y = E2 ðcos ωt cos δ þ sin ωt sin δÞ = E2 cos δ ± sin δ 1 - x2 :
E1 E1

Rearranging terms of the above equation, we have


296 7 Maxwell’s Equations

Ey E Ex 2
- ðcos δÞ x = ± ðsin δÞ 1- : ð7:72Þ
E2 E1 E 21

Squaring both sides of (7.72) and arranging the equation, we get

Ex 2 2ðcos δÞE x E y Ey 2
- þ = 1: ð7:73Þ
E 21 sin 2 δ E 1 E 2 sin 2 δ E 22 sin 2 δ

Using a matrix form, we have


1 cos δ
-
E 21sin 2 δ E 1 E 2 sin 2 δ Ex
Ex Ey = 1: ð7:74Þ
cos δ 1 Ey
-
E 1 E 2 sin 2 δ E22 sin 2 δ

Note that the above matrix is real symmetric. In that case, to examine properties
of the matrix we calculate its determinant along with principal minors. The principal
minor means a minor with respect to a diagonal element. In this case, two principal
1 1
minors are E2 sin 2 and 2
δ E sin 2 δ
. Also we have
2 1

1 cos δ
-
E 21 sin 2 δ E 1 E 2 sin 2 δ 1
= 2 2 : ð7:75Þ
cos δ 1 E 1 E 2 sin 2 δ
-
E 1 E 2 sin 2 δ E22 sin 2 δ

Evidently, two principal minors as well as a determinant are all positive (δ ≠ 0). In
this case, the (2, 2) matrix of (7.74) is said to be positive definite. The related
discussion will be given in Part III. The positive definiteness means that in a
quadratic form described by (7.74), LHS takes a positive value for any real number
Ex and Ey except a unique case where Ex = Ey = 0, which renders LHS zero. The
positive definiteness of a matrix ensures the existence of positive eigenvalues with
the said matrix.
Let us consider a real symmetric (2, 2) matrix that has positive principal minors
and a positive determinant in a general case. Let such a matrix M be

M= a
c
c
b
,

where a, b > 0 and detM > 0; i.e., ab - c2 > 0. Let a corresponding quadratic form
be Q. Then, we have

cy 2 c2 y2 aby2
Q=ðx yÞ
a c x
= ax2 þ 2cyx þ by2 = a xþ - þ 2
c b y a a2 a
7.4 Superposition of Two Electromagnetic Waves 297

cy 2 y2
=a xþ þ ab - c2 :
a a2

Thus, Q ≥ 0 for any real numbers x and y. We seek a condition under which
Q = 0. We readily find that with M that has the above properties, only x = y = 0
makes Q = 0. Thus, M is positive definite. We will deal with this issue from a more
general standpoint in Part III.
In general, it is pretty complicated to seek eigenvalues and corresponding eigen-
vectors in the above case. Yet, we can extract important information from (7.74).
The eigenvalues λ of the matrix that appeared in (7.74) are estimated as follows:

2
E 21 þ E 22 ± E21 þ E 22 - 4E 21 E 22 sin 2 δ
λ= : ð7:76Þ
2E 21 E 22 sin 2 δ

Notice that λ in (7.76) represents two different positive eigenvalues. It is because


an inside of the square root is rewritten by

2
E 21 - E22 þ 4E21 E22 cos 2 δ > 0 ðδ ≠ ± π=2Þ:

Also we have

2
E 21 þ E 22 > E 21 þ E 22 - 4E21 E22 sin 2 δ:

These clearly show that the quadratic form of (7.74) gives an ellipse (i.e.,
elliptically polarized light). Because of the presence of the second term of LHS of
(7.73), both the major and minor axes of the ellipse are tilted and diverted from the x-
and y-axes.
Let us inspect the ellipse described by (7.74). Inserting Ex = E1 obtained at t = 0
in (7.70) into (7.73) and solving a quadratic equation with respect to Ey, we get Ey as
a double root such that

Ey = E 2 cos δ:

Similarly putting Ey = E2 in (7.73), we have

Ex = E 1 cos δ:

These results show that an ellipse described by (7.73) or (7.74) is internally


tangent to a rectangle as depicted in Fig. 7.6a. Equation (7.69) shows that the
electromagnetic wave is propagated toward the positive direction of the z-axis.
Therefore, in Fig. 7.6a we are peeking into the oncoming wave from the bottom of
a plane of paper at a certain position of z = constant. We set the constant = 0. Then,
298 7 Maxwell’s Equations

Fig. 7.6 Trace of an electric (a)


field of an elliptically
polarized light. (a) The trace
is internally tangent to a
rectangle of 2E1 × 2E2. In
the case of δ > 0, starting
from P at t = 0, the cos
coordinate point
representing the electric cos
field traces the ellipse
counter-clockwise with
time. (b) The trace of an
elliptically polarized light
for δ = π/2

(b)

we find that at t = 0 the electric field is represented by the point P (Ex = E1,
Ey = E2 cos δ); see Fig. 7.6a. From (7.70), if δ > 0, P traces the ellipse counter-
clockwise. It reaches a maximum point of Ey = E2 at t = δ/2ω. Since the trace of
electric field forms an ellipse as in Fig. 7.6, the associated light is said to be an
elliptically polarized light. If δ < 0 in (7.70), however, P traces the ellipse clockwise.
In a special case of δ = π/2, the second term of (7.73) vanishes and we have a
simple form described as

Ex 2 Ey 2
þ 2 = 1: ð7:77Þ
E 21 E2

Thus, the principal axes of the ellipse coincide with the x- and y-axes. On the basis
of (7.70), we see from Fig. 7.6b that starting from P at t = 0, again the coordinate
point representing the electric field traces the ellipse counter-clockwise with time;
see the curved arrow of Fig. 7.6b. If δ < 0, the coordinate point traces the ellipse
clockwise with time.
7.4 Superposition of Two Electromagnetic Waves 299

(ii) Case II: E1 = E2.


Now, let us consider a simple but important case. When E1 = E2, (7.73) is
simplified to be

E x 2 - 2 cos δE x Ey þ E y 2 = E 21 sin 2 δ: ð7:78Þ

Using a matrix form, we have

- cos δ
ð Ex Ey Þ 1
- cos δ 1
Ex
Ey
= E 21 sin 2 δ: ð7:79Þ

We obtain eigenvalues λ of the matrix of (7.79) such that

λ = 1 ± j cos δ j : ð7:80Þ

π
Setting - 2 ≤ δ ≤ π2, we have

λ = 1 ± cos δ: ð7:81Þ

The corresponding normalized eigenvectors v1 and v2 (as a column vector) are

1 1
p p
2 2
v1 = and v2 = : ð7:82Þ
1 1
- p p
2 2

Thus, we have a diagonalizing unitary matrix P such that

1 1
p p
2 2
P= : ð7:83Þ
1 1
- p p
2 2

Defining the above matrix appearing in (7.79) as A such that

- cos δ
A= 1
- cos δ 1
, ð7:84Þ

we obtain

1þ cos δ
P - 1 AP = 0
0
1 - cos δ
: ð7:85Þ

Notice that eigenvalues (1 + cos δ) and (1 - cos δ) are both positive as expected.
Rewriting (7.79), we have
300 7 Maxwell’s Equations

Fig. 7.7 Relationship


between the basis vectors
(e1 e2) and e01 e02 in the
case of E1 = E2; see text

- cos δ
ð Ex Ey ÞPP - 1 1
- cos δ 1
PP - 1 Ex
Ey

1þ cos δ
= ð Ex Ey ÞP 0
0
1 - cos δ
P-1 Ex
Ey
= E21 sin 2 δ: ð7:86Þ

Here, let us define new coordinates such that

u
v
 P-1 Ex
Ey
: ð7:87Þ

This coordinate transformation corresponds to the transformation of basis vectors


(e1 e2) such that

ð e1 e 2 Þ Ex
Ey
= ðe1 e2 ÞPP - 1 Ex
Ey
= e01 e02 u
v
, ð7:88Þ

where new basis vectors e01 e02 are given by

1 1 1 1
e01 e02 = ðe1 e2 ÞP = p e1 - p e2 p e 1 þ p e2 : ð7:89Þ
2 2 2 2

The coordinate system is depicted in Fig. 7.7 along with the basis vectors. The
relevant discussion will again appear in Part. III.
7.4 Superposition of Two Electromagnetic Waves 301

Substituting (7.87) for (7.86) and rearranging terms, we get

u2 v2
þ = 1: ð7:90Þ
E 21 ð1 - cos δÞ E 21 ð1 þ cos δÞ
p
Equation (7.90) indicates that a major axis and minor axis are E 1 1 þ cos δ and
p
E 1 1 - cos δ, respectively. When δ = ± π/2, (7.90) becomes

u2 v 2
þ = 1: ð7:91Þ
E 21 E 21

This represents a circle. For this reason, the wave described by (7.91) is called a
circularly polarized light. In (7.90) where δ ≠ ± π/2, the wave is said to be an
elliptically polarized light. Thus, we have linearly, elliptically, and circularly polar-
ized lights depending on a magnitude of δ.
Let us closely examine characteristics of the elliptically and circularly polarized
lights in the case of E1 = E2. When t = 0, from (7.70) we have

Ex = E1 and E y = E 1 cos δ: ð7:92Þ

This coordinate point corresponds to A1 whose Ex coordinate is E1 (see Fig. 7.8a).


In the case of Δt = δ/2ω, Ex = Ey = E1 cos (±δ/2). This point corresponds to A2 in
Fig. 7.8a. We have

p p
E2x þ E 2y = 2E1 cosðδ=2Þ = E1 1 þ cos δ: ð7:93Þ

This is equal to the major axis as anticipated. With t = Δt,

E x = E1 cosð- ωΔt Þ and E y = E1 cosð- ωΔt þ δÞ: ð7:94Þ

Notice that Ey takes a maximum E1 when Δt = δ/ω. Consequently, if δ takes a


positive value, Ey takes a maximum E1 for a positive Δt, as is similarly the case with
Fig. 7.6a. At that time, Ex = E1 cos (-δ) < E1. This point corresponds to A3 in
Fig. 7.8a. As a result, the electric field traces the ellipse counter-clockwise with time,
as in the case of Fig. 7.6. If δ takes a negative value, however, the field traces the
ellipse clockwise.
If δ = ± π/2, in (7.94) we have

π
E x = E1 cosð- ωt Þ and Ey = E 1 cos - ωt ± : ð7:95Þ
2

We examine the case of δ = π/2 first. In this case,


p when t = 0, Ex = E1 and Ey = 0
(Point C1 in Fig. 7.8b). If ωt = π/4, E x = Ey = 1= 2 (Point C2 in Fig. 7.8b). In turn,
if ωt = π/2, Ex = 0 and Ey = E1 (Point C3). Again the electric field traces the circle
302 7 Maxwell’s Equations

Fig. 7.8 Polarized feature (a)


of light in the case of
E1 = E2. (a) If δ > 0, the
electric field traces an ellipse
from A1 via A2 to A3 (see
text). (b) If δ = π/2, the
electric field traces a circle 1 − cos
from C1 via C2 to C3 (left-
circularly polarized light)

>0

(b)

= /2

counter-clockwise. In this situation, we see the light from above the z-axis. In other
words, we are viewing the light against the direction of its propagation. The wave is
said to be left-circularly polarized and have positive helicity. In contrast, when
δ = - π/2, starting from Point C1 the electric field traces the circle clockwise.
That light is said to be right-circularly polarized and have negative helicity.
With the left-circularly polarized light, (7.69) can be rewritten as

E = E1 þ E2 = E1 ðe1 þ ie2 Þeiðkz - ωtÞ : ð7:96Þ

Therefore, a complex vector (e1 + ie2) characterizes the left-circular polarization.


On the other hand, (e1 - ie2) characterizes the right-circular polarization. To
References 303

normalize them, it is convenient to use the following vectors as in the case of Sect.
4.3 [3].

1 1
eþ  p ðe1 þ ie2 Þ and e -  p ðe1 - ie2 Þ: ð4:45Þ
2 2

In the case of δ = 0, we have a linearly polarized light. For this, the points A1, A2,
and A3 coalesce to be a point on a straight line of Ey = Ex.

References

1. Arfken GB, Weber HJ, Harris FE (2013) Mathematical methods for physicists, 7th edn. Aca-
demic Press, Waltham
2. Pain HJ (2005) The physics of vibrations and waves, 6th edn. Wiley, Chichester
3. Jackson JD (1999) Classical electrodynamics, 3rd edn. Wiley, New York
Chapter 8
Reflection and Transmission
of Electromagnetic Waves in Dielectric
Media

In Chap. 7, we considered the propagation of electromagnetic waves in an infinite


uniform dielectric medium. In this chapter, we think of a situation where two
(or more) dielectrics are in contact with each other at a plane interface. When two
dielectric media adjoin each other with an interface, propagating electromagnetic
waves are partly reflected by the interface and partly transmitted beyond the inter-
face. We deal with these phenomena in terms of characteristic impedance of the
dielectric media. In the case of an oblique incidence of a wave, we categorize it into a
transverse electric (TE) wave and transverse magnetic (TM) wave. If a thin plate of a
dielectric is sandwiched by a couple of metal sheets, the electromagnetic wave is
confined within the dielectric. In this case, the propagating mode of the wave differs
from that of a wave propagating in a free space (i.e., a space filled by a three-
dimensionally infinite dielectric medium). If a thin plate of a dielectric having a large
refractive index is sandwiched by a couple of dielectrics with a smaller refractive
index, the electromagnetic wave is also confined within the dielectric with a larger
index. In this case, we have to take account of the total reflection that causes a phase
change upon reflection. We deal with such specific modes of the electromagnetic
wave propagation. These phenomena are treated both from a basic aspect and from a
point of view of device application. The relevant devices are called waveguides in
optics.

8.1 Electromagnetic Fields at an Interface

We start with examining a condition of an electromagnetic field at the plane


interface. Suppose that two semi-infinite dielectric media D1 and D2 are in contact
with each other at a plane interface. Let us take a small rectangle S that strides the
interface (see Fig. 8.1). Taking a surface integral of both sides of (7.28) over the
strip, we have

© Springer Nature Singapore Pte Ltd. 2023 305


S. Hotta, Mathematical Physical Chemistry,
https://doi.org/10.1007/978-981-99-2512-4_8
306 8 Reflection and Transmission of Electromagnetic Waves in Dielectric Media

l
D1 t1

h
D2 n C t2
S
Fig. 8.1 A small rectangle S that strides an interface formed by two semi-infinite dielectric media
of D1 and D2. Let a curve C be a closed loop surrounding the rectangle S. A unit vector n is directed
to a normal of S. Unit vectors t1 and t2 are directed to a tangential line of the interface plane

Fig. 8.2 Diagram that


intuitively explains Stokes’
theorem. In the diagram a
surface S is encircled by a
closed curve C. An
C
infinitesimal portion of C is
dS
denoted by dl. The surface
S is pertinent to the surface
integration. Spiral vector
n
field E is present on and near
S

∂B
rot E  ndS þ  ndS = 0, ð8:1Þ
S S ∂t

where n is a unit vector directed to a normal of S as shown. Applying Stokes’


theorem to the first term of (8.1), we get

∂B
E  dl þ  nΔlΔh = 0: ð8:2Þ
C ∂t

With the line integral of the first term, C is a closed loop surrounding the rectangle
S and dl = tdl, where t is a unit vector directed toward the tangential direction of
C (see t1 and t2 in Fig. 8.1). The line integration is performed such that C is followed
counterclockwise in the direction of t.
Figure 8.2 gives an intuitive diagram that explains Stokes’ theorem. The diagram
shows an overview of a surface S encircled by a closed curve C. Suppose that we
have a spiral vector field E represented by arrowed circles as shown. In that case,
rot E is directed toward the upper side of the plane of paper in the individual
fragments. A summation of rot E  ndS forms a surface integral covering S.
Meanwhile, the arrows of adjacent fragments cancel out each other and only the
components on the periphery (i.e., the curve C) are non-vanishing (see Fig. 8.2).
8.2 Basic Concepts Underlying Phenomena 307

Thus, the surface integral of rot E is equivalent to the line integral of E. Accordingly,
we get Stokes’ theorem described by [1]

rot E  ndS = E  dl: ð8:3Þ


S C

Returning back to Fig. 8.1 and taking Δh ⟶ 0, we have ∂B


∂t
 nΔlΔh⟶0. Then, the
second term of (8.2) vanishes and we get

E  dl = 0:
C

This implies that

ΔlðE1  t1 þ E2  t2 Þ = 0,

where E1 and E2 represent the electric field in the dielectrics D1 and D2 close to the
interface, respectively. Considering t2 = - t1 and putting t1 = t, we get

ðE1 - E2 Þ  t = 0, ð8:4Þ

where t represents a unit vector in the direction of a tangential line of the interface
plane. Equation (8.4) means that the tangential components of the electric field are
continuous on both sides of the interface. We obtain a similar result with the
magnetic field. This can be shown by taking a surface integral of both sides of
(7.29) as well. As a result, we get

ðH 1 - H 2 Þ  t = 0, ð8:5Þ

where H1 and H2 represent the magnetic field in D1 and D2 close to the interface,
respectively. Hence, from (8.5) the tangential components of the magnetic field are
continuous on both sides of the interface as well.

8.2 Basic Concepts Underlying Phenomena

When an electromagnetic wave is incident upon an interface of dielectrics, its


reflection and transmission (refraction) take place at the interface. We address a
question of how the nature of the dielectrics studied in the previous section is
associated with the optical phenomena. When we deal with the problem, we assume
non-absorbing media. Notice that the complex wavenumber vector is responsible for
an absorbing medium along with a complex index of refraction. Nonetheless, our
308 8 Reflection and Transmission of Electromagnetic Waves in Dielectric Media

approach is useful to discuss related problems in the absorbing media. Characteristic


impedance plays a key role in the reflection and transmission of light.
We represent a field (either electric or magnetic) of the incident, reflected, and
transmitted (or refracted) waves by Fi, Fr, and Ft, respectively. We call a dielectric
of the incidence side (and, hence, reflection side) D1 and another dielectric of the
transmission side D2. The fields are described by

Fi = F i εi eiðki x - ωtÞ , ð8:6Þ

Fr = F r εr eiðkr x - ωtÞ , ð8:7Þ

Ft = F t εt eiðkt x - ωtÞ , ð8:8Þ

where Fi, Fr, and Ft denote an amplitude of the field; εi, εr, and εt represent a unit
vector of polarization direction, i.e., the direction along which the field oscillates;
and ki, kr, and kt are wavenumber vectors such that ki ⊥ εi, kr ⊥ εr, and kt ⊥ εt. These
wavenumber vectors represent the propagation directions of individual waves. In
(8.6)–(8.8), indices of i, r, and t stand for incidence, reflection, and transmission,
respectively.
Let xs be an arbitrary position vector at the interface between the dielectrics. Also,
let t be a unit vector paralleling the interface. Thus, tangential components of the
field are described as

F it = F i ðt  εi Þeiðki xs - ωtÞ , ð8:9Þ

F rt = F r ðt  εr Þeiðkr xs - ωtÞ , ð8:10Þ

F tt = F t ðt  εt Þeiðkt xs - ωtÞ : ð8:11Þ

Note that F it and F rt represent the field in D1 just close to the interface and that F tt
denotes the field in D2 just close to the interface. Thus, in light of (8.4) and (8.5),
we have

F it þ F r t = F t t : ð8:12Þ

Notice that (8.12) holds with any position xs and any time t.
Let us think of elementary calculation of exponential functions or exponential
polynomials and the relationship between individual coefficients and exponents.
0
With respect to two functions eikx and eik x , we have two alternatives according to a
value Wronskian takes. Here, Wronskian W is expressed as
8.2 Basic Concepts Underlying Phenomena 309

ikx 0 0
eik x
W = ðeeikx Þ0 0 0
ðeik x Þ = - iðk - k 0 Þeiðkþk Þx : ð8:13Þ

0
(i) W ≠ 0 if and only if k ≠ k′. In this case, eikx and eik x are said to be linearly
independent. That is, on condition of k ≠ k′, for any x we have
0
aeikx þ beik x = 0 ⟺ a = b = 0: ð8:14Þ

(ii) W = 0 if k = k′. In that case, we have


0
aeikx þ beik x = ða þ bÞeikx = 0 ⟺ a þ b = 0:

Notice that eikx never vanishes with any x. To conclude, if we think of an equation
of an exponential polynomial
0
aeikx þ beik x = 0,

we have two alternatives regarding the coefficients. One is a trivial case of a = b = 0


and the other is a + b = 0.
Next, with respect to eik1 x , and eik2 x , and eik3 x , similarly we have

eik1 x eik2 x eik3 x


ik 1 x 0 ik 2 x 0 0
W = ðe Þ ðe Þ ðeik3 x Þ
00 00 00
ðeik1 x Þ ðeik2 x Þ ðeik3 x Þ

= - iðk 1 - k 2 Þðk2 - k3 Þðk 3 - k 1 Þeiðk1 þk2 þk3 Þx , ð8:15Þ

where W ≠ 0 if and only if k1 ≠ k2, k2 ≠ k3, and k3 ≠ k1. That is, on this condition for
any x we have

aeik1 x þ beik2 x þ ceik3 x = 0 ⟺ a = b = c = 0: ð8:16Þ

If the three exponential functions are linearly dependent, at least two of k1, k2, and
k3 are equal to each other and vice versa. On this condition, again consider a
following equation of an exponential polynomial:

aeik1 x þ beik2 x þ ceik3 x = 0: ð8:17Þ

Without loss of generality, we assume that k1 = k2. Then, we have

aeik1 x þ beik2 x þ ceik3 x = ða þ bÞeik1 x þ ceik3 x = 0:

If k1 ≠ k3, we must have


310 8 Reflection and Transmission of Electromagnetic Waves in Dielectric Media

a þ b=0 and c = 0: ð8:18Þ

If, on the other hand, k1 = k3, i.e., k1 = k2 = k3, we have

aeik1 x þ beik2 x þ ceik3 x = ða þ b þ cÞeik1 x = 0:

That is, we have

a þ b þ c = 0: ð8:19Þ

Consequently, we must have k1 = k2 = k3 so that we can get three nonzero


coefficients a, b, and c.
Returning to (8.12), its full description is

F i ðt  εi Þeiðki xs - ωtÞ þ F r ðt  εr Þeiðkr xs - ωtÞ - F t ðt  εt Þeiðkt xs - ωtÞ = 0: ð8:20Þ

Again, (8.20) must hold with any position xs and any time t. Meanwhile, for
(8.20) to have a physical meaning, we should have

F i ðt  εi Þ ≠ 0, F r ðt  εr Þ ≠ 0, and F t ðt  εt Þ ≠ 0: ð8:21Þ

On the basis of the above consideration, we must have following two relations:

ki  xs - ωt = kr  xs - ωt = kt  xs - ωt

or

k i  xs = kr  xs = kt  xs , ð8:22Þ

and

F i ðt  εi Þ þ F r ðt  εr Þ - F t ðt  εt Þ = 0

or

F i ðt  εi Þ þ F r ðt  εr Þ = F t ðt  εt Þ: ð8:23Þ

In this way, we are able to obtain a relation among amplitudes of the fields of
incidence, reflection, and transmission. Notice that we get both the relations between
exponents and coefficients at once.
First, let us consider (8.22). Suppose that the incident light (ki) is propagated in a
dielectric medium D1 in parallel to the zx-plane and that the interface is the xy-plane
(see Fig. 8.3). Also suppose that at the interface the light is reflected partly back to
D1 and transmitted (or refracted) partly into another dielectric medium D2. In
8.2 Basic Concepts Underlying Phenomena 311

Fig. 8.3 Geometry of the


incident, reflected, and
z
transmitted lights. We
assume that the light is
incident from a dielectric θ θ kr
medium D1 toward another
medium D2. The D1 x
wavenumber vectors ki, kr,
and kt represent the incident,
reflected, and transmitted D2
(or refracted) lights with an
angle θ, θ′, and ϕ,
θ ki
respectively. Note here that
we did not assume the
equality of θ and θ′ (see text) φ
kt

Fig. 8.3, ki, kr, and kt represent the incident, reflected, and transmitted lights that
make an angle θ, θ′, and ϕ with the z-axis, respectively. Then we have

k i sin θ
k i = ð e1 e2 e3 Þ 0 , ð8:24Þ
- k i cos θ

x
xs = ð e 1 e 2 e 3 Þ y , ð8:25Þ
0

where θ is said to be an incidence angle. A plane formed by ki and a normal to the


interface is called a plane of incidence (or incidence plane). In Fig. 8.3, the zx-plane
forms the incidence plane. From (8.24) and (8.25), we have

ki  xs = ki x sin θ, ð8:26Þ

kr  xs = krx x þ k ry y, ð8:27Þ

kt  xs = ktx x þ k ty y, ð8:28Þ

where ki = j kij; krx and kry are x and y components of kr; similarly k tx and kty are
x and y components of kt.
Since (8.22) holds with any x and y, we have
312 8 Reflection and Transmission of Electromagnetic Waves in Dielectric Media

k i sin θ = krx = ktx , ð8:29Þ

k ry = k ty = 0: ð8:30Þ

From (8.30) neither kr nor kt has a y component. This means that ki, kr, and kt are
coplanar. That is, the incident, reflected, and transmitted waves are all parallel to the
zx-plane. Notice that at the beginning we did not assume the coplanarity of those
waves. We did not assume the equality of θ and θ′ either (vide infra). From (8.29)
and Fig. 8.3, however, we have

k i sin θ = kr sin θ0 = kt sin ϕ, ð8:31Þ

where kr = j krj and kt = j ktj; θ′ and ϕ are said to be a reflection angle and a
refraction angle, respectively. Thus, the end points of ki, kr, and kt are connected on a
straight line that parallels the z-axis. Figure 8.3 clearly shows it.
Now, we suppose that a wavelength of the electromagnetic wave in D1 is λ1 and
that in D2 is λ2. Since the incident light and reflected light are propagated in D1,
we have

k i = kr = 2π=λ1 : ð8:32Þ

From (8.31) and (8.32), we get

sin θ = sin θ0 : ð8:33Þ

Therefore, we have either θ = θ′ or θ′ = π - θ. Since 0 < θ, θ′ < π/2, we have

θ = θ0 : ð8:34Þ

Then, returning back to (8.31), we have

k i sin θ = kr sin θ = kt sin ϕ: ð8:35Þ

This implies that the components tangential to the interface of ki, kr, and kt are
the same.
Meanwhile, we have

kt = 2π=λ2 : ð8:36Þ

Also we have
8.3 Transverse Electric (TE) Waves and Transverse Magnetic (TM) Waves 313

c = λ0 ν, v1 = λ1 ν, v2 = λ2 ν, ð8:37Þ

where v1 and v2 are phase velocities of light in D1 and D2, respectively. Since ν is
common to D1 and D2, we have

c=λ0 = v1 =λ1 = v2 =λ2

or

c=v1 = λ0 =λ1 = n1 , c=v2 = λ0 =λ2 = n2 , ð8:38Þ

where λ0 is a wavelength in vacuum and n1 and n2 are refractive indices of D1 and


D2, respectively. Combining (8.35) with (8.32), (8.36), and (8.38), we have several
relations such that

sin θ k λ n
= t = 1 = 2 ð nÞ, ð8:39Þ
sin ϕ ki λ 2 n1

where n is said to be a relative refractive index of D2 relative to D1. The relation


(8.39) is called Snell’s law. Notice that (8.39) reflects the kinematic aspect of light
and that this characteristic comes from the exponents of (8.20).

8.3 Transverse Electric (TE) Waves and Transverse


Magnetic (TM) Waves

On the basis of the above argument, we are now in the position to determine the
relations among amplitudes of the electromagnetic fields of waves of incidence,
reflection, and transmission. Notice that since we are dealing with non-absorbing
media, the relevant amplitudes are real (i.e., positive or negative). In other words,
when the phase is retained upon reflection, we have a positive amplitude due to
ei0 = 1. When the phase is reversed upon reflection, on the other hand, we will be
treating a negative amplitude due to eiπ = - 1. Nevertheless, when we consider the
total reflection, we deal with a complex amplitude (vide infra).
We start with the discussion of the vertical incidence of an electromagnetic wave
before the general oblique incidence. In Fig. 8.4a, we depict electric fields E and
magnetic fields H obtained at a certain moment near the interface. We index, e.g., Ei,
for the incident field. There we define unit polarization vectors of the electric field εi,
εr, and εt as identical to be e1 (a unit vector in the direction of the x-axis). In (8.6), we
also define Fi (both electric and magnetic fields) as positive.
We have two cases about a geometry of the fields (see Fig. 8.4). The first case is
that all Ei, Er, and Et are directed in the same direction (i.e., the positive direction of
the x-axis); see Fig. 8.4a. Another case is that although Ei and Et are directed in the
314 8 Reflection and Transmission of Electromagnetic Waves in Dielectric Media

(a) z Incidence light (b) z Incidence light

Hi Ei Hi Ei
Hr Er Hr Er
D1 x x

Ht e1 D2 e1
Et Ht Et

Fig. 8.4 Geometry of the electromagnetic fields near the interface between dielectric media D1 and
D2 in the case of vertical incidence. (a) All Ei, Er, and Et are directed in the same direction e1 (i.e., a
unit vector in the positive direction of the x-axis). (b) Although Ei and Et are directed in the same
direction, Er is reversed. In this case, we define Er as negative

same direction, Er is reversed (Fig. 8.4b). In this case, we define Er as negative.


Notice that Ei and Et are always directed in the same direction and that Er is directed
either in the same direction or in the opposite direction according to the nature of the
dielectrics. The situation will be discussed soon.
Meanwhile, unit polarization vectors of the magnetic fields are determined by
(7.67) for the incident, reflected, and transmitted waves. In Fig. 8.4, the magnetic
fields are polarized along the y-axis (i.e., perpendicular to the plane of paper). The
magnetic fields Hi and Ht are always directed to the same direction as in the case of
the electric fields. On the other hand, if the phase of Er is conserved, the direction of
Hr is reversed and vice versa. This converse relationship with respect to the electric
and magnetic fields results solely from the requirement that E, H, and the propaga-
tion unit vector n of light must constitute a right-handed system in this order. Notice
that n is reversed upon reflection.
Next, let us consider an oblique incidence. With the oblique incidence, electro-
magnetic waves are classified into two special categories, i.e., transverse electric
(TE) waves (or modes) or transverse magnetic (TM) waves (or modes). The TE wave
is characterized by the electric field that is perpendicular to the incidence plane,
whereas the TM wave is characterized by the magnetic field that is perpendicular to
the incidence plane. Here the incidence plane is a plane that is formed by the
propagation direction of the incident light and the normal to the interface of the
two dielectrics. Since E, H, and n form a right-handed system, in the TE wave, H lies
on the incidence plane. For the same reason, in the TM wave, E lies on the incidence
plane.
In a general case where a field is polarized in an arbitrary direction, that field can
be formed by superimposing two fields corresponding to the TE and TM waves. In
other words, if we take an arbitrary field E, it can be decomposed into a component
having a unit polarization vector directed perpendicular to the incidence plane and
another component having the polarization vector that lies on the incidence plane.
These two components are orthogonal to each other.
8.3 Transverse Electric (TE) Waves and Transverse Magnetic (TM) Waves 315

Fig. 8.5 Geometry of the


electromagnetic fields near Hr z
the interface between
dielectric media D1 and D2
in the case of oblique ki εi kr
incidence of a TE wave. The εt
electric field E is polarized θ
along the y-axis (i.e.,
perpendicular to the plane of
D1 x
paper) with H polarized in
the zx-plane. Unit y D2
polarization vectors εi, εr,
and εt are given for H with
Ei Et εr Er
red arrows. To avoid
complication, neither Hi nor
Ht is shown
φ

kt

Example 8.1 TE Wave In Fig. 8.5 we depict the geometry of oblique incidence of a
TE wave. The xy-plane defines the interface of the two dielectrics and t of (8.9) lies
on that plane. The zx-plane defines the incidence plane. In this case, E is polarized
along the y-axis with H polarized in the zx-plane. That is, regarding E, we choose
polarization direction εi, εr, and εt of the electric field as e2 (a unit vector toward the
positive direction of the y-axis that is perpendicular to the plane of paper). In Fig. 8.5,
the polarization direction of the electric field is denoted by a symbol ⨂. Therefore,
we have

e2  εi = e2  εr = e2  εt = 1: ð8:40Þ

For H we define the direction of unit polarization vectors εi, εr, and εt so that their
direction cosine relative to the x-axis can be positive; see Fig. 8.5. Choosing e1
(a unit vector in the direction of the x-axis) for t of (8.9) with regard to H, we have

e1  εi = cos θ, e1  εr = cos θ, and e1  εt = cos ϕ: ð8:41Þ

Note that we have used the same symbols εi, εr, and εt to represent the unit
polarization vectors of both the electric field E and magnetic field H. In Fig. 8.5, the
unit polarization vectors of H are depicted with red arrows.
According as Hr is directed to the same direction as εr or the opposite direction to
εr, the amplitude is defined as positive or negative, as in the case of the vertical
incidence. In Fig. 8.5, we depict the case where the amplitude Hr is negative. That is,
the phase of the magnetic field is reversed upon reflection and, hence, Hr is in an
opposite direction to εr in Fig. 8.5. Note that Hi and Ht are in the same direction as εi
and εt, respectively. To avoid complication, neither Hi nor Ht is shown in Fig. 8.5.
316 8 Reflection and Transmission of Electromagnetic Waves in Dielectric Media

Taking account of the above note of caution, we apply (8.23) to both E and H to
get the following relations:

Ei þ Er = Et , ð8:42Þ

H i cos θ þ H r cos θ = H t cos ϕ: ð8:43Þ

To derive the above equations, we choose t = e2 with E and t = e1 with H for


(8.23). Because of the abovementioned converse relationship with E and H, we have

E r H r < 0: ð8:44Þ

Suppose that we carry out an experiment to determine six amplitudes in (8.42)


and (8.43). Out of those quantities, we can freely choose and fix Ei. Then, we have
five unknown amplitudes, i.e., Er, Et, Hi, Hr, and Ht. Thus, we need three more
relations to determine them. Here information about the characteristic impedance
Z is useful. It was defined as (7.64). From (8.6) to (8.8) as well as (8.44), we get

Z1 = μ1 =ε1 = E i =H i = - E r =H r , ð8:45Þ

Z2 = μ2 =ε2 = E t =H t , ð8:46Þ

where ε1 and μ1 are permittivity and permeability of D1, respectively, and ε2 and μ2
are permittivity and permeability of D2, respectively. As an example, we have

H i = n × Ei =Z 1 = n × E i εi,e eiðki x - ωtÞ =Z 1 = Ei εi,m eiðki x - ωtÞ =Z 1

= H i εi,m eiðki x - ωtÞ , ð8:47Þ

where we distinguish polarization vectors of electric field (εi, e) and magnetic field
(εi, m). With the notation and geometry of the fields Ei and Hi, see Fig. 7.4.
Comparing coefficients of the last relation of (8.47), we get

E i =Z 1 = H i : ð8:48Þ

On the basis of (8.42)–(8.46), we are able to decide Er, Et, Hi, Hr, and Ht.
What we wish to determine, however, is a ratio among those quantities. To this
end, dividing (8.42) and (8.43) by Ei (>0), we define following quantities:
8.3 Transverse Electric (TE) Waves and Transverse Magnetic (TM) Waves 317

R⊥
E  E r =E i and T⊥
E  E t =E i , ð8:49Þ

where R⊥ ⊥
E and T E are said to be a reflection coefficient and transmission coefficient
with the electric field, respectively; the symbol ⊥ means a quantity of the TE wave
(i.e., electric field oscillating vertically with respect to the incidence plane). Thus
rewriting (8.42) and (8.43) and using R⊥ ⊥
E and T E , we have

R⊥ ⊥
E - T E = - 1,
cos θ cos ϕ cos θ ð8:50Þ
R⊥
E Z þ T⊥
E Z = :
1 2 Z1

Using Cramer’s rule of matrix algebra, we have a solution such that

-1 -1
cos θ cos ϕ
Z1 Z2 Z cos θ - Z 1 cos ϕ
R⊥
E = = 2 , ð8:51Þ
1 -1 Z 2 cos θ þ Z 1 cos ϕ
cos θ cos ϕ
Z1 Z2

1 -1
cos θ cos θ
Z1 Z1 2Z 2 cos θ
T⊥
E = = : ð8:52Þ
1 -1 Z 2 cos θ þ Z 1 cos ϕ
cos θ cos ϕ
Z1 Z2

Similarly, defining

R⊥
H  H r =H i and T⊥
H  H t =H i , ð8:53Þ

where R⊥ ⊥
H and T H are said to be a reflection coefficient and transmission coefficient
with the magnetic field, respectively, we get

Z 1 cos ϕ - Z 2 cos θ
R⊥
H= , ð8:54Þ
Z 2 cos θ þ Z 1 cos ϕ

2Z 1 cos θ
T⊥
H= : ð8:55Þ
Z 2 cos θ þ Z 1 cos ϕ

In this case, rewrite (8.42) as a relation among Hi, Hr, and Ht using (8.45) and
(8.46). Derivation of (8.54) and (8.55) is left for readers. Notice also that
318 8 Reflection and Transmission of Electromagnetic Waves in Dielectric Media

Table 8.1 Reflection and transmission coefficients of TE and TM waves


⊥ Incidence (TE) k Incidence (TM)
Z 2 cos θ - Z 1 cos ϕ k
R⊥
E = Z 2 cos θþZ 1 cos ϕ RE = Z 2 cos ϕ - Z 1 cos θ
Z 1 cos θþZ 2 cos ϕ
Z 1 cos ϕ - Z 2 cos θ k k
R⊥
H = Z 2 cos θþZ 1 cos ϕ = - R⊥
E RH = Z 1 cos θ - Z 2 cos ϕ
Z 1 cos θþZ 2 cos ϕ = - RE
2Z 2 cos θ
T⊥
E = Z 2 cos θþZ 1 cos ϕ
k
TE = 2Z 2 cos θ
Z 1 cos θþZ 2 cos ϕ
2Z 1 cos θ
T⊥
H= Z 2 cos θþZ 1 cos ϕ
k
TH = 2Z 1 cos θ
Z 1 cos θþZ 2 cos ϕ
cos ϕ k k k k cos ϕ
- R⊥ ⊥ ⊥ ⊥
E RH þ T E T H cos θ = 1 ðR⊥ þ T ⊥ = 1Þ - RE RH þ T E T H cos θ = 1 Rk þ T k = 1

R⊥ ⊥
H = - RE : ð8:56Þ

This relation can easily be derived by (8.45).


Example 8.2 TM Wave In a manner similar to that described above, we obtain
information about the TM wave. Switching a role of E and H, we assume that H is
polarized along the y-axis with E polarized in the zx-plane. Following the aforemen-
tioned procedures, we have

Ei cos θ þ Er cos θ = E t cos ϕ, ð8:57Þ

Hi þ Hr = Ht : ð8:58Þ

From (8.57) and (8.58), similarly we get

k Z 2 cos ϕ - Z 1 cos θ
RE = , ð8:59Þ
Z 1 cos θ þ Z 2 cos ϕ

k 2Z 2 cos θ
TE = : ð8:60Þ
Z 1 cos θ þ Z 2 cos ϕ

Also, we get

k Z 1 cos θ - Z 2 cos ϕ k
RH = = - RE , ð8:61Þ
Z 1 cos θ þ Z 2 cos ϕ

k 2Z 1 cos θ
TH = : ð8:62Þ
Z 1 cos θ þ Z 2 cos ϕ

The symbol k of (8.59)–(8.62) means a quantity of the TM wave.


In Table 8.1, we list the important coefficients in relation to the reflection and
transmission of electromagnetic waves along with their relationship.
8.4 Energy Transport by Electromagnetic Waves 319

In Examples 8.1 and 8.2, we have examined how the reflection and transmission
coefficients vary as a function of characteristic impedance as well as incidence and
refraction angles. Meanwhile, in a nonmagnetic substance, a refractive index n can
be approximated as
p
n ≈ εr , ð7:57Þ

assuming that μr ≈ 1. In this case, we have


p
Z= μ=ε = μr μ0 =εr ε0 ≈ Z 0 = εr ≈ Z 0 =n:

Using this relation, we can readily rewrite the reflection and transmission coef-
ficients as a function of refractive indices of the dielectrics. The derivation is left for
the readers.

8.4 Energy Transport by Electromagnetic Waves

Returning to (7.58) and (7.59), let us consider energy transport in a dielectric


medium by electromagnetic waves. Let us describe their electric (E) and magnetic
(H) fields of the electromagnetic waves in a uniform and infinite dielectric medium
such that

E = Eεe eiðknx - ωtÞ , ð8:63Þ

H = Hεm eiðknx - ωtÞ , ð8:64Þ

where εe and εm are unit polarization vector; we assume that both E and H are
positive. Notice again that εe, εm, and n constitute a right-handed system in this
order.
The energy transport is characterized by a Poynting vector S that is described by

S = E × H: ð8:65Þ

V A
Since E and H have a dimension [m ] and [m ], respectively, S has a dimension [mW2].
Hence, S represents an energy flow per unit time and per unit area with respect to the
propagation direction. For simplicity, let us assume that the electromagnetic wave is
propagating toward the z-direction. Then we have

E = Eεe eiðkz - ωtÞ , ð8:66Þ


320 8 Reflection and Transmission of Electromagnetic Waves in Dielectric Media

H = Hεm eiðkz - ωtÞ : ð8:67Þ

To seek a time-averaged energy flow toward the z-direction, it suffices to multiply


real parts of (8.66) and (8.67) and integrate it during a period T at a point of z = 0.
Thus, a time-averaged Poynting vector S is given by

T
EH
S = e3 cos 2 ωtdt, ð8:68Þ
T 0

where T = 1/ν = 2π/ω. Using a trigonometric formula

1
cos 2 ωt = ð1 þ cos 2ωt Þ, ð8:69Þ
2

the integration can easily be performed. Thus, we get

1
S= EHe3 : ð8:70Þ
2

Equivalently, we have

1
S= E × H : ð8:71Þ
2

Meanwhile, an energy density W is given by

1
W= ðE  D þ H  BÞ, ð8:72Þ
2

where the first and second terms are pertinent to the electric and magnetic fields,
respectively. Note in (8.72) that the dimension of E  D is [m  m2] = [mJ3] and that the
V C

dimension of H  B is [m  m2 ] = [ m3 ] = [m3 ]. Using (7.7) and (7.10), we have


A Vs Ws J

1
W= εE2 þ μH 2 : ð8:73Þ
2

As in the above case, estimating a time-averaged energy density W, we get

1 1 2 1 2 1 1
W= εE þ μH = εE 2 þ μH 2 : ð8:74Þ
2 2 2 4 4

We also get this relation by integrating (8.73) over a wavelength λ at a time of


t = 0. Using (7.60) and (7.61), we have
8.4 Energy Transport by Electromagnetic Waves 321

εE2 = μH 2 : ð8:75Þ

This implies that the energy density resulting from the electric field and that due
to the magnetic field have the same value. Thus, rewriting (8.74) we have

1 2 1 2
W= εE = μH : ð8:76Þ
2 2

Moreover, using (7.43), we have for an impedance

Z = E=H = μ=ε = μv or E = μvH: ð8:77Þ

Using this relation along with (8.75), we get

1 1
S= vεE 2 e3 = vμH 2 e3 : ð8:78Þ
2 2

Thus, we have various relations among amplitudes of electromagnetic waves and


related physical quantities together with constant of dielectrics.
Returning to Examples 8.1 and 8.2, let us further investigate the reflection and
transmission properties of the electromagnetic waves. From (8.51) to (8.55) as well
as (8.59) to (8.62), we get in both the cases of TE and TM waves

cos ϕ
- R⊥ ⊥ ⊥ ⊥
E RH þ T E T H = 1, ð8:79Þ
cos θ

k k k k cos ϕ
- RE RH þ T E T H = 1: ð8:80Þ
cos θ

In both the TE and TM cases, we define reflectance R and transmittance T


such that

R  - RE RH = R2E = 2 Sr =2 Si = Sr = Si , ð8:81Þ

where Sr and Si are time-averaged Poynting vectors of the reflected wave and
incident waves, respectively.
Also, we have

cos ϕ 2 St cos ϕ St cos ϕ


T  TETH = = , ð8:82Þ
cos θ 2 Si cos θ Si cos θ

where St is a time-averaged Poynting vector of the transmitted wave. Thus, we have


322 8 Reflection and Transmission of Electromagnetic Waves in Dielectric Media

Fig. 8.6 Luminous flux z


near the interface
ki kr

θ
D1
D2 x

kt

R þ T = 1: ð8:83Þ
ϕ
The relation (8.83) represents the energy conservation. The factor cos cos θ can be
understood by Fig. 8.6 that depicts a luminous flux near the interface. Suppose that
we have an incident wave with an irradiance I mW2 whose incidence plane is the zx-
plane. Notice that I has the same dimension as a Poynting vector.
Here let us think of the luminous flux that is getting through a unit area (i.e., a unit
length square) perpendicular to the propagation direction of the light. Then, this flux
illuminates an area on the interface of a unit length (in the y-direction) multiplied by
ϕ
cos θ (in the x-direction). That is, the luminous flux has been widened
a length of cos
ϕ
(or squeezed) by cos cos θ times after getting through the interface. The irradiance has
been weakened (or strengthened) accordingly (see Fig. 8.6). Thus, to take a balance
of income and outgo with respect to the luminous flux before and after getting
ϕ
through the interface, the transmission irradiance must be multiplied by a factor cos cos θ .

8.5 Brewster Angles and Critical Angles

In this section and subsequent sections, we deal with nonmagnetic substance as


dielectrics; namely, we assume μr ≈ 1. In that case, as mentioned in Sect. 8.3, we
rewrite, e.g., (8.51) and (8.59) as

cos θ - n cos ϕ
R⊥
E = , ð8:84Þ
cos θ þ n cos ϕ
8.5 Brewster Angles and Critical Angles 323

k cos ϕ - n cos θ
RE = , ð8:85Þ
cos ϕ þ n cos θ

where n (=n2/n1) is a relative refractive index of D2 relative to D1. Let us think of a


k
condition on which R⊥ E = 0 or RE = 0.
First, we consider (8.84). We have

sin θ
½numerator of ð8:84Þ = cos θ - n cos ϕ = cos θ - cos ϕ
sin ϕ

sin ϕ cos θ - sin θ cos ϕ sinðϕ - θÞ


= = , ð8:86Þ
sin ϕ sin ϕ

where with the second equality we used Snell’s law; the last equality is due to
trigonometric formula. Since we assume 0 < θ < π/2 and 0 < ϕ < π/2, we have -π/
2 < ϕ - θ < π/2. Therefore, if and only if ϕ - θ = 0, sin(ϕ - θ) = 0. Namely, only
when ϕ = θ, R⊥ E could vanish. For different dielectrics having different refractive
indices, only if ϕ = θ = 0 (i.e., a vertical incidence), we have ϕ = θ. But, in that case
we obtain

sinðϕ - θÞ 0
lim = :
ϕ → 0, θ → 0 sin ϕ 0

This is a limit of indeterminate form. From (8.84), however, we have

1-n
R⊥
E = , ð8:87Þ
1þn

for ϕ = θ = 0. This implies that R⊥ ⊥


E does not vanish at ϕ = θ = 0. Thus, RE never
vanishes for any θ or ϕ. Note that for this condition, naturally we have

k 1-n
RE = :
1þn

This is because with ϕ = θ = 0 we have no physical difference between TE and


TM waves.
In turn, let us examine (8.85) similarly with the case of TM wave. We have

sin θ
½numerator of ð8:85Þ = cos ϕ - n cos θ = cos ϕ - cos θ
sin ϕ
324 8 Reflection and Transmission of Electromagnetic Waves in Dielectric Media

sin ϕ cos ϕ - sin θ cos θ sinðϕ - θÞ cosðϕ þ θÞ


= = : ð8:88Þ
sin ϕ sin ϕ

With the last equality of (8.88), we used a trigonometric formula. From (8.86) we
ðϕ - θ Þ k
know that sinsin ϕ does not vanish. Therefore, for RE to vanish, we need cos
(ϕ + θ) = 0. Since 0 < ϕ + θ < π, cos(ϕ + θ) = 0 if and only if

ϕ þ θ = π=2: ð8:89Þ

In other words, for particular angles θ = θB and ϕ = ϕB that satisfy

ϕB þ θB = π=2, ð8:90Þ

k
we have RE = 0, i.e., we do not observe a reflected wave. The particular angle θB is
said to be the Brewster angle. For θB we have

π
sin ϕB = sin - θB = cos θB ,
2

n = sin θB = sin ϕB = sin θB = cos θB = tan θB or θB = tan - 1 n,

ϕB = tan - 1 n - 1 : ð8:91Þ

Suppose that we have a parallel plate consisting a dielectric D2 of a refractive


index n2 sandwiched with another dielectric D1 of a refractive index n1 (Fig. 8.7).
Let θB be the Brewster angle when the TM wave is incident from D1 to D2. In the
above discussion, we defined a relative refractive index n of D2 relative D1 as
n = n2/n1; recall (8.39). The other way around, suppose that the TM wave is incident
from D2 to D1. Then, the relative refractive index of D1 relative to D2 is n1/n2 = n-1.
In this situation, another Brewster angle (from D2 to D1) defined as ~θB is given by

~θB = tan - 1 n - 1 : ð8:92Þ

This number is, however, identical to ϕB in (8.91). Thus, we have

~
θ B = ϕB : ð8:93Þ

Thus, regarding the TM wave that is propagating in D2 after getting through the
interface and is to get back to D1, ~
θB = ϕB is again the Brewster angle. In this way,
the said TM wave is propagating from D1 to D2 and then getting back from D2 to D1
without being reflected by the two interfaces. This conspicuous feature is often
utilized for an optical device.
8.6 Total Reflection 325

Fig. 8.7 Diagram that


explains the Brewster angle. z
Suppose that a parallel plate
consisting of a dielectric D2 θB D1 ( )
of a refractive index n2 is
sandwiched with another
dielectric D1 of a refractive x
index n1. The incidence
D2 ( ) φB
angle θB represents the φB
Brewster angle observed
when the TM wave is
incident from D1 to D2. ϕB
is another Brewster angle
that is observed when the
D1 ( ) θB
TM wave is getting back
from D2 to D1

If an electromagnetic wave is incident from a dielectric of a higher refractive


index to that of a lower index, the total reflection takes place. This is equally the case
with both TE and TM waves. For the total reflection to take place, θ should be larger
than a critical angle θc that is defined by

θc = sin - 1 n: ð8:94Þ

This is because at θc from the Snell’s law we have

sin θc n
= sin θc = 2 ð nÞ: ð8:95Þ
sin π2 n1

From (8.95), we have


p
tan θc = n= 1 - n2 > n = tan θB :

In the case of the TM wave, therefore, we find that

θc > θB : ð8:96Þ

The critical angle is always larger than the Brewster angle with TM waves.

8.6 Total Reflection

In Sect. 8.2 we saw that Snell’s law results from the kinematical requirement. For
this reason, we may consider it as a universal relation that can be extended to
complex refraction angles. In fact, for Snell’s law to hold with θ > θc, we must have
326 8 Reflection and Transmission of Electromagnetic Waves in Dielectric Media

sin ϕ > 1: ð8:97Þ

This needs us to extend ϕ to a complex domain. Putting

π
ϕ= þ ia ða : real, a ≠ 0Þ, ð8:98Þ
2

we have

1 iϕ 1
sin ϕ  e - e - iϕ = ðe - a þ ea Þ > 1, ð8:99Þ
2i 2

1 iϕ i
cos ϕ  e þ e - iϕ = ðe - a - ea Þ: ð8:100Þ
2 2

Thus, cos ϕ is pure imaginary.


Now, let us consider a transmitted wave whose electric field is described as

Et = Eεt eiðkt x - ωtÞ , ð8:101Þ

where εt is the unit polarization vector and kt is a wavenumber vector of the


transmission wave. Suppose that the incidence plane is the zx-plane. Then, we have

kt  x = ktx x þ ktz z = xk t sin ϕ þ zk t cos ϕ, ð8:102Þ

where k tx and ktz are x and z components of kt, respectively; kt = j ktj. Putting

cos ϕ = ib ðb : real, b ≠ 0Þ, ð8:103Þ

we have

Et = Eεt eiðxkt sin ϕþibzkt - ωtÞ = Eεt eiðxkt sin ϕ - ωtÞ e - bzkt : ð8:104Þ

With the total reflection, we must have

z⟶1 ⟹ e - bzkt ⟶0: ð8:105Þ

To meet this requirement, we have

b > 0: ð8:106Þ

Meanwhile, we have
8.6 Total Reflection 327

sin 2 θ n2 - sin 2 θ
cos 2 ϕ = 1 - sin 2 ϕ = 1 - = , ð8:107Þ
n2 n2

where notice that n < 1, because we are dealing with the incidence of light from a
medium with a higher refractive index to a low index medium. When we consider
the total reflection, the numerator of (8.107) is negative, and so we have two choices
such that
p
sin 2 θ - n2
cos ϕ = ± i : ð8:108Þ
n

From (8.103) and (8.106), we get


p
sin 2 θ - n2
cos ϕ = i : ð8:109Þ
n

Hence, inserting (8.109) into (8.84), we have for the TE wave


p
cos θ - i sin 2 θ - n2
R⊥
E = p : ð8:110Þ
cos θ þ i sin 2 θ - n2

Then, we have
p
⊥   cos θ - i sin 2 θ - n2
R = - R⊥
E R⊥
H = R⊥
E R⊥
E = p
cos θ þ i sin 2 θ - n2
p
cos θ þ i sin 2 θ - n2
 p = 1: ð8:111Þ
cos θ - i sin 2 θ - n2

As for the TM wave, substituting (8.109) for (8.85), we have


p
k - n2 cos θ þ i sin 2 θ - n2
RE = p : ð8:112Þ
n2 cos θ þ i sin 2 θ - n2

In this case, we also get


p p
k - n2 cos θ þ i sin 2 θ - n2 - n2 cos θ - i sin 2 θ - n2
R = p  p = 1: ð8:113Þ
n2 cos θ þ i sin 2 θ - n2 n2 cos θ - i sin 2 θ - n2

The relations (8.111) and (8.113) ensure that the energy flow gets back to a higher
refractive index medium.
Thus, the total reflection is characterized by the complex reflection coefficient
expressed as (8.110) and (8.112) as well as a reflectance of 1. From (8.110) and
328 8 Reflection and Transmission of Electromagnetic Waves in Dielectric Media

(8.112), we can estimate a change in a phase of the electromagnetic wave that takes
place by virtue of the total reflection. For this purpose, we put

k
R⊥
E  e

and RE  eiβ : ð8:114Þ

Rewriting (8.110), we have


p
cos 2 θ - sin 2 θ - n2 - 2i cos θ sin 2 θ - n2
R⊥ = : ð8:115Þ
E
1 - n2

At a critical angle θc, from (8.95) we have

sin θc = n: ð8:116Þ

Therefore, we have

1 - n2 = cos 2 θc : ð8:117Þ

Then, as expected, we get

R⊥
E θ = θc
= 1: ð8:118Þ

Note, however, that at θ = π/2 (i.e., grazing incidence), we have

R⊥
E θ = π=2
= - 1: ð8:119Þ

From (8.115), an argument α in a complex plane is given by


p
2 cos θ sin 2 θ - n2
tan α = - : ð8:120Þ
cos 2 θ - sin 2 θ - n2

The argument α defines a phase shift upon the total reflection. Considering
(8.115) and (8.118), we have

αjθ = θc = 0:

Since 1 - n2 > 0 (i. e., n < 1) and in the total reflection region sin2θ - n2 > 0, the
imaginary part of R⊥E is negative for any θ (i.e., 0 to π/2). On the other hand, the real

part of RE varies from 1 to -1, as is evidenced from (8.118) and (8.119). At θ0 that
satisfies a following condition:
8.6 Total Reflection 329

Fig. 8.8 Phase shift α


defined in a complex plane
for the total reflection of TE i
wave. The number n denotes
a relative refractive index of
D2 relative to D1. At a
critical angle θc, α = 0

0 1

α
= =
2

1+
= sin
2

1 þ n2
sin θ0 = , ð8:121Þ
2

the real part is zero. Thus, the phase shift α varies from 0 to -π as indicated in
Fig. 8.8. Comparing (8.121) with (8.116) and taking into account n < 1, we have

θc < θ0 < π=2:

Similarly, we estimate the phase change for a TM wave. Rewriting (8.112),


we have
p
k - n4 cos 2 θ þ sin 2 θ - n2 þ 2in2 cos θ sin 2 θ - n2
RE =
n4 cos 2 θ þ sin 2 θ - n2

p
- n4 cos 2 θ þ sin 2 θ - n2 þ 2in2 cos θ sin 2 θ - n2
= : ð8:122Þ
ð1 - n2 Þ sin 2 θ - n2 cos 2 θ

Then, we have
330 8 Reflection and Transmission of Electromagnetic Waves in Dielectric Media

k
RE = - 1: ð8:123Þ
θ = θc

Also at θ = π/2 (i.e., grazing incidence), we have

k
RE = 1: ð8:124Þ
θ = π=2

From (8.122), an argument β is given by


p
2n2 cos θ sin 2 θ - n2
tan β = : ð8:125Þ
- n4 cos 2 θ þ sin 2 θ - n2

Considering (8.122) and (8.123), we have

βjθ = θc = π: ð8:126Þ

In the total reflection region, we have

sin 2 θ - n2 cos 2 θ > n2 - n2 cos 2 θ = n2 1 - cos 2 θ > 0: ð8:127Þ

Therefore, the denominator of (8.122) is positive and, hence, the imaginary part
k
of RE is positive as well for any θ (i.e., 0 to π/2). From (8.123) and (8.124), on the
other hand, the real part of RE in (8.122) varies from -1 to 1. At ~θ0 that satisfies a
k

following condition:

1 - n2
cos ~
θ0 = , ð8:128Þ
1 þ n4
k
the real part of RE is zero. Once again, we have

θc < ~
θ0 < π=2: ð8:129Þ

Thus, the phase β varies from π to 0 as depicted in Fig. 8.9.


In this section, we mentioned somewhat peculiar features of complex trigono-
metric functions such as sinϕ > 1 in light of real functions. As a matter of course, the
complex angle ϕ should be determined experimentally from (8.109). In this context
readers are referred to Chap. 6 that dealt with the theory of analytic functions [2].
8.7 Waveguide Applications 331

Fig. 8.9 Phase shift β for


the total reflection of TM
wave. At a critical angle θc, 1−
β=π = cos
1+
i

β
1
0

= =
2

8.7 Waveguide Applications

There are many optical devices based upon light propagation. Among them, wave-
guide devices utilize the total reflection. We explain their operation principle.
Suppose that we have a thin plate (usually said to be a slab) comprising a
dielectric medium that infinitely spreads two-dimensionally and that the plate is
sandwiched with another dielectric (or maybe air or vacuum) or metal. In this
situation electromagnetic waves are confined within the slab. Moreover, only
under a restricted condition those waves are allowed to propagate in parallel to the
slab plane. Such electromagnetic waves are usually called propagation modes or
simply “modes.” An optical device thus designed is called a waveguide. These
modes are characterized by repeated total reflection during the propagation. Another
mode is an evanescent mode. Because of the total reflection, the energy transport is
not allowed to take place vertically to the interface of two dielectrics. For this reason,
the evanescent mode is thought to be propagated in parallel to the interface very
close to it.

8.7.1 TE and TM Waves in a Waveguide

In a waveguide configuration, propagating waves are classified into TE and TM


modes. Quality of materials that constitute a waveguide largely governs the propa-
gation modes within the waveguide.
332 8 Reflection and Transmission of Electromagnetic Waves in Dielectric Media

(a) (b)
Metal Dielectric (clad layer)

Waveguide
Waveguide

d
d
(core layer)

Metal Dielectric (clad layer)


y y

z z
x x

Fig. 8.10 Cross-section of a slab waveguide comprising a dielectric medium. (a) A waveguide is
sandwiched between a couple of metal layers. (b) A waveguide is sandwiched between a couple of
layers consisting of another dielectric called clad layer. The sandwiched layer is called core layer

Figure 8.10 depicts a cross-section of a slab waveguide. We assume that the


electromagnetic wave is propagated toward the positive direction of the z-axis and
that a waveguide infinitely spreads toward the z- and x-axes. Suppose that the said
waveguide is spatially confined toward the y-axis. Let the thickness of the wave-
guide be d. From a point of view of material that shapes a waveguide, waveguides
are classified into two types. (i) Electromagnetic waves are completely confined
within the waveguide layer. This case typically happens when a dielectric forming
the waveguide is sandwiched between a couple of metal layers (Fig. 8.10a). This is
because the electromagnetic wave is not allowed to exist or propagate inside the
metal.
(ii) Electromagnetic waves are not completely confined within the waveguide.
This case happens when the dielectric of the waveguide is sandwiched by a couple of
other dielectrics. We distinguish this case as the total internal reflection from the
above case (i). We will further describe it in Sect. 8.7.2. For the total internal
reflection to take place, the refractive index of the waveguide must be higher than
those of other dielectrics (Fig. 8.10b). The dielectric of the waveguide is called core
layer and the other dielectric is called clad layer. In this case, electromagnetic waves
are allowed to propagate inside of the clad layer, even though the region is confined
very close to the interface between the clad and core layers. Such electromagnetic
waves are said to be an evanescent wave. According to these two cases (i) and (ii),
we have different conditions under which the allowed modes can exist.
Now, let us return to Maxwell’s equations. We have introduced the equations of
wave motion (7.35) and (7.36) from Maxwell’s equations (7.28) and (7.29) along
with (7.7) and (7.10). One of their simplest solutions is a plane wave described by
(7.53). The plane wave is characterized by that the wave has the same phase on an
infinitely spreading plane perpendicular to the propagation direction (characterized
by a wavenumber vector k). In a waveguide, however, the electromagnetic field is
confined with respect to the direction parallel to the normal to the slab plane (i.e., the
direction of the y-axis in Fig. 8.10). Consequently, the electromagnetic field can no
8.7 Waveguide Applications 333

longer have the same phase in that direction. Yet, as solutions of equations of wave
motion, we can have a solution that has the same phase with the direction of the x-
axis. Bearing in mind such a situation, let us think of Maxwell’s equations in relation
to the equations of wave motion.
Ignoring components related to partial differentiation with respect to x (i.e., the
component related to ∂/∂x) and rewriting (7.28) and (7.29) for individual Cartesian
coordinates, we have [3]

∂Ez ∂E y ∂Bx
- þ = 0, ð8:130Þ
∂y ∂z ∂t

∂E x ∂By
þ = 0, ð8:131Þ
∂z ∂t

∂E x ∂Bz
- þ = 0, ð8:132Þ
∂y ∂t

∂H z ∂H y ∂Dx
- - = 0, ð8:133Þ
∂y ∂z ∂t

∂H x ∂Dy
- = 0, ð8:134Þ
∂z ∂t

∂H x ∂Dz
- - = 0: ð8:135Þ
∂y ∂t

Of the above equations, we collect those pertinent to Ex and differentiate (8.131),


(8.132), and (8.133) with respect to z, y, and t, respectively, to get

2
∂ E x ∂ By
2
þ = 0, ð8:136Þ
∂z
2 ∂z∂t

2 2
∂ Ex ∂ Bz
- = 0, ð8:137Þ
∂y
2 ∂y∂t
334 8 Reflection and Transmission of Electromagnetic Waves in Dielectric Media

2
2
∂ Hz ∂ H y ∂2 Dx
- - = 0: ð8:138Þ
∂t∂y ∂t∂z ∂t
2

Multiplying (8.138) by μ and further adding (8.136) and (8.137) to it and using
(7.7), we get

2 2 2
∂ Ex ∂ Ex ∂ E
2
þ 2
= με 2 x : ð8:139Þ
∂y ∂z ∂t

This is a two-dimensional equation of wave motion. In a similar manner, from


(8.130), (8.134), and (8.135), we have for the magnetic field

2 2 2
∂ Hx ∂ Hx ∂ Hx
2
þ 2
= με 2
: ð8:140Þ
∂y ∂z ∂t

Equations (8.139) and (8.140) are two-dimensional wave equations with respect
to the y- and z-coordinates. With the direction of the x-axis, a propagating wave has
the same phase. Suppose that we have plane wave solutions for them as in the case of
(7.58) and (7.59). Then, we have

E = E0 eiðkx - ωtÞ = E0 eiðknx - ωtÞ ,

H = H 0 eiðkx - ωtÞ = H 0 eiðknx - ωtÞ : ð8:141Þ

Note that a plane wave expressed by (7.58) and (7.59) is propagated uniformly in
a dielectric medium. In a waveguide, on the other hand, the electromagnetic waves
undergo repeated (total) reflections from the two boundaries positioned either side of
the waveguide while being propagated.
In a three-dimensional version, the wavenumber vector has three components kx,
ky, and kz as expressed in (7.48). In (8.141), in turn, k has y- and z-components
such that

k2 = k 2 = k 2y þ k2z : ð8:142Þ

Equations (8.139) and (8.140) can be rewritten as

2 2 2 2 2 2
∂ Ex ∂ Ex ∂ Ex ∂ Hx ∂ Hx ∂ Hx
2
þ 2
= με 2
, 2
þ 2
= με :
∂ ð ± yÞ ∂ ð ± zÞ ∂ ð ± t Þ ∂ ð ± yÞ ∂ ð ± zÞ ∂ ð ± t Þ2

Accordingly, we have four wavenumber vector components


8.7 Waveguide Applications 335

Fig. 8.11 Four possible


propagation directions k of
| |
an electromagnetic wave in
a slab waveguide

−| | | |

−| |

Fig. 8.12 Geometries and


propagation of the
electromagnetic waves in a
slab waveguide

k kz
ky
θ ky θ
k
kz

z
x

ky = ± j ky j and kz = ± j kz j :

Figure 8.11 indicates this situation where an electromagnetic wave can be


propagated within a slab waveguide in either one direction out of four choices of
k. In this section, we assume that the electromagnetic wave is propagated toward the
positive direction of the z-axis, and so we define kz as positive. On the other hand, ky
can be either positive or negative. Thus, we get

kz = k sin θ and ky = ± k cos θ: ð8:143Þ

Figure 8.12 shows the geometries of the electromagnetic waves within the slab
waveguide. The slab plane is parallel to the zx-plane. Let the positions of the two
interfaces of the slab waveguide be
336 8 Reflection and Transmission of Electromagnetic Waves in Dielectric Media

y=0 and y = d: ð8:144Þ

That is, we assume that the thickness of the waveguide is d.


Since (8.139) describes a wave equation for only one component Ex, (8.139) is
suited for representing a TE wave. In (8.140), in turn, a wave equation is given only
for Hx, and hence, it is suited for representing a TM wave. With the TE wave, the
electric field oscillates parallel to the slab plane and vertical to the propagation
direction. With the TM wave, in turn, the magnetic field oscillates parallel to the slab
plane and vertical to the propagation direction. In a general case, electromagnetic
waves in a slab waveguide are formed by superposition of TE and TM waves. Notice
that Fig. 8.12 is applicable to both TE and TM waves.
Let us further proceed with the waveguide analysis. The electric field E within the
waveguide is described by superposition of the incident and reflected waves. Using
the first equation of (8.141) and (8.143), we have

Eðz, yÞ = εe E þ eiðkzsin θþky cos θ - ωtÞ þ εe 0 E - eiðkzsin θ - ky cos θ - ωtÞ , ð8:145Þ

where E+ (E-) and εe ε0e represent an amplitude and unit polarization vector of the
incident (reflected) waves, respectively. The vector εe (or εe′) is defined in (7.67).
Equation (8.145) is common to both the cases of TE and TM waves. From now,
we consider the TE mode case. Suppose that the slab waveguide is sandwiched with
a couple of metal sheet of high conductance. Since the electric field must be absent
inside the metal, the electric field at the interface must be zero owing to the
continuity condition of a tangential component of the electric field. Thus, we require
the following condition should be met with (8.145):

t  Eðz, 0Þ = 0 = t  εe Eþ eiðkzsin θ - ωtÞ þ t  εe 0 E - eiðkzsin θ - ωtÞ

= ðt  εe E þ þ t  εe 0 E - Þeiðkzsin θ - ωtÞ : ð8:146Þ

Therefore, since ei(kz sin θ - ωt) never vanishes, we have

t  εe E þ þ t  εe 0 E - = 0, ð8:147Þ

where t is a tangential unit vector at the interface.


Since E is polarized along the x-axis, setting εe = εe′ = e1 and taking t as e1,
we get

E þ þ E - = 0:

This means that the reflection coefficient of the electric field is -1. Denoting
E+ = - E-  E0 (>0), we have
8.7 Waveguide Applications 337

E = e1 E 0 eiðkzsin θþky cos θ - ωtÞ - eiðkzsin θ - ky cos θ - ωtÞ

= e1 E0 eikycos θ - e - iky cos θ eiðkzsin θ - ωtÞ

= e1 2iE 0 sinðkycos θÞeiðkzsin θ - ωtÞ : ð8:148Þ

Requiring the electric field to vanish at another interface of y = d, we have

Eðz, dÞ = 0 = e1 2iE 0 sinðkdcos θÞeiðkzsin θ - ωtÞ :

Note that in terms of the boundary conditions, we are thinking of Dirichlet


conditions (see Sects. 1.3 and 10.3). In this case, we have nodes for the electric
field at the interface between metal and a dielectric. For this condition to be satisfied,
we must have

kd cos θ = lπ ðl = 1, 2, ⋯Þ: ð8:149Þ

From (8.149), we have a following condition for l:

l ≤ kd=π: ð8:150Þ

Meanwhile, we have

k = nk 0 , ð8:151Þ

where n is a refractive index of a dielectric that shapes the slab waveguide and the
quantity k0 is a wavenumber of the electromagnetic wave in vacuum. The index n is
given by

n = c=v, ð8:152Þ

where c and v are light velocity in vacuum and the dielectric media, respectively.
Here v is meant as a velocity in an infinitely spreading dielectric. Thus, θ is allowed
to take several (or more) numbers depending upon k, d, and l.
Since in the z-direction no specific boundary conditions are imposed, we have
propagating modes in that direction characterized by a propagation constant (vide
infra). Looking at (8.148), we notice that k sin θ plays a role of a wavenumber in a
free space. For this reason, a quantity β defined as
338 8 Reflection and Transmission of Electromagnetic Waves in Dielectric Media

β = k sin θ = nk 0 sin θ ðβ > 0Þ ð8:153Þ

is said to be a propagation constant. In (8.153), k0 is a wavenumber in vacuum. From


(8.149) and (8.153), we get

1=2
l2 π 2
β = k2 - : ð8:154Þ
d2

Thus, the allowed TE waves indexed by l are called TE modes and represented as
TEl. The phase velocity vp is given by

vp = ω=β: ð8:155Þ

Meanwhile, the group velocity vg is given by

-1
dω dβ
vg = = : ð8:156Þ
dβ dω

Using (8.154) and noting that k2 = ω2/v2, we get

vg = v2 β=ω: ð8:157Þ

Thus, we have

v p vg = v2 : ð8:158Þ

Note that in (1.22) of Sect 1.1, we saw a relationship similar to (8.158).


The characteristics of TM waves can be analyzed in a similar manner by exam-
ining the magnetic field Hx. In that case, the reflection coefficient of the magnetic
field is +1 and we have antinodes for the magnetic field at the interface. Concom-
itantly, we adopt Neumann conditions as the boundary conditions (see Sects. 1.3 and
8.3). Regardless of the difference in the boundary conditions, however, discussion
including (8.149)–(8.158) applies to the analysis of TM waves. Once Hx is deter-
mined, Ey and Ez can be determined as well from (8.134) and (8.135).

8.7.2 Total Internal Reflection and Evanescent Waves

If a slab waveguide shaped by a dielectric is sandwiched by a couple of another


dielectric (Fig. 8.10b), the situation differs from a metal waveguide (Fig. 8.10a) we
encountered in Sect. 8.7.1. Suppose in Fig. 8.10b that the former dielectric D1 of a
refractive index n1 is sandwiched by the latter dielectric D2 of a refractive index n2.
8.7 Waveguide Applications 339

Fig. 8.13 Cross-section of


the waveguide where the
light is propagated in the
X P N Y
direction of k. A dielectric
fills a semi-infinite space θ
situated below NXY. We
A

d
suppose another virtual
plane N′X′Y′ that is parallel Q
with NXY. We need the
plane N′X′Y′ to estimate an

X’ N’
optical path difference
(or phase difference) B Y’
P’ θ

A’
Q’

B’

Suppose that an electromagnetic wave is being propagated from D1 toward D2.


Then, we must have

n1 > n2 ð8:159Þ

so that the total internal reflection can take place at the interface of D1 and D2. In this
case, the dielectrics D1 and D2 act as a core layer and a clad layer, respectively.
The biggest difference between the present waveguide and the previous one is
that unlike the previous case, the total internal reflection occurs in the present case.
Concomitantly, an evanescent wave is present in the clad layer very close to the
interface.
First, let us estimate the conditions that are satisfied so that an electromagnetic
wave can be propagated within a waveguide. Figure 8.13 depicts a cross-section of
the waveguide where the light is propagated in the direction of k. In Fig. 8.13,
suppose that we have a normal N to the plane of paper at P. Then, N and a straight
line XY shape a plane NXY. Also suppose that a dielectric fills a semi-infinite space
situated below NXY. Further suppose that there is another virtual plane N′X′Y′ that
is parallel with NXY as shown. Here N′ is parallel to N. The separation of the two
parallel planes is d. We need the virtual plane N′X′Y′ just to estimate an optical path
difference (or phase difference, more specifically) between two waves, i.e., a prop-
agating wave and a reflected wave.
Let n be a unit vector in the direction of k, i.e.,
340 8 Reflection and Transmission of Electromagnetic Waves in Dielectric Media

n = k= j k j = k=k: ð8:160Þ

Then the electromagnetic wave is described as

E = E0 eiðkx - ωtÞ = E0 eiðknx - ωtÞ : ð8:161Þ

Suppose that we take a coordinate system such that

x = rn þ su þ tv, ð8:162Þ

where u and v represent unit vectors in the direction perpendicular to n. Then (8.161)
can be expressed by

E = E0 eiðkr - ωtÞ : ð8:163Þ

Suppose that the wave is propagated starting from a point A to P and reflected at
P. Then, the wave is further propagated to B and reflected again to reach Q. The
wave front is originally at AB and finally at PQ. Thus, the Z-shaped optical path
length APBQ is equal to a separation between A′B′ and PQ. Notice that the
separation between A′B′ and P′Q′ is taken so that it is equal to that between AB
and PQ. The geometry of Fig. 8.13 implies that two waves starting from AB and
A′B′ at once reach PQ again at once.
We find the separation between AB and A′B′ is

2d cos θ:

Let us tentatively call these waves Wave-AB and Wave-A′B′ and describe their
electric fields as EAB and EA0 B0 , respectively. Then, we denote

EAB = E0 eiðkr - ωtÞ , ð8:164Þ

EA0 B0 = E0 ei½kðrþ2d cos θÞ - ωt , ð8:165Þ

where k is a wavenumber in the dielectric. Note that since EA0 B0 gets behind EAB, a
plus sign appears in the first term of the exponent. Therefore, the phase difference
between the two waves is

2kd cos θ: ð8:166Þ

Now, let us come back to the actual geometry of the waveguide. That is, the core
layer of thickness d is sandwiched by a couple of clad layers (Fig. 8.10b). In this
situation, the wave EAB experiences the total internal reflection two times, which we
ignored in the above discussion of the metal waveguide. Since the total internal
8.7 Waveguide Applications 341

reflection causes a complex phase shift, we have to take account of this effect. The
phase shift was defined as α of (8.114) for a TE mode and β for a TM mode. Notice
that in Fig. 8.13 the electric field oscillates perpendicularly to the plane of paper with
the TE mode, whereas it oscillates in parallel with the plane of paper with the TM
mode. For both cases the electric field oscillates perpendicularly to n. Consequently,
the phase shift due to these reflections has to be added to (8.166). Thus, for the phase
commensuration to be obtained, the following condition must be satisfied:

2kd cos θ þ 2δ = 2lπ ðl = 0, 1, 2, ⋯Þ, ð8:167Þ

where δ is either δTE or δTM defined below according to the case of the TE wave and
TM wave, respectively. For a practical purpose, (8.167) is dealt with by a numerical
calculation, e.g., to design an optical waveguide.
Unlike (8.149), what is the most important with (8.167) is that the condition l = 0
is permitted because of δ < 0 (see just below).
For convenience and according to the custom, we adopt a phase shift notation
other than that defined in (8.114). With the TE mode, the phase is retained upon
reflection at the critical angle, and so we identify α with an additional component
δTE. In the TM case, on the other hand, the phase is reversed upon reflection at the
critical angle (i.e., a π shift occurs). Since this π shift has been incorporated into β, it
suffices to consider only an additional component δTM. That is, we have

δTE  α and δTM  β - π: ð8:168Þ

We rewrite (8.110) as

ae - iσ
R⊥
E =e =e
iα iδTE
 = e - 2iσ
aeiσ

and

δTE = - 2σ ðσ > 0Þ, ð8:169Þ

where we have

aeiσ = cos θ þ i sin 2 θ - n2 : ð8:170Þ

Therefore,
p
δ sin 2 θ - n2
tan σ = - tan TE = : ð8:171Þ
2 cos θ

Meanwhile, rewriting (8.112) we have


342 8 Reflection and Transmission of Electromagnetic Waves in Dielectric Media

k be - iτ
RE = eiβ = eiðδTM þπ Þ  - = - e - 2iτ = eið- 2τþπ Þ , ð8:172Þ
beiτ

where

beiτ = n2 cos θ þ i sin 2 θ - n2 : ð8:173Þ

Note that the minus sign with the second last equality in (8.172) is due to the
phase reversal upon reflection. From (8.172), we may put

β = - 2τ þ π:

Comparing this with the second equation of (8.168), we get

δTM = - 2τ ðτ > 0Þ: ð8:174Þ

Consequently, we get
p
δTM sin 2 θ - n2
tan τ = - tan = : ð8:175Þ
2 n2 cos θ

Finally, the additional phase change δTE and δTM upon the total reflection is given
by [4].
p p
-1 sin 2 θ - n2 -1 sin 2 θ - n2
δTE = - 2 tan and δTM = - 2 tan : ð8:176Þ
cos θ n2 cos θ

We emphasize that in (8.176) both δTE and δTM are negative quantities. This
phase shift has to be included in (8.167) as a negative quantity δ.
At a first glance, (8.176) seems to differ largely from (8.120) and (8.125).
Nevertheless, noting that a trigonometric formula

2 tan x
tan 2x =
1 - tan 2 x

and remembering that δTM in (8.168) includes π arising from the phase reversal, we
find that both the relations are virtually identical.
Evanescent waves are drawing a large attention in the field of basic physics and
applied device physics. If the total internal reflection is absent, ϕ is real. But, under
the total internal reflection, ϕ is pure imaginary. The electric field of the evanescent
wave is described as
8.8 Stationary Waves 343

Et = Eεt eiðkt zsin ϕþkt y cos ϕ - ωtÞ = Eεt eiðkt zsin ϕþkt yib - ωtÞ

= Eεt eiðkt zsin ϕ - ωtÞ e - kt yb : ð8:177Þ

In (8.177), a unit polarization vector εt is either perpendicular to the plane of


paper of Fig. 8.13 (the TE case) or in parallel to it (the TM case). Notice that the
coordinate system is different from that of (8.104). The quantity kt sin ϕ is the
ðsÞ ð eÞ
propagation constant. Let vp and vp be a phase velocity of the electromagnetic
wave in the slab waveguide (i.e., core layer) and evanescent wave in the clad layer,
respectively. Then, in virtue of Snell’s law, we have

v1 ω ω v
v1 < vðpsÞ = = = = vðpeÞ = 2 < v2 , ð8:178Þ
sin θ k sin θ kt sin ϕ sin ϕ

where v1 and v2 are light velocity in a free space filled by the dielectrics D1 and D2,
respectively. For this, we used a relation described as

ω = v1 k = v2 k t : ð8:179Þ

We also used Snell’s law with the third equality of (8.178). Notice that sinϕ > 1
in the evanescent region and that kt sin ϕ is a propagation constant in the clad layer.
ðsÞ ðeÞ
Also note that vp is equal to vp and that these phase velocities are in between the
two velocities of the free space. Thus, the evanescent waves must be present,
accompanying propagating waves that undergo the total internal reflections in a
slab waveguide.
As remarked in (8.105), the electric field of evanescent waves decays exponen-
tially with increasing z. This implies that the evanescent waves exist only in the clad
layer very close to an interface of core and clad layers.

8.8 Stationary Waves

So far, we have been dealing with propagating waves either in a free space or in a
waveguide. If the dielectric shaping the waveguide is confined in another direction,
the propagating waves show specific properties. Examples include optical fibers.
In this section we consider a situation where the electromagnetic wave is prop-
agating in a dielectric medium and reflected by a “wall” formed by metal or another
dielectric. In such a situation, the original wave (i.e., a forward wave) causes
interference with the backward wave, and a stationary wave is formed as a conse-
quence of the interference.
344 8 Reflection and Transmission of Electromagnetic Waves in Dielectric Media

To approach this issue, we deal with superposition of two waves that have
different phases and different amplitudes. To generalize the problem, let ψ 1 and
ψ 2 be two cosine functions described as

ψ 1 = a1 cos b1 and ψ 2 = a2 cos b2 : ð8:180Þ

Their addition is expressed as

ψ = ψ 1 þ ψ 2 = a1 cos b1 þ a2 cos b2 : ð8:181Þ

Here we wish to unify (8.181) as a single cosine (or sine) function. To this end,
we modify a description of ψ 2 such that

ψ 2 = a2 cos½b1 þ ðb2 - b1 Þ

= a2 ½cos b1 cosðb2 - b1 Þ - sin b1 sinðb2 - b1 Þ

= a2 ½cos b1 cosðb1 - b2 Þ þ sin b1 sinðb1 - b2 Þ: ð8:182Þ

Then, the addition is described by

ψ = ψ1 þ ψ2

= ½a1 þ a2 cosðb1 - b2 Þ cos b1 þ a2 sinðb1 - b2 Þ sin b1 : ð8:183Þ

Putting R such that

R= ½a1 þ a2 cosðb1 - b2 Þ2 þ a2 2 sin2 ðb1 - b2 Þ

= a1 2 þ a2 2 þ 2a1 a2 cosðb1 - b2 Þ, ð8:184Þ

we get

ψ = R cosðb1 - θÞ, ð8:185Þ

where θ is expressed by
8.8 Stationary Waves 345

Fig. 8.14 Geometric


diagram in relation to the y
superposition of two waves
cos( − ) sin( − )
having different amplitudes
(a1 and a2) and different
phases (b1 and b2)
b1 ‒ b2

a1

θ
a2

O b1 b2 x

a2 sinðb1 - b2 Þ
tan θ = : ð8:186Þ
a1 þ a2 cosðb1 - b2 Þ

Figure 8.14 represents a geometric diagram in relation to the superposition of two


waves having different amplitudes (a1 and a2) and different phases (b1 and b2) [5].
To apply (8.185) to the superposition of two electromagnetic waves that are
propagating forward and backward to collide head-on with each other, we change
the variables such that

b1 = kx - ωt and b2 = - kx - ωt, ð8:187Þ

where the former equation represents a forward wave, whereas the latter a backward
wave. Then we have

b1 - b2 = 2kx: ð8:188Þ

Equation (8.185) is rewritten as

ψ ðx, t Þ = R cosðkx - ωt - θÞ, ð8:189Þ

with
346 8 Reflection and Transmission of Electromagnetic Waves in Dielectric Media

Fig. 8.15 Superposition of 䢴


two sinusoidal waves. In
(8.189) and (8.190), we put 䢳䢰䢷
a1 = 1, a2 = 0.5, with (i) (iii)
(i) t = 0; (ii) t = T/4; (iii) 䢳
t = T/2; (iv) t = 3T/4. ψ(x, t)
(ii)
䢲䢰䢷
is plotted as a function of
phase kx 䢲
䢯䢲䢰䢷
(iv)
䢯䢳
䢯䢳䢰䢷
䢯䢴
0 Phase kx (rad) 6.3

R= a1 2 þ a2 2 þ 2a1 a2 cos 2kx ð8:190Þ

and

a2 sin 2kx
tan θ = : ð8:191Þ
a1 þ a2 cos 2kx

Equation (8.189) looks simple, but since both R and θ vary as a function of x, the
situation is somewhat complicated unlike a simple sinusoidal wave. Nonetheless,
when x takes a special value, (8.189) is expressed by a simple function form. For
example, at t = 0,

ψ ðx, 0Þ = ða1 þ a2 Þ cos kx:

This corresponds to (i) of Fig. 8.15. If t = T/2 (where T is a period, i.e., T = 2π/ω),
we have

T
ψ x, = - ða1 þ a2 Þ cos kx:
2

This corresponds to (iii) of Fig. 8.15. But, the waves described by (ii) or (iv) do
not have a simple function form.
We characterize Fig. 8.15 below. If we have

2kx = nπ or x = nλ=4 ð8:192Þ

with λ being a wavelength, then θ = 0 or π, and so θ can be eliminated. This situation


occurs with every quarter period of a wavelength. Let us put t = 0 and examine how
the superposed wave looks like. For instance, putting x = 0, x = λ/4, and x = λ/2,
we have
8.8 Stationary Waves 347

ψ ð0, 0Þ = j a1 þ a2 j , ψ ðλ=4, 0Þ = ψ ð3λ=4, 0Þ = 0, ψ ðλ=2, 0Þ = - ja1 þ a2 j, ð8:193Þ

respectively. Notice that in Fig. 8.15 we took a1, a2 > 0. At another instant t = T/4,
we have similarly

ψ ð0, T=4Þ = 0, ψ ðλ=4, T=4Þ = ja1 - a2 j, ψ ðλ=2, T=4Þ = 0,

ψ ðλ=4, T=4Þ = - ja1 - a2 j: ð8:194Þ

Thus, the waves that vary with time are characterized by two dram-shaped
envelopes that have extremals ja1 + a2j and ja1 - a2j or those - j a1 + a2j and
- ja1 - a2j. An important implication of Fig. 8.15 is that no node is present in the
superposed wave. In other words, there is no instant t0 when ψ(x, t0) = 0 for any x.
From the aspect of energy transport of electromagnetic waves, if |a1| > j a2j (where
a1 and a2 represent an amplitude of the forward and backward waves, respectively),
the net energy flow takes place in the travelling direction of the forward wave. If, on
the other hand, |a1| > j a2j, the net energy flow takes place in the travelling direction
of the backward wave. In this respect, think of Poynting vectors.
In case |a1| = j a2j, the situation is particularly simple. No net energy flow takes
place in this case. Correspondingly, we observe nodes. Such waves are called
stationary waves. Let us consider this simple situation for an electromagnetic wave
that is incident perpendicularly to the interface between two dielectrics (one of them
may be a metal).
Returning back to (7.58) and (7.66), we describe two electromagnetic waves that
are propagating in the positive and negative direction of the z-axis such that

E1 = E 1 εe eiðkz - ωtÞ and E2 = E2 εe eið- kz - ωtÞ , ð8:195Þ

where εe is a unit polarization vector arbitrarily fixed so that it can be parallel to the
interface, i.e., wall (i.e., perpendicular to the z-axis). The situation is depicted in
Fig. 8.16. Notice that in Fig. 8.16 E1 represents the forward wave (i.e., incident
wave) and E2 the backward wave (i.e., wave reflected at the interface). Thus, a
superposed wave is described as

E = E1 þ E2 : ð8:196Þ

Taking account of the reflection of an electromagnetic wave perpendicularly


incident on a wall, let us consider following two cases:
(i) Syn-phase:
The phase of the electric field is retained upon reflection. We assume that
E1 = E2 (>0). Then, we have
348 8 Reflection and Transmission of Electromagnetic Waves in Dielectric Media

Fig. 8.16 Superposition of


electric fields of forward
(or incident) wave E1 and
backward (or reflected)
( )
wave E2
=

= ( )

E = E 1 εe eiðkz - ωtÞ þ eið- kz - ωtÞ = E 1 εe e - iωt eikz þ e - ikz

= 2E 1 εe e - iωt cos kz: ð8:197Þ

In (8.197), we put z = 0 at the interface for convenience. Taking a real part of


(8.197), we have

E = 2E 1 εe cos ωt cos kz: ð8:198Þ

Note that in (8.198) variables z and t have been separated. This implies that we
have a stationary wave. For this case to be realized, the characteristic impedance of
the dielectric of the incident wave side should be smaller enough than that of the
other side; see (8.51) and (8.59). In other words, the dielectric constant of the
incident side should be large enough. We have nodes at positions that satisfy

π 1 m
- kz = þ mπ ðm = 0, 1, 2, ⋯Þ or -z= λ þ λ: ð8:199Þ
2 4 2

Note that we are thinking of the stationary wave in the region of z < 0. Equation
(8.199) indicates that nodes are formed at a quarter wavelength from the interface
and every half wavelength from it. The node means the position where no electric
field is present.
Meanwhile, antinodes are observed at positions
8.8 Stationary Waves 349

m
- kz = mπ ðm = 0, 1, 2, ⋯Þ or -z= þ λ:
2

Thus, the nodes and antinodes alternate with every quarter wavelength.
(ii) Antiphase:
The phase of the electric field is reversed. We assume that E1 = - E2 (>0). Then,
we have

E = E1 εe eiðkz - ωtÞ - eið- kz - ωtÞ = E 1 εe e - iωt eikz - e - ikz

= 2iE 1 εe e - iωt sin kz: ð8:200Þ

Taking a real part of (8.197), we have

E = 2E 1 εe sin ωt sin kz: ð8:201Þ

In (8.201) variables z and t have been separated as well. For this case to be
realized, the characteristic impedance of the dielectric of the incident wave side
should be larger enough than that of the other side. In other words, the dielectric
constant of the incident side should be small enough. Practically, this situation can
easily be attained choosing a metal of high reflectance for the wall material. We have
nodes at positions that satisfy

m
- kz = mπ ðm = 0, 1, 2, ⋯Þ or -z= λ: ð8:202Þ
2

The nodes are formed at the interface and every half wavelength from it. As in the
case of the syn-phase, the antinodes take place with the positions shifted by a quarter
wavelength relative to the nodes.
If there is another interface at say -z = L (>0), the wave goes back and forth
many times. If an absolute value of the reflection coefficient at the interface is high
enough (i.e., close to the unity), attenuation of the wave is ignorable. For both the
syn-phase and antiphase cases, we must have

m
kL = mπ ðm = 1, 2, ⋯Þ or L= λ ð8:203Þ
2

so that stationary waves can stand stable. The stationary waves indexed with a
positive integer m in (8.203) are said to be longitudinal modes. Note that the index
l that features the transverse modes in the waveguide can be zero or a positive
integer. Unlike the previous case of (8.167) in Sect. 8.7, the index m of (8.203) is not
allowed to be zero.
If L in (8.203) is large enough compared to λ, m is a pretty large number as well,
and what is more, many different numbers of m are allowed between given
350 8 Reflection and Transmission of Electromagnetic Waves in Dielectric Media

interfaces. In other words, stationary waves of many different wavelengths are


allowed to be present at once between the given interfaces. In such a case, the
stationary waves are referred to as a longitudinal multimode.
For a practical purpose, an optical device having interfaces that make the station-
ary waves stabilize is said to be a resonator. Various geometries and constitutions of
the resonator are proposed in combination with various dielectrics including semi-
conductors. Related discussion can be seen in Chap. 9.

References

1. Arfken GB, Weber HJ, Harris FE (2013) Mathematical methods for physicists, 7th edn. Aca-
demic Press, Waltham
2. Rudin W (1987) Real and complex analysis, 3rd edn. McGraw-Hill, New York
3. Smith FG, King TA, Wilkins D (2007) Optics and photonics, 2nd edn. Wiley, Chichester
4. Born M, Wolf E (2005) Principles of optics, 7th edn. Cambridge University Press, Cambridge
5. Pain HJ (2005) The physics of vibrations and waves, 6th edn. Wiley, Chichester
Chapter 9
Light Quanta: Radiation and Absorption

So far we discussed propagation of light and its reflection and transmission


(or refraction) at an interface of dielectric media. We described characteristics of
light from the point of view of an electromagnetic wave. In this chapter, we describe
properties of light in relation to quantum mechanics. To this end, we start with
Planck’s law of radiation that successfully reproduced experimental results related to
a blackbody radiation. Before this law had been established, Rayleigh–Jeans law
failed to explain the experimental results at a high frequency region of radiation (the
ultraviolet catastrophe). The Planck’s law of radiation led to the discovery of light
quanta. Einstein interpreted Planck’s law of radiation on the basis of a model of
two-level atoms. This model includes the so-called Einstein A and B coefficients that
are important in optics applications, especially lasers. We derive these coefficients
from a classical point of view based on a dipole oscillation. We also consider a close
relationship between electromagnetic waves confined in a cavity and a motion of a
harmonic oscillator.

9.1 Blackbody Radiation

Historically, the relevant theory was first propounded by Max Planck and then
Albert Einstein as briefly discussed in Chap. 1. The theory was developed on the
basis of the experiments called cavity radiation or blackbody radiation. Here,
however, we wish to derive Planck’s law of radiation on the assumption of the
existence of quantum harmonic oscillators.
As discussed in Chap. 2, the ground state of a quantum harmonic oscillator has an
energy ℏω/2. Therefore, we measure energies of the oscillator in reference to that
state. Let N0 be the number of oscillators (i.e., light quanta) present in the ground
state. Then, according to Boltzmann distribution law the number of oscillators of the
first excited state N1 is

© Springer Nature Singapore Pte Ltd. 2023 351


S. Hotta, Mathematical Physical Chemistry,
https://doi.org/10.1007/978-981-99-2512-4_9
352 9 Light Quanta: Radiation and Absorption

N 1 = N 0 e - ℏω=kB T , ð9:1Þ

where kB is Boltzmann constant and T is absolute temperature. Let Nj be the number


of oscillators of the j-th excited state. Then we have

N j = N 0 e - jℏω=kB T : ð9:2Þ

Let N be the total number of oscillators in the system. Then, we get

N = N 0 þ N 0 e - ℏω=kB T þ ⋯ þ N 0 e - jℏω=kB T þ ⋯
1 ð9:3Þ
= N0 j=0
e - jℏω=kB T :

Let E be a total energy of the oscillator system in reference to the ground state.
That is, we put a ground state energy at zero. Then we have

E = 0  N 0 þ N 0 ℏωe - ℏω=kB T þ ⋯ þ N 0 jℏωe - jℏω=kB T þ ⋯


1 ð9:4Þ
= N 0 j = 0 jℏωe - jℏω=kB T :

Therefore, an average energy of oscillators E is given by

1 - jℏω=k B T
E j = 0 je
E= = ℏω 1 - jℏω=kB T
: ð9:5Þ
N j = 0e

Putting x  e - ℏω=kB T [1], we have


1 j
j = 0 jx
E = ℏω 1 j
: ð9:6Þ
j = 0x

Since x < 1, we have

1 1 d 1 d 1
jxj = jxj - 1  x = xj x= x
j=0 j=0 dx j=0 dx 1 - x
x
= , ð9:7Þ
ð 1 - xÞ 2

1 1
xj = : ð9:8Þ
j=0 1-x

Then, we get
9.2 Planck’s Law of Radiation and Mode Density of Electromagnetic Waves 353

ℏωx ℏωe - ℏω=kB T ℏω


E= = = ℏω=k T : ð9:9Þ
1 - x 1 - e - ℏω=kB T e B -1

The function

1
ð9:10Þ
eℏω=kB T - 1

is a form of Bose–Einstein distribution functions; more specifically it is called the


Bose–Einstein distribution function for photons today. If (ℏω/kBT ) ≪ 1, we have

eℏω=kB T ≈ 1 þ ðℏω=kB T Þ:

Therefore, we get

E ≈ k B T: ð9:11Þ

Thus, the relation (9.9) asymptotically agrees with a classical theory. In other
words, according to the classical theory related to law of equipartition of energy,
energy of kBT/2 is distributed to each of two degrees of freedom of motion, i.e., a
kinetic energy and a potential energy of a harmonic oscillator.

9.2 Planck’s Law of Radiation and Mode Density


of Electromagnetic Waves

Researcher at the time tried to seek the relationship between the energy density
inside the cavity and (angular) frequency of radiation. To reach the relationship, let
us introduce a concept of mode density of electromagnetic waves related to the
blackbody radiation. We define the mode density D(ω) as the number of modes of
electromagnetic waves per unit volume per unit angular frequency. We refer to the
electromagnetic waves having allowed specific angular frequencies and polarization
as modes. These modes must be described as linearly independent functions.
Determination of the mode density is related to boundary conditions (BCs)
imposed on a physical system. We already dealt with this problem in Chaps. 2, 3,
and 8. These BCs often appear when we find solutions of a differential equation. Let
us consider a following wave equation:

2 2
∂ ψ 1 ∂ ψ
2
= 2 : ð9:12Þ
∂x v ∂t 2

According to the method of separation of variables, we put


354 9 Light Quanta: Radiation and Absorption

ψ ðx, t Þ = X ðxÞT ðt Þ: ð9:13Þ

Substituting (9.13) for (9.12) and dividing both sides by X(x)T(t), we have

1 d2 X 1 1 d2 T
2
= 2 = - k2 , ð9:14Þ
X dx v T dt 2

where k is an undetermined (possibly complex) constant. For the x component, we


get

d2 X
þ k 2 X = 0: ð9:15Þ
dx2

Remember that k is supposed to be a complex number for the moment (see


Example 1.1). Modifying Example 1.1 a little bit such that (9.15) is posed in a
domain [0, L] and imposing the Dirichlet conditions such that

X ð0Þ = X ðLÞ = 0, ð9:16Þ

we find a solution of

X ðxÞ = a sin kx, ð9:17Þ

where a is a constant. The constant k can be determined to satisfy the BCs; i.e.,

kL = mπ or k = mπ=L ðm = 1, 2, ⋯Þ: ð9:18Þ

Thus, we get real numbers for k. Then, we have a solution

T ðt Þ = b sin kvt = b sin ωt: ð9:19Þ

The overall solution is then

ψ ðx, t Þ = c sin kx sin ωt: ð9:20Þ

This solution has already appeared in Chap. 8 as a stationary solution. The readers
are encouraged to derive these results.
In a three-dimensional case, we have a wave equation

2 2 2 2
∂ ψ ∂ ψ ∂ ψ 1 ∂ ψ
2
þ 2þ 2 = 2 : ð9:21Þ
∂x ∂y ∂z v ∂t 2

In this case, we also assume


9.2 Planck’s Law of Radiation and Mode Density of Electromagnetic Waves 355

ψ ðx, t Þ = X ðxÞY ðyÞZ ðzÞT ðt Þ: ð9:22Þ

Similarly we get

1 d2 X 1 d2 Y 1 d2 Z 1 1 d2 T
þ þ = = - k2 : ð9:23Þ
X dx2 Y dx2 Z dx2 v2 T dt 2

Putting

1 d2 X 1 d2 Y 1 d2 Z
= - k 2x , = - k 2y , = - k2z , ð9:24Þ
X dx2 Y dx2 Z dx2

we have

k2x þ k 2y þ k2z = k2 : ð9:25Þ

Then, we get a stationary wave solution as in the one-dimensional case such that

ψ ðx, t Þ = c sin k x x sin k y y sin kz z sin ωt: ð9:26Þ

The BCs to be satisfied with X(x), Y( y), and Z(z) are

kx L = mx π, ky L = my π, k z L = mz π mx , my , mz = 1, 2, ⋯ : ð9:27Þ

Returning to the main issue, let us deal with the mode density. Think of a cube of
each side of L that is placed as shown in Fig. 9.1. Calculating k, we have

π ω
k= k 2x þ k2y þ k2z = m2x þ m2y þ m2z = , ð9:28Þ
L c

where we assumed that the inside of a cavity is vacuum and, hence, the propagation
velocity of light is c. Rewriting (9.28), we have


m2x þ m2y þ m2z = : ð9:29Þ
πc

The number mx, my, and mz represents allowable modes in the cavity; the set of
(mx, my, mz) specifies individual modes. Note that mx, my, and mz are all positive
integers. If, for instance, -mx were allowed, this would produce sin(-kxx) = -
sin kxx; but this function is linearly dependent on sinkxx. Then, a mode indexed by -
mx should not be regarded as an independent mode. Given a ω, a set (mx, my, mz) that
satisfies (9.29) corresponds to each mode.
Therefore, the number of modes that satisfies a following expression
356 9 Light Quanta: Radiation and Absorption

Fig. 9.1 Cube of each side


of L. We use this simple z
model to estimate mode
density
L
L
L
O
y


m2x þ m2y þ m2z ≤ ð9:30Þ
πc

represents those corresponding to angular frequencies equal to or less than given ω.


Each mode has one-to-one correspondence with the lattice indexed by (mx, my, mz).
Accordingly, if mx, my, mz ≫ 1, the number of allowed modes approximately equals
one-eighth of a volume of a sphere having a radius of Lω
πc . Let NL be the number of
modes whose angular frequencies equal to or less than ω. Recalling that there are
two independent modes having the same index (mx, my, mz) but mutually orthogonal
polarities, we have

4π Lω 3
1 L3 ω3
NL =  2= 2 3: ð9:31Þ
3 πc 8 3π c

Consequently, the mode density D(ω) is expressed as

1 dN L ω2
DðωÞdω = dω = 2 3 dω, ð9:32Þ
L 3 dω π c

where D(ω)dω represents the number of modes per unit volume whose angular
frequencies range ω and ω + dω.
Now, we introduce a function ρ(ω) as an energy density per unit angular
frequency. Then, combining D(ω) with (9.9), we get
9.2 Planck’s Law of Radiation and Mode Density of Electromagnetic Waves 357

ℏω ℏω3 1
ρðωÞ = DðωÞ = : ð9:33Þ
eℏω=kB T - 1 π 2 c3 eℏω=kB T - 1

The relation (9.33) is called Planck’s law of radiation. Notice that ρ(ω) has a
dimension [J m-3 s].
To solve (9.15) under the Dirichlet conditions (9.16) is pertinent to analyzing an
electric field within a cavity surrounded by a metal husk, because the electric field
must be absent at an interface between the cavity and metal. The problem, however,
can equivalently be solved using the magnetic field. This is because at the interface
the reflection coefficient of electric field and magnetic field has a reversed sign (see
Chap. 8). Thus, given an equation for the magnetic field, we may use the Neumann
condition. This condition requires differential coefficients to vanish at the boundary
(i.e., the interface between the cavity and metal). In a similar manner to the above,
we get

ψ ðx, t Þ = c cos kx cos ωt: ð9:34Þ

By imposing BCs, again we have (9.18) that leads to the same result as the above.
We may also impose the periodic BCs. This type of equation has already been
treated in Chap. 3. In that case we have a solution of

eikx and e - ikx :

The BCs demand that e0 = 1 = eikL. That is,

kL = 2πm ðm = 0, ± 1, ± 2, ⋯Þ: ð9:35Þ

Notice that eikx and e-ikx are linearly independent and, hence, minus sign for m is
permitted. Correspondingly, we have

4π Lω 3
L3 ω 3
NL = 2= : ð9:36Þ
3 2πc 3π 2 c3

In other words, here we have to consider a whole volume of a sphere of a half


radius of the previous case. Thus, we reach the same conclusion as before.
If the average energy of an oscillator were described by (9.11), we would obtain a
following description of ρ(ω) such that

ω2
ρðωÞ = DðωÞkB T = k T: ð9:37Þ
π 2 c3 B

This relation is well-known as Rayleigh-Jeans law, but (9.37) disagreed with


experimental results in that according to Rayleigh-Jeans law, ρ(ω) diverges toward
infinity as ω goes to infinity. The discrepancy between the theory and experimental
358 9 Light Quanta: Radiation and Absorption

results was referred to as “ultraviolet catastrophe.” Planck’s law of radiation


described by (9.33), however, reproduces the experimental results well.

9.3 Two-Level Atoms

Although Planck established Planck’s law of radiation, researchers at that time


hesitated in professing the existence of light quanta. It was Einstein that derived
Planck’s law by assuming two-level atoms in which light quanta play a role.
His assumption comprises the following three postulates: (1) The physical system
to be addressed comprises the so-called hypothetical two-level atoms that have only
two energy levels. If two-level atoms absorb a light quantum, a ground-state electron
is excited up to a higher level (i.e., the stimulated absorption). (2) The higher-level
electron may spontaneously lose its energy and return back to the ground state (the
spontaneous emission). (3) The higher-level electron may also lose its energy and
return back to the ground state. Unlike (2), however, the excited electron has to be
stimulated by being irradiated by light quanta having an energy corresponding to the
energy difference between the ground state and excited state (the stimulated emis-
sion). Figure 9.2 schematically depicts the optical processes of the Einstein model.
Under those postulates, Einstein dealt with the problem probabilistically. Sup-
pose that the ground state and excited state have energies E1 and E2. Einstein
assumed that light quanta having an energy equaling E2 - E1 take part in all the
above three transition processes. He also propounded the idea that the light quanta
have an energy that is proportional to its (angular) frequency. That is, he thought that
the following relation should hold:

,

( ) ( )
=

, Stimulated Stimulated Spontaneous


absorption emission emission

Fig. 9.2 Optical processes of Einstein two-level atom model


9.3 Two-Level Atoms 359

ℏω21 = E 2 - E1 , ð9:38Þ

where ω21 is an angular frequency of light that takes part in the optical transitions.
For the time being, let us follow Einstein’s postulates.
1. Stimulated absorption: This process is simply said to be an “absorption.” Let Wa
[s-1] be the transition probability that the electron absorbs a light quantum to be
excited to the excited state. Wa is described as

W a = N 1 B21 ρðω21 Þ, ð9:39Þ

where N1 is the number of atoms occupying the ground state; B21 is a proportional
constant; ρ(ω21) is due to (9.33). Note that in (9.39) we used ω21 instead of ω in
(9.33). The coefficient B21 is called Einstein B coefficient; more specifically one
of Einstein B coefficients. Namely, B21 is pertinent to the transition from the
ground state to excited state.
2. Emission processes: The processes include both the spontaneous and stimulated
emissions. Let We [s-1] be the transition probability that the electron emits a light
quantum and returns back to the ground state. We is described as

W e = N 2 B12 ρðω21 Þ þ N 2 A12 , ð9:40Þ

where N2 is the number of atoms occupying the excited state; B12 and A12 are
proportional constants. The coefficient A12 is called Einstein A coefficient rele-
vant to the spontaneous emission. The coefficient B12 is associated with the
stimulated emission and also called Einstein B coefficient together with B21.
Here, B12 is pertinent to the transition from the excited state to ground state.
Now, we have

B12 = B21 : ð9:41Þ

The reasoning for this is as follows: The coefficients B12 and B21 are proportional
to the matrix elements pertinent to the optical transition. Let T be an operator
associated with the transition. Then, a matrix element is described using an inner
product notation of Chap. 1 by

B21 = hψ 2 jT j ψ 1 i, ð9:42Þ

where ψ 1 and ψ 2 are initial and final states of the system in relation to the optical
transition. As a good approximation, we use er for T (dipole approximation), where
e is an elementary charge and r is a position operator (see Chap. 1). If (9.42)
represents the absorption process (i.e., the transition from the ground state to excited
state), the corresponding emission process should be described as a reversed process
by
360 9 Light Quanta: Radiation and Absorption

B12 = hψ 1 jT j ψ 2 i: ð9:43Þ

Notice that in (9.43) ψ 2 and ψ 1 are initial and final states.


Taking complex conjugate of (9.42), we have

B21  = hψ 1 jT { j ψ 2 i, ð9:44Þ

where T{ is an operator adjoint to T (see Chap. 1). With an Hermitian operator H,


from Sect. 1.4 we have

H { = H: ð1:119Þ

Since T is also Hermitian, we have

T { = T: ð9:45Þ

Thus, we get

B21  = B12 : ð9:46Þ

But, as in the cases of Sects. 4.2 and 4.3, ψ 1 and ψ 2 can be represented as real
functions. Then, we have

B21  = B21 = B12 :

That is, we assume that the matrix B is real symmetric. In the case of two-level
atoms, as a matrix form we get

B= 0
B12
B12
0
: ð9:47Þ

Compare (9.47) with (4.28).


Now, in the thermal equilibrium, we have

W e = W a: ð9:48Þ

That is,

N 2 B21 ρðω21 Þ þ N 2 A12 = N 1 B21 ρðω21 Þ, ð9:49Þ

where we used (9.41) for LHS. Assuming Boltzmann distribution law, we get
9.4 Dipole Radiation 361

N2
= exp½ - ðE 2 - E 1 Þ=kB T : ð9:50Þ
N1

Here if moreover we assume (9.38), we get

N2
= expð- ℏω21 =kB T Þ: ð9:51Þ
N1

Combing (9.49) and (9.51), we have

B21 ρðω21 Þ
expð- ℏω21 =kB T Þ = : ð9:52Þ
B21 ρðω21 Þ þ A12

Solving (9.52) with respect to ρ(ω12), we finally get

A12 expð- ℏω21 =kB T Þ A 1


ρðω21 Þ =  = 12  : ð9:53Þ
B21 1 - expð- ℏω21 =kB T Þ B21 expðℏω21 =kB T Þ - 1

Assuming that

A12 ℏω21 3
= 2 3 , ð9:54Þ
B21 π c

we have

ℏω21 3 1
ρðω21 Þ =  : ð9:55Þ
π 2 c3 expðℏω21 =k B T Þ - 1

This is none other than Planck’s law of radiation.

9.4 Dipole Radiation

In (9.54) we only know the ratio of A12 to B21. To have a good knowledge of these
Einstein coefficients, we briefly examine a mechanism of the dipole radiation. The
electromagnetic radiation results from an accelerated motion of a dipole.
A dipole moment p(t) is defined as a function of time t by

pð t Þ = x0 ρðx0 , t Þdx0 , ð9:56Þ

where x′ is a position vector in a Cartesian coordinate; an integral is taken over a


whole three-dimensional space; ρ is a charge density appearing in (7.1). If we
362 9 Light Quanta: Radiation and Absorption

Fig. 9.3 Dipole moment


viewed from the frame O or
O′
+ ‒

.
.
consider a system comprising point charges, integration can immediately be carried
out to yield

pð t Þ = qx,
i i i
ð9:57Þ

where qi is a charge of each point charge i and xi is a position vector of the point
charge i. From (9.56) and (9.57), we find that p(t) depends on how we set up the
coordinate system. However, if a total charge of the system is zero, p(t) does not
depend on the coordinate system. Let p(t) and p′(t) be a dipole moment viewed from
the frame O and O′, respectively (see Fig. 9.3). Then we have

p0 ðt Þ = q x0
i i i
= q ðx
i i 0
þ xi Þ = q
i i
x0 þ qx
i i i
= qx
i i i

= pðt Þ: ð9:58Þ

Notice that with the third equality the first term vanishes because the total charge
is zero. The system comprising two point charges that have an opposite charge (±q)
is particularly simple but very important. In that case we have

pðt Þ = qx1 þ ð- qÞx2 = qðx1 - x2 Þ = q~x: ð9:59Þ

Here we assume that q > 0 according to the custom and, hence, x~ is a vector
directing from the minus point charge to the plus charge.
Figure 9.4 displays geometry of an oscillating dipole and electromagnetic radia-
tion from it. Figure 9.4a depicts the dipole. It is placed at the origin of the coordinate
9.4 Dipole Radiation 363

+q z
(a) (b) x

z0 a
θ

‒a ‒z 0
y
O

‒q φ

x
Fig. 9.4 Electromagnetic radiation from an accelerated motion of a dipole. (a) A dipole placed at
the origin of the coordinate system is executing harmonic oscillation along the z-direction around an
equilibrium position. (b) Electromagnetic radiation from a dipole in a wave zone. εe and εm are unit
polarization vectors of the electric field and magnetic field, respectively. εe, εm, and n form a right-
handed system

system and assumed to be of an atomic or molecular scale in extension; we regard a


center of the dipole as the origin. Figure 9.4b represents a large-scale geometry of the
dipole and surrounding space of it. For the electromagnetic radiation, an accelerated
motion of the dipole is of primary importance. The electromagnetic fields produced
by pð€t Þ vary as the inverse of r, where r is a macroscopic distance between the dipole
and observation point. Namely, r is much larger compared to the dipole size.
There are other electromagnetic fields that result from the dipole moment. The
fields result from p(t) and pð_t Þ. Strictly speaking, we have to include those quantities
that are responsible for the electromagnetic fields associated with the dipole radia-
tion. Nevertheless, the fields produced by p(t) and pð_t Þ vary as a function of the
inverse cube and inverse square of r, respectively. Therefore, the surface integral of
the square of the fields associated with p(t) and pð_t Þ asymptotically reaches zero with
enough large r with respect to a sphere enveloping the dipole. Regarding pð€t Þ,
however, the surface integral of the square of the fields remains finite even with
enough large r. For this reason, we refer to the spatial region where pð€t Þ does not
vanish as a wave zone.
Suppose that a dipole placed at the origin of the coordinate system is executing
harmonic oscillation along the z-direction around an equilibrium position (see
Fig. 9.4). Motion of two charges having plus and minus signs is described by

zþ = z0 e3 þ aeiωt e3 ðz0 , a > 0Þ, ð9:60Þ


z - = - z0 e3 - ae e3 , iωt
ð9:61Þ
364 9 Light Quanta: Radiation and Absorption

where z+ and z- are position vectors of a plus charge and minus charge, respectively;
z0 and -z0 are equilibrium positions of each charge; a is an amplitude of the
harmonic oscillation; ω is an angular frequency of the oscillation. Then, accelera-
tions of the charges are given by

aþ  z€þ = - aω2 eiωt e3 , ð9:62Þ


a -  z€- = aω2 eiωt e3 : ð9:63Þ

Meanwhile, we have

pðt Þ = qzþ þ ð- qÞz - = qðzþ - z - Þ ðq > 0Þ: ð9:64Þ

Therefore,

pð€t Þ = qðz€þ - z€- Þ = - 2qaω2 eiωt e3 : ð9:65Þ

The quantity pð€t Þ, i.e., the second derivative of p(t) with respect to time produces
the electric field described by [2]

E= -€
p=4πε0 c2 r = - qaω2 eiωt e3 =2πε0 c2 r, ð9:66Þ

where r is a distance between the dipole and observation point. In (9.66) we ignored
a term proportional to inverse square and cube of r for the aforementioned reason.
As described in (9.66), the strength of the radiation electric field in the wave zone
measured at a point away from the oscillating dipole is proportional to a component
of the vector of the acceleration motion of the dipole [i.e., pð€t Þ ]. The radiation
electric field lies in the direction perpendicular to a line connecting the observation
point and the point of the dipole (Fig. 9.4). Let εe be a unit polarization vector of the
electric field in that direction and let E⊥ be the radiation electric field. Then, we have

E⊥ = - qaω2 eiωt ðe3  εe Þεe =2πε0 c2 r = - qaω2 eiωt εe sin θ=2πε0 c2 r : ð9:67Þ

As shown in Sect. 7.3, (εe  e3)εe in (9.67) “extracts” from e3 a vector component
parallel to εe. Such an operation is said to be a projection of a vector. The related
discussion can be seen in Part III.
It takes a time of r/c for the emitted light from the charge to arrive at the
observation point. Consequently, the acceleration of the charge has to be measured
at the time when the radiation leaves the charge. Let t be the instant when the electric
field is measured at the measuring point. Then, it follows that the radiation leaves the
charge at a time of t - r/c. Thus, the electric field relevant to the radiation that can be
observed far enough away from the oscillating charge is described as [2]
9.4 Dipole Radiation 365

qaω2 eiωðt - cÞ sin θ


r

E⊥ ðx, t Þ = - εe : ð9:68Þ
2πε0 c2 r

The radiation electric field must necessarily be accompanied by a magnetic field.


Writing the radiation magnetic field as H⊥(x, t), we have [3]

1 qaω2 eiωðt - cÞ sin θ qaω2 eiωðt - cÞ sin θ


r r

H ⊥ ðx, t Þ = -  n × ε e = - n × εe
cμ0 2πε0 c2 r 2πcr
qaω2 eiωðt - cÞ sin θ
r

=- εm ,
2πcr
ð9:69Þ

where n represents a unit vector in the direction parallel to a line connecting the
observation point and the dipole. The εm is a unit polarization vector of the magnetic
field as defined by (7.67). From the above, we see that the radiation electromagnetic
waves in the wave zone are transverse waves that show the properties the same as
those of electromagnetic waves in a free space.
Now, let us evaluate a time-averaged energy flux from an oscillating dipole.
Using (8.71), we have

1 qaω2 eiωðt - cÞ sin θ qaω2 e - iωðt - cÞ sin θ


r r
1
Sð θ Þ = E × H =   n
2 2 2πε0 c2 r 2πcr
ð9:70Þ
ω4 sin 2 θ
= 2 3 2 ðqaÞ2 n:
8π ε0 c r

If we are thinking of an atom or a molecule in which the dipole consists of an


electron and a positive charge that compensates it, q is replaced with -e (e < 0).
Then (9.70) reads as

ω4 sin 2 θ
SðθÞ = ðeaÞ2 n: ð9:71Þ
8π 2 ε0 c3 r 2

Let us relate the above argument to Einstein A and B coefficients. Since we are
dealing with an isolated dipole, we might well suppose that the radiation comes from
the spontaneous emission. Let P be a total power of emission from the oscillating
dipole that gets through a sphere of radius r. Then we have
366 9 Light Quanta: Radiation and Absorption

2π π
P = SðθÞ  ndS = dϕ SðθÞr 2 sin θdθ
0
2π π
0
ð9:72Þ
ω4
= 2 3 ðeaÞ2 dϕ sin θdθ: 3
8π ε0 c 0 0

π
Changing cosθ to t, the integral I  0 sin 3 θdθ can be converted into

1
I= 1 - t 2 dt = 4=3: ð9:73Þ
-1

Thus, we have

ω4
P= ðeaÞ2 : ð9:74Þ
3πε0 c3

A probability of the spontaneous emission is given by N2A12. Since we are


dealing with a single dipole, N2 can be put 1. Accordingly, an expected power of
emission is A12ℏω21. Replacing ω in (9.74) with ω21 in (9.55) and equating A12ℏω21
to P, we get

ω21 3
A12 = ðeaÞ2 : ð9:75Þ
3πε0 c3 ℏ

From (9.54), we also get

π
B12 = ðeaÞ2 : ð9:76Þ
3ε0 ℏ2

In order to relate these results to those of quantum mechanics, we may replace a2


in the above expressions with a square of an absolute value of the matrix elements of
the position operator r. That is, representing | 1i and | 2i as the quantum states of the
ground and excited states of a two-level atom, we define h1| r| 2i as the matrix
element of the position operator. Relating |h1| r| 2i|2 to a2, we get

2 2
ω 3 e2 πe2
A12 = 21 3 h1jrj2i and B12 = h1jrj2i : ð9:77Þ
3πε0 c ℏ 3ε0 ℏ2

From (9.77), we have


9.5 Lasers 367

πe2 πe2 πe2


B12 = h1jrj2ih1jrj2i = h1jrj2ih2jr{ j1 = h1jrj2i
3ε0 ℏ 2
3ε0 ℏ 2
3ε0 ℏ2
 h2jrj1i, ð9:78Þ

where with the second equality we used (1.116); the last equality comes from that r is
Hermitian. Meanwhile, we have

πe2 πe2 πe2


B21 = 2 jh
2jrj1ij2 = 2h
2jrj1ih1jr{ j2 = h2jrj1ih1jrj2i: ð9:79Þ
3ε0 ℏ 3ε0 ℏ 3ε0 ℏ2

Hence, we recover (9.41).

9.5 Lasers

9.5.1 Brief Outlook

A concept of the two-level atoms proposed by Einstein is universal and independent


of materials and can be utilized for some particular purposes. Actually, in later years
many researchers tried to validate that concept and verified its validity. After basic
researches of 1950s and 1960s, fundamentals were put into practical use as various
optical devices. Typical examples are masers and lasers, abbreviations for “micro-
wave amplification by stimulated emission of radiation” and “light amplification by
stimulated emission of radiation.” Of these, lasers are common and particularly
important nowadays. On the basis of universality of the concept, a lot of materials
including semiconductors and dyes are used in gaseous, liquid, and solid states.
Let us consider a rectangular parallelepiped of a laser medium with a length L and
cross-section area S (Fig. 9.5). Suppose there are N two-level atoms in the rectan-
gular parallelepiped such that

t t+dt

I(x) I(x+dx)

0 x x+dx L
Fig. 9.5 Rectangular parallelepiped of a laser medium with a length L and a cross-section area
S (not shown). I(x) denotes irradiance at a point of x from the origin
368 9 Light Quanta: Radiation and Absorption

N = N1 þ N 2, ð9:80Þ

where N1 and N2 represent the number of atoms occupying the ground and excited
states, respectively. Suppose that a light is propagated from the left of the rectangular
parallelepiped and entering it. Then, we expect that three processes occur simulta-
neously. One is a stimulated absorption and others are stimulated emission and
spontaneous emission. After these process, an increment dE in photon energy of the
total system (i.e., the rectangular parallelepiped) during dt is described by

dE = fN 2 ½B21 ρðω21 Þ þ A12  - N 1 B21 ρðω21 Þgℏω21 dt: ð9:81Þ

In light of (9.39) and (9.40), a dimensionless quantity dE/ℏω21 represents a


number of effective events of photons emission that have occurred during dt.
Since in lasers the stimulated emission is dominant, we shall forget about the
spontaneous emission and rewrite (9.81) as

dE = fN 2 B21 ρðω21 Þ - N 1 B21 ρðω21 Þgℏω21 dt


ð9:82Þ
= B21 ρðω21 ÞðN 2 - N 1 Þℏω21 dt:

Under a thermal equilibrium, we have N2 < N1 on the basis of Boltzmann


distribution law, and so dE < 0. In this occasion, therefore, the photon energy
decreases. For the light amplification to take place, therefore, we must have a
following condition:

N2 > N1: ð9:83Þ

This energy distribution is called inverted distribution or population inversion.


Thus, the laser oscillation is a typical non-equilibrium phenomenon. To produce the
population inversion, we need an external exciting source using an electrical or
optical device.
The essence of lasers rests upon the fact that stimulated emission produces a
photon that possesses a wavenumber vector (k) and a polarization (ε) both the same
as those of an original photon. For this reason, the laser light is monochromatic and
highly directional. To understand the fundamental mechanism underlying the related
phenomena, interested readers are encouraged to seek appropriate literature of
quantum theory of light for further reading [4].
To make a discussion simple and straightforward, let us assume that the light is
incident parallel to the long axis of the rectangular parallelepiped. Then, the stimu-
lated emission produces light to be propagated in the same direction. As a result, an
irradiance I measured in that direction is described as

E 0
I= c: ð9:84Þ
SL

Note that the light velocity in the laser medium c′ is given by


9.5 Lasers 369

c0 = c=n, ð9:85Þ

where n is a refractive index of the laser medium. Taking an infinitesimal of both


sides of (9.84), we have

c0 c0
dI = dE = B21 ρðω21 ÞðN 2 - N 1 Þℏω21 dt
SL SL ð9:86Þ
= B21 ρðω21 ÞNℏω21 c0 dt,

where N~ = ðN 2 - N 1 Þ=SL denotes a “net” density of atoms that occupy the excited
state.
The energy density ρ(ω21) can be written as

ρðω21 Þ = I ðω21 Þgðω21 Þ=c0 , ð9:87Þ

where I(ω12) [J s-1 m-2] represents an intensity of radiation; g(ω21) is a gain


function [s]. The gain function is a measure that shows how favorably
(or unfavorably) the transition takes place at the said angular frequency ω12. This
is normalized in the emission range such that
1
gðωÞdω = 1:
0

The quantity I(ω21) is an energy flux that gets through per unit area per unit time.
This flux corresponds to an energy contained in a long and thin rectangular paral-
lelepiped of a length c′ and a unit cross-section area. To obtain ρ(ω21), I(ω12) should
be divided by c′ in (9.87) accordingly. Using (9.87) and replacing c′dt with a
distance dx and I(ω12) with I(x) as a function of x, we rewrite (9.86) as

~ 21
B21 gðω21 ÞNℏω
dI ðxÞ = 0 I ðxÞdx: ð9:88Þ
c

Dividing (9.88) by I(x) and integrating both sides, we have

I
dI ðxÞ I ~ 21
B21 gðω21 ÞNℏω x
= d ln I ðxÞ = 0 dx, ð9:89Þ
I0 I ð xÞ I0 c 0

where I0 is an irradiance of light at an instant when the light is entering the laser
medium from the left. Thus, we get

~ 21
B21 gðω21 ÞNℏω
I ðxÞ = I 0 exp x : ð9:90Þ
c0
370 9 Light Quanta: Radiation and Absorption

Equation (9.90) shows that an irradiance of the laser light is augmented expo-
nentially along the path of the laser light. In (9.90), denoting an exponent as G

~ 21
B21 gðω21 ÞNℏω
G , ð9:91Þ
c0

we get

I ðxÞ = I 0 exp Gx:

The constant G is called a gain constant. This is an index that indicates the laser
performance. Large numbers B21, g(ω21), and N ~ yield a high performance of the
laser.
In Sects. 8.8 and 9.2, we sought conditions for electromagnetic waves to cause
constructive interference. In a one-dimensional dielectric medium, the condition is
described as

kL = mπ or mλ = 2L ðm = 1, 2, ⋯Þ, ð9:92Þ

where k and λ denote a wavenumber and wavelength in the dielectric medium,


respectively. Indexing k and λ with m that represents a mode, we have

k m L = mπ or mλm = 2L ðm = 1, 2, ⋯Þ: ð9:93Þ

This condition can be expressed by different manners such that

ωm = 2πνm = 2πc0 =λm = 2πc=nλm = mπc=nL: ð9:94Þ

It is often the case that if the laser is a long and thin rod, rectangular parallele-
piped, etc., we see that sharply resolved and regularly spaced spectral lines are
observed in emission spectrum. These lines are said to be a longitudinal multimode.
The separation between two neighboring emission lines is referred to as the free
spectral range [2]. If adjacent emission lines are clearly resolved so that the free
spectral range can easily be recognized, we can derive useful information from the
laser oscillation spectra (vide infra).
Rewriting (9.94), as, e.g.,

πc
ωm n = m, ð9:95Þ
L

and taking differential (or variation) of both sides, we get


9.5 Lasers 371

δn δn πc
nδωm þ ωm δn = nδωm þ ωm δω = n þ ωm δωm = δm: ð9:96Þ
δωm m δωm L

Therefore, we get

-1
πc δn
δωm = n þ ωm δm: ð9:97Þ
L δωm

Equation (9.97) premises the wavelength dispersion of a refractive index of a


laser medium. Here, the wavelength dispersion means that the refractive index varies
as a function of wavelengths of light in a matter. The laser materials often have a
considerably large dispersion and relevant information is indispensable.
From (9.97), we find that

δn
ng  n þ ωm ð9:98Þ
δωm

plays a role of refractive index when the laser material has a wavelength dispersion.
The quantity ng is said to be a group refractive index (or group index). Thus, (9.97) is
rewritten as

πc
δω = δm, ð9:99Þ
Lng

where we omitted the index m of ωm. When we need to distinguish the refractive
index n clearly from the group refractive index, we refer to n as a phase refractive
index. Rewriting (9.98) as a relation of continuous quantities and using differenti-
ation instead of variation, we have [2]

dn dn
ng = n þ ω or ng = n - λ : ð9:100Þ
dω dλ

To derive the second equation of (9.100), we used the following relations:


Namely, taking a variation of λω = 2πc, we have

ω λ
ωdλ þ λdω = 0 or =- :
dω dλ

Several formulae or relation equations were proposed to describe the wavelength


dispersion. One of the famous and useful formulas among them is Sellmeier’s
dispersion formula [5]. As an example, the Sellmeier’s dispersion formula can be
described as
372 9 Light Quanta: Radiation and Absorption

B
n= Aþ , ð9:101Þ
c 2
1- λ

where A, B, and C are appropriate constants with A and B being dimensionless and
C having a dimension [m]. In an actual case, it would be difficult to determine
n analytically. However, if we are able to obtain well-resolved spectra, δm can be put
as 1 and δωm can be determined from the free spectral range. Expressing it as δωFSR
from (9.99) we have

πc πc
δωFSR = or ng = : ð9:102Þ
Lng LðδωFSR Þ

Thus, one can determine ng as a function of wavelengths.

9.5.2 Organic Lasers

By virtue of prominent features of lasers and related phenomena, researchers and


engineers have been developing and proposing up to now various device structures
and their operation principles in the research field of device physics. Of these, we
have organic lasers as newly occurring laser devices. Organic crystals possess the
well-defined structure-property relationship and some of them exhibit peculiar
light-emitting features. Therefore, those crystals are suited for studying their lasing
property. In this section we study the light-emitting properties (especially the lasing
property) of the organic crystals in relation to the wavelength dispersion of refractive
index of materials. From the point of view of device physics, another key issue lies in
designing an efficient diffraction grating or resonator (see Sect. 8.8). We address
these fundamental issues from as broad a viewpoint as possible so that we may
acquire systematic knowledge about optical device applications of materials of
different compositions either inorganic or organic.
In the following tangible examples, we further investigate specific aspects of the
light-emitting properties of the organic crystals to pursue fundamental properties of
the organic light-emitting materials. The emission properties are examined in a
device configuration of (1) thin rectangular parallelepiped or (2) slab waveguide
equipped with a diffraction grating. In the former case, we investigate the longitu-
dinal multimode of emissions that arise from an optical resonator shaped by a pair of
parallel crystal faces (see Example 9.1). In the latter case, we study the transverse
mode of the emissions in addition to the longitudinal mode (Examples 9.2 and 9.3).
Example 9.1 [6] We wish to show and interpret optical emission data of the organic
crystals within a framework of the interference theory of electromagnetic waves
mentioned in the previous chapter. As an example, Fig. 9.6 [6] displays a broadband
emission spectrum obtained under a weak optical excitation regime (using a mercury
lamp) with a crystal consisting of an organic semiconductor AC′7. The optical
9.5 Lasers 373

(a) (b)

11

Intensity (10 counts)


Intensity (10 counts)

10
3

3
10
5

9
0
500 520 540 560 580 600 620 640 529 530 531
Wavelength (nm) Wavelength (nm)

Fig. 9.6 Broadband emission spectra of an organic semiconductor crystal AC′7. (a) Full spectrum.
(b) Enlarged profile of the spectrum around 530 nm. (Reproduced from Yamao T, Okuda Y,
Makino Y, Hotta S (2011) Dispersion of the refractive indices of thiophene/phenylene co-oligomer
single crystals. J Appl Phys 110(5): 053113/7 pages [6], with the permission of AIP Publishing.
https://doi.org/10.1063/1.3634117)

Fig. 9.7 Laser oscillation


spectrum of an organic 4
Intensity (10 counts)

semiconductor crystal AC′


7. (Reproduced from
Yamao T, Okuda Y, 3
Makino Y, Hotta S (2011)
Dispersion of the refractive
3

indices of thiophene/
phenylene co-oligomer
2
single crystals. J Appl Phys
110(5): 053113/7 pages [6],
with the permission of AIP 1
Publishing. https://doi.org/
10.1063/1.3634117)
0
522 524 526 528
Wavelength (nm)

measurements were carried out using an AC′7 crystal that had a pair of parallel
crystal faces which worked as an optical resonator called Fabry-Pérot resonator. It
caused the interference modulation of emissions and, as a result, produced a periodic
modulation that was superposed onto the broadband emission. It is clearly seen in
Fig. 9.6b.
As another example, Fig. 9.7 [6] displays a laser oscillation spectrum obtained
under a strong excitation regime (using a laser beam) with the same AC′7 crystal as
the above. The structural formula of AC′7 is shown in Fig. 9.8 together with other
374 9 Light Quanta: Radiation and Absorption

Fig. 9.8 Structural formulae of several organic semiconductors BP1T, AC5, and AC′7

related organic semiconductors. The spectrum again exhibits the periodic modula-
tion accompanied by the longitudinal multimode. Sharply resolved emission lines
are clearly noted with a regular spacing in the spectrum. If we compare the regular
spacing with that of Fig. 9.6b, we immediately recognize that the regular spacing is
virtually the same with the two spectra in spite of a considerably large difference
between their overall spectral profile. In terms of the interference, both the broad-
band spectra and laser emission lines gave the same free spectral range [6].
Thus, both the broadband emission spectra and laser oscillation spectra are
equally useful to study the wavelength dispersion of refractive index of the crystal.
Choosing a suitable empirical formula of the wavelength dispersion, we can com-
pare it with experimentally obtained results. For example, if we choose, e.g., (9.101)
for the empirical formula, what we do next is to determine the constants A, B, and C
of (9.101) by comparing the computation results with actual emission data.
If well-resolved interference modulation spectra can be obtained, this enables one
to immediately get a precise value of ng from (9.102). In turn, it leads to information
about the wavelength dispersion of n through (9.100) and (9.101). Bearing in mind
this situation, Yamao et al. got a following expression [6]:
9.5 Lasers 375

Table 9.1 Optimized constants of A, B, and C for Sellmeier’s dispersion formula (9.101) with
several organic semiconductor crystalsa
Material A B C (nm)
BP1T 5.7 1.04 397
AC5 3.9 1.44 402
AC′7 6.0 1.06 452
a
Reproduced from Yamao T, Okuda Y, Makino Y, Hotta S (2011) Dispersion of the refractive
indices of thiophene/phenylene co-oligomer single crystals. J Appl Phys 110(5): 053113/7 pages
[6], with the permission of AIP Publishing. https://doi.org/10.1063/1.3634117

2
c 2
A 1- λ þB
ng = : ð9:103Þ
3
c 2 c 2
1- λ A 1- λ þB

Notice that (9.103) is obtained by inserting (9.101) into (9.100) and expressing ng
as a function of λ. Determining optimum constants A, B, and C, a set of these
constants yield a reliable dispersion formula in (9.101). Numerical calculations can
be utilized effectively. The procedures are as follows: (1) Tentatively choosing
probable numbers for A, B, and C in (9.103), ng can be expressed as a function of
λ. (2) The resulting fitting curve is then compared with ng data experimentally
decided from (9.102). After this procedure, one can choose another set of A, and
B, and C and again compare the fitting curve with the experimental data. (3) This
procedure can be repeated many times through iterative numerical computations of
(9.103) using different sets of A, B, and C.
Thus, we should be able to adjust and determine better and better combination of
A, B, and C so that the refined function (9.103) can reproduce the experimental
results as precise as one pleases. At the same time, we can determine the most
reliable combination of A, B, and C. Optimized constants A, B, and C of (9.101) are
listed in Table 9.1 [6]. Using these numbers A, B, and C, the dispersion of the phase
refractive indices n can be calculated using (9.101) as a function of wavelengths λ.
Figure 9.9 [6] shows several examples of the wavelength dispersion of ng and n for
the organic semiconductor crystals. Once again, it is intriguing to see that ng
associated with the laser oscillation line sits on the dispersion curve determined
from the broadband emission lines (Fig. 9.9a).
The formulae (9.101) and (9.103) along with associated procedures to determine
the constants A, B, and C are expected to apply widely to various laser and light-
emitting materials consisting of semiconducting inorganic and organic materials.
Example 9.2 [7] If we wish to construct a laser device, it will be highly desired to
equip the laser material with a suitable diffraction grating or resonator [8, 9]. Suppose
a situation where we choose a thin crystal for an optically active material equipped
with the diffraction grating. In that case, we are to deal with a device consisting of a
crystal slab waveguide. We have already encountered such a device configuration in
Sect. 8.7 where the transverse modes must be considered on the basis of (8.167). We
376 9 Light Quanta: Radiation and Absorption

7
(a)
BP1T
Group index ng
6 AC'7

5 AC5

3.3
3.2 AC'7 (b)
Refractive index n

3.1 BP1T
3
2.9
2.8
2.7
2.6 AC5
2.5
500 600
Wavelength (nm)
Fig. 9.9 Examples of the wavelength dispersion of refractive indices for several organic semicon-
ductor crystals. (a) Dispersion curves of the group indices (ng). The optimized solid curves are
plotted using (9.103). The data plotted by filled circles (BP1T), filled triangles (AC5), and filled
diamonds (AC′7) were obtained by the broadband interference modulation of emissions. An open
circle, an open triangle, and open diamonds show the group indices estimated from the mode
intervals of the multimode laser oscillations. (b) Dispersion curves of the phase refractive indices
(n). These data are plotted using the Sellmeier’s dispersion formula (9.101) with the optimized
parameters A, B, and C (listed in Table 9.1). (Reproduced from Yamao T, Okuda Y, Makino Y,
Hotta S (2011) Dispersion of the refractive indices of thiophene/phenylene co-oligomer single
crystals. J Appl Phys 110(5): 053113/7 pages [6], with the permission of AIP Publishing. https://
doi.org/10.1063/1.3634117)

should also take account of the longitudinal modes that arise from the presence of the
diffraction grating. As a result, allowed modes in the optical device are indexed by,
e.g., TMml, TEml, where m and l denote the longitudinal modes and transverse
modes, respectively (see Sects. 8.7 and 8.8).
Besides the information about the dispersion of phase refractive index, we need
numerical data of the propagation constant and its dispersion. The propagation
constant β has appeared in Sect. 8.7.1 and is defined as
9.5 Lasers 377

β = k sin θ = nk 0 sin θ, ð8:153Þ

where θ is identical to an angle denoted by θ that was defined in Fig. 8.12. The
propagation constant β (>0) characterizes the light propagation within a slab
waveguide where the transverse modes of light are responsible.
As the phase refractive index n has a dispersion, the propagation constant β has a
dispersion as well. In optics, n sin θ is often referred to as an effective index. We
denote it by

neff  n sin θ or β = k 0 neff : ð9:104Þ

As 0 ≤ θ ≤ π/2, we have n sin θ ≤ n. In general, it is going to be difficult to obtain


analytical description or solution for the dispersion of both the phase refractive index
and effective index. As in Example 9.1, we usually obtain the relevant data by the
numerical calculations.
Under these circumstances, to establish a design principle of high-performance
organic light-emitting devices Yamao et al. [7] have developed a light-emitting
device where an organic crystal of P6T (see structural formula of Fig. 9.10a) was
placed onto a diffraction grating (Fig. 9.10b [7]). The compound P6T is known as
one of a family of thiophene/phenylene co-oligomers (TPCOs) together with BP1T,
AC5, AC′7, etc. that appear in Fig. 9.8 [10, 11].
The diffraction grating was engraved using a focused ion beam (FIB) apparatus
on an aluminum-doped zinc oxide (AZO) substrate. The resulting substrate was then
laminated with the thin crystal of P6T (Fig. 9.10c). Using those devices, Yamao et al.
excited the P6T crystal with ultraviolet light of a mercury lamp and collected the
emissions from the crystal and analyzed those emission spectra (vide infra). Detailed
analysis of those spectra has revealed that there is a close relationship between the
peak location of emission lines and emission direction. Thus, to analyze the angle-
dependent emission spectra turns out to be a powerful tool to accurately determine
the dispersion of propagation constant of the laser material along with light-emitting
materials more generally.
In a light-emitting device, we must consider a problem of out-coupling of light.
Seeking efficient out-coupling of light is equivalent to solving equations of electro-
magnetic wave motion inside and outside the slab crystal under appropriate bound-
ary conditions. The boundary conditions are given such that tangential components
of electromagnetic fields are continuous across the interface formed by the slab
crystal and air or a device substrate. Moreover, if a diffraction grating is present in an
optical device (including the laser), the diffraction grating influences emission
characteristics of the device. In other words, regarding the out-coupling of light
we must impose the following condition on the device such that [7]

β - mK = ðk0  ~eÞ~e, ð9:105Þ


378 9 Light Quanta: Radiation and Absorption

(a)

P6T

(b)

2 µm

(c)
P6T crystal
air

optical device

AZO grating

Fig. 9.10 Construction of an organic light-emitting device. (a) Structural formula of P6T. (b)
Diffraction grating of an organic device. The purple arrow indicates the direction of grating
wavevector K. (Reproduced from Yamao T, Higashihara S, Yamashita S, Sano H, Inada Y,
Yamashita K, Ura S, Hotta S (2018) Design principle of high-performance organic single-crystal
light-emitting devices. J Appl Phys 123(23): 235501/13 pages [7], with the permission of AIP
Publishing. https://doi.org/10.1063/1.5030486). (c) Cross-section of the device consisting of P6T
crystal/AZO substrate
9.5 Lasers 379

Fig. 9.11 Geometrical


relationship among the light
propagation in crystal
(indicated with β), the
propagation outside the
crystal (k0), and the grating
wavevector (K). This
geometry represents the −
phase matching between the
emission inside the device
(organic slab crystal) and
out-coupled emission (i.e.,
the emission outside the
device); see text

where β is the propagation vector with |β|  β that appeared in (8.153). The quantity
β is called a propagation constant. In (9.105), furthermore, m is the diffraction order
with a natural number (i.e., a positive integer); k0 denotes the wavenumber vector of
an emission in vacuum with

jk0 j  k 0 = 2π=λp ,

where λp is the emission peak wavelength in vacuum. Note that k0 is almost identical
to the wavenumber of the emission in air (i.e., the emission to be detected). The unit
vector ~e that is oriented in the direction parallel to β - mK within the crystal plane is
defined as

β - mK
~e = : ð9:106Þ
j β - mK j

In the above equation, K is the grating wavevector defined as

jK j  K = 2π=Λ,

where Λ is the grating period and the direction of K is perpendicular to the grating
grooves; that direction is indicated in Fig. 9.10b with a purple arrow. Equation
(9.105) can be visualized in Fig. 9.11 as geometrical relationship among the light
propagation in crystal (indicated with β), the propagation outside the crystal (k0), and
the grating wavevector (K).
If we are dealing with the emission parallel to the substrate plane, i.e., grazing
emission, (9.105) can be reduced to
380 9 Light Quanta: Radiation and Absorption

β - mK = k0 : ð9:107Þ

Equations (9.105) and (9.107) represent the relationship between β and k0, i.e.,
the phase matching conditions between the emission inside the device (organic slab
crystal) and the out-coupled emission, namely the emission outside the device
(in air).
Thus, the boundary conditions can be restated as (1) the tangential component
continuity of electromagnetic fields at both the planes of interface (see Sect. 8.1) and
(2) the phase matching of the electromagnetic fields both inside and outsides the
laser medium. We examine these two factors below.
1. Tangential component continuity of electromagnetic fields (waveguide
condition):
Let us suppose that we are dealing with a dielectric medium with anisotropic
dielectric constant, but isotropic magnetic permeability. In this situation, Max-
well’s equations must be formulated so that we can deal with the electromagnetic
properties in an anisotropic medium. Such substances are widely available and
organic crystals are counted as typical examples. Among those crystals, P6T
crystallizes in the monoclinic system as in many other cases of organic crystals
[10, 11]. In this case, the permittivity tensor (or electric permittivity tensor) ε is
written as [7]

εaa 0 εac
ε= 0 εbb 0 : ð9:108Þ
εc a 0 εc  c 

Notice that in (9.108) the permittivity tensor is described in the orthogonal


coordinate of abc-system, where a and b coincide with those of the crystallo-
graphic a- and b-axes of P6T with the c-axis being perpendicular to the ab-plane.
Note that in many of organic crystals the c-axis is not perpendicular to the ab-
plane. Meanwhile, we assume that the magnetic permeability μ of P6T is isotropic
and identical to that of vacuum. That is, in a tensor form we have

μ0 0 0
μ= 0 μ0 0 , ð9:109Þ
0 0 μ0

where μ0 is the magnetic permeability of vacuum.


Then, the Maxwell’s equations in a matrix form read as

∂H
rot E = - μ0 , ð9:110Þ
∂t
9.5 Lasers 381

Fig. 9.12 Laboratory


coordinate system (the ξηζ-
system) for the device
experiments. Regarding the
symbols and notations,
see text

εaa 0 εac ∂E
rot H = ε0 0 εbb 0 : ð9:111Þ
εc  a 0 εc c ∂t

In (9.111) the equation is described in the abc-system. But, it will be desired


to describe the Maxwell’s equations in the laboratory coordinate system (see
Fig. 9.12) so that we can readily visualize the light propagation in an anisotropic
crystal such as P6T. Figure 9.12 depicts the ξηζ-system, where the ξ-axis is in the
direction of the light propagation within the crystal, namely the ξ-axis parallels β
in (9.107). We assume that the ξηζ-system forms an orthogonal coordinate.
Here we define the dielectric constant ellipsoid ~ε described by

ξ
ðξ η ζ Þ~ε η = 1, ð9:112Þ
ζ

which represents an ellipsoid in the ξηζ-system. If one cuts the ellipsoid by a


plane that includes the origin of the coordinate system and is perpendicular to the
ξ-axis, its cross-section is an ellipse. In this situation, one can choose the η- and ζ-
axes for the principal axis of the ellipse. This implies that when we put ξ = 0, we
must have
382 9 Light Quanta: Radiation and Absorption

0
ð0 η ζ Þ~ε η = εηη η2 þ εζζ ζ 2 = 1: ð9:113Þ
ζ

In other words, we must have ~ε in the form of

εξξ εξη εξζ


~ε = εηξ εηη 0 ,
εζξ 0 εζζ

where ~ε is expressed in the ξηζ-system. In terms of the matrix algebra, the


principal submatrix with respect to the η- and ζ-components should be diagonal-
ized (see Sect. 11.3). On this condition, the electric flux density of the electro-
magnetic wave is polarized in the η- and ζ-direction. A half of the reciprocal
p p
square root of the principal axes, i.e., 1= εηη or 1= εζζ represents the anisotropic
refractive indices.
Suppose that the ξηζ-system is reached by two successive rotations starting
from the abc-system in such a way that we first perform the rotation by α around
the c-axis that is followed by another rotation by δ around the ξ-axis (Fig. 9.12).
Then we have [7]

εξξ εξη εξζ εaa 0 εac


εηξ εηη 0 = Rξ- 1 ðδÞRc- 1 ðαÞ 0 εbb 0 Rc ðαÞRξ ðδÞ, ð9:114Þ
εζξ 0 εζζ εc  a 0 εc c

where Rc ðαÞ and Rξ(δ) stand for the first and second rotation matrices of the
above, respectively. In (9.114) we have

cos α - sin α 0 1 0 0
Rc ðαÞ = sin α cos α 0 , Rξ ðδÞ = 0 cos δ - sin δ : ð9:115Þ
0 0 1 0 sin δ cos δ

With the matrix presentations and related coordinate transformations for


(9.114) and (9.115), see Sects. 11.4, 17.1, and 17.4.2. Note that the rotation
angle δ cannot be decided independent of α. In the literature [7], furthermore, the
quantity εξη = εηξ was ignored as small compared to other components. As a
result, (9.111) is converted into the following equation described by

εξξ 0 εξζ ∂E
rot H = ε0 0 εηη 0 : ð9:116Þ
εζξ 0 εζζ ∂t

Notice here that the electromagnetic quantities H and E are measured in the
ξηζ-coordinate system.
Meanwhile, the electromagnetic waves that are propagating within the slab
crystal are described as
9.5 Lasers 383

Φν exp½iðβξ - ωt Þ, ð9:117Þ

where Φν stands for either an electric field or a magnetic field with ν chosen from
ν = 1, 2, 3 representing each component of the ξηζ-system. Inserting (9.117) into
(9.116) and separating the resulting equation into ξ, η, and ζ components, we
obtain six equations with respect to six components of H and E. Of these, we are
particularly interested in Eξ and Eζ as well as Hη, because we assume that the
electromagnetic wave is propagated as a TM mode [7].
Using the set of the above six equations, with Hη in the P6T crystal we get the
following second-order linear differential equation (SOLDE):

d2 H η εξζ dH η εξζ 2 εξξ


þ 2ineff k0 þ εξξ - - n 2 k 2 H = 0, ð9:118Þ
dζ 2 ε ζζ dζ εζζ εζζ eff 0 η

where neff = n sin θ in (8.153) is the effective index. Assuming as a solution of


(9.118)

H η = H cryst e - iκζ cosðkζ þ ϕs Þ κ ≠ 0, k ≠ 0, H cryst , ϕs : constant ð9:119Þ

and inserting (9.119) into (9.118), we get a following type of equation described
by

Ζe - iκζ cosðkζ þ ϕs Þ þ Ωe - iκζ sinðkζ þ ϕs Þ = 0: ð9:120Þ

In (9.119) Hcryst represents the magnetic field within the P6T crystal. The
constants Ζ and Ω in (9.120) can be described using κ and k together with the
constant coefficients of dHη/dζ and Hη of SOLDE (9.118). Meanwhile, we have

e - iκζ cosðkζþϕs Þ e - iκζ


sinðkζþϕs Þ - 2iκ
½e - iκζ sinðkζþϕs Þ = ke ≠ 0,
0 0
½e - iκζ cosðkζþϕs Þ

where the differentiation is pertinent to ζ. This shows that e-iκζ cos (kζ + ϕs) and
e-iκζ sin (kζ + ϕs) are linearly independent. This in turn implies that

ΖΩ0 ð9:121Þ

in (9.120). Thus, from (9.121) we can determine κ and k such that

εξζ εξζ 1
κ=β = neff k0 , k2 = εξξ εζζ - εξζ 2 εζζ - neff 2 k0 2 : ð9:122Þ
εζζ εζζ εζζ 2

Further using the aforementioned six equations to eliminate Eζ, we obtain


384 9 Light Quanta: Radiation and Absorption

εζζ
Eξ = i kH e - iκζ sinðkζ þ ϕs Þ: ð9:123Þ
ωε0 ðεξξ εζζ - εξζ 2 Þ cryst

Equations (9.119) and (9.123) define the electromagnetic fields as a function


of ζ within the P6T slab crystal.
Meanwhile, the fields in the AZO substrate and air can readily be determined.
Assuming that both the substances are isotropic, we have εξζ = 0 and εξξ =
εζζ ≈ ~
n2 and, hence, we get

d2 H η
~2 - neff 2 k0 2 H η = 0,
þ n ð9:124Þ
dζ 2

where ~
n is the refractive index of either the substrate or air. Since

1<~
n ≤ neff ≤ n, ð9:125Þ

where n is the phase refractive index of the P6T crystal, we have ~n2 - neff 2 < 0.
Defining a quantity

γ neff 2 - ~
n2 k 0 ð9:126Þ

for the substrate (i.e., AZO) and air, from (9.124) we express the electromagnetic
fields as

~ γζ ,
H η = He ð9:127Þ

where H~ denotes the magnetic field relevant to the substrate or air. Considering
that ζ < 0 for the substrate and ζ > d/ cos δ in air (see Fig. 9.13), with the
magnetic field we have

~ γζ
H η = He ~ - γðζ - cosd δÞ
and H η = He ð9:128Þ

for the substrate and air, respectively. Notice that Hη → 0, when ζ → - 1


with the substrate and ζ → 1 for air. Correspondingly, with the electric field
we get

γ ~ γζ γ ~ - γ ðζ - cosd δÞ
Eξ = - i He and Eξ = i He ð9:129Þ
ωε0 ~
n2 ωε0 ~
n2

for the substrate and air, respectively.


The device geometry is characterized by that the P6T slab crystal is
sandwiched by air and the device substrate so as to form the three-layered
(air/crystal/substrate) structure (Fig. 9.10c). Therefore, we impose boundary
9.5 Lasers 385

Fig. 9.13 Geometry of the


cross-section of air/P6T
crystal/AZO substrate. The
angle δ is identical with that
of Fig. 9.12. The points
P and P′ are located at the
crystal/substrate interface
and air/crystal interface,
air
respectively. The distance
between P and P′ is d/ cos δ

P6T crystal

AZO substrate

conditions on Hη in (9.119) and (9.128) along with Eξ in (9.123) and (9.129) in


such a way that their tangential components are continuous across the interfaces
between the crystal and the substrate and between the crystal and air.
Figure 9.13 represents the geometry of the cross-section of air/P6T crystal/
AZO substrate. From (9.119) and (9.128), (a) the tangential continuity condition
for the magnetic field at the crystal/substrate interface is described as

~ γ0 ðcos δÞ,


H cryst e - iκ0 cosðk  0 þ ϕs Þðcos δÞ = He ð9:130Þ

where the factor of cosδ comes from the requirement of the tangential continuity
condition. (b) The tangential continuity condition for the electric field at the same
interface reads as

εζζ γ ~ γ0
i kH e - iκ0 sinðk  0 þ ϕs Þ = - i He : ð9:131Þ
ωε0 ðεξξ εζζ - εξζ 2 Þ cryst ωε0 ~n2

Thus, dividing both sides of (9.131) by both sides of (9.130), we get

εζζ γ εζζ γ
k tan ϕs = - 2 or k tanð- ϕs Þ = 2 , ð9:132Þ
εξξ εζζ - εξζ 2 ~
n εξξ εζζ - εξζ 2 ~n
386 9 Light Quanta: Radiation and Absorption

where γ and ~n are substrate related quantities; see (9.125) and (9.126).
Another boundary condition at the air/crystal interface can be obtained in a
similar manner. Notice that in that case ζ = d/ cos δ; see Fig. 9.13. As a result,
we have

εζζ kd γ
k tan þ ϕs = 2 , ð9:133Þ
εξξ εζζ - εξζ 2 cos δ n

where γ and n ~ are air related quantities. The quantity ϕs can be eliminated
considering arc tangent of (9.132) and (9.133).
In combination with (9.122), we finally get the following eigenvalue equation
for TM modes:

1
k0 d εξξ εζζ - εξζ 2 εζζ - neff 2 =ðcos δÞ
εζζ 2
1 neff 2 - nair 2
= lπ þ tan - 1 ðεξξ εζζ - εξζ 2 Þ ð9:134Þ
nair 2 εζζ - neff 2

1 neff 2 - nsub 2
þ tan - 1 ðεξξ εζζ - εξζ 2 Þ ,
nsub 2 εζζ - neff 2

where nair (=1) is the refractive index of air, nsub is that of AZO substrate
equipped with a diffraction grating, d is the crystal thickness, and l is the order
of the transverse mode. The refractive index nsub is estimated as the average of the
refractive indices of air and AZO weighted by volume fraction they occupy in the
diffraction grating.
Regarding the index l, the condition l ≥ 0 (i.e., zero or a positive integer) must
be satisfied. Note that l = 0 is allowed because the total internal reflection (Sects.
8.5 and 8.6) is responsible in the crystal waveguide. Thus, neff can be determined
as a function of k0 with d and l being parameters. To solve (9.134), iterative
numerical computation is needed. The detailed calculation procedures for this can
be seen in the literature [7] and supplementary material therein.
Note that (9.134) includes the well-known eigenvalue equation for a wave-
guide that comprises an isotropic core dielectric medium sandwiched by a couple
of clad layers having the same refractive index (symmetric waveguide) [12]. In
that case, in fact, the second and third terms in RHS of (9.134) are the same and
their sum is identical with -δTM of (8.176) in Sect. 8.7.2. To confirm it, in (9.134)
use εξξ = εζζ ≈ n2 [see (7.57)] and εξζ = 0 together with the relative refractive
index given by (8.39).

2. Phase matching of the electromagnetic fields (transverse mode requisition):


Once we have obtained the eigenvalue equation described by (9.134), we will
be able to solve the problem from the crystal thickness d. Even though the order
of the transverse mode l is yet unknown, l is a discrete value (i.e., zero or a
9.5 Lasers 387

(a) (b)

Case (i) Case (ii)


Fig. 9.14 Two cases of the light propagation in the device geometry of slab waveguide. (a) The
projection of the vectors β and k0 onto K is parallel to each other. (b) The projection β and k0 onto K
is antiparallel. For sake of simplicity, the projection of β onto K is parallel to K for both the cases.
The number m is assumed to be a positive integer

positive integer) so that the computation of (9.134) can be simplified. Namely,


inserting probable values l into (9.134) we can greatly save time and effort to get a
solution.
Meanwhile, the condition of the phase matching is equivalent to determining
the index m related to the longitudinal mode. To determine m, we return back to
(9.107). Figure 9.14 features following two cases with regard to the light prop-
agation represented by (9.107) in the device geometry of slab waveguide. Case
(1): The projection of the vectors β and k0 onto K is parallel to each other (see
Fig. 9.14a). Case (2): The projection β and k0 onto K are antiparallel (Fig. 9.14b).
The angle θ of Fig. 9.14 is defined as that made by k0 and K. Suppose geometric
transformations that convert the vectors β, k0, and K to those having the opposite
direction. Then, the mutual geometric arrangement of these vectors remains
unchanged together with the form of (9.107) before and after the transformations.
Concomitantly, the angle θ is switched to θ - π. As a result, if we set θ as
0 ≤ θ ≤ π/2, we have -π ≤ θ - π ≤ - π/2.
Meanwhile, let us think of the reflection with respect to the plane that contains
the vector K and is perpendicular to the plane of paper. In that case, we are
uncertain as to whether β undergoes the same reflection according to the said
reflection operation. It is because the dielectric constant ellipsoid does not
necessarily possess the mirror symmetry with respect to that plane in an aniso-
tropic medium. We need the information about the emission data taken at the
angle -θ accordingly. Similarly, setting -θ as 0 ≤ - θ ≤ π/2 (or -π/2 ≤ θ ≤ 0),
we have -3π/2 ≤ θ - π ≤ - π. Combining the above results, the ranges of θ and
θ - π cover the emission angles [-3π/2, π/2]. It is equivalent to the whole range
[0, 2π]. Thus, the angle-dependent measurements covering -π/2 ≤ θ ≤ π/2
provide us with sufficient data to be analyzed.
Now, care should be taken to determine β. This is because from an experi-
mental perspective k0 can be decided accurately but arbitrariness is involved with
β. Defining β  |β|, k0  j k0j, and K  j Kj, we examine the relationship among
388 9 Light Quanta: Radiation and Absorption

magnitudes of β, k0, and K according to the above two cases. Using an elementary
trigonometric formula, we describe the results as follows:

β2 = k0 2 þ m2 K 2 - 2mKk0 cosðπ - θÞ = k0 2 þ m2 K 2 þ 2mKk0 cos θ,

or

β2 = k0 2 þ m2 K 2 - 2mKk0 cos θ, ð9:135Þ

where we assume that m is a positive integer. In (9.135) the former and latter
equations correspond to Case (1) and Case (2), respectively (see Fig. 9.14).
Replacing β with β = k0neff and taking the variation of the resulting equation
with regard to k0 and θ, we get

δk0 ∓ mKk0 sin θ


= , ð9:136Þ
δθ k 0 ðneff 2 - 1Þ ∓ mK cos θ

where the signs - and + are associated with Case (1) and Case (2), respectively.
In (9.136) we did not take the variation of neff, because it was assumed to be
weakly dependent on θ compared to k0. Note that since k0 was not allowed to take
continuous numbers, we used the sign of variation δ instead of the sign of
differentiation d or ∂. The denominator of (9.136) is positive (readers, check
it), and so δk0/δθ is negative or positive with 0 ≤ θ ≤ π/2 for Case (1) and Case
(2), respectively. The quantity δk0/δθ is positive or negative with -π/2 ≤ θ ≤ 0 for
Case (1) and Case (2), respectively. For both Case (1) and Case (2), δk0/δθ = 0 at
θ = 0, namely, k0 (or the emission wavelength) takes an extremum at θ = 0. In
terms of the emission wavelength, Case (1) and Case (2) are related to the redshift
and blueshift in the spectra, respectively. These shifts can be detected and
inspected closely by the angle-dependent emission spectra (vide infra).
Meanwhile, taking inner products (see Chap. 13) of both sides of (9.107),
we have

hβ - mKjβ - mK i = hk0 jk0 i:

That is, we get

β2 þ m2 K 2 - 2mKβ cos φ = k0 2 , ð9:137Þ

where φ is an angle between mK and β (see Fig. 9.11). To precisely decide φ is


accompanied by experimental ambiguity, however. To avoid the ambiguity, there-
fore, it is preferred to deal with a case where all the vectors β, K, and k0 are
(anti)parallel to ~e in (9.105). From a practical point of view, moreover, it is the
most desired to make a device so that it can strongly emit light in the direction
parallel to the grating wavevector.
9.5 Lasers 389

(a) Z
(b) Z

k0 k0
mK mK
Grating Grating
Crystal β Crystal β
X X

(c) Z

mK
Grating
k0
Crystal −β β
X

Fig. 9.15 Schematic diagram for the relationship among k0, β, and K. This geometry is obtained by
substituting φ = 0 for (9.137). (a) Redshifted case. (b) Blueshifted case. (c) Bragg’s condition case.
All these diagrams are depicted as a cross section of the devices cut along the K direction.
(Reproduced from Inada Y, Kawata Y, Kawai T, Hotta S, Yamao T (2020) Laser oscillation at
an intended wavelength from a distributed feedback structure on anisotropic organic crystals. J Appl
Phys 127(5): 053102/6 pages [13], with the permission of AIP Publishing. https://doi.org/10.1063/
1.5129425)

In that case, we have φ = 0 and (9.137) is reduced to

ðβ - mK Þ2 = k 0 2 or β - mK = ± k0 :

Then, this gives us two choices such that

2π mK
ð1Þ β = k 0 þ mK = neff k0 , k0 = = ,
λp neff - 1
ð9:138Þ
2π mK
ð 2Þ - β = k 0 - mK = - neff k0 , k0 = = :
λp neff þ 1

In (9.138), all the physical quantities are assumed to be positive. According to the
above two cases, we have the following corresponding geometry such that
1. β, K k k0 (β and K are parallel to k0),
2. β, K ka k0 (β and K are antiparallel to k0).
The above geometry is depicted in Fig. 9.15 [13]. Figure 9.15 includes a case of
the Bragg’s condition as well (Fig. 9.15c). Notice that Fig. 9.15a and Fig. 9.15b are
special cases of the configurations depicted in Fig. 9.14a and Fig. 9.14b,
respectively.
390 9 Light Quanta: Radiation and Absorption

Further rewriting (9.138) to combine the two equations, we get

2π ðneff ∓ 1Þ ðneff ∓ 1ÞΛ


λp = = , ð9:139Þ
mK m

where the sign ∓ corresponds to the aforementioned Case (1) and Case (2), respec-
tively; we used the relation of K = 2π/Λ. Equation (9.139) is useful for us to guess
the magnitude of m. It is because since m is changed stepwise (e.g., m = 1, 2, 3,etc.),
we have only to examine a limited number of m so that neff may satisfy (9.125).
When the laser oscillation is responsible, (9.138) and (9.139) should be replaced
with the following relation that represents the Bragg’s condition [14]:

2β = mK ð9:140Þ

or

2neff Λ
λp = : ð9:141Þ
m

Once we find out a suited m, (9.107) enables us to seek the relationship between
λp and neff experimentally.

As described above, we have explained two major factors of the out-coupling of


emission. In the above discussion, we assumed that the TM modes dominate.
Combining (9.107) and (9.134), we successfully assign individual emissions to the
specific TMml modes, in which the indices m and l are longitudinal and transverse
modes, respectively. Note that m ≥ 1 and l ≥ 0. Whereas the index m is associated
with the presence of diffraction grating, l is pertinent to the waveguide condition.
In general, it is quite difficult to precisely predict the phase refractive index and
effective index of anisotropic dielectric materials such as organic crystals. This
situation leads to the difficulty in developing optical devices based on the organic
crystals. In that respect, (9.134) is highly useful in designing high-performance
optical devices. At the same time, the angle-dependent emission spectroscopy offers
various information necessary to construct the device.
As a typical example, Fig. 9.16a [7] shows the angle-dependent emission spectra
of a P6T crystal. For this, the grazing emission was intended. There are two series of
progressions in which the emission peak locations are either redshifted or blueshifted
with increasing angles jθj; the angle θ is defined as that made between two vectors K
and k0; see Figs. 9.14 and 9.16b as well as (9.135). Also, these progressions exhibit
the extrema at θ = 0 as expected. Thus, Fig. 9.16 obviously demonstrates that the
redshifted and blueshifted progressions result from the device geometry represented
by Fig. 9.14a [corresponding to Case (1)] and Fig. 9.14b [Case (2)], respectively.
The first and second equations of (9.138) are associated with the emission spectra
recorded at θ = 0 in a series of the redshifted and blueshifted progressions,
respectively. That is, the index neff was determined from (9.138) by replacing λp
with the peak wavelength observed at θ = 0 of Fig. 9.16a and replacing m with the
9.5 Lasers 391

(a) (b)
4

(10 counts)
Intensity 2
3

90
ion angle (degree)

60

30
RotatAngle

−30

−60

−90
600 700 800
Wavelength (nm)

Fig. 9.16 Angle-dependent emission spectra of a P6T crystal. (a) Emission spectra. (Reproduced
from Yamao T, Higashihara S, Yamashita S, Sano H, Inada Y, Yamashita K, Ura S, Hotta S (2018)
Design principle of high-performance organic single-crystal light-emitting devices. J Appl Phys
123(23): 235501/13 pages [7], with the permission of AIP Publishing. https://doi.org/10.1063/1.
5030486). (b) Geometrical relationship between k0 and K. The grazing emissions (k0) were
collected at various angles θ made with the grating wavevector direction (K)

most suitable integer m (vide supra). In such a manner, individual experimental


emission data were assigned to a specific TMml mode. Namely, the TMml mode
assignment has been completed by combining (k0, l) that was determined from
(9.134) and (k0, m) decided by (9.138). Notice that the same number m is held
unchanged within a same series of redshifted or blueshifted progression (i.e., -
90 ° ≤ θ ≤ 90° in Fig. 9.16a). This allows us to experimentally determine the
dispersion of effective indices within the same progression (either redshifted or
blueshifted) using (9.107).
The emission data experimentally obtained and assigned above are displayed in
Fig. 9.17 [7]. The observed emission peaks are indicated with open circles/triangles
or closed circles/triangles with blueshifted or redshifted peaks, respectively. In
Fig. 9.17, the TM20 (blueshifted) and TM10 (redshifted) modes are seamlessly
jointed so as to be an upper group of emission peaks. A lower group of emission
peaks are assigned to TM21 (blueshifted) or TM11 (redshifted) modes. For those
emission data, once again, the relevant modes are seamlessly jointed. Notice that
these modes may have multiple values of neff according to different εξξ, εζζ, and εξζ
that vary with the different propagation directions of light (i.e., the ξ-axis) within the
slab crystal waveguide plane.
The color solid curves represent the results obtained by purely numerical com-
putations based on (9.134). Nonetheless, the dielectric constant ellipsoid elements
εξξ, εζζ, and εξζ reflect the direction of the wavenumber vector k0 through (9.107)
392 9 Light Quanta: Radiation and Absorption

Refractive index, effective index


3
(2, 0, b)

(2, 1, b) (m, l, s) = (1, 0, r)

2
(1, 1, r)

600 700 800


Peak wavelength (nm)
Fig. 9.17 Wavelength dispersion of the effective indices neff for the P6T crystal waveguide. The
open and closed symbols show the data pertinent to the blueshifted and redshifted peaks, respec-
tively. These data were calculated from either (9.105) or (9.107). The colored solid curves represent
the dispersions of the effective refractive indices computed from (9.134). The numbers and
characters (m, l, s) denote the diffraction order (m), order of transverse mode (l ), and shift direction
(s) that distinguishes between blueshift (b) and redshift (r). A black solid curve drawn at the upper
right indicates the wavelength dispersion of the phase refractive indices (n) of the P6T crystal
related to the one principal axis [15]. (Reproduced from Yamao T, Higashihara S, Yamashita S,
Sano H, Inada Y, Yamashita K, Ura S, Hotta S (2018) Design principle of high-performance
organic single-crystal light-emitting devices. J Appl Phys 123(23): 235501/13 pages [7], with the
permission of AIP Publishing. https://doi.org/10.1063/1.5030486)

and, hence, (9.134) is associated with the experimental data in that sense. As clearly
recognized from Fig. 9.17, agreement between the experiments and numerical
computations is satisfactory. These curves are compared with a black solid curve
drawn at the upper right that indicates the wavelength dispersion of the phase
refractive indices (n) of the P6T crystal related to the one principal axis [15].
In a practical sense, it will be difficult to make a device that causes the laser
oscillation at an intended wavelength using an anisotropic crystal as in the case of
present example. It is because mainly of difficulty in predicting neff of the anisotropic
crystal. From (9.134), however, we can readily determine the wavelength dispersion
of neff of the crystal. From the Bragg’s condition (9.141), in turn, Λ can precisely be
designed for a given index m and suitable λp (at which neff can be computed with a
crystal of its thickness d ). Naturally, it is sensible to set λp at a wavelength where
high optical gain (or emission gain) is anticipated [16].
Example 9.3 [13] Thus, everything has been arranged for one to construct a high-
performance optical device, especially a laser. Using a single crystal of BP3T (see
Fig. 9.18a for the structural formula), Inada and co-workers have developed a device
that causes the laser oscillation at an intended wavelength [13].
To this end, the BP3T crystal was placed on a silicon oxide substrate and
subsequently spin-coated with a photoresist film. The photoresist film was then
9.5 Lasers 393

(a)
S
S S

BP3T

(b)
Effective/phase refractive index

2.8

(2, 0, b)
2.6 (m, l) =
(3, 0)

2.4
(m, l, s) = (1, 0, r)

2.2
550 600 650 700
Peak wavelength (nm)
Fig. 9.18 Wavelength dispersions of the effective/phase refractive indices for the BP3T crystal
waveguide. (a) Structural formula of BP3T. (b) Dispersion data of the device D1 (see Table 9.2).
Colored solid curves (either green or red) are obtained from (9.134). The data of green circles and
red triangles are obtained through (9.107) using the emission directions and the emission line
locations. The laser oscillation line is indicated with a blue square as a TM30 mode occurring at
568.1 nm. The black solid curve drawn at the upper right indicates the dispersion of the phase
refractive index of the BP3T crystal pertinent to the one principal axis [15]. (Reproduced from
Inada Y, Kawata Y, Kawai T, Hotta S, Yamao T (2020) Laser oscillation at an intended wavelength
from a distributed feedback structure on anisotropic organic crystals. J Appl Phys 127(5): 053102/6
pages [13], with the permission of AIP Publishing. https://doi.org/10.1063/1.5129425)

engraved with a one-dimensional diffraction grating so that the a-axis of the BP3T
crystal was parallel to the grating wavevector. With this device geometry, the angle-
dependent spectroscopy was carried out in a weak excitation regime to determine the
wavelength dependence of the effective index. At the same time, strong excitation
measurement was performed to analyze the laser oscillation data.
394 9 Light Quanta: Radiation and Absorption

Table 9.2 Laser characteristics of BP3T crystal devicesa


Intended
Crystal Grating Actual lasing emission Effective
thickness period Λ wavelength λa wavelength λi index neff Validity
Device d (nm) (nm) (nm) (nm) at λa Vb
D1 428 319.8 568.1 569.0 2.65c 0.994
2.66d
D2 295 349.2 569.6 569.0 2.48c 0.988
2.45d
a
Reproduced from Inada Y, Kawata Y, Kawai T, Hotta S, Yamao T (2020) Laser oscillation at an
intended wavelength from a distributed feedback structure on anisotropic organic crystals. J Appl
Phys 127(5): 053102/6 pages [13], with the permission of AIP Publishing. https://doi.org/10.1063/
1.5129425
b
See (9.142) in the text
c
Estimated from neff(WG) in (9.134)
d
Estimated from neff(PM) in (9.141)

The BP3T crystal exhibited a high emission gain around 570 nm [17]. Accord-
ingly, setting the laser oscillation wavelength at 569.0 nm, Inada and co-workers
made the BP3T crystal devices with different crystal thicknesses. As an example,
using a crystal of 428 nm in thickness (device D1; Table 9.2), on the basis of (9.134)
they estimated the effective index neff at 569.0 nm in the direction of the a-axis of the
BP3T crystal. Meanwhile, using (9.141) and assuming m to be 3 in (9.141), the
grating period was decided with the actual period of the resulting device being
319.8 nm (see Table 9.2).
The results of the emission measurements for the device D1 are displayed in
Fig. 9.18b. The figure shows the dispersion of the effective indices neff of the BP3T
crystal. Once again, the TM20 (blueshifted) and TM10 (redshifted) modes are
seamlessly jointed. As in the case of Fig. 9.9a, the laser oscillation line shows up
as a TM30 mode among other modes. That is, the effective index neff of the laser
oscillation line sits on the dispersion curve consisting of other emission lines arising
from the weak excitation.
Figure 9.19 shows the distinct single-mode laser oscillation line occurring at
568.1 nm (Table 9.2). This clearly indicates that the laser oscillation has been
achieved at a location very close to the designed wavelength (at 569.0 nm). Compare
the spectrum of Fig. 9.19 with that of Fig. 9.7. Manifestation of the single-mode line
is evident and should be contrasted with the longitudinal multimode laser oscillation
of Fig. 9.7.
To quantify the validity of (9.134), the following quantity V is useful [13]:

neff ðWGÞ - neff ðPMÞ


V  1- , ð9:142Þ
neff ðWGÞ

where V stands for validity of (9.134); neff (WG) was estimated from the waveguide
condition of (9.134) in combination with the emission data obtained with the device;
neff (PM) was obtained from the phase matching condition described by (9.107),
9.5 Lasers 395

Fig. 9.19 Single-mode


laser oscillation spectrum of 8 Excitation
pulse energy
an organic semiconductor

Intensity (10 counts)


crystal BP3T (the device –2
D1). A sharply resolved (J cm )
laser oscillation line is noted 6
at 568.1 nm. (Reproduced 121.7
from Inada Y, Kawata Y,
118.4
4
Kawai T, Hotta S, Yamao T
(2020) Laser oscillation at
an intended wavelength
4 113.0
from a distributed feedback 109.3
structure on anisotropic
105.0
organic crystals. J Appl
Phys 127(5): 053102/6 2
pages [13], with the
permission of AIP
Publishing. https://doi.org/
10.1063/1.5129425) 0
565 570
Wavelength (nm)

(9.139), or (9.141). We have 0 ≤ V ≤ 1 and a larger number of V (close to 1) means


that the validity of (9.134) is high (see Table 9.2). Note that the validity estimation
by means of (9.142) can equally be applied to both the laser oscillation and non-laser
emission lines.
In a word, as long as the thickness of a crystal becomes available along with
permittivity tensor of the crystal, one can readily construct a single-mode laser that
oscillates at an intended wavelength.
As discussed above in detail, Examples 9.1, 9.2, and 9.3 enable us to design a
high-performance laser device using an organic crystal that is characterized by a
pretty complicated crystal structure associated with anisotropic refractive indices.
From a practical point of view, a laser device is often combined with a diffraction
grating. Since organic semiconductors are less robust as compared to inorganic
semiconductors, special care should be taken not to damage an organic semicon-
ductor crystal when equipping it with the diffraction grating via rigorous semicon-
ductor processing. In that respect, Inada and co-workers further developed an
effective method [18]. The method is based upon the focused ion beam lithography
in combination with subsequent plasma etching. They skillfully minimized a damage
from which an organic crystal might well suffer.
The above-mentioned design principle enables one to construct effective laser
devices that consist of light-emitting materials either organic or inorganic more
widely. At the same time, these examples are expected to provide an effective
methodology in interdisciplinary fields encompassing solid-state physics and solid-
state chemistry as well as device physics.
396 9 Light Quanta: Radiation and Absorption

9.6 Mechanical System

As outlined above, two-level atoms have distinct characteristics in connection with


lasers. Electromagnetic waves similarly confined within a one-dimensional cavity
exhibit related properties and above all have many features in common with a
harmonic oscillator [19].
We have already described several features and properties of the harmonic
oscillator (Chap. 2). Meanwhile, we have briefly discussed the formation of electro-
magnetic stationary waves (Chap. 8). There are several resemblance and correspon-
dence between the harmonic oscillator and electromagnetic stationary waves when
we view them as mechanical systems. The point is that in a harmonic oscillator the
position and momentum are in-quadrature relationship; i.e., their phase difference is
π/2. For the electromagnetic stationary waves, the electric field and magnetic field
are in-quadrature relationship as well.
In Chap. 8, we examine the conditions under which stationary waves are formed.
In a dielectric medium both sides of which are equipped with metal layers
(or mirrors), the electric field is described as

E = 2E 1 εe sin ωt sin kz: ð8:201Þ

In (8.201), we assumed that forward and backward electromagnetic waves are


propagating in the direction of the z-axis. Here, we assume that the interfaces
(or walls) are positioned at z = 0 and z = L. Within a domain [0, L], the two
waves form a stationary wave. Since this expression assumed two waves, the
electromagnetic energy was doubled. To normalize the energy sopthat a single
wave is contained, the amplitude E1 of (8.201) should be divided by 2. Therefore,
we think of a following description for E:
p p
E = 2Ee1 sin ωt sin kz or Ex = 2E sin ωt sin kz,

where we designated the polarization vector as a direction of the x-axis. At the same
time, we omitted the index from the amplitude. Thus, from the second equation of
(7.65) we have
p
∂H y 1 ∂E x 2Ek
=- =- sin ωt cos kz: ð9:143Þ
∂t μ ∂z μ

Note that this equation appeared as (8.131) as well. Integrating both sides of
(9.143), we get
p
2Ek
Hy = cos ωt cos kz:
μω

Using a relation ω = vk, we have


9.6 Mechanical System 397

p
2E
Hy = cos ωt cos kz þ C,
μv

where v is a light velocity in the dielectric medium and C is an integration constant.


Removing C and putting

E
H ,
μv

we have
p
H y = 2H cos ωt cos kz:

Using a vector expression, we have


p
H = e2 2H cos ωt cos kz:

Thus, E (ke1), H (ke2), and n (ke3) form the right-handed system in this order. As
noted in Sect. 8.8, at the interface (or wall) the electric field and magnetic field form
nodes and antinodes, respectively. Namely, the two fields are in-quadrature.
Let us calculate electromagnetic energy of the dielectric medium within a cavity.
In the present case the cavity is meant as the dielectric sandwiched by a couple of
metal layer. We have

W = W e þ W m,

where W is the total electromagnetic energy; We and Wm are electric and magnetic
energies, respectively. Let the length of the cavity be L. Then the energy per unit
cross-section area is described as

L L
ε μ
We = E2 dz and W m = H 2 dz:
2 0 2 0

Performing integration, we get

ε 2
We = LE sin 2 ωt,
2
2 ð9:144Þ
μ μ E ε 2
Wm = LH 2 cos 2 ωt = L cos 2 ωt = LE cos 2 ωt,
2 2 μv 2

where we used 1/v2 = εμ with the last equality. Thus, we have


398 9 Light Quanta: Radiation and Absorption

ε 2 μ 2
W= LE = LH : ð9:145Þ
2 2

Representing an energy per unit volume as W~e , W~m , and W,


~ we have

ε ε ~ = ε E2 = μ H 2 :
W~e = E 2 sin2 ωt, W~m = E2 cos 2 ωt, W ð9:146Þ
2 2 2 2

In Chap. 2, we treated motion of a harmonic oscillator. There, we had

v0
xð t Þ = sin ωt = x0 sin ωt: ð2:7Þ
ω

Here, we have defined an amplitude of the harmonic oscillation as x0 (>0)


given by

v0
x0  :
ω

Then, momentum is described as

pðt Þ = mxð_t Þ = mωx0 cos ωt:

Defining

p0  mωx0 ,

we have

pðt Þ = p0 cos ωt:

Let a kinetic energy and potential energy of the oscillator be K and V, respec-
tively. Then, we have

1 1 2
K= pð t Þ2 = p cos 2 ωtxðt Þ,
2m 2m 0
1 1
V = mω2 xðt Þ2 = mω2 x0 2 sin 2 ωtxðt Þ, ð9:147Þ
2 2
1 2 1 1
W =K þ V = p = mv0 2 = mω2 x0 2 :
2m 0 2 2

Comparing (9.146) and (9.147), we recognize the following relationship in


energy between the electromagnetic fields and harmonic oscillator motion [19]:
p p p p
mωx0 ⟷ εE and p0 = m⟷ μH: ð9:148Þ
References 399

Thus, there is an elegant contradistinction between the dynamics of electromag-


netic fields in cavity and motion of a harmonic oscillator. In fact, quantum electro-
dynamics (QED) is based upon the treatment of a quantum harmonic oscillator
introduced in Chap. 2. Various aspects of QED will be studied in Part V along
with the Dirac equation and its handling.

References

1. Moore WJ (1955) Physical chemistry, 3rd edn. Prentice-Hall, Englewood Cliffs


2. Smith FG, King TA, Wilkins D (2007) Optics and photonics, 2nd edn. Wiley, Chichester
3. Sunakawa S (1965) Theoretical electromagnetism. Kinokuniya, Tokyo (in Japanese)
4. Loudon R (2000) The quantum theory of light, 3rd edn. Oxford University Press, Oxford
5. Born M, Wolf E (2005) Principles of optics, 7th edn. Cambridge University Press, Cambridge
6. Yamao T, Okuda Y, Makino Y, Hotta S (2011) Dispersion of the refractive indices of
thiophene/phenylene co-oligomer single crystals. J Appl Phys 110(5):053113
7. Yamao T, Higashihara S, Yamashita S, Sano H, Inada Y, Yamashita K, Ura S, Hotta S (2018)
Design principle of high-performance organic single-crystal light-emitting devices. J Appl Phys
123(23):235501
8. Siegman AE (1986) Lasers. University Science Books, Sausalito
9. Palmer C (2005) Diffraction grating handbook, 6th edn. Newport Corporation, New York
10. Hotta S, Yamao T (2011) The thiophene/phenylene co-oligomers: exotic molecular semicon-
ductors integrating high-performance electronic and optical functionalities. J Mater Chem
21(5):1295–1304
11. Yamao T, Nishimoto Y, Terasaki K, Akagami H, Katagiri T, Hotta S, Goto M, Azumi R,
Inoue M, Ichikawa M, Taniguchi Y (2010) Single crystal growth and charge transport proper-
ties of an alternating co-oligomer composed of thiophene and phenylene rings. Jpn J Appl Phys
49(4S):04DK20
12. Yariv A, Yeh P (2003) Optical waves in crystals: propagation and control of laser radiation.
Wiley, New York
13. Inada Y, Kawata Y, Kawai T, Hotta S, Yamao T (2020) Laser oscillation at an intended
wavelength from a distributed feedback structure on anisotropic organic crystals. J Appl Phys
127(5):053102
14. Pain HJ (2005) The physics of vibrations and waves, 6th edn. Wiley, Chichester
15. Sakurai Y, Hayashi W, Yamao T, Hotta S (2014) Phase refractive index dispersions of organic
oligomer crystals with different molecular alignments. Jpn J Appl Phys 53(2S):02BB01
16. Yamao T, Yamamoto K, Hotta S (2009) A high optical-gain organic crystal comprising
thiophene/phenylene co-oligomer nanomolecules. J Nanosci Nanotechnol 9(4):2582–2585
17. Bisri SZ, Takenobu T, Yomogida Y, Shimotani H, Yamao T, Hotta S, Iwasa Y (2009) High
mobility and luminescent efficiency in organic single-crystal light-emitting transistors. Adv
Funct Mater 19:1728–1735
18. Inada Y, Yamashita S, Murakami S, Takahashi K, Yamao T, Hotta S (2021) Direct fabrication
of a diffraction grating onto organic oligomer crystals by focused ion beam lithography
followed by plasma etching. Jpn J Appl Phys 60(12):120901
19. Fox M (2006) Quantum optics. Oxford University Press, Oxford
Chapter 10
Introductory Green’s Functions

In this chapter, we deal with various properties and characteristics of differential


equations, especially first-order linear differential equations (FOLDEs) and
second-order linear differential equations (SOLDEs). These differential equations
are characterized by differential operators and boundary conditions (BCs). Of these,
differential operators appearing in SOLDEs are particularly important. Under appro-
priate conditions, the said operators can be converted to Hermitian operators. The
SOLDEs associated with classical orthogonal polynomials play a central role in
many fields of mathematical physics including quantum mechanics and electromag-
netism. We study the general principle of SOLDEs in relation to several specific
SOLDEs we have studied in Part I and examine general features of an eigenvalue
problem and an initial-value problem (IVP). In this context, Green’s functions
provide a powerful tool for solving SOLDEs. For a practical purpose, we deal
with actual construction of Green’s functions. In Sect. 8.8, we dealt with steady-
state characteristics of electromagnetic waves in dielectrics in terms of propagation,
reflection, and transmission. When we consider transient characteristics of electro-
magnetic and optical phenomena, we often need to deal with SOLDEs having
constant coefficients. This is well known in connection with a motion of a damped
harmonic oscillator. In the latter part of this chapter, we treat the initial-value
problem of a SOLDE of this type.

10.1 Second-Order Linear Differential Equations


(SOLDEs)

A general form of n-th order linear differential equations has the following form:

© Springer Nature Singapore Pte Ltd. 2023 401


S. Hotta, Mathematical Physical Chemistry,
https://doi.org/10.1007/978-981-99-2512-4_10
402 10 Introductory Green’s Functions

Table 10.1 Characteristics of SOLDEs


Type I Type II Type III Type IV
Equation Homogeneous Homogeneous Inhomogeneous Inhomogeneous
Boundary conditions Homogeneous Inhomogeneous Homogeneous Inhomogeneous

dn u dn - 1 u du
an ð x Þ n þ an - 1 ðxÞ n - 1 þ ⋯ þ a1 ðxÞ þ a0 ðxÞu = dðxÞ: ð10:1Þ
dx dx dx

If d(x) = 0, the differential equation is said to be homogeneous; otherwise, it is


called inhomogeneous. Equation (10.1) is a linear function of u and its derivatives.
Likewise, we have a SOLDE such that

d2 u du
að x Þ 2
þ bðxÞ þ cðxÞu = dðxÞ: ð10:2Þ
dx dx

In (10.2) we assume that the variable x is real. The equation can be solved under
appropriate boundary conditions (BCs). A general form of BCs is described as

du du
B1 ðuÞ = α1 uðaÞ þ β1 x=a þ γ 1 uðbÞ þ δ1 = σ1, ð10:3Þ
dx dx x=b

du du
B2 ðuÞ = α2 uðaÞ þ β2 x=a þ γ 2 uðbÞ þ δ2 = σ2, ð10:4Þ
dx dx x=b

where α1, β1, γ 1, δ1 σ 1, etc. are real constants; u(x) is defined in an interval [a, b],
where a and b can be infinity (i.e., ±1). The LHS of B1(u) and B2(u) is referred to as
boundary functionals [1, 2]. If σ 1 = σ 2 = 0, the BCs are called homogeneous;
otherwise, the BCs are said to be inhomogeneous. In combination with the inhomo-
geneous equation expressed as (10.2), Table 10.1 summarizes characteristics of
SOLDEs. We have four types of SOLDEs according to homogeneity and inhomo-
geneity of equations and BCs.
Even though SOLDEs are mathematically tractable, yet it is not easy necessarily
to solve them depending upon the nature of a(x), b(x), and c(x) of (10.2). Nonethe-
less, if those functions are constant coefficients, it can readily be solved. We will deal
with SOLDEs of that type in great deal later. Suppose that we find two linearly
independent solutions u1(x) and u2(x) of a following homogeneous equation:

d2 u du
að x Þ þ bðxÞ þ cðxÞu = 0: ð10:5Þ
dx2 dx

Then, any solution u(x) of (10.5) can be expressed as their linear combination
such that
10.1 Second-Order Linear Differential Equations (SOLDEs) 403

uðxÞ = c1 u1 ðxÞ þ c2 u2 ðxÞ, ð10:6Þ

where c1 and c2 are some arbitrary constants.


In general, suppose that there are arbitrarily chosen n functions; i.e., f1(x), f2(x),
⋯, fn(x). Suppose a following equation with those functions:

a1 f 1 ðxÞ þ a2 f 2 ðxÞ þ ⋯ þ an f n ðxÞ = 0, ð10:7Þ

where a1, a2, ⋯, and an are constants. If a1 = a2 = ⋯ = an = 0, (10.7) always holds.


In this case, (10.7) is said to be a trivial linear relation. If f1(x), f2(x), ⋯, and fn(x)
satisfy a non-trivial linear relation, f1(x), f2(x), ⋯, and fn(x) are said to be linearly
dependent. That is, the non-trivial expression means that in (10.7) at least one of a1,
a2, ⋯, and an is non-zero. Suppose that an ≠ 0. Then, from (10.7), fn(x) is
expressed as

a1 a a
f n ð xÞ = - f ðxÞ - 2 f 2 ðxÞ - ⋯ - n - 1 f n - 1 ðxÞ: ð10:8Þ
an 1 an an

If f1(x), f2(x), ⋯, and fn(x) are not linearly dependent, they are called linearly
independent. In other words, the statement that f1(x), f2(x), ⋯, and fn(x) are linearly
independent is equivalent to that (10.7) holds if and only if a1 = a2 = ⋯ = an = 0.
We will have relevant discussion in Part III.
Now suppose that with the above two linearly independent functions u1(x) and
u2(x), we have

a1 u1 ðxÞ þ a2 u2 ðxÞ = 0: ð10:9Þ

Differentiating (10.9), we have

du1 ðxÞ du ðxÞ


a1 þ a2 2 = 0: ð10:10Þ
dx dx

Expressing (10.9) and (10.10) in a matrix form, we get

u1 ðxÞ u2 ð x Þ
a1
du1 ðxÞ du2 ðxÞ = 0: ð10:11Þ
a2
dx dx

Thus, that u1(x) and u2(x) are linearly independent is equivalent to that the
following expression holds:
404 10 Introductory Green’s Functions

u1 ð x Þ u2 ð xÞ
du1 ðxÞ du2 ðxÞ  W ðu1 , u2 Þ ≠ 0, ð10:12Þ
dx dx

where W(u1, u2) is called Wronskian of u1(x) and u2(x). In fact, if W(u1, u2) = 0, then
we have

du2 du
u1 - u2 1 = 0: ð10:13Þ
dx dx

This implies that there is a functional relationship between u1 and u2. In fact, if we
can express as u2 = u2(u1(x)), then du dx = du1 dx . That is,
2 du2 du1

du2 du du du du2 du2 du1


u1 = u1 2 1 = u2 1 or u1 = u2 or = , ð10:14Þ
dx du1 dx dx du1 u2 u1

where the second equality of the first equation comes from (10.13). The third
equation can easily be integrated to yield

u2 u2
ln = c or = ec or u2 = e c u 1 : ð10:15Þ
u1 u1

Equation (10.15) shows that u1(x) and u2(x) are linearly dependent. It is easy to
show if u1(x) and u2(x) are linearly dependent, W(u1, u2) = 0. Thus, we have a
following statement:
Two functions are linearly dependent. , W(u1, u2) = 0.
Then, as the contraposition of this statement we have
Two functions are linearly independent. , W (u1, u2) ≠ 0.
Conversely, suppose that we have another solution u3(x) for (10.5) besides u1(x)
and u2(x). Then, we have

d 2 u1 du
að x Þ 2
þ bðxÞ 1 þ cðxÞu1 = 0,
dx dx
2
d u du
aðxÞ 22 þ bðxÞ 2 þ cðxÞu2 = 0, ð10:16Þ
dx dx
2
d u du
aðxÞ 23 þ bðxÞ 3 þ cðxÞu3 = 0:
dx dx

Again rewriting (10.16) in a matrix form, we have


10.1 Second-Order Linear Differential Equations (SOLDEs) 405

d 2 u1 du1
u1
dx2 dx
a
d 2 u2 du2
u2 b = 0: ð10:17Þ
dx2 dx
c
d 2 u3 du3
u3
dx2 dx

A necessary and sufficient condition to obtain a non-trivial solution (i.e., a


solution besides a = b = c = 0) is that [1]

d 2 u1 du1
u1
dx2 dx
d 2 u2 du2
u2 = 0: ð10:18Þ
dx2 dx
d 2 u3 du3
u3
dx2 dx

Note here that

d 2 u1 du1
u1 u1 u2 u3
dx2 dx
du1 du2 du3
d 2 u2 du2
u2 = - dx dx dx  - W ðu1 , u2 , u3 Þ, ð10:19Þ
dx2 dx
d 2 u1 d 2 u2 d 2 u3
d 2 u3 du3
u3 dx2 dx2 dx2
dx2 dx

where W(u1, u2, u3) is Wronskian of u1(x), u2(x), and u3(x). In the above relation, we
used the fact that a determinant of a matrix is identical to that of its transposed matrix
and that a determinant of a matrix changes the sign after permutation of row vectors.
To be short, a necessary and sufficient condition to get a non-trivial solution is that
W(u1, u2, u3) vanishes.
This implies that u1(x), u2(x), and u3(x) are linearly dependent. However, we have
assumed that u1(x) and u2(x) are linearly independent, and so (10.18) and (10.19)
mean that u3(x) must be described as a linear combination of u1(x) and u2(x). That is,
we have no third linearly independent solution. Consequently, the general solution
of (10.5) must be given by (10.6). In this sense, u1(x) and u2(x) are said to be a
fundamental set of solutions of (10.5).
Next, let us consider the inhomogeneous equation of (10.2). Suppose that up(x) is
a particular solution of (10.2). Let us think of a following function v(x) such that:
406 10 Introductory Green’s Functions

uðxÞ = vðxÞ þ up ðxÞ: ð10:20Þ

Substituting (10.20) for (10.2), we have

d2 v dv d 2 up dup
að x Þ þ b ð x Þ þ c ð x Þv þ a ð x Þ þ bð x Þ þ cðxÞup = dðxÞ: ð10:21Þ
dx2 dx dx2 dx

Therefore, we have

d2 v dv
að x Þ þ bðxÞ þ cðxÞv = 0:
dx2 dx

But v(x) can be described by a linear combination of u1(x) and u2(x) as in the case
of (10.6). Hence, the general solution of (10.2) should be expressed as

uðxÞ = c1 u1 ðxÞ þ c2 u2 ðxÞ þ up ðxÞ, ð10:22Þ

where c1 and c2 are arbitrary (complex) constants.

10.2 First-Order Linear Differential Equations (FOLDEs)

In a discussion that follows, first-order linear differential equations (FOLDEs)


supply us with useful information. A general form of FOLDEs is expressed as

du
að x Þ þ bðxÞu = d ðxÞ ½aðxÞ ≠ 0: ð10:23Þ
dx

An associated boundary condition is given by a boundary functional B(u)


such that

BðuÞ = αuðaÞ þ βuðbÞ = σ, ð10:24Þ

where α, β, and σ are real constants; u(x) is defined in an interval [a, b]. If in (10.23)
d(x)  0, (10.23) can readily be integrated to yield a solution. Let us multiply both
sides of (10.23) by w(x). Then we have

du
wðxÞaðxÞ þ wðxÞbðxÞu = wðxÞdðxÞ: ð10:25Þ
dx

Here, we define p(x) as follows:


10.2 First-Order Linear Differential Equations (FOLDEs) 407

pðxÞ  wðxÞaðxÞ, ð10:26Þ

where w(x) is called a weight function. As mentioned in Sect. 2.4, the weight
function is a real and non-negative function within the domain considered. Here
we suppose that

dpðxÞ
= wðxÞbðxÞ: ð10:27Þ
dx

Then, (10.25) can be rewritten as

d
½pðxÞu = wðxÞdðxÞ: ð10:28Þ
dx

Thus, we can immediately integrate (10.28) to obtain a solution

x
1
u= wðx0 Þdðx0 Þdx0 þ C , ð10:29Þ
pð x Þ

where C is an arbitrary integration constant.


To seek w(x), from (10.26) and (10.27) we have

b
p0 = ðwaÞ0 = wb = wa : ð10:30Þ
a

This can easily be integrated for wa to be expressed as

b C0 b
wa = C0 exp dx or w= exp dx , ð10:31Þ
a a a
0
where C′ is an arbitrary integration constant. The quantity Ca must be non-negative so
that w can be non-negative.
Example 10.1 Let us think of a following FOLDE within an interval [a, b]; i.e.,
a ≤ x ≤ b.

du
þ xu = x: ð10:32Þ
dx

A boundary condition is set such that

uðaÞ = σ: ð10:33Þ

Notice that (10.33) is obtained by setting α = 1 and β = 0 in (10.24). Following


the above argument, we obtain a solution described as
408 10 Introductory Green’s Functions

x
1
uð x Þ = wðx0 Þd ðx0 Þdx0 þ uðaÞpðaÞ : ð10:34Þ
pð x Þ a

Also, we have

x
1 2
pðxÞ = wðxÞ = exp x0 dx0 = exp x - a2 : ð10:35Þ
a 2

The integration of RHS of (10.34) can be performed as follows:


x x
1 02
wðx0 Þdðx0 Þdx0 ¼ x0 exp x - a2 dx0
a a 2
x2 =2
a2
¼ exp - exp tdt
2 a2 =2
ð10:36Þ
a2 x2 a2
¼ exp - exp - exp
2 2 2
a2 x2
¼ exp - exp - 1 ¼ pðxÞ - 1,
2 2

where with the second equality we used an integration by substitution of 12 x0 ⟶t.


2

Considering (10.33) and putting p(a) = 1 in (10.34), (10.34) is rewritten as

1 σ-1
uð x Þ = ½ pð x Þ - 1 þ σ  = 1 þ
pð x Þ pð x Þ
ð10:37Þ
1 2
= 1 þ ðσ - 1Þ exp a -x ,2
2

where with the last equality we used (10.35).


Notice that when we put σ = 1 in (10.37), we get u(x)  1. In fact, u(x)  1 is
certainly a solution of (10.32) and satisfies the BC of (10.33). Uniqueness of a
solution is thus guaranteed.
In the next two examples, we make general discussions.
Example 10.2 Let us consider a following differential operator:

d
Lx = : ð10:38Þ
dx

We think of a following identity using an integration by parts:


10.2 First-Order Linear Differential Equations (FOLDEs) 409

b b
d d 
dxφ ψ þ dx φ ψ = ½φ ψ ba : ð10:39Þ
a dx a dx

Rewriting this, we get

b b 
d d
dxφ ψ - dx - φ ψ = ½φ ψ ba : ð10:40Þ
a dx a dx

Looking at (10.40), we notice that LHS comprises a difference between two


integrals, while RHS referred to as a boundary term (or surface term) does not
contain an integral.
Recalling the expression (1.128) and defining

d
L~x  -
dx

in (10.40), we have

hφjLx ψ i - L~x φjψ = ½φ ψ ba : ð10:41Þ

Here, RHS of (10.41) needs to vanish so that we can have

hφjLx ψ i = L~x φjψ : ð10:42Þ

Meanwhile, adopting the expression (1.112) with respect to an adjoint operator,


we have a following expression such that

hφjLx ψ i = Lx { φjψ : ð10:43Þ

Comparing (10.42) and (10.43) and considering that φ and ψ are arbitrary
functions, we have L~x = Lx { . Thus, as an operator adjoint to Lx we get

d
Lx { = L~x = - = - Lx :
dx

Notice that only if the surface term vanishes, the adjoint operator Lx{ can
appropriately be defined. We will encounter a similar expression again in Part III.
We add that if with an operator A we have a relation described by

A{ = - A, ð10:44Þ

the operator A is said to be anti-Hermitian. We have already encountered such an


operator in Sects. 1.5 and 3.7.
410 10 Introductory Green’s Functions

Let us then examine on what condition the surface term vanishes. The RHS of
(10.40) and (10.41) is given by

φ ðbÞψ ðbÞ - φ ðaÞψ ðaÞ:

For this term to vanish, we should have

φ ðbÞ ψ ðaÞ
φ ðbÞψ ðbÞ = φ ðaÞψ ðaÞ or = :
φ ðaÞ ψ ðbÞ

If ψ(b) = 2ψ(a), then we should have φ(b) = φ(a)/2 for the surface term to
vanish. Recalling (10.24), the above conditions are expressed as

Bðψ Þ = 2ψ ðaÞ - ψ ðbÞ = 0, ð10:45Þ

1
B 0 ð φÞ = φðaÞ - φðbÞ = 0: ð10:46Þ
2

The boundary functional B′(φ) is said to be adjoint to B(ψ). The two boundary
functionals are admittedly different. If, however, we set ψ(b) = ψ(a), then we should
have φ(b) = φ(a) for the surface term to vanish. That is

Bðψ Þ = ψ ðaÞ - ψ ðbÞ = 0, ð10:47Þ

B0 ðφÞ = φðaÞ - φðbÞ = 0: ð10:48Þ

Thus, the two functionals are identical and ψ and φ satisfy homogeneous BCs
with respect to these functionals.
As discussed above, a FOLDE is characterized by its differential operator as well
as a BC (or boundary functional). This is similarly the case with SOLDEs as well.
Example 10.3 Next, let us consider a following differential operator:

1 d
Lx = : ð10:49Þ
i dx

As in the case of Example 10.2, we have

b b 
1 d 1 d 1
dxφ ψ - dx φ ψ = ½φ ψ ba : ð10:50Þ
a i dx a i dx i

Also rewriting (10.50) using an inner product notation, we get


10.3 Second-Order Differential Operators 411

1  b
hφjLx ψ i - hLx φjψ i = ½φ ψ a : ð10:51Þ
i

Apart from the factor 1i , RHS of (10.51) is again given by

φ ðbÞψ ðbÞ - φ ðaÞψ ðaÞ:

Repeating a discussion similar to Example 10.2, when the surface term vanishes,
we get

hφjLx ψ i = hLx φjψ i: ð10:52Þ

Comparing (10.43) and (10.52), we have

hLx φjψ i = Lx { φjψ : ð10:53Þ

Again considering that φ and ψ are arbitrary functions, we get

Lx { = Lx : ð10:54Þ

As in (10.54), if the differential operator is identical to its adjoint operator, such


an operator is called self-adjoint. On the basis of (1.119), Lx would apparently be
Hermitian. However, we have to be careful to assure that Lx is Hermitian. For a
differential operator to be Hermitian, (1) the said operator must be self-adjoint.
(2) The two boundary functionals adjoint to each other must be identical. In other
words, ψ and φ must satisfy the same homogeneous BCs with respect to these
functionals. In this example, we must have the same boundary functionals as those
described by (10.47) and (10.48). If and only if the conditions (1) and (2) are
satisfied, the operator is said to be Hermitian. It seems somewhat a formal expres-
sion. Nonetheless, satisfaction of these conditions is also the case with second-order
differential operators so that these operators can be Hermitian. In fact, SOLDEs we
studied in Part I are essentially dealt with within the framework of the aforemen-
tioned formalism.

10.3 Second-Order Differential Operators

The second-order differential operators are the most common operators and fre-
quently treated in mathematical physics. The general differential operators are
described as
412 10 Introductory Green’s Functions

d2 d
Lx = aðxÞ þ bðxÞ þ cðxÞ, ð10:55Þ
dx2 dx

where a(x), b(x), and c(x) can in general be complex functions of a real variable x.
Let us think of the following identities [1]:

d2 u d2 ðav Þ d du dðav Þ
v a 2
-u 2
= av -u ,
dx dx dx dx dx
du dðbv Þ d ð10:56Þ
v b þ u = ½buv ,
dx dx dx
v cu - ucv = 0:

Summing both sides of (10.56), we have an identity

d2 u du d 2 ðav Þ d ðbv Þ
v a 2
þ b þ cu - u - þ cv
dx dx dx2 dx
ð10:57Þ
d du dðav Þ d
= av -u þ ½buv :
dx dx dx dx

Hence, following the expressions of Sect. 10.2, we define Lx{ such that

 d2 ðav Þ dðbv Þ
Lx { v  - þ cv : ð10:58Þ
dx2 dx

Taking a complex conjugate of both sides, we get

d 2 ð a v Þ d ð b vÞ
Lx { v = - þ c v: ð10:59Þ
dx2 dx

Considering the differential of a product function, we have as Lx{

d2 da d d2 a db
Lx { = a 2
þ 2 - b þ - þ c : ð10:60Þ
dx dx dx dx2 dx

Replacing (10.57) with (10.55) and (10.59), we have

 d du d ðav Þ
v ðLx uÞ - Lx { v u = av -u þ buv : ð10:61Þ
dx dx dx

Assuming that the relevant SOLDE is defined in [r, s] and integrating (10.61)
within that interval, we get
10.3 Second-Order Differential Operators 413

d ðav Þ
s s
 du
dx v ðLx uÞ - Lx { v u = av -u þ buv : ð10:62Þ
r dx dx r

Using the definition of an inner product described in (1.128) and rewriting


(10.62), we have

dðav Þ
s
du
hvjLx ui - Lx { vju = av -u þ buv :
dx dx r

Here if RHS of the above (i.e., the surface term of the above expression) vanishes,
we get

hvjLx ui = Lx { vju :

We find that this notation is consistent with (1.112).


Bearing in mind this situation, let us seek a condition under which the differential
operator Lx is Hermitian. Suppose here that a(x), b(x), and c(x) are all real and that

daðxÞ
= bðxÞ: ð10:63Þ
dx

Then, instead of (10.60), we have

d2 d
Lx { = aðxÞ þ bð x Þ þ cð xÞ = L x :
dx2 dx

Thus, we are successful in constituting a self-adjoint operator Lx. In that case,


(10.62) can be rewritten as
s
s
du dv
dxfv ðLx uÞ - ½Lx v ug = a v -u : ð10:64Þ
r dx dx r

Notice that b(x) is eliminated from (10.64). If RHS of (10.64) vanishes, we get

hvjLx ui = hLx vjui:

This notation is consistent with (1.119) and the Hermiticity of Lx becomes well-
defined.
If we do not have the condition of dadxðxÞ = bðxÞ, how can we deal with the
problem? The answer is that following the procedures in Sect. 10.2, we can convert
Lx to a self-adjoint operator by multiplying Lx by a weight function w(x) introduced
in (10.26), (10.27), and (10.31). Replacing a(x), b(x), and c(x) with w(x)a(x), w(x)
b(x), and w(x)c(x), respectively, in the identity (10.57), we rewrite (10.57) as
414 10 Introductory Green’s Functions

d2 u du d 2 ðawv Þ dðbwv Þ
v aw þ bw þ cwu - u - þ cwv
dx2 dx dx2 dx
ð10:65Þ
d du dðawv Þ d
= awv -u þ ½bwuv :
dx dx dx dx

Let us calculate {⋯} of the second term for LHS of (10.65). Using the conditions
of (10.26) and (10.27), we have

d 2 ðawv Þ dðbwv Þ
- þ cwv
dx2 dx

= ðawÞ ′ v þ awv
′ ′
- bw ′ v þ bwv þ cwv
′ ′
= bwv þ awv ′
- bw ′ v þ bwv þ cwv
ð10:66Þ
= ðbwÞ ′ v þ bwv þ ðawÞ ′ v þ awv - ðbwÞ ′ v - bwv þ cwv
′ ′ ″ ′

= ðawÞ ′ v þ awv þ cwv = bwv þ awv þ cwv


′ ″ ′ ″

″ ′ ″ ′
= w av þ bv þ cv = w a v þ b v þ c v

= w av″ þ bv ′ þ cv :

The second last equality of (10.66) is based on the assumption that a(x), b(x), and
c(x) are real functions. Meanwhile, for RHS of (10.65) we have

d du dðawv Þ d
awv -u þ ½bwuv 
dx dx dx dx
d du dv d
= awv - uðawÞ0 v - uaw þ ½bwuv 
dx dx dx dx
ð10:67Þ
d du dv d
= awv - ubwv - uaw þ ½bwuv 
dx dx dx dx
d du dv d du dv
= aw v -u = p v -u :
dx dx dx dx dx dx

With the last equality of (10.67), we used (10.26). Using (10.66) and (10.67), we
rewrite (10.65) once again as

d2 u du d2 v dv
v w a 2
þ b þ cu - uw a 2 þ b þ cv
dx dx dx dx

ð10:68Þ
d du dv
= p v -u :
dx dx dx

Then, integrating (10.68) from r to s, we finally get


10.3 Second-Order Differential Operators 415

s
s
du dv
dxwðxÞfv ðLx uÞ - ½Lx v ug = p v -u : ð10:69Þ
r dx dx r

The relations (10.69) along with (10.62) are called the generalized Green’s
identity.
We emphasize that as far as the coefficients a(x), b(x), and c(x) in (10.55) are real
functions, the associated differential operator Lx can be converted to a self-adjoint
form following the procedures of (10.66) and (10.67).
In the above, LHS of the original homogeneous differential equation (10.5) is
rewritten as

d2 u du
aðxÞwðxÞ þ bðxÞwðxÞ þ cðxÞwðxÞu
dx2 dx
d du
= pð x Þ þ cwðxÞu:
dx dx

Rewriting this, we have

1 d du
Lx u = pðxÞ þ cu ½wðxÞ > 0, ð10:70Þ
wðxÞ dx dx

where we have

dpðxÞ
pðxÞ = aðxÞwðxÞ and = bðxÞwðxÞ: ð10:71Þ
dx

The latter equation of (10.71) corresponds to (10.63) if we assume w(x)  1. When


the differential operator Lx is defined as (10.70), Lx is said to be self-adjoint with
respect to a weight function of w(x).
Now we examine boundary functionals. The homogeneous adjoint boundary
functionals are described as follows:

dv dv
B{1 ðuÞ = α1 v ðr Þ þ β1 x=r þ γ 1 v ðsÞ þ δ1 = 0, ð10:72Þ
dx dx x=s

dv dv
B{2 ðuÞ = α2 v ðr Þ þ β2 x=r þ γ 2 v ð s Þ þ δ 2 = 0: ð10:73Þ
dx dx x=s

In (10.3) putting α1 = 1 and β1 = γ 1 = δ1 = 0, we have

B 1 ð uÞ = u ð r Þ = σ 1 : ð10:74Þ

Also putting γ 2 = 1 and α2 = β2 = δ2 = 0, we have


416 10 Introductory Green’s Functions

B2 ðuÞ = uðsÞ = σ 2 : ð10:75Þ

Further putting

σ 1 = σ 2 = 0, ð10:76Þ

we also get homogeneous BCs of

B1 ðuÞ = B2 ðuÞ = 0; i:e:, uðr Þ = uðsÞ = 0: ð10:77Þ

For RHS of (10.69) to vanish, it suffices to define B{1 ðuÞ and B{2 ðuÞ such that

B{1 ðuÞ = v ðr Þ and B{2 ðuÞ = v ðsÞ: ð10:78Þ

Then, homogeneous adjoint BCs read as

v ðr Þ = v ðsÞ = 0 i:e:, vðr Þ = vðsÞ = 0: ð10:79Þ

In this manner, we can readily construct the homogeneous adjoint BCs the same
as those of (10.77) so that Lx can be Hermitian.
We list several prescriptions of typical BCs below.

ð1Þ uðr Þ = uðsÞ = 0 ðDirichlet conditionsÞ,


du du
ð2Þ = = 0 ðNeumann conditionsÞ,
dx x = r dx x = s ð10:80Þ
du du
ð3Þ uðr Þ = uðsÞ and = ðperiodic conditionsÞ:
dx x = r dx x = s

Yet, care should be taken when handling RHS of (10.69), i.e., the surface terms. It
is because conditions (1)–(3) are not necessary but sufficient conditions for the
surface terms to vanish. Such conditions are not limited to them. Meanwhile, we
often have to deal with the non-vanishing surface terms. In that case, we have to start
with (10.62) instead of (10.69).
In Sect. 10.2, we mentioned the definition of Hermiticity of the differential
operator in such a way that the said operator is self-adjoint and that homogeneous
BCs and homogeneous adjoint BCs are the same. In light of the above argument,
however, we may relax the conditions for a differential operator to be Hermitian.
This is particularly the case when p(x) = a(x)w(x) in (10.69) vanishes at both the
endpoints. We will encounter such a situation in Sect. 10.7.
10.4 Green’s Functions 417

10.4 Green’s Functions

Having aforementioned discussions, let us proceed with studies of Green’s functions


for SOLDEs. Though minimum, we have to mention a bit of formalism.
Given Lx defined by (10.55), let us assume

L x uð x Þ = d ð xÞ ð10:81Þ

under homogeneous BCs with an inhomogeneous term d(x) being an arbitrary


function. We also assume that (10.81) is well-defined in a domain [r, s]. The
numbers r and s can be infinity. Suppose simultaneously that we have

L x { vð xÞ = hð x Þ ð10:82Þ

under homogeneous adjoint BCs [1, 2] with an inhomogeneous term h(x) being an
arbitrary function as well.
Let us describe the above relations as

Ljui = jdi and L{ jvi = jhi: ð10:83Þ

Suppose that there is an inverse operator L-1  G such that

GL = LG = E, ð10:84Þ

where E is an identity operator. Operating G on (10.83), we have

GLjui = Ejui = jui = Gjdi: ð10:85Þ

This implies that (10.81) has been solved and the solution is given by G| di. Since
an inverse operation to differentiation is integration, G is expected to be an integral
operator.
We have

hxjLG j yi = Lx hxjG j yi = Lx Gðx, yÞ: ð10:86Þ

Meanwhile, using (10.84) we get

hxjLG j yi = hxjE j yi = hxjyi: ð10:87Þ

Using a weight function w(x), we generalize an inner product of (1.128) such that
s
hgjf i  wðxÞgðxÞ f ðxÞdx: ð10:88Þ
r
418 10 Introductory Green’s Functions

Fig. 10.1 Function | fi and its coordinate representation f(x)

As we expand an arbitrary vector using basis vectors, we “expand” an arbitrary


function | fi using basis vectors | xi. Here, we are treating real numbers as if they
formed continuous innumerable basis vectors on a real number line (see Fig. 10.1).
Thus, we could expand | fi in terms of | xi such that
s
jf i = dxwðxÞf ðxÞjxi: ð10:89Þ
r

In (10.89) we considered f(x) as if it were an expansion coefficient. The following


notation would be reasonable accordingly:

f ðxÞ  hxjf i: ð10:90Þ

In (10.90), f(x) can be viewed as coordinate representation of | fi. Thus, from


(10.89) we get
s
hx0 jf i = f ðx0 Þ = dxwðxÞf ðxÞhx0 jxi: ð10:91Þ
r

Alternatively, we have
s
f ðx0 Þ = dxf ðxÞδðx - x0 Þ: ð10:92Þ
r

This comes from a property of the δ function [1] described as


s
dxf ðxÞδðxÞ = f ð0Þ: ð10:93Þ
r

Comparing (10.91) and (10.92), we have


10.4 Green’s Functions 419

δ ð x - x0 Þ δ ðx0 - xÞ
wðxÞhx0 jxi = δðx - x0 Þ or hx0 jxi = = : ð10:94Þ
wðxÞ wðxÞ

Thus comparing (10.86) and (10.87) and using (10.94), we get

δðx - yÞ
Lx Gðx, yÞ = : ð10:95Þ
wðxÞ

In a similar manner, we also have

δðx - yÞ
Lx { gðx, yÞ = : ð10:96Þ
wðxÞ

To arrive at (10.96), we start the discussion assuming an operator (Lx{)-1 such


that (Lx{)-1  g with gL{ = L{g = E.
The function G(x, y) is called a Green’s function and g(x, y) is said to be an adjoint
Green’s function. Handling of Green’s functions and adjoint Green’s functions is
based upon (10.95) and (10.96), respectively. As (10.81) is defined in a domain
r ≤ x ≤ s, (10.95) and (10.96) are defined in a domain r ≤ x ≤ s and r ≤ y ≤ s. Notice
that except for the point x = y we have

Lx Gðx, yÞ = 0 and Lx { gðx, yÞ = 0: ð10:97Þ

That is, G(x, y) and g(x, y) satisfy the homogeneous equation with respect to the
variable x. Accordingly, we require G(x, y) and g(x, y) to satisfy the same homoge-
neous BCs with respect to the variable x as those imposed upon u(x) and v(x) of
(10.81) and (10.82), respectively [1].
The relation (10.88) can be obtained as follows: Operating hg| on (10.89),
we have
s s
hgjf i = dxwðxÞf ðxÞhgjxi = wðxÞgðxÞ f ðxÞdx, ð10:98Þ
r r

where for the last equality we used

hgjxi = hx j gi = gðxÞ : ð10:99Þ

For this, see (1.113) where A is replaced with an identity operator E with regard to
a complex conjugate of an inner product of two vectors. Also see (13.2) of Sect.
13.1.
If in (10.69) the surface term (i.e., RHS) vanishes under appropriate conditions
e.g., (10.80), we have
420 10 Introductory Green’s Functions

s

dxwðxÞ v ðLx uÞ - Lx { v u = 0, ð10:100Þ
r

which is called Green’s identity. Since (10.100) is derived from identities (10.56),
(10.100) is an identity as well (as a terminology of Green’s identity shows).
Therefore, (10.100) must hold with any functions u and v so far as they satisfy
homogeneous BCs. Thus, replacing v in (10.100) with g(x, y) and using (10.96)
together with (10.81), we have
s

dxwðxÞ g ðx, yÞ½Lx uðxÞ - Lx { gðx, yÞ uðxÞ
r

s
δ ð x - yÞ
= dxwðxÞ g ðx, yÞdðxÞ - uð x Þ ð10:101Þ
r w ð xÞ
s
= dxwðxÞg ðx, yÞdðxÞ - uðyÞ = 0,
r

where with the second last equality we used a property of the δ functions. Also notice
that δðwxð-xÞyÞ is a real function. Rewriting (10.101), we get

s
uðyÞ = dxwðxÞg ðx, yÞdðxÞ: ð10:102Þ
r

Similarly, replacing u in (10.100) with G(x, y) and using (10.95) together with
(10.82), we have
s
vð yÞ = dxwðxÞG ðx, yÞhðxÞ: ð10:103Þ
r

Replacing u and v in (10.100) with G(x, q) and g(x, t), respectively, we have
s

dxwðxÞ g ðx, t Þ½Lx Gðx, qÞ - Lx { gðx, t Þ Gðx, qÞ = 0: ð10:104Þ
r

Notice that we have chosen q and t for the second argument y in (10.95) and
(10.96), respectively. Inserting (10.95) and (10.96) into the above equation after
changing arguments, we have

s
δðx - qÞ δðx - t Þ
dxwðxÞ g ðx, t Þ - Gðx, qÞ = 0: ð10:105Þ
r wðxÞ wðxÞ

Thus, we get
10.4 Green’s Functions 421

g ðq, t Þ = Gðt, qÞ or gðq, t Þ = G ðt, qÞ: ð10:106Þ

This implies that G(t, q) must satisfy the adjoint BCs with respect to the second
argument q. Inserting (10.106) into (10.102), we get
s
uð y Þ = dxwðxÞGðy, xÞdðxÞ: ð10:107Þ
r

Or exchanging the arguments x and y, we have


s
uð x Þ = dywðyÞGðx, yÞdðyÞ: ð10:108Þ
r

Similarly, substituting (10.106) for (10.103) we get


s
vð yÞ = dxwðxÞgðy, xÞhðxÞ: ð10:109Þ
r

Or, we have
s
vð xÞ = dywðyÞgðx, yÞhðyÞ: ð10:110Þ
r

Equations (10.107)–(10.110) clearly show that homogeneous equations [given by


putting d(x) = h(x) = 0] have a trivial solution u(x)  0 and v(x)  0 under
homogeneous BCs. Note that it is always the case when we are able to construct a
Green’s function. This in turn implies that we can construct a Green’s function if the
differential operator is accompanied by initial conditions. Conversely, if the homo-
geneous equation has a non-trivial solution under homogeneous BCs, (10.107)–
(10.110) will not work.
If the differential operator L in (10.81) is Hermitian, according to the associated
remarks of Sect. 10.2 we must have Lx = Lx{ and u(x) and v(x) of (10.81) and (10.82)
must satisfy the same homogeneous BCs. Consequently, in the case of an Hermitian
operator we should have

Gðx, yÞ = gðx, yÞ: ð10:111Þ

From (10.106) and (10.111), if the operator is Hermitian we get

Gðx, yÞ = G ðy, xÞ: ð10:112Þ

In Sect. 10.3 we assume that the coefficients a(x), and b(x), and c(x) are real to
assure that Lx is Hermitian [1]. On this condition G(x, y) is real as well (vide infra).
Then we have
422 10 Introductory Green’s Functions

Gðx, yÞ = Gðy, xÞ: ð10:113Þ

That is, G(x, y) is real symmetric with respect to the arguments x and y.
To be able to apply Green’s functions to practical use, we will have to estimate a
behavior of the Green’s function near x = y. This is because in light of (10.95) and
(10.96) there is a “jump” at x = y.
When we deal with a case where a self-adjoint operator is relevant, using a
function p(x) of (10.69) we have

að x Þ ∂ ∂G δ ð x - yÞ
p þ cðxÞG = : ð10:114Þ
pðxÞ ∂x ∂x wðxÞ

Multiplying both sides by paððxxÞÞ, we have

∂ ∂G pðxÞ δðx - yÞ pðxÞcðxÞ


p = - Gðx, yÞ: ð10:115Þ
∂x ∂x aðxÞ wðxÞ að x Þ

Using a property of the δ function expressed by

f ðxÞ δðxÞ = f ð0Þ δðxÞ or f ðxÞ δðx - yÞ = f ðyÞ δðx - yÞ, ð10:116Þ

we have

∂ ∂G pðyÞ δðx - yÞ pðxÞcðxÞ


p = - Gðx, yÞ: ð10:117Þ
∂x ∂x aðyÞ wðyÞ að x Þ

Integrating (10.117) with respect to x, we get

∂Gðx, yÞ pð y Þ x
pðt Þcðt Þ
p = θ ð x - yÞ - dt Gðt, yÞ þ C, ð10:118Þ
∂x aðyÞwðyÞ r aðt Þ

where C is a constant. The function θ(x - y) is defined by

1 ðx > 0Þ,
θ ð xÞ = 0 ðx < 0Þ:
ð10:119Þ

Note that we have

dθðxÞ
= δðxÞ: ð10:120Þ
dx

The function θ(x) is called a step function or Heaviside step function. In RHS of
(10.118) the first term has a discontinuity at x = y because of θ(x - y), whereas the
second term is continuous with respect to y. Thus, we have
10.4 Green’s Functions 423

∂Gðx, yÞ ∂Gðx, yÞ
lim pðy þ εÞ x = yþε - pð y - ε Þ
ε → þ0 ∂x ∂x x=y-ε
ð10:121Þ
pð y Þ pð y Þ
= lim ½θðþεÞ - θð - εÞ = :
ε → þ0 aðyÞwðyÞ aðyÞwðyÞ

Since p( y) is continuous with respect to the argument y, this factor drops off and
we get

∂Gðx, yÞ ∂Gðx, yÞ 1
lim x = yþε - = : ð10:122Þ
ε → þ0 ∂x ∂x x=y-ε aðyÞwðyÞ

ð x, yÞ
Thus, ∂G∂x is accompanied by a discontinuity at x = y by a magnitude of aðyÞw
1
ðyÞ.
Since RHS of (10.122) is continuous with respect to the argument y, integrating
(10.122) again with respect to x, we find that G(x, y) is continuous at x = y. These
properties of G(x, y) are useful to calculate Green’s functions in practical use. We
will encounter several examples in next sections.
Suppose that there are two Green’s functions that satisfy the same homogeneous
BCs. Let G(x, y) and G ~ ðx, yÞ be such functions. Then, we must have

δ ð x - yÞ ~ ðx, yÞ = δðx - yÞ :
Lx Gðx, yÞ = and Lx G ð10:123Þ
w ð xÞ wðxÞ

Subtracting both sides of (10.123), we have

~ ðx, yÞ - Gðx, yÞ = 0:
Lx G ð10:124Þ

~ ðx, yÞ must satisfy the same homo-


In virtue of the linearity of BCs, Gðx, yÞ - G
geneous BCs as well. But (10.124) is a homogeneous equation, and so we must have
a trivial solution from the aforementioned constructability of the Green’s function
such that

~ ðx, yÞ  0
Gðx, yÞ - G or ~ ðx, yÞ:
Gðx, yÞ  G ð10:125Þ

This obviously indicates that a Green’s function should be unique.


We have assumed in Sect. 10.3 that the coefficients a(x), and b(x), and c(x) are
real. Therefore, taking complex conjugate of (10.95) we have

δðx - yÞ
Lx Gðx, yÞ = : ð10:126Þ
wðxÞ

Notice here that both δ(x - y) and w(x) are real functions. Subtracting (10.95)
from (10.126), we have
424 10 Introductory Green’s Functions

Lx ½Gðx, yÞ - Gðx, yÞ = 0:

Again, from the uniqueness of the Green’s function, we get G(x, y) = G(x, y);
i.e., G(x, y) is real accordingly. This is independent of specific structures of Lx. In
other words, so far as we are dealing with real coefficients a(x), b(x), and c(x), G(x, y)
is real whether or not Lx is self-adjoint.

10.5 Construction of Green’s Functions

So far we dealt with homogeneous boundary conditions (BCs) with respect to a


differential equation

d2 u du
að x Þ þ bðxÞ þ cðxÞu = d ðxÞ, ð10:2Þ
dx2 dx

where coefficients a(x), b(x), and c(x) are real. In this case, if d(x)  0 in (10.108),
namely, the SOLDE is homogeneous equation, we have a solution u(x)  0 under
homogeneous boundary conditions on the basis of (10.108). If, however, we have
inhomogeneous boundary conditions (BCs), additional terms appear on RHS of
(10.108) in both the cases of homogeneous and inhomogeneous equations. In this
section, we examine how we can deal with this problem.
Following the remarks made in Sect. 10.3, we start with (10.62) or (10.69). If we
deal with a self-adjoint or Hermitian operator, we can apply (10.69) to the problem.
In a more general case where the operator is not self-adjoint, (10.62) is useful. In this
respect, in Sect. 10.6 we have a good opportunity for this.
In Sect. 10.3, we mentioned that we may relax the definition of Hermiticity of the
differential operator in the case where the surface term vanishes. Meanwhile, we
should bear in mind that the Green’s functions and adjoint Green’s functions are
constructed using homogeneous BCs regardless of whether we are concerned with a
homogeneous equation or inhomogeneous equation. Thus, even if the surface terms
do not vanish, we may regard the differential operator as Hermitian. This is because
we deal with essentially the same Green’s function to solve a problem with both the
cases of homogeneous equation and inhomogeneous equations (vide infra). Notice
also that whether or not RHS vanishes, we are to use the same Green’s function
[1]. In this sense, we do not have to be too strict with the definition of Hermiticity.
Now, suppose that for a differential (self-adjoint) operator Lx we are given
s
s
du dv
dxwðxÞfv ðLx uÞ - ½Lx v ug = pðxÞ v -u , ð10:69Þ
r dx dx r
10.5 Construction of Green’s Functions 425

where w(x) > 0 in a domain [r, s] and p(x) is a real function. Note that since (10.69) is
an identity, we may choose G(x, y) for v with an appropriate choice of w(x). Then
from (10.95) and (10.112), we have

s
δ ð x - yÞ
dxwðxÞ Gðy, xÞdðxÞ - uð x Þ
r w ð xÞ
s
δ ð x - yÞ
= dxwðxÞ Gðy, xÞd ðxÞ - uð x Þ ð10:127Þ
r w ð xÞ
s
duðxÞ ∂Gðy, xÞ
= pðxÞ Gðy, xÞ - uð x Þ :
dx ∂x x=r

Note in (10.127) we used the fact that both δ(x - y) and w(x) are real.
Using a property of the δ function, we get
s
uðyÞ = dxwðxÞGðy, xÞdðxÞ
r
s
duðxÞ ∂Gðy, xÞ
- pðxÞ Gðy, xÞ - uð x Þ : ð10:128Þ
dx ∂x x=r

When the differential operator Lx can be made Hermitian under an appropriate


condition of (10.71), as a real Green’s function we have

Gðx, yÞ = Gðy, xÞ: ð10:113Þ

The function G(x, y) satisfies homogeneous BCs. Hence, if we assume, e.g., the
Dirichlet BCs [see (10.80)], we have

Gðr, yÞ = Gðs, yÞ = 0: ð10:129Þ

Using the symmetric property of G(x, y) with respect to arguments x and y, from
(10.129) we get

Gðy, r Þ = Gðy, sÞ = 0: ð10:130Þ

Thus, the first term of the surface terms of (10.128) is eliminated to yield
s
s
∂Gðy, xÞ
uð y Þ = dxwðxÞGðy, xÞdðxÞ þ pðxÞuðxÞ :
r ∂x x=r

Exchanging the arguments x and y, we get


426 10 Introductory Green’s Functions

s
s
∂Gðx, yÞ
uð x Þ = dywðyÞGðx, yÞd ðyÞ þ pðyÞuðyÞ : ð10:131Þ
r ∂y y=r

Then, (1) substituting surface terms of u(s) and u(r) that are associated with the
inhomogeneous BCs described as

B 1 ð uÞ = σ 1 and B2 ðuÞ = σ 2 ð10:132Þ

∂Gðx, yÞ ∂Gðx, yÞ
and (2) calculating ∂y
and ∂y
, we will be able to obtain a unique
y=r y=s
solution. Notice that even though we have formally the same differential operators,
we get different Green’s functions depending upon different BCs. We see tangible
examples later.
On the basis of the general discussion of Sect. 10.4 and this section, we are in the
position to construct the Green’s functions. Except for the points of x = y, the
Green’s function G(x, y) must satisfy the following differential equation:

Lx Gðx, yÞ = 0, ð10:133Þ

where Lx is given by

d2 d
Lx = aðxÞ þ bðxÞ þ cðxÞ: ð10:55Þ
dx2 dx

The differential equation Lxu = d(x) is defined within an interval [r, s], where
r may be -1 and s may be +1.
From now on, we regard a(x), b(x), and c(x) as real functions. From (10.133), we
expect the Green’s function to be described as a linear combination of a fundamental
set of solutions u1(x) and u2(x). Here the fundamental set of solutions are given by
two linearly independent solutions of a homogeneous equation Lxu = 0. Then we
should be able to express G(x, y) as a combination of F1(x, y) and F2(x, y) that are
described as

F 1 ðx, yÞ = c1 u1 ðxÞ þ c2 u2 ðxÞ for r ≤ x < y,


ð10:134Þ
F 2 ðx, yÞ = d1 u1 ðxÞ þ d 2 u2 ðxÞ for y < x ≤ s,

where c1, c2, d1, and d2 are arbitrary (complex) constants to be determined later.
These constants are given as a function of y. The combination has to be made
such that

F 1 ðx, yÞ for r ≤ x < y,


Gðx, yÞ = F 2 ðx, yÞ for y < x ≤ s:
ð10:135Þ

Thus using θ(x) function defined as (10.119), we describe G(x, y) as


10.5 Construction of Green’s Functions 427

Gðx, yÞ = F 1 ðx, yÞθðy - xÞ þ F 2 ðx, yÞθðx - yÞ: ð10:136Þ

Notice that F1(x, y) and F2(x, y) are “ordinary” functions and that G(x, y) is not,
because G(x, y) contains the θ(x) function.
If we have

F 2 ðx, yÞ = F 1 ðy, xÞ, ð10:137Þ

Gðx, yÞ = F 1 ðx, yÞθðy - xÞ þ F 1 ðy, xÞθðx - yÞ: ð10:138Þ

Hence, we get

Gðx, yÞ = Gðy, xÞ: ð10:139Þ

From (10.113), Lx is Hermitian. Suppose that F1(x, y) = (x - r)(y - s) and F2(x,


y) = (x - s)(y - r). Then, (10.137) is satisfied and, hence, if we can construct the
Green’s function from F1(x, y) and F2(x, y), Lx should be Hermitian. However, if we
had, e.g., F1(x, y) = x - r and F2(x, y) = y - s, G(x, y) ≠ G(y, x), and so Lx would not
be Hermitian.
The Green’s functions must satisfy the homogeneous BCs. That is,

B1 ðGÞ = B2 ðGÞ = 0: ð10:140Þ

Also, we require continuity condition of G(x, y) at x = y and discontinuity


condition of ∂G(x, y)/∂x at x = y described by (10.122). Thus, we have four
conditions including BCs and continuity and discontinuity conditions to be satisfied
by G(x, y). Thus, we can determine four constants c1, c2, d1, and d2 by the four
conditions.
Now, let us inspect further details about the Green’s functions by an example.
Example 10.4 Let us consider a following differential equation

d2 u
þ u = 1: ð10:141Þ
dx2

We assume that a domain of the argument x is [0, L]. We set boundary conditions
such that

uð0Þ = σ 1 and u ð LÞ = σ 2 : ð10:142Þ

Thus, if at least one of σ 1 and σ 2 is not zero, we are dealing with an inhomoge-
neous differential equation under inhomogeneous BCs.
Next, let us seek conditions that the Green’s function satisfies. We also seek a
fundamental set of solutions of a homogeneous equation described by
428 10 Introductory Green’s Functions

d2 u
þ u = 0: ð10:143Þ
dx2

This is obtained by putting a = c = 1 and b = 0 in a general form of (10.5) with a


weight function being unity. The differential equation (10.143) is therefore self-
adjoint according to the argument of Sect. 10.3. A fundamental set of solutions are
given by

eix and e - ix :

Then, we have

F 1 ðx, yÞ = c1 eix þ c2 e - ix for 0 ≤ x < y,


- ix
ð10:144Þ
F 2 ðx, yÞ = d1 e þ d2 e
ix
for y < x ≤ L:

The functions F1(x, y) and F2(x, y) must satisfy the following BCs such that

F 1 ð0, yÞ = c1 þ c2 = 0 and F 2 ðL, yÞ = d1 eiL þ d2 e - iL = 0: ð10:145Þ

Thus, we have

F 1 ðx, yÞ = c1 eix - e - ix , F 2 ðx, yÞ = d1 eix - e2iL e - ix : ð10:146Þ

Therefore, at x = y we have

c1 eiy - e - iy = d1 eiy - e2iL e - iy : ð10:147Þ

Discontinuity condition of (10.122) is equivalent to

∂F 2 ðx, yÞ ∂F 1 ðx, yÞ
x=y - = 1: ð10:148Þ
∂x ∂x x=y

This is because both F1(x, y) and F2(x, y) are ordinary functions and supposed to
be differentiable at any x. The relation (10.148) then reads as

id 1 eiy þ e2iL e - iy - ic1 eiy þ e - iy = 1: ð10:149Þ

From (10.147) and (10.149), using Cramer’s rule we have


10.5 Construction of Green’s Functions 429

0 - eiy þ e2iL e - iy
i - eiy - e2iL e - iy iðeiy - e2iL e - iy Þ
c1 = iy - iy 2iL - iy
= , ð10:150Þ
e -e -e þ e e
iy 2ð1 - e2iL Þ
eiy þ e - iy - eiy - e2iL e - iy

eiy - e - iy 0
eiy þ e - iy i iðeiy - e - iy Þ
d1 = iy - iy 2iL - iy
= : ð10:151Þ
e -e -e þ e e
iy 2ð1 - e2iL Þ
eiy þ e - iy - eiy - e2iL e - iy

Substituting these parameters for (10.146), we get

sin xðe2iL e - iy - eiy Þ sin yðe2iL e - ix - eix Þ


F 1 ðx, yÞ = , F 2 ðx, yÞ = : ð10:152Þ
1 - e2iL 1 - e2iL

Making a denominator real, we have

sin x½cosðy - 2LÞ - cos y


F 1 ðx, yÞ = ,
2 sin 2 L
sin y½cosðx - 2LÞ - cos x
F 2 ðx, yÞ = : ð10:153Þ
2 sin 2 L

Using the θ(x) function, we get

sin x½cosðy - 2LÞ - cos y


Gðx, yÞ = θ ð y - xÞ
2 sin 2 L
sin y½cosðx - 2LÞ - cos x
þ θðx - yÞ: ð10:154Þ
2 sin 2 L

Thus, G(x, y) = G(y, x) as expected. Notice, however, that if L = nπ (n = 1, 2, ⋯)


the Green’s function cannot be defined as an ordinary function even if x ≠ y. We
return to this point later.
The solution for (10.141) under the homogeneous BCs is then described as

L
uð x Þ = dyGðx, yÞ
0
cosðx - 2LÞ - cos x x
sin x L
= sin ydy þ ½cosðy - 2LÞ - cos ydy:
2 sin 2 L 0 2 sin 2 L x
ð10:155Þ
430 10 Introductory Green’s Functions

This can readily be integrated to yields solution for the inhomogeneous equation
such that

cosðx - 2LÞ - cos x - 2 sin L sin x - cos 2L þ 1


uð x Þ =
2 sin 2 L
ð10:156Þ
cosðx - 2LÞ - cos x - 2 sin L sin x þ 2 sin 2 L
= :
2 sin 2 L

Next, let us consider the surface term. This is given by the second term of
(10.131). We get

∂F 1 ðx, yÞ sin x ∂F 2 ðx, yÞ cosðx - 2LÞ - cos x


y=L = , = : ð10:157Þ
∂x sin L ∂x y=0 2 sin 2 L

Therefore, with the inhomogeneous BCs we have the following solution for the
inhomogeneous equation:

cosðx - 2LÞ - cos x - 2 sin L sin x þ 2 sin 2 L


uðxÞ =
2 sin 2 L
2σ sin L sin x þ σ 1 ½cos x - cosðx - 2LÞ
þ 2 , ð10:158Þ
2 sin 2 L

where the second term is the surface term. If σ 1 = σ 2 = 1, we have

uðxÞ  1:

Looking at (10.141), we find that u(x)  1 is certainly a solution for (10.141) with
inhomogeneous BCs of σ 1 = σ 2 = 1. The uniqueness of the solution then ensures
that u(x)  1 is a sole solution under the said BCs.
From (10.154), we find that G(x, y) has a singularity at L = nπ (n : integers). This
is associated with the fact that a homogenous equation (10.143) has a non-trivial
solution, e.g., u(x) = sin x under homogeneous BCs u(0) = u(L ) = 0. The present
situation is essentially the same as that of Example 1.1 of Sect. 1.3. In other words,
when λ = 1 in (1.61), the form of a differential equation is identical to (10.143) with
virtually the same Dirichlet conditions. The point is that (10.143) can be viewed as a
homogeneous equation and, at the same time, as an eigenvalue equation. In such a
case, a Green’s function approach will fail.
10.6 Initial-Value Problems (IVPs) 431

10.6 Initial-Value Problems (IVPs)

10.6.1 General Remarks

The IVPs frequently appear in mathematical physics. The relevant conditions are
dealt with as BCs in the theory of differential equations. With boundary functionals
B1(u) and B2(u) of (10.3) and (10.4), setting α1 = β2 = 1 and other coefficients as
zero, we get

du
B1 ðuÞ = uðpÞ = σ 1 and B2 ðuÞ = = σ2 : ð10:159Þ
dx x=p

In the above, note that we choose [r, s] for a domain of x. The points r and s can be
infinity as before. Any point p within the domain [r, s] may be designated as a special
point on which the BCs (10.159) are imposed. The initial conditions are particularly
prominent among BCs. This is because the conditions are set at one point of the
argument. This special condition is usually called initial conditions. In this section,
we investigate fundamental characteristics of IVPs.
Suppose that we have

du
uð pÞ = = 0, ð10:160Þ
dx x=p

with homogeneous BCs. Given a differential operator Lx defined as (10.55), i.e.,

d2 d
Lx = aðxÞ þ bðxÞ þ cðxÞ, ð10:55Þ
dx2 dx

let a fundamental set of solutions be u1(x) and u2(x) for

Lx uðxÞ = 0: ð10:161Þ

A general solution u(x) for (10.161) is given by a linear combination of u1(x) and
u2(x) such that

uðxÞ = c1 u1 ðxÞ þ c2 u2 ðxÞ, ð10:162Þ

where c1 and c2 are arbitrary (complex) constants. Suppose that we have homoge-
neous BCs expressed by (10.160). Then, we have

uðpÞ = c1 u1 ðpÞ þ c2 u2 ðpÞ = 0,


u0 ðpÞ = c1 u01 ðpÞ þ c2 u02 ðpÞ = 0:

Rewriting it in a matrix form, we have


432 10 Introductory Green’s Functions

u1 ðpÞ u2 ðpÞ
u01 ðpÞ u02 ðpÞ
c1
c2
= 0:

Since the matrix represents Wronskian of a fundamental set of solutions u1(x) and
u2(x), its determinant never vanishes at any point p. That is, we have

u 1 ð pÞ u 2 ð pÞ
≠ 0: ð10:163Þ
u01 ðpÞ u02 ðpÞ

Then, we necessarily have c1 = c2 = 0. From (10.162), we have a trivial solution

uð x Þ  0

under the initial conditions as homogeneous BCs. Thus, as already discussed a


Green’s function can always be constructed for IVPs.
To seek the Green’s functions for IVPs, we return back to the generalized Green’s
identity described as

d ðav Þ
s s
 du
dx v ðLx uÞ - Lx { v u = av -u þ buv : ð10:62Þ
r dx dx r

For the surface term (RHS) to vanish, for homogeneous BCs we have, e.g.,

du dv
uð s Þ = x=s =0 and vð r Þ = = 0,
dx dx x=r

for the two sets of BCs adjoint to each other. Obviously, these are not identical
simply because the former is determined at s and the latter is determined at a different
point r. For this reason, the operator Lx is not Hermitian, even though it is formally
self-adjoint. In such a case, we would rather use Lx directly than construct a self-
adjoint operator because we cannot make the operator Hermitian either way.
Hence, unlike the precedent sections we do not need a weight function w(x). Or,
we may regard w(x)  1. Then, we reconsider the conditions which the Green’s
functions should satisfy. On the basis of the general consideration of Sect. 10.4,
especially (10.86), (10.87), and (10.94), we have [2]

Lx Gðx, yÞ = hxjyi = δðx - yÞ: ð10:164Þ

Therefore, we have

2
∂ Gðx, yÞ bðxÞ ∂Gðx, yÞ cðxÞ δ ð x - yÞ
þ þ Gðx, yÞ = :
∂x2 að x Þ ∂x að x Þ að x Þ

Integrating or integrating by parts the above equation, we get


10.6 Initial-Value Problems (IVPs) 433

x 0
∂Gðx, yÞ bð x Þ x
bð ξ Þ x
cð ξ Þ
þ Gðx, yÞ - Gðξ, yÞdξ þ Gðξ, yÞdξ
∂x að x Þ x0 x0 að ξ Þ x0 aðξÞ
θ ð x - yÞ
= :
að y Þ

∂Gðx, yÞ θ ð x - yÞ
Noting that the functions other than ∂x
and aðyÞ are continuous, as before
we have

∂Gðx, yÞ ∂Gðx, yÞ 1
lim x = yþε - = : ð10:165Þ
ε → þ0 ∂x ∂x x=y-ε að y Þ

10.6.2 Green’s Functions for IVPs

From a practical point of view, we may set r = 0 in (10.62). Then, we can choose a
domain for [0, s] (for s > 0) or [s, 0] (for s < 0) with (10.2). For simplicity, we use
x instead of s. We consider two cases of x > 0 and x < 0.
1. Case I (x > 0): Let u1(x) and u2(x) be a fundamental set of solutions. We define
F1(x, y) and F2(x, y) as before such that

F 1 ðx, yÞ = c1 u1 ðxÞ þ c2 u2 ðxÞ for 0 ≤ x < y,


ð10:166Þ
F 2 ðx, yÞ = d1 u1 ðxÞ þ d2 u2 ðxÞ for 0 < y < x:

As before, we set

F 1 ðx, yÞ for 0 ≤ x < y,


Gðx, yÞ = F 2 ðx, yÞ for 0 < y < x:

Homogeneous BCs are defined as

u ð 0Þ = 0 and u0 ð0Þ = 0:

Correspondingly, we have

F 1 ð0, yÞ = 0 and F 01 ð0, yÞ = 0:

This is translated into


434 10 Introductory Green’s Functions

c 1 u1 ð 0Þ þ c2 u2 ð 0Þ = 0 and c1 u01 ð0Þ þ c2 u02 ð0Þ = 0:

In a matrix form, we get

u1 ð0Þ u2 ð0Þ
u01 ð0Þ u02 ð0Þ
c1
c2
= 0:

As mentioned above, since u1(x) and u2(x) are a fundamental set of solutions,
we have c1 = c2 = 0. Hence, we get

F 1 ðx, yÞ = 0:

From the continuity and discontinuity conditions (10.165) imposed upon the
Green’s functions, we have

d1 u1 ðyÞ þ d 2 u2 ðyÞ = 0 and d 1 u01 ðyÞ þ d2 u02 ðyÞ = 1=aðyÞ: ð10:167Þ

As before, we get

u2 ð y Þ u1 ðyÞ
d1 = - and d2 = ,
aðyÞW ðu1 ðyÞ, u2 ðyÞÞ aðyÞW ðu1 ðyÞ, u2 ðyÞÞ

where W(u1( y), u2( y)) is Wronskian of u1( y) and u2( y). Thus, we get

u2 ðxÞu1 ðyÞ - u1 ðxÞu2 ðyÞ


F 2 ðx, yÞ = : ð10:168Þ
aðyÞW ðu1 ðyÞ, u2 ðyÞÞ

2. Case II (x < 0): Next, we think of the case as below:

F 1 ðx, yÞ = c1 u1 ðxÞ þ c2 u2 ðxÞ for y < x ≤ 0,


ð10:169Þ
F 2 ðx, yÞ = d1 u1 ðxÞ þ d2 u2 ðxÞ for x < y < 0:

Similarly proceeding as the above, we have c1 = c2 = 0. Also, we get

u1 ðxÞu2 ðyÞ - u2 ðxÞu1 ðyÞ


F 2 ðx, yÞ = : ð10:170Þ
aðyÞW ðu1 ðyÞ, u2 ðyÞÞ

Here notice that the sign is reversed in (10.170) relative to (10.168). This is
because on the discontinuity condition, instead of (10.167) we have to have

d1 u01 ðyÞ þ d 2 u02 ðyÞ = - 1=aðyÞ:

This results from the fact that magnitude relationship between the arguments
x and y has been reversed in (10.169) relative to (10.166).
10.6 Initial-Value Problems (IVPs) 435

Fig. 10.2 Graph of a x


function Θ(x, y). Θ(x,
y) = 1 or - 1 in hatched
areas, otherwise Θ(x, y) = 0
, =1

( , )

O y

, = −1

Summarizing the above argument, (10.168) is obtained in the domain


0 ≤ y < x; (10.170) is obtained in the domain x < y < 0. Noting this characteristic,
we define a function such that

Θðx, yÞ  θðx - yÞθðyÞ - θðy - xÞθð- yÞ: ð10:171Þ

Notice that

Θðx, yÞ = - Θð- x, - yÞ:

That is, Θ(x, y) is antisymmetric with respect to the origin.


Figure 10.2 shows a feature of Θ(x, y). If the “initial point” is taken at x = a, we
can use Θ(x - a, y - a) instead; see Fig. 10.3. The function is described as

Θðx - a, y - aÞ = θðx - yÞθðy - aÞ - θðy - xÞθða - yÞ:

Note that Θ(x - a, y - a) can be obtained by shifting Θ(x, y) toward the positive
direction of the x- and y-axes by a (a can be either positive or negative; in Fig. 10.3
we assume a > 0). Using the Θ(x, y) function, the Green’s function is described as

u2 ðxÞu1 ðyÞ - u1 ðxÞu2 ðyÞ


Gðx, yÞ = Θðx, yÞ: ð10:172Þ
aðyÞW ðu1 ðyÞ, u2 ðyÞÞ

Defining a function F such that


436 10 Introductory Green’s Functions

Fig. 10.3 Graph of a


function Θ(x - a, y - a).
x
We assume a > 0

( − , − )
O
y

u2 ðxÞu1 ðyÞ - u1 ðxÞu2 ðyÞ


F ðx, yÞ  , ð10:173Þ
aðyÞW ðu1 ðyÞ, u2 ðyÞÞ

we have

Gðx, yÞ = F ðx, yÞΘðx, yÞ: ð10:174Þ

Notice that

Θðx, yÞ ≠ Θðy, xÞ and Gðx, yÞ ≠ Gðy, xÞ: ð10:175Þ

It is therefore obvious that the differential operator is not Hermitian.

10.6.3 Estimation of Surface Terms

To include the surface term of the inhomogeneous case, we use (10.62).

d ðav Þ
s s
 du
dx v ðLx uÞ - Lx { v u = av -u þ buv : ð10:62Þ
r dx dx r

As before, we set r = 0 in (10.62). Also, we classify (10.62) into two cases


according as s > 0 or s < 0.
10.6 Initial-Value Problems (IVPs) 437

x
Fig. 10.4 Domain of G(y,
x). Areas for which G(y, x)
does not vanish are hatched

O y

1. Case I (x > 0, y > 0): (10.62) reads as

1 1
 du d ðav Þ
dx v ðLx uÞ - Lx { v u = av -u þ buv : ð10:176Þ
0 dx dx 0

This time, inserting

gðx, yÞ = G ðy, xÞ = Gðy, xÞ ð10:177Þ

into v of (10.176) and arranging terms, we have


1
uðyÞ = dxGðy, xÞd ðxÞ
0
1
duðxÞ daðxÞ ∂Gðy, xÞ
- aðxÞGðy, xÞ - uðxÞ Gðy, xÞ - uðxÞaðxÞ þ bðxÞuðxÞGðy, xÞ :
dx dx ∂x x=0
ð10:178Þ

Note that in the above we used Lx{g(x, y) = δ(x - y). In Fig. 10.4, we depict a
domain of G(y, x) in which G(y, x) does not vanish. Notice that we get the domain
by folding back that of Θ(x, y) (see Fig. 10.2) relative to a straight line y = x.
ðy, xÞ
Thus, we find that G(y, x) vanishes at x > y. So does ∂G∂x ; see Fig. 10.4.
Namely, the second term of RHS of (10.178) vanishes at x = 1. In other words,
g(x, y) and G(y, x) must satisfy the adjoint BCs; i.e.,
438 10 Introductory Green’s Functions

gð1, yÞ = Gðy, 1Þ = 0: ð10:179Þ

At the same time, the upper limit of integration range of (10.178) can be set at
y. Noting the above, we have
y
½the first term of ð10:178Þ = dxGðy, xÞd ðxÞ: ð10:180Þ
0

Also with the second term of (10.178), we get

½the second term of ð10:178Þ =


duðxÞ daðxÞ ∂Gðy, xÞ
þ aðxÞGðy, xÞ - uðxÞ Gðy, xÞ - uðxÞaðxÞ þ bðxÞuðxÞGðy, xÞ
dx dx ∂x x=0
duð0Þ dað0Þ ∂Gðy, xÞ
= að0ÞGðy, 0Þ - uð0Þ Gðy, 0Þ - uð0Það0Þ þ bð0Þuð0ÞGðy, 0Þ:
dx dx ∂x x=0
ð10:181Þ

If we substitute inhomogeneous BCs

du
u ð 0Þ = σ 1 and = σ2 , ð10:182Þ
dx x=0

for (10.181) along with other appropriate values, we should be able to get a
unique solution as

y
dað0Þ
uð y Þ = dxGðy, xÞdðxÞ þ σ 2 að0Þ - σ 1 þ σ 1 bð0Þ Gðy, 0Þ
0 dx
ð10:183Þ
∂Gðy, xÞ
- σ 1 a ð 0Þ :
∂x x=0

Exchanging arguments x and y, we get

x
dað0Þ
uð x Þ = dyGðx, yÞdðyÞ þ σ 2 að0Þ - σ 1 þ σ 1 bð0Þ Gðx, 0Þ
0 dy
ð10:184Þ
∂Gðx, yÞ
- σ 1 a ð 0Þ :
∂y y=0

Here, we consider that Θ(x, y) = 1 in this region and use (10.174). Meanwhile,
from (10.174), we have
10.6 Initial-Value Problems (IVPs) 439

∂Gðx, yÞ ∂F ðx, yÞ ∂Θðx, yÞ


= Θðx, yÞ þ F ðx, yÞ: ð10:185Þ
∂y ∂y ∂y

In the second term,

∂Θðx, yÞ ∂θðx - yÞ ∂θðyÞ ∂θðy - xÞ ∂θð- yÞ


= θðyÞ þ θðx - yÞ - θð - yÞ - θðy - xÞ
∂y ∂y ∂y ∂y ∂y
= - δðx - yÞθðyÞ þ θðx - yÞδðyÞ - δðy - xÞθð - yÞ þ θðy - xÞδð - yÞ
= - δðx - yÞ½θðyÞ þ θð - yÞ þ ½θðx - yÞ þ θðy - xÞδðyÞ
= - δðx - yÞ½θðyÞ þ θð - yÞ þ ½θðxÞ þ θð - xÞδðyÞ = - δðx - yÞ þ δðyÞ,
ð10:186Þ

where we used

θðxÞ þ θð- xÞ  1 ð10:187Þ

as well as

f ðyÞδðyÞ = f ð0ÞδðyÞ ð10:188Þ

and

δð- yÞ = δðyÞ: ð10:189Þ

However, the function -δ(x - y) + δ( y) is of secondary importance. It is


because in (10.184) we may choose [ε, x - ε] (ε > 0) for the domain y and put
ε → + 0 after the integration and other calculations related to the surface terms.
Therefore, -δ(x - y) + δ( y) in (10.186) virtually vanishes.
Thus, we can express (10.185) as

∂Gðx, yÞ ∂F ðx, yÞ ∂F ðx, yÞ


= Θðx, yÞ = :
∂y ∂y ∂y

Then, finally we reach

x
dað0Þ
uðxÞ = dyF ðx, yÞdðyÞ þ σ 2 að0Þ - σ 1 þ σ 1 bð0Þ F ðx, 0Þ
0 dy
ð10:190Þ
∂F ðx, yÞ
- σ 1 að0Þ :
∂y y=0

2. Case II (x < 0, y < 0): Similarly as the above, (10.62) reads as


440 10 Introductory Green’s Functions

0
duðxÞ daðxÞ
uð y Þ ¼ dxGðy, xÞd ðxÞ - aðxÞGðy, xÞ - uð x Þ Gðy, xÞ
-1 dx dx
0
∂Gðy, xÞ
- uðxÞaðxÞ þ bðxÞuðxÞGðy, xÞ :
∂x
x¼ - 1

Similarly as mentioned above, the lower limit of integration range is y.


ð y, xÞ
Considering both G(y, x) and ∂G∂x vanish at x < y (see Fig. 10.4), we have

0
uðyÞ = dxGðy, xÞdðxÞ
y
duðxÞ daðxÞ ∂Gðy, xÞ
- aðxÞGðy, xÞ - uð x Þ Gðy,xÞ - uðxÞaðxÞ þ bðxÞuðxÞGðy,xÞ
dx dx ∂x x= 0
y
dað0Þ
=- dxGðy,xÞdðxÞ - σ 2 að0Þ - σ 1 þ σ 1 bð0Þ Gðy,0Þ
0 dx
∂Gðy,xÞ
þσ 1 að0Þ :
∂x x=0
ð10:191Þ

Comparing (10.191) with (10.183), we recognize that the sign of RHS of


(10.191) has been reversed relative to RHS of (10.183). This is also the case after
exchanging arguments x and y. Note, however, Θ(x, y) = - 1 in the present case.
As a result, two minus signs cancel and (10.191) takes exactly the same expres-
sion as (10.183). Proceeding with calculations similarly, for both Cases I and II
we arrive at a unified solution represented by (10.190) throughout a domain (-
1, +1).

10.6.4 Examples

To deepen understanding of Green’s functions, we deal with tangible examples of


the IVP below.
Example 10.5 Let us consider a following inhomogeneous differential equation:

d2 u
þ u = 1: ð10:192Þ
dx2

Note that (10.192) is formally the same differential equation of (10.141). We may
encounter (10.192) when we are observing a motion of a charged harmonic oscillator
10.6 Initial-Value Problems (IVPs) 441

that is placed under a static electric field. We assume that a domain of the argument
x is a whole range of real numbers. We set boundary conditions such that

u ð 0Þ = σ 1 and u0 ð0Þ = σ 2 : ð10:193Þ

As in the case of Example 10.4, a fundamental set of solutions are given by

eix and e - ix : ð10:194Þ

Therefore, following (10.173), we get

F ðx, yÞ = sinðx - yÞ: ð10:195Þ

Also following (10.190) we have

x
uð x Þ = dy sinðx - yÞ þ σ 2 sin x - σ 1 ½ - cosðx - yÞ
0 y=0 ð10:196Þ
= 1 - cos x þ σ 2 sin x þ σ 1 cos x:

In particular, if we choose σ 1 = 1 and σ 2 = 0, we have

uðxÞ  1: ð10:197Þ

This also ensures that this is a unique solution under the inhomogeneous BCs
described as σ 1 = 1 and σ 2 = 0.
Example 10.6 Damped Oscillator If a harmonic oscillator undergoes friction, the
oscillator exerts damped oscillation. Such an oscillator is said to be a damped
oscillator. The damped oscillator is often dealt with when we think of bound
electrons in a dielectric medium that undergo an effect of a dynamic external field
varying with time. This is the case when the electron is placed in an alternating
electric field or an electromagnetic wave.
An equation of motion of the damped oscillator is described as

d2 u du
m þ r þ ku = dðt Þ, ð10:198Þ
dt 2 dt

where m is a mass of an electron; t denotes time; r is a damping constant; k is a spring


constant of the damped oscillator; d(t) represents an external force field (generally
time-dependent). To seek a fundamental set of solutions of a homogeneous equation
described as
442 10 Introductory Green’s Functions

d2 u du
m þ r þ ku = 0,
dt 2 dt

putting

uðt Þ = eiρt ð10:199Þ

and inserting it to (10.198), we have

r k iρt
- ρ2 þ iρ þ e = 0:
m m

Since eiρt does not vanish, we have

r k
- ρ2 þ iρ þ = 0: ð10:200Þ
m m

We call this equation a characteristic quadratic equation. We have three cases for
the solution of a quadratic equation of (10.200). Solving (10.200) with respect to ρ,
we get

ir r2 k
ρ= ∓ - þ : ð10:201Þ
2m 4m2 m

Regarding (10.201), we have following three cases for the roots ρ: (1) ρ has two
r2
different pure imaginary roots; i.e., - 4m 2 þ m < 0 (an over damping). (2) ρ has
k

r2
double root 2m ir
; - 4m 2 þ m = 0 (a critical damping). (3) ρ has two complex roots;
k
2
- 4mr
2 þ m > 0 (a weak damping). Of these, Case (3) is characterized by an oscil-
k

lating solution and has many applications in mathematical physics. For the Cases
(1) and (2), however, we do not have an oscillating solution.
Case (1):
The characteristic roots are given by

ir r2 k
ρ= ∓i - :
2m 4m2 m

Therefore, we have a fundamental set of solutions described by

rt r2 k
uðt Þ = exp - exp ∓ 2
- t :
2m 4m m

Then, a general solution is given by


10.6 Initial-Value Problems (IVPs) 443

rt r2 k r2 k
uðt Þ = exp - aexp - t þ b exp - - t :
2m 4m2 m 4m2 m

Case (2):
The characteristic roots are given by

ir
ρ= :
2m

Therefore, one of the solutions is

rt
u1 ðt Þ = exp - :
2m

Another solution u2(t) is given by

∂u1 ðt Þ rt
u2 ðt Þ = c = c0 t exp - ,
∂ðiρÞ 2m

where c and c′ are appropriate constants. Thus, general solution is given by

rt rt
uðt Þ = a exp - þ bt exp - :
2m 2m

The most important and interesting feature emerges as a “damped oscillator” in


the next Case (3) in many fields of natural science. We are particularly interested in
this case.
Case (3):
Suppose that the damping is relatively weak such that the characteristic equation
has two complex roots. Let us examine further details of this case following the
prescriptions of IVPs. We divide (10.198) by m for the sake of easy handling of the
differential equation such that

d2 u r du k 1
þ þ u = d ðt Þ:
dt 2 m dt m m

Putting

r2 k
ω - þ , ð10:202Þ
4m2 m

we get a fundamental set of solutions described as


444 10 Introductory Green’s Functions

rt
uðt Þ = exp - expð ∓ iωt Þ: ð10:203Þ
2m

Given BCs, following (10.172) we get as a Green’s function

u2 ðt Þu1 ðτÞ - u1 ðt Þu2 ðτÞ


Gðt, τÞ = Θðt, τÞ
W ðu1 ðτÞ, u2 ðτÞÞ
ð10:204Þ
1
= e - 2mðt - τÞ sin ωðt - τÞΘðt, τÞ,
r

where u1 ðt Þ = exp - 2m
rt
expðiωt Þ and u2 ðt Þ = exp - 2m
rt
expð- iωt Þ.
We examine whether G(t, τ) is eligible for the Green’s function as follows:

dG 1 r
e - 2mðt - τÞ sin ωðt - τÞ þ ωe - 2mðt - τÞ cos ωðt - τÞ Θðt, τÞ
r r
= -
dt ω 2m
1
þ e - 2mðt - τÞ sin ωðt - τÞ½δðt - τÞθðτÞ þ δðτ - t Þθð - τÞ:
r

ω
ð10:205Þ

The second term of (10.205) vanishes because sinω(t - τ)δ(t - τ) = 0  δ(t -


τ) = 0. Thus,

d2 G 1 r 2 r
e - 2mðt - τÞ sin ωðt - τÞ - ωe - 2mðt - τÞ cos ωðt - τÞ
r r
¼ -
dt 2 ω 2m m

1 r
- ω2 e - 2mðt - τÞ sin ωðt - τÞ Θðt, τÞ þ e - 2mðt - τÞ sin ωðt - τÞ
r r
-
ω 2m

þ ωe - 2mðt - τÞ cos ωðt - τÞ × ½δðt - τÞθðτÞ þ δðτ - t Þθð - τÞ:


r

ð10:206Þ

In the last term using the property of the δ function and θ function, we get δ(t - τ).
That is, the following part in (10.206) is calculated such that

e - 2mðt - τÞ cos ωðt - τÞ½δðt - τÞθðτÞ þ δðτ - t Þθð - τÞ


r

= e - 2m0 cosðω  0Þfδðt - τÞ½θðτÞ þ θð - τÞg


r

= δðt - τÞ½θðτÞ þ θð - τÞ = δðt - τÞ:

Thus, rearranging (10.206), we get


10.6 Initial-Value Problems (IVPs) 445

d2 G 1 r 2
þ ω2 e - 2mðt - τÞ sin ωðt - τÞ
r
¼ δ ðt - τ Þ þ - -
dt 2 ω 2m

r r
e - 2mðt - τÞ sin ωðt - τÞ þ ωe - 2mðt - τÞ cos ωðt - τÞ
r r
- - Θðt, τÞ
m 2m
k r dG
¼ δ ðt - τ Þ - G- ,
m m dt
ð10:207Þ

where we used (10.202) and (10.204) for the last equality. Rearranging (10.207)
once again, we have

d 2 G r dG k
þ þ G = δðt - τÞ: ð10:208Þ
dt 2 m dt m

Defining the following operator

d2 r d k
Lt  þ þ , ð10:209Þ
dt 2 m dt m

we get

Lt G = δðt - τÞ: ð10:210Þ

Note that this expression is consistent with (10.164). Thus, we find that (10.210)
satisfies the condition (10.123) of the Green’s function, where the weight function is
identified with unity.
Now, suppose that a sinusoidally changing external field eiΩt influences the
motion of the damped oscillator. Here we assume that an amplitude of the external
field is unity. Then, we have

d2 u r du k 1
þ þ u = eiΩt : ð10:211Þ
dt 2 m dt m m

Thus, as a solution of the homogeneous boundary conditions [i.e., uð0Þ =


_
uð0Þ = 0] we get

t
1
e - 2mðt - τÞ eiΩt sin ωðt - τÞdτ,
r
uðt Þ = ð10:212Þ
mω 0

where t is an arbitrary positive or negative time. Equation (10.212) shows that with
the real part we have an external field cosΩt and that with the imaginary part we have
an external field sinΩt. To calculate (10.212) we use
446 10 Introductory Green’s Functions

1 iωðt - τÞ
sin ωðt - τÞ = e - e - iωðt - τÞ : ð10:213Þ
2i

Then, the equation can readily be solved by integration of exponential functions,


even though we have to do somewhat lengthy (but straightforward) calculations.
Thus for the real part (i.e., the external field is cosΩt), we get a solution

1 r 1 r 2 1 2
Cuðt Þ ¼ Ω sin Ωt þ cos Ωt - Ω - ω2 cos Ωt ð10:214Þ
m m m 2m m
1 - 2mr t r 1 2
þ e Ω2 - ω2 cos ωt - Ω þ ω2 sin ωt
m 2m ω

r 2 r 3 1
- cos ωt - sin ωt ,
2m 2m ω

where C is a constant [i.e., a constant denominator of u(t)] expressed as

2 r2 r 4
C = Ω 2 - ω2 þ Ω2 þ ω 2 þ : ð10:215Þ
2m2 2m

For the imaginary part (i.e., the external field is sinΩt) we get

1 2 1 r 1 r 2
Cuðt Þ = - Ω - ω2 sin Ωt - Ω cos Ωt þ sin Ωt
m m m m 2m
ð10:216Þ
1 Ω 2 r r 2Ω
þ e - 2mt
r
Ω - ω2 sin ωt þ Ω cos ωt þ sin ωt :
m ω m 2m ω

In Fig. 10.5 we show an example that depicts the positions of a damped oscillator
as a function of time. In Fig. 10.5a, an amplitude of an envelope gradually dimin-
ishes with time. An enlarged diagram near the origin (Fig. 10.5b) clearly reflects the
initial conditions uð0Þ = uð_0Þ = 0. In Fig. 10.5, we put m = 1 [kg], Ω = 1 [1/s],
ω = 0.94 [1/s], and r = 0.006 [kg/s]. In the above calculations, if r/m is small enough
(i.e., damping is small enough), the third order and fourth order of r/m may be
ignored and the approximation is precise enough.
In the case of inhomogeneous BCs, given σ 1 = u(0) and σ 2 = uð_0Þ we can decide
additional surface term S(t) such that

r 1 - 2mr t
sin ωt þ σ 1 e - 2mt cos ωt:
r
Sð t Þ = σ 2 þ σ 1 e ð10:217Þ
2m ω

The term S(t) corresponds to the second and third terms of (10.190). Thus, from
(10.212) and (10.217),
10.6 Initial-Value Problems (IVPs) 447

(a)
0.6

0.4

0.2
Position (m)

-0.2

-0.4
Phase: 0
-0.6
–446 Time (s) 754

(b) 0.6

0.4

0.2
Position (m)

-0.2

Phase: 0
-0.4

-0.6
–110 Time (s) 110

Fig. 10.5 Example of a damped oscillation as a function of t. The data are taken from (10.214). (a)
Overall profile. (b) Profile enlarged near the origin

uð t Þ þ Sð t Þ

gives a unique solution for the SOLDE with inhomogeneous BCs. Notice that S(t)
does not depend on the external field.
In the above calculations, use
448 10 Introductory Green’s Functions

1 - 2mr ðt - τÞ
F ðt, τÞ = e sin ωðt - τÞ
ω

together with a(t)  1, b(t)  r/m, d(t)  eiΩt/m with respect to (10.190). The
conformation of (10.217) is left for readers.

10.7 Eigenvalue Problems

We often encounter eigenvalue problems in mathematical physics. Of these, those


related to Hermitian differential operators have particularly interesting and important
features. The eigenvalue problems we have considered in Part I are typical illustra-
tions. Here we investigate general properties of the eigenvalue problems.
Returning to the case of homogeneous BCs, we consider a following homoge-
neous SOLDE:

d2 u du
að x Þ þ bðxÞ þ cðxÞu = 0: ð10:5Þ
dx2 dx

Defining a following differential operator Lx such that

d2 d
Lx = aðxÞ þ bðxÞ þ cðxÞ, ð10:55Þ
dx2 dx

we have a homogeneous equation

Lx uðxÞ = 0: ð10:218Þ

Putting a constant -λ instead of c(x), we have

d2 u du
að x Þ þ bðxÞ - λu = 0: ð10:219Þ
dx2 dx

If we define a differential operator Lx such that

d2 d
Lx = a ð x Þ þ bðxÞ - λ, ð10:220Þ
dx2 dx

we have a homogeneous equation

Lx u = 0 ð10:221Þ

to express (10.219). Instead, if we define a differential operator Lx such that


10.7 Eigenvalue Problems 449

d2 d
Lx = a ð x Þ þ bðxÞ ,
dx2 dx

we have the same homogeneous equation

Lx u = λu ð10:222Þ

to express (10.219).
Equations (10.221) and (10.222) are essentially the same except that the expres-
sion is different. The expression using (10.222) is familiar to us as an eigenvalue
equation. The difference between (10.5) and (10.219) is that whereas c(x) in (10.5) is
a given fixed function, λ in (10.219) is constant, but may be varied according to the
solution of u(x). One of the most essential properties of the eigenvalue problem that
is posed in the form of (10.222) is that its solution is not uniquely determined as
already studied in various cases of Part I.
Remember that the methods based upon the Green’s function are valid for a
problem to which a homogeneous differential equation has a trivial solution (i.e.,
identically zero) under homogeneous BCs. In contrast to this situation, even though
the eigenvalue problem is basically posed as a homogeneous equation under homo-
geneous BCs, non-trivial solutions are expected to be obtained. In this respect, we
have seen that in Part I we rejected a trivial solution (i.e., identically zero) because of
no physical meaning.
As exemplified in Part I, the eigenvalue problems that appear in mathematical
physics are closely connected to the Hermiticity of (differential) operators. This is
because in many cases an eigenvalue is required to be real. We have already
examined how we can convert a differential operator to the self-adjoint form. That
is, if we define p(x) as in (10.26), we have the self-adjoint operator as described in
(10.70). As a symbolic description, we have

d du
wðxÞLx u = pð x Þ þ cðxÞwðxÞu: ð10:223Þ
dx dx

In the same way, multiplying both sides of (10.222) by w(x), we get

wðxÞLx u = λwðxÞu: ð10:224Þ

For instance, Hermite differential equation that has already appeared as (2.118) in
Sects. 2.3 is described as

d2 u du
- 2x þ 2nu = 0: ð10:225Þ
dx2 dx

If we express (10.225) as in (10.224), multiplying e - x on both sides of (10.225)


2

we have
450 10 Introductory Green’s Functions

d 2 du
e-x þ 2ne - x u = 0:
2
ð10:226Þ
dx dx

Notice that the differential operator has been converted to a self-adjoint form
according to (10.31) that defines a real and positive weight function e - x in the
2

present case. The domain of the Hermite differential equation is (-1, +1) at the
endpoints (i.e., ±1) of which the surface term of RHS of (10.69) approaches zero
sufficiently rapidly in virtue of e - x .
2

In Sects. 3.4, and 3.5, in turn, we dealt with the (associated) Legendre differential
equation (3.127) for which the relevant differential operator is self-adjoint. The
surface term corresponding to (10.69) vanishes. It is because (1 - ξ2) vanishes at
the endpoints ξ = cos θ = ± 1 (i.e., θ = 0 or π) from (3.107). Thus, the Hermiticity
is automatically ensured for the (associated) Legendre differential equation as well
as Hermite differential equation. In those cases, even though the differential equa-
tions do not satisfy any particular BCs, the Hermiticity is yet ensured.
In the theory of differential equations, the aforementioned properties of the
Hermitian operators have been fully investigated as the so-called Strum-Liouville
system (or problem) in the form of a homogeneous differential equation. The related
differential equations are connected to classical orthogonal polynomials having
personal names such as Hermite, Laguerre, Jacobi, Gegenbauer, Legendre,
Tchebichef, etc. These equations frequently appear in quantum mechanics and
electromagnetism as typical examples of Strum-Liouville system. They can be
converted to differential equations by multiplying an original form by a weight
function. The resulting equations can be expressed as

d dY ðxÞ
aðxÞwðxÞ n þ λn wðxÞY n ðxÞ = 0, ð10:227Þ
dx dx

where Yn(x) is a collective representation of classical orthogonal polynomials.


Equation (10.226) is an example. Conventionally, a following form is adopted
instead of (10.227):

1 d dY ðxÞ
aðxÞwðxÞ n þ λn Y n ðxÞ = 0, ð10:228Þ
wðxÞ dx dx

where we put (aw)′ = bw. That is, the differential equation is originally described as

d2 Y n ðxÞ dY ðxÞ
að x Þ þ bðxÞ n þ λn Y n ðxÞ = 0: ð10:229Þ
dx2 dx

In the case of Hermite polynomials, for instance, a(x) = 1 and wðxÞ = e - x . Since
2

we have ðawÞ0 = - 2xe - x = bw, we can put b = - 2x. Examples including this
2

case are tabulated in Table 10.2. The eigenvalues λn are associated with real numbers
10.7
Eigenvalue Problems

Table 10.2 Classical polynomials and their related SOLDEs


Name of the polynomial SOLDE form Weight function: w(x) Domain
2
Hermite: Hn(x) d2 d e-x (-1, +1)
dx2 H n ðxÞ - 2x dx H n ðxÞ þ 2nH n ðxÞ = 0
Laguerre: Lνn ðxÞ d2 ν d ν xνe-x (ν > - 1) [0, +1)
x dx2 Ln ðxÞ þ ðν þ 1 - xÞ dx Ln ðxÞ þ nLνn ðxÞ = 0
2
Gegenbauer: C λn ðxÞ d λ d λ λ λ - 12 [-1, +1]
ð1 - x2 Þ dx 2 C n ðxÞ - ð2λ þ 1Þx dx C n ðxÞ þ nðn þ 2λÞC n ðxÞ = 0 ð 1 - x2 Þ
λ > - 12
2
Legendre: Pn(x) d d 1 [-1, +1]
ð1 - x2 Þ dx 2 Pn ðxÞ - 2x dx Pn ðxÞ þ nðn þ 1ÞPn ðxÞ = 0
451
452 10 Introductory Green’s Functions

that characterize the individual physical systems. The related fields have wide
applications in many branches of natural science.
After having converted the operator to the self-adjoint form; i.e., Lx = Lx{, instead
of (10.100) we have
s
dxwðxÞfv ðLx uÞ - ½Lx v ug = 0: ð10:230Þ
r

Rewriting it, we get


s s
dxv ½wðxÞLx u = dx½wðxÞLx v u: ð10:231Þ
r r

If we use an inner product notation described by (10.88), we get

hvjLx ui = hLx vjui: ð10:232Þ

Here let us think of two eigenfunctions ψ i and ψ j that belong to an eigenvalue λi


and λj, respectively. That is,

wðxÞLx ψ i = λi wðxÞψ i and wðxÞLx ψ j = λj wðxÞψ j : ð10:233Þ

Inserting ψ i and ψ j into u and v, respectively, in (10.232), we have

ψ j jLx ψ i = ψ j jλi ψ i = λi ψ j jψ i = λj  ψ j jψ i = λj ψ j jψ i
= Lx ψ j jψ i : ð10:234Þ

With the second and third equalities, we have used a rule of the inner product (see
Parts I and III). Therefore, we get

λi - λj  ψ j jψ i = 0: ð10:235Þ

Putting i = j in (10.235), we get

ðλi - λi  Þhψ i jψ i i = 0: ð10:236Þ

An inner product hψ i| ψ ii vanishes if and only if jψ ii  0; see inner product


calculation rules of Sect. 13.1. However, j ψ ii  0 is not acceptable as a physical
state. Therefore, we must have hψ i| ψ ii ≠ 0. Thus, we get

λi - λi  = 0 or λi = λi  : ð10:237Þ
References 453

The relation (10.237) obviously indicates that λi is real; i.e., we find that eigen-
values of an Hermitian operator are real. If λi ≠ λj = λj, from (10.235) we get

ψ j jψ i = 0: ð10:238Þ

That is, | ψ ji and | ψ ii are orthogonal to each other.


We often encounter related orthogonality relationship between vectors and func-
tions. We saw several cases in Part I and will see other cases in Part III.

References

1. Dennery P, Krzywicki A (1996) Mathematics for physicists. Dover, New York


2. Stakgold I (1998) Green’s functions and boundary value problems, 2nd edn. Wiley, New York
Part III
Linear Vector Spaces

In this part, we treat vectors and their transformations in linear vector spaces so that
we can address various aspects of mathematical physics systematically but intui-
tively. We outline the general principles of linear vector spaces mostly from an
algebraic point of view. Starting with abstract definition and description of vectors,
we deal with their transformation in a vector space using a matrix. An inner product
is a central concept in the theory of a linear vector space so that two vectors can be
associated with each other to yield a scalar. Unlike many books of linear algebra and
linear vector spaces, however, we describe canonical forms of matrices before
considering the inner product. This is because we can treat the topics in light of
the abstract theory of matrices and vector space without a concept of the inner
product. Of the canonical forms of matrices, Jordan canonical form is of paramount
importance. We study how it is constructed providing a tangible example.
In relation to the inner product space, normal operators such as Hermitian
operators and unitary operators frequently appear in quantum mechanics and elec-
tromagnetism. From a general aspect, we revisit the theory of Hermitian operators
that often appeared in both Parts I and II.
The last part deals with exponential functions of matrices. The relevant topics can
be counted as one of the important branches of applied mathematics. Also, the
exponential functions of matrices play an essential role in the theory of continuous
groups that will be dealt with in Parts IV and V.
Chapter 11
Vectors and Their Transformation

In this chapter, we deal with the theory of finite-dimensional linear vector spaces.
Such vector spaces are spanned by a finite number of linearly independent vectors,
namely, basis vectors. In conjunction with developing an abstract concept and
theory, we mention a notion of mapping among mathematical elements. A linear
transformation of a vector is a special kind of mapping. In particular, we focus on
endomorphism within a n-dimensional vector space Vn. Here, the endomorphism is
defined as a linear transformation: Vn → Vn. The endomorphism is represented by a
(n, n) square matrix. This is most often the case with physical and chemical
applications, when we deal with matrix algebra. In this book we focus on this type
of transformation.
A non-singular matrix plays an important role in the endomorphism. In this
connection, we consider its inverse matrix and determinant. All these fundamental
concepts supply us with a sufficient basis for better understanding of the theory of
the linear vector spaces. Through these processes, we should be able to get
acquainted with connection between algebraic and analytical approaches and gain
a broad perspective on various aspects of mathematical physics and related fields.

11.1 Vectors

From both fundamental and practical points of view, it is desirable to define linear
vector spaces in an abstract way. Suppose that V is a set of elements denoted by a, b,
c, etc. called vectors. The set V is a linear vector space (or simply a vector space), if a
sum a + b 2 V is defined for any pair of vectors a and b and the elements of V satisfy
the following mathematical relations:

ða þ bÞ þ c = a þ ðb þ cÞ, ð11:1Þ

© Springer Nature Singapore Pte Ltd. 2023 457


S. Hotta, Mathematical Physical Chemistry,
https://doi.org/10.1007/978-981-99-2512-4_11
458 11 Vectors and Their Transformation

a þ b = b þ a, ð11:2Þ

a þ 0 = a, ð11:3Þ

a þ ð- aÞ = 0: ð11:4Þ

For the above 0 is called the zero vector. Furthermore, for a 2 V, ca 2 V is defined
(c is a complex number called a scalar) and we assume the following relations among
vectors (a and b) and scalars (c and d ):

ðcd Þa = cðdaÞ, ð11:5Þ

1a = a, ð11:6Þ

cða þ bÞ = ca þ cb, ð11:7Þ

ðc þ dÞa = ca þ da: ð11:8Þ

On the basis of the above relations, we can construct the following expression
called a linear combination:

c1 a1 þ c 2 a2 þ ⋯ þ c n an :

If this linear combination is equated to zero, we obtain

c1 a1 þ c2 a2 þ ⋯ þ cn an = 0: ð11:9Þ

If (11.9) holds only in the case where every ci = 0 (1 ≤ i ≤ n), the vectors a1, a2, ⋯,
an are said to be linearly independent. In this case the relation represented by (11.9)
is said to be trivial. If the relation is non-trivial (i.e., ∃ci ≠ 0), those vectors are said to
be linearly dependent.
If in the vector space V the maximum number of linearly independent vectors is
n, V is said to be an n-dimensional vector space and sometimes denoted by Vn. In this
case any vector x of Vn is expressed uniquely as a linear combination of linearly
independent vectors such that

x = x 1 a1 þ x 2 a2 þ ⋯ þ x n an : ð11:10Þ

Suppose x is denoted by
11.1 Vectors 459

x = x01 a1 þ x02 a2 þ ⋯ þ x0n an : ð11:11Þ

Subtracting both sides of (11.11) from (11.10), we obtain

0 = x1 - x01 a1 þ x2 - x02 a2 þ ⋯ þ xn - x0n an : ð11:12Þ

Linear independence of the vectors a1, a2, ⋯, an implies


xn - x0n = 0; i:e:, xn = x0n ð1 ≤ i ≤ nÞ. These n linearly independent vectors are referred
to as basis vectors. A vector space that has a finite number of basis vectors is called
finite-dimensional; otherwise, it is infinite-dimensional.
Alternatively, we express (11.10) as

x1
x = ða1 ⋯an Þ ⋮ : ð11:13Þ
xn

x1
A set of coordinates ⋮ is called a column vector (or a numerical vector) that
xn
indicates an “address” of the vector x with respect to the basis vectors a1, a2, ⋯, an.
These vectors are represented as a row vector in (11.13). Any vector in Vn can be
expressed as a linear combination of the basis vectors and, hence, we say that Vn is
spanned by a1, a2, ⋯, an. This is represented as

V n = Spanfa1 , a2 , ⋯, an g: ð11:14Þ

Let us think of a subset W of Vn (i.e., W ⊂ Vn). If the following relations hold for
W, W is said to be a (linear) subspace of Vn.

a, b 2 W ) a þ b 2 W,
ð11:15Þ
a 2 W ) ca 2 W:

These two relations ensure that the relations of (11.1)-(11.8) hold for W as well.
The dimension of W is equal to or smaller than n. For instance, W = Span{a1, a2, ⋯,
ar} (r ≤ n) is a subspace of Vn. If r = n, W = Vn. Suppose that there are two
subspaces W1 = Span{a1} and W2 = Span{a2}. Note that in this case W1 [ W2 is not
a subspace, because W1 [ W2 does not contain a1 + a2. However, a set U defined by

U = x = x 1 þ x2 ; 8 x1 2 W 1 , 8 x2 2 W 2 ð11:16Þ

is a subspace of Vn. We denote this subspace by W1 + W2.


To show this is in fact a subspace, suppose that x, y 2 W1 + W2. Then, we may
express x = x1 + x2 and y = y1 + y2, where x1, y1 2 W1; x2, y2 2 W2. We have
x + y = (x1 + y1) + (x2 + y2), where x1 + y1 2 W1 and x2 + y2 2 W2 because both W1
and W2 are subspaces. Therefore, x + y 2 W1 + W2. Meanwhile, with any scalar
460 11 Vectors and Their Transformation

c, cx = cx1 + cx2 2 W1 + W2. By definition (11.15), W1 + W2 is a subspace


accordingly. Suppose here x1 2 W1. Then, x1 = x1 + 0 2 W1 + W2. Then,
W1 ⊂ W1 + W2. Similarly, we have W2 ⊂ W1 + W2. Thus, W1 + W2 contains both
W1 and W2. Conversely, let W be an arbitrary subspace that contains both W1 and W2.
Then, we have 8x1 2 W1 ⊂ W and 8x2 2 W2 ⊂ W and, hence, we have x1 + x2 2 W by
definition (11.15). But, from (11.16) W1 + W2 = {x1 + x2; 8x1 2 W1, 8x2 2 W2}.
Hence, W1 + W2 ⊂ W. Consequently, any subspace necessarily contains W1 + W2.
This implies that W1 + W2 is the smallest subspace that contains both W1 and W2.
Example 11.1 Consider a three-dimensional Cartesian space ℝ3 (Fig. 11.1). We
regard the xy-plane and yz-plane as a subspace W1 and W2, respectively, and
! ! !
ℝ3 = W1 + W2. In Fig. 11.1a, a vector OB (in ℝ3) is expressed as OA þ AB (i.e.,
!
a sum of a vector in W1 and that in W2). Alternatively, the same vector OB can be
! !
expressed as OA0 þ A0 B . Conversely, we can designate a subspace in a different
way; i.e., in Fig. 11.1b the z-axis is chosen for a subspace W3 instead of W2. We have
!
ℝ3 = W1 + W3 as well. In this case, however, OB is uniquely expressed as
! ! !
OB = OP þ PB . Notice that in Fig. 11.1a W1 \ W2 = Span{e2}, where e2 is a
unit vector in the positive direction of the y-axis. In Fig. 11.1b, however, we have
W1 \ W3 = {0}.
We can generalize this example to the following theorem.
Theorem 11.1 Let W1 and W2 be subspaces of V and V = W1 + W2. Then a vector x
in V is uniquely expressed as

x = x 1 þ x2 ; x 1 2 W 1 , x 2 2 W 2 ,

if and only if W1 \ W2 = {0}.


Proof Suppose W1 \ W2 = {0} and x = x1 þ x2 = x01 þ x02 , x1 , x01 2 W 1 , x2 , x02 2 W 2 .
Then x1 - x01 = x02 - x2 . LHS belongs to W1 and RHS belongs to W2. Both sides
belong to W1 \ W2 accordingly. Hence, from the supposition both the sides should be
equal to a zero vector. Therefore, x1 = x01 , x2 = x02 . This implies that x is expressed
uniquely as x = x1 + x2. Conversely, suppose the vector representation (x = x1 + x2)
is unique and x 2 W1 \ W2. Then, x = x + 0 = 0 + x; x,0 2 W1 and x,0 2 W2.
Uniqueness of the representation implies that x = 0. Consequently, W1 \ W2 = {0} fol-
lows. This completes the proof.
In case W1 \ W2 = {0}, V = W1 + W2 is said to be a direct sum of W1 and W2 or
we say that V is decomposed into a direct sum of W1 and W2. We symbolically
denote this by

V = W 1 ⨁W 2 : ð11:17Þ

In this case, the following equality holds:


11.1 Vectors 461

(a)
z z

B B

O O
y y

A A’
x x

(b) z

2
y

3
x
Fig. 11.1 Decomposition of a vector in a three-dimensional Cartesian space ℝ3 into two subspaces.
(a) ℝ3 = W1 + W2; W1 \ W2 = Span{e2}, where e2 is a unit vector in the positive direction of the y-
axis. (b) ℝ3 = W1 + W3; W1 \ W3 = {0}

dim V = dimW 1 þ dimW 2 , ð11:18Þ

where “dim” stands for dimension of the vector space considered. To prove (11.18),
we suppose that V is a n-dimensional vector space and that W1 and W2 are spanned
by r1 and r2 linearly independent vectors, respectively, such that
462 11 Vectors and Their Transformation

ð1Þ ð1Þ ð2Þ ð2Þ


W 1 = Span e1 , e2 , ⋯, eðr11 Þ and W 2 = Span e1 , e2 , ⋯, eðr22 Þ : ð11:19Þ

This is equivalent to that dimension of W1 and W2 is r1 and r2, respectively. If


V = W1 + W2 (here we do not assume that the summation is a direct sum), we have

ð1Þ ð1Þ ð2Þ ð2Þ


V = Span e1 , e2 , ⋯, erð11 Þ , e1 , e2 , ⋯, eðr22 Þ : ð11:20Þ

Then, we have n ≤ r1 + r2. This is almost trivial. Suppose r1 + r2 < n. Then, these
(r1 + r2) vectors cannot span V, but we need additional vectors for all vectors
including the additional vectors to span V. Thus, we should have n ≤ r1 + r2
accordingly. That is,

dim V ≤ dimW 1 þ dimW 2 : ð11:21Þ

ð2Þ
Now, let us assume that V = W1 ⨁ W2. Then, ei ð1 ≤ i ≤ r 2 Þ must be linearly
ð1Þ ð1Þ ð2Þ
independent of e1 , e2 , ⋯, erð11 Þ . If not, ei could be described as a linear combina-
ð1Þ ð1Þ ð2Þ ð2Þ
tion of e1 , e2 , ⋯, eðr11 Þ . But this would imply that ei 2 W 1 , i.e., ei 2 W 1 \ W 2 ,
in contradiction to that we have V = W1 ⨁ W2. It is because W1 \ W2 = {0} by
ð1Þ ð2Þ ð2Þ
assumption. Likewise, ej ð1 ≤ j ≤ r 1 Þ is linearly independent of e1 , e2 , ⋯, eðr22 Þ .
ð1Þ ð1Þ ð2Þ ð2Þ
Hence, e1 , e2 , ⋯, eðr11 Þ , e1 , e2 , ⋯, eðr22 Þ must be linearly independent. Thus,
n ≥ r1 + r2. This is because in the vector space V we may well have additional
vector(s) that are independent of the above (r1 + r2) vectors. Meanwhile, n ≤ r1 + r2
from the above. Consequently, we must have n = r1 + r2. Thus, we have proven that

V = W 1 ⨁W 2 ⟹ dim V = dimW 1 þ dimW 2 :

Conversely, suppose n = r1 + r2. Then, any vector x in V is expressed uniquely as

ð1Þ ð2Þ
x = a1 e1 þ ⋯ þ ar1 erð11 Þ þ b1 e1 þ ⋯ þ br2 eðr22 Þ : ð11:22Þ

The vector described by the first term is contained in W1 and that described by the
second term in W2. Both the terms are again expressed uniquely. Therefore, we get
V = W1 ⨁ W2. This is a proof of

dim V = dimW 1 þ dimW 2 ⟹ V = W 1 ⨁W 2 :

The above statements are summarized as the following theorem:


Theorem 11.2 Let V be a vector space and let W1 and W2 be subspaces of V. Also
suppose V = W1 + W2. Then, the following relation holds:
11.2 Linear Transformations of Vectors 463

dim V ≤ dimW 1 þ dimW 2 : ð11:21Þ

Furthermore, we have

dim V = dimW 1 þ dimW 2 , ð11:23Þ

if and only if V = W1 ⨁ W2.


Theorem 11.2 is readily extended to the case where there are three or more
subspaces. That is, having W1, W2, ⋯, Wm so that V = W1 + W2⋯ + Wm, we obtain
the following relation:

dim V ≤ dimW 1 þ dimW 2 þ ⋯ þ dimW m : ð11:24Þ

The equality of (11.24) holds if and only if V = W1 ⨁ W2 ⨁ ⋯ ⨁ Wm.


In light of Theorem 11.2, Example 11.1 says that
3 = dim ℝ3 < 2 + 2 = dim W1 + dim W2. But dim ℝ3 = 2 + 1 = dim W1 + dim W3.
Therefore, we have ℝ3 = W1 ⨁ W3.

11.2 Linear Transformations of Vectors

In the previous section we introduced vectors and their calculation rules in a linear
vector space. It is natural and convenient to relate a vector to another vector, as a
function f relates a number (either real or complex) to another number such that
y = f(x), where x and y are certain two numbers. A linear transformation from the
vector space V to another vector space W is a mapping A : V → W such that

Aðca þ dbÞ = cAðaÞ þ dAðbÞ: ð11:25Þ

We will briefly discuss the concepts of mapping at the end of this section.
It is convenient to define addition of the linear transformations. It is defined as

ðA þ BÞa = Aa þ Ba, ð11:26Þ

where a is any vector in V.


Since (11.25) is a broad but abstract definition, we begin with a well-known
simple example of rotation of a vector within a xy-plane (Fig. 11.2). We denote an
arbitrary position vector x in the xy-plane by
464 11 Vectors and Their Transformation

Fig. 11.2 Rotation of a


vector x within a xy-plane

T
T

x = xe1 þ ye2
x ð11:27Þ
= ð e1 e2 Þ ,
y

where e1 and e2 are unit basis vectors in the xy-plane and x and y are coordinates of
the vector x in reference to e1 and e2. The expression (11.27) is consistent with
(11.13). The rotation represented in Fig. 11.2 is an example of a linear transforma-
tion. We call this rotation R. According to the definition

Rðxe1 þ ye2 Þ = RðxÞ = xRðe1 Þ þ yRðe2 Þ: ð11:28Þ

Putting

Rðxe1 þ ye2 Þ = x0 , Rðe1 Þ = e01 and Rðe2 Þ = e02 ,

we have

x0 = xe01 þ ye02
x ð11:29Þ
= ð e01 e02 Þ :
y

From Fig. 11.2 we readily obtain

e01 = e1 cos θ þ e2 sin θ,


ð11:30Þ
e02 = - e1 sin θ þ e2 cos θ:

Using a matrix representation,


11.2 Linear Transformations of Vectors 465

cos θ - sin θ
e1′ e2′ = ðe1 e2 Þ : ð11:31Þ
sin θ cos θ

Substituting (11.30) into (11.29), we obtain

x0 = ðxcos θ - y sin θÞe1 þ ðxsin θ þ y cos θÞe2 : ð11:32Þ

Meanwhile, x′ can be expressed relative to the original basis vectors e1 and e2.

x0 = x0 e1 þ y 0 e2 : ð11:33Þ

Comparing (11.32) and (11.33), uniqueness of the representation ensures that

x0 = x cos θ - y sin θ,
ð11:34Þ
y0 = x sin θ þ y cos θ:

Using a matrix representation once again,

x′ cos θ - sin θ x
= : ð11:35Þ
y′ sin θ cos θ y

Further combining (11.29) and (11.31), we get

cos θ - sin θ x
x ′ = RðxÞ = ðe1 e2 Þ : ð11:36Þ
sin θ cos θ y

The above example demonstrates that the linear transformation R has a (2, 2)
matrix representation shown in (11.36). Moreover, this example obviously shows
that if a vector is expressed as a linear combination of the basis vectors, the
“coordinates” (represented by a column vector) can be transformed as well by the
same matrix.
Regarding an abstract n-dimensional linear vector space Vn, the linear vector
transformation A is given by

a11 ⋯ a1n x1
A ð xÞ = ð e1 ⋯ en Þ ⋮ ⋱ ⋮ ⋮ , ð11:37Þ
an1 ⋯ ann xn

where e1, e2, ⋯, and en are basis vectors and x1, x2, ⋯, and xn are the corresponding
n
coordinates of a vector x = xi ei . We assume that the transformation is a mapping
i=1
A: Vn → Vn (i.e., endomorphism). In this case the transformation is represented by a
(n, n) matrix. Note that the matrix operates on the basis vectors from the right and
that it operates on the coordinates (i.e., a column vector) from the left. In (11.37) we
often omit a parenthesis to simply write Ax.
466 11 Vectors and Their Transformation

Here we mention matrix notation for later convenience. We often identify a linear
vector transformation with its representation matrix and denote both transformation
and matrix by A. On this occasion, we write

a11 ⋯ a1n
A= ⋮ ⋱ ⋮ , A = ðAÞij = aij , etc:, ð11:38Þ
an1 ⋯ ann

where with the second expression (A)ij and (aij) represent the matrix A itself; for we
~ etc. The notation (11.38)
frequently use indexed matrix notations such as A-1, A{, A,
can conveniently be used in such cases. Note moreover that aij represents the matrix
A as well.
Equation (11.37) has duality such that the matrix A operates either on the basis
vectors or coordinates. This can explicitly be written as

a11 ⋯ a1n x1
A ð xÞ = ð e1 ⋯ e n Þ ⋮ ⋱ ⋮ ⋮
an1 ⋯ ann xn
ð11:39Þ
a11 ⋯ a1n x1
= ð e1 ⋯ en Þ ⋮ ⋱ ⋮ ⋮ :
an1 ⋯ ann xn

That is, we assume the associative law with the above expression. Making
summation representation, we have
n
x1 l = 1 a1l xl
n n
AðxÞ = ea
k = 1 k k1
⋯ ea
k = 1 k kn
⋮ = ð e 1 ⋯ en Þ ⋮
n
xn l = 1 anl xl
n n
= k=1
ea x
l = 1 k kl l
n n n n
= l=1
ea
k = 1 k kl
xl = k=1
a x
l = 1 kl l
ek :

That is, the above equation can be viewed in either of two ways, i.e., coordinate
transformation with fix vectors or vector transformation with fixed coordinates.
Also, (11.37) can formally be written as

x1 x1
AðxÞ = ðe1 ⋯ en ÞA ⋮ = ðe1 A ⋯ en AÞ ⋮ , ð11:40Þ
xn xn
11.2 Linear Transformations of Vectors 467

where we assumed that the distributive law holds with operation of A on (e1⋯en).
Meanwhile, if in (11.37) we put xi = 1, xj = 0 ( j ≠ i), we get
n
A ð ei Þ = ea :
k = 1 k ki

Therefore, (11.39) can be rewritten as

x1
AðxÞ = ðAðe1 Þ ⋯ Aðen ÞÞ ⋮ : ð11:41Þ
xn

Since xi (1 ≤ i ≤ n) can arbitrarily be chosen, comparing (11.40) and (11.41)


we have

ei A = Aðei Þ ð1 ≤ i ≤ nÞ: ð11:42Þ

The matrix representation is unique in reference to the same basis vectors.


Suppose that there is another matrix representation of the transformation A such that

a011 ⋯ a01n x1
AðxÞ = ðe1 ⋯ en Þ ⋮ ⋱ ⋮ ⋮ : ð11:43Þ
a0n1 ⋯ a0nn xn

Subtracting (11.43) from (11.37), we obtain

a11 ⋯ a1n x1 a011 ⋯ a01n x1


ðe1 ⋯ en Þ ⋮ ⋱ ⋮ ⋮ - ⋮ ⋱ ⋮ ⋮
an1 ⋯ ann xn a0n1 ⋯ a0nn xn
n
k=1 a1k - a01k xk
= ð e1 ⋯ en Þ ⋮ = 0:
n
k=1 ank - a0nk xk
n
On the basis of the linear dependence of the basis vectors, aik - a0ik xk =
k=1
0 ð1 ≤ i ≤ nÞ: This relationship holds for any arbitrarily and independently chosen
complex numbers xi (1 ≤ i ≤ n). Therefore, we must have aik = a0ik ð1 ≤ i, k ≤ nÞ,
meaning that the matrix representation of A is unique with regard to fixed basis
vectors.
Nonetheless, if a set of vectors e1, e2, ⋯, and en does not constitute basis vectors
(i.e., those vectors are linearly dependent), the aforementioned uniqueness of the
matrix representation loses its meaning. For instance, in V2 take vectors e1 and e2
such that e1 = e2 (i.e., the two vectors are linearly dependent) and let the
468 11 Vectors and Their Transformation

transformation matrix be B = 1
0
0
2
. This means that e1 should be e1 after the
transformation. At the same time, the vector e2 (=e1) should be converted to
2e2 (=2e1). It is impossible except for the case of e1 = e2 = 0. The above matrix
B in its own right is an object of matrix algebra, of course.
Putting a = b = 0 and c = d = 1 in the definition of (11.25) for the linear
transformation, we obtain

Að0Þ = Að0Þ þ Að0Þ:

Combining this relation with (11.4) gives

Að0Þ = 0: ð11:44Þ

Then do we have a vector u ≠ 0 for which A(u) = 0? An answer is yes. This is


1 0
because if a (2, 2) matrix of 0 0
is chosen for R, we get a linear transformation
such that

e01 e02 = ðe1 e2 Þ 1


0
0
0
= ðe1 0Þ:

That is, we have R(e2) = 0.


In general, vectors x (2V ) satisfying A(x) = 0 form a subspace in a vector space
V. This is because A(x) = 0 and A(y) = 0 ⟹ A(x + y) = A(x) + A(y) = 0,
A(cx) = cA(x) = 0. We call this subspace of a maximum dimension a null-space
and represent it as Ker A, where Ker stands for “kernel.” In other words,
Ker A = A-1(0). Note that this symbolic notation does not ensure the existence of
the inverse transformation A-1 (vide infra) but represents a set comprising elements
x that satisfy A(x) = 0. In the above example Ker R = Span{e2}.
Let us introduce frequently used terminology and notation. First, suppose that
with a vector space Vn an endomorphism A of Vn exists. That is, we have

AðV n Þ  AðxÞ; 8 x 2 V n , AðxÞ 2 V n :

Then, A(Vn) forms a subspace of Vn. In fact, for any x, y 2 Vn, we have A(x),
A(y) 2 A(Vn) ⟹ A(x) + A(y) = A(x + y) 2 A(Vn) and cA(x) = A(cx) 2 A(Vn). Since
A is an endomorphism, obviously A(Vn) ⊂ Vn. Hence, A(Vn) is a subspace of Vn. The
subspace A(Vn) is said to be an image of transformation A and sometimes denoted by
Im A.
The number of dimA(Vn) is said to be a rank of the linear transformation A. That
is, we write
11.2 Linear Transformations of Vectors 469

dimAðV n Þ = rank A:

Also the number of dim Ker A is said to be a nullity of the linear transformation A.
That is, we have

dim Ker A = nullity A:

With respect to the two subspaces A(Vn) and Ker A, we show an important
theorem (the so-called dimension theorem) below.
Theorem 11.3 Dimension Theorem Let Vn be a linear vector space of dimension
n. Also let A be an endomorphism: Vn → Vn. Then we have

dim V n = dimAðV n Þ þ dim Ker A: ð11:45Þ

Equation (11.45) can be written succinctly as

dim V n = rank A þ nullity A:

Proof Let e1, e2, ⋯, and en be basis vectors of Vn. First, assume

Aðe1 Þ = Aðe2 Þ = ⋯ = Aðen Þ = 0:

n n
This implies that nullity A = n. Then, A xi ei = xi Aðei Þ = 0. Since xi is
i=1 i=1
8
arbitrarily chosen, the expression means that A(x) = 0 for x 2 Vn. This implies
A = 0. That is, rank A = 0. Thus, (11.45) certainly holds.
n
To proceed with proof of the theorem, we think of a linear combination ci ei .
i=1
Next, assume that Ker A = Span{e1, e2, ⋯, eν} (ν < n); dim Ker A = ν. After A is
n
operated on the above linear combination, we are left with ci Aðei Þ. We put
i = νþ1

n n
c Aðei Þ =
i=1 i
c Aðei Þ = 0:
i = νþ1 i
ð11:46Þ

Suppose that the (n - ν) vectors A(ei) (ν + 1 ≤ i ≤ n) are linearly dependent. Then


without loss of generality we can assume cν + 1 ≠ 0. Dividing (11.46) by cν + 1 we
obtain
470 11 Vectors and Their Transformation

cνþ2 c
Aðeνþ1 Þ þ Aðeνþ2 Þ þ ⋯ þ n Aðen Þ = 0,
cνþ1 cνþ1
c cn
Aðeνþ1 Þ þ A νþ2 eνþ2 þ ⋯ þ A e = 0,
cνþ1 cνþ1 n
cνþ2 c
A eνþ1 þ e þ ⋯ þ n en = 0:
cνþ1 νþ2 cνþ1

Meanwhile, the (ν + 1) vectors e1 , e2 , ⋯, eν , and eνþ1 þ ccνþ2


νþ1
eνþ2 þ ⋯ þ cνþ1
cn
en
are linearly independent, because e1, e2, ⋯, and en are basis vectors of V . This
n

would imply that the dimension of Ker A is ν + 1, but this is in contradiction to


Ker A = Span{e1, e2, ⋯, eν}. Thus, the (n - ν) vectors A(ei) (ν + 1 ≤ i ≤ n) should be
linearly independent.
Let Vn - ν be described as

V n - ν = SpanfAðeνþ1 Þ, Aðeνþ2 Þ, ⋯, Aðen Þg: ð11:47Þ

Then, Vn - ν is a subspace of Vn, and so dimA(Vn) ≥ n - ν = dim Vn - ν


.
Meanwhile, from (11.46) we have

n
A ce
i = νþ1 i i
= 0:

From the above discussion, however, this relation holds if and only if
cν + 1 = ⋯ = cn = 0. This implies that Ker A \ Vn - ν = {0}. Then, we have

V n - ν þ Ker A = V n - ν Ker A:

Furthermore, from Theorem 11.2 we have

dim V n - ν Ker A = dimV n - ν þ dim Ker A = ðn - νÞ þ ν = n = dimV n :

Thus, we must have dimA(Vn) = n - ν = dim Vn - ν. Since Vn - ν is a subspace of


V and Vn - ν ⊂ A(Vn) from (11.47), Vn - ν = A(Vn).
n

To conclude, we get

dimAðV n Þ þ dim Ker A = dimV n ,


ð11:48Þ
V n = A ðV n Þ Ker A:

This completes the proof.


Comparing Theorem 11.3 with Theorem 11.2, we find that Theorem 11.3 is a
special case of Theorem 11.2. Equations (11.45) and (11.48) play an important role
11.2 Linear Transformations of Vectors 471

in the theory of linear vector space. We add that Theorem 11.3 holds with a linear
transformation that connects two vector spaces with different dimensions [1–3].
As an exercise, we have a following example:
Example 11.2 Let e1, e2, e3, e4 be basis vectors of V4. Let A be an endomorphism of
V4 and described by

1 0 1 0
A= 0
1
1
0
1
1
0
0 :
0 1 1 0

We have

1 0 1 0
0 1 1 0
ð e1 e 2 e 3 e4 Þ = ð e1 þ e3 e2 þ e4 e1 þ e2 þ e3 þ e4 0 Þ:
1 0 1 0
0 1 1 0

That is,

A ð e1 Þ = e 1 þ e 3 , Aðe2 Þ = e2 þ e4 , A ð e3 Þ = e 1 þ e 2 þ e3 þ e4 ,
Aðe4 Þ = 0:

We have

Að- e1 - e2 þ e3 Þ = - Aðe1 Þ - Aðe2 Þ þ Aðe3 Þ = 0:

Then, we find

A V 4 = Spanfe1 þ e3 , e2 þ e4 g, Ker A = Spanf - e1 - e2 þ e3 , e4 g: ð11:49Þ

For any x 2 Vn, using scalar ci (1 ≤ i ≤ 4), we have

x = c 1 e1 þ c 2 e2 þ c3 e3 þ c4 e4
1 1
= ðc1 þ c3 Þðe1 þ e3 Þ þ ð2c2 þ c3 - c1 Þðe2 þ e4 Þ ð11:50Þ
2 2
1 1
þ ðc3 - c1 Þð - e1 - e2 þ e3 Þ þ ðc1 - 2c2 - c3 þ 2c4 Þe4 :
2 2

Thus, x has been uniquely represented as (11.50) with respect to basis vectors
(e1 + e3), (e2 + e4), (-e1 - e2 + e3), and e4. The linear independence of these vectors
can easily be checked by equating (11.50) with zero. We also confirm that
472 11 Vectors and Their Transformation

Fig. 11.3 Concept of


mapping from a set X to
injective
another set Y

surjective

bijective: injective + surjective

V4 = A V4 Ker A:

In the present case, (11.45) is read as

dim V 4 = dimA V 4 þ dim Ker A = 2 þ 2 = 4:

Linear transformation is a kind of mapping. Figure 11.3 depicts the concept of


mapping. Suppose two sets of X and Y. The mapping f is a correspondence between
an element x (2X) and y (2Y ). The set f(X) (⊂Y ) is said to be a range of f. (1) The
mapping f is injective: If x1 ≠ x2 ⟹ f(x1) ≠ f(x2). (2) The mapping is surjective:
f(X) = Y. For 8y 2 Y corresponding element(s) x 2 X exist(s). (3) The mapping is
bijective: If the mapping f is both injective and surjective, it is said to be bijective
(or reversible mapping or invertible mapping). A mapping that is not invertible is
said to be a non-invertible mapping.
If the mapping f is bijective, a unique element ∃x 2 X exists for 8y 2 Y such that
f(x) = y. In terms of solving an equation, we say that with any given y we can find a
unique solution x to the equation f(x) = y. In this case x is said to be an
inverse element to y and this is denoted by x = f -1( y). The mapping f -1 is called
an inverse mapping. If the linear transformation is relevant, the mapping is said to be
an inverse transformation.
Here we focus on a case where both X and Y form a vector space and the mapping
is an endomorphism. Regarding the linear transformation A: Vn → Vn (i.e., an
endomorphism of Vn), we have a following important theorem:
Theorem 11.4 Let A: Vn → Vn be an endomorphism of Vn. A necessary and
sufficient condition for the existence of an inverse transformation to A (i.e., A-1) is
A-1(0) = {0}.
11.3 Inverse Matrices and Determinants 473

Proof Suppose A-1(0) = {0}. Then A(x1) = A(x2) ⟺ A(x1 - x2) = 0 ⟺ x1 -


x2 = 0; i.e., x1 = x2. This implies that the transformation A is injective. Other way
round, suppose that A is injective. If A-1(0) ≠ {0}, there should be b (≠0) with which
A(b) = 0. This is, however, in contradiction to that A is injective. Then we must have
A-1(0) = {0}. Thus, A-1(0) = {0} ⟺ A is injective.
Meanwhile, A-1(0) = {0} ⟺ dim A-1(0) = 0 ⟺ dim A(Vn) = n (due to
Theorem 11.3); i.e., A(Vn) = Vn. Then A-1(0) = {0} ⟺ A is surjective. Combining
this with the above-mentioned statement, we have A-1(0) = {0} ⟺ A is bijective.
This statement is equivalent to that an inverse transformation exists. This completes
the proof.
In the proof of Theorem 11.4, to show that A-1(0) = {0} ⟺ A is surjective, we
have used the dimension theorem (Theorem 11.3), for which the relevant vector
space is finite (i.e., n-dimensional). In other words, that A is surjective is equivalent
to that A is injective with a finite-dimensional vector space and vice versa. To
conclude, so far as we are thinking of the endomorphism of a finite-dimensional
vector space, if we can show it is either injective or surjective, the other necessarily
follows and, hence, the mapping is bijective.

11.3 Inverse Matrices and Determinants

The existence of the inverse transformation plays a particularly important role in the
theory of linear vector spaces. The inverse transformation is a linear transformation.
Let x1 = A-1(y1), x2 = A-1(y2). Also, we have A(c1x1 + c2x2) = c1A(x1) + c2A(x2) =
c1y1 + c2y2. Thus, c1x1 + c2x2 = A-1(c1y1 + c2y2) = c1A-1(y1) + c2A-1(y2), showing
that A-1 is a linear transformation. As already mentioned, a matrix that represents a
linear transformation A is uniquely determined with respect to fixed basis vectors.
This should be the case with A-1 accordingly. We have an important theorem
for this.
Theorem 11.5 [4] The necessary and sufficient condition for the matrix A-1 that
represents the inverse transformation to A to exist is that detA ≠ 0 (“det” means a
determinant). Here the matrix A represents the linear transformation A. The matrix
A-1 is uniquely determined and given by

A-1 ij
= ð- 1Þiþj ðM Þji =ðdetAÞ, ð11:51Þ

where (M)ij is the minor with respect to the element Aij.


Proof First, we suppose that the matrix A-1 exists so that it satisfies the following
relation:
474 11 Vectors and Their Transformation

n
k=1
A-1 ik
ðAÞkj = δij ð1 ≤ i, j ≤ nÞ: ð11:52Þ

On this condition, suppose that detA = 0. From the properties of determinants,


this implies that one of the columns of A (let it be the m-th column) can be expressed
as a linear combination of the other columns of A such that

Akm = A c:
j ≠ m kj j
ð11:53Þ

Putting i = m in (11.52), multiplying by cj, and summing over j ≠ m, we get


n
k=1
A-1 mk
A c
j ≠ m kj j
= δ c
j ≠ m mj j
= 0: ð11:54Þ

From (11.52) and (11.53), however, we obtain

n
k=1
A-1 mk
A c
j ≠ m kj j
= A-1 ik
ðAÞkm = 1: ð11:55Þ
i=m

There is the inconsistency between (11.54) and (11.55). The inconsistency


resulted from the supposition that the matrix A-1 exists. Therefore, we conclude
that if detA = 0, A-1 does not exist. Taking contraposition to the above statement, we
say that if A-1 exists, detA ≠ 0. Suppose next that detA ≠ 0. In this case, on the basis
of the well-established result, a unique A-1 exists and it is given by (11.51). This
completes the proof.
Summarizing the characteristics of the endomorphism within a finite-dimensional
vector space, we have

injective ⟺ surjective ⟺ bijective ⟺ detA ≠ 0:

Let a matrix A be

a11 ⋯ a1n
A= ⋮ ⋱ ⋮ : ð11:56Þ
an1 ⋯ ann

The determinant of a matrix A is denoted by detA or by

a11 ⋯ a1n
⋮ ⋱ ⋮ :
an1 ⋯ ann

Here, the determinant is defined as


11.3 Inverse Matrices and Determinants 475

det A  1 2 ⋯ n εðσ Þa1i1 a2i2 ⋯anin , ð11:57Þ


σ=
i1 i2 ⋯ in

where σ means permutation among 1, 2, . . ., n and ε(σ) denotes a sign of + (in the
case of even permutations) or - (for odd permutations).
We deal with triangle matrices for future discussion. It is denoted by


a11
:
T= ⋱ , ð11:58Þ
:
0 ann

where an asterisk (*) means that upper right off-diagonal elements can take any
complex numbers (including zero). A large zero shows that all the lower left
off-diagonal elements are zero. Its determinant is given by

det T = a11 a22 ⋯ann : ð11:59Þ

In fact, focusing on anin we notice that only if in = n, anin does not vanish. Then
we get

det A  1 2 ⋯ n-1 εðσ Þa1i1 a2i2 an - 1in - 1 ann :


σ=
i1 i2 ⋯ in - 1

Repeating this process, we finally obtain (11.59).


The endomorphic linear transformation can be described succinctly as

AðxÞ = y, ð11:60Þ
n n
where we have vectors such that x = xi ei and y = yi ei . In reference to the
i=1 i=1
same set of basis vectors (e1 ⋯ en) and using a matrix representation, we have

a11 ⋯ a1n x1 y1
AðxÞ = ðe1 ⋯ en Þ ⋮ ⋱ ⋮ ⋮ = ð e1 ⋯ en Þ ⋮ : ð11:61Þ
an1 ⋯ ann xn yn

From the unique representation of a vector in reference to the basis vectors,


(11.61) is simply expressed as
476 11 Vectors and Their Transformation

a11 ⋯ a1n x1 y1
⋮ ⋱ ⋮ ⋮ = ⋮ : ð11:62Þ
an1 ⋯ ann xn yn

With a shorthand notation, we have


n
yi = a x
k = 1 ik k
ð1 ≤ i ≤ nÞ: ð11:63Þ

From the above discussion, for the linear transformation A to be bijective, det
A ≠ 0. In terms of the system of linear equations, we say that for (11.62) to have a
x1 y1
unique solution ⋮ for a given ⋮ , we must have detA ≠ 0. Conversely,
xn yn
detA = 0 is equivalent to that (11.62) has indefinite solutions or has no solution. As
far as the matrix algebra is concerned, (11.61) is symbolically described by omitting
a parenthesis as

Ax = y: ð11:64Þ

However, when the vector transformation is explicitly taken into account, the full
representation of (11.61) should be borne in mind.
The relations (11.60) and (11.64) can be considered as a set of simultaneous
equations. A necessary and sufficient condition for (11.64) to have a unique solution
x for a given y is det A ≠ 0. In that case, the solution x of (11.64) can be symbolically
expressed as

x = A - 1 y, ð11:65Þ

where A-1 represents an inverse matrix of A.


Here, we have a good place to supplement somewhat confusing but important
concepts of matrix algebra in relation to the inverse matrix. Let A be a (n, n) matrix
expressed as

a11 ⋯ a1n
A= ⋮ ⋱ ⋮ : ð11:56Þ
an1 ⋯ ann

We often need a matrix “derived” from A of (11.56) so that we may obtain (m, m)
matrix with m < n. Let us call such a matrix derived from A a submatrix A. Among
various types of submatrices A, we think of a special type of matrix Aði, jÞ that is
defined as a submatrix obtained by striking out the i-th row and j-th column. Then,
Aði, jÞ is a (n - 1, n - 1) submatrix.
We call detAði, jÞ a minor with respect to aij. Moreover, we define a cofactor Δij as

Δij  ð- 1Þiþj detAði, jÞ: ð11:66Þ


11.3 Inverse Matrices and Determinants 477

In (11.66), Δij is said to be a cofactor of aij; see Theorem 11.5. A transposed


matrix of a matrix consisting of Δij is sometimes referred to as an adjugate matrix of
A. In relation to Theorem 11.5, if A is non-singular, the adjugate matrix of A is given
by A-1(detA). In that case, the adjugate matrix of A is non-singular as well.
In a particular case, suppose that we strike out the i-th row and i-th column. Then,
Aði, iÞ is called a principal submatrix and detAði, iÞ is said to be a principal minor
with respect to aii, i.e., a diagonal element of A. In this case, we have Δii  detAði, iÞ.
If we strike out several pairs of i-th rows and i-th columns, we can make (m, m)
matrices (m < n), which are also called principal submatrices (of m-th order).
Determinants of the resulting matrices are said to be principal minors of m-th
order. Then, aii (1 ≤ i ≤ n) itself is a principal minor as well.
Related discussion and examples have already appeared in Sect. 7.4. Other
examples will appear in Sects. 12.2 and 14.5.
Example 11.3 Think of three-dimensional rotation by θ in ℝ3 around the z-axis.
The relevant transformation matrix is

cos θ - sin θ 0
R= sin θ cos θ 0 :
0 0 1

As detR = 1 ≠ 0, the transformation is bijective. This means that for 8y 2 ℝ3 there


is always a corresponding x 2 ℝ3. This x can be found by solving Rx = y; i.e.,
x = R-1y. Putting x = xe1 + ye2 + ze3 and y = x′e1 + y′e2 + z′e3, a matrix
representation is given by

x x′ cos θ sin θ 0 x′
-1
y =R y′ = - sin θ cos θ 0 y′ :
z z′ 0 0 1 z′

Thus, x can be obtained by rotating y by -θ.


Example 11.4 Think of a following matrix that represents a linear transformation P:

1 0 0
P= 0 1 0 :
0 0 0

This matrix transforms a vector x = xe1 + ye2 + ze3 into y = xe1 + ye2 as follows:

1 0 0 x x
0 1 0 y = y : ð11:67Þ
0 0 0 z 0
478 11 Vectors and Their Transformation

z
Fig. 11.4 Example of an
endomorphism P: ℝ3 → ℝ3

y
O

Here, we are thinking of an endomorphism P: ℝ3 → ℝ3. Geometrically, it can be


viewed as in Fig. 11.4. Let us think of (11.67) from a point of view of solving a
system of linear equations and newly consider the next equation. In other words, we
are thinking of finding x, y, and z with given a, b, and c in the following equation:

1 0 0 x a
0 1 0 y = b : ð11:68Þ
0 0 0 z c

If c = 0, we find a solution of x = a, y = b, but z can be any (complex) number; we


have thus indefinite solutions. If c ≠ 0, however, we have no solution. The former
situation reflects the fact that the transformation represented by P is not injective.
Meanwhile, the latter reflects the fact that the transformation is not surjective.
Remember that as detP = 0, the transformation is not injective or surjective.

11.4 Basis Vectors and Their Transformations

In the previous sections we show that a vector is uniquely represented as a column


vector in reference to a set of the fixed basis vectors. The representation, however,
will be changed under a different set of basis vectors.
11.4 Basis Vectors and Their Transformations 479

First let us think of a linear transformation of a set of basis vectors e1, e2, ⋯,
and en. The transformation matrix A representing a linear transformation A is defined
as follows:

a11 ⋯ a1n
A= ⋮ ⋱ ⋮ :
an1 ⋯ ann

Notice here that we often denote both a linear transformation and its
corresponding matrix by the same character. After the transformation, suppose that
the resulting vectors are given by e01 , e02 , and e0n . This is explicitly described as

a11 ⋯ a1n
ðe1 ⋯ en Þ ⋮ ⋱ ⋮ = e1′ ⋯ en′ : ð11:69Þ
an1 ⋯ ann

With a shorthand notation, we have


n
e0i = a e
k = 1 ki k
ð1 ≤ i ≤ nÞ: ð11:70Þ

Care should be taken not to confuse (11.70) with (11.63). Here, a set of vectors
e01 , e02 , and e0n may or may not be linearly independent. Let us operate both sides of
x1
(11.70) from the left on ⋮ and equate both the sides to zero. That is,
xn

a11 ⋯ a1n x1 x1
ðe1 ⋯ en Þ ⋮ ⋱ ⋮ ⋮ = e1′ ⋯ en′ ⋮ = 0:
an1 ⋯ ann xn xn

Since e1, e2, ⋯, and en are the basis vectors, we get

a11 ⋯ a1n x1
⋮ ⋱ ⋮ ⋮ = 0: ð11:71Þ
an1 ⋯ ann xn

x1
Meanwhile, we must have ⋮ = 0 so that e01 , e02 , and e0n can be linearly
xn
independent (i.e., so as to be a set of basis vectors). But this means that (11.71)
has such a unique (and trivial) solution and, hence, detA ≠ 0. If conversely detA ≠ 0,
(11.71) has a unique trivial solution and e01 , e02 , and e0n are linearly independent. Thus,
a necessary and sufficient condition for e01 , e02 , and e0n to be a set of basis vectors is
detA ≠ 0. If detA = 0, (11.71) has indefinite solutions (including a trivial solution)
and e01 , e02 , and e0n are linearly dependent and vice versa. In case detA ≠ 0, an inverse
matrix A-1 exists, and so we have
480 11 Vectors and Their Transformation

ðe1 ⋯ en Þ = e01 ⋯ e0n A - 1 : ð11:72Þ

In the previous steps we see how the linear transformation (and the corresponding
matrix representation) converts a set of basis vectors (e1⋯ en) to another set of basis
vectors e01 ⋯ e0n . Is this possible then to find a suitable transformation between two
sets of arbitrarily chosen basis vectors? The answer is yes. This is because any vector
can be expressed uniquely by a linear combination of any set of basis vectors. A whole
array of such linear combinations uniquely define a transformation matrix between the
two sets of basis vectors as expressed in (11.69) and (11.72). The matrix has non-zero
determinant and has an inverse matrix. A concept of the transformation between basis
vectors is important and very often used in various fields of natural science.
Example 11.5 We revisit Example 11.2. The relation (11.49) tells us that the basis
vectors of A(V4) and those of Ker A span V4 in total. Therefore, in light of the above
argument, there should be a linear transformation R between the two sets of vectors,
i.e., e1, e2, e3, e4 and e1 + e3, e2 + e4, - e1 - e2 + e3, e4. Moreover, the matrix
R associated with the linear transformation must be non-singular (i.e., detR ≠ 0). In
fact, we find that R is expressed as

1 0 -1 0
0 1 -1 0
R= :
1 0 1 0
0 1 0 1

This is because we have a following relation between the two sets of basis
vectors:

1 0 -1 0
0 1 -1 0
ð e 1 e2 e3 e4 Þ = ð e 1 þ e3 e 2 þ e4 - e 1 - e 2 þ e3 e 4 Þ :
1 0 1 0
0 1 0 1

We have det R = 2 ≠ 0 as expected.


Next, let us consider successive linear transformations of vectors. Again we
assume that the transformations are endomorphism:Vn → Vn. We have to take into
account transformations of basis vectors along with the targeted vectors. First we
choose a transformation by a non-singular matrix (having a non-zero determinant)
for the subsequent transformation to have a unique matrix representation (vide
supra). The vector transformation by P is expressed as

p11 ⋯ p1n x1
PðxÞ = ðe1 ⋯ en Þ ⋮ ⋱ ⋮ ⋮ , ð11:73Þ
pn1 ⋯ pnn xn
11.4 Basis Vectors and Their Transformations 481

where the non-singular matrix P represents the transformation P. Notice here that the
transformation and its matrix are represented by the same P. As mentioned in Sect.
11.2, the matrix P can be operated either from the right on the basis vectors or from
the left on the column vector. We explicitly write

p11 ⋯ p1n x1
PðxÞ = ðe1 ⋯ en Þ ⋮ ⋱ ⋮ ⋮ ,
pn1 ⋯ pnn xn
ð11:74Þ
x1
= e01 ⋯ e0n ⋮ ,
xn

where e01 ⋯ e0n = (e1⋯ en)P [here P is the non-singular matrix defined in (11.73)].
Alternatively, we have

p11 ⋯ p1n x1
PðxÞ = ðe1 ⋯ en Þ ⋮ ⋱ ⋮ ⋮
pn1 ⋯ pnn xn
ð11:75Þ
x01
= ðe1 ⋯ en Þ ⋮ ,
x0n

where

x01 p11 ⋯ p1n x1


⋮ = ⋮ ⋱ ⋮ ⋮ : ð11:76Þ
x0n pn1 ⋯ pnn xn

Equation (11.75) gives a column vector representation regarding the vector that
has been obtained by the transformation P and is viewed in reference to the basis
vectors (e1⋯en). Combining (11.74) and (11.75), we get

x1 x01
PðxÞ = e01 ⋯ e0n ⋮ = ðe1 ⋯ en Þ ⋮ : ð11:77Þ
xn x0n

We further make another linear transformation A : Vn → Vn. In this case a


corresponding matrix A may be non-singular (i.e., detA ≠ 0) or singular (detA = 0).
We have to distinguish the matrix representations of the two cases. It is because the
matrix representations are uniquely defined in reference to an individual set of basis
vectors; see (11.37) and (11.43). Let us denote the matrices AO and A′ with respect to
482 11 Vectors and Their Transformation

the basis vectors (e1⋯ en) and e01 ⋯ e0n , respectively. Then, A[P(x)] can be
described in two different ways as follows:

x1 x01
A½PðxÞ = e01 ⋯ e0n A0 ⋮ = ðe1 ⋯ en ÞAO ⋮ : ð11:78Þ
xn x0n

This can be rewritten in reference to a linearly independent set of vectors e1, ⋯,


en as

x1 x1
A½PðxÞ = ½ðe1 ⋯ en ÞPA0  ⋮ = ð e1 ⋯ en Þ A O P ⋮ : ð11:79Þ
xn xn

n
As (11.79) is described for a vector x = xi ei arbitrarily chosen in Vn, we get
i=1

PA0 = AO P: ð11:80Þ

Since P is non-singular, we finally obtain

A0 = P - 1 AO P: ð11:81Þ

We can see (11.79) from a point of view of successive linear transformations.


When the subsequent operation is viewed in reference to the basis vectors e01 , ⋯, e0n
newly reached by the precedent transformation, we make it a rule to write the
relevant subsequent operator A′ from the right. In the case where the subsequent
operation is viewed in reference to the original basis vectors, however, we write the
subsequent operator AO from the left. Further discussion and examples can be seen in
Part IV.
We see (11.81) in a different manner. Suppose we have

x1
AðxÞ = ðe1 ⋯ en ÞAO ⋮ : ð11:82Þ
xn

Note that since the transformation A has been performed in reference to the basis
vectors (e1⋯en), AO should be used for the matrix representation. This is rewritten as

x1
AðxÞ = ðe1 ⋯ en ÞPP - 1 AO PP - 1 ⋮ : ð11:83Þ
xn

Meanwhile, any vector x in Vn can be written as


11.4 Basis Vectors and Their Transformations 483

x1
x = ð e1 ⋯ en Þ ⋮
xn
x1 x1 x1
-1
= ðe1 ⋯ en ÞPP ⋮ = e01 ⋯ e0n P -1
⋮ = e01 ⋯ e0n ⋮ :
xn xn xn
ð11:84Þ

In (11.84) we put

x1 x~1
ðe1 ⋯ en ÞP = e01 ⋯ e0n , P-1 ⋮ = ⋮ : ð11:85Þ
xn x~n

Equation (11.84) gives a column vector representation regarding the same vector
x viewed in reference to the basis set of vectors e1, ⋯, en or e01 , ⋯, e0n . Equation
(11.85) should not be confused with (11.76). That is, (11.76) relates the two
x1 x01
coordinates ⋮ and ⋮ of different vectors before and after the transformation
xn x0n
viewed in reference to the same set of basis vectors. The relation (11.85), however,
x1 x~1
relates two coordinates ⋮ and ⋮ of the same vector viewed in reference to
xn x~n
different set of basis vectors. Thus, from (11.83) we have

x~1
AðxÞ = e01 ⋯ e0n P - 1 AO P ⋮ : ð11:86Þ
x~n

Meanwhile, viewing the transformation A in reference to the basis vectors


e01 ⋯e0n , we have

x~1
AðxÞ = e01 ⋯ e0n A0 ⋮ : ð11:87Þ
x~n

Equating (11.86) and (11.87),

A0 = P - 1 AO P: ð11:88Þ

Thus, (11.81) is recovered.


The relations expressed by (11.81) and (11.88) are called a similarity transfor-
mation on A. The matrices A0 and A′ are said to be similar to each other. As
484 11 Vectors and Their Transformation

mentioned earlier, if A0 (and hence A′) is non-singular, the linear transformation


A produces a set of basis vectors other than e01 , ⋯, e0n , say e001 , ⋯, e00n . We write this
symbolically as

ðe1 ⋯en ÞPA = e01 ⋯e0n A = e001 ⋯e00n : ð11:89Þ

Therefore, such A defines successive transformations of the basis vectors in


conjunction with P defined in (11.73). The successive transformations and resulting
basis vectors supply us with important applications in the field of group theory.
Topics will be dealt with in Parts IV and V.

References

1. Satake I-O (1975) Linear algebra (Pure and applied mathematics). Marcel Dekker, New York
2. Satake I (1974) Linear algebra. Shokabo, Tokyo. (in Japanese)
3. Hassani S (2006) Mathematical physics. Springer, New York
4. Dennery P, Krzywicki A (1996) Mathematics for physicists. Dover, New York
Chapter 12
Canonical Forms of Matrices

In Sect. 11.4 we saw that the transformation matrices are altered depending on the
basis vectors we choose. Then a following question arises. Can we convert a
(transformation) matrix to as simple a form as possible by similarity transforma-
tion(s)? In Sect. 11.4 we have also shown that if we have two sets of basis vectors in
a linear vector space Vn we can always find a non-singular transformation matrix
between the two. In conjunction with the transformation of the basis vectors, the
matrix undergoes similarity transformation. It is our task in this chapter to find a
simple form or a specific form (i.e., canonical form) of a matrix as a result of the
similarity transformation. For this purpose, we should first find eigenvalue(s) and
corresponding eigenvector(s) of the matrix. Depending upon the nature of matrices,
we get various canonical forms of matrices such as a triangle matrix and a diagonal
matrix. Regarding any form of matrices, we can treat these matrices under a unified
form called the Jordan canonical form.

12.1 Eigenvalues and Eigenvectors

An eigenvalue problem is one of the important subjects of the theory of linear vector
spaces. Let A be a linear transformation on Vn. The resulting matrix gets to several
different kinds of canonical forms of matrices. A typical example is a diagonal
matrix. To reach a satisfactory answer, we start with a so-called eigenvalue problem.
Suppose that after the transformation of x we have

AðxÞ = αx, ð12:1Þ

where α is a certain (complex) number. Then we say that α is an eigenvalue and that
x is an eigenvector that corresponds to the eigenvalue α. Using a notation of (11.37)
of Sect. 11.2, we have

© Springer Nature Singapore Pte Ltd. 2023 485


S. Hotta, Mathematical Physical Chemistry,
https://doi.org/10.1007/978-981-99-2512-4_12
486 12 Canonical Forms of Matrices

a11 ⋯ a1n x1 x1
AðxÞ = ðe1 ⋯ en Þ ⋮ ⋱ ⋮ ⋮ = αðe1 ⋯ en Þ ⋮ :
an1 ⋯ ann xn xn

From linear dependence of e1, e2, ⋯, and en, we simply write

a11 ⋯ a1n x1 x1
⋮ ⋱ ⋮ ⋮ =α ⋮ :
an1 ⋯ ann xn xn

x1
If we identify x with ⋮ at fixed basis vectors (e1⋯en), we may naturally
xn
rewrite (12.1) as

Ax = αx: ð12:2Þ

If x1 and x2 belong to the eigenvalue α, so do x1 + x2 and cx1 (c is an appropriate


complex number). Therefore, all the eigenvectors belonging to the eigenvalue α
along with 0 (a zero vector) form a subspace of A (within Vn) corresponding to the
eigenvalue α.
Strictly speaking, we should use terminologies such as a “proper” (or ordinary)
eigenvalue, eigenvector, eigenspace, etc. to distinguish them from a “generalized”
eigenvalue, eigenvector, eigenspace, etc. We will return to this point later. Further
rewriting (12.2) we have

ðA - αE Þx = 0, ð12:3Þ

where E is a (n, n) unit matrix. Equations (12.2) and (12.3) are said to be an
eigenvalue equation (or eigenequation). In (12.2) or (12.3) x = 0 always holds
(as a trivial solution). Consequently, for x ≠ 0 to be a solution we must have

jA - αE j = 0: ð12:4Þ

In (12.4), |A - αE| stands for det(A - αE).


Now let us define the following polynomial:

x - a11 ⋯ - a1n
f A ðxÞ = jxE - Aj = ⋮ ⋱ ⋮ : ð12:5Þ
- an1 ⋯ x - ann

A necessary and sufficient condition for α to be an eigenvalue is that α is a root of


fA(x) = 0. The function fA(x) is said to be a characteristic polynomial and we call
fA(x) = 0 a characteristic equation. This is an n-th order polynomial. Putting
12.1 Eigenvalues and Eigenvectors 487

f A ðxÞ = xn þ a1 xn - 1 þ ⋯ þ an , ð12:6Þ

we have

a1 = - ða11 þ a22 þ ⋯ þ ann Þ  - TrA, ð12:7Þ

where Tr stands for “trace” that is a summation of diagonal elements. Moreover,

an = ð- 1Þn j A j : ð12:8Þ

The characteristic equation fA(x) = 0 has n roots including possible multiple roots.
Let those roots be α1, ⋯, αn (some of them may be identical). Then we have
n
f A ð xÞ = i=1
ðx - αi Þ: ð12:9Þ

Furthermore, according to relations between roots and coefficients we get

α1 þ ⋯ þ αn = - a1 = TrA, ð12:10Þ

α1 ⋯αn = ð- 1Þn an = jAj: ð12:11Þ

The characteristic equation fA(x) is invariant under a similarity transformation.


In fact,

f P - 1 AP ðxÞ = xE - P - 1 AP = j P - 1 ðxE - AÞP j


ð12:12Þ
= jPj - 1 jxE - AjjPj = jxE - Aj = f A ðxÞ:

This leads to invariance of the trace under a similarity transformation. That is,

Tr P - 1 AP = TrA: ð12:13Þ

Let us think of a following triangle matrix:


a11
:
T= : ð11:58Þ

0 ann

The matrix of this type is thought to be one of canonical forms of matrices. Its
characteristic equation fT(x) is
488 12 Canonical Forms of Matrices

 
x - a11

f T ðxÞ = jxE - T j = 0 ⋱ : ð12:14Þ
0 0 x - ann

Therefore, we get
n
f T ð xÞ = i=1
ðx - aii Þ, ð12:15Þ

where we used (11.59). From (11.58) and (12.15), we find that eigenvalues of a
triangle matrix are given by its diagonal elements accordingly. Our immediate task
will be to examine whether and how a given matrix is converted to a triangle matrix
through a similarity transformation. A following theorem is important.
Theorem 12.1 Every (n, n) square matrix can be converted to a triangle matrix by
similarity transformation [1].
Proof A triangle matrix is either an “upper” triangle matrix [to which all the lower
left off-diagonal elements are zero; see (11.58)] or a “lower” triangle matrix
(to which all the upper right off-diagonal elements are zero). In the present case
we show the proof for the upper triangle matrix. Regarding the lower triangle matrix
the theorem is proven in a similar manner.
The proof is due to mathematical induction. First we show that the theorem is true
of a (2, 2) matrix. Suppose that one of the eigenvalues of A2 is α1 and that its
corresponding eigenvector is x1. Then we have

A2 x1 = α1 x1 , ð12:16Þ

where we assume that x1 represents a column vector. Let a non-singular matrix be

P1 = ðx1 p1 Þ, ð12:17Þ

where p1 is another column vector chosen in such a way that x1 and p1 are linearly
independent so that P1 can be a non-singular matrix. Then, we have

P1- 1 A2 P1 = P1- 1 A2 x1 P1- 1 A2 p1 : ð12:18Þ

Meanwhile,

x1 = ð x 1 p 1 Þ 1
0
= P1 1
0
: ð12:19Þ

Hence, we have
12.1 Eigenvalues and Eigenvectors 489

α1
P1- 1 A2 x1 = α1 P1- 1 x1 = α1 P1- 1 P1 1
0
= α1 1
0
= 0
: ð12:20Þ

Thus (12.18) can be rewritten as

α1 
P1- 1 A2 P1 = 0 
: ð12:21Þ

This shows that Theorem 12.1 is true of a (2, 2) matrix A2.


Now let us show that Theorem 12.1 holds in a general case, i.e., for a (n, n) square
matrix An. Let αn be one of the eigenvalues of An. On the basis of the argument of the
(2, 2) matrix case, suppose that after a suitable similarity transformation by a
non-singular matrix P~ we have

-1
A~n = P
~ ~
An P: ð12:22Þ

Then, we can describe A~n as

αn x1 ⋯ xn - 1
0
An = : ð12:23Þ
⋮ An - 1
0

In (12.23), αn is one of the eigenvalues of An. To show that (12.23) is valid, we


~ such that
have a similar argument as in the case of a (2, 2) matrix. That is, we set P

P = ð an p1 p2 ⋯ pn - 1 Þ,

where an is an eigenvector corresponding to αn and P~ is a non-singular matrix


formed by n linearly independent column vectors an, p1, p2, ⋯and pn - 1. Then
we have

-1
An = P An P
-1 -1 -1 -1 ð12:24Þ
= P An an P A n p1 P An p2 ⋯ P A n pn - 1 :

The vector an can be expressed as


490 12 Canonical Forms of Matrices

1 1
0 0
an = ð an p1 p2 ⋯ pn - 1 Þ 0 =P 0 :
⋮ ⋮
0 0

Therefore, we have

1 1 αn
-1 -1 -1
~
P ~
An an = P ~
αn an = P ~
αn P
0
0 = αn
0
0 =
0
0 :
⋮ ⋮ ⋮
0 0 0

Thus, from (12.24) we see that one can express a matrix form of A~n as (12.23).
By a hypothesis of the mathematical induction, we assume that there exists a (n -
1, n - 1) non-singular square matrix Pn - 1 and an upper triangle matrix Δn - 1
such that

Pn--11 An - 1 Pn - 1 = Δn - 1 : ð12:25Þ

Let us define a following matrix:

d 0 ⋯ 0
0
Pn = ,
⋮ Pn - 1
0

where d ≠ 0. The Pn is (n, n) non-singular square matrix; remember that detPn =


d(det Pn - 1) ≠ 0. Operating Pn on A~n from the right, we have
12.1 Eigenvalues and Eigenvectors 491

αn x1 ⋯ xn - 1 d 0 ⋯ 0
0 0
⋮ An - 1 ⋮ Pn - 1
0 0

αn d xTn - 1 Pn - 1
0
= , ð12:26Þ
⋮ An - 1 Pn - 1
0

x1
where xTn - 1 is a transpose of a column vector ⋮ . Therefore, xTn - 1 Pn - 1 is a (1,
xn - 1
n - 1) matrix (i.e., a row vector). Meanwhile, we have

d 0 0 0 αn xTn - 1 Pn - 1 =d
0 0
⋮ Pn - 1 ⋮ Δn - 1
0 0

αn d xTn - 1 Pn - 1
0
= : ð12:27Þ
⋮ Pn - 1 Δn - 1
0

From the assumption of (12.25), we have

LHS of ð12:26Þ = LHS of ð12:27Þ: ð12:28Þ

That is,

A~n Pn = Pn Δn : ð12:29Þ

In (12.29) we define Δn such that


492 12 Canonical Forms of Matrices

αn xTn - 1 Pn - 1 =d
0
Δn = , ð12:30Þ
⋮ Δn - 1
0

which has appeared in LHS of (12.27). As Δn - 1 is a triangle matrix from the


assumption, Δn is a triangle matrix as well. Combining (12.22) and (12.29), we
finally get

-1
~ n
Δn = PP ~ n:
An PP ð12:31Þ

~ n is a non-singular matrix, and so PP


Notice that PP ~ - 1 = E. Hence,
~ n Pn - 1 P
~ - 1 = PP
Pn - 1 P ~ n - 1 . The equation obviously shows that An has been converted
to a triangle matrix Δn. This completes the proof.
Equation (12.31) implies that eigenvectors are disposed on diagonal positions of a
triangle matrix. Triangle matrices can further undergo a similarity transformation.
Example 12.1 Let us think of a following triangle matrix A:

A= 2
0
1
1
: ð12:32Þ

Eigenvalues of A are 2 and 1. Remember that diagonal elements of a triangle


1
matrix give eigenvalues. According to (12.20), a vector 0
can be chosen for an
eigenvector (as a column vector representation) corresponding to the eigenvalue
-1
2. Another eigenvector can be chosen to be 1
. This is because for an eigenvalue
1, we obtain

1 1
x=0
0 0

as an eigenvalue equation (A - E)x = 0. Therefore, with an eigenvector c1


c2
-1
corresponding to the eigenvalue 1, we get c1 + c2 = 0. Therefore, we have 1
as
-1
a simple form of the eigenvector. Hence, putting P = 1
0 1
, similarity transfor-
mation is carried out such that
12.1 Eigenvalues and Eigenvectors 493

1 1 2 1 1 -1 2 0
P - 1 AP = = : ð12:33Þ
0 1 0 1 0 1 0 1

This is a simple example of matrix diagonalization.


Regarding the eigenvalue/eigenvector problems, we have another important
theorem.
Theorem 12.2 Eigenvectors corresponding to different eigenvalues of A are line-
arly independent.
Proof We prove the theorem by mathematical induction. Let α1 and α2 be two
different eigenvalues of a matrix A and let a1 and a2 be eigenvectors corresponding
to α1 and α2, respectively.
Let us think of a following equation:

c1 a1 þ c2 a2 = 0: ð12:34Þ

Suppose that a1 and a2 are linearly dependent. Then, without loss of generality we
can put c1 ≠ 0. Accordingly, we get

c2
a1 = - a : ð12:35Þ
c1 2

Operating A from the left of (12.35) we have

c2
α 1 a1 = - α a = α2 a1 : ð12:36Þ
c1 2 2

With the second equality we have used (12.35). From (12.36) we have

ðα1 - α2 Þa1 = 0: ð12:37Þ

As α1 ≠ α2, α1 - α2 ≠ 0. This implies a1 = 0, in contradiction to that a1 is a


eigenvector. Thus, a1 and a2 must be linearly independent.
Next we assume that Theorem 12.2 is true of the case where we have (n - 1)
eigenvalues α1, α2, ⋯, and αn - 1 that are different from one another and
corresponding eigenvectors a1, a2, ⋯, and an - 1 are linearly independent. Let us
think of a following equation:

c1 a1 þ c2 a2 þ ⋯ þ cn - 1 an - 1 þ cn an = 0, ð12:38Þ

where an is an eigenvector corresponding to an eigenvalue αn. Suppose here that a1,


a2, ⋯, and an are linearly dependent. If cn = 0, we have
494 12 Canonical Forms of Matrices

c1 a1 þ c2 a2 þ ⋯ þ cn - 1 an - 1 = 0: ð12:39Þ

But, from the linear independence of a1, a2, ⋯, and an - 1 we have


c1 = c2 = ⋯ = cn - 1 = 0. Thus, it follows that all the eigenvectors a1, a2, ⋯,
and an are linearly independent. However, this is in contradiction to the assumption.
We should therefore have cn ≠ 0. Accordingly we get

c1 c c
an = - a þ 2 a þ ⋯ þ n - 1 an - 1 : ð12:40Þ
cn 1 cn 2 cn

Operating A from the left of (12.38) again, we have

c1 c c
αn an = - α a þ 2 α a þ ⋯ þ n - 1 αn - 1 an - 1 : ð12:41Þ
cn 1 1 cn 2 2 cn

Here we think of two cases of (1) αn ≠ 0 and (2) αn = 0.


1. Case I: αn ≠ 0.
Multiplying both sides of (12.40) by αn we have

c1 c c
αn an = - α a þ 2 α a þ ⋯ þ n - 1 αn an - 1 : ð12:42Þ
cn n 1 cn n 2 cn

Subtracting (12.42) from (12.41) we get

c1 c c
0= ðα - α1 Þa1 þ 2 ðαn - α2 Þa2 þ ⋯ þ n - 1 ðαn - αn - 1 Þan - 1 : ð12:43Þ
cn n cn cn

Since we assume that eigenvalue is different from one another, αn ≠ α1,


αn ≠ α2, ⋯, αn ≠ αn - 1. This implies that c1 = c2 = ⋯ = cn - 1 = 0. From
(12.40) we have an = 0. This is, however, in contradiction to that an is an
eigenvector. This means that our original supposition that a1, a2, ⋯, and an are
linearly dependent was wrong. Thus, the eigenvectors a1, a2, ⋯, and an should be
linearly independent.
2. Case II: αn = 0.
Suppose again that a1, a2, ⋯, and an are linearly dependent. Since as before
cn ≠ 0, we get (12.40) and (12.41) once again. Putting αn = 0 in (12.41) we have

c1 c c
0= - α a þ 2 α a þ ⋯ þ n - 1 αn - 1 an - 1 : ð12:44Þ
cn 1 1 cn 2 2 cn

Since eigenvalues are different, we should have α1 ≠ 0, α2 ≠ 0, ⋯, and


αn - 1 ≠ 0. Then, considering that a1, a2, ⋯, and an - 1 are linearly independent,
for (12.44) to hold we must have c1 = c2 = ⋯ = cn - 1 = 0. But, from (12.40) we
have an = 0, again in contradiction to that an is an eigenvector. Thus, the
12.2 Eigenspaces and Invariant Subspaces 495

eigenvectors a1, a2, ⋯, and an should be linearly independent. These complete


the proof.

12.2 Eigenspaces and Invariant Subspaces

Equations (12.21), (12.30), and (12.31) show that if we adopt an eigenvector as one
of the basis vectors, the matrix representation of the linear transformation A in
reference to such basis vectors is obtained so that the leftmost column is zero except
for the left top corner on which an eigenvalue is positioned. (Note that if the said
eigenvector is zero, the leftmost column is a zero vector.) Meanwhile, neither
Theorem 12.1 nor Theorem 12.2 tells about multiplicity of eigenvalues. If the
eigenvalues have multiplicity, we have to think about different aspects. This is a
major issue of this section.
Let us start with a discussion of invariant subspaces. Let A be a (n, n) square
matrix. If a subspace W in Vn is characterized by

x 2 W ⟹ Ax 2 W,

W is said to be invariant with respect to A (or simply A-invariant) or an invariant


subspace in Vn = Span{a1, a2, ⋯, an}. Suppose that x is an eigenvector of A and that
its corresponding eigenvalue is α. Then, Span{x} is an invariant subspace of Vn. It is
because A(cx) = cAx = cαx = α(cx) and cx is again an eigenvector belonging to α.
Suppose that dimW = m (m ≤ n) and that W = Span{a1, a2, ⋯, am}. If W is A-
invariant, A causes a linear transformation within W. In that case, expressing A in
reference to a1, a2, ⋯, am, am + 1, ⋯, an, we have

ð a1 a2 ⋯ am amþ1 ⋯ an ÞA

Am
= ð a1 a2 ⋯ am amþ1 ⋯ an Þ 
, ð12:45Þ
0

where Am is a (m, m) square matrix and “zero” denotes a (n - m, m) zero matrix.


Notice that the transformation A makes the remaining (n - m) basis vectors am + 1,
am + 2, ⋯, and an in Vn be converted to a linear combination of a1, a2, ⋯, and an. The
triangle matrix Δn given in (12.30) and (12.31) is an example to which Am is a (1, 1)
matrix (i.e., simply a complex number).
Let us examine properties of the A-invariant subspace still further. Let a be any
vector in Vn and think of following (n + 1) vectors [2]:

a, Aa, A2 a, ⋯, An a:
496 12 Canonical Forms of Matrices

These vectors are linearly dependent since there are at most n linearly indepen-
dent vectors in Vn. These vectors span a subspace in Vn. Let us call this subspace M;
i.e.,

M  Span a, Aa, A2 a, ⋯, An a :

Consider the following equation:

c0 a þ c1 Aa þ c2 A2 a þ ⋯ þ cn An a = 0: ð12:46Þ

Not all ci (0 ≤ i ≤ n) are zero, because the vectors are linearly dependent. Suppose
that cn ≠ 0. Then, from (12.46) we have

1
An a = - c a þ c1 Aa þ c2 A2 a þ ⋯ þ cn - 1 An - 1 a :
cn 0

Operating A on the above equation from the left, we have

1
Anþ1 a = - c Aa þ c1 A2 a þ c2 A3 a þ ⋯ þ cn - 1 An a :
cn 0

Thus, An + 1a is contained in M. That is, we have An + 1a 2 Span{a, Aa, A2a, ⋯,


A a}. Next, suppose that cn = 0. Then, at least one of ci (0 ≤ i ≤ n - 1) is not zero.
n

Suppose that cn - 1 ≠ 0. From (12.46), we have

1
An - 1 a = - c a þ c1 Aa þ c2 A2 a þ ⋯ þ cn - 2 An - 2 a :
cn - 1 0

Operating A2 on the above equation from the left, we have

1
Anþ1 a = - c A2 a þ c1 A3 a þ c2 A4 a þ ⋯ þ cn - 2 An a :
cn - 1 0

Again, An + 1a is contained in M. Repeating similar procedures, we reach

c0 a þ c1 Aa = 0:

If c1 = 0, then we must have c0 ≠ 0. If so, a = 0. This is impossible, however,


because we should have a ≠ 0. Then, we have c1 ≠ 0 and, hence,

c0
Aa = - a:
c1

Operating once again An on the above equation from the left, we have
12.2 Eigenspaces and Invariant Subspaces 497

c0 n
Anþ1 a = - A a:
c1

Thus, once again An + 1a is contained in M.


In the above discussion, we get AM ⊂ M. Further operating
A, A2M ⊂ AM ⊂ M, A3M ⊂ A2M ⊂ AM ⊂ M, ⋯. Then we have

a, Aa, A2 a, ⋯, An a, Anþ1 a, ⋯ 2 Span a, Aa, A2 a, ⋯, An a :

That is, M is an A-invariant subspace. We also have

m  dimM ≤ dimV n = n:

There are m basis vectors in M, and so representing A in a matrix form in reference


to the n basis vectors of Vn including these m vectors, we have


A= Am
0 
: ð12:47Þ

Note again that in (12.45) Vn is spanned by the m basis vectors in M together with
other (n - m) linearly independent vectors.
We happen to encounter a situation where two subspaces W1 and W2 are at once
A-invariant. Here we can take basis vectors a1, a2, ⋯, and am for W1 and am + 1,
am + 2, ⋯, and an for W2. In reference to such a1, a2, ⋯, and an as basis vector of Vn,
we have

A= Am
0
0
An - m
, ð12:48Þ

where An - m is a (n - m, n - m) square matrix and “zero” denotes either a (n - m,


m) or (m, n - m) zero matrix. Alternatively, we denote

A = Am An - m :

In this situation, the matrix A is said to be reducible.


As stated above, A causes a linear transformation within both W1 and W2. In other
words, Am and An - m cause a linear transformation within W1 and W2 in reference to
a1, a2, ⋯, am and am + 1, am + 2, ⋯, an, respectrively. In this case Vn can be
represented as a direct sum of W1 and W2 such that

Vn = W1 W2
ð12:49Þ
= Spanfa1 , a2 , ⋯, am g Spanfamþ1 , amþ2 , ⋯, an g:
498 12 Canonical Forms of Matrices

This is because Span{a1, a2, ⋯, am} \ Span{am + 1, am + 2, ⋯, an} = {0}. In fact,


if the two subspaces possess x ( ≠ 0) in common, a1, a2, ⋯, an become linearly
dependent, in contradiction.
The vector space Vn may well further be decomposed into subspaces with a lower
dimension. For further discussion we need a following theorem and a concept of a
generalized eigenvector and generalized eigenspace.
Theorem 12.3 Hamilton–Cayley Theorem [3, 4] Let fA(x) be the characteristic
polynomial pertinent to a linear transformation A: Vn → Vn. Then fA(A)(x) = 0 for
8
x 2 Vn. That is, Ker fA(A) = Vn.
Proof To prove the theorem we use the following well-known property of deter-
minants. Namely, let A be a (n, n) square matrix expressed as

a11 ⋯ a1n
A= ⋮ ⋱ ⋮ :
an1 ⋯ ann

~ be the cofactor matrix of A, namely


Let A

Δ11 ⋯ Δ1n
~=
A ⋮ ⋱ ⋮ ,
Δn1 ⋯ Δnn

where Δij is a cofactor of aij. Then

~ T A = AA
A ~ T = j A j E, ð12:50Þ

~ T is a transposed matrix of A.
where A ~ We now apply (12.50) to the characteristic
polynomial to get a following equation expressed as

T T
xE - A ðxE - AÞ = ðxE - AÞ xE - A = jxE - AjE = f A ðAÞE, ð12:51Þ

where xE - A is the cofactor matrix of xE - A. Let the cofactor of the (i, j)-element
of (xE - A) be Δij. Note in this case that Δij is an at most (n - 1)-th order polynomial
of x. Let us put accordingly

Δij = bij,0 xn - 1 þ bij,1 xn - 2 þ ⋯ þ bij,n - 1 : ð12:52Þ

Also, we put Bk = (bij, k). Then we have


12.2 Eigenspaces and Invariant Subspaces 499

Δ11 ⋯ Δ1n
xE - A = ⋮ ⋱ ⋮
Δn1 ⋯ Δnn
n-1
b11,0 x þ ⋯ þ b11,n - 1 ⋯ b1n,0 xn - 1 þ ⋯ þ b1n,n - 1
= ⋮ ⋱ ⋮
bn1,0 xn - 1 þ ⋯ þ bn1,n - 1 ⋯ bnn,0 xn - 1 þ ⋯ þ bnn,n - 1
b11,0 ⋯ b1n,0 b11,0 ⋯ b1n,n - 1
= xn - 1 ⋮ ⋱ ⋮ þ⋯þ ⋮ ⋱ ⋮
bn1,0 ⋯ bnn,0 bn1,0 ⋯ bnn,n - 1
n-1 n-2
=x B0 þ x B1 þ ⋯ þ Bn - 1 :
ð12:53Þ

Thus, we get

jxE - AjE = ðxE - AÞ xn - 1 BT0 þ xn - 2 BT1 þ ⋯ þ BTn - 1 : ð12:54Þ

Replacing x with A and considering that BTl ðl = 0, ⋯, n - 1Þ is commutative with


A, we have

f A ðAÞ = ðA - AÞ An - 1 BT0 þ An - 2 BT1 þ ⋯ þ BTn - 1 = 0: ð12:55Þ

This means that fA(A)(x) = 0 for 8x 2 Vn. These complete the proof.
In relation to Hamilton–Cayley theorem, we consider an important concept of a
minimal polynomial.
Definition 12.1 Let f(x) be a polynomial of x with scalar coefficients such that
f(A) = 0, where A is a (n, n) matrix. Among f(x), a lowest-order polynomial with the
highest-order coefficient of one is said to be a minimal polynomial. We denote it by
φA(x); i.e., φA(A) = 0.
We have an important property for this. Namely, a minimal polynomial φA(x) is a
divisor of f(A). In fact, suppose that we have

f ðxÞ = gðxÞφA ðxÞ þ hðxÞ:

Inserting A into x, we have

f ðAÞ = gðAÞφA ðAÞ þ hðAÞ = hðAÞ = 0:


500 12 Canonical Forms of Matrices

From the above equation, h(x) should be a polynomial whose order is lower than
that of φA(A). But the presence of such h(x) is in contradiction to the definition of the
minimal polynomial. This implies that h(x)  0. Thus, we get

f ðxÞ = gðxÞφA ðxÞ:

That is, φA(x) must be a divisor of f(A).


Suppose that φ0A ðxÞ is another minimal polynomial. If the order of φ0A ðxÞ is lower
than that of φA(x), we can choose φ0A ðxÞ for a minimal polynomial from the
beginning. Thus we assume that φA(x) and φ0A ðxÞ are of the same order. We have

f ðxÞ = gðxÞφA ðxÞ = g0 ðxÞφ0A ðxÞ:

Then, we have

φA ðxÞ=φ0A ðxÞ = g0 ðxÞ=gðxÞ = c, ð12:56Þ

where c is a constant, because φA(x) and φ0A ðxÞ are of the same order. But c should be
one, since the highest-order coefficient of the minimal polynomial is one. Thus,
φA(x) is uniquely defined.

12.3 Generalized Eigenvectors and Nilpotent Matrices

Equation (12.4) ensures that an eigenvalue is accompanied by an eigenvector.


Therefore, if a matrix A : Vn → Vn has different n eigenvalues without multiple
roots, the vector space Vn is decomposed to a direct sum of one-dimensional sub-
spaces spanned by those individual linearly independent eigenvectors (see discus-
sion of Sect 12.1). Thus, we have

Vn = W1 W2 ⋯ Wn
ð12:57Þ
= Spanfa1 g Spanfa2 g ⋯ Spanfan g,

where ai (1 ≤ i ≤ n) are eigenvectors corresponding to different n eigenvalues. The


situation, however, is not always simple. Even though a matrix has eigenvalues of
multiple roots, we have yet a simple case as shown in a next example.
Example 12.2 Let us think of a following matrix A : V3 → V3 described by

2 0 0
A= 0 2 0 : ð12:58Þ
0 0 2
12.3 Generalized Eigenvectors and Nilpotent Matrices 501

The matrix has a triple root 2. As can easily be seen below, individual eigenvec-
tors a1, a2, and a3 form basis vectors of each invariant subspace, indicating that V3
can be decomposed to a direct sum of the three invariant subspaces as in (12.57) such
that

2 0 0
ð a1 a2 a3 Þ 0 2 0 = ð 2a1 2a2 2a3 Þ:
0 0 2

Let us think of another simple example.


Example 12.3 Let us think of a linear transformation A : V2 → V2 expressed as

3 1 x1
AðxÞ = ð a1 a2 Þ , ð12:59Þ
0 3 x2

where a1 and a2 are basis vectors and for 8x 2 V2,x = x1a1 + x2a2. This example has a
double root 3. We have

3 1
ð a1 a2 Þ = ð 3a1 a1 þ 3a2 Þ: ð12:60Þ
0 3

Thus, Span {a1} is A-invariant, but this is not the case with Span{a2}. This
implies that a1 is an eigenvector corresponding to an eigenvalue 3 but a2 is not.
Detailed discussion about matrices of this kind follows below.
Nilpotent matrices play an important role in matrix algebra. These matrices are
defined as follows.
Definition 12.2 Let N be a linear transformation in a vector space Vn. Suppose that
we have

Nk = 0 and N k - 1 ≠ 0, ð12:61Þ

where N is a (n, n) square matrix and k (≤2) is a certain natural number. Then, N is
said to be a nilpotent matrix of an order k or a k-th order nilpotent matrix. If (12.61)
holds with k = 1, N is a zero matrix.
Nilpotent matrices have the following properties:
1. Eigenvalues of a nilpotent matrix are zero. Let N be a k-th order nilpotent matrix.
Suppose that
502 12 Canonical Forms of Matrices

Nx = αx, ð12:62Þ

where α is an eigenvalue and x (≠0) is its corresponding eigenvector. Operating


N (k - 1) more times from the left of both sides of (12.62), we have

N k x = αk x: ð12:63Þ

Meanwhile, Nk = 0 by definition, and so αk = 0, namely α = 0.


Conversely, suppose that eigenvalues of a (n, n) matrix N are zero. From
Theorem 12.1, via a suitable similarity transformation with P we have

~
P - 1 NP = N,

~ is a triangle matrix. Then, using (12.12) and (12.15) we have


where N

f P - 1 NP ðxÞ = f N~ ðxÞ = f N ðxÞ = xn :

From Theorem 12.3, we have

f N ðN Þ = N n = 0:

Namely, N is a nilpotent matrix. In a trivial case, we have N = 0 (zero matrix).


By Definition 12.2, we have Nk = 0 with a k-th nilpotent (n, n) matrix N. Then,
its minimal polynomial is φN(x) = xk (k ≤ n).
2. A nilpotent matrix N is not diagonalizable (except for a zero matrix). Suppose that
N is diagonalizable. Then N can be diagonalized by a non-singular matrix P such
that

P - 1 NP = 0:

The above equation holds, because N only takes eigenvalues of zero. Operat-
ing P from the left of the above equation and P-1 from the right, we have

N = 0:

This means that N would be a zero matrix, in contradiction. Thus, a nilpotent


matrix N is not diagonalizable.
12.3 Generalized Eigenvectors and Nilpotent Matrices 503

Example 12.4 Let N be a matrix of a following form:

N= 0
0
1
0
:

Then, we can easily check that N2 = 0. Therefore, N is a nilpotent matrix of a


second order. Note that N is an upper triangle matrix, and so eigenvalues are given
by diagonal elements. In the present case the eigenvalue is zero (as a double root),
consistent with the aforementioned property.
With a nilpotent matrix of an order k, we have at least one vector x such that
Nk - 1x ≠ 0. Here we add that a zero transformation A (or matrix) can be defined as

8
A = 0 , Ax = 0 for x 2 V n: ð12:64Þ

Taking contraposition of this, we have


A ≠ 0 , Ax ≠ 0 for x 2 V n:

In relation to a nilpotent matrix, we have a following important theorem.


Theorem 12.4 If N is a k-th order nilpotent matrix, then for ∃x 2 V we have
following linearly independent k vectors:

x, Nx N 2 x, ⋯, N k - 1 x:

Proof Let us think of a following equation:

k-1
cN
i=0 i
i
x = 0: ð12:65Þ

Multiplying (12.65) by Nk - 1 from the left and using Nk = 0, we get

c0 N k - 1 x = 0: ð12:66Þ

This implies that c0 = 0. Thus, we are left with

k-1
cN
i=1 i
i
x = 0: ð12:67Þ

Next, multiplying (12.67) by Nk - 2 from the left, we get similarly

c1 N k - 1 x = 0, ð12:68Þ

implying that c1 = 0. Continuing this procedure, we finally get ci = 0 (0 ≤ i ≤ k - 1).


This completes the proof.
504 12 Canonical Forms of Matrices

This immediately tells us that for a k-th order nilpotent matrix N : Vn → Vn, we
should have k ≤ n. This is because the number of linearly independent vectors is at
most n.
In Example 12.4, N causes a transformation of basis vectors in V2 such that

0 1
ð e1 e2 ÞN = ð e1 e2 Þ =ð0 e1 Þ:
0 0

That is, Ne2 = e1. Then, linearly independent two vectors e2 and Ne2 correspond
to the case of Theorem 12.4.
So far we have shown simple cases where matrices can be diagonalized via
similarity transformation. This is equivalent to that the relevant vector space is
decomposed to a direct sum of (invariant) subspaces. Nonetheless, if a characteristic
polynomial of the matrix has multiple root(s), it remains uncertain whether the vector
space is decomposed to such a direct sum. To answer this question, we need a
following lemma.
Lemma 12.1 Let f1(x), f2(x), ⋯, and fs(x) be polynomials without a common factor.
Then we have s polynomials M1(x), M2(x), ⋯, Ms(x) that satisfy the following
relation:

M 1 ðxÞf 1 ðxÞ þ M 2 ðxÞf 2 ðxÞ þ ⋯ þ M s ðxÞf s ðxÞ = 1: ð12:69Þ

Proof Let Mi(x) (1 ≤ i ≤ s) be arbitrarily chosen polynomials and deal with a set of
g(x) that can be expressed as

gðxÞ = M 1 ðxÞf 1 ðxÞ þ M 2 ðxÞf 2 ðxÞ þ ⋯ þ M s ðxÞf s ðxÞ: ð12:70Þ

Let a whole set of g(x) be H. Then H has the following two properties:
1. g1 ðxÞ, g2 ðxÞ 2 H ⟹ g1 ðxÞ þ g2 ðxÞ 2 H,
2. gðxÞ 2 H, M(x): an arbitrarily chosen polynomial ⟹ M ðxÞgðxÞ 2 H.

Now let the lowest-order polynomial out of the set H be g0(x). Then 8 gðxÞð2 HÞ
are a multiple of g0(x). Suppose that dividing g(x) by g0(x), we have

gðxÞ = M ðxÞg0 ðxÞ þ hðxÞ, ð12:71Þ

where h(x) is a certain polynomial. Since gðxÞ, g0 ðxÞ 2 H, we have hðxÞ 2 H from
the above properties (1) and (2). If h(x) ≠ 0, the order of h(x) is lower than that of
g0(x) from (12.71). This is, however, in contradiction to the definition of g0(x).
Therefore, h(x) = 0. Thus,

gðxÞ = M ðxÞg0 ðxÞ: ð12:72Þ


12.4 Idempotent Matrices and Generalized Eigenspaces 505

This implies that H is identical to a whole set of polynomials comprising multiples


of g0(x). In particular, polynomials f i ðxÞ ð1 ≤ i ≤ sÞ 2 H. To show this, put Mi(x) = 1
with other Mj(x) = 0 ( j ≠ i). Hence, the polynomial g0(x) should be a common factor
of fi(x). Meanwhile, g0 ðxÞ 2 H, and so by virtue of (12.70) we have

g0 ðxÞ = M 1 ðxÞf 1 ðxÞ þ M 2 ðxÞf 2 ðxÞ þ ⋯ þ M s ðxÞf s ðxÞ: ð12:73Þ

On assumption the greatest common factor of f1(x), f2(x), ⋯, and fs(x) is 1. This
implies that g0(x) = 1. Thus, we finally get (12.69) and complete the proof.

12.4 Idempotent Matrices and Generalized Eigenspaces

In Sect. 12.1 we have shown that eigenvectors corresponding to different eigen-


values of A are linearly independent. Also if those eigenvalues do not possess
multiple roots, the vector space comprises a direct sum of the subspaces of
corresponding eigenvectors. However, how do we have to treat a situation where
eigenvalues possess multiple roots? Even in this case there is at least one eigenvector
that corresponds to the eigenvalue. To adequately address the question, we need a
concept of generalized eigenvectors and generalized eigenspaces.
For this purpose we extend and generalize the concept of eigenvectors. For a
certain natural number l, if a vector x (2Vn) satisfies a following relation, x is said to
be a generalized eigenvector of rank l that corresponds to an eigenvalue α.

ðA - αEÞl x = 0, ðA - αEÞl - 1 x ≠ 0:

Thus (12.3) implies that an eigenvector of (12.3) may be said to be a generalized


eigenvector of rank 1. When we need to distinguish it from generalized eigenvectors
of rank l (l ≥ 2), we call it a “proper” eigenvector. Thus far we have only been
concerned with the proper eigenvectors. We have a following theorem related to the
generalized eigenvectors.
In this section, idempotent operators play a role. The definition is simple.
Definition 12.3 An operator A is said to be idempotent if A2 = A.
From this simple definition, we can draw several important pieces of information.
Let A be an idempotent operator that operates on Vn. Let x be an arbitrary vector in
Vn. Then, A2x = A(Ax) = Ax. That is,

AðAx - xÞ = 0:

Then, we have
506 12 Canonical Forms of Matrices

Ax = x or Ax = 0: ð12:74Þ

From Theorem 11.3 (dimension theorem), for a general case of linear operators
we had

V n = A ðV n Þ Ker A,

where

AðV n Þ = fAx; x 2 V n g, Ker A = fx; x 2 V n , Ax = 0g:

Consequently, if A is an idempotent operator, we must have

AðV n Þ = fx; x 2 V n , Ax = xg, Ker A = fx; x 2 V n , Ax = 0g: ð12:75Þ

Thus, we find that an idempotent operator A decomposes Vn into a direct sum of


A(Vn) to which Ax = x and Ker A to which Ax = 0. Conversely, if there exists an
operator A that satisfies (12.75), such A must be an idempotent operator.
Meanwhile, we have

ðE - AÞ2 = E - 2A þ A2 = E - 2A þ A = E - A:

Hence, E - A is an idempotent matrix as well. Moreover, we have

AðE - AÞ = ðE - AÞA = 0:

Putting E - A  B and following a procedure similar to the above

Bx - x = ðE - AÞx - x = x - Ax - x = - Ax:

Therefore, B(Vn) = {x; x 2 Vn, Bx = x} is identical to Ker A. Writing

W A = fx; x 2 V n , Ax = xg, W A = fx; x 2 V n , Ax = 0g , ð12:76Þ

we get

W A = AV n , W A = ðE - AÞV n = BV n = V n - AV n = Ker A:

That is,

V n = W A þ W A:

Suppose that ∃ u 2 W A \ W A . Then, from (12.76) Au = u and Au = 0, namely


u = 0, and so V is a direct sum of WA and W A . That is,
12.4 Idempotent Matrices and Generalized Eigenspaces 507

Vn = WA W A:

Notice that if we consider an identity operator E as an idempotent operator, we are


thinking of a trivial case. That is, Vn = Vn {0}.
The result can immediately be extended to the case where more idempotent
operators take part in the vector space. Let us define operators such that

A1 þ A2 þ ⋯ þ As = E,

where E is a (n, n) identity matrix. Also

Ai Aj = Ai δij :

Moreover, we define Wi such that

W i = Ai V n = fx; x 2 V n , Ai x = xg: ð12:77Þ

Then

Vn = W1 W2 ⋯ W s: ð12:78Þ

In fact, suppose that ∃x( 2 Vn) 2 Wi, Wj (i ≠ j). Then Aix = x = Ajx. Operating Aj
( j ≠ i) from the left, AjAix = Ajx = AjAjx. That is, 0 = x = Ajx, implying that
Wi \ Wj = {0}. Meanwhile,

V n = ðA1 þ A2 þ ⋯ þ As ÞV n = A1 V n þ A2 V n þ ⋯ þ As V n
ð12:79Þ
= W 1 þ W 2 þ ⋯ þ W s:

As Wi \ Wj = {0} (i ≠ j), (12.79) is a direct sum. Thus, (12.78) will follow.


Example 12.5 Think of the following transformation A:

1 0 0 0
0 1 0 0
ð e1 e2 e3 e4 ÞA = ð e1 e2 e3 e4 Þ = ð e1 e2 0 0 Þ:
0 0 0 0
0 0 0 0

4
Put x = i = 1 xi ei . Then, we have

4 2
AðxÞ = x Aðei Þ =
i=1 i
xe
i=1 i i
= W A,
508 12 Canonical Forms of Matrices

4
ðE - AÞðxÞ = x - AðxÞ = xe
i=3 i i
= W A:

In the above,

0 0 0 0
0 0 0 0
ð e1 e2 e3 e4 ÞðE - AÞ = ð e1 e2 e3 e4 Þ = ð 0 0 e3 e 4 Þ :
0 0 1 0
0 0 0 1

Thus, we have

W A = Spanfe1 , e2 g, W A = Spanfe3 , e4 g, V4 = WA W A:

The properties of idempotent matrices can easily be checked. It is left for readers.
Using the aforementioned idempotent operators, let us introduce the following
theorem.
Theorem 12.5 [3, 4] Let A be a linear transformation Vn → Vn. Suppose that a
vector x (2Vn) satisfies the following relation:

ðA - αE Þl x = 0, ð12:80Þ

where l is an enough large natural number. Then a set comprising x forms an A-


invariant subspace that corresponds to an eigenvalue α. Let α1, α2, ⋯, αs be
eigenvalues of A different from one another.
Then Vn is decomposed to a direct sum of the A-invariant subspaces that corre-
spond individually to α1, α2, ⋯, αs. This is succinctly expressed as follows:

V n = W α1 W α2 ⋯ W αs : ð12:81Þ

~ αi ð1 ≤ i ≤ sÞ is given by
Here W

~ αi = x; x 2 V n , ðA - αi E Þli x = 0 ,
W ð12:82Þ

~ α i = ni .
where li is an enough large natural number. If multiplicity of αi is ni, dim W
Proof Let us define the aforementioned A-invariant subspaces as W ~ α k ð 1 ≤ k ≤ sÞ.
Let fA(x) be a characteristic polynomial of A. Factorizing fA(x) into a product of
powers of first-degree polynomials, we have
12.4 Idempotent Matrices and Generalized Eigenspaces 509

s
f A ð xÞ = i=1
ðx - αi Þni , ð12:83Þ

where ni is a multiplicity of αi. Let us put


s nj
f i ðxÞ = f A ðxÞ=ðx - αi Þni = j≠i
x - αj : ð12:84Þ

Then f1(x), f2(x), ⋯, and fs(x) do not have a common factor. Consequently,
Lemma 12.1 tells us that there are polynomials M1(x), M2(x), ⋯, and Ms(x) such that

M 1 ðxÞf 1 ðxÞ þ ⋯ þ M s ðxÞf s ðxÞ = 1: ð12:85Þ

Replacing x with a matrix A, we get

M 1 ðAÞf 1 ðAÞ þ ⋯ þ M s ðAÞf s ðAÞ = E: ð12:86Þ

Or defining Mi(A)fi(A)  Ai, we have

A1 þ A2 þ ⋯ þ As = E, ð12:87Þ

where E is a (n, n) identity matrix. Moreover we have

Ai Aj = Ai δij : ð12:88Þ

In fact, if i ≠ j,

Ai Aj = M i ðAÞf i ðAÞM j ðAÞf j ðAÞ = M i ðAÞM j ðAÞf i ðAÞf j ðAÞ


s s
= M i ðAÞM j ðAÞ k≠i
ðA - αk Þnk l≠j
ðA - αl Þnl ð12:89Þ
s
= M i ðAÞM j ðAÞf A ðAÞ k ≠ i,j
ðA - αk Þnk = 0:

The second equality results from the fact that Mj(A) and fi(A) are commutable
since both are polynomials of A. The last equality follows from Hamilton–Cayley
theorem. On the basis of (12.87) and (12.89),

Ai = Ai E = Ai ðA1 þ A2 þ ⋯ þ As Þ = A2i : ð12:90Þ

Thus, we find that Ai is an idempotent matrix.


Next, let us show that using Ai determined above, AiVn is identical to
~ αi ð1 ≤ i ≤ sÞ. To this end, we define Wi such that
W

W i = Ai V n = fx; x 2 V n , Ai x = xg: ð12:91Þ

We have
510 12 Canonical Forms of Matrices

ðx - αi Þni f i ðxÞ = f A ðxÞ: ð12:92Þ

Therefore, from Hamilton–Cayley theorem we have

ðA - αi E Þni f i ðAÞ = 0: ð12:93Þ

Operating Mi(A) from the left and from the fact that Mi(A) commutes with
ðA - αi E Þni , we get

ðA - αi EÞni Ai = 0, ð12:94Þ

where we used Mi(A)fi(A) = Ai. Operating both sides of this equation on Vn,
furthermore, we have

ðA - αi E Þni Ai V n = 0:

This means that

~ αi ð1 ≤ i ≤ sÞ:
Ai V n ⊂ W ð12:95Þ

Conversely, suppose that x 2 W ~ αi . Then (A - αE)lx = 0 holds for a certain


natural number l. If Mi(x)fi(x) were divided out by x - αi, LHS of (12.85) would be
divided out by x - αi as well, leading to the contradiction. Thus, it follows that (x -
αi)l and Mi(x)fi(x) do not have a common factor. Consequently, Lemma 12.1 ensures
that we have polynomials M(x) and N(x) such that

M ðxÞðx - αi Þl þ N ðxÞM i ðxÞf i ðxÞ = 1

and, hence,

M ðAÞðA - αi EÞl þ N ðAÞM i ðAÞf i ðAÞ = E: ð12:96Þ

Operating both sides of (12.96) on x, we get

M ðAÞðA - αi E Þl x þ N ðAÞM i ðAÞf i ðAÞx = N ðAÞAi x = x: ð12:97Þ

Notice that the first term of (12.97) vanishes from (12.82).


As Ai is a polynomial of A, it commutes with N(A). Hence, we have

x = Ai ½N ðAÞx 2 Ai V n : ð12:98Þ

Thus, we get
12.4 Idempotent Matrices and Generalized Eigenspaces 511

~ αi ⊂ Ai V n ð1 ≤ i ≤ sÞ:
W ð12:99Þ

From (12.95) and (12.99), we conclude that

~ αi = Ai V n ð1 ≤ i ≤ sÞ:
W ð12:100Þ

~ αi that is defined as (12.82).


In other words, Wi defined as (12.91) is identical to W
Thus, we have

Vn = W1 W2 ⋯ Ws

or

~ α1
Vn = W ~ α2
W ⋯ ~ αs :
W ð12:101Þ

This completes the former half of the proof. With the latter half, the proof is as
follows.
Suppose that dim W~ αi = n0i . In parallel to the decomposition of Vn to the direct
sum of (12.81), A can be reduced as

Að1Þ ⋯ 0
A ⋮ ⋱ ⋮ , ð12:102Þ
0 ⋯ AðsÞ

where A(i) (1 ≤ i ≤ s) is a (n0i , n0i ) matrix and a symbol  indicates that A has been
transformed by suitable similarity transformation. The matrix A(i) represents a linear
transformation that A causes to W ~ αi . We denote a n0i order identity matrix by E n0 .
i
Equation (12.82) implies that the matrix represented by

N i = AðiÞ - αi En0i ð12:103Þ

is a nilpotent matrix. The order of a nilpotent matrix is at most n0i (vide supra) and,
hence, li can be n0i . With Ni we have

f N i ðxÞ = xE n0i - N i = xE n0i - AðiÞ - αi E n0i = xE n0i - AðiÞ þ αi E n0i


0
ð12:104Þ
= ðx þ αi ÞE n0i - AðiÞ = xni :

The last equality is because eigenvalues of a nilpotent matrix are all zero.
Meanwhile,
512 12 Canonical Forms of Matrices

0
f AðiÞ ðxÞ = xE n0i - AðiÞ = f N i ðx - αi Þ = ðx - αi Þni : ð12:105Þ

Equation (12.105) implies that


s s 0 s
f A ð xÞ = f ðiÞ ðxÞ =
i=1 A i=1
ðx - αi Þni = i=1
ðx - αi Þni : ð12:106Þ

The last equality comes from (12.83). Thus, n0i = ni . At the same time, we may
equate li in (12.82) to ni. These procedures complete the proof.
Theorem 12.1 shows that any square matrix can be converted to a (upper) triangle
matrix by a similarity transformation. Theorem 12.5 demonstrates that the matrix can
further be segmented according to individual eigenvalues. Considering Theorem
12.1 again, A(i) (1 ≤ i ≤ s) can be described as an upper triangle matrix by

αi ⋯ 
AðiÞ  ⋮ ⋱ ⋮ : ð12:107Þ
0 ⋯ αi

Therefore, denoting N(i) such that

0 ⋯ 
N ðiÞ = AðiÞ - αi E ni = ⋮ ⋱ ⋮ , ð12:108Þ
0 ⋯ 0

we find that N(i) is nilpotent. This is because all the eigenvalues of N(i) are zero. From
(12.108), we have
μi
N ðiÞ = 0 ðμi ≤ ni Þ:

12.5 Decomposition of Matrix

To investigate canonical forms of matrices, it would be convenient if a matrix can be


decomposed into appropriate forms. To this end, the following definition is
important.
Definition 12.4 A matrix similar to a diagonal matrix is said to be semi-simple.
In the above definition, if a matrix is related to another matrix by similarity
transformation, those matrices are said to be similar to each other. When we have
two matrices A and A′, we express it by A~A′ as stated above. This relation satisfies
the equivalence law. That is,
(1) A~A, (2) A~A′ ) A′~A, (3) A~A′, A′~A″ ) A~A″.
12.5 Decomposition of Matrix 513

Readers, check this. We have a following important theorem with the matrix
decomposition.
Theorem 12.6 [3, 4] Any (n, n) square matrix A is expressed uniquely as

A = S þ N, ð12:109Þ

where S is semi-simple and N is nilpotent; S and N are commutable, i.e., SN = NS.


Furthermore, S and N are polynomials of A with scalar coefficients.
Proof Using (12.86) and (12.87), we write
s
S = α1 A1 þ ⋯ þ αs As = α M ðAÞf i ðAÞ:
i=1 i i
ð12:110Þ

Then, (12.110) is a polynomial of A. From Theorem 12.1 and Theorem 12.5,


A(i) (1 ≤ i ≤ s) in (12.102) is characterized by that A(i) is a triangle matrix whose
eigenvalues αi (1 ≤ i ≤ s) are positioned on diagonal positions and that the order of
A(i) is identical to the multiplicity of αi. Since Ai (1 ≤ i ≤ s) is an idempotent matrix, it
should be diagonalized through similarity transformation (see Sect. 12.7). In fact,
corresponding to (12.102), S is transformed via similarity transformation into

α 1 E n1 ⋯ 0
S ⋮ ⋱ ⋮ , ð12:111Þ
0 ⋯ α s E ns

where E ni ð1 ≤ i ≤ sÞ is an identity matrix of an order ni that is identical to the


multiplicity of αi. This expression is equivalent to, e.g.,

E n1 ⋯ 0
A1  ⋮ ⋱ ⋮
0 ⋯ 0

in (12.110). Thus, S is obviously semi-simple (i.e., diagonalizable). Putting N = A -


S, N is described after the above transformation as

N ð 1Þ ⋯ 0
N ⋮ ⋱ ⋮ , N ðiÞ = AðiÞ - αi E ni : ð12:112Þ
0 ⋯ N ðsÞ

Since each N(i) is nilpotent as stated in Sect. 12.4, N is nilpotent as well. Also
(12.112) is a polynomial of A as in the case of S. Therefore, S and N are commutable.
To prove the uniqueness of the decomposition, we show the following:
1. Let S and S′ be commutable semi-simple matrices. Then, those matrices are
simultaneously diagonalized. That is, with a certain non-singular matrix
P, P-1SP and P-1S′P are diagonalized at once. Hence, S ± S′ is semi-simple
as well.
514 12 Canonical Forms of Matrices

2. Let N and N′ be commutable nilpotent matrices. Then, N ± N′ is nilpotent as well.


3. A matrix both semi-simple and nilpotent is zero matrix.
1. Let different eigenvalues of S be α1, ⋯, αs. Then, since S is semi-simple, a vector
space Vn is decomposed into a direct sum of eigenspaces W αi ð1 ≤ i ≤ sÞ. That is,
we have

V n = W α1 ⋯ W αs :

Since S and S′ are commutable, with ∃ x 2 W αi we have


SS x = S′Sx = S′(αix) = αiS′x. Hence, we have S0 x 2 W αi . Namely, W αi is S′-

invariant. Therefore, if we adopt the basis vectors {a1, ⋯, an} with respect to the
direct sum decomposition, we get

α1 E n1 ⋯ 0 S01 ⋯ 0
S ⋮ ⋱ ⋮ , S0  ⋮ ⋱ ⋮ :
0 ⋯ αs E ns 0 ⋯ S0s

Since S′ is semi-simple, S0i ð1 ≤ i ≤ sÞ must be semi-simple as well. Here, let


{e1, ⋯, en} be original basis vectors before the basis vector transformation and let
P be a representation matrix of the said transformation. Then, we have

ð e1 ⋯ en ÞP = ð a1 ⋯ an Þ:

Thus, we get

α1 E n1 ⋯ 0 S01 ⋯ 0
P - 1 SP = ⋮ ⋱ ⋮ , P - 1 S0 P = ⋮ ⋱ ⋮ :
0 ⋯ αs E ns 0 ⋯ S0s

This means that both P-1SP and P-1S′P are diagonal. That is, P-1SP ±
P S P = P-1(S ± S′)P is diagonal, indicating that S ± S′ is semi-simple as well.
-1 ′
0
2. Suppose that Nν = 0 and N 0ν = 0. From the assumption, N and N′ are commut-
able. Consequently, using binomial theorem we have

m!
ðN ± N 0 Þ = N m ± mN m - 1 N 0 þ ⋯ þ ð ± 1Þi N m - iN0 þ ⋯
m i
i!ðm - iÞ! ð12:113Þ
þð ± 1Þm N 0 :
m

Putting m = ν + ν′ - 1, if i ≥ ν′, N′i = 0 from the supposition. If i < ν′, we have


m - i > m - ν′ = ν + ν′ - 1 - ν′ = ν - 1; i.e., m - i ≥ ν. Therefore, Nm - i = 0.
Consequently, we have Nm - iN′i = 0 with any i in (12.113). Thus, we get
(N ± N′)m = 0, indicating that N ± N′ is nilpotent.
3. Let S be a semi-simple and nilpotent matrix. We describe S as
12.6 Jordan Canonical Form 515

α1 ⋯ 0
S ⋮ ⋱ ⋮ , ð12:114Þ
0 ⋯ αn

where some of αi (1 ≤ i ≤ n) may be identical. Since S is nilpotent, all αi (1 ≤ i ≤ n)


is zero. We have then S~0; i.e., S = 0 accordingly.
Now, suppose that a matrix A is decomposed differently from (12.109). That
is, we have

A = S þ N = S0 þ N 0 or S - S0 = N 0 - N: ð12:115Þ

From the assumption, S′ and N′ are commutable. Moreover, since S, S′, N, and

N are described by a polynomial of A, they are commutable with one another.
Hence, from (1) and (2) along with the second equation of (12.115), S - S′ and
N′ - N are both semi-simple and nilpotent at once. Consequently, from (3) S -
S′ = N′ - N = 0. Thus, we finally get S = S′ and N = N′. That is, the
decomposition is unique.
These complete the proof.
Theorem 12.6 implies that the matrix decomposition of (12.109) is unique. This
matrix decomposition is said to be Jordan decomposition. On the basis of Theorem
12.6, we investigate Jordan canonical forms of matrices in the next section.

12.6 Jordan Canonical Form

Once the vector space Vn has been decomposed to a direct sum of generalized
eigenspaces with the matrix reduced in parallel, we are able to deal with individual
~ αi ð1 ≤ i ≤ sÞ and the corresponding A(i) (1 ≤ i ≤ s) given in (12.102)
eigenspaces W
separately.

12.6.1 Canonical Form of Nilpotent Matrix

To avoid complication of notation, we think of a following example where we


assume a (n, n) matrix that operates on Vn: Let the nilpotent matrix be N: Suppose
that Nν - 1 ≠ 0 and Nν = 0 (1 ≤ ν ≤ n). To avoid complex notation, we focus on one
of the eigenspaces W ~ αi ð1 ≤ i ≤ sÞ and regard it as n-dimensional vector space Vn.
If ν = 1, the nilpotent matrix N is a zero matrix. Notice that since a characteristic
polynomial is described by fN(x) = xn, fN(N ) = Nn = 0 from Hamilton–Cayley
theorem. Let W(i) be given such that
516 12 Canonical Forms of Matrices

W ðiÞ = x; x 2 V n , N i x = 0, N i - 1 x ≠ 0 ði ≥ 1Þ ; W ð0Þ  f0g:

Then we have

V n = W ðνÞ ⊃ W ðν - 1Þ ⊃ ⋯ ⊃ W ð1Þ ⊃ W ð0Þ  f0g: ð12:116Þ

Note that when ν = 1, we have trivially Vn = W(ν) ⊃ W(0)  {0}. Let us put dim
W = mi, mi - mi - 1 = ri (1 ≤ i ≤ ν), m0  0. Then we can add rν linearly
(i)

independent vectors a1 , a2 , ⋯, and arν to the basis vectors of W(ν - 1) so that those rν
vectors can be basis vectors of W(ν). Unless N = 0 (i.e., zero matrix), we must have at
least one such vector; from the supposition with ∃x ≠ 0 we have Nν - 1x ≠ 0, and so
x2= W(ν - 1). At least one such vector x is present and it is eligible for a basis vector of
W(ν). Hence, rν ≥ 1 and we have

W ðνÞ = Spanfa1 , a2 , ⋯, arν g W ðν - 1Þ : ð12:117Þ

Note that (12.117) is expressed as a direct sum. Meanwhile,


Na1 , Na2 , ⋯, Narν 2 W ðν - 1Þ . In fact, suppose that x 2 W(ν), i.e., Nνx = Nν - 1(Nx)
= 0. That is, Nx 2 W(ν - 1).
According to a similar reasoning made above, we have

SpanfNa1 , Na2 , ⋯, Narν g \ W ðν - 2Þ = f0g:

Moreover, these rν vectors Na1 , Na2 , ⋯, and Narν are linearly independent.
Suppose that

c1 Na1 þ c2 Na2 þ ⋯ þ crν Narν = 0: ð12:118Þ

Operating Nν - 2 from the left, we have

N ν - 1 ðc1 a1 þ c2 a2 þ ⋯ þ crν arν Þ = 0:

This would imply that c1 a1 þ c2 a2 þ ⋯ þ crν arν 2 W ðν - 1Þ . On the basis of the


above argument, however, we must have c1 = c2 = ⋯ = crν = 0. In other words, if

ci (1 ≤ i ≤ rν) were non-zero, we would have ai 2 W(ν - 1), in contradiction. From
(12.118) this means linear independence of Na1 , Na2 , ⋯, and Narν .
As W(ν - 1) ⊃ W(ν - 2), we may well have additional linearly independent vectors
within the basis vectors of W(ν - 1). Let those vectors be arν þ1 , ⋯, arν - 1 . Here we
assume that the number of such vectors is rν - 1 - rν. We have rν - 1 - rν ≥ 0
accordingly. In this way we can construct basis vectors of W(ν - 1) by including
arν þ1 , ⋯, and arν - 1 along with Na1 , Na2 , ⋯, and Narν . As a result, we get
12.6 Jordan Canonical Form 517

W ðν - 1Þ = SpanfNa1 , ⋯, Narν , arν þ1 , ⋯, arν - 1 g W ðν - 2Þ :

We can repeat these processes to construct W(ν - 2) such that

W ðν - 2Þ = Span N 2 a1 , ⋯, N 2 arν , Narν þ1 , ⋯, Narν - 1 , arν - 1 þ1 , ⋯, arν - 2


W ðν - 3Þ :

For W(i), furthermore, we have

W ðiÞ =
Span N ν - i a1 , ⋯, N ν - i arν , N ν - i - 1 arν þ1 , ⋯, N ν - i - 1 arν - 1 , ⋯, ariþ1 þ1 , ⋯, ari
W ði - 1Þ :
ð12:119Þ

Further repeating the procedures, we exhaust all the n basis vectors of


W(ν) = Vn. These vectors are given as follows:

N k ariþ1 þ1 , ⋯, N k ari ð1 ≤ i ≤ ν; 0 ≤ k ≤ i - 1Þ:

At the same time we have

0  rνþ1 < 1 ≤ r ν ≤ r ν - 1 ≤ ⋯ ≤ r1 : ð12:120Þ

Table 12.1 [3, 4] shows the resulting structure of these basis vectors pertinent to
Jordan blocks. In Table 12.1, if laterally counting basis vectors, from the top we have
rν, rν - 1, ⋯ , r1 vectors. Their sum is n. This is the same number as that vertically
counted. The dimension n of the vector space Vn is thus given by
ν ν
n= r
i=1 i
= i=1
iðr i - r iþ1 Þ: ð12:121Þ

Let us examine the structure of Table 12.1 more closely. More specifically, let us
inspect the i-layered structures of (ri - ri + 1) vectors. Picking up a vector from
among ariþ1 þ1 , ⋯, ari , we call it aρ. Then, we get following set of vectors aρ, Naρ
N2aρ, ⋯, Ni - 1aρ in the i-layered structure. These i vectors are displayed “vertically”
in Table 12.1. These vectors are linearly independent (see Theorem 12.4) and form
an i-dimensional N-invariant subspace; i.e., Span{Ni - 1aρ, Ni - 2aρ, ⋯, Naρ, aρ},
where ri + 1 + 1 ≤ ρ ≤ ri. Matrix representation of the linear transformation N with
respect to the set of these i vectors is [3, 4]
518

Table 12.1 Jordan blocks and structure of basis vectors for a nilpotent matrixa
rν#
rν← a1 , ⋯, arν rν - 1 - rν#
rν - 1← Na1 , ⋯, Narν arν þ1 , ⋯, arν - 1
⋯ N 2 a1 , ⋯, N 2 arν Narν þ1 , ⋯, Narν - 1
⋯ ⋯ ⋯ ⋯ ri - ri + 1#
ri← N ν - i a1 , ⋯, N ν - i arν N ν - i - 1 arν þ1 , ⋯, N ν - i - 1 arν - 1 ⋯ ariþ1 þ1 , ⋯, ari
ri - 1← N ν - i - 1 a1 , ⋯, N ν - i - 1 arν N ν - i - 2 arν þ1 , ⋯, N ν - i - 2 arν - 1 ⋯ Nariþ1 þ1 , ⋯, Nari
⋯ ⋯ ⋯ ⋯ ⋯ r2 - r3#
r2← N ν - 2 a1 , ⋯, N ν - 2 arν N ν - 3 arν þ1 , ⋯, N ν - 3 arν - 1 ⋯ N i - 2 ariþ1 þ1 , ⋯, N i - 2 ari ⋯ ar3 þ1 , ⋯, ar2 r1 - r2#
ν-1 ν-1 ν-2 ν-2 i-1 i-1
r1← N a1 , ⋯, N ar ν N arν þ1 , ⋯, N ar ν - 1 ⋯ N ariþ1 þ1 , ⋯, N ari ⋯ Nar3 þ1 , ⋯, Nar2 ar2 þ1 , ⋯, ar1
12

ν
n= k = 1 rk νrν (ν - 1)(rν - 1 - rν) ⋯ i(ri - ri + 1) ⋯ 2(r2 - r3) r1 - r2
a
Adapted from Satake I (1974) Linear algebra (Mathematics Library 1: in Japanese) [4], with the permission of Shokabo Co., Ltd., Tokyo
Canonical Forms of Matrices
12.6 Jordan Canonical Form 519

N i - 1 aρ N i - 2 aρ ⋯Naρ aρ N
0 1
0 1
0 ⋱
i-1 i-2
ð12:122Þ
= N aρ N aρ ⋯Naρ aρ ⋮ ⋱ ⋱ ⋮ :
0 1
⋯ 0 1
0

These (i, i) matrices of (12.122) are called i-th order Jordan blocks. Equation
(12.122) clearly shows that Ni - 1aρ is a proper eigenvector and that we have i - 1
generalized eigenvectors Ni - 2aρ, ⋯, Naρ, aρ.
Notice that the number of those Jordan blocks is (ri - ri + 1). Let us expressly
define this number as [3, 4]

J i = r i - r iþ1 , ð12:123Þ

where Ji is the number of the i-th order Jordan blocks. The total number of Jordan
blocks within a whole vector space Vn = W(ν) is
ν ν
J
i=1 i
= i=1
ðr i - r iþ1 Þ = r 1 : ð12:124Þ

Recalling the dimension theorem mentioned in (11.45), we have

dim V n = dim Ker N i þ dimN i ðV n Þ = dim Ker N i þ rankN i : ð12:125Þ

Meanwhile, since W(i) = Ker Ni, dim W(i) = mi = dim Ker Ni. From (12.116),
m0  0. Then (11.45) is now read as

dim V n = mi þ rankN i : ð12:126Þ

That is

n = mi þ rankN i ð12:127Þ

or

mi = n - rankN i : ð12:128Þ

Meanwhile, from Table 12.1 [3, 4] we have


520 12 Canonical Forms of Matrices

(b)

(a) ⋯

, , ⋯,
= = = ⋯= 1

Fig. 12.1 Examples of the structure of Jordan blocks. (a) r1 = n. (b) r1 = r2 = ⋯ = rn = 1

i i-1
dimW ðiÞ = mi = r ,
k=1 k
dimW ði - 1Þ = mi - 1 = r :
k=1 k
ð12:129Þ

Hence, we have

r i = mi - mi - 1 : ð12:130Þ

Then we get [3, 4]

J i = r i - r iþ1 = ðmi - mi - 1 Þ - ðmiþ1 - mi Þ = 2mi - mi - 1 - miþ1


= 2 n - rankN i - n - rankN i - 1 - n - rankN iþ1 ð12:131Þ
i-1
= rankN þ rankN iþ1
- 2rankN :i

The number Ji is therefore defined uniquely by N. The total number of Jordan


blocks r1 is also computed using (12.128) and (12.130) as

r 1 = m1 - m0 = m1 = n - rankN = dim Ker N, ð12:132Þ

where the last equality arises from the dimension theorem expressed as (11.45).
In Table 12.1 [3, 4], moreover, we have two extreme cases. That is, if ν = 1 in
(12.116), i.e., N = 0, from (12.132) we have r1 = n and r2 = ⋯ = rn = 0; see
Fig. 12.1a. Also we confirm n = νi = 1 r i in (12.121). In this case, all the eigenvectors
are proper eigenvectors with multiplicity of n and we have n first-order Jordan
blocks. The other is the case of ν = n. In that case, we have
12.6 Jordan Canonical Form 521

r 1 = r 2 = ⋯ = rn = 1: ð12:133Þ

In the latter case, we also have n = νi = 1 r i in (12.121). We have only one proper
eigenvector and (n - 1) generalized eigenvectors; see Fig. 12.1b. From (12.132), we
have this special case where we have only one n-th order Jordan block, when
rankN = n - 1.

12.6.2 Jordan Blocks

Let us think of (12.81) on the basis of (12.102). Picking up A(i) from (12.102) and
considering (12.108), we put

N i = AðiÞ - αi Eni ð1 ≤ i ≤ sÞ, ð12:134Þ

where Eni denotes (ni, ni) identity matrix. We express a nilpotent matrix as Ni as
before. In (12.134) the number ni corresponds to n in Vn of Sect. 12.6.1. As Ni is a (ni,
ni) matrix, we have

N i ni = 0:

Here we are speaking of ν-th order nilpotent matrices Ni such that Niν - 1 ≠ 0 and
ν
Ni = 0 (1 ≤ ν ≤ ni). We can deal with Ni in a manner fully consistent with the theory
we developed in Sect. 12.6.1. Each A(i) comprises one or more Jordan blocks A(κ)
that is expressed as

AðκÞ = N κi þ αi E κi ð1 ≤ κi ≤ ni Þ, ð12:135Þ

where A(κ) denotes the κ-th Jordan block in A(i). In A(κ), N κi and Eκi are nilpotent
(κi, κ i)-matrix and (κ i, κi) identity matrix, respectively. In N κi , κi zeros are displayed
on the principal diagonal and entries of 1 are positioned on the matrix element next
above the principal diagonal. All other entries are zero; see, e.g., a matrix of
(12.122). As in the Sect. 12.6.1, the number κi is called a dimension of the Jordan
block. Thus, A(i) of (12.102) can further be reduced to segmented matrices A(κ). Our
next task is to find out how many Jordan blocks are contained in individual A(i) and
what is the dimension of those Jordan blocks.
Corresponding to (12.122), the matrix representation of the linear transformation
by A(κ) with respect to the set of κ i vectors is
522 12 Canonical Forms of Matrices

AðκÞ - αi E κi κi - 1
aσ AðκÞ - αi E κi κi - 2
aσ ⋯ aσ AðκÞ - αi Eκi

= AðκÞ - αi E κi κi - 1
aσ AðκÞ - αi Eκi κi - 2
aσ ⋯ aσ
0 1
0 1
0 ⋱ ð12:136Þ

× ⋮ ⋱ ⋱ ⋮ :
0 1
⋯ 0 1
0

A vector aσ stands for a vector associated with the κ-th Jordan block of A(i). From
(12.136) we obtain

κi - 1
AðκÞ - αi E κi AðκÞ - αi Eκi aσ = 0: ð12:137Þ

Namely,

κi - 1 κi - 1
AðκÞ AðκÞ - αi E κi aσ = αi AðκÞ - αi E κi aσ : ð12:138Þ

κi - 1
This shows that AðκÞ - αi Eκi aσ is a proper eigenvector of A(κ) that corre-
sponds to an eigenvalue αi. Conversely, aσ is a generalized eigenvector of rank κi.
There are another (κi - 2) generalized eigenvectors of
μ
AðκÞ - αi E κi aσ ð1 ≤ μ ≤ κi - 2Þ. In total, there are κ i eigenvectors [a proper eigen-
vector and (κ i - 1) generalized eigenvectors]. Also we see that the sole proper
eigenvector can be found for each Jordan block.
In reference to these κi eigenvectors as the basis vectors, the (κi, κi)-matrix A(κ)
(i.e., a Jordan block) is expressed as
12.6 Jordan Canonical Form 523

αi 1
αi 1
αi ⋱
ðκ Þ
A = ⋮ ⋱ ⋱ ⋮ : ð12:139Þ
αi 1
⋯ αi 1
αi

A (ni, ni)-matrix A(i) of (12.102) pertinent to an eigenvalue αi contains a direct


sum of Jordan blocks whose dimension ranges from 1 to ni. The largest possible
number of Jordan blocks of dimension d (that satisfies n2i þ 1 ≤ d ≤ ni , where [μ]
denotes a largest integer that does not exceed μ) is at most one.
An example depicted below is a matrix A( p) (1 ≤ p ≤ s) that explicitly includes
two one-dimensional Jordan blocks, a (3, 3) three-dimensional Jordan block, and a
(5, 5) five-dimensional Jordan block:

αp 0
αp 0
αp 1 ...
αp 1
⋮ αp 0 ⋮
AðpÞ  :
αp 1
αp 1
... αp 1
αp 1
αp

where A( p) is a (10, 10) upper triangle matrix in which an eigenvalue αp is displayed


on the principal diagonal with entries 0 or 1 on the matrix element next above the
principal diagonal with all other entries being zero.
Theorem 12.1 shows that every (n, n) square matrix can be converted to a triangle
matrix by suitable similarity transformation. Diagonal elements give eigenvalues.
Furthermore, Theorem 12.5 ensures that A can be reduced to generalized
eigenspaces W ~ αi (1 ≤ i ≤ s) according to individual eigenvalues. Suppose, for
example, that after a suitable similarity transformation a full matrix A is
represented as
524 12 Canonical Forms of Matrices

α1
 
α2

α2
α2

  
αi
A  
, ð12:140Þ
αi

αi
αi


αs
αs

where

A = Að1Þ ⨁Að2Þ ⨁⋯⨁AðiÞ ⨁⋯⨁AðsÞ : ð12:141Þ

In (12.141), A(1) is a (1, 1) matrix (i.e., simply a number); A(2) is a (3, 3) matrix; ⋯
A is a (4, 4) matrix; ⋯ A(s) is a (2, 2) matrix.
(i)

The above matrix form allows us to further deal with segmented triangle matrices
separately. In the case of (12.140) we may use a following matrix for similarity
transformation:

1
1
1
1

p11 p12 p13 p14
P= ,
p21 p22 p23 p24
p31 p32 p33 p34
p41 p42 p43 p44

1
1

where a (4, 4) matrix P given by


12.6 Jordan Canonical Form 525

p11 p12 p13 p14


p21 p22 p23 p24
P=
p31 p32 p33 p34
p41 p42 p43 p44

is a non-singular matrix. The matrix P is to be operated on A(i) so that we can


separately perform the similarity transformation with respect to a (4, 4) nilpotent
matrix A(i) - αiE4 following the procedures mentioned Sect. 12.6.1.
Thus, only an αi-associated segment can be treated with other segments left
unchanged. In a similar fashion, we can consecutively deal with matrix segments
related to other eigenvalues. In a practical case, however, it is more convenient to
seek different eigenvalues and corresponding (generalized) eigenvectors at once and
convert the matrix to Jordan canonical form. To make a guess about the structure of a
matrix, however, the following argument will be useful. Let us think of an example
after that.
Using (12.140), we have

A - αi E 
α1 - αi
 
α2 - αi

α2 - αi
α2 - αi

  
0
:
 
0

0
0


αs - αi
αs - αi

Here we are treating the (n, n) matrix A on Vn. Note that a matrix A - αiE is not
nilpotent as a whole. Suppose that the multiplicity of αi is ni; in (12.140) ni = 4.
Since eigenvalues α1, α2, ⋯, and αs take different values from one another, α1 - αi,
α2 - αi, ⋯, and αs - αi ≠ 0. In a triangle matrix diagonal elements give eigenvalues
and, hence, α1 - αi, α2 - αi, ⋯, and αs - αi are non-zero eigenvalues of A - αiE. We
rewrite (12.140) as
526 12 Canonical Forms of Matrices

M ð1Þ
M ð2Þ

A - αi E  , ð12:142Þ
M ðiÞ

M ðsÞ

 
α2 - αi
where M ð1Þ = ðα1 - αi Þ, M ð2Þ = α2 - αi 
, ⋯, M ðiÞ =
α2 - αi
  
0
  
0 αs - αi
 , ⋯, M ðsÞ = :
0 αs - αi
0
Thus, M( p) ( p ≠ i) is a non-singular matrix and M(i) is a nilpotent matrix. Note that
μ -1 μ
if we can find μi such that M ðiÞ i ≠ 0 and M ðiÞ i = 0, with a minimal polynomial
φM ðiÞ ðxÞ for M(i), we have φM ðiÞ ðxÞ = xμi . Consequently, we get
μi
M ð1Þ
μi
M ð2Þ

ð A - α i E Þ μi  : ð12:143Þ
0

μi
M ðsÞ
μ
In (12.143) diagonal elements of non-singular triangle matrices M ðpÞ i ðp ≠ iÞ
μ
are αp - αi i ð ≠ 0Þ. Thus, we have a “perforated” matrix ðA - αi EÞμi , where
ðiÞ μi
M = 0 in (12.143). Putting

s
ΦA ð x Þ  i=1
ðx - αi Þμi ,

we get
12.6 Jordan Canonical Form 527

s
ΦA ðAÞ  i=1
ðA - αi E Þμi = 0:

A polynomial ΦA(x) gives a minimal polynomial for fA(x).


From the above argument, we can choose μi for li in (12.82). Meanwhile, M(i) in
(12.142) is identical to Ni in (12.134). Rewriting (12.134), we get

AðiÞ = M ðiÞ þ αi Eni : ð12:144Þ


k
Let us think of matrices AðiÞ - αi Eni and (A - αiE)k (k ≤ μi). From (11.45),
we find

dim V n = n = dim Ker ðA - αi EÞk þ rank ðA - αi E Þk ð12:145Þ

and

k k
~ αi = ni = dim Ker AðiÞ - αi E ni
dim W þ rank AðiÞ - αi Eni , ð12:146Þ

where rank (A - αiE)k = dim (A - αiE)k(Vn) and


ðiÞ k ðiÞ k
~
rank A - αi Eni = dim A - αi E ni W αi .
Noting that

k
dim Ker ðA - αi EÞk = dim Ker AðiÞ - αi E ni , ð12:147Þ

we get

k
n - rank ðA - αi E Þk = ni - rank AðiÞ - αi Eni : ð12:148Þ

This notable property comes from the non-singularity of [M( p)]k ( p ≠ i, k: a


positive integer); i.e., all the eigenvalues of [M( p)]k are non-zero. In particular, as
μ
rank AðiÞ - αi Eni i = 0, from (12.148) we have

rankðA - αi E Þl = n - ni ðl ≥ μi Þ: ð12:149Þ

Meanwhile, putting k = 1 in (12.148) and using (12.132) we get

dim Ker AðiÞ - αi Eni = ni - rank AðiÞ - αi Eni = n - rank ðA - αi EÞ


ð12:150Þ
ðiÞ
= dim Ker ðA - α i E Þ = r 1 :

ðiÞ
The value r 1 gives the number of Jordan blocks with an eigenvalue αi.
528 12 Canonical Forms of Matrices

Moreover, we must consider a following situation. We know how the matrix A(i)
in (12.134) is reduced to Jordan blocks of lower dimension. To get detailed infor-
mation about it, however, we have to get the information about (generalized)
eigenvalues corresponding to eigenvalues other than αi. In this context,
Eq. (12.148) is useful. Equation (12.131) tells how the number of Jordan blocks in
a nilpotent matrix is determined. If we can get this knowledge before we have found
out all the (generalized) eigenvectors, it will be easier to address the problem. Let us
rewrite (12.131) as

J ðqiÞ = r q - r qþ1
q-1 qþ1 q
= rank AðiÞ - αi E ni þ rank AðiÞ - αi E ni - 2rank AðiÞ - αi E ni ,
ð12:151Þ

where we define J ðqiÞ as the number of the q-th order Jordan blocks within A(i). Note
that these blocks are expressed as (q, q) matrices. Meanwhile, using (12.148) J ðqiÞ is
expressed as

J ðqiÞ = rank ðA - αi E Þq - 1 þ rank ðA - αi E Þqþ1 - 2rank ðA - αi E Þq : ð12:152Þ

This relation can be obtained by replacing k in (12.148) with q - 1, q + 1, and q,


respectively and deleting n and ni from these three relations. This enables us to gain
access to a whole structure of the linear transformation represented by the (n, n)
matrix A without reducing it to subspaces.
To enrich our understanding of Jordan canonical forms, a following tangible
example will be beneficial.

12.6.3 Example of Jordan Canonical Form

Let us think of a following matrix A:

1 -1 0 0
A¼ 1 3 0 0 : ð12:153Þ
0 0 2 0
-3 -1 -2 4

The characteristic equation fA(x) is given by


12.6 Jordan Canonical Form 529

x-1 1 0 0
-1 x-3 0 0
f A ð xÞ =
0 0 x-2 0 ð12:154Þ

3 1 2 x-4
3
= ð x - 4Þ ð x - 2Þ :

Equating (12.154) to zero, we get an eigenvalue 4 as a simple root and an


eigenvalue 2 as a triple root. The vector space V4 is then decomposed to two
invariant subspaces. The first is a one-dimensional kernel (or null-space) of the
transformation (A - 4E) and the other is a three-dimensional kernel of the transfor-
mation (A - 2E)3. We have to seek eigenvectors that span these invariant subspaces
with individual eigenvalues of x.
1. Case I (x = 4):
An eigenvector belonging to the first invariant subspace must satisfy a proper
eigenvalue equation since the eigenvalue 4 is simple root. This equation is
expressed as

ðA - 4EÞx = 0:

This reads in a matrix form as

-3 -1 0 0 c1
1 -1 0 0 c2
= 0: ð12:155Þ
0 0 -2 0 c3
-3 -1 -2 0 c4

This is equivalent to a following set of four equations:

- 3c1 - c2 = 0,
c1 - c2 = 0,
- 2c3 = 0,
- 3c1 - c2 - 2c3 = 0:

These are equivalent to that c3 = 0 and c1 = c2 = - 3c1. Therefore,


c1 = c2 = c3 = 0 with an arbitrarily chosen number of c4, which is chosen as
ð4Þ
1 as usual. Hence, designating the proper eigenvector as e1 , its column vector
representation is
530 12 Canonical Forms of Matrices

0
ð4Þ 0
e1 = 0 :
1

A (4, 4) matrix in (12.155) representing (A - 4E) has a rank 3. The number of


Jordan blocks for an eigenvalue 4 is given by (12.150) as

ð4Þ
r 1 = 4 - rankðA - 4E Þ = 1: ð12:156Þ

In this case, the Jordan block is naturally one-dimensional. In fact, using


(12.152) we have

ð4Þ
J 1 = rank ðA - 4E Þ0 þ rank ðA - 4E Þ2 - 2rank ðA - 4E Þ
ð12:157Þ
= 4 þ 3 - 2 x 3 = 1:

ð4Þ
In (12.157), J 1 gives the number of the first-order Jordan blocks for an
eigenvalue 4. We used

8 4 0 0
ðA - 4EÞ =2 -4 0 0 0 ð12:158Þ
0 0 4 0
8 4 4 0

and confirmed that rank (A - 4E)2 = 3.


2. Case II (x = 2):
The eigenvalue 2 has a triple root. Therefore, we must examine how the
invariant subspaces can further be decomposed to subspaces of lower dimension.
To this end we first start with a secular equation expressed as

ðA - 2EÞx = 0: ð12:159Þ

The matrix representation is

-1 -1 0 0 c1
1 1 0 0 c2
0 0 0 0 = 0:
c3
-3 -1 -2 2 c4

This is equivalent to a following set of two equations:

c1 þ c2 = 0,
- 3c1 - c2 - 2c3 þ 2c4 = 0:

From the above, we can put c1 = c2 = 0 and c3 = c4 (=1). The equations allow
the existence of another proper eigenvector. For this we have c1 = - c2 = 1,
12.6 Jordan Canonical Form 531

c3 = 0, and c4 = 1. Thus, for the two proper eigenvectors corresponding to an


eigenvalue 2, we get

0 1
ð2Þ
e1 = 0 ,
ð2Þ
e2 = -1 :
1 0
1 1

A dimension of the invariant subspace corresponding to an eigenvalue 2 is three


(due to the triple root) and, hence, there should be one generalized eigenvector. To
determine it, we examine a following matrix equation:

ðA - 2E Þ2 x = 0: ð12:160Þ

The matrix representation is

0 0 0 0 c1
0 0 0 0 c2 = 0:
0 0 0 0 c3
-4 0 -4 4 c4

That is,

- c1 - c3 þ c4 = 0: ð12:161Þ

Furthermore, we have

0 0 0 0
ðA - 2E Þ = 3 0 0 0 0 : ð12:162Þ
0 0 0 0
-8 0 -8 8

Moreover, rank (A - 2E)l(=1) remains unchanged for l ≥ 2 as expected from


(12.149).
It will be convenient to examine a structure of the invariant subspace. For this
ð2Þ
purpose, we seek the number of Jordan blocks r 1 and their order. Using (12.150),
we have

ð2Þ
r 1 = 4 - rankðA - 2EÞ = 4 - 2 = 2: ð12:163Þ

The number of first-order Jordan blocks is

ð2Þ
J1 = rank ðA - 2E Þ0 þ rank ðA - 2E Þ2 - 2rank ðA - 2EÞ
ð12:164Þ
= 4 þ 1 - 2 × 2 = 1:

In turn, the number of second-order Jordan blocks is


532 12 Canonical Forms of Matrices

Fig. 12.2 Structure of


Jordan blocks of a matrix Jordan blocks
shown in (12.170)

( )

( ) ( ) ( )

Eigenvalue: 4 Eigenvalue: 2

ð2Þ
J 2 = rank ðA - 2E Þ þ rank ðA - 2EÞ3 - 2rank ðA - 2E Þ2
ð12:165Þ
= 2 þ 1 - 2 × 1 = 1:

ð2Þ ð2Þ
In the above, J 1 and J 2 are obtained from (12.152). Thus, Fig. 12.2 gives a
constitution of Jordan blocks for eigenvalues 4 and 2. The overall number of Jordan
blocks is three; the number of the first-order and second-order Jordan blocks is two
and one, respectively.
ð2Þ ð2Þ
The proper eigenvector e1 is related to J 1 of (12.164). A set of the proper
ð2Þ ð2Þ
eigenvector e2 and the corresponding generalized eigenvector g2 is pertinent to
ð2Þ ð2Þ
J 2 of (12.165). We must have the generalized eigenvector g2 in such a way that

ð2Þ ð2Þ ð2Þ


ðA - 2E Þ2 g2 = ðA - 2E Þ ðA - 2EÞg2 = ðA - 2E Þe2 = 0: ð12:166Þ

From (12.161), we can put c1 = c3 = c4 = 0 and c2 = - 1. Thus, the matrix


ð2Þ
representation of the generalized eigenvector g2 is

0
ð2Þ
g2 = -1 : ð12:167Þ
0
0

ð2Þ ð2Þ
We stress here that e1 is not eligible for a proper pair with g2 in J2. It is because
from (12.166) we have

ð2Þ ð2Þ ð2Þ ð2Þ


ðA - 2EÞg2 = e2 , ðA - 2E Þg2 ≠ e1 : ð12:168Þ

Thus, we have determined a set of (generalized) eigenvectors. The matrix repre-


sentation R for the basis vectors transformation is given by
12.6 Jordan Canonical Form 533

0 0 1 0
0 0 -1 -1 ð4Þ ð2Þ ð2Þ ð2Þ
R= e1 e 1 e 2 g2 , ð12:169Þ
0 1 0 0
1 1 1 0

ð4Þ ð2Þ ð2Þ


where the symbol ~ denotes the column vector representation; e1 , e1 , and e2
ð2Þ
represent proper eigenvectors and g2 is a generalized eigenvector. Performing
similarity transformation using this R, we get a following Jordan canonical form:

-1 0 -1 1 1 -1 0 0 0 0 1 0
0 0 1 0 1 3 0 0 0 0-1-1
R - 1 AR ¼
1 0 0 0 0 0 2 0 0 1 0 0
-1-1 0 0 -3-1-2 4 1 1 1 0
4 0 0 0
0 2 0 0
¼ :
0 0 2 1
0 0 0 2
ð12:170Þ

The structure of Jordan blocks is shown in Fig. 12.2. Notice here that the trace of
A remains unchanged before and after similarity transformation.
Next, we consider column vector representations. According to (11.37), let us
view the matrix A as a linear transformation over V4. Then A is given by

x1
x2
AðxÞ = ð e1 e2 e3 e4 ÞA
x3
x4
1 -1 0 0 ð12:171Þ
x1
1 3 0 0 x2
= ð e1 e2 e3 e4 Þ ,
0 0 2 0
x3
-3 -1 -2 4 x4

where e1, e2, e3, and e4 are basis vectors and x1, x2, x3, and x4 are corresponding
n
coordinates of a vector x = xi ei 2 V 4 . We rewrite (12.171) as
i=1
534 12 Canonical Forms of Matrices

x1
x2
AðxÞ = ð e1 e2 e3 e4 ÞRR - 1 ARR - 1
x3
x4
4 0 0 0 x01 ð12:172Þ

0 2 0 0 x02
= eð14Þ ð2Þ
e1
ð2Þ
e2
ð2Þ
g2 ,
0 0 2 1
x03
0 0 0 2 x04

where we have

ð4Þ ð2Þ ð2Þ ð2Þ


e1 e 1 e 2 g2 = ð e1 e2 e3 e4 ÞR,
x01 x1
x02 x2 ð12:173Þ
= R-1 :
x03 x3
x04 x4

After (11.84), we have

x1 x1
x2 x2
x = ð e1 e2 e3 e4 Þ = ð e1 e2 e3 e4 ÞRR - 1
x3 x3
x4 x4
x01 ð12:174Þ

x02
= eð14Þ e1
ð2Þ ð2Þ
e2 g2
ð2Þ
:
x03
x04

As for Vn in general, let us put R = ( p)ij and R-1 = (q)ij. Also represent j-th
(generalized) eigenvectors by a column vector and denote them by p( j ). There we
display individual (generalized) eigenvectors in the order of (e(1) e(2)⋯e( j )⋯e(n)),
where e( j ) (1 ≤ j ≤ n) denotes either a proper eigenvector or a generalized
eigenvector according to (12.174). Each p( j ) is represented in reference to original
basis vectors (e1⋯en). Then we have
12.6 Jordan Canonical Form 535

n ðjÞ ðjÞ
R - 1 pðjÞ = q p
k = 1 ik k
= δi , ð12:175Þ

ðjÞ
where δi denotes a column vector to which only the j-th row is 1, otherwise 0. Thus,
ðjÞ
a column vector δi is an “address” of e( j ) in reference to (e(1) e(2)⋯e( j )⋯e(n)) taken
as basis vectors.
In our present case, in fact, we have

-1 0 -1 1 0 1
0 0 1 0 0 0
R - 1 pð1Þ = = ,
1 0 0 0 0 0
-1 -1 0 0 1 0
-1 0 -1 1 0 0
0 0 1 0 0 1
R - 1 pð2Þ = = ,
1 0 0 0 1 0
-1 -1 0 0 1 0
ð12:176Þ
-1 0 -1 1 1 0
0 0 1 0 -1 0
R - 1 pð3Þ = = ,
1 0 0 0 0 1
-1 -1 0 0 1 0
-1 0 -1 1 0 0
0 0 1 0 -1 0
R - 1 pð4Þ = = :
1 0 0 0 0 0
-1 -1 0 0 0 1

ð4Þ
In (12.176) p(1) is a column vector representation of e1 ; p(2) is a column vector
ð2Þ
representation of e1 , and so on.
A minimal polynomial ΦA(x) is expressed as

Φ A ð xÞ = ð x - 4Þ ð x - 2Þ 2 :

Readers can easily make sure of it.


We remark that a non-singular matrix R pertinent to the similarity transformation
is not uniquely determined, but we have arbitrariness. In (12.169), for example, if we
ð2Þ ð2Þ ð2Þ ð2Þ
adopt - g2 instead of g2 , we should adopt - e2 instead of e2 accordingly.
Thus, instead of R in (12.169) we may choose
536 12 Canonical Forms of Matrices

0 0 -1 0
′ 0 0 1 1
R = : ð12:177Þ
0 1 0 0
1 1 -1 0

In this case, we also get the same Jordan canonical form as before. That is,

4 0 0 0
-1 0 2 0 0
R0 AR0 = :
0 0 2 1
0 0 0 2

Suppose that we choose R″ such that

0 1 0 0

0 -1 -1 0
ðe1 e2 e3 e4 ÞR00 = ðe1 e2 e3 e4 Þ
1 0 0 0
1 1 0 1
ð2Þ ð2Þ ð2Þ ð4Þ
= e1 e2 g2 e1 :

In this case, we have

2 0 0 0
-1 0 2 1 0
R00 AR00 = :
0 0 2 0
0 0 0 4

Note that we get a different disposition of the matrix elements from that of
(12.172).
Next, we decompose A into a semi-simple matrix and a nilpotent matrix. In
(12.172), we had

4 0 0 0
0 2 0 0
R - 1 AR = :
0 0 2 1
0 0 0 2

Defining
12.6 Jordan Canonical Form 537

4 0 0 0 0 0 0 0
0 2 0 0 0 0 0 0
S= and N= ,
0 0 2 0 0 0 0 1
0 0 0 2 0 0 0 0

we have

R - 1 AR = S þ N i:e:, A = RðS þ N ÞR - 1 = RSR - 1 þ RNR - 1 :

Performing the above matrix calculations and putting ~S = RSR - 1 and


~
N = RNR - 1 , we get

A=~ ~
SþN

with

2 0 0 0 -1 -1 0 0
~ 0 2 0 0 ~ = 1 1 0 0
S= and N :
0 0 2 0 0 0 0 0
-2 0 -2 4 -1 -1 0 0

That is, we have

1 -1 0 0 2 0 0 0
1 3 0 0 0 2 0 0
A= =
0 0 2 0 0 0 2 0
-3 -1 -2 4 -2 0 -2 4
-1 -1 0 0
1 1 0 0
þ : ð12:178Þ
0 0 0 0
-1 -1 0 0

Even though matrix forms S and N differ depending on the choice of different
matrix forms of similarity transformation R (namely, R′, R′′, etc. represented above),
the decomposition (12.178) is unique. That is, ~ S and N~ are uniquely determined,
~ ~
once a matrix A is given. The matrices S and N are commutable. The confirmation is
left for readers as an exercise.
We present another simple example. Let A be a matrix described as

0 4
A= :
-1 4

Eigenvalues of A are 2 as a double root. According to routine, we have an


ð2Þ
eigenvector e1 as a column vector, e.g.,
538 12 Canonical Forms of Matrices

ð2Þ 2
e1 = :
1

ð2Þ
Another eigenvector is a generalized eigenvector g1 of rank 2. This can be
decided such that

ð2Þ -2 ð2Þ ð2Þ


ðA - 2E Þg1 = -1
4
2
g1 = e 1 :

As an option, we get

ð2Þ
g1 = 1
1
:

Thus, we can choose R for a diagonalizing matrix together with an inverse matrix
R-1 such that

-1
R= 2
1
1
1
, and R-1 = 1
-1 2
:

Therefore, with a Jordan canonical form we have

R - 1 AR = 2
0
1
2
: ð12:179Þ

As before, putting

S= 2
0
0
2
and N= 0
0
1
0
,

we have

R - 1 AR = S þ N, i:e:, A = RðS þ N ÞR - 1 = RSR - 1 þ RNR - 1 :

Putting ~ ~ = RNR - 1 , we get


S = RSR - 1 and N

A=~ ~
SþN

with

2 0 -2 4
S= and N= : ð12:180Þ
0 2 -1 2

We may also choose R′ for a diagonalizing matrix together with an inverse matrix
R′ instead of R and R-1, respectively, such that we have, e.g.,
-1
12.7 Diagonalizable Matrices 539

2 -3 -1 -1 3
R′ = and R′ = :
1 -1 -1 2

Using these matrices, we get exactly the same Jordan canonical form and the
matrix decomposition as (12.179) and (12.180). Thus, again we find that the matrix
decomposition is unique.
Another simple example is a lower triangle matrix

2 0 0
A= -2 1 0 :
0 0 1

Following now familiar procedures, as a diagonalizing matrix we have, e.g.,

1 0 0 1 0 0
R= -2 1 0 and R - 1 = 2 1 0 :
0 0 1 0 0 1

Then, we get

2 0 0
R - 1 AR = S = 0 1 0 :
0 0 1

Therefore, the “decomposition” is

2 0 0 0 0 0
A = RSR - 1 = -2 1 0 þ 0 0 0 ,
0 0 1 0 0 0

where the first term is a semi-simple matrix and the second is a nilpotent matrix (i.e.,
zero matrix). Thus, the decomposition is once again unique.

12.7 Diagonalizable Matrices

Among canonical forms of matrices, the simplest form is a diagonalizable matrix.


Here we define a diagonalizable matrix as a matrix that can be converted to that
whose off-diagonal elements are zero. In Sect. 12.5 we have investigated different
properties of the matrices. In this section we examine basic properties of diagonal-
izable matrices.
In Sect 12.6.1 we have shown that Span{Ni - 1aρ, Ni - 2aρ, ⋯Naρ, aρ} forms a N-
invariant subspace of a dimension i, where aρ satisfies the relations Niaρ = 0 and
Ni - 1aρ ≠ 0 (ri + 1 + 1 ≤ j ≤ ri) as in (12.122). Of these vectors, only Ni - 1aρ is a sole
540 12 Canonical Forms of Matrices

proper eigenvector that is accompanied by (i - 1) generalized eigenvectors. Note


that only the proper eigenvector can construct one-dimensional N-invariant subspace
by itself. This is because regarding other generalized eigenvectors g (here g stands
for all of generalized eigenvectors), Ng (≠0) and g are linearly independent. Note
that with a proper eigenvector e, we have Ne = 0. A corresponding Jordan block is
represented by a matrix as given in (12.122) in reference to the basis vectors
comprising these i eigenvectors.
Therefore, if a (n, n) matrix A has only proper eigenvectors, all Jordan blocks are
one-dimensional. This means that A is diagonalizable. That A has only proper
eigenvectors is equivalent to that those eigenvectors form a one-dimensional sub-
space and that Vn is a direct sum of the subspaces spanned by individual proper
eigenvectors. In other words, if Vn is a direct sum of subspaces (i.e., eigenspaces)
spanned by individual proper eigenvectors of A, A is diagonalizable.
Next, suppose that A is diagonalizable. Then, after an appropriate similarity
transformation with a non-singular matrix P, A has a following form:

α1

α1
α2

P - 1 AP = : ð12:181Þ
α2

αs

αs

In this case, let us examine what form a minimal polynomial φA(x) for A takes. A
characteristic polynomial fA(x) for A is invariant through similarity transformation,
so is φA(x). That is,

φP - 1 AP ðxÞ = φA ðxÞ: ð12:182Þ

From (12.181), we find that A - αiE (1 ≤ i ≤ s) has a “perforated” form such as


(12.143) with the diagonalized form unchanged. Then we have

ðA - α1 EÞðA - α2 E Þ⋯ðA - αs E Þ = 0: ð12:183Þ

This is because a product of matrices having only diagonal elements is merely a


product of individual diagonal elements. Meanwhile, in virtue of Hamilton-Cayley
theorem,
s
f A ðAÞ = i=1
ðA - αi EÞni = 0:

Rewriting this expression, we have


12.7 Diagonalizable Matrices 541

ðA - α1 E Þn1 ðA - α2 EÞn2 ⋯ðA - αs EÞns = 0: ð12:184Þ

In light of (12.183), this implies that a minima polynomial φA(x) is expressed as

φA ðxÞ = ðx - α1 Þðx - α2 Þ⋯ðx - αs Þ: ð12:185Þ

Surely φA(x) of (12.185) has a lowest-order polynomial among those satisfying


f(A) = 0 and is a divisor of fA(x). Also φA(x) has a highest-order coefficient of
1. Thus, φA(x) should be a minimal polynomial of A and we conclude that φA(x) does
not have a multiple root.
Then let us think how Vn is characterized in case φA(x) does not have a multiple
root. This is equivalent to that φA(x) is described by (12.185). To see this, suppose
that we have two matrices A and B and let BVn = W. We wish to use the following
relation [3, 4]:

rankðABÞ = dimABV n = dimAW = dimW - dim A - 1 ð0Þ \ W


≥ dimW - dim A - 1 ð0Þ = dimBV n - ðn - dimAV n Þ ð12:186Þ
= rank A þ rank B - n:

In (12.186), the third equality comes from the fact that the domain of A is
restricted to W. Concomitantly, A-1(0) is restricted to A-1(0) \ W as well; notice
that A-1(0) \ W is a subspace. Considering these situations, we use a relation
corresponding to that of (11.45). The fourth equality is due to the dimension theorem
of (11.45). Applying (12.186) to (12.183) successively, we have

0 = rank ½ðA - α1 E ÞðA - α2 E Þ⋯ðA - αs EÞ


≥ rank ðA - α1 E Þ þ rank ½ðA - α2 E Þ⋯ðA - αs E Þ - n
≥ rank ðA - α1 E Þ þ rank ðA - α2 EÞ þ rank ½ðA - α3 EÞ⋯ðA - αs E Þ - 2n
≥⋯
≥ rank ðA - α1 E Þ þ ⋯ þ rank ðA - αs E Þ - ðs - 1Þn
s
= i=1
rank ½ðA - αi E Þ - n þ n:

Finally we get
s
i=1
rank ½n - ðA - αi EÞ ≥ n: ð12:187Þ

As rank ½n - ðA - αi E Þ = dimW αi , we have


s
i=1
dimW αi ≥ n: ð12:188Þ

Meanwhile, we have
542 12 Canonical Forms of Matrices

V n ⊃ W α1 W α2 ⋯ W αs ,
s ð12:189Þ
n ≥ dim W α1 W α2 ⋯ W αs = i=1
dimW αi :

The equality results from the property of a direct sum. From (12.188) and
(12.189),
s
i=1
dimW αi = n: ð12:190Þ

Hence,

V n = W α1 W α2 ⋯ W αs : ð12:191Þ

Thus, we have proven that if the minimal polynomial does not have a multiple
root, Vn is decomposed into direct sum of eigenspaces as in (12.191).
If in turn Vn is decomposed into direct sum of eigenspaces as in (12.191), A can be
diagonalized by a similarity transformation. The proof is as follows: Suppose that
(12.191) holds. Then, we can take only eigenvectors for the basis vectors of Vn.
Suppose that dimW αi = ni . Then, we can take vectors
i-1 i
ak j = 1 nj þ 1 ≤ k ≤ j = 1 nj so that ak can be the basis vectors of W αi . In
reference to this basis set, we describe a vector x 2 Vn such that

x1
x = ð a1 a 2 ⋯ a n Þ x2 :

xn
Operating A on x, we get
x1 x1
x2 x2
AðxÞ = ð a1 a2 ⋯ an ÞA = ð a1 A a2 A ⋯ an A Þ
⋮ ⋮
xn xn
x1
x2
= ð α1 a1 α2 a2 ⋯ αn an Þ

xn
α1 x1
α2 x2
= ð a1 a2 ⋯ an Þ ,
⋱ ⋮

αn xn
ð12:192Þ
12.7 Diagonalizable Matrices 543

where with the second equality we used the notation (11.40); with the third equality
some of αi (1 ≤ i ≤ n) may be identical; ai is an eigenvector that corresponds to an
eigenvalue αi. Suppose that a1, a2, ⋯, and an are obtained by transforming an
“original” basis set e1, e2, ⋯, and en by R. Then, we have

ð0Þ
α1 x1
ð0Þ
α2 x2
AðxÞ = ð e1 e2 ⋯ en Þ R R-1 :
⋱ ⋮
αn xðn0Þ

We denote the transformation A with respect to a basis set e1, e2, ⋯, and en by A0;
see (11.82) with the notation. Then, we have

ð0Þ
x1
ð0Þ
x2
AðxÞ = ð e1 e2 ⋯ en Þ A0 :

xnð0Þ

Therefore, we get

α1
α2
R - 1 A0 R = : ð12:193Þ

αn

Thus, A is similar to a diagonal matrix as represented in (12.192) and (12.193).


It is obvious to show a minimal polynomial of a diagonalizable matrix has no
multiple roots. The proof is left for readers.
Summarizing the above arguments, we have a following theorem:
Theorem 12.7 [3, 4] The following three statements related to A are equivalent:
1. The matrix A is similar to a diagonal matrix (i.e., semi-simple).
2. The minimal polynomial φA(x) does not have a multiple root.
3. The vector space Vn is decomposed into a direct sum of eigenspaces.
In Example 12.1 we showed the diagonalization of a matrix. There A has two
different eigenvalues. Since with a (n, n) matrix having n different eigenvalues its
characteristic polynomial does not have a multiple root, the minimal polynomial
necessarily has no multiple roots. The above theorem therefore ensures that a matrix
having no multiple roots must be diagonalizable.
544 12 Canonical Forms of Matrices

Another consequence of this theorem is that an idempotent matrix is diagonaliz-


able. The matrix is characterized by A2 = A. Then A(A - E) = 0. Taking its
determinant, (detA)[det(A - E)] = 0. Therefore, we have either detA = 0 or det
(A - E) = 0. Hence, eigenvalues of A are zero or 1. Think of f(x) = x(x - 1). As
f(A) = 0, f(x) should be a minimal polynomial. It has no multiple roots, and so the
matrix is diagonalizable.
Example 12.6 Let us revisit Example 12.1, where we dealt with

A= 2
0
1
1
: ð12:32Þ

From (12.33), fA(x) = (x - 2)(x - 1). Note that f A ðxÞ = f P - 1 AP ðxÞ. Let us treat a
problem according to Theorem 12.5. Also we use the notation of (12.85). Given
f1(x) = x - 1 and f2(x) = x - 2, let us decide M1(x) and M2(x) such that these can
satisfy

M 1 ðxÞf 1 ðxÞ þ M 2 ðxÞf 2 ðxÞ = 1: ð12:194Þ

We find M1(x) = 1 and M2(x) = - 1. Thus, using the notation of Theorem 12.5,
Sect. 12.4 we have

1 1
A1 = M 1 ðAÞf 1 ðAÞ = A - E = ,
0 0
0 -1
A2 = M 2 ðAÞf 2 ðAÞ = - A þ 2E = :
0 1

We also have

A1 þ A2 = E, Ai Aj = Aj Ai = Ai δij : ð12:195Þ

Thus, we find that A1 and A2 are idempotent matrices. As both A1 and A2 are
expressed by a polynomial A, they are commutative with A.
We find that A is represented by

A = α1 A1 þ α2 A2 , ð12:196Þ

where α1 and α2 denote eigenvalues 2 and 1, respectively. Thus choosing proper


eigenvectors for basis vectors, we have decomposed a vector space Vn into a direct
sum of invariant subspaces comprising the proper eigenvectors. Concomitantly, A is
represented as in (12.196). The relevant decomposition is always possible for a
diagonalizable matrix.
Thus, idempotent matrices play an important role in the linear vector spaces.
12.7 Diagonalizable Matrices 545

Example 12.7 Let us think of a following matrix.

1 0 1
A= 0 1 1 : ð12:197Þ
0 0 0

This is a triangle matrix, and so diagonal elements give eigenvalues. We have an


eigenvalue 1 of double root and that 0 of simple root. The matrix can be diagonalized
using P such that

1 0 1 1 0 1 1 0 -1
~ = P - 1 AP =
A 0 1 1 0 1 1 0 1 -1
0 0 1 0 0 0 0 0 1
1 0 0
= 0 1 0 : ð12:198Þ
0 0 0

~ = A.
As can be checked easily, A ~ We also have
2

0 0 0
~=
E-A 0 0 0 , ð12:199Þ
0 0 1

where E - A~ 2=E-A ~ holds as well. Moreover, A


~ E-A
~ = E-A
~ A
~ = 0. Thus
~ and E - A
A ~ behave like A1 and A2 of (12.195).

Next, suppose that x 2 Vn is expressed as a linear combination of basis vectors a1,


a2, ⋯, and an. Then

x = c1 a1 þ c 2 a2 þ ⋯ þ c n - 1 an - 1 þ c n an : ð12:200Þ

Here let us define a following linear transformation P(k) such that P(k) “extracts”
the k-th component of x. That is,

n n n ½k 
PðkÞ ðxÞ = PðkÞ ca
j=1 j j
= j=1
p ca
i = 1 ij j i
= c k ak , ð12:201Þ

½k 
where pij is the matrix representation of P(k). In fact, suppose that there is another
arbitrarily chosen vector x such that

y = d 1 a1 þ d 2 a2 þ ⋯ þ d n - 1 an - 1 þ d n an : ð12:202Þ

Then we have
546 12 Canonical Forms of Matrices

PðkÞ ðax þ byÞ = ðack þ bdk Þak = ack ak þ bd k ak = aPðkÞ ðxÞ þ bPðkÞ ðyÞ: ð12:203Þ

Thus P(k) is a linear transformation. In (12.201), for the third equality to hold, we
should have

½k  ðk Þ
pij = δi δjðkÞ , ð12:204Þ

ðjÞ
where δi has been defined in (12.175). Meanwhile, δjðkÞ denotes a row vector to
ðk Þ
which only the k-th column is 1, otherwise 0. Note that δi represents a (n, 1) matrix
ðk Þ
and that δðj kÞ denotes a (1, n) matrix. Therefore, δi δjðkÞ represents a (n, n) matrix
(k)
whose (k, k) element is 1, otherwise 0. Thus, P (x) is denoted by

0

x1
0
x2
PðkÞ ðxÞ = ða1 ⋯ an Þ 1 = x k ak , ð12:205Þ

0
xn

0

where only the (k, k) element is 1, otherwise 0. Then P(k)[P(k)(x)] = P(k)(x). That is

2
PðkÞ = PðkÞ : ð12:206Þ

Also P(k)[P(l )(x)] = 0 if k ≠ l. Meanwhile, we have P(1)(x) + ⋯ + P(n)(x) = x.


Hence, P(1)(x) + ⋯ + P(n)(x) = [P(1) + ⋯ + P(n)](x) = x. Since this relation holds
with any x 2 Vn, we get

Pð1Þ þ ⋯ þ PðnÞ = E: ð12:207Þ

As shown above, an idempotent matrix such as P(k) always exists. In particular, if


the basis vectors comprise only proper eigenvectors, the decomposition as expressed
in (12.196) is possible. In that case, it is described as

A = α1 A1 þ ⋯ þ αn An , ð12:208Þ

where α1, ⋯, and αn are eigenvalues (some of which may be identical) and A1, ⋯,
and An are idempotent matrices such as those represented by (12.205).
In fact, the above situation is equivalent to that A is semi-simple. This implies that
A can be decomposed into a following form described by
12.7 Diagonalizable Matrices 547

R - 1 AR = α1 Pð1Þ þ ⋯ þ αn PðnÞ , ð12:209Þ

where R is a non-singular operator. Rewriting (12.209), we have

A = α1 RPð1Þ R - 1 þ ⋯ þ αn RPðnÞ R - 1 : ð12:210Þ

Putting A~k  RPðkÞ R - 1 ð1 ≤ k ≤ nÞ, we obtain

A = α1 A~1 þ ⋯ þ αn A~n : ð12:211Þ

We can readily show that

A~k A~l = δkl : ð12:212Þ

That is, A~k ð1 ≤ k ≤ nÞ is an idempotent matrix.


Yet, we have to be careful to construct idempotent matrices according to a
formalism described in Theorem 12.5. It is because we often encounter a situation
where different matrices give an identical characteristic polynomial. We briefly
mention this in the next example.
Example 12.8 Let us think about following two matrices:

3 0 0 3 0 0
A= 0 2 0 , B= 0 2 1 : ð12:213Þ
0 0 2 0 0 2

Then, following Theorem 12.5, Sect. 12.4, we have

f A ðxÞ = f B ðxÞ = ðx - 3Þðx - 2Þ2 , ð12:214Þ

with eigenvalues α1 = 3 and α2 = 2. Also we have f1(x) = (x - 2)2 and f2(x) = x - 3.


Following the procedures of (12.85) and (12.86), we obtain

M 1 ð xÞ = x - 2 and M 2 ðxÞ = - x2 þ 3x - 3: ð12:215Þ

Therefore, we have

M 1 ðxÞf 1 ðxÞ = ðx - 2Þ3 , M 2 ðxÞf 2 ðxÞ = ðx - 3Þ - x2 þ 3x - 3 : ð12:216Þ

Hence, we get

A1  M 1 ðAÞf 1 ðAÞ = ðA - 2E Þ3 ,
ð12:217Þ
A2  M 2 ðAÞf 2 ðAÞ = ðA - 3E Þ - A2 þ 3A - 3E :

Similarly, we get B1 and B2 by replacing A with B in (12.217). Thus, we have


548 12 Canonical Forms of Matrices

1 0 0 0 0 0
A1 = B1 = 0 0 0 , A2 = B2 = 0 1 0 : ð12:218Þ
0 0 0 0 0 1

Notice that we get the same idempotent matrix of (12.218), even though the
matrix forms of A and B differ. Also we have

A1 þ A2 = B1 þ B2 = E: ð12:219Þ

Then, we have

A = ðA1 þ A2 ÞA = A1 A þ A2 A, B = ðB1 þ B2 ÞB = B1 B þ B2 B: ð12:220Þ

Nonetheless, although A = 3A1 + 2A2 holds, B ≠ 3B1 + 2B2. That is, the
decomposition of the form of (12.208) is not true of B. The decomposition of this
kind is possible only with diagonalizable matrices.
In summary, a (n, n) matrix with s (1 ≤ s ≤ n) different eigenvalues has at least
s proper eigenvectors. (Note that a diagonalizable matrix has n proper eigenvectors.)
In the case of s < n, the matrix has multiple root(s) and may have generalized
eigenvectors. If the matrix has a generalized eigenvector of rank ν, the matrix is
accompanied by (ν - 1) generalized eigenvectors along with a sole proper eigen-
vector. Those vectors form an invariant subspace along with the proper eigenvector
(s). In total, such n (generalized) eigenvectors span a whole vector space Vn.
With the eigenvalue equation A(x) = αx, we have an indefinite but non-trivial
solution x ≠ 0 for only restricted numbers α (i.e., eigenvalues) in a complex plane.
However, we have a unique but trivial solution x = 0 for complex numbers α other
than eigenvalues. This is characteristic of the eigenvalue problem.

References

1. Mirsky L (1990) An Introduction to linear algebra. Dover, New York


2. Hassani S (2006) Mathematical physics. Springer, New York
3. Satake I-O (1975) Linear algebra (Pure and applied mathematics). Marcel Dekker, New York
4. Satake I (1974) Linear algebra. Shokabo, Tokyo. (in Japanese)
Chapter 13
Inner Product Space

Thus far we have treated the theory of linear vector spaces. The vector spaces,
however, were somewhat “structureless,” and so it will be desirable to introduce a
concept of metric or measure into the linear vector spaces. We call a linear vector
space where the inner product is defined an inner product space. In virtue of a
concept of the inner product, the linear vector space is given a variety of structures.
For instance, introduction of the inner product to the linear vector space immediately
leads to the definition of adjoint operators and Gram matrices.
Above all, the concept of inner product can readily be extended to a functional
space and facilitate understanding of, e.g., orthogonalization of functions, as was
exemplified in Parts I and II. Moreover, definition of the inner product allows us to
relate matrix operators and differential operators. In particular, it is a key issue to
understand logical structure of quantum mechanics. This can easily be understood
from the fact that Paul Dirac, who was known as one of the prominent founders of
quantum mechanics, invented bra and ket vectors to represent an inner product.

13.1 Inner Product and Metric

Inner product relates two vectors to a complex number. To do this, we introduce the
notation jai and hbj to represent the vectors. This notation is due to Dirac and widely
used in physics and mathematical physics. Usually jai and hbj are called a “ket”
vector and a “bra” vector, respectively, again due to Dirac. Alternatively, we may
call haj an adjoint vector of jai. Or we denote ha j  j ai{. The symbol “{” (dagger)
means that for a matrix its transposed matrix should be taken with complex conju-
{
gate matrix elements. That is, aij = aji . If we represent a full matrix, we have

© Springer Nature Singapore Pte Ltd. 2023 549


S. Hotta, Mathematical Physical Chemistry,
https://doi.org/10.1007/978-981-99-2512-4_13
550 13 Inner Product Space

a11 ⋯ a1n a11 ⋯ an1


A= ⋮ ⋱ ⋮ , A{ = ⋮ ⋱ ⋮ : ð13:1Þ
an1 ⋯ ann a1n ⋯ ann

We call A{ an adjoint matrix or adjoint to A; see (1.106). The operation of


transposition and complex conjugate is commutable. A further remark will be
made below after showing the definition of the inner product. The symbols jai and
hbj represent vectors and, hence, we do not need to use bold characters to show that
those are vectors.
The definition of the inner product is as follows:

hbjai = hajbi , ð13:2Þ


ha j ðβjbi þ γ jciÞ = βhajbi þ γ hajci, ð13:3Þ
hajai ≥ 0: ð13:4Þ

In (13.4), equality holds if and only if jai = 0. Note here that two vectors are said
to be orthogonal to each other if their inner product vanishes, i.e., hb| ai = ha| bi = 0.
In particular, if a vector jai 2 Vn is orthogonal to all the vectors in Vn, i.e., hx| ai = 0
for 8jxi 2 Vn, then jai = 0. This is because if we choose jai for jxi, we have ha|
ai = 0. This means that jai = 0. We call a linear vector space to which the inner
product is defined an inner product space.
We can create another structure to a vector space. An example is a metric
(or distance function). Suppose that there is an arbitrary set Q. If a real
non-negative number ρ(a, b) is defined as follows with any arbitrary elements a, b,
c 2 Q, the set Q is called a metric space [1]:

ρða, bÞ = ρðb, aÞ, ð13:5Þ


8
ρða, bÞ ≥ 0 for a, b; ρða, bÞ = 0 if and only if a = b, ð13:6Þ

ρða, bÞ þ ρðb, cÞ ≥ ρða, cÞ: ð13:7Þ

In our study a vector space is chosen for the set Q. Here let us define a norm for
each vector a. The norm is defined as

kak = hajai: ð13:8Þ

If we define ρ(a, b)  ka - bk, ka - bk satisfies the definitions of metric.


Equations (13.5) and (13.6) are obvious. For (13.7) let us consider a vector | ci as
| ci = j ai - xhb| ai j bi with real x. Since hc| ci ≥ 0, we have
13.1 Inner Product and Metric 551

x2 hajbihbjaihbjbi - 2xhajbihbjai þ hajai ≥ 0

or

x2 jhajbij2 hbjbi - 2xjhajbij2 þ hajai ≥ 0: ð13:9Þ

The inequality (13.9) related to the quadratic equation in x with real coefficients
requires the inequality such that

hajaihbjbi ≥ hajbihbjai = jhajbij2 : ð13:10Þ

That is,

hajai ∙ hbjbi ≥ jhajbij: ð13:11Þ

Namely,

jjajj ∙ jjbjj ≥ jhajbij ≥ Rehajbi: ð13:12Þ

The relations (13.11) and (13.12) are known as Cauchy–Schwarz inequality.


Meanwhile, we have

jja þ bjj2 = ha þ bja þ bi = jjajj2 þ jjbjj2 þ 2Rehajbi, ð13:13Þ


ðjjajj þ jjbjjÞ2 = jjajj2 þ jjbjj2 þ 2jjajj ∙ jjbjj: ð13:14Þ

Comparing (13.13) and (13.14) and using (13.12), we have

jjajj þ jjbjj ≥ jja þ bjj: ð13:15Þ

The inequality (13.15) is known as the triangle inequality. In (13.15) replacing


a → a - b and b → b - c, we get

jja - bjj þ jjb - cjj ≥ jja - cjj: ð13:16Þ

Thus, (13.16) is equivalent to (13.7). At the same time, the norm defined in (13.8)
may be regarded as a “length” of a vector a.
As β| bi + γ| ci represents a vector, we use a shorthand notation for it as

j βb þ γci  βjbi þ γjci: ð13:17Þ

According to the definition (13.3)


552 13 Inner Product Space

ha j βb þ γci = βhajbi þ γ hajci: ð13:18Þ

Also from (13.2),

hβb þ γc j ai = ha j βb þ γci = ½βhajbi þ γ hajci = β hbjai þ γ  hcjai: ð13:19Þ

Comparing the leftmost side and rightmost side of (13.19) and considering | ai
can arbitrarily be chosen, we get

hβb þ γcj = β hbj þ γ  hcj: ð13:20Þ

Therefore, when we take out a scalar from a bra vector, it should be converted to a
complex conjugate. When the scalar is taken out from a ket vector, however, it is
unaffected by definition (13.3). To show this, we have jαai = α j ai. Taking its
adjoint, hαa| =αha j .
We can view (13.17) as a linear transformation in a vector space. In other words,
if we regard | ∙i as a linear transformation of a vector a 2 Vn to j ai 2 V~n , | ∙i is that of
Vn to V~n . Conversely, h∙j could not be regarded as a linear transformation of a 2 Vn to
0
ha j2 V~n . Sometimes the said transformation is referred to as “antilinear” or
“sesquilinear.” From the point of view of formalism, the inner product can be viewed
0 0
as an operation: V~n × V~n → ℂ. The vector space V~n is said to be a dual vector space
(or dual space) of V~n . The notion of the dual space possesses important position in
the theory of vector space. We will come back to the definition and properties of the
dual space in Part V (Chap. 24).
Let us consider jxi = j yi in an inner product space V~n . Then jxi - j yi = 0. That
is, jx - yi = 0. Therefore, we have x = y, or j0i = 0. This means that the linear
transformation | ∙i converts 0 2 Vn to j 0i 2 V~n . This is a characteristic of a linear
transformation represented in (11.44). Similarly we have h0 j = 0. However, we do
not have to get into further details about the notation in this book. Also if we have to
specify a vector space, we simply do so by designating it as Vn.

13.2 Gram Matrices

Once we have defined an inner product between any pair of vectors jai and jbi of a
vector space Vn, we can define and calculate various quantities related to inner
products. As an example, let ja1i, ⋯, j ani and jb1i, ⋯, j bni be two sets of vectors
in Vn. The vectors ja1i, ⋯, j ani may or may not be linearly independent. This is true
of jb1i, ⋯, j bni. Let us think of a following matrix M defined as below:
13.2 Gram Matrices 553

ha1 j ha1 jb1 i ⋯ ha1 jbn i


M= ⋮ ðjb1 i⋯jbn iÞ = ⋮ ⋱ ⋮ : ð13:21Þ
han j han jb1 i ⋯ han jbn i

We assume the following cases.


1. Suppose that in (13.21) | b1i, ⋯, | bni are linearly dependent. Then, without loss
of generality we can put

jb1 i = c2 jb2 i þ c3 jb3 i þ ⋯ þ cn jbn i: ð13:22Þ

Then we have

c2 ha1 jb2 iþc3 ha1 jb3 iþ⋯þcn ha1 jbn i ha1 jb2 i ⋯ ha1 jbn i
M= ⋮ ⋮ ⋱ ⋮ : ð13:23Þ
c2 han jb2 iþc3 han jb3 iþ⋯þcn han jbn i han jb2 i ⋯ han jbn i

Multiplying the second column,⋯, and the n-th column by (-c2),⋯, and (-cn),
respectively, and adding them to the first column, we get

0 ha1 jb2 i ⋯ ha1 jbn i


~=
M ⋮ ⋮ ⋱ : ð13:24Þ
0 han jb2 i ⋯ han jbn i

~ was obtained after giving the above operations to M. We


Note that in (13.24), M
~
have det M = 0. Since the above operations keep the determinant of a matrix
unchanged, again we get det M = 0.
2. Suppose in turn that in (13.21) | a1i, ⋯, | ani are linearly dependent. In that case,
again, without loss of generality we can put

ja1 i = d2 ja2 i þ d 3 ja3 i þ ⋯ þ dn jan i: ð13:25Þ

Focusing attention on individual rows and taking a similar procedure described


above, we have

0 ⋯ 0
~ = ha2 jb1 i ⋯ ha2 jbn i
M : ð13:26Þ
⋮ ⋱ ⋮
han jb1 i ⋯ han jbn i

Again, det M = 0.
Next, let us examine the case where det M = 0. In that case, n column vectors of
M in (13.21) are linearly dependent. Without loss of generality, we suppose that the
first column is expressed as a linear combination of the other (n - 1) columns such
that
554 13 Inner Product Space

ha1 jb1 i ha1 jb2 i ha1 jbn i


⋮ = c2 ⋮ þ ⋯ þ cn ⋮ : ð13:27Þ
han jb1 i han jb2 i han jbn i

Rewriting this, we have

ha1 jb1 - c2 b2 - ⋯ - cn bn i 0
⋮ = ⋮ : ð13:28Þ
han jb1 - c2 b2 - ⋯ - cn bn i 0

Multiplying the first row,⋯, and the n-th row of (13.28) by an appropriate
complex number p1 ,⋯, and pn , respectively, we have

hp1 a1 jb1 - c2 b2 - ⋯ - cn bn i ¼ 0,
⋯,
hpn an jb1 - c2 b2 - ⋯ - cn bn i ¼ 0:

Adding all the above, we get

hp1 a1 þ ⋯ þ pn an jb1 - c2 b2 - ⋯ - cn bn i = 0:

Now, suppose that ja1i, ⋯, j ani are the basis vectors. Then, jp1a1 + ⋯ + pnani
represents any vectors in a vector space. This implies that jb1 - c2b2 - ⋯ -
cnbni = 0; for this see remarks after (13.4). That is,

j b1 i = c2 j b2 i þ ⋯ þ cn j bn i: ð13:29Þ

Thus, jb1i, ⋯, j bni are linearly dependent.


Meanwhile, det M = 0 implies that n row vectors of M in (13.21) are linearly
dependent. In that case, performing similar calculations to the above, we can readily
show that if jb1i, ⋯, j bni are the basis vectors, ja1i, ⋯, j ani are linearly dependent.
We summarize the above discussion by a following statement: suppose that we
have two sets of vectors ja1i, ⋯, j ani and jb1i, ⋯, j bni.

At least a set of vectors are linearly dependent: ⟺ detM = 0: ð13:30Þ


Both the sets of vectors are linearly independent: ⟺ detM ≠ 0: ð13:31Þ

The latter statement is obtained by considering contraposition of the former


statement.
We restate the above in a following theorem:
Theorem 13.1 Let ja1i, ⋯, j ani and jb1i, ⋯, j bni be two sets of vectors defined in
a vector space Vn. A necessary and sufficient condition for both these sets of vectors
to be linearly independent is that with a matrix M defined as
13.2 Gram Matrices 555

ha1 j ha1 jb1 i ⋯ ha1 jbn i


M= ⋮ ðjb1 i⋯jbn iÞ = ⋮ ⋱ ⋮ ,
han j han jb1 i ⋯ han jbn i

we must have detM ≠ 0. ■


Next we consider a norm of a vector expressed in reference to a set of basis
vectors je1i, ⋯, j eni of Vn. Let us express a vector jxi in an inner product space as
follows as in the case of (11.10) and (11.13):

j xi = x1 j e1 i þ x2 j e2 i þ ⋯ þ xn j en i
= j x1 e1 þ x 2 e2 þ ⋯ þ xn en i
x1
= ðj e1 i⋯jen iÞ ⋮ : ð13:32Þ
xn

A bra vector hxj is then denoted by

he1 j
hx j = x1 ⋯ xn ⋮ : ð13:33Þ
hen j

Thus, we have an inner product described as

he1 j x1
hxjxi = x1 ⋯ xn ⋮ ðje1 i ⋯ jen iÞ ⋮
hen j xn

he1 je1 i ⋯ he1 jen i x1


= x1 ⋯ xn ⋮ ⋱ ⋮ ⋮ : ð13:34Þ
hen je1 i ⋯ hen jen i xn

Here the matrix G expressed as follows is called a Gram matrix [2–4]:

he1 je1 i ⋯ he1 jen i


G= ⋮ ⋱ ⋮ : ð13:35Þ
hen je1 i ⋯ hen jen i

As hej| eii = hei| eji, we have G = G{. From Theorem 13.1, detG ≠ 0. With a
shorthand notation, we write (G)ij = (hei| eji). As already mentioned in Sect. 1.4, if
for a matrix H we have a relation described by

H = H{, ð1:119Þ

it is said to be an Hermitian matrix or a self-adjoint matrix. We often say that such a


matrix is Hermitian.
556 13 Inner Product Space

Since the Gram matrices frequently appear in matrix algebra and play a role, their
properties are worth examining. Since G is an Hermitian matrix, it can be diagonal-
ized through similarity transformation using a unitary matrix. We will give its proof
later (see Sect. 14.3).
Let us deal with (13.34) further. We have

he1 je1 i ⋯ he1 jen i x1


hxjxi = x1 ⋯ xn UU { ⋮ ⋱ ⋮ UU { ⋮ , ð13:36Þ
hen je1 i ⋯ hen jen i xn

where U is defined as UU{ = U{U = E (or U-1 = U{). Such a matrix U is called a
unitary matrix. We represent a matrix form of U as

u11 ⋯ u1n u11 ⋯ un1


U= ⋮ ⋱ ⋮ , U{ = ⋮ ⋱ ⋮ : ð13:37Þ
un1 ⋯ unn u1n ⋯ unn

Here, putting

x1 ξ1
U{ ⋮ = ⋮
xn ξn

or equivalently
n
x u
k = 1 k ki
 ξi ð13:38Þ

and taking its adjoint such that

x1 ⋯ xn U = ξ1 ⋯ ξn

or equivalently
n
x u
k = 1 k ki
= ξi ,

we have

he1 je1 i ⋯ he1 jen i ξ1


hxjxi = ξ1 ⋯ ξn U { ⋮ ⋱ ⋮ U ⋮ : ð13:39Þ
hen je1 i ⋯ hen jen i ξn

We assume that the Gram matrix is diagonalized by a similarity transformation by


U. After being diagonalized, similarly to (12.192) and (12.193) the Gram matrix has
a following form G′:
13.2 Gram Matrices 557

λ1 ⋯ 0
U - 1 GU = U { GU = G0 = ⋮ ⋱ ⋮ : ð13:40Þ
0 ⋯ λn

Thus, we get

λ1 ⋯ 0 ξ1
hxjxi = ξ1 ⋯ ξn ⋮ ⋱ ⋮ ⋮ = λ1 jξ1 j2 þ ⋯ þ λn jξn j2 : ð13:41Þ
0 ⋯ λn ξn

From the relation (13.4), hx| xi ≥ 0. This implies that in (13.41) we have

λi ≥ 0 ð1 ≤ i ≤ nÞ: ð13:42Þ

To show this, suppose that for ∃λi, λi < 0. Suppose also that for ∃ξi, ξi ≠ 0. Then,
0

with ξi we have hx| xi = λi|ξi|2 < 0, in contradiction.

0
Since we have detG ≠ 0, taking a determinant of (13.40) we have

n
det G0 = det U { GU = det U { UG = detEdetG = detG = λi ≠ 0: ð13:43Þ
i=1

Combining (13.42) and (13.43), all the eigenvalues λi are positive; i.e.,

λi > 0 ð1 ≤ i ≤ nÞ: ð13:44Þ

The norm hx| xi = 0, if and only if ξ1 = ξ2 = ⋯ = ξn = 0 which corresponds to


x1 = x2 = ⋯ = xn = 0 from (13.38).
For further study, we generalize the aforementioned feature a little further. That
is, if je1i, ⋯, and j eni are linearly dependent, from Theorem 13.1 we get det G = 0.
This implies that there is at least one eigenvalue λi = 0 (1 ≤ i ≤ n) and that for a
0

vector ξi with ∃ξi ≠ 0 we have hx| xi = 0. (Suppose that in V2 we have linearly

0
dependent vectors je1i and je2i = j - e1i; see Example 13.2 below.)
Let H be an Hermitian matrix [i.e., (H )ij = (H)ji]. If we have ϕ as a function of
complex variables x1, ⋯, xn such that
n
ϕðx1 , ⋯, xn Þ = x ðH Þij xj ,
i,j = 1 i
ð13:45Þ

where Hij is a matrix element of an Hermitian matrix; ϕ(x1, ⋯, xn) is said to be an


Hermitian quadratic form. Suppose that ϕ(x1, ⋯, xn) satisfies ϕ(x1, ⋯, xn) = 0 if and
only if x1 = x2 = ⋯ = xn = 0, and otherwise ϕ(x1, ⋯, xn) > 0 with any other sets of
558 13 Inner Product Space

xi (1 ≤ i ≤ n). Then, the said Hermitian quadratic form is called positive definite and
we write

H > 0: ð13:46Þ

If ϕ(x1, ⋯, xn) ≥ 0 for any xn (1 ≤ i ≤ n) and ϕ(x1, ⋯, xn) = 0 for at least a set of
(x1, ⋯, xn) to which ∃xi ≠ 0, ϕ(x1, ⋯, xn) is said to be positive semi-definite or
non-negative. In that case, we write

H ≥ 0: ð13:47Þ

From the above argument, a Gram matrix comprising linearly independent


vectors is positive definite, whereas that comprising linearly dependent vectors is
non-negative. On the basis of the above argument including (13.36)–(13.44),
we have

H > 0 ⟺ λi > 0 ð1 ≤ i ≤ nÞ, detH > 0;



H ≥ 0 ⟺ λi ≥ 0 with λi = 0 ð1 ≤ i ≤ nÞ, detH = 0: ð13:48Þ

Notice here that eigenvalues λi remain unchanged after (unitary) similarity trans-
formation. Namely, the eigenvalues are inherent to H.
We have already encountered several examples of positive definite and
non-negative operators. A typical example of the former case is Hamiltonian of
quantum-mechanical harmonic oscillator (see Chap. 2). In this case, energy eigen-
values are all positive (i.e., positive definite). Orbital angular momenta L2 of
hydrogen-like atoms, however, are non-negative operators and, hence, an eigenvalue
of zero is permitted.
Alternatively, the Gram matrix is defined as B{B, where B is any (n, n) matrix. If
we take an orthonormal basis jη1i, j η2i, ⋯, j ηni, jeii can be expressed as
n
j ei i = b
j = 1 ji
j ηj ,
n n n n n
hek jei i = j=1
b b
l = 1 lk ji
ηl jηj = j=1
b b δ
l = 1 lk ji lj
= b b
j = 1 jk ji
n
= j=1
B{ kj
ðBÞji = B{ B ki : ð13:49Þ

For the second equality of (13.49), we used the orthonormal condition hηi| ηji = δij.
Thus, the Gram matrix G defined in (13.35) can be regarded as identical to B{B.
In Sect. 11.4 we have dealt with a linear transformation of a set of basis vectors e1,
e2, ⋯, and en by a matrix A defined in (11.69) and examined whether the
transformed vectors e01 , e02 , and e0n are linearly independent. As a result, a necessary
and sufficient condition for e01 , e02 , and e0n to be linearly independent (i.e., to be a set
of basis vectors) is detA ≠ 0. Thus, we notice that B plays a same role as A of (11.69)
13.2 Gram Matrices 559

and, hence, det B ≠ 0 if and only if the set of vectors je1i, j e2i, and⋯ j eni defined in
(13.32) are linearly independent. By the same token as the above, we conclude that
eigenvalues of B{B are all positive, only if det B ≠ 0 (i.e., B is non-singular).
Alternatively, if B is singular, detB{B = det B{ det B = 0. In that case at least one
of the eigenvalues of B{B must be zero. The Gram matrices appearing in (13.35) are
frequently dealt with in the field of mathematical physics in conjunction with
quadratic forms. Further topics can be seen in the next chapter.
Example 13.1 Let us take two vectors jε1i and j ε2i that are expressed as

j ε1 i = j e1 iþ j e2 i, j ε2 i = j e1 i þ i j e2 i: ð13:50Þ

Here we have hei| eji = δij (1 ≤ i, j ≤ 2). Then we have a Gram matrix expressed as

hε1 jε1 i hε1 jε2 i 2 1þi


G= = : ð13:51Þ
hε2 jε1 i hε2 jε2 i 1-i 2

Principal minors of G are |2| = 2 > 0 and


2
1-i
1þi
2
= 4 - ð1 þ iÞð1 - iÞ = 2 > 0. Therefore, according to Theorem 14.12,
G > 0 (vide infra).
Let us diagonalize the matrix G. To this end, we find roots of the characteristic
equation. That is,

2-λ
det j G - λE j = 1-i
1þi
2-λ
= 0, λ2 - 4λ þ 2 = 0: ð13:52Þ

p
We have λ = 2 ± 2. Then as a diagonalizing unitary matrix U we get

1þi 1þi
U= p2 2p
: ð13:53Þ
2 2
-
2 2

Thus, we get
p
1-i 2 1þi 1þi
p2 2p
{ 2 2p 2 1þi
U GU =
1-i 2 1-i 2 2
-
2
-
2 2 2 2
p
= 2þ 2
0
0p
2- 2
: ð13:54Þ

p p
The eigenvalues 2 þ 2 and 2 - 2 are real positive as expected. That is, the
Gram matrix is positive definite. □
560 13 Inner Product Space

Example 13.2 Let je1i and j e2i(=-| e1i) be two vectors. Then, we have a Gram
matrix expressed as

he1 je1 i he1 je2 i 1 -1


G= = : ð13:55Þ
he2 je1 i he2 je2 i -1 1

Similarly in the case of Example 13.1, we have as an eigenvalue equation

1-λ -1
det j G - λE j = -1 1-λ
= 0, λ2 - 2λ = 0: ð13:56Þ

We have λ = 2 or 0. As a diagonalizing unitary matrix U we get

1 1
p p
2 2
U= : ð13:57Þ
1 1
- p p
2 2

Thus, we get

1 1 1 1
p - p p p
2 2 1 -1 2 2 2 0
U { GU = = : ð13:58Þ
1 1 -1 1 1
p p 1
- p p 0 0
2 2 2 2

As expected, we have a diagonal matrix, one of the eigenvalues for which is zero.
That is, the Gram matrix is non-negative.
In the present case, let us think of a following Hermitian quadratic form:

2
ϕðx1 , x2 Þ = x ðGÞij xj
i,j = 1 i
= x1 x2 UU { GUU { x1
x2

ξ1
= x1 x2 U 2
0
0
0
U{ x1
x2
= ξ1 ξ2 2
0
0
0 ξ2
= 2jξ1 j2 :

ξ1
Then, if we take ξ2
= 0
ξ2
with ξ2 ≠ 0 as a column vector, we get ϕ(x1,
x2) = 0. With this type of column vector, we have

1 1
p p
x1 0 x1 0 2 2 0 1 ξ2
U{ = , i:e:, =U = =p ,
ξ2 ξ2 1 1 ξ2 2 ξ2
- p p
x2 x2
2 2

where ξ2 is an arbitrary complex number. Thus, even for x1


x2
≠ 0
0
we may have
ϕ(x1, x2) = 0. To be more specific, if we take, e.g., x1
x2
= 1
1
or 1
2
, we get
13.3 Adjoint Operators 561

-1
ϕð1, 1Þ = ð 1 1Þ
1
-1 1
1
1
=ð1 1Þ
0
0
= 0,

-1 -1
ϕð1, 2Þ = ð 1 2Þ
1
-1 1
1
2
=ð1 2Þ
1
= 1 > 0:

13.3 Adjoint Operators

A linear transformation A is similarly defined as before and A transforms jxi of


(13.32) such that

a11 ⋯ a1n x1
AðjxiÞ = ðj e1 i ⋯ jen iÞ ⋮ ⋱ ⋮ ⋮ , ð13:59Þ
an1 ⋯ ann xn

where (aij) is a matrix representation of A. Defining A(| xi) = j A(x)i and h(x)
A{ j = (hx| )A{ = [A(| xi)]{ to be consistent with (1.117), we have

a11 ⋯ an1 he1 j


ðxÞA{ j = x1 ⋯ xn ⋮ ⋱ ⋮ ⋮ : ð13:60Þ
a1n ⋯ ann hen j

n
Therefore, putting j yi = i = 1 yi j ei i, we have

a11 ⋯ an1 h e1 j y1
ðxÞA{ j yi = x1 ⋯ xn ⋮ ⋱ ⋮ ⋮ ðje1 i ⋯ jen iÞ ⋮
a1n ⋯ ann h en j yn

y1
= x1 ⋯ xn A{ G ⋮ : ð13:61Þ
yn

Meanwhile, we get

h e1 j a11 ⋯ a1n x1
hyjAðxÞi = y1 ⋯ yn ⋮ ðje1 i ⋯ jen iÞ ⋮ ⋱ ⋮ ⋮
h en j an1 ⋯ ann xn

x1
= y1 ⋯ yn GA ⋮ : ð13:62Þ
xn

Hence, we have
562 13 Inner Product Space

x1 x1
hyjAðxÞi = ðy1 ⋯ yn ÞG A
T
⋮ = ðy1 ⋯ yn ÞGT A{ ⋮ : ð13:63Þ
xn xn

With the second equality, we used G = (G{)T = GT (note that G is Hermitian)


and A = (A{)T. A complex conjugate matrix A is defined as

a11 ⋯ a1n
A  ⋮ ⋱ ⋮ :
an1 ⋯ ann

Comparing (13.61) and (13.63), we find that one is transposed matrix of the other.
Also note that (AB)T = BTAT, (ABC)T = CTBTAT, etc. Since an inner product can be
viewed as a (1, 1) matrix, two mutually transposed (1, 1) matrices are identical.
Hence, we get

ðxÞA{ jy = hyjAðxÞi = hAðxÞjyi, ð13:64Þ

where the second equality is due to (13.2).


The other way around, we may use (13.64) for the definition of an adjoint
operator of a linear transformation A. In fact, on the basis of (13.64) we have

xi A{ ik
ðGÞkj yj = yj ðG Þjk ðA Þki xi
i, j, k i, j, k

= xi yj A{ ik
ðGÞkj - ðG Þjk ðA Þki 
i, j, k

= xi yj A{ ik
- ðA Þki ðGÞkj = 0: ð13:65Þ
i, j, k

With the third equality of (13.65), we used (G)jk = (G)kj, i.e., G{ = G (Hermitian
matrix). Thanks to the freedom in choice of basis vectors as well as xi and yi, we
must have

A{ ik
= ðA Þki : ð13:66Þ

Adopting the matrix representation of (13.59) for A, we get [1]

A{ ik
= aki : ð13:67Þ

Thus, we confirm that the adjoint operator A{ is represented by a complex


conjugate transposed matrix of A, in accordance with (13.1).
Taking complex conjugate of (13.64), we have
13.3 Adjoint Operators 563

 {
ðxÞA{ jy = yj A{ ðxÞ = hyjAðxÞi:

Comparing both sides of the second equality, we get

{
A{ = A: ð13:68Þ

In Chap. 11, we have seen how a matrix representation of a linear transformation


A is changed from A0 to A′ by the basis vectors transformation. We have

A0 = P - 1 A0 P: ð11:88Þ

In a similar manner, taking the adjoint of (11.88) we get

{ { -1
ðA0 Þ = P{ A{ P - 1 = P{ A{ P{ : ð13:69Þ

In (13.69), we denote the adjoint operator before the basis vectors transformation
simply by A{ to avoid complicated notation. We also have

0 {
A { = ðA 0 Þ : ð13:70Þ

Meanwhile, suppose that (A{)′ = Q-1A{Q. Then, from (13.69) and (13.70)
we have

P{ = Q - 1 :

Next let us perform a calculation as below:

ðαu þ βvÞA{ jy = hyjAðαu þ βvÞi = α hyjAðuÞi þ β hyjAðvÞi

= α ðuÞA{ jy þ β ðvÞA{ jy = αðuÞA{ þ βðvÞA{ jy :

As y is an element arbitrarily chosen from a relevant vector space, we have

ðαu þ βvÞA{ j = αðuÞA{ þ βðvÞA{ j , ð13:71Þ

or

ðαu þ βvÞA{ = αðuÞA{ þ βðvÞA{ : ð13:72Þ

Equation (13.71) states the equality of two vectors in an inner product space on
both sides, whereas (13.72) states that in a vector space where the inner product is
564 13 Inner Product Space

not defined. In either case, both (13.71) and (13.72) show that A{ is indeed a linear
transformation. In fact, the matrix representation of (13.66) and (13.67) is indepen-
dent of the concept of the inner product.
Suppose that there are two (or more) adjoint operators B and C that correspond to
A. Then, from (13.64) we have

hðxÞBjyi = hðxÞCjyi = hyjAðxÞi : ð13:73Þ

Also we have

hðxÞB - ðxÞCjyi = hðxÞðB - CÞjyi = 0: ð13:74Þ

As x and y are arbitrarily chosen elements, we get B = C, indicating the


uniqueness of the adjoint operator.
It is of importance to examine how the norm of a vector is changed by the linear
transformation. To this end, let us perform a calculation as below:

ðxÞA{ jAðxÞ
a11 ⋯ an1 h e1 j a11 ⋯ a1n x1
= x1 ⋯ xn ⋮ ⋱ ⋮ ⋮ ðje1 i ⋯ jen iÞ ⋮ ⋱ ⋮ ⋮
a1n ⋯ ann h en j an1 ⋯ ann xn

x1
= x1 ⋯ xn A{ GA ⋮ : ð13:75Þ
xn

Equation (13.75) gives the norm of vector after its transformation.


We may have a case where the norm is conserved before and after the transfor-
mation. Actually, comparing (13.34) and (13.75), we notice that if A{GA = G, hx|
xi = h(x)A{| A(x)i. Let us have a following example for this.
Example 13.3 Let us take two mutually orthogonal vectors je1i and je2i with
ke2k = 2ke1k as basis vectors in the xy-plane (Fig. 13.1). We assume that je1i is a
unit vector. Hence, a set of vectors je1i and je2i does not constitute the orthonormal
basis. Let jxi be an arbitrary position vector expressed as

j xi = ðje1 i je2 iÞ x1
x2
: ð13:76Þ

Then we have

he1 j
hxjxi = ðx1 x2 Þ he2 j
ðj e1 i je2 iÞ x1
x2
13.3 Adjoint Operators 565

Fig. 13.1 Basis vectors je1i


and je2i in the xy-plane and
y
their linear transformation
by R | ⟩

| ′⟩ | ⟩ cos
| ′⟩
sin
| ⟩
2
| ⟩
−2| ⟩ sin | ⟩ cos x

he1 je1 i he1 je2 i x1 1 0 x1


= ð x1 x2 Þ = ðx1 x2 Þ
he2 je1 i he2 je2 i x2 0 4 x2

= x21 þ 4x22 : ð13:77Þ

Next let us think of a following linear transformation R whose matrix represen-


tation is given by

cos θ - 2 sin θ
R= : ð13:78Þ
ð sin θÞ=2 cos θ

The transformation matrix R is geometrically represented in Fig. 13.1. Following


(11.36) we have

cos θ - 2 sin θ x1
Rðj xiÞ = j e1 i je2 i : ð13:79Þ
ð sin θÞ=2 cos θ x2

As a result of the transformation R, the basis vectors je1i and je2i are transformed
into j e01 and j e02 , respectively, as in Fig. 13.1 such that

cos θ - 2 sin θ
je1′ je2′ = j e1 i je2 i :
ð sin θÞ=2 cos θ

Taking an inner product of (13.79), we have


566 13 Inner Product Space

ðxÞR{ jRðxÞ
cos θ ð sin θÞ=2 1 0 cos θ - 2 sin θ x1
= ð x1 x 2 Þ
- 2 sin θ cos θ 0 4 ð sin θÞ=2 cos θ x2
1 0 x1
= ðx1 x2 Þ = x21 þ 4x22 : ð13:80Þ
0 4 x2

Putting

1 0
G= ,
0 4

we have

R{ GR = G: ð13:81Þ

Comparing (13.77) and (13.80), we notice that a norm of jxi remains unchanged
after the transformation R. This means that R is virtually a unitary transformation. A
somewhat unfamiliar matrix form of R resulted from the choice of basis vectors other
than an orthonormal basis. □

13.4 Orthonormal Basis

Now we introduce an orthonormal basis, the simplest and most important basis set in
an inner product space. If we choose the orthonormal basis so that hei| eji = δij, a
Gram matrix G = E. Thus, R{GR = G reads as R{R = E. In that case a linear
transformation is represented by a unitary matrix and it conserves a norm of a vector
and an inner product with two arbitrary vectors.
So far we assumed that an adjoint operator A{ operates only on a row vector from
the right, as is evident from (13.61). At the same time, A operates only on the column
vector from the left as in (13.62). To render the notation of (13.61) and (13.62)
consistent with the associative law, we have to examine the commutability of A{
with G. In this context, choice of the orthonormal basis enables us to get through
such a troublesome situation and largely eases matrix calculation. Thus,

ðxÞA{ jAðxÞ
a11 ⋯ an1 h e1 j a11 ⋯ a1n x1
= x1 ⋯ xn ⋮ ⋱ ⋮ ⋮ ðje1 i ⋯ jen iÞ ⋮ ⋱ ⋮ ⋮
a1n ⋯ ann h en j an1 ⋯ ann xn
13.4 Orthonormal Basis 567

x1 x1
= x1 ⋯ xn A{ EA ⋮ = x1 ⋯ xn A{ A ⋮ : ð13:82Þ
xn xn

At the same time, we adopt a simple notation as below instead of (13.75)

x1
xA{ jAx = x1 ⋯ xn A{ A ⋮ : ð13:83Þ
xn

This notation has become now consistent with the associative law. Note that A{
and A operate on either a column or row vector. We can also do without a symbol “|”
in (13.83) and express it as

xA{ Ax = xA{ jAx : ð13:84Þ

Thus, we can freely operate A{ and A from both the left and right. By the same
token, we rewrite (13.62) as

a11 ⋯ a1n x1 x1
hyjAxi = hyAjxi = y1 ⋯ yn ⋮ ⋱ ⋮ ⋮ = y1 ⋯ yn A ⋮ : ð13:85Þ
an1 ⋯ ann xn xn

Here, notice that a vector jxi is represented by a column vector with respect to the
orthonormal basis. Using (13.64), we have

y1
hyjAxi = xA{ jy = x1 ⋯ xn A{ ⋮
yn

a11 ⋯ an1 y1
= x1 ⋯ xn ⋮ ⋱ ⋮ ⋮ : ð13:86Þ
a1n ⋯ ann yn

If in (13.83) we put jyi = j Axi, hxA{| Axi = hy| yi ≥ 0. Thus, we define a norm of
jAxi as

Ax = xA{ jAx : ð13:87Þ

Now we are in a position to construct an orthonormal basis in Vn using n linearly


independent vectors jii (1 ≤ i ≤ n). The following theorem is well-known as the
Gram–Schmidt orthonormalization.
Theorem 13.2 Gram-Schmidt Orthonormalization Theorem [5] Suppose that there
are a set of linearly independent vectors jii (1 ≤ i ≤ n) in Vn. Then one can construct
an orthonormal basis jeii (1 ≤ i ≤ n) so that hei| eji = δij (1 ≤ j ≤ n) and each vector
jeii can be a linear combination of the vectors jii.
568 13 Inner Product Space

Proof First let us take j1i. This can be normalized such that

j 1i
je1 i = , he1 je1 i = 1: ð13:88Þ
h1j1i

Next let us take j2i and then make a following vector:

1
je2 i = ½2i - he1 j2ije1 i, ð13:89Þ
L2

where L2 is a normalization constant such that ‹e2 j e2 › = 1. Note that | e2i cannot be
a zero vector. This is because if | e2i were a zero vector, | 2i and | e1i (or | 1i) would
be linearly dependent, in contradiction to the assumption. We have he1| e2i = 0.
Thus, | e1i and | e2i are orthonormal.
After this, the proof is based upon mathematical induction. Suppose that the
theorem is true of (n - 1) vectors. That is, let | eii (1 ≤ i ≤ n - 1) so that hei|
eji = δij (1 ≤ j ≤ n - 1) and each vector | eii can be a linear combination of the
vectors jii. Meanwhile, let us define

n-1
~ j ni -
j ni ej jn jej : ð13:90Þ
j=1

Again, the vector j n~i cannot be a zero vector as asserted above. We have

n-1 n-1
ek j ~n = hek jni - j=1
ej jn ek jej = hek jni - j=1
ej jn δkj = 0, ð13:91Þ

where 1 ≤ k ≤ n - 1. The second equality comes from the assumption of the


induction. The vector j n~i can always be normalized such that

~
j ni
j en i = , hen jen i = 1: ð13:92Þ
~
h~n j ni

Thus, the theorem is proven. ■


We note that in (13.92) a phase factor eiθ (θ: an arbitrarily chosen real number)
can be added such that

~
eiθ j ni
j en i = : ð13:93Þ
h~n j ni
~

To prove Theorem 13.2, we have used the following simple but important
theorem.
13.4 Orthonormal Basis 569

Theorem 13.3 Let us have any n vectors jii ≠ 0 (1 ≤ i ≤ n) in Vn and let these
vectors jii be orthogonal to one another. Then the vectors jii are linearly
independent.
Proof Let us think of the following equation:

c1 j1i þ c2 j2i þ ⋯ þ cn j ni = 0: ð13:94Þ

Multiplying (13.94) by hij from the left and considering the orthogonality among
the vectors, we have

ci hijii = 0: ð13:95Þ

Since hi| ii ≠ 0, ci = 0. The above is true of any ci and jii. Then c1 = c2 = ⋯ = cn = 0.


Thus, (13.94) implies that j1i, j 2i, ⋯, j ni are linearly independent. ■
It is worth reviewing Theorem 13.2 from the point of view of the similarity
transformation. In Sect. 13.2, we have seen that a Gram matrix comprising linearly
independent vectors is a positive definite Hermitian matrix. In that case, we had

λ1 ⋯ 0
U - 1 GU = U { GU = G0 = ⋮ ⋱ ⋮ , ð13:40Þ
0 ⋯ λn

where λi > 0 (1 ≤ i ≤ n). Now, let us define a following matrix given by


p
1= λ1 ⋯ 0
D= ⋮ ⋱ ⋮
p
: ð13:96Þ
0 ⋯ 1= λn

Then, we have

1 ⋯ 0
DT U { GUD = ⋮ ⋱ ⋮ , ð13:97Þ
0 ⋯ 1

where RHS of (13.97) is a (n, n) identity matrix. Thus, we find that (13.97) is
equivalent to what Theorem 13.2 implies.
If the vector space we are thinking of is a real vector space, the corresponding
Gram matrix is real as well and U of (13.97) is an orthogonal matrix. Then, instead of
(13.97) we have

1 ⋯ 0
DT U T GUD = ðUDÞT GUD = ⋮ ⋱ ⋮ : ð13:98Þ
0 ⋯ 1

Defining UD  P, we have
570 13 Inner Product Space

PT GP = En , ð13:99Þ

where En stands for the (n, n) identity matrix and P is a real non-singular matrix. The
transformation of G represented by (13.99) is said to be an equivalence transforma-
tion (see Sect. 14.5). Equation (13.99) gives a special case where Theorem 14.11
holds; also see Sect. 14.5.
The real vector space characterized by a Gram matrix En is called Euclidean
space. Its related topics are discussed in Parts IV and V. In Part V, in particular, we
will investigate several properties of vector spaces other than the (real) inner product
space in relation to the quantum theory of fields.

References

1. Hassani S (2006) Mathematical physics. Springer, New York


2. Satake I-O (1975) Linear algebra (pure and applied mathematics). Marcel Dekker, New York
3. Satake I (1974) Linear algebra. Shokabo, Tokyo
4. Riley KF, Hobson MP, Bence SJ (2006) Mathematical methods for physics and engineering, 3rd
edn. Cambridge University Press, Cambridge
5. Dennery P, Krzywicki A (1996) Mathematics for physicists. Dover, New York
Chapter 14
Hermitian Operators and Unitary
Operators

Hermitian operators and unitary operators are quite often encountered in mathemat-
ical physics and, in particular, quantum physics. In this chapter we investigate their
basic properties. Both Hermitian operators and unitary operators fall under the
category of normal operators. The normal matrices are characterized by an important
fact that those matrices can be diagonalized by a unitary matrix. Moreover,
Hermitian matrices always possess real eigenvalues. This fact largely eases mathe-
matical treatment of quantum mechanics. In relation to these topics, in this chapter
we investigate projection operators systematically. We find their important applica-
tion to physicochemical problems in Part IV. We further investigate Hermitian
quadratic forms and real symmetric quadratic forms as an important branch of matrix
algebra. In connection with this topic, positive definiteness and non-negative prop-
erty of a matrix are an important concept. This characteristic is readily applicable to
theory of differential operators, thus rendering this chapter closely related to basic
concepts of quantum physics.

14.1 Projection Operators

In Chap. 12 we considered the decomposition of a vector space to direct sum of


invariant subspaces. We also mentioned properties of idempotent operators. More-
over, we have shown how an orthonormal basis can be constructed from a set of
linearly independent vectors. In this section an orthonormal basis set is implied as
basis vectors in an inner product space Vn.
Let us start with a concept of an orthogonal complement. Let W be a subspace in
Vn. Let us think of a set of vectors jxi such that

© Springer Nature Singapore Pte Ltd. 2023 571


S. Hotta, Mathematical Physical Chemistry,
https://doi.org/10.1007/978-981-99-2512-4_14
572 14 Hermitian Operators and Unitary Operators

jxi; hxjyi = 0 for 8 j yi 2 W : ð14:1Þ

We name this set W⊥ and call it an orthogonal complement of W. The set W⊥


forms a subspace of Vn. In fact, if jai, j bi 2 W⊥, ha| yi = 0, hb| yi = 0. Since (ha|
+hb| ) j y i = ha| yi + hb| yi = 0. Therefore, jai + j bi 2 W⊥ and hαa| yi = αha|
yi = 0. Hence, jαai = α j ai 2 W⊥. Then, W⊥ is a subspace of Vn.
Theorem 14.1 [1, 2] Let W be a subspace and W⊥ be its orthogonal complement in
Vn. Then,

V n = W⨁W⊥ : ð14:2Þ

Proof Suppose that an orthonormal basis comprising je1i, j e2i, ⋯, j eni spans Vn;

V n = Spanfje1 i, je2 i, ⋯, jen ig:

Of the orthonormal basis, let je1i, j e2i, ⋯, j eri (r < n) span W. Let an arbitrarily
chosen vector from Vn be jxi. Then we have
n
j xi = x1 j e1 i þ x2 j e2 i þ ⋯ þ xn j en i = x
i=1 i
j ei i: ð14:3Þ

Multiplying hejj on (14.3) from the left, we have


n n
ej jx = x
i=1 i
ej jei = xδ
i = 1 i ji
= xj : ð14:4Þ

That is,
n
j xi = i=1
hei jxijei i: ð14:5Þ

Meanwhile, put
r
j x0 i = i=1
hei jxijei i: ð14:6Þ

Then we have jx′i 2 W. Also putting jx′′i = j xi - j x′i and multiplying


hei j (1 ≤ i ≤ r) on it from the left, we get

hei jx00 i = hei jxi - hei jx0 i = hei jxi - hei jxi = 0: ð14:7Þ

Taking account of W = Span{| e1i, | e2i, ⋯, | eri}, from the definition of the
orthogonal complement we get jx′′i 2 W⊥. That is, for 8jxi 2 Vn
14.1 Projection Operators 573

j xi = j x0 iþ j x00 i: ð14:8Þ

This means that Vn = W + W⊥. Meanwhile, we have W \ W⊥ = {0}. In fact,


suppose that jxi 2 W \ W⊥. Then hx| xi = 0 because of the orthogonality. However,
this implies that jxi = 0. Consequently, from Theorem 11.1 we have Vn = W ⨁ W⊥.
This completes the proof.
The consequence of Theorem 14.1 is that the dimension of W⊥ is (n - r); see
Theorem 11.2. In other words, we have

dimV n = n = dimW þ dimW ⊥ :

Moreover, the contents of Theorem 14.1 can readily be generalized to more


subspaces such that

V n = W 1 ⨁W 2 ⨁⋯⨁W r , ð14:9Þ

where W1, W2, ⋯, and Wr (r ≤ n) are mutually orthogonal complements. In this


case 8jxi 2 Vn can be expressed uniquely as the sum of individual vectors jw1i,
j w2i, ⋯, and j wri of each subspace; i.e.,

j xi = j w1 iþ j w2 i þ ⋯þ j wr i = jw1 þ w2 þ ⋯ þ wr i: ð14:10Þ

Let us define the following operators similarly to the case of (12.201):

Pi ðjxiÞ = j wi i ð1 ≤ i ≤ r Þ: ð14:11Þ

Thus, the operator Pi extracts a vector jwii in a subspace Wi. Then we have

ðP1 þ P2 þ ⋯ þ Pr ÞðjxiÞ = P1 j xi þ P2 j xi þ ⋯ þ Pr j xi = j w1 iþ
j w2 i þ ⋯þ j wr i = j xi: ð14:12Þ

Since jxi is an arbitrarily chosen vector, we get

P1 þ P2 þ ⋯ þ Pr = E: ð14:13Þ

Moreover,

Pi ½Pi ðjxiÞ = Pi ðjwi iÞ = jwi i ð1 ≤ i ≤ r Þ: ð14:14Þ

Therefore, from (14.11) and (14.14) we have

Pi ½Pi ðjxiÞ = Pi ðjxiÞ: ð14:15Þ

The vector jxi is arbitrarily chosen, and so we get


574 14 Hermitian Operators and Unitary Operators

Pi 2 = Pi : ð14:16Þ

Choose another arbitrary vector jyi 2 Vn such that

j yi = j u1 iþ j u2 i þ ⋯þ j ur i = j u1 þ u2 þ ⋯ þ ur i: ð14:17Þ

Then, we have

hxjPi yi = hw1 þ w2 þ ⋯ þ wr jPi ju1 þ u2 þ ⋯ þ ur i


= hw1 þ w2 þ ⋯ þ wr jui i = hwi jui i: ð14:18Þ

With the last equality, we used the mutual orthogonality of the subspaces.
Meanwhile, we have

hyjPi xi = hu1 þ u2 þ ⋯ þ ur jPi jw1 þ w2 þ ⋯ þ wr i


= hu1 þ u2 þ ⋯ þ ur jwi i = hui j wi i = hwi jui i: ð14:19Þ

Comparing (14.18) and (14.19), we get

hxjPi yi = hyjPi xi = xjPi { y , ð14:20Þ

where we used (13.64) with the second equality. Since jxi and jyi are arbitrarily
chosen, we get

Pi { = Pi : ð14:21Þ

Equation (14.21) shows that Pi is Hermitian.


The above discussion parallels that made in Sect. 12.4 with an idempotent
operator. We have a following definition about a projection operator.
Definition 14.1 An operator P is said to be a projection operator if P2 = P and
P{ = P. That is, an idempotent and Hermitian operator is a projection operator.
As described above, a projection operator is characterized by (14.16) and (14.21).
An idempotent operator does not premise the presence of an inner product space, but
we only need a direct sum of subspaces. In contrast, if we deal with the projection
operator, we are thinking of orthogonal complements as subspaces and their direct
sum. The projection operator can adequately be defined in an inner product vector
space having an orthonormal basis.
From (14.13) we have
14.1 Projection Operators 575

r
ðP1 þ P2 þ ⋯ þ Pr ÞðP1 þ P2 þ ⋯ þ Pr Þ = P
i=1 i
2
þ PP
i≠j i j
r
= P
i=1 i
þ PP
i≠j i j
=E þ PP
i≠j i j
= E: ð14:22Þ

In (14.22) we used (14.13) and (14.16). Therefore, we get

PP
i≠j i j
= 0:

In particular, we have

Pi Pj = 0, Pj Pi = 0 ði ≠ jÞ: ð14:23Þ

In fact, we have

Pi Pj ðjxiÞ = Pi jwj = 0 ði ≠ j, 1 ≤ i, j ≤ nÞ: ð14:24Þ

The second equality comes from Wi \ Wj = {0}. Notice that in (14.24), indices
i and j are interchangeable. Again, jxi is arbitrarily chosen, and so (14.23) holds.
Combining (14.16) and (14.23), we write

Pi Pj = δij : ð14:25Þ

In virtue of the relation (14.23), Pi + Pj (i ≠ j) is a projection operator as well


[3]. In fact, we have

2
Pi þ Pj = Pi 2 þ Pi Pj þ Pj Pi þ Pj 2 = Pi 2 þ Pj 2 = Pi þ Pj ,

where the second equality comes from (14.23). Also we have

{
Pi þ Pj = Pi { þ Pj { = Pi þ Pj :

The following notation is often used:

jwi ihwi j
Pi = : ð14:26Þ
wi  wi

Then we have

jwi ihwi j
Pi j xi = ðj w1 iþ j w2 i þ ⋯ þ jwr iÞ
jjwi jj  jjwi jj
576 14 Hermitian Operators and Unitary Operators

= ½jwi iðhwi jw1 i þ hwi jw2 i þ ⋯ þ hwi jwr iÞ=hwi jwi i


= ½jwi ihwi jwi i=hwi jwi i = j wi i:

Furthermore, we have

Pi 2 j xi = Pi j wi i = j wi i = Pi j xi: ð14:27Þ

Equation (14.16) is recovered accordingly. Meanwhile,

ðjwi ihwi jÞ{ = ðhwi jÞ{ ðjwi iÞ{ = jwi ihwi j : ð14:28Þ

Hence, we recover

Pi { = Pi : ð14:21Þ

In (14.28) we used (AB){ = B{A{. In fact, we have

xjB{ A{ jy = xB{ jA{ y = hyAjBxi = hyjABjxi = xjðABÞ{ jy : ð14:29Þ

With the second equality of (14.29), we used (13.86) where A is replaced with
B and jyi is replaced with A{ j yi. Since (14.29) holds for arbitrarily chosen vectors
jxi and jyi, comparing the first and last sides of (14.29) we have

ðABÞ{ = B{ A{ : ð14:30Þ

We can express (14.29) alternatively as follows:

{ { 
xB{ jA{ y = y A{ j B{ x = hyAjBxi = hyABxi , ð14:31Þ

where with the second equality we used (13.68). Also recall the remarks after (13.83)
with the expressions of (14.29) and (14.31). Other notations can be adopted.
We can view a projection operator under a more strict condition. Related oper-
ators can be defined as well. As in (14.26), let us define an operator such that

P~k = j ek ihek j , ð14:32Þ

where jeki (k = 1, ⋯, n) are the orthonormal basis set of Vn. Operating P~k on
jxi = x1 j e1i + x2 j e2i + ⋯ + xn j eni from the left, we get
14.1 Projection Operators 577

P~k j xi = j ek ihek j
n n
x je
j=1 j j
= j ek i x he je
j=1 j k j
=
n
j ek i xδ
j = 1 j kj
= xk j ek i:

Thus, we find that P~k plays the same role as P(k) defined in (12.201). Represented
by a matrix, P~k has the same structure as that denoted in (12.205). Evidently,

{
P~k = P~k , P~k = P~k , P~1 þ P~2 þ ⋯ þ P~n = E:
2
ð14:33Þ

ðk Þ
Now let us modify P(k) in (12.201). There PðkÞ = δi δjðkÞ , where only the (k, k)
ðk Þ ðk Þ
element is 1, otherwise 0, in the (n, n) matrix. We define a matrix PðmÞ = δi δjðmÞ . A
full matrix representation for it is

0
⋱ 1
0
ðk Þ
PðmÞ = 0 , ð14:34Þ
0

0

where only the (k, m) element is 1, otherwise 0.


ðk Þ
In an example of (14.34), PðmÞ is an upper triangle matrix (k < m). Therefore, its
ðk Þ
eigenvalues are all zero, and so PðmÞ is a nilpotent matrix. If k > m, the matrix is a
lower triangle matrix and nilpotent as well. Such a matrix is not Hermitian (nor a
projection operator), as can be immediately seen from the matrix form of (14.34).
ðk Þ
Because of the properties of nilpotent matrices mentioned in Sect. 12.3, PðmÞ ðk ≠ mÞ
is not diagonalizable either.
Various relations can be extracted. As an example, we have

ðk Þ ðlÞ ðk Þ ðk Þ ðk Þ
PðmÞ PðnÞ = δi δqðmÞ δðqlÞ δjðnÞ = δi δml δjðnÞ = δml PðnÞ : ð14:35Þ
q

ðk Þ
Note that PðkÞ  PðkÞ defined in (12.201). From (14.35), moreover, we have

ðk Þ ðmÞ ðk Þ ðnÞ ðmÞ ðnÞ ðmÞ ðnÞ ðmÞ


PðmÞ PðnÞ = PðnÞ , PðmÞ PðnÞ = PðnÞ , PðnÞ PðmÞ = PðmÞ ,
2
ðk Þ ðk Þ ðk Þ ðmÞ ðmÞ ðmÞ ðmÞ
PðmÞ PðmÞ = δmk PðmÞ , PðmÞ PðmÞ = PðmÞ = PðmÞ , etc:

These relations remain unchanged after the unitary similarity transformation by


U. For instance, taking the first equation of the above, we have
578 14 Hermitian Operators and Unitary Operators

ðk Þ ðmÞ ðk Þ ðmÞ ðk Þ
U { PðmÞ PðnÞ U = U { PðmÞ U U { PðnÞ U = U { PðnÞ U:

ðk Þ
Among these operators, only PðkÞ is eligible for a projection operator. We will
encounter further examples in Parts IV and V.
ðk Þ
Replacing A in (13.62) with PðkÞ , we obtain

x1
ðk Þ ðk Þ
yjPðkÞ ðxÞ = y1 ⋯ yn GPðkÞ ⋮ : ð14:36Þ
xn

Within a framework of an orthonormal basis where G = E, the representation is


largely simplified to be

x1
ðk Þ ðk Þ
yjPðkÞ ðxÞ = y1 ⋯ yn PðkÞ ⋮ = yk xk : ð14:37Þ
xn

14.2 Normal Operators

There are a large group of operators called normal operators that play an important
role in mathematical physics, especially quantum physics. A normal operator is
defined as an operator on an inner product space that commutes with its adjoint
operator. That is, let A be a normal operator. Then, we have

AA{ = A{ A: ð14:38Þ

The normal operators include an Hermitian operator H defined as H{ = H as well


as a unitary operator U defined as UU{ = U{U = E.
In this condition let us estimate the norm of jA{xi together with jAxi defined by
(13.87). If A is a normal operator,

{
A{ x = x A{ jA{ x = xAA{ x = xA{ Ax = jjAxjj: ð14:39Þ

The other way around suppose that j|Ax| j = kA{xk. Then, since ||Ax||2 = kA{xk2,
hxA{Axi = hxAA{xi. That is, hx| A{A - AA{| xi = 0 for an arbitrarily chosen vector
jxi. To assert A{A - AA{ = 0, i.e., A{A = AA{ on the assumption that hx| A{A - AA{|
xi = 0, we need the following theorems:
Theorem 14.2 [4] A linear transformation A on an inner product space is the zero
transformation if and only if hy| Axi = 0 for any vectors jxi and jyi.
14.2 Normal Operators 579

Proof If A = 0, then hy| Axi = hy| 0i = 0. This is because in (13.3) putting


β = 1 = - γ and jbi = j ci, we get ha| 0i = 0. Conversely, suppose that hy|
Axi = 0 for any vectors jxi and jyi. Then, putting jyi = j Axi, hxA{| Axi = 0 and hy|
yi = 0. This implies that jyi = j Axi = 0. For jAxi = 0 to hold for any jxi we must
have A = 0. Note here that if A is a singular matrix, for some vectors jxi, j Axi = 0.
However, even though A is singular, for jAxi = 0 to hold for any jxi, A = 0. This
completes the proof.
We have another important theorem under a further restricted condition.
Theorem 14.3 [4] A linear transformation A on an inner product space is the zero
transformation if and only if hx| Axi = 0 for any vectors jxi.
Proof As in the case of Theorem 14.2, a necessary condition is trivial. To prove a
sufficient condition, let us consider the following:

hx þ yjAðx þ yÞi = hxjAxi þ hyjAyi þ hxjAyi þ hyjAxi,

hxjAyi þ hyjAxi = hx þ yjAðx þ yÞi - hxjAxi - hyjAyi: ð14:40Þ

From the assumption that hx| Axi = 0 with any vectors jxi, we have

hxjAyi þ hyjAxi = 0: ð14:41Þ

Meanwhile, replacing jyi by jiyi in (14.41), we get

hxjAiyi þ hiyjAxi = i½hxjAyi - hyjAxi = 0: ð14:42Þ

That is,

hxjAyi - hyjAxi = 0: ð14:43Þ

Summing both sides of (14.41) and (14.43), we get

hxjAyi = 0: ð14:44Þ

Theorem 14.2 means that A = 0, indicating that the sufficient condition holds.
This completes the proof.
Thus returning to the beginning, i.e., remarks made after (14.39), we establish the
following theorem:
Theorem 14.4 A necessary and sufficient condition for a linear transformation A on
an inner product space to be a normal operator is that a following relation
580 14 Hermitian Operators and Unitary Operators

A{ x = jjAxjj ð14:45Þ

holds.

14.3 Unitary Diagonalization of Matrices

A normal operator has a distinct property. The normal operator can be diagonalized
by a similarity transformation by a unitary matrix. The transformation is said to be a
unitary similarity transformation. Let us prove the following theorem.
Theorem 14.5 [5] A necessary and sufficient condition for a matrix A to be
diagonalized by unitary similarity transformation is that the matrix A is a normal
matrix.
Proof To prove the necessary condition, suppose that A can be diagonalized by a
unitary matrix U. That is,

U { AU = D, i:e:, A = UDU { and A{ = UD{ U { , ð14:46Þ

where D is a diagonal matrix. Then

AA{ = UDU { UD{ U { = UDD{ U { = UD{ DU { = UD{ U {


 UDU { = A{ A: ð14:47Þ

For the third equality, we used DD{ = D{D (i.e., D and D{ are commutable). This
shows that A is a normal matrix.
To prove the sufficient condition, let us show that a normal matrix can be
diagonalized by unitary similarity transformation. The proof is due to mathematical
induction, as is the case with Theorem 12.1.
First we show that Theorem is true of a (2, 2) matrix. Suppose that one of the
eigenvalues of A2 is α1 and that its corresponding eigenvector is jx1i. Following
procedures of the proof for Theorem 12.1 and remembering the Gram-Schmidt
orthonormalization theorem, we can construct a unitary matrix U1 such that

U 1 = ðjx1 i jp1 iÞ, ð14:48Þ

where jx1i represents a column vector and jp1i is another arbitrarily determined
column vector. Then we can convert A2 to a triangle matrix such that
14.3 Unitary Diagonalization of Matrices 581

A~2  U 1 { A2 U 1 = α1
0
x
y
: ð14:49Þ

Then we have

{ {
A~2 A~2 = U 1 { A2 U 1 U 1 { A2 U 1 = U 1 { A2 U 1 U 1 { A2 { U 1

= U 1 { A2 A2 { U 1 = U 1 { A2 { A2 U 1 = U 1 { A2 { U 1 U 1 { A2 U 1

{
= A~2 A~2 : ð14:50Þ

With the fourth equality, we used the supposition that A2 is a normal matrix.
Equation (14.50) means that A~2 defined in (14.49) is a normal operator. Via simple
matrix calculations, we have

{ jα1 j2 þ jxj2 xy { jα1 j2 α1 x


A2 A2 = , A2 A2 = : ð14:51Þ
x y j yj 2 α1 x jxj þ jyj2
2

For (14.50) to hold, we must have x = 0 in (14.51). Accordingly, we get

α1 0
A2 = : ð14:52Þ
0 y

This implies that a normal matrix A2 has been diagonalized by the unitary
similarity transformation.
Now let us examine a general case where we consider a (n, n) square normal
matrix An. Let αn be one of the eigenvalues of An. On the basis of the argument of the
~ we
(2, 2) matrix case, after a suitable similarity transformation by a unitary matrix U
first have

~ { An U,
A~2 = U ~ ð14:53Þ

where we can put

αn xT
An = , ð14:54Þ
0 B

where x is a column vector of order (n - 1), 0 is a zero column vector of order (n -


1), and B is a (n - 1, n - 1) matrix. Then we have
582 14 Hermitian Operators and Unitary Operators

{ αn 0
An = , ð14:55Þ
x B{

where x is a complex column vector. Performing matrix calculations, we have

{ jαn j2 þ xT x xT B{
An An = ,
Bx BB{
{ jαn j2 αn xT
An An = : ð14:56Þ
α n x x x þ B{ B
 T

For A~n ½A~n { = ½A~n { ½A~n  to hold with (14.56), we must have x = 0. Thus we get

αn 0
An = : ð14:57Þ
0 B

Since A~n is a normal matrix, so is B. According to mathematical induction, let us


assume that the theorem holds with a (n - 1, n - 1) matrix, i.e., B. Then, also from
the assumption there exists a unitary matrix C and a diagonal matrix D, both of order
(n - 1), such that BC = CD. Hence,

αn 0 1 0 1 0 αn 0
= : ð14:58Þ
0 B 0 C 0 C 0 D

Here putting

1 0 αn 0
Cn = and Dn = , ð14:59Þ
0 C 0 D

we get

A~n C~n = C~n D~n : ð14:60Þ

{
As C~n is a (n, n) unitary matrix, [C~n { C~n = C~n C~n = E. Hence,
{
C~n A~n C~n = D~n . Thus, from (14.53) finally we get

{
C~n ~ { An U
U ~ C~n = D~n : ð14:61Þ

~ C~n = V, V being another unitary operator,


Putting U

V { An V = D~n : ð14:62Þ

Obviously, from (14.59) D~n is a diagonal matrix.


14.3 Unitary Diagonalization of Matrices 583

These complete the proof.


A direct consequence of Theorem 14.5 is that with any normal matrix we can find
a set of orthonormal eigenvectors corresponding to individual eigenvalues whether
or not those are degenerate.
In Sect. 12.2 we dealt with a decomposition of a linear vector space and relevant
reduction of an operator when discussing canonical forms of matrices. In this context
Theorem 14.5 gives a simple and clear criterion for this. Equation (14.57) implies
that a (n - 1, n - 1) submatrix B can further be reduced to matrices having lower
dimensions. Considering that a diagonal matrix is a special case of triangle matrices,
a normal matrix that has been diagonalized by the unitary similarity transformation
gives eigenvalues by its diagonal elements.
From a point of view of the aforementioned aspect, let us consider the character-
istics of normal matrices, starting with the discussion about the invariant subspaces.
We have a following important theorem.
Theorem 14.6 Let A be a normal matrix and let one of its eigenvalues be α. Let Wα
be an eigenspace corresponding to α. Then, Wα is both A-invariant and A{-invariant.
Also Wα⊥ is both A-invariant and A{-invariant.
Proof Theorem 14.5 ensures that a normal matrix is diagonalized by unitary
similarity transformation. Therefore, we deal with only “proper” eigenvalues and
eigenvectors here. First we show if a subspace W is A-invariant, then its orthogonal
complement W⊥ is A{-invariant. In fact, suppose that jxi 2 W and jx′i 2 W⊥. Then,
from (13.64) and (13.86) we have
 
hx0 jAxi = 0 = xA{ jx0 = xjA{ x0 : ð14:63Þ

The first equality comes from the fact that |xi 2 W ⟹ A|xi (=| Axi) 2 W as W is
A-invariant. From the last equality of (14.63), we have

A{ jx0 = jA{ x0 2 W ⊥: ð14:64Þ

That is, W⊥ is A{-invariant.


Next suppose that jxi 2 Wα. Then we have

AA{ j xi = A{ A j xi = A{ ðαjxiÞ = αA{ j xi: ð14:65Þ

Therefore, A{ j xi 2 Wα. This means that Wα is A{-invariant. From the above


remark, Wα⊥ is (A{){-invariant, and so A-invariant accordingly. This completes the
proof.
From Theorem 14.5, we know that the resulting diagonal matrix D~n in (14.62) has
a form with n eigenvalues (αn) some of which may be multiple roots arranged in
diagonal elements. After diagonalizing the matrix, those eigenvalues can be sorted
out according to different eigenvalues α1, α2, ⋯, and αs. This can also be done by
unitary similarity transformation. The relevant unitary matrix U is represented as
584 14 Hermitian Operators and Unitary Operators

1

1
0 ⋯ 1
1
U= ⋮ ⋱ ⋮ , ð14:66Þ
1
1 ⋯ 0
1

1

where except (i, j) and ( j, i) elements equal to 1, all the off-diagonal elements are
zero. If operated from the left, U exchanges the i-th and j-th rows of the matrix. If
operated from the right, U exchanges the i-th and j-th columns of the matrix. Note
that U is at once unitary and Hermitian with eigenvalue 1 or -1. Note that U2 = E.
This is because exchanging two columns (or two rows) two times produces identity
transformation.
Thus performing such unitary similarity transformations appropriate times, we
get

α1

α1
α2

Dn : ð14:67Þ
α2

αs

αs

The matrix is identical to that represented in (12.181).


In parallel, Vn is decomposed to mutually orthogonal subspaces associated with
different eigenvalues α1, α2, ⋯, and αs such that

V n = W α1 W α2 ⋯ W αs : ð14:68Þ

This expression is formally identical to that represented in (12.191). Note,


however, that in (12.181) orthogonal subspaces are not implied. At the same time,
An is reduced to

Að1Þ

An Að2Þ , ð14:69Þ
⋮ ⋱ ⋮
⋯ AðsÞ

according to the different eigenvalues.


14.3 Unitary Diagonalization of Matrices 585

A normal operator has other distinct properties. Following theorems are good
examples.
Theorem 14.7 Let A be a normal operator on Vn. Then jxi is an eigenvector of
A with an eigenvalue α, if and only if jxi is an eigenvector of A{ with an eigenvalue
α.
Proof We apply (14.45) for the proof. Both (A - αE){ = A{ - αE and (A - αE) are
normal, since A is normal. Consequently, we have k(A - αE)xk = 0 if and only if
k(A{ - αE)xk = 0. Since only the zero vector has a zero norm, we get
(A - αE) j xi = 0 if and only if (A{ - αE) j xi = 0.
This completes the proof.
Theorem 14.8 Let A be a normal operator on Vn. Then, eigenvectors corresponding
to different eigenvalues are mutually orthogonal.
Proof Let A be a normal operator on Vn. Let jui be an eigenvector corresponding to
an eigenvalue α and jvi be an eigenvector corresponding to an eigenvalue β with
α ≠ β. Then we have

αhvjui = hvjαui = hvjAui = ujA{ v = hujβ vi = hβ vjui = βhvjui, ð14:70Þ

where with the fourth equality we used Theorem 14.7. Then we get

ðα - βÞhvjui = 0:

Since α - β ≠ 0, hv| ui = 0. Namely, the eigenvectors jui and jvi are mutually
orthogonal. This completes the proof.
In (12.208) we mentioned the decomposition of diagonalizable matrices. As for
the normal matrices, we have a related matrix decomposition. Let A be a normal
operator. Then, according to Theorem 14.5, A can be diagonalized and expressed as
(14.67). This is equivalently expressed as a following succinct relation. That is, if we
choose U for a diagonalizing unitary matrix, we have

U { AU = α1 P1 þ α2 P2 þ ⋯ þ αs Ps , ð14:71Þ

where α1, α2, ⋯, and αs are different eigenvalues (possibly degenerate) of A;


Pl (1 ≤ l ≤ s) is described such that, e.g.,

E n1

0n 2
P1 = , ð14:72Þ
⋮ ⋱ ⋮
⋯ 0n s
586 14 Hermitian Operators and Unitary Operators

where E n1 stands for a (n1, n1) identity matrix with n1 corresponding to multiplicity
of α1. A matrix represented by 0n2 is a (n2, n2) zero matrix, and so forth. This
expression is in accordance with (14.69). From a matrix form (14.72), obviously
Pl (1 ≤ l ≤ s) is a projection operator. Thus, operating U and U{ on both sides (14.71)
from the left and right of (14.71), respectively, we obtain

A = α1 UP1 U { þ α2 UP2 U { þ ⋯ þ αs UPs U { : ð14:73Þ

Defining P~l  UPl U { ð1 ≤ l ≤ sÞ, we have

A = α1 P~1 þ α2 P~2 þ ⋯ þ αs P~s : ð14:74Þ

In (14.74), we can easily check that P~l is a projection operator with α1, α2, ⋯,
and αs being different eigenvalues of A. If one of the eigenvalues is zero, in (14.74)
we may disregard the term related to the zero eigenvalue. For example, suppose that
α1 = 0 in (14.74), we get

A = α2 P~2 þ ⋯ þ αs P~s :

In case, moreover, αl (1 ≤ l ≤ s) is degenerate, we express P~l as P~μl ð1 ≤ μ ≤ ml Þ,


where ml is multiplicity of αl. In that case, we may write

P~l = P~1l ⋯ P~m


l :
l
ð14:75Þ

Also, for k ≠ l we have

P~k P~l = UPk U { UPl U { = UPk EPl U { = UPk Pl U { = 0 ð1 ≤ k, l ≤ sÞ:

The last equality comes from (14.23). Similarly, we have

P~l P~k = 0:

Thus, we have

P~k P~l = δkl :

If the operator is decomposed as in the case of (14.75), we can express

P~μl P~νl = δμν ð1 ≤ μ, ν ≤ ml Þ:

Conversely, if an operator A is expressed by (14.74), that operator is normal


operator. In fact, we have
14.3 Unitary Diagonalization of Matrices 587

A A={ ~i
αi P ~j
αj P = ~iP
αi  αj P ~j = ~i =
αi  αj δij P ~i,
jαi j2 P
i j i, j i, j i

AA{ = ~j
αj P ~i
αi P = ~jP
αi  αj P ~i = ~j
αi  αj δji P
j i i, j i, j
2~
= jαi j Pi : ð14:76Þ
i

Hence, A{A = AA{. If projection operators are further decomposed as in the case
of (14.75), we have a related expression to (14.76). Thus, a necessary and sufficient
condition for an operator to be a normal operator is that the said operator is expressed
as (14.74). The relation (14.74) is well-known as a spectral decomposition theorem.
Thus, the spectral decomposition theorem is equivalent to Theorem 14.5.
In Sect. 12.7, we have already encountered the matrix decomposition (12.211)
similar to that of (14.74). In this context, (12.211) may well be referred to as the
spectral decomposition in a broader sense. Notice, however, that in (12.211)
A~k ð1 ≤ k ≤ nÞ is not necessarily Hermitian. In Part V, we will encounter interesting
examples of matrix decomposition such as the spectral decomposition.
The relations (12.208) and (14.74) are virtually the same, aside from the fact that
whereas (14.74) premises an inner product space, (12.208) does not premise
it. Correspondingly, whereas the related operators are called projection operators
with the case of (14.74), those operators are said to be idempotent operators for
(12.208).
Example 14.1 Let us think of a Gram matrix of Example 13.1, as shown below

2 1þi
G= : ð13:51Þ
1-i 2

After a unitary similarity transformation, we got


p
2þ 2 0p
U { GU = : ð13:54Þ
0 2- 2

~ = U { GU and rewriting (13.54), we have


Putting G
p p p
~ = 2þ 2 0p 1 0 0 0
G = 2þ 2 þ 2- 2 :
0 2- 2 0 0 0 1

~ { = G, we get
By a back calculation of U GU
588 14 Hermitian Operators and Unitary Operators

p
1 2ð1 þ iÞ
p 2 4
G= 2 þ 2 p
2ð1 - iÞ 1
4 2
p
1 2ð 1 þ i Þ
p -
2 4
þ 2- 2 p : ð14:77Þ
2ð 1 - i Þ 1
-
4 2
p p
Putting eigenvalues α1 = 2 þ 2 and α2 = 2 - 2 along with
p p
1 2ð1 þ iÞ 1 2ð1 þ iÞ
-
2 4 2 4
A1 = p , A2 = p , ð14:78Þ
2ð1- iÞ 1 2ð 1 - i Þ 1
-
4 2 4 2

we get

G = α1 A1 þ α2 A2 : ð14:79Þ

In the above, A1 and A2 are projection operators. In fact, as anticipated we have

A1 2 = A1 , A2 2 = A2 , A1 A2 = A2 A1 = 0, A1 þ A2 = E: ð14:80Þ

Moreover, (14.78) obviously shows that both A1 and A2 are Hermitian. Thus,
(14.77) and (14.79) are an example of spectral decomposition. The decomposition is
unique. □
Example 14.1 can be dealt with in parallel to Example 12.5. In Example 12.5,
however, an inner product space is not implied, and so we used an idempotent matrix
instead of a projection operator. Note that as can be seen in Example 12.5 that
idempotent matrix was not Hermitian.

14.4 Hermitian Matrices and Unitary Matrices

Of normal matrices, Hermitian matrices and unitary matrices play a crucial role both
in fundamental and applied science. Let us think of several topics and examples.
In quantum physics, one frequently treats expectation value of an operator. In
general, such an operator is Hermitian, more strictly an observable. Moreover, a
vector on an inner product space is interpreted as a state on a Hilbert space. Suppose
that there is a linear operator (or observable that represents a physical quantity)
O that has discrete (or countable) eigenvalues α1, α2, ⋯. The number of the
eigenvalues may be a finite number or an infinite number, but here we assume the
14.4 Hermitian Matrices and Unitary Matrices 589

finite number; i.e., let us suppose that we have eigenvalues α1, α2, ⋯, and αs, in
consistent with our previous discussion.
In quantum physics, we have a well-known Born probability rule. The rule says
the following: Suppose that we carry out a physical measurement on A with respect
to a physical state jui. Here we assume that jui has been normalized. Then, the
probability ℘l that A takes αl (1 ≤ l ≤ s) is given by

℘l = P~l u
2
, ð14:81Þ

where P~l is a projection operator that projects jui to an eigenspace W αl spanned by


| αl, ki. Here k (1 ≤ k ≤ nl) reflects the multiplicity nl of an eigenvalue αl. Hence, we
express the nl-dimensional eigenspace W αl as

W αl = Spanfjαl , 1i, jαl , 2i, ⋯, jαl , nl ig: ð14:82Þ

Now, we define an expectation value hAi of A such that


s
hAi  α℘:
l=1 l l
ð14:83Þ

From (14.81) we have

{
℘l = P~l u = uP~l jP~l u = uP~l jP~l u = uP~l P~l u = uP~l u :
2
ð14:84Þ

For the third equality we used the fact that P~l is Hermitian; for the last equality we
used P~l = P~l . Summing (14.84) with the index l, we have
2

℘l = uP~l u = u P~l u = huEui = hujui = 1,


l l l

where with the third equality we used (14.33).


Meanwhile, from the spectral decomposition theorem, we have

A = α1 P~1 þ α2 P~2 þ ⋯ þ αs P~s : ð14:74Þ

Operating huj and jui on both sides of (14.74), we get

hujAjui = α1 ujP~1 ju þ α2 ujP~2 ju þ ⋯ þ αs ujP~s ju = α1 ℘1 þ α2 ℘2


þ ⋯ þ αs ℘s : ð14:85Þ

Equating (14.83) and (14.85), we have

hAi = hujAjui: ð14:86Þ


590 14 Hermitian Operators and Unitary Operators

In quantum physics, a real number is required for an expectation value of an


observable (i.e., a physical quantity). To warrant this, we have following theorems.
Theorem 14.9 A linear transformation A on an inner product space is Hermitian if
and only if hu| A| ui is real for all jui of that inner product space.
Proof If A = A{, then hu| A| ui = hu| A{| ui = hu| A| ui. Therefore, hu| A| ui is real.
Conversely, if hu| A| ui is real for all jui, we have

hujAjui = hujAjui = ujA{ ju :

Hence,

ujA - A{ ju = 0: ð14:87Þ

From Theorem 14.3, we get A - A{ = 0, i.e., A = A{. This completes the proof.
Theorem 14.10 [4] The eigenvalues of an Hermitian operator A are real.
Proof Let α be an eigenvalue of A and let jui be a corresponding eigenvector. Then,
A j ui = α j ui. Operating huj from the left, we have hu| A| ui = αhu|
ui = αkuk2. Thus,

α = hujAjui=kuk2 : ð14:88Þ

Since A is Hermitian, hu| A| ui is real. Then, the eigenvalue α is real as well. This
completes the proof.
Unitary matrices have a following conspicuous features: (1) An inner product is
held invariant under unitary transformation: Suppose that jx′i = U j xi and
jy′i = U j yi, where U is a unitary operator. Then hy′| x′i = hyU{| Uxi = hy| xi. A
norm of any vector is held invariant under unitary transformation as well. This is
easily checked by replacing jyi with jxi in the above. (2) Let U be a unitary matrix
and suppose that λ is an eigenvalue with jλi being its corresponding eigenvector of
that matrix. Then we have

λi = U { U λi = U { ðλjλiÞ = λU { jλi = λλ jλi, ð14:89Þ

where with the last equality we used Theorem 14.7. Thus

ð1 - λλ Þ j λi = 0: ð14:90Þ

As jλi ≠ 0 is assumed, 1 - λλ = 0. That is

λλ = jλj2 = 1: ð14:91Þ


14.4 Hermitian Matrices and Unitary Matrices 591

Thus, eigenvalues of a unitary matrix have unit absolute value. Let those eigen-
values be λk = eiθk ðk = 1, 2, ⋯; θk : realÞ. Then, from (12.11) we have

detU = λk = eiðθ1 þθ2 þ⋯Þ and jdetU j = jλk j = eiθk = 1:


k k k

That is, any unitary matrix has a determinant of unit absolute value.
Example 14.2 Let us think of a following unitary matrix R:

cos θ - sin θ
R= : ð14:92Þ
sin θ cos θ

A characteristic equation is

cos θ - λ - sin θ
= λ2 - 2λ cos θ þ 1: ð14:93Þ
sin θ cos θ - λ

Solving (14.93), we have


p
λ = cos θ ± cos 2 θ - 1 = cos θ ± i j sin θ j :

1. θ = 0: This is a trivial case. The matrix R has automatically been diagonalized to


be an identity matrix. Eigenvalues are 1 (double root).
2. θ = π: This is a trivial case. Eigenvalues are -1 (double root).
3. θ ≠ 0, π: Let us think of 0 < θ < π. Then λ = cos θ ± i sin θ. As a diagonalizing
unitary matrix U, we get

1 1 1 i
p p p p
2 2 2 2
U= , U{ = :
i i 1 i
- p p p - p
2 2 2 2

Therefore, we have

1 i 1 1
p p p p
2 2 cos θ - sin θ 2 2
U { RU =
1 i sin θ cos θ i i
p -p -p p ð14:94Þ
2 2 2 2
eiθ 0
= :
- iθ
0 e
592 14 Hermitian Operators and Unitary Operators

A trace of the resulting matrix is 2 cos θ. In the case of π < θ < 2π, we get a
diagonal matrix similarly. The conformation is left for readers. □

14.5 Hermitian Quadratic Forms

The Hermitian quadratic forms appeared in, e.g., (13.34) and (13.83) in relation to
Gram matrices in Sect. 13.2. The Hermitian quadratic forms have wide applications
in the field of mathematical physics and materials science.
Let H be an Hermitian operator and jxi on an inner product space Vn. We define
the Hermitian quadratic form in an arbitrary orthonormal basis as follows:

x1
hxjHjxi  x1 ⋯ xn H ⋮ = x ðH Þij xj ,
i,j i
xn

where jxi is represented as a column vector, as already mentioned in Sect. 13.4. Let
us start with unitary diagonalization of (13.40), where a Gram matrix is a kind of
Hermitian matrix. Following similar procedures, as in the case of (13.36) we obtain a
diagonal matrix and an inner product such that

λ1 ⋯ 0 ξ1
hxjHjxi = ξ1 ⋯ ξn ⋮ ⋱ ⋮ ⋮ = λ1 jξ1 j2 þ ⋯ þ λn jξn j2 : ð14:95Þ
0 ⋯ λn ξn

Notice, however, that the Gram matrix comprising basis vectors (that are linearly
independent) is positive definite. Remember that if a Gram matrix is constructed by
A{A (Hermitian as well) according to whether A is non-singular or singular, A{A is
either positive definite or non-negative. The Hermitian matrix we are dealing with
here, in general, does not necessarily possess the positive definiteness or
non-negative feature. Yet, remember that hx| H| xi and eigenvalues λ1, ⋯, λn
are real.
Positive definiteness of matrices is an important concept in relation to the
Hermitian (and real) quadratic forms (see Sect. 13.2). In particular, in the case
where all the matrix elements are real and | xi is defined in a real domain, we are
dealing with the quadratic form with respect to a real symmetric matrix. In the case
of the real quadratic forms, we sometimes adopt the following notation [1, 2]:

n x1
A½x  xT Ax = a xx;
i,j = 1 ij i j
x= ⋮ ,
xn

where A = (aij) is a real symmetric matrix and xi (1 ≤ i ≤ n) are real numbers.


14.5 Hermitian Quadratic Forms 593

Here, we wish to show that the positive definiteness is invariant under a trans-
formation PTAP, where P is non-singular. In other words, if A > 0, we must have

A0 = PT AP > 0: ð14:96Þ

In (14.96) the transformation A → A′ is said to be an equivalence transformation


(or equivalent transformation) of A by P. Also, if A and A′ satisfy (14.96), A and A′
are said to be equivalent and we write [1, 2]

A0 ≃ A:

Note that the equivalence transformation satisfies the equivalence law; readers,
check it. To prove the above proposition stated by (14.96), we take in advance the
notion of the polar decomposition of a matrix (see Sect. 24.9.1). The point is that any
real non-singular matrix P is uniquely decomposed into a product such that P = OS,
where O is an orthogonal matrix and S is a positive definite real symmetric matrix
(see Corollary 24.3).
Then, we have

A0 = ðOSÞT AðOSÞ = S OT AO S = S O - 1 AO S:

Defining A~  O - 1 AO, we have A~ > 0. It is because the similarity transformation


of A by O holds its eigenvalues unchanged. Next, suppose that S is diagonalized
~ It is always possible on the basis of Theorem 14.5.
through an orthogonal matrix O.
Then, we have

O ~ =O
~ T A0 O ~O
~TS O ~ O
~T A ~O ~= O
~ T SO ~ O
~ T SO ~O
~TA ~ O ~ = DAD,
~ T SO ð14:97Þ

where D is a diagonal matrix given by

d1 ⋯
~ T SO
D=O ~ = ⋮ ⋱ ⋮
⋯ dn

~TA
with di > 0 (i = 1, ⋯, n). Also, we have A = O ~ O. ~ > 0, we have A > 0.
~ Since A
Meanwhile, let us think of a following inner product xjDADjx , where x is an
arbitrary real vector in Vn. We have
594 14 Hermitian Operators and Unitary Operators

x1 x1
xjDADjx = ðx1 ⋯ xn ÞDAD ⋮ = ½ðx1 ⋯ xn ÞDA D ⋮
xn xn

x1 d 1
= ðx1 d1 ⋯ xn d n ÞA ⋮ : ð14:98Þ
xn d n

Suppose here that (x1⋯xn) ≠ 0. That is, at least one of xi (i = 1, ⋯, n) is a


non-zero (real) number. Then, at least one of xidi (i = 1, ⋯, n) is a non-zero (real)
number as well. Since A > 0, the rightmost side of (14.98) is positive. This implies
that xjDADjx is positive with arbitrarily chosen (x1⋯xn) ≠ 0. Namely, DAD > 0.
Thus, from (14.97) and (13.48) we have A′ > 0. It is because a similarity transfor-
~ keeps its eigenvalues unchanged.
mation of A′ by O
Thus, we have established the next important theorem.
Theorem 14.11 [1, 2] Let A be a real symmetric matrix with A > 0. Then, we have
PTAP > 0, where P is a real non-singular matrix.
Since A is a real symmetric matrix, we must have a suitable orthogonal matrix O
that diagonalizes A such that

λ1 ⋯ 0
O ^=
^ T AO ⋮ ⋱ ⋮ ,
0 ⋯ λn

where λi (1 ≤ i ≤ n) are the eigenvalues of A.


Since A > 0 from the assumption, we should have 8λi > 0 (1 ≤ i ≤ n) from
(13.48). Then, we have

T T n
det O AO = detO ðdetAÞ detO = ð ± 1ÞdetAð ± 1Þ = detA = i=1
λi :

Notice that we distinguish the equivalence transformation from the similarity


transformation P-1A P. Nonetheless, if we choose an orthogonal matrix O for P, the
two transformations are the same because OT = O-1.
We often encounter real quadratic forms in the field of electromagnetism. Typical
example is a trace of electromagnetic fields observed with an elliptically or circularly
polarized light (see Sect. 7.3). A permittivity tensor of an anisotropic media such as
crystals (either inorganic or organic) is another example. A tangible example for this
appeared in Sect. 9.5.2 with an anisotropic organic crystal.
Regarding the real quadratic forms, we have a following important theorem
(Theorem 14.12). To prove Theorem 14.12, we need Theorem 14.11.
Theorem 14.12 [1, 2] Let A be a n-dimensional real symmetric matrix A = (aij). Let
A(k) be k-dimensional principal submatrices described by
14.5 Hermitian Quadratic Forms 595

ai1 i1 ai1 i2 ⋯ ai1 ik


ðk Þ ai2 i1 ai 2 i 2 ⋯ ai2 ik
A = ð1 ≤ i1 < i2 < ⋯ < ik ≤ nÞ,
⋮ ⋮ ⋱ ⋮
aik i1 aik i2 ⋯ aik ik

where the principal submatrices mean a matrix made by striking out rows and
columns on diagonal elements. Then, we have

A > 0 ⟺ detAðkÞ > 0 ð1 ≤ k ≤ nÞ: ð14:99Þ

Proof First, suppose that A > 0. Then, in a quadratic form A[x] equating (n - k)
variables xl = 0 (l ≠ i1, i2, ⋯, ik), we obtain a quadratic form of

k
a x x :
μ,ν = 1 iμ iν iμ iν

Since A > 0, this (partial) quadratic form is positive definite as well; i.e., we have

AðkÞ > 0:

Therefore, we get

detAðkÞ > 0:

This is due to (13.48). Notice that detA(k) is said to be a principal minor. Thus, we
have proven ⟹ of (14.99).
To prove ⟸, in turn, we use mathematical induction. If n = 1, we have a trivial
case; i.e., A is merely a real positive number. Suppose that ⟸ is true of n - 1. Then,
we have A(n - 1) > 0 by supposition. Thus, it follows that it will suffice to show
A > 0 on condition that A(n - 1) > 0 and detA > 0 in addition. Let A be a n-
dimensional real symmetric non-singular matrix such that

Aðn - 1Þ a
A= ,
aT an

where A(n - 1) is a symmetric matrix and non-singular as well. We define P such that

-1
P= E Aðn - 1Þ a ,
0 1

where E is a (n - 1, n - 1) unit matrix. Notice that detP = det E  1 = 1 ≠ 0,


indicating that P is non-singular. We have
596 14 Hermitian Operators and Unitary Operators

E 0
PT = -1 :
aT Aðn - 1Þ 1

For this expression, consider a non-singular matrix S. Then, we have SS-1 = E.


Taking its transposition, we have (S-1)TST = E. Therefore, if S is a symmetric matrix
(S-1)TS = E; i.e., (S-1)T = S-1. Hence, an inverse matrix of a symmetric matrix is
symmetric as well. Then, for a symmetric matrix A(n - 1) we have

-1 T -1 T -1
aT Aðn - 1Þ = Aðn - 1Þ = Aðn - 1Þ
T
aT a:

Therefore, A can be expressed as

Aðn - 1Þ 0
A = PT -1 P:
0 an - Aðn - 1Þ ½a

-1
E 0 Aðn - 1Þ 0 E Aðn - 1Þ a :
= ðn - 1Þ - 1 ðn - 1Þ - 1
aT A 1 0 an - A ½ a 0 1
ð14:100Þ

Now, taking a determinant of (14.100), we have

-1
detA = detPT detAðn - 1Þ an - Aðn - 1Þ ½a detP
-1
= detAðn - 1Þ an - Aðn - 1Þ ½ a :

By supposition, we have detA(n - 1) > 0 and detA > 0. Hence, we have

-1
an - Aðn - 1Þ ½a > 0:

-1 xðn - 1Þ
Putting a~n  an - Aðn - 1Þ ½a and x  xn
, we get

Aðn - 1Þ 0
½x = Aðn - 1Þ xðn - 1Þ þ an xn 2 :
0 an

Since A(n - 1) > 0 and a~n > 0, we have

~  Aðn - 1Þ 0
A > 0:
0 an

Meanwhile, A is expressed as
14.5 Hermitian Quadratic Forms 597

~ or A ≃ A:
A = PT AP ~

From (14.96), we have A > 0. These complete the proof.


We also have a related theorem (the following Theorem 14.13) for an Hermitian
quadratic form.

Theorem 14.13 Let A = (aij) be a n-dimensional Hermitian matrix. Let A~ðkÞ be k-


dimensional principal submatrices. Then, we have

A > 0 ⟺ detA~ðkÞ > 0 ð1 ≤ k ≤ nÞ,

where A~ðkÞ is described as

a11 ⋯ a1k
A~ðkÞ = ⋮ ⋱ ⋮ :
ak1 ⋯ akk

The proof is left for readers.


Example 14.3 Let us consider a following Hermitian (real symmetric) matrix and
corresponding Hermitian quadratic form.

5 2 5 2 x1
H= , hxjHjxi = ðx1 x2 Þ : ð14:101Þ
2 1 2 1 x2

Principal minors of H are 5 (>0) and 1 (>0) and detH = 5 - 4 = 1 (>0).


Therefore, from Theorem 14.12 we have H > 0. A characteristic equation gives the
following eigenvalues; i.e.,
p p
λ1 = 3 þ 2 2, λ2 = 3 - 2 2:

Both the eigenvalues are positive as anticipated. As a diagonalizing matrix R, we


get
p p
1- 2
R= 1þ 2
1 1
:

To obtain a unitary matrix, we have to seek norms of column vectors.


p
Corresponding to λ1 and λ2, we estimate their norms to be 4 þ 2 2 and
p
4 - 2 2, respectively. Using them, as a unitary matrix U we get
598 14 Hermitian Operators and Unitary Operators

1 1
p - p
U= 4-2 2 4þ2 2 : ð14:102Þ
1 1
p p
4þ2 2 4-2 2

Thus, performing the matrix diagonalization, we obtain a diagonal matrix D such


that
p
{ 3þ2 2 0p
D = U HU = : ð14:103Þ
0 3-2 2

Let us view the above unitary diagonalization in terms of coordinate transforma-


tion. Using the above matrix U and changing (14.101) as in (13.36),

5 2 x
hxjHjxi = ðx1 x2 ÞUU { UU { 1
2 1 x2
p
3þ2 2 0p x
= ðx1 x2 ÞU U{ 1 : ð14:104Þ
0 3-2 2 x2

Making an argument analogous to that of Sect. 13.2 and using similar notation,
we have

x~1
h~xjH 0 j~xi = ð x~1 x~2 ÞH 0 x~2
= ðx1 x2 ÞUU { HUU { x1
x2
= hxjHjxi: ð14:105Þ

That is, we have

x1 x1
= U{ : ð14:106Þ
x2 x2

Or, taking an adjoint of (14.106), we have equivalently ð x~1 x~2 Þ = ðx1 x2 ÞU.
x~1 x1
Notice here that x~2
and x2
are real and that U is real (i.e., orthogonal matrix).
Likewise,

H 0 = U { HU: ð14:107Þ

x1 ~x1
Thus, it follows that x2
and ~x2
are different column vector representations
of the identical vector that is viewed in reference to two different sets of orthonormal
bases (i.e., different coordinate systems).
From (14.102), as an approximation we have
14.6 Simultaneous Eigenstates and Diagonalization 599

Fig. 14.1 Ellipse obtained


through diagonalization of
an Hermitian quadratic
form. The angle θ is about
22.5°

0:92 - 0:38 cos 22:5 ° - sin 22:5 °


Uffi ffi : ð14:108Þ
0:38 0:92 sin 22:5 ° cos 22:5 °

Equating hx| H| xi to a constant, we get a hypersurface in a plane. Choosing 1 for a


constant, we get an equation of hypersurface (i.e., an ellipse) as a function of x~1 and
x~2 such that

x~1 2 x~2 2
2
þ 2
= 1: ð14:109Þ
1p 1p
3þ2 2 3-2 2

Figure 14.1 depicts the ellipse. □

14.6 Simultaneous Eigenstates and Diagonalization

In quantum physics, a concept of simultaneous eigenstate is important and has been


briefly mentioned in Sect. 3.3. To rephrase this concept, suppose that there are two
operators B and C and ask whether the two operators (or more) possess a common set
of eigenvectors. The question is boiled down to whether the two operators commute.
To address this question, the following theorem is important:
600 14 Hermitian Operators and Unitary Operators

Theorem 14.14 [4] Two Hermitian matrices B and C commute if and only if there
exists a complete orthonormal set of common eigenvectors.
Proof Suppose that there exists a complete orthonormal set of common eigenvec-
tors {jxi; ki (1 ≤ k ≤ mi)} that span the linear vector space, where i and mi are a
positive integer and that jxii corresponds to an eigenvalue bi of B and ci of C. Note
that if mi is 1, we say that the spectrum is non-degenerate and that if mi is equal to
two or more, the spectrum is said to be degenerate. Then we have

Bðjxi iÞ = bi j xi i, C ðjxi iÞ = ci j xi i: ð14:110Þ

Therefore, BC(| xii) = B(ci| xii) = ciB(| xii) = cibi j xii. Similarly, CB(| xii) = cibi j xii.
Consequently, (BC - CB)(| xii) = 0 for any jxii. As all the set of jxii span the vector
space, BC - CB = 0, namely BC = CB.
In turn, assume that BC = CB and that B(| xii) = bi j xii, where jxii are
orthonormal. Then, we have

CBðjxi iÞ = bi C ðjxi iÞ ⟹ B½Cðjxi iÞ = bi C ðjxi iÞ: ð14:111Þ

This implies that C(| xii) is an eigenvector of B corresponding to the eigenvalue bi.
We have two cases.
1. The spectrum is non-degenerate: The spectrum is said to be non-degenerate if
only one eigenvector belongs to an eigenvalue. In other words, multiplicity of bi
is one. Then, C(| xii) must be equal to some constant times jxii; i.e., C(| xii) = ci j xii.
That is, jxii is an eigenvector of C corresponding to an eigenvalue ci. That is, jxii
is a common eigenvector to B and C. This completes the proof for the
non-degenerate case.
2. The spectrum is degenerate: The spectrum is said to be degenerate if two or more
eigenvectors belong to an eigenvalue. Multiplicity of bi is two or more; here
ð1Þ ð2Þ ðmÞ
suppose that the multiplicity is m (m ≥ 2). Let j xi i, j xi i, ⋯, and j xi i
be linearly independent vectors and belong to the eigenvalue bi of B. Then, from
the assumption we have m eigenvectors

ð1Þ ð2Þ ðmÞ


C jxi i , C jxi i , ⋯, C jxi i

that belong to an eigenvalue bi of B. This means that individual


ðμÞ
C jxi i ð1 ≤ μ ≤ mÞ are described by linear combination of
ð1Þ ð2Þ ðmÞ
j xi i, j xi i, ⋯, and j xi i. What we want to prove is to show that suitable
linear combination of these m vectors constitutes an eigenvector corresponding to an
eigenvalue cμ of C. Here, to avoid complexity we denote the multiplicity by
m instead of the above-mentioned mi.
ð μÞ
The vectors C jxi i ð1 ≤ μ ≤ mÞ can be described as
14.6 Simultaneous Eigenstates and Diagonalization 601

ð1Þ m ðjÞ ðmÞ m ðjÞ


C jxi i = α
j = 1 j1
j xi i, ⋯, C jxi i = α
j = 1 jm
j xi i: ð14:112Þ

Using full matrix representation, we have

γ1
m ðk Þ ð1Þ ðmÞ
C γ jx i = jxi i⋯jxi i C
k=1 k i

γm
α11 ⋯ α1m γ1
ð1Þ ðmÞ
= jxi i⋯jxi i ⋮ ⋱ ⋮ ⋮ : ð14:113Þ
αm1 ⋯ αmm γm

In (14.113), we adopt the notation of (11.37). Since (αij) is a matrix representation


of an Hermitian operator C, (αij) is Hermitian as well. More specifically, if we take an
ðlÞ
inner product of a vector expressed in (14.112) with j xi i, then we have

ðlÞ ðk Þ ðlÞ m ðjÞ m ðlÞ ðjÞ m


xi jCxi = xi j α x
j = 1 jk i
= α
j = 1 jk
xi jxi = α δ
j = 1 jk lj

= αlk ð1 ≤ k, l ≤ mÞ, ð14:114Þ

where the third equality comes from the orthonormality of the basis vectors.
Meanwhile,
 
ðlÞ ðk Þ ðk Þ ðlÞ ðk Þ ðlÞ
xi jCxi = xi jC{ xi = xi jCxi = αkl  , ð14:115Þ

where the second equality comes from the Hermiticity of C. From (14.114) and
(14.115), we get

αlk = αkl  ð1 ≤ k, l ≤ mÞ: ð14:116Þ

This indicates that (αij) is in fact Hermitian.


We are seeking the condition under which linear combinations of the eigenvec-
ðk Þ
tors j xi ið1 ≤ k ≤ mÞ for B are simultaneously eigenvectors of C. If the linear
ðk Þ
combination m k = 1 γ k j xi i is to be an eigenvector of C, we must have

m ðk Þ m ðjÞ
C γ jx
k=1 k i
=c γ jx
j=1 j i
: ð14:117Þ

Considering (14.113), we have

α11 ⋯ α1m γ1 γ1
ð1Þ ðmÞ ð1Þ ðmÞ
jxi i⋯jxi i ⋮ ⋱ ⋮ ⋮ = jxi i⋯jxi i c ⋮ : ð14:118Þ
αm1 ⋯ αmm γm γm
602 14 Hermitian Operators and Unitary Operators

ðjÞ
The vectors j xi ið1 ≤ j ≤ mÞ span an invariant subspace (i.e., an eigenspace
corresponding to an eigenvalue of bi). Let us call this subspace Wm. Consequently,
ðjÞ
in (14.118) we can equate the scalar coefficients of individual j xi ið1 ≤ j ≤ mÞ.
Then, we get

α11 ⋯ α1m γ1 γ1
⋮ ⋱ ⋮ ⋮ =c ⋮ : ð14:119Þ
αm1 ⋯ αmm γm γm

This is nothing other than an eigenvalue equation. Since (αij) is an Hermitian


matrix, there should be m eigenvalues cμ some of which may be identical (the
degenerate case). Moreover, we can always decide m orthonormal column vectors
by solving (14.119). We denote them by γ (μ) (1 ≤ μ ≤ m) that belong to cμ. Rewriting
(14.119), we get

ðμÞ ð μÞ
α11 ⋯ α1m γ1 γ1
⋮ ⋱ ⋮ ⋮ = cμ ⋮ : ð14:120Þ
αm1 ⋯ αmm γ ðmμÞ γ ðmμÞ

Equation (14.120) implies that we can construct a (m, m) unitary matrix from the
m orthonormal column vectors γ (μ). Using the said unitary matrix, we will be able to
diagonalize (αij) according to Theorem 14.5.
Having determined m eigenvectors γ (μ) (1 ≤ μ ≤ m), we can construct a set of
eigenvectors such that

ð μÞ m ð μÞ ðk Þ
yi = γ
k=1 k
xi : ð14:121Þ

ð μÞ
Finally let us confirm that j yi ið1 ≤ μ ≤ mÞ in fact constitute an orthonormal
basis. To show this, we have

ðνÞ ð μÞ m ðνÞ ðk Þ m ðμÞ ðlÞ


yi jyi = γ xi j
k=1 k
γ xi
l=1 l

m m ðνÞ  ðμÞ ðk Þ ðlÞ m m ðνÞ  ðμÞ


= k=1 l=1
γk γl xi jxi = k=1 l=1
γk γ l δkl

m ðνÞ ðμÞ
= k=1
γk γ k = δνμ : ð14:122Þ

The last equality comes from the fact that a matrix comprising m orthonormal
ðμÞ
column vectors γ (μ) (1 ≤ μ ≤ m) forms a unitary matrix. Thus, j yi ið1 ≤ μ ≤ mÞ
certainly constitute an orthonormal basis.
The above completes the proof.
14.6 Simultaneous Eigenstates and Diagonalization 603

Theorem 14.14 can be restated as follows: Two Hermitian matrices B and C can
be simultaneously diagonalized by a unitary similarity transformation. As mentioned
above, we can construct a unitary matrix U such that

ð1Þ ðmÞ
γ1 ⋯ γ1
U= ⋮ ⋱ ⋮ :
γ ðm1Þ ⋯ γ ðmmÞ

Then, using U, matrices B and C are diagonalized such that

bi c1
bi c2
U { BU = , U { CU = : ð14:123Þ
⋱ ⋱
bi cm

Note that both B and C are represented in an invariant subspace Wm.


As we have already seen in Part I that dealt with an eigenvalue problem of a
hydrogen-like atom, squared angular momentum (L2) and z-component of angular
momentum (Lz) possess a mutual set of eigenvectors and, hence, their eigenvalues
are determined at once. Related matrix representations to (14.123) were given in
(3.159). Conversely, this was not the case with a set of operators Lx, and Ly, and Lz;
see (3.30). Yet, we pointed out the exceptional case where these three operators
along with L2 take an eigenvalue zero in common which an eigenstate Y 00 ðθ, ϕÞ 
1=4π corresponds to. Nonetheless, no complete orthonormal set of common
eigenvectors exists with the set of operators Lx, and Ly, and Lz. This fact is equivalent
to that these three operators are non-commutative among them. In contrast, L2 and Lz
share a complete orthonormal set of common eigenvectors and, hence, are
commutable.
ð1Þ ð2Þ ðmÞ
Notice that C jxi i , C jxi i , ⋯, and C jxi i are not necessarily line-
arly independent (see Sect. 11.4). Suppose that among m eigenvalues cμ (1 ≤ μ ≤ m),
some cμ = 0. Then, detC = 0 according to (13.48) and (14.123). This means that C is
ð1Þ ð2Þ ðmÞ
singular. In that case, C jxi i , C jxi i , ⋯, and C jxi i are linearly
dependent. In Sect. 3.3, in fact, we had j Lz Y 00 ðθ, ϕÞ = j L2 Y 00 ðθ, ϕÞ = 0. But this
special situation does not affect the proof of Theorem 14.14.
We know that any matrix A can be decomposed such that

1 1
A= A þ A{ þ i A - A{ , ð14:124Þ
2 2i

where we put B  12 A þ A{ and C  2i1 A - A{ ; both B and C are Hermitian. That


is, any matrix A can be decomposed to two Hermitian matrices in such a way that
604 14 Hermitian Operators and Unitary Operators

A = B þ iC: ð14:125Þ

Note here that B and C commute if and only if A and A{ commute, that is A is a
normal matrix. In fact, from (14.125) we get

AA{ - A{ A = 2iðCB - BC Þ: ð14:126Þ

From (14.126), if and only if B and C commute, AA{ - A{A = 0, i.e., AA{ = A{A.
This indicates that A is a normal matrix.
Taking account of the above context along with Theorem 14.14, we see that we
can construct the diagonalizing matrix of A from the complete orthonormal set of
eigenvectors common to B and C. This immediately implies that the diagonalizing
matrix is unitary. In other words, with respect to a normal operator A on Vn a
diagonalizing unitary matrix U must exist. On top of it, such matrix
U diagonalizes the Hermitian matrices B and C at once. Then, we have

U { AU = U { BU þ iU { CU
λ1 ⋯ μ1 ⋯ λ1 þ iμ1 ⋯
= ⋮ ⋱ ⋮ þi ⋮ ⋱ ⋮ = ⋮ ⋱ ⋮ ,
⋯ λn ⋯ μn ⋯ λn þ iμn
ð14:127Þ

where real eigenvalues λ1, ⋯, λn of B as well as those μ1, ⋯, μn of C are possibly


degenerate. The complex eigenvalues λ1 + iμ1, ⋯, λn + iμn are possibly degenerate
accordingly. Then, (14.127) can be rewritten as

ω1

ω1
ω2

U { AU = , ð14:128Þ
ω2

ωs

ωs

where ωl (1 ≤ l ≤ s ≤ n) denotes different complex eigenvalues possibly with several


of them degenerate. Equation (14.128) is equivalent to (14.71). Considering (14.74)
and (14.128) represents the spectral decomposition.
Following the argument made in Sect. 14.3, especially including (14.71)–(14.74),
we acknowledge that the above discussion is equivalent to the implication of
Theorem 14.5. Moreover, (14.12) implies that even with eigenvectors belonging to
References 605

a degenerate eigenvalue, we can choose mutually orthogonal (unit) vectors. Thus,


we recognize that Theorems 14.5, 14.8, and 14.14 are mutually interrelated. These
theorems and related concepts can be interpreted in terms of the projection operators
and spectral decomposition. We will survey the related aspects in a broader context
from a quantum-mechanical point of view later (see Chap. 24).
In Sects. 14.1 and 14.3, we mentioned the spectral decomposition. There, we
showed a case where projection operators commute with one another; see (14.23).
Thus, in light of Theorem 14.14, those projection operators can be diagonalized at
once to be expressed as, e.g., (14.72). This is a conspicuous feature of the projection
operators.

References

1. Satake I-O (1975) Linear algebra (pure and applied mathematics). Marcel Dekker, New York
2. Satake I (1974) Linear algebra. Shokabo, Tokyo
3. Hassani S (2006) Mathematical physics. Springer, New York
4. Byron FW, Fuller RW (1992) Mathematics of classical and quantum physics. Dover, New York
5. Mirsky L (1990) An introduction to linear algebra. Dover, New York
Chapter 15
Exponential Functions of Matrices

In Chap. 12, we dealt with a function of matrices. In this chapter we study several
important definitions and characteristics of functions of matrices. If elements of
matrices consist of analytic functions of a real variable, such matrices are of
particular importance. For instance, differentiation can naturally be defined with
the functions of matrices. Of these, exponential functions of matrices are widely used
in various fields of mathematical physics. These functions frequently appear in a
system of differential equations. In Chap. 10, we showed that SOLDEs with suitable
BCs can be solved using Green’s functions. In the present chapter, in parallel, we
show a solving method based on resolvent matrices. The exponential functions of
matrices have broad applications in group theory that we will study in Part IV. In
preparation for it, we study how the collection of matrices forms a linear vector
space. In accordance with Chap. 13, we introduce basic notions of inner product and
norm to the matrices.

15.1 Functions of Matrices

We consider a matrix in which individual components are differentiable with respect


to a real variable such that

Aðt Þ = aij ðt Þ : ð15:1Þ

We define the differentiation as

dAðt Þ
A0 ðt Þ =  a0ij ðt Þ : ð15:2Þ
dt

© Springer Nature Singapore Pte Ltd. 2023 607


S. Hotta, Mathematical Physical Chemistry,
https://doi.org/10.1007/978-981-99-2512-4_15
608 15 Exponential Functions of Matrices

We have

½Aðt Þ þ Bðt Þ0 = A0 ðt Þ þ B0 ðt Þ, ð15:3Þ


½Aðt ÞBðt Þ0 = A0 ðt ÞBðt Þ þ Aðt ÞB0 ðt Þ: ð15:4Þ

To get (15.4), putting A(t)B(t)  C(t), we have

cij ðt Þ = aik ðt Þbkj ðt Þ, ð15:5Þ


k

where C(t) = (cij(t)), B(t) = (bij(t)). Differentiating (15.5), we get

c0ij ðt Þ = a0ik ðt Þbkj ðt Þ þ aik ðt Þb0kj ðt Þ, ð15:6Þ


k k

namely, we have C′(t) = A′(t)B(t) + A(t)B′(t), i.e., (15.4) holds.


Analytic functions are usually expanded as an infinite power series. As an
analogue, a function of a matrix is expected to be expanded as such. Among various
functions of matrices, exponential functions are particularly important and have a
wide field of applications. As in the case of a real or complex number x, the
exponential function of matrix A can be defined as

1 2 1
exp A  E þ A þ A þ ⋯ þ Aν þ ⋯, ð15:7Þ
2! ν!

where A is a (n, n) square matrix and E is a (n, n) identity matrix. Note that E is used
instead of a number 1 that appears in the power series expansion of expx described as

1 2 1
exp x  1 þ x þ x þ ⋯ þ xν þ ⋯:
2! ν!

Here we define the convergence of a series of matrix as in the following


definition:
Definition 15.1 Let A0, A1, ⋯, Aν, ⋯ [Aν  (aij, ν) (1 ≤ i, j ≤ n)] be a sequence of
matrices. Let aij, 0, aij, 1, ⋯, aij, ν, ⋯ be the corresponding sequence of each matrix
component. Let us define a series of matrices as A ~  ~aij = A0 þ A1 þ ⋯ þ Aν þ
⋯ and a series of individual matrix components as
aij  aij,0 þ aij,1 þ ⋯ þ aij,ν þ ⋯. Then, if each ~aij converges, we say that A
~ ~
converges.
Regarding the matrix notation in Definition 15.1, readers are referred to (11.38).
Now, let us show that the power series of (15.7) converges [1, 2].
15.1 Functions of Matrices 609

Let A be a (n, n) square matrix. Putting A = (aij) and max j aij j = M and
i, j
ðνÞ ðνÞ
defining A  aij , where aij is defined as a (i, j) component of Aν, we get
ν

ðνÞ
aij ≤ nν M ν ð1 ≤ i, j ≤ nÞ: ð15:8Þ

ð0Þ
Note that A0 = E. Hence, aij = δij . Then, if ν = 0, (15.8) routinely holds. When
ν = 1, (15.8) obviously holds as well from the definition of M and assuming n ≥ 2;
we are considering (n, n) matrices with n ≥ 2. We wish to show that (15.8) holds with
any ν (≥2) using mathematical induction. Suppose that (15.8) holds with ν = ν - 1.
That is,

ðν - 1Þ
aij ≤ nν - 1 M ν - 1 ð1 ≤ i, j ≤ nÞ:

Then, we have

ðνÞ n n ðν - 1Þ
ðAν Þij  aij = Aν - 1 A ij
= k=1
Aν - 1 ik
ðAÞkj = a
k = 1 ik
akj : ð15:9Þ

Thus, we get

ðνÞ n ðν - 1Þ n ðν - 1Þ
aij = a
k = 1 ik
akj ≤ k=1
aik  j akj j ≤ n  nν - 1 M ν - 1  M

= nν M ν : ð15:10Þ

Consequently (15.8) holds with ν = ν.


Meanwhile, we have a next relation such that

1 1 ν ν
enM = nM : ð15:11Þ
ν=0 ν!

The series (15.11) certainly converges; note that nM is not a matrix but a number.
From (15.10) we obtain

1 1 ðνÞ 1 1 ν ν
a ≤ n M = enM :
ν=0 ν! ij ν=0 ν!
ð νÞ
This implies that 1 1
ν = 0 ν! aij converges (i.e., absolute convergence [3]). Thus,
from Definition 15.1, (15.7) converges.
To further examine the exponential functions of matrices, in the following
proposition, we show that a collection of matrices forms a linear vector space.
610 15 Exponential Functions of Matrices

Proposition 15.1 Let M be a collection of all (n, n) square matrices. Then, M forms
a linear vector space.
Proof Let A = aij 2 M and B = bij 2 M be (n, n) square matrices. Then,
αA = (αaij) and A + B = (aij + bij) are again (n, n) square matrices, and so we
have αA 2 M and A þ B 2 M . Thus, M forms a linear vector space.
2
In Proposition 15.1, M forms a n2-th order vector space V n . To define an inner
ðk Þ
product in this space, we introduce j PðlÞ ðk, l = 1, 2, ⋯, nÞ as a basis set of n2
vectors. Now, let A = (aij) be a (n, n) square matrices. To explicitly show that A is a
vector in the inner product space, we express it as

n ðk Þ
j Ai = a
k,l = 1 kl
j PðlÞ :

Then, an adjoint matrix of jAi is written as

n ðk Þ
hA jj Ai{ = a
k,l = 1 kl

PðlÞ j , ð15:12Þ

ðk Þ ðk Þ
where PðlÞ j is an adjoint matrix of j PðlÞ . With the notations in (15.12), see Sects.
ðk Þ
1.4 and 13.3; for PðlÞ see Sect. 14.1. Meanwhile, let B = (bij) be another (n, n) square
matrix denoted by

n ðsÞ
j Bi = b
s,t = 1 st
j PðtÞ :

ðk Þ
Here, let us define an orthonormal basis set j PðlÞ and an inner product between
vectors described in a form of matrices such that

ðk Þ ðsÞ
PðlÞ jPðtÞ  δks δlt :

Then, from (15.12) the inner product between A and B is given by

n ðk Þ ðsÞ n
hAjBi  a b
k,l,s,t = 1 kl st
PðlÞ jPðtÞ = a b δ δ
k,l,s,t = 1 kl st ks lt
n
= a b :
k,l = 1 kl kl
ð15:13Þ

The relation (15.13) leads to a norm already introduced in (13.8). The norm of a
matrix is defined as
15.2 Exponential Functions of Matrices and Their Manipulations 611

n 2
kAk  hAjAi = i,j = 1
aij : ð15:14Þ

We have the following theorem regarding two matrices.


Theorem 15.1 Let A = (aij) and B = (bij) be two square matrices. Then, we have a
following inequality:

kABk ≤ kAk  kBk: ð15:15Þ

Proof Let C = AB. Then, from Cauchy–Schwarz inequality (Sect. 13.1), we get

2 2 2
kC k2 = cij = a b
k ik kj
≤ jaik j2 blj
i, j i, j i, j k l

2
= i,k
jaik j2 blj = kAk2  kBk2 : ð15:16Þ
j, l

This leads to (15.15)

15.2 Exponential Functions of Matrices and Their


Manipulations

Using Theorem 15.1, we explore important characteristics of exponential functions


of matrices. For this we have a next theorem.
Theorem 15.2 [4] Let A and B be mutually commutative matrices, i.e., AB = BA.
Then, we have

expðA þ BÞ = exp A exp B = exp B exp A: ð15:17Þ

Proof Since A and B are mutually commutative, (A + B)n can be expanded through
the binomial theorem such that

1 n Ak Bn - k
ðA þ BÞn = k = 0 k! ðn - k Þ!
: ð15:18Þ
n!

Meanwhile, with an arbitrary integer m, we have


612 15 Exponential Functions of Matrices

Fig. 15.1 Number of


possible combinations (k, l).
Such combinations are
indicated with green points.
The number of the green −1
points of upper left triangle −2
is m(m + 1)/2. That for lower
right triangle is m(m + 1)/2 ⋯⋯⋯
as well. The total number of
the green points is m(m + 1)
accordingly. Adapted from +1 ⋯
Yamanouchi T, Sugiura M
(1960) Introduction to
continuous groups (New
Mathematics Series 18: in


Japanese) [4], with the

⋯⋯⋯
permission of Baifukan Co.,
Ltd., Tokyo
0
0 +1

2m 1 m An m Bn
n = 0 n!
ðA þ B Þn = n = 0 n! n = 0 n!
þ Rm , ð15:19Þ

where Rm has a form of Ak Bl =k!l!, where the summation is taken over all
ðk , lÞ
combinations of (k, l ) that satisfy max(k, l) ≥ m + 1 and k + l ≤ 2m. The number of all
the combinations (k, l ) is m(m + 1) accordingly; see Fig. 15.1 [4]. We remark that if
in (15.19) we put m = 0, (15.19) trivially holds with Rm = 0 (i.e., zero matrix). The
matrix Rm, however, changes with increasing m, and so we must evaluate it at
m → 1. Putting max(| |A|| , | |B|| , 1) = C and using Theorem 15.1, we get

jjAjjk jjBjjl
jjRm jj ≤ : ð15:20Þ
k!l!
ðk, lÞ

In (15.20) min(k, l) = 0, and so k ! l ! ≥ (m + 1)!. Since ||A||k||B||l ≤ Ck + l ≤ C2m,


we have

jjAjjk jjBjjl mðm þ 1ÞC 2m C 2m


≤ = , ð15:21Þ
k!l! ðm þ 1Þ! ðm - 1Þ!
ðk , lÞ

where m(m + 1) in the numerator comes from the number of combinations (k, l ) as
pointed out above. From Stirling’s interpolation formula [5], we have
15.2 Exponential Functions of Matrices and Their Manipulations 613

p
Γðz þ 1Þ ≈ 2πe - z zzþ2 :
1
ð15:22Þ

Replacing z with an integer m - 1, we get


p
ΓðmÞ = ðm - 1Þ! ≈ 2π e - ðm - 1Þ ðm - 1Þm - 2 :
1
ð15:23Þ

Then, we have

C2m em - 1 C 2m
½RHS of ð15:21Þ = ≈ p
ðm - 1Þ! 2π ðm - 1Þm - 2
1

m - 12
C eC 2
=p ð15:24Þ
2πe m - 1

and

m - 12
C eC 2
lim p = 0: ð15:25Þ
m → 1 2πe m - 1

Thus, from (15.20) and (15.21), we get

lim kRm k = 0:
m→1
ð15:26Þ

Remember that kRmk = 0 if and only if Rm = 0 (zero matrix); see (13.4) and
(13.8) in Sect. 13.1. Finally, taking limit (m → 1) of both sides of (15.19), we
obtain

2m 1 m An m Bn
lim n = 0 n!
ðA þ BÞn = lim n = 0 n! n = 0 n!
þ Rm :
m→1 m→1

Considering (15.26), we get

2m 1 m An m Bn
lim n=0
ðA þ BÞn = lim n=0 n=0
: ð15:27Þ
m→1 n! m→1 n! n!

This implies that (15.17) holds. It is obvious that if A and B commute, then expA
and expB commute. That is, expA exp B = exp B exp A. These complete the proof.
From Theorem 15.2, we immediately get the following property.

ð1Þ ðexp AÞ - 1 = expð- AÞ: ð15:28Þ

To show this, it suffices to find that A and -A commute. That is, A(-A) = -
A2 = (-A)A. Replacing B with -A in (15.17), we have
614 15 Exponential Functions of Matrices

expðA - AÞ = exp 0 = E = exp A expð- AÞ = expð- AÞ exp A: ð15:29Þ

This leads to (15.28). Notice here that exp0 = E from (15.7). We further have
important properties of exponential functions of matrices.

ð2Þ P - 1 ðexp AÞP = exp P - 1 AP : ð15:30Þ

ð3Þ exp AT = ðexp AÞT : ð15:31Þ

ð4Þ exp A{ = ðexp AÞ{ : ð15:32Þ

To show (2) we have


n
P - 1 AP = P - 1 AP P - 1 AP ⋯ P - 1 AP

= P - 1 A PP - 1 A PP - 1 ⋯ PP - 1 AP

= P - 1 AEAE⋯EAP = P - 1 An P: ð15:33Þ

Applying (15.33)–(15.7), we have

2 1 n 1
exp P - 1 AP = E þ P - 1 AP þ P - 1 AP þ ⋯ þ P - 1 AP þ⋯
2! n!

A2 An
= P - 1 EP þ P - 1 AP þ P - 1 P þ ⋯ þ P-1 P þ ⋯
2! n!

A2 An
= P-1 E þ A þ þ ⋯ þ þ ⋯ P = P - 1 ðexp AÞP: ð15:34Þ
2! n!

With (3) and (4), use the following equations:

= ðA n Þ{ :
n
= ðAn ÞT and A{
n
AT ð15:35Þ

The confirmation of (15.31) and (15.32) is left for readers. For future purpose, we
describe other important theorem and properties of the exponential functions of
matrices.
Theorem 15.3 Let a1, a2, ⋯, an be eigenvalues of a (n, n) square matrix A. Then,
expa1, exp a2, ⋯, exp an are eigenvalues of expA.
Proof From Theorem 12.1 we know that every (n, n) square matrix can be
converted to a triangle matrix by similarity transformation. Also, we know that
eigenvalues of a triangle matrix are given by its diagonal elements (Sect. 12.1). Let
P be a non-singular matrix used for such a similarity transformation. Then, we have
15.2 Exponential Functions of Matrices and Their Manipulations 615


a1
P - 1 AP ¼ ⋱ : ð15:36Þ
0 an

Meanwhile, let T be a triangle matrix described by


t 11
T¼ ⋱ : ð15:37Þ
0 t nn

Then, from a property of the triangle matrix, we have


tm
11
Tm ¼ ⋱ :
0 tm
nn

Applying this property to (15.36), we get


am
1
m
P - 1 AP ¼ ⋱ ¼ P - 1 Am P, ð15:38Þ
0 am
n

where the last equality is due to (15.33). Summing (15.38) over m following (15.34),
we obtain another triangle matrix expressed as

P - 1 ðexp AÞP ¼ ð exp a1  ⋱0 exp an Þ: ð15:39Þ

Once again considering that eigenvalues of a triangle matrix are given by its
diagonal elements and that the eigenvalues are invariant under a similarity transfor-
mation, (15.39) implies that expa1, exp a2, ⋯, exp an are eigenvalues of expA.
These complete the proof.
A question of whether the eigenvalues are a proper eigenvalue or a generalized
eigenvalue is irrelevant to Theorem 15.3. In other words, the eigenvalue may be a
proper one or a generalized one (Sect. 12.6). That depends solely upon the nature of
A in question.
Theorem 15.3 immediately leads to a next important relation. Taking a determi-
nant of (15.39), we get

det P - 1 ðexp AÞP = detP - 1 detðexp AÞdetP = detðexp AÞ


= ðexp a1 Þðexp a2 Þ⋯ðexp an Þ = expða1 þ a2 þ ⋯ þ an Þ
= expðTrAÞ: ð15:40Þ
616 15 Exponential Functions of Matrices

To derive (15.40), refer to (11.59) and (12.10) as well. Since (15.40) represents an
important property of the exponential functions of matrices, we restate it separately
as a following formula [4]:

ð5Þ detðexp AÞ = expðTrAÞ: ð15:41Þ

We have examined general aspects until now. Nonetheless, we have not yet
studied the relationship between the exponential functions of matrices and specific
types of matrices so far. Regarding the applications of exponential functions of
matrices to mathematical physics, especially with the transformations of functions
and vectors, we frequently encounter orthogonal matrices and unitary matrices. For
this purpose, we wish to examine the following properties:
(6) Let A be a real skew-symmetric matrix. Then, expA is a real orthogonal matrix.
(7) Let A be an anti-Hermitian matrix. Then, expA is a unitary matrix.
To show (6), we note that a skew-symmetric matrix is described as

AT = - A or AT þ A = 0: ð15:42Þ

Then, we have ATA = - A2 = AAT and, hence, A and AT commute. Therefore,


from Theorem 15.2, we have

exp A þ AT = exp 0 = E = exp A exp AT = exp Aðexp AÞT , ð15:43Þ

where the last equality comes from Property (3) of (15.31). Equation (15.43) implies
that expA is a real orthogonal matrix.
Since an anti-Hermitian matrix A is expressed as

A{ = - A or A{ þ A = 0, ð15:44Þ

we have A{A = - A2 = AA{, and so A and A{ commute. Using (15.32), we have

exp A þ A{ = exp 0 = E = exp A exp A{ = exp Aðexp AÞ{ , ð15:45Þ

where the last equality comes from Property (4) of (15.32). This implies that expA is
a unitary matrix.
Regarding Properties (6) and (7), if A is replaced with tA (t : real), the relations
(15.42)–(15.45) are held unchanged. Then, Properties (6) and (7) are rewritten as
(6)′ Let A be a real skew-symmetric matrix. Then, exptA (t : real) is a real
orthogonal matrix.
(7)′ Let A be an anti-Hermitian matrix. Then, exptA (t : real) is a unitary matrix.
Now, let a function F(t) be expressed as F(t) = exp tx with real numbers t and x.
Then, we have a well-known formula
15.2 Exponential Functions of Matrices and Their Manipulations 617

d
F ðt Þ = xF ðt Þ: ð15:46Þ
dt

Next, we extend (15.46) to the exponential functions of matrices. Let us define the
differentiation of an exponential function of a matrix with respect to a real parameter
t. Then, in concert with (15.46), we have a following important theorem.
Theorem 15.4 [1, 2] Let F(t)  exp tA with t being a real number and A being a
matrix that does not depend on t. Then, we have

d
F ðt Þ = AF ðt Þ: ð15:47Þ
dt

In (15.47) we assume that individual matrix components of F(t) are differentiable


with respect to t and that the differentiation of a matrix is defined as in (15.2).
Proof We have

F ðt þ Δt Þ = expðt þ Δt ÞA = expðtA þ ΔtAÞ = ðexp tAÞðexpΔtAÞ


= F ðt ÞF ðΔt Þ = ðexpΔtAÞðexp tAÞ = F ðΔt ÞF ðt Þ, ð15:48Þ

where we considered that tA and ΔtA are commutative and used Theorem 15.2.
Therefore, we get

1 1 1 1 1
½F ðt þ Δt Þ - F ðt Þ = ½F ðΔt Þ - E F ðt Þ = ðΔtAÞν - E F ðt Þ
Δt Δt Δt ν=0 ν!
1 1
= ðΔt Þν - 1 Aν F ðt Þ: ð15:49Þ
ν=1 ν!

Taking the limit of both sides of (15.49), we get

1 d
lim ½F ðt þ Δt Þ - F ðt Þ  F ðt Þ = AF ðt Þ:
Δt → 0 Δt dt

Notice that in (15.49) only the first term (i.e., ν = 1) of RHS is non-vanishing with
respect to Δt → 0. Then, (15.47) will follow. This completes the proof.
On the basis of the power series expansion of the exponential function of a matrix
defined in (15.7), A and F(t) are commutative. Therefore, instead of (15.47) we may
write

d
F ðt Þ = F ðt ÞA: ð15:50Þ
dt

Using Theorem 15.4, we show the following important properties. These are
converse propositions to Properties (6) and (7).
618 15 Exponential Functions of Matrices

(8) Let exptA be a real orthogonal matrix with any real number t. Then, A is a real
skew-symmetric matrix.
(9) Let exptA be a unitary matrix with any real number t. Then, A is an anti-
Hermitian matrix.
To show (8), from the assumption with any real number t, we have

exp tAðexp tAÞT = exp tA exp tAT = E: ð15:51Þ

Differentiating (15.51) with respect to t by use of (15.4), we have

ðAexp tAÞ exp tAT þ ðexp tAÞAT exp tAT = 0: ð15:52Þ

Since (15.52) must hold with any real number, (15.52) must hold with t → 0 as
well. On this condition, from (15.52) we get A + AT = 0. That is, A is a real skew-
symmetric matrix.
In the case of (9), similarly we have

exp tAðexp tAÞ{ = exp tA exp tA{ = E: ð15:53Þ

Differentiating (15.53) with respect to t and taking t → 0, we get A + A{ = 0. That


is, A is an anti-Hermitian matrix.
Properties (6)–(9) including Properties (6)′ and (7)′ are frequently used later in
Chap. 20 in relation to Lie groups and Lie algebras.

15.3 System of Differential Equations

In this section we describe important applications of the exponential functions of


matrices to systems of differential equations.

15.3.1 Introduction

In Chap. 10 we have dealt with SOLDEs using Green’s functions. In the present
section, we examine the properties of systems of differential equations. There is a
close relationship between the two topics. First we briefly outline it.
We describe an example of the system of differential equations. First, let us think
of the following general equation:

xð_t Þ = aðt Þxðt Þ þ bðt Þyðt Þ, ð15:54Þ


15.3 System of Differential Equations 619

yð_t Þ = cðt Þxðt Þ þ dðt Þyðt Þ, ð15:55Þ

where x(t) and y(t) vary as a function of t; a(t), b(t), c(t) and d(t) are coefficients.
Differentiating (15.54) with respect to t, we get

xð€t Þ = að_t Þxðt Þ þ aðt Þxð_t Þ þ bð_t Þyðt Þ þ bðt Þyð_t Þ

= að_t Þxðt Þ þ aðt Þxð_t Þ þ bð_t Þyðt Þ þ bðt Þ½ cðt Þxðt Þ þ d ðt Þyðt Þ

= að_t Þ þ bðt Þcðt Þ xðt Þ þ aðt Þxð_t Þ þ bð_t Þ þ bðt Þdðt Þ yðt Þ, ð15:56Þ

where with the second equality we used (15.55). To delete y(t) from (15.56), we
multiply b(t) on both sides of (15.56) and multiply bð_t Þ þ bðt Þdðt Þ on both sides of
(15.54) and further subtract the latter equation from the former. As a result, we
obtain

b€x - b_ þ ða þ dÞb x_ þ a b_ þ bd - bða_ þ bcÞ x = 0: ð15:57Þ

Upon the above derivation, we assume b ≠ 0. Solving (15.57) and substituting the
resulting solution for (15.54), we can get a solution for y(t). If, on the other hand,
b = 0, we have xð_t Þ = aðt Þxðt Þ. This is a simple FOLDE and can readily be integrated
to yield

t
x = C exp aðt 0 Þdt 0 , ð15:58Þ

where C is an integration constant.


Meanwhile, differentiating (15.55) with respect to t, we have

_ þ d y_ = c_ x þ cax þ dy
€y = c_ x þ c_x þ dy _ þ dy_
t
= ðc_ þ caÞ C exp aðt 0 Þdt 0 _ þ d y,_
þ dy

where we used (15.58) as well as x_ = ax. Thus, we are going to solve a following
inhomogeneous SOLDE:

t
_ = ðc_ þ caÞC exp
€y - d y_ - dy aðt 0 Þdt 0 : ð15:59Þ

In Sect. 10.2 we have shown how we are able to solve an inhomogeneous FOLDE
using a weight function. For a later purpose, let us consider a next FOLDE assuming
a(x)  1 in (10.23):
620 15 Exponential Functions of Matrices

xð_t Þ þ pðt Þx = qðt Þ: ð15:60Þ

As another method to solve (15.60), we examine the method of variation of


constants. The homogeneous equation corresponding to (15.60) is

xð_t Þ þ pðt Þx = 0: ð15:61Þ

It can be solved as just before such that

t
xðt Þ = C exp - pðt 0 Þdt 0  uðt Þ: ð15:62Þ

Now we assume that a solution of (15.60) can be sought by putting

xðt Þ = k ðt Þuðt Þ, ð15:63Þ

where we assume that the functional form u(t) remains unchanged in the inhomo-
geneous equation (15.60) and that instead the “constant” k(t) may change as a
function of t. Inserting (15.63) into (15.60), we have

_ þ ku_ þ pku = ku
x_ þ px = ku _ þ kðu_ þ puÞ = ku
_ þ k ð- pu þ puÞ
_ = q,
= ku ð15:64Þ

where with the third equality we used (15.61) and (15.62) to get u_ = - pu. In this
way, (15.64) can easily be integrated to give

t
qð t 0 Þ 0
k ðt Þ = dt þ C,
uð t 0 Þ

where C is an integration constant. Thus, as a solution of (15.60), we have

t
qðt 0 Þ 0
xðt Þ = k ðt Þuðt Þ = uðt Þ dt þ Cuðt Þ: ð15:65Þ
uðt 0 Þ

Comparing (15.65) with (10.29), we notice that these two expressions are related.
In other words, the method of variation of constants in (15.65) is essentially the same
as that using the weight function in (10.29).
Another important point to bear in mind is that in Chap. 10 we have solved
SOLDEs of constant coefficients using the method of Green’s functions. In that case,
we have separately estimated a homogeneous term and inhomogeneous term (i.e.,
surface term). In the present section, we are going to seek a method corresponding to
that using Green’s functions. In the above discussion, we see that a system of
differential equations with two unknows can be translated into a SOLDE. Then,
15.3 System of Differential Equations 621

we expect such a system of differential equations of constant coefficients to be


solved somehow or other. We will hereinafter describe it in detail giving examples.

15.3.2 System of Differential Equations in a Matrix Form:


Resolvent Matrix

The equations of (15.54) and (15.55) are unified in a homogeneous single matrix
equation such that

að t Þ cð t Þ
xð_t Þ yð_t Þ = ðxðt Þ yðt ÞÞ : ð15:66Þ
bð t Þ d ðt Þ

In (15.66), x(t) and y(t) and their derivatives represent the functions (of t), and so
we describe them as row matrices. With this expression of equation, see Sect. 11.2
and (11.37) therein. Another expression is a transposition of (15.66) described by

xð_t Þ að t Þ bð t Þ xð t Þ
= : ð15:67Þ
yð_t Þ cð t Þ d ð t Þ yð t Þ

Notice that the coefficient matrix has been transposed in (15.67) accordingly. We
are most interested in an equation of constant coefficient, and hence, we rewrite
(15.66) once again explicitly as

a c
xð_t Þ yð_t Þ = ðxðt Þ yðt ÞÞ ð15:68Þ
b d

or

d a c
ð xð t Þ yð t Þ Þ = ð xð t Þ yð t Þ Þ : ð15:69Þ
dt b d

Putting F(t)  (x(t) y(t)), we have

dF ðt Þ a c
= F ðt Þ : ð15:70Þ
dt b d

This is exactly the same as (15.50) if we put


622 15 Exponential Functions of Matrices

a c
A= : ð15:71Þ
b d

Then, we get

F ðt Þ = exp tA: ð15:72Þ

As already shown in (15.7) of Sect. 15.1, an exponential function of a matrix, i.e.,


exptA, can be defined as

1 1
exp tA  E þ tA þ ðtAÞ2 þ ⋯ þ ðtAÞν þ ⋯: ð15:73Þ
2! ν!

From Property (5) of (15.41), we have det(expA) ≠ 0. This is because no matter


what number (either real or complex) TrA may take, exp t(TrA) never vanishes. That
is, F(t) = exp A is non-singular, and so an inverse matrix of expA must exist. This is
the case with exptA as well; see equation below:

ð5Þ0 detðexp tAÞ = exp TrðtAÞ = exp t ðTrAÞ: ð15:74Þ

Note that if A (or tA) is a (n, n) matrix, F(t) of (15.72) is a (n, n) matrix as well.
Since in that general case F(t) is non-singular, it consists of n linearly independent
column vectors, i.e., (n, 1) matrices; see Chap. 11. Then, F(t) is symbolically
described as

F ðt Þ = ðx1 ðt Þ x2 ðt Þ ⋯ xn ðt ÞÞ,

where xi(t) (1 ≤ i ≤ n) represents a i-th column vector. Note that since F(0) = E, xi(0) has
its i-th row component of 1, otherwise 0. Since F(t) is non-singular, n column vectors
xi(t) (1 ≤ i ≤ n) are linearly independent. In the case of, e.g., (15.66), A is a (2, 2)
matrix, which implies that x(t) and y(t) represent two linearly independent column
vectors, i.e., (2, 1) matrices. Furthermore, if we look at the constitution of (15.70), x(t)
and y(t) represent two linearly independent solutions of (15.70). We will come back to
this point later.
On the basis of the method of variation of constants, we deal with inhomogeneous
equations and describe below the procedures for solving the system of equations
with two unknowns. But, they can be readily conformed to the general case where
the equations with n unknowns are dealt with.
Now, we rewrite (15.68) symbolically such that

xð_t Þ = xðt ÞA, ð15:75Þ

where x(t)  (x(t) y(t)), xð_t Þ  xð_t Þ yð_t Þ , and A  a


b
c
d
. Then, the inhomo-
geneous equation can be described as
15.3 System of Differential Equations 623

xð_t Þ = xðt ÞA þ bðt Þ, ð15:76Þ

where we define the inhomogeneous term as b  (p q). This is equivalent to

a c
xð_t Þ yð_t Þ = ðxðt Þ yðt ÞÞ þ ðp qÞ: ð15:77Þ
b d

Using the same solution F(t) that appears in the homogeneous equation, we
assume that the solution of the inhomogeneous equation is described by

xðt Þ = kðt ÞF ðt Þ, ð15:78Þ

where k(t) is a variable “constant.” Replacing x(t) in (15.76) with x(t) = k(t)F(t) of
(15.78), we obtain

_ þ kF_ = kF
x_ = kF _ þ kFA = kFA þ b, ð15:79Þ

where with the second equality we used F_ = FA in (15.50). Then, we have

_ = b:
kF ð15:80Þ

This can be integrated so that we may have

t
k= ds bðsÞF ðsÞ - 1 : ð15:81Þ

As previously noted, F(s)-1 exists because F(s) is non-singular. Thus, as a


solution we get

t
xðt Þ = kðt ÞF ðt Þ = ds bðsÞF ðsÞ - 1 F ðt Þ : ð15:82Þ
t0

Here we define

Rðs, t Þ  F ðsÞ - 1 F ðt Þ: ð15:83Þ

The matrix Rðs, t Þ is said to be a resolvent matrix [6]. This matrix plays an
essential role in solving the system of differential equations both homogeneous and
inhomogeneous with either homogeneous or inhomogeneous boundary conditions
(BCs). Rewriting (15.82), we get
624 15 Exponential Functions of Matrices

t
xðt Þ = kðt ÞF ðt Þ = ds½bðsÞRðs, t Þ : ð15:84Þ
t0

We summarize principal characteristics of the resolvent matrix.

ð1Þ Rðs, sÞ = F ðsÞ - 1 F ðsÞ = E, ð15:85Þ

where E is a (2, 2) unit matrix, i.e., E = 1


0
0
1
for an equation having two
unknowns.

-1
ð2Þ Rðs, t Þ - 1 = F ðsÞ - 1 F ðt Þ = F ðt Þ - 1 F ðsÞ = Rðt, sÞ: ð15:86Þ

ð3Þ Rðs, t ÞRðt, uÞ = F ðsÞ - 1 F ðt ÞF ðt Þ - 1 F ðuÞ = F ðsÞ - 1 F ðuÞ = Rðs, uÞ: ð15:87Þ

It is easy to include a term in solution related to the inhomogeneous BCs. As


already seen in (10.33) of Sect. 10.2, a boundary condition for the FOLDEs is set
such that, e.g.,

uðaÞ = σ, ð15:88Þ

where u(t) is a solution of a FOLDE. In that case, the BC can be translated into the
“initial” condition. In the present case, we describe the BCs as

xðt 0 Þ  x0  ðσ τÞ, ð15:89Þ

where σ  x(t0) and τ  y(t0). Then, the said term associated with the inhomoge-
neous BCs is expected to be written as

xðt 0 ÞRðt 0 , t Þ:

In fact, since xðt 0 ÞRðt 0 , t 0 Þ = xðt 0 ÞE = xðt 0 Þ, this satisfies the BCs. Then, the full
expression of the solution of the inhomogeneous equation that takes account of the
BCs is assumed to be expressed as

t
xðt Þ = xðt 0 ÞRðt 0 , t Þ þ ds½bðsÞRðs, t Þ: ð15:90Þ
t0

Let us confirm that (15.90) certainly gives a proper solution of (15.76). First we
give a formula on the differentiation of
15.3 System of Differential Equations 625

t
d
Qðt Þ = Pðt, sÞds, ð15:91Þ
dt t0

where P(t, s) stands for a (1, 2) matrix whose general form is described by (q(t, s)
r(t, s)), in which q(t, s) and r(t, s) are functions of t and s. We rewrite (15.91) as

tþΔ t
1
Qðt Þ = lim Pðt þ Δ, sÞds - Pðt, sÞds
Δ→0 Δ t0 t0
tþΔ t t
1
= lim Pðt þ Δ, sÞds - Pðt þ Δ, sÞds þ Pðt þ Δ, sÞds
Δ→0 Δ t0 t0 t0
t
- Pðt, sÞds
t0
tþΔ t t
1
= lim Pðt þ Δ, sÞds þ Pðt þ Δ, sÞds - Pðt, sÞds
Δ→0 Δ t t0 t0
t t
1 1
≈ lim Pðt, t ÞΔ þ lim Pðt þ Δ, sÞds - Pðt, sÞds
Δ→0 Δ Δ→0 Δ t0 t0
t
∂Pðt, sÞ
= Pðt, t Þ þ ds: ð15:92Þ
t0 ∂t

Replacing P(t, s) of (15.92) with bðsÞRðs, t Þ, we have

d t t
∂Rðs, t Þ
bðsÞRðs, t Þds = bðt ÞRðt, t Þ þ bðsÞ ds
dt t0 t0 ∂t
t
∂Rðs, t Þ
= bð t Þ þ bð s Þ ds, ð15:93Þ
t0 ∂t

where with the last equality we used (15.85). Considering (15.83) and (15.93) and
differentiating (15.90) with respect to t, we get

t
xð_t Þ = xðt0 ÞF ðt 0 Þ - 1 F ð_t Þ þ ds bðsÞF ðsÞ - 1 F ð_t Þ þ bðt Þ
t0
t
= xðt 0 ÞF ðt 0 Þ - 1 F ðt ÞA þ ds bðsÞF ðsÞ - 1 F ðt ÞA þ bðt Þ
t0
t
= xðt 0 ÞRðt 0 , t ÞA þ ds½bðsÞRðs, t ÞA  þ bðt Þ
t0
626 15 Exponential Functions of Matrices

t
= xðt 0 ÞRðt 0 , t Þ þ ds½bðsÞRðs, t Þ A þ bðt Þ
t0

= xðt ÞA þ bðt Þ, ð15:94Þ

where with the second equality we used (15.70) and with the last equality we used
(15.90). Thus, we have certainly recovered the original differential equation of
(15.76). Consequently, (15.90) is the proper solution for the given inhomogeneous
equation with inhomogeneous BCs.
The above discussion and formulation equally apply to the general case of the
differential equation with n unknowns, even though the calculation procedures
become increasingly complicated with the increasing number of unknowns.

15.3.3 Several Examples

To deepen our understanding of the essence and characteristics of the system of


differential equations, we deal with several examples regarding the equations of two
unknowns with constant coefficients. The constant matrices A of specific types (e.g.,
anti-Hermitian matrices and skew-symmetric matrices) have a wide field of appli-
cations in mathematical physics, especially in Lie groups and Lie algebras (see
Chap. 20). In the subsequent examples, however, we deal with various types of
matrices.
Example 15.1 Solve the following equation

xð_t Þ = x þ 2y þ 1,

yð_t Þ = 2x þ y þ 2, ð15:95Þ

under the BCs x(0) = c and y(0) = d.


In a matrix form, (15.95) can be written as

1 2
xð_t Þ yð_t Þ = ðx yÞ þ ð1 2Þ: ð15:96Þ
2 1

The homogeneous equation corresponding to (15.96) is expressed as

1 2
xð_t Þ yð_t Þ = ðx yÞ : ð15:97Þ
2 1

As discussed in Sect. 15.3.2, this can readily be solved to produce a solution F(t)
such that
15.3 System of Differential Equations 627

F ðt Þ = exp tA, ð15:98Þ

where A = 1
2
2
1
. Since A is a real symmetric (i.e., Hermitian) matrix, it should be
diagonalized by the unitary similarity transformation (see Sect. 14.3). For the
purpose of getting an exponential function of a matrix, it is easier to estimate
exptD (where D is a diagonal matrix) than to directly calculate exptA. That is, we
rewrite (15.97) as

xð_t Þ yð_t Þ = ðx yÞUU { 1


2
2
1
UU { , ð15:99Þ

where U is a unitary matrix, i.e., U{ = U-1.


Following routine calculation procedures (as in, e.g., Example 14.3), we readily
get a following diagonal matrix and the corresponding unitary matrix for the
diagonalization. That is, we obtain

1 1
p p
U= 2 2
1 1
-p p
2 2

and

1 2 -1 0
D = U -1 U = U - 1 AU = : ð15:100Þ
2 1 0 3

Or we have

A = UDU - 1 : ð15:101Þ

Hence, we get

F ðt Þ = exp tA = exp tUDU - 1 = U ðexp tDÞU - 1 , ð15:102Þ

where with the last equality we used (15.30). From Theorem 15.3, we have

e-t 0
exptD = : ð15:103Þ
0 e3t

In turn, we get
628 15 Exponential Functions of Matrices

1 1 1 1
p p p -p
2 2 e-t 0 2 2
F ðt Þ = 1 1 0 e3t 1 1
-p p p p
2 2 2 2
1 e - t þ e3t - e - t þ e3t
= : ð15:104Þ
2 - e - t þ e3t e - t þ e3t

With an inverse matrix, we have

1 e - 3t þ et e - 3t - e3t
F ðt Þ - 1 = : ð15:105Þ
2 e - 3t - e3t e - 3t þ et

Therefore, as a resolvent matrix, we get

Rðs; t Þ = F ðsÞ - 1 F ðt Þ
1 e - ðt - sÞ þ e3ðt - sÞ - e - ðt - sÞ þ e3ðt - sÞ
= : ð15:106Þ
2 - e - ðt - sÞ þ e3ðt - sÞ e - ðt - sÞ þ e3ðt - sÞ

Thus, as a solution of (15.95), we obtain

t
xðt Þ = xð0ÞRð0, t Þ þ ds½bðsÞRðs, t Þ, ð15:107Þ
0

where x(0) = (x(0) y(0)) = (c d) represents the BCs and b(s) = (1 2) comes from
(15.96). This can easily be calculated so that we can have

xð0ÞRð0, t Þ
1 1
= ðc - dÞe - t þ ðc þ dÞe3t ðd - cÞe - t þ ðc þ dÞe3t ð15:108Þ
2 2

and

t
1 -t 1 -t
ds½bðsÞRðs, t Þ = e þ e3t - 1 - e - e3t : ð15:109Þ
0 2 2

Note that the first component of (15.108) and (15.109) represents the
x component of the solution for (15.95) and that the second component represents
the y component. From (15.108), we have

xð0ÞRð0, 0Þ = xð0ÞE = xð0Þ = ðc dÞ: ð15:110Þ

Thus, we find that the BCs are certainly satisfied. Putting t = 0 in (15.109), we
have
15.3 System of Differential Equations 629

0
ds½bðsÞRðs, 0Þ = 0, ð15:111Þ
0

confirming that the second term of (15.107) vanishes at t = 0.


The summation of (15.108) and (15.109) gives an overall solution of (15.107).
Separating individual components of the solution, we describe them as

1
xð t Þ = ðc - d þ 1Þe - t þ ðc þ d þ 1Þe3t - 2 ,
2
1
yðt Þ = ðd - c - 1Þe - t þ ðc þ d þ 1Þe3t : ð15:112Þ
2

Moreover, differentiating (15.112) with respect to t, we recover the original form


of (15.95). The confirmation is left for readers.
Remarks on Example 15.1: We become aware of several points in Example 15.1.
1. The resolvent matrix given by (15.106) can be obtained by replacing t of (15.104)
with t - s. This is one of important characteristics of the resolvent matrix that is
derived from an exponential function of a constant matrix.
To see it, if A is a constant matrix, tA and -sA commute. By the definition (15.7)
of the exponential function, we have

1 2 2 1
t A þ ⋯ þ t ν Aν þ ⋯,
exp tA = E þ tA þ ð15:113Þ
2! ν!
1 1
expð- sAÞ = E þ ð- sAÞ þ ð- sÞ2 A2 þ ⋯ þ ð- sÞν Aν þ ⋯: ð15:114Þ
2! ν!

Both (15.113) and (15.114) are polynomials of a constant matrix A and, hence,
exptA and exp(-sA) commute as well. Then, from Theorem 15.2 and Property (1) of
(15.28), we get

expðtA - sAÞ = ðexp tAÞ½expð- sAÞ = ½expð- sAÞðexp tAÞ


= expð- sA þ tAÞ = exp½ðt - sÞA: ð15:115Þ

That is, using (15.28), (15.72), and (15.83), we get

Rðs, t Þ = F ðsÞ - 1 F ðt Þ = exp ðsAÞ - 1 expðtAÞ = expð- sAÞ expðtAÞ


= exp½ðt - sÞA = F ðt - sÞ: ð15:116Þ

This implies that once we get exptA, we can safely obtain the resolvent matrix by
automatically replacing t of (15.72) with t - s. Also, exptA and exp(-sA) are
commutative. In the present case, therefore, by exchanging the order of products
of F(s)-1 and F(t) in (15.106), we have the same result as (15.106) such that
630 15 Exponential Functions of Matrices

Rðs; t Þ = F ðt ÞF ðsÞ - 1
1 e - ðt - sÞ þ e3ðt - sÞ - e - ðt - sÞ þ e3ðt - sÞ
= : ð15:117Þ
2 - e - ðt - sÞ þ e3ðt - sÞ e - ðt - sÞ þ e3ðt - sÞ

2. The resolvent matrix of the present example is real symmetric. This is because if
A is real symmetric, i.e., AT = A, we have

n
ðAn ÞT = AT = An , ð15:118Þ

that is, An is real symmetric as well. From (15.7), expA is real symmetric accord-
ingly. In a similar manner, if A is Hermitian, expA is also Hermitian.
3. Let us think of a case where A is not a constant matrix but varies as a function of
t. In that case, we have

aðt Þ bðt Þ
A ðt Þ = : ð15:119Þ
cð t Þ d ð t Þ

Also, we have

að s Þ bð s Þ
AðsÞ = : ð15:120Þ
cðsÞ d ðsÞ

We can easily check that in a general case

Aðt ÞAðsÞ ≠ AðsÞAðt Þ, ð15:121Þ

namely, A(t) and A(s) are not generally commutative. The matrix tA(t) is not
commutative with -sA(s), either. In turn, (15.115) does not hold either. Thus, we
need an elaborated treatment for this.
Example 15.2 Find the resolvent matrix of the following homogeneous equation:

xð_t Þ = 0,

yð_t Þ = x þ y: ð15:122Þ

In a matrix form, (15.122) can be written as

xð_t Þ yð_t Þ = ðx yÞ 0
0
1
1
: ð15:123Þ
15.3 System of Differential Equations 631

The matrix A = 0
0
1
1
is characterized by an idempotent matrix (see Sect. 12.4).
As in the case of Example 15.1, we get

xð_t Þ yð_t Þ = ðx yÞPP - 1 0


0
1
1
PP - 1 , ð15:124Þ

-1
where we choose P = 1
0
1
1
and, hence, P - 1 = 1
0 1
. Notice that in this
example since A was not a symmetric matrix, we did not use a unitary matrix with the
similarity transformation. As a diagonal matrix D, we have

D = P-1 0
0
1
1
P = P - 1 AP = 0
0
0
1
: ð15:125Þ

Or we have

A = PDP - 1 : ð15:126Þ

Hence, we get

F ðt Þ = exp tA = exp tPDP - 1 = Pðexp tDÞP - 1 , ð15:127Þ

where we used (15.30) again. From Theorem 15.3, we have

1 0
exptD = : ð15:128Þ
0 et

In turn, we get

1 1 1 0 1 -1 1 - 1 þ et
F ðt Þ = = : ð15:129Þ
0 1 0 et 0 1 0 et

With an inverse matrix, we have

1 e-t - 1
F ðt Þ - 1 = : ð15:130Þ
0 e-t

Therefore, as a resolvent matrix, we get

1 - 1 þ et - s
Rðs; t Þ = F ðsÞ - 1 F ðt Þ = : ð15:131Þ
0 et - s

Using the resolvent matrix, we can include the inhomogeneous term along with
BCs (or initial conditions). Once again, by exchanging the order of products of F(s)-1
632 15 Exponential Functions of Matrices

and F(t) in (15.131), we have the same resolvent matrix. This can readily be checked.
Moreover, we get Rðs, t Þ = F ðt - sÞ.
Example 15.3 Solve the following equation

xð_t Þ = x þ c,

yð_t Þ = x þ y þ d, ð15:132Þ

under the BCs x(0) = σ and y(0) = τ. In a matrix form, (15.132) can be written as

xð_t Þ yð_t Þ = ðx yÞ 1
0
1
1
þ ðc d Þ: ð15:133Þ

The matrix of a type of M = 1


0
1
1
has been fully investigated in Sect. 12.6 and
characterized as a matrix that cannot be diagonalized. In such a case, we directly
calculate exptM. Here, remember that product of triangle matrices of the same kind
(i.e., an upper triangle matrix or a lower triangle matrix; see Sect. 12.1) is a triangle
matrix of the same kind.
We have

M2 = 1
0
1
1
1
0
1
1
= 1
0
2
1
, M3 = 1
0
2
1
1
0
1
1
= 1
0
3
1
, ⋯:

Repeating this, we get

Mn = 1
0
n
1
: ð15:134Þ

Thus, we have

1 2 2 1
exp tM = Rð0, t Þ = E þ tM þ t M þ ⋯ þ tν M ν þ ⋯
2! ν!
1 2 1
= 1 0
þt 1 1
þ t 1 2
þ ⋯ þ t ν 10 1ν þ ⋯
0 1 0 1 2! 0 1 ν!
1 1
et t 1 þ t þ t2 þ ⋯ þ tν - 1 þ ⋯ et tet
= 2! ðν - 1Þ! = : ð15:135Þ
0 et
0 et

With the resolvent matrix, we get


15.3 System of Differential Equations 633

et - s ðt - sÞet - s
Rðs; t Þ = expð - sM ÞexptM = : ð15:136Þ
0 et - s

Using (15.107), we get

t
xðt Þ = ðσ τÞRð0, t Þ þ ds½ðc dÞRðs, t Þ: ð15:137Þ
0

Thus, we obtain the solution described by

xðt Þ = ðc þ σ Þet - c,
yðt Þ = ðc þ σ Þtet þ ðd - c þ τÞet þ c - d: ð15:138Þ

From (15.138), we find that the original inhomogeneous equation (15.132) and
BCs of x(0) = σ and y(0) = τ have been recovered.
Example 15.4 Find the resolvent matrix of the following homogeneous equation:

xð_t Þ = 0,

yð_t Þ = x: ð15:139Þ

In a matrix form, (15.139) can be written as

xð_t Þ yð_t Þ = ðx yÞ 0
0
1
0
: ð15:140Þ

The matrix N = 0
0
1
0
is characterized by a nilpotent matrix (Sect. 12.3). Since
2
N and matrices of higher order of N vanish, we have

exp tN = E þ tN = 1
0
t
1
: ð15:141Þ

We have a corresponding resolvent matrix such that

t-s
Rðs, t Þ = 1
0 1
: ð15:142Þ

Using the resolvent matrix, we can include the inhomogeneous term along
with BCs.
Example 15.5 As a trivial case, let us consider a following homogeneous equation:
634 15 Exponential Functions of Matrices

xð_t Þ yð_t Þ = ðx yÞ a
0
0
b
, ð15:143Þ

where one of a and b can be zero or both of them can be zero. Even though (15.143)
merely apposes two FOLDEs, we can deal with such cases in an automatic manner.
From Theorem 15.3, as eigenvalues of expA where A = a0 0b , we have ea and eb.
That is, we have

eat
F ðt Þ = exp tA = 0
0
ebt
: ð15:144Þ

In particular, if a = b (including a = b = 0), we merely have the same FOLDE


twice changing the arguments x with y.
Readers may well wonder why we have to argue such a trivial case expressly.
That is because the theory we developed in this chapter holds widely and we are able
to make a clear forecast about a constitution of the solution for differential equations
of a first order. For instance, (15.144) immediately tells us that a fundamental
solution for

xð_t Þ = axðt Þ ð15:145Þ

is eat. We only have to perform routine calculations with a counterpart of x(t) [or y(t)]
of (x(t) y(t)) using (15.107) to solve an inhomogeneous differential equation
under BCs.
Returning back to Example 15.1, we had

xð_t Þ yð_t Þ = ðx yÞ 1
2
2
1
þ ð1 2Þ: ð15:96Þ

This equation can be converted to

1 2
xð_t Þ yð_t Þ U = ðx yÞUU - 1 U þ ð1 2ÞU, ð15:146Þ
2 1

p1 p1 p1 p1
where U = 2 2
. Then, ðx yÞU = ðx yÞ 2 2
. Defining (X Y)
- p12 p1
2
- p12 p1
2
as

1 1
p p
ðX Y Þ  ðx yÞU = ðx yÞ 2 2 , ð15:147Þ
1 1
-p p
2 2

we have
15.3 System of Differential Equations 635

-1 0
X ð_t Þ Y ð_t Þ = ðX Y Þ þ ð1 2ÞU: ð15:148Þ
0 3

Then, according to (15.90), we get a solution

t
~ ðt 0 , t Þ þ
X ðt Þ = X ðt 0 ÞR ds ~bðsÞR
~ ðs, t Þ , ð15:149Þ
t0

where X(t0) = (X(t0) Y(t0)) and ~ ~ ðs, t Þ is given


bðsÞ = bðsÞU. The resolvent matrix R
by

~ ðs; t Þ = e - ðt - sÞ 0
R : ð15:150Þ
0 e3ðt - sÞ

To convert (X Y) to (x y), operating U-1 on both sides of (15.149) from the


right, we obtain

t
~ ðt 0 , t ÞU - 1 þ
X ðt ÞU - 1 = X ðt 0 ÞU - 1 U R ds ~ ~ ðs, t Þ U - 1 , ð15:151Þ
bðsÞU - 1 U R
t0

where with the first and second terms we insert U-1U = E. Then, we get

t
~ ðt 0 , t ÞU - 1 þ
x ð t Þ = xð t 0 Þ U R ~ ðs, t ÞU - 1 :
dsbðsÞ U R ð15:152Þ
t0

~ ðs, t ÞU - 1 to have
We calculate U R

~ ðs; t ÞU - 1 = 1 1 1 e - ðt - sÞ 0 1 -1
UR
2 -1 1 0 e3ðt - sÞ 1 1
1 e - ðt - sÞ þ e3ðt - sÞ - e - ðt - sÞ þ e3ðt - sÞ
= = Rðs; t Þ: ð15:153Þ
2 - e - ðt - sÞ þ e3ðt - sÞ e - ðt - sÞ þ e3ðt - sÞ

Thus, we recover (15.90), i.e., we have

t
xðt Þ = xðt 0 ÞRðt 0 , t Þ þ ds½bðsÞRðs, t Þ: ð15:90Þ
t0

Equation (15.153) shows that Rðs, t Þ and R ~ ðs, t Þ are connected to each other
through the unitary similarity transformation such that
636 15 Exponential Functions of Matrices

~ ðs, t Þ = U - 1 Rðs, t ÞU = U { Rðs, t ÞU:


R ð15:154Þ

Example 15.6 Let us consider the following system of differential equations:

0 -ω
xð_t Þ yð_t Þ = ðx yÞ þ ða 0Þ: ð15:155Þ
ω 0

0 -ω
The matrix is characterized by an anti-Hermitian operator. This type
ω 0
of operator frequently appears in Lie groups and Lie algebras that we will deal with
in Chap. 20. Here we define an operator D as

0 -1 0 -ω
D= or ωD = : ð15:156Þ
1 0 ω 0

The calculation procedures will be seen in Chap. 20, and so we only show the
result here. That is, we have

cos ωt - sin ωt
exp tωD = : ð15:157Þ
sin ωt cos ωt

We seek the resolvent matrix Rðs, t Þ such that

cos ωðt - sÞ - sin ωðt - sÞ


Rðs; t Þ = : ð15:158Þ
sin ωðt - sÞ cos ωðt - sÞ

The implication of (15.158) is simple; synthesis of a rotation of ωt and the inverse


rotation -ωs is described by ω(t - s). Physically, (15.155) represents the motion of a
point mass under an external field.
We routinely obtained the above solution using (15.90). If, for example, we solve
(15.155) under the BCs for which the point mass is placed at rest at the origin of the
xy-plane at t = 0, we get a solution described by

a
x= sin ωt,
ω
a
y= ðcos ωt - 1Þ: ð15:159Þ
ω

Deleting t from (15.159), we get


15.4 Motion of a Charged Particle in Polarized Electromagnetic Wave 637

Fig. 15.2 Circular motion


of a point mass

− /

− /

a 2 a 2
x2 þ y þ = : ð15:160Þ
ω ω

In other words, the point mass performs a circular motion (see Fig. 15.2).
In Example 15.1 we mentioned that if the matrix A that defines the system of
differential equations varies as a function of t, then the solution method based upon
exptA does not work. Yet, we have an important case where A is not a constant
matrix. Let us consider a next illustration.

15.4 Motion of a Charged Particle in Polarized


Electromagnetic Wave

In Sect. 4.5 we dealt with an electron motion under a circularly polarized light. Here
we consider the motion of a charged particle in a linearly polarized
electromagnetic wave.
Suppose that a charged particle is placed in vacuum. Suppose also that an
electromagnetic wave (light) linearly polarized along the x-direction is propagated
in the positive direction of the z-axis. Unlike the case of Sect. 4.5, we have to take
account of influence from both of the electric field and magnetic field components of
Lorentz force.
From (7.58) we have
638 15 Exponential Functions of Matrices

E = E0 eiðkx - ωtÞ = E 0 e1 eiðkz - ωtÞ , ð15:161Þ

where e1, e2, and e3 are the unit vector of the x-, y-, and z-direction. (The latter two
unit vectors appear just below.) If the extent of motion of the charged particle is
narrow enough around the origin compared to a wavelength of the electromagnetic
field, we can ignore kz in the exponent (see Sect. 4.5). Then, we have

E ≈ E0 e1 e - iωt : ð15:162Þ

Taking a real part of (15.162), we get

E = E 0 e1 cos ωt: ð15:163Þ

Meanwhile, from (7.59) we have

H = H 0 eiðkz - ωtÞ ≈ H 0 cos ωt: ð15:164Þ

From (7.60) furthermore, we get

E0 E 0 e1 E 0 e2
H 0 = e3 × = e3 × = : ð15:165Þ
μ0 μ0 c μ0 c
ε0

Thus, as the magnetic flux density, we have

E 0 e2 cos ωt
B = μ0 H ≈ μ0 H 0 cos ωt = : ð15:166Þ
c

The Lorentz force F exerted on the charged particle is then described by

F = eE þ e x_ × B, ð15:167Þ

where x (=xe1 + ye2 + ze3) is a position vector of the charged particle having a
charge e. In (15.167) the first term represents the electric Lorentz force and the
second term shows the magnetic Lorentz force; see Sect. 4.5. Replacing E and B in
(15.167) with those in (15.163) and (15.166), we get

1 eE
F= 1- z_ eE 0 e1 cos ωt þ 0 x_ e3 cos ωt
c c
x = m€xe1 þ m€ye2 þ m€ze3 ,
= m€ ð15:168Þ

where m is a mass of the charged particle. Comparing each vector components, we


have
15.4 Motion of a Charged Particle in Polarized Electromagnetic Wave 639

1
m€x = 1 - z_ eE0 cos ωt,
c
m€y = 0,
eE0
m€z = x_ cos ωt: ð15:169Þ
c

Note that the Lorentz force is not exerted in the direction of the y-axis. Putting
a  eE0/m and b  eE0/mc, we get

€x = a cos ωt - b_z cos ωt,


€z = b_x cos ωt: ð15:170Þ

Further putting x_ = ξ and z_ = ζ, we obtain

ξ_ = - bζ cos ωt þ a cos ωt,


ζ_ = bξ cos ωt: ð15:171Þ

Writing (15.171) in a matrix form, we obtain a following system of inhomoge-


neous differential equations:

0 b cos ωt
ξð_t Þ ζ ð_t Þ = ðξ ζ Þ þ ða cos ωt 0Þ, ð15:172Þ
- b cos ωt 0

where the inhomogeneous term is given by (a cos ωt 0). First, we think of a


homogeneous equation described by

0 b cos ωt
ξð_t Þ ζ ð_t Þ = ðξ ζ Þ : ð15:173Þ
- b cos ωt 0

Meanwhile, we consider the following equation:

d cos f ðt Þ sin f ðt Þ - f ð_t Þ sin f ðt Þ f ð_t Þ cos f ðt Þ


=
dt - sin f ðt Þ cos f ðt Þ - f ð_t Þ cos f ðt Þ - f ð_t Þ sin f ðt Þ
cos f ðt Þ sin f ðt Þ 0 f ð_t Þ
= : ð15:174Þ
- sin f ðt Þ cos f ðt Þ - f ð_t Þ 0

Closely inspecting (15.173) and (15.174), we find that if we can decide f(t) so that
f(t) may satisfy
640 15 Exponential Functions of Matrices

f ð_t Þ = b cos ωt, ð15:175Þ

we might well be able to solve (15.172). In that case, moreover, we expect that two
linearly independent column vectors

cos f ðt Þ sin f ðt Þ
and ð15:176Þ
- sin f ðt Þ cos f ðt Þ

give a fundamental set of solutions of (15.172). A function f(t) that satisfies (15.175)
can be given by

b
f ðt Þ = sin ωt = ~
b sin ωt, ð15:177Þ
ω

where ~
b  b=ω. Notice that at t = 0 two column vectors of (15.176) give

1 0
0
and 1
,

if we choose f ðt Þ = ~
b sin ωt for f(t) in (15.176).
Thus, defining F(t) as

cos ~
b sin ωt sin ~b sin ωt
F ðt Þ  ð15:178Þ
-sin ~
b sin ωt cos ~b sin ωt

and taking account of (15.174), we recast the homogeneous equation (15.173) as

dF ðt Þ
= F ðt ÞA,
dt
0 b cos ωt
A : ð15:179Þ
- b cos ωt 0

Equation (15.179) is formally the same as (15.47), even though F(t) is not given
in the form of exptA. This enables us to apply the general scheme mentioned in Sect.
15.3.2 to the present case. That is, we are going to address the given problem using
the method of variation of constants such that

xðt Þ = kðt ÞF ðt Þ, ð15:78Þ

where we define x(t)  (ξ(t) ζ(t)) and k(t)  (u(t) v(t)) as a variable constant.
Hence, using (15.80)–(15.90), we should be able to find the solution of (15.172)
described by
15.4 Motion of a Charged Particle in Polarized Electromagnetic Wave 641

t
xðt Þ = xðt 0 ÞRðt 0 , t Þ þ ds½bðsÞRðs, t Þ, ð15:90Þ
t0

where b(s)  (a cos ωt 0), i.e., the inhomogeneous term in (15.172). The resolvent
matrix Rðs, t Þ is expressed as

cos ½f ðt Þ - f ðsÞ sin ½f ðt Þ - f ðsÞ


Rðs; t Þ = F ðsÞ - 1 F ðt Þ = , ð15:180Þ
- sin ½f ðt Þ - f ðsÞ cos ½f ðt Þ - f ðsÞ

where f(t) is given by (15.177). With F(s)-1, we simply take a transpose matrix of
(15.178), because it is an orthogonal matrix. Thus, we obtain

cos ~
b sin ωs - sin ~b sin ωs
F ðsÞ - 1 = :
sin ~
b sin ωs cos ~b sin ωs

The resolvent matrix (15.180) possesses the properties described as (15.85),


(15.86), and (15.87). These can readily be checked. Also, we become aware that
F(t) of (15.178) represents a rotation matrix in ℝ2; see Sect. 11.2. Therefore, F(t) and
F(s)-1 are commutative. Notice, however, that we do not have a resolvent matrix as
a simple form that appears in (15.116). In fact, from (15.178) we find
Rðs, t Þ ≠ F ðt - sÞ. It is because the matrix A given as (15.179) is not a constant but
depends on t.
Rewriting (15.90) as

t
xðt Þ = xð0ÞRð0, t Þ þ ds½ðacos ωt 0ÞRðs, t Þ ð15:181Þ
0

and setting BCs as x(0) = (σ τ), with the individual components, we finally obtain

a
ξðt Þ = σ cos ~b sin ωt - τ sin ~
b sin ωt þ sin ~b sin ωt ,
b
a a
ζ ðt Þ = σ sin ~b sin ωt þ τ cos ~b sin ωt - cos ~b sin ωt þ : ð15:182Þ
b b

Note that differentiating both sides of (15.182) with respect to t, we recover


original equation of (15.172) to be solved. Further setting ξ(0) = σ = ζ(0) = τ = 0 as
BCs, we have

a
ξðt Þ = sin ~
b sin ωt ,
b
a
ζ ðt Þ = 1 - cos ~
b sin ωt : ð15:183Þ
b
642 15 Exponential Functions of Matrices

Fig. 15.3 Velocities ξ and ζ


of a charged particle in the x-
and z-directions, /
respectively. The charged
particle exerts a periodic
motion under the influence
of a linearly polarized
electromagnetic wave /

Notice that the above BCs correspond to that the charged particle is initially
placed (at the origin) at rest (i.e., zero velocity). Deleting the argument t, we get

a 2 a 2
ξ2 þ ζ - = : ð15:184Þ
b b

Figure 15.3 depicts the particle velocities ξ and ζ in the x- and z-directions,
respectively. It is interesting that although the velocity in the x-direction switches
from negative to positive, that in the z-direction is kept nonnegative. This implies
that the particle is oscillating along the x-direction but continually drifting toward the
z-direction while oscillating. In fact, if we integrate ζ(t), we have a continually
drifting term ab t.
Figure 15.4 shows the Lorentz force F as a function of time in the case where the
charge of the particle is positive. Again, the electric Lorentz force (represented by E)
switches from positive to negative in the x-direction, but the magnetic Lorentz force
(represented by x_ × B) holds nonnegative in the z-direction all the time.
To precisely analyze the positional change of the particle, we need to integrate
(15.182) once again. This requires the detailed numerical calculation. For this
purpose the following relation is useful [7]:
1
cosðxsin t Þ = J ðxÞ cos mt,
m= -1 m
1
sinðxsin t Þ = J ðxÞ sin mt,
m= -1 m
ð15:185Þ

where the functions Jm(x) are called the Bessel functions that satisfy the following
equation:
15.4 Motion of a Charged Particle in Polarized Electromagnetic Wave 643

Fig. 15.4 Lorentz force as a


function of time. In (a) and
(a)
(b), the phase of
electromagnetic fields E and
B is reversed. As a result,
the electric Lorentz force
(represented by E) switches
from positive to negative in
the x-direction, but the ,
magnetic Lorentz force
(represented by x_ × B) holds
nonnegative all the time in
the z-direction

× ,

(b)

× ,
,

d2 J n ðxÞ 1 dJ n ðxÞ n2
þ þ 1 - J ðxÞ = 0: ð15:186Þ
dx2 x dx x2 n

Although we do not examine the properties of the Bessel functions in this book,
the related topics are dealt with in detail in the literature [5, 7–9]. The Bessel
functions are widely investigated as one of special functions in many branches of
mathematical physics.
To conclude this section, we have described an example to which the matrix A of
(15.179) that characterizes the system of differential equations is not a constant
matrix but depends on the parameter t. Unlike the case of the constant matrix A, it
644 15 Exponential Functions of Matrices

may well be difficult to find a fundamental set of solutions. Once we can find the
fundamental set of solutions, however, it is always possible to construct the resolvent
matrix and solve the problem using it. In this respect, we have already encountered a
similar situation in Chap. 10 where Green’s functions were constructed by a
fundamental set of solutions.
In this section, a fundamental set of solutions could be found and, as a conse-
quence, we could construct the resolvent matrix. It is, however, not always the case
that we can recast the system of differential equation (15.66) as the form of (15.179).
Nonetheless, whenever we are successful in finding the fundamental set of solutions,
we can construct a resolvent matrix and, hence, solve the problems. This is a generic
and powerful tool.

References

1. Satake I-O (1975) Linear algebra (Pure and applied mathematics). Marcel Dekker, New York
2. Satake I (1974) Linear algebra. Shokabo, Tokyo
3. Rudin W (1987) Real and complex analysis, 3rd edn. McGraw-Hill, New York
4. Yamanouchi T, Sugiura M (1960) Introduction to continuous groups. Baifukan, Tokyo
5. Dennery P, Krzywicki A (1996) Mathematics for physicists. Dover, New York
6. Inami T (1998) Ordinary differential equations. Iwanami, Tokyo
7. Riley KF, Hobson MP, Bence SJ (2006) Mathematical methods for physics and engineering, 3rd
edn. Cambridge University Press, Cambridge
8. Arfken GB, Weber HJ, Harris FE (2013) Mathematical methods for physicists, 7th edn. Aca-
demic Press, Waltham
9. Hassani S (2006) Mathematical physics. Springer, New York
Part IV
Group Theory and Its Chemical
Applications

The universe comprises space and matter. These two mutually stipulate their modal-
ity of existence. We often comprehend related various aspects as a manifestation of
symmetry. In this part we deal with the symmetry from a point of view of group
theory. In this part of the book, we outline and emphasize chemical applications of
the methods of mathematical physics. This part supplies us with introductory
description of group theory. The group theory forms an important field of both
pure and applied mathematics. Starting with the definition of groups, we cover a
variety of topics related to group theory. Of these, symmetry groups are familiar to
chemists, because they deal with a variety of matter and molecules that are charac-
terized by different types of symmetry. The symmetry group is a kind of finite group
and called a point group as well. Meanwhile, we have various infinite groups that
include rotation group as a typical example. We also mention an introductory theory
of the rotation group of SO(3) that deals with an important topic of, e.g., Euler
angles. We also treat successive coordinate transformations.
Next, we describe representation theory of groups. Schur’s lemmas and related
grand orthogonality theorem underlie the representation theory of groups. In parallel,
characters and irreducible representations are important concepts that support the
representation theory. We present various representations, e.g., regular representa-
tion, direct-product representation, symmetric and antisymmetric representations.
These have wide applications in the field of quantum mechanics and quantum
chemistry, and so forth.
On the basis of the above topics, we highlight quantum chemical applications of
group theory in relation to a method of molecular orbitals. As tangible examples, we
adopt aromatic molecules and methane.
The last part deals with the theory of continuous groups. The relevant theory has
wide applications in many fields of both pure and applied physics and chemistry. We
highlight the topics of SU(2) and SO(3) that very often appear there. Tangible
examples help understand the essence.
Chapter 16
Introductory Group Theory

A group comprises mathematical elements that satisfy four simple definitions. A


bunch of groups exists under these simple definitions. This makes the group theory a
discriminating field of mathematics. To get familiar with various concepts of groups,
we first show several tangible examples. Group elements can be numbers (both real
and complex) and matrices. More abstract mathematical elements can be included as
well. Examples include transformation, operation, etc. as already studied in previous
parts. Once those mathematical elements form a group, they share several common
notions such as classes, subgroups, and direct-product groups. In this context,
readers are encouraged to conceive different kinds of groups close to their heart.
Mapping is an important concept as in the case of vector spaces. In particular,
isomorphism and homomorphism frequently appear in the group theory. These
concepts are closely related to the representation theory that is an important pillar
of the group theory.

16.1 Definition of Groups

In contrast to a broad range of applications, the definition of the group is simple. Let
ℊ be a set of elements gν, where ν is an index either countable (e.g., integers) or
uncountable (e.g., real numbers) and the number of elements may be finite or
infinite. We denote this by ℊ = {gν}. If a group is a finite group, we express it as

ℊ = fg1 , g2 , ⋯, gn g, ð16:1Þ

where n is said to be an order of the group.


Definition of the group comprises the following four axioms with respect to a
well-defined “multiplication” rule between any pair of elements. The multiplication

© Springer Nature Singapore Pte Ltd. 2023 647


S. Hotta, Mathematical Physical Chemistry,
https://doi.org/10.1007/978-981-99-2512-4_16
648 16 Introductory Group Theory

is denoted by a symbol “⋄” below. Note that the symbol ⋄ implies an ordinary
multiplication, an ordinary addition, etc.
(A1) If a and b are any elements of the set ℊ, then so is a ⋄ b. (We sometimes say
that the set is “closed” regarding the multiplication.)
(A2) Multiplication is associative, i.e., a ⋄ (b ⋄ c) = (a ⋄ b) ⋄ c.
(A3) The set ℊ contains an element of e called the identity element such that we
have a ⋄ e = e ⋄ a = a with any element a of ℊ.
(A4) For any a of ℊ, we have an element b such that a ⋄ b = b ⋄ a = e. The
element b is said to be the inverse element of a. We denote b  a-1.
In the above definitions, we assume that the commutative law does not necessar-
ily hold, that is, a ⋄ b ≠ b ⋄ a. In that case the group ℊ is said to be a
non-commutative group. However, we have a case where the commutative law
holds, i.e., a ⋄ b = b ⋄ a. If so, the group ℊ is called a commutative group or an
Abelian group.
Let us think of some examples of groups. Henceforth, we follow the convention
and write ab to express a ⋄ b.
Example 16.1 We present several examples of groups below. Examples (1)–(4) are
simple, but Example (5) is general.

1. ℊ = {1, -1}. The group ℊ makes a group with respect to the multiplication. This
is an example of a finite group.
2. ℊ = {⋯, -3, -2, -1, 0, 1, 2, 3, ⋯}. The group ℊ makes a group with
respect to the addition. This is an infinite group. For instance, take a(>0) and
make a + 1 and make (a + 1) + 1, [(a + 1) + 1] + 1, ⋯ again and again. Thus,
addition is not closed and, hence, we must have an infinite group.
-1
3. Let us start with a matrix a = 0
1 0
. Then, the inverse a - 1 = a3 = 0
-1
1
0
.
-1
We have a2 = 0
0
-1
. Its inverse is a2 itself. These four elements make a
group. That is, ℊ = {e, a, a2, a3}. This is an example of cyclic groups.
4. ℊ = {1}. It is a most trivial case, but sometimes the trivial case is very important
as well. We will come back later to this point.
5. Let us think of a more general case. In Chap. 11, we discussed endomorphism on
a vector space and showed the necessary and sufficient condition for the existence
of an inverse transformation. In this relation, consider a set that comprises
matrices such that

GLðn, ℂÞ  A = aij ; i, j = 1, 2, ⋯, n, aij 2 ℂ, detA ≠ 0 :

This may be either a finite group or an infinite group. The former can be a
symmetry group and the latter can be a rotation group. This group is characterized
by a set of invertible and endomorphic linear transformations over a vector space Vn
and called a linear transformation group or a general linear group and denoted by
GL(n, ℂ), GL(Vn), GL(V ), etc. The relevant transformations are bijective. We can
16.2 Subgroups 649

Table 16.1 Multiplication ℊ e a b  a2 c  a3


table of ℊ = {e, a, a2, a3}
e e a b c
a a b c e
b b c e a
c c e a b

readily make sure that axioms (A1) to (A4) are satisfied with GL(n, ℂ). Here, the
vector space can be ℂn or a function space.
The structure of a finite group is tabulated in a multiplication table. This is made
up such that group elements are arranged in a first row and a first column and that an
intersection of an element gi in the row and gj in the column is designated as a
product gi ⋄ gj. Choosing the above (3) for an example, we make its multiplication
table (see Table 16.1). There we define a2 = b and a3 = c.
Having a look at Table 16.1, we notice that in the individual rows and columns
each group element appears once and only once. This is well known as a
rearrangement theorem.
Theorem 16.1 Rearrangement theorem [1] In each row or each column in the
group multiplication table, individual group elements appear once and only once.
From this, each row and each column list merely rearranged group elements.
Proof Let a set ℊ = {g1  e, g2, ⋯, gn} be a group. Arbitrarily choosing any
element h from ℊ and multiplying individual elements by h, we obtain a set
H = fhg1 , hg2 , ⋯, hgn g. Then, all the group elements of ℊ appear in H once and
only once. Choosing any group element gi, let us multiply gi by h-1 to get h-1gi.
Since h-1gi must be a certain element gk of ℊ, we put h-1gi = gk. Multiplying both
sides by h, we have gi = hgk. Therefore, we are able to find this very element hgk in
H, i.e., gi in H. This implies that the element gi necessarily appears in H. Suppose in
turn that gi appears more than once. Then, we must have gi = hgk = hgl (k ≠ l ).
Multiplying the relation by h-1, we would get h-1gi = gk = gl, in contradiction to the
supposition. This means that gi appears in H once and only once. This confirms that
the theorem is true of each row of the group multiplication table.
A similar argument applies with a set H0 = fg1 h, g2 h, ⋯, gn hg. This confirms in
turn that the theorem is true of each column of the group multiplication table. These
complete the proof.

16.2 Subgroups

As we think of subspaces in a linear vector space, we have subgroups in a group. The


definition of the subgroup is that a subset H of a group makes a group with respect to
the multiplication ⋄ that is defined for the group ℊ. The identity element makes a
650 16 Introductory Group Theory

group by itself. Both {e} and ℊ are subgroups as well. We often call subgroups other
than {e} and ℊ “proper” subgroups.
A necessary and sufficient condition for the subset H to be a subgroup is the
following:

ð1Þ hi , hj 2 H ⟹ hi ⋄hj 2 H ,
ð 2Þ h 2 H ⟹ h - 1 2 H :

If H is a subgroup of ℊ, it is obvious that the relations (1) and (2) hold.


Conversely, if (1) and (2) hold, H is a subgroup. In fact, (1) ensures the aforemen-
tioned relation (A1). Since H is a subset of ℊ, this guarantees the associative law
(A2). The relation (2) ensures (A4). Finally, in virtue of (1) and (2), h ⋄ h-1 = e is
contained in H ; this implies that (A3) is satisfied. Thus, H is a subgroup, because H
satisfies the axioms (A1) to (A4). Of the above examples, (3) has a subgroup
H = e, a2 .
It is important to decompose a set into subsets that do not mutually contain an
element (except for a special element) among them. We saw this in Part III when we
decomposed a linear vector space into subspaces. In that case the said special
element was a zero vector. Here let us consider a related question in its similar
aspects.
Let H = fh1  e, h2 , ⋯, hs g be a subgroup of ℊ. Also let us consider aH where

a 2 ℊ and a= 2H . Suppose that aH is a subset of ℊ such that
aH = fah1 , ah2 , ⋯, ahs g. Then, we have another subset H þ aH . If H contains
s elements, so does aH . In fact, if it were not the case, namely, if ahi = ahj,
multiplying both sides by a-1, we would have hi = hj, in contradiction. Next, let
us take b such that b= 2H and b= 2aH and make up bH and H þ aH þ bH
successively. Our question is whether these procedures decompose ℊ into subsets
mutually exclusive and collectively exhaustive.
Suppose that we can succeed in such a decomposition and get

ℊ = g1 H þ g2 H þ ⋯ þ gk H , ð16:2Þ

where g1, g2, ⋯, gk are mutually different elements with g1 being the identity e. In
that case (16.2) is said to be the left coset decomposition of ℊ by H . Similarly, right
coset decomposition can be done to give

ℊ = H g1 þ H g 2 þ ⋯ þ H g k : ð16:3Þ

In general, however,

gk H ≠ H gk or gk H gk- 1 ≠ H : ð16:4Þ

Taking the case of left coset as an example, let us examine whether different
cosets mutually contain a common element. Suppose that gi H and gj H mutually
16.3 Classes 651

contain a common element. Then, that element would be expressed as


gihp = gjhq (1 ≤ i, j ≤ n; 1 ≤ p, q ≤ s). Thus, we have gi hp hq- 1 = gj . Since H is a
subgroup of ℊ, hp hq- 1 2 H . This implies that gj 2 gi H . It is in contradiction to the
definition of left coset. Thus, we conclude that different cosets do not mutually
contain a common element.
Suppose that the order of ℊ and H is n and s, respectively. Different k cosets
comprise s elements individually and different cosets do not mutually possess a
common element and, hence, we must have

n = sk, ð16:5Þ

where k is called an index of H . We will have many examples afterward.

16.3 Classes

Another method to decompose a group into subsets is called conjugacy classes. A


conjugate element is defined as follows: Let a be an element arbitrarily chosen from
a group. Then an element gag-1 is called a conjugate element or conjugate to a. If
c is conjugate to b and b is conjugate to a, then c is conjugate to a. It is because

-1 -1 -1 -1
c = g0 bg0 , b = gag - 1 ⟹ c = g0 bg0 = g0 gag - 1 g0 = g0 gaðg0 gÞ : ð16:6Þ

In the above a set containing a and all the elements conjugate to a is said to be a
(conjugate) class of a. Denoting this set by ∁a, we have

∁a = a, g2 ag2- 1 , g3 ag3- 1 , ⋯, gn agn- 1 : ð16:7Þ

In ∁a the same element may appear repeatedly. It is obvious that in every group
the identity element e forms a class by itself. That is,

∁e = feg: ð16:8Þ

As in the case of the decomposition of a group into (left or right) cosets, we can
decompose a group to classes. If group elements are not exhausted by a set
comprising ∁e or ∁a, let us take b such that b ≠ e and b 2= ∁a and make ∁b similarly
to (16.7). Repeating this procedure, we should be able to decompose a group into
classes. In fact, if group elements have not yet exhausted after these procedures, take
remaining element z and make a class. If the remaining element is only z in this
moment, z can make a class by itself (as in the case of e). Notice that for an Abelian
group every element makes a class by itself. Thus, with a finite group, we have a
decomposition such that
652 16 Introductory Group Theory

ℊ = ∁e þ ∁a þ ∁b þ ⋯ þ ∁z : ð16:9Þ

To show that (16.9) is really a decomposition, suppose that, for instance, a set ∁a \ ∁b
is not an empty set and that x 2 ∁a \ ∁b. Then we must have α and β that satisfy
a following relation: x = αaα-1 = βbβ-1, i.e., b = β-1 αaα-1β = β-1 αa(β-1 α)-1.
This implies that b has already been included in ∁a, in contradiction to the
supposition. Thus, (16.9) is in fact a decomposition of ℊ into a finite number of
classes.
In the above we thought of a class conjugate to a single element. This notion can
be extended to a class conjugate to a subgroup. Let H be a subgroup of ℊ. Let g be
an element of ℊ. Let us now consider a set H 0 = gH g - 1 . The set H 0 is a subgroup of
ℊ and is called a conjugate subgroup.
In fact, let hi and hj be any two elements of H , that is, let ghig-1 and ghjg-1 be any
two elements of H 0 . Then, we have

ghi g - 1 ghj g - 1 = ghi hj g - 1 = ghk g - 1 , ð16:10Þ

where h k = hi h j 2 H . Hence, ghk g - 1 2 H 0 . Meanwhile,


-1 -1 -1 -1 0
ðghi g Þ = ghi g 2 H . Thus, conditions (1) and (2) of Sect. 16.2 are satis-
fied with H 0 . Therefore, H 0 is a subgroup of ℊ. The subgroup H 0 has the same order
as H . This is because with any two different elements hi and hj ghig-1 ≠ ghjg-1.
If for 8g2ℊ and a subgroup H we have a following equality

g - 1H g = H , ð16:11Þ

such a subgroup H is said to be an invariant subgroup. If (16.11) holds, H should be


a sum of classes (reader, please show this). A set comprising only the identity, i.e.,
{e}, forms a class. Therefore, if H is a proper subgroup, H must contain two or more
classes. The relation (16.11) can be rewritten as

gH = H g: ð16:12Þ

This implies that the left coset is identical to the right coset. Thus, as far as we are
dealing with a coset pertinent to an invariant subgroup, we do not have to distinguish
left and right cosets.
Now let us anew consider the (left) coset decomposition of ℊ by an invariant
subgroup H

ℊ = g1 H þ g2 H þ ⋯ þ gk H , ð16:13Þ

where we have H = fh1  e, h2 , ⋯, hs g. Then multiplication of two elements that


belong to the cosets gi H and gj H is expressed as
16.4 Isomorphism and Homomorphism 653

ðgi hl Þ gj hm = gi gj gj- 1 hl gj hm = gi gj gj- 1 hl gj hm = gi gj hp hm

= gi gj h q , ð16:14Þ

where the third equality comes from (16.11). That is, we should have ∃hp such that
gj- 1 hl gj = hp and hphm = hq. In (16.14) hα 2 H (α stands for l, m, p, q, etc. with
1 ≤ α ≤ s). Note that gi gj hq 2 gi gj H . Accordingly, a product of elements belonging
to gi H and gj H belongs to gi gj H . We rewrite (16.14) as a relation between the sets

ð g i H Þ g j H = g i gj H : ð16:15Þ

Viewing LHS of (16.15) as a product of two cosets, we find that the said product
is a coset as well. This implies that a collection of the cosets forms a group. Such a
group that possesses cosets as elements is said to be a factor group or quotient group.
In this context, the multiplication is a product of cosets. We denote the factor group
by

ℊ=H :

An identity element of this factor group is H . This is because in (16.15) putting


gi = e, we get H gj H = gj H . Alternatively, putting gj = e, we have
ðgi H ÞH = gi H . In (16.15) moreover putting gj = gi- 1 , we get

ðgi H Þ gi- 1 H = gi gi- 1 H = H : ð16:16Þ

Hence, ðgi H Þ - 1 = gi- 1 H . That is, the inverse element of gi H is gi- 1 H .

16.4 Isomorphism and Homomorphism

As in the case of the linear vector space, we consider the mapping between group
elements. Of these, the notion of isomorphism and homomorphism is important.
Definition 16.1 Let ℊ = {x, y, ⋯} and ℊ′ = {x′, y′, ⋯} be groups and let a
mapping ℊ → ℊ′ exist. Suppose that there is a one-to-one correspondence (i.e.,
injective mapping)

x $ x0 , y $ y0 , ⋯

between the elements such that xy = z implies that x′y′ = z′ and vice versa.
Meanwhile, any element in ℊ′ must be the image of some element of ℊ. That is,
the mapping is surjective as well and, hence, the mapping is bijective. Then, the two
654 16 Introductory Group Theory

groups ℊ and ℊ′ are said to be isomorphic. The relevant mapping is called an


isomorphism. We symbolically denote this relation by

ℊ ffi ℊ0 :

Note that the aforementioned groups can be either a finite group or an infinite
group. We did not designate identity elements. Suppose that x is the identity e. Then,
from the relations xy = z and x′y′ = z′, we have

ey = z = y, x0 y0 = z0 = y0 : ð16:17Þ

Then, we get

x0 = e0 , i:e:, e $ e0 : ð16:18Þ

Also let us put y = x-1. Then,

0
xx - 1 = z = e, x0 x - 1 = e0 , x0 y0 = z0 = e0 : ð16:19Þ

Comparing the second and third equations of (16.19), we get

-1 0
y0 = x 0 = x-1 : ð16:20Þ

The bijective character mentioned above can somewhat be loosened in such a


way that the one-to-one correspondence is replaced with n-to-one correspondence.
We have a following definition.
Definition 16.2 Let ℊ = {x, y, ⋯} and ℊ′ = {x′, y′, ⋯} be groups and let a
mapping ℊ → ℊ′ exist. Also let a mapping ρ: ℊ → ℊ′ exist such that with any
arbitrarily chosen two elements, the following relation holds:

ρðxÞρðyÞ = ρðxyÞ: ð16:21Þ

Then, the two groups ℊ and ℊ′ are said to be homomorphic. The relevant
mapping is called homomorphism. We symbolically denote this relation by

ℊ  ℊ0 :

In this case, we have

ρðeÞρðeÞ = ρðeeÞ = ρðeÞ, i:e:, ρðeÞ = e0 ,

where e′ is an identity element of ℊ′. Also, we have


16.4 Isomorphism and Homomorphism 655

ρðxÞρ x - 1 = ρ xx - 1 = ρðeÞ = e0 :

Therefore,

½ ρð xÞ  - 1 = ρ x - 1 :

The two groups can be either a finite group or an infinite group. Note that in the
above the mapping is not injective. The mapping may or may not be surjective.
Regarding the identity and inverse elements, we have the same relations as (16.18)
and (16.20). From Definitions 16.1 and 16.2, we say that the bijective homomor-
phism is the isomorphism.
Let us introduce an important notion of a kernel of a mapping. In this regard, we
have a following definition:
Definition 16.3 Let ℊ = {e, x, y, ⋯} and ℊ′ = {e′, x′, y′, ⋯} be groups and
let e and e′ be the identity elements. Suppose that there exists a homomorphic
mapping ρ: ℊ → ℊ′. Also let F be a subset of ℊ such that

ρðF Þ = e0 : ð16:22Þ

Then, F is said to be a kernel of ρ.


Regarding the kernel, we have following important theorems.
Theorem 16.2 Let ℊ = {e, x, y, ⋯} and ℊ′ = {e′, x′, y′, ⋯} be groups,
where e and e′ are identity elements. A necessary and sufficient condition for a
surjective and homomorphic mapping ρ : ℊ → ℊ′ to be isomorphic is that a kernel
F = feg.
Proof We assume that F = feg. Suppose that ρ(x) = ρ( y). Then, we have

ρðxÞ½ρðyÞ - 1 = ρðxÞρ y - 1 = ρ xy - 1 = e0 : ð16:23Þ

The first and second equalities result from the homomorphism of ρ. Since
F = feg, xy-1 = e, i.e., x = y. Therefore, ρ is injective (i.e., one-to-one correspon-
dence). As ρ is surjective from the assumption, ρ is bijective. The mapping ρ is
isomorphic accordingly.
Conversely, suppose that ρ is isomorphic. Also suppose for ∃x 2 ℊ ρ(x) = e′.
From (16.18), ρ(e) = e′. We have ρ(x) = ρ(e) = e′ ⟹ x = e due to the isomorphism
of ρ (i.e., one-to-one correspondence). This implies F = feg. This completes the
proof.
We become aware of close relationship between Theorem 16.2 and linear trans-
formation versus kernel already mentioned in Sect. 11.2 of Part III. Figure 16.1
shows this relationship. Figure 16.1a represents homomorphic mapping ρ: ℊ → ℊ′
between two groups, whereas Fig. 16.1b shows linear transformation (endomor-
phism) A: Vn → Vn in a vector space Vn.
656 16 Introductory Group Theory

Fig. 16.1 Mapping in a (a)


group and vector space. (a)
Homomorphic mapping ρ:
ℊ → ℊ′ between two
groups. (b) Linear
transformation
(endomorphism) A: Vn → Vn
in a vector space Vn
: isomorphism
(b)

: invertible (bijective)

Theorem 16.3 Suppose that there exists a homomorphic mapping ρ: ℊ → ℊ′,


where ℊ and ℊ′ are groups. Then, a kernel F of ρ is an invariant subgroup of ℊ.
Proof Let ki and kj be any two arbitrarily chosen elements of F . Then,

ρð k i Þ = ρ k j = e0 , ð16:24Þ

where e′ is the identity element of ℊ′. From (16.21) we have

ρ k i kj = ρðk i Þρ k j = e0 e0 = e0 : ð16:25Þ

Therefore, k i kj 2 F . Meanwhile, from (16.20) we have

-1
ρ ki- 1 = ½ρðk i Þ - 1 = e0 = e0 : ð16:26Þ

Then, ki- 1 2 F . Thus, F is a subgroup of ℊ.


Next, for 8g 2 ℊ we have

ρ gk i g - 1 = ρðgÞρðki Þρ g - 1 = ρðgÞe0 ρ g - 1 = e0 : ð16:27Þ

Accordingly we have gk i g - 1 2 F . Thus, gF g - 1 ⊂ F . Since g is chosen arbi-


trarily, replacing it with g-1, we have g - 1 F g ⊂ F . Multiplying g and g-1 on both
sides from the left and right, respectively, we get F ⊂ gF g - 1 . Consequently, we get

gF g - 1 = F : ð16:28Þ

This implies that F of ρ is an invariant subgroup of ℊ.


16.5 Direct-Product Groups 657

Theorem 16.4 Homomorphism Theorem Let ℊ = {x, y, ⋯} and ℊ′ = {x′, y′,


⋯} be groups and let a homomorphic (and surjective) mapping ρ: ℊ → ℊ′ exist.
Also let F be a kernel of ℊ. Let us define a surjective mapping ~ρ : ℊ=F → ℊ0
such that

~
ρðgi F Þ = ρðgi Þ: ð16:29Þ

Then, ~
ρ is an isomorphic mapping.
Proof From (16.15) and (16.21), it is obvious that ρ~ is homomorphic. The confir-
mation is left for readers. Let gi F and gj F be two different cosets. Suppose here that
ρ(gi) = ρ(gj). Then we have

ρ gi- 1 gj = ρ gi- 1 ρ gj = ½ρðgi Þ - 1 ρ gj = ½ρðgi Þ - 1 ρðgi Þ = e0 : ð16:30Þ

This implies that gi- 1 gj 2 F . That is, we would have gj 2 gi F . This is in


contradiction to the definition of a coset. Thus, we should have ρ(gi) ≠ ρ(gj). In
other words, the different cosets gi F and gj F have been mapped into different
elements ρ(gi) and ρ(gj) in ℊ′. That is, ~
ρ is isomorphic; i.e., ℊ=F ffi ℊ0 .

16.5 Direct-Product Groups

So far we have investigated basic properties of groups. In Sect. 16.4 we examined


factor groups. The homomorphism theorem shows that the factor group is charac-
terized by division. In the case of a finite group, an order of the group is reduced. In
this section, we study the opposite character, i.e., properties of direct product of
groups, or direct-product groups.
Let H = fh1  e, h2 , ⋯, hm g and H 0 = h01  e, h02 , ⋯, h0n be groups of the
order of m and n, respectively. Suppose that (1) 8 hi (1 ≤ i ≤ m) and
8 0
hj ð1 ≤ i ≤ nÞ commute, i.e., hi h0j = h0j hi , and that (2) H \ H 0 = feg. Under these
conditions let us construct a set ℊ such that

ℊ = h1 h01  e, hi h0j ð1 ≤ i ≤ m, 1 ≤ j ≤ nÞ : ð16:31Þ

In other words, ℊ is a set comprising mn elements hi h0j . A product of elements is


defined as

hi h0j hk h0l = hi hk h0j h0l = hp h0q , ð16:32Þ


658 16 Introductory Group Theory

where hp = hihk and h0q = h0j h0l . The identity element is ee = e; ehihk = hihke. The
-1 -1 -1
inverse element is hi h0j = h0j hi - 1 = hi - 1 h0j . Associative law is obvious
from hi h0j = h0j hi .
Thus, ℊ forms a group. This is said to be a direct product of groups,
or a direct-product group. The groups H and H 0 are called direct factors of ℊ. In this
case, we succinctly represent

ℊ = H  H 0:

In the above, the condition (2) is equivalent to that 8 g2ℊ is uniquely represented
as

g = hh0 ; h 2 H , h0 2 H 0 : ð16:33Þ

In fact, suppose that H \ H 0 = feg and that g can be represented in two ways
such that

g = h1 h01 = h2 h02 ; h1 , h2 2 H , h01 , h02 2 H 0 : ð16:34Þ

Then, we have

-1 -1
h2 - 1 h1 = h02 h01 ; h2 - 1 h1 2 H , h02 h01 2 H 0: ð16:35Þ

From the supposition, we get

-1
h2 - 1 h1 = h02 h01 = e: ð16:36Þ

That is, h2 = h1 and h02 = h01 . This means that the representation is unique.
Conversely, suppose that the representation is unique and that x 2 H \ H 0 . Then
we must have

x = xe = ex: ð16:37Þ

Thanks to uniqueness of the representation, x = e. This implies H \ H 0 = feg.


Now suppose h 2 H . Then for 8 g2ℊ putting g = hν h0μ , we have

-1 -1
ghg - 1 = hν h0μ hh0μ hν - 1 = hν hh0μ h0μ hν - 1 = hν hhν - 1 2 H : ð16:38Þ

Then we have gH g - 1 ⊂ H . Similarly to the proof of Theorem 16.3, we get

gH g - 1 = H : ð16:39Þ
Reference 659

This shows that H is an invariant subgroup of ℊ. Similarly, H 0 is an invariant


subgroup as well.
Regarding the unique representation of the group element of a direct-product
group, we become again aware of the close relationship between the direct product
and direct sum that was mentioned earlier in Part III.

Reference

1. Cotton FA (1990) Chemical applications of group theory, 3rd edn. Wiley, New York
Chapter 17
Symmetry Groups

We have many opportunities to observe symmetry both macroscopic and micro-


scopic in natural world. First, we need to formulate the symmetry appropriately. For
this purpose, we must regard various symmetry operations as mathematical elements
and classify these operations under several categories. In Part III we examined
various properties of vectors and their transformations. We also showed that the
vector transformation can be viewed as the coordinate transformation. On these
topics, we focused upon abstract concepts in various ways. On another front,
however, we have not paid attention to specific geometric objects, especially mol-
ecules. In this chapter, we study the symmetry of these concrete objects. For this, it
will be indispensable to correctly understand a variety of symmetry operations. At
the same time, we deal with the vector and coordinate transformations as group
elements. Among such transformations, rotations occupy a central place in the group
theory and related field of mathematics. Regarding the three-dimensional Euclidean
space, SO(3) is particularly important. This is characterized by an infinite group in
contrast to various symmetry groups (or point groups) we investigate in the former
parts of this chapter.

17.1 A Variety of Symmetry Operations

To understand various aspects of symmetry operations, it is convenient and essential


to consider a general point that is fixed in a three-dimensional Euclidean space and to
examine how this point is transformed in the space. In parallel to the description in
Part III, we express the coordinate of the general point P as

© Springer Nature Singapore Pte Ltd. 2023 661


S. Hotta, Mathematical Physical Chemistry,
https://doi.org/10.1007/978-981-99-2512-4_17
662 17 Symmetry Groups

x
P= y : ð17:1Þ
z

Note that P may be on or within or outside a geometric object or molecule that we


are dealing with. The relevant position vector x for P is expresses as

x = xe1 þ ye2 þ ze3


x
= ð e1 e2 e3 Þ y , ð17:2Þ
z

where e1, e2, and e3 denote an orthonormal basis vectors pointing to positive
directions of x-, y-, and z-axes, respectively. Similarly we denote a linear transfor-
mation A by

a11 a12 a13 x


AðxÞ = ðe1 e2 e3 Þ a21 a22 a23 y : ð17:3Þ
a31 a32 a33 z

Among various linear transformations that are represented by matrices, orthogo-


nal transformations are the simplest and most widely used. We use orthogonal
matrices to represent the orthogonal transformations accordingly.
Let us think of a movement or translation of a geometric object and an operation
that causes such a movement. First suppose that the geometric object is fixed on a
coordinate system. Then the object is moved (or translate) to another place. If before
and after such a movement (or translation) one could not tell whether the object has
been moved, we say that the object possesses a “symmetry” in a certain sense. Thus,
we have to specify this symmetry. In that context, group theory deals with the
symmetry and defines it clearly.
To tell whether the object has been moved (to another place), we usually
distinguish it by change in (1) positional relationship and (2) attribute or property.
To make the situation simple, let us consider a following example:
Example 17.1 We have two round disks, i.e., Disk A and Disk B. Suppose that Disk
A is a solid white disk, whereas Disk B is partly painted black (see Fig. 17.1). In
Fig. 17.1 we are thinking of a rotation of an object (e.g., round disk) around an axis
standing on its center (C) and stretching perpendicularly to the object plane.
If an arbitrarily chosen positional vector fixed on the object before the rotation is
moved to another position that was not originally occupied by the object, then we
recognize that the object has certainly been moved. For instance, imagine that a
round disk having a through-hole located aside from center is rotating. What about
the case where that position was originally occupied by the object, then? We have
two possibilities. The first alternative is that we cannot recognize that the object has
been moved. The second one is that we can yet recognize that the object has been
17.1 A Variety of Symmetry Operations 663

(a) (b)
Disk A Disk B

Rotation around Rotation around C


C the center C C C the center C

Fig. 17.1 Rotation of an object. (a) Case where we cannot recognize that the object has been
moved. (b) Case where we can recognize that the object has been moved because of its attribute
(i.e., because the round disk is partly painted black)

moved. According to Fig. 17.1a, b, we have the former case and the latter case,
respectively. In the latter case, we have recognized the movement of the object by its
attribute, i.e., by that the object is partly painted black. □
In the above example, we do not have to be rigorous. We have a clear intuitive
criterion for a judgment of whether a geometric object has been moved. From now
on, we assume that the geometric character of an object is pertinent to both its
positional relationship and attribute. We define the equivalent (or indistinguishable)
disposition of an object and the operation that yields such an equivalent disposition
as follows:
Definition 17.1
1. Symmetry operation: A geometric operation that produces an equivalent
(or indistinguishable) disposition of an object.
2. Equivalent (or indistinguishable) disposition: Suppose that regarding a geometric
operation of an object, we cannot recognize that the object has been moved before
and after that geometric operation. In that case, the original disposition of the
object and the resulting disposition reached after the geometric operation are
referred to as an equivalent disposition. The relevant geometric operation is the
symmetry operation. ▽
Here we should clearly distinguish translation (i.e., parallel displacement) from
the abovementioned symmetry operations. This is because for a geometric object to
possess the translation symmetry the object must be infinite in extent, typically an
infinite crystal lattice. The relevant discipline is widely studied as space group and
has a broad class of applications in physics and chemistry. However, we will not deal
with the space group or associated topics, but focus our attention upon symmetry
groups in this book.
In the above example, the rotation is a symmetry operation with Fig. 17.1a, but
the said geometric operation is not a symmetric operation with Fig. 17.1b.
Let us further inspect properties of the symmetry operation. Let us consider a set
H consisting of symmetry operations. Let a and b be any two symmetric operations
of H. Then, (1) a ⋄ b is a symmetric operation as well. (2) Multiplication of
successive symmetric operations a, b, c is associative, i.e., a ⋄ (b ⋄ c) = (a ⋄ b) ⋄ c.
(3) The set H contains an element of e called the identity element such that we have
a ⋄ e = e ⋄ a = a with any element a of H. Operating “nothing” should be e. If the
664 17 Symmetry Groups

rotation is relevant, 2π rotation is thought to be e. These are intuitively acceptable.


(4) For any a of ℊ, we have an element b such that a ⋄ b = b ⋄ a = e. The element
b is said to be the inverse element of a. We denote it by b  a-1. The inverse element
corresponds to an operation that brings the disposition of a geometric object back to
the original disposition. Thus, H forms a group. We call H satisfying the above
criteria a symmetry group. A symmetry group is called a point group as well. This is
because a point group comprises symmetry operations of geometric objects as group
elements, and those objects have at least one fixed point after the relevant symmetry
operation. The name of a point group comes from this fact.
As mentioned above, the symmetric operation is best characterized by a (3, 3)
orthogonal matrix. In Example 17.1, e.g., the π rotation is represented by an
orthogonal matrix A such that

-1 0 0
A= 0 -1 0 : ð17:4Þ
0 0 1

This operation represents a π rotation around the z-axis. Let us think of another
symmetric operation described by

1 0 0
B= 0 1 0 : ð17:5Þ
0 0 -1

This produces a mirror symmetry with respect to the xy-plane. Then, we have

-1 0 0
C  AB = BA = 0 -1 0 : ð17:6Þ
0 0 -1

The operation C shows an inversion about the origin. Thus, A, B, and C along
with an identity element E form a group. Here E is expressed as

1 0 0
E= 0 1 0 : ð17:7Þ
0 0 1

The above group is represented by four three-dimensional diagonal matrices


whose elements are 1 or -1. Therefore, it is evident that an inverse element of
A, B, and C is A, B, and C itself, respectively. The said group is commutative
(Abelian) and said to be a four-group (or Klein four-group, more specifically) [1].
Meanwhile, we have a number of non-commutative groups. From a point of view
of a matrix structure, non-commutativity comes from off-diagonal elements of the
matrix. A typical example is a rotation matrix of a rotation angles different from zero
17.1 A Variety of Symmetry Operations 665

Fig. 17.2 Rotation by θ


around the z-axis z

y
O

or nπ (n: integer). For later use, let us have a matrix form that expresses a θ rotation
around the z-axis. Figure 17.2 depicts a graphical illustration for this.
A matrix R has a following form:

cos θ - sin θ 0
R= sin θ cos θ 0 : ð17:8Þ
0 0 1

Note that R has been reduced. This implies that the three-dimensional Euclidean
space is decomposed into a two-dimensional subspace (xy-plane) and a
one-dimensional subspace (z-axis). The xy-coordinates are not mixed with the z-
component after the rotation R. Note, however, that if the rotation axis is oblique
against the xy-plane, this is not the case. We will come back to this point later.
Taking only the xy-coordinates in Fig. 17.3, we make a calculation. Using an
addition theorem of trigonometric functions, we get

x0 = r cosðθ þ αÞ = r ðcos α cos θ - sin α sin θÞ = x cos θ - y sin θ, ð17:9Þ


0
y = r sinðθ þ αÞ = rðsin α cos θ þ cos α sin θÞ = y cos θ þ x sin θ, ð17:10Þ

where we used x = r cos α and y = r sin α. Combining (17.9) and (17.10), as a matrix
form, we get
666 17 Symmetry Groups

Fig. 17.3 Transformation y


of the xy-coordinates by a θ
rotation

θ
α
O x

x′ cos θ - sin θ x
= : ð17:11Þ
y′ sin θ cos θ y

The matrix of RHS in (17.11) is the same as (11.31) and represents a transfor-
mation matrix of a rotation angle θ within the xy-plane. Whereas in Chap. 11 we
considered this from the point of view of the transformation of basis vectors, here we
deal with the transformations of coordinates in the fixed coordinate system. If θ ≠ π,
off-diagonal elements do not vanish. Including the z-component we get (17.8).
Let us summarize symmetry operations and their (3, 3) matrix representations.
The coordinates before and after a symmetry operation are expressed as

x x0
y and y0 , respectively:
z z0

1. Identity transformation:
To leave a geometric object or a coordinate system unchanged (or unmoved).
By convention, we denote it by a capital letter E. It is represented by a (3, 3)
identity matrix.
2. Rotation symmetry around a rotation axis:
Here a “proper” rotation is intended. We denote a rotation by a rotation axis
and its magnitude (i.e., rotation angle). Thus, we have
17.1 A Variety of Symmetry Operations 667

cos θ – sin θ 0 cos ϕ 0 sin ϕ


Rzθ = sin θ cos θ 0 , Ryϕ = 0 1 0 ,
0 0 1 - sin ϕ 0 cos ϕ

1 0 0
Rxφ = 0 cos φ - sin φ : ð17:12Þ
0 sin φ cos φ

With Ryϕ we first consider a following coordinate transformation:

z′ cos ϕ - sin ϕ 0 z
x′ = sin ϕ cos ϕ 0 x : ð17:13Þ
y′ 0 0 1 y

This can easily be visualized in Fig. 17.4; consider that cyclic permutation of
x z
y produces x . Shuffling the order of coordinates, we get
z y

x′ cos ϕ 0 sin ϕ x
y′ = 0 1 0 y : ð17:14Þ
z′ - sin ϕ 0 cos ϕ z

By convention of the symmetry groups, the following notation Cn is used to


denote a rotation. A subscript n of Cn represents the order of the rotation axis. The
order means the largest number of n so that the rotation through 2π/n gives an
equivalent configuration. Successive rotations of m times (m < n) are denoted by

n:
Cm

If m = n, the successive rotations produce an equivalent configuration the same as


the beginning, i.e., C nn = E. The rotation angles θ, φ, etc. used above are restricted to
2πm/n accordingly.
3. Mirror symmetry with respect to a plane of mirror symmetry:
We denote a mirror symmetry by a mirror symmetry plane (e.g., xy-plane, yz-
plane, etc.). We have

1 0 0 -1 0 0 1 0 0
M xy = 0 1 0 , M yz = 0 1 0 , M zx = 0 -1 0 : ð17:15Þ
0 0 -1 0 0 1 0 0 1
668 17 Symmetry Groups

y
O A

x
Fig. 17.4 Rotation by ϕ around the y-axis

The mirror symmetry is usually denoted by σ v, σ h, and σ d, whose subscripts stand


for “vertical,” “horizontal,” and “dihedral,” respectively. Among these symmetry
operations, σ v and σ d include a rotation axis in the symmetry plane, while σ h is
perpendicular to the rotation axis if such an axis exists. Notice that a group belonging
to Cs symmetry possesses only E and σ h. Although σ h can exist by itself as a mirror
symmetry, neither σ v nor σ d can exist as a mirror symmetry by itself. We will come
back to this point later.
4. Inversion symmetry with respect to a center of inversion:
We specify an inversion center if necessary, e.g., an origin of a coordinate
system O. The operation of inversion is denoted by

-1 0 0
IO = 0 -1 0 : ð17:16Þ
0 0 -1

Note that as obviously from the matrix form, IO is commutable with any other
symmetry operations. Note also that IO can be expressed as successive symmetry
operations or product of symmetry operations. For instance, we have
17.2 Successive Symmetry Operations 669

-1 0 0 1 0 0
I O = Rzπ M xy = 0 -1 0 0 1 0 : ð17:17Þ
0 0 1 0 0 -1

Note that Rzπ and Mxy are commutable, i.e., RzπMxy = MxyRzπ .
5. Improper rotation:
This is a combination of a proper rotation and a reflection by a mirror
symmetry plane. That is, rotation around an axis is performed first and then
reflection is carried out by a mirror plane that is perpendicular to the rotation
axis. For instance, an improper rotation is expressed as

1 0 0 cos θ – sin θ 0
M xy Rzθ = 0 1 0 sin θ cos θ 0
0 0 -1 0 0 1

cos θ – sin θ 0
= sin θ cos θ 0 : ð17:18Þ
0 0 -1

As mentioned just above, the inversion symmetry I can be viewed as an improper


rotation. Note that in this case the reflection and rotation operations are commutable.
However, we will follow a conventional custom that considers the inversion sym-
metry as an independent symmetry operation. Readers may well wonder why we
need to consider the improper rotation. The answer is simple; it solely rests upon the
axiom (A1) of the group theory. A group must be closed with respect to the
multiplication.
The improper rotations are usually denoted by Sn. A subscript n again stands for
an order of rotation.

17.2 Successive Symmetry Operations

Let us now consider successive reflections in different planes and successive rota-
tions about different axes [2]. Figure 17.5a displays two reflections with respect to
the planes σ and σ~ both perpendicular to the xy-plane. The said planes make a
dihedral angle θ with their intersection line identical to the z-axis. Also, the plane
σ is identical with the zx-plane. Suppose that an arrow lies on the xy-plane perpen-
dicularly to the zx-plane. As in (17.15), an operation σ is represented as
670 17 Symmetry Groups

(a) (b) (c)


z

2
A A

‒2
O
y

Fig. 17.5 Successive two reflections about two planes σ and σ~ that make an angle θ. (a) Reflections
σ and σ~ with respect to two planes. (b) Successive operations of σ and σ~ in this order. The combined
operation is denoted by σ~σ. The operations result in a 2θ rotation around the z-axis. (c) Successive
operations of σ~ and σ in this order. The combined operation is denoted by σ~ σ . The operations result
in a -2θ rotation around the z-axis

1 0 0
σ= 0 -1 0 : ð17:19Þ
0 0 1

To determine a matrix representation of σ~, we calculate a matrix again as in the


above case. As a result we have

cos θ - sin θ 0 1 0 0 cos θ sin θ 0


σ~ = sin θ cos θ 0 0 -1 0 - sin θ cos θ 0
0 0 1 0 0 1 0 0 1

cos 2θ sin 2θ 0
= sin 2θ - cos 2θ 0 : ð17:20Þ
0 0 1

Notice that this matrix representation is referred to the original xyz-coordinate


system; see discussion of Sect. 11.4. Hence, we describe the successive transforma-
tions σ followed by σ~ as

cos 2θ - sin 2θ 0
σ~ σ = sin 2θ cos 2θ 0 : ð17:21Þ
0 0 1

The expression (17.21) means that the multiplication should be done first by σ
and then by σ~ ; see Fig. 17.5b. Note that σ and σ~ are conjugate to each other (Sect.
16.3). In this case, the combined operations produce a 2θ rotation around the z-axis.
If, on the other hand, the multiplication is made first by σ~ and then by σ, we have a
-2θ rotation around the z-axis; see Fig. 17.5c. As a matrix representation, we have
17.2 Successive Symmetry Operations 671

cos 2θ sin 2θ 0
σ~
σ= - sin 2θ cos 2θ 0 : ð17:22Þ
0 0 1

Thus, successive operations of reflection by the planes that make a dihedral angle
θ yield a rotation ±2θ around the z-axis (i.e., the intersection line of σ and σ~ ). The
operation σ~σ is an inverse to σ~σ. That is,

ðσ~
σ Þðσ~σ Þ = E: ð17:23Þ

σ = det~
We have detσ~ σ σ = 1.
Meanwhile, putting

cos 2θ - sin 2θ 0
R2θ = sin 2θ cos 2θ 0 , ð17:24Þ
0 0 1

we have

σ~σ = R2θ or σ~ = R2θ σ: ð17:25Þ

This implies the following: Suppose a plane σ and a straight line on it. Also
suppose that one first makes a reflection about σ and then makes a 2θ rotation around
the said straight line. Then, the resulting transformation is equivalent to a reflection
about σ~ that makes an angle θ with σ. At the same time, the said straight line is an
intersection line of σ and σ~. Note that a dihedral angle between the two planes
σ and σ~ is half an angle of the rotation. Thus, any two of the symmetry operations
related to (17.25) are mutually dependent; any two of them produce the third
symmetry operation.
In the above illustration, we did not take account of the presence of a symmetry
axis. If the aforementioned axis is a symmetry axis Cn, we must have

2θ = 2π=n or n = π=θ: ð17:26Þ

From a symmetry requirement, there should be n planes of mirror symmetry in


combination with the Cn axis. Moreover, an intersection line of these n mirror
symmetry planes should coincide with that Cn axis. This can be seen as various
Cnv groups.
Next we consider another successive symmetry operations. Suppose that there are
two C2 axes (Cxπ and Caπ as shown) that intersect at an angle θ (Fig. 17.6). There,
Cxπ is identical to the x-axis and the other (Caπ) lies on the xy-plane making an angle
θ with Cxπ. Following procedures similar to the above, we have matrix representa-
tions of the successive C2 operations in reference to the xyz-system such that
672 17 Symmetry Groups

Fig. 17.6 Successive two π


rotations around two C2 A
axes of Cxπ and Caπ that
intersect at an angle θ. Of
these, Cxπ is identical to the
x-axis (not shown)
2
−2

1 0 0
C xπ = 0 -1 0 ,
0 0 -1

cos θ -sin θ 0 1 0 0 cos θ sin θ 0


Caπ = sin θ cos θ 0 0 -1 0 - sin θ cos θ 0
0 0 1 0 0 -1 0 0 1

cos 2θ sin 2θ 0
= sin 2θ - cos 2θ 0 , ð17:27Þ
0 0 -1

where Caπ can be calculated similarly to (17.20). Again we get

cos 2θ -sin 2θ 0
C aπ C xπ = sin 2θ cos 2θ 0 ,
0 0 1

cos 2θ sin 2θ 0
C xπ C aπ = - sin 2θ cos 2θ 0 : ð17:28Þ
0 0 1

Notice that (CxπCaπ )(CaπCxπ) = E. Once again putting


17.2 Successive Symmetry Operations 673

cos 2θ - sin 2θ 0
R2θ = sin 2θ cos 2θ 0 , ð17:24Þ
0 0 1

we have [1, 2]

C aπ Cxπ = R2θ or C aπ = R2θ C xπ : ð17:29Þ

Note that in the above two illustrations for the successive symmetry operations,
both the relevant operators have been represented in reference to the original xyz-
system. For this reason, the latter operation was done from the left in (17.28).
From a symmetry requirement, once again the aforementioned C2 axes must be
present in combination with the Cn axis. Moreover, those C2 axes should be
perpendicular to the Cn axis. This can be seen in various Dn groups.
Another illustration of successive symmetry operations is an improper rotation. If
the rotation angle is π, this causes an inversion symmetry. In this illustration,
reflection, rotation, and inversion symmetries coexist. A C2h symmetry is a typical
example.
Equations (17.21), (17.22), and (17.24) demonstrate the same relation. Namely,
two successive mirror symmetry operations about a couple of planes and two
successive π-rotations about a couple of C2 axes cause the same effect with regard
to the geometric transformation. In this relation, we emphasize that two successive
reflection operations make a determinant of relevant matrices 1. These aspects cause
an interesting effect and we will briefly discuss it in relation to O and Td groups.
If furthermore the abovementioned mirror symmetry planes and C2 axes coexist,
the symmetry planes coincide with or bisect the C2 axes and vice versa. If these were
not the case, another mirror plane or C2 axis would be generated from the symmetry
requirement and the newly generated plane or axis would be coupled with the
original plane or axis. From the above argument, these processes again produce
another Cn axis. That must be prohibited.
Next, suppose that a Cn axis intersects obliquely with a plane of mirror symmetry.
A rotation of 2π/n around such an axis produces another mirror symmetry plane.
This newly generated plane intersects with the original mirror plane and produces a
different Cn axis according to the above discussion. Thus, in this situation a mirror
symmetry plane cannot coexist with a sole rotation axis. In a geometric object with
higher symmetry such as Oh, however, several mirror symmetry planes can coexist
with several rotation axes in such a way that the axes intersect with the mirror planes
obliquely. In case the Cn axis intersects perpendicularly to a mirror symmetry plane,
that plane can coexist with a sole rotation axis (see Fig. 17.7). This is actually the
case with a geometric object having a C2h symmetry. The mirror symmetry plane is
denoted by σ h.
Now, let us examine simple examples of molecules and geometric figures as well
as associated symmetry operations.
674 17 Symmetry Groups

Fig. 17.7 Mirror symmetry


plane σ h and a sole rotation
axis perpendicular to it

Fig. 17.8 Chemical


structural formulae and (a) (b)
point groups of (a)
thiophene, (b) bithiophene,
S
(c) biphenyl, and (d)
naphthalene S S

(c) (d)

Example 17.2 Figure 17.8 shows chemical structural formulae of thiophene,


bithiophene, biphenyl, and naphthalene. These molecules belong to C2v, C2h, D2,
and D2h, respectively. Note that these symbols are normally used to show specific
point groups. Notice also that in biphenyl two benzene rings are twisted relative to
the molecular axis. As an example, a multiplication table is shown in Table 17.1 for a
C2v group. Table 17.1 clearly demonstrates that the group constitution of C2v differs
from that of the group appearing in Example 16.1 (3), even though the order is four
for both the case. Similar tables are given with C2h and D2. This is left for readers as
an exercise. We will find that the multiplication tables of these groups have the same
structure and that C2v, C2h, and D2 are all isomorphic to one another as a four-group.
Table 17.2 gives matrix representation of symmetry operations for C2v. The
17.2 Successive Symmetry Operations 675

Table 17.1 Multiplication C2v E C2(z) σ v(zx) σ 0v ðyzÞ


table of C2v
E E C2 σv σ 0v
C2(z) C2 E σ 0v σv
σ v(zx) σv σ 0v E C2
σ 0v ðyzÞ σ 0v σv C2 E

Table 17.2 Matrix representation of symmetry operations for C2v


E C2 (around z-axis) σ v(zx) σ 0v ðyzÞ
Matrix 1 0 0 -1 0 0 1 0 0 -1 0 0
0 1 0 0 -1 0 0 -1 0 0 1 0
0 0 1 0 0 1 0 0 1 0 0 1

Table 17.3 Multiplication table of D2h


D2h E C2(z) σ v(zx) σ 0v ðyzÞ i σ 00v ðxyÞ C 02 ðyÞ C 002 ðxÞ
E E C2 σv σ 0v i σ 00v C 02 C 002
C2(z) C2 E σ 0v σv σ 00v i C002 C 02
σ v(zx) σv σ 0v E C2 C 02 C 002 i σ 00v
σ 0v ðyzÞ σ 0v σv C2 E C 002 C 02 σ 00v i
i i σ 00v C 02 C 002 E C2 σv σ 0v
σ 00v ðxyÞ σ 00v i C002 C 02 C2 E σ 0v σv
C 02 ðyÞ C 02 C002 i σ 00v σv σ 0v E C2
C 002 ðxÞ C 002 C02 σ 00v i σ 0v σv C2 E

representation is defined as the transformation by the symmetry operations of a set of


basis vectors (x y z) in ℝ3.
Meanwhile, Table 17.3 gives the multiplication table of D2h. We recognize that
the multiplication table of C2v appears on upper left and lower right blocks. If we
suitably rearrange the order of group elements, we can make another multiplication
table so that, e.g., C2h may appear on upper left and lower right blocks. As in the case
of Table 17.2, Table 17.4 summarizes the matrix representation of symmetry
operations for D2h. There are eight group elements, i.e., an identity, an inversion,
three mutually perpendicular C2 axes, and three mutually perpendicular planes of
mirror symmetry (σ). Here, we consider a possibility of constructing subgroups of
D2h. The order of the subgroups must be a divisor of eight, and so let us list
subgroups whose order is four and examine how many subgroups exist. We have
8C4 = 70 combinations, but those allowed should be restricted from the requirement
of forming a group. This is because all the groups must contain identity element, and
so the number allowed is equal to or no greater than 7C3 = 35.
(1) In light of the aforementioned discussion, two C2 axes mutually intersecting at
π/2 yield another C2 axis around the normal to a plane defined by the intersecting
axes. Thus, three C2 axes have been chosen and a D2 symmetry results. In this case,
we have only one choice. (2) In the case of C2v, two planes mutually intersecting at
676

Table 17.4 Matrix representation of symmetry operations for D2h


D2h E C2(z) C2( y) C2(x) i σ(xy) σ(zx) σ(yz)
Matrix 1 0 0 -1 0 0 -1 0 0 1 0 0 -1 0 0 1 0 0 1 0 0 -1 0 0
0 1 0 0 -1 0 0 1 0 0 -1 0 0 -1 0 0 1 0 0 -1 0 0 1 0
0 0 1 0 0 1 0 0 -1 0 0 -1 0 0 -1 0 0 -1 0 0 1 0 0 1
17
Symmetry Groups
17.2 Successive Symmetry Operations 677

Table 17.5 Choice of sym- Subgroup E C2(z) σ i Choice


metry operations for con-
D2 1 3 0 0 1
struction of subgroups of D2h
C2v 1 1 2 0 3
C2h 1 1 1 1 3

π/2 yield a C2 axis around their line of intersection. There are three possibilities of
choosing two axes out of three (i.e., 3C2 = 3). (3) If we choose the inversion (i) along
with, e.g., one of the three C2 axes, a σ necessarily results. This is also the case when
we first combine a σ with i to obtain a C2 axis. We have three possibilities (i.e.,
3C1 = 3) as well. Thus, we have only seven choices to construct subgroups of D2h
having an order of four. This is summarized in Table 17.5.
An inverse of any element is that element itself. Therefore, if with any above
subgroup one chooses any element out of the remaining four elements and combines
it with the identity, one can construct a subgroup Cs, C2, or Ci of an order of 2. Since
all those subgroups of an order of 4 and 2 are commutative with D2h, these sub-
groups are invariant subgroups. Thus in terms of a direct-product group, D2h can be
expressed as various direct factors. Conversely, we can construct factor groups from
coset decomposition. For instance, we have

D2h =C2v ffi C s , D2h =C2v ffi C 2 , D2h =C 2v ffi C i : ð17:30Þ

In turn, we express direct-product groups as, e.g.,

D2h = C 2v  Cs , D2h = C2v  C2 , D2h = C2v  C i : ð17:31Þ


Example 17.3 Figure 17.9 shows an equilateral triangle placed on the xy-plane of a
three-dimensional Cartesian coordinate. An orthonormal basis e1 and e2 are desig-
nated as shown. As for a chemical species of molecules, we have, e.g., boron
trifluoride (BF3). In the molecule, a boron atom is positioned at a molecular center
with three fluorine atoms located at vertices of the equilateral triangle. The boron
atom and fluorine atoms form a planar molecule (Fig. 17.10).
A symmetry group belonging to D3h comprises 12 symmetry operations such that

D3h = E, C3 , C 03 , C 2 , C02 , C 002 , σ h , S3 , S03 , σ v , σ 0v , σ 00v , ð17:32Þ

where symmetry operations of the same species but distinct operation are denoted by
a “prime” or “double prime.” When we represent these operations by a matrix, it is
straightforward in most cases. For instance, a matrix for σ v is given by Myz of
(17.15). However, we should make some matrix calculations about
C 02 , C 002 , σ 0v , and σ 00v .
To determine a matrix representation of, e.g., σ 0v in reference to the xyz-coordinate
system with orthonormal basis vectors e1 and e2, we consider the x′y′z′-coordinate
678 17 Symmetry Groups

Fig. 17.9 Equilateral


triangle placed on the xy-
plane. Several symmetry
operations of D3h are shown.
We consider the successive
operations (i), (ii), and (i)
(iii) to represent σ 0v in
reference to the xyz-
coordinate system (see text) ± /6

(iii)
(ii)
(Σ );
Σ

Fig. 17.10 Boron


trifluoride (BF3) belonging
to a D3h point group

system with orthonormal basis vectors e01 and e02 (see Fig. 17.9). A transformation
matrix between the two sets of basis vectors is represented by
p
3 1
- 0
2 2
p
e1′ e2′ e3′ = ðe1 e2 e3 ÞRzπ6 = ðe1 e2 e3 Þ 1 3 : ð17:33Þ
0
2 2
0 0 1

This representation corresponds to (11.69). Let Σv be a reflection with respect to


the z′x′-plane. This is the same operation as σ 0v . However, a matrix representation is
different. This is because Σv is represented in reference to the x′y′z′-system, while σ 0v
is in reference to xyz-system. The matrix representation of Σv is simple and expressed
as
17.2 Successive Symmetry Operations 679

1 0 0
Σv = 0 -1 0 : ð17:34Þ
0 0 1

Referring to (11.80), we have

-1
Rzπ6 Σv = σ v′ Rzπ6 or σ v′ = Rzπ6 Σv Rzπ6 : ð17:35Þ

Thus, we see that in the first equation of (17.35) the order of multiplications is
reversed according as the latter operation is expressed in the xyz-system or in the
x′y′z′-system. As a full matrix representation, we get
p p
3 1 3 1
- 0 0
2 2 1 0 0 2 2
p p
σ v′ = 1 3 0 -1 0 1 3
0 0 0 1 - 0
2 2 2 2
0 0 1 0 0 1
p
1 3
2 0
2
p
= 3 1 : ð17:36Þ
- 0
2 2
0 0 1

Notice that this matrix representation is referred to the original xyz-coordinate


system as before. Graphically, (17.35) corresponds to the multiplication of the
symmetry operations done in the order of (i) -π/6 rotation, (ii) reflection (denoted
by Σv), and (iii) π/6 rotation (Fig. 17.9). The associated matrices are multiplied from
the left.
Similarly, with C02 we have
p
1 3
2 0
2
p
C2′ = 3 1 : ð17:37Þ
- 0
2 2
0 0 -1

The matrix form of (17.37) can also be decided in a manner similar to the above
according to three successive operations shown in Fig. 17.9. In terms of classes,
σ v , σ 0v , and σ 00v form a conjugacy class and C 2 , C02 , and C 002 form another conjugacy
class. With regard to the reflection and rotation, we have detσ 0v = - 1 and detC2 = 1,
respectively. □
680 17 Symmetry Groups

17.3 O and Td Groups

According as a geometric object (or a molecule) has a higher symmetry, we have to


deal with many symmetry operations and relationship between them. As an example,
we consider O and Td groups. Both groups have 24 symmetry operations and are
isomorphic.
Let us think of the group O fist. We start with considering rotations of π/2 around
x-, y-, and z-axis. The matrices representing these rotations are obtained from (17.12)
to give

1 0 0 0 0 1 0 -1 0
Rxπ2 = 0 0 -1 , Ryπ2 = 0 1 0 , Rzπ2 = 1 0 0 :
0 1 0 -1 0 0 0 0 1
ð17:38Þ

We continue multiplications of these matrices so that the matrices can make up a


complete set (i.e., a closure). Counting over those matrices, we have 24 of them and
they form a group termed O. The group O is a pure rotation group. Here, the pure
rotation group is defined as a group whose group elements only comprise proper
rotations (with their determinant of 1). An example is shown in Fig. 17.11, where
individual vertices of the cube have three arrows for the cube not to possess mirror
symmetries or inversion symmetry. The group O has five conjugacy classes. Fig-
ure 17.12 summarizes them. Geometric characteristics of individual classes are
sketched as well. These classes are categorized by a trace of the matrix. This is

Fig. 17.11 Cube whose


individual vertices have
three arrows for the cube not
to possess mirror
symmetries. This object
belongs to a point group
O called pure rotation group
17.3 O and Td Groups 681

χ=3
z
100
010
y
001
z x
χ=1
10 0 1 00 0 0 1 0 0 -1 0 -1 0 0 1 0
0 0 -1 0 01 0 1 0 0 1 0 1 0 0 -1 0 0
01 0 0 -1 0 -1 0 0 1 0 0 0 0 1 0 0 1 y
x
z
χ=0
001 0 01 0 0 -1 0 0 -1 010 0 -1 0 0 1 0 0 -1 0 y
100 -1 0 0 1 00 -1 0 0 001 0 0 -1 0 0 -1 0 0 1
010 0 -1 0 0 -1 0 0 10 100 10 0 -1 0 0 -1 0 0
x

χ = −1 z
10 0 -1 0 0 -1 0 0
0 -1 0 0 1 0 0 -1 0
0 0 -1 0 0 -1 0 01
y z
x
χ = −1
-1 0 0 -1 0 0 0 1 0 0 -1 0 0 0 1 0 0 -1 y
00 1 0 0 -1 1 0 0 -1 0 0 0 -1 0 0 -1 0
01 0 0 -1 0 0 0 -1 0 0 -1 1 0 0 -1 0 0 x

Fig. 17.12 Point group O and its five conjugacy classes. Geometric characteristics of individual
classes are briefly sketched

because the trace is kept unchanged by a similarity transformation. (Remember that


elements of the same conjugacy class are connected with a similarity transforma-
tion.) Having a look at the sketches, we notice that each operation switches the basis
vectors e1, e2, and e3, i.e., x-, y-, and z-axes. Therefore, the presence of diagonal
elements (either 1 or -1) implies that the matrix takes the basis vector(s) as an
eigenvector with respect to the rotation. Corresponding eigenvalue(s) are either 1 or
-1 accordingly. This is expected from the fact that the matrix is an orthogonal
matrix (i.e., unitary). The trace, namely, a summation of diagonal elements, is
closely related to the geometric feature of the operation.
The operations of a π rotation around the x-, y-, and z-axis and those of a π
rotation around an axis bisecting any two of three axes have a trace -1. The former
operations take all the basis vectors as an eigenvectors; that is, all the diagonal
elements are non-vanishing. With the latter operations, however, only one diagonal
element is -1. This feature comes from that the bisected axes are switched by the
rotation, whereas the remaining axis is reversed by the rotation.
Another characteristic is the generation of eight rotation axes that trisect the x-, y-,
and z-axes, more specifically a solid angle π/2 formed by the x-, y-, and z-axes. Since
the rotation switches all the x-, y-, and z-axes, the trace is zero. At the same time, we
find that this operation belongs to C3. This operation is generated by successive two
π/2 rotations around two mutually orthogonal axes. To inspect this situation more
closely, we consider a conjugacy class of π/2 rotation that belongs to the C4
symmetry and includes six elements, i.e., Rxπ2 , Ryπ2 , Rzπ2 , Rxπ2 , Ryπ2 , and Rzπ2 . With
682 17 Symmetry Groups

these notations, e.g., Rxπ2 stands for a π/2 counterclockwise rotation around the x-axis;
Rxπ2 denotes a π/2 counterclockwise rotation around the -x-axis. Consequently, Rxπ2
implies a π/2 clockwise rotation around the x-axis and, hence, an inverse element of
Rxπ2 . Namely, we have

-1
Rx π2 = Rx π2 : ð17:39Þ

Now let us consider the successive two rotations. This is denoted by the multi-
plication of matrices that represent the related rotations. For instance, the multipli-
cation of, e.g., Rxπ2 and R0yπ produces the following:
2

Rxyz2π3 = Rxπ2 Ryπ ′


2

1 0 0 0 0 1 0 0 1
= 0 0 -1 0 1 0 = 1 0 0 : ð17:40Þ
0 1 0 -1 0 0 0 1 0

In (17.40), we define Rxyz2π3 as a 2π/3 counterclockwise rotation around an axis that


trisects the x-, y-, and z-axes. The prime “ ′ ” of R0yπ means that the operation is carried
2
out in reference to the new coordinate system reached by the previous operation Rxπ2 .
For this reason, R0yπ is operated (i.e., multiplied) from the right in (17.40). Compare
2
this with the remark made just after (17.29). Changing the order of Rxπ2 and R0yπ , we
2
have

Rxyz2π3 = Ryπ2 Rx′π


2

0 0 1 1 0 0 0 1 0
= 0 1 0 0 0 -1 = 0 0 -1 , ð17:41Þ
-1 0 0 0 1 0 -1 0 0

where Rxyz2π3 is a 2π/3 counterclockwise rotation around an axis that trisects the x-, y-,
and -z-axes. Notice that we used R0xπ this time, because it was performed after Ryπ2 .
2
Thus, we notice that there are eight related operations that trisect eight octants of the
coordinate system. These operations are further categorized into four sets in which
the two elements are an inverse element of each other. For instance, we have

-1
Rxyz2π3 = Rxyz2π3 : ð17:42Þ

Notice that a 2π/3 counterclockwise rotation around an axis that trisects the -x-,
-y-, and -z-axes is equivalent to a 2π/3 clockwise rotation around an axis that
17.3 O and Td Groups 683

-1
trisects the x-, y-, and z-axes. Also, we have Rxyz2π3 = Rxyz2π3 , etc. Moreover, we
have “cyclic” relations such as

Rxπ2 R0yπ = Ryπ2 R0zπ = Rzπ2 R0xπ = Rxyz2π3 : ð17:43Þ


2 2 2

Returning back to Sect. 11.4, we had

x1 x1
A½PðxÞ = ½ðe1 ⋯ en ÞPA0  ⋮ = ð e1 ⋯ en Þ A O P ⋮ : ð11:79Þ
xn xn

Implication of (11.79) is that LHS is related to the transformation of basis vectors


x1
while retaining coordinates ⋮ and that transformation matrices should be
xn
operated on the basis vectors from the right. Meanwhile, RHS describes the trans-
formation of coordinates while retaining basis vectors. In that case, transformation
matrices should be operated on the coordinates from the left. Thus, the order of
operator multiplication is reversed. Following (11.80), we describe

-1
Rxπ2 R0yπ = RO Rxπ2 , i:e:, RO = Rxπ2 R0yπ Rxπ2 , ð17:44Þ
2 2

where RO is viewed in reference to the original (or fixed) coordinate system and
conjugate to R0yπ . Thus, we have
2

1 0 0 0 0 1 1 0 0
RO = 0 0 -1 0 1 0 0 0 1
0 1 0 -1 0 0 0 -1 0
0 -1 0
= 1 0 0 : ð17:45Þ
0 0 1

Note that (17.45) is identical to a matrix representation of a π/2 rotation around


the z-axis. This is evident from the fact that the y-axis is converted to the original z-
axis by Rxπ2 ; readers, imagine it.
We have two conjugacy classes of π rotation (the C2 symmetry). One of them
includes six elements, i.e., Rxyπ , Ryzπ, Rzxπ, Rxyπ , Ryzπ , and Rzxπ . For these notations a
subscript, e.g., xy, stands for an axis that bisects the angle formed by x- and y-axes. A
subscript xy denotes an axis bisecting the angle formed by -x- and y-axes. Another
class includes three elements, i.e., Rxπ , Ryπ, and Rzπ.
As for Rxπ, Ryπ, and Rzπ, a combination of these operations should yield a C2
rotation axis as discussed in Sect. 17.2. Of these three rotation axes, in fact, any two
684 17 Symmetry Groups

produce a C2 rotation around the remaining axis, as is the case with naphthalene
belonging to the D2h symmetry (see Sect. 17.2).
Regarding the class comprising six π rotation elements, a combination of, e.g.,
Rxyπ and Rxyπ crossing each other at a right angle causes a related effect. For the other
combinations, the two C2 axes intersect each other at π/3; see Fig. 17.6 and put
θ = π/3 there. In this respect, elementary analytic geometry teaches the positional
relationship among planes and straight lines. The argument is as follows: A plane
x1 x2 x3
determined by three points y1 , y2 , and y3 that do not sit on a line is
z1 z2 z3
expressed by a following equation:

x y z 1
x1 y1 z1 1
= 0: ð17:46Þ
x2 y2 z2 1
x3 y3 z3 1

0 1 0 x1 x2 x3
Substituting 0 , 1 , and 1 for y1 , y2 , and y3 in (17.46),
0 0 1 z1 z2 z3
respectively, we have

x - y þ z = 0: ð17:47Þ

Taking account of direction cosines and using Hesse’s normal form, we get

1
p ðx - y þ zÞ = 0, ð17:48Þ
3

where the normal to the plane expressed in (17.48) has direction cosines of p13, - p13,
and p13 in relation to the x-, y-, and z-axes, respectively.
Therefore, the normal is given by a straight line connecting the origin and
1
-1 . In other words, a line connecting the origin and a corner of a cube is the
1
normal to the plane described by (17.48). That plane is formed by two intersecting
lines, i.e., rotation axes of C2 and C02 (see Fig. 17.13 that depicts a cube of each side
of 2). These axes make an angle π/3; this can easily be checked by taking an inner
p
1=p2 0p
product between 1= 2 and 1=p2 . These column vectors are two direction
0 1= 2
cosines of C2 and C02 . On
the basis of the discussion of Sect. 17.2, we must have a
rotation axis of C3. That is, this axis trisects a solid angle π/2 shaped by three
intersecting sides.
17.3 O and Td Groups 685

Fig. 17.13 Rotation axes of z


C2 and C02 (marked red)
along with another rotation
axis C3 (marked blue) in a
point group O
(0, 1, 1)

(1, –1, 1)
3 y
O
2
(1, 1, 0)

Sheet 1 Sheet 2 Sheet 3

Fig. 17.14 Simple kit that helps visualize the positional relationship among planes and straight
lines in three-dimensional space. To make it, follow next procedures: (1) Take three thick sheets of
paper and make slits (dashed lines) as shown. (2) Insert Sheet 2 into Sheet 1 so that the two sheets
can make a right angle. (3) Insert Sheet 3 into combined Sheets 1 and 2

It is sometimes hard to visualize or envisage the positional relationship among


planes and straight lines in three-dimensional space. It will therefore be useful to
make a simple kit to help visualize it. Figure 17.14 gives an illustration.
Another typical example having 24 group elements is Td. A molecule of methane
belongs to this symmetry. Table 17.6 collects the relevant symmetry operations and
their (3, 3) matrix representations. As in the case of Fig. 17.12, the matrices show
how a set of vectors (x y z) are transformed according to the symmetry operations.
Comparing it with Fig. 17.12, we immediately recognize that the close relationship
between Td and O exists and that these point groups share notable characteristics.
(1) Both Td and O consist of five conjugacy classes, each of which contains the
same number of symmetry species. (2) Both Td and O contain a pure rotation group
T as a subgroup. The subgroup T consists of 12 group elements E, 8C3, and 3C2.
Other remaining 12 group elements of Td are symmetry species related to reflection:
S4 and σ d. The elements 6S4 and 6σ d correspond to 6C4 and 6C2 of O, respectively.
That is, successive operations of S4 cause similar effects to those of C4 of O.
Meanwhile, successive operations of σ d are related to those of 6C2 of O.
Table 17.6 Symmetry operations and their matrix representation of Td
686

E:
1 0 0
0 1 0
0 0 1

8C3:
0 0 1 0 0 1 0 0 -1 0 0 -1 0 1 0 0 -1 0 0 1 0 0 -1 0
1 0 0 -1 0 0 1 0 0 -1 0 0 0 0 1 0 0 -1 0 0 -1 0 0 1
0 1 0 0 -1 0 0 -1 0 0 1 0 1 0 0 1 0 0 -1 0 0 -1 0 0

3C2:
-1 0 0 -1 0 0 1 0 0
0 -1 0 0 1 0 0 -1 0
0 0 1 0 0 -1 0 0 -1

6S4:
-1 0 0 -1 0 0 0 0 1 0 0 -1 0 -1 0 0 1 0
0 0 -1 0 0 1 0 -1 0 0 -1 0 1 0 0 -1 0 0
0 1 0 0 -1 0 -1 0 0 1 0 0 0 0 -1 0 0 -1

6σ d:
1 0 0 1 0 0 0 0 1 0 0 -1 0 1 0 0 - 1 0
0 0 1 0 0 -1 0 1 0 0 1 0 1 0 0 -1 0 0
0 1 0 0 -1 0 1 0 0 -1 0 0 0 0 1 0 0 1
17
Symmetry Groups
17.3 O and Td Groups 687

Fig. 17.15 Regular


tetrahedron inscribed in a
z
cube. As an example of
symmetry operations, we
can choose three pairs of
planes of σ d (six planes in
total) given by equations of
x = ± y and y = ± z and
z= ±x

y
O

Let us imagine in Fig. 17.15 that a regular tetrahedron is inscribed in a cube. As


an example of symmetry operations, suppose that three pairs of planes of σ d (six
planes in total) are given by equations of x = ± y and y = ± z and z = ± x. Their
Hesse’s normal forms are represented as

1
p ðx ± yÞ = 0, ð17:49Þ
2
1
p ðy ± zÞ = 0, ð17:50Þ
2
1
p ðz ± xÞ = 0: ð17:51Þ
2

Then, a dihedral angle α of the two planes is given by

1 1 1
cos α = p ∙ p = ð17:52Þ
2 2 2

or

1 1 1 1
cos α = p ∙ p - p ∙ p = 0: ð17:53Þ
2 2 2 2
688 17 Symmetry Groups

That is, α = π/3 or α = π/2. On the basis of the discussion of Sect. 17.2, the
intersection of the two planes must be a rotation axis of C3 or C2. Once again, in the
case of C3, the intersection is a straight line connecting the origin and a vertex of the
cube. This can readily be verified as follows: For instance, two planes given by x = y
and y = z make an angle π/3 and produce an intersection line x = y = z. This line, in
turn, connects the origin and a vertex of the cube.
If we choose, e.g., two planes x = ± y from the above, these planes make a right
angle and their intersection must be a C2 axis. The three C2 axes coincide with the x-,
y-, and z-axes. In this light, σ d functions similarly to 6C2 of O in that their
combinations produce 8C3 or 3C2. Thus, constitution and operation of Td and
O are related.
Let us more closely inspect the structure and constitution of O and Td. First we
construct mapping ρ between group elements of O and Td such that

ρ : g 2 O ⟷ g ′ 2 T d , if g 2 T,
-1
ρ : g 2 O ⟷ - g′ 2 T d , if g 2
= T:

In the above relation, the minus sign indicates that with an inverse representation
matrix R must be replaced with -R. Then, ρ(g) = g′ is an isomorphic mapping. In
fact, comparing Fig. 17.12 and Table 17.6, ρ gives identical matrix representation for
O and Td. For example, taking the first matrix of S4 of Td, we have

-1
-1 0 0 -1 0 0 1 0 0
- 0 0 -1 =- 0 0 1 = 0 0 -1 :
0 1 0 0 -1 0 0 1 0

The resulting matrix is identical to the first matrix of C4 of O. Thus, we find Td


and O are isomorphic to each other.
Both O and Td consist of 24 group elements and isomorphic to a symmetric group
S4; do not confuse it with the same symbol S4 as a group element of Td. The subgroup
T consists of three conjugacy classes E, 8C3, and 3C2. Since T is constructed only by
entire classes, it is an invariant subgroup; in this respect see the discussion of Sect.
16.3. The groups O, Td, and T along with Th and Oh form cubic groups [2]. In
Table 17.7, we list these cubic groups together with their name and order. Of these,
O is a pure rotation subgroup of Oh and T is a pure rotation subgroup of Td and Th.

Table 17.7 Several cubic groups and their characteristics


Notation Group name Order Remark
T Tetrahedral rotation 12 Subgroup of Th, Td
Th, Td Tetrahedral 24
O Octahedral rotation 24 Subgroup of Oh
Oh Octahedral 48
17.4 Special Orthogonal Group SO(3) 689

Symmetry groups are related to permutations of n elements (or objects or


numbers). The permutation has already appeared in (11.57) when we defined a
determinant of a matrix. That was defined as

1 2 ⋯ n
σ= ,
i1 i2 ⋯ in

where σ means the permutation of the numbers 1, 2, ⋯, n. Therefore, the above


symmetry group has n! group elements (i.e., different ways of rearrangements).
Although we do not dwell on symmetric groups much, we describe a following
important theorem related to finite groups without proof. Interested readers are
referred to literature [1].
Theorem 17.1 Cayley’s Theorem [1] Every finite group ℊ of order n is isomor-
phic to a subgroup (containing a whole group) of the symmetric group Sn. ■

17.4 Special Orthogonal Group SO(3)

In Parts III and IV thus far, we have dealt with a broad class of linear transforma-
tions. Related groups are finite groups. Here we will describe characteristics of
special orthogonal group SO(3), a kind of infinite groups. The SO(3) represents
rotations in three-dimensional Euclidean space ℝ3. Rotations are made around an
axis (a line through the origin) with the origin fixed. The rotation is defined by an
azimuth of direction and a magnitude (angle) of rotation. The azimuth of direction is
defined by two parameters and the magnitude is defined by one parameter, a rotation
angle. Hence, the rotation is defined by three independent parameters. Since those
parameters are continuously variable, SO(3) is one of the continuous groups.
Two rotations result in another rotation with the origin again fixed. A reverse
rotation is unambiguously defined. An identity transformation is naturally defined.
An associative law holds as well. Thus, the relevant rotations form a group, i.e.,
SO(3). The rotation is represented by a real (3, 3) matrix whose determinant is 1. A
matrix representation is uniquely determined once an orthonormal basis is set in ℝ3.
Any rotation is represented by a rotation matrix accordingly. The rotation matrix R is
defined by

RT R = RRT = E, ð17:54Þ

where R is a real matrix with detR = 1. Notice that we exclude the case where
detR = - 1. Matrices that satisfy (17.54) with detR = ± 1 are referred to as
orthogonal matrices that cause orthogonal transformation. Correspondingly, orthog-
onal groups (represented by orthogonal matrices) contain rotation groups as a special
case. In other words, the orthogonal groups contain the rotation groups as a
690 17 Symmetry Groups

subgroup. An orthogonal group in ℝ3 denoted O(3) contains SO(3) as a subgroup.


By the same token, orthogonal matrices contain rotation matrices as a special case.
In Sect. 17.2 we treated reflection and improper rotation with a determinant of
their matrices being -1. In this section these transformations are excluded and only
rotations are dealt with. We focus on geometric characteristics of the rotation groups.
Readers are referred to more detailed representation theory of SO(3) in appropriate
literature [1].

17.4.1 Rotation Axis and Rotation Matrix

In this section we represent a vector such as jxi. We start with showing that any
rotation has a unique presence of a rotation axis. The rotation axis is defined by the
following: Suppose that there is a rigid body with some point within the body fixed.
Here the said rigid body can be that with infinite extent. Then the rigid body exerts
rotation. The rotation axis is a line on which every point is unmoved during the
rotation. As a matter of course, identity matrix E has three linearly independent
rotation axes. (Practically, this represents no rotation.)
Theorem 17.2 Any rotation matrix R is accompanied by at least one rotation axis.
Unless the rotation matrix is identity, the rotation matrix should be accompanied by
one and only one rotation axis.
Proof As R is an orthogonal matrix of a determinant 1, so are RT and R-1. Then we
have

ðR - E ÞT = RT - E = R - 1 - E: ð17:55Þ

Hence, we get

detðR - EÞ = det RT - E = det R - 1 - E = det R - 1 ðE - RÞ

= det R - 1 detðE - RÞ = detðE - RÞ = - detðR - E Þ: ð17:56Þ

Note here that with any (3, 3) matrix A of ℝ3,

detA = - ð- 1Þ9 detA = - detð- AÞ:

This equality holds with ℝn (n : odd), but in ℝn (n : even), we have detA = det (-
A). Then, (17.56) results in a trivial equation 0 = 0 accordingly. Therefore, the
discussion made below only applies to ℝn (n : odd). Thus, from (17.56) we have
17.4 Special Orthogonal Group SO(3) 691

detðR - E Þ = 0: ð17:57Þ

This implies that for ∃jx0i ≠ 0

ðR - E Þ j x0 i = 0: ð17:58Þ

Therefore, we get

Rðajx0 iÞ = a j x0 i, ð17:59Þ

where a is an arbitrarily chosen real number. In this case an eigenvalue of R is


1, which an eigenvector a j x0i corresponds to. Thus, as a rotation axis, we have a
straight line expressed as

l = Spanfajx0 i; a 2 ℝg: ð17:60Þ

This proves the presence of a rotation axis.


Next suppose that there are two (or more) rotation axes. The presence of two
rotation axes naturally implies that there are two linearly independent vectors (i.e.,
two straight lines that mutually intersect at the fixed point). Suppose that such
vectors are jui and jvi. Then we have

ðR - E Þ j ui = 0, ð17:61Þ
ðR - E Þ j vi = 0: ð17:62Þ

Let us consider a vector 8jyi that is chosen from Span{| ui, | vi}, to which we
assume that

Spanfjui, jvig  Spanfajui, bjvi; a, b 2 ℝg: ð17:63Þ

That is, Span{| ui, | vi} represents a plane P formed by two mutually intersecting
straight lines. Then we have

y = sjui þ t jvi, ð17:64Þ

where s and t are some real numbers. Operating R - E on (17.64), we have

ðR - E Þ j yi = ðR - E Þðsjui þ tjviÞ = sðR - E Þ j ui þ t ðR - E Þ j vi = 0: ð17:65Þ

This indicates that any vectors in P can be an eigenvector of R, implying that an


infinite number of rotation axes exist.
Now, take another vector jwi that is perpendicular to the plane P (see Fig. 17.16).
Let us consider an inner product hy| Rwi. Since
692 17 Symmetry Groups

Fig. 17.16 Plane P formed


by two mutually intersecting | ⟩
straight lines represented by
| ui and | vi. Another vector
| wi is perpendicular to the
plane P

P
| ⟩
| ⟩

R j ui = j ui, hujR{ = hujRT = huj: ð17:66Þ

Similarly, we have

hvjRT = hvj: ð17:67Þ

Therefore, using the relation (17.64), we get

hy j R T = hy j : ð17:68Þ

Here we are dealing with real numbers and, hence, we have

R{ = RT : ð17:69Þ

Now we have

hyjRwi = yRT jRw = yjRT Rw = hyjEwi = hyjwi = 0: ð17:70Þ

In (17.70) the second equality comes from the associative law; the third is due to
(17.54). The last equality comes from that jwi is perpendicular to P.
From (17.70) we have

hyjðR - E Þwi = 0: ð17:71Þ


17.4 Special Orthogonal Group SO(3) 693

However, we should be careful not to conclude immediately from (17.71) that


(R - E) j wi = 0; i.e., R j wi = j wi. This is because in (17.70) jyi does not represent
all vectors in ℝ3, but merely represents all the vectors in Span{| ui, | vi}. Nonethe-
less, both jwi and jRwi are perpendicular to P, and so

jRwi = ajwi, ð17:72Þ

where a is an arbitrarily chosen real number. From (17.72), we have

wR{ jRw = wRT jRw = hwjwi = jaj2 hwjwi, i:e:, a = ± 1,

where the first equality comes from that R is an orthogonal matrix. Since detR=1,
a = 1. Thus, from (17.72) this time around, we have jRwi = j wi; that is,

ðR - E Þ j wi = 0: ð17:73Þ

Equations (17.65) and (17.73) imply that for any vector jxi arbitrarily chosen
from ℝ3, we have

ðR - E Þ j xi = 0: ð17:74Þ

Consequently, we get

R - E = 0 or R = E: ð17:75Þ

The above procedures represented by (17.61)–(17.75) indicate that the presence


of two rotation axes necessarily requires a transformation matrix to be identity. This
implies that all the vectors in ℝ3 are an eigenvector of a rotation matrix.
Taking contraposition of the above, unless the rotation matrix is identity, the
relevant rotation cannot have two rotation axes. Meanwhile, the proof of the former
half ensures the presence at least one rotation axis. Hence, any rotation is charac-
terized by a unique rotation axis except for the identity transformation. This com-
pletes the proof. ■
An immediate consequence of Theorem 17.2 is that the rotation matrix should
have an eigenvalue 1 which an eigenvector representing the rotation axis corre-
sponds to. This statement includes a trivial fact that all the eigenvalues of the identity
matrix are 1. In Sect. 14.4, we calculated eigenvalues of a two-dimensional rotation
matrix. The eigenvalues were eiθ or e-iθ, where θ is a rotation angle.
Let us consider rotation matrices that we dealt with in Sect. 17.1. The matrix
representing the rotation around the z-axis by a rotation angle θ is expressed by
694 17 Symmetry Groups

cos θ - sin θ 0
R= sin θ cos θ 0 : ð17:8Þ
0 0 1

In Sect. 14.4 we treated diagonalization of a rotation matrix. As R is reduced, the


diagonalization can be performed in a manner essentially the same as (14.92). That
is, as a diagonalizing unitary matrix, we have

1 1 1 i
p p 0 p p 0
2 2 2 2
U= - p
i
p
i
0 , U{ = 1
p - p
i
0 : ð17:76Þ
2 2 2 2
0 0 1 0 0 1

As a result of the unitary similarity transformation, we get

1 i 1 1
p p 0 p p 0
2 2 cos θ - sin θ 0 2 2
U { RU = p
1
-p 0
i sin θ cos θ 0 i i
0 0 1 -p p 0
2 2 2 2
0 0 1 0 0 1

e 0 0
= 0 e - iθ 0 : ð17:77Þ
0 0 1

Thus, eigenvalues are 1, eiθ, and e-iθ. The eigenvalue 1 results from the existence
of the unique rotation axis. When θ = 0, (17.77) gives an identity matrix with all the
eigenvalues 1 as expected. When θ = π, eigenvalues are -1, - 1, and 1. The
eigenvalue 1 is again associated with the unique rotation axis. The (unitary) simi-
larity transformation keeps a trace unchanged, that is, the trace χ is

χ = 1 þ 2 cos θ: ð17:78Þ

As R is a normal operator, spectral decomposition can be done as in the case of


Example 14.1. Here we only show the result below:

1 i 1 i
0 - 0
2 2 2 2 0 0 0
R = eiθ -
i 1
0 þ e - iθ i 1
0 þ 0 0 0 :
2 2 2 2 0 0 1
0 0 0 0 0 0

Three matrices of the above equation are projection operators.


17.4 Special Orthogonal Group SO(3) 695

17.4.2 Euler Angles and Related Topics

Euler angles are well known and have been used in various fields of science. We
wish to connect the above discussion with Euler angles.
In Part III we dealt with successive linear transformations. This can be extended
to the case of three or more successive transformations. Suppose that we have three
successive transformation R1, R2, and R3 and that the coordinate system (a three-
dimensional orthonormal basis) is transformed from O ⟶ I ⟶ II ⟶ III accordingly.
The symbol “O” stands for the original coordinate system and I, II, and III represent
successively transformed systems.
With the discussion that follows, let us denote the transformation by R02 , R03 , R003 ,
etc. in reference to the coordinate system I. For example, R03 means that the third
transformation is viewed from the system I. The transformation R003 indicates the third
transformation is viewed from the system II. That is, the number of primes “′”
denotes the number of the coordinate system to distinguish the systems I and II. Let
R2 (without prime) stand for the second transformation viewed from the system O.
Meanwhile, we have

R1 R02 = R2 R1 : ð17:79Þ

This notation is in parallel to (11.80). Similarly, we have

R02 R003 = R03 R02 and R1 R03 = R3 R1 : ð17:80Þ

Therefore, we get [3]

R1 R02 R003 = R1 R03 R02 = R3 R1 R02 = R3 R2 R1 : ð17:81Þ

Also combining (17.79) and (17.80), we have

R003 = ðR2 R1 Þ - 1 R3 ðR2 R1 Þ: ð17:82Þ

Let us call R02 , R003 , etc. a transformation on a “moving” coordinate system (i.e., the
systems I, II, III, ⋯). On the other hand, we call R1, R2, etc. a transformation on a
“fixed” system (i.e., original coordinate system O). Thus (17.81) shows that the
multiplication order is reversed with respect to the moving system and fixed
system [3].
For a practical purpose, it would be enough to consider three successive trans-
formations. Let us think of, however, a general case where n successive transforma-
tions are involved (n denotes a positive integer). For the purpose of succinct
notation, let us define the linear transformations and relevant coordinate systems
as those in Fig. 17.17. The diagram shows the transformation of the basis vectors.
ðiÞ
We define a following orthogonal transformation Rj :
696 17 Symmetry Groups

( ) ( ) ( ) ( )

O I II III
z

y
x

Fig. 17.17 Successive orthogonal transformations and relevant coordinate systems

ðiÞ ð0Þ
Rj ð0 ≤ i < j ≤ nÞ, and Ri  Ri , ð17:83Þ

ðiÞ
where Rj is defined as a transformation Rj described in reference to the coordinate
ð0Þ
system i; Ri means that Ri is referred to the original coordinate system (i.e., the
fixed coordinate). Then we have

ði - 2Þ ði - 1Þ ði - 2Þ ði - 2Þ
Ri - 1 Rk = Rk Ri - 1 ðk > i - 1Þ: ð17:84Þ

Particularly, when i = 3

ð1Þ ð2Þ ð1Þ ð1Þ


R2 Rk = Rk R2 ðk > 2Þ: ð17:85Þ

For i = 2 we have

ð1Þ
R1 Rk = Rk R1 ðk > 1Þ: ð17:86Þ

We define n time successive transformations on a moving coordinate system as


R~n such that

ð1Þ ð2Þ ðn - 3Þ ðn - 2Þ
R~n  R1 R2 R3 ⋯Rn - 2 Rn - 1 Rðnn - 1Þ : ð17:87Þ

ðn - 2Þ
Applying (17.84) on Rn - 1 Rðnn - 1Þ and rewriting (17.87), we have

ð1Þ ð2Þ ð n - 3Þ ðn - 2Þ
R~n = R1 R2 R3 ⋯Rn - 2 Rðnn - 2Þ Rn - 1 : ð17:88Þ

ðn - 3Þ
Applying (17.84) again on Rn - 2 Rðnn - 2Þ , we get
17.4 Special Orthogonal Group SO(3) 697

ð1Þ ð2Þ ðn - 3Þ ðn - 2Þ
R~n = R1 R2 R3 ⋯Rðnn - 3Þ Rn - 2 Rn - 1 : ð17:89Þ

Proceeding similarly, we have

ð1Þ ð2Þ ðn - 3Þ ðn - 2Þ ð1Þ ð2Þ ðn - 3Þ ðn - 2Þ


R~n = R1 Rðn1Þ R2 R3 ⋯Rn - 2 Rn - 1 = Rn R1 R2 R3 ⋯Rn - 2 Rn - 1 , ð17:90Þ

where with the last equality we used (17.86). In this case, we have

R1 Rðn1Þ = Rn R1 : ð17:91Þ

To reach RHS of (17.90), we applied (17.84) (n - 1) times in total. Then we


ð n - 2Þ
repeat the above procedures with respect to Rn - 1 another (n - 2) times to get

ð1Þ ð2Þ ð n - 3Þ
R~n = Rn Rn - 1 R1 R2 R3 ⋯Rn - 2 : ð17:92Þ

Further proceeding similarly, we finally get

R~n = Rn Rn - 1 Rn - 2 ⋯R3 R2 R1 : ð17:93Þ

In total, we have applied the permutation of (17.84) n(n - 1)/2 times. When n is
2, n(n - 1)/2 = 1. This is the case with (17.79). If n is 3, n(n - 1)/2 = 3. This is the
case with (17.81). Thus, (17.93) once again confirms that the multiplication order is
reversed with respect to the moving system and fixed system.
Meanwhile, we define P ~ as

~  R1 Rð1Þ Rð2Þ ⋯Rðn - 3Þ Rðn - 2Þ :


P ð17:94Þ
2 3 n-2 n-1

Alternately, we describe

~ = Rn - 1 Rn - 2 ⋯R3 R2 R1 :
P ð17:95Þ

Then, from (17.87) and (17.90), we get

R~n = PR ~
~ ðnn - 1Þ = Rn P: ð17:96Þ

Equivalently, we have

Rn = PR ~ -1
~ ðnn - 1Þ P ð17:97Þ

or
698 17 Symmetry Groups

~ - 1 Rn P:
Rðnn - 1Þ = P ~ ð17:98Þ

Moreover, we have

T
~ T = R1 Rð21Þ Rð32Þ ⋯Rðnn--23Þ Rðnn--12Þ
P

ð n - 2Þ T ðn - 3Þ T ð1Þ T
= Rn - 1 Rn - 2 ⋯ R2 ½R1 T

ðn - 2Þ - 1 ð n - 3Þ - 1 ð1Þ - 1
= Rn - 1 Rn - 2 ⋯ R2 R1 - 1

ð1Þ ð2Þ ðn - 3Þ ðn - 2Þ - 1 ~ -1
= R1 R2 R3 ⋯Rn - 2 Rn - 1 =P : ð17:99Þ

ðn - 2Þ ðn - 3Þ
The third equality of (17.99) comes from the fact that matrices Rn - 1 , Rn - 2 ,
ð1Þ
R2 , and R1 are orthogonal matrices. Then, we get

~T P
P ~=P
~P~ T = E: ð17:100Þ

Thus, P ~ is an orthogonal matrix.


From a point of view of practical application, (17.97) and (17.98) are very useful.
This is because Rn and Rðnn - 1Þ are conjugate to each other. Consequently, Rn has the
same eigenvalues and trace as Rðnn - 1Þ . In light of (11.81), we see that (17.97) and
(17.98) relate Rn (i.e., viewed in reference to the original coordinate system) to
Rðnn - 1Þ [i.e., the same transformation viewed in reference to the coordinate system
reached after the (n - 1) transformations]. Since the transformation Rðnn - 1Þ is usually
described in a simple form, matrix calculations to compute Rn can readily be done.
Now let us consider an example.
Example 17.4 Successive rotations A typical illustration of three successive trans-
formations in moving coordinate systems is well known in association with Euler
angles. This contains the following three steps:
1. Rotation by α around the z-axis in the original coordinate system (O)
2. Rotation by β around the y′-axis in the transferred coordinate system (I)
3. Rotation by γ around the z′′-axis (the same as z′-axis) in the transferred coordinate
system (II)
The above three steps are represented by matrices of Rzα, R ′ y0 β , and R ′ ′ z00 γ in
(17.12). That is, as a total transformation R~3 , we have
17.4 Special Orthogonal Group SO(3) 699

cos α – sin α 0 cos β 0 sin β cos γ – sin γ 0


R3 = sin α cos α 0 0 1 0 sin γ cos γ 0
0 0 1 - sin β 0 cos β 0 0 1

cos α cos β cos γ - sin α sin γ - cos α cos β sin γ - sin α cos γ cos α sin β
= sin α cos β cos γ þ cos α sin γ - sin α cos β sin γ þ cos α cos γ sin α sin β :
- sin β cos γ sin β sin γ cos β
ð17:101Þ

This matrix corresponds to (17.87), where n = 3. The angles α, β, and γ in


(17.101) are called Euler angles and their domains are usually taken as 0 ≤ α ≤ 2π,
0 ≤ β ≤ π, 0 ≤ γ ≤ 2π. The matrix (17.101) is widely used in quantum mechanics and
related fields of natural science. The matrix notation, however, differs from literature
to literature, and so care should be taken [3–5]. Using the notation of (17.87), we
have

cos α – sin α 0
R1 = sin α cos α 0 , ð17:102Þ
0 0 1

cos β 0 sin β
ð1Þ
R2 = 0 1 0 , ð17:103Þ
- sin β 0 cos β

cos γ – sin γ 0
ð2Þ
R3 = sin γ cos γ 0 : ð17:104Þ
0 0 1

From (17.94) we also get

cos α cos β – sin α cos α sin β


~  R1 Rð21Þ =
P sin α cos β cos α sin α sin β : ð17:105Þ
- sin β 0 cos β

Corresponding to (17.97), we have

-1
R3 = PR ~ - 1 = R1 Rð21Þ Rð32Þ Rð21Þ
~ ð32Þ P R1 - 1 : ð17:106Þ

Now matrix calculations are readily performed such that


700 17 Symmetry Groups

cos α – sin α 0 cos β 0 sin β cos γ – sin γ 0


R3 = sin α cos α 0 0 1 0 sin γ cos γ 0
0 0 1 - sin β 0 cos β 0 0 1

cos β 0 - sin β cos α sin α 0


× 0 1 0 - sin α cos α 0
sin β 0 cos β 0 0 1

ð cos 2 αcos 2 β þ sin 2 αÞ cosγ cos αsin αsin 2 βð1- cos γ Þ cos αcos β sin βð1- cos γ Þ
þ cos 2 αsin 2 β - cos β sin γ þ sin αsin β sin γ

cos αsin αsin 2 βð1 - cos γ Þ ð sin 2 αcos 2 β þ cos 2 αÞ cosγ sinα cosβ sin βð1- cos γ Þ
= þ cos β sin γ þ sin 2 α sin 2 β - cos αsin β sin γ :

cosβ sin β cos αð1- cos γ Þ sin αcos β sin βð1 - cos γ Þ sin 2 β cosγ
- sin αsin β sin γ þ cosα sinβ sin γ þ cos 2 β
ð17:107Þ

Notice that in (17.107) we have a trace χ described as

χ = 1 þ 2 cos γ: ð17:108Þ

ð2Þ
The trace is the same as that of R3 , as expected from (17.106) and (17.97).
Equation (17.107) apparently seems complicated but has simple and well-defined
ð2Þ
meaning. The rotation represented by R3 is characterized by a rotation by γ around
the z -axis. Figure 17.18 represents the orientation of the z′′-axis viewed in reference
′′

to the original xyz-system. That is, the z′′-axis (identical to the rotation axis A) is
designated by an azimuthal angle α and a zenithal angle β as shown. The operation
R3 is represented by a rotation by γ around the axis A in the xyz-system. The angles α,
β, and γ coincide with the Euler angles designated with the same independent
parameters α, β, and γ.
From (17.77) and (17.107), a diagonalizing matrix for R3 is PU. ~ That is,

eiγ 0 0
{~ -1 { ð2Þ
U P ~U= P
R3 P ~ U R3 P
~U = U { R3 U = 0 e - iγ 0 : ð17:109Þ
0 0 1

~ is a real matrix, we have


Note that as P

~{ = P
P ~ -1
~T = P ð17:110Þ

and
17.4 Special Orthogonal Group SO(3) 701

Fig. 17.18 Rotation γ


around the rotation axis A.
z
A
The orientation of A is
defined by angles α and β as
shown

O y

1 1
p p 0
cos α cos β - sin α cos α sin β 2 2
~ ¼
PU sin α cos β cos α sin α sin β i i
-p p 0
- sin β 0 cos β 2 2
0 0 1
1 i 1 i
p cos α cos β þ p sin α p cos α cos β - p sin α cos α sin β
2 2 2 2
1 i 1 i
¼ p sin α cos β - p cos α p sin α cos β þ p cos α sin α sin β :
2 2 2 2
1 1
- p sin β - p sin β cos β
2 2
ð17:111Þ

A vector representing the rotation axis corresponds to an eigenvalue 1. The


direction cosines of x-, y-, and z-component for the rotation axis A are cosα sin β,
sinα sin β, and cosβ (see Fig. 3.1), respectively, when viewed in reference to the
original xyz-coordinate system. This can directly be shown as follows: The charac-
teristic equation of R3 is expressed as
702 17 Symmetry Groups

jR3 - λE j = 0: ð17:112Þ

Using (17.107) we have

jR3 -λE j
ð cos 2 αcos 2 β þ sin 2 αÞcosγ cosαsinαsin 2 βð1- cosγ Þ cosαcosβ sinβð1- cosγ Þ
þcos 2 αsin 2 β -λ - cosβ sinγ þsinαsinβ sinγ

cosαsinαsin 2 βð1- cosγ Þ ð sin 2 αcos 2 β þ cos 2 αÞcosγ sinαcosβ sinβð1- cosγ Þ
= - cosαsinβ sinγ :
þcosβ sinγ þsin 2 αsin 2 β -λ

cosαcosβ sinβð1- cosγ Þ sinαcosβ sinβð1- cosγ Þ sin 2 β cosγ


- sinαsinβ sinγ þcosαsinβ sinγ þcos 2 β -λ
ð17:113Þ

When λ = 1, we must have the direction cosines of the rotation axis as an


eigenvector. That is, we get

ð cos 2 β cos 2 α þ sin 2 αÞ cos γ sin 2 β cos α sin αð1 - cos γ Þ cos β sin β cos αð1 - cos γ Þ
þ sin 2 β cos 2 α - 1 - cos β sin γ þ sin β sin α sin γ

sin 2 β cos α sin αð1 - cos γ Þ ð cos 2 β sin 2 α þ cos 2 αÞ cos γ sin α cos β sin βð1 - cos γ Þ
þ cos β sin γ þ sin 2 α sin 2 β - 1 - cos α sin β sin γ

cos α cos β sin βð1 - cos γ Þ sin α cos β sin βð1 - cos γ Þ sin 2 β cos γ
- sin α sin β sin γ þ cos α sin β sin γ þ cos 2 β - 1
cos α sin β 0
× sin α sin β = 0 :
cos β 0
ð17:114Þ

The above matrix calculations certainly verify that (17.114) holds. The confir-
mation is left as an exercise for readers.
As an application of (17.107) to an illustrative example, let us consider a 2π/3
rotation around an axis trisecting a solid angle π/2 formed by the x-, y-, and z-axes
(see Fig. 17.19 and Sect. 17.3). Then we have
p p p
cos α = sin α = 1= 2, cos β = 1= 3, sin β = 2= 3,

p
cos γ = - 1=2, sin γ = 3=2: ð17:115Þ

Substituting (17.115) for (17.107), we get


17.4 Special Orthogonal Group SO(3) 703

z z
(a) (b)

O
y
O

y
x

Fig. 17.19 Rotation axis of C3 that permutates basis vectors. (a) The C3 axis is a straight line that
connects the origin and each vertex of a cube. (b) Cube viewed along the C3 axis that connects the
origin and vertex

0 0 1
R3 = 1 0 0 : ð17:116Þ
0 1 0

If we write a linear transformation R3 following (11.37), we get

0 0 1 x1
R3 ðjxiÞ = ðje1 i je2 i je3 iÞ 1 0 0 x2 , ð17:117Þ
0 1 0 x3

where je1i, j e2i, and j e3i represent unit basis vectors in the direction of x-, y-, and z-
axes, respectively. We have jxi = x1 j e1i + x2 j e2i + x3 j e3i. Thus, we get

x1
R3 ðjxiÞ = ðje2 i je3 i je1 iÞ x2 : ð17:118Þ
x3

This implies a cyclic permutation of the basis vectors. This is well characterized
by Fig. 17.19. Alternatively, (17.117) can be expressed in terms of the column
vectors (i.e., coordinate) transformation as
704 17 Symmetry Groups

x3
R3 ðjxiÞ = ðje1 i je2 i je3 iÞ x1 : ð17:119Þ
x2

Care should be taken on which linear transformation is intended out of the basis
vector transformation or coordinate transformation.
As mentioned above, we have compared geometric features on the moving
coordinate system and fixed coordinate system. The features apparently seem to
differ at a glance, but are essentially the same. □

References

1. Hamermesh M (1989) Group theory and its application to physical problems. Dover, New York
2. Cotton FA (1990) Chemical applications of group theory, 3rd edn. Wiley, New York
3. Edmonds AR (1957) Angular momentum in quantum mechanics. Princeton University Press,
Princeton
4. Chen JQ, Ping J, Wang F (2002) Group representation theory for physicists, 2nd edn. World
Scientific, Singapore
5. Arfken GB, Weber HJ, Harris FE (2013) Mathematical methods for physicists, 7th edn. Aca-
demic Press, Waltham
Chapter 18
Representation Theory of Groups

Representation theory is an important pillar of the group theory. As we shall see


soon, a word “representation” and its definition sound a bit daunting. To be short,
however, we need “numbers” or “matrices” to do a mathematical calculation.
Therefore, we may think of the representation as merely numbers and matrices.
Individual representations have their dimension. If that dimension is one, we treat a
representation as a number (real or complex). If the dimension is two or more, we are
going to deal with matrices; in the case of the n-dimension, it is a (n, n) square
matrix. In this chapter we focus on the representation theory of finite groups. In this
case, we have an important theorem stating that a representation of any finite group
can be converted to a unitary representation by a similarity transformation. That is,
group elements of a finite group are represented by a unitary matrix. According to the
dimension of the representation, we have the same number of basis vectors. Bearing
these things firmly in mind, we can pretty easily understand this important notion of
representation.

18.1 Definition of Representation

In Sect. 16.4 we dealt with various aspects of the mapping between group elements.
Of these, we studied fundamental properties of isomorphism and homomorphism. In
this section we introduce the notion of representation of groups and study it. If we
deal with a finite group consisting of n elements, we describe it as ℊ = {g1  e, g2,
⋯, gn} as in the case of Sect. 16.1.
Definition 18.1 Let ℊ = {gν} be a group comprising elements gν. Suppose that a
(d, d ) matrix D(gν) is given for each group element gν. Suppose also that in
correspondence with

© Springer Nature Singapore Pte Ltd. 2023 705


S. Hotta, Mathematical Physical Chemistry,
https://doi.org/10.1007/978-981-99-2512-4_18
706 18 Representation Theory of Groups

gμ gν = gρ , ð18:1Þ

D gμ Dðgν Þ = D gρ ð18:2Þ

holds. Then a set L consisting of D(gν), that is,

L = fDðgν Þg

is said to be a representation.
Although the definition seems somewhat daunting, the representation is, as
already seen, merely a homomorphism.
We call individual matrices D(gν) a representation matrix. A dimension d of the
matrix is said to be a dimension of representation as well. In correspondence with
gνe = gν, we have

D g μ D ð e Þ = D gμ : ð18:3Þ

Therefore, we get

DðeÞ = E, ð18:4Þ

where E is an identity matrix of dimension d. Also in correspondence with


g νgν-1 = gν-1gν = E, we have

Dðgν ÞD gν - 1 = D gν - 1 Dðgν Þ = DðeÞ = E: ð18:5Þ

That is,

D gν - 1 = ½Dðgν Þ - 1 : ð18:6Þ

Namely, an inverse matrix corresponds to an inverse element. If the representa-


tion has one-to-one correspondence, the representation is said to be faithful. In the
case of n-to-one correspondence, the representation is homomorphic. In particular,
representing all group elements by 1 [as (1, 1) matrix] is called an “identity
representation.”
Illustrative examples of the representation have already appeared in Sects. 17.2
and 17.3. In that case the representation was faithful. For instance, in Tables 17.2,
17.4, and 17.6 as well as Fig. 17.12, the number of representation matrices is the
same as that of group elements. In Sects. 17.2 and 17.3, in most cases we have used
real orthogonal matrices. Such matrices are included among unitary matrices. Rep-
resentation using unitary matrices is said to be a “unitary representation.” In fact, a
representation of a finite group can be converted to a unitary representation.
18.1 Definition of Representation 707

Theorem 18.1 [1, 2] A representation of any finite group can be converted to a


unitary representation by a similarity transformation.
Proof Let ℊ = {g1, g2, ⋯, gn} be a finite group of order n. Let D(gν) be a
representation matrix of gν of dimension d. Here we suppose that D(gν) is not
unitary, but non-singular. Using D(gν), we construct the following matrix
H such that

n
H= i=1
Dðgi Þ{ Dðgi Þ: ð18:7Þ

Then, for an arbitrarily chosen group element gj, we have

{ n {
D gj HD gj = i=1
D gj Dðgi Þ{ Dðgi ÞD gj
n {
= i=1
Dðgi ÞD gj Dðgi ÞD gj
n { n
= i=1
D g i gj D gi gj = k=1
D ð gk Þ { D ð gk Þ

= H, ð18:8Þ

where the third equality comes from homomorphism of the representation and the
second last equality is due to rearrangement theorem (see Sect. 16.1).
Note that each matrix D(gi){D(gi) is an Hermitian Gram matrix constructed by a
non-singular matrix and that H is a summation of such Gram matrices. Conse-
quently, on the basis of the argument of Sect. 13.2, H is positive definite and all
the eigenvalues of H are positive. Then, using an appropriate unitary matrix U, we
can get a diagonal matrix Λ such that

U { HU = Λ: ð18:9Þ

Here, the diagonal matrix Λ is given by

λ1
λ2
Λ= , ð18:10Þ

λd

with λi > 0 (1 ≤ i ≤ d). We define Λ1/2 such that


p
λ1 p
λ2
Λ1=2
= , ð18:11Þ
⋱ p
λd
708 18 Representation Theory of Groups

λ1- 1
-1
Λ1=2 = λ2- 1 : ð18:12Þ

λd- 1

Notice that both Λ1/2 and (Λ1/2)-1 are non-singular. Furthermore, we define a
matrix V such that

-1
V = U Λ1=2 : ð18:13Þ

Then, multiplying both sides of (18.8) by V-1 from the left and by V from the
right and inserting VV-1 = E in between, we get

{
V - 1 D gj VV - 1 HVV - 1 D gj V = V - 1 HV: ð18:14Þ

Meanwhile, we have

-1 -1
V - 1 HV = Λ1=2 U { HU Λ1=2 = Λ1=2 Λ Λ1=2 = Λ: ð18:15Þ

With the second equality of (18.15), we used (18.9). Inserting (18.15) into
(18.14), we get

{
V - 1 D gj VΛV - 1 D gj V = Λ: ð18:16Þ

Multiplying both sides of (18.16) by Λ-1 from the left, we get

{
Λ - 1 V - 1 D gj ðVΛÞ  V - 1 D gj V = E: ð18:17Þ

Using (18.13), we have

-1 -1 -1
Λ - 1 V - 1 = ðVΛÞ - 1 = UΛ2 U - 1 = Λ2 U{:
1 1 1
= Λ2

Meanwhile, taking adjoint of (18.13) and noting (18.12), we get

-1 { -1
V{ = Λ1=2 U { = Λ1=2 U{:

Using (18.13) once again, we have


18.2 Basis Functions of Representation 709

VΛ = UΛ1=2 :

Also using (18.13) we have

{
{ -1 -1 { { {
V -1 = Λ2 U { = U{
1 1 1 1
= U Λ2 Λ2 = UΛ2 ,

where we used unitarity of U and Hermiticity of Λ1/2 indicated in (18.11). Using the
above relations and rewriting (18.17), finally we get

{ {
V { D gj V -1 ∙ V - 1 D gj V = E: ð18:18Þ

~ gj as follows
Defining D

~ gj  V - 1 D gj V,
D ð18:19Þ

and taking adjoint of both sides of (18.19), we get

{ { {
V { D gj V -1 ~ gj :
=D ð18:20Þ

Then, from (18.18) we have

~ gj { D
D ~ gj = E: ð18:21Þ

Equation (18.21) implies that a representation of any finite group can be


converted to a unitary representation by a similarity transformation of (18.19).
This completes the proof.

18.2 Basis Functions of Representation

In Part III we considered a linear transformation of a vector. In that case we have


defined vectors as abstract elements with which operation laws of (11.1)–(11.8)
hold. We assumed that the operation is addition. In this part, so far we have dealt
with vectors mostly in ℝ3. Therefore, vectors naturally possess geometric features.
In this section, we extend a notion of vectors so that they can be treated under a wider
scope. More specifically, we include mathematical functions treated in analysis as
vectors.
To this end, let us think of a basis of a representation. According to the dimension
d of the representation, we have d basis vectors. Here we adopt d linearly indepen-
dent basis vectors for the representation.
710 18 Representation Theory of Groups

Let ψ 1, ψ 2, ⋯, and ψ d be linearly independent vectors in a vector space Vd. Let


ℊ = {g1, g2, ⋯, gn} be a finite group of order n. Here we assume that gi
(1 ≤ i ≤ n) is a linear transformation (or operator) such as a symmetry operation
dealt with in Chap. 17. Suppose that the following relation holds with
gi 2 ℊ (1 ≤ i ≤ n):

d
gi ð ψ ν Þ = ψ μ Dμν ðgi Þ ð1 ≤ ν ≤ dÞ: ð18:22Þ
μ=1

Here we followed the notation of (11.37) that represented a linear transformation


of a vector. Rewriting (18.22) more explicitly, we have

D11 ðgi Þ ⋯ D1d ðgi Þ


gi ðψ ν Þ = ðψ 1 ψ 2 ⋯ ψ d Þ ⋮ ⋱ ⋮ : ð18:23Þ
Dd1 ðgi Þ ⋯ Ddd ðgi Þ

Comparing (18.23) with (11.37), we notice that ψ 1, ψ 2, ⋯, and ψ d act as vectors.


The corresponding coordinates of the vectors (or a column vector), i.e., x1, x2, ⋯,
and xd, have been omitted.
Now let us make sure that a set L consisting of {D(g1), D(g2), ⋯, D(gn)} forms a
representation. Operating gj on (18.22), we have

d d
gj ½gi ðψ ν Þ = gj gi ðψ ν Þ = gj ψ μ Dμν ðgi Þ = gj ψ μ Dμν ðgi Þ
μ=1 μ=1

d d
d
= gj ψ μ Dμν ðgi Þ = λ=1
ψ λ Dλμ gj Dμν ðgi Þ
μ=1 μ=1

d
= λ=1
ψ λ D gj Dðgi Þ λν
: ð18:24Þ

Putting gjgi = gk according to multiplication of group elements, we get

d
gk ð ψ ν Þ = λ=1
ψ λ D gj Dðgi Þ λν
: ð18:25Þ

Meanwhile, replacing gi in (18.22) with gk, we have

d
gk ð ψ ν Þ = ψ λ Dλν ðgk Þ: ð18:26Þ
μ=1

Comparing (18.25) and (18.26) and considering the uniqueness of vector repre-
sentation based on linear independence of the basis vectors (see Sect. 11.1), we get
18.2 Basis Functions of Representation 711

D gj D ð gi Þ λν
= Dλν ðgk Þ  ½Dðgk Þλν , ð18:27Þ

where the last identity follows the notation (11.38) of Sect. 11.2. Hence, we have

D gj Dðgi Þ = Dðgk Þ: ð18:28Þ

Thus, the set L consisting of {D(g1), D(g2), ⋯, D(gn)} is certainly a representation


of the group ℊ = {g1, g2, ⋯, gn}. In such a case, the set B consisting of linearly
independent d functions (i.e., vectors)

B = fψ 1 , ψ 2 , ⋯, ψ d g ð18:29Þ

is said to be a basis functions of the representation D. The number d equals the


dimension of representation. Correspondingly, the representation matrix is a (d, d )
square matrix. As remarked above, the correspondence between the elements of L
and ℊ is not necessarily one-to-one (isomorphic), but may be n-to-one
(homomorphic).
Definition 18.2 Let D and D′ be two representations of ℊ = {g1, g2, ⋯, gn}.
Suppose that these representations are related to each other by similarity transfor-
mation such that

D0 ðgi Þ = T - 1 Dðgi ÞT ð1 ≤ i ≤ nÞ, ð18:30Þ

where T is a non-singular matrix. Then, D and D′ are said to be equivalent


representations, or simply equivalent. If the representations are not equivalent,
they are called inequivalent.
Suppose that B = fψ 1 , ψ 2 , ⋯, ψ d g is a basis of a representation D of ℊ = {g1,
g2, ⋯, gn}. Then we have (18.22). Let T be a non-singular matrix. Using T, we
want to transform a basis of the representation D from B to a new set
B 0 = ψ 01 , ψ 02 , ⋯, ψ 0d . Individual elements of the new basis are expressed as

d
ψ 0ν = ψ λ T λν : ð18:31Þ
λ=1

Since T is non-singular, this ensures that ψ 01 , ψ 02 , ⋯, and ψ 0d are linearly inde-


pendent and, hence, that B 0 forms another basis set of D (see discussion of Sect.
11.4). Thus, we can describe ψ μ (1 ≤ μ ≤ n) in terms of ψ 0ν . That is,

d
ψμ = ψ 0λ T - 1 λμ
: ð18:32Þ
λ=1

Operating gi on both sides of (18.31), we have


712 18 Representation Theory of Groups

d d
d
gi ψ 0ν = gi ðψ λ ÞT λν = λ=1
ψ μ Dμλ ðgi ÞT λν
λ=1 μ=1

d d
d d
= λ=1
ψ 0κ T - 1 κμ
Dμλ ðgi ÞT λν = κ=1
ψ 0κ T - 1 Dðgi ÞT κν
: ð18:33Þ
μ=1 κ=1

Let D′ be a representation of ℊ in reference to B 0 = ψ 01 , ψ 02 , ⋯, ψ 0d . Then


we have

d
gi ψ 0ν = κ=1
ψ 0κ D ′ κν :

Hence, (18.30) follows in virtue of the linear independence of


ψ 01 , ψ 02 , ⋯, and ψ 0d . Thus, we see that the transformation of basis vectors via
(18.31) causes similarity transformation between representation matrices. This is
in parallel with (11.81) and (11.88).
Below we show several important notions as a definition. We assume that a group
is a finite group.
Definition 18.3 Let D and D ~ be two representations of ℊ = {g1, g2, ⋯, gn}. Let
~ ðgi Þ be two representation matrices of gi (1 ≤ i ≤ n). If we construct Δ(gi)
D(gi) and D
such that

Dðgi Þ
Δ ð gi Þ = ~ ð gi Þ , ð18:34Þ
D

then Δ(gi) is a representation as well. The representation Δ(gi) is said to be a direct


sum of D(gi) and D ~ ðgi Þ. We denote it by

Δðgi Þ = Dðgi Þ ~ ðgi Þ:


D ð18:35Þ

~ ðgi Þ.
A dimension Δ(gi) is a sum of that of D(gi) and that of D
Definition 18.4 Let D be a representation of ℊ and let D(gi) be a representation
matrix of a group element gi. If we can convert D(gi) to a block matrix such as
(18.34) by its similarity transformation, D is said to be a reducible representation, or
completely reducible. If the representation is not reducible, it is irreducible.
A reducible representation can be decomposed (or reduced) to a direct sum of
plural irreducible representations. Such an operation is called reduction.
Definition 18.5 Let ℊ = {g1, g2, ⋯, gn} be a group and let V be a vector space.
Then, we denote a representation D of ℊ operating on V by
18.2 Basis Functions of Representation 713

D : ℊ → GLðV Þ:

We call V a representation space (or carrier space) L S of D [3, 4].


From Definition 18.5, the dimension of representation is identical with the
dimension of V. Suppose that there is a subspace W in V. If W is D(gi)-invariant
(see Sect. 12.2) with all 8gi 2 ℊ, W is said to be an invariant subspace of V. Here, we
say that W is D(gi)-invariant if we have | xi 2 W ⟹ D(gi)| xi 2 W; see Sect. 12.2.
Notice that | xi may well represent a function ψ ν of (18.22). In this context, we have a
following important theorem.
Theorem 18.2 [3] Let D: ℊ → GL(V ) be a unitary representation over V. Let W be a
D(gi)-invariant subspace of V, where gi (1 ≤ i ≤ n) 2 ℊ = {g1, g2, ⋯, gn}. Then,
W⊥ is a D(gi)-invariant subspace of V as well.
Proof Suppose that | ai 2 W⊥ and let | bi 2 W. Then, from (13.86) we have
  
hbjDðgi Þjai = ajDðgi Þ{ jb = aj½Dðgi Þ - 1 jb = ajD gi - 1 jb , ð18:36Þ

where with the second equality we used the fact that D(gi) is a unitary matrix; also,
see (18.6). But, since W is D(gi)-invariant from the supposition, jD(gi-1)| bi 2 W.
Here notice that gi-1 must be identical to ∃gk (1 ≤ k ≤ n) 2 ℊ. Therefore, we
should have

ajD gi - 1 jb = 0:

Hence, from (18.36) we get



0 = ajD gi - 1 jb = hbjDðgi Þjai:

This implies that jD(gi)| ai 2 W⊥. In turn, this means that W⊥ is a D(gi)-invariant
subspace of V. This completes the proof.
From Theorem 14.1, we have V = W ⨁ W⊥. Correspondingly, as in the case of
(12.102), D(gi) is reduced as follows:

DðW Þ ðgi Þ
Dðgi Þ = ⊥ ,
DðW Þ ðgi Þ

where D(W )(gi) and DðW Þ ðgi Þ are representation matrices associated with subspaces
W and W⊥, respectively. This is the converse of the additive representations that
appeared in Definition 18.3. From (11.17) and (11.18), we have

dimV = dimW þ dimW ⊥ : ð18:37Þ


714 18 Representation Theory of Groups

If V is a d-dimensional vector space, V is spanned by d linearly independent basis


vectors (or functions) ψ μ (1 ≤ μ ≤ d ). Suppose that the dimension W and W⊥ is d(W )

and dðW Þ , respectively. Then, W and W⊥ are spanned by d(W ) linearly independent

vectors and dðW Þ linearly independent vectors, respectively. The subspaces W and
W⊥ may well further be decomposed into orthogonal complements [i.e., D(gi)-
invariant subspaces of V]. In this situation, in general, we write

Dðgi Þ = Dð1Þ ðgi Þ Dð2Þ ðgi Þ ⋯ DðωÞ ðgi Þ: ð18:38Þ

Thus, unitary representation is completely reducible (Definition 18.4), if it is


reducible at any rate. This is one of the conspicuous features of the unitary repre-
sentation. This can readily be understood during the course of the proof of Theorem
18.2. We will develop further detailed discussion in Sect. 18.4.
If the aforementioned decomposition cannot be done, D is irreducible. Therefore,
it will be of great importance to examine a dimension of irreducible representations
contained in (18.38). This is closely related to the choice of basis vectors and
properties of the invariant subspaces. Notice that the same irreducible representation
may repeatedly occur in (18.38).
Before advancing to the next section, however, let us think of an example to get
used to abstract concepts. This is at the same time for the purpose of taking the
contents of Chap. 19 in advance.
Example 18.1 Figure 18.1 shows structural formulae including resonance struc-
tures for allyl radical. Thanks to the resonance structures, the allyl radical belongs to
C2v; see the multiplication table in Table 17.1. The molecule lies on the yz-plane and
a line connecting C1 and a bonded H is the C2 axis (see Fig. 18.1). The C2 axis is
identical to the z-axis. The molecule has mirror symmetries with respect to the yz-
and zx-planes. We denote π-orbitals of C1, C2, and C3 by ϕ1, ϕ2, and ϕ3, respec-
tively. We suppose that these orbitals extend toward the x-direction with a positive
sign in the upper side and a negative sign in the lower side relative to the plane of
paper. Notice that we follow custom of the group theory notation with the coordinate
setting in Fig. 18.1.
We consider an inner product space V3 spanned by ϕ1, ϕ2, and ϕ3. To explicitly
show that these are vectors of the inner product space, we express them as jϕ1i, jϕ2i,
and jϕ3i in this example. Then, according to (11.19) of Sect. 11.1, we write

V 3 = Spanfjϕ1 i, jϕ2 i, jϕ3 ig:

The vector space V3 is a representation space pertinent to a representation D of the


present example. Also, in parallel to (13.32) of Sect. 13.2, we express an arbitrary
vector jψi 2 V3 as
18.2 Basis Functions of Representation 715

H H

H C1 H H C1 H
C3 C2 C3 C2

H H H H

y
x

Fig. 18.1 Allyl radical and its resonance structure. The molecule is placed on the yz-plane. The z-
axis is identical with a straight line connecting C1 and H (i.e., C2 axis)

jψi = c1 jϕ1 i þ c2 jϕ2 i þ c3 jϕ3 i = j c1 ϕ1 þ c2 ϕ2 þ c3 ϕ3 i

c1
= ðjϕ1 i jϕ2 i jϕ3 iÞ c2 :
c3

Let us now operate a group element of C2v. For example, choosing C2(z), we have

-1 0 0 c1
C 2 ðzÞðjψiÞ = ðjϕ1 i jϕ2 i jϕ3 iÞ 0 0 -1 c2 : ð18:39Þ
0 -1 0 c3

Thus, C2(z) is represented by a (3, 3) matrix. Other group elements are


represented similarly. These results are collected in Table 18.1, where each matrix
is given with respect to jϕ1i, jϕ2i, and jϕ3i as basis vectors. Notice that the matrix
representations differ from those of Table 17.2, where we chose jxi, j yi, and j zi for
the basis vectors. From Table 18.1, we immediately see that the representation
matrices are reduced to an upper (1, 1) diagonal matrix (i.e., just a number) and a
lower (2, 2) diagonal matrix.
In Table 18.1, we find that all the matrices are Hermitian (as well as unitary).
Since C2v is an Abelian group (i.e., commutative), in light of Theorem 14.14, we
should be able to diagonalize these matrices by a single unitary similarity transfor-
mation all at once. In fact, E and σ 0v ðyzÞ are invariant with respect to unitary similarity
transformation, and so we only have to diagonalize C2(z) and σ v(zx) at once. As a
characteristic equation of the above (3, 3) matrix of (18.39), we have
716 18 Representation Theory of Groups

Table 18.1 Matrix representation of symmetry operations given with respect to jϕ1i, jϕ2i, and jϕ3i
E C2(z) σ v(zx) σ 0v ðyzÞ
1 0 0 -1 0 0 1 0 0 -1 0 0
0 1 0 0 0 -1 0 0 1 0 -1 0
0 0 1 0 -1 0 0 1 0 0 0 -1

-λ-1 0 0
0 -λ - 1 = 0:
0 -1 -λ

Solving the above equation, we get λ = 1 or λ = - 1 (as a double root). Also as a


diagonalizing unitary matrix U, we get

1 0 0
1 1
0 p p
U= 2 2 = U{: ð18:40Þ
1 1
0 p -p
2 2

Thus, (18.39) can be rewritten as

-1 0 0 c1
C 2 ðzÞðjψiÞ = ðjϕ1 i jϕ2 i jϕ3 iÞUU { 0 0 -1 UU { c2
0 -1 0 c3

c1
1
1 1 -1 0 0 p ð c2 þ c 3 Þ
= jϕ1 i p jϕ2 þ ϕ3 i p jϕ2 - ϕ3 i 0 -1 0 2 :
2 2 0 0 1 1
p ð c2 - c3 Þ
2

The diagonalization of the representation matrix σ v(zx) using the same U as the
above is left for the readers as an exercise. Table 18.2 shows the results of the
diagonalized representations with regard to the vectors
j ϕ1 i, p12 j ϕ2 þ ϕ3 i, and p12 j ϕ2 - ϕ3 i. Notice that traces of the matrices remain
unchanged in Tables 18.1 and 18.2, i.e., before and after the unitary similarity
transformation. From Table 18.2, we find that the vectors are eigenfunctions of the
individual symmetry operations. Each diagonal element is a corresponding eigen-
value of those operations. Of the three vectors, jϕ1i and p12 j ϕ2 þ ϕ3 i have the same
eigenvalues with respect to individual symmetry operations. With symmetry oper-
ations C2 and σ v(zx), another vector p12 j ϕ2 - ϕ3 i has an eigenvalue of a sign
opposite to that of the former two vectors. Thus, we find that we have arrived at
“symmetry-adapted” vectors by taking linear combination of original vectors.
18.3 Schur’s Lemmas and Grand Orthogonality Theorem (GOT) 717

Table 18.2 Matrix representation of symmetry operations given with respect to jϕ1i,
p1 ðjϕþ ϕ3 iÞ, and p12 ðjϕ2 - ϕ3 iÞ
2 2

E C2(z) σ v(zx) σ 0v ðyzÞ


1 0 0 -1 0 0 1 0 0 -1 0 0
0 1 0 0 -1 0 0 1 0 0 -1 0
0 0 1 0 0 1 0 0 -1 0 0 -1

Returning back to Table 17.2, we see that the representation matrices have already
been diagonalized. In terms of the representation space, we constructed the repre-
sentation matrices with respect to x, y, and z as symmetry-adapted basis vectors. In
the next chapter, we make the most of such vectors constructed by the symmetry-
adapted linear combination.
Meanwhile, allocating appropriate numbers to coefficients c1, c2, and c3, we
represent any vector in V3 spanned by jϕ1i, jϕ2i, and jϕ3i. In the next chapter, we
deal with molecular orbital (MO) calculations. Including the present case of the allyl
radical, we solve the energy eigenvalue problem by appropriately determining those
coefficients (i.e., eigenvectors) and corresponding energy eigenvalues in a represen-
tation space. A dimension of the representation space depends on the number of
molecular orbitals. The representation space is decomposed into several or more (but
finite) orthogonal complements according to (18.38). We will come back to the
present example in Sect. 19.4.4 and further investigate the problems there.
As can be seen in the above example, the representation space accommodates
various types of vectors, e.g., mathematical functions in the present case. If the
representation space is decomposed into invariant subspaces, we can choose appro-
priate basis vectors for each subspace, the number of which is equal to the dimen-
sionality of each irreducible representation [4]. In this context, the situation is related
to that of Part III where we have studied how a linear vector space is decomposed
into direct sum of invariant subspaces that are spanned by associated basis vectors.
In particular, it will be of great importance to construct mutually orthogonal
symmetry-adapted vectors through linear combination of original basis vectors in
each subspace associated with the irreducible representation. We further study these
important subjects in the following several sections.

18.3 Schur’s Lemmas and Grand Orthogonality Theorem


(GOT)

Pivotal notions of the representation theory of finite groups rest upon Schur’s
lemmas (first lemma and second lemma).
Schur’s First Lemma [1, 2]
Let D and D ~ be two irreducible representations of ℊ. Let dimensions of represen-
~ be m and n, respectively. Suppose that with 8g 2 ℊ the following
tations of D and D
relation holds:
718 18 Representation Theory of Groups

~ ðgÞ,
DðgÞM = M D ð18:41Þ

where M is a (m, n) matrix. Then we must have


Case (1): M = 0 or
Case (2): M is a square matrix (i.e., m = n) with detM ≠ 0.
~ are inequivalent. In Case (2), on the
In Case (1), the representations of D and D
~
other hand, D and D are equivalent.
Proof
(a) First, suppose that m > n. Let B = fψ 1 , ψ 2 , ⋯, ψ m g be a basis set of the
representation space related to D. Then we have

m
gðψ ν Þ = ψ μ Dμν ðgÞ ð1 ≤ ν ≤ mÞ:
μ=1

Next we form a linear combination of ψ 1, ψ 2, ⋯, and ψ m such that

m
ϕν = ψ μ M μν ð1 ≤ ν ≤ nÞ: ð18:42Þ
μ=1

Operating g on both sides of (18.42), we have

m
m m
gðϕν Þ = g ψ μ M μν = μ=1 λ=1
ψ λ Dλμ ðgÞ M μν
μ=1

=
m
ψλ
m
Dλμ ðgÞM μν =
m
ψλ
n
~ μν ðgÞ
M λμ D
λ=1 μ=1 λ=1 μ=1

=
n m
~ μν ðgÞ =
ψ λ M λμ D
n
~
ϕ D ðgÞ: ð18:43Þ
μ=1 λ=1 μ = 1 μ μν

With the fourth equality of (18.43), we used (18.41). Therefore, B~ =


~
fϕ1 , ϕ2 , ⋯, ϕn g constitutes a basis set of the representation space for D.
If m > n, we would have been able to construct a basis of the representation
D using n (<m) functions of ϕν (1 ≤ ν ≤ n) that are a linear combination of
ψ μ (1 ≤ μ ≤ m). In that case, we would have obtained a representation of a smaller
dimension n for D ~ by taking a linear combination of ψ 1, ψ 2, ⋯, and ψ m. As n < m
by supposition, this implies that as in (18.34), D would be decomposed into
representations whose dimension is smaller than m. It is in contradiction to the
supposition that D is irreducible. To avoid this contradiction, we must have M = 0.
This makes (18.41) trivially hold.
(b) Next we consider a case of m < n. Taking a complex conjugate of (18.41) and
exchanging both sides, we get
18.3 Schur’s Lemmas and Grand Orthogonality Theorem (GOT) 719

~ ðgÞ{ M { = M { DðgÞ{ :
D ð18:44Þ

Then, from (18.21) and (18.28), we have

DðgÞ{ DðgÞ = E and DðgÞD g - 1 = DðeÞ: ð18:45Þ

Therefore, we get

D g - 1 = DðgÞ - 1 = DðgÞ{ :

Similarly, we have

D ~ ð gÞ - 1 = D
~ g-1 = D ~ ð gÞ { :

Thus, (18.44) is rewritten as

~ g - 1 M{ = M{D g - 1 :
D ð18:46Þ

The number of rows of M{ is larger than that of columns. This time, dimension
n of D~ ðg - 1 Þ is larger than m of D(g-1). A group element g designates an arbitrary
element of ℊ, so does g-1. Thus, (18.46) can be treated in parallel with (18.41)
where m > n with a (m, n) matrix M. Figure 18.2 graphically shows the magnitude
relationship in dimensions of representation between representation matrices and M.
Thus, in parallel with the above argument of (a), we exclude the case where m < n as
well.
(c) We consider the third case of m = n. Similarly as before we make a linear
combination of vectors contained in the basis set B = fψ 1 , ψ 2 , ⋯, ψ m g such that

m
ϕν = ψ μ M μν ð1 ≤ ν ≤ mÞ, ð18:47Þ
μ=1

where M is a (m, m) square matrix. If detM = 0, then ϕ1, ϕ2, ⋯, and ϕm are linearly
dependent (see Sect. 11.4). With the number of linearly independent vectors p, we
have p < m accordingly. As in (18.43) again, this implies that we would have
obtained a representation of a smaller dimension p for D, ~ in contradiction to the
~
supposition that D is irreducible. To avoid this contradiction, we must have detM ≠ 0.
These complete the proof.
Schur’s Second Lemma [1, 2]
Let D be a representation of ℊ. Suppose that with 8g 2 ℊ we have
720 18 Representation Theory of Groups

(a): >

( ) × = ( )
×
(m, m) (m, n) (m, n) (n, n)

(b): <

( ) × = × ( )
(n, n) (n, m) (n, m) (m, m)

~ ðgÞ]
Fig. 18.2 Magnitude relationship between dimensions of representation matrices [D(g) and D
and M. (a) m > n. (b) m < n. The diagram is based on (18.41) and (18.46) of the text

DðgÞM = MDðgÞ: ð18:48Þ

Then, if D is irreducible, M = cE with c being an arbitrary complex number,


where E is an identity matrix.
Proof Let c be an arbitrarily chosen complex number. From (18.48) we have

DðgÞðM - cEÞ = ðM - cE ÞDðgÞ: ð18:49Þ

If D is irreducible, Schur’s first lemma implies that we must have either (1) M -
cE = 0 or (2) det(M - cE) ≠ 0. A matrix M has at least one proper eigenvalue λ (Sect.
12.1), and so choosing λ for c in (18.49), we have det(M - λE) = 0. Consequently,
only former case is allowed. That is, we have

M = cE: ð18:50Þ

This completes the proof.


Schur’s lemmas lead to important orthogonality theorem that plays a fundamental
role in many scientific fields. The orthogonality theorem includes that of matrices
and their traces (or characters).
Theorem 18.3 Grand Orthogonality Theorem (GOT) [5] Let D(1), D(2), ⋯ be
all inequivalent irreducible representations of a group ℊ = {g1, g2, ⋯, gn} of order
18.3 Schur’s Lemmas and Grand Orthogonality Theorem (GOT) 721

n. Let D(α) and D(β) be two irreducible representations chosen from among D(1), D(2),
⋯. Then, regarding their matrix representations, we have the following relationship:

ðαÞ ðβ Þ n
Dij ðgÞ Dkl ðgÞ = δ δ δ , ð18:51Þ
g dα αβ ik jl

where Σg means that the summation should be taken over all n group elements and dα
denotes a dimension of the representation D(α). The symbol δαβ means that δαβ = 1
when D(α) and D(β) are equivalent and that δαβ = 0 when D(α) and D(β) are
inequivalent.
Proof First we prove the case where D(α) = D(β). For the sake of simple expression,
we omit a superscript and denote D(α) simply by D. Let us construct a matrix
A such that

A= g
DðgÞXD g - 1 , ð18:52Þ

where X is an arbitrary matrix. Hence,

-1
Dðg0 ÞA = g
Dðg0 ÞDðgÞXD g - 1 = g
Dðg0 ÞDðgÞXD g - 1 D g0 D ð g0 Þ

-1
= g
Dðg0 ÞDðgÞXD g - 1 g0 D ð g0 Þ

-1
= g
Dðg0 gÞXD ðg0 gÞ Dðg0 Þ: ð18:53Þ

Thanks to the rearrangement theorem, for fixed g′ the element g′g runs through all
the group elements as g does so. Therefore, we have

-1
g
Dðg0 gÞXD ðg0 gÞ = g
DðgÞXD g - 1 = A: ð18:54Þ

Thus,

DðgÞA = ADðgÞ: ð18:55Þ

According to Schur’s second lemma, we have

A = λE: ð18:56Þ

ðlÞ
The value of a constant λ depends upon the choice of X. Let X be δi δjðmÞ where all
the matrix elements are zero except for the (l, m)-component that takes 1 (Sects. 12.5
and 12.6). Thus from (18.53) we have
722 18 Representation Theory of Groups

Dip ðgÞδðplÞ δqðmÞ Dqj g - 1 = g


Dil ðgÞDmj g - 1 = λlm δij , ð18:57Þ
g, p, q

where λlm is a constant to be determined. Using the fact that the representation is
unitary, from (18.57) we have

g
Dil ðgÞDjm ðgÞ = λlm δij : ð18:58Þ

Next, we wish to determine coefficients λlm. To this end, setting i = j and


summing over i in (18.57), we get for LHS

g i
Dil ðgÞDmi g - 1 = g
D g - 1 DðgÞ ml
= g
D g - 1g ml

= g
½DðeÞml = δ
g ml
= nδml , ð18:59Þ

where n is equal to the order of group. As for RHS, we have

λ δ
i lm ii
= λlm d, ð18:60Þ

where d is equal to a dimension of D. From (18.59) and (18.60), we get

n
λlm d = nδlm or λlm = δ : ð18:61Þ
d lm

Therefore, from (18.58)

n
Dil ðgÞDjm ðgÞ = δ δ : ð18:62Þ
g d lm ij

Specifying a species of the irreducible representation, we get

ðαÞ ðαÞ n
Dil ðgÞDjm ðgÞ = δ δ , ð18:63Þ
g dα lm ij

where dα is a dimension of D(α).


Next, we examine the relationship between two inequivalent irreducible repre-
sentations. Let D(α) and D(β) be such representations with dimensions dα and dβ,
respectively. Let us construct a matrix B such that

B= g
DðαÞ ðgÞXDðβÞ g - 1 , ð18:64Þ

where X is again an arbitrary matrix. Hence,


18.4 Characters 723

DðαÞ ðg0 ÞB = g
DðαÞ ðg0 ÞDðαÞ ðgÞXDðβÞ g - 1

-1
= g
DðαÞ ðg0 ÞDðαÞ ðgÞXDðβÞ g - 1 DðβÞ g0 DðβÞ ðg0 Þ

-1
= g
DðαÞ ðg0 gÞXDðβÞ ðg0 gÞ DðβÞ ðg0 Þ = BDðβÞ ðg0 Þ: ð18:65Þ

According to Schur’s first lemma, we have

B = 0: ð18:66Þ

ðlÞ
Putting X = δi δjðmÞ as before and rewriting (18.64), we get

ðαÞ ðβ Þ
g
Dil ðgÞDjm ðgÞ = 0: ð18:67Þ

Combining (18.63) and (18.67), we get (18.51). These procedures complete the
proof.

18.4 Characters

Representation matrices of a group are square matrices. In Part III we examined


properties of a trace, i.e., a sum of diagonal elements of a square matrix. In group
theory the trace is called a character.
Definition 18.6 Let D be a (matrix) representation of a group ℊ = {g1, g2, ⋯,
gn}. The sum of diagonal elements χ(g) is defined as follows:

d
χ ðgÞ  TrDðgÞ = i=1
Dii ðgÞ, ð18:68Þ

where g stands for group elements g1, g2, ⋯, and gn, Tr stands for “trace,” and d is
a dimension of the representation D. Let ∁ be a set defined as

∁ = fχ ðg1 Þ, χ ðg2 Þ, ⋯, χ ðgn Þg: ð18:69Þ

Then, the set ∁ is called a character of D. A character of an irreducible represen-


tation is said to be an irreducible character.
Let us describe several properties of the character or trace.
1. A character of the identity element χ(e) is equal to a dimension d of a represen-
tation. This is because the identity element is given by a unit matrix.
2. Let P and Q be two square matrices. Then, we have
724 18 Representation Theory of Groups

TrðPQÞ = TrðQPÞ: ð18:70Þ

This is because

ðPQÞii = Pij Qji = Qji Pij = ðQPÞjj : ð18:71Þ


i i j j i j

Putting Q = SP-1 in (18.70), we get

Tr PSP - 1 = Tr SP - 1 P = TrðSÞ: ð18:72Þ

Therefore, we have the following property:


3. Characters of group elements that are conjugate to each other are equal. If gi and
gj are conjugate, these elements are connected by means of a suitable element
g such that

ggi g - 1 = gj : ð18:73Þ

Accordingly, a representation matrix is expressed as

DðgÞDðgi ÞD g - 1 = DðgÞDðgi Þ½DðgÞ - 1 = D gj : ð18:74Þ

Taking a trace of both sides of (18.74), from (18.72) we have

χ ðgi Þ = χ gj : ð18:75Þ

4. Any two equivalent representations have the same trace. This immediately
follows from (18.30).
There are several orthogonality theorem about a trace. Among them, a following
theorem is well known.
Theorem 18.4 A trace of irreducible representations satisfies the following orthog-
onality relation:

g
χ ðαÞ ðgÞ χ ðβÞ ðgÞ = nδαβ , ð18:76Þ

where χ (α) and χ (β) are traces of irreducible representations D(α) and D(β),
respectively.
Proof In (18.51) putting i = j and k = l in both sides and summing over all i and k,
we have
18.4 Characters 725

ðαÞ ðβÞ n n n
Dii ðgÞ Dkk ðgÞ = δαβ δik δik = δαβ δik = δαβ dα
g i, k i, k
d α i, k
d α d α

= nδαβ : ð18:77Þ

From (18.68) and (18.77), we get (18.76). This completes the proof.
Since a character is identical with group elements belonging to the same
conjugacy class Kl, we may write it as χ(Kl) and rewrite a summation of (18.76) as
a summation of the conjugacy classes. Thus, we have
nc
l=1
χ ðαÞ ðK l Þ χ ðβÞ ðK l Þk l = nδαβ , ð18:78Þ

where nc denotes the number of conjugacy classes in a group and kl indicates the
number of group elements contained in a class Kl.
We have seen a case where a representation matrix can be reduced to two
(or more) block matrices as in (18.34). Also as already seen in Part III, the block
matrix decomposition takes place with normal matrices (including unitary matrices).
The character is often used to examine a constitution of a reducible representation or
reducible matrix.
Alternatively, if a unitary matrix (a normal matrix, more widely) is decomposed
into block matrices, we say that the unitary matrix comprises a direct sum of those
block matrices. In physics and chemistry dealing with atoms, molecules, crystals,
etc., we very often encounter such a situation. Extending (18.35), the relation can
generally be summarized as

Dðgi Þ = Dð1Þ ðgi Þ Dð2Þ ðgi Þ ⋯ DðωÞ ðgi Þ, ð18:79Þ

where D(gi) is a reducible representation for a group element gi and D(1)(gi), D(2)(gi),
⋯, and D(ω)(gi) are irreducible representations in a group. The notation D(ω)(gi)
means that D(ω)(gi) may be equivalent (or identical) to D(1)(gi), D(2)(gi), etc. or may
be inequivalent to them. More specifically, the same irreducible representations may
well appear several times.
To make the above situation clear, we usually use a following equation instead:

ðαÞ
Dðgi Þ = q D
α α
ðgi Þ, ð18:80Þ

where qα is zero or a positive integer and D(α) is different types of irreducible


representations. If the same D(α) repeatedly appears in the direct sum, then qα
specifies how many times D(α) appears in the direct sum. Unless D(α) appears, qα
is zero.
Bearing the above in mind, we take a trace of (18.80). Then we have
726 18 Representation Theory of Groups

ðαÞ
χ ð gÞ = q χ
α α
ðgÞ, ð18:81Þ

where χ(g) is a character of the reducible representation for a group element g, where
we omitted a subscript i indicating the i-th group element gi (1 ≤ i ≤ n). To find qα,
let us multiply both sides of (18.81) by χ (α)(g) and take summation over group
elements. That is,

g
χ ð α Þ ð gÞ  χ ð gÞ = q
β β
χ ðαÞ ðgÞ χ ðβÞ ðgÞ = qβ nδαβ = qα n, ð18:82Þ
g β

where we used (18.76) with the second equality. Thus, we get

1
qα = χ ðαÞ ðgÞ χ ðgÞ: ð18:83Þ
n g

The integer qα explicitly gives the number of appearance of D(α)(g) that appears in
a reducible representation D(g). The expression pertinent to the classes is

1 
qα = χ ðαÞ K j χ K j kj , ð18:84Þ
n j

where Kj and kj denote the j-th class of the group and the number of elements
belonging to Kj, respectively.

18.5 Regular Representation and Group Algebra

Now, the readers may wonder how many different irreducible representations exist
for a group. To answer this question, let us introduce a special representation of the
regular representation.
Definition 18.7 Let ℊ = {g1, g2, ⋯, gn} be a group. Let us define a (n, n) square
matrix D(R)(gν) for an arbitrary group element gν (1 ≤ ν ≤ n) such that

DðRÞ ðgν Þ = δ gi- 1 gν gj ð1 ≤ i, j ≤ nÞ, ð18:85Þ


ij

where

1 for gν = e ði:e:; identityÞ,


δ ð gν Þ = ð18:86Þ
0 for gν ≠ e:

Let us consider a set


18.5 Regular Representation and Group Algebra 727

R = DðRÞ ðg1 Þ, DðRÞ ðg2 Þ, ⋯, DðRÞ ðgn Þ : ð18:87Þ

Then, the set R is said to be a regular representation of the group ℊ.


In fact, R is a representation. This is confirmed as follows: In (18.85), if
gi- 1 gν gj = e, δ gi- 1 gν gj = 1. This occurs, when gνgj = gi (A). Meanwhile, let us
consider a situation where gj- 1 gμ gk = e. This occurs when gμgk = gj (B). Replacing
gj in (A) with that in (B), we have

gν g μ g k = g i : ð18:88Þ

That is,

gi- 1 gν gμ gk = e: ð18:89Þ

If we choose gi and gν, gj is uniquely decided from (A). If gμ is separately chosen,


then gk is uniquely decide from (B) as well, because gj has already been uniquely
decided. Thus, performing the following matrix calculations, we get

DðRÞ ðgν Þ DðRÞ gμ = δ gi- 1 gν gj δ gj- 1 gμ gk = δ gi- 1 gν gμ gk


ij jk
j j

= D ð RÞ g ν g μ : ð18:90Þ
ik

Rewriting (18.90) in a matrix product form, we get

DðRÞ ðgν ÞDðRÞ gμ = DðRÞ gν gμ : ð18:91Þ

Thus, D(R) is certainly a representation.


To further confirm this, let us think of an example.
Example 18.2 Let us consider a thiophene molecule that we have already examined
in Sect. 17.2. In a multiplication table, we arrange E, C2, σ v , σ v0 in a first column and
their inverse element E, C2, σ v , σ v0 in a first row. In this case, the inverse element is
the same as original element itself. Paying attention, e.g., to C2, we allocate the
number 1 on the place where C2 appears and the number 0 otherwise. Then, that
matrix is a regular representation of C2; see Table 18.3. Thus as D(R)(C2), we get
728 18 Representation Theory of Groups

Table 18.3 How to make a


C2v E-1 C2(z)-1 σ v(zx)-1 σ 0v ðyzÞ - 1
regular representation of C2v
E E C2 σv σ 0v
C2(z) C2 E σ 0v σv
σ v(zx) σv σ 0v E C2
σ 0v ðyzÞ σ 0v σv C2 E

0 1 0 0
1 0 0 0
DðRÞ ðC 2 Þ = : ð18:92Þ
0 0 0 1
0 0 1 0

As evidenced in (18.92), the rearrangement theorem ensures that the number


1 appears once and only once in each column and each row in such a way that
individual column and row vectors become linearly independent. Thus, at the same
time, we confirm that the matrix is unitary.
Another characteristic of the regular representation is that the identity is
represented by an identity matrix. In this example, we have

1 0 0 0
ð RÞ 0 1 0 0
D ðE Þ = : ð18:93Þ
0 0 1 0
0 0 0 1

For the other symmetry operations, we have

0 0 1 0 0 0 0 1
ð RÞ 0 0 0 1 ð RÞ 0 0 1 0
D ðσ v Þ = ,D ðσ v ′ Þ = :
1 0 0 0 0 1 0 0
0 1 0 0 1 0 0 0

Let χ (R)(gν) be a character of the regular representation. Then, according to the


definition of (18.85),

n for gν = e,
n
χ ðRÞ ðgν Þ = i=1
δ gi- 1 gν gi = ð18:94Þ
0 for gν ≠ e:

As can be seen from (18.92), the regular representation is reducible because the
matrix is decomposed into block matrices. Therefore, the representation can be
reduced to a direct sum of irreducible representations such that

DðRÞ = qα DðαÞ , ð18:95Þ


α
18.5 Regular Representation and Group Algebra 729

Table 18.4 Character table C2v E C2(z) σ v(zx) σ 0v ðyzÞ


of C2v
A1 1 1 1 1 z; x2, y2, z2
A2 1 1 -1 -1 xy
B1 1 -1 1 -1 x; zx
B2 1 -1 -1 1 y; yz

where qα is a positive integer or zero and D(α) is an irreducible representation. Then,


from (18.81), we have

χ ðRÞ ðgν Þ = q χ
α α
ðαÞ
ðgν Þ, ð18:96Þ

where χ (α) is a trace of the irreducible representation D(α). Using (18.83) and (18.94),

1 1 ðαÞ  ðRÞ 1
qα = χ ðαÞ ðgÞ χ ðRÞ ðgÞ = χ ðeÞ χ ðeÞ = χ ðαÞ ðeÞ n = χ ðαÞ ðeÞ
n g n n
= dα : ð18:97Þ

Note that a dimension dα of the representation D(α) is equal to a trace of its identity
matrix. Also notice from (18.95) and (18.97) that D(R) contains every irreducible
representation D(α) dα times. To show it more clearly, Table 18.4 gives a character
table of C2v. The regular representation matrices are given in Example 18.2. For this,
we have

DðRÞ ðC2v Þ = A1 þ A2 þ B1 þ B2 : ð18:98Þ

This relation obviously indicates that all the irreducible representations of C2v are
contained one time (that is equal to the dimension of representation of C2v).
To examine general properties of the irreducible representations, we return to
(18.94) and (18.96). Replacing qα with dα there, we get

n for gν = e,
d χ ðαÞ ðgν Þ =
α α
ð18:99Þ
0 for gν ≠ e:

In particular, when gν = e, again we have χ (α)(e) = dα (dα is a real number!).


That is,

d
α α
2
= n: ð18:100Þ

This is very important relation in the representation theory of a finite group in that
(18.100) sets an upper limit to the number of irreducible representations and their
dimensions. That number cannot exceed the order of a group.
730 18 Representation Theory of Groups

In (18.76) and (18.78), we have shown the orthogonality relationship between


traces. We have another important orthogonality relationship between them. To
prove this theorem, we need a notion of group algebra [5]. The argument is as
follows:
(a) Let us think of a set comprising group elements expressed as

ℵ= ag g, ð18:101Þ
g

where g is a group element of a group ℊ = {g1, g2, ⋯, gn}, ag is an arbitrarily


chosen complex number, and Σg means that summation should be taken over group
elements. Let ℵ′ be another set similarly defined as (18.101). That is,

ℵ0 = a0g0 g0 : ð18:102Þ
g0

Then we can define a following sum:

ℵ þ ℵ0 = ag g þ a0g0 g0 = ag g þ a0g g = ag þ a0g g: ð18:103Þ


g g0 g g

Also, we get

ℵ ∙ ℵ0 = ag g a0g0 g0 = ag gg0 a0g0


g g0 g0 g

-1 -1
= ag gg0 a0g0 - 1 = agg0 gg0 g0 a0g0 - 1
g0 - 1 g g0 - 1 gg0

= agg0 g a0g0 - 1 = agg0 a0g0 - 1 g


0 -1
g gg0 gg0 g0 -1

= agg0 a0g0 - 1 g, ð18:104Þ


g g0

where we used the rearrangement theorem and suitable exchange of group elements.
Thus, we see that the above-defined set ℵ is closed under summations (i.e., linear
combinations) and multiplications. A set closed under summations and multiplica-
tions is said to be an algebra. If the set forms a group, the said set is called a group
algebra.
18.5 Regular Representation and Group Algebra 731

So far we have treated calculations as a multiplication between two elements


g and g′, i.e., g ⋄ g′ (see Sect. 16.1). Now we start regarding the calculations as
summation as well. In that case, group elements act as basis vectors in a vector space.
Bearing this in mind, let us further define specific group algebra.
(b) Let Ki be the i-th conjugacy class of a group ℊ = {g1, g2, ⋯, gn}. Also let Ki
be such that

ðiÞ ðiÞ ðiÞ


K i = A1 , A2 , ⋯, Aki , ð18:105Þ

where ki is the number of elements belonging to Ki. Now think of a set


gKig-1 (8g 2 ℊ). Then, due to the definition of a class, we have

gK i g - 1 ⊂ K i : ð18:106Þ

Multiplying g-1 from the left and g from the right of both sides, we get
Ki ⊂ g-1Kig. Since g is arbitrarily chosen, replacing g with g-1, we have

K i ⊂ gK i g - 1 : ð18:107Þ

Therefore, we get

gK i g - 1 = K i : ð18:108Þ

ðiÞ ðiÞ
Meanwhile, for AðαiÞ , Aβ 2 K i ; AðαiÞ ≠ Aβ (1 ≤ α, β ≤ ki), we have

ðiÞ
gAαðiÞ g - 1 ≠ gAβ g - 1 8
g2ℊ : ð18:109Þ

ðiÞ
This is because if the equality holds with (18.109), we have AðαiÞ = Aβ , in
contradiction.
(c) Let K be a set collecting several classes and described as

K= ai K i , ð18:110Þ
i

where ai is a positive integer or zero. Thanks to (18.105) and (18.110), we have

gKg - 1 = K: ð18:111Þ
732 18 Representation Theory of Groups

Conversely, if a group algebra K satisfies (18.111), K can be expressed as a sum


of classes such as (18.110). Here suppose that K is not expressed by (18.110), but
described by

K= ai K i þ Q, ð18:112Þ
i

where Q is an “incomplete” set that does not form a class. Then,

gKg - 1 = g ai K i þ Q g - 1 = g ai K i g - 1 þ gQg - 1 = ai K i
i i i

þ gQg - 1

=K = ai K i þ Q, ð18:113Þ
i

where the equality before the last comes from (18.111). Thus, from (18.113) we get

gQg - 1 = Q 8
g2ℊ : ð18:114Þ

By definition of the classes, this implies that Q must form a “complete” class, in
contradiction to the supposition. Thus, (18.110) holds.
In Sect. 16.3, we described several characteristics of an invariant subgroup. As
readily seen from (16.11) and (18.111), any invariant subgroup consists of two or
more entire classes. Conversely, if a group comprises entire classes, it must be an
invariant subgroup.
(d) Let us think of a product of classes. Let Ki and Kj be sets described as (18.105).
We define a product KiKj as a set containing products
ðjÞ
AðαiÞ Aβ 1 ≤ α ≤ ki , 1 ≤ β ≤ kj . That is,

ki kj ðiÞ
KiKj = l=1
A AðmjÞ :
m=1 l
ð18:115Þ

Multiplying (18.115) by g (8g 2 ℊ) from the left and by g-1 from the right, we get

gK i K j g - 1 = ðgK i gÞ g - 1 K j g - 1 = K i K j : ð18:116Þ

From the above discussion, we get


18.5 Regular Representation and Group Algebra 733

KiKj = cijl K l , ð18:117Þ


l

where cijl is a positive integer or zero. In fact, when we take gKiKjg-1, we merely
permute the terms in (18.117).
(e) If two group elements of the group ℊ = {g1, g2, ⋯, gn} are conjugate to each
other, their inverse elements are conjugate to each other as well. In fact, suppose
that for gμ, gν 2 Ki, we have

gμ = ggν g - 1 : ð18:118Þ

Then, taking the inverse of (18.118), we get

gμ - 1 = ggν - 1 g - 1 : ð18:119Þ

Thus, given a class Ki, there exists another class K i0 that consists of the inverses of
the elements of Ki. If gμ ≠ gν, gμ-1 ≠ gν-1. Therefore, Ki and K i0 are of the same
order. Suppose that the number of elements contained in Ki is ki and that in K i0 is ki0 .
Then, we have

k i = k i0 : ð18:120Þ

If K j ≠ K i0 (i.e., Kj is not identical with K i0 as a set), KiKj does not contain e. In


fact, suppose that for gρ 2 Ki there were a group element b 2 Kj such that bgρ = e.
Then, we would have

b = gρ - 1 and b 2 K i0 : ð18:121Þ

This would mean that b 2 K j K i0 , implying that K j = K i0 by definition of class.


It is in contradiction to K j ≠ K i0 . Consequently, KiKj does not contain e. Taking the
product of the classes Ki and K i0 , we obtain the identity e precisely ki times.
Rewriting (18.117), we have

K i K j = cij1 K 1 þ cijl K l , ð18:122Þ


l≠1

where K1 = {e}. As mentioned below, we are most interested in the first term in
(18.122).
In (18.122), if K j = K i0 , we have

cij1 = ki : ð18:123Þ

On the other hand, if K j ≠ K i0 , cij1 = 0. Summarizing the above arguments, we get


734 18 Representation Theory of Groups

ki for K j = K i ′ ,
cij1 = ð18:124Þ
0 for K j ≠ K i ′ :

Or, symbolically we write

cij1 = ki δji0: ð18:125Þ

18.6 Classes and Irreducible Representations

After the aforementioned considerations, we have the following theorem:


Theorem 18.5 Let ℊ = {g1, g2, ⋯, gn} be a group. A trace of irreducible
representations satisfies the following orthogonality relation:

nr  n
χ ðαÞ ðK i Þχ ðαÞ K j = δ , ð18:126Þ
α=1 ki ij

where summation α is taken over all inequivalent nr irreducible representations, Ki


and Kj indicate conjugacy classes, and ki denotes the number of elements contained
in the i-th class Ki.
Proof Rewriting (18.108), we have

8
gK i = K i g g2ℊ : ð18:127Þ

Since a homomorphic correspondence holds between a group element and its


representation matrix, a similar correspondence holds as well with (18.127). Let K i
be a sum of ki matrices of the α-th irreducible representation D(α) and let K i be
expressed as

Ki = DðαÞ : ð18:128Þ
g2K i

Note that in (18.128) D(α) functions as a linear transformation with respect to a


group algebra. From (18.127), we have

DðαÞ K i = K i DðαÞ 8
g2ℊ : ð18:129Þ

Since D(α) is an irreducible representation, K i must be expressed on the basis of


Schur’s second lemma as
18.6 Classes and Irreducible Representations 735

K i = λE: ð18:130Þ

To determine λ, we take a trace of both sides of (18.130). Then, from (18.128) and
(18.130), we get

ki χ ðαÞ ðK i Þ = λdα : ð18:131Þ

Thus, we get

k i ðαÞ
Ki = χ ðK i ÞE: ð18:132Þ

Next, corresponding to (18.122), we have

KiKj = cijl K l : ð18:133Þ


l

Replacing K l in (18.133) with that of (18.132), we get

k i kj χ ðαÞ ðK i Þχ ðαÞ K j = dα cijl k l χ ðαÞ ðK l Þ: ð18:134Þ


l

Returning to (18.99) and rewriting it, we have

ðαÞ
d χ
α α
ðK i Þ = nδi1 , ð18:135Þ

where again we have K1 = {e}. With respect to (18.134), we sum over all the
irreducible representations α. Then we have

ki k j χ ðαÞ ðK i Þχ ðαÞ K j = cijl k l d α χ ðαÞ ðK l Þ = cijl kl nδl1


α l α l

= cij1 n: ð18:136Þ

In (18.136) we remark that k1 = 1, meaning that the number of group elements


that K1 = {e} contains is 1. Rewriting (18.136), we get

ki kj χ ðαÞ ðK i Þχ ðαÞ K j = cij1 n = k i nδji0 , ð18:137Þ


α

where we used (18.125) with the last equality. Moreover, using


736 18 Representation Theory of Groups

χ ðαÞ ðK i0 Þ = χ ðαÞ ðK i Þ , ð18:138Þ

we get


ki kj χ ðαÞ ðK i Þχ ðαÞ K j = cij1 n = k i nδji : ð18:139Þ
α

Rewriting (18.139), we finally get

nr  n
χ ðαÞ ðK i Þχ ðαÞ K j = δ : ð18:126Þ
α=1 k i ij

These complete the proof.


Equations (18.78) and (18.126) are well known as orthogonality relations. So far,
we have no idea about the mutual relationship between the two numbers nc and nr in
magnitude. In (18.78), let us consider a following set S

S= k 1 χ ðαÞ ðK 1 Þ, k2 χ ðαÞ ðK 2 Þ, ⋯, knc χ ðαÞ ðK nc Þ : ð18:140Þ

Viewing individual components in S as coordinates of a nc-dimensional vector,


(18.78) can be considered as an inner product as expressed using (complex) coor-
dinates of the two vectors. At the same time, (18.78) represents an orthogonal
relationship between the vectors. Since we can obtain at most nc mutually orthogonal
(i.e., linearly independent) vectors in a nc-dimensional space, for the number (nr) of
such vectors, we have

nr ≤ n c : ð18:141Þ

Here nr is equal to the number of different α, i.e., the number of irreducible


representations.
Meanwhile, in (18.126) we consider a following set S′

S0 = χ ð1Þ ðK i Þ, χ ð2Þ ðK i Þ, ⋯, χ ðnr Þ ðK i Þ : ð18:142Þ

Similarly, individual components in S′ can be considered as coordinates of a nr-


dimensional vector. Again, (18.126) implies the orthogonality relation among vec-
tors. Therefore, as for the number (nc) of mutually orthogonal vectors, we have

nc ≤ n r : ð18:143Þ

Thus, from (18.141) and (18.143), we finally reach a simple but very important
conclusion about the relationship between nc and nr such that
18.7 Projection Operators: Revisited 737

nr = nc : ð18:144Þ

That is, the number (nr) of inequivalent irreducible representations is equal to that
(nc) of conjugacy classes of the group. An immediate and important consequence of
(18.144) together with (18.100) is that the representation of an Abelian group is
one-dimensional. This is because in the Abelian group individual group elements
constitute a conjugacy class. We have nc = n accordingly.

18.7 Projection Operators: Revisited

In Sect. 18.2 we have described how basis vectors (or functions) are transformed by
a symmetry operation. In that case the symmetry operation is performed by a group
element that belongs to a transformation group. More specifically, given a group
ℊ = {g1, g2, ⋯, gn} and a set of basis vectors ψ 1, ψ 2, ⋯, and ψ d, the basis vectors
are transformed by gi 2 ℊ (1 ≤ i ≤ n) such that

d
gi ð ψ ν Þ = ψ μ Dμν ðgi Þ: ð18:22Þ
μ=1

We may well ask then how we can construct such basis vectors. In this section we
address this question. A central concept about this is a projection operator. We have
already studied the definition and basic properties of the projection operators. In this
section, we deal with them bearing in mind that we apply the group theory to
molecular science, especially to quantum chemical calculations.
In Sect. 18.6, we examined the permissible number of irreducible representations.
We have reached a conclusion that the number (nr) of inequivalent irreducible
representations is equal to that (nc) of conjugacy classes of the group. According
to this conclusion, we modify the above Eq. (18.22) such that


ðαÞ ðαÞ ðαÞ
g ψi = ψ j Dji ðgÞ, ð18:145Þ
j=1

where α and dα denote the α-th irreducible representation and its dimension,
respectively; the subscript i is omitted from gi for simplicity. This naturally leads
ðαÞ ðβ Þ
to the next question of how the basis vectors ψ j are related to ψ j that are basis
vectors as well, but belong to a different irreducible representation β.
Suppose that we have an arbitrarily chosen function f. Then, f is assumed to
contain various components of different irreducible representations. Thus, let us
assume that f is decomposed into the component such that
738 18 Representation Theory of Groups

f= α
cðαÞ ψ ðmαÞ ,
m m
ð18:146Þ

ðαÞ
where cm is a coefficient of the expansion and ψ ðmαÞ is the m-th component of α-th
irreducible representation.
Now, let us define the following operator:

ðαÞ dα ðαÞ
PlðmÞ  Dlm ðgÞ g, ð18:147Þ
n g

where Σg means that the summation should be taken over all n group elements g and
ðαÞ
dα denotes a dimension of the representation D(α). Operating PlðmÞ on f, we have

ðαÞ dα ðαÞ dα ðαÞ ðνÞ ðνÞ


PlðmÞ f = Dlm ðgÞ gf = Dlm ðgÞ g ν
c ψk
k k
n g n g

dα ðαÞ ð νÞ ðνÞ
= Dlm ðgÞ ν
c gψ k
k k
n g


dα ðαÞ ðνÞ ð νÞ ðνÞ
= Dlm ðgÞ ν
c
k k
ψ j Djk ðgÞ
n g j=1


dα ðνÞ ðνÞ ðαÞ ðνÞ
= ν
c
k k
ψj Dlm ðgÞ Djk ðgÞ
n j=1 g


dα ðνÞ ðνÞ n ðαÞ ðαÞ
= c ψj δ δ δ = cm ψl , ð18:148Þ
n ν k k
j=1
d α αν lj mk

where we used (18.51) for the equality before the last.


ðαÞ ðαÞ
Thus, an implication of the operator PlðmÞ is that if f contains the component ψ m ,
ðαÞ
i.e., cðmαÞ ≠ 0, PlðmÞ plays a role in extracting the component cm
ðαÞ ðαÞ
ψ m from f and then
ðαÞ
converting it to cðmαÞ ψ l . If the ψ m
ðαÞ
component is not contained in f, that means from
ðαÞ
(18.148) PlðmÞ f = 0. In that case we choose a more suitable function for f. In this
context, we will investigate an example of a quantum chemical calculation later.
ðαÞ
In the above case, let us call PlðmÞ a projection operator sensu lato by relaxing
Definition 14.1 of the projection operator. In Sect. 14.1, we dealt with several aspects
of the projection operators. There we have mentioned that a projection operator
should be idempotent and Hermitian in a rigorous sense (Definition 14.1). To
address another important aspect of the projection operator, let us prove an important
relation in a following theorem.
18.7 Projection Operators: Revisited 739

ðαÞ ðβÞ
Theorem 18.6 [1, 2] Let PlðmÞ and PsðtÞ be projection operators defined in (18.147).
Then, the following equation holds:

ðαÞ ðβÞ ðαÞ


PlðmÞ PsðtÞ = δαβ δms PlðtÞ : ð18:149Þ

Proof We have

ðαÞ ðβ Þ dα ðαÞ dβ ðβÞ 


PlðmÞ PsðtÞ = Dlm ðgÞ g Dst ðg0 Þ g0
n g
n g0

dα ðαÞ dβ ðβ Þ 
= Dlm ðgÞ g Dst g - 1 g0 g - 1 g0
n g n g0

dα dβ ðαÞ ðβ Þ 
= Dlm ðgÞ Dst g - 1 g0 gg - 1 g0
n n g g0

{ 
dα dβ ðαÞ
= Dlm ðgÞ DðβÞ ðgÞ DðβÞ ðg0 Þ eg0
n n g g0 st

dα dβ 
ðαÞ
= Dlm ðgÞ DðβÞ ðgÞks  DðβÞ ðg0 Þkt g0
n n g g0
k

dα dβ ðαÞ 
= Dlm ðgÞ DðβÞ ðgÞks DðβÞ ðg0 Þkt g0
n n k
g0 g

dα dβ n  dβ 
= δ δ δ DðβÞ ðg0 Þkt g0 = δαβ δms DðβÞ ðg0 Þlt g0
n n k
g0
dα αβ lk ms n g0

dβ  ðαÞ
= δαβ δms DðβÞ ðg0 Þlt g0 = δαβ δms PlðtÞ : ð18:150Þ
n g0

This completes the proof.


In the above proof, we used the rearrangement theorem and grand orthogonality
theorem (GOT) as well as the homomorphism and unitarity of the representation
matrices.
ðαÞ ðαÞ
Comparing (18.146) and (18.148), we notice that the term cm ψ m is not extracted
ðαÞ ðα Þ ðαÞ
entirely, but cm ψ l is given instead. This is due to the linearity of PlðmÞ . Nonethe-
less, this is somewhat inconvenient for a practical purpose. To overcome this
inconvenience, we modify (18.149). In (18.149) putting s = m, we have
740 18 Representation Theory of Groups

ðαÞ ðβ Þ ðαÞ
PlðmÞ PmðtÞ = δαβ PlðtÞ : ð18:151Þ

We further modify the relation. Putting β = α, we have

ðαÞ ðαÞ ðαÞ


PlðmÞ PmðtÞ = PlðtÞ : ð18:152Þ

Putting t = l furthermore,

ðαÞ ðαÞ ðαÞ


PlðmÞ PmðlÞ = PlðlÞ : ð18:153Þ

In particular, putting m = l moreover, we get

2
ðαÞ ðαÞ ðαÞ ðαÞ
PlðlÞ PlðlÞ = PlðlÞ = PlðlÞ : ð18:154Þ

In fact, putting m = l in (18.148), we get

ðαÞ ðαÞ ðαÞ


PlðlÞ f = cl ψ l : ð18:155Þ

ðαÞ ðαÞ
This means that the term cl ψ l has been extracted entirely.
Moreover, in (18.149) putting β = α, s = l, and t = m, we have

ðαÞ ðαÞ ðαÞ 2 ðαÞ


PlðmÞ PlðmÞ = PlðmÞ = δml PlðmÞ :

ðαÞ
Therefore, for PlðmÞ to be an idempotent operator, we must have m = l. That is, of
ðαÞ ðαÞ
various operators PlðmÞ , only PlðlÞ is eligible for an idempotent operator.
ðαÞ
Meanwhile, fully describing PlðlÞ we have

ðαÞ dα ðαÞ
PlðlÞ = Dll ðgÞ g: ð18:156Þ
n g

Taking complex conjugate transposition (i.e., adjoint) of (18.156), we have

{ dα dα {
ðαÞ ðαÞ ðαÞ
PlðlÞ = Dll ðgÞg{ = Dll g-1 g-1
n g
n
g-1

dα ðαÞ ðαÞ
= Dll ðgÞ g = PlðlÞ , ð18:157Þ
n g
18.7 Projection Operators: Revisited 741

where we used unitarity of g (with the third equality) and equivalence of summation
over g and g-1. Notice that the notation of g{ is less common. This should be
interpreted as meaning that g{ operates on a vector constituting a representation
space. Thus, notation g{ implies that g{ is equivalent to its unitary representation
ðαÞ
matrix D(α)(g). Note also that Dll is not a matrix but a (complex) number. Namely,
ðαÞ 
in (18.156) Dll ðgÞ is a coefficient of an operator g.
ðαÞ
Equations (18.154) and (18.157) establish that PlðlÞ is a projection operator in a
rigorous sense; namely, it is an idempotent and Hermitian operator (see Definition
ðαÞ
14.1). Let us call PlðlÞ a projection operator sensu stricto accordingly. Also, we notice
that cðmαÞ ψ m
ðαÞ
is entirely extracted from an arbitrary function f including a coefficient.
This situation resembles that of (12.205).
ðαÞ
Regarding PlðmÞ ðl ≠ mÞ, on the other hand, we have

{ dα dα {
ðαÞ ðαÞ ðαÞ
PlðmÞ = Dlm ðgÞg{ = Dlm g - 1 g - 1
n g
n
g-1

dα ðαÞ ðαÞ
= Dml ðgÞ g = PmðlÞ : ð18:158Þ
n g

ðαÞ
Hence, PlðmÞ is not Hermitian. We have many other related operators and
equations. For instance,

ðαÞ ðαÞ { ðαÞ { ðαÞ { ðαÞ ðαÞ


PlðmÞ PmðlÞ = PmðlÞ PlðmÞ = PlðmÞ PmðlÞ : ð18:159Þ

ðαÞ ðαÞ
Therefore, PlðmÞ PmðlÞ is Hermitian, recovering the relation (18.153).
In (18.149) putting m = l and t = s, we get

ðαÞ ðβÞ ðαÞ


PlðlÞ PsðsÞ = δαβ δls PlðsÞ : ð18:160Þ

As in the case of (18.146), we assume that h is described as

h= α
d ðαÞ ϕðmαÞ ,
m m
ð18:161Þ

where ϕðmαÞ is transformed in the same manner as that for ψ m


ðαÞ
in (18.146), but linearly
ðαÞ
independent of ψ m . Namely, we have


ðαÞ ðαÞ ðαÞ
g ϕi = ϕj Dji ðgÞ: ð18:162Þ
j=1

Tangible examples can be seen in Chap. 19.


742 18 Representation Theory of Groups

ðαÞ
Operating PlðlÞ on both sides of (18.155), we have

ðαÞ 2 ðαÞ ðαÞ ðαÞ ðαÞ ðαÞ ðαÞ


PlðlÞ f = PlðlÞ cl ψ l = PlðlÞ f = cl ψ l , ð18:163Þ

where with the second equality we used (18.154). That is, we have

ðαÞ ðαÞ ðαÞ ðαÞ ðαÞ


PlðlÞ cl ψ l = cl ψ l : ð18:164Þ

ðαÞ ðαÞ
This equation means that cl ψ l is an eigenfunction corresponding to an
ðαÞ ðαÞ ðαÞ
eigenvalue 1 of PlðlÞ . In other words, once cl ψ l is extracted from f, it belongs to
the “position” l of an irreducible representation α. Furthermore, for some constants
c and d as well as the functions f and h that appeared in (18.146) and (18.161), we
consider a following equation:

ðαÞ ðαÞ ðαÞ ðαÞ ðαÞ ðαÞ ðαÞ ðαÞ ðαÞ ðαÞ ðαÞ
PlðlÞ ccl ψ l þ ddl ϕl = cPlðlÞ cl ψ l þ dPlðlÞ d l ϕl
ðαÞ ðαÞ ðαÞ ðαÞ
= ccl ψ l þ ddl ϕl , ð18:165Þ

where with the last equality we used (18.164).


ðαÞ ðαÞ ðαÞ ðαÞ
This means that an arbitrary linear combination of cl ψ l and d l ϕl again
ðαÞ ðαÞ
belongs to the position l of an irreducible representation α. If ψ l and ϕl are
linearly independent, we can construct two orthonormal basis vectors following
Theorem 13.2 (Gram–Schmidt orthonormalization theorem). If there are other
ðαÞ ðαÞ ðαÞ ðαÞ ðαÞ ðαÞ
linearly independent vectors pl ξl , ql φl , ⋯, where ξl , φl , etc. belong to
ðαÞ ðαÞ ðαÞ ðαÞ
the position l of an irreducible representation α, then ccl ψ l þ ddl ϕl þ
ðαÞ ðαÞ ðαÞ ðαÞ
ppl ξl þ qql φl þ ⋯ again belongs to the position l of an irreducible represen-
tation α. Thus, we can construct orthonormal basis vectors of the representation
space according to Theorem 13.2.
Regarding arbitrary functions f and h as vectors and using (18.160), we make an
inner product of (18.160) such that

ðαÞ ðβÞ ðαÞ ðαÞ ðβÞ ðαÞ ðαÞ


hjPlðlÞ PsðsÞ jf = hjPlðlÞ PlðlÞ PsðsÞ jf = hjPlðlÞ δαβ δls PlðsÞ jf

ðαÞ ðαÞ
= δαβ δls hPlðlÞ jPlðsÞ jf , ð18:166Þ

where with the first equality we used (18.154). Meanwhile, from (18.148) we have

ðαÞ ðαÞ
PlðsÞ j f i = cðsαÞ ψ l : ð18:167Þ

Also using (18.155), we get


18.7 Projection Operators: Revisited 743

ðαÞ ðαÞ ðαÞ


PlðlÞ hi = d l ϕl : ð18:168Þ

Taking adjoint of (18.168), we have

{ 
ðαÞ ðαÞ ðαÞ ðαÞ
hh j PlðlÞ = h PlðlÞ = d l ϕl , ð18:169Þ

where we used (18.157). The relation (18.169) is due to the notation of Sect. 13.3.
Substituting (18.167) and (18.168) for (18.166),

ðαÞ ðβÞ ðαÞ ðαÞ ðαÞ
hjPlðlÞ PsðsÞ jf = δαβ δls dl cðsαÞ ϕl jψ l :

Meanwhile, (18.166) can also be described as



ðαÞ ðβÞ ðαÞ ðαÞ ðαÞ ðαÞ
hjPlðlÞ PsðsÞ jf = dl ϕl jcðsβÞ ψ ðsβÞ = dl cðsβÞ ϕl jψ sðβÞ : ð18:170Þ

For (18.166) and (18.170) to be identical, we must have

ðαÞ  ðαÞ ðαÞ ðαÞ ðαÞ  ðβÞ ðαÞ


δαβ δls dl cs ϕl jψ l = dl cs ϕl jψ ðsβÞ :

Deleting coefficients, we get

ðαÞ ðαÞ ðαÞ


ϕl jψ ðsβÞ = δαβ δls ϕl jψ l : ð18:171Þ

The relation (18.171) is frequently used to estimate whether definite integrals


vanish. Functional forms depend upon actual problems we encounter in various
situations. We will deal with this problem in Chap. 19 in relation to, e.g., discussion
on optical transition and evaluation of overlap integrals.
To evaluate (18.170), if α ≠ β or l ≠ s, we get simply

ðαÞ ðβÞ
hjPlðlÞ PsðsÞ jf = 0 ðα ≠ β or l ≠ sÞ: ð18:172Þ

ðβÞ ðαÞ
That is, under a given condition α ≠ β or l ≠ s, PsðsÞ j f i and PlðlÞ j hi are
orthogonal (see Theorem 13.3 of Sect. 13.4). The relation clearly indicates that
functions belonging to different irreducible representations (α ≠ β) are mutually
orthogonal. Even though the functions belong to the same irreducible representation,
the functions are orthogonal if they are allocated to the different “place” as a basis
ðαÞ
vector designated by l, s, etc. Here the place means the index j of ψ j in (18.146) or
ðαÞ ðαÞ ðαÞ ðαÞ ðαÞ ðαÞ
ϕj in (18.161) that designates the “order” of ψ j within ψ 1 , ψ 2 , ⋯, ψ dα or ϕj
744 18 Representation Theory of Groups

ðαÞ ðαÞ ðαÞ


within ϕ1 , ϕ2 , ⋯, ϕdα . This takes place if the representation is multidimensional
(i.e., dimensionality: dα).
In (18.160) putting α = β, we get

ðαÞ ðαÞ ðαÞ


PlðlÞ PsðsÞ = δls PlðsÞ :

In the above relation, furthermore, unless l = s we have

ðαÞ ðαÞ
PlðlÞ PsðsÞ = 0:

ðαÞ ðαÞ
Therefore, on the basis of the discussion of Sect. 14.1, PlðlÞ þ PsðsÞ is a projection
ðαÞ ðαÞ
operator as well in the case of l ≠ s. Notice, however, that if l = s, PlðlÞ þ PsðsÞ is not a
projection operator. Readers can readily show it.
Moreover, let us define P(α) as below:

dα ðαÞ
PðαÞ  P :
l = 1 lðlÞ
ð18:173Þ

Similarly, P(α) is again a projection operator as well.


From (18.156), P(α) in (18.173) can be rewritten as

dα dα 
dα ðαÞ
PðαÞ = l=1
Dll ðgÞ g = χ ðαÞ ðgÞ g: ð18:174Þ
n g n g

Returning to (18.155) and taking summation over l there, we have

dα ðαÞ dα ðαÞ ðαÞ


P f
l = 1 lðlÞ
= c ψl :
l=1 l

Using (18.173), we get

dα ðαÞ ðαÞ
PðαÞ f = c ψl :
l=1 l
ð18:175Þ

Thus, an operator P(α) has a clear meaning. That is, P(α) plays a role in extracting
all the vectors (or functions) that belong to the irreducible representation α including
their coefficients. Defining those functions as

dα ðαÞ ðαÞ
ψ ðαÞ  c ψl ,
l=1 l
ð18:176Þ

we succinctly rewrite (18.175) as


18.7 Projection Operators: Revisited 745

PðαÞ f = ψ ðαÞ : ð18:177Þ

In turn, let us calculate P(β)P(α). This can be done with (18.174) as follows:

dβ  dα 
PðβÞ PðαÞ = χ ðβÞ ðgÞ g χ ðαÞ ðg0 Þ g0 : ð18:178Þ
n g n g0

To carry out the calculation, (1) first we replace g′ with g-1g′ and rewrite the
summation over g′ as that over g-1g′. (2) Using the homomorphism and unitarity of
the representation, we rewrite [χ (α)(g′)] as
 
χ ðαÞ ðg0 Þ = i,j
DðαÞ ðgÞ DðαÞ ðg0 Þ : ð18:179Þ
ji ji

Rewriting (18.178) we have

dβ dα  
PðβÞ PðαÞ = DðβÞ ðgÞ DðαÞ ðgÞ DðαÞ ðg0 Þ g0
n n g k , i, j kk ji
g0
ji

dβ dα n  dα 
= δ δ δ DðαÞ ðg0 Þ g0 = DðαÞ ðg0 Þ g0
n n k, i, j d β αβ kj ki g0
ji n k g0
kk

dα 
= δαβ χ ðαÞ ðg0 Þ g0 = δαβ PðαÞ : ð18:180Þ
n g0

These relationships are anticipated from the fact that both P(α) and P(β) are
projection operators.
As in the case of (18.160) and (18.180) is useful to evaluate an inner product of
functions. Again taking an inner product of (18.180) with arbitrary functions f and g,
we have

gjPðβÞ PðαÞ jf = gjδαβ PðαÞ jf = δαβ gjPðαÞ jf : ð18:181Þ

If we define P such that


nr
P α=1
PðαÞ , ð18:182Þ

then P is once again a projection operator (see Sect. 14.1). Now taking summation in
(18.177) over all the irreducible representations α, we have
746 18 Representation Theory of Groups

nr nr
α=1
PðαÞ f = α=1
ψ ðαÞ = f : ð18:183Þ

The function f has been arbitrarily taken and, hence, we get

P = E: ð18:184Þ

The relation described in (18.182)–(18.184) is called completeness relation (see


Sect. 5.1).
As mentioned in Example 18.1, the concept of the representation space is not only
important but also very useful for addressing various problems of physics and
chemistry. For instance, regarding molecular orbital methods to be dealt with in
Chap. 19, we consider a representation space whose dimension is equal to the
number of electrons of a molecule about which we wish to know, e.g., energy
eigenvalues of those electrons. In that case, the dimension of representation space
is equal to the number of molecular orbitals. According to a symmetry species of the
molecule, the representation matrix is decomposed into a direct sum of invariant
eigenspaces relevant to individual irreducible representations. If the basis vectors
belong to different irreducible representation, such vectors are orthogonal to each
other in virtue of (18.171). Even though those vectors belong to the same irreducible
representation, they are orthogonal if they are allocated to a different place. How-
ever, it is often the case that the vectors belong to the same place of the same
irreducible representation. Then, it is always possible according to Theorem 13.2 to
make them mutually orthogonal by taking their linear combination. In such a way,
we can construct an orthonormal basis set throughout the representation space. In
fact, the method is a powerful tool for solving an energy eigenvalue problem and for
determining associated eigenfunctions (or molecular orbitals).

18.8 Direct-Product Representation

In Sect. 16.5 we have studied basic properties of direct-product groups. Correspond-


ingly, we examine in this section properties of direct-product representation. This
notion is very useful to investigate optical transitions in molecular systems and
selection rules relevant to those transitions.
Let D(α) and D(β) be two different irreducible representations whose dimensions
are dα and dβ, respectively. Then, operating a group element g on the basis functions,
we have


ðαÞ
gðψ i Þ = ψ k Dki ðgÞ ð1 ≤ i ≤ dα Þ, ð18:185Þ
k=1
18.8 Direct-Product Representation 747


ðβ Þ
g ϕj = ϕl Dlj ðgÞ 1 ≤ j ≤ d β , ð18:186Þ
l=1

where ψ k (1 ≤ k ≤ dα) and ϕl (1 ≤ l ≤ dβ) are basis functions of D(α) and D(β),
respectively. We can construct dαdβ new basis vectors using ψ kϕl. These functions
are transformed according to g such that

ðαÞ ðβ Þ
g ψ i ϕj = gðψ i Þg ϕj = k
ψ k Dki ðgÞ ϕ D ð gÞ
l l lj

ðαÞ ðβÞ
= k l
ψ k ϕl Dki ðgÞDlj ðgÞ: ð18:187Þ

Here let us define the following matrix [D(α × β)(g)]kl, ij such that

ðαÞ ðβÞ
Dðα × βÞ ðgÞ  Dki ðgÞDlj ðgÞ: ð18:188Þ
kl,ij

Then we have

g ψ i ϕj = k l
ψ k ϕl Dðα × βÞ ðgÞ : ð18:189Þ
kl,ij

The notation using double scripts is somewhat complicated. We notice, however,


that in (18.188) the order of subscript of kilj is converted to kl, ij, i.e., the subscripts
i and l have been interchanged.
We write D(α) and D(β) in explicit forms as follows:

ðαÞ ðαÞ ðβ Þ ðβÞ


d 1 ,1 ⋯ d 1, d α d 1 ,1 ⋯ d 1, dβ
ðαÞ ðβÞ
D ð gÞ = ⋮ ⋱ ⋮ , D ð gÞ = ⋮ ⋱ ⋮ : ð18:190Þ
ðαÞ ðαÞ ðβ Þ ðβÞ
d d α ,1 ⋯ d dα , dα d d β ,1 ⋯ d dβ , dβ

Thus,

ðαÞ ðαÞ
d1,1 DðβÞ ðgÞ ⋯ d1, dα DðβÞ ðgÞ
Dðα × βÞ ðgÞ = DðαÞ ðgÞ  DðβÞ ðgÞ = ⋮ ⋱ ⋮ : ð18:191Þ
ðαÞ ðαÞ
d dα ,1 DðβÞ ðgÞ ⋯ ddα , dα DðβÞ ðgÞ

To get familiar with the double-scripted notation, we describe a case of (2, 2)


matrices. Denoting
748 18 Representation Theory of Groups

a11 a12 b11 b12


DðαÞ ðgÞ = and DðβÞ ðgÞ = , ð18:192Þ
a21 a22 b21 b22

we get

Dðα × βÞ ðgÞ = DðαÞ ðgÞ  DðβÞ ðgÞ


a11 b11 a11 b12 a12 b11 a12 b12
a11 b21 a11 b22 a12 b21 a12 b22
= : ð18:193Þ
a21 b11 a21 b12 a22 b11 a22 b12
a21 b21 a21 b22 a22 b21 a22 b22

Corresponding to (18.22) and (18.189) describes the transformation of ψ iϕj


regarding the double script. At the same time, a set {ψ iϕj; (1 ≤ i ≤ dα, 1 ≤ j ≤ dβ)} is
a basis of D(α × β). In fact,

ðαÞ ðβ Þ
Dðα × βÞ ðgg0 Þ = Dki ðgg0 ÞDlj ðgg0 Þ
kl,ij

ðαÞ ðαÞ 0 ðβ Þ ðβ Þ 0
= μ Dkμ ðgÞDμi ðg Þ ν Dlν ðgÞDνj ðg Þ]

ðαÞ ðβÞ ðαÞ ðβÞ


= μ ν
Dkμ ðgÞDlν ðgÞDμi ðg0 ÞDνj ðg0 Þ

= μ ν
Dðα × βÞ ðgÞ Dðα × βÞ ðg0 Þ : ð18:194Þ
kl,μν μν,ij

Equation (18.194) shows that the rule of matrix calculation with respect to the
double subscript is satisfied. Consequently, we get

Dðα × βÞ ðgg0 Þ = Dðα × βÞ ðgÞDðα × βÞ ðg0 Þ: ð18:195Þ

The relation (18.195) indicates that D(α × β) is certainly a representation of a


group. This representation is said to be a direct-product representation.
A character of the direct-product representation is given by putting k = i and l = j
in (18.188). That is,

ðαÞ ðβÞ
Dðα × βÞ ðgÞ = Dii ðgÞDjj ðgÞ: ð18:196Þ
ij,ij

Denoting
18.8 Direct-Product Representation 749

χ ð α × β Þ ð gÞ  i,j
Dðα × βÞ ðgÞ , ð18:197Þ
ij,ij

we have

χ ðα × βÞ ðgÞ = χ ðαÞ ðgÞχ ðβÞ ðgÞ: ð18:198Þ


× β)
Even though D(α) and D(β) are both irreducible, D(α is not necessarily
irreducible. Suppose that

DðαÞ ðgÞ  DðβÞ ðgÞ = qD


γ γ
ðγ Þ
ðgÞ, ð18:199Þ

where qγ is given by (18.83). Then, we have

1 1
qγ = χ ð γ Þ ð gÞ  χ ð α × β Þ ð gÞ = χ ðγÞ ðgÞ χ ðαÞ ðgÞχ ðβÞ ðgÞ, ð18:200Þ
n g n g

where n is an order of the group. This relation is often used to perform quantum
mechanical or chemical calculations, especially to evaluate optical transitions of
matter including atoms and molecular systems. This is also useful to examine
whether a definite integral of a product of functions (or an inner product of vectors)
vanishes.
In Sect. 16.5, we investigated definition and properties of direct-product groups.
Similarly to the case of the direct-product representation, we consider a representa-
tion of the direct-product groups. Let us consider two groups ℊ and H and assume
that a direct-product group ℊ  H is defined (Sect. 16.5). Also let D(α) and D(β) be
dα- and dβ-dimensional representations of groups ℊ and H , respectively. Further-
more, let us define a matrix D(α × β)(ab) as in (18.188) such that

ðαÞ ðβÞ
Dðα × βÞ ðabÞ  Dki ðaÞDlj ðbÞ, ð18:201Þ
kl,ij

where a and b are arbitrarily chosen from ℊ and H , respectively, and ab 2 ℊ  H .


Then a set comprising D(α × β)(ab) forms a representation of ℊ  H . (Readers,
please verify it.) A dimension of D(α × β) is dαdβ. The character is given in (18.69),
and so in the present case, by putting i = k and j = l in (18.201), we get

ðαÞ ðβ Þ
χ ðα × βÞ ðabÞ = k l
Dðα × βÞ ðabÞ = k l
Dkk ðaÞDll ðbÞ
kl,kl

= χ ðαÞ ðaÞχ ðβÞ ðbÞ: ð18:202Þ

Equation (18.202) resembles (18.198). Hence, we should be careful not to


confuse them. In (18.198), we were thinking of a direct-product representation
within a sole group ℊ. In (18.202), however, we are considering a representation
750 18 Representation Theory of Groups

of the direct-product group comprising two different groups. In fact, even though a
character is computed with respect to a sole group element g in (18.198), in (18.202)
we evaluate a character regarding two group elements a and b chosen from different
groups ℊ and H , respectively.

18.9 Symmetric Representation and Antisymmetric


Representation

As mentioned in Sect. 18.2, we have viewed a group element g as a linear transfor-


mation over a vector space V. There we dealt with widely chosen functions as
vectors. In this chapter we introduce other useful ideas so that we can apply them
to molecular science and atomic physics.
In the previous section, we defined direct-product representation. In
D(α × β)(g) = D(α)(g)  D(β)(g), we can freely consider a case where D(α)(g) = D(β)(g).
Then we have

ðαÞ ðαÞ
g ψ i ϕj = gðψ i Þg ϕj = k l
ψ k ϕl Dki ðgÞDlj ðgÞ: ð18:203Þ

Regarding a product function ψ jϕi, we can get a similar equation such that

ðαÞ ðαÞ
g ψ j ϕi = g ψ j gð ϕi Þ = k l
ψ k ϕl Dkj ðgÞDli ðgÞ: ð18:204Þ

On the basis of the linearity of the relations (18.203) and (18.204), let us construct
a linear combination of the product functions. That is,

g ψ i ϕj ± ψ j ϕi = g ψ i ϕj ± g ψ j ϕi
ðαÞ ðαÞ ðαÞ ðαÞ
= k l
ψ k ϕl Dki ðgÞDlj ðgÞ ± k l
ψ k ϕl Dkj ðgÞDli ðgÞ
ðαÞ ðαÞ
= k l
ðψ k ϕl ± ψ l ϕk ÞDki ðgÞDlj ðgÞ: ð18:205Þ

Here, defining Ψ ij± as

Ψ ij±  ψ i ϕj ± ψ j ϕi , ð18:206Þ

we rewrite (18.205) as

1 ðαÞ ðαÞ ðαÞ ðαÞ


gΨ ij± = Ψ kl± D ðgÞDlj ðgÞ ± Dli ðgÞDkj ðgÞ : ð18:207Þ
k l 2 ki
18.9 Symmetric Representation and Antisymmetric Representation 751

Notice that Ψ þ -
ij and Ψ ij in (18.206) are symmetric and antisymmetric with
respect to the interchange of subscript i and j, respectively. That is,

Ψ ij± = ± Ψ ji± : ð18:208Þ

We may naturally ask how we can constitute representation (matrices) with


(18.207). To answer this question, we have to carry out calculations by replacing
g with gg′ in (18.207) and following the procedures of Sect. 18.2. Thus, we have

1 ðαÞ 0 ðαÞ 0 ðαÞ ðαÞ


gg0 Ψ ij± = Ψ kl± D ðgg ÞDlj ðgg Þ ± Dli ðgg0 ÞDkj ðgg0 Þ
k l 2 ki
1 ðαÞ ðαÞ ðαÞ ðαÞ
= Ψ kl± μ
Dkμ ðgÞDμi ðg0 Þ ν
Dlν ðgÞDνj ðg0 Þ
k l 2
ðαÞ ðαÞ 0 ðαÞ ðαÞ 0
± μ Dlμ ðgÞDμi ðg Þ ν Dkν ðgÞDνj ðg Þ

1 ðαÞ ðαÞ ðαÞ ðαÞ ðαÞ ðαÞ


= Ψ kl± μ ν
Dkμ ðgÞDlν ðgÞ ± Dlμ ðgÞDkν ðgÞ Dμi ðg0 ÞDνj ðg0 Þ
k l 2

1 ð αÞ ð αÞ
= Ψ kl± μ ν
Dðα × αÞ ðgÞ ± Dðα × αÞ ðgÞ Dμi ðg0 ÞDνj ðg0 Þ ,
k l 2 kl,μν lk,μν

ð18:209Þ

where the last equality follows from the definition of a direct-product representation
(18.188). We notice that the terms of [D(α × α)(gg′)]kl, μν ± [D(α × α)(gg′)]lk, μν in
(18.209) are symmetric and antisymmetric with respect to the interchange of sub-
scripts k and l together with subscripts μ and ν, respectively. Comparing both sides of
(18.209), we see that this must be the case with i and j as well. Then, the last factor of
(18.209) should be rewritten as:

ðαÞ ðαÞ 1 ðαÞ 0 ðαÞ 0 ðαÞ ðαÞ


Dμi ðg0 ÞDνj ðg0 Þ = D ðg ÞDνj ðg Þ ± Dνi ðg0 ÞDμj ðg0 Þ :
2 μi

Now, we define the following notations D[α × α](g) and D{α × α}(g) represented by

1
D½α × α ðgÞ  Dðα × αÞ ðgÞ þ Dðα × αÞ ðgÞ
kl,μν 2 kl,μν lk,μν

1 ðαÞ ðαÞ ðαÞ ðαÞ


= D ðgÞDlν ðgÞ þ Dlμ ðgÞDkν ðgÞ ð18:210Þ
2 kμ

and
752 18 Representation Theory of Groups

1
Dfα × αg ðgÞ kl,μν
 Dðα × αÞ ðgÞ - Dðα × αÞ ðgÞ
2 kl,μν lk,μν

1 ðαÞ ðαÞ ðαÞ ðαÞ


= D ðgÞDlν ðgÞ - Dlμ ðgÞDkν ðgÞ : ð18:211Þ
2 kμ
ðαÞ ðαÞ
Meanwhile, using Dμi ðg0 ÞDνj ðg0 Þ = Dðα × αÞ ðg0 Þ μν,ij and considering the
exchange of summation with respect to the subscripts μ and ν, we can also define
D[α × α](g′) and D{α × α}(g) according to the symmetric and antisymmetric cases,
respectively. Thus for the symmetric case, we have

gg0 Ψ þ
ij = k l μ ν
Ψþ
kl D
½α × α
ð gÞ D½α × α ðg0 Þ
kl,μν μν,ij

= k l
Ψþ
kl D
½α × α
ðgÞD½α × α ðg0 Þ : ð18:212Þ
kl,ij

Using (18.210) and (18.207) can be rewritten as

gΨ þ
ij = k l
Ψþ
kl D
½α × α
ð gÞ : ð18:213Þ
kl,ij

Then we have

gg0 Ψ þ
ij = k l
Ψþ
kl D
½α × α
ðgg0 Þ : ð18:214Þ
kl,ij

Comparing (18.212) and (18.214), we finally get

D½α × α ðgg0 Þ = D½α × α ðgÞD½α × α ðg0 Þ: ð18:215Þ

Similarly, for the antisymmetric case, we have

gΨ ij- = k l
Ψ kl- Dfα × αg ðgÞ kl,ij
, ð18:216Þ

gg ′ Ψ ij- = k l
Ψ kl- Dfα × αg ðgÞDfα × αg ðg0 Þ kl,ij
: ð18:217Þ

Hence, we obtain

Dfα × αg ðgg0 Þ = Dfα × αg ðgÞDfα × αg ðg0 Þ: ð18:218Þ

Thus, both D[α × α](g) and D{α × α}(g) produce well-defined representations.
Letting dimension of the representation α be dα, we have dα(dα + 1)/2 functions
belonging to the symmetric representation and dα(dα - 1)/2 functions belonging to
References 753

the antisymmetric representation. With the two-dimensional representation, for


instance, functions belonging to the symmetric representation are

ψ 1 ϕ1 , ψ 1 ϕ2 þ ψ 2 ϕ1 , and ψ 2 ϕ2 : ð18:219Þ

A function belonging to the antisymmetric representation is

ψ 1 ϕ2 - ψ 2 ϕ1 : ð18:220Þ

Note that these vectors have not yet been normalized.


From (18.210) and (18.211), we can readily get useful expressions with charac-
ters of symmetric and antisymmetric representations. In (18.210) putting μ = k and
ν = l and summing over k and l,

χ ½ α × α  ð gÞ = k l
D½α × α ðgÞ D½α × α ðgÞ
kl,kl kl,kl

1 ðαÞ ðαÞ ðαÞ ðαÞ


= Dkk ðgÞDll ðgÞ þ Dlk ðgÞDkl ðgÞ
2 k l

1 2
= χ ð α Þ ð gÞ þ χ ðαÞ g2 : ð18:221Þ
2

Similarly we have

1 2
χ fα × αg ðgÞ = χ ðαÞ ðgÞ - χ ðαÞ g2 : ð18:222Þ
2

References

1. Inui T, Tanabe Y, Onodera Y (1990) Group theory and its applications in physics. Springer,
Berlin
2. Inui T, Tanabe Y, Onodera Y (1980) Group theory and its applications in physics. Shokabo,
Tokyo
3. Hassani S (2006) Mathematical physics. Springer, New York
4. Chen JQ, Ping J, Wang F (2002) Group representation theory for physicists, 2nd edn. World
Scientific, Singapore
5. Hamermesh M (1989) Group theory and its application to physical problems. Dover, New York
Chapter 19
Applications of Group Theory to Physical
Chemistry

On the basis of studies of group theory, now in this chapter we apply the knowledge
to the molecular orbital (MO) calculations (or quantum chemical calculations). As
tangible examples, we adopt aromatic hydrocarbons (ethylene, cyclopropenyl radi-
cal, benzene, and ally radical) and methane. The approach is based upon a method of
linear combination of atomic orbitals (LCAO). To seek an appropriate LCAO MO,
we make the most of a method based on a symmetry-adapted linear combination
(SALC). To use projection operators is a powerful tool for this purpose. For the sake
of correct understanding, it is desired to consider transformation of functions. To this
end, we first show several examples.
In the process of carrying out MO calculations, we encounter a secular equation
as an eigenvalue equation. Using a SALC eases the calculations of the secular
equation. Molecular science relies largely on spectroscopic measurements and
researchers need to assign individual spectral lines to a specific transition between
the relevant molecular states. Representation theory works well in this situation.
Thus, the group theory finds a perfect fit with its applications in the molecular
science.

19.1 Transformation of Functions

Before showing individual examples, let us consider the transformation of functions


(or vectors) by the symmetry operation. Here we consider scalar functions.
For example, let us suppose an arbitrary scalar function f(x, y) on a Cartesian xy-
coordinate. Figure 19.1 shows a contour map of f(x, y) = constant. Then suppose that
the map is rotated around the origin. More specifically, the positon vector r0 fixed on
a “summit” [i.e., a point that gives a maximal value of f(x, y)] undergoes a symmetry
operation, namely, rotation around the origin. As a result, r0 is transformed to r00 .
Here, we assume that a “mountain” represented by f(x, y) is a rigid body before and

© Springer Nature Singapore Pte Ltd. 2023 755


S. Hotta, Mathematical Physical Chemistry,
https://doi.org/10.1007/978-981-99-2512-4_19
756 19 Applications of Group Theory to Physical Chemistry

Fig. 19.1 Contour map of y


f(x, y) and f ′(x′, y′). The
function f ′(x′, y′) is obtained
by rotating a map of f(x, y)
around the z-axis
,
,

O x

after the transformation. Concomitantly, a general point r (see Sect. 17.1) is


transformed to r′ in exactly the same way as r0.
A new function f ′ gives a new contour map after the rotation. Let us describe f ′ as

f 0  OR f : ð19:1Þ

The RHS of (19.1) means that f ′ is produced as a result of operating the rotation
R on f. We describe the position vector after being transformed as r00 . Following the
notation of Sect. 11.1, r00 and r′ are expressed as

r00 = Rðr0 Þ and r0 = RðrÞ: ð19:2Þ

The matrix representation for R is given by e.g., (11.35). In (19.2) we have

x00
r0 = ðe1 e2 Þ x0
y0
and r00 = ðe1 e2 Þ y00
, ð19:3Þ

where e1 and e2 are orthonormal basis vectors in the xy-plane. Also we have

x0
r = ðe 1 e2 Þ x
y
and r 0 = ð e1 e2 Þ y0
: ð19:4Þ

Meanwhile, the following equation must hold:

f 0 ðx0 , y0 Þ = f ðx, yÞ: ð19:5Þ

Or, using (19.1) we have


19.1 Transformation of Functions 757

OR f ðx0 , y0 Þ = f ðx, yÞ: ð19:6Þ

The above argument can be extended to a three-dimensional (or higher dimen-


sional) space. In that case, similarly we have

OR f ðx0 , y0 , z0 Þ = f ðx, y, zÞ, OR f ðr0 Þ = f ðrÞ, or f 0 ðr0 Þ = f ðrÞ, ð19:7Þ

where

x x0
r = ð e1 e 2 e 3 Þ y and r 0 = ð e1 e2 e3 Þ y0 : ð19:8Þ
z z0

The last relation of (19.7) comes from (19.1). Using (19.2) and (19.3), we rewrite
(19.7) as

OR f ½RðrÞ = f ðrÞ: ð19:9Þ

Replacing r with R-1(r), we get

OR f R R - 1 ðrÞ = f R - 1 ðrÞ : ð19:10Þ

That is,

OR f ðrÞ = f R - 1 ðrÞ : ð19:11Þ

More succinctly, (19.11) may be rewritten as

Rf ðrÞ = f R - 1 ðrÞ : ð19:12Þ

Comparing (19.11) with (19.12) and considering (19.1), we have

f 0  OR f  Rf : ð19:13Þ

To gain a good understanding of the function transformation, let us think of some


examples.
Example 19.1
Let f(x, y) be a function described by

f ðx, yÞ = ðx - aÞ2 þ ðy - bÞ2 ; a, b > 0: ð19:14Þ

A contour is shown in Fig. 19.2. We consider a π/2 rotation around the z-axis.
Then, f(x, y) is transformed into Rf(x, y) = f ′(x, y) such that
758 19 Applications of Group Theory to Physical Chemistry

Fig. 19.2 Contour map of y, x'


f(x, y) = (x - a)2 + (y - b)2
and f ′(x′, y′) = (x′ + b)2 +
(y′ - a)2 before and after a π/2
rotation around the z-axis , ,
( , )
(− , )

y' O x
z

Rf ðx, yÞ = f 0 ðx, yÞ = ðx þ bÞ2 þ ðy - aÞ2 : ð19:15Þ

We also have

r 0 = ð e1 e2 Þ a
b
,

-1 -b
r00 = ðe1 e2 Þ 0
1 0
a
b
= ð e1 e2 Þ a
,

where we define R as

-1
R= 0
1 0
:

Similarly,

-y
r = ð e1 e 2 Þ x
y
and r0 = ðe1 e2 Þ x
:

From (19.15), we have

f 0 ðx0 , y0 Þ = ðx0 þ bÞ þ ðy0 - aÞ = ð- y þ bÞ2 þ ðx - aÞ2 = f ðx, yÞ:


2 2
ð19:16Þ

This ensures that (19.5) holds. The implication of (19.16) combined with (19.5) is
that a view of f ′(x′, y′) from (-b, a) is the same as that of f(x, y) from (a, b). Imagine
that if we are standing at (-b, a) of f ′(x′, y′), we are in the bottom of the “valley” of
f ′(x′, y′). Exactly in the same manner, if we are standing at (a, b) of f(x, y), we are in
19.1 Transformation of Functions 759

y
, ,
(− , 0) ( , 0)

O
z x

Fig. 19.3 Contour map of f ðrÞ = e - 2½ðx - aÞ þy þz  þ e - 2½ðxþaÞ þy þz  ða > 0Þ. The function form
2 2 2 2 2 2

remains unchanged by a reflection with respect to the yz-plane

the bottom of the valley of f(x, y) as well. Notice that for both f(x, y) and f ′(x′, y′), (a,
b) and (-b, a) are the lowest point, respectively.
Meanwhile, we have

R-1 = 0
-1
1
0
, R - 1 ðrÞ = ðe1 e2 Þ 0
-1
1
0
x
y
= ð e1 e2 Þ y
-x
:

Then we have

f R - 1 ðrÞ = ½ðyÞ - a2 þ ½ð- xÞ - b2 = ðx þ bÞ2 þ ðy - aÞ2 = Rf ðrÞ: ð19:17Þ

Thus, (19.12) certainly holds.


Example 19.2
Let f(r) and g(r) be functions described by

f ðrÞ = e - 2½ðx - aÞ þy þz2 


þ e - 2½ðxþaÞ þy þz2 
2 2 2 2
ða > 0Þ, ð19:18Þ

gðrÞ = e - 2½ðx - aÞ þy þz2 


- e - 2½ðxþaÞ þy2 þz2 
2 2 2
ða > 0Þ: ð19:19Þ

Figure 19.3 shows an outline of the contour of f(r). We consider a following


symmetry operation in a three-dimensional coordinate system:
760 19 Applications of Group Theory to Physical Chemistry

(a) (b)

0
‒a 0 a ‒a 0 a

Fig. 19.4 Plots of f ðrÞ = e - 2½ðx - aÞ þy þz  þ e - 2½ðxþaÞ þy þz  and gðrÞ = e - 2½ðx - aÞ þy þz  -


2 2 2 2 2 2 2 2 2

- 2½ðxþaÞ2 þy2 þz2 


e ða > 0Þ as a function of x. The plots have been taken along the x-axis. (a) f(r).
(b) g(r)

-1 0 0 -1 0 0
R= 0 1 0 and R-1 = 0 1 0 : ð19:20Þ
0 0 1 0 0 1

This represents a reflection with respect to the yz-plane. Then we have

f R - 1 ðrÞ = Rf ðrÞ = f ðrÞ, ð19:21Þ

g R - 1 ðrÞ = RgðrÞ = - gðrÞ: ð19:22Þ

Plotting f(r) and g(r) as a function of x on the x-axis, we depict results in Fig. 19.4.
Looking at (19.21) and (19.22), we find that f(r) and g(r) are solutions of an
eigenvalue equation for an operator R. Corresponding eigenvalues are 1 and -1 for
f(r) and g(r), respectively. In particular, f(r) holds the functional form after the
transformation R. In this case, f(r) is said to be invariant with the transformation R.
Moreover, f(r) is invariant with the following eight transformations:

±1 0 0
R= 0 ±1 0 : ð19:23Þ
0 0 ±1

These transformations form a group that is isomorphic to D2h. Therefore, f(r) is


eligible for a basis function of the totally symmetric representation Ag of D2h. Notice
that f(r) is invariant as well with a rotation of an arbitrary angle around the x-axis. On
the other hand, g(r) belongs to B3u.
19.2 Method of Molecular Orbitals (MOs) 761

19.2 Method of Molecular Orbitals (MOs)

Bearing in mind these arguments, we examine several examples of quantum chem-


ical calculations. Our approach is based upon the molecular orbital theory. The
theory assumes the existence of molecular orbitals (MOs) in a molecule, as the
notion of atomic orbitals has been well-established in atomic physics. Furthermore,
we assume that the molecular orbitals comprise a linear combination of atomic
orbitals (LCAO).
This notion is equivalent to that individual electrons in a molecule are indepen-
dently moving in a potential field produced by nuclei and other electrons. In other
words, we assume that each electron is moving along an MO that is extended over
the whole molecule. Electronic state in the molecule is formed by different MOs of
various energies that are occupied by electrons. As in the case of an atom, an MO
ψ i(r) occupied by an electron is described as

ħ2 2
Hψ i ðrÞ  - — þ V ðrÞ ψ i ðrÞ = E i ψ i ðrÞ, ð19:24Þ
2m

where H is Hamiltonian of a molecule; m is a mass of an electron (note that we do not


use a reduced mass μ here); r is a position vector of the electron; ∇2 is the Laplacian
(Laplace operator); V(r) is a potential of the molecule at r; Ei is an energy of the
electron occupying ψ i and said to be a molecular orbital energy. We assume that V(r)
possesses a symmetry the same as that of the molecule.
Let ℊ be a symmetry group and let a group element arbitrarily chosen from
among ℊ be R. Suppose that an arbitrary position vector r is moved to another
position r′. This transformation is expressed as (19.2). Since V(r) has the same
symmetry as the molecule, an electron “feels” the same potential field at r′ as that
at r. That is,

V ðrÞ = V 0 ðr0 Þ = V ðr0 Þ: ð19:25Þ

Or we have

V ðrÞψ ðrÞ = V ðr0 Þψ 0 ðr0 Þ, ð19:26Þ

where ψ is an arbitrary function. Defining

V ðrÞψ ðrÞ  ½Vψ ðrÞ = Vψ ðrÞ, ð19:27Þ

and recalling (19.1) and (19.7), we get

½RV ψ ðr0 Þ = R½Vψ ðr0 Þ = V 0 ψ 0 ðr0 Þ = V 0 ðr0 Þψ 0 ðr0 Þ = V ðr0 Þψ 0 ðr0 Þ


762 19 Applications of Group Theory to Physical Chemistry

= V ðr0 ÞRψ ðr0 Þ = VRψ ðr0 Þ = ½VRψ ðr0 Þ: ð19:28Þ

Comparing the first and last sides and remembering that ψ is an arbitrary function,
we get

RV = VR: ð19:29Þ

Next, let us examine the symmetry of the Laplacian — 2. The Laplacian is defined
in Sect. 1.2 as

2 2 2
∂ ∂ ∂
—2  2
þ 2þ 2, ð1:24Þ
∂x ∂y ∂z

where x, y, and z denote the Cartesian coordinates. Let S be an orthogonal matrix that
transforms the xyz-coordinate system to x′y′z′-coordinate system. We suppose that
S is expressed as

s11 s12 s13 x′ s11 s12 s13 x


S= s21 s22 s23 and y′ = s21 s22 s23 y : ð19:30Þ
s31 s32 s33 z′ s31 s32 s33 z

Since S is an orthonormal matrix, we have

x s11 s21 s31 x′


y = s12 s22 s32 y′ : ð19:31Þ
z s13 s23 s33 z′

The equation is due to

SST = ST S = E, ð19:32Þ

where E is a unit matrix. Then we have

∂ ∂x ∂ ∂y ∂ ∂z ∂ ∂ ∂ ∂
= þ þ = s11 þ s12 þ s13 : ð19:33Þ
∂x0 ∂x0 ∂x ∂x0 ∂y ∂x0 ∂z ∂x ∂y ∂z

Partially differentiating (19.33), we have

∂ ∂ ∂ ∂ ∂ ∂ ∂
= s11 0 þ s12 0 þ s13 0
∂x 0 2 ∂x ∂x ∂x ∂y ∂x ∂z
19.2 Method of Molecular Orbitals (MOs) 763

∂ ∂ ∂ ∂ ∂ ∂ ∂ ∂
= s11 s11 þ s12 þ s13 þ s12 s11 þ s12 þ s13
∂x ∂y ∂z ∂x ∂x ∂y ∂z ∂y

∂ ∂ ∂ ∂
þs13 s11 þ s12 þ s13
∂x ∂y ∂z ∂z

2 2 2
∂ ∂ ∂
= s11 2 þ s11 s12 þ s11 s13
∂x
2 ∂y∂x ∂z∂x

2 2 2
∂ ∂ ∂
þs12 s11 þ s12 2 2 þ s12 s13
∂x∂y ∂y ∂z∂y

2 2 2
∂ ∂ ∂
þs13 s11 þ s13 s12 þ s13 2 2 : ð19:34Þ
∂x∂z ∂y∂z ∂z
∂ ∂
Calculating terms of ∂y0 2
and ∂z0 2
, we have similar results. Then, collecting all
those 27 terms, we get, e.g.,

2 2
∂ ∂
s11 2 þ s21 2 þ s31 2 2
= 2, ð19:35Þ
∂x ∂x
2 2

where we used an orthogonal relationship of S. Cross terms of , ∂ ,
∂x∂y ∂y∂x
etc. all
vanish. Consequently, we get

2 2 2
∂ ∂ ∂ ∂ ∂ ∂
þ þ = þ þ : ð19:36Þ
∂x0 2 ∂y0 2 ∂z0 2 ∂x2 ∂y2 ∂z2

Defining — ′2 as

∂ ∂ ∂
—0 
2
þ 02 þ 02 , ð19:37Þ
∂x 0 2
∂y ∂z

we have

— 0 = — 2:
2
ð19:38Þ

Notice that (19.38) holds with not only the symmetry operation of the molecule,
but also any rotation operation in ℝ3 (see Sect. 17.4).
764 19 Applications of Group Theory to Physical Chemistry

As in the case of (19.28), we have

R— 2 ψ ðr0 Þ = R — 2 ψ ðr0 Þ = — 0 ψ 0 ðr0 Þ = — 2 ψ 0 ðr0 Þ


2

= — 2 Rψ ðr0 Þ = — 2 R ψ ðr0 Þ: ð19:39Þ

Consequently, we get

R— 2 = — 2 R: ð19:40Þ

Adding both sides of (19.29) and (19.40), we get

R — 2 þ V = — 2 þ V R: ð19:41Þ

From (19.24), we have

RH = HR: ð19:42Þ

Thus, we confirm that the Hamiltonian H commutes with any symmetry operation
R. In other words, H is invariant with the coordinate transformation relevant to the
symmetry operation.
Now we consider matrix representation of (19.42). Also let D be an irreducible
representation of the symmetry group ℊ. Then, we have

DðgÞH = HDðgÞ g 2 ℊ :

Let {ψ 1, ψ 2, ⋯, ψ d} be a set of basis vectors that span a representation space L S


associated with D. Then, on the basis of Schur’s Second Lemma of Sect. 18.3,
(19.42) immediately leads to an important conclusion that if H is represented by a
matrix, we must have

H = λE, ð19:43Þ

where λ is an arbitrary complex constant. That is, H is represented such that

λ ⋯ 0
H= ⋮ ⋱ ⋮ , ð19:44Þ
0 ⋯ λ

where the above matrix is (d, d ) square diagonal matrix. This implies that H is
reduced to
19.2 Method of Molecular Orbitals (MOs) 765

H = λDð0Þ ⋯ λDð0Þ ,

where D(0) denotes the totally symmetric representation; notice that it is given by
1 for any symmetry operation. Thus, the commutability of an operator with all
symmetry operation is equivalent to that the operator belongs to the totally symmet-
ric representation. Since H is Hermitian, λ should be real.
Operating both sides of (19.42) on {ψ 1, ψ 2, ⋯, ψ d} from the right, we have

ðψ 1 ψ 2 ⋯ ψ d ÞRH = ðψ 1 ψ 2 ⋯ ψ d ÞHR

λ ⋯
λ
= ðψ 1 ψ 2 ⋯ ψ d Þ ⋮ ⋱ ⋮ R = ðλψ 1 λψ 2 ⋯ λψ d ÞR
⋯ λ

= λðψ 1 ψ 2 ⋯ ψ d ÞR: ð19:45Þ

In particular, putting R = E, we get

ðψ 1 ψ 2 ⋯ ψ d ÞH = λðψ 1 ψ 2 ⋯ ψ d Þ:

Simplifying this equation, we have

ψ i H = λψ i ð1 ≤ i ≤ dÞ: ð19:46Þ

Thus, λ is found to be an energy eigenvalue of H and ψ i (1 ≤ i ≤ d ) are


eigenfunctions belonging to the eigenvalue λ.
Meanwhile, using

d
Rðψ i Þ = k=1
ψ k Dki ðRÞ,

we rewrite (19.45) as

ðψ 1 ψ 2 ⋯ ψ d ÞRH = ðψ 1 R ψ 2 R ⋯ ψ d RÞH

d d d
= k=1
ψ k Dk1 ðRÞ k=1
ψ k Dk2 ðRÞ ⋯ k=1
ψ k Dkd ðRÞ H

d d d
=λ k=1
ψ k Dk1 ðRÞ k=1
ψ k Dk2 ðRÞ ⋯ k=1
ψ k Dkd ðRÞ ,

where with the second equality we used the relation (11.42); i.e.,
766 19 Applications of Group Theory to Physical Chemistry

ψ i R = Rðψ i Þ:

Thus, we get

d d
k=1
ψ k Dki ðRÞH = λ k=1
ψ k Dki ðRÞ ð1 ≤ i ≤ dÞ: ð19:47Þ

The relations (19.45)–(19.47) imply that ψ 1, ψ 2, ⋯, and ψ d as well as their linear


combinations using a representation matrix D(R) are eigenfunctions that belong to
the same eigenvalue λ. That is, ψ 1, ψ 2, ⋯, and ψ d are said to be degenerate with a
multiplication d.
After the remarks of Theorem 14.5, we can construct an orthonormal basis set
fψ~ 1 , ψ~ 2 , ⋯, ψ~ d g as eigenfunctions. These vectors can be constructed via linear
combinations of ψ 1, ψ 2, ⋯, and ψ d. After transforming the vectors ψ~ i ð1 ≤ i ≤ dÞ
by R, we get

d d d d
k=1
ψ~ k Dki ðRÞj l=1
ψ~ l Dlj ðRÞ = k=1 l=1
Dki ðRÞ Dlj ðRÞhψ~ k j~
ψ li

d d
= k=1
Dki ðRÞ Dkj ðRÞ = k=1
D{ ðRÞ ik ½DðRÞkj = D{ ðRÞDðRÞ ij
= δij :

With the last equality, we used unitarity of D(R). Thus, the orthonormal basis set
is retained after unitary transformation of fψ~ 1 , ψ~ 2 , ⋯, ψ~ d g.
In the above discussion, we have assumed that ψ 1, ψ 2, ⋯, and ψ d belong to a
certain irreducible representation D(ν). Since ψ~ 1 , ψ~ 2 , ⋯, and ψ~ d consist of their linear
combinations, ψ~ 1 , ψ~ 2 , ⋯, and ψ~ d belong to D(ν) as well. Also, we have assumed that
ψ~ 1 , ψ~ 2 , ⋯, and ψ~ d form an orthonormal basis and, hence, according to Theorem 11.3
these vectors constitute basis vectors that belong to D(ν). In particular, functions
derived using projection operators share these characteristics. In fact, the above
principles underlie molecular orbital calculations dealt with in Sects. 19.4 and
19.5. We will go into more details in subsequent sections.
Bearing the aforementioned argument in mind, we make the most of the relation
expressed by (18.171) to evaluate inner products of functions (or vectors). In this
connection, we often need to calculate matrix elements of an operator. One of the
most typical examples is matrix elements of Hamiltonian. In the field of molecular
science we have to estimate an overlap integral, Coulomb integral, resonance
integral, etc. Other examples include transition matrix elements pertinent to, e.g.,
electric dipole transition. To this end, we deal with direct-product representation (see
Sects. 18.7 and 18.8). Let O(γ) be an Hermitian operator belonging to the γ-th
irreducible representation. Let us think of a following inner product:
19.2 Method of Molecular Orbitals (MOs) 767

ðβ Þ
ϕl jOðγÞ ψ ðsαÞ , ð19:48Þ

ðβ Þ
where ψ ðsαÞ and ϕl are the s-th component of α-th irreducible representation and the
l-th component of β-th irreducible representation, respectively; see (18.146) and
(18.161).
(i) Let us think of first the case where O(γ) is Hamiltonian H. Suppose that ψ ðsαÞ
belongs to an eigenvalue λ of H. Then, we have

ðαÞ ðαÞ
H j ψ l i = λ j ψ l i:

In that case, with (19.48) we get

ðβ Þ ðβÞ
ϕl jHψ ðsαÞ = λ ϕl jψ ðsαÞ : ð19:49Þ

Having the benefit of (18.171), (19.49) can be rewritten as

ðβÞ ðαÞ ðαÞ


ϕl jHψ ðsαÞ = λδαβ δls ϕl jψ l : ð19:50Þ

At the first glance this equation seems trivial, but the equation gives us a powerful
tool to save us a lot of troublesome calculations. In fact, if we encounter a series of
inner product calculations (i.e., definite integrals), we ignore many of them. In
(19.50), the matrix elements of Hamiltonian do not vanish only if α = β and l = s.
ðαÞ ðαÞ
That is, we only have to estimate ϕl jψ l . Otherwise the inner products vanish.
ðαÞ ðαÞ
The functional forms of ϕl and ψ l are determined depending upon individual
practical problems. We will show tangible examples later (Sects. 19.4 and 19.5).
(ii) Next, let us consider matrix elements of the optical transition. In this case, we
are thinking of transition probability between quantum states. Assuming the dipole
ðβ Þ
approximation, we use εe  P(γ) for O(γ) in ϕl jOðγÞ ψ ðsαÞ of (19.48). The quantity
P(γ) represents an electric dipole moment associated with the position vector.
Normally, we can readily find it in a character table, in which we examine which
irreducible representation γ the position vector components x, y, and z indicated in a
rightmost column correspond to. Table 18.4 is an example. If we take a unit
polarization vector εe in parallel to the position vector component that the character
ðβ Þ
table designates, εe  P(γ) is non-vanishing. Suppose that ψ ðsαÞ and ϕl are an initial
state and final state, respectively. At the first glance, we would wish to use (18.200)
and count how many times a representation β occurs for a direct-product represen-
tation D(γ × α). It is often the case, however, where β does not belong to an irreducible
representation.
Even in such a case, if either α or β belongs to a totally symmetric representation,
the handling will be easier. This situation corresponds to that an initial electronic
configuration or a final electronic configuration forms a closed shell an electronic
768 19 Applications of Group Theory to Physical Chemistry

state of which is totally symmetric [1]. The former occurs when we consider the
optical absorption that takes place, e.g., in a molecule of a ground electronic state.
The latter corresponds to an optical emission that ends up with a ground state. Let us
ðγ Þ
consider the former case. Since Oj is Hermitian, we rewrite (19.48) as

ðβÞ ðγ Þ ðγ Þ ðβÞ ðγ Þ ðβ Þ 
ϕl jOj ψ ðsαÞ = Oj ϕl jψ ðsαÞ = ψ ðsαÞ jOj ϕl , ð19:51Þ

where we assume that ψ ðsαÞ is a ground state having a closed shell electronic
configuration. Therefore, ψ ðsαÞ belongs to a totally symmetric irreducible represen-
tation. For (19.48) not to vanish, therefore, we may alternatively state that it is
necessary for

Dðγ × βÞ = DðγÞ  DðβÞ

to contain D(α) belonging to a totally symmetric representation. Note that in group


theory we usually write D(γ) × D(β) instead of D(γ)  D(β); see (18.191).
ðβÞ
If ϕl belongs to a reducible representation, from (18.80) we have

DðβÞ = q D
ω ω
ðωÞ
,

where D(ω) belongs to an irreducible representation ω. Then, we get

Dðγ Þ × DðβÞ = q D
ω ω
ðγ Þ
× DðωÞ :

Thus, applying (18.199) and (18.200), we can obtain a direct sum of irreducible
representations. After that, we examine whether an irreducible representation α is
contained in D(γ) × D(β). Since the totally symmetric representation is
one-dimensional, s = 1 in (19.51).

19.3 Calculation Procedures of Molecular Orbitals (MOs)

We describe a brief outline of the MO method based on LCAO (LCAOMO).


Suppose that a molecule consists of n atomic orbitals and that each MO ψ i (1 ≤ i ≤ n)
comprises a linear combination of those n atomic orbitals ϕk (1 ≤ k ≤ n). That is,
we have
n
ψi = c ϕ
k = 1 ki k
ð1 ≤ i ≤ nÞ, ð19:52Þ

where cki are complex coefficients. We assume that ϕk are normalized. That is,
19.3 Calculation Procedures of Molecular Orbitals (MOs) 769

hϕk jϕk i  ϕk ϕk dτ = 1, ð19:53Þ

where dτ implies that an integration should be taken over ℝ3. The notation hg| fi
means an inner product defined in Sect. 13.1. The inner product is usually defined by
a definite integral of gf whose integration range covers a part or all of ℝ3 depending
on a constitution of a physical system.
First we try to solve Schrödinger equation given as an eigenvalue equation. The
said equation is described as

Hψ = λψ or ðH - λψ Þ = 0: ð19:54Þ

Replacing ψ with (19.52), we have


n
c ðH
k=1 k
- λÞϕk = 0 ð1 ≤ k ≤ nÞ, ð19:55Þ

where the subscript i in (19.52) has been omitted for simplicity. Multiplying ϕj from
the left and integrating over whole ℝ3, we have

n
c
k=1 k
ϕj ðH - λÞϕk dτ = 0: ð19:56Þ

Rewriting (19.56), we get

n
c
k=1 k
ϕj Hϕk - λϕj ϕk dτ = 0: ð19:57Þ

Here let us define following quantities:

H jk = ϕj Hϕk dτ and Sjk = ϕj ϕk dτ, ð19:58Þ

where Sii = 1 (1 ≤ i ≤ n) due to a normalized function of ϕi. Then we get


n
c
k=1 k
H jk - λSjk = 0: ð19:59Þ

Rewriting (19.59) in a matrix form, we get

H 11 - λ ⋯ H 1n - λS1n c1
⋮ ⋱ ⋮ ⋮ = 0: ð19:60Þ
H n1 - λSn1 ⋯ H nn - λ cn

Suppose that by solving (19.59) or (19.60), we get λi (1 ≤ i ≤ n), some of which


may be identical (i.e., the degenerate case), and obtain n different column
770 19 Applications of Group Theory to Physical Chemistry

eigenvectors corresponding to n eigenvalues λi. In light of (12.4) of Sect. 12.1, the


following condition must be satisfied for this to get eigenvectors for which not all ck
is zero:

det H jk - λSjk = 0

or

H 11 - λ ⋯ H 1n - λS1n
⋮ ⋱ ⋮ = 0: ð19:61Þ
H n1 - λSn1 ⋯ H nn - λ

Equation (19.61) is called a secular equation. This equation is pertinent to a


determinant of an order n, and so we are expected to get n roots for this, some of
which are identical (i.e., a degenerate case).
In the above discussion, it is useful to introduce the following notation:

ϕj jHϕk  H jk = ϕj Hϕk dτ and ϕj jϕk  Sjk = ϕj ϕk dτ: ð19:62Þ

The above notation has already been introduced in Sect. 1.4. Equation (19.62)
certainly satisfies the definition of the inner product described in (13.2)–(13.4);
readers, please check it. On the basis of (13.64), we have

hyjHxi = xH { jy : ð19:63Þ

Since H is Hermitian, H{ = H. That is,

hyjHxi = hxHjyi: ð19:64Þ

To solve (19.61) with an enough large number n is usually formidable. However,


if we can find appropriate conditions, (19.61) can pretty easily be solved. An
essential point rests upon how we deal with off-diagonal elements Hjk and Sjk. If
we are able to appropriately choose basis vectors so that we can get

H jk = 0 and Sjk = 0 ðj ≠ k Þ, ð19:65Þ

the secular equation is reduced to a simple form

~ 11 - λ~S11
H
~ 22 - λ~S22
H

= 0, ð19:66Þ
~ nn - λ~Snn
H

where all the off-diagonal elements are zero and an eigenvalue λ is given by
19.3 Calculation Procedures of Molecular Orbitals (MOs) 771

~ ii =~
λi = H Sii : ð19:67Þ

This means that the eigenvalue equation has automatically been solved.
The best way to achieve this is to choose basis functions (vectors) such that the
functions conform to the symmetry which the molecule belongs to. That is, we
“shuffle” the atomic orbitals as follows:
n
ξi = d ϕ
k = 1 ki k
ð1 ≤ i ≤ nÞ, ð19:68Þ

where ξi are new functions chosen instead of ψ i of (19.52). That is, we construct a
linear combination of the atomic orbitals that belongs to individual irreducible
representation. The said linear combination is called symmetry-adapted linear com-
bination (SALC). We remark that all atomic orbitals are not necessarily included in
the SALC. Then we have

~ jk =
H ξj Hξk dτ and ~
Sjk = ξj ξk dτ: ð19:69Þ

Thus, instead of (19.61), we have a new secular equation of

~ jk - λ~
det H Sjk = 0: ð19:70Þ

Since the Hamiltonian H is totally symmetric, in terms of the direct-product


representation Hξk in (19.69) belongs to an irreducible representation which ξk
belongs to. This can intuitively be understood. However, to assert this, use (18.76)
and (18.200) along with the fact that characters of the totally symmetric representa-
tion are 1.
In light of (18.171) and (18.181), if ξk and ξj belong to different irreducible
~ jk and ~
representations, H Sjk both vanish at once. From (18.172), at the same time, ξk
and ξj are orthogonal to each other. The most ideal situation is to get (19.66) with all
the off-diagonal elements vanishing. Regarding the diagonal elements of (19.70), we
always have the direct product of the same representation and, hence, the integrals
(19.69) do not vanish. Thus, we get a powerful guideline for the evaluation of
(19.70) in an as simplest as possible form.
If the SALC orbitals ξj and ξk belong to the same irreducible representation, the
relevant matrix elements are generally non-vanishing. Even in that case, however,
according to Theorem 13.2 we can construct a set of orthonormal vectors by taking
appropriate linear combination of ξj and ξk. The resulting vectors naturally belong to
the same irreducible representation. This can be done by solving the secular equation
as described below. On top of it, Hermiticity of the Hamiltonian ensures that those
vectors can be rendered orthogonal (see Theorem 14.5). If n electrons are present in a
molecule, we are to deal with n SALC orbitals which we view as vectors accord-
ingly. In terms of representation theory, these vectors span a representation space
where the vectors undergo symmetry operations. Thus, we can construct
772 19 Applications of Group Theory to Physical Chemistry

orthonormal basis vectors belonging to various irreducible representations through-


out the representation space.
To address our problems, we take following procedures: (i) First we have to
determine a symmetry species (i.e., point group) of a molecule. (ii) Next, we pick up
atomic orbitals contained in a molecule and examine how those orbitals are
transformed by symmetry operations. (iii) We examine how those atomic orbitals
are transformed according to symmetry operations. Since the symmetry operations
are represented by a (n, n) unitary matrix, we can readily decide a trace of the matrix.
Generally, that matrix representation is reducible, and so we reduce the representa-
tion according to the procedures of (18.81)–(18.83). Thus, we are able to determine
how many irreducible representations are contained in the original reducible repre-
sentation. (iv) After having determined the irreducible representations, we construct
SALCs and constitute a secular equation using them. (v) Solving the secular
equation, we determine molecular orbital energies and decide functional forms of
the corresponding MOs. (vi) We examine physicochemical properties such as optical
transition within a molecule.
In the procedure (iii) of the above, if the same irreducible representations appear
more than once, we have to solve a secular equation of an order of two or more. Even
in that case, we can render a set of resulting eigenvectors orthogonal to one another
during the process of solving a problem.
To construct the above-mentioned SALC, we make the most of projection
operators that are defined in Sect. 14.1. In (18.147) putting m = l, we have

ðαÞ dα ðαÞ
PlðlÞ = Dll ðgÞ g: ð19:71Þ
n g

Or we can choose a projection operator P(α) described as

dα dα 
dα ðαÞ
PðαÞ = D ðgÞ g =
l = 1 ll
χ ðαÞ ðgÞ g: ð18:174Þ
n g n g

ðαÞ
In the one-dimensional representation, PlðlÞ and P(α) are identical. As expressed in
(18.155) and (18.175), these projection operators act on an arbitrary function and
extract specific component(s) pertinent to a specific irreducible representation of the
point group which the molecule belongs to.
ðαÞ
At first glance the definition of PlðlÞ and P(α) looks daunting, but the use of
character tables relieves a calculation task. In particular, all the irreducible represen-
tations are one-dimensional (i.e., just a number!) with Abelian groups as mentioned
in Sect. 18.6. For this, notice that individual group elements form a class by
themselves. Therefore, utilization of character tables becomes easier for Abelian
groups. Even though we encounter a case where a dimension of representation is
ðαÞ
more than one (i.e., the case of non-commutative groups), Dll can be determined
without much difficulty.
19.4 MO Calculations Based on π-Electron Approximation 773

19.4 MO Calculations Based on π-Electron Approximation

On the basis of the general argument developed in Sects. 19.2 and 19.3, we preform
molecular orbital calculations of individual molecules. First, we apply group theory
to the molecular orbital calculations about aromatic hydrocarbons such as ethylene,
cyclopropenyl radical and cation as well as benzene and allyl radical. In these cases,
in addition to adoption of the molecular orbital theory, we adopt the so-called
π-electron approximation. With the first three molecules, we will not have to
“solve” a secular equation. However, for allyl radical we deal with two SALC
orbitals belonging to the same irreducible representations and, hence, the final
molecular orbitals must be obtained by solving the secular equation.

19.4.1 Ethylene

We start with one of the simplest examples, ethylene. Ethylene is a planar molecule
and belongs to D2h symmetry (see Sect. 17.2). In the molecule two π-electrons
extend vertically to the molecular plane toward upper and lower directions
(Fig. 19.5). The molecular plane forms a node to atomic 2pz orbitals; that is those
atomic orbitals change a sign relative to the molecular plane (i.e., the xy-plane).
In Fig. 19.5, two pz atomic orbitals of carbon are denoted by ϕ1 and ϕ2. We
should be able to construct basis vectors using ϕ1 and ϕ2. Corresponding to the two
atomic orbitals, we are dealing with a two-dimensional vector space. Let us consider
how ϕ1 and ϕ2 are transformed by a symmetry operation. First we examine an
operation C2(z). This operation exchanges ϕ1 and ϕ2. That is, we have

H z H
y y
C C
H H x
x
Fig. 19.5 Ethylene molecule placed on the xyz-coordinate system. Two pz atomic orbitals of carbon
are denoted by ϕ1 and ϕ2. The atomic orbitals change a sign relative to the molecular plane (i.e., the
xy-plane)
774 19 Applications of Group Theory to Physical Chemistry

C 2 ð z Þ ð ϕ1 Þ = ϕ2 ð19:72Þ

and

C 2 ð z Þ ð ϕ2 Þ = ϕ 1 : ð19:73Þ

Equations (19.72) and (19.73) can be combined into a following equation:

ðϕ1 ϕ2 ÞC2 ðzÞ = ðϕ2 ϕ1 Þ: ð19:74Þ

Thus, using a matrix representation, we have

C 2 ðzÞ = 0
1
1
0
: ð19:75Þ

In Sect. 17.2, we had

cos θ – sin θ 0
Rzθ = sin θ cos θ 0 ð19:76Þ
0 0 1

or

-1 0 0
C2 ðzÞ = 0 -1 0 : ð19:77Þ
0 0 1

Whereas (19.76) and (19.77) show the transformation of a position vector in ℝ3,
(19.75) represents the transformation of functions in a two-dimensional vector space.
A vector space composed of functions may be finite-dimensional or infinite-
dimensional. We have already encountered the latter case in Part I where we dealt
with the quantum mechanics of a harmonic oscillator. Such a function space is often
referred to as a Hilbert space. An essence of (19.75) is characterized by that a trace
(or character) of the matrix is zero.
Let us consider another transformation C2( y). In this case the situation is different
from the above case in that ϕ1 is converted to -ϕ1 by C2( y) and that ϕ2 is converted
to -ϕ2. Notice again that the molecular plane forms a node to atomic p-orbitals.
Thus, C2( y) is represented by

-1
C 2 ð yÞ = 0
0
-1
: ð19:78Þ

The trace of the matrix C2( y) is -2. In this way, choosing ϕ1 and ϕ2 for the basis
functions for the representation of D2h we can determine the characters for individual
symmetry transformations of atomic 2pz orbitals in ethylene belonging to D2h. We
collect the results in Table 19.1.
19.4 MO Calculations Based on π-Electron Approximation 775

Table 19.1 Characters for individual symmetry transformations of atomic 2pz orbitals in ethylene
D2h E C2(z) C2( y) C2(x) i σ(xy) σ(zx) σ(yz)
Γ 2 0 -2 0 0 -2 0 2

Table 19.2 Character table of D2h


D2h E C2(z) C2( y) C2(x) i σ(xy) σ(zx) σ(yz)
Ag 1 1 1 1 1 1 1 1 x2, y2, z2
B1g 1 1 -1 -1 1 1 -1 -1 xy
B2g 1 -1 1 -1 1 -1 1 -1 zx
B3g 1 -1 -1 1 1 -1 -1 1 yz
Au 1 1 1 1 -1 -1 -1 -1
B1u 1 1 -1 -1 -1 -1 1 1 z
B2u 1 -1 1 -1 -1 1 -1 1 y
B3u 1 -1 -1 1 -1 1 1 -1 x

Next, we examine what kind of irreducible representations is contained in our


present representation of ethylene. To do this, we need a character table of D2h (see
Table 19.2). If a specific kind of irreducible representation is contained, then we
want to examine how many times that specific representation takes place. Equation
(18.83) is very useful for this purpose. In the present case, n = 8 in (18.83). Also
taking into account (18.79) and (18.80), we get

Γ = B1u B3g , ð19:79Þ

where Γ is a reducible representation for a set consisting of two pz atomic orbitals of


ethylene. Equation (19.79) clearly shows that Γ is a direct sum of two irreducible
representations B1u and B3g that belong to D2h. In instead of (19.79), we usually
simply express the direct product in quantum chemical notation as

Γ = B1u þ B3g : ð19:80Þ

As a next step, we are going to find an appropriate basis function belonging to two
irreducible representations of (19.80). For this purpose, we use projection operators
expressed as

dα 
PðαÞ = χ ðαÞ ðgÞ g: ð18:174Þ
n g

Taking ϕ1, for instance, we apply (18.174) to ϕ1. That is,

1 
PðB1u Þ ϕ1 = χ ðB1u Þ ðgÞ gϕ1
8 g
776 19 Applications of Group Theory to Physical Chemistry

Table 19.3 Character table C2 E C2


of C2
A 1 1 z; x2, y2, z2, xy
B 1 -1 x, y; yz, zx

1
= ½1 ∙ ϕ1 þ 1 ∙ ϕ2 þ ð- 1Þð- ϕ1 Þ þ ð- 1Þð- ϕ2 Þ þ ð- 1Þð- ϕ2 Þ
8
1
þð- 1Þð- ϕ1 Þ þ 1 ∙ ϕ2 þ 1 ∙ ϕ1  = ðϕ1 þ ϕ2 Þ: ð19:81Þ
2

Also with the B3g, we apply (18.174) to ϕ1 and get


PðB3g Þ ϕ1 = χ ðB3g Þ ðgÞ gϕ1
1
8 g

1
= ½1 ∙ ϕ1 þ ð- 1Þϕ2 þ ð- 1Þð- ϕ1 Þ þ 1ð- ϕ2 Þ þ 1ð- ϕ2 Þ þ ð- 1Þð- ϕ1 Þ
8
1
þð- 1Þϕ2 þ 1 ∙ ϕ1  = ðϕ - ϕ2 Þ: ð19:82Þ
2 1

Thus, after going through routine but sure procedures, we have reached appro-
priate basis functions that belong to each irreducible representation.
As mentioned earlier (see Table 17.5), D2h can be expressed as a direct-product
group and C2 group is contained in D2h as a subgroup. We can use it for the present
analysis. Suppose now that we have a group ℊ and that H is a subgroup of ℊ. Let
g be an arbitrary element of ℊ and let D(g) be a representation of g. Meanwhile, let
h be an arbitrary element of H . Then with 8 h 2 H , a collection of D(h) is a
representation of H . We write this relation as

D#H: ð19:83Þ

This representation is called a subduced representation of D to H . Table 19.3


shows a character table of irreducible representations of C2. In the present case, we
are thinking of C2(z) as C2; see Table 17.5 and Fig. 19.5. Then we have

B1u # C 2 = A and B3g # C 2 = B: ð19:84Þ

The expression of (19.84) is called a compatibility relation. Note that in (19.84)


C2 is not a symmetry operation, but means a subgroup of D2h. Thus, (19.81) is
reduced to

1  1 1
PðAÞ ϕ1 = χ ðAÞ ðgÞ gϕ1 = ½1 ∙ ϕ1 þ 1 ∙ ϕ2  = ðϕ1 þ ϕ2 Þ: ð19:85Þ
2 g 2 2

Also, we have
19.4 MO Calculations Based on π-Electron Approximation 777

1  1 1
PðBÞ ϕ1 = χ ðBÞ ðgÞ gϕ1 = ½1 ∙ ϕ1 þ ð- 1Þϕ2  = ðϕ1 - ϕ2 Þ: ð19:86Þ
2 g 2 2

The relations of (19.85) and (19.86) are essentially the same as (19.81) and
(19.82), respectively.
We can easily construct a character table (see Table 19.3). There should be two
irreducible representations. Regarding the totally symmetric representation, we
allocate 1 to each symmetry operation. For another representation we allocate 1 to
an identity element E and -1 to an element C2 so that the row and column of the
character table are orthogonal to each other.
Readers might well ask why we bother to make circuitous approaches to reaching
predictable results such as (19.81) and (19.82) or (19.85) and (19.86). This question
seems natural when we are dealing with a case where the number of basis vectors
(i.e., a dimension of the vector space) is small, typically 2 as in the present case. With
increasing dimension of the vector space, however, to seek and determine appropri-
ate SALCs become increasingly complicated and difficult. Under such circum-
stances, a projection operator is an indispensable tool to address the problems.
We have a two-dimensional secular equation to be solved such that

~ 11 - λS~11
H ~ 12 - λS~12
H
~ 21 - λ~S21 ~ 22 - λ~S22
= 0: ð19:87Þ
H H

In the above argument, 12 ðϕ1 þ ϕ2 Þ belongs to B1u and 12 ðϕ1 - ϕ2 Þ belongs to B3g.
Since they belong to different irreducible representations, we have

~ 12 = ~
H S12 = 0: ð19:88Þ

Thus, the secular equation (19.87) is reduced to

~ 11 - λ~S11
H
0 H
0
~ 22 - λ~S22 = 0: ð19:89Þ

As expected, (19.89) has automatically been solved to give a solution

~ 11 =~
λ1 = H S11 ~ 22 =~S22 :
and λ2 = H ð19:90Þ

The next step is to determine the energy eigenvalue of the molecule. Note here
that a role of SALCs is to determine a suitable irreducible representation that
corresponds to a “direction” of a vector. As the coefficient keeps the direction of a
vector unaltered, it would be of secondary importance. The final form of normalized
MOs can be decided last. That procedure includes the normalization of a vector.
Thus, we tentatively choose following functions for SALCs, i.e.,
778 19 Applications of Group Theory to Physical Chemistry

ξ 1 = ϕ1 þ ϕ2 and ξ 2 = ϕ 1 - ϕ2 :

Then, we have

~ 11 =
H ξ1 Hξ1 dτ = ðϕ1 þ ϕ2 Þ H ðϕ1 þ ϕ2 Þdτ

= ϕ1 Hϕ1 dτ þ ϕ1 Hϕ2 dτ þ ϕ2 Hϕ1 dτ þ ϕ2 Hϕ2 dτ

= H 11 þ H 12 þ H 21 þ H 22 = H 11 þ 2H 12 þ H 22 : ð19:91Þ

Similarly, we have

~ 22 =
H ξ2 Hξ2 dτ = H 11 - 2H 12 þ H 22 : ð19:92Þ

The last equality comes from the fact that we have chosen real functions for ϕ1
and ϕ2 as studied in Part I. Moreover, we have

H 11  ϕ1 Hϕ1 dτ = ϕ1 Hϕ1 dτ = ϕ2 Hϕ2 dτ = H 22 ,

H 12  ϕ1 Hϕ2 dτ = ϕ1 Hϕ2 dτ = ϕ2 Hϕ1 dτ = ϕ2 Hϕ1 dτ

= H 21 : ð19:93Þ

The first equation comes from the fact that both H11 and H22 are calculated using
the same 2pz atomic orbital of carbon. The second equation results from the fact that
H is Hermitian. Notice that both ϕ1 and ϕ2 are real functions. Following the
convention, we denote

α  H 11 = H 22 and β  H 12 , ð19:94Þ

where α is called Coulomb integral and β is said to be resonance integral. Then,


we have

~ 11 = 2ðα þ βÞ:
H ð19:95Þ

In a similar manner, we get


19.4 MO Calculations Based on π-Electron Approximation 779

~ 22 = 2ðα - βÞ:
H ð19:96Þ

Meanwhile, we have

~
S11 = hξ1 jξ1 i = ðϕ1 þ ϕ2 Þ ðϕ1 þ ϕ2 Þdτ = ðϕ1 þ ϕ2 Þ2 dτ

= ϕ21 dτ þ ϕ22 dτ þ 2 ϕ1 ϕ2 dτ = 2 þ 2 ϕ1 ϕ2 dτ, ð19:97Þ

where we used the fact that ϕ1 and ϕ2 have been normalized. Also following the
convention, we denote

S ϕ1 ϕ2 dτ = S12 , ð19:98Þ

where S is called overlap integral. Thus, we have

~
S11 = 2ð1 þ SÞ: ð19:99Þ

Similarly, we get

~
S22 = 2ð1 - SÞ: ð19:100Þ

Substituting (19.95) and (19.96) along with (19.99) and (19.100) for (19.90), we
get as the energy eigenvalue

αþβ α-β
λ1 = and λ2 = : ð19:101Þ
1þS 1-S

From (19.97), we get

jjξ1 jj = hξ1 jξ1 i = 2ð1 þ SÞ: ð19:102Þ

Thus, for one of MOs corresponding to an energy eigenvalue λ1, we get

j ξ1 i ϕ1 þ ϕ 2
Ψ1 = = : ð19:103Þ
ξ1 2ð1 þ SÞ

Also, we have
780 19 Applications of Group Theory to Physical Chemistry

jjξ2 jj = hξ2 jξ2 i = 2ð1 - SÞ: ð19:104Þ

For another MO corresponding to an energy eigenvalue λ2, we get

j ξ2 i ϕ1 - ϕ 2
Ψ2 = = : ð19:105Þ
ξ2 2ð 1 - SÞ

Note that both normalized MOs and energy eigenvalues depend upon whether we
ignore an overlap integral, as being the case with the simplest Hückel approximation.
Nonetheless, MO functional forms (in this case either ϕ1 + ϕ2 or ϕ1 - ϕ2) remain the
same regardless of the approximation levels about the overlap integral. Regarding
quantitative evaluation of α, β, and S, we will briefly mention it later.
Once we have decided symmetry of MO (or an irreducible representation which
the orbital belongs to) and its energy eigenvalue, we will be in a position to examine
various physicochemical properties of the molecule. One of them is an optical
transition within a molecule, particularly electric dipole transition. In most cases,
the most important transition is that occurring among the highest-occupied molec-
ular orbital (HOMO) and lowest-unoccupied molecular orbital (LUMO). In the case
of ethylene, those levels are depicted in Fig. 19.6. In a ground state (i.e., the most
stable state), two electrons are positioned in a B1u state (HOMO). An excited state is
assigned to B3g (LUMO). The ground state that consists only of fully occupied MOs
belongs to a totally symmetric representation. In the case of ethylene, the ground
state belongs to Ag accordingly.
If a photon is absorbed by a molecule, that molecule is excited by an energy of the
photon. In ethylene, this process takes place by exciting an electron from B1u to B3g.
The resulting electronic state ends up as an electron remaining in B1u and another
excited to B3g (a final state). Thus, the representation of the final sate electronic
configuration (denoted by Γ f) is described as

Γ f = B1u × B3g : ð19:106Þ

That is, the final excited state is expressed as a direct product of the states
associated with the optical transition. To determine the symmetry of the final sate,
we use (18.198) and (18.200). If Γ in (19.106) is reducible, using (18.200) we can
determine the number of times that individual representations take place. Calculating
χ ðB1u Þ ðgÞχ ðB3g Þ ðgÞ for each group element, we can readily get the result. Using a
character table, we have

Γ f = B2u : ð19:107Þ

Thus, we find that the transition is Ag ⟶ B2u, where Ag is called an initial state
electronic configuration and B2u is called a final state electronic configuration. This
transition is characterized by an electric dipole transition moment operator P. Here
19.4 MO Calculations Based on π-Electron Approximation 781

= ( )

= ( )

Fig. 19.6 HOMO and LUMO energy levels and their assignments of ethylene

we have an important physical quantity of transition matrix element. This quantity Tfi


is approximated by

T fi  Θf jεe  PjΘi , ð19:108Þ

where εe is a unit polarization vector of the electric field; Θi and Θf are the initial state
and final sate electronic configuration, respectively. The description (19.108) is in
parallel with (4.5). Notice that in (4.5) we dealt with a single electron system such as
a particle confined in a square-well potential, a sole one-dimensional harmonic
oscillator, and an electron in a hydrogen atom.
In the present case, however, we are dealing with a two-electron system, ethylene.
Consequently, we cannot describe Θf by a simple wave function, but must use a
more elaborated function. Nonetheless, when we discuss the optical transition of a
molecule, it is often the case that when we study optical absorption or optical
emission we first wish to know whether such a phenomenon truly takes place. In
such a case, qualitative prediction for this is of great importance. This can be done by
judging whether the integral (19.108) vanishes. If the integral does not vanish, the
relevant transition is said to be allowed. If, however, the integral vanishes, the
transition is called forbidden. In this context, a systematic approach based on
group theory is a powerful tool for this.
Let us consider optical absorption of ethylene. With the transition matrix element
for this, we have

T fi = Θf ðB2u Þ jεe  PjΘi ðAg Þ : ð19:109Þ


782 19 Applications of Group Theory to Physical Chemistry

Again, note that a closed-shell electronic configuration belong to a totally sym-


metric representation [1]. Suppose that the position vector x belongs to an irreducible
representation η. A necessary condition for (19.109) not to vanish is that DðηÞ × DðAg Þ
contains the irreducible representation DðB2u Þ . In the present case, all the representa-
tions are one-dimensional, and so we can use χ (ω) instead of D(ω), where ω shows an
arbitrary irreducible representation of D2h. This procedure is straightforward as seen
in Sect. 19.2.
However, if the character is real (it is the case with many symmetry groups and
with the point group D2h as well), the situation will be easier. Suppose in general that
we are examining whether a following matrix element vanishes:

ðβÞ ðαÞ
M fi = Φf jOðγÞ jΦi , ð19:110Þ

where α, β, and γ stand for irreducible representations and O is an appropriate


operator. In this case, (18.200) can be rewritten as

1 1
qα = χ ð α Þ ð gÞ  χ ð γ × β Þ ð gÞ = χ ðαÞ ðgÞχ ðγ × βÞ ðgÞ
n g n g

1 1
= χ ðαÞ ðgÞχ ðγ Þ ðgÞχ ðβÞ ðgÞ = χ ðγÞ ðgÞχ ðαÞ ðgÞχ ðβÞ ðgÞ
n g n g

1 1
= χ ðγ Þ ðgÞχ ðα × βÞ ðgÞ = χ ðγÞ ðgÞ χ ðα × βÞ ðgÞ = qγ : ð19:111Þ
n g n g

Consequently, the number of times that D(γ) appears in D(α × β) is identical to the
number of times that D(α) appears in D(γ × β). Thus, it suffices to examine whether
D(γ × β) contains D(α). In other words, we only have to examine whether qα ≠ 0 in
(19.111).
Thus, applying (19.111)–(19.109), we examine whether DðB2u × Ag Þ contains the
irreducible representation D(η) that is related to x. We easily get

B2u = B2u × Ag : ð19:112Þ

Therefore, if εe  P (or x) belongs to B2u, the transition is allowed. Consulting the


character table, we find that y belongs to B2u. In this case, in fact (19.111) reads as

1
qB2u = ½1 × 1 þ ð- 1Þ × ð- 1Þ þ 1 × 1 þ ð- 1Þ × ð- 1Þ
8
19.4 MO Calculations Based on π-Electron Approximation 783

þð- 1Þ × ð- 1Þ þ 1 × 1 þ ð- 1Þ × ð- 1Þ þ 1 × 1 = 1:

Equivalently, we simply write

B2u × B2u = Ag :

This means that if a light polarized along the y-axis is incident, i.e., εe is parallel to
the y-axis, the transition is allowed. In that situation, ethylene is said to be polarized
along the y-axis or polarized in the direction of the y-axis. As a molecular axis is
parallel to the y-axis, ethylene is polarized in the direction of the molecular axis. This
is often the case with aromatic molecules having a well-defined molecular long axis
such as ethylene.
We would examine whether ethylene is polarized along, e.g., the x-axis. From a
character table of D2h (see Table 19.2), x belongs to B3u. In that case, using (19.111)
we have

1
qB3u = ½1 × 1 þ ð- 1Þ × ð- 1Þ þ 1 × ð- 1Þ þ ð- 1Þ × 1
8

þð- 1Þ × ð- 1Þ þ 1 × 1 þ ð- 1Þ × 1 þ 1 × ð- 1Þ = 0:

This implies that B3u is not contained in B1u × B3g (=B2u).


The above results on the optical transitions are quite obvious. Once we get used to
using a character table, quick estimation will be done.

19.4.2 Cyclopropenyl Radical [1]

Let us think of another example, cyclopropenyl radical that has three resonant
structures (Fig. 19.7). It is a planar molecule and three carbon atoms form an
equilateral triangle. Hence, the molecule belongs to D3h symmetry (see Sect.
17.2). In the molecule three π-electrons extend vertically to the molecular plane
toward upper and lower directions. Suppose that cyclopropenyl radical is placed on
the xy-plane. Three pz atomic orbitals of carbons located at vertices of an equilateral
triangle are denoted by ϕ1, ϕ2 and ϕ3 in Fig. 19.8. The orbitals are numbered
clockwise so that the calculations can be consistent with the conventional notation
of a character table (vide infra). We assume that these π-orbitals take positive and
negative signs on the upper and lower sides of the plane of paper, respectively, with a
nodal plane lying on the xy-plane. The situation is similar to that of ethylene and the
problem can be treated in parallel to the case of ethylene.
As in the case of ethylene, we can choose ϕ1, ϕ2 and ϕ3 as real functions. We
construct basis vectors using these vectors. What we want to do to address the
problem is as follows: (i) We examine how ϕ1, ϕ2 and ϕ3 are transformed by the
784 19 Applications of Group Theory to Physical Chemistry

⟷ ⟷ ⋅
⋅ ⋅
Fig. 19.7 Three resonant structures of cyclopropenyl radical

Fig. 19.8 Three pz atomic


orbitals of carbons for y
cyclopropenyl radical that is
placed on the xy-plane. The
carbon atoms are located at
vertices of an equilateral
triangle. The atomic orbitals
are denoted by ϕ1, ϕ2, and
ϕ3

O x
z

symmetry operations of D3h. According to the analysis, we can determine what


irreducible representations SALCs should be assigned to. (ii) On the basis of
knowledge obtained in (i), we construct proper MOs.
In Table 19.4 we list a character table of D3h along with symmetry species. First
we examine traces (characters) of representation matrices. Similarly in the case of
ethylene, a subgroup C3 of D3h plays an essential role (vide infra). This subgroup
contains three group elements such that

C 3 = E, C3 , C 23 :

In the above, we use the same notation for the group name and group element, and
so we should be careful not to confuse them.
They are transformed as follows:

C3 ðzÞðϕ1 Þ = ϕ3 , C 3 ðzÞðϕ2 Þ = ϕ1 , and C 3 ðzÞðϕ3 Þ = ϕ2 ; ð19:113Þ


19.4 MO Calculations Based on π-Electron Approximation 785

Table 19.4 Character table D3h E 2C3 3C2 σh 2S3 3σ v


of D3h
A01 1 1 1 1 1 1 x2 + y2, z2
A02 1 1 -1 1 1 -1
E′ 2 -1 0 2 -1 0 (x, y); (x2 - y2, xy)
A001 1 1 1 -1 -1 -1
A002 1 1 -1 -1 -1 1 z
E′′ 2 -1 0 -2 1 0 (yz, zx)

C 23 ðzÞðϕ1 Þ = ϕ2 , C 23 ðzÞðϕ2 Þ = ϕ3 , and C 23 ðzÞðϕ3 Þ = ϕ1 : ð19:114Þ

Equation (19.113) can be combined into a following form:

ðϕ1 ϕ2 ϕ3 ÞC3 ðzÞ = ðϕ3 ϕ1 ϕ2 Þ: ð19:115Þ

Using a matrix representation, we have

0 1 0
C 3 ðzÞ = 0 0 1 : ð19:116Þ
1 0 0

In turn, (19.114) is expressed as

0 0 1
C 23 ðzÞ = 1 0 0 : ð19:117Þ
0 1 0

Both traces of (19.116) and (19.117) are zero.


Similarly let us check the representation matrices of other symmetry species. Of
these, e.g., for C2 related to the y-axis (see Fig. 19.8) we have

ðϕ1 ϕ2 ϕ3 ÞC 2 = ð- ϕ1 - ϕ3 - ϕ2 Þ: ð19:118Þ

Therefore,

-1 0 0
C2 = 0 0 -1 : ð19:119Þ
0 -1 0

We have a trace -1 accordingly. Regarding σ h, we have

-1 0 0
ðϕ1 ϕ2 ϕ3 Þσ h = ð- ϕ1 - ϕ2 - ϕ3 Þ and σ h = 0 -1 0 : ð19:120Þ
0 0 -1
786 19 Applications of Group Theory to Physical Chemistry

Table 19.5 Characters for individual symmetry transformations of 2pz orbitals in cyclopropenyl
radical
D3h E 2C3 3C2 σh 2S3 3σ v
Γ 3 0 -1 -3 0 1

Table 19.6 Character table C3 E C3 C 23 ε = exp (i2π/3)


of C3
A 1 1 1 z; x2 + y2, z2
E 1 ε ε (x, y); (x2 - y2, xy), (yz, zx)
1 ε ε

In this way we can determine the trace for individual symmetry transformations
of basis functions ϕ1, ϕ2, and ϕ3. We collect the results of characters of a reducible
representation Γ in Table 19.5. It can be reduced to a summation of irreducible
representations according to the procedures given in (18.81)–(18.83) and using a
character table of D3h (Table 19.4). As a result, we get

Γ = A2 00 þ E00 : ð19:121Þ

As in the case of ethylene, we make the best use of the information of a subgroup
C3 of D3h. Let us consider a subduced representation of D of D3h to C3. For this, in
Table 19.6 we show a character table of irreducible representations of C3. We can
readily construct this character table. There should be three irreducible representa-
tions. Regarding the totally symmetric representation, we allocate 1 to each sym-
metry operation. Hence, for other representations we allocate 1 to an identity element
E and two other triple roots of 1, i.e., ε and ε [where ε = exp (i2π/3)] to an element
C3 and C32 as shown so that the row and column vectors of the character table are
orthogonal to each other.
Returning to the construction of the subduced representation, we have

A2 00 # C 3 = A and E 00 # C 3 = 2E: ð19:122Þ

Then, corresponding to (18.147) we have

1  1
PðAÞ ϕ1 = χ ðAÞ ðgÞ gϕ1 = ½1 ∙ ϕ1 þ 1 ∙ ϕ2 þ 1 ∙ ϕ3 
3 g 3

1
= ðϕ1 þ ϕ2 þ ϕ3 Þ: ð19:123Þ
3

Also, we have
19.4 MO Calculations Based on π-Electron Approximation 787


P½E  ϕ1 = χ ½E  ðgÞ gϕ1 = ½1 ∙ ϕ1 þ ε ϕ3 þ ðε Þ ϕ2 
ð 1Þ 1 ð 1Þ 1
3 g 3

1
= ðϕ þ εϕ2 þ ε ϕ3 Þ: ð19:124Þ
3 1

Also, we get


P½E  ϕ1 = χ ½E  ðgÞ gϕ1 = ½1 ∙ ϕ1 þ ðε Þ ϕ3 þ ε ϕ2 
ð 2Þ 1 ð 2Þ 1
3 g
3

1
= ðϕ þ ε ϕ2 þ εϕ3 Þ: ð19:125Þ
3 1

Here is the best place to mention an eigenvalue of a symmetry operator. Let us


designate SALCs as follows:

ξ1 = ϕ1 þ ϕ2 þ ϕ3 , ξ2 = ϕ1 þ εϕ2 þ ε ϕ3 , and ξ3 = ϕ1 þ ε ϕ2 þ εϕ3 : ð19:126Þ

Let us choose C3 for a symmetry operator. Then we have

C 3 ð ξ 1 Þ = C 3 ð ϕ1 þ ϕ 2 þ ϕ3 Þ = C 3 ϕ1 þ C 3 ϕ2 þ C 3 ϕ3 = ϕ3 þ ϕ1
þ ϕ2 = ξ 1 , ð19:127Þ

where for the second equality we used the fact that C3 is a linear operator. That is,
regarding a SALC ξ1, an eigenvalue of C3 is 1.
Similarly, we have

C 3 ðξ2 Þ = C3 ðϕ1 þ εϕ2 þ ε ϕ3 Þ = C 3 ϕ1 þ εC 3 ϕ2 þ ε C 3 ϕ3

= ϕ3 þ εϕ1 þ ε ϕ2 = εðϕ1 þ εϕ2 þ ε ϕ3 Þ = εξ2 : ð19:128Þ

Furthermore, we get

C3 ðξ3 Þ = ε ξ3 : ð19:129Þ

Thus, we find that regarding SALCs ξ2 and ξ3, eigenvalues of C3 are ε and ε,
respectively. These pieces of information imply that if we appropriately choose
proper functions for basis vectors, a character of a symmetry operation for a
one-dimensional representation is identical to an eigenvalue of the said symmetry
operation (see Table 19.6).
788 19 Applications of Group Theory to Physical Chemistry

Regarding the last parts of the calculations, we follow the procedures described in
the case of ethylene. Using the above functions ξ1, ξ2, and ξ3, we construct the
secular equation such that

~ 11 - λ~S11
H
~ 22 - λ~S22
H = 0: ð19:130Þ
~ 33 - λ~S33
H

Since we have obtained three SALCs that are assigned to individual irreducible
representations A, E(1), and E(2), these SALCs span the representation space V3. This
makes off-diagonal elements of the secular equation vanish and it is simplified as in
(19.130). Here, we have

~ 11 =
H ξ1 Hξ1 dτ = ðϕ1 þ ϕ2 þ ϕ3 Þ H ðϕ1 þ ϕ2 þ ϕ3 Þdτ

= 3ðα þ 2βÞ, ð19:131Þ

where we used the same α and β as defined in (19.94). Strictly speaking, α and β
appearing in (19.131) should be slightly different from those of (19.94), because a
Hamiltonian is different. This approximation, however, would be enough for the
present studies. In a similar manner, we get.

~ 22 =
H ~ 33 =
ξ2 Hξ2 dτ = 3ðα - βÞ and H ξ3 Hξ3 dτ = 3ðα - βÞ: ð19:132Þ

Meanwhile, we have

~
S11 = hξ1 jξ1 i = ðϕ1 þ ϕ2 þ ϕ3 Þ ðϕ1 þ ϕ2 þ ϕ3 Þdτ = ðϕ1 þ ϕ2 þ ϕ3 Þ2 dτ

= 3ð1 þ 2SÞ: ð19:133Þ

Similarly, we get

S22 = ~
~ S33 = 3ð1 - SÞ: ð19:134Þ

Readers are urged to verify (19.133) and (19.134).


Substituting (19.131) through (19.134) for (19.130), we get as the energy
eigenvalue
19.4 MO Calculations Based on π-Electron Approximation 789

α þ 2β α-β α-β
λ1 = ,λ = , and λ3 = : ð19:135Þ
1 þ 2S 2 1 - S 1-S

Notice that two MOs belonging to E′′ have the same energy. These MOs are said
to be energetically degenerate. This situation is characteristic of a two-dimensional
representation. Actually, even though the group C3 has only one-dimensional
representations (because it is an Abelian group), the two complex conjugate repre-
sentations labelled E behave as if they were a two-dimensional representation
[2]. We will again encounter the same situation in a next example, benzene.
From (19.126), we get

jjξ1 jj = ξ1 jξ1 = 3ð1 þ 2SÞ: ð19:136Þ

Thus, as one of MOs corresponding to an energy eigenvalue λ1; i.e., Ψ 1, we get

j ξ1 i ϕ1 þ ϕ2 þ ϕ 3
Ψ1 = = : ð19:137Þ
ξ1 3ð1 þ 2SÞ

Also, we have

ξ2 = hξ2 jξ2 i = 3ð1 - SÞ and ξ3 = hξ3 jξ3 i = 3ð1 - SÞ: ð19:138Þ

Thus, for another MO corresponding to an energy eigenvalue λ2, we get

j ξ2 i ϕ1 þ εϕ2 þ ε ϕ3
Ψ2 = = : ð19:139Þ
ξ2 3ð 1 - SÞ

Also, with a MO corresponding to λ3 (=λ2), we have

j ξ3 i ϕ1 þ ε ϕ2 þ εϕ3
Ψ3 = = : ð19:140Þ
ξ3 3ð 1 - SÞ

Equations (19.139) and (19.140) include complex numbers, and so it is inconve-


nient to computer analysis. In that case, we can convert it to real numbers. In Part III,
we examined properties of unitary transformations. Since the unitary transformation
keeps a norm of a vector unchanged, this is suited to our present purpose. This can be
done using a following unitary matrix U:
790 19 Applications of Group Theory to Physical Chemistry

1 i
p - p
2 2
U= : ð19:141Þ
1 i
p p
2 2

Then we have

1 i
ðΨ 2 Ψ 3 ÞU = p ðΨ 2 þ Ψ 3 Þ p ð- Ψ 2 þ Ψ 3 Þ : ð19:142Þ
2 2

Thus, defining Ψ~2 and Ψ~3 as

1 i
Ψ~2 = p ðΨ 2 þ Ψ 3 Þ and Ψ~3 = p ð- Ψ 2 þ Ψ 3 Þ, ð19:143Þ
2 2

we get

1
Ψ~2 = ½2ϕ1 þ ðε þ ε Þϕ2 þ ðε þ εÞϕ3 
6 ð 1 - SÞ

1
= ð2ϕ1 - ϕ2 - ϕ3 Þ: ð19:144Þ
6ð 1 - SÞ

Also, we have

i i p p
Ψ~3 = ½ðε - εÞϕ2 þ ðε - ε Þϕ3  = - i 3 ϕ2 þ i 3 ϕ 3
6ð 1 - S Þ 6ð 1 - SÞ

1
= ðϕ2 - ϕ3 Þ: ð19:145Þ
2 ð 1 - SÞ

Thus, we have successfully converted complex functions to real functions. In the


above unitary transformation, notice that a norm of the vectors remains unchanged
before and after the unitary transformation.
As cyclopropenyl radical has three π electrons, two occupy the lowest energy
level of A′′. Another electron occupies a level E′′. Since this level possesses an
energy higher than α, the electron occupying this level is anticipated to be unstable.
Under such a circumstance, a molecule tends to lose the said electron so as to be a
cation. Following the argument given in the previous case of ethylene, it is easy to
make sure that the allowed transition of the cyclopropenyl radical takes place when
19.4 MO Calculations Based on π-Electron Approximation 791

the light is polarized parallel to the molecular plane (i.e., the xy-plane in Fig. 19.8).
The proof is left for readers as an exercise. This polarizing feature is typical of planar
molecules with high molecular symmetry.

19.4.3 Benzene

Benzene has structural formula which is shown in Fig. 19.9. It is a planar molecule
and six carbon atoms form a regular hexagon. Hence, the molecule belongs to D6h
symmetry. In the molecule six π-electrons extend vertically to the molecular plane
toward upper and lower directions as in the case of ethylene and cyclopropenyl
radical. This is a standard illustration of quantum chemistry and dealt with in many
textbooks. As in the case of ethylene and cyclopropenyl radical, the problem can be
treated similarly.
As before, six equivalent pz atomic orbitals of carbon are denoted by ϕ1 to ϕ6 in
Fig. 19.10. These vectors or their linear combinations span a six-dimensional
representation space. We construct basis vectors using these vectors. Following
the previous procedures, we construct proper SALC orbitals along with MOs.
Similarly as before, a subgroup C6 of D6h plays an essential role. This subgroup
contains six group elements such that

C 6 = E, C 6 , C 3 , C2 , C 23 , C 56 : ð19:146Þ

Taking C6(z) as an example, we have

ðϕ1 ϕ2 ϕ3 ϕ4 ϕ5 ϕ6 ÞC 6 ðzÞ = ðϕ6 ϕ1 ϕ2 ϕ3 ϕ4 ϕ5 Þ: ð19:147Þ

Using a matrix representation, we have

Fig. 19.9 Structural


formula of benzene. It
belongs to D6h symmetry
792 19 Applications of Group Theory to Physical Chemistry

Fig. 19.10 Six equivalent


pz atomic orbitals of carbon
y
of benzene

O x
z

Table 19.7 Characters for individual symmetry transformations of 2pz orbitals in benzene
D6h E 2C6 2C3 C2 3C 02 3C 002 i 2S3 2S6 σh 3σ d 3σ v
Γ 6 0 0 0 -2 0 0 0 0 -6 0 2

0 1 0 0 0 0
0 0 1 0 0 0
C6 ðzÞ = 0
0
0
0
0
0
1
0
0
1
0
0
: ð19:148Þ
0 0 0 0 0 1
1 0 0 0 0 0

Once again, we can determine the trace for individual symmetry transformations
belonging to D6h. We collect the results in Table 19.7. The representation is
reducible and this is reduced as follows using a character table of D6h
(Table 19.8). As a result, we get

Γ = A2u þ B2g þ E1g þ E 2u : ð19:149Þ

As a subduced representation of D of D6h to C6, we have

A2u # C 6 = A, B2g # C 6 = B, E 1g # C 6 = 2E1 , E 2u # C 6 = 2E2 : ð19:150Þ

Here, we used Table 19.9 that shows a character table of irreducible representa-
tions of C6. Following the previous procedures, as SALCs we have
19.4 MO Calculations Based on π-Electron Approximation 793

Table 19.8 Character table of D6h


D6h E 2C6 2C3 C2 3C 02 3C 002 i 2S3 2S6 σh 3σ d 3σ v
A1g 1 1 1 1 1 1 1 1 1 1 1 1 x2 + y2, z2
A2g 1 1 1 1 -1 -1 1 1 1 1 -1 -1
B1g 1 -1 1 -1 1 -1 1 -1 1 -1 1 -1
B2g 1 -1 1 -1 -1 1 1 -1 1 -1 -1 1
E1g 2 1 -1 -2 0 0 2 1 -1 -2 0 0 (yz, zx)
E2g 2 -1 -1 2 0 0 2 -1 -1 2 0 0 (x2 - y2, xy)
A1u 1 1 1 1 1 1 -1 -1 -1 -1 -1 -1
A2u 1 1 1 1 -1 -1 -1 -1 -1 -1 1 1 z
B1u 1 -1 1 -1 1 -1 -1 1 -1 1 -1 1
B2u 1 -1 1 -1 -1 1 -1 1 -1 1 1 -1
E1u 2 1 -1 -2 0 0 -2 -1 1 2 0 0 (x, y)
E2u 2 -1 -1 2 0 0 -2 1 1 -2 0 0

Table 19.9 Character table of C6


C6 E C6 C3 C2 C 23 C 56 ε = exp (iπ/3)
A 1 1 1 1 1 1 z; x2 + y2, z2
B 1 -1 1 -1 1 -1
E1 1 ε - ε -1 -ε ε (x, y); (yz, zx)
1 ε -ε -1 - ε ε
E2 1 - ε -ε 1 - ε -ε (x2 - y2, xy)
1 -ε -ε 
1 -ε - ε

6PðAÞ ϕ1  ξ1 = ϕ1 þ ϕ2 þ ϕ3 þ ϕ4 þ ϕ5 þ ϕ6 ,

6PðBÞ ϕ1  ξ2 = ϕ1 - ϕ2 þ ϕ3 - ϕ4 þ ϕ5 - ϕ6 ,

P½E1  ϕ1  ξ3 = ϕ1 þ εϕ2 - ε ϕ3 - ϕ4 - εϕ5 þ ε ϕ6 ,


ð 1Þ

P½E1  ϕ1  ξ4 = ϕ1 þ ε ϕ2 - εϕ3 - ϕ4 - ε ϕ5 þ εϕ6 ,


ð 2Þ

P½E2  ϕ1  ξ5 = ϕ1 - ε ϕ2 - εϕ3 þ ϕ4 - ε ϕ5 - εϕ6 ,


ð 1Þ

P½E2  ϕ1  ξ6 = ϕ1 - εϕ2 - ε ϕ3 þ ϕ4 - εϕ5 - ε ϕ6 ,


ð 2Þ
ð19:151Þ

where ε = exp (iπ/3).


794 19 Applications of Group Theory to Physical Chemistry

Correspondingly, we have a diagonal secular equation of a sixth-order such that

~ 11 - λ~
H S11

~ 22 - λ~S22
H

~ 33 - λ~S33
H
¼ 0: ð19:152Þ
~ 44 - λ~S44
H
~ 55 - λ~S55
H
~ 66 - λ~S66
H

Here, we have, for example,

~ 11 =
H ξ1 Hξ1 dτ = hξ1 jHξ1 i = 6ðα þ 2β þ 2β0 þ β00 Þ: ð19:153Þ

In (19.153), we used the same α and β defined in (19.94) as in the case of


cyclopropenyl radical. That is, α is a Coulomb integral and β is a resonance integral
between two adjacent 2 pz orbital of carbon. Meanwhile, β′ is a resonance integral
between orbitals of “meta” positions such as ϕ1 and ϕ3. A quantity β′′ is a resonance
integral between orbitals of “para” positions such as ϕ1 and ϕ4. It is unfamiliar to
include such kind resonance integrals of β′ and β′′ at a simple π-electron approxi-
mation level. To ignore such resonance integrals is because of a practical purpose to
simplify the calculations. However, we have no reason to exclude them. Or rather,
the use of appropriate SALCs makes it feasible to include β′ and β′′.
In a similar manner, we get

~ 22 =
H ξ2 Hξ2 dτ = hξ2 jHξ2 i = 6ðα - 2β þ 2β0 - β00 Þ,

~ 33 = H
H ~ 44 = 6ðα þ β - β0 - β00 Þ,

~ 55 = H
H ~ 66 = 6ðα - β - β0 þ β00 Þ: ð19:154Þ

Meanwhile, we have

~
S11 = hξ1 jξ1 i = 6ð1 þ 2S þ 2S0 þ S00 Þ,

~
S22 = hξ2 jξ2 i = 6ð1 - 2S þ 2S0 - S00 Þ,

~
S33 = ~
S44 = 6ð1 þ S - S0 - S00 Þ,
19.4 MO Calculations Based on π-Electron Approximation 795

~
S55 = ~
S66 = 6ð1 - S - S0 þ S00 Þ, ð19:155Þ

where S, S′, and S′′ are overlap integrals between the ortho, meta, and para positions,
respectively. Substituting (19.153) through (19.155) for (19.152), the energy eigen-
values are readily obtained as

α þ 2β þ 2β0 þ β00 α - 2β þ 2β0 - β00 α þ β - β0 - β00


λ1 = 0 00 , λ2 = 0 00 , λ3 = λ4 = ,
1 þ 2S þ 2S þ S 1 - 2S þ 2S - S 1 þ S - S0 - S00

α - β - β0 þ β00
λ5 = λ6 = : ð19:156Þ
1 - S - S0 þ S00

Notice that two MOs ξ3 and ξ4 as well as ξ5 and ξ6 are degenerate. As can be seen,
λ3 and λ4 are doubly degenerate. So are λ5 and λ6.
From (19.151), we get, for instance,

jjξ1 jj = hξ1 jξ1 i = 6ð1 þ 2S þ 2S0 þ S00 Þ: ð19:157Þ

Thus, for one of the normalized MOs corresponding to an energy eigenvalue λ1,
we get

j ξ1 i ϕ1 þ ϕ2 þ ϕ3 þ ϕ 4 þ ϕ5 þ ϕ6
Ψ1 = = : ð19:158Þ
ξ1 6ð1 þ 2S þ 2S0 þ S00 Þ

Following the previous examples, we have other normalized MOs. That is,
we have

j ξ2 i ϕ1 - ϕ2 þ ϕ3 - ϕ4 þ ϕ5 - ϕ6
Ψ2 = = ,
ξ2 6ð1 - 2S þ 2S0 - S00 Þ

j ξ3 i ϕ1 þ εϕ2 - ε ϕ3 - ϕ4 - εϕ5 þ ε ϕ6
Ψ3 = = ,
ξ3 6ð1 þ S - S0 - S00 Þ

j ξ4 i ϕ1 þ ε ϕ2 - εϕ3 - ϕ4 - ε ϕ5 þ εϕ6
Ψ4 = = ,
ξ4 6ð1 þ S - S0 - S00 Þ
796 19 Applications of Group Theory to Physical Chemistry

Fig. 19.11 Energy diagram ( )


and MO assignments of
benzene. Energy
eigenvalues λ1 to λ6 are = ( )
given in (19.156)

= ( )

( )

j ξ5 i ϕ1 - ε ϕ2 - εϕ3 þ ϕ4 - ε ϕ5 - εϕ6
Ψ5 = = ,
ξ5 6ð1 - S - S0 þ S00 Þ

j ξ6 i ϕ1 - εϕ2 - ε ϕ3 þ ϕ4 - εϕ5 - ε ϕ6
Ψ6 = = : ð19:159Þ
ξ6 6ð1 - S - S0 þ S00 Þ

The eigenfunction Ψ i (1 ≤ i ≤ 6) corresponds to the eigenvalue λi. As in the case


of cyclopropenyl radical, Ψ 3 and Ψ 4 can be transformed to Ψ~3 and Ψ~4 , respectively,
through a unitary matrix of (19.141). Thus, we get

2ϕ þ ϕ2 - ϕ3 - 2ϕ4 - ϕ5 þ ϕ6
Ψ~3 = 1 ,
12ð1 þ S - S0 - S00 Þ

ϕ2 þ ϕ3 - ϕ5 - ϕ6
Ψ~4 = p : ð19:160Þ
2 1 þ S - S0 - S00

Similarly, transforming Ψ 5 and Ψ 6 to Ψ~5 and Ψ~6 , respectively, we have

2ϕ - ϕ2 - ϕ3 þ 2ϕ4 - ϕ5 - ϕ6
Ψ~5 = 1 ,
12ð1 - S - S0 þ S00 Þ

ϕ2 - ϕ3 þ ϕ5 - ϕ6
Ψ~6 = p : ð19:161Þ
2 1 - S - S0 þ S00

Figure 19.11 shows an energy diagram and MO assignments of benzene.


19.4 MO Calculations Based on π-Electron Approximation 797

A major optical transition takes place among HOMO (E1g) and LUMO (E2u)
levels. In the case of optical absorption, an initial electronic configuration is assigned
to the totally symmetric representation A1g and the symmetry of the final electronic
configuration is described as

Γ = E1g × E2u :

Therefore, a transition matrix element is expressed as

ΦðE1g × E2u Þ jεe  PjΦðA1g Þ , ð19:162Þ

where ΦðA1g Þ stands for the totally symmetric ground state electronic configuration;
ΦðE1g × E2u Þ denotes an excited state electronic configuration represented by a direct-
product representation. This representation is reducible and expressed as a direct
sum of irreducible representations such that

Γ = B1u þ B2u þ E 1u : ð19:163Þ

Notice that unlike ethylene a direct-product representation associated with the


final state is reducible.
To examine whether (19.162) is non-vanishing, as in the case of (19.109) we
estimate whether a direct-product representation
A1g × E1g × E2u = E1g × E2u = B1u + B2u + E1u contains an irreducible representation
which εe  P belongs to. Consulting a character table of D6h, we find that x and
y belong to an irreducible representation E1u and that z belongs to A2u. Since the
direct sum Γ in (19.163) contains E1u, benzene is expected to be polarized along both
x- and y-axes (see Fig. 19.10). Since (19.163) does not contain A2u, the transition
along the z-axis is forbidden. Accordingly, the transition takes place when the light is
polarized parallel to the molecular plane (i.e., the xy-plane). This is a common
feature among planar aromatic molecules including benzene and cyclopropenyl
radical. On the other hand, A2u is not contained in Γ, and so we do not expect the
optical transition to occur in the direction of the z-axis.

19.4.4 Allyl Radical [1]

We revisit the allyl radical and perform its MO calculations. As already noted,
Tables 18.1 and 18.2 of Example 18.1 collected representation matrices of individual
symmetry operations in reference to the basis vectors comprising three atomic
orbitals of allyl radical. As usual, we examine traces (or characters) of those
matrices. Table 19.10 collects them. The representation is readily reduced according
to the character table of C2v (see Table 18.4) so that we have
798 19 Applications of Group Theory to Physical Chemistry

Table 19.10 Characters for individual symmetry transformations of 2pz orbitals in allyl radical
C2v E C2(z) σ v(zx) σ 0v ðyzÞ
Γ 3 -1 1 -3

Γ = A2 þ 2B1 : ð19:164Þ

We have two SALC orbitals that belong to the same irreducible representation of
B1. As noted in Sect. 19.3, the orbitals obtained by a linear combination of these two
SALCs belong to B1 as well. Such a linear combination is given by a unitary
transformation. In the present case, it is convenient to transform the basis vectors
two times. The first transformation is carried out to get SALCs and the second one
will be done in the process of solving a secular equation using the SALCs. Sche-
matically showing the procedures, we have

ϕ1 Ψ1 Φ1
ϕ2 → Ψ2 → Φ2 ,
ϕ3 Ψ3 Φ3

where ϕ1, ϕ2, ϕ3 show the original atomic orbitals; Ψ 1, Ψ 2, Ψ 3 the SALCs; Φ1, Φ2,
Φ3 the final Mos. Thus, the three sets of vectors are connected through unitary
transformations.
Starting with ϕ1 and following the previous cases, we have, e.g.,

1 
PðB1 Þ ϕ1 = χ ðB1 Þ ðgÞ gϕ1 = ϕ1 :
4 g

Also starting with ϕ2, we have

1  1
PðB1 Þ ϕ2 = χ ðB1 Þ ðgÞ gϕ2 = ðϕ2 þ ϕ3 Þ:
4 g 2

Meanwhile, we get

PðA2 Þ ϕ1 = 0,

1  1
PðA2 Þ ϕ2 = χ ðA2 Þ ðgÞ gϕ2 = ðϕ - ϕ3 Þ:
4 g 2 2

Thus, we recovered the results of Example 18.1. Notice that ϕ1 does not partic-
ipate in A2, but take part in B1 by itself.
Normalized SALCs are given as follows:
19.4 MO Calculations Based on π-Electron Approximation 799

Ψ 1 = ϕ1 , Ψ 2 = ðϕ2 þ ϕ3 Þ= 2ð1 þ S0 Þ, Ψ 3 = ðϕ2 - ϕ3 Þ= 2ð1 - S0 Þ,

where we define S′  ϕ2ϕ3dτ. If Ψ 1, Ψ 2, and Ψ 3 belonged to different irreducible


representations (as in the cases of previous three examples of ethylene,
cyclopropenyl radical, and benzene), the secular equation would be fully reduced
to a form of (19.66). In the present case, however, Ψ 1 and Ψ 2 belong to the same
irreducible representation B1, making the situation a bit complicated. Nonetheless,
we can use (19.61) and the secular equation is “partially” reduced.
Defining

H jk = Ψ j HΨ k dτ and Sjk = Ψ j Ψ k dτ,

we have a following secular equation the same form as (19.61):

det H jk - λSjk = 0:

More specifically, we have

2
α-λ ðβ - SλÞ 0
1 þ S′
α þ β′
2 -λ 0 = 0,
ðβ - SλÞ 1 þ S′
1 þ S′
α-β′
0 0 -λ
1 - S′

where α, β, and S are similarly defined as (19.94) and (19.98). The quantities β′ is
defined as

β0  ϕ2 Hϕ3 dτ:

Thus, the secular equation is separated into the following two:

2
α-λ ðβ - SλÞ
1 þ S′ α-β′
=0 and - λ = 0: ð19:165Þ
2 α þ β′ 1 - S′
ðβ - SλÞ -λ
1 þ S′ 1 þ S′

The second equation immediately gives


800 19 Applications of Group Theory to Physical Chemistry

α - β0
λ= :
1 - S0

The first equation of (19.165) is somewhat complicated, and so we adopt the next
approximation. That is,

S0 = β0 = 0: ð19:166Þ

This approximation is justified, because two carbon atoms C2 and C3 are pretty
remote, and so the interaction between them is likely to be weak enough. Thus, we
rewrite the first equation of (19.165) as
p
p α-λ 2ðβ - SλÞ
= 0: ð19:167Þ
2ðβ - SλÞ α-λ

Moreover, we assume that since S is a small quantity compared to 1, a square term


of S2 ≪ 1. Hence, we ignore S2.
Using the approximation of (19.166) and assuming S2 ≈ 0, from (19.167) we
have a following quadratic equation:

λ2 - 2ðα - 2βSÞλ þ α2 - 2β2 = 0:

Solving this equation, we have

p 2αS p αS
λ = α - 2βS ± 2β 1- ≈ α - 2βS ± 2β 1 - ,
β β

where the last approximation is based on

p 1
1-x≈1- x
2

with a small quantity x. Thus, we get


p p p p
λL ≈ α þ 2β 1 - 2S and λH ≈ α - 2β 1þ 2S , ð19:168Þ

where λL < λH. To determine the corresponding eigenfunctions Φ (i.e., MOs), we use
a linear combination of two SALCs Ψ 1 and Ψ 2. That is, putting

Φ = c1 Ψ 1 þ c2 Ψ 2 ð19:169Þ

and from (19.167), we obtain


19.4 MO Calculations Based on π-Electron Approximation 801

p
ðα - λÞc1 þ 2ðβ - SλÞc2 = 0:

Thus, with λ = λL we get c1 = c2 and for λ = λH we get c1 = - c2. Consequently,


as a normalized eigenfunction ΦL corresponding to λL we get

1 1 p - 1=2 p
ΦL = p ð Ψ 1 þ Ψ 2 Þ = 1 þ 2S 2 ϕ1 þ ϕ2 þ ϕ3 : ð19:170Þ
2 2

Likewise, as a normalized eigenfunction ΦH corresponding to λH we get

1 1 p - 1=2 p
ΦH = p ð- Ψ 1 þ Ψ 2 Þ = 1 - 2S - 2ϕ1 þ ϕ2 þ ϕ3 : ð19:171Þ
2 2

As another eigenvalue λ0 and corresponding eigenfunction Φ0, we have

1
λ0 ≈ α and Φ0 = p ðϕ2 - ϕ3 Þ, ð19:172Þ
2

where we have λL < λ0 < λH. The eigenfunction Φ0 does not participate in chemical
bonding and, hence, is said to be a non-bonding orbital.
It is worth noting that eigenfunctions ΦL, Φ0, and ΦH have the same function
forms as those obtained by the simple Hückel theory that ignores the overlap
integrals S. It is because the interaction between ϕ2 and ϕ3 that construct SALCs
of Ψ 2 and Ψ 3 is weak.
Notice that within the framework of the simple Hückel theory, two sets of basis
vectors j ϕ1 i, p12 j ϕ2 þ ϕ3 i, p12 j ϕ2 - ϕ3 i and jΦLi, jΦHi, jΦ0i are connected by a
following unitary matrix V:

1 1
ðjΦL i jΦH i jΦ0 iÞ ¼ j ϕ1 i p j ϕ2 þ ϕ 3 i p j ϕ2 - ϕ 3 i V
2 2
1 1
p - p 0
2 2
1 1
¼ j ϕ1 i p j ϕ2 þ ϕ 3 i p j ϕ2 - ϕ 3 i 1 1 : ð19:173Þ
2 2 p p 0
2 2
0 0 1

Although both SALCs Ψ 1 and Ψ 2 belong to the same irreducible representation


B1, they are not orthogonal. As can be seen from (19.170) and (19.171), however, we
find that ΦL and ΦH that are sought by solving the secular equation (19.167) have
been mutually orthogonal. Starting from jϕ1i, jϕ2i, and jϕ3i of Example 18.1, we
reached jΦLi, jΦHi, and jΦ0i via two-step unitary transformations (18.40) and
(19.173). The combined unitary transformations W = UV are unitary again. That
is, we have
802 19 Applications of Group Theory to Physical Chemistry

(a) (b) (c)


( ) − 2 )(1 + 2

( )

( ) + 2 )(1 − 2

Fig. 19.12 Electronic configurations and symmetry species of individual eigenstates along with
their corresponding energy eigenvalues for the allyl cation. (a) Ground state. (b) First excited state.
(c) Second excited state

Fig. 19.13 Geometry and


position vectors of carbon
C1
atoms of the allyl cation.
The origin is located at the
center of a line segment r1
connecting C2 and C3
(r2 + r3 = 0) C3 C2
r3 O r2

1 1
p -p 0
2 2
1 1 1
j ϕ1 i j ϕ2 i j ϕ3 i UV = j ϕ1 i j ϕ2 i j ϕ3 i p
2 2 2
1 1 1
-p
2 2 2

= ðj ΦL i j ΦH i jΦ0 iÞ: ð19:174Þ

The optical transition of allyl radical represents general features of the optical
transitions of molecules. To make a story simple, let us consider a case of allyl
cation. Figure 19.12 shows the electronic configurations together with symmetry
species of individual eigenstates and their corresponding energy eigenvalues for the
allyl cation. In Fig. 19.13, we redraw its geometry where the origin is located at the
center of a line segment connecting C2 and C3 (r2 + r3 = 0). Major optical transitions
(or optical absorption) are ΦL → Φ0 and ΦL → ΦH.
(i) ΦL → Φ0: In this case, following (19.109) the transition matrix element Tfi is
described by

T fi = Θf ðB1 × A2 Þ jεe  PjΘi ðA1 Þ : ð19:175Þ


19.4 MO Calculations Based on π-Electron Approximation 803

In the above equation, we designate the irreducible representation of eigenstates


according to (19.164). Therefore, the symmetry of the final electronic configuration
is described as

Γ = B1 × A2 = B2 :

Since a direct product of the initial electronic configuration [Θi ðA1 Þ ] and the final
configuration Θf ðB1 × A2 Þ is B1 × A2 × A1 = B2. Then, if εe  P belongs to the same
irreducible representation B2, the associated optical transition should be allowed.
Consulting a character table of C2v, we find that y belongs to B2. Thus, the allowed
optical transition is polarized along the y-direction (see Fig. 18.1).
(ii) ΦL → ΦH: In parallel with the above case, Tfi is described by

T fi = Θf ðB1 × B1 Þ jεe  PjΘi ðA1 Þ : ð19:176Þ

Thus, the transition is characterized by

A1 → A1 ,

where the former A1 indicates the electronic ground state and the latter A1 indicates
the excited state given by B1 × B1 = A1. The direct product of them is simply
described as B1 × B1 × A1 = A1. Consulting a character table again, we find that
z belongs to A1. This implies that the allowed transition is polarized along the z-
direction (see Fig. 18.1).
Next, we investigate the above optical transition in a semi-quantitative manner. In
principle, the transition matrix element Tfi should be estimated from (19.175) and
(19.176) that use electronic configurations of two-electron system. Nonetheless, a
formulation of (4.7) of Sect. 4.1 based upon one-electron states well serves our
present purpose. For ΦL → Φ0 transition, we have

T fi ðΦL → Φ0 Þ = Φ0  ðεe  PÞΦL dτ = eεe  Φ0  rΦL dτ

1 p 1
= eεe  2ϕ1 þ ϕ2 þ ϕ3 r p ðϕ2 - ϕ3 Þ dτ
2 2

eε p p
= pe  2ϕ1 rϕ2 - 2ϕ1 rϕ3 þ ϕ2 rϕ2 þ ϕ3 rϕ2 - ϕ3 rϕ3 dτ
2 2
804 19 Applications of Group Theory to Physical Chemistry


≈ pe  ðϕ2 rϕ2 - ϕ3 rϕ3 Þ dτ
2 2

eε eε eε
≈ pe  r2 jϕ2 j2 dτ - r3 jϕ3 j2 dτ = pe  ðr2 - r3 Þ = p e  r2 ,
2 2 2 2 2

where with the first near equality we ignored integrals ϕirϕjdτ (i ≠ j); with the last
near equality r ≈ r2 or r3. For these approximations, we assumed that an electron
density is very high near C2 or C3 with ignorable density at a place remote from
them. Choosing εe for the direction of r2, we get

e
T fi ðΦL → Φ0 Þ ≈ p jr2 j:
2

With ΦL → ΦH transition, similarly we have

eεe eε
T fi ðΦL → ΦH Þ = ΦH  εe  PΦL dτ ≈  ðr3 - 2r1 þ r2 Þ = - e  r1 :
4 2

Choosing εe for the direction of r1, we get

e
T fi ðΦL → ΦH Þ ≈ jr j:
2 1

Transition
p probability is proportional to a square of an absolute value of Tfi. Using
j r2 j ≈ 3 j r1 j, we have

2 e2 3e2 2
T fi ðΦL → Φ0 Þ ≈ jr2 j2 = jr j ,
2 2 1

2 e2
T fi ðΦL → ΦH Þ ≈ jr j2 :
4 1

Thus, we obtain

2 2
T fi ðΦL → Φ0 Þ ≈ 6 T fi ðΦL → ΦH Þ : ð19:177Þ

Thus, the transition probability of ΦL → Φ0 is about six times that for ΦL → ΦH.
Note that in the above simple estimation we ignore an overlap integral S.
From the above discussion, we conclude that (i) the ΦL → Φ0 transition is
polarized along the r2 direction (i.e., the molecular long axis) and that the ΦL → ΦH
transition is polarized along the r1 direction (i.e., the molecular short axis).
(ii) Transition probability of ΦL → Φ0 is about six times that of ΦL → ΦH. Note
19.5 MO Calculations of Methane 805

that the polarized characteristics are consistent with those obtained from the discus-
sion based on the group theory. The conclusion reached by the semi-quantitative
estimation of a simple molecule of allyl cation well typifies the general optical
features of more complicated molecules having a well-defined molecular long axis
such as polyenes.

19.5 MO Calculations of Methane

So far, we investigated MO calculations of aromatic molecules based upon π-


electron approximation. These are a homogeneous system that has the same quality
of electrons. Here we deal with methane that includes a carbon and surrounding four
hydrogens. These hydrogen atoms form a regular tetrahedron with the carbon atom
positioned at a center of the tetrahedron. It is therefore considered as a heterogeneous
system. The calculation principle, however, is consistent, namely, we make the most
of projection operators and construct appropriate SALCs of methane.
We deal with four 1s electrons of hydrogen along with two 2s electrons and two
2p electrons of carbon. Regarding basis functions of carbon, however, we consider
2s atomic orbital and three 2p orbitals (i.e., 2px, 2py, 2pz orbitals). That is, we deal
with eight atomic orbitals all together. These are depicted in Fig. 19.14. The
dimension of the vector space (i.e., representation space) is eight accordingly.

Fig. 19.14 Four 1 s atomic


orbitals of hydrogen and a
z
2pz orbital of carbon. The
former orbitals are
represented by H1 to H4. 2px
and 2py orbitals of carbon
are omitted for simplicity

O
y

x
806 19 Applications of Group Theory to Physical Chemistry

As before, we wish to determine irreducible representations which individual


MOs belong to. As already mentioned in Sect. 17.3, there are 24 symmetry opera-
tions in a point group Td which methane belongs to (see Table 17.6). According to
the symmetry operations, we decide transformation matrices related to each opera-
tion. For example, Cxyz
3 transforms basis functions as follows:

H 1 H 2 H 3 H 4 C2s C2px C2py C2pz C xyz


3

= H 1 H 3 H 4 H 2 C2s C2py C2pz C2px ,

where by the above notations we denoted atomic species and molecular orbitals.
Hence, as a matrix representation we have

1 0 0 0 0 0 0 0
0 0 0 1 0 0 0 0
0 1 0 0 0 0 0 0
0 0 1 0 0 0 0 0
C 3xyz = , ð19:178Þ
0 0 0 0 1 0 0 0
0 0 0 0 0 0 0 1
0 0 0 0 0 1 0 0
0 0 0 0 0 0 1 0

where Cxyz
3 is the same operation as Rxyz2π
3
appeared in Sect. 17.3. Therefore,

χ Cxyz
3 = 2:

As another example σ yz
d , we have

H 1 H 2 H 3 H 4 C2s C2px C2py C2pz σ yz


d

= H 1 H 2 H 4 H 3 C2s C2px C2pz C2py ,

where σ yz
d represents a mirror symmetry with respect to the plane that includes the x-
axis and bisects the angle formed by the y- and z-axes. Thus, we have

1 0 0 0 0 0 0 0
0 1 0 0 0 0 0 0
0 0 0 1 0 0 0 0
0 0 1 0 0 0 0 0
σ yz
d = : ð19:179Þ
0 0 0 0 1 0 0 0
0 0 0 0 0 1 0 0
0 0 0 0 0 0 0 1
0 0 0 0 0 0 1 0

Then, we have
19.5 MO Calculations of Methane 807

Table 19.11 Characters for individual symmetry transformations of hydrogen 1s orbitals and
carbon 2s and 2p orbitals in methane
Td E 8C3 3C2 6S4 6σ d
Γ 8 2 0 0 4

χ σ yz
d = 4:

Taking some more examples, for Cz2 we get

H 1 H 2 H 3 H 4 C2s C2px C2py C2pz Cz2

= H 4 H 3 H 2 H 1 C2s - C2px - C2py C2pz ,

where Cz2 means a rotation by π around the z-axis. Also, we have

0 0 0 1 0 0 0 0
0 0 1 0 0 0 0 0
0 1 0 0 0 0 0 0
1 0 0 0 0 0 0 0
C 2z = , ð19:180Þ
0 0 0 0 1 0 0 0
0 0 0 0 0 -1 0 0
0 0 0 0 0 0 -1 0
0 0 0 0 0 0 0 1

χ C z2 = 0:


With S42 (i.e., an improper rotation by π2 around the z-axis), we get


H 1 H 2 H 3 H 4 C2s C2px C2py C2pz S42

= H 3 H 1 H 4 H 2 C2s C2py - C2px - C2pz ,

0 1 0 0 0 0 0 0
0 0 0 1 0 0 0 0
1 0 0 0 0 0 0 0
zπ 0 0 1 0 0 0 0 0
S4 2 = , ð19:181Þ
0 0 0 0 1 0 0 0
0 0 0 0 0 0 -1 0
0 0 0 0 0 1 0 0
0 0 0 0 0 0 0 -1
808 19 Applications of Group Theory to Physical Chemistry


χ S42 = 0:

As for the identity matrix E, we have

χ ðE Þ = 8:

Thus, Table 19.11 collects characters of individual symmetry transformations


with respect to hydrogen 1s and carbon 2s and 2p orbitals in methane. From the
above examples, we notice that all the symmetry operators R reduce an eight-
dimensional representation space V8 to subspaces of Span{H1, H2, H3, H4} and
Span{C2s, C2px, C2py, C2pz}. In terms of a notation of Sect. 12.2, we have

V 8 = SpanfH 1 , H 2 , H 3 , H 4 g Span C2s, C2px , C2py , C2pz : ð19:182Þ

In other words, V8 is decomposed into the above two R-invariant subspaces (see
Part III); one is the hydrogen-related subspace and the other is the carbon-related
subspace.
Correspondingly, a representation D comprising the above representation matri-
ces should be reduced to a direct sum of irreducible representations D(α). That is, we
should have

ðαÞ
D= q D
α α
,

where qα is a positive integer or zero. Here we are thinking of decomposition of (8, 8)


matrices such as (19.178) into submatrices. We estimate qα using (18.83). That is,

1
qα = χ ðαÞ ðgÞ χ ðgÞ: ð18:83Þ
n g

With an irreducible representation A1 of Td, for instance, we have

1 1
qA1 = χ ðA1 Þ ðgÞ χ ðgÞ = ð1  8 þ 1  8  2 þ 1  6  4Þ = 2:
24 g 24

As for A2, we have

1
qA 2 = ½1  8 þ 1  8  2 þ ð- 1Þ  6  4 = 0:
24

Regarding T2, we get

1
qT 2 = ð3  8 þ 1  6  4Þ = 2:
24

For other irreducible representations of Td, we get qα = 0. Consequently, we have


19.5 MO Calculations of Methane 809

D = 2DðA1 Þ þ 2DðT 2 Þ : ð19:183Þ

Evidently, both for the hydrogen-related representation D(H ) and carbon-related


representation D(C), we have individually

DðH Þ = DðA1 Þ þ DðT 2 Þ and DðCÞ = DðA1 Þ þ DðT 2 Þ , ð19:184Þ

where D = D(H ) + D(C).


In fact, taking (19.180), for example, in a subspace Span{H1, H2, H3, H4}, the
operator Cz2 is expressed as

0 0 0 1
0 0 1 0
C 2z = :
0 1 0 0
1 0 0 0

Following the routine procedures based on characteristic polynomials, we get


eigenvalues +1 (as a double root) and -1 (a double root as well). Unitary similarity
transformation using a unitary matrix P expressed as

1 1 1 1
2 2 2 2
1 1 1 1
- -
P= 2 2 2 2
1 1 1 1
- -
2 2 2 2
1 1 1 1
- -
2 2 2 2

yields a diagonal matrix described by

1 0 0 0
-1 0 -1 0 0
P C 2z P = :
0 0 -1 0
0 0 0 1

Note that this diagonal matrix is identical to submatrix of (19.180) for Span{C2s,

C2px, C2py, C2pz}. Similarly, S42 of (19.181) gives eigenvalues ±1 and ±i for both

the subspaces. Notice that since S42 is unitary, its eigenvalues take a complex number
with an absolute value of 1. Since these symmetry operation matrices are unitary,
these matrices must be diagonalized according to Theorem 14.5. Using unitary
matrices whose column vectors are chosen from eigenvectors of the matrices,
diagonal elements are identical to eigenvalues including their multiplicity. Writing
representation matrices of symmetry operations for the hydrogen-associated sub-
space and carbon-associated subspace as H and C, we find that H and C have the
810 19 Applications of Group Theory to Physical Chemistry

same eigenvalues in common. Notice here that different types of transformation



matrices (e.g., C z2 , S42 , etc.) give a different set of eigenvalues. Via unitary similarity
transformation using unitary matrices P and Q, we get

-1
P - 1 HP = Q - 1 CQ or PQ - 1 H PQ - 1 = C:

Namely, H and C are similar, i.e., the representation is equivalent. Thus, recalling
Schur’s First Lemma, (19.184) results.
Our next task is to construct SALCs, i.e., proper basis vectors using projection
operators. From the above, we anticipate that the MOs comprise a linear combina-
tion of the hydrogen-associated SALC and carbon-associated SALC that belongs to
the same irreducible representation (i.e., A1 or T2). For this purpose, we first find
proper SALCs using projection operators described as

ðαÞ dα ðαÞ
PlðlÞ = Dll ðgÞ g: ð18:156Þ
n g

We apply this operator to Span{H1, H2, H3, H4}. Taking, e.g., H1 and operating
both sides of (18.156) on H1, as a basis vector corresponding to a one-dimensional
representation of A1 we have

ðA Þ d A1 ðA Þ 1
P1ð11Þ H 1 = D111 ðgÞ gH 1 = ½ð1  H 1 Þ þ ð1  H 1 þ 1  H 1 þ 1  H 3
n g 24

þ1  H 4 þ 1  H 3 þ 1  H 2 þ 1  H 4 þ 1  H 2 Þ þ ð1  H 4 þ 1  H 3 þ 1  H 2 Þ
þð1  H 3 þ 1  H 2 þ 1  H 2 þ 1  H 4 þ 1  H 4 þ 1  H 3 Þ
þð1  H 1 þ 1  H 4 þ 1  H 1 þ 1  H 2 þ 1  H 1 þ 1  H 3 Þ:
1
= ð6  H 1 þ 6  H 2 þ 6  H 3 þ 6  H 4 Þ
24
1
= ðH 1 þ H 2 þ H 3 þ H 4 Þ: ð19:185Þ
4

The case of C2s is simple, because all the symmetry operations convert C2s to
itself. That is, we have

ðA Þ 1
P1ð11Þ C2s = ð24  C2sÞ = C2s: ð19:186Þ
24

Regarding C2p, taking C2px, for instance, we have


19.5 MO Calculations of Methane 811

Table 19.12 Character table Td E 8C3 3C2 6S4 6σ d


of Td
A1 1 1 1 1 1 x2 + y2 + z2
A2 1 1 1 -1 -1
E 2 -1 2 0 0 (2z2 - x2 - y2, x2 - y2)
T1 3 0 -1 1 -1
T2 3 0 -1 -1 1 (x, y, z); (xy, yz, zx)

ðA Þ 1
P1ð11Þ C2px = ð1  C2px Þ þ 1  C2py þ 1  C2pz þ 1  - C2py
24

þ1  - C2pz þ 1  - C2py þ 1  C2pz þ 1  C2py þ 1  - C2pz 

þ½ð1  C2px þ 1  ð- C2px Þ þ 1  ð- C2px Þ þ ½1  ð- C2px Þ

þ1  ð- C2px Þ þ 1  - C2pz þ 1  C2pz þ 1  C2py þ 1  - C2py 

þ 1  C2py þ 1  C2px þ 1  C2pz þ 1  - C2py þ 1  C2px þ 1  - C2pz g

= 0:

The calculation is somewhat tedious, but it is natural that since C2s is spherically
symmetric, it belongs to the totally symmetric representation. Conversely, it is
natural to think that C2px is totally unlikely to contain a totally symmetric represen-
tation. This is also the case with C2py and C2pz.
Table 19.12 shows the character table of Td in which the three-dimensional
irreducible representation T2 is spanned by basis vectors (x y z). Since in
Table 17.6 each (3, 3) matrix is given in reference to the vectors (x y z), it can
directly be utilized to represent T2. More specifically, we can directly choose the
ðT Þ
diagonal elements (1, 1), (2, 2), and (3, 3) of the individual (3, 3) matrices for D112 ,
ðT 2 Þ ðT 2 Þ
D22 , and D33 elements of the projection operators, respectively. Thus, we can
construct SALCs using projection operators explained in Sect. 18.7. For example,
using H1 we obtain SALCs that belong to T2 such that

ðT Þ 3 ðT Þ 3
P1ð12Þ H 1 = D112 ðgÞ gH 1 = fð 1  H 1 Þ
24 g 24
812 19 Applications of Group Theory to Physical Chemistry

þ½ð- 1Þ  H 4 þ ð- 1Þ  H 3 þ 1  H 2 

þ½0  ð- H 3 Þ þ 0  ð- H 2 Þ þ 0  ð- H 2 Þ þ 0  ð- H 4 Þ þ ð- 1Þ  H 4 þ ð- 1Þ  H 3 

þð0  H 1 þ 0  H 4 þ 1  H 1 þ 1  H 2 þ 0  H 1 þ 0  H 3 Þg

3
= ½2  H 1 þ 2  H 2 þ ð- 2Þ  H 3 þ ð- 2Þ  H 4 
24

1
= ðH þ H 2 - H 3 - H 4 Þ: ð19:187Þ
4 1

Similarly, we have

ðT Þ 1
P2ð22Þ H 1 = ðH - H 2 þ H 3 - H 4 Þ, ð19:188Þ
4 1

ðT Þ 1
P3ð32Þ H 1 = ðH 1 - H 2 - H 3 þ H 4 Þ: ð19:189Þ
4
ðT Þ
Now we can easily guess that P1ð12Þ C2px solely contains C2px. Likewise,
ðT Þ ðT Þ
P2ð22Þ C2py and P3ð32Þ C2pz contain only C2py and C2pz, respectively. In fact, we
obtain what we anticipate. That is,

ðT Þ ðT Þ ðT Þ
P1ð12Þ C2px = C2px , P2ð22Þ C2py = C2py , P3ð32Þ C2pz = C2pz : ð19:190Þ

This gives a good example to illustrate the general concept of projection operators
and related calculations of inner products discussed in Sect. 18.7. For example, we
ðT Þ ðT Þ
have a non-vanishing inner product of P1ð12Þ H 1 jP1ð12Þ C2px and an inner product
ðT Þ ðT Þ
of, e.g., P2ð22Þ H 1 jP1ð12Þ C2px is zero. This significantly reduces efforts to solve a
ðT Þ ðT Þ
secular equation (vide infra). Notice that functions P1ð12Þ H 1 , P1ð12Þ C2px , etc. are
ðT Þ ðT Þ
linearly independent of one another. Recall also that P1ð12Þ H 1 and P1ð12Þ C2px
ðαÞ ðαÞ
correspond to ϕl and ψ l in (18.171), respectively. That is, the former functions
are linearly independent, while they belong to the same place “1” of the same three-
dimensional irreducible representation T2.
Equations (19.187)–(19.190) seem intuitively obvious. This is because if we
draw a molecular geometry of methane (see Fig. 19.14), we can immediately
recognize the relationship between the directionality in ℝ3 and “directionality” of
19.5 MO Calculations of Methane 813

SALCs represented by sings of hydrogen atomic orbitals (or C2px, C2py, and C2pz of
carbon).
As stated above, we have successfully obtained SALCs relevant to methane.
Therefore, our next task is to solve an eigenvalue problem and construct appropriate
MOs. To do this, let us first normalize the SALCs. We assume that carbon-based
SALCs have already been normalized as well-studied atomic orbitals. We suppose
that all the functions are real. For the hydrogen-based SALCs

hH 1 þ H 2 þ H 3 þ H 4 jH 1 þ H 2 þ H 3 þ H 4 i = 4hH 1 jH 1 i þ 12hH 1 jH 2 i

= 4 þ 12SHH = 4 1 þ 3SHH , ð19:191Þ

where the second equality comes from the fact that jH1i, i.e., a 1 s atomic orbital is
normalized. We define hH1| H2i = SHH, i.e., an overlap integral between two
adjacent hydrogen atoms. Note also that hHi| Hji (1 ≤ i, j ≤ 4; i ≠ j) is the same
because of the symmetry requirement.
Thus, as a normalized SALC we have

j H1 þ H2 þ H3 þ H4i
H ðA1 Þ  : ð19:192Þ
2 1 þ 3SHH

Defining a denominator as c = 2 1 þ 3SHH , we have

H ðA1 Þ = j H 1 þ H 2 þ H 3 þ H 4 i=c: ð19:193Þ

Also, we have

hH 1 þ H 2 - H 3 - H 4 jH 1 þ H 2 - H 3 - H 4 i = 4hH 1 jH 1 i - 4hH 1 jH 2 i

= 4 - 4SHH = 4 1 - SHH : ð19:194Þ

Thus, we have

ðT 2 Þ j H1 þ H2 - H3 - H4i
H1  p : ð19:195Þ
2 1 - SHH
p
Also defining a denominator as d = 2 1 - SHH , we have
814 19 Applications of Group Theory to Physical Chemistry

ðT 2 Þ
H1 j H 1 þ H 2 - H 3 - H 4 i=d:

Similarly, we define other hydrogen-based SALCs as

ðT 2 Þ
H2 j H 1 - H 2 þ H 3 - H 4 i=d,

ðT 2 Þ
H3 j H 1 - H 2 - H 3 þ H 4 i=d:

The next step is to construct MOs using the above SALCs. To this end, we make a
linear combination using SALCs belonging to the same irreducible representations.
In the case of A1, we choose H ðA1 Þ and C2s. Naturally, we anticipate two linear
combinations of

a1 H ðA1 Þ þ b1 C2s, ð19:196Þ

where a1 and b1 are arbitrary constants. On the basis of the discussions of projection
operators in Sect. 18.7, both the above two linear combinations belong to A1 as well.
ðT Þ ðT Þ ðT Þ
Similarly, according to the projection operators P1ð12Þ , P2ð22Þ , and P3ð32Þ , we make three
sets of linear combinations

ðT Þ ðT Þ ðT Þ
q1 P1ð12Þ H 1 þ r 1 C2px ; q2 P2ð22Þ H 1 þ r 2 C2py ; q3 P3ð32Þ H 1 þ r 3 C2pz , ð19:197Þ

where q1, r1, etc. are arbitrary constants. These three sets of linear combinations
belong to individual “addresses” 1, 2, and 3 of T2. What we have to do is to
determine coefficients of the above MOs and to normalize them by solving the
secular equations. With two different energy eigenvalues, we get two orthogonal
(i.e., linearly independent) MOs for the individual four sets of linear combinations of
(19.196) and (19.197). Thus, total eight linear combinations constitute MOs of
methane.
In light of (19.183), the secular equation can be reduced as follows according to
the representations A1 and T2. There we have changed the order of entries in the
equation so that we can deal with the equation easily. Then we have

H 11 - λ H 12 - λS12

H 21 - λS21 H 22 - λ

G11 - λ G12 - λT 12

G21 - λT 21 G22 - λ
F 11 - λ F 12 - λV 12
F 21 - λV 21 F 22 - λ
K 11 - λ K 12 - λW 12
K 21 - λW 21 K 22 - λ

¼ 0, ð19:198Þ
19.5 MO Calculations of Methane 815

where off-diagonal elements are all zero except for those explicitly described. This is
because of the symmetry requirement (see Sect. 19.2). Thus, in (19.198) the secular
equation is decomposed into four (2, 2) blocks. The first block is pertinent to A1 of a
hydrogen-based component and carbon-based component from the left, respectively.
Lower three blocks are pertinent to T2 of hydrogen-based and carbon-based compo-
ðT Þ ðT Þ ðT Þ
nents from the left, respectively, in order of P1ð12Þ , P2ð22Þ , and P3ð32Þ SALCs from the
top. The notations follow those of (19.59).
We compute these equations. The calculations are equivalent to solving the
following four two-dimensional secular equations:

H 11 - λ H 12 - λS12
H 21 - λS21 H 22 - λ
= 0,

G11 - λ G12 - λT 12
G21 - λT 21 G22 - λ
= 0,

F 11 - λ F 12 - λV 12
F 21 - λV 21 F 22 - λ
= 0,

K 11 - λ K 12 - λW 12
K 21 - λW 21 K 22 - λ
= 0: ð19:199Þ

Notice that these four secular equations are the same as a (2, 2) determinant of
(19.59) in the form of a secular equation. Note, at the same time, that while (19.59)
did not assume SALCs, (19.199) takes account of SALCs. That is, (19.199)
expresses a secular equation with respect to two SALCs that belong to the same
irreducible representation.
The first equation of (19.199) reads as

1 - S12 2 λ2 - ðH 11 þ H 22 - 2H 12 S12 Þλ þ H 11 H 22 - H 12 2 = 0: ð19:200Þ

In (19.200) we define quantities as follows:

S12  H ðA1 Þ C2sdτ

hH 1 þ H 2 þ H 3 þ H 4 jC2si 2hH 1 jC2si


 H ðA1 Þ jC2s = =
2 1 þ 3S HH
1 þ 3SHH
816 19 Applications of Group Theory to Physical Chemistry

2SCH
A1
= : ð19:201Þ
1 þ 3SHH

In (19.201), an overlap integral between hydrogen atomic orbitals and C2s is


identical from the symmetry requirement and it is defined as

A1  hH 1 jC2si:
SCH ð19:202Þ

Also in (19.200), other quantities are defined as follows:

αH þ 3βHH
H 11  H ðA1 Þ jHH ðA1 Þ = , ð19:203Þ
1 þ 3SHH

H 22  hC2sjHC2si, ð19:204Þ

hH 1 þ H 2 þ H 3 þ H 4 jHC2si 2hH 1 jHC2si 2βCH


H 12  H ðA1 Þ j HC2s = = = A1
, ð19:205Þ
2 1 þ 3S HH
1 þ 3S HH
1 þ 3SHH

where H is a Hamiltonian of a methane molecule. In (19.203) and (19.205),


moreover, we define the quantities as

αH  hH 1 jHH 1 i, βHH  hH 1 jHH 2 i, βCH


A1  hH 1 jHC2si: ð19:206Þ

The quantity of H11 is a “Coulomb” integral of the hydrogen-based SALC that


involves four hydrogen atoms. Solving the first equation of (19.199), we get

H 11 þH 22 -2H 12 S12 ± ðH 11 -H 22 Þ2 þ4 H 12 2 þH 11 H 22 S12 2 -H 12 S12 ðH 11 þH 22 Þ


λ¼ :
2 1-S12 2
ð19:207Þ

Similarly, we obtain related solutions for the latter three eigenvalue equations of
(19.199). With the second equation of (19.199), for instance, we have

1 - T 12 2 λ2 - ðG11 þ G22 - 2G12 T 12 Þλ þ G11 G22 - G12 2 = 0: ð19:208Þ

Solving this, we get


19.5 MO Calculations of Methane 817

G11 þG22 -2G12 T 12 ± ðG11 -G22 Þ2 þ4 G12 2 þG11 G22 T 12 2 -G12 T 12 ðG11 þG22 Þ
λ¼ :
2 1-T 12 2
ð19:209Þ

In (19.208) and (19.209) we define these quantities as follows:

ðT Þ ðT Þ hH 1 þ H 2 - H 3 - H 4 jC2px i
T 12  H 1 2 C2px dτ  H 1 2 jC2px = p
2 1 - SHH

2hH jC2px i 2SCH


= p 1 = p T2 , ð19:210Þ
1 - SHH 1 - SHH

ðT Þ ðT 2 Þ αH - βHH
G11  H 1 2 jHH 1 = , ð19:211Þ
1 - SHH

G22  hC2px jHC2px i, ð19:212Þ

ðT 2 Þ hH 1 þ H 2 - H 3 - H 4 jHC2px i
G12  H 1 j HC2px = p
2 1 - SHH
2hH 1 jHC2px i 2βCH
= p = p T2 : ð19:213Þ
1 - SHH 1 - SHH

In the above equations, we have further defined integrals such that

T 2  hH 1 jC2px i, βT 2  hH 1 jHC2px i:
SCH ð19:214Þ
CH

In (19.210), an overlap integral T12 between four hydrogen atomic orbitals and
C2px is identical again from the symmetry requirement. That is, the integrals of
(19.213) are additive regarding a product of plus components of a hydrogen atomic
orbital of (19.187) and C2px as well as another product of minus components of a
hydrogen atomic orbital of (19.187) and C2px; see Fig. 19.14. Notice that C2px has a
node on yz-plane.
The third and fourth equations of (19.199) give exactly the same eigenvalues as
that given in (19.209). This is obvious from the fact that all the latter three equations
of (19.199) are associated with a irreducible representation T2. The corresponding
three MOs are triply degenerate.
818 19 Applications of Group Theory to Physical Chemistry

Fig. 19.15 Configuration


of electron (e) and hydrogen e
nuclei (A and B). rA and rB
denote a separation between
the electron and A and that
between the electron and B,
respectively. R denotes a
separation between A and B

A B

In (19.207) and (19.209), the plus sign gives a higher orbital energy and the minus
sign gives a lower energy. Equations (19.207) and (19.209), however, look some-
what complicated. To simplify the situation, (i) in, e.g., (19.207) let us consider a
case where jH11 j ≫ j H22j or jH11 j ≪ j H22j. In that case, (H11 - H22)2 dominates
inside a square root, and so ignoring -2H12S12 and -S122 we have λ ≈ H11 or
λ ≈ H22. Inserting these values into (19.199), we have either Ψ ≈ H ðA1 Þ or Ψ ≈ C2s,
where Ψ is a resulting MO. This implies that no interaction would arise between
H ðA1 Þ and C2s. (ii) If, however, H11 = H22, we would get

H 11 - H 12 S12 ± j H 12 - H 11 S12 j
λ= : ð19:215Þ
1 - S12 2

As H12 - H11S12 is positive or negative, we have a following alternative:


11 þH 12
Case I: H12 - H11S12 > 0. We have λH = H1þS 12
and λL = H111--SH1212 , where
λ H > λ L.
Case II: H12 - H11S12 < 0. We have λH = H111--SH1212 and λL = H1þS 11 þH 12
12
, where
λ H > λ L.
In the above cases, we would anticipate maximum orbital mixing between H ðA1 Þ
and C2s. (iii) If H11 and H22 are moderately different in between the above (i) and
(ii), the orbital mixing is likely to be moderate. This is an actual situation. With
(19.209) we have eigenvalues expressed similarly to those of (19.207). Therefore,
classifications related to the above Cases I and II hold with G12 - G11T12, F12 -
F11V12, and K12 - K11W12 as well.
In spite of simplicity of (19.215), the quantities H11, H12, and S12 are hard to
calculate. In general cases including the present example (i.e., methane), the diffi-
culty in calculating these quantities results essentially from the fact that we are
dealing with a many-particle interactions that include electron repulsion. Nonethe-
less, for the simplest case of a hydrogen molecular ion Hþ 2 the estimation is feasible
[3]. Let us estimate H12 - H11S12 quantitatively according to Atkins and
Friedman [3].
19.5 MO Calculations of Methane 819

The Hamiltonian of Hþ
2 is described as

ħ2 2 e2 e2 e2
H= - — - - þ , ð19:216Þ
2m 4πε0 r A 4πε0 r B 4πε0 R

where m is an electron rest mass and other symbols are defined in Fig. 19.15; the last
term represents the repulsive interaction between the two hydrogen nuclei. To
estimate the quantities in (19.215), it is convenient to use dimensionless ellipsoidal
μ
coordinates ν such that [3].
ϕ

rA þ rB rA - rB
μ= and ν= : ð19:217Þ
R R

The quantity ϕ is an azimuthal angle around the molecular axis (i.e., the straight
line connecting the two nuclei). Then, we have

ħ2 2 e2 e2
H 11 = hAjHjAi = Aj - — - jA þ Aj - jA
2m 4πε0 r A 4πε0 r B
e2
þ Aj jA
4πε0 R

e2 1 e2
= E 1s - Aj jA þ , ð19:218Þ
4πε0 rB 4πε0 R

where E1s is the same as that given in (3.258) with Z = n = 1 and μ replaced with
ħ2
m (i.e., E1s = - 2ma 2 ). Using a coordinate representation of (3.301), we have

1 - 3=2 - rA =a
j Ai = a e : ð19:219Þ
π

Moreover considering (19.217), we have

1 1 1
Aj jA = 3 dτe - 2rA =a :
rB πa rB

Converting Cartesian coordinates to ellipsoidal coordinates [3] such that

3 2π 1 1
R
dτ = dϕ dμ dν μ2 - ν2 ,
2 0 1 -1

we have
820 19 Applications of Group Theory to Physical Chemistry

1
1 R3 1
e - ðμþνÞR=a
Aj jA =  2π dμ dν μ2 - ν2
2 ðμ - ν Þ
rB 8πa3 R
1 -1

1 1
R2
= dμ dνðμ þ νÞe - ðμþνÞR=a : ð19:220Þ
2a3 1 -1

1
þ νÞe - ðμþνÞR=a , we obtain
1
Putting I = 1 dμ - 1 dνðμ

1 1 1 1
I= μe - μR=a dμ e - νR=a dν þ e - μR=a dμ νe - νR=a dν: ð19:221Þ
1 -1 1 -1

The above definite integrals can readily be calculated using the methods
described in Sect. 3.7.2; see, e.g., (3.262) and (3.263). For instance, we have

1
1
e - cx 1
e - cν dν = = ðec - e - c Þ: ð19:222Þ
-1 -c -1
c

Differentiating (19.222) with respect to the parameter c, we get

1
1
νe - cν dν = ½ð1 - cÞec - ð1 þ cÞe - c : ð19:223Þ
-1 c2

In the present case, c is to be replaced with R/a. Other calculations of the definite
integrals are left for readers. Thus, we obtain

a3 R
I= 2 - 2 1 þ e - 2R=a :
R3 a

In turn, we have

1 R2 1 R
Aj jA = 3  I = 1 - 1 þ e - 2R=a :
rB 2a R a

Introducing a symbol j0 according to Atkins and Friedman given by [3].

e2
j0  ,
4πε0

we finally obtain
19.5 MO Calculations of Methane 821

j0 R j
1 - 1 þ e- a þ 0 :
2R
H 11 = E 1s - ð19:224Þ
R a R

The quantity S12 appearing in (19.199) can be obtained as follows:

2π 1 1
R3
S12 = hBjAi = dϕ dμ dν μ2 - ν2 e - μR=a , ð19:225Þ
8πa3 0 1 -1

1 - 3=2 - rB =a
where we used j Bi = πa e and (19.217). Noting that

1 1 1
2 - μR=a
dμ dν μ2 - ν2 e - μR=a = dμ 2μ2 - e
1 -1 1 3

and following procedures similar to those described above, we get

2
R 1 R
S12 = 1 þ þ e - R=a : ð19:226Þ
a 3 a

In turn, for H12 in (19.199) we have

ħ2 2 e2 e2
H 12 = hAjHjBi = Aj - — - jB þ Aj - jB
2m 4πε0 r B 4πε0 r A
e2
þ Aj jB
4πε0 R

e2 1 e2
= E 1s hAjBi - Aj jB þ hAjBi: ð19:227Þ
4πε0 rA 4πε0 R

ħ 2
Note that in (19.227) jBi is an eigenfunction belonging to E1s = - 2ma 2 that is an
ħ2 e2
eigenvalue of an operator - 2m — - 4πε0 rB . From (3.258), we estimate E1s to be
2

13.61 eV. Using (19.217) and converting Cartesian coordinates to ellipsoidal coor-
dinates once again, we get

1 1 R
Aj jB = 1 þ e - R=a :
rA a a

Thus, we have
822 19 Applications of Group Theory to Physical Chemistry

j0 j R
S - 0 1 þ e - a:
R
H 12 = E1s þ ð19:228Þ
R 12 a a

We are now in the position to evaluate H12 - H11S12 in (19.215). According to


Atkins and Friedman [3], we define the following notations:

1 1
j0  j0 Aj jA and k0  j0 Aj jB :
rB rA

Then, we have

H 12 - H 11 S12 = - k0 þ j0 S12 : ð19:229Þ

The calculation of this quantity is straightforward. The result is

H 12 - H 11 S12

2
1 2R - R=a j0 R R 1 R
= j0 - e - 1þ 1þ þ e - 3R=a
R 3a2 R a a 3 a

j0 a 2R - R=a j0 2
a R 1 R
= - e - 1þ 1þ þ e - 3R=a : ð19:230Þ
a R 3a a R a 3 a

In (19.230) we notice that whereas the second term is always negative, the first
term may be negative or positive depending upon R. We could not tell a priori
whether H12 - H11S12 is negative accordingly. If we had R ≪ a, (19.230) would
become positive.
Let us then make a quantitative estimation. The Bohr radius a is about 52.9 pm
(using an electron rest mass). As an experimental result, R is approximately 106 pm
[3]. Hence, for a Hþ2 ion we have

R=a ≈ 2:0:

Using this number, we get

H 12 - H 11 S12 ≈ - 0:13j0 =a < 0:

We estimate H12 - H11S12 to be ~ - 3.5 eV. Therefore, from (19.215) we get


19.5 MO Calculations of Methane 823

H 11 þ H 12 H 11 - H 12
λL = and λH = , ð19:231Þ
1 þ S12 1 - S12

where λL and λH indicate the lower and higher energy eigenvalues, respectively.
Namely, Case II in the above is more likely. Correspondingly, for MOs we have

j Aiþ j Bi j Ai - j Bi
ΨL = and ΨH = , ð19:232Þ
2ð1 þ S12 Þ 2ð1 - S12 Þ

where Ψ L and Ψ H belong to λL and λH, respectively. The results are virtually the
same as those given in (19.101)–(19.105) of Sect. 19.1. In (19.101) and Fig. 19.6,
however, we merely assumed that λ1 = αþβ α-β
1þS is lower than λ2 = 1 - S. Here, we have
confirmed that this is truly the case. A chemical bond is formed in such a way that an
electron is distributed as much as possible along the molecular axis (see Fig. 19.4 for
a schematic) and that in this configuration a minimized orbital energy is achieved.
In our present case, H11 in (19.203) and H22 in (19.204) should differ. This is also
the case with G11 in (19.211) and G22 in (19.212). Here we return back to (19.199).
Suppose that we get a MO by solving, e.g., the first secular equation of (19.199)
such that

Ψ = a1 H ðA1 Þ þ b1 C2s:

From the secular equation, we get

H 12 - λS12
a1 = - b , ð19:233Þ
H 11 - λ 1

where S12 and H12 were defined as in (19.201) and (19.205), respectively. A
~ is described by
normalized MO Ψ

~= a1 H ðA1 Þ þ b1 C2s
Ψ : ð19:234Þ
a1 2 þ b1 2 þ 2a1 b1 S12

Thus, according to two different energy eigenvalues λ we will get two linearly
independent MOs. Other three secular equations are dealt with similarly.
From the energetical consideration of Hþ 2 we infer that a1b1 > 0 with a bonding
MO and a1b1 < 0 with an anti-bonding MO. Meanwhile, since both H ðA1 Þ and C2s
belong to the irreducible representation A1, so does Ψ~ on the basis of the discussion
on the projection operators of Sect. 18.7. Thus, we should be able to construct proper
MOs that belong to A1. Similarly, we get proper MOs belonging to T2 by solving
other three secular equations of (19.199). In this case, three bonding MOs are triply
degenerate, so are three anti-bonding MOs. All these six MOs belong to the
irreducible representation T2. Thus, we can get a complete set of MOs for methane.
These eight MOs span the representation space V8.
824 19 Applications of Group Theory to Physical Chemistry

Fig. 19.16 Probable energy


diagram and MO symmetry
species of methane.
(Adapted from http://www.
science.oregonstate.edu/
~gablek/CH334/Chapter1/
methane_MOs.htm with
kind permission of Professor
Kevin P. Gable)
C
4H

CH4

To precisely determine the energy levels, we need to take more elaborate


approaches to approximate and calculate various parameters that appear in
(19.199). At the same time, we need to perform detailed experiments including
spectroscopic measurements and interpret those results carefully [4]. Taking account
of these situations, Fig. 19.16 [5] displays as an example of MO calculations that
give a probable energy diagram and MO symmetry species of methane. The diagram
comprises a ground state bonding a1 state and its corresponding anti-bonding state
a1 along with triply degenerate bonding t2 states, and their corresponding anti-
bonding state t 2 .
We emphasize that the said elaborate approaches ensue truly from the “paper-
and-pencil” methods based upon group theory. The group theory thus supplies us
with a powerful tool and clear guideline for addressing various quantum chemical
problems, a few of which we are introducing as an example in this book.
Finally, let us examine the optical transition of methane. In this case, we have to
consider electronic configurations of the initial and final states. If we are dealing with
optical absorption, the initial state is the ground state A1 that is described by the
totally symmetric representation. The final state, however, will be an excited state,
which is described by a direct-product representation related to the two states that are
associated with the optical transition. The matrix element is expressed as

Θðα × βÞ jεe  PjΘðA1 Þ , ð19:235Þ

where ΘðA1 Þ stands for an electronic configuration of the totally symmetric ground
state; Θ(α × β) denotes an electronic configuration of an excited state represented by a
direct-product representation pertinent to irreducible representations α and β; P is an
electric dipole operator and εe is a unit polarization vector of the electric field. From a
19.5 MO Calculations of Methane 825

character table for Td, we find that εe  P belongs to the irreducible representation T2
(see Table 19.12).
The ground state electronic configuration is A1 (totally symmetric). It is
denoted by

a21 t 22 t ′ 22 t″22 ,

where three T2 states are distinguished by a prime and double prime. For possible
configuration of excited states, we have

a21 t 2 t ′ 22 t″22 t 2 ðA1 → T 2 × T 2 Þ, a21 t 2 t ′ 22 t″22 a1 ðA1 → T 2 × A1 = T 2 Þ,

a1 t 22 t ′ 22 t″22 t 2 ðA1 → A1 × T 2 = T 2 Þ, a1 t 22 t ′ 22 t″22 a1 ðA1 → A1 × A1 = A1 Þ: ð19:236Þ

In (19.236), we have optical excitations of t 2 → t 2 , t 2 → a1 , a1 → t 2 , and a1 → a1 ,


respectively. Consequently, the excited states are denoted by T2 × T2, T2, T2, and A1,
respectively. Since εe  P is Hermitian as seen in (19.51) of Sect. 19.2, we have

Θðα × βÞ jεe  PjΘðA1 Þ = ΘðA1 Þ jεe  PjΘðα × βÞ : ð19:237Þ

Therefore, according to the general theory of Sect. 19.2, we need to examine


whether T2 × D(α) × D(β) contains A1 to judge whether the optical transition is
allowed. As mentioned above, D(α) × D(β) is chosen from among T2 × T2, T2, T2,
and A1. The results are given as follows:

T 2 × T 2 × T 2 = 3T 1 þ 4T 2 þ 2E þ A2 þ A1 , ð19:238Þ

T 2 × T 2 = T 1 þ T 2 þ E þ A1 , ð19:239Þ

T 2 × A1 = T 2 : ð19:240Þ

Admittedly, (19.238) and (19.239) contain A1, and so the transition is allowed. As
for (19.240), however, the transition is forbidden because it does not contain A1. In
light of the character table of Td (Table 19.12), we find that the allowed transitions
(i.e., t 2 → t 2 , t 2 → a1 , and a1 → t 2) equally take place in the direction polarized along
the x-, y-, z-axes. This is often the case with molecules having higher symmetries
such as methane. The transition a1 → a1 , however, is forbidden.
In the above argument including the energetic consideration, we could not tell
magnitude relationship of MO energies or photon energies associated with the
826 19 Applications of Group Theory to Physical Chemistry

optical transition. Once again, this requires more accurate calculations and experi-
ments. Yet, the discussion we have developed gives us strong guiding principles in
the investigation of molecular science.

References

1. Cotton FA (1990) Chemical applications of group theory, 3rd edn. Wiley, New York
2. Hamermesh M (1989) Group theory and its application to physical problems. Dover, New York
3. Atkins P, Friedman R (2005) Molecular quantum mechanics, 4th edn. Oxford University Press,
Oxford
4. Anslyn EV, Dougherty DA (2006) Modern physical organic chemistry. University Science
Books, Sausalito
5. Gable KP (2014) Molecular orbitals: methane. http://www.science.oregonstate.edu/~gablek/
CH334/Chapter1/methane_MOs.htm
Chapter 20
Theory of Continuous Groups

In Chap. 16, we classified the groups into finite groups and infinite groups. We
focused mostly on finite groups and their representations in the precedent chapters.
Of the infinite groups, continuous groups and their properties have been widely
investigated. In this chapter we think of the continuous groups as a collective term
that include Lie groups and topological groups. The continuous groups are also
viewed as the transformation group. Aside from the strict definition, we will see the
continuous group as a natural extension of the rotation group or SO(3) that we
studied briefly in Chap. 17. Here we reconstruct SO(3) on the basis of the notion of
infinitesimal rotation. We make the most of the exponential functions of matrices the
theory of which we have explored in Chap. 15. In this context, we study the basic
properties and representations of the special unitary group SU(2) that has close
relevance to SO(3). Thus, we focus our attention on the representation theory of
SU(2) and SO(3). The results are associated with the (generalized) angular momenta
which we dealt with in Chap. 3. In particular, we show that the spherical surface
harmonics constitute the basis functions of the representation space of SO(3).
Finally, we study various important properties of SU(2) and SO(3) within a
framework of Lie groups and Lie algebras. The last sections comprise the description
of abstract ideas, but those are useful for us to comprehend the constitution of the
group theory from a higher perspective.

20.1 Introduction: Operators of Rotation and Infinitesimal


Rotation

To characterize the continuous transformation appropriately, we have to describe an


infinitesimal transformation of rotation. This is because a rotation with a finite angle
is attained by an infinite number of infinitesimal rotations.

© Springer Nature Singapore Pte Ltd. 2023 827


S. Hotta, Mathematical Physical Chemistry,
https://doi.org/10.1007/978-981-99-2512-4_20
828 20 Theory of Continuous Groups

First let us consider how a function Ψ is transformed by the rotation (see Sect.
19.1). Here we express Ψ by

c1
Ψ = ðψ 1 ψ 2 ⋯ ψ d Þ c2

, ð20:1Þ
cd

where {ψ 1, ⋯, ψ d} span the representation space relevant to the rotation; d is a


dimension of the representation space; c1, c2, ⋯ cd are coefficients (or coordinates).
Henceforth, we make it a rule to denote a vector in a representation space as in
(20.1). This is a natural extension of (11.13). We assume ψ ν (ν = 1, 2, ⋯, d ) as a
vector component and ψ ν is normally expressed as a function of a position vector x
in a real three-dimensional space (see Chap. 18). Considering for simplicity a
one-dimensional representation space (i.e., d = 1) and omitting the index ν, as a
transformation of ψ we have

Rθ ψ ðxÞ = ψ Rθ- 1 ðxÞ , ð20:2Þ

where the index θ indicates the rotation angle in a real space whose dimension is two
or three. Note that the dimensions of the real space and representation space may or
may not be the same. Equation (20.2) is essentially the same as (19.11).
As a three-dimensional matrix representation of rotation, we have, e.g.,

cos θ – sin θ 0
Rθ = sin θ cos θ 0 : ð20:3Þ
0 0 1

The matrix Rθ represents a rotation around the z-axis in ℝ3; see Fig. 11.2 and
(11.31). Borrowing the notation of (17.3) and (17.12), we have

cos θ – sin θ 0 x
R θ ð xÞ = ð e1 e 2 e 3 Þ sin θ cos θ 0 y : ð20:4Þ
0 0 1 z

Therefore, as an inverse operator we get

cos θ sin θ 0 x
Rθ- 1 ðxÞ = ðe1 e2 e3 Þ - sin θ cos θ 0 y : ð20:5Þ
0 0 1 z

Let us think of an infinitesimal rotation. Taking a first-order quantity of an


infinitesimal angle θ, (20.5) can be approximated as
20.1 Introduction: Operators of Rotation and Infinitesimal Rotation 829

1 θ 0 x x þ θy
Rθ- 1 ðxÞ ≈ ðe1 e2 e3 Þ - θ 1 0 y = ðe1 e2 e3 Þ y - θx
0 0 1 z z ð20:6Þ
= ðx þ θyÞe1 þ ðy - θxÞe2 þ ze3 :

Putting

ψ ðxÞ = ψ ðxe1 þ ye2 þ ze3 Þ  ψ ðx, y, zÞ,

we have

Rθ ψ ðxÞ = ψ Rθ- 1 ðxÞ ð20:2Þ

or

Rθ ψ ðx, y, zÞ ≈ ψ ðx þ θy, y - θx, zÞ


∂ψ ∂ψ
= ψ ðx, y, zÞ þ θy þ ð - θxÞ
∂x ∂y
∂ ∂
= ψ ðx, y, zÞ - iθð - iÞ x -y ψ ðx, y, zÞ:
∂y ∂x

Now, using a dimensionless operator Mz (=Lz/ħ) such that

∂ ∂
Mz  - i x -y , ð20:7Þ
∂y ∂x

we get

Rθ ψ ðx, y, zÞ ≈ ψ ðx, y, zÞ - iθMz ψ ðx, y, zÞ = ð1 - iθMz Þψ ðx, y, zÞ: ð20:8Þ

Note that the operator Mz has already appeared in Sect. 3.5 [also see (3.19), Sect.
3.2], but in this section we denote it by Mz to distinguish it from the previous
notation Mz. This is because Mz and Mz are connected through a unitary similarity
transformation (vide infra).
From (20.8), we express an infinitesimal rotation θ around the z-axis as

Rθ = 1 - iθMz : ð20:9Þ

Next, let us think of a rotation of a finite angle θ. We have


830 20 Theory of Continuous Groups

n
Rθ = Rθ=n , ð20:10Þ

where RHS of (20.10) implies that the rotation Rθ of a finite angle θ is attained by
n successive infinitesimal rotations of Rθ/n with a large enough n. Taking a limit
n → 1 [1, 2], we get
n
n θ
Rθ = lim Rθ=n = lim 1 - i Mz = expð- iθMz Þ: ð20:11Þ
n→1 n→1 n

Recall the following definition of an exponential function other than (15.7) [3]:

x n
ex  lim 1 þ : ð20:12Þ
n→1 n

Note that as in the case of (15.7), (20.12) holds when x represents a matrix. Thus,
if a dimension of the representation space d is two or more, (20.9) should be read as

Rθ = E - iθMz , ð20:13Þ

where E is a (d, d) identity matrix; Mz is represented by a (d, d) square matrix


accordingly. As already shown in Chap. 15, an exponential function of a matrix is
well-defined. Since Mz is Hermitian, -iMz is anti-Hermitian (see Chaps. 2 and 3).
Thus, using (20.11) Rθ can be expressed as

Rθ = expðθAz Þ, ð20:14Þ

with

Az  - iMz : ð20:15Þ

Operators of this type play an essential role in continuous groups. This is because
in (20.14) the operator Az represents a Lie algebra, which in turn generates a Lie
group. The Lie group is categorized as one of the continuous groups along with a
topological group. In this chapter we further explore various characteristics of
SO(3) and SU(2), both of which are Lie groups and have been fully investigated
from various aspects. The Lie algebras frequently appear in the theory of continuous
groups in combination with the Lie groups. Brief outline of the theory of Lie groups
and Lie algebras will be given at the end of this chapter.
Let us further consider implications of (20.9) and (20.13). From those equations
we have

Mz = ð1 - Rθ Þ=iθ ð20:16Þ

or
20.1 Introduction: Operators of Rotation and Infinitesimal Rotation 831

Mz = ðE - Rθ Þ=iθ: ð20:17Þ

As a very trivial case, we might well be able to think of a one-dimensional real


space. Let the space extend along the z-axis and a unit vector of that space be e3.
Then, with any rotation Rθ around the z-axis makes e3 unchanged (or invariant). That
is, we have

R θ ð e3 Þ = e 3 :

With the three-dimensional case, we have

1 0 0 1 -θ 0
E= 0 1 0 and Rθ = θ 1 0 : ð20:18Þ
0 0 1 0 0 1

Therefore, from (20.17) we get

0 -i 0
Mz = i 0 0 : ð20:19Þ
0 0 0

Next, viewing x, y, and z as vector components (i.e., functions), we think of the


following equation:

0 -i 0 c1
Mz ðΧÞ = ðx y zÞ i 0 0 c2 : ð20:20Þ
0 0 0 c3

The notation of (20.20) denotes the vector transformation in the representation


space, in accordance with (11.37). In (20.20) we consider (x y z) as basis vectors as in
the case of (e1 e2 e3) of (17.2) and (17.3). In other words, we are thinking of a vector
transformation in a three-dimensional representation space spanned by a set of basis
vectors (x y z). In this case, Mz represented by (20.19) operates on a vector (i.e., a
function) Χ expressed by

c1
Χ = ðx y zÞ c2 , ð20:21Þ
c3
832 20 Theory of Continuous Groups

Fig. 20.1 Function f(x, y) =


ay (a > 0). The function ay = , = ( > 0)
is drawn green. Only a part
of f(x, y) > 0 is shown

where c1, c2, and c3 are coefficients (or coordinates). Again, this notation is a natural
extension of (11.13). We emphasize once more that (x y z) does not represent
coordinates but functions. As an example, we depict a function f(x, y) = ay
(a > 0) in Fig. 20.1. The function ay (drawn green in Fig. 20.1) behaves as a vector
with respect to a linear transformation like a basis vector e2 of the Cartesian
coordinate. The “directionality” as a vector can easily be visualized with the function
ay.
Since Mz of (20.19) is Hermitian, it must be diagonalized by unitary similarity
transformation (Sect. 14.3). We find a diagonalizing unitary matrix P with respect to
Mz such that

1 1
-p p 0
2 2
P= i i : ð20:22Þ
-p -p 0
2 2
0 0 1

Using (20.22), we rewrite (20.20) as


20.1 Introduction: Operators of Rotation and Infinitesimal Rotation 833

0 -i 0 c1
Mz ðX Þ = ðx y zÞP  P - 1 i 0 0 P  P-1 c2
0 0 0 c3
1 1
-p p 0
2 2 1 0 0
= ðx y zÞ i i 0 -1 0
-p -p 0
2 2 0 0 0
0 0 1
1 i
-p p 0
2 2 c1
1 i 1 i
× 1 i c2 = - p x- p y p x- p y z
p p 0 2 2 2 2
2 2 c3
0 0 1
1 i
1 0 0 - p c1 þ p c2
2 2
× 0 -1 0 1 i : ð20:23Þ
p c 1 þ p c2
0 0 0 2 2
c3

Here we define M z such that

1 0 0
Mz  P -1
Mz P = 0 -1 0 : ð20:24Þ
0 0 0

Equation (20.24) was obtained by putting l = 1 and shuffling the rows and
columns of (3.158) in Chap. 3. That is, choosing a (real) unitary matrix R3 given by

0 0 1
R3 = 1 0 0 , ð17:116Þ
0 1 0

we have

-1 0 0
-1
R3- 1 M z R3 = R3- 1 P - 1 Mz PR3 = ðPR3 Þ Mz ðPR3 Þ = 0 0 0 = Mz:
0 0 1
834 20 Theory of Continuous Groups

Hence, Mz that appeared as a diagonal matrix in (3.158) has been obtained by a


unitary similarity transformation of Mz using a unitary operator PR3.
Comparing (20.20) and (20.23), we obtain useful information. If we choose (x y z)
for basis vectors, we could not have a representation that diagonalizes Mz. If,
1 i 1 i
however, we choose - p x - p y p x - p y z for the basis vectors,
2 2 2 2
we get a representation that diagonalizes Mz. These basis vectors are sometimes
referred to as a spherical basis. If we carefully look at the latter basis vectors (i.e.,
spherical basis), we notice that these vectors have the same transformation properties
as Y 11 Y 1- 1 Y 01 . In fact, using (3.216) and converting the spherical coordinates to
Cartesian coordinates in (3.216), we get [4]

3 1 1 1
Y 11 Y 1- 1 Y 01 = - p ðx þ iyÞ p ðx - iyÞ z , ð20:25Þ
4π r 2 2

where r is a radial coordinate (i.e., a positive number) that appears in the polar
coordinate (r, θ). Note that the factor r contained in Y 11 Y 1- 1 Y 01 is invariant
with respect to the rotation.
We can generalize the results and conform them to the discussion of related
characteristics in representation spaces of higher dimensions. The spherical surface
harmonics possess an important position for this (vide infra).
From Sect. 3.5, (3.150) we have

M z Y 00 = 0,

where Y 00 ðθ, ϕÞ = 1=4π. The above relation obviously shows that Y 00 ðθ, ϕÞ belongs
to an eigenvalue of zero of Mz. This is consistent with the earlier comments made on
the one-dimensional real space.

20.2 Rotation Groups: SU(2) and SO(3)

Bearing in mind the aforementioned background knowledge, we further develop the


theory of the rotation groups SU(2) and SO(3) and explore in detail various proper-
ties of them. First, we deal with the topics through conventional approaches. Later
we will describe them systematically.
Following the argument developed above, we wish to relate an anti-Hermitian
operator to a rotation operator of a finite angle θ represented by (17.12). We express
Hermitian operators related to the x- and y-axes as
20.2 Rotation Groups: SU(2) and SO(3) 835

0 0 0 0 0 i
Mx = 0 0 -i and My = 0 0 0 :
0 i 0 -i 0 0

Then, anti-Hermitian operators corresponding to (20.15) are described by

0 -1 0 0 0 0
Az = 1 0 0 , Ax = 0 0 -1 ,
0 0 0 0 1 0
0 0 1
Ay = 0 0 0 , ð20:26Þ
-1 0 0

where Ax  - iMx and Ay  - iMy. These are characterized by real skew-


symmetric matrices.
Using (15.7), we calculate exp(θAz) such that

1 1 1 1
expðθAz Þ = E þ θAz þ ðθAz Þ2 þ ðθAz Þ3 þ ðθAz Þ4 þ ðθAz Þ5
2! 3! 4! 5!
þ ⋯: ð20:27Þ

We note that

-1 0 0 1 0 0
A2z = 0 -1 0 = A6z , A4z = 0 1 0 = A8z etc:;
0 0 0 0 0 0
0 1 0 0 -1 0
A3z = -1 0 0 = A7z , A5z = 1 0 0 = A9z :
0 0 0 0 0 0

Thus, we get
836 20 Theory of Continuous Groups

expðθAz Þ
1 1 1 1
= E þ ðθAz Þ2 þ ðθAz Þ4 þ ⋯ þ θAz þ ðθAz Þ3 þ ðθAz Þ5 þ ⋯
2! 4! 3! 5!
1 1
1 - θ2 þ θ4 - ⋯ 0 0
2! 4!
= 1 1
0 1 - θ 2 þ θ4 - ⋯ 0
2! 4!
0 0 1
1 1
0 - θ þ θ3 - θ5 þ ⋯ 0
3! 5!
þ 1 1 ð20:28Þ
θ - θ 3 þ θ5 þ ⋯ 0 0
3! 5!
0 0 0
cos θ 0 0 0 - sin θ 0
= 0 cos θ 0 þ sin θ 0 0
0 0 1 0 0 0
cos θ - sin θ 0
= sin θ cos θ 0 :
0 0 1

Hence, we recover the matrix form of (17.12). Similarly, we get

1 0 0
expðφAx Þ = 0 cos φ - sin φ , ð20:29Þ
0 sin φ cos φ
cos ϕ 0 sin ϕ
exp ϕAy = 0 1 0 : ð20:30Þ
- sin ϕ 0 cos ϕ

Of the above operators, using (20.28) and (20.30) we recover (17.101) related to
the Euler angles such that

R3 = expðαAz Þ exp βAy expðγAz Þ: ð20:31Þ

Using exponential functions of matrices, (20.31) gives a transformation matrix


containing the Euler angles and supplies us with a direct method to be able to make a
matrix representation of SO(3).
20.2 Rotation Groups: SU(2) and SO(3) 837

Although (20.31) is useful for the matrix representation in the real space, its
extension to abstract representation spaces would somewhat be limited. In this
respect, the method developed in SU(2) can widely be used to produce representa-
tion matrices for a representation space of an arbitrary dimension. Let us start with
constructing those representation matrices using (2, 2) complex matrices.

20.2.1 Construction of SU(2) Matrices

In Sect. 16.1 we mentioned a general linear group GL(n, ℂ). There are many sub-
groups of GL(n, ℂ). Among them, SU(n), in particular SU(2), plays a central role in
theoretical physics and chemistry. In this section we examine how to construct
SU(2) matrices.
Definition 20.1 A group consisting of (n, n) unitary matrices with a determinant 1 is
called a special unitary group of degree n and denoted by SU(n).
In Chap. 14 we mentioned that the “absolute value” of determinant of a unitary
matrix is 1. In Definition 20.1, however, there is an additional constraint in that we
are dealing with unitary matrices whose determinant is 1.
The SU(2) matrices U have the following general form:

a b
U= , ð20:32Þ
c d

where a, b, c, and d are complex numbers. According to Definition 20.1 we try to


seek conditions for these numbers. The unitarity condition UU{ = U{U = E reads as

ac þ bd
2
a b a c aj2 þ b
=
c d b d a c þ b d cj2 þ d
2

2
ð20:33Þ
a c a b aj2 þ c a b þ c d 1 0
= = = :
b d c d ab þ cd  bj2 þ d
2 0 1

Then we have

ðiÞ jaj2 þ jbj2 = 1, ðiiÞ jcj2 þ jdj2 = 1, ðiiiÞ jaj2 þ jcj2 = 1,


ðivÞ jbj2 þ jdj2 = 1, ðvÞ a c þ b d = 0, ðviÞ a b þ c d = 0,
ðviiÞ ad - bc = 1:

Of the above conditions, (vii) comes from detU = 1. Condition (iv) can be
eliminated by conditions (i)-(iii). From (v) and (vii), using Cramer’s rule we have
838 20 Theory of Continuous Groups

0 b a 0
1 a - b -b 1 a
c=   = = - b , d=   =
a b jaj2 þjbj2 a b jaj2 þjbj2
-b a -b a
= a : ð20:34Þ

From (20.34) we see that (i)–(iv) are equivalent and that (v) and (vi) are redun-
dant. Thus, as a general form of U we get

a b
U= with aj2 þ bj2 = 1: ð20:35Þ
- b a

As a result, we have four freedoms with a and b with a restricting condition of


|a| 2+|b|2 = 1. That is, we finally get three freedoms with choice of parameters.
Putting

a = p þ iq and b = r þ is ðp, q, r, s : realÞ,

we have

p2 þ q2 þ r 2 þ s2 = 1: ð20:36Þ

Therefore, the restricted condition of (20.35) is equivalent to that p, q, r, and s are


regarded as coordinates that are positioned on a unit sphere of ℝ4.
From (20.35), we have

a -b
U - 1 = U{ =  :
b a

Also putting

a1 b1 a2 b2
U1 = and U2 = , ð20:37Þ
- b1 a1 - b2 a2

we get

a1 a2 - b1 b2 a1 b2 þ b1 a2
U1U2 = : ð20:38Þ
- a2 b1 - a1 b2 - b1 b2 þ a1 a2

Thus, both U-1 and U1U2 satisfy criteria (20.35) and, hence, the unitary matrices
having a matrix form of (20.35) constitute a group; i.e., SU(2).
20.2 Rotation Groups: SU(2) and SO(3) 839

Next we wish to connect the SU(2) matrices to the matrices that represent the
angular momentum. In Sect. 3.4 we developed the theory of generalized angular
momentum. In the case of j = 1/2 we can obtain accurate information about the spin
states. Using (3.101) as a starting point and defining the state belonging to the spin
1 0
angular momentum μ = - 1/2 and μ = 1/2 as and , respectively, we
0 1
obtain

0 0 0 1
J ð þÞ = and J ð- Þ = : ð20:39Þ
1 0 0 0

1 0
Note that andare in accordance with (2.64) as a column vector
0 1
representation. Using (3.72), we have

1 0 1 1 0 i 1 -1 0
Jx = , Jy = , Jz = : ð20:40Þ
2 1 0 2 -i 0 2 0 1

1 1
In (20.40) Jz was given so that we may have J z =- 1
2 and
0 0
0 0
Jz = 1
2 . Deleting the coefficient 12 from (20.40), we get Pauli spin matrices
1 1
such that

0 1 0 i -1 0
σx = , σy = , σz = : ð20:41Þ
1 0 -i 0 0 1

Multiplying -i on (20.40), we have anti-Hermitian operators described by

1 0 -i 1 0 1 1 i 0
ζx = , ζy = , ζz = : ð20:42Þ
2 -i 0 2 -1 0 2 0 -i

Notice that these are traceless anti-Hermitian matrices.


Now let us seek rotation matrices in a manner related to that deriving (20.31) but
at a more general level. From Property (7)' of Chap. 15 (see Sect. 15.2), we know that

expðtAÞ

with an anti-Hermitian operator A combined with a real number t produces a unitary


matrix. Let us apply this to the anti-Hermitian matrices described by (20.42).
Using ζz of (20.42), we calculate exp(αζ z) as in (20.27). We show only the results
described by
840 20 Theory of Continuous Groups

α α
α α cos þ i sin 0
expðαζ z Þ = E  cos - iσ z sin = 2 2
α α
2 2 0 cos - i sin ð20:43Þ
2 2
iα=2
e 0
= :
0 e - iα=2

Readers are recommended to derive this. Using ζ y and applying similar calcula-
tion procedures to the above, we also get

β β
cos sin
exp βζ y = 2 2 : ð20:44Þ
β β
- sin cos
2 2

The representation matrices describe “complex rotations” and (20.43) and (20.44)
ð1=2Þ
correspond to (20.28) and (20.30), respectively. We define Dα,β,γ as the representa-
tion matrix that describes the successive rotations α, β, and γ and corresponds to R~3
in (17.101) and (20.31) including Euler angles. As a result, we have

ð1=2Þ
Dα,β,γ = expðαζz Þ exp βζ y expðγζ z Þ
β β
e2ðαþγÞ cos e2ðα - γÞ sin
i i

= 2 2 , ð20:45Þ
β β
- e2ðγ - αÞ sin e - 2ðαþγÞ cos
i i

2 2
ð1=2Þ
where the superscript 1/2 of Dα,β,γ corresponds to the non-negative generalized
angular momentum j (=1/2). Equation (20.45) can be obtained by putting a =
e2ðαþγÞ cos β2 and b = e2ðα - γÞ sin β2 in the matrix U of (20.35). Using this general
i i

SU(2) representation matrix, we construct representation matrices of higher orders.

20.2.2 SU(2) Representation Matrices: Wigner Formula

Suppose that we have two functions (or vectors) v and u. We regard v and u as
linearly independent vectors that undergo a unitary transformation by a unitary
matrix U of (20.35). That is, we have

a b
v ′ u ′ = ðv uÞU = ðv uÞ , ð20:46Þ
- b a

where (v u) are transformed into (v′ u′). Namely, we have


20.2 Rotation Groups: SU(2) and SO(3) 841

v0 = av - b u, u0 = bv þ a u:

Let us further define according to Hamermesh [5] an orthonormal set of 2j + 1


functions such that

u jþm v j - m
fm = ðm = - j; - j þ 1; ⋯; j - 1; jÞ, ð20:47Þ
ðj þ mÞ!ðj - mÞ!

where j is an integer or a half-odd-integer. Equation (20.47) implies that 2j + 1


monomials fm of 2j-degree with respect to v and u constitute the orthonormal basis.
Related discussion can be seen in Sect. 20.3.2. The description that follows includes
contents related to generalized angular momenta (see Sect. 3.4).
We examine how these functions fm (or vectors) are transformed by (20.45). We
denote this transformation by Rða, bÞ. That is, we have

ðjÞ
Rða; bÞðf m Þ  f m ′ Dm ′ m ða; bÞ, ð20:48Þ
m′

ðjÞ
where Dm0 m ða, bÞ denotes matrix elements with respect to the transformation.
Replacing u and v with u′ and v′, respectively, and using the binomial theorem we get

1
Rða, bÞf m ¼ ðbv þ a uÞjþm ðav - b uÞj - m
ðj þ mÞ!ðj - mÞ!
1
¼ μ,ν
ðbv þ a uÞjþm ðav - b uÞj - m
ðj þ mÞ!ðj - mÞ!
1 ðjþmÞ!ðj-mÞ!
¼ ða uÞjþm-μ ðbvÞμ ð-b uÞj-m-ν ðavÞν
μ,ν ðjþmÞ!ðj-mÞ! ðjþm-μÞ!μ!ðj-m-νÞ!ν!

ðj þ mÞ!ðj - mÞ!
¼ ða Þjþm - μ bμ ð- b Þj - m - ν aν u2j - μ - ν vμþν :
ðj þ m - μÞ!μ!ðj - m - νÞ!ν!
μ,ν

ð20:49Þ
Here, to eliminate ν, putting

2j - μ - ν = j þ m0 , μ þ ν = j - m0 ,
we get

ð j þ mÞ!ð j - mÞ!
ℜða, bÞf m ¼
μ, m 0
ð j þ m - μ Þ!μ! ðm0 - m þ μÞ!ð j - m0 - μÞ!
0
ða Þ jþm - μ bμ ð- b Þm - mþμ a j - m
0
- μ jþm0 j - m0
u v : ð20:50Þ

Expressing f m0 as
842 20 Theory of Continuous Groups

0 0
u jþm v j - m
f m0 = ðm0 = - j, - j þ 1, ⋯, j - 1, jÞ, ð20:51Þ
ð j þ m0 Þ!ð j - m0 Þ!

we obtain

ℜða,bÞð f m Þ
ð j þ mÞ!ð j-mÞ!ð j þ m0 Þ!ð j-m0 Þ! 0
ða Þ jþm-μ bμ ð-b Þm -mþμ a j-m - μ :
0
= f m0
μ, m 0
ð j þ m-μÞ!μ!ðm0 -m þ μÞ!ð j-m0 -μÞ!
ð20:52Þ

Equating (20.52) with (20.48), as the (m′, m) component of D( j )(a, b) we get

ð jÞ
Dm0 m ða, bÞ
ð j þ mÞ!ð j - mÞ!ð j þ m0 Þ!ð j - m0 Þ! 0
ða Þ jþm - μ bμ ð- b Þm - mþμ aj - m - μ :
0
=
μ ð j þ m - μÞ!μ!ðm0 - m þ μÞ!ð j - m0 - μÞ!
ð20:53Þ

In (20.53), factorials of negative integers must be avoided; see (3.216). Finally,


replacing a (or a) and b (or b) with e2ðαþγÞ cos β2 [or e - 2ðαþγÞ cos β2] and e2ðα - γÞ sin β2
i i i

[or e2ðγ - αÞ sin β2] by use of (20.45), respectively, we get


i

0
ð jÞ ð- 1Þm - mþμ ð j þ mÞ!ð j - mÞ!ð j þ m0 Þ!ð j - m0 Þ!
Dm0 m ðα, β, γ Þ =
μ ð j þ m - μÞ!μ!ðm0 - m þ μÞ!ð j - m0 - μÞ! ð20:54Þ
0 0
- iðαm0 þγmÞ β 2jþm - m - 2μ β m - mþ2μ
× e cos 2 sin 2 :

In (20.54), the notation D( j )(a, b) has been replaced with D( j )(α, β, γ), where α, β,
and γ represent Euler angles that appeared in (17.101). Equation (20.54) is called
Wigner formula [5–7].
ðjÞ
To confirm that Dm0 m is indeed unitary, we carry out the following calculations.
From (20.47) we have

juj2ð jþmÞ jvj2ð j - mÞ


j j j
ju jþm v j - m j
2
jf m j2 = = :
m= -j m= -j
ð j þ mÞ!ð j - mÞ! m= -j
ð j þ mÞ!ð j - mÞ!

Meanwhile, we have
20.2 Rotation Groups: SU(2) and SO(3) 843

ð2jÞ!juj2ð2 j - kÞ jvj2k ð2jÞ!juj2ð jþmÞ jvj2ð j - mÞ


2j 2j j
juj2 þ jvj2 = = ,
k=0
ð2j - kÞ!k! m= -j
ð j þ mÞ!ð j - mÞ!

where with the last equality k was replaced with j - m. Comparing the above two
equations, we get

j 2j
1
jf m j2 = juj2 þ jvj2 : ð20:55Þ
m= -j
ð2jÞ!

Viewing (20.46) as variables transformation and taking adjoint of (20.46), we


have

v0 v
= U{ , ð20:56Þ
u0 u

a b
where we defined U = (i.e., a unitary matrix). Operating (20.56) on

- b a
both sides of (20.46) from the right, we get

v0 v v
ð v0 u0 Þ = ðv uÞUU { = ð v uÞ :
u0 u u

That is, we have a quadratic form described by

ju0 j þ jv0 j ¼ juj2 þ jvj2 :


2 2
ð20:57Þ
j 2
From (20.55) and (20.57), we conclude that m = - j jf m j (i.e., the length of
ðjÞ
vector) is invariant under the transformation Dm0 m . This implies that the transforma-
ðjÞ
tion is unitary and, hence, that Dm0 m is a unitary matrix [5].

20.2.3 SO(3) Representation Matrices and Spherical Surface


Harmonics

Once representation matrices D( j )(α, β, γ) have been obtained, we are able to get
further useful information. We immediately notice that (20.54) is related to (3.216)
that is explicitly described by trigonometric functions. Putting m = 0 in (20.54) we
have
844 20 Theory of Continuous Groups

ðjÞ ðjÞ
Dm0 0 ðα, β, γ Þ = Dm0 0 ðα, βÞ
0
2j - m0 - 2μ m0 þ2μ
ð- 1Þm þμ j! ðj þ m0 Þ!ðj - m0 Þ! - iαm0 β β
= e cos sin :
μ ðj - μÞ!μ!ðm0 þ μÞ!ðj - m0 - μÞ! 2 2
ð20:58Þ

Note that (20.58) does not depend on the variable γ, because m = 0 leads to
eimγ = ei  0  γ = e0 = 1 in (20.54). Notice also that the event of m = 0 in (20.54)
never happens with the case where j is a half-odd-integer but happens with the case
where j is a half-odd-integer but happens with the case where j is an integer l. In fact,
D(l )(α, β, γ) is a (2l + 1)-degree representation of SO(3). Later in this section, we will
give a full matrix form in the case of l = 1. Replacing m′ with m in (20.58), we get

ðjÞ
Dm0 ðα, β, γ Þ
2j - m - 2μ
ð- 1Þmþμ j! ðj þ mÞ!ðj - mÞ! - iαm β β
mþ2μ
= e cos sin :
μ ðj - μÞ!μ!ðm þ μÞ!ðj - m - μÞ! 2 2
ð20:59Þ

For (20.59) to be consistent with (3.216), by changing notations of variables and


replacing j with an integer l we further rewrite (20.59) to get

ðlÞ
Dm0 ðϕ, θÞ
2l - 2r - m
ð- 1Þmþr l! ðl þ mÞ!ðl - mÞ! - imϕ θ θ
2rþm
= e cos sin :
r r!ðl - m - r Þ!ðl - r Þ!ðm þ r Þ! 2 2
ð20:60Þ

Notice that in (20.60) we deleted the variable γ as it was redundant. Comparing


(20.60) with (3.216), we get [5]

ðlÞ  4π m
Dm0 ðϕ, θÞ = Y ðθ, ϕÞ: ð20:61Þ
2l þ 1 l

From a general point of view, let us return to a discussion of SU(2) and introduce
a following important theorem in relation to the basis functions of SU(2).
Theorem 20.1 [1] Let {ψ -j, ψ -j + 1, ⋯, ψ j} be a set of (2j + 1) functions. Let D( j )
be a representation of the special unitary group SU(2). Suppose that we have a
following expression for ψ m, k (m = - j, ⋯, j) such that
20.2 Rotation Groups: SU(2) and SO(3) 845

Fig. 20.2 Coordinate


systems O, I, and II and their
( , , )
I
transformation. Regarding
the symbols and notations,
see text

( , , )
II


ðjÞ
ψ m,k ðα, β, γ Þ = Dmk ðα, β, γ Þ , ð20:62Þ

where α, β, and γ are Euler angles that appeared in (17.101). Then, the above
functions are basis functions of the representation of D( j ).
Proof In Fig. 20.2 we show coordinate systems O, I, and II. Let P, Q, R be operators
of transformation (i.e., rotation) between the coordinate system as shown. The
transformation Q is defined as the coordinate transformation that transforms I to
II. Note that their inverse transformations, e.g., Q-1 exist. In. Fig. 20.2, P and R are
specified as P(α0, β0, γ 0) and R(α, β, γ), respectively. Then, we have [1, 2]

Q - 1 R = P: ð20:63Þ

According to the expression (19.12), we have

ðjÞ 
Qψ m,k ðα, β, γ Þ = ψ m,k Q - 1 ðα, β, γ Þ = ψ m,k ðα0 , β0 , γ 0 Þ = Dmk Q - 1 R ,

where the last equality comes from (20.62) and (20.63); (α, β, γ) and (α0, β0, γ 0) stand
for the transformations R(α, β, γ) and P(α0, β0, γ 0), respectively. The matrix element
ðjÞ
Dmk Q - 1 R can be written as


ðjÞ ðjÞ ðjÞ
Dmk Q - 1 R = Dðmn

Q - 1 Dnk ðRÞ = Dðnm

ðQÞ Dnk ðRÞ,
n n
846 20 Theory of Continuous Groups

where with the last equality we used the unitarity of the representation matrix D( j ).
Taking complex conjugate of the above expression, we get

ðjÞ
Qψ m,k ðα, β, γ Þ = ψ m,k ðα0 , β0 , γ 0 Þ = Dðnm

ðQÞ Dnk ðRÞ
n


ðjÞ
= Dðnm

ðQÞ Dnk ðα, β, γ Þ = ψ n,k ðα, β, γ ÞDðnm

ðQÞ, ð20:64Þ
n n

where with the last equality we used the supposition (20.62). Equation (20.64)
implies that ψ m, k (m = - j, ⋯, 0, ⋯, j) are basis functions of the representation
of D( j ). This completes the proof.
If we restrict Theorem 20.1 to SO(3), we get an important result. Replacing j with
an integer l and putting k = 0 in (20.62) and (20.64), we get

Qψ m,0 ðα, β, γ Þ = ψ m,0 ðα0 , β0 , γ 0 Þ = ψ n,0 ðα, β, γ ÞDðnm



ðQÞ,
n

ðlÞ 
ψ m,0 ðα, β, γ Þ = Dm0 ðα, β, γ Þ : ð20:65Þ

This expression shows that the complex conjugates of individual components for
the “center” column vector of the representation matrix give the basis functions of
the representation of D(l ). Removing γ as it is redundant and by the aid of (20.61), we
get

4π m
ψ m,0 ðα, βÞ  Y ðβ, αÞ:
2l þ 1 l

Thus, in combination with (20.61), Theorem 20.1 immediately indicates that the
spherical surface harmonics Y m l ðθ, ϕÞ ðm = - l, ⋯, 0, ⋯, lÞ span the representation
space with respect to D(l ). That is, we have

l ðβ, αÞ =
QY m Y nl ðβ, αÞDðnm

ðQÞ: ð20:66Þ
n

Now, let us get back to (20.63). From (20.63), we have

Q = RP - 1 : ð20:67Þ

Since P represents an arbitrary transformation, we may choose any (orthogonal)


coordinate system I in ℝ3. Meanwhile, a transformation which transforms I into II is
necessarily present and given by Q that is expressed as (20.67); see (11.72) and the
relevant argument of Sect. 11.4. Consequently, Q represents an arbitrary transfor-
mation in ℝ3 in turn.
20.2 Rotation Groups: SU(2) and SO(3) 847

Now, in (20.67) putting P = E we get Q  R(α, β, γ). Then, replacing Q with R in


(20.66) as a special case we have

l ðβ, αÞ =
Rðα, β, γ ÞY m Y nl ðβ, αÞDðnm

ðα, β, γ Þ: ð20:68Þ
n

The LHS of (20.68) is associated with three successive transformations. To


describe it appropriately, let us recall (19.12) again. We have

Rf ðrÞ = f R - 1 ðrÞ : ð19:12Þ

Now, we are thinking of the case where R in (19.12) is described by R(α, β, γ) or


R3 = Rzα Ry0 0 β Rz0000 γ in (17.101). That is, we have

Rðα, β, γ Þf ðrÞ = f Rðα, β, γ Þ - 1 ðrÞ :

Considering the constitution of R3 of (17.101), we readily get [5]

Rðα, β, γ Þ - 1 = Rð- γ, - β, - αÞ:

Thus, we have

Rðα, β, γ Þf ðrÞ = f ½Rð- γ, - β, - αÞðrÞ: ð20:69Þ

Applying (20.69) to (20.68), we get the successive (inverse) transformations of


the functions such that

l ðβ, αÞ → Y l ðβ, α - γ Þ → Y l ðβ - β, α - γ Þ → Y l ðβ - β, α - γ - αÞ:


Ym m m m

The last entry in the above is [5]

l ðβ - β, α - γ - αÞ = Y l ð0, - γ Þ:
Ym m

Then, from (20.68) we get

l ð0, - γ Þ =
Ym Y nl ðβ, αÞDðnm

ðα, β, γ Þ: ð20:70Þ
n

Meanwhile, from (20.61) we have


848 20 Theory of Continuous Groups

2l þ 1 ðlÞ 
l ð0, - γ Þ =
Ym

Dm0 ð- γ, 0Þ :

Returning to (20.60), if we would have sin 2θ factor in (20.60) or in (3.216), the


ðlÞ
relevant terms vanish with Dm0 ð- γ, 0Þ except for special cases where 2r + m = 0 in
(3.216) or (20.60). But, to avoid having a negative integer inside the factorial in the
denominator of (20.60), must have r + m ≥ 0. Since r ≥ 0, we must have
2r + m = (r + m) + r ≥ 0 as well. In other words, only if r = m = 0 in (20.60),
ðlÞ
we have 2r + m = 0, for which Dm0 ð- γ, 0Þ does not vanish. Using these conditions
in (20.60), (20.60) is greatly simplified to be

ðlÞ
Dm0 ð- γ, 0Þ = δm0 :

From (20.61) we get [5]

2l þ 1
l ð0, - γ Þ =
Ym
4π 0m
δ : ð20:71Þ

Notice that this expression is consistent with (3.147) and (3.148). Meanwhile,
replacing ϕ and θ in (20.61) with α and β, respectively, multiplying the resulting
ðlÞ
equation by Dmk ðα, β, γ Þ, and further summing over m, we get

4π m ðlÞ ðlÞ  ðlÞ


Y ðβ, αÞDmk ðα, β, γ Þ = Dm0 ðα, β, γ Þ Dmk ðα, β, γ Þ = δ0k
m 2l þ 1 l m
ð20:72Þ
4π k
= Y ð0, - γ Þ,
2l þ 1 l

where with the second equality we used the unitarity of the matrices Dðnm

ðα, β, γ Þ (see
Sect. 20.2.2) and with the last equality we used (20.70) multiplied by the constant

2lþ1. At the same time, we recovered (20.71) comparing the last two sides of
(20.72).
Rewriting (20.48) in a (2j + 1, 2j + 1) matrix form, we get

f - j f - jþ1 ⋯ f j - 1 f j ℜða, bÞ
D - j, - j ⋯ D - j,j
ð20:73Þ
= f - j f - jþ1 ⋯ f j - 1 f j ⋮ ⋱ ⋮ ,
Dj, - j ⋯ Dj,j

where fm (m = - j, -j + 1, ⋯, j - 1, j) is described as in (20.47). Though abstract,


the matrix on RHS of (20.73) gives (2j + 1)-dimensional representation D( j ) of
20.2 Rotation Groups: SU(2) and SO(3) 849

(20.54). If j is an integer l, we can choose Y m l ðβ, αÞ for individual fm (i.e., basis


functions).
To get used to the abstract representation theory in continuous groups, we wish to
think of a following trivial case: Putting j = 0 in (20.54), (20.54) becomes trivial, but
still valid. In that case, we have m′ = m = 0 so that the inside of the factorial can be
non-negative. In turn, we must have μ = 0 as well. Thus, we have

ð0Þ
D00 ðα, β, γ Þ = 1:

Or, inserting l = m = 0 into (20.61), we have

ð0Þ  p
D00 ðϕ, θÞ = 1 = 4π Y 00 ðθ, ϕÞ, i:e:, Y 00 ðθ, ϕÞ = 1=4π:

Thus, we recover (3.150).


Next, let us reproduce three-dimensional representation D(1) using (20.54). A full
matrix form is expressed as

D - 1, - 1 D - 1,0 D - 1,1
ð1Þ
D = D0, - 1 D0,0 D0,1
D1, - 1 D1,0 D1,1
β 1 iα β
eiðαþγÞ cos 2 p e sin β eiðα - γÞ sin 2 ð20:74Þ
2 2 2
1 1
= - p eiγ sin β cos β p e - iγ sin β :
2 2
β 1 β
eiðγ - αÞ sin 2 - p e - iα sin β e - iðαþγÞ cos 2
2 2 2

Equation (20.74) is a special case of (20.73), in which j = 1. Note that (20.74) is a


unitary matrix. The confirmation is left for readers. The complex conjugate of the
column vector m = 0 in (20.61) with l = 1 is described by

1
p e - iα sin β
ð1Þ  2
Dm0 ðϕ; θÞ = cos β ð20:75Þ
1 iα
- p e sin β
2

that is related to the spherical surface harmonics (see Sect. 3.6); also see p. 760,
Table 15.4 of Ref. [4].
Now, (20.25), (20.61), (20.65), and (20.73) provide a clue to relating D(1) to the
rotation matrix of ℝ3, i.e., R~3 of (17.101) that includes Euler angles. The argument is
as follows: As already shown in (20.61) and (20.65), we have related the basis
850 20 Theory of Continuous Groups

vectors ( f-1 f0 f1) of D(1) to the spherical harmonics Y 1- 1 Y 01 Y 11 . Denoting the


spherical basis by

~ 1  p1 ðx - iyÞ,
f- f~0  z,
1
f~1  p ð- x - iyÞ, ð20:76Þ
2 2

we have

1 1
p 0 - p
2 2
~ 1 f~0 f~1 = ðx z yÞU = ðx z yÞ
f- , ð20:77Þ
0 1 0
i i
- p - p
2 0 2

where we define a (unitary) matrix U as

1 1
p 0 - p
2 2
U 0 1 0 :
i i
- p - p
2 0 2

Equation (20.77) describes the relationship among two set of functions


~ 1 f~0 f~1 and (x z y). We should not regard (x z y) as coordinate in ℝ3 as
f-
mentioned in Sect. 20.1. The (3, 3) matrix of (20.77) is essentially the same as P of
(20.22). Following (20.73) and taking account of (20.25), we express
~ 1 0 f~0 0 f~1 0 as
f-

~ 1 0 f~0 0 f~1 0  f -
f- ~ 1 f~0 f~1 Dð1Þ :
~ 1 f~0 f~1 Rða, bÞ = f - ð20:78Þ

~ 1 f~0 f~1 forms the basis vectors of D(1) as


This means that a set of functions f -
well. Meanwhile, the relation (20.77) holds after the transformation, because (20.77)
is independent of the choice of specific coordinate systems. That is, we have

~ 1 0 f~0 0 f~1 0 = ðx0 z0 y0 Þ U:


f-

Then, from (20.78) we get


20.2 Rotation Groups: SU(2) and SO(3) 851

~ 1 0 f~0 0 f~1 0 U - 1 = f -
ð x0 z 0 y0 Þ = f - ~ 1 f~0 f~1 Dð1Þ U - 1

= ðx z yÞUDð1Þ U - 1 : ð20:79Þ

Defining a unitary matrix V as

1 0 0
V= 0 0 1
0 1 0

and operating V on both sides of (20.79) from the right, we get

ðx0 z0 y0 ÞV = ðx z yÞV  V - 1 UDð1Þ U - 1 V:

That is, we have

-1
ð x0 y0 z 0 Þ = ð x y z Þ U - 1 V Dð1Þ U - 1 V: ð20:80Þ

Note that (x′ z′ y′)V = (x′ y′ z′) and (x z y)V = (x y z); that is, V exchanges the order
of z and y in (x z y).
Meanwhile, regarding (x y z) as the basis vectors in ℝ3 we have

ðx0 y0 z0 Þ = ðx y zÞ R , ð20:81Þ

where R represents a (3, 3) orthogonal matrix. Equation (20.81) represents a rotation


in SO(3). Since, x, y, and z are linearly independent, comparing (20.80) and (20.81)
we get

-1 {
R  U - 1V Dð1Þ U - 1 V = U - 1 V Dð1Þ U - 1 V: ð20:82Þ

β β
More explicitly, using a = e2ðαþγÞ cos and b = e2ðα - γÞ sin
i i
2 2 as in the case of
(20.45), we describe R as

Reða2 - b2 Þ - Imða2 þb2 Þ 2ReðabÞ


R= Imða2 - b2 Þ Reða2 þb2 Þ 2ImðabÞ : ð20:83Þ
- 2Reðab Þ 2Imðab Þ jaj2 - jbj2

In (20.83), Re and Im denote a real part and an imaginary part of a complex


number, respectively.
Comparing (20.83) with (17.101), we find out that R is identical to R~3 of
(17.101). Equations (20.82) and (20.83) clearly show that D(1) and R  R~3 are
connected to each other through the unitary similarity transformation, even though
these matrices differ in that the former matrix is unitary and the latter is a real
orthogonal matrix. That is, D(1) and R~3 are equivalent in terms of representation; see
852 20 Theory of Continuous Groups

Schur’s First Lemma of Sect. 18.3. Thus, D(1) is certainly a representation matrix of
SO(3). The trace of D(1) and R~3 is given by

TrDð1Þ = Tr R~3 = cosðα þ γ Þð1 þ cos βÞ þ cos β:

Let us continue the discussion still further. We think of the quantity (x/r y/r z/r),
where r is radial coordinate of (20.25). Operating (20.82) on (x/r y/r z/r) from the
right, we have

ðx=r y=r z=r ÞR = ðx=r y=r z=r ÞV - 1 UDð1Þ U - 1 V


= f - 1 =r f 0 =r f 1 =r Dð1Þ U - 1 V:

From (20.25), (20.61), (20.75), and (20.76), we get

~ 1 =r f~0 =r f~1 =r = p1 e - iα sin β cos β - p1 eiα sin β :


f- ð20:84Þ
2 2

Using (20.74) and (20.84), we obtain

~ 1 =r f~0 =r f~1 =r Dð1Þ = ð0 1 0Þ:


f-

This is a tangible example of (20.72). Then, we have

~ 1 =r f~0 =r f~1 =r Dð1Þ U - 1 V = ðx=r y=r z=r ÞR = ð0 0 1Þ:


f- ð20:85Þ

For the confirmation of (20.85), use the spherical coordinate representation (3.17)
to denote x, y, and z in the above equation. This seems to be merely a confirmation of
the unitarity of representation matrices; see (20.72). Nonetheless, we emphasize that
the simple relation (20.85) holds only with a special case where the parameters α and
β in D(1) (or R ) are identical with the angular components of spherical coordinates
associated with the Cartesian coordinates x, y, and z. Equation (20.85), however,
does not hold in a general case where the parameters α and β are different from those
angular components. In other words, (20.68) that describes the special case where
Q  R(α, β, γ) leads to (20.85); see Fig. 20.2. Equation (20.66), however, is used for
the general case where Q represents an arbitrary transformation in ℝ3.
Regarding further detailed discussion on the topics, readers are referred to the
literature [1, 2, 5, 6].
20.2 Rotation Groups: SU(2) and SO(3) 853

20.2.4 Irreducible Representations of SU(2) and SO(3)

In this section we further explore characteristics of the representation of SU(2) and


SO(3) and show the irreducibility of the representation matrices of SU(2) and SO(3).
To prove the irreducibility of the SU(2) representation matrices D( j )(α, β, γ), we
need to use Schur’s lemmas already explained in Sect. 18.3. At present, our task is to
find a suitable matrix A that commutes with D( j )(α, β, γ). Since a general form of
D( j )(α, β, γ) has a complicated structure as shown in (20.54), it would be a formida-
ble task to directly deal with D( j )(α, β, γ) to verify its irreducibility. To gain clear
understanding, we impose a restriction on D( j )(α, β, γ) so that we can readily perform
related calculations [5]. To this end, in (20.54) we consider a special case where
β = 0; i.e., D( j )(α, 0, γ). The point is that if we can find a matrix A which commutes
with D( j )(α, 0, γ), the matrix that commutes with D( j )(α, β, γ) of a general form must
be of the type A as well (if such matrix does exist).
We wish to seek conditions under which D( j )(α, 0, γ) does not vanish. In (20.54)
the exponential term never vanishes. When β = 0, cos β2 ð= 1Þ does not vanish
0
β m - mþ2μ
either. For the factor sin 2 not to vanish, we must have

m0 - m þ 2μ = 0: ð20:86Þ

In the denominator of (20.54), to avoid a negative number in factorials we must


have m′ - m + μ ≥ 0 and μ ≥ 0. If μ > 0, m′ - m + 2μ = (m′ - m + μ) + μ > 0 as well.
Therefore, for (20.86) to hold, we must have μ = 0. From (20.86), in turn, this means
m′ - m = 0, i.e., m′ = m. Then, we have a greatly simplified expression with the
ðjÞ
matrix elements Dm0 m ðα, 0, γ Þ such that

ðjÞ 0
Dm0 m ðα, 0, γ Þ = δm0 m e - iðαm þγmÞ : ð20:87Þ

This implies that D( j )(α, 0, γ) is diagonal. Suppose that we have a matrix


A = ðAm0 m Þ. If A commutes with D( j )(α, 0, γ), it must be diagonal from (20.87).
That is, A should have a form Am0 m = am δm0 m . This can readily be confirmed using
simple matrix algebra. The confirmation is left for readers. We seek further condi-
tions for A to commute with D( j ) that has a general form of D( j )(α, β, γ) (β ≠ 0). For
this purpose, we consider the ( j, m) component of D( j )(α, β, γ). Putting m′ = j in
(20.54), we obtain ( j - m′ - μ) ! = (-μ)! in the denominator. Then, for (20.54) to
be meaningful, we must have μ = 0. Hence, we get

ðjÞ
Djm ðα, β, γ Þ
jþm j-m
ð2jÞ! β β
= ð- 1Þj - m e - iðαjþγmÞ cos sin : ð20:88Þ
ðj þ mÞ!ðj - mÞ! 2 2
854 20 Theory of Continuous Groups

With respect to (20.88), we think of the following three cases: (i) m ≠ j, m ≠ - j;


ðjÞ
(ii) m = j; (iii) m = - j. For the case (i), Djm ðα, β, γ Þ vanishes only when β = 0 or π.
ðjÞ
For the case (ii), it vanishes only when β = π; in the case (iii), Djm ðα, β, γ Þ vanishes
ðjÞ
only when β = 0. The matrix element Djm ðα, β, γ Þ does not vanish other than the
above three cases.
Taking account of the above consideration, we continue our discussion. If
A commutes with D( j )(α, β, γ), then with respect to the ( j, m) component of
AD( j )(α, β, γ) we have

ðjÞ ðjÞ ðjÞ


ADðjÞ = A D
k jk km
= aδ D
k k jk km
= aj Djm : ð20:89Þ
jm

Meanwhile, we obtain

ðjÞ ðjÞ ðjÞ


DðjÞ A = k
Djk Akm = k
Djk am δkm = am Djm : ð20:90Þ
jm

From (20.89) and (20.90), for A to commute with D( j ) we must have

ðjÞ
Djm ðα, β, γ Þ aj - am = 0: ð20:91Þ

ðjÞ
Considering that the matrix elements Djm ðα, β, γ Þ in (20.88) do not vanish in a
general case for α, β, γ, from (20.91) we must have aj - am = 0 for m that satisfies
1 ≤ m ≤ j - 1. When m = j, we have a trivial equation 0 = 0. This implies that aj
cannot be determined. Thus, putting am = aj  a ≠ 0 (1 ≤ m ≤ j), we have

Am0 m = aδm0 m ,

where a is an arbitrary complex constant. Then, from Schur’s Second Lemma we


conclude that D( j )(α, β, γ) must be irreducible.
We should be careful, however, about the application of Schur’s Second Lemma.
This is because Schur’s Second Lemma described in Sect. 18.3 says the following:
A representation D is irreducible ⟹ The matrix M that is commutative with all D(g)
is limited to a form of M = cE.
Nonetheless, we are uncertain of truth or falsehood of the converse proposition.
Fortunately, however, the converse proposition is true if the representation is unitary.
We prove the converse proposition by its contraposition such that
A representation D is not irreducible (i.e., reducible) ⟹ Of the matrices that are
commutative with all D(g), we can find at least one matrix M of the type other
than M = cE.
20.2 Rotation Groups: SU(2) and SO(3) 855

In this context, Theorem 18.2 (Sect. 18.2) as well as (18.38) teaches us that the
unitary representation D(g) is completely reducible so that we can describe it as a
direct sum such that

DðgÞ = Dð1Þ ðgÞ Dð2Þ ðgÞ ⋯ DðωÞ ðgÞ, ð20:92Þ

where g is any group element. Then, we can choose a following matrix A for a linear
transformation that is commutative with D(g):

A = αð1Þ E 1 αð2Þ E 2 ⋯ αðωÞ Eω , ð20:93Þ

where α(1), ⋯, α(ω) are complex constants that may be different from one another;
E1, ⋯, Eω are unit matrices having the same dimension as D(1)(g), ⋯, D(ω)(g),
respectively. Then, even though A is not a type of A = αδm0 m , A commutes with D(g)
for any g.
Here we rephrase Schur’s Second Lemma as the following theorem.
Theorem 20.2 Let D be a unitary representation of a group G. Suppose that with
8
g 2 G we have

DðgÞM = MDðgÞ: ð18:48Þ

A necessary and sufficient condition for the said unitary representation to be


irreducible is that linear transformations commutative with D(g)(8g 2 G) are limited
to those of a form described by A = αδm0 m , where α is a (complex) constant.
Next, we examine the irreducibility of the SO(3) representation matrices. We
develop the discussion in conjunction with explanation about the important proper-
ties of the representation matrices of SU(2) as well as SO(3). We rewrite (20.54)
when j = l (l : zero or positive integers) to relate it to the rotation in ℝ3. Namely, we
have
0

ðlÞ ð- 1Þm - mþμ ðl þ mÞ!ðl - mÞ!ðl þ m0 Þ!ðl - m0 Þ!


D ðα, β, γ Þ =
m0 m μ ðl þ m - μÞ!μ!ðm0 - m þ μÞ!ðl - m0 - μÞ!
2lþm - m0 - 2μ m0 - mþ2μ
- iðαm0 þγmÞ β β
×e cos sin :
2 2
ð20:94Þ

Meanwhile, as mentioned in Sect. 20.2.3 the transformation R~3 (appearing in


Sect. 17.4.2) described by
856 20 Theory of Continuous Groups

R~3 = Rzα R ′ y0 β R″z00 γ

may be regarded as R(α, β, γ) of (20.63). Consequently, the representation matrix


D(l )(α, β, γ) is described by

ðlÞ
DðlÞ ðα, β, γ Þ  DðlÞ R~3 = DðαlÞ ðRzα ÞDβ R0y0 β DðγlÞ R ′ ′ z00 γ : ð20:95Þ

Equation (20.95) is based upon the definition of representation (18.2). This


ð1=2Þ
notation is in parallel with (20.45) in which Dα,β,γ is described by the product of
three exponential functions of matrices each of which was associated with the
rotation characterized by Euler angles of (17.101). Denoting
ðlÞ ðlÞ ðlÞ 0 ðlÞ ðlÞ ðlÞ
Dα ðRzα Þ = D ðα, 0, 0Þ, Dβ Ry0 β = D ð0, β, 0Þ, and Dγ R ′ ′ z γ = D ð0, 0, γ Þ,
00

we have

DðlÞ ðα, β, γ Þ = DðlÞ ðα, 0, 0ÞDðlÞ ð0, β, 0ÞDðlÞ ð0, 0, γ Þ, ð20:96Þ

where with each representation two out of three Euler angles are zero.
In light of (20.94), we estimate each factor of (20.96). By the same token as
before, putting β = γ = 0 in (20.94) let us examine on what conditions
m0 - mþ2μ
sin 02 survives. For this factor to survive, we must have m′ -
m + 2μ = 0 as in (20.87). Then, following the argument as before, we should have
μ = 0 and m′ = m. Thus, D(l )(α, 0, 0) must be a diagonal matrix. This is the case with
D(l )(0, 0, γ) as well.
Equation (3.216) implies that the functions Y m l ðβ, αÞ ðm = - l, ⋯, 0, ⋯, lÞ are
eigenfunctions with regard to the rotation about the z-axis. In other words, we have

- imα m
l ðθ, ϕÞ = Y l ðθ, ϕ - αÞ = e
Rzα Y m Y l ðθ, ϕÞ:
m

l ðθ, ϕÞ can be described as


This is because from (3.216) Y m

l ðθ, ϕÞ = F ðθ Þe
Ym imϕ
,

where F(θ) is a function of θ. Hence, we have

imðϕ - αÞ
l ðθ, ϕ - αÞ = F ðθ Þe
Ym = e - imα F ðθÞeimϕ = e - imα Y m
l ðθ, ϕÞ:

Therefore, with a full matrix representation we get


20.2 Rotation Groups: SU(2) and SO(3) 857

Y l- l Y l- lþ1 ⋯Y 0l ⋯Y ll Rzα
e - ið- lÞα

e - ið- lþ1Þα
= Y l- l Y l- lþ1 ⋯ Y 0l ⋯Y ll

e - ilα ð20:97Þ
ilα
e

eiðl - 1Þα
= Y l- l Y l- lþ1 ⋯Y 0l ⋯Y ll :

e - ilα

With a shorthand notation, we have

DðlÞ ðα, 0, 0Þ = e - imα δm0 m ðm = - l, ⋯, lÞ: ð20:98Þ

From (20.96), with the (m′, m) matrix elements of D(l )(α, β, γ) we get

DðlÞ ðα, β, γ Þ
m0 m

= DðlÞ ðα, 0, 0Þ DðlÞ ð0, β, 0Þ DðlÞ ð0, 0, γ Þ


s, t m0 s st tm
ð20:99Þ
= e - isα δm0 s DðlÞ ð0, β, 0Þ e - imγ δtm
s, t st

0
= e - im α DðlÞ ð0, β, 0Þ e - imγ :
m0 m

Meanwhile, from (20.94) we have

DðlÞ ð0, β, 0Þ
m0 m
0
2lþm - m0 - 2μ m0 - mþ2μ
ð- 1Þm - mþμ ðl þ mÞ!ðl - mÞ!ðl þ m0 Þ!ðl - m0 Þ! β β
¼ cos sin :
μ ðl þ m - μÞ!μ!ðm0 - m þ μÞ!ðl - m0 - μÞ! 2 2
ð20:100Þ

Thus, we find that [DðlÞ ðα, β, γ Þm0 m of (20.99) has been factorized into three
factors. Equation (20.94) has such an implication. The same characteristic is shared
with the SU(2) representation matrices (20.54) more generally. This can readily be
understood from the aforementioned discussion.
Taking D(1) of (20.74) as an example, we have
858 20 Theory of Continuous Groups

Dð1Þ ðα, β, γ Þ ¼ Dð1Þ ðα, 0, 0ÞDð1Þ ð0, β, 0ÞDð1Þ ð0, 0, γ Þ


β 1 β
cos 2 p sin β sin 2
iα 2 2 2
e 0 0 eiγ 0 0
1 1
¼ 0 1 0 - p sin β cos β p sin β 0 1 0
2 2
0 0 e - iα β 1 β 0 0 e - iγ
sin 2 - p sin β cos 2
2 2 2
β 1 β
eiðαþγ Þ cos 2 p e sin β

eiðα - γ Þ sin 2
2 2 2
1 1
¼ - p eiγ sin β cos β p e - iγ sin β :
2 2
β 1 - β
eiðγ - αÞ sin 2
- p e sin β e - iðαþγÞ cos 2

2 2 2
ð20:74Þ

In this way, we have reproduced the result of (20.74).


Once D(l )(α, β, γ) has been factorized as in (20.94), we can readily examine
whether the representation is irreducible. To know what kinds of matrices commute
with D(l )(α, β, γ), it suffices to examine whether the matrices commute with individ-
ual factors. The argument is due to Hamermesh [5].
(i) Let A be a matrix having a dimension the same as that of D(l )(α, β, γ),
namely A is a (2l + 1, 2l + 1) square matrix with a form of A = (aij). First, we examine
the commutativity of D(l )(α, 0, 0) and A. With the matrix elements we have

ADðlÞ ðα, 0, 0Þ m0 m
= am0 k DðlÞ ðα, 0, 0Þ km
= am0 k e - imα δkm
k k ð20:101Þ
- imα
= am 0 m e ,

where the second equality comes from (20.98). Also, we have

0
DðlÞ ðα, 0, 0ÞA = DðlÞ ðα, 0, 0Þ akm = am0 m e - im α : ð20:102Þ
m0 m m0 k
k

From (20.101) and (20.102), for a condition of commutativity we have

0
am0 m e - imα - e - im α = 0:

If m′ ≠ m, we should have am0 m = 0. If m′ = m, am0 m does not have to vanish but


may take an arbitrary complex number. Hence, A is a diagonal matrix, namely

A = am δm0 m , ð20:103Þ

where am is an arbitrary complex number.


20.2 Rotation Groups: SU(2) and SO(3) 859

(ii) Next we examine the commutativity of D(l )(0, β, 0) and A of


(20.103). Using (20.66), we have

l ðβ, αÞ =
QY m Y nl ðβ, αÞDðnm

ðQÞ: ð20:66Þ
n

Choosing R(0, θ, 0) for Q and putting α = 0 in (20.66), we have

l ðβ, 0Þ =
Rð0, θ, 0ÞY m Y nl ðβ, 0ÞDðnm

ð0, θ, 0Þ: ð20:104Þ
n

Meanwhile, from (20.69) we get

-1
l ðβ, 0Þ = Y l Rð0, θ, 0Þ
Rð0, θ, 0ÞY m m
ðβ, 0Þ = Y m
l ½Rð0, - θ, 0Þðβ, 0Þ
= Y l ðβ - θ, 0Þ:
m

ð20:105Þ

Combining (20.104) and (20.105) we get

l ðβ - θ, 0Þ =
Ym Y nl ðβ, 0ÞDðnm

ð0, θ, 0Þ: ð20:106Þ
n

Further putting β = 0 in (20.106), we have

l ð- θ, 0Þ =
Ym Y nl ð0, 0ÞDðnm

ð0, θ, 0Þ: ð20:107Þ
n

Taking account of (20.71), RHS does not vanish in (20.107) only when n = 0. On
this condition, we obtain

ðlÞ
l ð- θ, 0Þ = Y l ð0, 0ÞD0m ð0, θ, 0Þ:
Ym ð20:108Þ
0

Meanwhile, from (20.71) with m = 0, we have

2l þ 1
Y 0l ð0, 0Þ = : ð20:109Þ

Inserting this into (20.108), we have

2l þ 1 ðlÞ
l ð- θ, 0Þ =
Ym D ð0, θ, 0Þ:
4π 0m
ð20:110Þ

Replacing θ with -θ in (20.110), we get


860 20 Theory of Continuous Groups

ðlÞ 4π m
D0m ð0, - θ, 0Þ = Y ðθ, 0Þ: ð20:111Þ
2l þ 1 l

Compare (20.111) with (20.61) and confirm (20.111) using (20.74) with l = 1.
Viewing (20.111) as a “row vector,” (20.111) with l = 1 can be expressed as

ð1Þ ð1Þ ð1Þ


D0, - 1 ð0, - θ, 0Þ D0,0 ð0, - θ, 0Þ D0,1 ð0, - θ, 0Þ

4π - 1 1
= Y ðθ, 0Þ Y 01 ðθ, 0Þ Y 11 ðθ, 0Þ = p sin θ cos θ - p1 sin θ :
3 1 2 2

The above relation should be compared with (20.74).


Getting back to our topic, with the matrix elements we have

ADðlÞ ð0, θ, 0Þ = ak δ0k DðlÞ ð0, θ, 0Þ = a0 DðlÞ ð0, θ, 0Þ


0m km 0m
k

and

DðlÞ ð0, θ, 0ÞA = DðlÞ ð0, θ, 0Þ am δkm = DðlÞ ð0, θ, 0Þ am ,


0m 0k 0m
k

where we assume A = am δm0 m in (20.103). Since from (20.108) [D(l )(0, θ, 0)]0m does
not vanish other than specific values of θ, for A and D(l )(0, θ, 0) to commute we must
have

a m = a0

for all m = - l, - l + 1, ⋯, 0, ⋯, l - 1, l. Then, from (20.103) we get

A = a0 δ m 0 m , ð20:112Þ

where a0 is an arbitrary complex number. Thus, a matrix that commutes with both
D(l )(α, 0, 0) and D(l )(0, β, 0) must be of a form of (20.112). The same discussion
holds with D(l )(α, β, γ) of (20.94) as a whole with regard to the commutativity. On
the basis of Schur’s Second Lemma (Sect. 18.3), we conclude that the representation
D(l )(α, β, γ) is again irreducible. This is one of the prominent features of SO(3) as
well as SU(2).
As can be seen clearly in (20.97), the dimension of representation matrix
D(l )(α, β, γ) is 2l + 1. Suppose that there is another representation matrix
0 0
Dðl Þ ðα, β, γ Þ ðl0 ≠ lÞ. Then, D(l )(α, β, γ) and Dðl Þ ðα, β, γ Þ are inequivalent (see Sect.
18.3). Meanwhile, the unitary representation of SO(3) is completely reducible and,
hence, any reducible unitary representation D can be described by
20.2 Rotation Groups: SU(2) and SO(3) 861

D= al DðlÞ ðα, β, γ Þ, ð20:113Þ


l

where al is zero or a positive integer. If al ≥ 2, this means that the same represen-
tations D(l )(α, β, γ) repeatedly appear. Equation (20.113) provides a tangible exam-
ple of (20.92) expressed as

DðgÞ = Dð1Þ ðgÞ Dð2Þ ðgÞ ⋯ DðωÞ ðgÞ ð20:92Þ

and applies to the representation matrices of SU(2) more generally. We will encoun-
ter related discussion in Sect. 20.3 in connection with the direct-product
representation.

20.2.5 Parameter Space of SO(3)

As already discussed in Sect. 17.4.2, we need three angles α, β, and γ to specify the
rotation in ℝ3. Their domains are usually taken as follows:

0 ≤ α ≤ 2π, 0 ≤ β ≤ π, 0 ≤ γ ≤ 2π:

The domains defined above are referred to as a parameter space.


Yet, there are different implications depending upon choosing different coordi-
nate systems. The first choice is a moving coordinate system where α, β, and γ are
Euler angles (see Sect. 17.4.2). The second choice is a fixed coordinate system where
α is an azimuthal angle, β is a zenithal angle, and γ is a rotation angle. Notice that in
the former case α, β, and γ represent equivalent rotation angles. In the latter case,
however, although α and β define the orientation of a rotation axis, γ is designated as
a rotation angle. In the present section we further discuss characteristics of the
parameter space.
First, we study several characteristics of (3, 3) real orthogonal matrices. (i) The
(3, 3) real orthogonal matrices have eigenvalues 1, eiγ , and e-iγ (0 ≤ γ ≤ π). This is a
direct consequence of (17.77) and invariance of the characteristic equation of the
matrix. The rotation axis is associated with an eigenvector that belongs to the
eigenvalue 1.
Let A = (aij) (1 ≤ i, j ≤ 3) be a (3, 3) real orthogonal matrix. Let u be an
eigenvector belonging to the eigenvalue 1. We suppose that when we are thinking
of the rotation on the fixed coordinate system, u is given by a column vector such
that
862 20 Theory of Continuous Groups

u1
u= u2 :
u3

Then, we have

A u = 1u = u: ð20:114Þ

Operating AT on both sides of (20.114), we get

AT A u = Eu = u = AT u, ð20:115Þ

where we used the property of an orthogonal matrix; i.e., ATA = E. From (20.114)
and (20.115), we have

A - AT u = 0: ð20:116Þ

Writing (20.116) in matrix components, we have [5]

a11 a12 a13 a11 a21 a31 u1


a21 a22 a23 - a12 a22 a32 u2 = 0: ð20:117Þ
a31 a32 a33 a13 a23 a33 u3

That is,

ða12 - a21 Þu2 þ ða13 - a31 Þu3 = 0,


ða21 - a12 Þu1 þ ða23 - a32 Þu3 = 0, ð20:118Þ
ða31 - a13 Þu1 þ ða32 - a23 Þu2 = 0:

Solving (20.118), we get

u1 ∶u2 ∶u3 = ða32 - a23 Þ∶ða13 - a31 Þ∶ða21 - a12 Þ: ð20:119Þ

u1
If we normalize u, i.e., u12 + u22 + u32 = 1, u2 gives direction cosines of the
u3
eigenvector, namely the rotation axis. Equation (20.119) applies to any orthogonal
matrix. A simple example is (20.3); a more complicated example can be seen in
(17.107). Confirmation is left for readers. We use this property soon below.
Let us consider infinitesimal transformations of rotation. As already implied in
(20.6), an infinitesimal rotation θ around the z-axis is described by

1 -θ 0
Rzθ = θ 1 0 : ð20:120Þ
0 0 1

Now, we newly introduce infinitesimal rotation operators as below:


20.2 Rotation Groups: SU(2) and SO(3) 863

Fig. 20.3 Infinitesimal


rotations ξ, η, and ζ around
z
the x-, y-, and z-axes,
respectively

y
O

1 0 0 1 0 η
Rxξ = 0 1 -ξ , Ryη = 0 1 0 ,
0 ξ 1 -η 0 1

1 -ζ 0
Rzζ = ζ 1 0 : ð20:121Þ
0 0 1

These operators represent infinitesimal rotations ξ, η, and ζ around the x-, y-, and
z-axes, respectively (see Fig. 20.3). Note that these operators commute one another
to the first order of infinitesimal quantities ξ, η, and ζ. That is, we have

Rxξ Ryη = Ryη Rxξ , Ryη Rzζ = Rzζ Ryη , Rxξ Ryη Rzζ = Rzζ Ryη Rxξ , etc:

with, e.g.,

1 -ζ η
Rxξ Ryη Rzζ = ζ 1 -ξ ð20:122Þ
-η ξ 1

to the first order. Notice that the order of operations Rxξ, Ryη, and Rzζ is disregarded.
Next, let us consider a successive transformation with a finite rotation angle ω that
follows the infinitesimal rotations. Readers may well ask why and for what purpose
we need to make such elaborate calculations. It is partly because when we were
dealing with finite groups in the previous chapters, group elements were finite and
864 20 Theory of Continuous Groups

relevant calculations were straightforward accordingly. But now we are thinking of


continuous groups that possess an infinite number of group elements and, hence, we
have to consider a “density” of group elements. In this respect, we are now thinking
of how the density of group elements in the parameter space is changed according to
the rotation of a finite angle ω. The results are used for various group calculations
including orthogonalization of characters.
Getting back to our subject, we further operate a finite rotation of an angle ω
around the z-axis subsequent to the aforementioned infinitesimal rotations. The
discussion developed below is due to Hamermesh [5]. We denote this rotation by
Rω. Since we are dealing with a spherically symmetric system, any rotation axis can
equivalently be chosen without loss of generality. Notice also that the rotations of the
same rotation angle belong to the same conjugacy class [see (17.106) and
Fig. 17.18]. Defining RxξRyηRzζ  RΔ of (20.122), the rotation R that combines RΔ
and subsequently occurring Rω is described by

1 -ζ η cos ω – sin ω 0
R = RΔ Rω = ζ 1 -ξ sin ω cos ω 0
-η ξ 1 0 0 1
ð20:123Þ
cos ω - ζ sin ω - sin ω - ζ cos ω η
= sin ω þ ζ cos ω cos ω - ζ sin ω - ξ :
ξ sin ω - η cos ω ξ cos ω þ η sin ω 1

Hence, we have

R32 - R23 = ξ cos ω þ η sin ω - ð - ξÞ = ξð1 þ cos ωÞ þ η sin ω,


R13 - R31 = η - ðξ sin ω - η cos ωÞ = ηð1 þ cos ωÞ - ξ sin ω,
R21 - R12 = sin ω þ ζ cos ω - ð - sin ω - ζ cos ωÞ = 2ðsin ω þ ζ cos ωÞ:
ð20:124Þ

Since (20.124) gives a relative directional ratio of the rotation axis, we should
normalize a vector whose components are given by (20.124) to get direction cosines
of the rotation axis. Note that the direction of the rotation axis related to R of
(20.123) should be close to Rω and that only R12 - R21 in (20.124) has a term
(i.e., 2 sin ω) lacking infinitesimal quantities ξ, η, ζ. Therefore, it suffices to
normalize R12 - R21 to seek the direction cosines to the first order. Hence, dividing
(20.124) by 2 sin ω, as components of the direction cosines we get

ξð1 þ cos ωÞ η ηð1 þ cos ωÞ ξ


þ , - , 1: ð20:125Þ
2 sin ω 2 2 sin ω 2

Meanwhile, combining (17.78) and (20.123), we can find a rotation angle ω′ for
R in (20.123). The trace χ of (20.123) is written as

χ = 1 þ 2ðcos ω - ζ sin ωÞ: ð20:126Þ

From (17.78) we have


20.2 Rotation Groups: SU(2) and SO(3) 865

χ = 1 þ 2 cos ω0 : ð20:127Þ

Equating (20.126) and (20.127), we obtain

cos ω0 = cos ω - ζ sin ω: ð20:128Þ

Approximating (20.128), we get

1 02 1
1- ω ≈ 1 - ω2 - ζω: ð20:129Þ
2 2

From (20.129), we have a following approximate expression:

ðω0 þ ωÞðω0 - ωÞ ≈ 2ωðω0 - ωÞ ≈ 2ζω:

Hence, we obtain

ω0 ≈ ω þ ζ: ð20:130Þ

Combining (20.125) and (20.130), we get their product to the first order such that

ξð1 þ cos ωÞ η ηð1 þ cos ωÞ ξ


ω þ ,ω - , ω þ ζ: ð20:131Þ
2 sin ω 2 2 sin ω 2

Since these quantities are products of individual direction cosines and the rotation
angle ω + ζ, we introduce variables ~x, ~y, and ~z as the x-, y-, and z-related quantities,
respectively. Namely, we have

ξð1 þ cos ωÞ η ηð1 þ cos ωÞ ξ


~x = ω þ , ~y = ω - , ~z = ω þ ζ: ð20:132Þ
2 sin ω 2 2 sin ω 2

To evaluate how the density of group elements is changed as a function of the


rotation angle ω, we are interested in variations of ~x, ~y, and ~z that depend on ξ, η, and
ζ. To this end, we calculate the Jacobian J as
866 20 Theory of Continuous Groups

∂~x ∂~x ∂~x ωð1 þ cos ωÞ ω


∂ξ ∂η ∂ζ 2 sin ω 2
0
∂ð~x, ~y, ~zÞ ∂~y ∂~y ∂~y
J= = = ω ωð1 þ cos ωÞ
∂ðξ, η, ζ Þ ∂ξ ∂η ∂ζ - 0
2 2 sin ω
∂~z ∂~z ∂~z
0 0 1
∂ξ ∂η ∂ζ
2
ω2 ð1 þ cos ωÞ ω2
= þ1 = ,
4 sin ω
2
4 sin 2 ω2
ð20:133Þ

where we used formulae of trigonometric functions with the last equality. Note that
(20.133) does not depend on the ± signs of ω. This is because J represents the
relative volume density ratio between two parameter spaces of the ~x~y~z -system and
ξηζ-system and this ratio is solely determined by the modulus of rotation angle.
Taking ω → 0, we have
0
ω2 ð ω2 Þ 1 2ω
lim J = lim ω = ωlim 0 = lim
ω→0 ω→0
4 sin 2 → 0 sin 2 ω 4 ω→0 ω 1 ω
2 2 2 sin  cos
2 2 2
0
ω ð ωÞ 1
= lim = lim 0 = lim = 1:
ω → 0 sin ω ω → 0 ð sin ωÞ ω → 0 cos ω

Thus, the relative volume density ratio in the limit of ω → 0 is 1, as expected. Let
dV = d~xd~yd~z and dΠ = dξdηdζ be volume elements of each coordinate system. Then
we have

dV = JdΠ: ð20:134Þ

We assume that

dV ρξηζ
J= = ,
dΠ ρ~x~y~z

where ρ~x~y~z and ρξηζ are a “number density” of group elements in each coordinate
system. We suppose that the infinitesimal rotations in the ξηζ-coordinate are
converted by a finite rotation Rω of (20.123) into the ~x~y~z -coordinate. In this way,
J can be viewed as an “expansion” coefficient as a function of rotation angle jωj.
Hence, we have
20.2 Rotation Groups: SU(2) and SO(3) 867

ρξηζ
dV = dΠ or ρ~x~y~z dV = ρξηζ dΠ: ð20:135Þ
ρ~x~y~z

Equation (20.135) implies that total (infinite) number of group elements that are
contained in the group SO(3) is invariant with respect to the transformation.
Let us calculate the total volume of SO(3). This can be measured such that

1
dΠ = dV: ð20:136Þ
J

Converting dV to the polar coordinate representation, we get

π
π 2π
4 sin 2 ω~2 2
Π½SOð3Þ = dΠ = ~
dω dθ dϕ ~ sin θ = 8π 2 ,
ω ð20:137Þ
0 0 ~2
ω
0

where we denote the total volume of SO(3) by Π[SO(3)]. Note that ω in (20.133) is
replaced with ω ~ j ω j so that the radial coordinate can be positive. As already
noted, (20.133) does not depend on the ± signs of ω. In other words, among the
angles α, β, and ω (usually γ is used instead of ω; see Sect. 17.4.2), ω is taken as -
π ≤ ω < π (instead of 0 ≤ ω < 2π) so that it can be conformed to the radial
coordinate.
We utilize (20.137) for calculation of various functions. Let f(ϕ, θ, ω) be an
arbitrary function on a parameter space. We can readily estimate a “mean value”
of f(ϕ, θ, ω) on that space. That is, we have

f ðϕ, θ, ωÞdΠ
f ðϕ, θ, ωÞ  , ð20:138Þ

where f ðϕ, θ, ωÞ represents a mean value of the function and it is given by

π
π 2π
~
ω
f ðϕ, θ, ωÞ = 4 ~
dω dθ dϕf ðϕ, θ, ωÞ sin 2 sin θ = dΠ: ð20:139Þ
0 0 2
0

There would be some inconvenience to use (20.139) because of the mixture of


variables ω and ω~ . Yet, that causes no problem if f(ϕ, θ, ω) is an even function with
respect to ω (see Sect. 20.2.6).
868 20 Theory of Continuous Groups

20.2.6 Irreducible Characters of SO(3) and Their


Orthogonality

To evaluate (20.139), let us get back to the irreducible representations D(l )(α, β, γ) of
Sect. 20.2.3. Also, we recall how successive coordinate transformations produce
changes in the orthogonal matrices of transformation (Sect. 17.4.2). Since the
parameters α, β, and γ uniquely specify individual rotation, different sets of α, β,
and γ (in this section we use ω instead of γ) cause different irreducible representa-
tions. A corresponding rotation matrix is given by (17.101). In fact, the said matrix is
a representation of the rotation. Keeping these points in mind, we discuss irreducible
representations and their characters particularly in relation to the orthogonality of the
irreducible characters.
Figure 20.4 depicts a geometrical arrangement of two rotation axes accompanied
by the same rotation angle ω. Let Rω and R0ω be such two rotations around the
rotation axis A and A′, respectively. Let Q be another rotation that transforms the
rotation axis A to A′. Then we have [1, 2]

R0ω = QRω Q - 1 : ð20:140Þ

This implies that Rω and R0ω belong to the same conjugacy class. To consider the
situation more clearly, let us view the two rotations Rω and R0ω from two different
coordinate systems, say some xyz-coordinate system and another x′y′z′-coordinate
system. Suppose also that A coincides with the z-axis and that A′ coincides with the
z′-axis. Meanwhile, if we describe Rω in reference to the xyz-system and R0ω in
reference to the x′y′z′-system, the representation of these rotations must be identical
in reference to the individual coordinate systems. Let this representation matrix be
Rω .
(i) If we describe Rω and R0ω with respect to the xyz-system, we have

-1
Rω = Rω , R0ω = Q - 1 Rω Q - 1 = QRω Q - 1 : ð20:141Þ

Namely, we reproduce

Fig. 20.4 Geometrical


arrangement of two rotation
axes A and A′ accompanied
by the same rotation angle ω
20.2 Rotation Groups: SU(2) and SO(3) 869

R0ω = QRω Q - 1 : ð20:140Þ

(ii) If we describe Rω and R0ω with respect to the x′y′z′-system, we have

R0ω = Rω , Rω = Q - 1 Rω Q: ð20:142Þ

Hence, again we reproduce (20.140). That is, the relation (20.140) is independent
of specific choices of the coordinate system.
Taking the representation indexed by l of Sect. 20.2.4 with respect to (20.140), we
have

DðlÞ R0ω = DðlÞ QRω Q - 1 = DðlÞ ðQÞDðlÞ ðRω ÞDðlÞ Q - 1


-1 ð20:143Þ
= DðlÞ ðQÞDðlÞ ðRω Þ DðlÞ ðQÞ ,

where with the last equality we used (18.6). Operating D(l )(Q) on (20.143) from the
right, we get

DðlÞ R0ω DðlÞ ðQÞ = DðlÞ ðQÞDðlÞ ðRω Þ: ð20:144Þ

Since both DðlÞ R0ω and D(l )(Rω) are irreducible, from (18.41) of Schur’s First
Lemma DðlÞ R0ω and D(l )(Rω) are equivalent. An infinite number of rotations
determined by the orientation of the rotation axis A that is accompanied by an
azimuthal angle α (0 ≤ α < 2π) and a zenithal angle β (0 ≤ β < π) (see
Fig. 17.18) form a conjugacy class with each specific ω shared. Thus, we have
classified various representations D(l )(Rω) according to ω and l. Meanwhile, we may
identify D(l )(Rω) with D(l )(α, β, γ) of Sect. 20.2.4.
In Sect. 20.2.2 we know that the spherical surface harmonics
l θ, ϕÞ ðm = - l, ⋯, 0, ⋯, lÞ constitute basis functions of D
ð (l )
Ym and span the repre-
sentation space. A dimension of the matrix (or the representation space) is 2l + 1.
0
Therefore, D(l ) and Dðl Þ for different l and l′ have a different dimension. From
0
(18.41) of Schur’s First Lemma, such D(l ) and Dðl Þ are inequivalent.
Returning to (20.139), let us evaluate a mean value of f ðϕ, θ, ωÞ. Let the trace of
DðlÞ R0ω and D(l )(Rω) be χ ðlÞ R0ω and χ (l )(Rω), respectively. Remembering (12.13),
the trace is invariant under a similarity transformation, and so χ ðlÞ R0ω = χ ðlÞ ðRω Þ.
Then, we put

χ ðlÞ ðωÞ  χ ðlÞ ðRω Þ = χ ðlÞ R0ω : ð20:145Þ

Consequently, it suffices to evaluate the trace χ (l )(ω) using D(l )(α, 0, 0) of (20.96)
whose representation matrix is given by (20.97). We have
870 20 Theory of Continuous Groups

l
e - ilω 1 - eiωð2lþ1Þ
χ ðlÞ ðωÞ = eimω =
m= -l
1 - eiω
- ilω
e ð1 - e - iω Þ 1 - eiωð2lþ1Þ
= , ð20:146Þ
ð1 - eiω Þð1 - e - iω Þ

where

½numerator of ð20:146Þ
= e - ilω þ eilω - eiωðlþ1Þ - e - iωðlþ1Þ = 2½cos lω - cosðl þ 1Þω
1 ω
= 4 sin l þ ω sin
2 2

and

ω
½denominator of ð20:146Þ = 2ð1 - cos ωÞ = 4 sin 2 :
2

Thus, we get

sin l þ 12 ω
χ ðlÞ ðωÞ = : ð20:147Þ
sin ω2

To adapt (18.76) to the present formulation, we rewrite it as

1
χ ðαÞ ðgÞ χ ðβÞ ðgÞ = δαβ : ð20:148Þ
n g

A summation over a finite number of group element n in (20.148) should be read


as integration in the case of the continuous groups. Instead of a finite number of
irreducible representations in a finite group, we are thinking of an infinite number of
irreducible representations with continuous groups. In (20.139) the denominator
dΠ (=8π 2) corresponds to n of (20.148). If f(ϕ, θ, ω) or f(α, β, ω) is an even
function with respect to ω (-π ≤ ω < π) as in the case of (20.147), the numerator
of (20.139) can be expressed in a form of

π π 2π ω
4 sin 2
f ðα, β, ωÞdΠ = dω dβ dαf ðα, β, ωÞ 2 ω2 sin β
ω2
π
0
π
0

0
ð20:149Þ
ω
= 4 dω dβ dαf ðα, β, ωÞ sin 2
sin β:
2
0 0 0
20.2 Rotation Groups: SU(2) and SO(3) 871

If, moreover, f(α, β, ω) depends only on ω, again as in the case of the character
described by (20.147), the calculation is further simplified such that

π
ω
f ðωÞdΠ = 16π dωf ðωÞ sin 2 : ð20:150Þ
2
0

0 
Replacing f(ω) with χ ðl Þ ðωÞ χ ðlÞ ðωÞ, we have

π
0  0  ω
χ ðl Þ ðωÞ χ ðlÞ ðωÞdΠ = 16π dω χ ðl Þ ðωÞ χ ðlÞ ðωÞ sin 2
2
0
π
sin l0 þ 12 ω sin l þ 12 ω ω
= 16π dω sin 2
sin ω2 sin ω2 2
0
π
1 1
= 16π dω sin l0 þ ω sin l þ ω
2 2
0
π

= 8π dω½cosðl0 - lÞω - cosðl0 þ l þ 1Þω: ð20:151Þ


0

Since l′ + l + 1 > 0, the integral of cos(l′ + l + 1)ω term vanishes. Only if l′ -


l = 0, the integral of cos(l′ - l )ω term does not vanish but takes a value of π.
Therefore, we have

0 
χ ðl Þ ðωÞ χ ðlÞ ðωÞdΠ = 8π 2 δl0 l : ð20:152Þ

0 
Finally, with χ ðl Þ ðωÞ χ ðlÞ ðωÞ defined as in (20.139), we get

0 
χ ðl Þ ðωÞ χ ðlÞ ðωÞ = δl0 l : ð20:153Þ

This relation gives the orthogonalization condition in concert with (20.148)


obtained in the case of a finite group.
In Sect. 18.6 we have shown that the number of inequivalent irreducible repre-
sentations of a finite group is well-defined and given by the number of conjugacy
classes of that group. This is clearly demonstrated in (18.144). In the case of the
continuous groups, however, the situation is somewhat complicated and we are
uncertain of the number of the inequivalent irreducible representations. We address
this problem as follows: Suppose that Ω(α, β, ω) would be an irreducible
872 20 Theory of Continuous Groups

representation that is inequivalent to individual D(l )(α, β, ω) (l = 0, 1, 2, ⋯). Let K


(ω) be a character of Ω(α, β, ω). Defining f(ω)  [χ (l )(ω)]K(ω), (20.150) reads as

π
ðlÞ
  ω
χ ðωÞ KðωÞdΠ = 16π dω χ ðlÞ ðωÞ KðωÞ sin 2 : ð20:154Þ
2
0

The orthogonalization condition (20.153) demands that the integral of (20.154)


vanishes. Similarly, we would have

π
ðlþ1Þ
  ω
χ ðωÞ KðωÞdΠ = 16π dω χ ðlþ1Þ ðωÞ KðωÞ sin 2 , ð20:155Þ
2
0

which must vanish as well. Subtracting RHS of (20.154) from RHS of (20.155) and
taking account of the fact that χ (l )(ω) is real [see (20.147)], we have

π
ω
dω χ ðlþ1Þ ðωÞ - χ ðlÞ ðωÞ KðωÞ sin 2 = 0: ð20:156Þ
2
0

Invoking a trigonometric formula of

aþb a-b
sin a - sin b = 2 cos sin
2 2

and applying it to (20.147), (20.156) can be rewritten as

π
ω
2 dω½cosðl þ 1ÞωKðωÞ sin 2 = 0 ðl = 0, 1, 2, ⋯Þ: ð20:157Þ
2
0

Putting l = 0 in LHS of (20.154) and from the assumption that the integral of
(20.154) vanishes, we also have

π
ω
dωKðωÞ sin 2 = 0, ð20:158Þ
2
0

where we used χ (0)(ω) = 1. Then, (20.157) and (20.158) are combined to give
20.3 Clebsch–Gordan Coefficients of Rotation Groups 873

π
ω
dωðcos lωÞKðωÞ sin 2 = 0 ðl = 0, 1, 2, ⋯Þ: ð20:159Þ
2
0

This implies that all the Fourier cosine coefficients of KðωÞ sin 2 ω2 vanish.
Considering that coslω (l = 0, 1, 2, ⋯) forms a complete orthonormal system in
the region [0, π], we must have

ω
KðωÞ sin 2  0: ð20:160Þ
2

Requiring K(ω) to be continuous, (20.160) is equivalent to K(ω)  0. This


implies that there is no other inequivalent irreducible representation than
D(l )(α, β, ω) (l = 0, 1, 2, ⋯). In other words, the representations
D(l )(α, β, ω) (l = 0, 1, 2, ⋯) constitute a complete set of irreducible representations.
The spherical surface harmonics Y m l ðθ, ϕÞ ðm = - l, ⋯, 0, ⋯, lÞ constitute basis
functions of D(l ) and span the representation space of D(l )(α, β, ω) accordingly.
In the above discussion, we assumed that K(ω) is an even function with respect to
ω as in the case of χ (l )(ω) in (20.147). Hence, KðωÞ sin 2 ω2 is an even function as
well. Therefore, we assumed that KðωÞ sin 2 ω2 can be expanded to the Fourier cosine
series.

20.3 Clebsch-Gordan Coefficients of Rotation Groups

In Sect. 18.8, we studied direct-product representations. We know that even though


D(α) and D(β) are both irreducible, D(α × β) is not necessarily irreducible. This is a
common feature for both finite and infinite groups. Moreover, the reducible unitary
representation can be expressed as a direct sum of irreducible representations such
that

DðαÞ ðgÞ  DðβÞ ðgÞ = qD


γ γ
ðγ Þ
ðgÞ: ð18:199Þ

In the infinite groups, typically the continuous groups, this situation often appears
when we deal with the addition of two angular momenta. Examples include the
addition of two (or more) orbital angular momenta and that of the orbital angular
momentum and spin angular momentum [6, 8].
Meanwhile, there is a set of basis vectors that spans the representation space with
respect to individual irreducible representations. There, we have a following ques-
tion: What are the basis vectors that span the total reduced representation space
described by (18.199)? An answer for this question is that these vectors must be
constructed by the basis vectors relevant to the individual irreducible representa-
tions. The Clebsch-Gordan coefficients appear as coefficients with respect to a
874 20 Theory of Continuous Groups

linear combination of the basis vectors associated with those irreducible


representations.
In a word, to find the Clebsch-Gordan coefficients is equivalent to the calcula-
tion of proper coefficients of the basis functions that span the representation space of
the direct-product groups. The relevant continuous groups which we are interested in
are SU(2) and SO(3).

20.3.1 Direct-Product of SU(2) and Clebsch-Gordan


Coefficients

In Sect. 20.2.6 we have explained the orthogonality of irreducible characters of


SO(3) by deriving (20.153). It takes advanced theory to show the orthogonality of
irreducible characters of SU(2). Fortunately, however, we have an expression similar
to (20.153) with SU(2) as well. Readers are encouraged to look up appropriate
literature for this [9]. On the basis of this important expression, we further develop
the representation theory of the continuous groups. Within this framework, we
calculate the Clebsch-Gordan coefficients.
As in the case of the previous section, the character of the irreducible represen-
tation D( j )(α, β, γ) of SU(2) is described only as a function of a rotation angle.
Similarly to (20.147), in SU(2) we get

j
sin j þ 12 ω
χ ðjÞ ðωÞ = eimω = , ð20:161Þ
m= -j
sin ω2

where χ ( j )(ω) is an irreducible character of D( j )(α, β, γ). It is because we can align


the rotation axis in the direction of, e.g., the z-axis and the representation matrix of a
rotation angle ω is typified by D( j )(ω, 0, 0) or D( j )(0, 0, ω). Or, we can evenly choose
any direction of the rotation axis at will.
We are interested in calculating angular momenta of a coupled system,
i.e. addition of angular momenta of that system [5, 6]. The word of “coupled system”
needs some explanation. This is because on the one hand we may deal with, e.g., a
sum of an orbital angular momentum and a spin angular momentum of a “single”
electron, but on the other hand we might calculate, e.g., a sum of two angular
momenta of “two” electrons. In either case, we will deal with the total angular
momenta of the coupled system.
Now, let us consider a direct-product group for this kind of problem. Suppose that
one system is characterized by a quantum number j1 and the other by j2. Here these
numbers are supposed to be those of (3.89), namely the highest positive number of
the generalized angular momentum in the z-direction. That is, the representation
matrices of rotation for systems 1 and 2 are assumed to be Dðj1 Þ ðω, 0, 0Þ and
Dðj2 Þ ðω, 0, 0Þ, respectively. Then, we are to deal with the direct-product representa-
tion Dðj1 Þ  Dðj2 Þ (see Sect. 18.8). As in (18.193), we describe
20.3 Clebsch–Gordan Coefficients of Rotation Groups 875

Dðj1 × j2 Þ ðωÞ = Dðj1 Þ ðωÞ  Dðj2 Þ ðωÞ: ð20:162Þ

In (20.162), the group is represented by a rotation angle ω, more strictly the


rotation operation of an angle ω. The quantum numbers j1 and j2 mean the irreduc-
ible representations of the rotations for systems 1 and 2, respectively. According to
(20.161), we have

χ ðj1 × j2 Þ ðωÞ = χ ðj1 Þ ðωÞχ ðj2 Þ ðωÞ,

where χ ðj1 × j2 Þ ðωÞ gives a character of the direct-product representation; see (18.198).
Furthermore, we have [5]

j1 j2
χ ðj1 × j2 Þ ðωÞ = eim1 ω eim2 ω
m1 = - j1 m2 = - j2
j1 j2
= eiðm1 þm2 Þω ð20:163Þ
m1 = - j1 m2 = - j2
j1 þj2 J j1 þj2
= eiMω = χ ðJ Þ ðωÞ,
J = jj1 - j2 j M = - J J = jj1 - j2 j

where a positive number J can be chosen from among

J = jj1 - j2 j, jj1 - j2 j þ 1, ⋯, j1 þ j2 : ð20:164Þ

Rewriting (20.163) more explicitly, we have

χ ðj1 × j2 Þ ðωÞ = χ ðjj1 - j2 jÞ ðωÞ þ χ ðjj1 - j2 jþ1Þ ðωÞ þ ⋯ þ χ ðj1 þj2 Þ ðωÞ: ð20:165Þ

If both j1 and j2 are integers, from (20.165) we can immediately derive an


important result. In light of (18.199) and (18.200), we multiply χ (k)(ω) on both
sides of (20.165) and then integrate it to get

χ ðkÞ ðωÞ χ ðj1 × j2 Þ ðωÞdΠ

= χ ðkÞ ðωÞ χ ðjj1 - j2 jÞ ðωÞdΠ þ χ ðkÞ ðωÞ χ ðjj1 - j2 jþ1Þ ðωÞdΠ þ ⋯


ð20:166Þ
þ χ ðkÞ ðωÞ χ ðj1 þj2 Þ ðωÞdΠ

= 8π 2 δk,jj1 - j2 j þ δk,jj1 - j2 jþ1 þ ⋯ þ δk,j1 þj2 ,

where k is zero or an integer and with the last equality we used (20.152). Dividing
both sides by dΠ, we get
876 20 Theory of Continuous Groups

χ ðkÞ ðωÞ χ ðj1 × j2 Þ ðωÞ = δk,jj1 - j2 j þ δk,jj1 - j2 jþ1 þ ⋯ þ δk,j1 þj2 , ð20:167Þ

where we used (20.153). Equation (20.166) implies that only if k is identical to one
out of jj1 - j2j, jj1 - j2 j + 1, ⋯, j1 + j2, χ ðkÞ ðωÞ χ ðj1 × j2 Þ ðωÞ does not vanish. If, for
example, we choose jj1 - j2j for k in (20.167), we have

χ ðjj1 - j2 jÞ ðωÞ χ ðj1 × j2 Þ ðωÞ = δjj1 - j2 j,jj1 - j2 j = 1: ð20:168Þ

This relation corresponds to (18.200) where a finite group is relevant. Note that
the volume of SO(3), i.e., dΠ = 8π 2 corresponds to the order n of a finite group in
(20.148). Considering (18.199) and (18.200) once again, it follows that in the above
case Dðjj1 - j2 jÞ takes place once and only once in the direct-product representation
Dðj1 × j2 Þ . Thus, we obtain the following important relation:

j1 þj2
Dðj1 × j2 Þ = Dðj1 Þ ðωÞ  Dðj2 Þ ðωÞ = J = jj1 - j2 j
DðJ Þ : ð20:169Þ

Any group (finite or infinite) is said to be simply reducible, if each irreducible


representation takes place at most once when the direct-product representation of
that group in question is reduced. Equation (20.169) clearly shows that SO(3) is
simply reducible. With respect to SU(2) we also have the same expression of
(20.169) [9].
In Sect. 18.8 we examined the direct-product representation of two (irreducible)
representations. Let D(μ)(g) and D(ν)(g) be two different irreducible representations
of the group G. Here G may be either a finite or infinite group. Note that in Sect. 18.8
we have focused on the finite groups and that now we are thinking of an infinite
group, typically SU(2) and SO(3). Let nμ and nν be a dimension of the representation
space of the irreducible representations μ and ν, respectively. We assume that each
representation space is spanned by following basis functions:

ðμÞ ðνÞ
ψj μ = 1, ⋯, nμ and ϕl ðν = 1, ⋯, nν Þ:

We know from Sect. 18.8 that nμnν new basis functions can be constructed using
ð μÞ ð ν Þ
the functions that are described by ψ j ϕl . Our next task will be to classify these
nμnν functions into those belonging to different irreducible representations. For this,
we need relation of (18.199). According to the custom, we rewrite it as follows:

DðμÞ ðωÞ  DðνÞ ðωÞ = γ


ðμνγ ÞDðγÞ ðωÞ, ð20:170Þ

where ω denotes a group element (i.e., rotation angle); (μνγ) is identical with qγ of
RHS of (18.199) and shows how many times the same representation γ takes place in
the direct-product representations. Naturally, we have
20.3 Clebsch–Gordan Coefficients of Rotation Groups 877

ðμνγ Þ = ðνμγ Þ:

Then, we get

ðμνγ Þnγ = nμ nν :
γ

Meanwhile, we should be able to constitute the functions that have transformation


properties the same as those of the irreducible representation γ. In other words, these
ðγ τ Þ
functions should make the basis functions belonging to γ. Let the function be Ψs γ .
Then, we have

ðγ τ Þ ð μÞ ð ν Þ
Ψs γ = μj, νljγ τγ s ψ j ϕl , ð20:171Þ
j, l

where τγ = 1, ⋯, (μνγ) and s stands for the number that designates the basis
functions of the irreducible representation γ; i.e., s = 1, ⋯, nγ . (We often designate
the basis functions with a negative integer. In that case, we choose that negative
integer for s; see Examples 20.1 and 20.2 shown later.) If γ takes place more than
once, we label γ as γ τγ . The coefficients with respect to the above linear combination

μj, νljγ τγ s

are called Clebsch-Gordan coefficients. Readers might well be bewildered by the


notations of abstract algebra, but Examples of Sect. 20.3.3 should greatly relieve
them of anxiety about it.
ðγ τ Þ
As we shall see later in this chapter, the functions Ψs γ are going to be
ðμÞ ðνÞ
normalized along with the product functions ψ j ϕl . The Clebsch-Gordan coeffi-
cients must form a unitary matrix accordingly. Notice that a transformation between
two orthonormal basis sets is unitary (see Chap. 14). According as there are nμnν
product functions as the basis vectors, the Clebsch-Gordan coefficients are associ-
ated with (nμnν, nμnν) square matrices. The said coefficients can be defined with any
group either finite or infinite. Or rather, difficulty in determining the Clebsch-
Gordan coefficients arises when τγ is more than one. This is because we should
take account of linear combinations described by

ðγ τ Þ
c γ τ γ Ψs γ
τγ

that belong to the irreducible representation γ. This would cause arbitrariness and
ambiguity [5]. Nevertheless, we do not need to get into further discussion about this
878 20 Theory of Continuous Groups

problem. From now on, we focus on the Clebsch-Gordan coefficients of SU(2) and
SO(3), a typical simply reducible group, namely τγ is at most one.
For SU(2) and SO(3) to be simply reducible saves us a lot of labor. The argument
for this is as follows: Equation (20.169) shows how the dimension of (2j1 + 1)
(2j2 + 1) for the representation space with the direct product of two representations is
redistributed to individual irreducible representations that take place only once. The
arithmetic based on this fact is as follows:

ð2j1 þ 1Þð2j2 þ 1Þ
1
= 2ðj1 - j2 Þð2j2 þ 1Þ þ 2  2j ð2j þ 1Þ þ ð2j2 þ 1Þ
2 2 2
= 2ðj1 - j2 Þð2j2 þ 1Þ þ 2ð0 þ 1 þ 2 þ ⋯ þ 2j2 Þ þ ð2j2 þ 1Þ ð20:172Þ
= ½2ðj1 - j2 Þ þ 0 þ 2½ðj1 - j2 Þ þ 1 þ 2½ðj1 - j2 Þ þ 2 þ ⋯
þ2½ðj1 - j2 Þ þ 2j2  þ ð2j2 þ 1Þ
= ½2ðj1 - j2 Þ þ 1 þ f2½ðj1 - j2 Þ þ 1 þ 1g þ ⋯ þ ½2ðj1 þ j2 Þ þ 1,

where with the second equality, we used the formula 1 þ 2 þ ⋯ þ n = 12 nðn þ 1Þ;
resulting numbers of 0, 1, 2, ⋯, and 2j2 are distributed to 2j2 + 1 terms of the
subsequent RHS of (20.172) each; the third term of 2j2 + 1 is distributed to 2j2 + 1
terms each as well on the rightmost hand side. Equation (20.172) is symmetric with
j1 and j2, but if we assume that j1 ≥ j2, each term of the rightmost hand side of
(20.172) is positive. Therefore, for convenience we assume j1 ≥ j2 without loss of
generality.
To clarify the situation, we list several tables (Tables 20.1, 20.2, and 20.3).
Tables 20.1 and 20.2 show specific cases, but Table 20.3 represents a general
case. In these tables the topmost layer has a diagram indicated by . This
diagram corresponds to the case of J = j j1 - j2j and contains different 2|j1 -
j2| + 1 quantum states. The lower layers include diagrams indicated by . These
diagrams correspond to the case of J = |j1 - j2| + 1, ⋯, j1 + j2. For each J, 2J + 1
functions (or quantum states) are included and labeled according to different num-
bers M  m1 + m2 = - J, - J + 1, ⋯, J. The rightmost column displays J = |j1 - j2|,
|j1 - j2| + 1, ⋯, j1 + j2 from the topmost row through the bottommost row.
Tables 20.1, 20.2, and 20.3 comprise 2j2 + 1 layers, where total (2j1 + 1)(2j2 + 1)
functions that form basis vectors of Dðj1 Þ ðωÞ  Dðj2 Þ ðωÞ are redistributed according
to the M number. The same numbers that appear from the topmost row through the
bottommost row are marked red and connected with dotted aqua lines. These lines
form a parallelogram together with the top and bottom horizontal lines as shown.
The upper and lower sides of the parallelogram are 2 j j1 - j2j in width. The
parallelogram gets flattened with decreasing jj1 - j2j. If j1 = j2, the parallelogram
coalesces into a line (Table 20.1). Regarding the notations M and J, see Sects. 20.3.2
and 20.3.3 (vide infra).
Bearing in mind the above remarks, we focus on the coupling of two angular
momenta as a typical example. Within this framework we deal with the Clebsch-
Gordan coefficients with regard to SU(2). We develop the discussion according to
20.3 Clebsch–Gordan Coefficients of Rotation Groups 879

Table 20.1 Summation of angular momenta. J = 0,


1, or 2; M  - J, - J + 1, ⋯, J

1
−1 0 1 (= 1)
2
−1 −2 −1 0
0 −1 0 1
1 (= 2) 0 1 2

Regarding a dotted aqua line, see text

Table 20.2 Summation of angular momenta. J = 1, 2, or 3; M  - J, - J + 1,


⋯, J

1
−2 −1 0 1 2 (= 1)
2
−1 −3 −2 −1 0 1
0 −2 −1 0 1 2
1 (= 2) −1 0 1 2 3

Table 20.3 Summation of angular momenta in a general case

1
−1 − 1+1 ⋯ 2 2 − 1 ⋯ 1 −2 2 ⋯ 1 −1 1
2
− 2 −1− 2 −1− 2 +1 ⋯ 2 − 1 ⋯ 1 −3 2 ⋯ 1 − 2 −1 1 − 2
− 2+1 −1− 2 +1 −1− 2 +2 ⋯ 2 − 1 +1 ⋯ 1 −3 2+1 ⋯ 1 − 2 1 − 2 +1
⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯
2 −1 −1+ 2 −1 −1+ 2 ⋯ 32− 1 −1 ⋯ 1 − 2 −1 ⋯ 1 + 2 −2 1 + 2 −1
2 −1+ 2 −1+ 2 +1 ⋯ 3 2 − 1 ⋯ 1 − 2 ⋯ 1 + 2 −1 1 + 2

We suppose j1 > 2j2. J = |j1 - j2|, ⋯, j1 + j2; M  - J, - J + 1, ⋯, J

literature [5]. The relevant approach helps address a more complicated situation
where the coupling of three or more angular momenta including j - j coupling and
L - S coupling is responsible [6].
From (20.47) we know that fm forms a basis set that spans a (2j + 1)-dimensional
representation space associated with D( j ) such that
880 20 Theory of Continuous Groups

u jþm v j - m
fm = ðm = - j; - j þ 1; ⋯; j - 1; jÞ: ð20:47Þ
ðj þ mÞ!ðj - mÞ!

The functions fm are transformed according to Rða, bÞ so that we have

ðjÞ
Rða, bÞðf m Þ  f m0 Dm0 m ða, bÞ, ð20:48Þ
m0

where j can be either an integer or a half-odd-integer. We henceforth work on fm of


(20.47).

20.3.2 Calculation Procedures of Clebsch-Gordan


Coefficients

In the previous section we have described how systematically the quantum states are
made up of two angular momenta. We express those states in terms of a linear
combination of the basis functions related to the direct-product representations of
(20.169). To determine those coefficients, we follow calculation procedures due to
Hamermesh [5]. The calculation procedures are rather lengthy, and so we summarize
each item separately.
1. Invariant quantities AJ and BJ:
We start with seeking invariant quantities under the unitary transformation U of
(20.35) expressed as

U= a
- b
b
a
with jaj2 þ jbj2 = 1: ð20:35Þ

As in (20.46) let us consider a combination of variables u  (u2 u1) and v  (v2 v1)
that are transformed such that

u02 u01 = ðu2 u1 Þ a


- b
b
a
, v02 v01 = ðv2 v1 Þ a
- b
b
a
: ð20:173Þ

For a shorthand notation we have

u0 = um, v0 = vm, ð20:174Þ

where we define u′, v′, m as


20.3 Clebsch–Gordan Coefficients of Rotation Groups 881

u0  u02 u01 , v0  v02 v01 , m  a


- b
b
a
: ð20:175Þ

As in (20.47) we also define the following functions as

u1 j1 þm1 u2 j1 - m1
ψ jm1 1 = ðm1 = - j1 , - j1 þ 1, ⋯, j1 - 1, j1 Þ,
ð j1 þ m1 Þ!ð j1 - m1 Þ!
v1 j2 þm2 v2 j2 - m2
ϕjm2 2 = ðm2 = - j2 ; - j2 þ 1; ⋯; j2 - 1; j2 Þ: ð20:176Þ
ð j2 þ m2 Þ!ð j2 - m2 Þ!

Meanwhile, we also consider a combination of variables x  (x2 x1) that are


transformed as

a b
x02 x01 = ðx2 x1 Þ -b a
: ð20:177Þ

For a shorthand notation we have

x0 = xm , ð20:178Þ

where we have

a b
x0  x02 x01 , x  ðx2 x1 Þ, m = -b a
: ð20:179Þ

The unitary matrix m is said to be a complex conjugate matrix of m. We say that


in (20.174) u and v are transformed covariantly and that in (20.178) x is transformed
contravariantly. Defining a unitary operator g as

-1
g 0
1 0
ð20:180Þ

and operating g on both sides of (20.174), we have

u0 g = u gg{ mg = ug g{ mg = ugm : ð20:181Þ

Rewriting (20.181) as

u0 g = ðugÞm , ð20:182Þ

we find that ug is transformed contravariantly. Taking transposition of (20.178), we


have
882 20 Theory of Continuous Groups

x0 = ðm ÞT xT = m{ xT :
T

Then, we get

u0 x0 = ðumÞ m{ xT = u mm{ xT = uxT = u2 x2 þ u1 x1 = uxT ,


T
ð20:183Þ

where with the third equality we used the unitarity of m. This implies that uxT is an
invariant quantity under the unitary transformation by m. We have

v0 x0 = vxT ,
T
ð20:184Þ

meaning that vxT is an invariant as well.


In a similar manner, we have

u01 v02 - u02 v01 = v0 ðu0 gÞT = v0 gT u0 T = vmgT mT uT = vgT uT = vðugÞT


0 1 u2 ð20:185Þ
= ð v2 v1 Þ = u1 v 2 - u2 v 1 :
-1 0 u1

Thus, v(ug)T (or u1v2 - u2v1) is another invariant under the unitary transformation
by m. In the above discussion we can view (20.183)–(20.185) as a quadratic form of
two variables (see Sect. 14.5).
Using these invariants, we define other important invariants AJ and BJ as follows
[5]:

AJ  ðu1 v2 - u2 v1 Þj1 þj2 - J ðu2 x2 þ u1 x1 Þj1 - j2 þJ ðv2 x2 þ v1 x1 Þj2 - j1 þJ , ð20:186Þ


BJ  ðu2 x2 þ u1 x1 Þ2J : ð20:187Þ

The polynomial AJ is of degree 2j1 with respect to the covariant variables u1 and
u2 and is of degree 2j2 with v1 and v2. The degree of the contravariant variables x1
and x2 is 2J. Expanding AJ in powers of x1 and x2, we get

J
AJ = M = -J
W JM X JM , ð20:188Þ

where X JM is defined as

x1 JþM x2 J - M
X JM  ðM = - J, - J þ 1, ⋯, J - 1, J Þ ð20:189Þ
ðJ þ M Þ!ðJ - M Þ!

and coefficients W JM are polynomials of u1, u2, v1, and v2. The coefficients W JM will
be determined soon in combination with X JM .
Meanwhile, we have
20.3 Clebsch–Gordan Coefficients of Rotation Groups 883

2J
ð2J Þ!
BJ = ðu1 x1 Þk ðu2 x2 Þ2J - k
k=0
k! ð 2J - k Þ!
J
1
= ð2J Þ! ðu1 x1 ÞJþM ðu2 x2 ÞJ - M
M= -J
ðJ þ M Þ!ð J - M Þ!
ð20:190Þ
J
1
= ð2J Þ! u JþM u2 J - M x1 JþM x2 J - M
M= -J
ðJ þ M Þ!ðJ - M Þ! 1
J
= ð2J Þ! ΦJM X JM ,
M= -J

where ΦMJ is defined as

u1 JþM u2 J - M
ΦMJ  ðM = - J, - J þ 1, ⋯, J - 1, J Þ: ð20:191Þ
ðJ þ M Þ!ðJ - M Þ!

Implications of the above calculations of abstract algebra are as follows: In Sect.


ðjÞ
20.2.2 we determined the representation matrices Dm0 m of SU(2) using (20.47) and
(20.48). We see that the functions fm (m = - j, -j + 1, ⋯, j - 1, j) of (20.47) are
transformed as basis vectors by the unitary transformation D( j ). Meanwhile, the
functions ΦJM ðM = - J, - J þ 1, ⋯, J - 1, J Þ of (20.191) are transformed as the
basis vectors by the unitary transformation D(J ). Here let us compare (20.188) and
(20.190). Then, we see that whereas ΦJM associates X JM with the invariant BJ, W JM
associates X JM with the invariant AJ. Thus, W JM plays the same role as ΦJM and, hence,
we expect that W JM is eligible for the basis vectors for D(J ) as well.
2. Binomial expansion of AJ:
To calculate W JM , using the binomial theorem we expand AJ such that [5]

j1 þj2 - J j1 þ j2 - J
ðu1 v2 - u2 v1 Þj1 þj2 - J = ð- 1Þλ ðu1 v2 Þj1 þj2 - J - λ ðu2 v1 Þλ ,
λ=0 λ
j1 - j2 þJ j1 - j2 þ J
ðu2 x2 þ u1 x1 Þj1 - j2 þJ = ðu1 x1 Þj1 - j2 þJ - μ ðu2 x2 Þμ ,
μ=0 μ
j1 - j2 þJ j2 - j1 þ J
ðv2 x2 þ v1 x1 Þj2 - j1 þJ = ðv1 x1 Þj2 - j1 þJ - ν ðv2 x2 Þν :
μ=0 ν
ð20:192Þ

Therefore, we have
884 20 Theory of Continuous Groups

j1 þ j2 - J j1 - j2 þ J j2 - j1 þ J
AJ = ð- 1Þλ
λ, μ , ν λ μ ν ð20:193Þ
× u1 2j1 - λ - μ u2 λþμ v1 j2 - j1 þJ - νþλ v2 j1 þj2 - J - λþν x1 2J - μ - ν x2 μþν :

Introducing new summation variables m1 = j1 - λ - μ and m2 = J - j1 + λ - ν,


we obtain

j1 þ j2 - J j1 - j2 þ J j2 - j1 þ J
AJ = ð- 1Þλ
λ, μ , ν λ j1 - λ - m1 j2 - λ þ m 2 ð20:194Þ
j1 þm1 j1 - m1
× u1 u2 v1 j2 þm2 v2 j2 - m2 x1 Jþm1 þm2 x2 J - m1 - m2 ,

j2 - j1 þJ
where to derive j2 - λþm2
in RHS we used

a
b
= a
a-b
: ð20:195Þ

Further using (20.176) and (20.189), we modify AJ such that

ðj1 þ j2 - J Þ!ðj1 - j2 þ J Þ!ðj2 - j1 þ J Þ!


AJ = ð - 1Þλ
m1 ,m2 , λ λ!ðj1 þ j2 - J - λÞ!ðj1 - λ - m1 Þ!
½ðj1 þ m1 Þ!ðj1 - m1 Þ!ðj2 þ m2 Þ!ðj2 - m2 Þ!ðJ þ m1 þ m2 Þ!ðJ - m1 - m2 Þ!1=2 j1 j2 J
× ψ m1 ϕm2 X m1 þm2 :
ðJ - j2 þ λ þ m1 Þ!ðj2 - λ þ m2 Þ!ðJ - j1 þ λ - m2 Þ!

Setting m1 + m2 = M, we have

ðj1 þ j2 - J Þ!ðj1 - j2 þ J Þ!ðj2 - j1 þ J Þ!


AJ = ð - 1Þλ
m 1 , m2 , λ λ!ðj1 þ j2 - J - λÞ!ðj1 - λ - m1 Þ!

½ðj1 þ m1 Þ!ðj1 - m1 Þ!ðj2 þ m2 Þ!ðj2 - m2 Þ!ðJ þ M Þ!ðJ - M Þ!1=2 j1 j2 J


× ψ m 1 ϕm 2 X M :
ðJ - j2 þ λ þ m1 Þ!ðj2 - λ þ m2 Þ!ðJ - j1 þ λ - m2 Þ!
ð20:196Þ

In this way, we get W JM for (20.188) expressed as

W MJ = ðj1 þ j2 - J Þ!ðj1 - j2 þ J Þ!ðj2 - j1 þ J Þ!


× C mJ 1 , m2 ψ mj11 ϕmj22 , ð20:197Þ
m1 , m2 , m1 þm2 = M

where we define C Jm1 ,m2 as


20.3 Clebsch–Gordan Coefficients of Rotation Groups 885

ð-1Þλ ½ðj1 þ m1 Þ!ðj1 - m1 Þ!ðj2 þ m2 Þ!ðj2 - m2 Þ!ðJ þ M Þ!ðJ - M Þ!1=2


CJm1 ,m2  :
λ
λ!ðj1 þ j2 - J -λÞ!ðj1 -λ - m1 Þ!ðJ - j2 þ λ þ m1 Þ!ðj2 - λ þ m2 Þ!ðJ -j1 þ λ - m2 Þ!
ð20:198Þ

From (20.197), we assume that the functions


W JM ðM = - J, - J þ 1, ⋯, J - 1, J Þ are suited for forming basis vectors of D(J ).
Then, any basis set ΛJM should be connected to W JM via unitary transformation such
that

ΛJM = UW JM ,

where U is a unitary matrix. Then, we get W JM = U { ΛJM . What we have to do is only


to normalize W JM . If we carefully look at the functional form of (20.197), we become
aware that we only have to adjust the first factor of (20.197) that is a function of
only J.
Hence, as the proper normalized functions we expect to have

ΨJM = CðJ ÞW JM = CðJ ÞU{ ΛJM , ð20:199Þ

where C(J ) is a constant that depends only on J with given numbers j1 and j2. An
implication of (20.199) is that we can get suitably normalized functions ΨJM using
arbitrary basis vectors ΛJM .
Putting

ρJ  C ðJ Þ ðj1 þ j2 - J Þ!ðj1 - j2 þ J Þ!ðj2 - j1 þ J Þ!, ð20:200Þ

we get

ΨJM = ρJ C Jm1 ,m2 ψ jm1 1 ϕjm2 2 : ð20:201Þ


m1 , m2 , m1 þm2 = M

The combined states are completely decided by J and M. If we fix J at a certain


number, (20.201) is expressed as a linear combination of different functions ψ jm1 1 ϕjm2 2
with varying m1 and m2 but fixed m1 + m2 = M. Returning back to Tables 20.1, 20.2,
and 20.3, the same M can be found in at most 2j2 + 1 places. This implies that ΨJM of
(20.201) has at most 2j2 + 1 terms on condition that j1 ≥ j2. In (20.201) the Clebsch-
Gordan coefficients are given by

ρJ CJm1 , m2 :

The descriptions of the Clebsch-Gordan coefficients are different from literature


to literature [5, 6]. We adopt the description due to Hamermesh [5] such that
886 20 Theory of Continuous Groups

ðj1 m1 j2 m2 jJM Þ  ρJ CJm1 ,m2 ,

ΨJM = ðj1 m1 j2 m2 jJM Þψ jm1 1 ϕjm2 2 : ð20:202Þ


m1 , m2 , m1 þm2 = M

3. Normalization of ΨJM :
Normalization condition for ΨJM of (20.201) or (20.202) is given by

2
jρJ j2 C Jm1 ,m2 = 1: ð20:203Þ
m1 , m2 , m1 þm2 = M

To obtain (20.203), we assume the following normalization condition for the


basis vectors ψ jm1 1 ϕjm2 2 that span the representation space. That is, for an appropriate
pair of complex variables z1 and z2, we define their inner product as follows [9]:

-l k m-k
zl1 zm
2 jz1 z2 = l!ðm - lÞ! k!ðm - k Þ!δlk

i.e.,

-l -k
zl1 zm zk1 zm
2
j 2
= δlk : ð20:204Þ
l!ðm - lÞ! k!ðm - kÞ!

Equation (20.204) implies that m + 1 monomials of m-degree with respect to z1


and z2 constitute the orthonormal basis.
Since ρJ defined in (20.200) is independent of M, we can evaluate it by choosing
M conveniently in (20.203). Setting M (=m1 + m2) = J and substituting it in J -
j1 + λ - m2 of (20.198), we obtain (m1 + m2) - j1 + λ - m2 = λ + m1 - j1. So far as
we are dealing with integers, an inside of the factorial must be non-negative, and so
we have

λ þ m 1 - j1 ≥ 0 or λ ≥ j1 - m1 : ð20:205Þ

Meanwhile, looking at another factorial ( j1 - λ - m1)!, we get

λ ≤ j1 - m 1 : ð20:206Þ

From the above equations, we obtain only one choice for λ such that [5]

λ = j1 - m 1 : ð20:207Þ

Notice that j1 - m1 is an integer, whichever j1 takes out of an integer or a half-odd-


integer. Then, we get
20.3 Clebsch–Gordan Coefficients of Rotation Groups 887

ð2J Þ! ðj1 þ m1 Þ!ðj2 þ m2 Þ!


CJm1 ,m2 = ð- 1Þj1 - m1 
ðJ - j2 þ j1 Þ!ðJ - j1 þ j2 Þ! ðj1 - m1 Þ!ðj2 - m2 Þ!

ð2J Þ! j1 þ m1 j2 þ m 2
= ð- 1Þj1 - m1  :
ðJ - j2 þ j1 Þ!ðJ - j1 þ j2 Þ! j2 - m2 j1 - m 1
ð20:208Þ

j1 þ m1 j2 þ m2
To confirm (20.208), calculate and use, e.g.,
j2 - m2 j1 - m1

j1 - j2 þ m1 þ m2 = j1 - j2 þ M = j1 - j2 þ J: ð20:209Þ

Thus, we get

2
C Jm1 ,m2
m1 , m2 , m1 þm2 = M

j1 þ m 1 j2 þ m2 ð20:210Þ
ð2J Þ!
= :
ðJ - j2 þ j1 Þ!ðJ - j1 þ j2 Þ! m , m , m þm = M j2 - m2 j1 - m1
1 2 1 2

To evaluate the sum, we use the following relations [4, 5, 10]:

ΓðzÞΓð1 - zÞ = π=ðsin πzÞ: ð20:211Þ

Replacing z with x + 1 in (20.211), we have

Γðx þ 1ÞΓð- xÞ = π=½sin π ðx þ 1Þ: ð20:212Þ

Meanwhile, using gamma functions, we have

x x! Γ ð x þ 1Þ Γðx þ 1ÞΓð - xÞ Γðy - xÞ


= = = 
y y!ðx - yÞ! y!Γðx - y þ 1Þ Γðx - y þ 1ÞΓðy - xÞ y!Γð - xÞ
π sin π ðx - y þ 1Þ Γðy - xÞ sin π ðx - y þ 1Þ Γðy - xÞ
=   = 
sin π ðx þ 1Þ π y!Γð - xÞ sin π ðx þ 1Þ y!Γð - xÞ
Γðy - xÞ
= ð - 1Þ y  :
y!Γð - xÞ
ð20:213Þ

Replacing x in (20.213) with y - x - 1, we have


888 20 Theory of Continuous Groups

y-x-1 Γðx þ 1Þ
= ð- 1Þy  = ð- 1Þy  x
,
y y!Γðx - y þ 1Þ y

where with the last equality we used (20.213). That is, we get

y-x-1
x
y
= ð- 1Þy y
: ð20:214Þ

Applying (20.214) to (20.210), we get

2
C Jm1 ,m2
m1 , m2 , m1 þm2 ¼M
ð2J Þ!
¼
ðJ - j2 þ j1 Þ!ðJ - j1 þ j2 Þ!
j2 - m2 - j1 - m1 - 1
× ð- 1Þj2 - m2 ð- 1Þj1 - m1
m1 , m2 , m1 þm2 ¼M j2 - m2
j1 - m1 - j2 - m2 - 1

j1 - m 1

ð- 1Þj1 þj2 - J ð2J Þ! j2 - j1 - J - 1 j1 - j2 - J - 1


¼ :
ðJ - j2 þ j1 Þ!ðJ - j1 þ j2 Þ! m , m , m þm j2 - m2 j1 - m 1
1 2 1 2 ¼M

ð20:215Þ

Meanwhile, from the binomial theorem, we have

r α s β r s αþβ
ð1 þ xÞr ð1 þ xÞs = x x = x
α
α β
β α, β
α β
ð20:216Þ
rþs rþs γ
= ð 1 þ xÞ = x:
γ
γ

Comparing the coefficients of the γ-th power monomials of the last three sides,
we get

rþs
γ
= r
α
s
β
: ð20:217Þ
α, β, αþβ = γ

Applying (20.217) to (20.215) and employing (20.214) once again, we obtain


20.3 Clebsch–Gordan Coefficients of Rotation Groups 889

2
C Jm1 ,m2
m1 , m2 , m1 þm2 ¼M

ð- 1Þj1 þj2 - J ð2J Þ! - 2J - 2


¼
ðJ - j2 þ j1 Þ!ðJ - j1 þ j2 Þ! j1 þ j2 - J
ð- 1Þ j1 þj2 - J
ð- 1Þ j1 þj2 - J
ð2J Þ! j1 þ j2 þ J þ 1
¼
ðJ - j2 þ j1 Þ!ðJ - j1 þ j2 Þ! j1 þ j2 - J ð20:218Þ
ð2J Þ! j1 þ j2 þ J þ 1
¼
ðJ - j2 þ j1 Þ!ðJ - j1 þ j2 Þ! j1 þ j2 - J
ð2J Þ! ðj1 þ j2 þ J þ 1Þ!
¼ 
ðJ - j2 þ j1 Þ!ðJ - j1 þ j2 Þ! ðj1 þ j2 - J Þ!ð2J þ 1Þ!
ðj1 þ j2 þ J þ 1Þ!
¼ :
ð2J þ 1ÞðJ - j2 þ j1 Þ!ðJ - j1 þ j2 Þ!ðj1 þ j2 - J Þ!

Inserting (20.218) into (20.203), as a positive number ρJ we have

ð2J þ 1ÞðJ - j2 þ j1 Þ!ðJ - j1 þ j2 Þ!ðj1 þ j2 - J Þ!


ρJ = : ð20:219Þ
ðj1 þ j2 þ J þ 1Þ!

From (20.200), we find

2J þ 1
C ðJ Þ = : ð20:220Þ
ðJ - j2 þ j1 Þ!ðJ - j1 þ j2 Þ!ðj1 þ j2 - J Þ!ðj1 þ j2 þ J þ 1Þ!

Thus, as the Clebsch-Gordan coefficients ( j1m1j2m2| JM), at last we get

ðj1 m1 j2 m2 jJM Þ  ρJ CJm1 ,m2

ð2J þ 1ÞðJ - j2 þ j1 Þ!ðJ - j1 þ j2 Þ!ðj1 þ j2 - J Þ!


¼
ðj1 þ j2 þ J þ 1Þ!

ð- 1Þλ ½ðj1 þ m1 Þ!ðj1 - m1 Þ!ðj2 þ m2 Þ!ðj2 - m2 Þ!ðJ þ M Þ!ðJ - M Þ!1=2


× :
λ
λ!ðj1 þ j2 - J - λÞ!ðj1 - λ - m1 Þ!ðJ - j2 þ λ þ m1 Þ!ðj2 - λ þ m2 Þ!ðJ - j1 þ λ - m2 Þ!
ð20:221Þ

During the course of the above calculations, we encountered, e.g., Γ(-x) and the
factorial involving a negative integer. Under ordinary circumstances, however, such
things must be avoided. Nonetheless, it would be convenient for practical use, if we
properly recognize that we first choose a number, e.g., -x close to (but not identical
with) a negative integer and finally take the limit of -x at a negative integer after the
related calculations have been finished.
890 20 Theory of Continuous Groups

Choosing, e.g., M (=m1 + m2) = - J in (20.203) instead of setting M = J in the


above, we will see that only the factor of λ = j2 + m2 in (20.198) survives. Yet we get
the same result as the above. The confirmation is left for readers.

20.3.3 Examples of Calculation of Clebsch-Gordan


Coefficients

Simple examples mentioned below help us understand the calculation procedures of


the Clebsch-Gordan coefficients and their implications.
Example 20.1 As a simplest example, we examine a case of D(1/2)  D(1/2). At the
same time, it is an example of the symmetric and antisymmetric representations
discussed in Sect. 18.9. Taking two sets of complex valuables u  (u2 u1) and
v  (v2 v1) and rewriting them according to (20.176), we have

1=2 1=2 1=2 1=2


ψ - 1=2 = u2 , ψ 1=2 = u1 , ϕ - 1=2 = v2 , ϕ1=2 = v1 ,

so that we can get the basis functions of the direct-product representation described
by

ðu2 v2 u1 v2 u2 v1 u1 v1 Þ:

Notice that we have four functions above; i.e., (2j1 + 1)(2j2 + 1) = 4, j1 = j2 = 1/2.
Using (20.221) we can determine the proper basis functions with respect to ΨJM of
(20.202). For example, for Ψ1- 1 we have only one choice to take m1 = m2 = - 12 to
get m1 + m2 = M = - 1. In (20.221) we have no other choice but to take λ = 0. In
this way, we have

1=2 1=2
Ψ1- 1 = u2 v2 = ψ - 1=2 ϕ - 1=2 :

Similarly, we get

1=2 1=2
Ψ11 = u1 v1 = ψ 1=2 ϕ1=2 :

With two other functions, using (20.221) we obtain

1 1 1=2 1=2 1=2 1=2


Ψ10 = p ðu2 v1 þ u1 v2 Þ = p ψ - 1=2 ϕ1=2 þ ψ 1=2 ϕ - 1=2 ,
2 2
20.3 Clebsch–Gordan Coefficients of Rotation Groups 891

1 1 1=2 1=2 1=2 1=2


Ψ00 = p ðu1 v2 - u2 v1 Þ = p ψ 1=2 ϕ - 1=2 - ψ - 1=2 ϕ1=2 :
2 2

We put the above results in a simple matrix form representing the basis vector
transformation such that

1 0 0 0
1 1
0 p 0 p
2 2
ð u2 v 2 u1 v 2 u1 v 1 u2 v 1 Þ = Ψ1- 1 Ψ10 Ψ11 Ψ00 : ð20:222Þ
0 0 1 0
1 1
p -p
0 2 0 2

In this way, (20.222) shows how the basis functions that possess a proper
irreducible representation and span the direct-product representation space are
constructed using the original product functions. The Clebsch-Gordan coefficients
play a role as a linker (i.e., a unitary operator) of those two sets of functions. In this
respect these coefficients resemble those appearing in the symmetry-adapted linear
combinations (SALCs) mentioned in Sects. 18.2 and 19.3.
Expressing the unitary transformation according to (20.173) and choosing
Ψ - 1 Ψ10 Ψ11 Ψ00 as the basis vectors, we describe the transformation in a similar
1

manner to (20.48) such that

Rða, bÞ Ψ1- 1 Ψ10 Ψ11 Ψ00 = Ψ1- 1 Ψ10 Ψ11 Ψ00 Dð1=2Þ  Dð1=2Þ : ð20:223Þ

Then, as the matrix representation of D(1/2)  D(1/2) we get

2
p 2
pa  2ab

pb  0

- 2ab aa -
p  bb 2a b 0
Dð1=2Þ  Dð1=2Þ =  2  : ð20:224Þ
ðb Þ - 2a b ð a Þ 2 0
0 0 0 1

Notice that (20.224) is a block matrix, namely (20.224) has been reduced according
to the symmetric and antisymmetric representations. Symbolically writing (20.223)
and (20.224), we have

Dð1=2Þ  Dð1=2Þ = Dð1Þ ⨁Dð0Þ :

The block matrix of (20.224) is unitary, and so the (3, 3) submatrix and (1, 1)
submatrix (i.e., the number of 1) are unitary accordingly. To show it, use the
conditions (20.35) and (20.173). The confirmation is left for readers. The (3, 3)
submatrix is a symmetric representation, whose representation space Ψ1- 1 Ψ10 Ψ11
892 20 Theory of Continuous Groups

span. Note that these functions are symmetric with the exchange m1 $ m2. Or, we
may think that these functions are symmetric with the exchange ψ $ ϕ. Though
trivial, the basis function Ψ00 spans the antisymmetric representation space. The
corresponding submatrix is merely the number 1. This is directly related to the fact
that v(ug)T (or u1v2 - u2v1) of (20.185) is an invariant. The function Ψ00 changes the
sign (i.e., antisymmetric) with the exchange m1 $ m2 or ψ $ ϕ.
Once again, readers might well ask why we need to work out such an elaborate
means to pick up a small pebble. With increasing dimension of the vector space,
however, to seek an appropriate set of proper basis functions becomes increasingly
as difficult as lifting a huge rock. Though simple, a next example gives us a feel for
it. Under such situations, a projection operator is an indispensable tool to address the
problems.
Example 20.2 We examine a direct product of D(1)  D(1), where j1 = j2 = 1. This
is another example of the symmetric and antisymmetric representations. Taking two
sets of complex valuables u  (u2 u1) and v  (v2 v1) and rewriting them according to
(20.176) again, we have nine, i.e., (2j1 + 1)(2j2 + 1) product functions expressed as

ψ 1- 1 ϕ1- 1 , ψ 10 ϕ1- 1 , ψ 11 ϕ1- 1 , ψ 1- 1 ϕ10 , ψ 10 ϕ10 , ψ 11 ϕ10 , ψ 1- 1 ϕ11 , ψ 10 ϕ11 , ψ 11 ϕ11 : ð20:225Þ

These product functions form the basis functions of the direct-product represen-
tation. Consequently, we will be able to construct proper eigenfunctions by means of
linear combinations of these functions. This method is again related to that based on
SALCs discussed in Chap. 19. In the present case, it is accomplished by finding the
Clebsch-Gordan coefficients described by (20.221).
As implied in Table 20.1, we need to determine the individual coefficients of nine
functions of (20.225) with respect to the following functions that are constituted by
linear combinations of the above nine functions:

Ψ2- 2 , Ψ2- 1 , Ψ20 , Ψ21 , Ψ22 , Ψ1- 1 , Ψ10 , Ψ11 , Ψ00 :

(i) With the first five functions, we have J = j1 + j2 = 2. In this case, the
determination procedures of the coefficients are pretty much simplified. Substituting
J = j1 + j2 for the second factor of the denominator of (20.221), we get (-λ)! As well
as λ! In the first factor of the denominator. Therefore, for the inside of factorials not
to be negative we must have λ = 0. Consequently, we have [5]

ð2j1 Þ!ð2j2 Þ!
ρJ = ,
ð2J Þ!
1=2
ðJ þ M Þ!ðJ - M Þ!
C Jm1 ,m2 = :
ðj1 þ m1 Þ!ðj1 - m1 Þ!ðj2 þ m2 Þ!ðj2 - m2 Þ!
20.3 Clebsch–Gordan Coefficients of Rotation Groups 893

With Ψ2± 2 we have only one choice for ρJ and C Jm1 ,m2 . That is, we have m1 = ± j1
and m2 = ± j2. Then, Ψ2± 2 = ψ 1± 1 ϕ1± 1 . In the case of M = - 1, we have two choices
in (20.221); i.e., m1 = - 1, m2 = 0 and m1 = 0, m2 = - 1. Then, we get

1
Ψ2- 1 = p ψ 1- 1 ϕ10 þ ψ 10 ϕ1- 1 : ð20:226Þ
2

In the case of M = 1, similarly we have two choices in (20.221); i.e., m1 = 1,


m2 = 0 and m1 = 0, m2 = 1. We get

1
Ψ21 = p ψ 11 ϕ10 þ ψ 10 ϕ11 : ð20:227Þ
2

Moreover, in the case of M = 0, we have three choices in (20.221); i.e., m1 = -


1, m2 = 1 and m1 = 0, m2 = 0 along with m1 = 1, m2 = - 1 (see Table 20.1). As a
result, we get

1
Ψ20 = p ψ 1- 1 ϕ11 þ 2ψ 10 ϕ10 þ ψ 11 ϕ1- 1 : ð20:228Þ
6

(ii) With J = 1, i.e., Ψ1- 1 , Ψ10 , and Ψ11 , we start with (20.221). Noting the
λ! and (1 - λ)! in the first two factors of denominator, we have λ = 0 or λ = 1. In the
case of M = - 1, we have only one choice of m1 = 0, m2 = - 1 for λ = 0. Similarly,
m1 = - 1, m2 = 0 for λ = 1. Hence, we get

1
Ψ1- 1 = p ψ 10 ϕ1- 1 - ψ 1- 1 ϕ10 : ð20:229Þ
2

In a similar manner, for M = 0 and M = 1, respectively, we have

1
Ψ10 = p ψ 11 ϕ1- 1 - ψ 1- 1 ϕ11 , ð20:230Þ
2
1
Ψ11 = p ψ 11 ϕ10 - ψ 10 ϕ11 : ð20:231Þ
2

(iii) With J = 0 (i.e., Ψ00), we have J = j1 - j2 (=0). In this case, we have


J - j1 + λ - m2 = - j2 + λ - m2 in the denominator of (20.221) [5]. Since this factor
is inside the factorial, we have -j2 + λ - m2 ≥ 0. Also, we have j2 - λ + m2 ≥ 0 in the
denominator of (20.221). Hence, we have only one choice of λ = j2 + m2. Therefore,
we get
894 20 Theory of Continuous Groups

ð2J þ 1Þ!ð2j2 Þ!
ρJ = ,
ð2j1 þ 1Þ!
1=2
ðj1 þ m1 Þ!ðj1 - m1 Þ!
CJm1 ,m2 = ð- 1Þj2 þm2 :
ðJ þ M Þ!ðJ - M Þ!ðj2 þ m2 Þ!ðj2 - m2 Þ!

In the above, we have three choices of m1 = - 1, m2 = 1; m1 = m2 = 0; m1 = 1,


m2 = - 1. As a result, we get

1
Ψ00 = p ψ 1- 1 ϕ11 - ψ 10 ϕ10 þ ψ 11 ϕ1- 1 :
3

Summarizing the above results, we have constructed the proper eigenfunctions


with respect to the combined angular momenta using the basis functions of the
direct-product representation. An example of the relevant matrix representations is
described as follows:

ψ 1-1 ϕ1-1 ψ 10 ϕ1-1 ψ 11 ϕ1-1 ψ 10 ϕ11 ψ 11 ϕ11 ψ 1-1 ϕ10 ψ 1-1 ϕ11 ψ 11 ϕ10 ψ 10 ϕ10
1 0 0 0 0 0 0 0 0
1 1
0 p 0 0 0 p 0 0 0
2 2
1 1 1
0 0 p 0 0 0 p 0 p
6 2 3
1 1
0 0 0 p 0 0 0 -p 0
2 2
0 0 0 0 1 0 0 0 0
× ð20:232Þ
1 1
0 p 0 0 0 -p 0 0 0
2 2
1 1 1
0 0 p 0 0 0 -p 0 p
6 2 3
1 1
0 0 0 p 0 0 0 p 0
2 2
2 1
0 0 p 0 0 0 0 0 -p
6 3
= Ψ2-2 Ψ2-1 Ψ20 Ψ21 Ψ22 Ψ1-1 Ψ10 Ψ11 Ψ00 :

In (20.232) the matrix elements represent the Clebsch-Gordan coefficients and


their combination in (20.232) forms a unitary matrix. In (20.232) we have arbitrar-
iness with the disposition of the elements of both row and column vectors. In other
20.3 Clebsch–Gordan Coefficients of Rotation Groups 895

words, we have the arbitrariness with the unitary similarity transformation of the
matrix, but the unitary matrix that underwent the unitary similarity transformation is
again a unitary matrix (see Chap. 14). To show that the determinant of (20.232) is -
1, use the expansion of the determinant and use the fact that when (a multiple of) a
certain column (or row) is added to another column (or row), the determinant is
unchanged (see Sect. 11.3).
We find that the functions belonging to J = 2 and J = 0 are the basis functions of
the symmetric representations. Meanwhile, those belonging to J = 1 are the basis
functions of the antisymmetric representations (see Sect. 18.9). Expressing the
unitary transformation of this example according to (20.173) and choosing
Ψ2- 2 Ψ2- 1 Ψ20 Ψ21 Ψ22 Ψ1- 1 Ψ10 Ψ11 Ψ00 as the basis vectors, we describe the trans-
formation as

ℜða, bÞ Ψ2- 2 Ψ2- 1 Ψ20 Ψ21 Ψ22 Ψ1- 1 Ψ10 Ψ11 Ψ00
ð20:233Þ
= Ψ2- 2 Ψ2- 1 Ψ20 Ψ21 Ψ22 Ψ1- 1 Ψ10 Ψ11 Ψ00 Dð1Þ  Dð1Þ :

Notice again that the representation matrix of D(1)  D(1) can be converted
(or reduced) to block unitary matrices according to the symmetric and antisymmetric
representations such that

Dð1Þ  Dð1Þ = Dð2Þ ⨁Dð1Þ ⨁Dð0Þ , ð20:234Þ

where D(2) and D(0) are the symmetric representations and D(1) is the antisymmetric
representation. Recall that SO(3) is simply reducible. To show (20.234), we need
somewhat lengthy calculations, but the computation procedures are straightforward.
A set of basis functions Ψ2- 2 Ψ2- 1 Ψ20 Ψ21 Ψ22 spans a five-dimensional sym-
metric representation space. In turn, Ψ00 spans a one-dimensional symmetric repre-
sentation space. Another set of basis functions Ψ1- 1 Ψ10 Ψ11 spans a three-
dimensional antisymmetric representation space.
We add that to seek some special Clebsch-Gordan coefficients would be easy.
For example, we can readily find out them in the case of J = j1 + j2 = M or
J = j1 + j2 = - M. The final result of the normalized basis functions can be

j þj j j j þj j j
Ψ j11 þj22 = ψ j11 ϕj22 , Ψ -1 ðj 2þj Þ = ψ -1 j1 ϕ -2 j2 :
1 2

That is, among the related coefficients, only

ðj1 , m1 = j1 , j2 , m2 = j2 jJ = j1 þ j2 , M = J Þ

and

ðj1 , m1 = - j1 , j2 , m2 = - j2 jJ = j1 þ j2 , M = - J Þ

survive and take a value of 1; see the corresponding parts of the matrix elements in
(20.221). This can readily be seen in Tables 20.1, 20.2, and 20.3.
896 20 Theory of Continuous Groups

20.4 Lie Groups and Lie Algebras

In Sect. 20.2 we have developed a practical approach to dealing with various aspects
and characteristics of SU(2) and SO(3). In this section we introduce an elegant theory
of Lie groups and Lie algebras that enable us to systematically study SU(2) and
SO(3).

20.4.1 Definition of Lie Groups and Lie Algebras:


One-Parameter Groups

We start with the following discussion. Let A(t) be a non-singular matrix whose
elements vary as a function of a real number t. Here we think of a matrix as a function
of a real number t. If under such conditions A(t) constitutes a group, A(t) is said to be
a one-parameter group. In particular, we are interested in a situation where we have

Aðs þ t Þ = AðsÞAðt Þ: ð20:235Þ

We further get

AðsÞAðt Þ = Aðs þ t Þ = Aðt þ sÞ = Aðt ÞAðsÞ: ð20:236Þ

Therefore, any pair of A(t) and A(s) is commutative. Putting t = s = 0 in (20.236),


we have

Að0Þ = Að0ÞAð0Þ: ð20:237Þ

Since A(0) is non-singular, A(0)-1 must exist. Then, multiplying A(0)-1 on both
sides of (20.237), we obtain

Að0Þ = E, ð20:238Þ

where E is an identity matrix. Also, we have

Aðt ÞAð- t Þ = Að0Þ = E, ð20:239Þ

meaning that

Að- t Þ = Aðt Þ - 1 :

Differentiating both sides of (20.235) with respect to s at s = 0, we get


20.4 Lie Groups and Lie Algebras 897

A0 ðt Þ = A0 ð0ÞAðt Þ: ð20:240Þ

Defining

X  A0 ð0Þ, ð20:241Þ

we obtain

A0 ðt Þ = XAðt Þ: ð20:242Þ

Integrating (20.242), we have

Aðt Þ = expðtX Þ ð20:243Þ

under an initial condition A(0) = E. Thus, any one-parameter group A(t) is described
by (20.243) on condition that A(t) is represented by (20.235).
In Sect. 20.1 we mentioned the Lie algebra in relation to (20.14). In this context
(20.243) shows how the Lie algebra is related to the one-parameter group. For a
group to be a continuous group is thus deeply connected to this one-parameter group.
Equation (20.45) of Sect. 20.2 is an example of product of one-parameter groups.
Bearing in mind the above situation, we describe definitions of the Lie group and Lie
algebra.
Definition 20.2 [9] Let G be a subgroup of GL(n, ℂ). Suppose that Ak (k = 1, 2,
⋯) 2 G and that lim Ak = A ½2 GLðn, ℂÞ. If A 2 G, then G is said to be a linear Lie
k→1
group of dimension n.
We usually call a linear Lie group simply a Lie group. Note that GL(n, ℂ) has
appeared in Sect. 16.1. The definition again is most likely to bewilder chemists in
that they read it as though the definition itself including the word of Lie group was
written in the air. Nevertheless, a next example is again expected to relieve the
chemists of useless anxiety.
Example 20.3 Let G be a group and let Ak 2 G with det Ak = 1 such that

cos θk - sin θk
Ak = ðk : real numberÞ: ð20:244Þ
sin θk cos θk

Suppose that lim θk = θ and then we have


k→1

cos θk - sin θk cos θ - sin θ


lim Ak = lim =  A: ð20:245Þ
k→1 k→1 sin θk cos θk sin θ cos θ

Certainly, we have A 2 G, because det Ak = det A = 1. Such group is called SO(2).


898 20 Theory of Continuous Groups

In a word, a lie group is a group whose component consists of continuous and


differentiable (or analytic) functions. In that sense the groups such as SU(2) and
SO(3) we have already encountered are Lie groups.
Next, the following is in turn a definition of a Lie algebra.
Definition 20.3 [9] Let G be a linear Lie group. If with 8t (t: real number) we have

expðtX Þ 2 G, ð20:246Þ

all X are called a Lie algebra of the corresponding Lie group G. We denote the Lie
algebra of the Lie group G by g.
The relation (20.246) includes the case where X is a (1, 1) matrix (i.e., merely a
complex number) as the trivial case. Most importantly, (20.246) is directly
connected to (20.243) and Definition 20.3 shows the direct relevance between the
Lie group and Lie algebra. The two theories of Lie group and Lie algebra underlie
continuous groups.
The lie algebra has following properties:
(i) If X 2 g, aX 2 g as well, where a is any real number.
(ii) If X, Y 2 g, X þ Y 2 g as well.
(iii) If X, Y 2 g, ½X, Y ð XY - YX Þ 2 g as well.
In fact, exp[t(aX)] = exp [(ta)X] and ta is a real number so that we have (i).
Regarding (ii) we should examine individual properties of X. As already shown in
Property (7)′ of Chap. 15 (Sect. 15.2), e.g., a unitary matrix of exp(tX) is associated
with an anti-Hermitian matrix of X; we are interested in this particular case. In the
case of SU(2) X is a traceless anti-Hermitian matrix and as to SO(3) X is a real skew-
symmetric matrix. In the former case, for example, traceless anti-Hermitian matrices
X and Y can be expressed as

ic - a þ ib ih - f þ ig
X= ,Y = , ð20:247Þ
a þ ib - ic f þ ig - ih

where a, b, c, etc. are real. Then, we have

iðc þ hÞ - ð a þ f Þ þ i ð b þ gÞ
XþY= : ð20:248Þ
ða þ f Þ þ iðb þ gÞ - iðc þ hÞ

Thus, X + Y is a traceless anti-Hermitian matrix as well. With the property (iii), we


need some explanation. Suppose X, Y 2 g. Then, with any real numbers s and t, exp
(sX), exp (tY), exp (-sX), exp (-tY) 2 G by Definition 20.3. Then, their product exp
(sX) exp (tY) exp (-sX) exp (-tY) 2 G as well. Taking an infinitesimal transforma-
tion of these four products within the first order of s and t, we have
20.4 Lie Groups and Lie Algebras 899

expðsX Þ expðtY Þ expð - sX Þ expð - tY Þ


≈ ð1 þ sX Þð1 þ tY Þð1 - sX Þð1 - tY Þ
= ð1 þ tY þ sX þ stXY Þð1 - tY - sX þ stXY Þ ð20:249Þ
≈ ð1 - tY - sX þ stXY Þ þ ðtY - stYX Þ þ ðsX - stXY Þ þ stXY
= 1 þ st ðXY - YX Þ:

Notice that we ignored the terms having s2, t2, st2, s2t, and s2t2 as a coefficient.
Defining

½X, Y   XY - YX, ð20:250Þ

we have

expðsX Þ expðtY Þ expð- sX Þ expð- tY Þ ≈ 1 þ st ½X, Y  ≈ expðst ½X, Y Þ: ð20:251Þ

Since LHS represents an element of G and st is a real number, by Definition 20.3


½X, Y  2 g. The quantity [X, Y] is called a commutator. The commutator and its
definition appeared in (1.139).
From properties (i) and (ii), the Lie algebra g forms a vector space (see Chap. 11).
An element of g is usually expressed by a matrix. Zero vector corresponds to zero
matrix (see Proposition 15.1). As already seen in Chap. 15, we defined an inner
product between any A, B 2 g such that

hAjBi  aij bij : ð20:252Þ


i, j

Thus, g constitutes an inner product (vector) space.

20.4.2 Properties of Lie Algebras

Let us further continue our discussion of Lie algebras by thinking of a following


proposition about the inner product.
Proposition 20.1 Let f(t) and g(t) be continuous and differentiable functions with
respect to a real number t. Then, we have

d df ðt Þ dgðt Þ
hf ðt Þjgðt Þi = jgðt Þ þ f ðt Þj : ð20:253Þ
dt dt dt

Proof We calculate a following equation:


900 20 Theory of Continuous Groups

hf ðt þ ΔÞjgðt þ ΔÞi - hf ðt Þjgðt Þi


= hf ðt þ ΔÞjgðt þ ΔÞi - hf ðt Þjgðt þ ΔÞi þ hf ðt Þjgðt þ ΔÞi - hf ðt Þjgðt Þi
= hf ðt þ ΔÞ - f ðt Þjgðt þ ΔÞi þ hf ðt Þjgðt þ ΔÞ - gðt Þi:
ð20:254Þ

Dividing both sides of (20.254) by a real number Δ and taking a limit as below,
we have

1
lim ½hf ðt þ ΔÞjgðt þ ΔÞi - hf ðt Þjgðt Þi
Δ→0 Δ
ð20:255Þ
f ðt þ Δ Þ - f ðt Þ gð t þ Δ Þ - gð t Þ
= lim jgðt þ ΔÞ þ f ðt Þj ,
Δ→0 Δ Δ

where we used the calculation rules for inner products of (13.3) and (13.20). Then,
we get (20.253). This completes the proof.
Now, let us apply (20.253) to a unitary operator A(t) we dealt with in the previous
section. Replacing f(t) and g(t) in (20.253) with ψA(t){ and A(t)χ, respectively, we
have

{
d dAðt Þ dAðt Þ
ψAðt Þ{ jAðt Þχ = ψ jAðt Þχ þ ψAðt Þ{ j χ , ð20:256Þ
dt dt dt

where ψ and χ are arbitrarily chosen vectors that do not depend on t. Then, we have

d d d
LHS of ð20:256Þ = ψjAðt Þ{ Aðt Þjχ = hψjEjχ i = hψjχ i = 0, ð20:257Þ
dt dt dt

where with the second equality we used the unitarity of A(t). Taking limit t → 0 in
(20.256), we have

dAð0Þ{ dAð0Þ
lim ½RHS of ð20:256Þ = ψ jAð0Þχ þ ψAð0Þ{ j χ :
t→0 dt dt

Meanwhile, taking a limit of t → 0 in the following relation, we have

{
dAðt Þ{ dAðt Þ {
= = ½A0 ð0Þ ,
dt t→0
dt t→0

where we assumed that operations of the adjoint and t → 0 are commutable. Then,
we get
20.4 Lie Groups and Lie Algebras 901

{ dAð0Þ
lim ½RHS of ð20:256Þ = ψ ½A0 ð0Þ jAð0Þχ þ ψAð0Þ{ j χ
t→0 dt ð20:258Þ
{ {
= ψX jχ þ hψjXχ i = ψ X þ X jχ ,

where we used X  A′(0) of (20.241) and A(0) = A(0){ = E. From (20.257) and
(20.258), we have

0 = ψ X { þ X jχ :

Since ψ and χ are arbitrary, from Theorem 14.2 we get

X{ þ X  0 i:e:, X { = - X: ð20:259Þ

This indicates that X is an anti-Hermitian operator. Let X be expressed as (xij).


Then, from (20.259) xij = - xji . As for diagonal elements, xii = - xii ; i.e.,
xii þ xii = 0. Hence, xii is a pure imaginary number or zero. We have the following
theorem accordingly.
Theorem 20.3 Let A(t) be a one-parameter unitary group with A(0) = E that is
described by A(t) = exp (tX) and satisfies (20.235). Then, X  A′(0) is an anti-
Hermitian operator.
Differentiating both sides of A(t) = exp (tX) with respect to t and using (15.47) of
Theorem 15.4 [11, 12], we have

A0 ðt Þ = X expðtX Þ = XAðt Þ = Aðt ÞX: ð20:260Þ

Putting t = 0 in (20.260), we restore X = A′(0). If we require A(t) to be unitary,


from Theorem 20.3 again X should be anti-Hermitian. Thus, once we have a
one-parameter group in a form of a unitary operator A(t) = exp (tX), the exponent
can be separated into a product of a parameter t and a t-independent constant anti-
Hermitian operator X. In fact, all the one-parameter groups that have appeared in
Sect. 20.2 are of a type of exp(tX).
Conversely, let us consider what type of operator exp(tX) would be if X is anti-
Hermitian. Here we assume that X is a (n, n) matrix. With an arbitrary real number
t we have

½expðtX Þ½expðtX Þ{ = ½expðtX Þ exp tX { = ½expðtX Þ½expð - tX Þ


ð20:261Þ
= ½expðtX - tX Þ = exp 0 = E,

where we used (15.32) and Theorem 15.2 as well as the assumption that X is anti-
Hermitian. Note that exp(tX) and exp(-tX) are commutative; see (15.29) of Sect.
15.2. Equation (20.261) implies that exp(tX) is unitary, i.e., exp(tX) 2 U(n). The
902 20 Theory of Continuous Groups

notation U(n) means a unitary group. Hence, X 2 uðnÞ, where uðnÞ means the Lie
algebra corresponding to U(n).
Next, let us seek a Lie algebra that corresponds to a special unitary group SU(n).
By Definition 20.3, it is obvious that since U(n) ⊃ SU(n), uðnÞ ⊃ suðnÞ , where
suðnÞ means the Lie algebra corresponding to SU(n). It suffices to find the condition
under which det[exp(tX)] = 1 holds with any real number t and the elements
X 2 uðnÞ. From (15.41) [9], we have

det½expðtX Þ = exp TrðtX Þ = exp t ½TrðX Þ = 1, ð20:262Þ

where Tr stands for trace (see Chap. 12). This is equivalent to that

t ½TrðX Þ = 2miπ ðm : zero or integersÞ ð20:263Þ

holds with any real t. For this, we must have Tr(X) = 0 with m = 0. This implies that
suðnÞ comprises anti-Hermitian matrices with its trace being zero (i.e., traceless).
Consequently, the Lie algebra suðnÞ is certainly a subset (or subspace) of uðnÞ that
consists of anti-Hermitian matrices.
In relation to (20.256) let us think of a real orthogonal matrix A(t). If A(t) is real
and orthogonal, (20.256) can be rewritten as

T
d dAðt Þ dAðt Þ
ψAðt ÞT jAðt Þχ = ψ jAðt Þχ þ ψAðt ÞT j χ : ð20:264Þ
dt dt dt

Following the procedure similar to the case of (20.259), we get

XT þ X  0 i:e:, X T = - X: ð20:265Þ

In this case we obtain a skew-symmetric matrix. All its diagonal elements are zero
and, hence, the skew-symmetric matrix is traceless. Other typical Lie groups are an
orthogonal group O(n) and a special orthogonal group SO(n) and the corresponding
Lie algebras are denoted by oðnÞ and soðnÞ, respectively. Note that both oðnÞ and
soðnÞ consist of skew-symmetric matrices.
Conversely, let us consider what type of operator exp(tX) would be if X is a skew-
symmetric matrix. Here we assume that X is a (n, n) matrix. With an arbitrary real
number t we have

½expðtX Þ½expðtX ÞT = ½expðtX Þ exp tX T = ½expðtX Þ½expð - tX Þ


ð20:266Þ
= ½expðtX - tX Þ = exp 0 = E,

where we used (15.31). This implies that exp(tX) is an orthogonal matrix, i.e., exp
(tX) 2 O(n). Hence, X 2 oðnÞ. Notice that exp(tX) and exp(-tX) are commutative.
Summarizing the above, we have the following theorem.
20.4 Lie Groups and Lie Algebras 903

Theorem 20.4 The n-th order Lie algebra uðnÞ corresponding to the Lie group U(n)
(i.e., unitary group) consists of all the anti-Hermitian (n, n) matrices. The n-th order
Lie algebra suðnÞ corresponding to the Lie group SU(n) (i.e., special unitary group)
comprises all the anti-Hermitian (n, n) matrices with the trace zero. The n-th order
Lie algebras oðnÞ and soðnÞ corresponding to the Lie groups O(n) and SO(n),
respectively, are all the real skew-symmetric (n, n) matrices.
Notice that if A and B are anti-Hermitian matrices, so are A + B and cA (c: real
number). This is true of skew-symmetric matrices as well. Hence, uðnÞ, suðnÞ, oðnÞ,
and soðnÞ form a linear vector space.
In Sect. 20.3.1 we introduced a commutator. The commutators are ubiquitous in
quantum mechanics, especially as commutation relations (see Part I). Major prop-
erties are as follows:

ðiÞ ½X þ Y, Z  = ½X, Z  þ ½Y, Z ,


ðiiÞ ½aX, Y  = a½X, Z  ða 2 ℝÞ,
ð20:267Þ
ðiiiÞ ½X, Y  = - ½Y, X ,
ðivÞ ½X, ½Y, Z  þ ½Y, ½Z, X  þ ½Z, ½X, Y  = 0:

The last equation of (20.267) is well-known as Jacobi’s identity. Readers are


encouraged to check it.
Since the Lie algebra forms a linear vector space of a finite dimension, we denote
its basis vectors by X1, X2, ⋯, Xd. Then, we have

gðdÞ = SpanfX 1 , X 2 , ⋯, X d g, ð20:268Þ

where gðd Þ stands for a Lie algebra of dimension d. As their commutators belong to
gðdÞ as well, with an arbitrary pair of Xi and Xj (i, j = 1, 2, ⋯, d) we have

d
Xi, Xj = f X ,
k = 1 ijk k
ð20:269Þ

where a set of real coefficients fijk is said to be structure constants, which define the
structure of the Lie algebra.
Example 20.4 In (20.42) we write ζ 1  ζ x, ζ 2  ζ y, and ζ 3  ζ z. Rewriting it
explicitly, we have

1 -i 1 1
ζ1 = 0
-i
, ζ2 = 0
-1
1
, ζ3 = i 0
-i
: ð20:270Þ
2 0 2 0 2 0

Then, we get
904 20 Theory of Continuous Groups

3
ζi , ζj = E ζ ,
k = 1 ijk k
ð20:271Þ

where Eijk is called the Levi-Civita symbol [4] and denoted by

þ1 ði, j, kÞ = ð1, 2, 3Þ, ð2, 3, 1Þ, ð3, 1, 2Þ;


Eijk = -1 ði, j, kÞ = ð3, 2, 1Þ, ð1, 3, 2Þ, ð2, 1, 3Þ; ð20:272Þ
0 otherwise:

Notice that in (20.272) if (i, j, k) represents an even permutation of (1, 2, 3), Eijk = 1,
but if (i, j, k) is an odd permutation of (1, 2, 3), Eijk = - 1. Otherwise, for e.g.,
i = j, j = k, or k = i, etc. Eijk = 0. The relation of (20.271) is essentially the same as
(3.30) and (3.69) of Chap. 3. We have

suð2Þ = Spanfζ 1 , ζ 2 , ζ 3 g: ð20:273Þ

Equation (20.270) shows that the trace of ζ i (i = 1, 2, 3) is zero.


Example 20.5 In (20.26) we write A1  Ax, A2  Ay, and A3  Az. Rewriting it, we
have

0 0 0 0 0 1 0 -1 0
A1 = 0 0 -1 , A2 = 0 0 0 , A3 = 1 0 0 : ð20:274Þ
0 1 0 -1 0 0 0 0 0

Again, we have

3
Ai , Aj = E A:
k = 1 ijk k
ð20:275Þ

This is of the same form as that of (20.271). Namely,

soð3Þ = oð3Þ = SpanfA1 , A2 , A3 g: ð20:276Þ

Equation (20.274) clearly shows that diagonal elements of Ai (i = 1, 2, 3) are zero.


From Examples 20.4 and 20.5, suð2Þ and soð3Þ ½or oð3Þ structurally resemble
each other. This fact is directly related to similarity between SU(2) and SO(3). Note
also that the dimensionality of suð2Þ and soð3Þ is the same in terms of linear vector
spaces.
In Chap. 3, we deal with the generalized angular momentum using the relation
(3.30). Using (20.15), we can readily obtain the relation the same as (20.271) and
(20.275). That is, from (3.30) defining the following anti-Hermitian operators as
20.4 Lie Groups and Lie Algebras 905

iJ iJ y ~
J~x  - x , J~y  - , J  - iJ z =ħ,
ħ ħ z

we get

J~l , J~m = ~
3
E J ,
k = 1 lmn n

where l, m, n stand for x, y, z. The derivation is left for readers.

20.4.3 Adjoint Representation of Lie Groups

In (20.252) of Sect. 20.3.1 we defined an inner product of g as

hAjBi  aij bij : ð20:252Þ


i, j

We can readily check that the definition (20.252) satisfies those of the inner
product of (13.2), (13.3), and (13.4). In fact, we have


hBjAi = bij aij = aij bij = aij bij = hAjBi : ð20:277Þ
i, j i, j i, j

Equation (13.3) is obvious from the calculation rules of a matrix. With (13.4), we get

hAjAi = Tr A{ A =
2
aij ≥ 0, ð20:278Þ
i, j

where we have A = (aij) and the equality holds if and only if all aij = 0, i.e., A = 0.
Thus, hA| Ai gives a positive definite inner product on g.
From (20.278), we may equally define an inner product of g as

hAjBi = Tr A{ B : ð20:279Þ

In fact, we have

Tr A{ B = A{ ij ðBÞji = aji bji  hAjBi:


i, j i, j

In another way, we can readily compare (20.279) with (20.252) using, e.g., a
(2, 2) matrix and find that both equations give the same result. It is left for readers as
an exercise.
906 20 Theory of Continuous Groups

From Theorem 20.4, uðnÞ corresponding to the unitary group U(n) consists of all
the anti-Hermitian (n, n) matrices. With these matrices, we have

hBjAi = Tr B{ A = bji aji = - bij - aij = aij bij = hAjBi: ð20:280Þ


i, j i, j i, j

Then, comparing (20.277) and (20.280) we have

hAjBi = hAjBi:

This means that hA| Bi is real. For example, using uð2Þ, let us evaluate an inner
product. We denote arbitrary X 1 , X 2 2 uð2Þ by

X1 = ia
- cþid
cþid
ib
, X2 = ip
- rþis
rþis
iq
,

where a, b, c, d; p, q, r, s are real. Then, according to the calculation rule of an inner


product of the Lie algebra, we have

hX 1 jX 2 i = ap þ 2ðcr þ dsÞ þ bq,

which gives a real inner product. Notice that this is the case with suð2Þ as well.
Next, let us think of the following inner product:

gXg - 1 jgYg - 1 = Tr gXg - 1 Þ{ gYg - 1 , ð20:281Þ

where g is any non-singular matrix.


Bearing in mind that we are particularly interested in the continuous groups U(n)
[or its subgroup SU(n)] and O(n) [or its subgroup SO(n)], we assume that g is an
element of those groups and represented by a unitary matrix (including an orthog-
onal matrix); i.e., g-1 = g{ (or gT). Then, if g is a unitary matrix, we have

{
Tr gXg - 1 gYg - 1 ¼ Tr gXg{ { gYg{
ð20:282Þ
¼ Tr gX { g{ gYg{ ¼ Tr gX { Yg{ ¼ Tr X { Y ¼ hX Yi,

where with the second last equality we used (12.13), namely invariance of the trace
under the (unitary) similarity transformation. If g is represented by a real orthogonal
matrix, instead of (20.282) we have

{
Tr gXg - 1 gYg - 1 ¼ Tr gXgT { gYgT
¼ Tr gT { X { g{ gYgT ¼ Tr g X { gT gYg{ ¼ Tr gX { YgT ¼ hX Yi,
ð20:283Þ
20.4 Lie Groups and Lie Algebras 907

where g is a complex conjugate matrix of g. Combining (20.281) and (20.282) or


(20.283), we have

gXg - 1 jgYg - 1 = hXjY i: ð20:284Þ

This relation clearly shows that the real inner product of (20.284) remains
unchanged by the operation

X ⟼ gXg - 1 ,

where X is an anti-Hermitian (or skew-symmetric) matrix and g is a unitary


(or orthogonal) matrix.
Now, we give the following definition and related theorems in relation to both the
Lie groups and Lie algebras.
Definition 20.4 Let G be a Lie group chosen from U(n) [including SU(n)] and O(n)
[including SO(n)]. Let g be a Lie algebra corresponding to G. Let g 2 G and X 2 g.
We define the following transformation on g such that

Ad½gðX Þ  gXg - 1 : ð20:285Þ

Then, Ad[g] is said to be an adjoint representation of G on g. We write the relation


(20.285) as

Ad½g : g → g:

From Definition 20.4, Ad½gðX Þ 2 g. The operator Ad[g] is a kind of mapping (i.e.,
endomorphism) discussed in Sect. 11.2. Notice that both g and X are represented by
(n, n) matrices. The matrix g is either a unitary matrix or an orthogonal matrix. The
matrix X is either an anti-Hermitian matrix or a skew-symmetric matrix.
Theorem 20.5 Let g be an element of a Lie group G and g be a Lie algebra of G.
Then, Ad[g] is a linear transformation on g.
Proof Let X, Y 2 g. Then, we have

Ad½gðaX þ bY Þ  gðaX þ bY Þg - 1 = a gXg - 1 þ b gYg - 1


ð20:286Þ
= aAd½gðX Þ þ bAd½gðY Þ:

Thus, we find that Ad[g] is a linear transformation. By Definition 20.3, exp


(tX) 2 G with an arbitrary real number t. Meanwhile, we have

expftAd½gðX Þg = exp tgXg - 1 = exp gtXg - 1 = g expðtX Þg - 1 ,

where with the last equality we used (15.30).


908 20 Theory of Continuous Groups

Then, if g 2 G, g exp (tX)g-1 2 G. Again from Definition 20.3, Ad½gðX Þ 2 g.


That is, Ad[g] is a linear transformation on g. This completes the proof.
Notice that in Theorem 20.5, G may be any linear Lie group chosen from among
GL(n, ℂ). Lie algebra corresponding to GL(n, ℂ) is denoted by glðn, ℂÞ. The follow-
ing theorem shows that Ad[g] is a representation.
Theorem 20.6 Let g be an element of a Lie group G. Then, g ⟼ Ad[g] is a
representation of G on g.
Proof From (20.285), we have

Ad½g1 g2 ðX Þ  ðg1 g2 ÞX ðg1 g2 Þ - 1 = g1 g2 Xg2- 1 g1- 1 = g1 Ad½g2 ðX Þg1- 1


ð20:287Þ
= Ad½g1 ðAd½g2 ðX ÞÞ = ðAd½g1 Ad½g2 ÞðX Þ:

Comparing the first and last sides of (20.287), we get

Ad½g1 g2  = Ad½g1 Ad½g2 :

That is, g ⟼ Ad[g] is a representation of G on g.


By virtue of Theorem 20.6, we call g ⟼ Ad[g] an adjoint representation of G on
g. Once again, we remark that

X ⟼ Ad½gðX Þ  X 0 or Ad½g : g → g ð20:288Þ

is a linear transformation g → g, namely an endomorphism that operates on a vector


space g (see Sect. 11.2). Figure 20.5 schematically depicts it. The notation of the
adjoint representation Ad is somewhat confusing. That is, (20.288) shows that Ad[g]
is a linear transformation g → g (i.e., endomorphism). Conversely, Ad is thought to
be a mapping G → G′ where G and G′ mean two groups. Namely, we express it
symbolically as [9]

Fig. 20.5 Linear


transformation Ad[g]: g → g
Ad[ ]
(i.e., endomorphism). Ad[g]
is an endomorphism that
operates on a vector space g

≡ Ad[ ]( )
20.4 Lie Groups and Lie Algebras 909

g ⟼ Ad½g or Ad : G → G0 : ð20:289Þ

The G and G′ may or may not be identical. Examples can be seen in, e.g., (20.294)
and (20.302) later.
As mentioned above, if g is either uðnÞ or oðnÞ, Ad[g] is an orthogonal
transformation on g. The representation of Ad[g] is real accordingly.
An immediate implication of this is that the transformation of the basis vectors of
g has a connection with the corresponding Lie groups such as U(n) and O(n).
Moreover, it is well known and studied that there is a close relationship between
Ad[SU(2)] and SO(3). First, we wish to seek basis vectors of suð2Þ. A general form
of X 2 suð2Þ is a following traceless anti-Hermitian matrix described by

ic - a þ ib
X= ,
a þ ib - ic

where a, b, and c are real. We have encountered this type of matrix in (20.42) and
(20.270). Using an inner product described in (20.252), we determine an orthonor-
mal basis set of suð2Þ such that

1 -i 1 1
e1 = p 0
-i 0
, e2 = p 0
-1
1
0
, e3 = p i
0
0
-i
, ð20:290Þ
2 2 2

where we have hei| eji = δij (i, j = 1, 2, 3).


Using the basis set of suð2Þ given by (20.290) for those of the representation
space, we wish to seek a tangible representation matrix of Ad[g] [g 2 SU(2)]. To
construct Ad[g] [g 2 SU(2)], we choose gα, gβ 2 SU(2) given by

β β
eiα=2 0 cos sin
gα  and gβ  2 2 :
0 e - iα=2 β β
- sin cos
2 2

The elements gα and gβ have appeared in (20.43) and (20.44). Then, we have [9]

1 eiα=2 0 0 -i e - iα=2 0
Ad½gα ðe1 Þ = gα e1 gα- 1 = p
2 0 e - iα=2 -i 0 0 eiα=2

1 0 sin α - i cos α
=p = e1 cos α þ e2 sin α:
2 - sin α - i cos α 0
ð20:291Þ

Similarly, we get
910 20 Theory of Continuous Groups

1 0 cosαþisinα
Ad½gα ðe2 Þ= p = -e1 sinαþe2 cosα, ð20:292Þ
2 - cosαþisinα 0
1
Ad½gα ðe3 Þ = p i
0
0
-i
= e3 : ð20:293Þ
2

Notice that we obtained (20.291)–(20.293) as a product of three diagonal matri-


ces. Thus, we obtain

cos α - sin α 0
ðe1 e2 e3 ÞAd½gα  = ðe1 e2 e3 Þ sin α cos α 0 : ð20:294Þ
0 0 1

From (20.294), as a real representation matrix we get

cos α - sin α 0
Ad½gα  = sin α cos α 0 : ð20:295Þ
0 0 1

Calculating (e1 e2 e3)Ad[gβ] likewise, we get

cos β 0 sin β
Ad gβ = 0 1 0 : ð20:296Þ
- sin β 0 cos β

Moreover, with

eiγ=2 0
gγ = , ð20:297Þ
0 e - iγ=2

we have

cos γ - sin γ 0
Ad gγ = sin γ cos γ 0 : ð20:298Þ
0 0 1

Finally, let us reproduce (17.101) by calculating

Ad½gα   Ad gβ  Ad gγ = Ad gα gβ gγ , ð20:299Þ

where we used the fact that gω ⟼ Ad[gω] (ω = α, β, γ) is a representation of


SU(2) on suð2Þ. To obtain an explicit form of the representation matrix of Ad
[g] [g 2 SU(2)], we use
20.4 Lie Groups and Lie Algebras 911

β β
e2ðαþγÞ cos e2ðα - γÞ sin
i i

gαβγ  gα gβ gγ = 2 2 : ð20:300Þ
β β
- e2ðγ - αÞ sin - 2i ðαþγ Þ
i
e cos
2 2

Note that the matrix of (20.300) is identical with (20.45). Using (20.300), we
calculate Ad[gαβγ ](e1) such that

Ad gαβγ ðe1 Þ ¼ gαβγ e1 gαβγ -1


β β β β
e2ðαþγ Þ cos e2ðα-γ Þ sin e - 2ðαþγ Þ cos -e2ðα-γ Þ sin
i i i i

1 2 2 0 -i 2 2
¼p
2 β - 2i ðαþγ Þ β -i 0 β β
-e2ðγ -αÞ sin e2ðγ -αÞ sin
e2ðαþγ Þ cos
i i i
e cos
2 2 2 2
-iðcosαcosβ cosγ - sinαsinγ Þ
-isinβ cosγ
1 þsinαcosβ cosγ þ cosαsinγ
¼p
2 -iðcosαcosβ cosγ - sinαsinγ Þ
isinβ cosγ
- ðsinαcosβ cosγ þ cosαsinγ Þ
cosαcosβ cosγ - sinαsinγ
¼ ðe1 e2 e3 Þ sinαcosβ cosγ þ cosαsinγ :
- sinβ cosγ
ð20:301Þ

Similarly calculating Ad[gαβγ ](e2) and Ad[gαβγ ](e3) and combining the results
with (20.301), we obtain

ðe1 e2 e3 Þ Ad gαβγ =
cosαcosβcosγ - sinαsinγ - cosαcosβsinγ - sinαcosγ cosαsinβ
ð e1 e2 e3 Þ sinαcosβcosγ þ cosαsinγ - sinαcosβsinγ þ cosαcosγ sinαsinβ :
- sinβcosγ sinβsinγ cosβ
ð20:302Þ

The matrix of RHS of (20.302) is the same as (17.101) that contains Euler angles.
The calculations are somewhat lengthy but straightforward. As can be seen, e.g., in
(20.294) and (20.302), we may symbolically express the above adjoint representa-
tion as [9]

Ad : SU ð2Þ → SOð3Þ ð20:303Þ

in parallel to (20.289). Equation (20.303) shows the mapping between different


groups SU(2) and SO(3). From (20.302) we may identify Ad[gαβγ ] with the
SO(3) matrix (17.101) and write
912 20 Theory of Continuous Groups

Ad gαβγ ¼
cos α cos β cos γ - sin α sin γ - cos α cos β sin γ - sin α cos γ cos α sin β
sin α cos β cos γ þ cos α sin γ - sin α cos β sin γ þ cos α cos γ sin α sin β :
- sin β cos γ sin β sin γ cos β
ð20:304Þ

In the above discussion, we make several remarks. (i) g 2 SU(2) is described by


(2, 2) complex matrices, but Ad[SU(2)] is expressed by (3, 3) real orthogonal
matrices. Thus, the dimension of the matrices is different. (ii) Notice again that
(e1 e2 e3) of (20.302) is an orthonormal basis set of suð2Þ. Hence, (20.302) can be
viewed as the orthogonal transformation of (e1 e2 e3). That is, we may regard Ad
[SU(2)] as all the matrix representations of SO(3). From (12.64) and (20.302), we
write [9]

Ad½SU ð2Þ - SOð3Þ = 0 or Ad½SU ð2Þ = SOð3Þ:

Regarding the adjoint representation of G, we have the following important


theorem.
Theorem 20.7 The adjoint representation of Ad[SU(2)] is a surjective homomor-
phism of SU(2) on SO(3). The kernel F of the representation is {e, -e} 2 SU(2).
With any rotation R 2 SO(3), two elements ±g 2 SU(2) satisfy Ad[g] = Ad[-g] = R.
Proof We have shown that by (20.302) the adjoint representation of Ad[SU(2)] is
surjective mapping of SU(2) on SO(3). Since Ad[g] is a representation, namely
homomorphism mapping from SU(2) to SO(3), we seek its kernel on the basis of
Theorem 16.3 (Sect. 16.4). Let h be an element of kernel of SU(2) such that

Ad½h = E 2 SOð3Þ, ð20:305Þ

where E is the identity transformation of SO(3).


Operating (20.305) on e3 2 suð2Þ, from (20.305) we have a following expression
such that

Ad½hðe3 Þ = he3 h - 1 = E ðe3 Þ = e3 or he3 = e3 h,

h11 h12
where e3 is denoted by (20.290). Expressing h as h21 h22
, we get

- ih12
ih11
ih21 - ih22
= ih11
- ih21
ih12
- ih22
:

From the above relation, we must have h12 = h21 = 0. As h 2 SU(2), deth = 1.
Hence, for h we choose h = a
0
0
a-1
, where a ≠ 0. Moreover, considering he2 = e2h
we get
20.4 Lie Groups and Lie Algebras 913

Fig. 20.6 Homomorphism


mapping Ad: SU(2) ⟶ SO
(2) (3)
(3) with a kernel F = {e, -
e}. E is the identity Ad
transformation of SO(3)

a-1
0
- a-1
a
0
= 0
-a 0
: ð20:306Þ

This implies that a = a-1, or a = ± 1. That is, h = ± e, where e denotes the


identity element of SU(2). (Note that we get no further conditions from he1 = e1h.)
Thus, we obtain

F = fe, - eg: ð20:307Þ

In fact, with respect to X 2 suð2Þ we have

Ad½ ± eðX Þ = ð ± eÞX ð ± eÞ - 1 = ð ± eÞX ð ± eÞ = X: ð20:308Þ

Therefore, certainly we get

Ad½ ± e = E: ð20:309Þ

Meanwhile, suppose that with g1, g2 2 SU(2) we have Ad[g1] = Ad


[g2] = E 2 SO(3). Then, we have

Ad g1 g2 - 1 = Ad½g1 Ad g2 - 1 = Ad½g2 Ad g2 - 1 = Ad g2 g2 - 1


= Ad½e = E: ð20:310Þ

Therefore, from (20.307) and (20.309) we get

g 1 g2 - 1 = ± e or g1 = ± g 2 : ð20:311Þ

Conversely, we have

Ad½ - g = Ad½ - eAd½g = Ad½g, ð20:312Þ

where with the first equality we used Theorem 20.6 and with the second equality
we used (20.309). The relations (20.311) and (20.312) imply that with any
g 2 SU(2) there are two elements ±g that satisfy Ad[g] = Ad[-g]. These complete
the proof.
In the above proof of Theorem 20.7, (20.309) is a simple, but important expres-
sion. In view of Theorem 16.4 (Homomorphism theorem), we summarize the above
discussion as follows: Let Ad be a homomorphism mapping such that
914 20 Theory of Continuous Groups

Ad : SU ð2Þ⟶SOð3Þ ð20:303Þ

with a kernel F = fe, - eg. The relation is schematically represented in Fig. 20.6.
Notice the resemblance between Figs. 20.6 and 16.1a.
Correspondingly, from Theorem 16.4 of Sect. 16.4 we have an isomorphic
mapping Ad~ such that

~ : SU ð2Þ=F ⟶SOð3Þ,
Ad ð20:313Þ

where F = fe, - eg. Symbolically writing (20.313) as in Sect. 16.4, we have


SU ð2Þ=F ffi SOð3Þ. Notice that F is an invariant subgroup of SU(2). Using a general
form of SU(2) of (20.300) and a general form of SO(3) of (20.302), we can readily
show Ad[gαβγ ] = Ad[-gαβγ ]. That is, from (20.300) we have

gαþ2π,βþ2π,γþ2π = - gαβγ : ð20:314Þ

Both the above two elements gαβγ and -gαβγ produce an identical orthogonal
matrix given in RHS of (20.302).
To associate the above results of abstract algebra with the tangible matrix form
obtained earlier, we rewrite (20.54) by applying a simple trigonometric formula of

β β
sin β = 2 sin cos :
2 2

That is, we get


0
ðjÞ ð- 1Þm - mþμ ðj þ mÞ!ðj - mÞ!ðj þ m0 Þ!ðj - m0 Þ!
Dm0 m ðα, β, γ Þ =
μ ðj þ m - μÞ!μ!ðm0 - m þ μÞ!ðj - m0 - μÞ!
2j m - m0 - 2μ m0 - mþ2μ
0 β β β
× e - iðαm þγmÞ cos cos sin
2 2 2
0
ð- 1Þm - mþμ ðj þ mÞ!ðj - mÞ!ðj þ m0 Þ!ðj - m0 Þ! m - m0 - 2μ
= 2
μ ðj þ m - μÞ!μ!ðm0 - m þ μÞ!ðj - m0 - μÞ!
2jþ2ðm - m0 - 2μÞ
- iðαm0 þγmÞ m0 - mþ2μ β
×e ðsin βÞ cos :
2
ð20:315Þ

From (20.315), we clearly see that (i) if j is zero or a positive integer, so are m and
m′. Therefore, (20.315) is a periodic function with 2π with respect to α, β, and γ. But
(ii) if j is a half-odd-integer, so are m and m′. Then, (20.315) is a periodic function
with 4π. Note that in the case of (ii) 2j = 2n + 1 (n = 0.1, 2, ⋯). Therefore, in
(20.315) we have
20.5 Connectedness of Lie Groups 915

2jþ2ðm - m0 - 2μÞ 2ðnþm - m0 - 2μÞ


β β β
cos = cos  cos : ð20:316Þ
2 2 2

As studied in Sect. 20.2.3, in the former case (i) the spherical surface harmonics
span the representation space of D(l ) (l = 0, 1, 2, ⋯). The simplest case for it is
D(0) = 1; a (1, 1) identity matrix, i.e., merely a number 1. The second simplest case
was given by (20.74) that describes D(1). Meanwhile, the simplest case for (ii) is
ð1=2Þ
D(1/2) whose matrix representation was given by Dα,β,γ in (20.45) or by gαβγ in
(20.300). With ±gαβγ , both Ad[±gαβγ] produce a real orthogonal (3, 3) matrix
expressed as (20.302). This representation matrix allowedly belongs to SO(3).

20.5 Connectedness of Lie Groups

As already discussed in Chap. 6, reflecting the topological characteristics the


connectedness is an important concept in the Lie groups. If we think of the analytic
functions on the complex plane ℂ, the connectedness was easily envisaged geomet-
rically. To deal with the connectedness in general Lie groups, however, we need
abstract concepts. In this section we take a down-to-earth approach as much as
possible.

20.5.1 Several Definitions and Examples

To discuss this topic, let us give several definitions and examples.


Definition 20.5 Suppose that A is a subset of GL(n, ℂ) and that we have any pair of
elements a, b 2 A ⊂ GL(n, ℂ). Meanwhile, let f(t) be a continuous function defined in
an interval 0 ≤ t ≤ 1 and take values within A; that is, f(t) 2 A with 8t. If f(0) = a and
f(1) = b, then a and b are said to be connected within A.
Major parts of the discussion that follows are based mostly upon literature [9]. A
simple example for the above is given below.
-1
Example 20.6 Let a = 1
0
0
1
and b = 0
0
-1
be elements of SO(2). Let f(x)
be described by

cos πt - sin πt
f ðt Þ = :
sin πt cos πt

Then, f(t) is a continuous matrix function of all real numbers t. We have f(0) = a
and f(1) = b and, hence, a and b are connected to each other within SO(2).
916 20 Theory of Continuous Groups

For a and b to be connected within A is denoted by a~b. Then, the symbol ~


satisfies the equivalence relation. That is, we have a following proposition.
Proposition 20.2 We have a, b, c 2 A ⊂ GL(n, ℂ). Then, we have following three
relations. (i) a~a. (ii) If a~b, b~a. (iii) If a~b and b~c, then we have a~c.
Proof (i) Let f(t) = a (0 ≤ t ≤ 1). Then, f(0) = f(1) = a so that we have a~a. (ii) Let
f(t) be a continuous curve that connects a and b such that f(0) = a and f(1) = b (0 ≤ t ≤ 1),
and so a~b. Let g(t) be another continuous curve within A. Then, g(t)  f(1 - t) is also
a continuous curve within A. Then, g(0) = b and g(1) = a. This implies b~a. (iii) Let
f(t) and g(t) be two continuous curves that connect a and b and b and c, respectively.
Meanwhile, we define h(t) as

f ð2t Þ ð0 ≤ t ≤ 1=2Þ
hðt Þ = gð2t - 1Þ ð1=2 ≤ t ≤ 1Þ
,

so that h(t) can be a third continuous curve. Then, h(1/2) = f(1) = g(0). From the
supposition we have h(0) = f(0) = a, h(1/2) = f(1) = g(0) = b, and h(1) = g(1) = c.
This implies if a~b and b~c, then we have a~c.
Let us give another related definition.
Definition 20.6 Let A be a subset of GL(n, ℂ). Let a point a 2 A. Then, a collection
of points that are connected to a within A is said to be a connected component of
a and denoted by C(a).
From Proposition 20.2 (i), a 2 C(a). Let C(a) and C(b) be two different connected
components. We have two alternatives with C(a) and C(b). (I) C(a) \ C(b) = ∅.
(II) C(a) = C(b). Suppose c 2 C(a) \ C(b). Then, it follows that c~a and c~b. From
Proposition 20.2 (ii) and (iii), we have a~b. Hence, we get C(a) = C(b). Thus, the
subset A is a direct sum of several connected components such that

A = CðaÞ [ CðbÞ [ ⋯ [ CðzÞ with any CðpÞ \ CðqÞ = ∅ðp ≠ qÞ, ð20:317Þ

where p, q are taken from among a, b, ⋯, z. In particular, the connected component


containing e (i.e., the identity element) is of major importance.
Definition 20.7 Let A be a subset of GL(n, ℂ). If any pair of elements a, b (2A) is
connected to each other within A, then A is connected or A is called a connected set.
The connected set is nothing but a set that consists of a single connected set. With
a connected set, we have the following important theorem.
Theorem 20.8 An image of a connected set A obtained by a continuous mapping
f is a connected set as well.
Proof The proof is almost self-evident. Let f: A ⟶ B be a continuous mapping from
a connected set A to B. Suppose f(A) = B. Then, we have 8a, a′ 2 A and b, b′ 2 B that
20.5 Connectedness of Lie Groups 917

satisfy f(a) = b and f(a′) = b′. Then, from Definition 20.5 there is a continuous
function g(t) (0 ≤ t ≤ 1) that satisfies g(0) = a and g(1) = a′.
Now, define a function h(t)  f[g(t)]. Then, h(t) is a continuous mapping with
h(0) = f[g(0)] = f(a) = b and h(1) = f[g(1)] = f(a′) = b′. Again from Definition 20.5,
h(t) is a continuous curve that connects b and b′ within B. Meanwhile, from
supposition of f(A) = B, with 8b, b′ 2 B we must have ∃ a0 , a00 2 A that satisfy
f(a0) = b and f a00 = b0 . Consequently, from Definition 20.7, B is a connected set as
well. This completes the proof.
Applications of Theorem 20.8 are given below as examples.
Example 20.7 Let f(x) be a real continuous function defined on a subset
A ⊂ GL(n, ℝ), where x 2 A with A being a connected set. Suppose that for a
pair of a, b 2 A, f(a) = p and f(b) = q ( p, q : real) with p < q. Then, f(x) takes all
real numbers for its value that exist on the interval [p, q]. In other words, we have
f(A) ⊃ [p, q]. This is well-known as an intermediate value theorem.
Example 20.8 [9] Suppose that A is a connected subset of GL(n, ℝ). Then, with any
pair of elements a, b 2 A ⊂ GL(n, ℝ), we must have a continuous function
g(t) (0 ≤ t ≤ 1) such that g(0) = a and g(1) = b. Meanwhile, suppose that we can
define a real determinant for any 8g 2 A as a real continuous function detg(t).
Now, suppose that deta = p < 0 and detb = q > 0 with p < q. Then, from
Theorem 20.8 we would have det(A) ⊃ [p, q]. Consequently, we must have some
x 2 A such that det(x) = 0. But this is in contradiction to A ⊂ GL(n, ℝ), because we
must have det(A) ≠ 0. Thus, within a connected set the sign of the determinant should
be constant, if the determinant is defined as a real continuous function.
In relation to the discussion of the connectedness, we have the following general
theorem.
Theorem 20.9 [9] Let G0 be a connected component within a linear Lie group
G that contains the identity element e. Then, G0 is an invariant subgroup of G. A
connected component C(g) containing g 2 G is identical with gG0 = G0g.
Proof Let a, b 2 G0. Since G0 = C(e), from the supposition there are continuous
curves f(t) and g(t) within G0 that connect a with e and b with e, respectively, such
that f(0) = a, f(1) = e and g(0) = b, g(1) = e. Meanwhile, let h(t)  f(t)[g(t)]-1. Then,
h(t) is another continuous curve. We have h(0) = f(0)[g(0)]-1 = ab-1 and h(1) = f(1)
[g(1)]-1 = e(e-1) = e  e = e. That is, ab-1 and e are connected, being indicative of
ab-1 2 G0. This implies that G0 is a subgroup of G.
Next, with 8g 2 G and 8a 2 G0 we have f(t) in the same sense as the above.
Defining k(t) as k(t) = g f(t)g-1, k(t) is a continuous curve within G with k(0) = gag-1
and k(1) = g eg-1 = e. Hence, we have gag-1 2 G0. Since a is an arbitrary point of G0,
we have gG0g-1 ⊂ G0 accordingly. Similarly, g is an arbitrary point of G, and so
replacing g with g-1 in the above, we get g-1G0g ⊂ G0. Operating g and g-1 from the
left and right, respectively, we obtain G0 ⊂ gG0g-1. Combining this with the above
918 20 Theory of Continuous Groups

relation gG0g-1 ⊂ G0, we obtain gG0g-1 = G0 or gG0 = G0g. This means that G0 is
an invariant subgroup of G; see Sect. 16.3.
Taking 8a 2 G0 and f(t) in the same sense as the above again, l(t) = g f(t) is a
continuous curve within G with l(0) = ga and l(1) = g e = g. Then, ga and g are
connected within G. That is, ga 2 C(g). Since a is an arbitrary point of G0,
gG0 ⊂ C(g). Meanwhile, choosing any 8p 2 C(g), we set a continuous curve
m(t) (0 ≤ t ≤ 1) that connects p with g within G. This implies that m(0) = p and
m(1) = g.
Also setting a continuous curve n(t) = g-1m(t), we have n(0) = g-1m(0) = g-1p
and n(1) = g-1m(1) = g-1g = e. This means that g-1p 2 G0. Multiplying g on its
both sides, we get p 2 gG0. Since p is arbitrarily chosen from C(g), we get
C(g) ⊂ gG0. Combining this with the above relation gG0 ⊂ C(g), we obtain
C(g) = gG0 = G0g. These complete the proof.
In the above proof, we add the following statement: In Sect. 16.2 with the
necessary and sufficient condition for the subset to be a subgroup, we describe
(1) hi , hj 2 H ⟹hi ⋄hj 2 H . (2) h 2 H ⟹h - 1 2 H . In (1) if we replace hj with
hi- 1 , hi ⋄hj- 1 = hi ⋄hi- 1 = e 2 H . In turn, if we replace hi with e,
hi ⋄hj- 1 = e⋄hj- 1 = hj- 1 2 H . Finally, if we replace hj = hj- 1 ,
-1
hi ⋄hj- 1 = hi ⋄ hj- 1 = hi ⋄hj 2 H . Thus, H satisfies the axioms (A1), (A3), and
(A4) of Sect. 16.1 and, hence, forms a (sub)group. In other words, the conditions
(1) and (2) are combined so as to be

hi 2 H , hj 2 H ⟹hi ⋄hj- 1 2 H :

The above general theorem can immediately be applied to an important class of


Lie groups SO(n). We discuss this topic below.

20.5.2 O(3) and SO(3)

In Chap. 17 we dealt with finite groups related to O(3) and SO(3), i.e., their
subgroups. In this section, we examine several important properties in terms of Lie
groups.
The groups O(3) and SO(3) are characterized by their determinant detO(3) = ± 1
and detSO(3) = 1. Example 20.8 implies that in O(3) there should be two different
connected components according to detO(3) = ± 1. Also Theorem 20.9 tells us that
a connected component C(E) is an invariant subgroup G0 in O(3), where E is the
identity element of O(3). Obviously, G0 is SO(3). In turn, another connected
component is SO(3)c, where SO(3)c denotes a complementary set of SO(3); with
the notation see Sect. 6.1. Remembering (20.317), we have
20.5 Connectedness of Lie Groups 919

Oð3Þ = SOð3Þ [ SOð3Þc : ð20:318Þ

This can be rewritten as a direct sum such that

Oð3Þ = C ðEÞ [ Cð- E Þ with C ðEÞ \ C ð- EÞ = ∅, ð20:319Þ

-1 0 0
where E denotes a (3, 3) unit matrix and - E = 0 -1 0 . In the above,
0 0 -1
SO(3) is identical to C(E) with SO(3) itself being a connected set; SO(3)c is identical
with C(-E) that is another connected set.
Another description of O(3) is given by [13]

Oð3Þ = SOð3Þ  fE, - Eg, ð20:320Þ

where the symbol  denotes a direct-product group (Sect. 16.5). An alternative


description for this is

Oð3Þ = SOð3Þ [ fð- EÞSOð3Þg, ð20:321Þ

where the direct sum is implied.


In Sect. 20.2.6, we discussed two rotations around different rotation axes with the
same rotation angle. These rotations belong to the same conjugacy class; see
(20.140). With Q as another rotation that transforms the rotation axis of Rω to that
of R0ω , the two rotations Rω and R0ω having the same rotation angle ω are associated
with each other through the following relation:

R0ω = QRω Q - 1 : ð20:140Þ

Notice that this relation is independent of the choice of specific coordinate


system. In Fig. 20.4 let us choose the z-axis for the rotation axis A with respect to
the rotation Rω. Then, the rotation matrix Rω is expressed in reference to the
Cartesian coordinate system as

cos ω - sin ω 0
Rω = sin ω cos ω 0 : ð20:322Þ
0 0 1

Meanwhile, multiplying -E on both sides of (20.140), we have

- R0ω = Qð- Rω ÞQ - 1 : ð20:323Þ

Note that -E is commutative with any Q (or Q-1). Also, notice that from
(20.323) - R0ω and -Rω are again connected via a unitary similarity transformation.
From (20.322), we have
920 20 Theory of Continuous Groups

- cos ω sin ω 0
- Rω = - sin ω - cos ω 0
0 0 -1
cos ðω þ π Þ - sin ðω þ π Þ 0
= sin ðω þ π Þ cos ðω þ π Þ 0 , ð20:324Þ
0 0 -1

where -Rω represents an improper rotation of an angle ω + π (see Sect. 17.1).


Referring to Fig. 17.12 and Table 17.6 (Sect. 17.3), as another tangible example
we further get

0 -1 0 -1 0 0 0 1 0
-1 0 0 0 -1 0 = 1 0 0 , ð20:325Þ
0 0 -1 0 0 -1 0 0 1

where the first component of LHS belongs to O (octahedral rotation group) and the
RHS belongs to Td (tetrahedral group) that gives a mirror symmetry. Combining
(20.324) and (20.325), we get the following schematic representation:

ðRotationÞ × ðInversionÞ = ðImproper rotationÞ or ðMirror symmetryÞ:

This is a finite group version of (20.320).

20.5.3 Simply Connected Lie Groups: Local Properties


and Global Properties

As a final topic of Lie groups, we outline the characteristic of simply connected


groups.
In (20.318), (20.319), and (20.321) we showed three different decompositions of
O(3). This can be generalized to O(n). That is, instead of -E we may take
F described by [9]

1
1
F= ⋱ , ð20:326Þ
1
-1

where det F = - 1. Then, we have

OðnÞ = SOðnÞ [ SOðnÞc = C ðEÞ [ CðF Þ = SOðnÞ þ FSOðnÞ, ð20:327Þ

where the last side represents the coset decomposition (Sect. 16.2). This is essen-
tially the same as (20.321). In terms of the isomorphism, we have
20.5 Connectedness of Lie Groups 921

OðnÞ=SOðnÞ ffi fE, F g, ð20:328Þ

where {E, F} is a subgroup of O(n).


If in particular n is odd (n ≥ 3), we can choose -E instead of F, because det(-
E) = - 1. Then, similarly to (20.320) we have

OðnÞ = SOðnÞ  fE, - E g ðn : oddÞ, ð20:329Þ

where E is a (n, n) identity matrix. Note that since -E is commutative with any
elements of O(n), the direct-product of (20.329) is allowed.
Of the connected components, that containing the identity element is of particular
importance. This is because as already mentioned in Sect. 20.3.1, the “initial”
condition of the one-parameter group A(t) is set at t = 0 such that

Að0Þ = E: ð20:238Þ

Under this condition, it is important to examine how A(t) evolves with t, starting
from E at t = 0. In this connection we have the following theorems that give good
grounds to further discussion of Sect. 20.2. We describe them without proof.
Interested readers are encouraged to look up literature [9].
Theorem 20.10 Let G be a linear Lie group. Let G0 be a connected component of
the identity element e 2 G. Then, G0 is another linear Lie group and the Lie algebra
of G and G0 is identical.
Theorem 20.10 shows the significance of the connected component of the
identity. From this theorem the Lie algebras of, e.g., oðnÞ and soðnÞ are identical,
namely both of the Lie algebras are given by real skew-symmetric matrices. This is
inherently related to the properties of Lie algebras. The zero matrix (i.e., zero vector)
as an element of Lie algebra corresponds to the identity element of the Lie group.
This is obvious from the relation

e = exp 0,

where e is the identity element of the Lie group and 0 represents the zero matrix.
Basically Lie algebras are suited for describing local properties associated with
infinitesimal transformations around the identity element e. More specifically, the
exponential functions of the real skew-symmetric matrices cannot produce -E.
Considering that one of the connected components of O(3) is SO(3) that contains
the identity element, it is intuitively understandable that oðnÞ and soðnÞ are the same.
Another interesting point is that SO(3) is at once an open set and a closed set (i.e.,
clopen set; see Sect. 6.1). This is relevant to Theorem 6.3 which says that a necessary
and sufficient condition for a subset S of a topological space to be both open and
closed at once is Sb = ∅. Since the determinant of any elements of O(3) is alterna-
tively ±1, it is natural that SO(3) has no boundary; if there were boundary, its
determinant would be zero, in contradiction to SO(3) ⊂ GL(n, ℝ); see Example 20.8.
922 20 Theory of Continuous Groups

Theorem 20.11 Let G be a connected linear Lie group. (The Lie group G may be G0
in the sense of Theorem 20.10.) Then, 8g 2 G can be described by

g = ðexp t1 X 1 Þðexp t 2 X 2 Þ⋯ðexp t d X d Þ,

where Xi (i = 1, 2, ⋯, d ) is a basis set of Lie algebra corresponding to G and


ti (i = 1, 2, ⋯, d ) is an arbitrary real number.
Originally, the notion of Lie algebra has been introduced to deal with infinites-
imal transformation near the identity element. Yet, Theorem 20.11 says that any
transformation (or element) of a connected linear Lie group can be described by a
finite number of elements of the corresponding Lie algebra. That is, the Lie algebra
determines the global characteristics of the connected Lie group.
Strengthening the condition of connectedness, we need to deal with simply
connected groups. In this context we have the following definitions.
Definition 20.8 Let G be a linear Lie group and let an interval I = [0, 1]. Let f be a
continuous function such that

f : I⟶G or f ðI Þ ⊂ G: ð20:330Þ

Then, the function f is called a path, in which f(0) and f(1) are an initial point and
an end point, respectively. If f(0) = f(1) = x0, f is called a loop (i.e., a closed path) at
x0 [14]. If f(t)  x0 (0 ≤ t ≤ 1), f is said to be a constant loop.
Definition 20.9 Let f and g be two paths. If f and g can continuously be deformed
from one to the other, they are said to be homotopic.
To be more specific with Definition 20.9, let us define a function h(s, t) that is
continuous with respect to s and t in a region I × I = {(s, t); 0 ≤ s ≤ 1, 0 ≤ t ≤ 1}. If
furthermore h(0, t) = f(t) and h(1, t) = g(t) hold, f and g are homotopic [9].
Definition 20.10 Let G be a connected Lie group. If all the loops at an arbitrary
chosen point 8x 2 G are homotopic to a constant loop, G is said to be simply
connected.
This would somewhat be an abstract concept. Rephrasing the statement in a word,
for G to be simply connected is that loops at any 8x 2 G can continually be
contracted to that point x. A next example helps understand the meaning of being
simply connected.
Example 20.9 Let S2 be a spherical surface of ℝ3 such that

S2 = x 2 ℝ3 ; hxjxi = 1 : ð20:331Þ

Any x is equivalent in virtue of the spherical symmetry of S2. Let us think of any
loop at x. This loop can be contracted to that point x. Hence, S2 is simply connected.
20.5 Connectedness of Lie Groups 923

By the same token, a spherical hypersurface (or hypersphere) given by


Sn = {x 2 ℝn + 1; hx| xi = 1} is simply connected. Thus, from (20.36) SU(2) is
simply connected. Note that the parameter space of SU(2) is S3.
Example 20.10 In Sect. 20.4.2 we showed that SO(3) is a connected set. But it is
not simply connected. To see this, we return to Sect. 17.4.2 that dealt with the three-
dimensional rotation matrices. Rewriting R~3 in (17.101) as R~3 ðα, β, γ Þ, we have

cos α cos β cos γ - sin α sin γ - cos α cos β sin γ - sin α cos γ cos α sin β
R3 ðα, β; γ Þ ¼ sin α cos β cos γ þ cos α sin γ - sin α cos β sin γ þ cos α cos γ sin α sin β :
- sin β cos γ sin β sin γ cos β
ð20:332Þ

Note that in (20.332) we represent the rotation in the moving coordinate system.
Putting β = 0 in (20.332), we get

R3 ðα; 0; γ Þ
cos α cos γ - sin α sin γ - cos α sin γ - sin α cos γ 0
= sin αð cos 0Þ cos γ þ cos α sin γ - sin αð cos 0Þ sin γ þ cos α cos γ 0
0 0 1
cos ðα þ γ Þ - sin ðα þ γ Þ 0
= sin ðα þ γ Þ cos ðα þ γ Þ 0 :
0 0 1
ð20:333Þ

If in (20.333) we replace α with α + ϕ0 and γ with γ - ϕ0, we are going to obtain


the same result as (20.333). That is, different sets of parameters give the same result
(i.e., the same rotation) such that

ðα, 0, γ Þ⟷ðα þ ϕ0 , 0, γ - ϕ0 Þ: ð20:334Þ

Putting β = π in (20.332), we get

- cos ðα - γ Þ - sin ðα - γ Þ 0
R3 ðα; π; γ Þ = - sin ðα - γ Þ cos ðα - γ Þ 0 : ð20:335Þ
0 0 -1

If, in turn, we replace α with α + ϕ0 and γ with γ + ϕ0 in (20.335), we obtain the


same result as (20.335). Again, different sets of parameters give the same result (i.e.,
the same rotation) such that

ðα, π, γ Þ⟷ðα þ ϕ0 , π, γ þ ϕ0 Þ: ð20:336Þ

Meanwhile, R3 of (17.107) is expressed as


924 20 Theory of Continuous Groups

ðcos 2 α cos 2 βþ sin 2 αÞ cos γ cos α sin α sin 2 βð1 - cos γ Þ cos α cos β sin βð1 - cos γ Þ

þ cos α sin β
2 2
- cos β sin γ þ sin α sin β sin γ

R3 ¼ cos α sin α sin 2 βð1 - cos γ Þ ðsin 2 α cos 2 βþ cos 2 αÞ cos γ sin α cos β sin βð1 - cos γ Þ
:
þ cos β sin γ þ sin 2 α sin 2 β - cos α sin β sin γ

cos β sin β cos αð1 - cos γ Þ sin α cos β sin βð1 - cos γ Þ sin 2 β cos γ
- sin α sin β sin γ þ cos α sin β sin γ þ cos 2 β

ð20:337Þ

Note in (20.337) we represent the rotation in the fixed coordinate system. In this
case, again we have arbitrariness with the choice of parameters. Figure 20.7 shows a
geometry of the rotation axis (A) accompanied by a rotation γ; see Fig. 17.18 once
again. In this parameter space, specifying SO(3) the rotation is characterized by
azimuthal angle α, zenithal angle β, and magnitude of rotation γ. The domains of
variability of the parameters (α, β, γ) are

0 ≤ α ≤ 2π, 0 ≤ β ≤ π, 0 ≤ γ ≤ 2π:

Figure 20.7a shows a geometry viewed in parallel to the equator that includes a
zenithal angle β as well as a point P (i.e., a point of intersection between the rotation
axis and geosphere) and its antipodal point P. If β is measured at P, β and γ should be
replaced with π - β and 2π - γ, respectively. Meanwhile, Fig. 20.7b, c depict an
azimuthal angle α viewed from above the north pole (N). If the azimuthal angle α is
measured at the antipodal point P, α should be replaced with either α + π (in the case
of 0 ≤ α ≤ π; see Fig. 20.7b) or α - π (in the case of π < α ≤ 2π; see Fig. 20.7c).
Consequently, we have the following correspondence:

ðα, β, γ Þ $ ðα ± π, π - β, 2π - γ Þ: ð20:338Þ

Thus, we confirm once again that the different sets of parameters give the same
rotation. In other words, two different points in the parameter space correspond to
the same transformation (i.e., rotation). In fact, using these different sets of param-
eters, the matrix form of (20.337) is held unchanged. The conformation is left for
readers.
Because of the aforementioned characteristic of the parameter space, a loop
cannot be contracted to a single point in SO(3). For this reason, SO(3) is not simply
connected.
In contrast to Example 20.10, the mapping from the parameter space S3 to
SU(2) is bijective in the case of SU(2). That is, any two different points of S3 give
different transformations on SU(2) (i.e., injective). For any element of SU(2) it has a
corresponding point in S3 (i.e., surjective). This can be rephrased such that SU(2) and
S3 are isomorphic.
In this context, let us give important concepts. Let (T, τ) and (S, σ) be two
topological spaces. Suppose that f : (T, τ)  (S, σ) is a bijective mapping. If in this
20.5 Connectedness of Lie Groups 925

Fig. 20.7 Geometry of the (a)


rotation axis (A)
accompanied by a rotation γ.
(a) Geometry viewed in N
parallel to the equator. (b),
(c) Geometry viewed from
above the north pole (N). If
the azimuthal angle α is
measured at the antipodal − Equator
~ α should be
point P,
replaced with (b) α + π
(in the case of 0 ≤ α ≤ π) or
(c) α - π (in the case of π <
α ≤ 2π)
2 −
S

(b)

(c)

N
926 20 Theory of Continuous Groups

case, furthermore, both f and f -1 are continuous mappings, (T, τ) and (S, σ) are said
to be homeomorphic [14]. In our present case, both SU(2) and S3 are homeomorphic.
Apart from being simply connected or not, SU(2) and SO(3) resemble in many
aspects. The resemblance has already shown up in Chap. 3 when we considered the
(orbital) angular momentum and generalized angular momentum. As shown in
(3.30) and (3.69), the commutation relation was virtually the same. It is obvious
when we compare (20.271) and (20.275). That is, in terms of Lie algebras their
structure constants are the same and, eventually, the structure of corresponding Lie
groups is related. This became well understood when we considered the adjoint
representation of the Lie group G on its Lie algebra g. Of the groups whose Lie
algebras are characterized by the same structure constants, the simply connected
group is said to be a universal covering group. For example, among the Lie groups
O(3), SO(3), and SU(2), only SU(2) is simply connected and, hence, is a universal
covering group.
The understanding of the Lie algebra is indispensable to explicitly describing the
elements of the Lie group, especially the connected components of the identity
element e 2 G. Then, we are able to understand the global characteristics of the
Lie group. In this chapter, we have focused on SU(2) and SO(3) as a typical example
of linear Lie groups.
Finally, though formal, we describe a definition of a topological group.
Definition 20.11 [15] Let G be a set. If G satisfies the following conditions, G is
called a topological group.
1. The set G is a group.
2. The set G is a T1-space.
3. Group operations (or calculations) of G are continuous. That is, let P and Q be
two mappings such that

P : G × G → G; Pðg, hÞ = gh ðg, h 2 GÞ, ð20:339Þ


Q : G → G; QðgÞ = g - 1 ðg 2 GÞ: ð20:340Þ

Then, both the mappings P and Q are continuous.


By Definition 20.11, a topological group combines the structures of group and
topological space. Usually, the continuous group is a collective term of Lie groups
and topological groups. The continuous groups are very frequently dealt with in
various fields of natural science and mathematical physics.

References

1. Inui T, Tanabe Y, Onodera Y (1990) Group theory and its applications in physics. Springer,
Berlin
References 927

2. Inui T, Tanabe Y, Onodera Y (1980) Group theory and its applications in physics, Expanded
edn. Shokabo, Tokyo. (in Japanese)
3. Takagi T (2010) Introduction to analysis, Standard edn. Iwanami, Tokyo. (in Japanese)
4. Arfken GB, Weber HJ, Harris FE (2013) Mathematical methods for physicists, 7th edn.
Academic Press, Waltham
5. Hamermesh M (1989) Group theory and its application to physical problems. Dover, New York
6. Rose ME (1995) Elementary theory of angular momentum. Dover, New York
7. Hassani S (2006) Mathematical physics. Springer, New York
8. Edmonds AR (1957) Angular momentum in quantum mechanics. Princeton University Press,
Princeton, NJ
9. Yamanouchi T, Sugiura M (1960) Introduction to continuous groups. Baifukan, Tokyo.
(in Japanese)
10. Dennery P, Krzywicki A (1996) Mathematics for physicists. Dover, New York
11. Satake I-O (1975) Linear algebra (pure and applied mathematics). Marcel Dekker, New York
12. Satake I (1974) Linear algebra. Shokabo, Tokyo. (in Japanese)
13. Steeb W-H (2007) Continuous symmetries, Lie algebras, differential equations and computer
algebra, 2nd edn. World Scientific, Singapore
14. McCarty G (1988) Topology. Dover, New York
15. Murakami S (2004) Foundation of continuous groups, Revised edn. Asakura Publishing,
Tokyo. (in Japanese)
Part V
Introduction to the Quantum Theory of
Fields

In Part I we described the fundamental feature of quantum mechanics. The theory


based upon the Schrödinger equation was inconsistent with the theory of relativity,
however. To make the theory consistent with the special theory of relativity, Dirac
developed his theory that led to the discovery of the Dirac equation, named after
himself. The success by Dirac was followed by the development of the quantum
theory of fields.
In this part, we study the basic properties and constitution of the Dirac equation.
We examine various properties of the plane wave solutions of the Dirac equation.
The quantum theory of fields was first established as quantum electrodynamics
(QED), in which the Dirac field and electromagnetic field (or photon field) play
the leading role. With the electromagnetic field, we highlight the coexistence of
classical theory and quantum theory. Before the quantization of the Dirac field and
photon field, however, we start with the canonical quantization of the scalar field
(or the Klein–Gordon field) because of simplicity and accessibility as well as the
wide applicability to other fields. After studying the quantization of fields, we
approach several features of the interaction between the fields, with emphasis upon
the practical aspect. The interactions between the fields are presented as nonlinear
field equations. Since it is difficult to obtain analytic solutions of the nonlinear
equation, suitable approximation must be done. The method for this is well
established as the perturbation theory that makes the most of the S-matrix. In this
book, we approach the problem by calculating the interaction process in lowest order
of the perturbation theory. As a tangible example, we take up Compton scattering
and study its characteristics in detail.
In the latter half, we examine basic formalism of the Lorentz group from the point
of view of the Lie groups and Lie algebras. In particular, we explore in detail the
Lorentz transformation properties of the Dirac equation and its plane wave solutions.
We stress the importance and practicability of the matrix representation of the Dirac
operators and Dirac spinors with respect to the Dirac equation. We study their
transformation properties in terms of the matrix algebra that is connected to the
Lorentz group. The Dirac operators associated with the Lorentz transformation are
somewhat intractable, however, since the operators are not represented by normal
930 Part V Introduction to the Quantum Theory of Fields

matrices (e.g., Hermitian matrix or unitary matrix) because of the presence of the
Lorentz boost. We examine how these issues are dealt with within the framework of
the standard matrix theory. This last part of the book deals with advanced topics of
Lie algebra, with its differential representation stressed. Since the theory of vector
spaces underlies all parts of the final part, several extended concepts of vector spaces
are explained in relation to Part III.
Historically speaking, the Schrödinger equation has obtained citizenship across a
wide range of scientific community including physics, chemistry, materials science,
and even biology. As for the Dirac equation, on the other hand, it seems to have
found the place only in theoretical physics, especially the quantum theory of fields.
Nonetheless, in recent years the Dirac equation often shows up as a fundamental
equation in, e.g., solid-state physics. Under these circumstances, we wish to inves-
tigate the constitution and properties of the Dirac equation as well as the Dirac
operators and spinors on a practical level throughout the whole of this part.
Chapter 21
The Dirac Equation

In this chapter we describe the feature of the Dirac equation including its historical
background. The Dirac equation is a quantum theory of electron that deals with the
electron dynamics within the framework of special theory of relativity. Before the
establishment of the theory achieved by Dirac, several attempts to develop the
relativistic quantum-mechanical theory of electron were made mainly on the basis
of the Klein-Gordon equation. The theory, however, had a serious problem in terms
of the probability density. Dirac proposed the Dirac equation, named after himself, to
dispel the problem. Most importantly, the Dirac equation opened up a novel route to
the quantum theory of fields. The quantum theory of fields has first been established
as the quantum electrodynamics (QED) that deals with electron and photons as
quantized fields. The theory has later become the foundation of the advanced
quantum field theory based on the gauge theory. In this chapter we study and pursue
the framework of the Dirac equation with emphasis upon the plane wave solutions to
make it a clear starting point for gaining a good understanding of QED. The structure
of the Dirac equation can be well-understood using the polar coordinate representa-
tion. The results will be used in Chap. 24 as well to further develop the theory. The
Dirac equation is originally “relativistic.” To help understand the nature of the Dirac
equation, we add several remarks on the special theory of relativity at the beginning
of the chapter accordingly.

21.1 Historical Background

The success in establishing the atomic-scale mechanics by the Schrödinger equation


prompted many researchers to pursue relativistic formulation of the quantum
mechanics. As a first approach for this, the Klein-Gordon equation (occasionally
referred to as Klein-Gordon-Fock equation) was proposed [1, 2]. It is expressed as

© Springer Nature Singapore Pte Ltd. 2023 931


S. Hotta, Mathematical Physical Chemistry,
https://doi.org/10.1007/978-981-99-2512-4_21
932 21 The Dirac Equation

2
1 ∂ mc 2
-—2 þ ϕðt, xÞ = 0, ð21:1Þ
c2 ∂t 2 ħ

where the operator — 2 has already appeared in (1.24) and (7.30). More succinctly we
have

mc 2
□þ ϕðt, xÞ = 0,
ħ

where □ is d’Alembertian (or d’Alembert operator) defined by

2
1 ∂
□ - — 2:
c2 ∂t 2

The Klein-Gordon equation is a relativistic wave equation for a quantum-


mechanical free particle. This equation is directly associated with the equivalence
theorem of mass and energy. In other words, it is based on the following relationship
among energy, momentum, and mass of the particle:

E 2 = p 2 c2 þ m 2 c4 , ð21:2Þ

where E, p, and m are energy, momentum, and mass of the particle, respectively; see
(1.20) and (1.21). Equation (21.1) can be obtained using the correspondence
between momentum or energy and differential operators expressed in (1.31) or
(1.40).
Nonetheless, the Klein-Gordon equation included a fundamental difficulty in
explaining a probability density for the presence of particle. The argument is as
follows: The Schrödinger equation is described as

∂ψ ħ2 2
iħ = - — þ V ψ: ð1:45Þ
∂t 2m

Multiplying both sides of (1.45) by ψ  from the left, we have

∂ψ ħ2  2
iħψ  =- ψ — ψ þ Vψ  ψ: ð21:3Þ
∂t 2m

Taking complex conjugate of (1.45) and multiplying ψ from the right, we have

∂ψ  ħ2
- iħ ψ= - — 2 ψ  ψ þ Vψ  ψ: ð21:4Þ
∂t 2m

Subtracting (21.4) from (21.3), we get


21.1 Historical Background 933

∂ψ ∂ψ  ħ2
iħ ψ  þ ψ = — 2ψ  ψ - ψ — 2ψ : ð21:5Þ
∂t ∂t 2m

Further rewriting (21.5), we have

∂  ħ
ðψ ψ Þ þ ψ  — 2 ψ - — 2 ψ  ψ = 0: ð21:6Þ
∂t 2mi

Here, we define ρ and j as

ρðx, t Þ  ψ  ðx, t Þψ ðx, t Þ,


ħ
jðx, t Þ  ½ψ  — ψ - ð— ψ  Þψ , ð21:7Þ
2mi

where x is a position vector and — is nabla [defined in (3.10)]. Recasting (21.6) by


use of (21.7), we get a continuity equation expressed as

∂ρ
þ div j = 0: ð21:8Þ
∂t

If we deem ρ to be a probability density of the particle and j to be its flow, (21.8)


represents the conservation law of the probability density; see (7.20). Notice that the
quantity ρ(x, t) defined in (21.7) is positive (or zero) anywhere at any time.
Meanwhile, we can similarly define ρ and j with the Klein-Gordon equation of
(21.1) such that

iħ ∂ϕ ∂ϕ
ρðx, t Þ  ϕ - ϕ ,
2m ∂t ∂t
ħ
jðx, t Þ  ½ϕ — ϕ - ð— ϕ Þϕ ð21:9Þ
2mi

to derive a continuity equation similar to (21.8). A problem lies, however, in


regarding ρ as the probability density. It is because ρ may be either positive or
negative. To see this, put, e.g., ϕ = f + ig (where f and g are real functions) and
calculate the first equation of (21.9). Thus, the conservation of the probability
density could not be guaranteed.
Under such circumstances, it was Dirac that settled the problem (1928); see Sect.
1.2. He proposed the Dirac equation and his theory was counted as one of the most
successful theories among those describing the equation of motion of a quantum-
mechanical particle (i.e., electron) within the framework of the special theory of
relativity.
934 21 The Dirac Equation

21.2 Several Remarks on the Special Theory of Relativity

21.2.1 Minkowski Space and Lorentz Transformation

The special theory of relativity is based on the principle of constancy of light


velocity. We give several brief remarks on the theory together with examples.
Regarding the mathematical equations dealt with in the quantum theory of fields,
we normally define natural units to represent physical quantities by putting

c = 1, ħ = 1: ð21:10Þ

We follow the custom from here on. Using the natural units, as the Klein-
Gordon equation we have

□ þ m2 ϕðt, xÞ = 0: ð21:11Þ

Also, time t and space coordinates x are denoted altogether by

x01
x = xμ = t
x
= x2
x3 ðμ = 0, 1, 2, 3Þ ð21:12Þ
x

where x0 represents the time coordinate and x1, x2, and x3 show the space coordi-
nates. The quantity x is referred to collectively as a space–time coordinate. We
follow the custom as well with the superscript notation of physical quantities. The
implication will be mentioned later (also, see Sect. 24.1). The quantity
xμ (μ = 0, 1, 2, 3) is called a four-vector; this is often represented by a single letter
x. That is, ϕ(x) is meant by ϕ(t, x).
Example 21.1 Suppose that a train is running on a straight rail track with a constant
velocity v toward the positive direction of the x-axis and that a man (M) standing on
firm ground is watching the train. Also, suppose that at the moment when the tail end
of the train is passing in front of him, he and a passenger (P) on the train adjust their
clocks to t = t′ = 0. Here, t is time for the man on firm ground and t′ for the passenger
(Fig. 21.1). Suppose moreover that a light is emitted at the tail end of the train at
t = t′ = 0.
Let O be the inertial frame of reference fixed on the man on firm ground and let O′
be another inertial frame of reference fixed on the train. Also, let the origin of O be
on that man and let that of O′ be on the tail end of the train. Let us assume that the
emitted light is observed at t = t at a position vector x (=xe1 + ye2 + ze3) of O and
that the same light is observed at t′ = t′ at a position vector x0 = x0 e01 þ y0 e02 þ z0 e03
of O′. In the above, e1, e2, e3 and e01 , e02 , e03 are individual basis sets for the Cartesian
coordinates related to O and O′, respectively.
To make the situation clearer, suppose that a laser beam has been emitted at t = 0
and x = 0 in the frame O and that the beam has been emitted at t′ = 0 and x′ = 0 in
21.2 Several Remarks on the Special Theory of Relativity 935

Fig. 21.1 Two inertial


frames of reference O and ( , )
O'. In O a man (M) is
standing on firm ground and
in O' a passenger (P) is ( , ′)
standing at the tail end of the
train. The train is running on
a straight rail track with a (P)
constant velocity v in
reference to O toward the
positive direction of the x-
axis. The man and passenger
are observing how a light
(e.g., a laser beam) is
emitted and that light hits a
(M)
ball after a moment at a
certain place Q (see text)

the frame O′. After a moment, the same beam hits a ball at a certain place Q. Let Q be
specified as x = x in the frame O and at x′ = x′ in the frame O′. Let time when the
beam hits the ball be t = t in the frame O and t′ = t′ in the frame O′. Naturally, the
man M chooses his standing position for x = 0 (i.e., the origin of O) and the
passenger P chooses the tail end of the train for x′ = 0 (i.e., the origin of O′). In
this situation, the special theory of relativity tells us that the following relationship
must hold:

ðct Þ2 - x2 = ðct 0 Þ - x0 = 0:
2 2

Note that the light velocity c is the same with the two frames of O and O′ from the
requirement of the special theory of relativity. Using the natural units, the above
equation is rewritten as

t 2 - x2 = t 0 - x0 = 0:
2 2
ð21:13Þ

Equation (21.13) represents an invariant (i.e., zero in this case) with respect to the
choice of space–time coordinates, or inertial frames of reference.
Before taking a step further to think of the implication of (21.13) and generalize
the situation of Example 21.1, let us introduce a mathematical technique. Let B(y, x)
be a bilinear form defined as [3, 4]

x1
Bðy, xÞ  yT ηx = y1 ⋯ yn η ⋮ , ð21:14Þ
xn

where x and y represent a set of coordinates of vectors in a n-dimensional vector


space. A common basis set is assumed for x and y. The coordinates x and y are
defined as a column vector described by
936 21 The Dirac Equation

x1 y1
x ⋮ ,y  ⋮ :
xn yn

Although the coordinates xi and yi (i = 1, ⋯, n) are usually chosen from either real
numbers or complex numbers, in this chapter we assume that both xi and yi are real.
In the Minkowski space we will be dealing with in this chapter, xi and yi (i = 0, 1, 2, 3)
are real numbers. The coordinates x0 and y0 denote the time coordinate and xk and
yk (k = 1, 2, 3) represent the space coordinates. We call xi and yi (i = 0, 1, 2, 3) the
space–time coordinates altogether.
In (21.14) η is said to be the metric tensor that characterizes the vector space
which we are thinking of. In a general case of a n-dimensional vector space, we have

1

1
η= , ð21:15Þ
-1

-1

where η is a (n, n) matrix having p integers of 1 and q integers of -1 ( p + q = n) with


diagonal elements; all the off-diagonal elements are zero. For this notation, see Sect.
24.1 as well.
In the present specific case, we have

1 0 0 0
0 -1 0 0
η= : ð21:16Þ
0 0 -1 0
0 0 0 -1

The tensor η of (21.16) is called Minkowski metric and the vector space pertinent
to the special theory of relativity is known as a four-dimensional Minkowski space
accompanied by space–time coordinates represented by (21.12). By virtue of the
metric tensor η, we have the following three cases:

<0
Bðx, xÞ =0 : ð21:17Þ
>0

We define a new row vector such that

ðx0 x1 x2 x3 Þ  x0 x1 x2 x3 η:

Then, the bilinear form B(x, x) given by (21.14) can be rewritten as


21.2 Several Remarks on the Special Theory of Relativity 937

x01
Bðx, xÞ = ðx0 x1 x2 x3 Þ x2
x3 : ð21:18Þ
x

With respect to individual components xν, we have

3 3
xν = xμ ημν = xμ ηνμ , ð21:19Þ
μ=0 μ=0

where xμ is called a contravariant (column) vector and xν a covariant (row) vector.


These kinds of vectors have already appeared in Sect. 20.3.2 and will be dealt with
later in detail in Sect. 24.1 in relation to tensor and tensor space. Then, we have

x0 = x 0 , x 1 = - x 1 , x 2 = - x 2 , x 3 = - x 3 : ð21:20Þ

Adopting Einstein summation convention, we express (21.18) as

3
Bðx, xÞ = xμ xμ  xμ xμ , ð21:21Þ
μ=0

where the same subscripts and superscripts are supposed to be summed over 0 to
3. Using this summation convention, (21.13) can be rewritten as
μ
xμ xμ = x0μ x0 = 0: ð21:22Þ

Notice that η of (21.16) is symmetric matrix. The inverse matrix of η is the same
as η. That is, η-1 = η or

ημν ηνρ = δμρ ,

where δμρ is Kronecker delta. Once again, note that the subscript and superscript ν are
summed. The symbols ημν and ηνρ represent η-1 and η, respectively. That is,
(ημν) = (η)μν = (η-1)μν and (ημν) = (η)μν; with these notations see (11.38).
What we immediately see from (21.17) is that unlike the inner product, B(x, x)
does not have a positive definite feature. If we put e.g., x0 = x0 = 0, we have B(x,
x) ≤ 0. For this reason, the Minkowski metric η is said to be an indefinite metric or
the Minkowski space is called an indefinite metric space.
Inserting E = Λ-1Λ in (21.18), we rewrite it as
938 21 The Dirac Equation

x01
Bðx, xÞ = ðx0 x1 x2 x3 ÞΛ - 1 Λ x2
x3 , ð21:23Þ
x

where Λ is a non-singular matrix. Suppose in (21.23) that the contravariant vector is


transformed such that

x00 x01
x01 =Λ x2 : ð21:24Þ
x02 x3
x03 x

Suppose, in turn, that with the transformation of a covariant vector we have

x00 x01 x02 x03 = ðx0 x1 x2 x3 ÞΛ - 1 : ð21:25Þ

Then, (21.23) can further be rewritten as

x0 0 3
x00 12 μ μ
Bðx, xÞ = x00 x01 x02 x03 x = x0μ x0 = x0μ x0 = Bðx0 , x0 Þ: ð21:26Þ
x0 3 μ=0

Equation (21.26) implies that if we demand the contravariant vector


xμ (μ = 0, 1, 2, 3) and the covariant vector xμ be transformed according to (21.24)
and (21.25), respectively, the bilinear form B(x, x) is held invariant before and after
the transformation caused by Λ. In fact, using (21.24) and (21.25) we have

μ σ
Bðx0 , x0 Þ = x0μ x0 = xσ Λ- 1 μ
Λμρ xρ = xσ δσρ xρ = xσ xσ = Bðx, xÞ:

In this respect, (21.22) is a special case of (21.26). Or rather, the transformation Λ


is decided so that B(x, x) can be held invariant. A linear transformation Λ that holds
B(x, x) invariant on the Minkowski space is called Lorentz transformation. A tangi-
ble form of the matrix Λ will be decided below.
To explicitly show how the covariant vector is transformed as a column vector,
taking the transposition of (21.25) we get

x00 x0
x01 -1 T
¼ Λ x1
: ð21:27Þ
x02 x2
x03 x3

Equation (21.27) indicates that a covariant vector is transformed by a matrix (Λ-1)T.


We have (Λ-1)T = (ΛT)-1. To confirm this, taking transposition of ΛΛ-1 = E, we get
(Λ-1)TΛT = E. Multiplying both sides of this equation by (ΛT)-1 from the right, we
obtain (Λ-1)T = (ΛT)-1.
Regarding the discussion of the bilinear form, we will come back to the issue in
Chap. 24.
21.2 Several Remarks on the Special Theory of Relativity 939

Remark 21.1 Up to now in this book, we have not thought about the necessity to
distinguish the covariant and contravariant vectors except for a few cases (see Sect.
20.3.2). This is because if we think of a vector, e.g., in ℝn (i.e., the n-dimensional
Euclidean space), the metric called Euclidean metric is represented by an n-dimen-
sional identity matrix En. Instead of dealing with the Lorentz transformation, let us
think of a familiar real orthogonal transformation in ℝ3.
In parallel with (21.14), as a bilinear form we have

x1 x1
Bðx, xÞ = x1 x2 x3 E3 x2 = x1 x 2 x 3 x2 ,
x3 x3

where E3 is a (3, 3) identity matrix. In this case the bilinear form B(x, x) is identical to
the positive definite inner product hx| xi. With the transformation of a contravariant
vector, we have

x0 1 x1
x0 2 =R x2 : ð21:28Þ
x0 3 x3

With a covariant vector, in turn, similarly to the case of (21.27) we have

x01 T x1 x1 x1
= R-1
T
x02 x2 = RT x2 =R x2 , ð21:29Þ
x03 x3 x3 x3

where with the second equality we used R-1 = RT for an orthogonal matrix. From
(21.28) and (21.29), we find that with the orthogonal transformation in ℝ3 we do not
need to distinguish a contravariant vector and a covariant vector. Note, however, that
unless the transformation is orthogonal, the contravariant vector and covariant vector
should be distinguished. Also, we will come back to this issue in Chap. 24.

21.2.2 Event and World Interval

If we need to specify space–time points where physical phenomena happen, we call


such physical phenomena “events.” Suppose, for instance, that we are examining a
motion of a baseball game and that we are going to decide when and where the ball is
hit with a bat. In that situation the phenomenon of batting the ball is called an event.
At the same time, we can specify the space–time coordinates where (or when) the
ball is hit in an arbitrarily chosen inertial frame of reference. Bearing in mind such a
situation, suppose in Example 21.1 that two events E and F have happened. Also,
suppose that in Fig. 21.1 an observer M on the frame O has experienced the events
E and F at xμ (E) and xμ (F), respectively. Meanwhile, another observer P on the
940 21 The Dirac Equation

frame O′ has experienced the same events E and F at x′μ (E) and x′μ (F), respectively.
In that situation we define a following quantity s.

3
s ½ x0 ð F Þ - x 0 ð E Þ  2 - k=1
½xk ðF Þ - xk ðE Þ2 : ð21:30Þ

The quantity s is said to be a world interval between the events E and F. The
world interval between the two events is classified as follows:

timelike ðs : realÞ
s= spacelike ðs : imaginaryÞ ð21:31Þ
lightlike, null ðs : zeroÞ:

In terms of the special principle of relativity which demands that physical laws be
the same in every inertial frame of reference, the world interval s must be an
invariant under the Lorentz transformation. The relationship s = 0 is associated
with the principle of constancy of light velocity. Therefore, we can see (21.31) as a
natural extension of the light velocity constancy principle. Historically, this principle
has been established on the basis of the precise physical measurements including
those performed in the Michelson-Morley experiment [5].
Now we are in the position to determine an explicit form of Λ. To make the
argument simple, we revisit Example 21.1.
Example 21.2 We assume that in Fig. 21.1 the frame O′ is moving at a velocity of
v toward the positive direction of the x1-axis of the frame O so that the x1- and x′1-
axes can coincide. In this situation, it suffices to only take account of x1 as a space
coordinate. In other words, we have

x0 = x 2 x0 = x3 :
2 3
and ð21:32Þ

Then, we assume a following linear transformation [5]:

x0 = px0 þ qx1 , x0 = rx0 þ ux1 :


0 1
ð21:33Þ

Writing (21.33) in a matrix form, we get

x0 0 x0
x0 1
= p
r
q
u x1
: ð21:34Þ

Corresponding to (21.30), we have

2 2 2 2
s2 = x 0 ð F Þ - x 0 ð E Þ - x1 ð F Þ - x1 ð E Þ = x 0 ð F Þ - x1 ð F Þ ,
21.2 Several Remarks on the Special Theory of Relativity 941

2 2 2 2
s 0 = x0 ð F Þ - x 0 ð E Þ - x0 ðF Þ - x0 ðE Þ = x0 ð F Þ - x0 ðF Þ : ð21:35Þ
2 0 0 1 1 0 1

In (21.35) we assume x0(E) = x1(E) = 0 and x′0(E) = x′1(E) = 0; i.e., as discussed


earlier we regard the event E as the fact that the tail end of the train is just passing in
front of the man standing on firm ground. Deleting F from (21.35) and on the basis of
the Lorentz invariance of s, we must have

0 2 1 2 2 2
s02 = x0 - x0 = x0 - x1 = s2 : ð21:36Þ

The coefficients p, q, r, and u of (21.33) can be determined from the requirement


that (21.36) holds with any x0 and x1. Inserting x′0 and x′1 of RHSs of (21.33) into
(21.36) and comparing the coefficients of xixk (i, k = 0, 1; i ≤ k) in (21.36), we get [5]

p2 - r 2 = 1, pq - ru = 0, q2 - u2 = - 1: ð21:37Þ

To decide four parameters, we need another condition different from (21.37).


Since we assume that the frame O′ is moving at the velocity of v to the x1-direction of
the frame O, the origin of O′ (i.e., x′1 = 0) should correspond to x1 = vx0 of O.
Namely, from (21.33) we have

0 = rx0 þ uvx0 = ðr þ uvÞx0 : ð21:38Þ

As (21.38) must hold with any x0, we have

r þ uv = 0: ð21:39Þ

Since (21.37) and (21.39) are second-order equations with respect to p, q, r, and
u, we do not obtain a unique solution. Solving (21.37) and (21.39), we get
p
p = ± 1= 1 - v2 , u = ± p, q = - vp, and r = - vu.
However, p, q, r, and u must satisfy the following condition such that

lim p q
= 1 0
:
v→0 r u 0 1

Then, we must have


p p
p = u = 1= 1 - v2 , r = q = - v= 1 - v2 :

Inserting these four parameters into (21.33), we rewrite (21.33) as


942 21 The Dirac Equation

1 1
x0 = p x0 = p
0 1
x0 - vx1 , - vx0 þ x1 :
1 - v2 1 - v2

Using a single parameter γ given by


p
γ  1= 1 - v2 ,

(21.34) can be rewritten as

x0 0 γ - γv x0
x0 1
= - γv γ x1
: ð21:40Þ

With the reversed form, we obtain

x0 γ γv x0 0
x1
= γv γ x0 1
: ð21:41Þ

Taking account of (21.32), a full (4, 4) matrix form is

0
x′ γ - γv 0 0 x0
x′
1
- γv γ 0 0 x1
2 = : ð21:42Þ
x′ 0 0 1 0 x2
3
x′ 0 0 0 1 x3

The (4, 4) matrix of (21.42) is a simple example of the Lorentz transformations.


These transformations form a group that is called the Lorentz group. If v = 0, we
have γ = 1. Then, the Lorentz transformation given by (21.42) becomes the identity
matrix (i.e., the identity element of the Lorentz group). The inverse element of the
matrix of (21.42) is given by its inverse matrix. The matrix given by (21.42) can
conveniently be expressed as follows: Putting

tanh ω  v ð- 1 < ω < 1 ⟺ - 1 < v < 1Þ ð21:43Þ

and using the formulae of the hyperbolic functions

1
cosh ω = p , ð21:44Þ
1 - tan h2 ω

we have

1 p
cosh ω = p = γ, sinh ω = γv = v= 1 - v2 : ð21:45Þ
1 - v2

Thus, (21.42) can be rewritten as


21.2 Several Remarks on the Special Theory of Relativity 943

γ - γv 0 0 cosh ω - sinh ω 0 0
- γv γ 0 0 - sinh ω cosh ω 0 0
0 0 1 0 = 0 0 1 0 : ð21:46Þ
0 0 0 1 0 0 0 1

The matrix of (21.46) is Hermitian (or real symmetric) with real positive eigen-
values and the determinant of 1. We will come back to this feature in Chap. 24. If
two successive Lorentz transformations are given by tanhω1 and tanhω2, their
multiplication is given by

cosh ω2 - sinh ω2 0 0 cosh ω1 - sinh ω1 0 0


- sinh ω2 cosh ω2 0 0 - sinh ω1 cosh ω1 0 0
0 0 1 0 0 0 1 0
0 0 0 1 0 0 0 1
coshð ω2 þ ω1 Þ - sinhð ω2 þ ω1 Þ 0 0
- sinhð ω2 þ ω1 Þ coshð ω2 þ ω1 Þ 0 0
= 1 0 :
0 0
0 0 0 1
ð21:47Þ

Equation (21.47) obviously shows the closure in terms of the multiplication. The
associative law with respect to the multiplication is evident. Thus, the matrices
represented by (21.46) satisfy the axioms (A1) to (A4) of Sect. 16.1. Hence, a
collection of these matrices forms a group (i.e., the Lorentz group). As clearly
seen in (21.46) and (21.47), the constitution of the Lorentz group is well character-
ized by the variable ω, which is sometimes called “rapidity.” Notice that the matrices
given above are not unitary. These matrices are often referred to as the Lorentz
boosts.
Example 21.3 Here let us briefly think of the Lorentz contraction. Suppose that a
straight stick exists in a certain inertial frame of reference O. The stick may or may
not be moving in reference to that frame. Being strictly formal, measuring a length of
the stick in that frame comprises (1) to measure and fix a spatial point P x1P , x2P , x3P
that corresponds to one end of the stick and (2) to measure and fix another spatial
point Q x1Q , x2Q , x3Q corresponding to the other end. The former and latter opera-
tions correspond to the events E and F, respectively. These operations must be done
at the same time in the frame. Then, the distance between P and Q (P and Q may be
either different or identical) defined as dPQ in the frame is given by a following
absolute value of the world interval:

3 2
d PQ  - k=1
xkQ - xkP : ð21:48Þ
944 21 The Dirac Equation

We define the length of the straight stick as dPQ. We have dPQ ≥ 0; dPQ = 0 if and
only if P and Q are identical. Notice that the world interval between the events E and
F is a pure imaginary from (21.30), because those events happen at the same time.
Bearing in mind the above discussion, suppose that the straight stick is laid along
the x′1-direction with its end in contact with the tail end of the train. Suppose that the
length of the stick is L′ in the frame O′ of Example 21.1. This implies that from
(21.40) x′1 = L′. (Remember that the man standing on firm ground and the passenger
on the train adjusted their watches to t = t′ = 0, at the moment when the tail end of
the train was passing in front of the man.) If we assume that the length of the stick is
read off at t (or x0) = 0 in the frame O of Example 21.1, again from (21.40) we have
p p
L = x1 = 1 - v2 x0 = L0 1 - v2 = L0 =γ,
1
ð21:49Þ

where L is a length of the stick measured in the frame O. Thisp implies that the man on
firm ground judges that the stick has been shortened by 1 - v2 (i.e., the Lorentz
contraction).
In an opposite way, suppose that a straight stick of its length of L is laid along the
x1-direction with its end at the origin of the frame O. Similarly to the above case,
x1 = L. If the length of the stick is read off at t′ (or x′0) = 0 in the frame O′, from
(21.41), i.e., the inverse matrix to (21.40), this time we have
p p
L0 = x0 = 1 - v2 x1 = L 1 - v2 = L=γ,
1

where L′ is a length of the stick measured in the frame O′. The Lorentz contraction is
again observed.
Substituting x1 = L in (21.40) gives x′0 ≠ 0. This observation implies that even
though measurements of either end of the stick have been done at the same time in
the frame O, the coinstantaneity is lost with the frame O′.
The relativeness of the time dilation can be dealt with in a similar manner. The
discussion is left for readers.
In the above examples, we have shown the brief outline and constitution of the
special theory of relativity. Since the theory is closely related to the Lorentz
transformation, or more specifically the Lorentz group, we wish to survey the
constitution and properties of the Lorentz group in more detail in Chap. 24.

21.3 Constitution and Solutions of the Dirac Equation

21.3.1 General Form of the Dirac Equation

To address the difficulty in the probabilistic interpretation of (21.9) mentioned in


Sect. 21.1, Dirac postulated that the Klein-Gordon Eq. (21.11) could be
“factorized” in such a way that
21.3 Constitution and Solutions of the Dirac Equation 945

□ þ m2 ϕðxÞ = - iγ μ ∂μ - m ðiγ ν ∂ν - mÞϕðxÞ = 0, ð21:50Þ

where γ μ denotes a certain constant mathematical quantity and ∂μ is a shorthand


notation of ∂x∂μ . The quantity ∂μ behaves as a covariant vector (see Sect. 21.2).
Remember that x represents the four-vector xμ. Once again note that the same
subscripts and superscripts should be summed. To show that ∂μ is a covariant
vector, we have

μ ρ
∂ ∂x0μ ∂ ∂Λ ρ x ∂ ∂ ∂ 0
∂ν  ν
= ν 0μ
= ν 0μ
= Λμρ δρν 0μ = Λμν 0μ = Λμν ∂μ : ð21:51Þ
∂x ∂x ∂x ∂x ∂x ∂x ∂x
ν
Multiplying (21.51) by Λ - 1 κ
, we have

ν ∂ ν ∂ ν ∂ ∂ 0
∂ν Λ- 1 κ
= Λ- 1 κ
= Λμν Λ- 1 κ
= δμκ = = ∂κ : ð21:52Þ
∂xν ∂x0μ ∂x0μ ∂x0κ

Equation (21.52) shows that ∂μ is transformed as in (21.25). That is, ∂μ behaves


as a covariant vector.
Meanwhile, defining

A  iγ μ ∂μ ,

we rewrite (21.50) as

ð- A - mÞðA - mÞϕðxÞ = 0: ð21:53Þ

This expression leads to a following equation:

ðA - mÞϕðxÞ = 0: ð21:54Þ

Since (-A - m) and (A - m) are commutative, we have an alternative expression


such that

ðA - mÞð- A - mÞϕðxÞ = 0: ð21:55Þ

This naturally produces

ð- A - mÞϕðxÞ = 0: ð21:56Þ

Equations (21.54) and (21.56) both share a solid footing for detailed analysis of
the Dirac equation. According to the custom, however, we solely use (21.54) for our
present purposes. As a consequence of the fact that the original Klein-Gordon
equation (21.11) is a second-order linear differential equation (SOLDE), we have
946 21 The Dirac Equation

another linearly independent solution (see Chap. 10). Letting that solution be χ(x),
we have

ðA - mÞχ ðxÞ = 0: ð21:57Þ

We give a full description of the Dirac equation as

iγ μ ∂μ - m ϕðxÞ = 0 and iγ μ ∂μ - m χ ðxÞ = 0: ð21:58Þ

As in the case of Example 1.1 of Sect. 1.3, we choose exponential functions for
ϕ(x) and χ(x). More specifically, we assume the solutions described by

ϕðxÞ = uðpÞe - ipx and χ ðxÞ = vðpÞeipx , ð21:59Þ

x0 p0
where x = x12
x3
, p= p12
p3
. Also, remark that
x p

px = pμ xμ = pμ xμ : ð21:60Þ

Notice that in (21.59) neither u( p) nor v( p) contains space–time coordinates. That


is, u( p) and v( p) behave as a coefficient of the exponential function.
To determine the form of γ μ in (21.50), we consider the cross term of
γ ∂μγ ν∂ν (μ ≠ ν). We have
μ

γ μ ∂μ γ ν ∂ν = γ μ γ ν ∂μ ∂ν = γ ν γ μ ∂ν ∂μ = γ ν γ μ ∂μ ∂ν ðμ ≠ νÞ, ð21:61Þ

where the last equality was obtained by exchanging ∂ν for ∂μ. Therefore, the
coefficient of ∂μ∂ν (μ ≠ ν) is

γμγν þ γνγμ:

Taking account of the fact that the cross terms ∂μ∂ν (μ ≠ ν) are not present in the
original Klein-Gordon equation (21.11), we must have

γ μ γ ν þ γ ν γ μ = 0 ðμ ≠ νÞ: ð21:62Þ

To satisfy the relation (21.62), γ μ must have a matrix form. With regard to the
2
terms of ∂μ , comparing their coefficients with those of (21.11), we get

2 2 2 2
γ0 = 1, γ 1 = γ2 = γ3 = - 1: ð21:63Þ

Combining (21.62) and (21.63), we have


21.3 Constitution and Solutions of the Dirac Equation 947

γ μ γ ν þ γ ν γ μ = 2ημν : ð21:64Þ

Introducing an anti-commutator

fA, Bg  AB þ AB, ð21:65Þ

we obtain

fγ μ , γ ν g = 2ημν : ð21:66Þ

The quantities γ μ are represented by (4, 4) matrices and called the gamma
matrices. Among several representations of the gamma matrices, the Dirac repre-
sentation is frequently used. With this representation, we have [1, 2, 6]

σk
γ0 = 1
0
0
-1
, γk = 0
- σk 0
ðk = 1, 2, 3Þ, ð21:67Þ

where each component denotes (2, 2) matrix; 1 shows a (2, 2) identity matrix;
0 shows a (2, 2) zero matrix; σ k (k = 1, 2, 3) are the Pauli spin matrices given in
(20.41). That is, as a full representation of the gamma matrices we have

1 0 0 0
0 1 0 0
γ0 = ,
0 0 - 1 0
0 0 0 -1
0 0 0 1 0 0 0 i
0 0 1 0 0 0 -i 0
γ1 = , γ2 = , ð21:68Þ
0 -1 0 0 0 -i 0 0
-1 0 0 0 i 0 0 0
0 0 -1 0
0 0 0 1
γ3 = :
1 0 0 0
0 -1 0 0

Let us determine the solutions of the Dirac equation. Replacing ϕ(x) and χ(x) of
(21.58) with those of (21.59), we get

pμ γ μ - m uðpÞ = 0 and - pμ γ μ - m vðpÞ = 0: ð21:69Þ

In accordance with (21.68) and (21.69), u( p) and v( p) are (4, 1) matrices, i.e.,
column vectors or strictly speaking spinors to be determined below. If we replaced
p of v( p) with -p, where
948 21 The Dirac Equation

p01 - p01
-E - p2
p= E
p
= p2
p3 , -p= -p
= - p3 , ð21:70Þ
p -p

we would have

pμ γ μ - m vð- pÞ = 0: ð21:71Þ

Comparing (21.71) with the first equation of (21.69), we have

vð- pÞ = cuðpÞ ðc : constantÞ: ð21:72Þ

So far, the relevant discussion looks parallel with that appeared in Example 1.1 of
Sect. 1.3. However, the present case differs from the previous example in the
following aspects: (1) Although boundary conditions (BCs) of Dirichlet type were
imposed in the previous case, we are dealing with a plane wave solution with an
infinite space–time extent. (2) In the previous case (Example 1.1), k could take both
positive and negative numbers. In the present case, however, a negative p0 does not
fit naturally into our intuition, even though the Lorentz covariance permits a negative
number for p0. This is because p0 represents energy and unlike the momentum p, p0
does not have “directionality.”
Instead of considering p as the quantities changeable between -1 and +1, we
single out p0 and deal with it as a positive parameter (0 < p0 < + 1) with varying
p1
values depending on jpj, while assuming that p = p2 are parameters freely
p3
changeable between -1 and +1. This treatment seems at first glance to violate
the Lorentz invariance, but it is not the case. We will discuss the issue in Chap. 22.
Thus, we safely assign p0 (>0) and -p0 (<0) to u( p) and v( p), respectively.
Meanwhile, as shown in (1.40) we have a relationship between the time coordinate
and energy described by

∂ ∂ ∂
iħ = i = i 0 $ E  p0 : ð1:40Þ
∂t ∂t ∂x

Therefore, once we have defined p0 > 0, ϕ(x) = u( p) e-ipx of (21.58) and (21.59)
must be assigned to the positive-energy state and χ(x) = v( p) eipx of (21.58) and
(21.59) to the negative-energy state. To further specify the solutions of (21.69) we
restate (21.69) as

pμ γ μ - m uðp, hÞ = 0, ð21:73Þ
μ
- pμ γ - m vðp, hÞ = 0, ð21:74Þ
21.3 Constitution and Solutions of the Dirac Equation 949

where (21.73) and (21.74) represent the positive-energy state and negative-energy
state, respectively. The index h is called helicity and takes a value either +1 or -1.
Notice that we have abandoned u( p) and v( p) in favor of u(p, h) and v(p, h). We will
give further consideration to the index h in the next section.

21.3.2 Plane Wave Solutions of the Dirac Equation

The Dirac equation is of special importance as a relativistic equation. The Dirac


equation has a wide range of applications in physics including atomic physics that
deals with the properties of hydrogen-like atoms (see Chap. 3) as one of the major
subjects [2]. Among the applications of the Dirac equations, however, its plane wave
solutions are most frequently dealt with for a reason mentioned below: In the field of
high-energy physics, particles (e.g., electrons) need to be accelerated close to light
velocity and in that case relativistic effects become increasingly dominant. What is
more, the motion of particles is well approximated by a plane wave besides colli-
sions of the particles that take place during a very short period of time and in a very
small extent of space. In this book, we solely deal with the plane wave solutions
accordingly.
First, we examine the properties for solutions of u(p, h) of (21.73). Using the
gamma matrices given in (21.68), the matrix equation to be solved is described as

p0 - m 0 p3 - p1 - ip2 c0
0 p -m
0
- p þ ip2
1
- p3 c1
= 0,
- p3 p1 þ ip2 - p0 - m 0 c2
p1 - ip2 p3 0 -p -m
0 c3
ð21:75Þ

c01
where c2 is a column vector representation of u(p, h). That is, we have
c3
c

c01
uðp, hÞ = c2
c3 : ð21:76Þ
c

We must have u(p, h) ≢ 0 to obtain a physically meaningful solution with u(p, h).
For this, the determinant of the (4, 4) matrix of (21.75) should be zero. That is,
950 21 The Dirac Equation

p0 - m 0 p3 - p1 - ip2
0 p0 - m - p1 þ ip2 - p3
= 0: ð21:77Þ
- p3 p1 þ ip2 - p0 - m 0
p1 - ip2 p3 0 - p0 - m

This is a quartic equation with respect to pμ. The equation leads to a simple form
described by

2 2
p0 - p2 - m 2 = 0: ð21:78Þ

Hence, we have

2
p0 - p2 - m2 = 0 or p0 = p2 þ m2 : ð21:79Þ

Remember that p0 represents the positive energy. Note that (21.79) is consistent
with (1.8), (1.9), and (1.20); recall the natural units. The result, however, has already
been implied in (21.2). Thus, we have to find another way to solve (21.75). To this
end, first let us consider (21.75) in a simpler form. That is, putting p1 = p2 = 0 in
(21.75), (21.75) reads as

p0 - m 0 p3 0 c0
0 p0 - m 0 - p3 c1
= 0: ð21:80Þ
- p3 0 - p0 - m 0 c2
0 p3 0 - p0 - m c3

Clearly, the first and second column vectors are linearly independent with each
other. Multiplying the first column by ( p0 + m) and the third column by p3 produces
the same column vectors. This implies that the first and third column vectors are
linearly dependent with each other. In a similar manner, the second and fourth
column vectors are linearly dependent as well. Thus, the rank of the matrix of
(21.80) is two. If p3 = 0 in (21.80), i.e., p = 0, from (21.79) we have
p0 = m (>0). Then, (21.80) can be rewritten as

0 0 0 0 c0
0 0 0 0 c1
- 2m = 0: ð21:81Þ
0 0 0 c2
0 0 0 - 2m c3

Equation (21.81) gives the solution of an electron at rest. We will come back to
this simple situation in Sect. 24.5. The rank of the matrix of (21.81) is two. In a
general case of (21.75) the rank of the matrix is two as well. This means the presence
of two linearly independent solutions for (21.75). The index h (h = ± 1) of u(p, h)
21.3 Constitution and Solutions of the Dirac Equation 951

represents and distinguishes these two solutions. The situation is the same as the
negative-energy solutions of v(p, h) given in (21.74); see Sect. 21.3.3.
Now, (21.75) can be rewritten in a succinct form as

c01
p0 - m - σp
σp - p0 - m
c2
c3 = 0: ð21:82Þ
c

Here, we introduce another (4, 4) matrix

1 σp
H 0
: ð21:83Þ
jpj 0 σp

Since σ  p is Hermitian, H is Hermitian as well. Since an Hermitian matrix can be


diagonalized by unitary similarity transformation (see Theorem 14.5 of Sect. 14.3),
so can both σ  p and H be. Let χ be a (2, 1) matrix, i.e., a column vector. As an
eigenfunction of (21.83), we assume a following type of (4, 1) matrix:

χ
Χ sχ
, ð21:84Þ

where s is a parameter to be determined below. To confirm whether this assumption


is valid, let us examine the properties of the following (2, 2) submatrix S, which is
described by

1
S σ  p: ð21:85Þ
jpj

The factor σ  p is contained in common in the matrix of LHS for (21.82) and H of
(21.83).
Using the notation of (21.85), we have

S 1 σp
H= 0
= 0
: ð21:86Þ
0 S jpj 0 σp

We can readily check the unitarity of S, i.e.,

S{ S = E 2 ,

where E2 is a (2, 2) identity matrix. Also, we have

detS = - 1: ð21:87Þ

Since the (2, 2) matrix S is both Hermitian and unitary, (21.87) immediately
implies that the eigenvalues of S are ±1; see Chaps. 12 and 14. Representing the
952 21 The Dirac Equation

two linearly independent functions (i.e., eigenfunctions belonging to the eigenvalues


of S of ±1) as χ - and χ +, we have

Sχ - = ð- 1Þχ - and Sχ þ = ðþ1Þχ þ ð21:88Þ

where χ - belongs to the eigenvalue -1 and χ + belongs to the eigenvalue +1 with


respect to the positive-energy solutions.
When we do not specify the eigenvalue, we write

Sχ = hχ ðh = ± 1Þ, ð21:89Þ

where h is said to be a helicity. Thus, we get

S χ Sχ χ
HΧ = 0
0
S sχ
= sSχ
=h sχ
= hΧ: ð21:90Þ

That is, we certainly find that Χ of (21.84) is an eigenvector of H that belongs to


an eigenvalue h.
Meanwhile, defining the matrix of (21.82) as

p0 - m - σp
G σp - p0 - m
= pμ γ μ - m, ð21:91Þ

we have

Guðp, hÞ = 0 ð21:92Þ

as the equation to be solved. In (21.92) u(p, h) indicates a (4, 1) column vector of


(21.76) or (21.82). We find

½G, H = 0: ð21:93Þ

That is, G and H commute with each other. Note that (21.93) results naturally
from the presence of gamma matrices. We remember that two Hermitian matrices
commute if and only if there exists a complete orthonormal set (CONS) of common
eigenfunctions (Theorem 14.14). This is almost true of G and H. Care should be
taken, however, to apply Theorem 14.14 to the present case. This is because G is
neither a normal matrix nor an Hermitian matrix except for special cases. Therefore,
the matrix G is in general impossible to diagonalize by the unitary similarity
transformation (Theorem 14.5). From the point of view of the matrix algebra, it is
of great interest whether or not G can be diagonalized. Moreover, if G could be
diagonalize, it is important to know what kind of non-singular matrix is used for the
similarity transformation (i.e., diagonalization) and what eigenvalues G possesses.
Yet, (21.92) immediately tells us that G possesses an eigenvalue zero. Otherwise, we
21.3 Constitution and Solutions of the Dirac Equation 953

are to have u(p, h)  0. We will come back to this intriguing discussion in Chap. 24
to examine the condition for G to be diagonalized.
Here, we examine what kind of eigenfunctions G and H share as the common
eigenvectors. Since G and H commute, from (21.84) and (21.90) we have

GHΧ = HGΧ = hGΧ: ð21:94Þ

This means that GΧ is an eigenvector of H. Let us tentatively consider that GΧ


can be described by

GΧ = cΧ, ð21:95Þ

where c is a constant (certainly including zero). This implies that Χ is an


eigenfunction of G and that from (21.90) and (21.95) G and H share a common
eigenfunction Χ.
Using the above results and notations, let us examine the conditions under which
χ χ

is a solution of (21.75). Replacing the column vector of (21.75) with sχ
, we
have

χ p0 - m -σ  p χ p0 - m - PS χ
GΧ = G = =
sχ σp -p -m 0
sχ PS -p -m 0 sχ
ðp0 - mÞχ - P sSχ ðp0 - mÞχ - P shχ
= = = 0,
P Sχ - ðp0 þ mÞsχ P hχ - ðp0 þ mÞsχ
ð21:96Þ

where we define a positive number P such that

P j p j : ð21:97Þ

χ
For (21.96) to be satisfied, namely, for sχ
to be an eigenvector of G that
belongs to the eigenvalue zero, the coefficient of χ of each row for (21.96) must
vanish. That is, we must have

p0 - m - P sh = 0,

P h - p0 þ m s = 0:

Then, if we postulate the following relations:

Ph p0 - m
s= = , ð21:98Þ
p0 þ m Ph

(21.96) will be satisfied. Equation (21.98) implies that


954 21 The Dirac Equation

2
ð P h Þ 2 = P 2 h 2 = P 2 = p 2 = p0 þ m p 0 - m = p 0 - m2 :

χ
However, this is equivalent to (21.79). Thus, we have confirmed that Χ = sχ
of
(21.84) is certainly an eigenfunction of G, if we choose s as s = P h=ðp þ mÞ. At the 0

χ
same time, we find that Χ = sχ
is indeed the common eigenfunction shared by G
and H with a definite value s designated by (21.98). This eigenfunction belongs to
the eigenvalue h (h = ± 1) of H and the eigenvalue 0 of G. It will be convenient for a
purpose of future reference to define a positive number S as

P
S ð21:99Þ
p0 þ m

so that we have s = hS. The eigenfunction of (21.96) is then expressed as

χ
Χ= hSχ
: ð21:100Þ

Hence, as u(p, h) given in (21.76) we obtain

χ
uðp, hÞ = N hSχ
, ð21:101Þ

where N is a normalization constant that will be determined later in Sect. 21.4.


To solve the Dirac equation in an actual form, we wish to choose the polar
coordinate representation (see Sect. 3.2). For this, we put

p1 = P sin θ cos ϕ,
p2 = P sin θ sin ϕ,
p3 = P cos θ:

Then, we have

- cos θ eiϕ sin θ


S= e - iϕ sin θ cos θ
: ð21:102Þ

Note that detS = - 1. Solving the eigenvalue equation (21.89) according to


Sects. 12.1 and 14.3, we get normalized eigenfunctions described by

θ θ
eiϕ=2 cos eiϕ=2 sin
χ- = 2
θ
and χ þ = 2
θ
ð21:103Þ
- iϕ=2 - iϕ=2
-e sin e cos
2 2
21.3 Constitution and Solutions of the Dirac Equation 955

where χ - and χ + are the eigenfunctions of the operator S belonging to the helicity
h = - 1 and +1, respectively. Then, as the positive-energy solution of (21.73) we
have

θ θ
eiϕ=2 cos eiϕ=2 sin
2 2
- iϕ=2
θ - iϕ=2
θ
-e sin e cos
uðp, - 1Þ = N 2
θ
, uðp, þ1Þ = N 2
θ
: ð21:104Þ
- Se iϕ=2
cos Se iϕ=2
sin
2 2
- iϕ=2
θ - iϕ=2
θ
Se sin Se cos
2 2

Since the rank of G is two, any positive-energy solution can be expressed as a


linear combination of u(p, -1) and u(p, +1).

21.3.3 Negative-Energy Solution of the Dirac Equation

Using (21.74), with the negative-energy sate v(p, h) we have the Dirac equation
expressed as

- p0 - m - σð - pÞ χ~
- pμ γ μ - m vðp, hÞ = σð - pÞ p0 - m χ
t~
= 0: ð21:105Þ

In accordance with (21.84) we put

~=
Χ χ~
: ð21:106Þ
t~χ

~ as
Meanwhile, defining S

~  1 1
S σ  ð- pÞ = σ  ð- pÞ = - S ð21:107Þ
j - pj P

~ as
and defining G

~  - pμ γ μ - m,
G ð21:108Þ

we rewrite (21.105) as
956 21 The Dirac Equation

~ χ~ =
G - p0 - m - PS ~ χ~
= 0: ð21:109Þ
χ
t~ PS~ p0 - m χ
t~

As in the case of (21.96), we get

~ χ~ = - ðp0 þmÞχ~ - P tS~ - ðp0 þmÞχ~ - P th~


~χ χ
G = = 0, ð21:110Þ
χ
t~ ~ χ þðp0 - mÞt~
P S~ χ χ þðp0 - mÞt~
P h~ χ

where we put

~ χ = h~
S~ χ:

For (21.110) to be satisfied, we have

- p0 þ m - P ht = 0, ð21:111Þ

hP þ p0 - m t = 0: ð21:112Þ

Then, we get

p0 þ m Ph
t= - =- 0 : ð21:113Þ
Ph p -m

Again, (21.113) is equivalent to (21.79). Hence, (21.106) is represented as

Ph
χ~ p0 þ m χ~ p0 þ m χ
hS~ 1 S~
χ
= p0 þ m = = , ð21:114Þ
χ
t~ Ph - χ~
Ph - χ~ S - h~
χ

where with the last equality the relations h2 = 1 and (21.99) were used. As a
~ the eigenvalue equation (21.89) is
consequence of switching the operator S to S,
replaced with

~ χ = h~
S~ χ = - S~
χ: ð21:115Þ

~ as in the case of (21.86) such that


Defining H

~ 1
~
H S 0
= -H= - σp 0
, ð21:116Þ
~
0 S j -p j 0 - σp

we get
21.3 Constitution and Solutions of the Dirac Equation 957

~ ~χ
~
H S~χ
= S 0 S~χ
= S S~
= Sh~χ
=h S~χ
: ð21:117Þ
- h~χ 0 ~
S - h~χ ~χ
- hS~ - h2 χ~ - h~χ

From (21.114) and (21.117), as the negative-energy solution v(p, h) of (21.105)


we get

~ S~χ
vðp, hÞ = N , ð21:118Þ
- h~χ

where N~ is a normalization constant to be determined later.


From (21.115), the negative-energy state corresponding to the helicity eigenvalue
+h of H~ should be translated into the positive-energy state corresponding to the
eigenvalue -h of H. Borrowing the results of (21.103), with the normalized
eigenfunctions we have

θ θ
- eiϕ=2 cos - eiϕ=2 sin
χ~þ = θ
2
and χ~ - = 2
θ
, ð21:119Þ
e - iϕ=2 sin - e - iϕ=2 cos
2 2

where χ~þ belongs to the helicity eigenvalue h = + 1 of S~ and χ~ - belongs to the


helicity h = - 1 of S ~ with respect to the negative-energy solutions. Thus, using
(21.114) we obtain

θ θ
- Seiϕ=2 cos - Seiϕ=2 sin
2 2
- iϕ=2
θ - iϕ=2
θ
Se sin - Se cos
~
vðp, þ1Þ = N 2 ~
, vðp, - 1Þ = N 2 : ð21:120Þ
θ θ
e iϕ=2
cos -e iϕ=2
sin
2 2
θ θ
- e - iϕ=2 sin -e - iϕ=2
cos
2 2

The column vectors of (21.104) and (21.120) expressed as (4, 1) matrices are
called the Dirac spinors. In both the cases, we retain the freedom to choose a phase
factor, because it does not alter the quantum state. Strictly speaking, the factor e±ipx
of (21.59) is a phase factor, but this factor is invariant under the Lorentz transfor-
mation. Ignoring the phase factor, with the second equation of (21.120) we may
choose
958 21 The Dirac Equation

θ
Seiϕ=2 sin
2
- iϕ=2
θ
Se cos
~
vðp, - 1Þ = N 2 : ð21:121Þ
θ
eiϕ=2 sin
2
θ
e - iϕ=2 cos
2

We will revisit this issue in relation to the Dirac adjoint and the charge conjuga-
tion (vide infra).
Since the rank of G ~ is two, any negative-energy solution can be expressed as a
linear combination of v(p, +1) and v(p, -1) as in the case of the positive-energy
solutions. In the description of the Dirac spinor, a following tangible example helps
further realize what is the point.
Example 21.4 In Sect. 21.3.1 we have mentioned that the parameter p can be freely
chosen between -1 and +1 in the three-dimensional space. Then, we wish to
examine how the Dirac spinor can be described by switching the momentum from p
to -p. Choosing u(p, -1) for the Dirac spinor, for instance, we had

θ
eiϕ=2 cos
2
θ
- e - iϕ=2 sin
uðp, - 1Þ = N 2
: ð21:104Þ
θ
- Seiϕ=2 cos
2
θ
Se - iϕ=2 sin
2

Corresponding to p → - p, we have

ðϕ, θÞ → ðϕ ± π, π - θÞ; ð20:338Þ

see Sect. 20.5.3. Upon the replacement of (20.338), u(p, -1) is converted to
21.3 Constitution and Solutions of the Dirac Equation 959

iϕ θ iϕ θ
ð ± iÞe 2 sin e 2 sin
2 2
- iϕ θ - iϕ2 θ
ð ± iÞe 2 cos e cos
2 2
u0 ð - p, þ1Þ = N iϕ θ = ð ± iÞN iϕ θ
ð ± iÞSe 2 sin Se 2 sin
2 2
iϕ θ iϕ θ
ð ± iÞSe - 2 cos Se - 2 cos
2 2
= ð ± iÞuðp, þ1Þ,
ð21:122Þ

where u′ denotes the change in the functional form; the helicity has been switched
from -1 to +1 according to (21.89). With the above switching, we have

uðp, - 1Þ → ð ± iÞuðp, 1Þ ð21:123Þ

with the energy unaltered. For u(p, +1), similarly we have

uðp, þ1Þ → ð ± iÞuðp, - 1Þ: ð21:124Þ

With the negative-energy solutions, we obtain a similar result. Then, aside from
the phase factor, the momentum switching produces

χ - $ χ þ; χþ $ χ -; h - $ hþ ,

where h- and h+ stand for the helicity -1 and +1, respectively.


Although in the non-relativistic quantum mechanics (i.e., the Schrödinger equa-
tion) the helicity was not an important dynamical variable, it plays an essential role
as a conserved quantity in the relativistic Dirac equation. In a quantum-mechanical
system, the total angular momentum (A) containing orbital angular momentum (L)
and spin angular momentum (σ) is a conserved quantity. That is, we have

A = L þ σ: ð21:125Þ

Taking an inner product of (21.125) with p, we obtain

A  p = L  p þ σ  p: ð21:126Þ

If the angular momentum is measured in the direction of motion (i.e., the direction
parallel to p), L  p = 0. It is because in this situation we can represent

L  p = ðx × pÞ  p = 0,

where x is a position vector. Thus, we get


960 21 The Dirac Equation

A  p = σ  p: ð21:127Þ

Since we are dealing with the free field, we have

A  p = σ  p = c, ð21:128Þ

where c is a constant. That is, the component of the total angular momentum in the
direction of motion is conserved and equal to the component of the spin angular
momentum in the direction of motion.
Getting back to the definition of S, we may define it as

1
S  lim σ  p: ð21:129Þ
jpj → 0 j pj

Equation (21.129) implies that S can be defined even with an infinitesimally


small amount of |p|. We remark that we can observe the motion of the electron from
various inertial frames of reference. Depending upon the choice of the inertial frame,
we may have varying p (both of varying directions and magnitudes). In particular,
small changes in p may lead to a positive or a negative with σ  p and the sudden
jump in the helicity between ±1 accordingly. This seems somewhat unnatural. Or
rather, it may be natural to assume that an electron inherently carries spin regardless
of its movement form (either moving or resting).
Even though in (21.129) S seemed to be described as an indeterminate form of
limit at p → 0, it was defined in (21.102) actually, regardless of the magnitude of
momentum jpj. Or rather, it is preferable to assume that the angles ϕ and θ represent
the geometric transformation of the basis vectors. We will come back to this point
later in Chap. 24.
Readers may well wonder whether we defined an operator σ0 σ0 instead of H
defined in (21.83). Such an operator, however, would be non-commutative with Hp
defined just below (see Sect. 21.3.4). This aspect warrants that the projection of the
spin operator σ onto the direction of motion should be taken into account. For this
reason, from here on, we will not distinguish the spin operator from the helicity
operator H and H~ when we are thinking of the plane wave solutions of the Dirac
equation.

21.3.4 One-Particle Hamiltonian of the Dirac Equation

In this section, we wish to think of how we can specify the quantum state of the plane
wave of an electron in the relativistic Dirac field. We know from Chaps. 3 and 14 that
it is important to specify a set of mutually commutative physical quantities (i.e.,
Hermitian operators) to characterize the physical system. Normally, it will suffice to
21.3 Constitution and Solutions of the Dirac Equation 961

describe the Hamiltonian of the system and seek the operators that commute with the
Hamiltonian.
First, let us think of the one-particle Hamiltonian Hp for the plane wave with the
positive energy. It is given by [2]

H p = γ 0 ðγ  p þ mÞ: ð21:130Þ

The derivation of (21.130) will be performed in Chap. 22 in relation to the


quantization of fields. As a matrix representation, we have

σ ∙p
Hp = m
σ ∙p -m
: ð21:131Þ

χ
Hence, operating Hp on hSχ
, we have

χ σ ∙p χ mχ þP hhSχ p0 χ χ
Hp hSχ
= m
σ ∙p -m hSχ
= P hχ - mhSχ
= hSp0 χ
= p0 hSχ
: ð21:132Þ

For the negative-energy solution, in turn, similarly operating H-p on

S~χ
Χ0 = - h~χ
, ð21:133Þ

we have

S~χ -σ ∙p S~χ χ - P hh~


χ - Sp0 χ~
H -p - h~χ
= m
-σ ∙p -m - h~χ
= mS~
P hS~χ þmh~χ
= hp0 χ~
=

S~χ
- p0 - h~χ
: ð21:134Þ

Thus, as expected we have reproduced a positive energy p0 and negative energy


-p0, to which u(p, h) and v(p, h) belong to, respectively. The spectra of Hp and H-p
are characterized by the continuous spectra as shown in Fig. 21.2. Note that energies
E ranging - m < E < m are forbidden.
We can readily check that Hp and H are commutative with each other. So are H-p
~ Thus, we find that the Dirac spinors u(p, h) and v(p, h) determined in (21.101)
and H.

Fig. 21.2 Continuous


spectra of Hp and H-p.
= +
Energies E ranging -
m < E < m are forbidden −

− 0 ℰ
962 21 The Dirac Equation

and (21.118) are the simultaneous eigenstates of the Hamiltonian and Hermitian
helicity operator of the Dirac field. Naturally, these Dirac spinors constitute the
complete set of the solutions of the Dirac equation. In Table 21.1 we summarize the
matrix form of the Hamiltonian and helicity operators relevant to the Dirac field
together with their simultaneous eigenstates and their eigenvalues.
It is worth comparing the Dirac equation with the Schrödinger equation from the
point of view of an eigenvalue problem. An energy eigenvalue equation based on the
Schrödinger equation is usually described as

HϕðxÞ = EϕðxÞ, ð1:55Þ

where H denotes the Hamiltonian. In a hydrogen atom, the total angular momentum
operator L2 is commutative with H; see (3.15). Since L2 (or M2) and Lz (or Mz) have
simultaneously been diagonalized, Lz is commutative with H; see (3.158) and
(3.159) as well. Since all H, L2, and Lz are Hermitian operators, there must exist a
complete orthonormal set of common eigenvectors (Theorem 14.14). Thus, with the
~ ðnÞ
l ðθ, ϕÞRl ðr Þ.
hydrogen atom, the solution ϕ(x) is represented by (3.300), i.e., Y m
As compared with the Schrödinger equation, the eigenvalue equation of the Dirac
equation is given by

Guðp, hÞ = 0 or ~ ðp, hÞ = 0:
Gv

In other words, we are seeking an eigenfunction u(p, h) or v(p, h) that belongs to an


eigenvalue of zero.
Since Hp and H (or H-p and H) ~ are commutative Hermitian operators, according
to Theorem 14.14 there must exist a CONS of common eigenvectors as well.
Nonetheless, H±p is not commutative with G or G ~ in general. Therefore, the said
CONS cannot be a solution of the Dirac equation as an exceptional case, however,
H±p is commutative with G and G ~ if p = 0. (We have already encountered a similar
case in Sect. 3.3 where Lx, Ly, and Lz are commutative with one another if the orbital
angular momentum L = 0, even though it is not the case with L ≠ 0.) In that
exceptional case where p = 0, both G and G ~ are Hermitian (i.e., real diagonalized
metrices), and so we have a CONS of common eigenvectors that are solutions of the
Dirac equation.
The fact that H±p (p ≠ 0) is not commutative with G or G ~ needs further
explanation. Consider an eigenvalue problem GΩ = ωΩ (or GΩ ~ ~ =ω ~ ). Then,

both the positive-energy solution and negative-energy solution of the Dirac equation
can be a solution of the above two types of eigenvalue problems. That is, both Ω and
~ take both the positive-energy and negative-energy solutions as an eigenfunction.
Ω
Nonetheless, Hp (H-p) takes only a positive (negative) energy as an eigenvalue.
Correspondingly, Hp (H-p) takes only the positive-energy (negative-energy) solu-
tion as the eigenfunction. For this reason, H±p (p ≠ 0) is not commutative with G or
~ For further discussion, see Chap. 24.
G.
21.3

Table 21.1 Hamiltonian and helicity operators of the Dirac field

Hamiltonian Helicity operator Eigenvalue


Dirac spinor Hp, H-p ℌ, ℌ Eigenstatea Energy Helicity
u(p, h) m σp σp 0 χ p0 h
Hp = ℌ = j1pj N
σp -m 0 σp hSχ
v(p, h) m 0 h
Constitution and Solutions of the Dirac Equation

-σ  p -σ  p Sχ -p0
H -p = ℌ = j -1 pj N
-σ  p -m 0 -σ p - hχ
a
With the normalization constants N and N, we have N = N = p0 þ m; see (21.138) and (21.143)
963
964 21 The Dirac Equation

In a general case where p ≠ 0, however, the eigenvectors eligible for the solution
of the Dirac equation are not likely to constitute the CONS. As already mentioned in
~ is Hermitian in the general case makes the
Sect. 21.3.2, the fact that neither G nor G
problem more complicated. Hence, we must abandon the familiar
orthonormalization method based on the inner product in favor of another way.
We develop a different method in the next section accordingly.

21.4 Normalization of the Solutions of the Dirac Equation

First, we introduce the Dirac adjoint ψ such that

ψ  ψ {γ0: ð21:135Þ

Using this notation, we have

χ χ
uðp, hÞ uðp, hÞ = N  χ { hSχ { γ 0 N = jN j2 χ { - hSχ {
hSχ hSχ
{ 2 2 {
2
= jN j χ  χ - h S χ  χ , ð21:136Þ
2
P 2m
= jN j 2 1 - S 2 = jN j2 1 - = jN j2 ,
p0 þ m p0 þ m

where χ represents either χ + or χ - and χ {  χ = 1 in either case; we have h2 = 1; with


2
the last equality we used the relation ðp0 Þ = P 2 þ m2 . The number N is the normal-
ization constant of (21.101). According to the custom, we wish to get a normalized
eigenfunction u(p, h) such that

uðp, hÞ uðp, hÞ = 2m: ð21:137Þ

This implies that |N|2 = p0 + m. Choosing N to be real and positive, we have

N= p0 þ m: ð21:138Þ

Then, from (21.101) we get

χ
uðp, hÞ = p0 þ m hSχ
: ð21:139Þ

With the eigenfunctions having different helicity h and h′ (h ≠ h′), we have


21.4 Normalization of the Solutions of the Dirac Equation 965

uðp, hÞ uðp, h0 Þ = 0 ðh ≠ h0 Þ, ð21:140Þ

because χ {h  χ h0 = 0, namely, χ h and χ h0 ðh ≠ h0 Þ are orthogonal. Notice that χ h or χ h0


denotes either χ - or χ +. Combined with the relation χ {  χ = 1, we have

χ {h  χ h0 = δhh0 :

Further combining (21.137) and (21.140), we get

uðp, hÞ uðp, h0 Þ = 2mδhh0 : ð21:141Þ

With the negative-energy solution, using (21.114) and similarly normalizing the
function given in (21.118) we obtain

vðp, hÞ vðp, h0 Þ = - 2mδhh0 , ð21:142Þ

where v(p, h) is expressed as

~
vðp, hÞ = N S~χ
= p0 þ m S~χ
:
- h~χ - h~χ

That is, the normalization constant given by

~=
N p0 þ m ð21:143Þ

for (21.118) is the same as the constant N of (21.138). Note, however, the minus sign
in (21.142). Thus, the normalization of the eigenfunctions differs from that based on
the positive definite inner product (see Theorem 13.2: Gram–Schmidt
orthonormalization Theorem).
With regard to the normalization of the plane wave solutions of the Dirac
equation, we have the following useful relations:

uðp, hÞuðp, hÞ = pμ γ μ þ m, ð21:144Þ


h = - 1, þ1

vðp, hÞvðp, hÞ = pμ γ μ - m, ð21:145Þ


h = þ1, - 1

where the summation with h should be taken as both -1 and +1. We will come back
to these important expressions for the normalization in Chap. 24 in relation to the
projection operators.
We remark the following relationship with respect to the Dirac adjoint. We had
966 21 The Dirac Equation

Guðp, hÞ = 0: ð21:92Þ

Taking the adjoint of this expression, we have

u{ ðp, hÞG{ = u{ ðp, hÞγ 0 γ 0 G{ = uðp, hÞγ 0 G{ = 0: ð21:146Þ

In (21.91) we had G = pμ γ μ - m. Taking the adjoint of both sides and taking


account of the fact that pμ and m are real, we have

G{ = pμ ðγ μ Þ{ - m = pμ γ 0 γ μ γ 0 - m, ð21:147Þ

where notice that [1]

ð γ μ Þ{ = γ 0 γ μ γ 0 : ð21:148Þ

Multiplying both sides of (21.147) by γ 0 from both the left and right, we get

γ 0 G{ γ 0 = pμ γ μ - m = G: ð21:149Þ

Moreover, multiplying (21.146) by γ 0 from the right, we get

uðp, hÞγ 0 G{ γ 0 = uðp, hÞG = 0, ð21:150Þ

where with the first equality we used (21.149).


Meanwhile, we have

ðpν γ ν Þγ μ þ γ μ ðpν γ ν Þ = pν ðγ ν γ μ þ γ μ γ ν Þ = pν  2ηνμ = 2pμ :

That is,

ðpν γ ν Þγ μ þ γ μ ðpν γ ν Þ = 2pμ : ð21:151Þ

Multiplying both sides of (21.151) by uðp, hÞ from the left and u(p, h′) from the
right, we obtain

uðp, hÞ½ðpν γ ν Þγ μ þ γ μ ðpν γ ν Þuðp, h0 Þ = 2pμ uðp, hÞuðp, h0 Þ = 2pμ 2mδhh0 , ð21:152Þ

where with the last equality we used (21.141). Using (21.92) and (21.150), we get

LHS of ð21:152Þ = uðp, hÞðmγ μ þ γ μ mÞuðp, h0 Þ = 2muðp, hÞγ μ uðp, h0 Þ: ð21:153Þ

Comparing RHS of (21.152) and (21.153), we obtain


21.5 Charge Conjugation 967

uðp, hÞγ μ uðp, h0 Þ = 2pμ δhh0 : ð21:154Þ

Likewise, we have

vðp, hÞγ μ vðp, h0 Þ = 2pμ δhh0 : ð21:155Þ

In particular, putting μ = 0 in (21.154) and (21.155) we have

uγ 0 uðp, h0 Þ = ½uðp, hÞ{ uðp, h0 Þ = 2p0 δhh0 ,

vγ 0 vðp, h0 Þ = ½vðp, hÞ{ vðp, h0 Þ = 2p0 δhh0 : ð21:156Þ

With respect to the other normalization relations, we have, e.g.,

u{ ðp, hÞvð- p, h0 Þ = v{ ðp, hÞuð- p, h0 Þ = 0: ð21:157Þ

Equation (21.157) immediately follows from the fact that u(±p, h) and v(±p, h)
belong to the different energy eigenvalues p0 and -p0, respectively; see the discus-
sion of Sect. 21.3.4.

21.5 Charge Conjugation

As in the case of (21.141) and (21.142), the normalization based upon the Dirac
adjoint keeps (21.141) and (21.142) invariant by the multiplication of a phase factor
c = eiθ (θ : real). That is, instead of u(p, h) and v(p, h) we may freely use cu(p, h) and
cv(p, h). Yet, we normally impose a pretty strong constraint on the eigenfunctions
u(p, h) and v(p, h) (vide infra). Such a constraint is known as the charge conjugation
or charge-conjugation transformation, which is a kind of discrete transformation and
thought of as the particle-antiparticle transduction. The charge-conjugation trans-
formation of a function ψ is defined by

ψ C  iγ 2 ψ  : ð21:158Þ

This transformation is defined as the interchange between a particle and an


antiparticle. The notion of antiparticle is characteristic of the quantum theory of
fields. If ψ represents the quantum state of the particle, ψ C corresponds to that of the
antiparticle and vice versa. If ψ is associated with the charge Q (=1), ψ C carries the
charge of -Q (= - 1).
The charge conjugation is related to the Dirac adjoint such that
968 21 The Dirac Equation

ψ C = Cψ T , ð21:159Þ

where C is given by

C  iγ 2 γ 0 : ð21:160Þ

That is, we have

T T
ψ C = Cψ T = iγ 2 γ 0 γ 0 ψ{ = iγ 2 γ 0 γ 0 ψ  = iγ 2 ψ  ð21:161Þ

to recover the definition of the charge conjugation of (21.158). Writing ψ  as a


column vector representation such that

ψ 1
ψ 2
ψ = , ð21:162Þ
ψ 3
ψ 4

we obtain

0 0 0 -1 ψ 1 - ψ 4
0 0 1 0 ψ 2 ψ 3
ψ C = iγ 2 ψ  = 0 1 0 0 = , ð21:163Þ
ψ 3 ψ 2
-1 0 0 0 ψ 4 - ψ 1

where we used the representation of (21.68) with γ 2. As for the representation of the
matrix C, we get

0 0 0 1
0 0 -1 0
C = iγ 2 γ 0 = 0 1 0 0 : ð21:164Þ

-1 0 0 0

The matrix C is anti-Hermitian with eigenvalues of ±i (each, double root). Also,


we have C = - C-1. As anticipated, successive charge-conjugation transformation
of ψ gives
21.5 Charge Conjugation 969


= iγ 2 ð- iÞ - γ 2 ðψ  Þ = - γ 2
2
ψ C = iγ 2 iγ 2 ψ 
C
ψ=
- ð- E 4 Þψ = ψ, ð21:165Þ

where with the second to the last equality, we used (21.63).


We apply the notion of charge conjugation to u(p, h) and v(p, h) of (21.104) and
(21.120) so that we may obtain

vðp, hÞ = iγ 2 uðp, hÞ : ð21:166Þ

Also, we have

uðp, hÞ = iγ 2 vðp, hÞ : ð21:167Þ

Using the column vector representation for (21.166), we get

iϕ 
θ
e 2 sin 2
0 0 0 -1 

0 0 1 0 e - 2 cos θ
2
iγ 2 uðp, þ1Þ = N 0 1 0 0 iϕ 
θ
Se 2 sin 2
-1 0 0 0 iϕ 
Se - 2 cos θ
2
θ
- Seiϕ=2 cos
2
- iϕ=2 θ
Se sin
2
= N θ = vðp, þ1Þ,
eiϕ=2 cos
2
θ
- e - iϕ=2 sin
2
970 21 The Dirac Equation

iϕ 
θ
e 2 cos 2
0 0 0 -1 

0 0 1 0 - e - 2 sin θ
2
iγ 2 uðp, - 1Þ = N 0 1 0 0 iϕ 
θ
- Se 2 cos 2
-1 0 0 0 iϕ 
Se - 2 sin θ
2
θ ð21:168Þ
- Seiϕ=2 sin
2
- iϕ=2 θ
- Se cos
2
=N θ = vðp, - 1Þ:
-e iϕ=2
sin
2
θ
- e - iϕ=2 cos
2

In (21.168), we used N = N. Notice that (21.168) is exactly the same as the


second equation of (21.120) including the phase factor. This feature is essential
when we deal with the quantization of the Dirac field (see Chap. 22). Also, note that
the charge conjugation holds the helicity invariant. As already mentioned, the charge
conjugation transforms a particle carrying the charge Q (= + 1) into its antiparticle
Q (= - 1) or vice versa. Therefore, the charge conjugation does not cause the change
in the movement form of a particle but is responsible for the inner transformation
between particle and antiparticle. Then, it is natural that the charge conjugation holds
the helicity invariant.
We also derive the charge-conjugation transformation in a general form as
follows:

pμ γ μ - m uðp, hÞ = 0,

# take complex conjugate:

p2 - γ 2 þ pμ γ μ - m uðp, hÞ = 0,
μ≠2

# insert iγ 2iγ 2 = E4 [(4, 4) identity matrix], before u(p, h):

p2 - γ 2 þ pμ γ μ - m iγ 2 iγ 2 uðp, hÞ = 0,
μ≠2

# use Cuðp, hÞT  iγ 2 uðp, hÞ :


21.6 Characteristics of the Gamma Matrices 971

p2 - γ 2 iγ 2 þ pμ γ μ iγ 2 - miγ 2 Cuðp, hÞT = 0,


μ≠2

# use γ μγ 2 = - γ 2γ μ (μ ≠ 2) and (γ 2)2 = - E4:

ip2 - iγ 2 pμ γ μ - miγ 2 Cuðp, hÞT = 0,


μ≠2

# multiply iγ 2 on the above expression from the left:

- p2 γ 2 - pμ γ μ - m Cuðp, hÞT = 0:
μ≠2

Thus, we obtain

- pμ γ μ - m Cuðp, hÞT = 0: ð21:169Þ

Meanwhile, as already seen, we have had

- pμ γ μ - m vðp, hÞ = 0: ð21:74Þ

Comparing (21.169) and (21.74), we get

vðp, hÞ = Cuðp, hÞT : ð21:170Þ

In an opposite way, we have

uðp, hÞ = Cvðp, hÞT : ð21:171Þ

The concepts of the Dirac adjoint and charge conjugation play an essential role in
the QED, and so we will come back to this point in Chaps. 23 and 24 to consider
these concepts in terms of the matrix algebra.

21.6 Characteristics of the Gamma Matrices

In this section we summarize important expressions and relations of the gamma


matrices. The gamma matrices were constructed using the Pauli spin matrices as in
(21.67) and their tangible forms were given by (21.68). The gamma matrices are
inherently coupled to the Lorentz transformation. The gamma matrices combine the
basic importance and practical feature in the quantum theory of fields accordingly. In
972 21 The Dirac Equation

terms of the latter aspect, we make the most of various formula of the gamma
matrices (see Chaps. 23 and 24). We summarize several of them in the last part of
this section.
We started with the discussion of the Dirac equation by introducing the gamma
matrices (Sect. 21.3.1). The key equation for this is given by

fγ μ , γ ν g = 2ημν : ð21:66Þ

As the frequently used representation of the gamma matrices, we had the Dirac
representation described by

1 0 0 σk
γ0 = , γk = ðk = 1, 2, 3Þ: ð21:67Þ
0 -1 - σk 0

In (21.67), γ 0 is an Hermitian operator and γ k (k = 1, 2, 3) is anti-Hermitian, while


both γ 0 and γ k are unitary. While sticking to the representation of γ 0 and (anti-)
Hermitian property of these operators, we wish to deal with the representation of
gamma matrices in a little bit broader context. Suppose that γ k (k = 1, 2, 3) is
represented by

a b c d
e f g h
γk = p q r s , ð21:172Þ

t u v w

where a, b, ⋯, w are arbitrary complex numbers. Taking account of

γ0γk þ γk γ0 = 0 ðk = 1, 2, 3Þ ð21:173Þ

and the fact that γ k is anti-Hermitian, (21.172) is reduced to

0 b c d
- b 0 g h
γ k = - γ k{ = - c - g 0 s : ð21:174Þ

- d - h - s 0

From the unitarity of γ k, we get

cg þ dh = 0, - bg þ ds = 0, bh - cs = 0,


cb þ hs = 0, db - gs = 0, dc þ hg = 0;
21.6 Characteristics of the Gamma Matrices 973

2 2
bj2 þ c þ dj2 = 1, b 2 þ gj2 þ h = 1,
2 2
cj2 þ g þ sj2 = 1, d 2 þ hj2 þ s = 1: ð21:175Þ

Putting b = c = h = s = 0 in (21.175), we have |d| 2=|g|2 = 1. Then, using θk and


ρk (k = 1, 2) as real parameters, we obtain

0 0 0 eiθk
0 0 eiρk 0
γk = 0 - e - iρk 0 0 ðk = 1, 2Þ: ð21:176Þ

- e - iθk 0 0 0

From (21.66), we have

- eiðθ1 - θ2 Þ 0 0 0
iðρ1 - ρ2 Þ
0 -e 0 0
γ1, γ2 =
0 0 - eiðρ2 - ρ1 Þ 0

0 0 0 - eiðθ2 - θ1 Þ
iðθ2 - θ1 Þ
-e 0 0 0
iðρ2 - ρ1 Þ
0 -e 0 0
þ = 0:
0 0 - eiðρ1 - ρ2 Þ 0

0 0 0 - eiðθ1 - θ2 Þ
ð21:177Þ

Hence,

2 cosðθ1 - θ2 Þ = 2 cosðρ1 - ρ2 Þ = 0:

This is equivalent to θ1 - θ2 = ± π/2 and ρ1 - ρ2 = ± π/2. Setting θ1 = ρ1 = 0


and θ2 = - ρ2 = π/2 in (21.176), we get γ 1 and γ 2 represented in (21.68). Similarly,
putting b = d = g = s = 0 in (21.174), we have |c| 2=|h|2 = 1 from (21.175). Using τ
and σ as real parameters, we obtain γ 3 described by

0 0 eiτ 0
0 0 0 eiσ
γ3 = - e - iτ 0 0 0 : ð21:178Þ

0 - e - iσ 0 0

Setting τ = π and σ = 0, we get γ 3 represented in (21.68).


974 21 The Dirac Equation

There is arbitrariness for choosing the gamma matrices. Multiplying both sides of
(21.66) by a unitary operator U{ from the left and U from the right and defining

γ μ  U { γ μ U, ð21:179Þ

we get

fγ μ , γ ν g = 2ημν : ð21:180Þ

That is, newly defined gamma matrices γ μ satisfy the same condition (21.66)
while retaining unitarity and (anti-)Hermiticity.
The characteristic polynomials of γ k (k = 1, 2, 3) given by (21.68) are all

2
λ2 þ 1 = 0: ð21:181Þ

That is, λ = ± i (each, doubly degenerate). Their determinants are all 1. With γ 0,
its characteristic polynomial is

2
λ2 - 1 = 0: ð21:182Þ

Hence, λ = ± 1 (each, doubly degenerate). Its determinant is 1.


We have another gamma matrix, γ 5, that is linearly independent of
μ
γ (μ = 0, 1, 2, 3). The matrix is defined as

0 0 1 0
0 0 0 1
γ 5  iγ 0 γ 1 γ 2 γ 3 = 1 0 0 0 : ð21:183Þ

0 1 0 0

The matrix γ 5 is Hermitian and unitary with eigenvalues λ = ± 1 (each, doubly


degenerate) and γ 5 is anti-commutative with γ μ (μ = 0, 1, 2, 3). The determinant of γ 5
is 1. In Table 21.2, we summarize major characteristics of the gamma matrices
including γ 5.

Table 21.2 Major characteristics of the gamma matrices


Matrix Characteristic Eigenvalue Determinant
γ0 Hermitian, unitary 1
±1 (each doubly degenerate)
γ i (i = 1, 2, 3) anti-Hermitian, unitary ±i (each doubly degenerate) 1
γ 5  iγ 0γ 1γ 2γ 3 Hermitian, unitary 1
±1 (each doubly degenerate)
21.6 Characteristics of the Gamma Matrices 975

So far, we have assumed that the gamma matrices comprise a fourth-order square
matrix. But it is not self-evident. Now, we tentatively assume that the gamma matrix
is the n-th order square matrix. From (21.62) we have

γ μ γ ν = - γ ν γ μ ðμ ≠ νÞ:

Taking the determinant of the above equation, we have

detγ μ detγ ν = ð- 1Þn detγ ν detγ μ : ð21:184Þ

Thus, we obtain (-1)n = 1. Namely, n must be an even number. In (21.184) we


know that detγ μ ≠ 0 from (21.63). We acknowledge that the linearly independent
5
matrices are E (identity matrix), γ μ (μ = 0, 1, 2, 3, 5), and 10 = products
2
consisting of two different gamma matrices. Therefore, the number of linearly
independent matrices is 16. Hence, n of (21.184) is at least 4, because we have
42 (=16) matrix elements for a (4, 4) matrix. In fact, it suffices to choose a (4, 4)
matrix for the gamma matrix. The detailed proof for n = 4 is given in literature and
website [7, 8].
Other than the characteristics of the gamma matrices summarized in Table 21.2,
various properties of the gamma matrices are summarized or tabulated in many
textbooks. We list some typical examples of them below. Among them, the contrac-
tions and trace calculations are frequently used. Useful applications can be seen in
subsequent chapters.
(i) Contractions: [1]

γ λ γ λ = 4E4 , γ λ γα γ λ = - 2γ α , ð21:185Þ
γ λ γ α γ β γ λ = 4ηαβ , γ λ γ α γ β γ ρ γ λ = - 2γ ρ γ β γ α , ð21:186Þ
γ λ ðAσ γ σ Þγ λ = - 2ðAσ γ σ Þ, γ λ ðAσ γ σ ÞðBσ γ σ Þγ λ = 4Aσ Bσ , ð21:187Þ
γ λ ðAσ γ σ ÞðBσ γ σ ÞðCσ γ σ Þγ λ = - 2ðC σ γ σ ÞðBσ γ σ ÞðAσ γ σ Þ: ð21:188Þ

Note that in the above, E4 denotes the (4, 4) identity matrix.


(ii) Traces: [1]

Tr½ðAσ γ σ ÞðBσ γ σ Þ = 4Aσ Bσ , ð21:189Þ


σ σ σ σ
Tr½ðAσ γ ÞðBσ γ ÞðC σ γ ÞðDσ γ Þ
ð21:190Þ
= 4½ðAσ Bσ ÞðC σ Dσ Þ - ðAσ C σ ÞðBσ Dσ Þ þ ðAσ Dσ ÞðBσ Cσ Þ:
976 21 The Dirac Equation

References

1. Mandl F, Shaw G (2010) Quantum field theory, 2nd edn. Wiley, Chichester
2. Itzykson C, Zuber J-B (2005) Quantum field theory. Dover, New York
3. Satake I-O (1975) Linear algebra (pure and applied mathematics). Marcel Dekker, New York
4. Satake I (1974) Linear algebra. Shokabo, Tokyo. (in Japanese)
5. Møller C (1952) The theory of relativity. Oxford University Press, London
6. Kaku M (1993) Quantum field theory. Oxford University Press, New York
7. Sakamoto M (2014) Quantum field theory. Shokabo, Tokyo. (in Japanese)
8. Sakamoto M (2014). https://www.shokabo.co.jp/author/2511/answer/QFT_answer_all.pdf;
http://www.shokabo.co.jp/author/2511/answer/QFT_answer_all.pdf. (in Japanese)
Chapter 22
Quantization of Fields

In Chap. 1 we thought the canonical commutation relation as the key concept of


quantum mechanics. The canonical commutation relation is based on the commuta-
tion relation between the canonical coordinate and canonical momentum. These
quantities are represented as generalized coordinates and generalized momenta so
that these fundamental variables can conform to different coordinate systems. In this
chapter, we reformulate the canonical commutation relation as the guiding principle
of the quantization of fields. This will be done within the framework of the
Lagrangian formalism. In this chapter we deal with the scalar field, Dirac field,
and photon field (electromagnetic field). The commutation relation is first
represented as the equal-time commutation relation, as the Lagrangian formalism
singles out the time coordinate, so does the equal-time commutation relation. This
looks at first glance inconsistent with the Lorentz invariance (or covariance). How-
ever, this superficial inconsistency is worked out by introducing the invariant delta
functions. At the same time, by translating the canonical commutation relation in the
coordinate space into that in the momentum space, the calculations are eased with
respect to the field quantities.
The covariant nature is indispensable to any physical theory that is based on the
theory of relativity. In that sense, the invariant delta functions provide us with a good
example. The combination of the invariant delta function and time-ordered product
yields the Feynman propagator that is fully necessary to evaluate the interaction
between the fields. These procedures are directly associated with the construction of
the quantum electrodynamics (QED).

22.1 Lagrangian Formalism of the Fields [1]

Analytical mechanics is based upon the introduction of scalar quantities (e.g.,


Lagrangian and Hamiltonian) that are independent of the choice of individual
coordinate system. Of these quantities, Lagrangian and the associated action integral

© Springer Nature Singapore Pte Ltd. 2023 977


S. Hotta, Mathematical Physical Chemistry,
https://doi.org/10.1007/978-981-99-2512-4_22
978 22 Quantization of Fields

play a central role in the quantum theory of fields as a powerful tool to develop the
theory and to perform calculations. We describe a brief outline of the Lagrangian
formalism [1].
For the sake of brevity, suppose that a single particle is moving under the
influence of potential U. Then, the kinetic energy T of the particle is described as

3 3
1 2 1 2
T= m_x = p, ð22:1Þ
k=1
2 k k=1
2m k

where m is a mass of the particle; xk and pk (k = 1, 2, 3) represent the x-, y-, and z-
component of the Cartesian coordinate and momentum of the particle in that
direction, respectively. In the Cartesian coordinate, we have

pk = m_xk ðk = 1, 2, 3Þ: ð22:2Þ

Here, we may modify (22.2) so that we can use any suited variables to specify the
position of the particle instead of the Cartesian coordinate xi. Examples include the
spherical coordinate introduced in Sect. 3.2. Let qk (k = 1, 2, 3) be a set of such
variables, which are said to be generalized coordinates. Suppose that xk is expressed
as a function of qk such that

x1 = x1 ðq1 , q2 , q3 Þ,
x2 = x2 ðq1 , q2 , q3 Þ,
x3 = x3 ðq1 , q2 , q3 Þ:

Differentiating xk with respect to t (i.e., time), we get

3
∂xk
x_ k = q_ ðk = 1, 2, 3Þ: ð22:3Þ
l=1
∂ql l

Assuming that x_ k depends upon both q1, q2, q3 and q_ 1 , q_ 2 , q_ 3 , x_ k can be described
by

x_ k = x_ k ðq1 , q2 , q3 , q_ 1 , q_ 2 , q_ 3 Þ: ð22:4Þ

Then, from (22.3) we obtain

∂_xk ∂x
= k: ð22:5Þ
∂q_ l ∂ql

Comparing (22.1) and (22.2), we may express pk as


22.1 Lagrangian Formalism of the Fields [1] 979

∂T
pk = m_xk = ðk = 1, 2, 3Þ: ð22:6Þ
∂_xk

Since in (22.1) T is described by x_ k ðk = 1, 2, 3Þ, T can be described by


q1 , q2 , q3 , q_ 1 , q_ 2 , q_ 3 as well through (22.4) such that

T = T ðx_ 1 , x_ 2 , x_ 3 Þ = T ðq1 , q2 , q3 , q_ 1 , q_ 2 , q_ 3 Þ: ð22:7Þ

Thus, in parallel with (22.6) we can define a generalized momentum (or conjugate
momentum) pk such that

∂T
pk  ðk = 1, 2, 3Þ: ð22:8Þ
∂q_ k

Taking account of (22.4) and applying the differentiation of a composite function


to (22.7), we have

3
∂T ∂_xl
pk = ðk = 1, 2, 3Þ: ð22:9Þ
l=1
∂_xl ∂q_ k

Using (22.5) and (22.6), we obtain

3
∂xl
pk = m_xl : ð22:10Þ
l=1
∂qk

Further differentiating (22.10) with respect to t, we get

3 3
∂xl d ∂xl
p_k = m€xl þ m_xl : ð22:11Þ
l=1
∂qk l = 1 dt ∂qk

Now, suppose that the single particle we are dealing with undergoes an infinites-
imal displacement so that q1, q2, q3 are changed by dq1, dq2, dq3. This causes
xk = xk(q1, q2, q3) to be changed by dxk such that

3
∂xk
dxk = dq : ð22:12Þ
l=1
∂ql l

If we think of the virtual displacement δxk [1], we rewrite (22.12) as


980 22 Quantization of Fields

3
∂xk
δxk = δq : ð22:13Þ
l=1
∂ql l

Suppose that the particle is subject to a force F, which exerts an infinitesimal


virtual work δW to the particle described by

3
δW = F k δxk : ð22:14Þ
k=1

Note that Fk is the x-, y-, and z-component of the original Cartesian coordinate.
Namely, F can be expressed such that

3
F= F k ek , ð22:15Þ
k=1

where ek (k = 1, 2, 3) are orthonormal basis vectors of the Cartesian coordinate that


appeared in Sect. 3.2. Also, notice that we have used the word virtual work and
symbol δW in correspondence with the virtual displacement δxk.
Measuring δW in reference to the generalized coordinate, we get

3 3 3 3 3
∂xk ∂xk
δW = F k δxk = Fk δq = Fk δql , ð22:16Þ
k=1 k=1 l=1
∂ql l l=1 k=1
∂ql

where δxk in (22.14) was replaced with δqk through (22.13) and the order of
summation was exchanged with respect to l and k. In (22.16), we define Ql as

3
∂xk
Ql  Fk : ð22:17Þ
k=1
∂ql

Then, we have

3
δW = Ql δql : ð22:18Þ
l=1

Comparing (22.14) and (22.18), we have

3 3
F k δxk = Qk δqk : ð22:19Þ
k=1 k=1
22.1 Lagrangian Formalism of the Fields [1] 981

Thus, we notice that Qk represented in reference to the generalized coordinate


system corresponds to Fk represented in reference to the original Cartesian coordi-
nate system. Hence, Qk is said to be a generalized force.
Describing the Newton’s equation of motion in reference to the Cartesian coor-
dinate, we obtain

F l = m€xl : ð22:20Þ

Substituting (22.20) for (22.11), we have

3 3 3
∂xl d ∂xl d ∂xl
p_k = Fl þ m_xl = Qk þ m_xl , ð22:21Þ
l=1
∂qk l = 1 dt ∂qk l=1
dt ∂qk

where with the second equality we used (22.17).


Once again, taking account of (22.4) and applying the differentiation of a
composite function to (22.7), we have

3 3
∂T ∂T ∂_xl ∂_xl
= = m_xl ðk = 1, 2, 3Þ, ð22:22Þ
∂qk l=1
∂_xl ∂qk l=1
∂qk

∂_xl
where with the second equality we used (22.6). Calculating ∂q by use of (22.3), we
k
obtain

3 3 3
∂_xl ∂xl 2 ∂xl 2 ∂ ∂xl dqm d
= q_ = q_ = =
∂qk m=1
∂qk ∂qm m m=1
∂qm ∂qk m m=1
∂qm ∂qk dt dt

∂xl
 :
∂qk
ð22:23Þ

The last equality of (22.23) is obtained from the differentiation of the composite
function xl = xl ðq1 , q2 , q3 , q_ 1 , q_ 2 , q_ 3 Þ with respect to t. Substituting (22.23) for RHS
of (22.22), we get

3
∂T d ∂xl
= m_xl ðk = 1, 2, 3Þ: ð22:24Þ
∂qk l=1
dt ∂qk

Replacing the second term of (22.21) with LHS of (22.24), finally we obtain a
succinct expression described by
982 22 Quantization of Fields

∂T
p_k = Qk þ : ð22:25Þ
∂qk

Using (22.8) that defined the generalized momentum, (22.25) is rewritten as

d ∂T ∂T
= Qk þ ðk = 1, 2, 3Þ: ð22:26Þ
dt ∂q_ k ∂qk

Equation (22.26) can further be transformed to a well-known form based upon the
Lagrangian L. Suppose that the force is a conservative force. In that case, Qk can be
described as

∂U
Qk = - , ð22:27Þ
∂qk

where U is potential. Then, (22.26) is further rewritten as

d ∂T ∂T ∂U
- þ = 0 ðk = 1, 2, 3Þ: ð22:28Þ
dt ∂q_ k ∂qk ∂qk

We assume that U is independent of q_ k ðk = 1, 2, 3Þ. That is, we have

∂U
= 0: ð22:29Þ
∂q_ k

Accordingly, we get

d ∂ ðT - U Þ ∂ ðT - U Þ
- = 0: ð22:30Þ
dt ∂q_ k ∂qk

Defining the Lagrangian L as

L  T - U, ð22:31Þ

we obtain

d ∂L ∂L
- = 0 ðk = 1, 2, 3Þ: ð22:32Þ
dt ∂q_ k ∂qk

Equation (22.32) is called the Lagrange’s equations of motion (or Euler–


Lagrange equation) and utilized in various fields of physics and natural science.
Although we have assumed a single particle system so far, the number of particles
does not need to be limited. If a dynamical system we are thinking of contains
n particles, the freedom of that system is 3n. However, some part of them may be put
22.1 Lagrangian Formalism of the Fields [1] 983

under a constraint; for instance, suppose that motion of some particles is restricted to
the surface of a sphere, or some particles are subject to free fall under the force of
gravity. In such cases the freedom decreases. Hence, the freedom of the dynamical
system is normally characterized by a positive integer n. Then, (22.32) is
reformulated as follows:

d ∂L ∂L
- = 0 ðk = 1, 2, ⋯, nÞ: ð22:33Þ
dt ∂q_ k ∂qk

We remark that on the condition of (22.29), (22.8) can be rewritten as

∂L ∂T
pk  = ðk = 1, 2, ⋯, nÞ: ð22:34Þ
∂q_ k ∂q_ k

The relations (22.8) and (22.34) make a turning point in our formulation. Through
the medium of (22.7) and (22.33), we express the Lagrangian L as

L = Lðq1 , ⋯, qn , q_ 1 , ⋯, q_ n , t Þ  Lðq, q,
_ t Þ, ð22:35Þ

where with the last identity q and q_ represent q1, ⋯, qn and q_ 1 , ⋯, q_ n collectively.
Consequently, pk ðk = 1, 2, ⋯, nÞ can be described as a function of q, q, _ t. In an
_ p, t Þ, but this would break the
opposite way, we could describe L as Lðq, p, t Þ or Lðq,
original functional form of L. Instead, we wish to seek another way of describing a
physical quantity P as a function of q, p, and t, i.e., Pðq, p, t Þ. Such satisfactory
prescription is called the Legendre transformation [1]. The Legendre transformation
is defined by

_ t Þ - pq:
Pðq, p, t Þ = Lðq, q, _ ð22:36Þ

The implication of (22.36) is that the physical quantity P has been described as a
_ t. Notice that Pðq, p, t Þ
function of the independent variables q, p, t instead of q, q,
_ t Þ are connected to each other through p and q.
and Lðq, q, _ The quantities q, p are said
to be canonical variables. That is, the Legendre transformation enables us to describe
P as a function of the canonical variables. The Hamiltonian H ðq, p, t Þ, another
important quantity in the analytical mechanics, is given by the Legendre transfor-
mation and defined as

n
H ðq, p, t Þ = _ t Þ:
pk q_ k - Lðq, q, ð22:37Þ
k=1

If the kinetic energy T is given by a second-order homogeneous polynomial with


respect to q_ k ðk = 1, 2, ⋯, nÞ as in (22.1), from (22.34) we have
984 22 Quantization of Fields

∂T
= pk = 2T k =q_ k : ð22:38Þ
∂q_ k

Hence, we obtain

2T k = pk q_ k :

Taking the summation of this expression, we get

n n
pk q_ k = 2 T k = 2T, ð22:39Þ
k=1 k=1

n
where T k = T. Hence, the Hamiltonian H ðq, p, t Þ given in (22.37) is described by
k=1

H ðq, p, t Þ = 2T - L = 2T - ðT - U Þ = T þ U: ð22:40Þ

Thus, the Hamiltonian represents the total energy of the dynamical system. The
Legendre transformation is often utilized in thermodynamics. Interested readers are
referred to literature [1].
Along with the Lagrange’s equations of motion, we have another important
guiding principle of the Lagrangian formalism. It is referred to as the principle of
least action or Hamilton’s principle. To formulate the principle properly, we define
an action integral S such that [2]
1
Sðq1 , ⋯, qn Þ  dtLðq1 , ⋯, qn , q_ 1 , ⋯, q_ n , t Þ: ð22:41Þ
-1

The principle of least action is simply described as

δSðq1 , ⋯, qn Þ = 0: ð22:42Þ

The principle of least action says that particles of the dynamical system move in
such a way that the action integral takes an extremum. Equations (22.41) and (22.42)
lead to the following equation:
1
δSðqÞ = dt ½Lðq þ δq, q_ þ δq,
_ t Þ - Lðq, q,
_ t Þ = 0, ð22:43Þ
-1

where q and q_ represent q1, ⋯, qn and q_ 1 , ⋯, q_ n all in one, respectively. We have


22.2 Introductory Fourier Analysis [3] 985

n
∂L ∂L
Lðq þ δq, q_ þ δq,
_ t Þ = Lðq, q,
_ tÞ þ δqk þ δq_ k : ð22:44Þ
k=1
∂qk ∂q_ k

Noting that

∂L d ∂L d ∂L d ∂L
δq_ k = δq = - δqk þ δqk , ð22:45Þ
∂q_ k dt k ∂q_ k dt ∂q_ k dt ∂q_ k

we get

1 n
∂L d ∂L d ∂L
δSðqÞ = dt δqk - δqk þ δqk
-1 k=1
∂qk dt ∂q_ k dt ∂q_ k
1 n
∂L d ∂L d ∂L
= dt δqk - þ δqk
-1 k=1
∂qk dt ∂q_ k dt ∂q_ k
1 n 1 n
∂L d ∂L d ∂L
= dt δqk - þ dt δqk
-1 k=1
∂qk dt ∂q_ k -1 k=1
dt ∂ q_ k
1 n n 1
∂L d ∂L ∂L
= dt δqk - þ δqk = 0:
-1 k=1
∂qk dt ∂q_ k k=1
∂q_ k -1

Assuming that infinitesimal variations vanish at t = ± 1, we obtain

1 n
∂L d ∂L
δSðqÞ = dt δqk - = 0: ð22:46Þ
-1 k=1
∂qk dt ∂q_ k

The integrand of (22.46) must vanish so that (22.46) can hold with any arbitrarily
chosen variations δqk. In other words, the Lagrange’s equations of motion must be
represented by

d ∂L ∂L
- = 0 ðk = 1, 2, ⋯, nÞ: ð22:33Þ
dt ∂q_ k ∂qk

In what follows, we will make the most of the Lagrangian formalism.

22.2 Introductory Fourier Analysis [3]

Fourier analysis is one of the most important mathematical techniques. It contains


the Fourier series expansion, Fourier transform, inverse Fourier transform, etc. We
dealt with these techniques in this book from place to place (e.g., in Chaps. 5 and 20).
Since the Fourier analysis, especially the Fourier transform is indispensable tool in
986 22 Quantization of Fields

the quantum theory of fields, we introduce the related techniques rather systemati-
cally in this section. This is because the field quantization is associated with the
whole space–time. Be that as it may, the major concepts related to the field
quantization have already been shown in Parts I and IV. Here, we further extend
and improve such concepts.

22.2.1 Fourier Series Expansion

Let us start with the Fourier series expansion. Consider the following SOLDE that
appeared in Example 1.1 of Sect. 1.3:

d 2 yð xÞ
þ λyðxÞ = 0: ð1:61Þ
dx2

We solved (1.61) under the Dirichlet boundary conditions (see Sect. 10.3) of
y(L ) = 0 and y(-L ) = 0 (L > 0). The normalized eigenfunctions were described by

1 π
yð xÞ = cos kx kL = þ mπ ðm = 0, 1, 2, ⋯Þ , ð1:91Þ
L 2

with λ = (2m + 1)2π 2/4L2 (m = 0, 1, 2, ⋯) or

1
yð xÞ = sin kx ½kL = nπ ðn = 1, 2, 3, ⋯Þ, ð1:92Þ
L

with λ = (2n)2π 2/4L2 (n = 1, 2, 3, ⋯).


Meanwhile, in Sect. 3.3 we dealt with essentially the same SOLDE

1 d2 ΦðϕÞ
- = η: ð3:58Þ
ΦðϕÞ dϕ2

Rewriting (3.58), we get

d 2 Φð ϕÞ
þ ηΦðϕÞ = 0 ð0 ≤ ϕ ≤ 2π Þ: ð22:47Þ
dϕ2

Under the periodic boundary conditions (BCs) of Φ(0) = Φ(2π) and


Φ′(0) = Φ′(2π), we obtained the eigenfunctions described by
22.2 Introductory Fourier Analysis [3] 987

1
ΦðϕÞ = p eimϕ ðm = 0, ± 1, ± 2, ⋯Þ, ð3:64Þ

with η = m2 (m = 0, ±1, ±2, ⋯).


The above two examples show how the different BCs (boundary conditions)
produced different eigenfunctions that corresponded to the different eigenvalues,
even though we were considering the same differential equation.When we deal with
the problems of the quantum field theory, the periodic BCs are more likely to be
encountered than the Dirichlet BCs. If we admit that with the quantum-mechanical
wave functions ψ, |ψ|2 represents the existence probability (of either particle or
field), it is natural to assume that |ψ(x)|2 → 0 at x → ± 1, where x denotes the
space–time coordinate. In light of the periodic BCs, we may safely say that ψ(x)
vanishes “smoothly” at x → ± 1.
Having the aforementioned background knowledge, let us consider the Fourier
series expansion of a function f(x) that satisfies the periodic BCs specified by

f ð- aÞ = f ðaÞ and f 0 ð- aÞ = f 0 ðaÞ ða > 0Þ: ð22:48Þ

With a system of eigenfunctions F n(x), we use

1
F n ðxÞ = p einπx=a ðn = 0, ± 1, ± 2, ⋯; a > 0Þ: ð22:49Þ
2a

Note that if we put a = π in (22.49), (22.49) is of the same form as (3.64). Then,
using (22.49) f(x) can be expanded such that

1
1
f ðxÞ = cðnÞeinπx=a ð22:50Þ
2a n= -1

with
a
c ð nÞ = e - inπx=a f ðxÞ: ð22:51Þ
-a

In the next paragraph, we further explore the characteristics and implications of


(22.50) and (22.51).

22.2.2 Fourier Integral Transforms: Fourier Transform


and Inverse Fourier Transform

Let us suppose that a → 1 in (22.50) and (22.51), keeping the BCs the same as
(22.48) to see what happens. Putting
988 22 Quantization of Fields


= k, ð22:52Þ
a

we think of its difference

Δnπ
= Δk: ð22:53Þ
a
Δk
Taking Δn = 1, we have 2a
1
= 2π . Then, (22.50) is rewritten as

1 1
Δk 1
f ðxÞ = cðnÞeikx = cðnÞeikx Δk: ð22:54Þ
2π n= -1
2π n= -1

We see from (22.53) that as a → 1, Δk → 0. Then, (22.54) can further be


converted into
1
1
f ðxÞ = cðkÞeikx dk, ð22:55Þ
2π -1

where c(n) in (22.50) was changed to c(k) through (22.52). Multiplying both sides of
(22.55) by e-iqx and integrating it with respect to x, we have
1 1 1
1
e - iqx f ðxÞdx = cðkÞeiðk - qÞx dkdx
-1 2π -1 -1
1 1
1
= cð k Þ eiðk - qÞx dx dk ð22:56Þ
2π -1 -1
1
1
= cðkÞ2πδðk - qÞdk = cðqÞ:
2π -1

In (22.56) we used a property of the δ function described by


1
1
δ ðk Þ = eikx dx: ð22:57Þ
2π -1

Changing the argument q to k in (22.56), we get


1
cð k Þ = e - ikx f ðxÞdx: ð22:58Þ
-1

The mathematical manipulation that transforms (22.55) into (22.58) is referred to


as the Fourier integral transform. The transformation from (22.58) to (22.55) is said
to be the inverse Fourier transform. These transforms are called the Fourier integral
transforms as a collective term.
Rewriting (22.55) as
22.2 Introductory Fourier Analysis [3] 989

1
1
f ðxÞ = p cðkÞeikx dk, ð22:59Þ
2π -1

where cðkÞ  p12π cðkÞ, similarly we get

p
1
- iqx 2π 1 p
e f ðxÞdx = cðk Þ2πδðk - qÞdk = 2π cðqÞ: ð22:60Þ
-1 2π -1

That is, we have


1
1
cðqÞ = p e - iqx f ðxÞdx: ð22:61Þ
2π -1
p
In (22.59) and (22.61), both f(x) and cðqÞ share 1= 2π as a coefficient of the
integral in common. Writing cðqÞ as c(q) anew, we obtain
1
1
f ðxÞ = p cðkÞeikx dk, ð22:62Þ
2π -1
1
1
cðk Þ = p e - ikx f ðxÞdx: ð22:63Þ
2π -1

In what follows, we make it a rule to represent the Fourier integral transforms as


(22.62) and (22.63), when we deal with the transformation of the canonical coordi-
nates and canonical momenta. Note, however, that when we calculate the invariant
delta functions and Feynman propagator, we use the Fourier transform pairs of
(22.55) and (22.58). Though non-essential, care should be taken accordingly.
In Sect. 5.1 we mentioned that the eigenvectors jki (k = 0, 1, 2, ⋯) (i.e., solutions
of an eigenvalue equation) form a complete orthonormal system (CONS) such that

P j kihk j = E, ð5:15Þ
k

where E is an identity operator. In (5.15) the number of jki might be either finite or
infinite, but jki was countable. A dynamical system we will be dealing with in this
chapter, however, normally has infinite and uncountable degrees of freedom. To
conform (5.15) to our present purpose, (5.15) should be modified as follows:
1
E= j kihk j dk, ð22:64Þ
-1

where E is the identity operator and k is a suitable continuous parameter.


990 22 Quantization of Fields

Both (5.15) and (22.64) can be regarded as the completeness of the projection
operators (see Sect. 14.1). Multiplying both sides of (22.64) by hxj and jfi from the
left and from the right, respectively, we get
1
hxjf i = hxjk ihkjf idk:
-1

Using the relation introduced in Sect. 10.4 as

f ðxÞ  hxjf i ð10:90Þ

and further defining

cðkÞ  hkjf i, ð22:65Þ

we have
1
f ðxÞ = hxjk icðk Þdk: ð22:66Þ
-1

Comparing (22.62) and (22.66), we obtain the following relationship [4]:

1
hxjk i = p eikx : ð22:67Þ

As in the case of (22.64), we have


1
E= j xihx j dx: ð22:68Þ
-1

Multiplying both sides of (22.68) by hkj and jfi from the left and from the right,
respectively, we get
1
hkjf i = hkjxihxjf idx: ð22:69Þ
-1

Thus, we obtain
1 1 1
1
cðk Þ = hkjxif ðxÞdx = hxjk i f ðxÞdx = p e - ikx f ðxÞdx
-1 -1 - 1 2π
1 ð22:70Þ
1 - ikx
=p e f ðxÞdx,
2π -1

where with the second equality we used (13.2). Thus, we reproduced (22.63).
22.3 Quantization of the Scalar Field [5, 6] 991

22.3 Quantization of the Scalar Field [5, 6]

22.3.1 Lagrangian Density and Action Integral

Going back to Sects. 1.1 and 21.1, the energy–momentum–mass relationship of a


particle is represented in the natural units as

E 2 = p2 þ m 2 : ð22:71Þ

As already seen in Sect. 21.1, the Klein–Gordon equation was based upon this
relationship. From (22.71), we have

E= ± p2 þ m2 , ð22:72Þ

to which both positive and negative energies are presumed. The quantization of
(neutral) scalar field (or Klein-Gordon field), Dirac field, and electromagnetic field
to be dealt with in this chapter is closely associated with this assumption. At the same
time, this caused problems at the early stage of the quantum field theory along with
the difficulty in the probabilistic interpretation of (21.9). Nevertheless, after Dirac
invented the equation named after himself (i.e., the Dirac equation), the Klein–
Gordon equation turned out to be valid as well as the Dirac equation as a basis
equation describing quantum fields [7].
Historically speaking, the quantum theory of fields was at first established as the
theory that dealt with the dynamical system consisting of photons and electrons and
referred to as quantum electrodynamics (QED). In this connection, the quantization
of scalar field is not necessarily required. In this section, however, we first describe
the quantization of the (neutral) scalar fields based on the Klein–Gordon equation. It
is because the scalar field quantization is not only the simplest example but also a
prototype of the related quantum theories of fields. Hence, we study the quantization
of scalar field in detail accordingly.
We start with the formulation of the Lagrangian density of the free Klein–Gordon
field. The Lagrangian density L is given by [5, 8]

1 μ
L= ∂ ϕ∂ ϕ - m2 ϕ2 : ð22:73Þ
2 μ

The action integral S(ϕ) follows from (22.73) such that

1 μ
Sð ϕÞ = dx L = dx ∂ ϕðxÞ∂ ϕðxÞ - m2 ½ϕðxÞ2 : ð22:74Þ
2 μ

Note that unlike (22.41) where the integration was taken only with the time
coordinate, the integration in (22.74) has been taken over the whole space–time.
That is
992 22 Quantization of Fields

1 1 1 1
dx  dx0 dx1 dx2 dx3 : ð22:75Þ
-1 -1 -1 -1

Taking variation with respect to ϕ of S(ϕ), we have

δSðϕÞ = δ dx LðxÞ

1 μ μ
= dx ∂μ δϕðxÞ ∂ ϕðxÞ þ ∂μ ϕðxÞ½∂ δϕðxÞ - m2  2ϕðxÞδϕðxÞ
2
1 μ 1 1 1
= dx ≠ μ f½δϕðxÞ∂ ϕðxÞgxμ = - 1 þ dx ≠ μ ∂μ ϕðxÞ½δϕðxÞ xμ = - 1
2 2
1 μ μ
þ dx - ∂μ ∂ ϕðxÞ - ∂ ∂μ ϕðxÞ - m2  2ϕðxÞ δϕðxÞ = 0,
2
ð22:76Þ

where the second to the last equality resulted from the integration by parts; x≠μ
(or x≠μ) in the first and second terms means the triple integral with respect to the
variables x other than xμ (or xμ). We assume that δϕ(x) → 0 at xμ (xμ) → ± 1.
Hence, with the second to the last equality of (22.76), the first and second integrals
vanish. Consequently, for δS(ϕ) = 0 to be satisfied with regard to any arbitrarily
taken δϕ(x), the integrand of the third integral must vanish. This is equivalent to

μ μ
- ∂μ ∂ ϕðxÞ - ∂ ∂μ ϕðxÞ - m2  2ϕðxÞ = 0:

That is, we obtain

μ
∂μ ∂ ϕðxÞ þ m2 ϕðxÞ = 0: ð22:77Þ

Note that in the above equation ∂μ∂μϕ(x) = ∂μ∂μϕ(x). Equation (22.77) is


identical with (21.11). Thus, we find that the Klein–Gordon equation has followed
from the principle of least action. In other words, on the basis of that principle we
obtain an equation of motion of the quantum field from the action integral described
by

Sð ϕÞ = dxL ϕðxÞ, ∂μ ϕðxÞ : ð22:78Þ

Namely, the Euler–Lagrange equation described by [2]

∂L ∂L
∂μ - =0 ð22:79Þ
∂ ∂μ ϕ ∂ϕ
22.3 Quantization of the Scalar Field [5, 6] 993

follows from δS(ϕ) = 0. The Euler–Lagrange equation can be regarded as a


relativistic extension of the Lagrange’s equations of motion (22.33). The derivation
of (22.79) is left for the readers [2].
The implication of (22.78) and (22.79) is that once L[ϕ(x), ∂μϕ(x)] is properly
formulated with a relativistic field, the equation of motion for that field can ade-
quately be derived from the principle of least action. The Lagrangian formalism of
the Dirac field and electromagnetic field will be given in Sects. 22.4 and 22.5,
respectively.

22.3.2 Equal-Time Commutation Relation and Field


Quantization

Once the Lagrangian density L has been given with the field to be quantized, the
quantization of fields is carried out as the equal-time commutation relation between
the canonical variables, i.e., generalized coordinate (or canonical coordinate) and
generalized momentum (or canonical momentum). That is, if L is given as
L[ϕ(x), ∂μϕ(x)] in (22.78) as a function of generalized coordinates ϕ(x) as well as
their derivatives ∂μϕ(x), the quantization (or canonical quantization) of the scalar
field is postulated as the canonical commutation relations expressed as

½ϕðt, xÞ, π ðt, yÞ = iδ3 ðx - yÞ,


ð22:80Þ
½ϕðt, xÞ, ϕðt, yÞ = ½π ðt, xÞ, π ðt, yÞ = 0:

In (22.80), generalized momentum π(x) is defined in analogy with (22.34) as

∂L
π : ð22:81Þ
∂ϕ_

Also, we have

δ 3 ð x - yÞ  δ x 1 - y 1 δ x 2 - y2 δ x3 - y3 :

If more than one coordinate is responsible for (22.80), that can appropriately be
indexed. In that case, the commutation relations are sometimes represented as a
matrix form (see Sect. 22.4). The first equation of (22.80) is essentially the same as
(1.140) and (1.141), which is rewritten in the natural units (i.e., ħ = 1) as

½q, p = iE, ð22:82Þ

where E is the identity operator. Thus, we notice that E of (22.82) is replaced with
δ3(x - y) in (22.80). This is because the whole spatial coordinates are responsible for
the field quantization.
994 22 Quantization of Fields

Equation (22.80) is often said to be the equal-time commutation relation. As is so


named, the equal-time commutation relation favors the time coordinate over the
other spatial coordinate. For this reason, the Lorentz covariance seems unclear, but
the covariance manifests itself later, especially when we deal with, e.g., the invariant
delta functions and the Feynman propagator.
With the free Klein–Gordon field, from (22.73) the canonical momentum π(t, x) is
described as

∂L
= ∂ ϕðt, xÞ = ϕ_ ðt, xÞ:
0
π ðt, xÞ = ð22:83Þ
∂ð∂0 ϕÞ

Using (22.83), the equal-time commutation relation is written as

½ϕðt, xÞ, π ðt, yÞ = ϕðt, xÞ, ϕ_ ðt, yÞ = iδ3 ðx - yÞ,


ð22:84Þ
½ϕðt, xÞ, ϕðt, yÞ = ϕ_ ðt, xÞ, ϕ_ ðt, yÞ = 0:

Now, we are in the position to do with our task at hand. We wish to seek Fourier
integral representation of the free Klein–Gordon field ϕ(x). The three-dimensional
representation is given by [8]

d3 k
ϕðt, xÞ = qk ðt Þeikx , ð22:85Þ
3
ð2π Þ

where d3k  dk1dk2dk3. Notice that (22.85) is the three-dimensional version of


(22.62). The Fourier integral representation of (22.85) differs from that appeared
in Sect. 22.2.2 in that the expansion coefficient qk(t) contains the time coordinate as a
parameter. Inserting (22.85) into the Klein–Gordon equation (22.77), we obtain
::
qk ðt Þ þ k2 þ m2 qk ðt Þ = 0: ð22:86Þ

This is the equation of motion of a harmonic oscillator. With a general solution of


(22.86), as in the case of (2.4) of Sect. 2.1 we have

qk ðt Þ = q1 ðkÞe - ik0 t þ q2 ðkÞeik0 t , ð22:87Þ

with

k0  k2 þ m2 ð > 0Þ: ð22:88Þ

Notice that in (22.87) each term has been factorized with respect to k and t so that
the exponential factors contain only t as a variable. The quantity k0 represents the
angular frequency of the harmonic oscillator and varies with |k|. Also, note that the
22.3 Quantization of the Scalar Field [5, 6] 995

angular frequency and energy have the same dimension in the natural units (vide
infra).
Here we assume that ϕ(x) represents the neutral scalar field so that ϕ(x) is
Hermitian. In the present case ϕ(x) is represented by a (1, 1) matrix. Whereas it is
a real c-number before the field quantization, after the field quantization it should be
regarded as a real q-number. Namely, the order of the multiplication must be
considered properly between the q-numbers.
Taking adjoint of (22.85), we have

d3 k
ϕ{ ðxÞ = qk { ðt Þe - ikx : ð22:89Þ
3
ð2π Þ

Replacing k with -k in (22.89), we get

d3 k
ϕ{ ðxÞ = q - k { ðt Þeikx : ð22:90Þ
3
ð2π Þ

For (22.90) we used

þ1 -1
d3 k ð- 1Þ3 d3 k
qk { ðt Þe - ikx = q - k { ðt Þeikx
-1 ð2π Þ 3 þ1 ð2π Þ 3

þ1
ð22:91Þ
d k
3
{
= q - k ðt Þe : ikx
-1 ð2π Þ3

In (22.91), the integration range covers a domain from -1 to +1 with each


component of k. Since the domain is symmetric relative to the origin, it suffices for
the integral calculation of (22.91) to switch k to -k in the integrand.
For ϕ(x) to be Hermitian, that is, for ϕ{(x) = ϕ(x) to hold, comparing (22.85) and
(22.90) we must have

q - k { ðt Þ = qk ðt Þ: ð22:92Þ

Meanwhile, from (22.87) we have

q - k { ðt Þ = q1 { ð- kÞeik0 t þ q2 { ð- kÞe - ik0 t : ð22:93Þ

Comparing once again (22.87) and (22.93), we obtain

q1 { ð- kÞ = q2 ðkÞ and q2 { ð- kÞ = q1 ðkÞ, ð22:94Þ


996 22 Quantization of Fields

with the first and second relations of (22.94) being identical. Substituting (22.87) for
(22.85), we get

1
d3 k
ϕðt, xÞ = q1 ðkÞe - ik0 t þ q2 ðkÞeik0 t eikx
-1 ð2π Þ 3

1 -1
d3 k ð- 1Þ3 d3 k
= q1 ðkÞe - ik0 t eikx þ q2 ð - kÞeik0 t eið- kÞx
-1 ð2π Þ 3 1 ð2π Þ 3

1 -1
d k
3
ð- 1Þ3 d3 k
= q1 ðkÞe - ik0 t eikx þ q1 { ðkÞeik0 t eið- kÞx
-1 ð2π Þ 3 1 ð2π Þ 3

1 1
d k
3
d k
3
= q1 ðkÞe - ikx þ q1 { ðkÞeikx
-1 ð2π Þ 3 -1 ð2π Þ 3

1
d k
3
= q1 ðkÞe - ikx þ q1 { ðkÞeikx ,
-1 ð2π Þ 3

ð22:95Þ

where with the second equality k of the second term has been replaced with -k; with
the third equality (22.94) was used in the second term. Note also that in (22.95)

kx  k0 x0 - kx = k2 þ m2 t - kx: ð22:96Þ

Defining a(k) as [8]

a ð kÞ  2k 0 q1 ðkÞ, ð22:97Þ

we get

d3 k
ϕð xÞ = aðkÞe - ikx þ a{ ðkÞeikx : ð22:98Þ
3
ð2π Þ ð2k0 Þ
p
The origin of the coefficient 2k0 is explained in literature [9] (vide infra).
Equation (22.98) clearly shows that ϕ(x) is Hermitian, that is ϕ{(x) = ϕ(x). We
divide (22.98) into the “positive-frequency” part ϕ+(x) and “negative-frequency”
part ϕ-(x) such that [5]
22.3 Quantization of the Scalar Field [5, 6] 997

ϕðxÞ = ϕþ ðxÞ þ ϕ - ðxÞ,


d3 k
ϕþ ð xÞ  aðkÞe - ikx ,
3
ð2π Þ ð2k0 Þ ð22:99Þ
d k
3
ϕ - ð xÞ  a{ ðkÞeikx :
3
ð2π Þ ð2k0 Þ

Note that the positive- and negative-frequency correspond to the positive- and
negative-energy, respectively. From (22.83) and (22.98) we have

0
 d3 k
π ð xÞ = ∂ ϕð xÞ = ϕð xÞ =
ð2π Þ3 ð2k0 Þ

 - ik 0 aðkÞe - ikx þ ik 0 a{ ðkÞeikx : ð22:100Þ

Next, we convert the equal-time commutation relations (22.84) into the commu-
tation relation between a(k) and a{(q). For this purpose, we solve (22.98) and
(22.100) with respect to a(k) and a{(k). To perform the calculation, we make the
most of the Fourier transform developed in Sect. 22.2.2. In particular, the three-
dimensional version of (22.57) is useful. The formula is expressed as
1
ð2π Þ3 δ3 ðkÞ = eikx d 3 x: ð22:101Þ
-1

To apply (22.101) to (22.98) and (22.100), we deform those equations such that

d3 k
ϕðt, xÞ = aðkÞe - ik0 t þ a{ ð- kÞeik0 t eikx , ð22:102Þ
3
ð2π Þ ð2k0 Þ

d3 k
ϕ_ ðt, xÞ = ð- ik 0 Þ aðkÞe - ik0 t - a{ ð- kÞeik0 t eikx : ð22:103Þ
3
ð2π Þ ð2k0 Þ

In (22.102) and (22.103), k has been replaced with -k in their second terms of
RHS. We want to solve these equations in terms of aðkÞe - ik0 t and a{ ð- kÞeik0 t . The
calculation procedures are as follows: (1) Multiply both sides of (22.102) and
(22.103) by e-iqx and integrate them with respect to x by use of (22.101) to get a
factor δ3(k - q). (2) Then, we perform the integration with respect to k to delete k.
As a consequence of the integration, k is replaced with q and k0 = k2 þ m2 is
replaced with q2 þ m2  q0 accordingly. In that way, we convert (22.102) and
(22.103) into the equations with respect to aðqÞe - iq0 t and a{ ðqÞeiq0 t . The result is
expressed as follows:
998 22 Quantization of Fields

d3 x
aðqÞe - iq0 t = i ϕ_ ðt, xÞ - iq0 ϕðt, xÞ e - iqx ,
3
ð2π Þ ð2q0 Þ
ð22:104Þ
d3 x
{
a ðqÞe iq0 t
= -i ϕ_ ðt, xÞ þ iq0 ϕðt, xÞ eiqx :
ð2π Þ3 ð2q0 Þ

Note that in the second equation of (22.104) -q was switched back to q to change
a{(-q) to a{(q).
(3) Next, we calculate the commutation relation between a(k) and a{(q). For this,
the commutator aðkÞe - ik0 t , a{ ðqÞeiq0 t can directly be calculated using the equal-
time commutation relations (22.84). The factors e - ik0 t and eiq0 t are gotten rid of in
the final stage. The relevant calculations are straightforward and greatly simplified
by virtue of the integration of the factor δ3(x - y) that comes from (22.84). The use
of (22.101) also eases computations. As a result, we get succinct equation that is
described by

q0
aðkÞe - ik0 t , a{ ðqÞeiq0 t = e - iðk0 - q0 Þt aðkÞ, a{ ðqÞ = δ3 ðk - qÞ:
k 0 q0

Since e - iðk0 - q0 Þt never vanishes, we have [a(k), a{(q)] = 0 if k ≠ q. If k = q, we


obtain k0 = k2 þ m2 = q2 þ m2 = q0 . Consequently, [a(k), a{(q)] = δ3(k - q).
Meanwhile, we have [a(k), a(q)] = [a{(k), a{(q)] = 0 with any value of k and q.
Summarizing the above results, with the commutation relations among the
annihilation and creation operators we get

aðkÞ, a{ ðqÞ = δ3 ðk - qÞ,


ð22:105Þ
½aðkÞ, aðqÞ = a{ ðkÞ, a{ ðqÞ = 0:
p
Note that the factor 2k0 of (22.97) has been introduced so that (22.105) can be
expressed in the simplest form possible. Equation (22.105) is free from the inclusion
of space–time coordinates and, hence, conveniently used for various calculations,
e.g., those of interaction between the quantum fields. Looking at (22.105), we find
that we have again reached a result similar to that obtained in Sect. 2.2 expressed as

a, a{ = 1 and a, a{ = E: ð2:24Þ

In fact, characteristics of a(k), a{(q), and a, a{ are closely related in that a{(q) and
{
a function as the creation operators and that a(k) and a act as the annihilation
operators. Thus, the above discussion clearly shows that the quantum-mechanical
description of the harmonic oscillators is of fundamental importance in the quantum
field theory. The operator a(k) is sometimes given in a simple form such that [8]
22.3 Quantization of the Scalar Field [5, 6] 999

d3 x $
aðkÞ = i eikx ∂0 ϕ , ð22:106Þ
ð2π Þ3 ð2k 0 Þ

where

$
f ∂0 g  f ð∂0 gÞ - ð∂0 f Þg: ð22:107Þ

As already shown in Chap. 2, the use of a(k) and a{(k) enables us to calculate
various physical quantities in simple algebraic forms. In the next section, we give the
calculation processes for the Hamiltonian of the free Klein–Gordon field. The
concept of the Fock space is introduced as well. The results can readily be extended
to the calculations regarding, e.g., the Dirac field and photon field.

22.3.3 Hamiltonian and Fock Space

The Hamiltonian is described in a form analogous to that of the classical theory of


analytical mechanics. It is given as the spatial integration of the Hamiltonian density
ℋ defined as

ℋðxÞ = π ðxÞϕð_xÞ - L ϕðxÞ; ∂μ ϕðxÞ : ð22:108Þ

In our present case, we have π ðxÞ = ϕ_ ðt, xÞ and from (22.73) we get

1
ℋð xÞ = ½π ðxÞ2 þ ½— ϕðxÞ2 þ ½mϕðxÞ2 : ð22:109Þ
2

The Hamiltonian H is then given by

1
H= d3 x ℋ = d3 x ½π ðxÞ2 þ ½— ϕðxÞ2 þ ½mϕðxÞ2 : ð22:110Þ
2

The first term of the above integral is


1000 22 Quantization of Fields

1 1
d3 x ϕ_ ðt; xÞ
2
d3 x½π ðt; xÞ2 =
2 2
1 d3 k
= d3 x ð - ik 0 Þ aðkÞe - ik0 t - a{ ð - kÞeik0 t eikx
2 3
ð2π Þ ð2k0 Þ
d q
3
× ð - iq0 Þ aðqÞe - iq0 t - a{ ð - qÞeiq0 t eiqx
3
ð2π Þ ð2q0 Þ

1 d3 kd3 q
=- d3 x k0 q0 aðkÞaðqÞe - iðk0 þq0 Þt
4ð2π Þ3 k 0 q0

- aðkÞa{ ð - qÞe - iðk0 - q0 Þt - a{ ð - kÞaðqÞeiðk0 - q0 Þt

þ a{ ð - kÞa{ ð - qÞeiðk0 þq0 Þt eiðkþqÞx

1 d 3 kd3 q
=- 3
d3 xeiðkþqÞx k0 q0 aðkÞaðqÞe - iðk0 þq0 Þt ½ {
4ð2π Þ k 0 q0
- aðkÞa{ ð - qÞe - iðk0 - q0 Þt - a{ ð - kÞaðqÞeiðk0 - q0 Þt

þ a{ ð - kÞa{ ð - qÞeiðk0 þq0 Þt

1 d3 kd3 q
=- 3
ð2π Þ3 δðk þ qÞk 0 q0 aðkÞaðqÞe - iðk0 þq0 Þt ½{{
4ð2π Þ k 0 q0
- aðkÞa{ ð - qÞe - iðk0 - q0 Þt - a{ ð - kÞaðqÞeiðk0 - q0 Þt

þ a{ ð - kÞa{ ð - qÞeiðk0 þq0 Þt

1 d3 k
=- 3
ð2π Þ3 k0 q0 aðkÞað - kÞe - iðk0 þq0 Þt
4ð2π Þ k 0 q0
- aðkÞa{ ðkÞe - iðk0 - q0 Þt - a{ ð - kÞað - kÞeiðk0 - q0 Þt

þ a{ ð - kÞa{ ðkÞeiðk0 þq0 Þt

1 d3 k
= k0 q0 - aðkÞað - kÞe - iðk0 þq0 Þt þ aðkÞa{ ðkÞe - iðk0 - q0 Þt
4 k 0 q0
þ a{ ð- kÞað - kÞeiðk0 - q0 Þt - a{ ð - kÞa{ ðkÞeiðk0 þq0 Þt
1
= d3 kk0 - aðkÞað - kÞe - 2ik0 t þ aðkÞa{ ðkÞ þ a{ ðkÞaðkÞ ½{{{
4

- a{ ð - kÞa{ ðkÞe2ik0 t : ð22:111Þ


22.3 Quantization of the Scalar Field [5, 6] 1001

In the above somewhat lengthy calculations, we have several points. Namely, at


[{] we exchanged the order of the integrations d3x and d3kd3q with respect to the
factor ei(k + q)x. This exchange allowed us to use (22.101) at [{{]. At [{ { {] we
switched -k to k with regard to the factor a{(-k)a(-k). Also, we used k0 = q0. It is
because since k0 = k2 þ m2 and q0 = q2 þ m2 , we have k0 = q0 in the case of
k + q = 0, which comes from the factor δ(k + q) appearing in [{{].
The second and third terms of (22.110) can readily be calculated in a similar
manner. We describe the results as follows:

1
d3 x½— ϕðxÞ2
2
1 d3 k 2 ð22:112Þ
= k þ aðkÞað - kÞe - 2ik0 t þ aðkÞa{ ðkÞ þ a{ ðkÞaðkÞ
4 k 0 q0
þ a{ ð - kÞa{ ðkÞe2ik0 t ,
1
d3 x½mϕðxÞ2
2
1 d3 k ð22:113Þ
= m2 þ aðkÞað - kÞe - 2ik0 t þ aðkÞa{ ðkÞ þ a{ ðkÞaðkÞ
4 k 0 q0
þ a{ ð - kÞa{ ðkÞe2ik0 t :

Combining (22.112) and (22.113), we obtain

1
d3 x½— ϕðxÞ2 þ ½mϕðxÞ2
2
1 d3 k
= k2 þ m2
4 k 0 q0
× þaðkÞað - kÞe - 2ik0 t þ aðkÞa{ ðkÞ þ a{ ðkÞaðkÞ þ a{ ð - kÞa{ ðkÞe2ik0 t
1
= d3 kk0
4
× þaðkÞað - kÞe - 2ik0 t þ aðkÞa{ ðkÞ þ a{ ðkÞaðkÞ þ a{ ð - kÞa{ ðkÞe2ik0 t ,
ð22:114Þ

where we used k2 + m2 = k02 = q02. The integrals including the terms ∓


aðkÞað- kÞe - 2ik0 t and ∓ a{ ð- kÞa{ ðkÞe2ik0 t in (22.111) and (22.114) are canceled
out. Taking account of these results, finally we get the Hamiltonian in a simple form
described by
1002 22 Quantization of Fields

1
H = d3 kk 0 aðkÞa{ ðkÞ þ a{ ðkÞaðkÞ
2
ð22:115Þ
1
= d3 k k2 þ m2 aðkÞa{ ðkÞ þ a{ ðkÞaðkÞ ,
2

where we explicitly described the dependence of the integrand on k.


Let us continue the discussion further. Using the commutation relation (22.105),
we have

lim aðkÞ, a{ ðk - eÞ = lim δ3 ðeÞ,


e→0 E→0

where e is an infinitesimal three-dimensional real vector. Then, we have

1
H= d3 kk 0 a{ ðkÞaðkÞ þ a{ ðkÞaðkÞ þ lim δ3 ðeÞ : ð22:116Þ
2 e→0

From (22.101), we obtain

1 1
lim δ3 ðeÞ = lim eiex d3 x → d3 x:
e→0 ð2π Þ3 e → 0 ð2π Þ3

Therefore, we get

1
H= d 3 kk0 a{ ðkÞaðkÞ þ k0 d3 k d3 x: ð22:117Þ
2ð2π Þ3

Defining the number density operator n(k) as

nðkÞ  a{ ðkÞaðkÞ, ð22:118Þ

we have

1 1
H= d3 kk0 nðkÞ þ d3 kd3 x k : ð22:119Þ
ð2π Þ3 2 0

The second term of (22.119) means a total energy included in the whole phase
space. In other words, the quantity (k0/2) represents the energy density per “volume”
(2π)3 of phase space. Note here that in the natural units the dimension of time and
length is the inverse of mass (mass dimension). The angular frequency and
wavenumber as well as energy and momentum have the same dimension of mass
accordingly. This is represented straight in (22.88).
Representing (2.21) in the natural units, we have
22.3 Quantization of the Scalar Field [5, 6] 1003

1
H = ωa{ a þ ω: ð22:120Þ
2

The analogy between (22.117) and (22.120) is evident. The operator n(k) is
closely related to the number operator introduced in Sect. 2.2; see (2.58). The second
term of (22.117) corresponds to the zero-point energy and has relevance to the
discussion about the harmonic oscillator of Chap. 2. Thus, we appreciate how deeply
the quantization of fields is connected to the treatment of the harmonic oscillators.
In accordance with (2.28) of Sect. 2.2, we define the state of vacuum j0i such that

8
aðkÞ j 0i = 0 for k: ð22:121Þ

Operating (22.119) on j0i, we have

1 1
H j 0i = d3 kk0 nðkÞ j 0i þ d3 kd 3 x k j 0i
ð2π Þ3 2 0
ð22:122Þ
1 1
= d3 kd 3 x k j 0i:
ð2π Þ3 2 0

That is, j0i seems to be an “eigenstate” of H that belongs to an “eigenvalue” of


1
ð2π Þ3
d3 kd3 x 12 k 0 . Since this value diverges to infinity, it is usually ignored and we
plainly state that the energy of j0i (i.e., vacuum) is zero. To discuss the implication
of (22.119) and (22.122) is beyond the scope of this book and, hence, we do not get
into further details about this issue.
In contrast to the above argument, the following momentum operator Pμ has a
well-defined meaning:

1 1 μ
Pμ  d 3 kkμ nðkÞ þ d3 kd3 x k ðμ = 1, 2, 3Þ: ð22:123Þ
ð2π Þ3 2

The second term vanishes, because kμ (μ = 1, 2, 3) is an antisymmetric function


with respect to kμ (μ = 1, 2, 3). Then, we get

Pμ = d 3 kkμ nðkÞ ðμ = 1, 2, 3Þ: ð22:124Þ

Notice that from (22.119) and (22.124), if we ignore the second term of (22.119)
and the index μ is extended to 0, we have P0  H. Thus, we reformulate Pμ as a four-
vector such that

Pμ = d3 kk μ nðkÞ ðμ = 0, 1, 2, 3Þ: ð22:125Þ


1004 22 Quantization of Fields

Meanwhile, using the creation operators a{(k), let us consider the following
states:

j k 1 , k 2 , ⋯, k n i  a{ ðk1 Þa{ ðk2 Þ⋯a{ ðkn Þ j 0i, ð22:126Þ

where the freedom of the dynamical system is specified by an integer i (i = 1, ⋯, n);


ki denotes the four-momentum of a particle that is present at the ith state.
Operating a{(q)a(q) on both sides of (22.126) from the left, we have

a{ ðqÞaðqÞ j k1 , k2 , ⋯kn i = a{ ðqÞaðqÞa{ ðk1 Þa{ ðk2 Þ⋯a{ ðkn Þ j 0i


= a{ ðqÞ a{ ðk1 ÞaðqÞ þ δ3 ðq - k1 Þ a{ ðk2 Þ⋯a{ ðkn Þ j 0i
= a{ ðqÞa{ ðk1 ÞaðqÞa{ ðk2 Þ⋯a{ ðkn Þ j 0i þ δ3 ðq - k1 Þa{ ðqÞa{ ðk2 Þ⋯a{ ðkn Þ j 0i
= a{ ðqÞa{ ðk1 Þ a{ ðk2 ÞaðqÞ þ δ3 ðq - k2 Þ ⋯a{ ðkn Þ j 0i
þδ3 ðq - k1 Þa{ ðqÞa{ ðk2 Þ⋯a{ ðkn Þ j 0i
= a{ ðqÞa{ ðk1 Þa{ ðk2 ÞaðqÞ⋯a{ ðkn Þ j 0i þ δ3 ðq - k2 Þa{ ðqÞa{ ðk1 Þ⋯a{ ðkn Þ j 0i
þδ3 ðq - k1 Þa{ ðqÞa{ ðk2 Þ⋯a{ ðkn Þ j 0i
=⋯
= a{ ðqÞa{ ðk1 Þa{ ðk2 Þ⋯a{ ðkn ÞaðqÞ j 0i þ δ3 ðq - kn Þa{ ðqÞa{ ðk1 Þ⋯a{ ðkn - 1 Þ j 0i
þ⋯ þ δ3 ðq - k1 Þa{ ðqÞa{ ðk2 Þ⋯a{ ðkn Þ j 0i
= δ3 ðq - kn Þa{ ðqÞa{ ðk1 Þ⋯a{ ðkn - 1 Þ j 0i
þ⋯ þ δ3 ðq - k1 Þa{ ðqÞa{ ðk2 Þ⋯a{ ðkn Þ j 0i,
ð22:127Þ

where with the second to the last equality the first term vanishes because from
(22.121) we have a(q) j 0i = 0; with the last equality n terms containing the δ3(q -
ki) (i = 1, ⋯, n) factor survive.
Thus, from (22.118), (22.125), and (22.127) we get

Pμ j k1 , k 2 , ⋯, k n i = d 3 qkμ a{ ðqÞaðqÞ j k1 , k 2 , ⋯, k n i

= d3 qkμ δ3 ðq - kn Þa{ ðqÞa{ ðk1 Þ⋯a{ ðkn - 1 Þ j 0i

þ ⋯ þ δ3 ðq - k1 Þa{ ðqÞa{ ðk2 Þ⋯a{ ðkn Þ j 0i


= k nμ a{ ðkn Þa{ ðk1 Þ⋯a{ ðkn - 1 Þ j 0i þ ⋯ þ k1μ a{ ðk1 Þa{ ðk2 Þ⋯a{ ðkn Þ j 0i
= k 1μ þ ⋯ þ k nμ a{ ðk1 Þa{ ðk2 Þ⋯a{ ðkn Þ j 0i
= k 1μ þ ⋯ þ k nμ j k1 , k2 , ⋯, k n i, ð22:128Þ

where with the second to the last equality we used the second equation of (22.105);
with the last equality we used the definition of (22.126). With respect to the energy
of jth state, we have
22.3 Quantization of the Scalar Field [5, 6] 1005

k 0j  k2j þ m2 ð > 0Þ: ð22:129Þ

Therefore, once kj ( j = 1, ⋯, n) is determined from (22.128), k 0j is also deter-


mined from (22.129) and, hence, kμj (μ = 0, 1, 2, 3) is determined accordingly. Thus,
we obtain [8]

Pμ j k1 , k2 , ⋯, kn i = kμ1 þ ⋯ þ kμn j k 1 , k 2 , ⋯, kn i ðμ = 1, 2, 3Þ: ð22:130Þ

From (22.130), we find that jk1, k2, ⋯, kni is the eigenstate of the four-momentum
operator Pμ (μ = 0, 1, 2, 3) that belongs to the eigenvalue k μ1 þ ⋯ þ k μn . The state
described by jk1, k2, ⋯, kni is called a Fock basis. A Hilbert space spanned by the
Fock basis is called a Fock space. As in the case of Chap. 2, the operator a{(k) is
interpreted as an operator that creates a particle having momentum k.
The calculation procedures of (22.127) naturally lead to the normal product
(or normal-ordered product), where the annihilation operators a(k) always stand to
the right of the creation operators a{(k). This is because if so, any term that contains
a(k) j 0i as a factor vanishes in virtue of (22.121). This situation is reminiscent of
(2.28), where jaψ 0i = 0 corresponds to a(k) j 0i = 0.

22.3.4 Invariant Delta Functions of the Scalar Field

We have already mentioned that the Lorentz invariance (or covariance) is not
necessarily clear with the equal-time commutation relation. In this section we
describe characteristics of the invariant delta functions of the scalar field to explicitly
show the Lorentz invariance of these functions. In this section, we deal with the
invariant delta functions of the scalar field in detail, since their usage offers a typical
example as a prototype of the field quantization and, hence, can be applied to the
Dirac field and photon field as well.
The invariant delta function is defined using a commutator as

iΔðx - yÞ  ½ϕðxÞ, ϕðyÞ, ð22:131Þ

t
where x and y represent an arbitrary space–time four-coordinate such that x =
x
t0
and y = . Several different invariant delta functions will appear soon. Since
y
their definition and notation differ depending on literature, care should be taken.
Note that although we have first imposed the condition t = t′ on the commutation
relation, it is not the case with the present consideration. Since the invariant delta
function Δ(x - y) is a scalar function as we shall see below, we must have a Lorentz
1006 22 Quantization of Fields

invariant form for it. One of the major purposes of this section is to represent Δ(x -
y) and related functions in the Lorentz invariant form.
Dividing the positive-frequency part and negative-frequency part of ϕ(x) and
ϕ( y) such that [5]

ϕðxÞ = ϕþ ðxÞ þ ϕ - ðxÞ,


ð22:132Þ
ϕðyÞ = ϕþ ðyÞ þ ϕ - ðyÞ,

we have

½ϕðxÞ, ϕðyÞ = ½ϕþ ðxÞ, ϕþ ðyÞ þ ½ϕþ ðxÞ, ϕ - ðyÞ


ð22:133Þ
þ½ϕ - ðxÞ, ϕþ ðyÞ þ ½ϕ - ðxÞ, ϕ - ðyÞ:

From the expansion formulae of (22.99) as well as the second equation of


(22.105), we find that

½ϕþ ðxÞ, ϕþ ðyÞ = ½ϕ - ðxÞ, ϕ - ðyÞ = 0: ð22:134Þ

Then, we have

½ϕðxÞ, ϕðyÞ = ½ϕþ ðxÞ, ϕ - ðyÞ þ ½ϕ - ðxÞ, ϕþ ðyÞ: ð22:135Þ

Next, applying the first equation of (22.105) to the commutator consisting of


(22.99), we obtain

½ϕþ ðxÞ, ϕ - ðyÞ


d3 kd3 q
= aðkÞe - ikx , a{ ðqÞeiqy
ð2π Þ3 2 k0 q0
d3 kd3 q d3 kd 3 q
= e - iðkx - qyÞ aðkÞ, a{ ðqÞ = e - iðkx - qyÞ δ3 ðk - qÞ
ð2π Þ3 2 k0 q0 ð2π Þ3 2 k0 q0
1 d3 k - ikðx - yÞ
= e ,
ð2π Þ3 2k 0
ð22:136Þ

where with the third equality we used the first equation of (22.105). Also, in (22.136)
we used the fact that since k 0 = k2 þ m2 and q0 = q2 þ m2 with k = q, we have
k0 = q0 accordingly (vide supra). We define one of the invariant delta functions, i.e.,
Δ+(x - y) as
22.3 Quantization of the Scalar Field [5, 6] 1007

1 d 3 k - ikðx - yÞ
iΔþ ðx - yÞ  ½ϕþ ðxÞ, ϕ - ðyÞ = e : ð22:137Þ
ð2π Þ3 2k0

Exchanging the coordinate x and y in (22.137) and reversing the order of the
operators in the commutator, we get the following expression where another invari-
ant delta function Δ-(x - y) is defined as [5]

iΔ - ðx - yÞ  ½ϕ - ðxÞ, ϕþ ðyÞ = - iΔþ ðy - xÞ =


1 d 3 k - ikðy - xÞ
- e : ð22:138Þ
ð2π Þ3 2k0

From (22.131), (22.135), (22.137), and (22.138), we have

Δðx - yÞ = Δþ ðx - yÞ þ Δ - ðx - yÞ: ð22:139Þ

From (22.137) and (22.138), we can readily get the following expression
described by

□x þ m2 Δþ ðx - yÞ = □x þ m2 Δ - ðx - yÞ = 0,
ð22:140Þ
□x þ m2 Δðx - yÞ = 0,

where □x means that the differentiation should be performed with respect to x.


Note that upon defining the invariant delta functions, the notations and equalities
vary from literature to literature. In this book, to avoid the complexity of the notation
for “contraction,” we have chosen (22.137) and (22.138) for the definitions of the
invariant delta functions. The contraction is frequently used for the perturbation
calculation, which will appear later in Chap. 23.
The contraction is a convenient notation to describe a time-ordered product
(or T-product) that is defined as h0| T[ϕ(x)ϕ( y)]| 0i (see Sect. 22.3.5). The time-
ordered product is represented by the following symbol:

h0jT½ϕðxÞϕðyÞj0i  ϕðxÞϕðyÞ: ð22:141Þ


j j

Equation (22.141) is referred to as the Feynman propagator and the discussion


about it will be made in the next section.
To show that (22.137) is explicitly Lorentz invariant, we wish to convert the
k0
integral with respect to k into the integral with respect to k = . For this
k
purpose, we use the δ function. We have
1008 22 Quantization of Fields

δ k 2 - m2 = δ k 0 2 - k2 þ m2 
1
= δ k0 - k2 þ m 2 þ δ k 0 þ k2 þ m 2 :
2 k þ m2
2

ð22:142Þ

This relation results from the following property of the δ function [10]. We have

δðx - an Þ
δ½f ðxÞ = , ð22:143Þ
n jf 0 ðan Þj

where f(an) = 0, f ′(an) ≠ 0. As a special case, we obtain [4]

1
δ x 2 - a2 = ½δðx - aÞ þ δðx þ aÞ ða > 0Þ: ð22:144Þ
2a

Replacing x with k0 and a with k2 þ m2 , we get (22.142).


Taking account of k0 > 0 in (22.136), we wish to pick a positive value up in
(22.142). Multiplying both sides of (22.142) by θ(k0), we get

δ k 2 - m 2 θ ðk 0 Þ
1
= δ k0 - k2 þ m2 θðk0 Þ þ δ k0 þ k2 þ m2 θðk0 Þ :
2 k þ m2
2

ð22:145Þ

Since the argument in the δ function is positive in the second term of (22.145),
that term vanishes and we are left with

1
δ k 2 - m 2 θ ðk 0 Þ = δ k0 - k2 þ m 2 : ð22:146Þ
2 k þ
2
m2

Operating an arbitrary function f(k0) on both sides of (22.146), we obtain

1
δ k 2 - m2 θðk0 Þf ðk 0 Þ = δ k0 - k2 þ m 2 f ð k 0 Þ
2 k þ
2
m2
1
= δ k0 - k2 þ m 2 f k2 þ m2 ,
2 k þ
2
m2
ð22:147Þ

where with the second equality we used the following relation [11]:
22.3 Quantization of the Scalar Field [5, 6] 1009

f ðxÞ δðx - aÞ = f ðaÞδðx - aÞ: ð22:148Þ

Integrating (22.147) with respect to k0, we obtain

dk 0 δ k 2 - m2 θðk0 Þf ðk0 Þ

1
¼ dk0 δ k0 - k2 þ m2 f k2 þ m 2 ð22:149Þ
2 k þ m2
2

1
¼ f k2 þ m2 :
2 k þ m2
2

Multiplying both sides of (22.149) by g(k) and further integrating it with respect
to k, we get

d3 kdk 0 δ k2 - m2 θðk0 Þf ðk 0 ÞgðkÞ


ð22:150Þ
f k2 þ m2
= d kδ k - m θðk0 Þf ðk 0 ÞgðkÞ =
4 2 2
d k
3
gðkÞ:
2 k2 þ m 2

We wish to use (22.150) to show that (22.137) is explicitly Lorentz invariant. To


this end, we put

f ðk0 Þ  e - ik0 ðx - y0 Þ
gðkÞ  eikðx - yÞ :
0
and ð22:151Þ

Under the condition of k0 = k2 þ m2 , in RHS of (22.150) we have

f k2 þ m 2 1 - ik0 ðx0 - y0 Þ ikðx - yÞ 1 - ikðx - yÞ


g ð kÞ = e e = e :
2 k2 þ m2 2k 0 2k 0

Applying (22.150) to (22.137), we get

1 d 3 k - ikðx - yÞ
iΔþ ðx - yÞ = e
ð2π Þ3 2k0
ð22:152Þ
1
= d 4 kδ k2 - m2 θðk 0 Þe - ikðx - yÞ :
ð2π Þ3

Note that with the first equality of (22.152) the factor e-ik(x - y) is given by
1010 22 Quantization of Fields

p
k2 þm2 ðx0 - y0 Þ ikðx - yÞ
e - ikðx - yÞ = e - i e ð22:153Þ

with k0 replaced with k2 þ m2 . With the second equality of (22.152), however, k0


is changed to an explicit integration variable. Also, notice that in spite of the above
change in k0, the function e-ik(x - y) is represented in the same form in (22.152)
throughout. The important point is that the positive number k2 þ m2 does not
appear in LHS of (22.145) explicitly, but that number is absorbed into the Lorentz
invariant form of δ(k2 - m2)θ(k0). Thus, (22.137) has successfully been converted
into the explicitly Lorentz invariant form.
In a similar way, for Δ-(x - y) we obtain another Lorentz invariant form
described by

-1
iΔ - ðx - yÞ = dkδ k 2 - m2 θðk 0 Þeikðx - yÞ : ð22:154Þ
ð2π Þ3

From (22.137) to (22.139) as well as (22.152) and (22.154), we have

iΔðx - yÞ = iΔþ ðx - yÞ þ iΔ - ðx - yÞ
= ½ϕþ ðxÞ, ϕ - ðyÞ þ ½ϕ - ðxÞ, ϕþ ðyÞ
1
= d4 kδ k 2 - m2 θðk0 Þ e - ikðx - yÞ - eikðx - yÞ ð22:155Þ
ð2π Þ3
- 2i
= d4 kδ k 2 - m2 θðk0 Þ sin k ðx - yÞ:
ð2π Þ3

Note that as a representation equivalent to (22.155), we have

1
iΔðx - yÞ = d 4 kδ k2 - m2 εðk0 Þe - ikðx - yÞ , ð22:156Þ
ð2π Þ3

where ε(x) is defined by

εðxÞ  x=jxj, εð0Þ  0 ðx : realÞ: ð22:157Þ

To show that (22.155) and (22.156) are identical, use the fact that ε(x) is an odd
function with respect to x that is expressed as

εðxÞ = θðxÞ - θð- xÞ: ð22:158Þ

Thus, we have shown that all the above functions Δ+(x - y), Δ-(x - y), and
Δ(x - y) are Lorentz invariant. On account of the Lorentz invariance, these functions
are said to be invariant delta functions. Since the invariant delta functions are given
22.3 Quantization of the Scalar Field [5, 6] 1011

Fig. 22.1 Contours to


evaluate the complex
integrations with respect to
the invariant delta functions
Δ±(x - y) and Δ(x - y) of
the scalar field. The contours
C+, C-, and C correspond to
Δ+(x - y), Δ-(x - y), and − 0
Δ(x - y), respectively

= +

as a scalar, they are of special importance in the quantum field theory. This situation
is the same with the Dirac field and photon field as well.
In (22.152) and (22.154), we assume that the integration range spans the whole
real numbers from -1 to +1. As another way to represent the invariant delta
functions in the Lorentz invariant form, we may choose the complex variable for k0.
The contour integration method developed in Chap. 6 is useful for the present
purpose. In Fig. 22.1, we draw several contours to evaluate the complex integrations
with respect to the invariant delta functions. With these contours, the invariant delta
functions are represented as [5]

1 e - ikðx - yÞ
Δ ± ð x - yÞ = d4 k : ð22:159Þ
ð2π Þ4 C± m2 - k 2

To examine whether (22.159) is equivalent to (22.137) or (22.138), we rewrite the


denominator of (22.159) as

m2 - k2 = m2 þ k2 - k20 : ð22:160Þ

Further defining a positive number K  k2 þ m2 in accordance with (22.149)


and defining the integral I C ± as

e - ikðx - yÞ
IC ±  d4 k , ð22:161Þ
C± m2 - k2

we have
1012 22 Quantization of Fields

e - ikðx - yÞ 1 - ikðx - yÞ 1 1
IC ± = d4 k = d4 k e -
C± m2 - k 2 C± 2K k0 þ K k0 - K
ð22:162Þ
1 - ik 0 ðx0 - y0 Þ 1 1
= d3 keikðx - yÞ dk 0 e - :
2K C± k0 þ K k0 - K

1
The function k0 þK is analytic within the complex domain k0 encircled by C+. With
the function k0 -1 K , however, a simple pole is present within the said domain
(at k0 = K ). Therefore, choosing C+ for the contour, only the second term contributes
to the integral with the integration I Cþ . It is given by

1 - ikðx - yÞ 1
I Cþ = d4 k e -
Cþ 2K k0 - K

dk 0 e - ik0 ðx - y0 Þ
1 1
d 3 keikðx - yÞ
0
= -
2K Cþ k0 - K
ð22:163Þ
ikðx - yÞ 1 - ik 0 ðx0 - y0 Þ 1
= d ke
3
2πiRes e -
2K k0 - K k0 = K

- e - iK ðx - y Þ ,
1
d 3 keikðx - yÞ
0 0
= 2πi
2K

where with the third equality we used (6.146) and (6.148). Then, from (22.159) Δ+ is
given by [5]

1 -i 1 - ikðx - yÞ
Δ þ ð x - yÞ = I Cþ = d3 k e , ð22:164Þ
ð2π Þ4 ð2π Þ3 2K

where we have

e - ikðx - yÞ = e - iK ðx - y0 Þ ikðx - yÞ
0
e : ð22:165Þ

Thus, Δ+(x - y) of (22.159) is obviously identical to (22.137) if C+ is chosen for


the contour. Notice that in (22.164) k0 is no longer an integration variable, but a real
positive number K = k2 þ m2 as in (22.165), independent of k0. In turn, the
1
function k0 - K is analytic within the complex domain encircled by C-. With the
1
function k0 þK , a simple pole is present within the said domain. Therefore, choosing
C- for the contour, only the first term of RHS of (22.162) contributes to the integral
with the integration I C - . It is given by
22.3 Quantization of the Scalar Field [5, 6] 1013

1 - ikðx - yÞ 1
IC - = d4 k e
C- 2K k0 þ K

dk 0 e - ik0 ðx - y0 Þ
1 1
d3 keikðx - yÞ
0
=
2K C- k0 þ K

2πiRes e - ik0 ðx - y0 Þ
1 1 ð22:166Þ
d3 keikðx - yÞ
0
=
2K k0 þ K k0 = - K

1 3 - ikðx - yÞ iK ðx0 - y0 Þ
= 2πi d ke e
2K
1 ikðx - yÞ
= 2πi d3 k e ,
2K

where with the third equality we used (6.148) again. Also, note that with the second
to the last equality we used

d 3 keikðx - yÞ = d 3 ke - ikðx - yÞ : ð22:167Þ

Then, Δ-(x - y) is given by

1 i 1 ikðx - yÞ
Δ - ðx - yÞ = I =
4 C-
d3 k e , ð22:168Þ
ð2π Þ ð2π Þ3 2K

where eikðx - yÞ = eiK ðx - y Þ e - ikðx - yÞ . This shows that Δ-(x - y) of (22.159) is


0 0

identical with (22.138) if C- is chosen for the contour.


Then, from (22.139), (22.164), and (22.168), we have [5]

-i 1
Δ ðx - yÞ = d3 k e - ikðx - yÞ - eikðx - yÞ
ð2π Þ3 2K
-1 1
= d3 k sin k ðx - yÞ: ð22:169Þ
ð2π Þ3 K

Or from (22.131), we have equivalently

1 1
½ϕðxÞ, ϕðyÞ = d3 k e - ikðx - yÞ - eikðx - yÞ : ð22:170Þ
ð2π Þ3 2K

This relation can readily be derived from (22.137) and (22.138) as well.
Returning to the notation (22.159), we get
1014 22 Quantization of Fields

1 1 e - ikðx - yÞ e - ikðx - yÞ
Δðx - yÞ = I þ IC - =
4 Cþ
d4 k þ d4 k
ð2π Þ ð2π Þ4 Cþ m2 - k 2 C- m2 - k2
- ik ðx - yÞ
1 e
= d4 k ,
ð2π Þ4 C m2 - k 2
ð22:171Þ

where C is a contour that encircles both C+ and C- as shown in Fig. 22.1. This is
evident from (6.87) and (6.146) that relate the complex integral with residues.
We summarize major properties of the invariant delta functions below. From
(22.169), we have

d3 k
ΔðxÞ = - i½ϕðxÞ, ϕð0Þ = - i e - ikx - eikx : ð22:172Þ
ð2π Þ3 2k 0

Therefore, Δ(x) = - Δ(-x) (i.e., odd function). Putting x0 = 0 in (22.172), we


have

d3 k
ΔðxÞjx0 = 0 = - i eikx - e - ikx = 0, ð22:173Þ
ð2π Þ3 2k 0

where with the second equality we used (22.167). Differentiating (22.172) with
respect to x0, we get

: d3 k d3 k
ΔðxÞ = - i ð- ik 0 Þe - ikx - ik 0 eikx = -
ð2π Þ3 2k 0 ð2π Þ3 2
 e - ikx þ eikx : ð22:174Þ

Putting x0 = 0 in (22.174), we have

: d3 k
ΔðxÞ =- eikx þ e - ikx = - δ3 ðxÞ,
x0 = 0 ð2π Þ3 2

where we used (22.101). As an alternative way, from (22.169) we have

-1 1
ΔðxÞ = d3 k sin kx:
ð2π Þ3 K


Using ðsin kxÞ = k0 cos kx, we obtain
22.3 Quantization of the Scalar Field [5, 6] 1015

: -1
Δ ð xÞ = d3 k cos kx:
ð2π Þ3

Then, we get

:
-1 -1
Δ ð xÞ = d 3 k cosð- k  xÞ = d3 k cosðk  xÞ = - δ3 ðxÞ:
x0 = 0 ð2π Þ3 ð2π Þ3

Getting back to the equal-time commutation relation, we had

½ϕðt, xÞ, ϕðt, yÞ = 0: ð22:84Þ

Using (22.131) and (22.173), however, the above equation can be rewritten as

½ϕðt, xÞ, ϕðt, yÞ = iΔð0, x - yÞ = 0: ð22:175Þ

A world interval between the events that are labeled by t, x and t, y is imaginary
(i.e., spacelike) as can be seen in (21.31). Since this nature is Lorentz invariant, we
conclude that

½ϕðxÞ, ϕðyÞ = 0 for ðx - yÞ2 < 0: ð22:176Þ

This clearly states that the fields at any two points x and y with spacelike
separation commute [5]. In other words, this statement can be translated into that
the physical measurements of the fields at two points with spacelike separation must
not interfere with each other. This is known as the microcausality.

22.3.5 Feynman Propagator of the Scalar Field

Once the Lorentz invariant functions have been introduced, this directly leads to an
important quantity called the Feynman propagator. It is expressed as a c-number
directly connected to the experimental results that are dealt with in the framework of
the interaction between fields. The Feynman propagator is defined as a vacuum
expectation value of the time-ordered product of ϕ(x)ϕ( y), i.e.,

h0jT½ϕðxÞϕðyÞj0i: ð22:177Þ

With the above quantity, the time-ordered product T[ϕ(x)ϕ( y)] of the neutral
scalar field is defined by
1016 22 Quantization of Fields

T½ϕðxÞϕðyÞ  θ x0 - y0 ϕðxÞϕðyÞ þ θ y0 - x0 ϕðyÞϕðxÞ: ð22:178Þ

The time-ordered product (abbreviated as T-product) frequently appears in the


quantum field theory together with the normal-ordered product (abbreviated as
N-product). Their tangible usage for practical calculations will be seen in the next
chapter. Using (22.102), we have

T ½ϕðxÞϕðyÞ
d 3 kd3 q
= θ x0 - y0 aðkÞe - ikx þ a{ ðkÞeikx aðqÞe - iqy þ a{ ðqÞeiqy
ð2π Þ3 2 k 0 q0
d3 kd3 q
þ θ y0 - x0 aðqÞe - iqy þ a{ ðqÞeiqy aðkÞe - ikx þ a{ ðkÞeikx :
ð2π Þ3 2 k0 q0

Each integral contains four terms, only one of which survives after taking the
vacuum expectation values h0| ⋯| 0i; use the equations of a(k) j 0i = h0 j a{(k) = 0.
Then, we have

h0jT½ϕðxÞϕðyÞj0i
d3 kd3 q
= θ x0 - y0 e - ikx eiqy 0jaðkÞa{ ðqÞj0
ð2π Þ3 2 k0 q0 ð22:179Þ
d kd q
3 3
þ 3
θ y0 - x0 e - iqy eikx 0jaðqÞa{ ðkÞj0 :
ð2π Þ 2 k0 q0

Using the first equation of (22.105), we have

aðkÞa{ ðqÞ = a{ ðqÞaðkÞ þ δ3 ðk - qÞ,


aðqÞa{ ðkÞ = a{ ðkÞaðqÞ þ δ3 ðq - kÞ:

Hence, we obtain

0jaðkÞa{ ðqÞj0 = 0ja{ ðqÞaðkÞj0 þ 0jδ3 ðk - qÞj0


= 0jδ3 ðk - qÞj0 = δ3 ðk - qÞh0j0i = δ3 ðk - qÞ:

Similarly, we have

0jaðqÞa{ ðkÞj0 = δ3 ðq - kÞ:

Substituting the above equations for (22.179) and integrating the resulting equa-
tion with respect to q, we get
22.3 Quantization of the Scalar Field [5, 6] 1017

d3 k
h0jT½ϕðxÞϕðyÞj0i =
ð2π Þ3 2k 0
 θ x0 - y0 e - ikðx - yÞ þ θ y0 - x0 eikðx - yÞ , ð22:180Þ

where, e.g., e-ik(x - y) is represented by

e - ikðx - yÞ = e - ik0 ðx - y0 Þþikðx - yÞ


0
; k0 = k2 þ m2 : ð22:181Þ

We call (22.180) the Feynman propagator and symbolically define it as [8]

ΔF ðx - yÞ  h0jT½ϕðxÞϕðyÞj0i: ð22:182Þ

Notice that the definition and notation of the invariant delta functions and
Feynman propagator are different from literature to literature. Caution should be
taken accordingly. In this book we define the Feynman propagator as identical with
the vacuum expectation value of the time-ordered product [8].
Meanwhile, the commutator [ϕ+(x), ϕ-( y)] is a scalar function. Then, we have [5]

iΔþ ðx - yÞ = ½ϕþ ðxÞ, ϕ - ðyÞ = h0j½ϕþ ðxÞ, ϕ - ðyÞj0i = h0jϕþ ðxÞϕ - ðyÞj0i
= h0jϕðxÞϕðyÞj0i:
ð22:183Þ

This relation is particularly important, because the invariant delta function


Δ+(x - y) is not only a scalar, but also given by a vacuum expectation value of
the product function ϕ(x)ϕ( y). This aspect will become clearer when we evaluate the
Feynman propagator. Switching the arguments x and y and using (22.138), we have

iΔþ ðy - xÞ = - iΔ - ðx - yÞ = h0jϕðyÞϕðxÞj0i: ð22:184Þ

Using (22.178) and (22.182), we get [5]

ΔF ðx - yÞ = i θ x0 - y0 Δþ ðx - yÞ - θ y0 - x0 Δ - ðx - yÞ : ð22:185Þ

The Feynman propagator plays an important role in evaluating the interactions


between various fields, especially through the perturbation calculations. As in the
case of (22.155), we wish somehow to describe (22.180) in the explicitly Lorentz
invariant form. To do this, normally starting with (22.180), we arrive at the Lorentz
invariant Fourier integral representation. In the present case, however, in an opposite
manner we start with the Fourier integral representation of the time-ordered product
in the four-momentum space and show how the said representation naturally leads to
(22.180).
1018 22 Quantization of Fields

The Fourier integral representation of the Feynman propagator is described in an


invariant form as [8]

-i e - ikðx - yÞ
Δ F ð x - yÞ = d4 k , ð22:186Þ
ð2π Þ4 m2 - k 2 - iE

where E is an infinitesimal positive constant and we will take the limit E → 0 after the
whole calculation. When we dealt with the invariant delta functions in the last
section, the constant E was not included in the denominator. The presence or absence
of E comes from the difference in the integration paths of the complex integral that
are chosen for the evaluation of individual invariant functions. We will make a brief
discussion on this point at the end of this section.
It is convenient to express the Feynman propagator in the momentum space
(momentum space representation). In practice, taking the Fourier transform of
(22.186), we have

-i
= d 4 xΔF ðx - yÞeikðx - yÞ , ð22:187Þ
m2 - k 2 - iE

where we applied the relations (22.55) and (22.58) to (22.186). For the derivation,
use
1
ð2π Þ4 δ4 ðkÞ = eikx d 4 x:
-1

We define ΔF ðk Þ as

-i
ΔF ðkÞ  : ð22:188Þ
m2 - k2 - iE

The momentum space representation ΔF ðk Þ is frequently used for dealing with


problems of the interaction between the fields (see Chap. 23).
We modify (22.186) as follows for the sake of easy handling. Defining a positive
constant K as K  k2 þ m2 as before, we think of an equation

1 1 1
A - : ð22:189Þ
2K k 0 - ð- K þ iEÞ k 0 - ðK - iEÞ

After some calculations, (22.189) is deformed to

1 2K - 2iE 1 - ðiE=K Þ
A=  2 = 2 : ð22:190Þ
2K K - k 0 - 2iKE þ E
2 2 m - k 2 - 2iKE þ E2
22.3 Quantization of the Scalar Field [5, 6] 1019

In (22.190), 2KE in the denominator is an infinitesimal positive constant as well;


we neglect E2 as usual. The term -iE/K in the numerator does not affect subsequent
calculations and, hence, can be discarded from the beginning. Thus, (22.190) can be
regarded as identical to 1/(m2 - k2 - iE). Then, instead of calculating (22.186), we
are going to evaluate

ΔF ðx - yÞ
-i 1 - ikðx - yÞ 1 1
= d4 k e - : ð22:191Þ
ð2π Þ4 2K k0 - ð- K þ iEÞ k0 - ðK - iEÞ

Rewriting (22.191), we have

þ1
-i 1 - ik0 ðx0 - y0 Þ
ΔF ðx - yÞ ¼ d3 keikðx - yÞ dk 0 e
ð2π Þ4 -1 2K

1 1
× - : ð22:192Þ
k0 - ð- K þ iEÞ k0 - ðK - iEÞ

With the above relation, we define I as

þ1
1 - ik0 ðx0 - y0 Þ 1 1
I dk 0 e - : ð22:193Þ
-1 2K k0 - ð- K þ iEÞ k0 - ðK - iEÞ

Since complex numbers are contained in the denominator of the integrand I of


(22.192), we wish to convert I to a complex integration with respect to k0 (see
Chap. 6). From (22.192) we immediately see that the integrand has simple poles at
k0 = ± (K - iE). The point is that the presence of the simple poles allows us to use
(6.146) described by

f ðzÞdz = 2πi Res f aj : ð6:146Þ


j
C

Another point is that the functions

1 1
and
k0 - ð- K þ iEÞ k0 - ðK - iEÞ

tend uniformly to zero with |k0| → 1. This situation allows us to apply Jordan’s
lemma to the complex integration of (22.192); see Sect 6.8, Lemma 6.6. To perform
the complex integration, we have the following two cases.
1020 22 Quantization of Fields

(a)
Γ

i
− +
0

= +

(b)

= +

− 0 −
−i

Fig. 22.2 Contours to calculate the complex integrations with respect to the Feynman propagator
ΔF(x - y) of the scalar field. (a) Contour integration with the upper semicircle C. (b) Contour
~
integration with the lower semicircle C

1. Case I:
We choose an upper semicircle C for the contour integration (see Fig. 22.2a).
As in the case of Example 6.4, we define IC as

1 - ik0 ðx0 - y0 Þ 1 1
IC = dk 0 e - : ð22:194Þ
2K k 0 - ð- K þ iEÞ k0 - ðK - iEÞ
C
22.3 Quantization of the Scalar Field [5, 6] 1021

As the function k0 - ð1K - iEÞ is analytic on and inside the contour C, its associated
contour integration vanishes due to the Cauchy’s integral theorem (Theorem
6.10). Hence, we are left with

1 - ik0 ðx0 - y0 Þ 1
IC = dk 0 e : ð22:195Þ
2K k0 - ð- K þ iEÞ
C

Making the upper semicircle ΓR large enough, we get


1
1 - ik0 ðx0 - y0 Þ 1
lim I C = e dk 0
R→1 -1 2K k0 - ð- K þ iEÞ
ð22:196Þ
1 - ik0 ðx0 - y0 Þ 1
þ lim dk0 e :
R→1 Γ
R
2K k 0 - ð- K þ iEÞ

In (22.196), the contour integration is taken counter-clockwise so that k0 = -


K + iE can be contained inside the closed contour (i.e., the upper semicircle C).
For the contour integral along C to have a physical meaning (i.e., for Jordan’s
Lemma to be applied to the present case), we must have x0 - y0 < 0. For this, put
k0 = Reiθ (R : real positive; 0 < θ < π) so that we may get

e - ik0 ðx - y0 Þ
= e - iRðx - y0 Þ cos θ Rðx0 - y0 Þ sin θ
0 0
e :

Taking the absolute value of both sides for the above equation, we obtain

e - ik0 ðx - y0 Þ
= e - iRðx - y0 Þ cos θ
 eRðx - y0 Þ sin θ
= eR ð x - y0 Þ sin θ
0 0 0 0
:

We have eRðx - y0 Þ sin θ


0
→ 0 with R → 1. Since the function 1
k0 - ð- KþiEÞ is
analytic on ΓR and tends uniformly to zero as R → 1, the second term of (22.196)
vanishes according to Lemma 6.6 (Jordan’s Lemma). Thus, we obtain
1
1 - ik0 ðx0 - y0 Þ 1
lim I C = dk 0 e = I:
R→1 -1 2K k0 - ð- K þ iEÞ

Applying (6.146) and (6.148) to the present case, we get

1 - ik0 ðx0 - y0 Þ 1 ðiKþEÞðx0 - y0 Þ


I = 2πi e = 2πi e :
2K k0 = - KþiE 2K

Taking E → 0, we obtain
1022 22 Quantization of Fields

1 iK ðx0 - y0 Þ
I = 2πi e : ð22:197Þ
2K

2. Case II:
We choose a lower semicircle C for the contour integration (see Fig. 22.2b).
As in Case I, we define I as
C

1 - ik0 ðx0 - y0 Þ 1 1
I = dk 0 e - :
C 2K k0 - ð- K þ iEÞ k0 - ðK - iEÞ
C

This time, as the function k0 - ð-1 KþiEÞ is analytic on and inside the contour C, its
associated contour integration vanishes due to the Cauchy’s integral theorem, and
so we are left with

1
1 - ik0 ðx0 - y0 Þ 1
lim I = dk 0 e -
R→1 C -1 2K k0 - ðK - iEÞ
ð22:198Þ
1 - ik0 ðx0 - y0 Þ 1
þ lim dk 0 e - :
R→1 Γ
R
2K k0 - ðK - iEÞ

Using Jordan’s lemma as in Case I and applying (6.146) and (6.148) to the
present case, we get

- e - ik0 ðx - y Þ
1 0 0 1 - iK ðx0 - y0 Þ
I = - 2πi = 2πi e : ð22:199Þ
2K k0 = K 2K

In (22.199), the minus sign before 2πi with the first equality resulted from that
the contour integration was taken clockwise so that k0 = K - iE was contained
inside the closed contour (i.e., the lower semicircle C). Note that in this case we
must have x0 - y0 > 0 for Jordan’s lemma to be applied.
Combining the results of Case I and Case II and taking account of the fact that we
must have x0 - y0 < 0 (x0 - y0 > 0) with Case I (Case II), we get

θ - x0 þ y0 eiK ðx - y Þ þ θ x0 - y0 e - iK ðx - y Þ :
1 0 0 0 0
I = 2πi ð22:200Þ
2K

Substituting (22.200) for (22.192), we get


22.3 Quantization of the Scalar Field [5, 6] 1023

-i
Δ F ðx - y Þ = d 3 kIe - ikðx - yÞ
ð2π Þ4
d 3 k ikðx - yÞ
θ - x0 þ y0 eiK ðx - y Þ þ θ x0 - y0 e - iK ðx - y Þ
1 0 0 0 0
= e
ð2π Þ3 2K
1 d3 k ik ðx - yÞ
= 3
θ - x0 þ y0 þ θ x0 - y0 e - ikðx - yÞ
ð2π Þ 2K
= h0jT ½ϕðxÞϕðyÞj0i,
ð22:201Þ

where with the second equality we switched the integral variable k to -k in the first
term; the last equality came from (22.180). Thus, we have shown that (22.186)
properly reproduces the definition of the Feynman propagator ΔF(x - y) given by
(22.182).
To calculate the invariant delta functions, we have encountered several examples
of the complex integration. It is worth noting that comparing the contours C of
Fig. 22.2a and C- of Fig. 22.1 their topological features are virtually the same. This
is also the case with the topological relationship between the contours C of
Fig. 22.2b and C+ of Fig. 22.1. That is, the former two contours encircle only
k0 = - K (or k0 = - K + iE), whereas the latter two contours encircle only
k0 = K (or k0 = K - iE).
Comparing (22.186) with (22.159) and (22.171), we realize that the functional
forms of the integrand are virtually the same except for the presence or absence of
the term -iE in the denominator. With the complex integration of k0, simple poles
±K are located inside the contour for the integration regarding (22.159) and
(22.171). With (22.186), however, if the term -iE were lacked, the simple poles
±K would be located on the integration path. To avoid this situation, we “raise” or
“lower” the pole by adding +iE (E > 0) or -iE to its complex coordinate, respectively,
so that we can apply Cauchy’s Integral Theorem (see Theorem 6.10) and Cauchy’s
Integral Formula (Theorem 6.11) to the contour integration [11]. After completing
the relevant calculations, we take E → 0 to evaluate the integral properly; see
(22.197), for example. Nevertheless, we often do without this mathematical manip-
ulation. An example can be seen in Chap. 23.

22.3.6 General Consideration on the Field Quantization

So far, we have described the quantization procedures of the scalar field. We will
subsequently deal with the quantization of the Dirac field and electromagnetic field
(or photon field). The quantization processes and resulting formulation have differ-
ent faces depending on different types of relativistic fields. Nevertheless, the quan-
tization methods can be integrated into a unified approach. This section links the
contents of the previous sections to those of the subsequent sections.
In Sect. 22.2.2, we had
1024 22 Quantization of Fields

1
E= j kihk j dk, ð22:64Þ
-1

where E is the identity operator. We extend a variable k of (22.64) to the momentum


four-vector p. That is, we have
1
E= j pihp j dp4 , ð22:202Þ
-1

p0
where p = , to which p0 denotes the energy and p represents the three-
p
dimensional momenta of a relativistic particle. We define the quantity p2 as

p2  p μ p μ :

Then, from (21.79) we have

p2 = p0 2 - p2 = m2 or p2 - m2 = 0:

Taking account of this constraint properly, we modify (22.202) in an explicitly


invariant form in such a way that [7]
1
E = δ p2 - m2 j pihp j dp4
-1
1 1 ð22:203Þ
= dp3 dp0 δ p0 2 - p2 þ m2 j pihp j :
-1 -1

To further modify (22.203), using (22.143) where f(x) = x2 - a2 we obtain

δðx - aÞ δðx þ aÞ
δ x 2 - a2 = þ :
2j aj 2j - aj

Exchanging x and a and using the fact that the δ function is an even function, we
have

δðx - aÞ δðx þ aÞ
δ x2 - a2 = þ ,
2j xj 2j - x j

where x is regarded as the valuable. Replacing x with p0 and a with p2 þ m2 , we


get
22.3 Quantization of the Scalar Field [5, 6] 1025

δ p2 - m2 = δ p0 2 - p2 þ m2
δ p0 - p2 þ m2 δ p0 þ p2 þ m 2
= þ :
2jp0 j 2j - p0 j

Inserting this relation into (22.203), we obtain


1
E = δ p2 - m2 j pihp j dp4
-1

1 1 δ p0 - p2 þ m2 δ p0 þ p2 þ m 2
= dp3 dp0 þ j pihp j :
-1 -1 2jp0 j 2j - p0 j

Defining jpi  j p, p0i along with a real positive number Ep  p2 þ m2 and


using a property of the δ function, we get
1
1
E= dp3 p, Ep ihp, Ep þ p, - E p ihp, - E p :
-1 2Ep

Operating hxj and jΦi on both sides of the above equation from the left and right,
respectively, we get

xEΦ = hxjΦi
1 ð22:204Þ
dp3
= xjp, E p hp, E p jΦi þ hxjp, - E p ihp, - Ep jΦi ,
- 1 2E p

where jxi represents the continuous space–time basis vectors of the Minkowski
space (see Fig. 10.1, which shows a one-dimensional case); jΦi denotes the quantum
state of the relativistic particle (photon, electron, etc.) we are thinking of. According
to (10.90), we express hx| Φi of (22.204) as

ΦðxÞ  hxjΦi:

Also defining jxi  j x, ti, we rewrite (22.204) as


1
dp3
Φ ð xÞ = x, tjp, Ep hp, Ep jΦi þ hx, tjp, - E p ihp, - Ep jΦi :
- 1 2E p

Using (22.67), we have


1026 22 Quantization of Fields

1
x, tjp, ± E p = eipx : ð22:205Þ
3
ð2π Þ

In the above equation, neither t nor Ep can be regarded as Fourier components


because of the presence of δ( p2 - m2) in the integrand of (22.203). In other words, if
we choose p for freely changeable parameter over the range (-1, +1), we cannot
deal with p0 as an independent parameter on account of the factor δ( p2 - m2) but
have to regard it as a dependent parameter p0 = ± p2 þ m2 .
The factor hp, ±Ep| Φi describes the quantum state of the free particle in the
positive-energy state and negative-energy state, respectively. The former (latter)
state contains e - iEp t (eiEp t ) as a factor accordingly, independent of the particle
species. The factor hp, ±Ep| Φi reflects the nature of individual particle species. As
already seen in Chap. 21, for example, hp, ±Ep| Φi denotes a spinor with the Dirac
field. As for the photon field, hp, ±Ep| Φi represents an electromagnetic four-vector,
as we will see later. More importantly, hp, ±Ep| Φi contains a q-number, i.e.,
operator. Normally, in the quantum field theory, the positive-energy (or negative-
energy) state is accompanied by the annihilation (or creation) operator. The factor
hp, ±Ep| Φi must properly be “normalized.”
Including the aforementioned characteristics, Φ(x) is described as

Φ ð xÞ
1
dp3 1
= eipx e - iEp t αþ ðpÞΦþ p, E p þ eiEp t α{- ðpÞΦ - p, - E p ,
- 1 2E p ð2π Þ 3

ð22:206Þ

where α+(p) and α{- ðpÞ are the annihilation and creation operators (i.e., q-numbers),
respectively. In (22.206) we have defined

p, E p jΦ  αþ ðpÞΦþ p, Ep and p, - E p jΦ  α{- ðpÞΦ - p, - Ep :

Even though Φ±(p, ±Ep) contains the representation of spinor, vector, etc., it is a
c-number. Further defining the operators such that [8]

aþ ð p Þ  2E p αþ ðpÞ and a{- ðpÞ  2Ep α{- ðpÞ,

we obtain
22.3 Quantization of the Scalar Field [5, 6] 1027

Φ ð xÞ
1
dp3
= eipx e - iEp t aþ ðpÞΦþ p, E p þ eiEp t a{- ðpÞΦ - p, - E p :
3
-1 ð2π Þ 2E p
ð22:207Þ

Then, we get
1
dp3
ΦðxÞ ¼ eipx e - iEp t aþ ðpÞΦþ p, Ep
-1 ð2π Þ3 2E p

þ e - ipx eiEp t a{- ð - pÞΦ - - p, - E p


1
dp3
¼ e - ipx aþ ðpÞΦþ p, E p þ eipx a{- ð - pÞΦ - - p, - E p ,
-1 ð2π Þ3 2E p
ð22:208Þ

where with the second equality e - ipx = eipx e - iEp t and eipx = e - ipx eiEp t; in the second
term of RHS p was switched to -p. In (22.208), e.g., b{- ðpÞ is often used instead of
a{- ð- pÞ for the sake of distinction and clarity.
The quantization of the relativistic fields is based upon (22.208) in common with
regard to the neutral scalar field, Dirac field, and the photon field. The first term of
(22.208) is related to the positive-energy state that is characterized by a+(p) and
Φ+(p, Ep). The second term, in turn, is associated with the negative-energy state
characterized by a{- ð- pÞ and Φ-(-p, -Ep). In Sect. 22.3.2, we established a basic
equation of (22.98) as a starting point for the quantization of the scalar field. It was
unclear whether (22.98) is Lorentz invariant. Its equivalent expression of (22.208),
however, has been derived from the explicitly invariant equation of (22.203). Thus,
this enables us to adopt (22.208) as a fundamental equation for the quantization of
relativistic fields.
As in (22.208), it is always possible to separate (or decompose) the field operator
Φ(x) into the positive-energy and negative-energy parts. Defining

ΦðxÞ = Φþ ðxÞ þ Φ - ðxÞ,

we have
1028 22 Quantization of Fields

1
dp3
Φþ ðxÞ  e - ipx aþ ðpÞΦþ p, E p ,
-1 3
ð2π Þ 2E p
1 ð22:209Þ
dp3
Φ - ð xÞ  eipx a{- ð - pÞΦ - - p, - E p :
-1 ð2π Þ3 2E p

Using these equations, we obtain various useful representations regarding, e.g.,


commutation relations, the invariant delta functions, and Feynman propagators (see
Sects. 22.3.4 and 22.3.5 as well as the subsequent Sects. 22.4 and 22.5).
In the case of the neutral scalar field (or Klein-Gordon field), we do not have to
distinguish a±(±p) but we only have to take account of a(p) [and a{(p)]. Also in the
neutral scalar case, as a normalization condition to be satisfied in (22.208) we may
put

Φþ p, E p  Φ - - p, - Ep  1:

Thus, we have reproduced (22.98). In the case of the Dirac field, in turn, we have
following correspondence:

Φþ p, E p $ uðp, hÞ and Φ - - p, - Ep $ vðp, hÞ,

where we consider the helicity h as an additional freedom (see Sect. 22.4). Regarding
other specific notations, see the subsequent sections.

22.4 Quantization of the Dirac Field [5, 6]

22.4.1 Lagrangian Density and Hamiltonian Density


of the Dirac Field

As in the case of the scalar field, we start with the Lagrangian density of the Dirac
field. The Lagrangian density L(x) is given by

LðxÞ  ψ ðxÞ iγ μ ∂μ - m ψ ðxÞ: ð22:210Þ

Then, the action integral S(ψ) follows from (22.210) such that

Sðψ Þ = dx LðxÞ = dx ψ ðxÞ iγ μ ∂μ - m ψ ðxÞ : ð22:211Þ

Taking variation with respect to ψ of S(ψ) as before, we have


22.4 Quantization of the Dirac Field [5, 6] 1029

δSðψ Þ = dxδLðxÞ

= dx δψiγ μ ∂μ ψ þ ψiγ μ ∂μ δψ - mδψ  ψ - mψ  δψ

= dxδψ iγ μ ∂μ ψ - mψ þ dx ψiγ μ ∂μ - mψ δψ

= dxδψ iγ μ ∂μ ψ - mψ þ ½ψiγ μ δψ 1
-1 þ dx - ∂μ ψiγ μ - mψ δψ

= dxδψ iγ μ ∂μ ψ - mψ - dx i∂μ ψγ μ þ mψ δψ,

ð22:212Þ

where with the second to the last equality we used integration by parts and δψ was
supposed to vanish at x = ± 1. Since ψ and ψ are regarded as independent
variables, for δS(ψ) = 0 to hold with any arbitrarily chosen δψ and δψ we must have

iγ μ ∂μ - m ψ ðxÞ = 0 ð22:213Þ

and

i∂μ ψ ðxÞγ μ þ mψ ðxÞ = 0: ð22:214Þ

Instead of (22.214), we sometimes write [6]

i ψ ∂μ γ μ þ mψ = 0: ð22:215Þ

Equations (22.214) and (22.215) are called the adjoint Dirac equation. Rewriting
(22.214), we have
1030 22 Quantization of Fields

i∂μ ψ { γ 0 γ μ þ mψ { γ 0 = 0,
# take a transposition and rearrange the terms :
T
iðγ Þ γ 0 ∂μ ψ  þ mγ 0 ψ  = 0,
μ T

# use transposition of γ 0 γ μ = 2η0μ - γ μ γ 0 ð21:66Þ :


0 T
i 2η0μ - γ ðγ μ ÞT ∂μ ψ  þ mγ 0 ψ  = 0,
T
# multiply both sides by γ 0 from the left and
use γ 0 T
γ 0 T
= γ 0 T 0
γ =E ðidentity matrixÞ : ð22:216Þ
T
i 2η0μ γ 0 - ðγ μ ÞT ∂μ ψ  þ mψ 
3
T T
= i γ 0 ∂0 ψ  - i γk ∂k ψ  þ mψ  = 0,
k=1
# take complex conjugate of the above :
3
- i γ 0 ∂0 ψ þ γ k ∂k ψ þ mψ = 0,
k=1

where we used the fact that γ 0 is an Hermitian operator and that γ k (k = 1, 2, 3) are
anti-Hermitian. Rewriting (22.216), finally we recover

iγ μ ∂μ - m ψ ðxÞ = 0: ð22:213Þ

Thus, (22.213) can naturally be derived from (22.214), which is again equivalent
to (21.54).
As in the case of the scalar field, the Hamiltonian density ℋ(x) of the Dirac field
is given by

ℋðxÞ = π ðxÞψ_ ðxÞ - L ψ ðxÞ, ∂μ ψ ðxÞ : ð22:217Þ

Since ψ(x) represents a spinor field that is described by a (4, 1) matrix (i.e.,
column vector), π(x) is a (1, 4) matrix (row vector) so that ℋ is a scalar, i.e., a (1, 1)
matrix. If we need to distinguish each component, we put an index, e.g., ψ(x)a and
π(x)b (a, b = 1, 2, 3, 4). When we need to distinguish the time coordinate from the
space coordinates, we describe, e.g., π(x) as π(t, x). We have

∂LðxÞ
π ð xÞ = = ψ ðxÞiγ 0 = ψ { ðxÞγ 0 iγ 0 = iψ { ðxÞ: ð22:218Þ
∂ ½ ∂ 0 ψ ð xÞ 

The representation of (22.218) is somewhat symbolic, because ψ(x) involved as


the differentiation variable is a matrix. Therefore, we use the following expression to
specify individual components of ψ(x) and π(x):
22.4 Quantization of the Dirac Field [5, 6] 1031

∂LðxÞ
π a ð xÞ = = ψ ðxÞiγ 0 = ψ { ðxÞγ 0 iγ 0 a = i ψ { ðxÞa
∂ ½ ∂ 0 ψ a ð xÞ  a

= iψ  ðxÞa : ð22:219Þ

Note that in (22.219) ψ {(x) is a (1, 4) matrix and that ψ (x)a is its ath component.
The derivation of (22.219) is based on a so-called compendium quantization method.
Interested readers are referred to literature for further discussion [2, 6, 8]. Using
(22.219), the Hamiltonian density ℋ of (22.217) is expressed as
:
ℋðxÞ = iψ { ðxÞ ψ ðxÞ - ψ ðxÞ iγ μ ∂μ - m ψ ðxÞ
:
= iψ { ðxÞ ψ ðxÞ - ψ { ðxÞγ 0 iγ μ ∂μ ψ ðxÞ þ ψ ðxÞmψ ðxÞ
:
= iψ { ðxÞ ψ ðxÞ - ψ { ðxÞγ 0 iγ μ ∂μ ψ ðxÞ þ ψ ðxÞmψ ðxÞ
3
=- ψ { ðxÞγ 0 iγ k ∂k ψ ðxÞ þ ψ ðxÞmψ ðxÞ
k=1
3
= ψ ð xÞ - iγ k ∂k þ m ψ ðxÞ:
k=1

Defining α  γ 0γ [where γ  (γ 1, γ 2, γ 3)] and β  γ 0 according to the custom, we


have

ℋ = ψ { ðxÞð- iα  — þ mβÞψ ðxÞ: ð22:220Þ

Hence, with the Hamiltonian H we get

H= d3 x ℋ = d3 x ψ { ðxÞð- iα  — þ mβÞψ ðxÞ: ð22:221Þ

According to the classical theory (i.e., the theory before the quantum field theory)
based on the one-particle wave function and its probabilistic interpretation, the
integrand of (22.221) can be viewed as an expectation value of the one-particle
energy if ψ(x) is normalized. Then, the classical Schrödinger equation could be
described as

Hψ ðxÞ = ð- iα  — þ mβÞψ ðxÞ = i∂0 ψ ðxÞ, ð22:222Þ

where H represents the one-particle Hamiltonian. Then, (22.222) is written as


1032 22 Quantization of Fields

3
- iγ 0 γ k ∂k þ γ 0 m - γ 0 γ 0 i∂0 ψ ðxÞ = 0: ð22:223Þ
k=1

Multiplying both sides of (22.223) by γ 0 from the left, we obtain

3
- iγ k ∂k þ m - iγ 0 ∂0 ψ ðxÞ = 0:
k=1

Rearranging the terms, we get

iγ μ ∂μ - m ψ ðxÞ = 0: ð22:213Þ

Thus, once again we have recovered the Dirac equation. If Schrödinger had
invented the gamma matrices, he would have discovered the Dirac equation
(22.213).
If we choose ϕ(x) of (21.59) for the solution of the Dirac equation, H is identical
with Hp of (21.131) and gives a positive one-particle energy p0. If, however, χ(x) of
(21.59) is chosen, H is identical with H-p that gives a negative energy -p0, in
agreement with the result of Sect. 21.3.3.
Unlike the classical theory, ψ(x) must be dealt with as a quantum-mechanical
operator. Instead of the commutation relation given in (22.84) with the scalar field,
we use the following equal-time anti-commutation relation (see Sect. 21.3.1)
described by

fψ ðt, xÞ, π ðt, yÞg = ψ ðt, xÞ, iψ { ðt, yÞ = iδ3 ðx - yÞ,


ð22:224Þ
ψ ðt, xÞ, ψ T ðt, yÞ = π T ðt, xÞ, π ðt, yÞ = 0:

The expression of (22.224) is somewhat symbolic. To deal with the anti-


commutators of (22.224) requires caution. It is because although ψ(t, x)π(t, y) is a
(4, 4) matrix, π(t, y)ψ(t, x) is a (1, 1) matrix. To avoid this inconsistency, we define

T
fψ ðt, xÞ, π ðt, yÞg  ψ ðt, xÞπ ðt, yÞ þ π ðt, yÞT ½ψ ðt, xÞT : ð22:225Þ

In the second term of (22.225), π(t, y)ψ(t, x) is replaced with π(t, y)T[ψ(t, x)]T and
the components of π(t, y) are placed in front of those of ψ(t, x).
Another word of caution is that both ψ(t, x) and π(t, y) are Grassmann numbers.
Let a and b be Grassmann numbers. Then, we have ab = - ba, a2 = b2 = 0. Let
A = (aij) and B = (bij) be matrices whose matrix elements are Grassmann numbers.
Unlike ordinary c-numbers, we have aijbkl ≠ bklaij. Hence, the standard matrix
multiplication rules must be modified. If numbers we are dealing with are not
Grassmann numbers but ordinary c-numbers, we would simply have
22.4 Quantization of the Dirac Field [5, 6] 1033

T
π ðt, yÞT ½ψ ðt, xÞT = ψ ðt, xÞπ ðt, yÞ:

This equation, however, is not true of the present case where both ψ(t, x) and π(t,
y) are Grassmann numbers. To highlight this feature, let us think of a following
simple example.
Example 22.1 Suppose that we have matrices

a1
a2
A= and B = ðb1 b2 b3 b4 Þ: ð22:226Þ
a3
a4

Then, we have

a 1 b1 a1 b2 a1 b3 a1 b4
a2 b1 a2 b2 a2 b3 a2 b4
AB = ,
a3 b1 a3 b2 a3 b3 a3 b4
a4 b1 a4 b2 a4 b3 a4 b4
ð22:227Þ
b1 a1 b2 a1 b3 a1 b4 a1

T b1 a2 b2 a2 b3 a2 b4 a2
BT AT = :
b1 a3 b2 a3 b3 a3 b4 a3
b1 a4 b2 a4 b3 a4 b4 a4

If all the above numbers are ordinary c-numbers, we have no problem with
AB = (BTAT)T. But, if we are dealing with the Grassmann numbers, we have
AB ≠ (BTAT)T. If we express the anti-commutator between A and B, symbolically
we have

T
fA, Bg = AB þ BT AT
a 1 b1 þ b1 a1 a1 b2 þ b2 a1 a1 b3 þ b3 a1 a 1 b4 þ b4 a1
a2 b1 þ b1 a2 a2 b2 þ b2 a2 a2 b3 þ b3 a2 a 2 b4 þ b4 a2
= :
a3 b1 þ b1 a3 a3 b2 þ b2 a3 a3 b3 þ b3 a3 a 3 b4 þ b4 a3
a4 b1 þ b1 a4 a4 b2 þ b2 a4 a4 b3 þ b3 a4 a 4 b4 þ b4 a4

What we have done in the above calculation is to compute aibk + bkai


(i, k = 1, 2, 3, 4) for the individual matrix elements.
1034 22 Quantization of Fields

Thus, this simple example shows that care should be taken when dealing with the
Grassmann numbers, particularly in the case of matrix calculations. □
Getting back to our present issue, we write the first equation of (22.224) in a
matrix form to get

T
ψ1 π1
ψ2 π2
ð π1 π2 π3 π4 Þ þ ψ1 ψ2 ψ3 ψ4
ψ3 π3
ψ4 π4
ð22:228Þ
iδ ðx - yÞ
3

iδ3 ðx - yÞ
= ,
iδ3 ðx - yÞ
iδ3 ðx - yÞ

where with RHS all the off-diagonal elements vanish with the diagonal elements
being iδ3(x - y). That is, (22.228) is a diagonal matrix. Also, notice that using
(22.218), (22.228) can be rewritten as

ψ 1
T
ψ1
ψ2 ψ 2
ð ψ 1 ψ 2 ψ 3 ψ 4 Þ þ ψ1 ψ2 ψ3 ψ4
ψ3 ψ 3
ψ4 ψ 4
ð22:229Þ
δ ð x - yÞ
3

δ 3 ð x - yÞ
= :
δ3 ðx - yÞ
δ3 ðx - yÞ

Another way to show (22.228) is to write down the matrix elements in such a way
that [8]

ψ α ðt, xÞ, π β ðt, yÞ = iδαβ δ3 ðx - yÞ ðα, β = 1, 2, 3, 4Þ: ð22:230Þ

Namely,

ψ α ðt, xÞπ β ðt, yÞ þ π β ðt, yÞψ α ðt, xÞ = iδαβ δ3 ðx - yÞ: ð22:231Þ


22.4 Quantization of the Dirac Field [5, 6] 1035

Note that neither α nor β is an index that indicates the four-vector, but that both α
and β are indices of spinors. Then, we specify α, β = 1, 2, 3, 4 instead of α, β = 0,
1, 2, 3.
Equations (22.230) and (22.231) imply the presence of a transposed matrix,
which has appeared in (22.228) and (22.229). Unlike (22.84) describing the com-
mutation relation of the scalar field, the anti-commutation relation for the Dirac field
of (22.224), (22.229), and (22.230) is represented by (4, 4) matrices.

22.4.2 Quantization Procedure of the Dirac Field

The quantization of the Dirac field can be performed basically in parallel with that of
the scalar field. That is, (A) the operator ψ(x) needs to be Fourier-expanded in an
invariant form. (B) The creation operator and annihilation operator are introduced.
(C) The anti-commutation relation between the creation and annihilation operators
needs to be established. The calculation processes, however, differ from the scalar
field quantization in several points. For instance, (a) ψ(x) does not need to be
Hermitian. (b) The calculations are somewhat time-consuming in that we have to
do (4, 4) matrices calculations. (But they are straightforward.)
A. We start with the Fourier expansion of ψ(x). In Sect. 22.3.6 we had
1
dp3
Φ ð xÞ =
-1 ð2π Þ3 2Ep

 e - ipx aþ ðpÞΦþ p, E p þ eipx a{- ð- pÞΦ - - p, - E p : ð22:208Þ

In the Dirac field, we must include an additional freedom of the helicity h that
takes a number 1 or -1. Accordingly, Fourier expansion of the identity operator
(22.203) is reformulated as
1
E= δ p2 - m2 j p, hihp, h j dp4 :
- 1h = ± 1

Hence, (22.208) is rewritten as


1
dp3
ΦðxÞ ¼ e - ipx aþ ðp, hÞΦþ p, Ep , h
3
-1 ð2π Þ 2Ep h¼ ± 1

þ eipx a{- ð - p, hÞΦ - - p, - E p , h :


1036 22 Quantization of Fields

B. To this relation, we wish to introduce the creation operator and annihilation


operator. To this end, we rewrite the field quantity Φ(x) using the following
expressions given by [8]

aþ ðp, hÞ $ bðp, hÞ and a{- ð- p, hÞ $ d{ ðp, hÞ

as well as

Φþ p, E p , h $ uðp, hÞ and Φ - - p, - Ep , h $ vðp, hÞ:

With respect to the Dirac field, using ψ(x) in place of Φ(x) in the above
expression, we get [8]

ψ ð xÞ
1
dp3
= e - ipx bðp, hÞuðp, hÞ þ eipx d { ðp, hÞvðp, hÞ ,
-1 3
ð2π Þ 2p0 h= ±1

ð22:232Þ

where p0 was used instead of Ep in (22.208). The operators b(p, h) and d{(p, h)
are an annihilation operator of an electron and a creation operator of a positron,
respectively.
The introduction of d{(p, h) needs some explanation. So far, we permitted the
negative-energy state. However, if a particle carrying a negative energy did exist
in the real world, we would face an intolerable situation. To get rid of this big
hurdle, Dirac proposed the hole theory (1930). Even though several aspects of
the hole theory are explained in literature [6, 8], we wish to add a few comments
to the theory. Dirac thought that vacuum consisted of electrons that occupied all
the negative-energy states (strictly speaking, the energy states with an energy of
-m or less where m is an electron mass). If an electron happened to be excited so
as to have a positive energy, a created hole within the negative-energy electrons
would behave as if it were a particle carrying a positive charge (|e| = - e) and a
positive energy with a rest mass m, i.e., a positron. The positron predicted by
Dirac was indeed soon discovered experimentally by Carl Anderson (1932).
Getting back to the present issue, the operator d{(p, h) might well be interpreted
as an annihilation operator of an electron carrying a negative energy. But once
the positron was discovered in the real world, d{(p, h) must be interpreted as the
creation operator of a positron.
Notice again that according to the custom the annihilation operator b(p, h) is a
“coefficient” of the positive-energy state u(p, h) and that the creation operator
d{(p, h) is a coefficient of the negative-energy state v(p, h). The anti-
commutation relations will be introduced because electron is a Fermion particle.
At the same time, the anti-commutation relations facilitate various calculations
(vide infra). Note that unlike the scalar field ϕ(x) described in (22.98), ψ(x) is
22.4 Quantization of the Dirac Field [5, 6] 1037

asymmetric with respect to the creation and annihilation operators, reflecting the
fact that ψ(x) is not Hermitian.
C. Our next task is to translate the canonical anti-commutation relations (22.224)
into the anti-commutation relations between the creation and annihilation oper-
ators. To this end, we wish first to solve (22.232) with respect to b(p, h). The
calculation procedures for this are as follows:
(a) Multiply both sides of (22.232) by e-iqx and integrate them with respect to x
to apply (22.101). As a result, we obtain
1
dx3 e - iqx ψ ðxÞ
-1
ð22:233Þ
ð2π Þ3 - iq0 t 0 0 iq0 t { 0 0
= e bðq, h Þuðq, h Þ þ e d ðq, h Þvðq, h Þ :
2q0 h0 = ± 1

(b) Multiplying both sides of (22.233) by u{(q, h) from the left and using
(21.156) and (21.157), we delete d{(q, h′) from (22.233) to get
1
dx3 e - iqx u{ ðq, hÞψ ðxÞ = ð2π Þ3 2q0 e - iq0 t bðq, hÞ: ð22:234Þ
-1

Rewriting (22.234), we have


1
1
bðq, hÞ = dx3 eiqx u{ ðq, hÞψ ðxÞ: ð22:235Þ
3
ð2π Þ 2q0 -1

(c) Next, we determine the anti-commutation relations between b(q, h) and d(q,
h) (and their adjoints). For example,

bðp, hÞ, b{ ðq, h0 Þ


1 1
1
= dx3 eipx u{ ðp, hÞψ ðxÞ dy3 e - iqy ψ { ðyÞuðq, h0 Þ
ð2π Þ3 2p0 -1 -1
1 1
þ dy3 e - iqy ψ { ðyÞuðq, h0 Þ dx3 eipx u{ ðp, hÞψ ðxÞ
-1 -1
1
= ×
ð2π Þ3 2p0
1038 22 Quantization of Fields

1 1
dx3 dy3 eipx e - iqy u{ ðp, hÞψ ðxÞψ { ðyÞuðq, h0 Þþψ { ðyÞuðq, h0 Þu{ ðp, hÞψ ðxÞ:
-1 -1
ð22:236Þ

The RHS is a (1, 1) matrix containing Grassmann numbers (ψ)α,


{
(ψ )β (α, β = 1, 2, 3, 4). The quantity (u)α, however, is an ordinary c-number. Note
that the matrix multiplication (1, 4) × (4, 1) × (1, 4) × (4, 1) produces a (1, 1) matrix.
Matrix calculations are somewhat tedious, but after termwise estimation of (22.236)
we are left with a simple form by virtue of the anti-commutation relation (22.224).
The calculation processes are as follows: Regarding u{(p, h)ψ(x)ψ {( y)u(q, h′), its full
description is given by

u{ ðp, hÞψ ðxÞψ { ðyÞuðq, h0 Þ


ψ 1 ð xÞ
ψ 2 ð xÞ
= ð u1 ðp, hÞ u2 ðp, hÞ u3 ðp, hÞ u4 ðp, hÞ Þ
ψ 3 ð xÞ
ψ 4 ð xÞ ð22:237Þ
0
u1 ðq, h Þ
u2 ðq, h0 Þ
× ð ψ 1 ð yÞ  ψ 2 ð yÞ  ψ 3 ð yÞ  ψ 4 ðyÞ Þ :
u3 ðq, h0 Þ
u4 ðq, h0 Þ

Meanwhile, ψ {( y)u(q, h′)u{(p, h)ψ(x) is similarly expressed as in (22.237). After


the calculations, we get

u{ ðp, hÞψ ðxÞψ { ðyÞuðq, h0 Þ þ ψ { ðyÞuðq, h0 Þu{ ðp, hÞψ ðxÞ


4
ð22:238Þ
= uα uα δ3 ðx - yÞ = 2p0 δhh0 δ3 ðx - yÞ,
α=1

where with the last equality we used (21.156) and (22.229). Inserting (22.238) into
(22.236), we obtain

1 1
2p0 eitðp0 - q0 Þ
bðp, hÞ, b{ ðq, h0 Þ = δhh0 dx3 dy3 eiðqy - pxÞ δ3 ðx - yÞ
ð2π Þ3 2p0 -1 -1
1
2p0 eitðp0 - q0 Þ
= δhh0 dx3 eixðq - pÞ
ð2π Þ3 2p0 -1
22.4 Quantization of the Dirac Field [5, 6] 1039

2p0 eitðp0 - q0 Þ
= δhh0 ð2π Þ3 δ3 ðq - pÞ = δhh0 δ3 ðq - pÞ, ð22:239Þ
ð2π Þ3 2p0

where with the second to the last equality we used (22.101); the last equality was due
to the inclusion of δ3(q - p), which leads to q0 - p0 = 0. Thus, the result of (22.239)
has largely been simplified. We remark that the above calculation procedures offer
an interesting example for a computation of numbers that contain both Grassmann
numbers and ordinary c-numbers.
Other anti-commutation relations can be derived similarly. For example, we
obtain the following relation as in the case of (22.235):
1
1
d { ðq, hÞ = dx3 e - iqx v{ ðq, hÞψ ðxÞ: ð22:240Þ
3
ð2π Þ 2p0 -1

Also, proceeding similarly to (22.239), we have

d ðp, hÞ, d{ ðq, h0 Þ = δhh0 δ3 ðq - pÞ: ð22:241Þ

With other combinations of the b-related operators and d-related operators, we


have the following anti-commutation relations such that

fbðp, hÞ, bðq, h0 Þg = fbðp, hÞ, dðq, h0 Þg = bðp, hÞ, d { ðq, h0 Þ


= b{ ðp, hÞ, b{ ðq, h0 Þ = b{ ðp, hÞ, dðq, h0 Þ = b{ ðp, hÞ, d { ðq, h0 Þ ð22:242Þ
0 { { 0
= fdðp, hÞ, d ðq, h Þg = d ðp, hÞ, d ðq, h Þ = 0:

These results can be viewed in parallel with (22.105) in spite of the difference
between the commutation and anti-commutation relations and the difference in
presence or absence of the helicity freedom. Since (22.242) does not contain
space–time coordinates, it is useful for various calculations in relation to the
quantum fields.

22.4.3 Antiparticle: Positron

The Dirac field is characterized by the antiparticles. The antiparticle of electron is the
positron. In Sect. 21.4 we have introduced the notion of the charge conjugation. It
can be regarded as the particle–antiparticle transformation. Strictly speaking, it is not
until we properly describe the field quantization by introducing the creation and
annihilation operators that the charge conjugation has its place. Nonetheless, it is of
great use in a mathematical approach to introduce the notion of charge conjugation
into the spinor field before being quantized. In fact, the formulation of the charge
1040 22 Quantization of Fields

conjugation developed in Sect. 21.4 can directly be imported along with the creation
and annihilation operators into the field quantization of the antiparticle.
In Sect. 21.4 the charge conjugation of the spinor ψ into ψ C was described as

ψ C = Cψ T = iγ 2 ψ  : ð21:161Þ

Borrowing this notation and using (22.232), the antiparticle field ψ anti(x) is
expressed as

ψ anti ðxÞ = ψ C ðxÞ = iγ 2 ψ 


1
dp3
= eipx b{ ðp, hÞiγ 2 u ðp, hÞ þ e - ipx dðp, hÞiγ 2 v ðp, hÞ :
-1 3
ð2π Þ 2p0 h = ± 1
ð22:243Þ

Further using (21.166) and (21.167), we get

ψ anti ðxÞ
1
dp3
= eipx b{ ðp, hÞvðp, hÞ þ e - ipx dðp, hÞuðp, hÞ : ð22:244Þ
-1 3
ð2π Þ 2p0 h = ± 1

Comparing (22.232) and (22.244), we find that ψ(x) and ψ anti(x) can be obtained
merely by exchanging b(p, h) and d(p, h).
The discrimination between the particle and antiparticle is associated with
whether the operator ψ(x) is Hermitian or not. Since in the case of the neutral scalar
field the operator ϕ(x) is Hermitian, we have no discrimination between the particle
and antiparticle.
With the Dirac field, however, the operator ψ(x) is not Hermitian, but the particle
and antiparticle should be distinguished. We may think that destroying the negative-
energy state is equivalent to creating the positive-energy state, or vice versa.
Interested readers are referred to appropriate literature with more detailed discus-
sions on the particle and antiparticle as well as their implications [8, 12].

22.4.4 Invariant Delta Functions of the Dirac Field

In this section we introduce the invariant delta functions of the Dirac field. As in the
case of the scalar field, these functions are of great importance in studying the
interaction of quantum fields. The derivation of these functions is parallel with
those of the scalar field.
We have performed the canonical quantization by setting the equal-time anti-
commutation relation (22.224). Combining (22.224) with the Dirac field expansion
22.4 Quantization of the Dirac Field [5, 6] 1041

described by (22.232) and (22.244), we introduced the anti-commutation relations


between the creation and annihilation operators expressed as (22.239), (22.241), and
(22.242). Further combining the Dirac field expansion with the anti-commutation
relations between the creation and annihilation operators, we wish to seek the
covariant anti-commutation relations between the Dirac fields. Note that unlike the
equal-time anti-commutation relation (22.224) the covariant anti-commutation rela-
tions do not contain the (equal) time variable explicitly.
We obtain a following covariant relations at once such that

fψ ðxÞ, ψ ðyÞg = fψ ðxÞ, ψ ðyÞg = 0: ð22:245Þ

With this expression see Example 22.1.


To explore other covariant anti-commutation relations of the Dirac field, we
decompose the field operators into the positive-frequency and negative-frequency
parts as in the case of the scalar field (Sect. 22.3.4). For example, from (22.232) we
have

dp3
ψ þ ðxÞ = e - ipx bðp, hÞuðp, hÞ,
3
ð2π Þ 2p0 h = ± 1
ð22:246Þ
dp3
ψ - ðxÞ = eipx b{ ðp, hÞuðp, hÞ:
ð2π Þ3 2p0 h = ± 1

Similarly, a pair of functions ψ -(x) and ψ þ ðxÞ is obtained such that

dp3
ψ - ðxÞ = eipx d{ ðp, hÞvðp, hÞ,
3
ð2π Þ 2p0 h = ± 1
dp3
ψ þ ðxÞ = e - ipx dðp, hÞvðp, hÞ:
3
ð2π Þ 2p0 h = ± 1

From (22.246) we get

fψ þ ðxÞ, ψ - ðxÞg
1 dq3 dp3
= e - iðpx - qyÞ ð22:247Þ
ð2π Þ 3
2p0  2q0 h, h0 = ± 1
× bðp, hÞb{ ðq, h0 Þuðp, hÞuðq, h0 Þ þ b{ ðq, h0 Þbðp, hÞuðq, h0 Þuðp, hÞ :

As implied in Example 22.1, the factor uðq, h0 Þuðp, hÞ in the second term of the
integrand of (22.247) must be read as
1042 22 Quantization of Fields

T
uðq, h0 Þuðp, hÞ = uðq, h0 Þ uðp, hÞT = uðp, hÞuðq, h0 Þ:
T
ð22:248Þ

Notice that since (22.248) does not contain the Grassmann numbers, unlike
(22.227) the second equality of (22.248) holds. Then, (22.247) can be rewritten as

1 dq3 dp3
fψ þ ðxÞ, ψ - ðxÞg = e - iðpx - qyÞ
ð2π Þ3 2p0  2q0 h, h0 = ± 1
× uðp, hÞuðq, hÞ bðp, hÞb{ ðq, h0 Þ þ b{ ðq, h0 Þbðp, hÞ
1 dq3 dp3
= e - iðpx - qyÞ uðp, hÞuðq, h0 Þ bðp, hÞ, b{ ðq, h0 Þ
ð2π Þ3 2p0  2q0 h, h0 = ± 1
1 dq3 dp3
= e - iðpx - qyÞ uðp, hÞuðq, h0 Þδhh0 δ3 ðq - pÞ ½{
ð2π Þ3 2p0  2q0 h, h0 = ± 1
1 dp3
= e - ipðx - yÞ uðp, hÞuðp, hÞ
ð2π Þ3 2p0 h = ± 1
1 dp3 - ipðx - yÞ
= e pμ γ μ þ m ½{{
ð2π Þ3 2p0
1 dp3 - ipðx - yÞ
= iγ μ ∂μ þ m = iγ μ ∂μ þ m iΔþ ðx - yÞ:
x x
e ½{{{
ð2π Þ3 2p0
ð22:249Þ

At [{] in (22.249) we used (22.239); at [{{] we used (21.144); at [{ { {] we used


x
(22.164). In (22.249) ∂μ means the partial differentiation with respect to x.
Similarly calculating fψ - ðxÞ, ψ þ ðyÞg, we obtain

fψ - ðxÞ, ψ þ ðyÞg = iγ μ ∂μ þ m iΔ - ðx - yÞ,


x

where Δ-(x - y) is identical with that appearing in (22.168). Also, we define

iγ μ ∂μ þ m iΔþ ðx - yÞ  iSþ ðx - yÞ = fψ þ ðxÞ, ψ - ðxÞg,


x

ð22:250Þ
iγ μ ∂μ þ m iΔ - ðx - yÞ  iS - ðx - yÞ = fψ - ðxÞ, ψ þ ðyÞg:
x

Summing two equations of (22.250), using (22.139), and further defining

Sðx - yÞ  Sþ ðx - yÞ þ S - ðx - yÞ,

we obtain [5]
22.4 Quantization of the Dirac Field [5, 6] 1043

iγ μ ∂μ þ m iΔðx - yÞ  iSðx - yÞ:


x
ð22:251Þ

Meanwhile, we have [5]

fψ ðxÞ, ψ ðyÞg
= fψ þ ðxÞ, ψ þ ðxÞg þ fψ þ ðxÞ, ψ - ðxÞg þ fψ - ðxÞ, ψ þ ðxÞg þ fψ - ðxÞ, ψ - ðxÞg
= fψ þ ðxÞ, ψ - ðxÞg þ fψ - ðxÞ, ψ þ ðxÞg = iSþ ðx - yÞ þ iS - ðx - yÞ
= iSðx - yÞ,
ð22:252Þ

where with the first equality, the first term and fourth term vanish because of
(22.242).

22.4.5 Feynman Propagator of the Dirac Field

Similarly to the case of the scalar field, the Feynman propagator SF(x - y) is defined
as a vacuum expectation value of the T-product such that

SF ðx - yÞ  h0jT½ψ ðxÞψ ðyÞj0i: ð22:253Þ

In the case of the Dirac field, the time-ordered product (T-product) T½ψ ðxÞψ ðyÞ is
defined by

T½ψ ðxÞψ ðyÞ  θ x0 - y0 ψ ðxÞψ ðyÞ - θ y0 - x0 ψ ðyÞψ ðxÞ: ð22:254Þ

Note the minus sign before the second term of (22.254). It comes from the fact
that the Fermion operators ψ(x) and ψ ðyÞ are anti-commutative. We have to pay
attention to the symbolic notation of (22.254) as in the case of the calculation of
fψ þ ðxÞ, ψ - ðxÞg and fψ - ðxÞ, ψ þ ðyÞg. In fact, ψ ðyÞψ ðxÞ must be dealt with as a
(4, 4) matrix as well. Taking the vacuum expectation value of (22.254), we have

SF ðx - yÞ = h0jT½ψ ðxÞψ ðyÞj0i


ð22:255Þ
= θ x0 - y0 h0jψ ðxÞψ ðyÞj0i - θ y0 - x0 h0jψ ðyÞψ ðxÞj0i:

Similarly to the scalar field case of (22.183), we have

h0jψ ðxÞψ ðyÞj0i = fψ þ ðxÞ, ψ - ðyÞg = iSþ ðx - yÞ, ð22:256Þ


h0jψ ðyÞψ ðxÞj0i = fψ - ðxÞ, ψ þ ðyÞg = iS - ðx - yÞ: ð22:257Þ

Substituting (22.256) and (22.257) for (22.255), we get


1044 22 Quantization of Fields

SF ð x - y Þ = i θ x 0 - y 0 Sþ ð x - y Þ - θ y 0 - x 0 S - ð x - y Þ : ð22:258Þ

Then, from (22.232), we have

T ½ψ ðxÞψ ðyÞ
1
θ ð x0 - y 0 Þ
= dq3 dp3 e - ipx bðp, hÞuðp, hÞ þ eipx d { ðp, hÞvðp, hÞ
-1 ð2π Þ3 2p0 h= ±1

× eipy b{ ðq, h0 Þuðq, h0 Þ þ e - iqy dðq, h0 Þvðq, h0 Þ


0
h = ±1

1
θ ð y0 - x0 Þ
- dq3 dp3 eipy b{ ðq, h0 Þuðq, h0 Þ þ e - iqy dðq, h0 Þvðq, h0 Þ
-1 ð2π Þ3 2p0 0
h = ±1

× e - ipx bðp, hÞuðp, hÞ þ eipx d{ ðp, hÞvðp, hÞ : ð22:259Þ


h= ±1

Therefore, we obtain

h0jT ½ψ ðxÞψ ðyÞj0i


1
θ ð x0 - y0 Þ
¼ dq3 dp3 h0j e - ipx bðp, hÞuðp, hÞþ eipx d{ ðp, hÞvðp, hÞ
-1 ð2π Þ3 2p0 h¼ ± 1

× eipy b{ ðq, h0 Þuðq, h0 Þþ e - iqy dðq, h0 Þvðq, h0 Þ j0


0
h ¼±1

1
θ ð y0 - x0 Þ
- dq3 dp3 h0j eipy b{ ðq, h0 Þuðq, h0 Þþ e - iqy d ðq, h0 Þvðq, h0 Þ
-1 ð2π Þ3 2p0 h0 ¼ ± 1

× e - ipx bðp, hÞuðp, hÞþ eipx d{ ðp, hÞvðp, hÞ j0 : ð22:260Þ


h¼ ± 1

Equation (22.260) comprises two integrals and the integrands of each integral
comprise 2 × 2 = 4 terms. However, we do not need to calculate all of them. This is
because in (22.260) we only have to calculate the integrals with respect to the
following two integrands that lead to non-vanishing integrals:

A e - iðpx - qyÞ 0jbðp, hÞuðp, hÞ  b{ ðq, h0 Þuðq, h0 Þj0


h, h0 = ± 1

and
22.4 Quantization of the Dirac Field [5, 6] 1045

B e - iðqy - pxÞ 0jd ðq, h0 Þvðq, h0 Þ  d{ ðp, hÞvðp, hÞj0 : ð22:261Þ


h, h0 = ± 1

Other terms vanish similarly as in the case of (22.121). It is because we have

bðp, hÞ j 0i = d ðq, h0 Þ j 0i = 0

and

h0 j b{ ðq, h0 Þ = h0 j d{ ðp, hÞ = 0: ð22:262Þ

We estimate the above two integrands A and B one by one. First, we rewrite A as

A e - iðpx - qyÞ 0jbðp, hÞuðp, hÞ  b{ ðq, h0 Þuðq, h0 Þj0


0
h, h = ± 1

= e - iðpx - qyÞ 0j δhh0 δ3 ðq - pÞ - b{ ðq, h0 Þbðp, hÞ uðp, hÞuðq, h0 Þj0


0
h, h = ± 1

= e - iðpx - qyÞ 0jδhh0 δ3 ðq - pÞuðp, hÞuðq, h0 Þj0


0
h, h = ± 1

= e - iðpx - qyÞ δhh0 δ3 ðq - pÞh0juðp, hÞuðq, h0 Þj0i,


0
h, h = ± 1

ð22:263Þ

where the second equality comes from (22.239) and the third equality from (22.262).
Paying attention to the fact that uðp, hÞuðq, h0 Þ is not a quantum operator but a (4, 4)
matrix, we have

h0juðp, hÞuðq, h0 Þj0i = uðp, hÞuðq, h0 Þh0j0i = uðp, hÞuðq, h0 Þ: ð22:264Þ

Notice that h0| 0i = 1, i.e., the vacuum state has been normalized. Thus, we obtain

A= e - iðpx - qyÞ δhh0 δ3 ðq - pÞuðp, hÞuðq, h0 Þ: ð22:265Þ


0
h, h = ± 1

Similarly, we have

B= e - iðqy - pxÞ δhh0 δ3 ðq - pÞvðq, h0 Þvðp, hÞ: ð22:266Þ


0
h, h = ± 1

Neither A nor B is a Grassmann number but ordinary number. Hence, according to


the manipulation as noted earlier in (22.248), we obtain
1046 22 Quantization of Fields

vðq, h0 Þvðp, hÞ = vðp, hÞvðq, h0 Þ: ð22:267Þ

Substituting (22.265) and (22.266) for (22.260), by the aid of (22.267) we get
1 1
θðx0 - y0 Þ θ ðy 0 - x 0 Þ
h0jT ½ψ ðxÞψ ðyÞj0i = dq3 dp3
3
A- dq3 dp3 B
-1 ð2π Þ 2p0 -1 ð2π Þ3 2p0
1
θðx0 - y0 Þ
= dq3 dp3 e - iðpx - qyÞ δhh0 δ3 ðq - pÞuðp, hÞuðq, h0 Þ
-1 ð2π Þ3 2p0 h, h0 = ± 1
1
θðy0 - x0 Þ
- dq3 dp3 e - iðqy - pxÞ δhh0 δ3 ðq - pÞνðp, hÞνðq, h0 Þ:
-1 ð2π Þ3 2p0 h, h0 = ± 1
ð22:268Þ

Performing the integration with respect to q and summation with respect to h′, we
obtain

1
θ ð x0 - y0 Þ
h0jT ½ψ ðxÞψ ðyÞj0i = dp3 3
e - ipðx - yÞ uðp, hÞuðp, hÞ
-1 ð2π Þ 2p0 h = ± 1
1 ð22:269Þ
θ ð y0 - x0 Þ
- dp3 eipðx - yÞ νðp, hÞνðp, hÞ:
-1 ð2π Þ3 2p0 h = ± 1

Using (21.144) and (21.145), finally we obtain

SF ðx - yÞ = h0jT ½ψ ðxÞψ ðyÞj0i


1
dp3
= 3
θ x0 - y0 e - ipðx - yÞ pμ γ μ þ m - θ y0 - x0 eipðx - yÞ pμ γ μ - m
- 1 ð2π Þ 2p0
1
dp3
= 3
θ x0 - y0 e - ipðx - yÞ pμ γ μ þ m þ θ y0 - x0 eipðx- yÞ - pμ γ μ þ m :
- 1 ð2π Þ 2p0

ð22:270Þ

We get another representation of SF(x - y) in a way analogous to (22.251). It is


described by

SF ðx - yÞ = iγ μ ∂μ þ m ΔF ðx - yÞ,
x
ð22:271Þ

where ΔF(x - y) was given by (22.201). With the derivation of (22.271), use
(22.270) and perform the differentiation with respect to the individual coordinate
xμ. We should be careful in treating the differentiation with respect to x0, because it
contains the differentiation of θ(x0 - y0).
We show the related calculations as follows:
22.4 Quantization of the Dirac Field [5, 6] 1047

x
iγ 0 ∂0 ΔF ðx - yÞ
1
iγ 0 dp3 x
= 3
∂0 θ x0 - y0 e - ipðx - yÞ þ θ y0 - x0 eipðx - yÞ
- 1 ð2π Þ 2p0
1
iγ 0 dp3
= 3
δ x0 - y0 e - ipðx - yÞ þ θ x0 - y0 ð - ip0 Þe - ipðx - yÞ
- 1 ð2π Þ 2p0

- δðy0 - x0 Þeipðx - yÞ þ θðy0 - x0 Þðip0 Þeipðx - yÞ ð22:272Þ


1
iγ 0 dp3
= δ x0 - y0 e - ipðx - yÞ - δ y0 - x0 eipðx - yÞ
-1 ð2π Þ3 2p0
þ θðx0 - y0 Þð - ip0 Þe - ipðx - yÞ þ θðy0 - x0 Þðip0 Þeipðx - yÞ
1
dp3 p0 γ 0
= 3
θ x0 - y0 e - ipðx - yÞ - θ y0 - x0 eipðx - yÞ ,
-1 ð2π Þ 2p0

where with the second to the last equality the first and second integrands canceled
out each other. For this, use the properties of δ functions; i.e., δ(x0 - y0)e-ip(x - y) =
δ(x0 - y0)eip(x - y) and δ(x0 - y0) = δ(y0 - x0) along with
1 ipðx - yÞ 3 1
- 1e d p = - 1 e - ipðx - yÞ d3 p.
Also, we make use of the following relations:

3
x
iγ k ∂k ΔF ðx - yÞ
k=1
3 1
iγ k dp3
∂k θ x0 - y0 e - ipðx - yÞ þ θ y0 - x0 eipðx - yÞ
x
= 3
k=1 -1 ð2π Þ 2p0
3 1
iγ k dp3 ð22:273Þ
= 3
θ x0 - y0 ð - ipk Þe - ipðx - yÞ
- 1 ð2π Þ 2p0
k=1
þ θðy0 - x0 Þðipk Þeipðx - yÞ
1
3
dp3 pk γ k
= 3
θ x0 - y0 e - ipðx - yÞ - θ y0 - x0 eipðx - yÞ :
k=1 - 1 ð2π Þ 2p 0

Adding (22.272) and (22.273) together with mΔF(x - y) and taking account of
(22.201), we acknowledge that (22.271) is identical to (22.270).
We wish to relate SF(x - y) expressed as (22.270) to the momentum representa-
tion (or Fourier integral representation) as discussed in Sect. 22.3.5. We are readily
aware that the difference between (22.270) and (22.201) is with or without the factor
of ( pμγ μ + m) or (-pμγ μ + m). We derive these factors by modifying the functional
form of the integrand of (22.186). As studied in Sect. 22.3.5, (22.201) was based on
the residue obtained by the contour integral with respect to k0.
In Sect. 6.5, we considered a complex function having a pole of order n at z = a
that is represented by f(z) = g(z)/(z - a)n, where g(z) is analytic within a domain we
are thinking of. In Sect. 22.3.5, we thought of a type of a function having a simple
pole given by e-ikz/(z - a) with g(z) = e-ikz. In the present case, we are thinking of a
1048 22 Quantization of Fields

type of a function described by e-ikz(cz + b)/(z - a), where c and b are certain
constants. That is, we have

gðzÞ = e - ikz ðcz þ bÞ:

From (6.148), we have

Res f ðaÞ = f ðzÞðz - aÞjz = a = gðaÞ:

Therefore, the residue of the present case is given by g(a) = e-ika(ca + b). Taking
account of (22.270), we assume the next analytic function described by

3
gðp0 , pÞ = eipðx - yÞ e - ip0 ðx - y0 Þ
0
p0 γ 0 þ pk γ k þ m
k=1 ð22:274Þ
- ipðx - yÞ μ
=e pμ γ þ m :

Since in (22.274) eip(x - y) and 3k = 1 pk γ k are constants with respect to p0 along


with m, the p dependence is denoted by g( p0, p) in (22.274). The function g( p0)
consists of a product of an exponential function and a linear function of p0 and is an
entire function (Definition 6.10). According to the case where the pole is located at
p0 = ∓ p0, we obtain g(∓p0), which corresponds to Case I (-p0) and Case II (+p0) of
Sect. 22.3.5, respectively.
With Case I, the contour integration produces a residue g(-p0, p) described by

3
gð- p0 , pÞ = eipðx - yÞ eip0 ðx - y0 Þ
0
- p0 γ 0 þ pk γ k þ m :
k=1

As in the scalar field case, switching p to -p effects no change in an ensuing


definite integral. Then, with Case I we obtain a function given by

3
gð - p0 , - pÞ = e - ipðx - yÞ eip0 ðx - y0 Þ
0
- p0 γ 0 - pk γ k þ m
k=1
3
= eipðx - yÞ - p0 γ 0 - pk γ k þ m
k=1
ipðx - yÞ μ
=e - pμ γ þ m :

Comparing (22.201) and (22.270), we find that instead of e-ik(x - y) in (22.186),


g( p0, p) = e-ip(x - y)( pμγ μ + m) produces a proper Feynman propagator of the Dirac
field SF(x - y). It is described by
22.5 Quantization of the Electromagnetic Field [5–7] 1049

-i e - ipðx - yÞ pμ γ μ þ m
SF ðx - yÞ = d4 p : ð22:275Þ
ð2π Þ4 m2 - p2 - iE

Equation (22.275) is well known as the Lorentz invariant representation of the


Feynman propagator for the Dirac field. The momentum space representation SF ðpÞ
(or Fourier counterpart of the coordinate representation) of the Feynman propagator
is given in accordance with (22.188) as

- i pμ γ μ þ m
S F ð pÞ  : ð22:276Þ
m2 - p2 - iE

We will make the most of the above-mentioned Feynman propagator in the next
chapter that deals with the interaction between the quantum fields.

22.5 Quantization of the Electromagnetic Field [5–7]

In Part II we studied the classical electromagnetic theory based on the Maxwell’s


equations, as in the case of the scalar field and the Dirac field, we deal with the
quantization of the electromagnetic field (or photon field). Also, in this case the
relativistic description of the electromagnetic field is essential. The quantized theory
of both the Dirac field and electromagnetic field directly leads to the construction of
the quantum electrodynamics (QED).

22.5.1 Relativistic Formulation of the Electromagnetic Field

To properly define the Lagrangian density and Hamiltonian density of the electro-
magnetic field, we revisit Part II as a preliminary stage. First, we introduce the
Heaviside-Lorentz units (in combination with the natural units) to simplify the
description of the Maxwell’s equations. It is because the use of the dielectric constant
of vacuum (ε0) and permeability of vacuum (μ0) is often troublesome. To delete
these constants, we redefine the electromagnetic quantities as follows:
p p
E, D → E= ε0 , ε0 D;
p p
H, B → H= μ0 , μ0 B;
p p
ρ, i → ε0 ρ, ε0 i:

As a result, Maxwell’s equations of electromagnetism (7.1)-(7.4) are recast as


follows:
1050 22 Quantization of Fields

div E = ρ,
div B = 0,
∂B ð22:277Þ
rot E þ = 0,
∂t
∂E
rot B - = i:
∂t

In essence, we have redefined the electromagnetic quantities such that

D=E and B = H: ð22:278Þ

That is, D = ε0E, e.g., the relationship (7.7) in vacuum is rewritten as


p p p
ε0 D = ε0 ðE= ε0 Þ = ε0 E or D = E:

In (22.277) μ0ε0 = 1 is implied in accordance with the natural units; see (7.12).
Next, we introduce vector potential and scalar potential to express the electromag-
netic fields. Of the Maxwell’s equations, with the magnetic flux density B we had

div B = 0: ð7:2Þ

We knew a formula of vector analysis described by

div rot V = 0, ð7:22Þ

for any vector field V. From the above equations, we immediately express B as

B = rot A, ð22:279Þ

where A is a vector field called the vector potential. We know another formula of
vector analysis

rot— Λ  0, ð22:280Þ

where Λ is any scalar field. The confirmation of (22.280) is left for readers.
Therefore, the vector potential A has arbitrariness of choice so that B can remain
unchanged as follows:

A0 = A - — Λ and B = rot A0 : ð22:281Þ

Meanwhile, replacing B in the third equation of (22.277) with (22.279) and


exchanging the operating order of the differential operators ∂/∂t and rot, we obtain
22.5 Quantization of the Electromagnetic Field [5–7] 1051

∂A
rot E þ = 0: ð22:282Þ
∂t

Using (22.280) we express E as

∂A
E= - - — ϕ, ð22:283Þ
∂t

where ϕ is another scalar field called the scalar potential. The scalar potential ϕ and
vector potential A are said to be electromagnetic potentials as a collective term.
Nonetheless, we have no clear reason to mathematically distinguish ϕ of (22.283)
from Λ in (22.280) or (22.281).
Considering that these potentials ϕ and A have arbitrariness, we wish to set the
functional relationship between them by imposing an appropriate constraint. Such
constraint on the electromagnetic potentials is called the gauge transformation. The
gauge transformation is defined by

A0 ðt, xÞ  Aðt, xÞ - — Λðt, xÞ,


∂Λðt, xÞ ð22:284Þ
ϕ0 ðt, xÞ  ϕðt, xÞ þ :
∂t

Note that A′ and ϕ′ denote the change in the functional form caused by the gauge
transformation. The gauge transformation holds the electric field E and magnetic
flux density B unchanged such that

∂A0 ðt, xÞ
E0 ðt, xÞ = - - — ϕ0 ðt, xÞ
∂t
∂Aðt, xÞ ∂— Λðt, xÞ ∂Λðt, xÞ
=- þ - — ϕðt, xÞ - —
∂t ∂t ∂t
∂Aðt, xÞ
=- - — ϕðt, xÞ,
∂t

where the last equality resulted from the fact that the operating order of ∂/∂t and
grad is exchangeable in the second equality. Also, we have

B0 ðt, xÞ = rot A0 ðt, xÞ = rot Aðt, xÞ - rot ½— Λðt, xÞ = rot Aðt, xÞ:

Having the above-mentioned background knowledge, we recast the Maxwell’s


equations in the relativistic formulation. For this, we introduce the following anti-
symmetric field tensor Fμν (in a contravariant tensor form) expressed as
1052 22 Quantization of Fields

F 00 F 01 F 02 F 03 0 - E1 - E2 - E3
F 10 F 11 F 12 F 13 E1 0 - B3 B2
F μν  =
F 20 F 21 F 22 F 23 E2 B3 0 - B1
F 30 F 31 F 32 F 33 E3 - B2 B1 0
= - F νμ , ð22:285Þ

where the indices 0, 1, 2, and 3 denote the t-, x-, y-, and z-component, respectively.
Note that the field quantities of (22.285) have been taken from (22.277). For later
use, we show the covariant form expressed as

F 00 F 01 F 02 F 03
F 10 F 11 F 12 F 13
F μν =
F 20 F 21 F 22 F 23
F 30 F 31 F 32 F 33
0 E1 E2 E3
- E1 0 - B3 B2
= = - F νμ : ð22:286Þ
- E2 B3 0 - B1
- E3 - B2 B1 0

This expression is obtained from

F μν = F ρσ ηρμ ησν , ð22:287Þ

where (η)ρμ is the Minkowski metric that has appeared in (21.16). We will discuss
the constitution and properties of the tensors in detail in Chap. 24.
Defining the charge–current density four-vector as

ρð xÞ
ρð xÞ i 1 ð xÞ
sν ðxÞ  = , ð22:288Þ
i ð xÞ i 2 ð xÞ
i 3 ð xÞ

we get the Maxwell’s equations that are described in the Lorentz invariant form as

∂μ F μν ðxÞ = sν ðxÞ, ð22:289Þ


ρ μν μ ν ρμ
∂ F ðxÞ þ ∂ F νρ ðxÞ þ ∂ F ðxÞ = 0, ð22:290Þ
22.5 Quantization of the Electromagnetic Field [5–7] 1053

t t
where x represents the four-vector . Note that we use the notation to
x x
represent the four-vector and that, e.g., the notation rot A(t, x) is used to show the
dependence of a function on independent variables. In (22.290) the indices ρ, μ, and
ν are cyclically permuted. These equations are equivalent to (22.277). For instance,
in (22.289) the equation described by

∂μ F μ0 ðxÞ = ∂0 F 00 ðxÞ þ ∂1 F 10 ðxÞ þ ∂2 F 20 ðxÞ þ ∂3 F 30 ðxÞ = s0 ðxÞ

implies

∂E 1 ∂E2 ∂E 3
þ þ = ρðxÞ:
∂x1 ∂x2 ∂x3

Rewriting the above equation using (t, x), we have

div E = ρ,

i.e., the first equation of (22.277). As an example of (22.290), an equation given by

0 1 2
∂ F 12 ðxÞ þ ∂ F 20 ðxÞ þ ∂ F 01 ðxÞ = 0

means

∂B3 ∂E 2 ∂E1 ∂E 2 ∂E1 ∂B3


- - þ =0 or - þ = 0:
∂x0 ∂x1 ∂x2 ∂x1 ∂x2 ∂x0

Also, rewriting the above equation using (t, x) we get

∂E y ∂E x ∂Bz ∂B
- þ =0 or ðrot EÞz þ = 0:
∂x ∂y ∂t ∂t z

That is, the third equation of (22.277) is recovered. The confirmation of other
relations is left for readers. In the above derivations of Maxwell’s equations, note
that

0 k
∂ = ∂=∂x0 = ∂=∂x0 = ∂0 , ∂ = ∂=∂xk = - ∂=∂xk =
- ∂k ðk = 1, 2, 3Þ: ð22:291Þ

Since Fμν is the antisymmetric tensor, from (22.289) we have

∂μ sμ ðxÞ = 0: ð22:292Þ
1054 22 Quantization of Fields

Equation (22.292) is essentially the same as the current continuity equation that
has already appeared in (7.20). Using four-vectors, the gauge transformation is
succinctly described as
μ μ
A0 ðxÞ = Aμ ðxÞ þ ∂ ΛðxÞ, ð22:293Þ

ϕð xÞ
where Aμ ðxÞ = . Equation (22.293) is equivalent to (22.284). Using Aμ(x),
Að x Þ
we get
μ ν
F μν ðxÞ = ∂ Aν ðxÞ - ∂ Aμ ðxÞ: ð22:294Þ

For example, we have

0 1
F 01 ðxÞ = ∂ A1 ðxÞ - ∂ A0 ðxÞ,

i.e.,

∂A1 ðxÞ ∂A0 ðxÞ ∂A


- E1 ðxÞ = þ or ðE Þx = - - ð — ϕÞ x :
∂x0 ∂x1 ∂t x

Other relations can readily be confirmed. The field strength Fμν(x) is invariant
under the gauge transformation (gauge invariance). It can be shown as follows:
μν μ ν ν μ
F 0 ðxÞ = ∂ A0 ðxÞ - ∂ A0 ðxÞ
μ ν ν μ
= ∂ ½Aν ðxÞ þ ∂ ΛðxÞ - ∂ ½Aμ ðxÞ þ ∂ ΛðxÞ
μ μ ν ν ν μ μ ν
= ∂ Aν ðxÞ þ ∂ ∂ Λ - ∂ Aμ ðxÞ - ∂ ∂ Λ = ∂ Aν ðxÞ - ∂ Aμ ðxÞ
= F μν ðxÞ,
ð22:295Þ

where with the third equality the operating order of ∂μ and ∂ν can be switched.
Further substituting (22.294) for (22.289), we obtain
μ ν μ ν
∂μ ½∂ Aν ðxÞ - ∂ Aμ ðxÞ = ∂μ ∂ Aν ðxÞ - ∂μ ∂ Aμ ðxÞ = sν ðxÞ:

Exchanging the operating order of ∂μ and ∂ν with the first equality of the above
equation, we get
μ ν
∂μ ∂ Aν ðxÞ - ∂ ∂μ Aμ ðxÞ = sν ðxÞ:

Or using ∂μ∂μ = □ we have


22.5 Quantization of the Electromagnetic Field [5–7] 1055

ν
□Aν ðxÞ - ∂ ∂μ Aμ ðxÞ = sν ðxÞ: ð22:296Þ

Defining in (22.296)

χ ðxÞ  ∂μ Aμ ðxÞ, ð22:297Þ

we obtain
ν
□Aν ðxÞ - ∂ χ ðxÞ = sν ðxÞ: ð22:298Þ

The quantity Aμ(x) is said to be a four-vector potential and a relativistic field


described by such four-vector is called a gauge field. The electromagnetic field is a
typical gauge field.

22.5.2 Lagrangian Density and Hamiltonian Density


of the Electromagnetic Field

Now, we are in the position to describe the Lagrangian density to quantize the free
electromagnetic field. Choosing Aμ(x) for the canonical coordinate, we first define
the Lagrangian density L(x) as

1 1
Lð xÞ  - F ðxÞF μν ðxÞ = - ∂μ Aν ðxÞ - ∂ν Aμ ðxÞ
4 μν 4
μ ν
 ½∂ Aν ðxÞ - ∂ Aμ ðxÞ: ð22:299Þ

Using (22.285) and (22.286) based on the classical electromagnetic fields E(t, x)
and B(t, x), we obtain

1
Lð xÞ = ½Eðt, xÞ2 - ½Bðt, xÞ2 : ð22:300Þ
2

From (22.299), as the corresponding canonical momentum π μ(x) we have



π μ ðxÞ  ∂LðxÞ=∂ Aμ ðxÞ = F μ0 ðxÞ: ð22:301Þ

Hamiltonian density ℋ(x) is then described by



ℋðxÞ = π μ ðxÞ Aμ ðxÞ - LðxÞ:

The first term is given by


1056 22 Quantization of Fields

0
μ
- E1
π μ ðxÞA_ ðxÞ = 0 - E 1 - E 2 - E 3 = E2 ,
- E2
- E3

where we used (22.286) to denote π μ(x); we also used (22.283), to which ϕ  0 is


imposed. The condition of ϕ  0 is due to the Coulomb gauge that is often used to
describe the radiation in vacuum [5]. From (22.300), we get

1
ℋðxÞ = ½Eðt, xÞ2 þ ½Bðt, xÞ2 : ð22:302Þ
2

Although (22.300) and (22.302) seem reasonable, (22.301) has an inconvenience.


It is because we could not define π 0(x), i.e., π 0(x) = F00(x)  0. To circumvent this
inconvenience, we need to redefine the Lagrangian density. We develop the discus-
sion in accordance with the literature [12]. The point is to use a scalar function χ(x)
of (22.297). That is, we set the Lagrangian density such that [12]

1 1
Lð xÞ  - F ðxÞF μν ðxÞ - χ 2 : ð22:303Þ
4 μν 2

Rewriting FμνFμν as

1 1 μ ν
- F F μν = - ∂μ Aν ∂ Aν - ∂μ Aν ∂ Aμ
4 μν 2
1 μ 1 1
= - ∂μ Aν ∂ Aν þ ∂μ ðAν ∂ν Aμ Þ - Aν ∂ν χ,
2 2 2

we have

1 μ 1 1 1
Lð xÞ  - ∂ A ∂ Aν þ ∂μ ðAν ∂ν Aμ Þ - Aν ∂ν χ - χ 2 : ð22:304Þ
2 μ ν 2 2 2

We wish to take the variation with respect to Aν (i.e., δAν) termwise in (22.304).
Namely, we have

LðxÞ = L1 ðxÞ þ L2 ðxÞ þ L3 ðxÞ þ L4 ðxÞ, ð22:305Þ

where L1(x), ⋯, and L4(x) correspond to the first, ⋯, and fourth term of (22.304),
respectively. We obtain
22.5 Quantization of the Electromagnetic Field [5–7] 1057

SðAν Þ = dx LðxÞ = dxL1 ðxÞ þ ⋯ þ dx L4 ðxÞ = S1 þ ⋯ þ S4 ,

δSðAν Þ = δ dx LðxÞ = δ dxL1 ðxÞ þ ⋯ þ δ dx L4 ðxÞ ,

= δS1 ðAν Þ þ ⋯ þ δS4 ðAν Þ,


ð22:306Þ

where Sk(Aν) = dxLk(x) and δSk(Aν) = δ dxLk(x) (k = 1, 2, 3, 4). We evaluate δSk


one by one. With δS1 we get

δS1 ðAν Þ = δ dxL1 ðxÞ

1 μ μ
= dx - ∂μ δAν ∂ Aν þ ∂μ Aν ð∂ δAν Þ
2
ð22:307Þ
1 μ μ
= dx - ∂μ δAν ∂ Aν þ ∂μ Aν ð∂ δAν Þ
2
1 μ μ μ
= dxδAν ∂μ ∂ Aν þ ∂ ∂μ Aν = dxδAν ∂μ ∂ Aν ,
2

where with the third equality the second term was rewritten so that the variation was
evaluated with δAν. In the derivation of (22.307), we used the integration by parts
and assumed that δAν vanishes at xμ → ± 1 as in the derivation of (22.76). Similarly
evaluating δS2(Aν), δS3(Aν), and δS4(Aν), we have

1 ν
δS2 ðAν Þ = dxδL2 ðxÞ = - dxδAν ∂μ ∂ Aμ ,
2
1 ν
δS3 ðAν Þ = dxδL3 ðxÞ = - dxδAν ∂ ∂μ Aμ , ð22:308Þ
2
ν
δS4 ðAν Þ = dxδL4 ðxÞ = dxδAν ∂ ∂μ Aμ :

Summing δS1(Aν), δS2(Aν), δS3(Aν), and δS4(Aν), we obtain a simple result


expressed as

μ
δSðAν Þ = δS1 ðAν Þ = dxδAν ∂μ ∂ Aν : ð22:309Þ

Notice that δS2(Aν), δS3(Aν), and δS4(Aν) canceled out one another. Thus, the
Euler–Lagrange equation obtained from (22.309) reads as
1058 22 Quantization of Fields

μ
∂ μ ∂ A ν ð xÞ = 0 or □Aν ðxÞ = 0: ð22:310Þ

For (22.310) to be equivalent to the Maxwell’s equations, we need another


condition [12]. We seek the condition below.
With a free electromagnetic field without the charge–current density, considering
(22.297) and (22.298) on condition that sν(x) = 0 we have
μ ν μ ν
∂μ ∂ Aν ðxÞ - ∂ ∂μ Aμ ðxÞ = ∂μ ∂ Aν ðxÞ - ∂ χ ðxÞ = 0: ð22:311Þ

Then, for (22.310) to be consistent with (22.311) we must have


ν
∂ χ ðxÞ = 0: ð22:312Þ

This leads to
ν
∂ν ∂ χ ðxÞ = 0: ð22:313Þ

Putting ν = 0 in (22.312), we have

0
∂ χ ðxÞ = ∂χ ðxÞ=∂t = 0: ð22:314Þ

Consequently, if we impose

χ ðxÞ = ∂μ Aμ ðxÞ = 0 at t = 0, ð22:315Þ

from (22.313) we have χ(x)  0 at all the times [12]. The constraint χ(x) = 0 is called
the Lorentz condition.
Any gauge in which the Lorentz condition holds is called a Lorentz gauge
[12]. The significance of the Lorentz condition will be mentioned later in relation
to the indefinite metric (Sect. 22.5.5).
Differentiating (22.293) with respect to xμ, we obtain
μ μ
∂μ A0 ðxÞ = ∂μ Aμ ðxÞ þ ∂μ ∂ ΛðxÞ: ð22:316Þ

Using the Lorentz gauge, from (22.315) and (22.316) we have [12]
μ
∂ μ ∂ Λ ð xÞ = 0 or □ΛðxÞ = 0: ð22:317Þ

When we originally chose Λ(x) in (22.280), Λ could be any scalar field. We find,
however, that the scalar function Λ(x) must satisfy the condition of (22.317) under
the Lorentz gauge.
Under the Lorentz gauge, in turn, the last two terms of (22.304) vanish. If we
assume that Aν vanishes at x = ± 1, the second term of (22.304) vanishes as well. It
is because the integration of that term can be regarded the extension of the Gauss’s
22.5 Quantization of the Electromagnetic Field [5–7] 1059

theorem to the four-dimensional space–time (see Sect. 7.1). Thus, among four terms
of (22.304) only the first term survives as the Lagrangian density L(x) that properly
produces the physically meaningful π 0(x). That is, we get [12]

1 μ
Lð xÞ = - ∂ A ∂ Aν : ð22:318Þ
2 μ ν

The Lagrangian density described by (22.318) is equivalent to that of (22.303) or


(22.304) on which the Lorentz condition is imposed and for which the field Aν is
supposed to vanish at x = ± 1. Using (22.318), as the canonical momentum π μ(x)
we obtain [12]
μ
π μ ðxÞ = ∂LðxÞ=∂A_ μ ðxÞ = - A_ ðxÞ: ð22:319Þ

Thus, the inconvenience of π 0(x) = 0 mentioned earlier has been skillfully gotten
rid of. Notice that RHS of (22.319) has a minus sign, in contrast to (22.83). What
remains to be done is to examine whether or not L(x) is invariant under the gauge
transformation (22.293). We will discuss this issue in Chap. 23.
In the subsequent sections, we perform the canonical quantization of the electro-
magnetic field using the canonical coordinates and canonical momenta.

22.5.3 Polarization Vectors of the Electromagnetic Field [5]

In Part I and Part II, we discussed the polarization vectors of the electromagnetic
fields within the framework of the classical theory. In this section, we revisit the
related discussion and rebuild the polarization features of the fields from the rela-
tivistic point of view.
Previously, we have mentioned that the electromagnetic waves have two inde-
pendent polarization states as the transverse modes. Since in this section we are
dealing with the electromagnetic fields as the four-vectors, we need to consider two
extra polarization freedoms, i.e., the longitudinal mode and scalar mode. Then, we
distinguish these modes as the polarization vectors such as

εμ ðk, r Þ or εμ ðk, r Þ ðμ, r = 0, 1, 2, 3Þ,

where the index μ denotes the component of the four-vector (either contravariant or
covariant); k is a wavenumber vector (see Part II); r represents the four linearly
independent polarization states for each k. The modes εμ(k, 1) and εμ(k, 2) are called
transverse polarizations, εμ(k, 3) longitudinal polarization, εμ(k, 0) scalar polariza-
tion. The quantity εμ(k, r) is a complex c-number. Also, note that
1060 22 Quantization of Fields

k0
k= with k0 = k0 = jkj: ð22:320Þ
k

Equation (22.320) is a consequence of the fact that photon is massless. The


orthonormality and completeness relations are read as [5]

εμ ðk, r Þ εμ ðk, sÞ = - ζ r δrs ðr, s = 0, 1, 2, 3Þ, ð22:321Þ


3 3
ζ r εμ ðk, r Þ εν ðk, r Þ = - ημν or ζ r εμ ðk, r Þ εν ðk, r Þ = - δμν ð22:322Þ
r=0 r=0

with

ζ 0 = - 1, ζ 1 = ζ 2 = ζ 3 = 1: ð22:323Þ

Equation (22.321) means that

εμ ðk, 0Þ εμ ðk, 0Þ = 1 > 0, εμ ðk, r Þ εμ ðk, r Þ = - 1 < 0 ðr = 1, 2, 3Þ: ð22:324Þ

If εμ(k, r) is real, εμ(k, 0) is a timelike vector and that εμ(k, r) (r = 1, 2, 3) are


spacelike vectors in the Minkowski space. We show other important relations in a
special frame where k is directed to the positive direction of the z-axis. We have

1 0
εμ ðk, 0Þ = , εμ ðk, r Þ = ðr = 1, 2, 3Þ, ð22:325Þ
0 εðk, r Þ

where ε(k, r) denotes a usual three-dimensional vector; to specify it we use a Gothic


letter ε. If we wish to represent εμ(k, 0) in a general frame, we denote it by a covariant
notation [5]

εμ ðk, 0Þ  nμ : ð22:326Þ

Also, we have

k  εðk, r Þ = 0 ðr = 1, 2Þ, εðk, 3Þ = k=jkj,


ð22:327Þ
εðk, r Þ  εðk, sÞ = δrs ðr, s = 1, 2, 3Þ:

Denoting εμ(k, 3) by a covariant form, we have [5]


22.5 Quantization of the Electromagnetic Field [5–7] 1061

k μ - ðknÞnμ
εμ ðk, 3Þ = , ð22:328Þ
ðknÞ2 - k 2

where the numerator was obtained by subtracting the timelike component from
k0
k= ; related discussion for the three-dimensional version can be seen in
k
Sect. 7.3.
In (22.321) and (22.322), we used complex numbers for εμ(k, s), which is
generally used for the circularly (or elliptically) polarized light (Sect. 7.4). With a
linearly polarized light, we use a real number for εμ(k, s).
All the above discussions relevant to the polarization features of the electromag-
netic fields can be dealt with within the framework of the classical theory of
electromagnetism.

22.5.4 Canonical Quantization of the Electromagnetic Field

The quantization of the electromagnetic field is inherently represented in a tensor


form. This differs from those for the scalar field and the Dirac field. This feature is
highly important with both the classical and quantum electromagnetism. As in the
case of the scalar field, the canonical quantization is translated into the commutation
relations between the creation and annihilation operators. Unlike the scalar field and
Dirac field, however, the classical polarization vectors of photon are incorporated
into the commutation relations.
The equal-time canonical commutation relation is described as

½Aμ ðt, xÞ, π ν ðt, yÞ = iδμν δ3 ðx - yÞ: ð22:329Þ

Equation (22.329) corresponds to (22.84) of the scalar field case and is clearly
represented in the tensor form, i.e., a direct product of a contravariant vector Aμ(t, x)
and a covariant vector π ν(t, y). Multiplying both sides of (22.329) by ηνρ, we obtain

½Aμ ðt, xÞ, π ρ ðt, yÞ = iημρ δ3 ðx - yÞ: ð22:330Þ

This is a tensor described as a direct product of two contravariant vectors. Using


(22.319), we get

Aμ ðt, xÞ, A_ ν ðt, yÞ = - iδμν δ3 ðx - yÞ, ð22:331Þ


ρ
Aμ ðt, xÞ, A_ ðt, yÞ = - iημρ δ3 ðx - yÞ: ð22:332Þ
1062 22 Quantization of Fields

In (22.331) and (22.332), the minus sign of RHS comes from (22.319). All the
other commutators of other combinations vanish. (That is, all the other combinations
are commutative.) We have, e.g.,

μ ν
½Aμ ðt, xÞ, Aν ðt, yÞ = A_ ðt, xÞ, A_ ðt, yÞ = 0:

We perform the quantization of the electromagnetic field in accordance with


(22.98) of the scalar field (Sect. 22.3.2). Taking account of the polarization index
r (=0, ⋯, 3), we expand the identity operator E such that

1 3
E= δ k 0 2 - k2 j p, rihp, r j dp4 : ð22:333Þ
-1 r=0

Notice that in (22.333) photon is massless. As the Fourier expansion of the field
operator Aμ(x) we obtain [2]

d3 k
Aμ ðxÞ =
ð2π Þ3 ð2k 0 Þ
3
× aðk, r Þεμ ðk, r Þe - ikx þ a{ ðk, r Þεμ ðk, r Þ eikx : ð22:334Þ
r=0

In (22.334), the summation with r is pertinent to a(k, r) [or a{(k, r)] and εμ(k, r)
[or εμ(k, r)]. In comparison with (22.208), in (22.334) we use the following
correspondence:

aþ ðpÞ $ aðk, r Þ and a{- ð- pÞ $ a{ ðk, r Þ

as well as

Φþ p, E p $ εμ ðk, r Þ and Φ - - p, - E p $ εμ ðk, r Þ :

Then, as the decomposition of Aμ(x) into the positive-frequency term Aμ+(x) and
negative-frequency term Aμ-(x), we have

Aμ ðxÞ = Aμþ ðxÞ þ Aμ - ðxÞ

with
22.5 Quantization of the Electromagnetic Field [5–7] 1063

3
d3 k
Aμþ ðxÞ  aðk, r Þεμ ðk, r Þe - ikx ,
3
ð2π Þ ð2k 0 Þ r=0
ð22:335Þ
3
μ- d3 k { μ  ikx
A ð xÞ  a ðk, r Þε ðk, r Þ e :
ð2π Þ3 ð2k 0 Þ r=0

Note that Aμ(x) is Hermitian, as is the case with the scalar field ϕ(x). With the
covariant quantized electromagnetic field Aμ(x), we have

3
d3 k
Aμ ð x Þ =
ð2π Þ3 ð2k0 Þ r=0

 aðk, r Þεμ ðk, r Þe - ikx þ a{ ðk, r Þεμ ðk, r Þ eikx : ð22:336Þ

The operator Aμ(x) is also divided into

Aμ ðxÞ = Aμ þ ðxÞ þ Aμ - ðxÞ,

where Aμ þ ðxÞ and Aμ - ðxÞ are similarly defined as in the case of (22.335).
Now, we define operators Bμ ðkÞ as

3
Bμ ð kÞ  aðk, r Þεμ ðk, r Þ: ð22:337Þ
r=0

Taking the adjoint of (22.337), we get

3
Bμ ð kÞ { = a{ ðk, r Þεμ ðk, r Þ : ð22:338Þ
r=0

Using the operators Bμ ðkÞ and Bμ ðkÞ{ , we may advance discussion of the
quantization in parallel with that of the scalar field. Note the following correspon-
dence between the photon field and scalar field:

Bμ ðkÞ $ aðkÞ, Bμ ðkÞ{ $ a{ ðkÞ:

Remember once again that both the photon field operator and real scalar field
operator are Hermitian (and bosonic field operators), allowing us to carry out the
parallel discussion on the field quantization. As in the case of (22.104), we have
1064 22 Quantization of Fields

d3 x μ
Bμ ðkÞe - ik0 t = i A_ ðt, xÞ - ik 0 Aμ ðt, xÞ e - iqx , ð22:339Þ
ð2π Þ3 ð2k 0 Þ

d3 x μ
Bμ ðkÞ{ eik0 t = - i A_ ðt, xÞ þ ik 0 Aμ ðt, xÞ eiqx : ð22:340Þ
ð2π Þ3 ð2k0 Þ

In light of the quantization of the scalar field represented by (22.105), we


acknowledge that the following relationship holds:

Bμ ðkÞ, Bν ðqÞ{ = - δμν δ3 ðk - qÞ, ð22:341Þ

where Bν ðqÞ{ is the covariant form corresponding to Bν ðqÞ{ ; the minus sign of the
head of RHS in (22.341) comes from (22.319). In fact, we have

3 3
Bμ ðkÞ, Bν ðqÞ{ = aðk, r Þεμ ðk, rÞ, a{ ðq, sÞεν ðq, sÞ
r=0 s=0
3 3
= εμ ðk, r Þεν ðq, sÞ aðk, r Þ, a{ ðq, sÞ
r=0 s=0
3 3
= ζ r εμ ðk, r Þεν ðq, sÞ ζ r aðk, r Þ, a{ ðq, sÞ , ð22:342Þ
r=0 s=0

where with the last equality we used ζ r2 = 1.


For (22.341) and (22.342) to be identical, taking account of (22.322) we must
have [5]

ζ r aðk, r Þ, a{ ðq, sÞ = δrs δ3 ðk - qÞ ð22:343Þ

or

aðk, r Þ, a{ ðq, sÞ = ζ r δrs δ3 ðk - qÞ: ð22:344Þ

Also, as another commutation relation we have

½aðk, r Þ, aðq, sÞ = a{ ðk, r Þ, a{ ðq, sÞ = 0:

Then, (22.342) becomes


22.5 Quantization of the Electromagnetic Field [5–7] 1065

3 3
Bμ ðkÞ, Bν ðqÞ{ = ζ r εμ ðk, r Þεν ðq, sÞ δrs δ3 ðk - qÞ
r=0 s=0
3
= ζ r εμ ðk, r Þεν ðq, r Þ δ3 ðk - qÞ
r=0 ð22:341Þ
3
= ζ r εμ ðq, r Þεν ðq, r Þ δ3 ðk - qÞ
r=0
= - δμν δ3 ðk - qÞ

as required. With the second to the last equality of (22.341), we used (10.116), i.e.,
the property of the δ function; with the last equality we used (22.322). It is worth
mentioning that RHS of (22.344) contains ζ r. This fact implies that the quantization
of the radiation field is directly associated with the polarization feature of the
classical electromagnetic field.

22.5.5 Hamiltonian and Indefinite Metric

In the last section, we had the commutation relation (22.344) with the creation and
annihilation operators as well as the canonical commutation relations (22.331) and
(22.332). These equations differ from the corresponding commutation relations
(22.84) and (22.105) of the scalar field in that (22.331) has a minus sign and that
(22.344) has a factor ζ r. This feature causes problems with the subsequent
discussion.
In relation to this issue, we calculate the Hamiltonian density ℋ(x) of the
electromagnetic field. It is given by

ℋðxÞ = π μ ðxÞA_ μ ðxÞ - LðxÞ


μ 1 ð22:345Þ
= - A_ ðxÞA_ μ ðxÞ þ ∂μ Aν ∂ Aν ,
μ
2

where with the second equality we used (22.318) and (22.319). As in the case of the
scalar field, the Hamiltonian H is given by the triple integral of ℋ(x) such that

H= d3 x ℋðxÞ:

The calculations are similar to those of the scalar field (Sect. 22.3.3). We estimate
the integral termwise. With the first term of (22.345), using (22.334) and (22.336) we
have
1066 22 Quantization of Fields

μ
- A_ ðxÞA_ μ ðxÞ
3
-1 d3 k
= p aðk, r Þεμ ðk, r Þð - ik 0 Þe - ikx þ a{ ðk, r Þεμ ðk, r Þ ðik 0 Þeikx
2ð2π Þ3 k0 r=0
3
d q3
× p aðq, sÞεμ ðq, sÞð - iq0 Þe - iqx þ a{ ðq, sÞεμ ðq, sÞ ðiq0 Þeiqx
q0 s=0
3
1
= d3 k k0 - aðk, r Þεμ ðk, r Þe - ikx þ a{ ðk, r Þεμ ðk, r Þ eikx
2ð2π Þ3 r=0
3
p
× d 3 q q0 - aðq, sÞεμ ðq, sÞe - iqx þ a{ ðq, sÞεμ ðq, sÞ eiqx :
s=0
ð22:346Þ

The integrand of (22.346) is given by

½Integrand of ð22:346Þ
3 3
= k 0 q0 - aðk, r Þεμ ðk, r Þa{ ðq, sÞεμ ðq, sÞ e - iðk - qÞx
r=0 s=0

- a{ ðk, r Þεμ ðk, r Þ aðq, sÞεμ ðq, sÞeiðk - qÞx


þaðk, r Þεμ ðk, r Þaðq, sÞεμ ðq, sÞe - iðkþqÞx
þa{ ðk, r Þεμ ðk, r Þ a{ ðq, sÞεμ ðq, sÞ eiðkþqÞx :

μ
To calculate d3 x - A_ ðxÞA_ μ ðxÞ , we exchange the integration order between
the variable x and variable k (or variable q). That is, performing the integration with
respect to x first, we obtain

d3 x ½Intrgrand of ð22:346Þ
3 3
= k 0 q0 - aðk, r Þεμ ðk, r Þa{ ðq, sÞεμ ðq, sÞ e - iðk0 - q0 Þ ð2π Þ3 δðk - qÞ
r=0 s=0

- a{ ðk, r Þεμ ðk, r Þ aðq, sÞεμ ðq, sÞeiðk0 - q0 Þ ð2π Þ3 δðk - qÞ

þaðk, r Þεμ ðk, r Þaðq, sÞεμ ðq, sÞe - iðk0 þq0 Þ ð2π Þ3 δðk þ qÞ

þa{ ðk, r Þεμ ðk, r Þ a{ ðq, sÞεμ ðq, sÞ eiðk0 þq0 Þ ð2π Þ3 δðk þ qÞ :
ð22:347Þ

Further performing the integration of (22.347) with respect to q, we obtain


22.5 Quantization of the Electromagnetic Field [5–7] 1067

d 3 q ð22:347Þ
3 3
= ð2π Þ3 k 0 - aðk, r Þεμ ðk, r Þa{ ðk, sÞεμ ðk, sÞ
r=0 s=0

- a{ ðk, r Þεμ ðk, r Þ aðk, sÞεμ ðk, sÞ

þaðk, r Þεμ ðk, r Það - k, sÞεμ ð - k, sÞe - 2ik0

þa{ ðk, r Þεμ ðk, r Þ a{ ð - k, sÞεμ ð- k, sÞ e2ik0 : ð22:348Þ

Only when k = - k, the last two terms of (22.348) are physically meaningful. In
that case, however, we have k = 0. But this implies no photon state, and so we
discard these two terms. Compare the present situation with that of (22.111), i.e., the
scalar field case, where k = 0 is physically meaningful. Then, using (22.321) we get

d 3 q ð22:347Þ
3 3
= ð2π Þ3 k 0 - aðk, r Þεμ ðk, sÞð - ζ r δrs Þ - a{ ðk, r Þaðk, sÞð - ζ r δrs Þ
r=0 s=0
3
= ð2π Þ3 k 0 ζ r aðk, r Þa{ ðk, r Þ þ a{ ðk, r Þaðk, r Þ
r=0
3
= ð2π Þ3 k 0 ζ r a{ ðk, r Þaðk, r Þ þ ζ r δ3 ð0Þ þ a{ ðk, r Þaðk, r Þ
r=0
3
= ð2π Þ3 k 0 ζ r 2a{ ðk, r Þaðk, r Þ þ ζ r 2 δ3 ð0Þ
r=0
3
= ð2π Þ3 k 0 ζ r 2a{ ðk, r Þaðk, r Þ þ ð2π Þ3 k0 4δ3 ð0Þ,
r=0

where with the third equality we used (22.344); with the last equality we used
(22.323). Finally, we get

μ
d3 x - A_ ðxÞA_ μ ðxÞ
3 ð22:349Þ
{
= d kk 0
3
ζ r a ðk, r Þaðk, r Þ þ 2 d kk0 δ ð0Þ:
3 3

r=0

In (22.349) the second term leads to the divergence as discussed in Sect. 22.3.3.
As before, we discard this term; namely the normal order is implied there.
1068 22 Quantization of Fields

As for the integration of the second term of (22.345), use the Fourier expansion
formulae (22.334) and (22.336) in combination with the δ function formula (22.101).
That term will vanish because of the presence of a factor

k μ kμ = k0 k 0 - jkj2 = 0:

The confirmation is left for readers. Thus, we obtain the Hamiltonian H described
by

3
H= d 3 kk 0 ζ r a{ ðk, r Þaðk, r Þ: ð22:350Þ
r=0

In (22.350) the presence of a factor ζ0 = - 1 causes a problem. It is because


ζ 0 leads to a negative integrand in (22.350). The argument is as follows: Defining the
one-photon state j1q, si as [5]

j 1q,s i  a{ ðq, sÞ j 0i, ð22:351Þ

where j0i denotes the vacuum, its energy is

3
H j 1q,s i = d 3 kk0 ζ r a{ ðk, r Þaðk, r Þa{ ðq, sÞ j 0i
r=0
3
= d 3 kk0 ζ r a{ ðk, r Þ a{ ðq, sÞaðk, r Þ þ ζ r δrs δ3 ðk - qÞ j 0i
r=0
3
= d 3 kk0 ζ r a{ ðk, r Þζ r δrs δ3 ðk - qÞ j 0i
r=0

= k0 ðζ s Þ2 a{ ðq, sÞ j 0i = k0 a{ ðq, sÞ j 0i = k 0 j 1q,s i,


ð22:352Þ

where with the second equality we used (22.344); with the third equality we used
a(k, r) j 0i = 0; with the third to the last equality we used (22.323). Thus, (22.352)
certainly shows that the energy of the one-photon state is k0.
As a square of the norm h1q, s| 1q, si [see (13.8)], we have

1q,s j1q,s = 0jaðq, sÞa{ ðq, sÞj0 = 0ja{ ðq, sÞaðq, sÞ þ ζ s δ3 ðq - qÞj0
ð22:353Þ
= ζ s δ3 ð0Þh0j0i = ζs δ3 ð0Þ,

where with the last equality we assumed that the vacuum state is normalized. With
s = 0, we have ζ 0 = - 1 and, hence,
22.5 Quantization of the Electromagnetic Field [5–7] 1069

1q,0 j1q,0 = - δ3 ð0Þ: ð22:354Þ

This implies that the (square of) norm of a scalar photon is negative. The negative
norm is said to be an indefinite metric. Admittedly, (22.354) violates the rule of inner
product calculations (see Sect. 13.1). In fact, such a metric is intractable, and so it
should be appropriately dealt with. So far as the factor a(k, 0)a{(k, 0) is involved, we
could not avoid inconvenience. We wish to remove it from quantum-mechanical
operators.
At the same time, we have another problem with the Lorentz condition described
by

∂μ Aμ ðxÞ = 0: ð22:355Þ

The Lorentz condition is incompatible with the equal-time canonical commuta-


tion relation. That is, we have [2]

A0 ðt, xÞ, ∂μ Aμ ðt, xÞ = A0 ðt, xÞ, ∂0 A0 ðt, xÞ þ A0 ðt, xÞ, ∂k Ak ðt, xÞ


ð22:356Þ
= A0 ðt, xÞ, - π 0 ðt, xÞ = - iδ3 ðx - yÞ ≠ 0:

This problem was resolved by Gupta and Bleuler by replacing the Lorentz
condition (22.355) with a weaker condition. The point is that we divide the
Hermitian operator Aμ(x) into Aμ+(x) and Aμ-(x) such that

Aμ ðxÞ = Aμþ ðxÞ þ Aμ - ðxÞ

with

3
d3 k
Aμþ ðxÞ  aðk, r Þεμ ðk, r Þe - ikx ,
3
ð2π Þ ð2k 0 Þ r=0
ð22:335Þ
3
μ- d3 k { μ  ikx
A ð xÞ  a ðk, r Þε ðk, r Þ e :
ð2π Þ3 ð2k 0 Þ r=0

We also require Aμ+(x) to be

∂μ Aμþ ðxÞ j Ψi = 0, ð22:357Þ

where jΨi is any physically meaningful state. Taking an adjoint of (22.357), we have

hΨ j ∂μ Aμ - ðxÞ = 0: ð22:358Þ

Accordingly, we obtain
1070 22 Quantization of Fields

Ψ ∂μ Aμ ðxÞ Ψ = Ψ ∂μ Aμ - ðxÞ þ ∂μ Aμþ ðxÞ Ψ = 0: ð22:359Þ

Equation (22.359) implies that the Lorentz condition holds with an “expectation
value.” Substituting Aμ+(x) of (22.335) for (22.357), we get [2]

3
k μ aðk, r Þεμ ðk, r Þ j Ψi = 0: ð22:360Þ
r=0

With the above equation, notice that the functions e-ikx in (22.335) are linearly
independent with respect to individual k. Recalling the special coordinate frame
where k is directed to the positive direction of the z-axis, we have [2]

kμ = ðk0 0 0 - k0 Þ ðk 0 > 0Þ,

where k1 = k2 = 0 and k3 = - k0.


Also, designating εμ(k, r) as [2]

1 0
0 1
ð0Þ ð1Þ
ε0 ðk, 0Þ =  δ0 , ε1 ðk, 1Þ =  δ1 ,
0 0

0 0
ð22:361Þ
0 0
0 0
ð2Þ ð3Þ
ε2 ðk, 2Þ =  δ2 , ε3 ðk, 3Þ =  δ3 ,
1 0

0 1

we obtain

3 3
ð0Þ ð3Þ
kμ aðk, r Þεμ ðk, r Þ = k 0 aðk, r Þδ0 þ k 3 aðk, r Þδ3
r=0 r=0
= k0 ½aðk, 0Þ - aðk, 3Þ:

ð0Þ ð3Þ
With δ0 , δ3 , etc. in (22.361), see (12.175). Then, (22.360) is rewritten as

k 0 ½aðk, 0Þ - aðk, 3Þ j Ψi = 0: ð22:362Þ

Since k0 ≠ 0, we have
22.5 Quantization of the Electromagnetic Field [5–7] 1071

½aðk, 0Þ - aðk, 3Þ j Ψi = 0: ð22:363Þ

Taking the adjoint of (22.363), we have

hΨ j a{ ðk, 0Þ - a{ ðk, 3Þ = 0: ð22:364Þ

Multiplying both sides of (22.363) by hΨ j a{(k, 0) or hΨ j a{(k, 3) from the left


and multiplying both sides of (22.364) by a(k, 0) j Ψi or a(k, 3) j Ψi from the right,
and then summing both sides of all these four equations, we get

Ψ a{ ðk, 0Þaðk, 0Þ - a{ ðk, 3Þaðk, 3Þ Ψ = 0: ð22:365Þ

Further multiplying both sides of (22.365) by k0 and performing the triple


integration with respect to k, we have

0= d3 kk 0 Ψ a{ ðk, 0Þaðk, 0Þ - a{ ðk, 3Þaðk, 3Þ Ψ : ð22:366Þ

Meanwhile, sandwiching both sides of (22.350) between hΨj and jΨi, we obtain

3
hΨjH jΨi = d3 kk0 Ψ ζ r a{ ðk, r Þaðk, r Þ Ψ : ð22:367Þ
r=0

Finally, summing both sides of (22.366) and (22.367), we get

2
hΨjH jΨi = d3 kk 0 Ψ a{ ðk, r Þaðk, r Þ Ψ : ð22:368Þ
r=1

Thus, we have successfully gotten rid of the factor a{(k, 0)a(k, 0) together with
{
a (k, 3)a(k, 3) from the Hamiltonian.

22.5.6 Feynman Propagator of Photon Field

As in the case of both the scalar field and Dirac field, we wish to find the Feynman
propagator of the photon field. First, we calculate [Aμ(x), Aν( y)]. Using (22.334), we
have
1072 22 Quantization of Fields

½Aμ ðxÞ, Aν ðyÞ


3
d3 kd3 q
= faðk, r Þεμ ðk, rÞe - ikx
ð2π Þ3 2 k0 q0 r=0

3
þa{ ðk, rÞεμ ðk, rÞ eikx g, aðq, sÞεν ðq, sÞe - iqx þ a{ ðq, sÞεν ðq, sÞ eiqx
s=0

3 3
d 3 kd3 q
= εμ ðk, r Þεν ðq, sÞ e - ikx eiqx aðk, r Þ, a{ ðq, sÞ
ð2π Þ3 2 k 0 q0 r=0 s=0

þεμ ðk, r Þ εν ðq, sÞeikx e - iqx a{ ðk, r Þ, aðq, sÞ


3
d3 k
= 3
ζ r εμ ðk, r Þεν ðk, r Þ e - iðkx - qyÞ - ζ r εμ ðk, r Þ εν ðk, r Þeiðkx - qyÞ
ð2π Þ 2 k0 q0 r=0

d3 kð- ημν Þ - ikðx - yÞ


= e - eikðx - yÞ , ð22:369Þ
ð2π Þ3 2k 0

where the second to the last equality we used (22.344) and with the last equality we
used (22.322). Comparing (22.369) with (22.169), we find that (22.369) differs from
(22.169) only by the factor -ημν. Also, notice that photon is massless (m = 0).
Therefore, we have

lim K = lim k2 þ m2 = jkj = k0 ðk 0 > 0Þ: ð22:370Þ


m→0 m→0

Thus, k0 corresponds to K of (22.169). In conjunction with (22.131), we define


the invariant delta function Dμν(x - y) such that [5, 6]

iDμν ðx - yÞ  ½Aμ ðxÞ, Aν ðyÞ = lim ½ - iημν Δðx - yÞ: ð22:371Þ


m→0

Once we have gotten the invariant delta function Dμν(x) in the form of (22.369),
we can immediately obtain the Feynman propagator with respect to the photon field.
Defining the Feynman propagator DμνF ðxÞ as

Dμν μ ν
F ðxÞ  h0jT½A ðxÞA ðyÞj0i ð22:372Þ

and taking account of (22.369), we get

Dμν μν
F ðxÞ = lim ½ - η ΔF ðx - yÞ:
m→0

Writing it explicitly, we have


22.5 Quantization of the Electromagnetic Field [5–7] 1073

-i e - ikðx - yÞ
Dμν
F ð x - yÞ = d4 kð- ημν Þ
ð2π Þ4 - k2 - iE
-i ημν e - ikðx - yÞ
= d4 k : ð22:373Þ
ð2π Þ4 k2 þ iE

Using the momentum space representation, we have

μν - iημν iημν
DF ðk Þ = = = lim - ημν ΔF ðkÞ : ð22:374Þ
k þ iE
2
- k 2 - iE m → 0

Equation (22.374) can further be deformed using (22.322), (22.323), (22.326),


and (22.328). That is, (22.322) is rewritten as

2
ημν = - εμ ðk, r Þ εν ðk, r Þ - εμ ðk, 3Þ εν ðk, 3Þ þ εμ ðk, 0Þ εν ðk, 0Þ
r=1

½kμ - ðknÞnμ ½k ν - ðknÞnν 


2
=- εμ ðk, r Þ εν ðk, r Þ - þ nμ nν ð22:375Þ
r=1 ðknÞ2 - k2
½kμ kν - ðknÞðkμ nν þ nμ k ν Þ þ nμ nν k 2
2
=- εμ ðk, r Þ εν ðk, r Þ - :
r=1 ðknÞ2 - k 2

The quantity ημν in (22.374) can be replaced with that of (22.375) and the
μν
resulting DF ðk Þ can be used for the calculations of interaction among the quantum
fields.
Meanwhile, in (22.375) we assume that k2 ≠ 0 in general cases. With the real
photon we have k2 = 0. Then, we get

kμ k ν - ðknÞðkμ nν þ nμ kν Þ
2
ημν = - εμ ðk, r Þ εν ðk, r Þ - : ð22:376Þ
r=1 ðknÞ2

Equation (22.376) is rewritten as

2 k μ kν - ðknÞðkμ nν þ nμ k ν Þ
r=1
εμ ðk, r Þ εν ðk, r Þ = - ημν - : ð22:377Þ
ðknÞ2

Equation (22.377) will be useful in evaluating the matrix elements of the S-matrix
including the Feynman amplitude; see the next chapter, especially Sect. 23.8.
1074 22 Quantization of Fields

References

1. Goldstein H, Poole C, Safko J (2002) Classical mechanics, 3rd edn. Addison Wesley, San
Francisco
2. Sakamoto M (2014) Quantum field theory. Shokabo, Tokyo. (in Japanese)
3. Byron FW Jr, Fuller RW (1992) Mathematics of classical and quantum physics. Dover,
New York
4. Sunakawa S (1991) Quantum mechanics. Iwanami, Tokyo. (in Japanese)
5. Mandl F, Shaw G (2010) Quantum field theory, 2nd edn. Wiley, Chichester
6. Itzykson C, Zuber J-B (2005) Quantum field theory. Dover, New York
7. Kaku M (1993) Quantum field theory. Oxford University Press, New York
8. Kugo C (1989) Quantum theory of gauge field I. Baifukan, Tokyo. (in Japanese)
9. Greiner W, Reinhardt J (1996) Field quantization. Springer, Berlin
10. Boas ML (2006) Mathematical methods in the physical sciences, 3rd edn. Wiley, New York
11. Dennery P, Krzywicki A (1996) Mathematics for physicists. Dover, New York
12. Schweber SS (1989) An introduction to relativistic quantum field theory. Dover, New York
Chapter 23
Interaction Between Fields

So far, we have focused on the characteristics of the scalar field, Dirac field, and
electromagnetic field that behave as free fields without interaction among one
another. In this chapter, we deal with the interaction between these fields. The
interaction is described as that between the quantized fields, whose important
features have been studied in Chap. 22. The interactions between the fields are
presented as non-linear field equations. It is difficult to obtain analytic solutions of
the non-linear equation, and so suitable approximation must be done. The method
for this is well-established as the perturbation theory that makes the most of the S-
matrix. Although we have dealt with the perturbation method in Chap. 5, the method
we use in the present chapter differs largely from the previous method. Here we
focus on the interaction between the Dirac field (i.e., electron) and the electromag-
netic field (photon). The physical system that comprises electrons and photons has
been fully investigated and established as the quantum electrodynamics (QED). We
take this opportunity to get familiar with this major field of quantum mechanics. In
the last part of the chapter, we deal with the Compton scattering as a typical example
to deepen understanding of the theory [1].

23.1 Lorentz Force and Minimal Coupling

Although in the quantum theory of fields abstract and axiomatic approaches tend to
be taken, we wish to adhere to an intuitive approach to the maximum extent possible.
To that effect, we get back to the Lorentz force we studied in Chaps. 4 and 15. The
Lorentz force and its nature are widely investigated in various areas of natural
science. The Lorentz force F(t) is described by

© Springer Nature Singapore Pte Ltd. 2023 1075


S. Hotta, Mathematical Physical Chemistry,
https://doi.org/10.1007/978-981-99-2512-4_23
1076 23 Interaction Between Fields

Fðt Þ = eEðxðt ÞÞ þ exð_t Þ × Bðxðt ÞÞ, ð4:88Þ

where the first term is electric Lorentz force and the second term represents the
magnetic Lorentz force.
We wish to relate the Lorentz force to the Hamiltonian of the interacting physical
system within the framework of the quantum field theory. To this end, the minimal
coupling (or minimal substitution) is an important concept in association with the
covariant derivative. The covariant derivative is defined as [1]

∂ ∂
i → i - qϕðt, xÞ,
∂t ∂t
1 1
— → — - qAðt, xÞ, ð23:1Þ
i i

where q is a charge of the matter field; ϕ(t, x) and A(t, x) are the electromagnetic
potentials. Using the four-vector notation, we have

∂μ → Dμ  ∂μ þ iqAμ ðxÞ, ð23:2Þ

where Dμ defines the covariant derivative, which leads to the interaction between the
electron and photon. This interaction is referred to as a minimal coupling. To see
how the covariant derivative produces the interaction, we replace ∂μ with Dμ in
(22.213) such that

iγ μ Dμ - m ψ ðxÞ = 0: ð23:3Þ

Then, we have

iγ μ ∂μ þ ieAμ - m ψ ðxÞ = 0, ð23:4Þ

where e is an elementary charge (e < 0), as defined in Chap. 4. Rewriting (23.4), we


get

3 3
iγ 0 ∂0 ψ = - i k=1
γ k ∂k þ eγ 0 A0 þ e k=1
γ k Ak þ m ψ: ð23:5Þ

Multiplying both sides of (23.5) by γ 0, we obtain

3 3
i∂0 ψ = - i k=1
γ 0 γ k ∂k þ eγ 0 γ 0 A0 þ e k=1
γ 0 γ k Ak þ mγ 0 ψ
3 3
= -i k=1
γ 0 γ k ∂k þ mγ 0 þ eA0 þ e k=1
γ 0 γ k Ak ψ
23.1 Lorentz Force and Minimal Coupling 1077

1
= α  — þ mβ þ ðeA0 - eα  AÞ ψ, ð23:6Þ
i

γ1
where α  γ 0γ; γ = γ2 and β  γ 0. The operator iα
1
 — þ mβ is identical with
γ3
the one-particle Hamiltonian H~ (of the free Dirac field) defined in (22.222). There-
fore, the operator (eA0 - eα  A) in (23.6) is the interaction Hamiltonian Hint that
represents the interaction between the Dirac field and electromagnetic field. Sym-
bolically writing (23.6), we have

~ þ H int ψ ðxÞ,
i∂0 ψ ðxÞ = H ð23:7Þ

where

~ = 1 α  — þ mβ
H ð23:8Þ
i

and

H int = eA0 - eα  A: ð23:9Þ

Defining the one-particle Hamiltonian H as

~ þ H int ,
H=H ð23:10Þ

we get

i∂0 ψ ðxÞ = Hψ ðxÞ: ð23:11Þ

Equations (23.7) and (23.11) show the Schrödinger equation of the Dirac field
that interacts with the electromagnetic field. The interaction Hamiltonian Hint is
directly associated with the Lorentz force F(t) described by (4.88).
Representing E and B using the electromagnetic potential as in (22.279) and
(22.283), we get

∂Aðxðt ÞÞ
Fð t Þ = - e — ϕ ð x ð t Þ Þ - þ exð_t Þ × rot Aðxðt ÞÞ: ð23:12Þ
∂t

The second term of (23.12) can be rewritten as [2]

exð_t Þ × rot A = e— A A  xð_t Þ - e xð_t Þ  — A, ð23:13Þ

where — A acts only on A (i.e., vector potential). Then, (23.12) is further rewritten as
1078 23 Interaction Between Fields

∂Aðxðt ÞÞ
Fð t Þ = - e — ϕ ð x ð t Þ Þ - þ e— A A  xð_t Þ - e xð_t Þ  — A:
∂t

Of these terms, the moiety of conservative force FC(t) is expressed as

FC ðt Þ = - e— ϕðxðt ÞÞ þ e— A A  xð_t Þ : ð23:14Þ

Integrating (23.14) between two spatial points of Q and P, we obtain

P P
- FC ðt Þ  dl = e — ϕðxðt ÞÞ - — A A  xð_t Þ  dl
Q Q
P P
=e dϕðxðt ÞÞ - e d xð_t Þ  A
Q Q

= e½ϕðPÞ - ϕðQÞ - e xð_t Þ  AðPÞ - xð_t Þ  AðQÞ ,

where dl means the line integral; we used — ϕ  dl = dϕ and


— A A  xð_tÞ  dl = d xð_tÞ  A . Taking Q → - 1 and setting ϕ(-1) = A(-1) = 0, we
get

P
- FC ðt Þ  dl = eϕðPÞ - exð_t Þ  AðPÞ: ð23:15Þ
Q

Comparing (23.15) with (23.6), we find that α corresponds to the classical


velocity operator xð_t Þ [3]. Thus, Hint of (23.7) corresponds to a work the Lorentz
force exerts on an electron. Notice, however, that α of (23.6) does not have a concept
of a particle velocity. It is because — A of (23.14) represents a derivative with respect
to the vector potential. Hence, the minimal coupling introduced through the covar-
iant derivative defined in (23.2) properly reflects the interaction between the Dirac
field and electromagnetic field.
If the interaction is sufficiently weak, namely Hint is sufficiently small compared
~ we can apply the perturbation method we developed in Chap. 5. It is indeed the
to H,
case with our problems that follow. Nevertheless, our approaches described in this
chapter largely differ from those of Chap. 5. After we present the relevant methods
that partly include somewhat abstract notions, we address practical problems such as
the Compton scattering which we dealt with in Chap. 1.
23.2 Lagrangian and Field Equation of the Interacting Fields 1079

23.2 Lagrangian and Field Equation of the Interacting


Fields

We continue the aforementioned discussion still further. To represent Hint in a scalar


form using the four-vector potentials, we rewrite (23.9) as

H int = eγ 0 γ μ Aμ :

Assuming that ψ(x) of (23.11) is normalized, we have an expectation value of


hHinti such that

hH int ðt Þi = hψ ðxÞjH int jψ ðxÞi

= d 3 x ψ { ðxÞeγ 0 γ μ Aμ ðxÞψ ðxÞ

= d 3 x eψ ðxÞγ μ Aμ ðxÞψ ðxÞ: ð23:16Þ

Thus, the interaction Hamiltonian density Hint ðxÞ is given by the integrand of
(23.16) as

Hint ðxÞ  eψ ðxÞγ μ Aμ ðxÞψ ðxÞ: ð23:17Þ

Then, the interaction Lagrangian density Lint ðxÞ is given by

Lint ðxÞ = - Hint ðxÞ = - eψ ðxÞγ μ Aμ ðxÞψ ðxÞ: ð23:18Þ

Note that in (23.18) we have derived the Lagrangian density from the Hamilto-
nian density in a way opposite to (22.37). This derivation, however, does not cause a
problem, because regarding the equation of motion of the interacting system we are
considering now, the interaction does not involve derivatives of physical variables
(see Sect. 22.1). Or rather, since the kinetic energy is irrelevant to the interaction,
from (22.31) and (22.40) we have

Lint ðxÞ = - Uint ðxÞ = - Hint ðxÞ,

where Uint ðxÞ denotes the potential energy density.


On the basis of the above discussion and (22.210), (22.318), as well as (23.18),
the Lagrangian density LðxÞ of the interacting fields (we assume the interaction
between the Dirac field and electromagnetic field) is given by

LðxÞ = L ðxÞ þ Lint ðxÞ


1080 23 Interaction Between Fields

1 μ
= ψ ðxÞ iγ μ ∂μ - m ψ ðxÞ - ∂ A ðxÞ∂ Aν ðxÞ - eψ ðxÞγ μ Aμ ðxÞψ ðxÞ: ð23:19Þ
2 μ ν

Note that in some textbooks [1] the sign of the third term is reversed. This is due
to the difference in the definition of e (i.e., e < 0 in this book, whereas e > 0 in [1]).
Rewriting (23.19) using the covariant derivative, we get

1 μ
LðxÞ = ψ ðxÞ iγ μ ∂μ þ ieAμ ðxÞ - m ψ ðxÞ - ∂μ Aν ðxÞ∂ Aν ðxÞ
2
1 μ
= ψ ðxÞ iγ μ Dμ - m ψ ðxÞ - ∂μ Aν ðxÞ∂ Aν ðxÞ: ð23:20Þ
2

The quantum mechanics featured by the Lagrangian density of (23.19) or (23.20)


is called quantum electrodynamics (QED). From now on, we focus on QED and
describe its characteristics including the fundamental principle and practical prob-
lems. For future use, we introduce the charge-current density operator sμ(x) as
follows [1]:

sμ ðxÞ  qψ ðxÞγ μ ψ ðxÞ: ð23:21Þ

Notice that in the case of electron, q = e (<0). Using sμ(x), we have

Lint ðxÞ = - sμ ðxÞAμ ðxÞ: ð23:22Þ

The conserved charge Q is described by

Q= d3 x s0 ðxÞ = q d3 xψ ðxÞγ 0 ψ ðxÞ = q d3 x ψ { ðxÞγ 0 γ 0 ψ ðxÞ

=q d3 x ψ { ðxÞψ ðxÞ, ð23:23Þ

where with the last equality we used γ 0γ 0 = 1. Meanwhile, the conserved current J is
represented as

J= d 3 x s k ð xÞ = q d3 xψ ðxÞγ k ψ ðxÞ = q d 3 x ψ { ðxÞγ 0 γ k ψ ðxÞ

=q d 3 x ψ { ðxÞα ψ ðxÞ, ð23:24Þ

where we used α  γ 0γ.


To confirm the validity of (23.19), we take the action integral following (22.211)
and (22.212). The calculation procedures are almost the same as before. The only
difference is about the additional term with respect to Lint ðxÞ. That is, we have
23.2 Lagrangian and Field Equation of the Interacting Fields 1081

δLint ðxÞ = δψ ðxÞ - eγ μ Aμ ðxÞψ ðxÞ þ - eψ ðxÞγ μ Aμ ðxÞ δψ ðxÞ:

In response to (22.212), we obtain

δSðϕÞ = dxLðxÞ = dx½L ðxÞ þ Lint ðxÞ

= dx δψ iγ μ ∂μ ψ - mψ - eγ μ Aμ ðxÞψ ðxÞ

- dx i∂μ ψγ μ þ mψ þ eψ ðxÞγ μ Aμ ðxÞ δψg: ð23:25Þ

On the conditions of δψ = 0, we get

iγ μ ∂μ þ ieAμ ðxÞ - m ψ ðxÞ = 0 ð23:26Þ

or

iγ μ ∂μ - eγ μ Aμ ðxÞ - m ψ ðxÞ = 0: ð23:27Þ

Meanwhile, from δψ = 0, we obtain the adjoint Dirac equation described by

i∂μ ψγ μ þ eψ ðxÞγ μ Aμ ðxÞ þ mψ = 0: ð23:28Þ

Equation (23.27) is the Dirac equation with the system where electron and photon
are interacting with each other. As in the case of Sect. 22.4.1, (23.27) and (23.28) are
equivalent. Also, note that (21.150) is the adjoint Dirac equation (without interaction
with photon), if we assume that the solution of the Dirac equation is given by
ψ(x) = u(p, h) e-ipx.
In conjunction with (23.27), we remark the implication of the charge conjugation.
Taking the adjoint of (23.27), we have

- i∂μ ψ { γ μ{ - eψ { A{μ γ μ{ - mψ { = 0,

# multiply γ 0 on the above from the right and insert γ 0γ 0 = 1:

- i∂μ ψ { γ 0 γ 0 γ μ{ γ 0 - eψ { γ 0 γ 0 Aμ γ μ{ γ 0 - mψ { γ 0 = 0,

# use ψ = ψ { γ 0 , γ 0γ μ{γ 0 = γ μ, and A{μ = Aμ (Hermitian):

- i∂μ ψγ μ - eAμ ψγ μ - mψ = 0,

# take transposition of the above:


1082 23 Interaction Between Fields

- i∂μ γ μT ψ T - eAμ γ μT ψ T - mψ T = 0,

# multiply C on the above from the left and insert C-1C = 1:

- i∂μ Cγ μT C - 1 Cψ T - eAμ Cγ μT C - 1 Cψ T - mCψ T = 0,

# use Cγ μTC-1 = - γ μ and Cψ T = ψ C :

i∂μ γ μ ψ C þ eAμ γ μ ψ C - mψ C = 0:

Hence, we get

iγ μ ∂μ þ eγ μ Aμ - m ψ C = 0: ð23:29Þ

Notice that during the above processes that derived (23.29), Aμ commutes with
the gamma matrices γ μ. With the operators ψ C and C, see (21.161) and (21.164).
Comparing (23.27) and (23.29), we find that e is exchanged with -e (-e > 0, i.e.,
positive charge). Thus, (23.29) represents the field equation of the positron.

23.3 Local Phase Transformation and U(1) Gauge Field

In Chap. 22, we mentioned several characteristics of the gauge field and the
associated gauge transformation. To give a precise definition of the “gauge field”
and “gauge transformation,” however, is beyond the scope of this book. Yet, we can
get an intuitive concept of these principles at the core of the covariant derivative. In
Chap. 22, we have shown that the field strength Fμν(x) is invariant under the gauge
transformation of (22.293).
Meanwhile, we can express an exponential function (or phase factor) as

eiΘðxÞ , ð23:30Þ

where Θ(x) is a real function of x. The phase factor (23.30) is different from those
that appeared in Part I in that in this chapter the exponent may change depending
upon the space-time coordinate x. Let us consider a function transformation
described by

f 0 ðxÞ = eiΘðxÞ f ðxÞ: ð23:31Þ

Also, let us consider a set GðxÞ defined by


23.3 Local Phase Transformation and U(1) Gauge Field 1083

GðxÞ  eiΘðxÞ ; ΘðxÞ : real function of x : ð23:32Þ

Then, GðxÞ forms a U(1) group (i.e., continuous transformation group). That is,
(A1) closure and (A2) associative law are evident (see Sect. 16.1) by choosing real
functions Θλ(x) (λ 2 Λ), where Λ can be a finite set or infinite set with different
cardinalities (see Sect. 6.1). (A3) Identity element corresponds to Θ(x)  0.
(A4) Inverse element eiΘ(x) is e-iΘ(x). The functional transformation described by
(23.31) and related to GðxÞ is said to be a local phase transformation. The important
point is that if we select Θ(x) appropriately, we can make the covariant derivative of
f(x) in (23.31) undergo the same local phase transformation as f(x). In other words,
appropriately selecting a functional form as Θ(x), we may have

0
Dμ f ðxÞ = eiΘðxÞ Dμ f ðxÞ , ð23:33Þ

where the prime (′) denotes the local phase transformation.


Meanwhile, in Chap. 22 we chose a function Λ(x) for the gauge transformation
such that
μ
A0μ ðxÞ = Aμ ðxÞ þ ∂ ΛðxÞ: ð22:293Þ

We wish to connect Θ(x) of (23.31) to Λ(x). The proper choice is to fix Θ(x) at

ΘðxÞ = - qΛðxÞ:

We show this as follows:

0
Dμ f ðxÞ = ∂μ þ iqA0μ ðxÞ f 0 ðxÞ = ∂μ f 0 ðxÞ þ iqA0μ ðxÞf 0 ðxÞ
μ
= ∂μ e - iqΛðxÞ f ðxÞ þ iq½Aμ ðxÞ þ ∂ ΛðxÞe - iqΛðxÞ f ðxÞ

= - iqe - iqΛðxÞ ∂μ ΛðxÞ f ðxÞ þ e - iqΛðxÞ ∂μ f ðxÞ

þiqAμ ðxÞe - iqΛðxÞ f ðxÞ þ iq ∂μ ΛðxÞ e - iqΛðxÞ f ðxÞ

= e - iqΛðxÞ ∂μ þ iqAμ ðxÞ f ðxÞ

= e - iqΛðxÞ Dμ f ðxÞ: ð23:34Þ

In (23.34), the prime (′) denotes either the local phase transformation with the
Dirac field or the gauge transformation with the electromagnetic field. Applying
(23.34) to ψ ðxÞ iγ μ Dμ - m ψ ðxÞ, we get
1084 23 Interaction Between Fields

0
ψ ðxÞ iγ μ Dμ - m ψ ðxÞ = eiqΛðxÞ ψ ðxÞe - iqΛðxÞ iγ μ Dμ - m ψ ðxÞ
= ψ ðxÞ iγ μ Dμ - m ψ ðxÞ: ð23:35Þ

Thus, the first term of LðxÞ of (23.20) is invariant under both the gauge transfor-
mation (22.293) and the local phase transformation (23.31). As already seen in
(22.318), the second term of LðxÞ of (23.20) represents the Lagrangian density of
the free electromagnetic field. From (22.316) and (22.317), ∂μAμ(x) is gauge invari-
ant if one uses the Lorentz gauge. It is unclear, however, whether the second term of
LðxÞ in (23.20) is gauge invariant. To examine it, from (22.293) we have
μ ν μ ν
∂μ A0ν ðxÞ∂ A0 ðxÞ = ∂μ f½Aν þ ∂ν ΛðxÞgf∂ ½Aν ðxÞ þ ∂ ΛðxÞg
μ μ ν
= ∂μ Aν ðxÞ þ ∂μ ∂ν ΛðxÞ ½∂ Aν ðxÞ þ ∂ ∂ ΛðxÞ
μ μ ν
= ∂μ Aν ðxÞ∂ Aν ðxÞ þ ∂μ Aν ðxÞ∂ ∂ ΛðxÞ
μ μ ν
þ∂μ ∂ν ΛðxÞ∂ Aν ðxÞ þ ∂μ ∂ν ΛðxÞ∂ ∂ ΛðxÞ: ð23:36Þ

Thus, we see that the term ∂μAν(x)∂μAν(x) is not gauge invariant. Yet, this does
not cause a serious problem. Let us think of the action integral S(Aν) of
∂μAν(x)∂μAν(x). As in (22.306), we have

μ ν
S A0ν = dx ∂μ A0ν ðxÞ∂ A0 : ð23:37Þ

Taking the variation with respect to Aν and using integration by parts with the
second and third terms of (23.36), we are left with ∂μ∂μΛ(x) for them on condition
that Aν(x) vanishes at x = ± 1. But, from (22.317), we have
μ
∂μ ∂ ΛðxÞ = 0: ð23:38Þ

Taking variation with respect to Λ with the last term of (23.36), we also have the
same result of (23.38). Thus, only the first term of (23.36) survives on condition that
Aν(x) vanishes at x = ± 1 and, hence, we conclude that LðxÞ of (23.20) is virtually
unchanged in total by the gauge transformation. That is, LðxÞ is gauge invariant.
In this book, we restrict the gauge field to the electromagnetic vector field Aμ(x).
Correspondingly, we also restrict the gauge transformation to (22.293). By virtue of
the local phase transformation (23.31) and the gauge transformation (22.293), we
can construct a firm theory of QED. More specifically, we can develop the theory
that uniquely determines the interaction between the Dirac field and gauge field (i.e.,
electromagnetic field).
Before closing this section, it is worth mentioning that the gauge fields must be
accompanied by massless particles. Among them, the photon field is the most
important both theoretically and practically. Moreover, the photons are only an
23.4 Interaction Picture 1085

entity that has ever been observed experimentally as a massless free particle.
Otherwise, gluons are believed to exist, accompanied by the vector bosonic field,
but those have never been observed as a massless free particle so far [4, 5]. Also, the
massless free particles must move at a light velocity.
We revisit Example 21.1 where the inertial frame of reference O′ is moving with a
constant velocity v (>0) relative to the frame O. Suppose that the x-axis of O and the
x′-axis of O′ coincide and that the frame O′ is moving in the positive direction of the
x′-axis. Also, suppose that a particle is moving along the x-axis (x′-axis, too) at a
velocity of u and u′ relative to the frame O and O′, respectively. Then, we have [6]

u-v u0 þ v
u0 = and u= : ð23:39Þ
1 - uv 1 þ u0 v

If we put u′ = 1, we have u = 1, vice versa. This simple relation clearly


demonstrates that if the velocity of a particle is 1 (i.e., light velocity) with respect
to an arbitrarily chosen inertial frame of reference, the velocity of that particle must
be 1 with respect to any other inertial frames of reference as well. This is the direct
confirmation of the principle of constancy of light velocity and an outstanding
feature of the gauge field.

23.4 Interaction Picture

In the previous sections, we have seen how the covariant derivative introduces the
interaction term into the Dirac equation through minimal coupling. Meanwhile, we
also acknowledge that this interaction term results from the variation of the action
integral related to the interaction Lagrangian. As a result, we have gotten the proper
field equation of (23.27). The magnitude of the interaction relative to the free field,
however, is not explicitly represented in (23.27). In this respect, the situation is
different from the case of Chap. 5 where the interaction Hamiltonian was quantita-
tively presented. Thus, we are going to take a different approach in this section. The
point is that the Hamiltonian density Hint ðxÞ is explicitly presented in (23.17). This
provides a strong clue to solving the problem. We adopt an interaction picture to
address a given issue.
Let us assume the time-dependent total Hamiltonian H(t) as follows:

H ðt Þ = H 0 ðt Þ þ H int ðt Þ, ð23:40Þ

where H0(t) is a free-field Hamiltonian and Hint(t) is an interaction Hamiltonian. All


the H(t), H0(t), and Hint(t) may or may not be time-dependent, but we designate them
as, e.g., H(t), where t denotes time. From (23.17), as Hint(t) we have
1086 23 Interaction Between Fields

H int ðt Þ = dx3 Hint ðxÞ = dx3 eψ ðxÞγ μ Aμ ðxÞψ ðxÞ: ð23:41Þ

To deal with the interaction between the quantum fields systematically, we revisit
Sect. 1.2. In Chap. 1, we solved the Schrödinger equation by the method of the
separation of variables and obtained the solution of (1.48) described by

ψ ðx, t Þ = expð- iEt ÞϕðxÞ, ð1:60Þ

where we adopted the natural units of ħ = 1. More generally, the solution of (1.47) is
given as

j ψ ðt, xÞi = exp½ - iH ðt Þt jψ ð0, xÞi: ð23:42Þ

With the exponential function of the operator H(t), especially that given in a
matrix form see Chap. 15. Since H(t) is Hermitian, -iH(t)t is anti-Hermitian. Hence,
exp[-iH(t)t] is unitary; see Sect. 15.2. Thus, (23.42) implies that the state vector
jψ(t, x)i evolves with time through the unitary transformation of exp[-iH(t)t],
starting with | ψ(0, x)i.
Now, we wish to generalize the present situation in some degree. We can think of
the situation as viewing the same vector in reference to a different set of basis vectors
(Sect. 11.4). Suppose that jψ(λ, x)i is an element of a vector space V (or Hilbert
space, more generally) that is spanned by a set of basis vectors E λ ðλ 2 ΛÞ, where Λ
is an infinite set (see Sect. 6.1). Also, let jψ(λ, x)i′ be an element of another vector
space V 0 spanned by a set of basis vectors E 0λ ðλ 2 ΛÞ. In general, V 0 might be
different from the original vector space V , but in this chapter we assume that the
mapping caused by an operator is solely an endomorphism (see Sect. 11.2). Suppose
that a quantum state ψ is described as

ψ= dλE λ j ψ ðλ, xÞi: ð23:43Þ

Equation (23.43) is given pursuant to (11.13) so that the vector ψ can be dealt
with in a vector space with an infinite dimension. Note, however, if we were strictly
pursuant to (13.32), we would write instead

jψi = dλjE λ iψ ðλ, xÞ,

where jE λ i is the basis set of the inner product space with ψ(λ, x) being a coordinate
of a vector in that space. A reason why we use (23.43) is that we place emphasis on
the coordinate representation; the basis set representation is hardly used in the
present case.
Operating O on the function ψ, we have
23.4 Interaction Picture 1087

Oðψ Þ = dλE λ O j ψ ðλ, xÞi, ð23:44Þ

where O is an arbitrary Hermitian operator. For a future purpose, we assume that the
operator O does not depend on λ in a special coordinate system (or frame). Equation
(23.44) can be rewritten as

Oðψ Þ = dλE λ V ðλÞV ðλÞ{ OV ðλÞV ðλÞ{ j ψ ðλ, xÞi, ð23:45Þ

where V(λ) is a unitary operator. According to Theorem 14.5, as O is Hermitian,


O can be diagonalized via unitary similarity transformation. But in (23.45), we do
not necessarily assume the diagonalization. Putting

E 0λ  E λ V ðλÞ, O0 ðλÞ  V ðλÞ{ OV ðλÞ, V ðλÞ{ jψ ðλ, xÞi  jψ ðλ, xÞi0 , ð23:46Þ

we rewrite (23.45) such that

O ðψ Þ = dλE 0λ O0 ðλÞ j ψ ðλ, xÞi0 : ð23:47Þ

Equation (23.47) shows how the operator O′(λ) and the state vector jψ(λ, x)i′
change in accordance with the transformation of the basis vectors E λ . We can
develop the theory, however, so that jψ(λ, x)i′ may not depend upon λ in a specific
frame.
To show how it looks in the present case, according to the custom we choose
t (time variable) for λ and choose a unitary operator exp[-iH(t)t] for V(t). Namely,

V ðt Þ = exp½ - iH ðt Þt   U ðt Þ: ð23:48Þ

A time-independent state vector | ψ(0, x)i of (23.42) is said to be a state vector of


Heisenberg picture (or Heisenberg representation). In other words, we can choose
the Heisenberg picture so that the state vector jψ(t, x)i′ may not depend upon t. The
other way around, we may choose a special frame where the operator O does not
depend on t. That frame is referred to as the Schrödinger frame where a time-
dependent state vector | ψ(t, x)i described by (23.42) is called a state vector of
Schrödinger picture (or Schrödinger representation). Accordingly, we designate a
state vector in the Heisenberg frame as j iH and that in the Schrödinger frame as j iS.
Replacing λ with t, jψ(λ, x)i with | ψ(t, x)iS, and O with OS in (23.44), we rewrite
it as
1088 23 Interaction Between Fields

O ðψ Þ = dtE t OS jψ ðt, xÞiS , ð23:49Þ

where OS denotes the operator O in the Schrödinger frame. Equation (23.49) can
further be rewritten as

O ðψ Þ = dtE t U ðt ÞU ðt Þ{ OS U ðt ÞU ðt Þ{ jψ ðt, xÞiS

= dtE 0t U ðt Þ{ OS U ðt Þ jψ ðt, xÞi0S

= dtE 0t U ðt Þ{ OS U ðt Þ jψ ðt, xÞi0S

= dtE 0t OH ðt Þjψ ð0, xÞiH , ð23:50Þ

where we define

jψ ðt, xÞi0S  U ðt Þ{ jψ ðt, xÞiS , ð23:51Þ


E 0t  E t U ðt Þ,

jψ ð0, xÞiH  jψ ðt, xÞi0S = U ðt Þ{ jψ ðt, xÞiS , ð23:52Þ

OH ðt Þ  U ðt Þ{ OS U ðt Þ: ð23:53Þ

Putting t = 0 in the above equation, we have

jψ ð0, xÞiH = jψ ð0, xÞiS :

Equations (23.49) and (23.50) imply the invariance of O(ψ) upon transfer from
the Schrödinger frame to the Heisenberg frame. The situation is in parallel to that of
Sect. 11.4. We can view (23.51) and (23.52) as the unitary transformation between
the two vectors | ψ(t, x)iS and | ψ(0, x)iH. Correspondingly, (23.53) represents the
unitary similarity transformation of the operator O between the two frames.
In (23.53), we assume that OH(t) is a time-dependent operator, but that OS is time-
independent. In particular, if we choose the Hamiltonian H(t) for O,

H H ðt Þ = H H = H S : ð23:54Þ

It is because H(t) and U(t) = exp [-iH(t)t] are commutative. Thus, the Hamil-
tonian is a time-independent quantity, regardless of the specific frames chosen.
Differentiating OH(t) with respect to t in (23.53), we obtain
23.4 Interaction Picture 1089

dOH ðt Þ
= iH ðt ÞeiH ðtÞt OS e - iH ðtÞt þ eiH ðtÞt OS ½ - iH ðt Þe - iH ðtÞt
dt
= iH ðt ÞeiH ðtÞt OS e - iH ðtÞt - ieiH ðtÞt OS e - iH ðtÞt H ðt Þ

= i H ðt ÞeiH ðtÞt OS e - iH ðtÞt - eiH ðtÞt OS e - iH ðtÞt H ðt Þ

= i½H ðt ÞOH ðt Þ - OH ðt ÞH ðt Þ = i½H ðt Þ, OH ðt Þ, ð23:55Þ

where with the second equality, we used the fact that e-iH(t)t and H(t) are commu-
tative. Multiplying both sides of (23.55) by i, we get

dOH ðt Þ
i = ½OH ðt Þ, H ðt Þ: ð23:56Þ
dt

Equation (23.56) is well-known as the Heisenberg equation of motion. Let us


think of a commutator [OH(t), PH(t)], where PH(t) is another physical quantity of the
Heisenberg picture. Then, we have

½OH ðt Þ, PH ðt Þ = U ðt Þ{ ½OS , PS U ðt Þ, ð23:57Þ

where PS is the physical quantity of the Schrödinger picture that corresponds to PH(t)
of the Heisenberg picture. If, in particular, [OS, PS] = c (c is a certain constant),
[OH(t), PH(t)] = c as well. Thus, we have

½qH ðt Þ, pH ðt Þ = ½qS , pS  = i, ð23:58Þ

where q and p are the canonical coordinate and canonical momentum, respectively.
Thus, the unitary similarity transformation described by (23.53) makes the canonical
commutation relation invariant. Such a transformation is called the canonical
transformation.
Now, we get back to the major issue related to the interaction picture. A major
difficulty in solving the interacting field equation rests partly on the fact that H0(t)
and Hint(t) in general are not commutative. Let us consider another transformation of
the state vector and operator different from (23.50). That is, using

U 0 ðt Þ  exp½ - iH 0 ðt Þt  ð23:59Þ

instead of U(t), we have an expression related to (23.51)-(23.53). Namely, we have

Oðψ Þ = dtE It OI ðt Þjψ ðt, xÞiI , ð23:60Þ

where we define
1090 23 Interaction Between Fields

jψ ðt, xÞiI  U 0 ðt Þ{ jψ ðt, xÞiS = U 0 ðt Þ{ U ðt Þjψ ð0, xÞiH , ð23:61Þ


E It  E t U 0 ðt Þ,

OI ðt Þ  U 0 ðt Þ{ OS U 0 ðt Þ: ð23:62Þ

In (23.62), OI(t) is a physical quantity of the interaction picture. This time, | ψ(t,
x)iI is time-dependent.
The previous results (23.45) and (23.46) obtained with the Schrödinger picture
and Heisenberg picture basically hold as well between the Schrödinger picture and
the interaction picture. Differentiating | ψ(t, x)iI with respect to t, we get

djψ ðt, xÞiI


= iH 0 ðt ÞeiH 0 ðtÞt jψ ðt, xÞiS þ eiH 0 ðtÞt ½ - iH ðt Þjψ ðt, xÞiS
dt
= iH 0 ðt ÞeiH 0 ðtÞt jψ ðt, xÞiS - ieiH 0 ðtÞt ½H 0 ðt Þ þ H int ðt Þjψ ðt, xÞiS

= iH 0 ðt ÞeiH 0 ðtÞt jψ ðt, xÞiS - iH 0 ðt ÞeiH 0 ðtÞt jψ ðt, xÞiS

- ieiH 0 ðtÞt H int ðt Þjψ ðt, xÞiS

= - ieiH 0 ðtÞt H int ðt Þe - iH 0 ðtÞt jψ ðt, xÞiI


= - iH int ðt ÞI jψ ðt, xÞiI , ð23:63Þ

where with the third equality eiH 0 ðtÞt and H0(t) are commutative. In (23.63), we
defined Hint(t)I as

H int ðt ÞI  eiH 0 ðtÞt H int ðt Þe - iH 0 ðtÞt = U 0 ðt Þ{ H int ðt ÞU 0 ðt Þ: ð23:64Þ

Also, notice again that in (23.63) e ± iH 0 ðtÞt and Hint(t) are not commutative in
general. Multiplying both sides of (23.63) by i, we obtain

djψ ðt, xÞiI


i = H int ðt ÞI jψ ðt, xÞiI : ð23:65Þ
dt

Similarly to (23.56), we have

dOI ðt Þ
i = ½OI ðt Þ, H 0 ðt Þ: ð23:66Þ
dt

Close relationship between (23.66) and (23.56) is evident.


We schematically depict the conversion among the three pictures of Schrödinger,
Heisenberg, and interaction in Fig. 23.1. Equation (23.65) formally resembles the
time-dependent Schrödinger equation (1.47) and provides a starting point to address
a key question of the interacting fields.
23.5 S-Matrix and S-Matrix Expansion 1091

Fig. 23.1 State vector | ψ(t,


x)iI of the interaction picture | ( , )⟩
in comparison with those of
the Schrödinger picture ( ) ( )
| ψ (t, x)iS and the ( )
Heisenberg picture | ψ(0,
x)iH

| ( , )⟩ | ( , )⟩

23.5 S-Matrix and S-Matrix Expansion

Since (23.65) is a non-linear equation, it is impossible in general to obtain an analytic


solution. Instead, we will be approaching the problem by an iteration method. To this
end, the S-matrix expansion plays a central role.
From here on, we exclusively adopt the interaction picture that gave (23.65), and
so we rewrite (23.65) as follows:

djΨðt, xÞi
i = H int ðt ÞjΨðt, xÞi, ð23:67Þ
dt

where the subscript I of | ψ(t, x)iI and Hint(t)I was omitted. Namely, as we work
solely on the interaction picture, we redefine the state vector and Hamiltonian such
that

jΨðt, xÞi  jψ ðt, xÞiI and H int ðt Þ  H int ðt ÞI : ð23:68Þ

If we have no interaction, i.e., Hint(t) = 0, we have a state vector | Ψ(t, x)i constant
in time. Suppose that the interaction is “switched on” at a certain time ton and that the
state vector is in | Ψ(ti, x)i at an initial time t = ti well before ton. In that situation, we
define the state vector | Ψ(ti, x)i as

jΨðt i , xÞi  jii ≈ jΨð- 1, xÞi: ð23:69Þ

Also, suppose that the interaction is “switched off” at a certain time toff and that
the state vector is in | Ψ(tf, x)i at a final time t = tf well after toff. Then, the state vector
| Ψ(tf, x)i is approximated by

jΨ t f , x ≈ jΨð1, xÞi: ð23:70Þ

Because of the interaction (or collision), the state | Ψ(1, x)i may include different
final states | fαi (α = 1, 2, ⋯), which may involve different species of particles. If we
assume that | fαi form a complete orthonormal system (CONS), we have [1]
1092 23 Interaction Between Fields

jf α ihf α j = E, ð23:71Þ
α

where E is an identity operator. Then, we have

jΨð1, xÞi = jf α ihf α jΨð1, xÞi: ð23:72Þ


α

(i) The state | ii is described by the Fock state [see (22.126)] that is characterized by
the particle species and definite number of those particles with definite momenta
and energies. The particles in | ii are far apart from one another before the
collision such that they do not interact with one another. The particles in | fαi are
also far apart from one another after the collision.
(ii) The states | ii and | fαi are described by the plane wave.
Normally, the interaction is assumed to cause in a very narrow space and short
time. During the interaction, we cannot get detailed information about | Ψ(t, x)i.
Before and after the interaction, the species or number of particles may or may not be
conserved. The plane wave description is particularly suited for the actual experi-
mental situation where accelerated particle beams collide. The bound states, on the
other hand, are neither practical nor appropriate. In what follows we are going to deal
with electrons, positrons, and photons within the framework of QED.
The S-matrix S is defined as an operator that relates | fαi to | ii such that

f α jΨ t f , x ≈ hf α jΨð1, xÞi = hf α jSjii  Sf α ,i = hf α jSjΨð- 1, xÞi, ð23:73Þ

where with the first near equality, we used (23.70). Then, (23.72) can be rewritten as

jΨð1, xÞi = Sf α ,i jf α i: ð23:74Þ


α

Equation (23.74) is a plane wave expansion of | Ψ(1, x)i. If jΨ(tf, x)i and jii are
normalized, we have

Ψ tf , x j Ψ tf , x = hijii = 1 = ijS{ Sji : ð23:75Þ

Since the transformation with S does not change the norm, S is unitary.
Integrating (23.67) with respect to t from -1 to t, we get

t
j Ψðt, xÞi = j Ψð- 1, xÞi þ ð- iÞ dt 1 H int ðt 1 Þ j Ψðt 1 , xÞi
-1
23.5 S-Matrix and S-Matrix Expansion 1093

t
= j ii þ ð- iÞ dt 1 H int ðt 1 Þ j Ψðt 1 , xÞi: ð23:76Þ
-1

The RSH of (23.76) comprises jii and an additional second term. Since we can
solve (23.67) only iteratively, as a next step replacing jΨ(t1, x)i with

t1
j ii þ ð- iÞ H int ðt 2 Þ j Ψðt 2 , xÞi, ð23:77Þ
-1

we obtain

t
j Ψðt, xÞi = j ii þ ð- iÞ dt 1 H int ðt 1 Þ j ii
-1
t t1
þð- iÞ2 dt 1 dt 2 H int ðt 1 ÞH int ðt 2 Þ j Ψðt 2 , xÞi: ð23:78Þ
-1 -1

Repeating the iteration, we get

j Ψðt, xÞi
= j ii
t tn - 1
r
þ n=1
ð- iÞn dt 1 ⋯ dt n H int ðt 1 Þ⋯H int ðt n ÞjΨðt n , xÞi , ð23:79Þ
-1 -1

where t0  t when n = 1. Although we could further replace jΨ(tr, x)i with

tr
j ii þ ð- iÞ H int ðt rþ1 Þ j Ψðt rþ1 , xÞi,
-1

we cut off the iteration process in this stage. Hence, jΨ(t, x)i of (23.79) is approx-
imated by

t tn - 1
r
j Ψðt, xÞi ≈ j ii þ n=1
ð- iÞn dt 1 ⋯ dt n H int ðt 1 Þ⋯H int ðt n Þjii
-1 -1
t tn - 1
r
= j ii þ n=1
ð- iÞn dt 1 ⋯ dt n H int ðt 1 Þ⋯H int ðt n Þ j ii
-1 -1
t tn - 1
r
= Eþ n=1
ð- iÞn dt 1 ⋯ dt n H int ðt 1 Þ⋯H int ðt n Þ j ii: ð23:80Þ
-1 -1

Taking t, t1, ⋯, tn → 1 in (23.80), we have


1094 23 Interaction Between Fields

j Ψð1, xÞi
1 1
r
≈ Eþ n=1
ð- iÞn dt 1 ⋯ dt n H int ðt 1 Þ⋯H int ðt n Þ j ii: ð23:81Þ
-1 -1

Taking account of (23.73), we have

j Ψð1, xÞi = Sjii:

Then, from (23.81) we obtain

1 1
r
S=E þ n=1
ð- iÞn dt 1 ⋯ dt n H int ðt 1 Þ⋯H int ðt n Þ :
-1 -1

Defining S(0)  E and


1 1
r
SðrÞ  n=1
ð- iÞn dt 1 ⋯ dt n H int ðt 1 Þ⋯H int ðt n Þ ðr ≥ n ≥ 1Þ, ð23:82Þ
-1 -1

we get
r
S= n=0
SðnÞ : ð23:83Þ

The matrix S(n) is said to be the n-th order S-matrix. In (23.83), the number r can
be as large as one pleases, even though the related S-matrix calculations become
increasingly complicated.
Taking the T-product of (23.82), moreover, we get
1 1 1
r ð- iÞn
SðrÞ = n = 0 n!
dt 1 dt 2 ⋯ dt n T½H int ðt 1 Þ⋯H int ðt n Þ: ð23:84Þ
-1 -1 -1

Note that the factorial in the denominator of (23.84) resulted from taking the
T-product.
Using the Hamiltonian density Hint ðxÞ of (23.17), we rewrite (23.84) as
1 1 1
r ð- iÞn
Sð r Þ = n = 0 n!
dx1 dx2 ⋯ dxn T½Hint ðx1 Þ⋯Hint ðxn Þ: ð23:85Þ
-1 -1 -1

Combining (23.83) with (23.73) and (23.74), we get

j Ψð1, xÞi = n
f α jSðnÞ ji jf α i: ð23:86Þ
α
23.6 N-Product and T-Product 1095

By virtue of using Hint ðxÞ, S has been made an explicitly covariant form. Notice
that we must add a factor -1 when we reorder Fermion quantities as in the case of
the T-product. But, in the above we have not put the factor -1, because Hint ðxÞ
contains two Fermion quantities and, hence, exchanging Hint ðxi Þ and
Hint xj xi ≠ xj produces even number permutations of the Fermion quantities.
Then, even permutations produce a factor (-1)2n = 1 where n is a positive integer.
The S-matrix contains the identity operator E and additional infinite terms and can
be expressed as an infinite power series of the fine structure constant α described in
the natural units (SI units) by

α  e2 =4π e2 =4πε0 ħc ≈ 7:29735 × 10 - 3 : ð23:87Þ

23.6 N-Product and T-Product

In Sects. 22.3 and 23.5, we mentioned the normal-ordered product (N-product) and
time-ordered product (T-product). The practical calculations based on them fre-
quently appear and occupy an important position in the quantum field theory. We
represent them as, for example

NðABÞ and TðABC⋯Þ,

where the former represents the N-product and the latter denotes the T-product; A, B,
C, etc. show the physical quantities (i.e., quantum fields). The N-product and
T-product are important not only in the practical calculations but also from the
theoretical point of view. In particular, they are useful to clarify the relativistic
invariance (or covariance) of the mathematical formulae.

23.6.1 Example of Two Field Operators

We start with the simplest case where two field operators are pertinent. The essence
of both the N-product and T-product clearly shows up in this case and the methods
developed here provide a good starting point for dealing with more complicated
problems. Let A(x) and B(x) be two field operators, where x represents the space-time
coordinates. We choose A(x) and B(x) from among the following operators:
1096 23 Interaction Between Fields

d3 k
ϕð xÞ = aðkÞe - ikx þ a{ ðkÞeikx , ð22:98Þ
3
ð2π Þ ð2k0 Þ

ψ ð xÞ
1
dp3
= h= ±1
bðp, hÞuðp, hÞe - ipx þ d{ ðp, hÞvðp, hÞeipx ,
-1 3
ð2π Þ 2p0
ð22:232Þ
ψ ð xÞ
1
dp3
= h= ±1
dðp, hÞvðp, hÞe - ipx þ b{ ðp, hÞuðp, hÞeipx ,
-1 3
ð2π Þ 2p0
ð23:88Þ
ψ anti ðxÞ
1
dp3
= h= ±1
dðp, hÞuðp, hÞe - ipx þ b{ ðp, hÞvðp, hÞeipx ,
-1 3
ð2π Þ 2p0
ð22:244Þ
ψ anti ðxÞ
1
dp3
= h= ±1
bðp, hÞvðp, hÞe - ipx þ d{ ðp, hÞuðp, hÞeipx ,
-1 3
ð2π Þ 2p0
ð23:89Þ
μ
A ð xÞ
d3 k 3
= r=0
aðk, r Þεμ ðk, r Þe - ikx þ a{ ðk, r Þεμ ðk, r Þ eikx :
3
ð2π Þ ð2k0 Þ
ð22:334Þ

Equations (22.244) and (23.89) represent the operators of a positron. Note that
with respect to (22.98) for the real scalar field and (22.334) for the photon field, the
field operators ϕ(x) and Aμ(x) are Hermitian. The Dirac field operators, on the other
hand, are not Hermitian. For this reason, we need two pairs of operators b(p, h), b{(p,
h); d(p, h), d{(p, h) to describe the Dirac field. We see that it is indeed the case when
we describe the time-ordered products (vide infra). When QED is relevant, the
operators of the Dirac field and photon field are chosen from among ψ(x), ψ ðxÞ,
ψ anti(x), ψ anti ðxÞ, and Aμ(x).
The above field operators share several features in common. (i) They can be
divided into two parts, i.e., the positive-frequency part (containing the factor e-ikx)
23.6 N-Product and T-Product 1097

and the negative-frequency part (containing the factor eikx) [1]. The positive and
negative frequencies correspond to the positive energy and negative energy, respec-
tively. These quantities result from the fact that the relativistic field theory is
associated with the Klein-Gordon equation (see Chaps. 21 and 22). (ii) For each
operator, the first term, i.e., the positive-frequency part, contains the annihilation
operator and the second term (negative-frequency part) contains the creation oper-
ator. Accordingly, we express the operators as in, e.g., (22.99) and (22.132) such that
[1]

ϕðxÞ = ϕþ ðxÞ þ ϕ - ðxÞ: ð23:90Þ

Care should be taken when dealing with ψ ðxÞ [or ψ anti ðxÞ]. It is because when
taking the Dirac adjoint of ψ(x) [or ψ anti(x)], the positive-frequency part and the
negative-frequency part are switched.
Let us get back to the present task. As already mentioned in Sect. 22.3, with
respect to the N-product for, e.g., the scalar field, the annihilation operators a(k)
always stand to the right of the creation operators a{(k). Now, we show the
computation rule of the N-product as follows:
(i) Boson:

N½ϕðxÞϕðyÞ = Nf½ϕþ ðxÞ þ ϕ - ðxÞ½ϕþ ðyÞ þ ϕ - ðyÞg

= N½ϕþ ðxÞϕþ ðyÞ þ ϕþ ðxÞϕ - ðyÞ þ ϕ - ðxÞϕþ ðyÞ þ ϕ - ðxÞϕ - ðyÞ


= ϕþ ðxÞϕþ ðyÞ þ ϕ - ðyÞϕþ ðxÞ þ ϕ - ðxÞϕþ ðyÞ þ ϕ - ðxÞϕ - ðyÞ: ð23:91Þ

Notice that with the second term in the last equality of (23.91), ϕ+(x)ϕ-( y) has
been switched to ϕ-( y)ϕ+(x). Similarly, we obtain

N½ϕðyÞϕðxÞ
= ϕ ðyÞϕ ðxÞ þ ϕ ðxÞϕ ðyÞ þ ϕ - ðyÞϕþ ðxÞ þ ϕ - ðyÞϕ - ðxÞ:
þ þ - þ
ð23:92Þ

Applying (22.134) to (23.92), we get

N½ϕðyÞϕðxÞ
= ϕþ ðxÞϕþ ðyÞ þ ϕ - ðxÞϕþ ðyÞ þ ϕ - ðyÞϕþ ðxÞ þ ϕ - ðxÞϕ - ðyÞ: ð23:93Þ

Comparing (23.91) and (23.93), we have

N½ϕðxÞϕðyÞ = N½ϕðyÞϕðxÞ: ð23:94Þ


1098 23 Interaction Between Fields

In this way, in the Boson field case, the N-product is kept unchanged after
exchanging the operators involved in the N-product. The expressions related to
(23.91) and (23.94) hold with the photon field Aμ(x) as well.
(ii) Fermion (electron, positron, etc.):

N½ψ ðxÞψ ðyÞ = Nf½ψ þ ðxÞ þ ψ - ðxÞ½ψ þ ðyÞ þ ψ - ðyÞg

= N½ψ þ ðxÞψ þ ðyÞ þ ψ þ ðxÞψ - ðyÞ þ ψ - ðxÞψ þ ðyÞ þ ψ - ðxÞψ - ðyÞ


= ψ þ ðxÞψ þ ðyÞ - ψ - ðyÞψ þ ðxÞ þ ψ - ðxÞψ þ ðyÞ þ ψ - ðxÞψ - ðyÞ: ð23:95Þ

The expression related to (23.92) holds with the Dirac field


ψ ðxÞ, ψ anti ðxÞ, and ψ anti ðxÞ. With the above calculations, the positive-frequency
part ψ +(x) always stands to the right of the negative-frequency part ψ -(x) as in the
Boson case. When the multiplication order is switched, however, the sign of the
relevant term is reversed (+ to - and - to +) for the Fermion case. The reversal of the
sign with respect to +ψ +(x)ψ -( y) and -ψ -( y)ψ +(x) results from the relations
(22.246) and the anti-commutation relations (22.242).
Similarly, we have

N½ψ ðyÞψ ðxÞ


= ψ ðyÞψ ðxÞ - ψ ðxÞψ ðyÞ þ ψ - ðyÞψ þ ðxÞ þ ψ - ðyÞψ - ðxÞ:
þ þ - þ
ð23:96Þ

Applying (22.245) to (23.96), we get

N½ψ ðyÞψ ðxÞ


= - ψ þ ðxÞψ þ ðyÞ - ψ - ðxÞψ þ ðyÞ þ ψ - ðyÞψ þ ðxÞ - ψ - ðxÞψ - ðyÞ: ð23:97Þ

Comparing (23.95) and (23.97), we have

N½ψ ðyÞψ ðxÞ = - N½ψ ðxÞψ ðyÞ: ð23:98Þ

Contrast (23.98) with (23.94) with respect to the presence or absence of the minus
sign. If we think of N½ψ ðxÞψ ðyÞ, we can perform the relevant calculations following
(23.95)-(23.98) in the same manner.
Thinking of the constitution of the N-product, it is clear that for both the cases of
Boson and Fermion, the vacuum expectation value of the N-product is zero. Or
rather, the N-product is constructed in such a way that the vacuum expectation value
of the product of field operators may vanish.
23.6 N-Product and T-Product 1099

23.6.2 Calculations Including Both the N-Products


and T-Products

We continue dealing with the mathematical expressions of the above example and
examine their implications for both the Boson and Fermion cases.
(i) Boson:
Subtracting N[ϕ(x)ϕ( y)] from ϕ(x)ϕ( y) in (23.91), we have

ϕðxÞϕðyÞ - N½ϕðxÞϕðyÞ = ½ϕþ ðxÞ, ϕ - ðyÞ = iΔþ ðx - yÞ, ð23:99Þ

where with the last equality we used (22.137). Exchanging x and y in (23.99), we
obtain

ϕðyÞϕðxÞ - N½ϕðyÞϕðxÞ = ½ϕþ ðyÞ, ϕ - ðxÞ = iΔþ ðy - xÞ = - iΔ - ðx - yÞ,


ð23:100Þ

where with the last equality we used (22.138). Using (23.94), we get

ϕðyÞϕðxÞ - N½ϕðxÞϕðyÞ = - iΔ - ðx - yÞ: ð23:101Þ

Equations (23.99) and (23.101) show that if we subtract the N-product from the
product of the field operators, i.e., ϕ(x)ϕ( y) or ϕ( y)ϕ(x), we are left with the scalar of
iΔ+(x - y) or -iΔ-(x - y). This implies the relativistic invariance of (23.99) and
(23.101). Meanwhile, sandwiching (23.99) between h0j and | 0i, i.e., taking a
vacuum expectation value of (23.99), we obtain

h0 j ϕðxÞϕðyÞj0i - h0 j N½ϕðxÞϕðyÞj0i = h0 j ½ϕþ ðxÞ, ϕ - ðyÞj0i: ð23:102Þ

Taking account of the fact that h0 j N[ϕ(x)ϕ( y)]| 0i vanishes, we get

h0jϕðxÞϕðyÞj0i = h0j½ϕþ ðxÞ, ϕ - ðyÞj0i = ½ϕþ ðxÞ, ϕ - ðyÞh0 j 0i


= ½ϕþ ðxÞ, ϕ - ðyÞ = iΔþ ðx - yÞ, ð23:103Þ

where with the second equality we used the fact that [ϕ+(x), ϕ-( y)] = iΔ+(x - y) is a
c-number, see (22.137); with the second to the last equality, we used the normalized
condition

h0 j 0i = 1: ð23:104Þ

Then, using (23.103), we rewrite (23.99) as [1]


1100 23 Interaction Between Fields

ϕðxÞϕðyÞ - N½ϕðxÞϕðyÞ = h0 j ϕðxÞϕðyÞj0i: ð23:105Þ

Thus, (23.105) shows the relativistic invariance based on the N-product and the
vacuum expectation value of ϕ(x)ϕ( y).
We wish to derive a more useful expression on the basis of (23.105). To this end,
we deal with the mathematical expressions including both the N-products and
T-products. Multiplying both sides of (23.99) by θ(x0 - y0), we get

θ x0 - y0 ϕðxÞϕðyÞ - θ x0 - y0 N½ϕðxÞϕðyÞ = iθ x0 - y0 Δþ ðx - yÞ: ð23:106Þ

Multiplying both sides of (23.101) by θ(y0 - x0), we obtain

θ y0 - x0 ϕðyÞϕðxÞ - θ y0 - x0 N½ϕðxÞϕðyÞ

= - iθ y0 - x0 Δ - ðx - yÞ: ð23:107Þ

Adding both sides of (23.106) and (23.107), we get

θ x0 - y0 ϕðxÞϕðyÞ þ θ y0 - x0 ϕðyÞϕðxÞ

- θ x0 - y0 þ θ y0 - x0 N½ϕðxÞϕðyÞ

= i θ x0 - y0 Δþ ðx - yÞ - θ y0 - x0 Δ - ðx - yÞ = ΔF ðx - yÞ, ð23:108Þ

where the last equality is due to (22.185). Using the definition of the time-ordered
product (or T-product) of (22.178) and the relation described by (see Chap. 10)

θ x0 - y0 þ θ y0 - x0  1, ð23:109Þ

we obtain

T½ϕðxÞϕðyÞ - N½ϕðxÞϕðyÞ = ΔF ðx - yÞ = h0jT½ϕðxÞϕðyÞj0i, ð23:110Þ

where with the last equality we used (22.182). Comparing (23.110) with (23.105),
we find that in (23.110) T[ϕ(x)ϕ( y)] has replaced ϕ(x)ϕ( y) of (23.105) except for
ϕ(x)ϕ( y) in N[ϕ(x)ϕ( y)]. Putting it another way, subtracting the N-product from the
T-product in (23.110), we are left with the relativistically invariant scalar h0| T
[ϕ(x)ϕ( y)]| 0i.
(ii) Fermion:
The discussion is in parallel with that of the Boson case mentioned above. From a
practical point of view, a product ψ ðxÞψ ðyÞ is the most important. Subtracting
N½ψ ðxÞψ ðyÞ from ψ ðxÞψ ðyÞ in (23.95), we have
23.6 N-Product and T-Product 1101

ψ ðxÞψ ðyÞ - N½ψ ðxÞψ ðyÞ = fψ þ ðxÞ, ψ - ðyÞg = iSþ ðx - yÞ ð23:111Þ

# multiply both sides by θ(x0 - y0)

θ x0 - y0 ψ ðxÞψ ðyÞ - θ x0 - y0 N½ψ ðxÞψ ðyÞ = iθ x0 - y0 Sþ ðx - yÞ: ð23:112Þ

Note that with the last equality of (23.111), we used (22.256). In a similar manner,
we have

ψ ðyÞψ ðxÞ - N½ψ ðyÞψ ðxÞ


= fψ ðyÞ, ψ ðxÞg = fψ - ðxÞ, ψ þ ðyÞg = iS - ðx - yÞ
þ -
ð23:113Þ

# multiply both sides by θ(y0 - x0)

θ y0 - x0 ψ ðyÞψ ðxÞ - θ y0 - x0 N½ψ ðyÞψ ðxÞ = iθ y0 - x0 S - ðx - yÞ: ð23:114Þ

Subtracting both sides of (23.114) from (23.112) and using the definition of the
time-ordered product of (22.254) for the Dirac field, we get

T½ψ ðxÞψ ðyÞ - θ x0 - y0 þ θ y0 - x0 N½ψ ðxÞψ ðyÞ


= T½ψ ðxÞψ ðyÞ - N½ψ ðxÞψ ðyÞ
= i θ x0 - y0 Sþ ðx - yÞ - θ y0 - x0 S - ðx - yÞ : ð23:115Þ

where we used the following relationship that holds with the Fermion case:

N½ψ ðyÞψ ðxÞ = - N½ψ ðxÞψ ðyÞ: ð23:116Þ

Using (22.258), we rewrite (23.115) as

T½ψ ðxÞψ ðyÞ - N½ψ ðxÞψ ðyÞ = SF ðx - yÞ = h0jT½ψ ðxÞψ ðyÞj0i: ð23:117Þ

Again, we have obtained the expression (23.117) related to (23.110). Collectively


writing these equations, we obtain [1]

T½AðxÞBðyÞ - N½AðxÞBðyÞ = h0jT½AðxÞBðyÞj0i: ð23:118Þ

In the case where both A and B are Fermion operators, A(x) = ψ(x) [or ψ ðxÞ] and
BðyÞ = ψ ðyÞ [or ψ( y)] are intended. In this case, all three terms of (23.118) change
the sign upon exchanging A(x) and B( y). Therefore, (23.118) does not change the
sign upon the same exchange. Meanwhile, in the case where at least one of A and B is
a Boson operator, each term of (23.118) does not change sign upon permutation of
A(x) and B( y). Hence, (23.118) holds as well.
1102 23 Interaction Between Fields

In the case of the photon field, the discussion can be developed in parallel to the
scalar bosonic field, because the photon is among a Boson family. In both the Boson
and Fermion cases, the essence of the perturbation method clearly shows up in the
simple expression (23.118). In fact, (23.118) is the simplest case of the Wick’s
theorem [1, 3]. In the general case, the Wick’s theorem is expressed as [1]

TðABCD⋯WXYZ Þ = NðABCD⋯WXYZ Þ þ N AB

C  YZ

þN ABC  YZ þ ⋯ þ N ABC  YZ þ N ABCD  YZ þ ⋯


j j ⎵ ⎵ ⎵

þN AB⋯WX YZ

þ ⋯: ð23:119Þ

Equation (23.119) looks complicated, but clearly shows how the Feynman
propagators are produced in coexistence of the T-products and N-products. Glancing
at (23.110) and (23.117), this feature is once again obvious. In RHS of (23.119), the
terms from the second on down are sometimes called the generalized normal
products [1]. We have another rule with them. For instance, we have [1]

N ABC  YZ = ð- 1ÞP AC ½NðB  YZ Þ, ð23:120Þ


j j ⎵

where P is the number of interchanges of neighboring Fermion operators required to


change the order (ABC   YZ) to (ACB   YZ). In this case, the interchange takes
place between B and C. If both B and C are Fermion operators, then P = 1;
otherwise, P = 0.
Bearing in mind the future application (e.g., the Compton scattering), let us think
of a following example given as

N F 1 B1 F 01 F 2 B2 F 02 , ð23:121Þ

where F and F′ are Fermion operators; B and B′ are Boson operators. Indices 1 and
2 designate the space-time coordinates. Suppose that we have a following contrac-
tion within the operators given by

N F 1 B1 F 01 F 2 B2 F 02 = ð- 1Þ2 F 1 F 02 N B1 F 01 F 2 B2
j j j j

= F 1 F 02 N B1 F 01 F 2 B2 : ð23:122Þ
j j

Meanwhile, suppose that we have


23.7 Feynman Rules and Feynman Diagrams in QED 1103

N F 2 B2 F 02 F 1 B1 F 01 = F 02 F 1 ð- 1Þ2 N F 2 B2 B1 F 01 = - F 1 F 02 ð- 1Þ2 N F 2 B2 B1 F 01
j j j j j j

= - F 1 F 02 ð- 1Þ2 ð- 1ÞN B1 F 01 F 2 B2 = F 1 F 02 N B1 F 01 F 2 B2 : ð23:123Þ


j j j j

Comparing (23.122) and (23.123), we get

N F 1 B1 F 01 F 2 B2 F 02 = N F 2 B2 F 02 F 1 B1 F 01 : ð23:124Þ
j j j j

Thus, we have obtained a simple but important relationship with the generalized
normal product. We make use of this relationship in the following sections.

23.7 Feynman Rules and Feynman Diagrams in QED

So far, we have developed the general discussion on the interacting fields and how to
deal with the interaction using the perturbation methods based upon the S-matrix
method. In this section, we develop the methods to apply the general principle to the
specific practical problems. For this purpose, we make the most of the Feynman
diagrams and associated Feynman rules. In parallel, we represent the mathematical
expressions using a symbol or diagram. One of typical examples is a contraction,
which we mentioned in the previous section. We summarize frequently appearing
contractions below. The contractions represent the Feynman propagators.

ϕðxÞϕðyÞ = ΔF ðx - yÞ, ð23:125Þ


j j

ψ ðxÞψ ðyÞ = - ψ ðyÞψ ðxÞ = SF ðx - yÞ, ð23:126Þ


j j j j

Aμ ðxÞAν ðyÞ = DF ðx - yÞ: ð23:127Þ


j j

Among the above relations, (23.126) and (23.127) are pertinent to QED. When
we examine the interaction between the fields, it is convenient to describe field
operator F such that [1]

F = Fþ þ F - , ð23:128Þ

where F+ and F- represent the positive-frequency part and negative-frequency part,


respectively (Sects. 22.3.6 and 23.6.1). Regarding a product of more than one
operators F, G, etc., we similarly denote
1104 23 Interaction Between Fields

FG⋯ = ðF þ þ F - ÞðGþ þ G - Þ⋯: ð23:129Þ

In our present case of the Hamiltonian density of electromagnetic interaction, we


have

Hint ðxÞ  e½ψ þ ðxÞ þ ψ - ðxÞ γ μ Aþ -


μ ðxÞ þ Aμ ðxÞ ½ψ þ ðxÞ þ ψ - ðxÞ: ð23:130Þ

The operator Aμ(x) defined by (22.336) is an Hermitian operator with

Aμ ðxÞ = Aþ -
μ ðxÞ þ Aμ ðxÞ:

We have

{ {

μ ð xÞ = Aμ- ðxÞ and Aμ- ðxÞ = Aþ
μ ðxÞ: ð23:131Þ

Note, however, that neither ψ(x) nor ψ ðxÞ is Hermitian. In conjunction to this fact,
the first factor of (23.130) contains a creation operator ψ - of electron and an
annihilation operator ψ þ of positron. The third factor of (23.130), on the other
hand, includes a creation operator ψ - of positron and an annihilation operator ψ + of
electron. The relevant discussion can be seen in Sect. 23.8.2. According to the
prescription of Sect. 23.5, we wish to calculate matrix elements of (23.86) up to
the second order of the S-matrix.
For simplicity, we assume that the final state obtained after the interaction
between the fields is monochromatic. In other words, the final state of individual
particles is assumed to be described by a single plane wave. For instance, if we are
observing photons and electrons as the final states after the interaction, those photons
and electrons possess definite four-momentum and spin (or polarization). We use
f instead of fα in (23.86) accordingly.

23.7.1 Zeroth- and First-Order S-Matrix Elements

We start with the titled simple cases to show how the S-matrix elements are related to
the elementary processes that are associated with the interaction between fields.
(i) Zeroth-order matrix element hf| S(0)| ii:
We have S(0) = E (identity operator) in this trivial case. We assume

j ii = b{ ðp, hÞj0i and j f i = b{ ðp0 , h0 Þj0 : ð23:132Þ


23.7 Feynman Rules and Feynman Diagrams in QED 1105

The RHSs of (23.132) imply that an electron with momenta p (or p′) and helicity
h (or h′) has been created in vacuum. That is, the elementary process is described by

e- ⟶ e- ,

where e- denotes an electron. With this process we have

f jSð0Þ ji = hf jEjii = hf jii = 0jbðp0 , h0 Þb{ ðp, hÞj0

= 0jbðp0 , h0 Þb{ ðp, hÞj0 = 0j - b{ ðp, hÞbðp0 , h0 Þ þ δh0 h δ3 ðp0 - pÞj0

= 0j - b{ ðp, hÞbðp0 , h0 Þj0 þ δh0 h δ3 ðp0 - pÞh0j0i = δh0 h δ3 ðp0 - pÞ, ð23:133Þ

where with the last equality, we used (22.262). Equation (23.133) implies that hf|
S(0)| ii vanishes unless we have h′ = h or p′ = p. That is, electrons are only moving
without changing its helicity or momentum. This means that no interaction occurred.
(ii) First-order matrix element hf| S(1)| ii [1]:
We assume that

j ii = b{ ðp, hÞj0i and j f i = a{ ðk0 , r 0 Þb{ ðp0 , h0 Þj0 : ð23:134Þ

The corresponding elementary process would be

e- ⟶ e- þ γ, ð23:135Þ

where e- and γ denote an electron and a photon, respectively. An electron of the


initial state jii in vacuum is converted into an electron and a photon in the final state
jfi of (23.135). We pick up the effective normal product from the interaction
Hamiltonian (23.17) [1]. Then, we have

N eψ ðxÞγ μ Aμ ðxÞψ ðxÞ = eψ - ðxÞγ μ Aμ- ðxÞψ þ ðxÞ: ð23:136Þ

As the matrix element hf| S(1)| ii, we have

f jSð1Þ ji = ieh0 j bðp0 , h0 Þaðk0 , r 0 Þ

d4 xψ - ðxÞγ μ Aμ- ðxÞψ þ ðxÞ b{ ðp, hÞ j 0i: ð23:137Þ

In (23.137), we expand ψ - ðxÞ, Aμ- ðxÞ, and ψ +(x) using (23.88), (22.334), and
(22.232), respectively. That is, we have
1106 23 Interaction Between Fields

1
d 3 w0 0
ψ - ð xÞ = t0 = ± 1
b{ ðw0 , t 0 Þuðw0 , t 0 Þeiw x , ð23:138Þ
-1 ð2π Þ3 2w00
1
d3 l0 2 0
Aμ- ðxÞ = m0 = 1
a{ ðl0 , m0 Þεμ ðl0 , m0 Þeil x , ð23:139Þ
-1 ð2π Þ3 2l00
1
d3 w
ψ þ ð xÞ = t= ±1
bðw, t Þuðw, t Þe - iwx : ð23:140Þ
-1 ð2π Þ 2w0 3

In (23.139), we took only m′ = 1, 2 (i.e., transverse mode) with the photon


polarization [1]. Using (22.239), we have

bðp0 , h0 Þb{ ðw0 , t 0 Þ = - b{ ðw0 , t 0 Þbðp0 , h0 Þ þ δh0 t0 δ3 ðp0 - w0 Þ, ð23:141Þ


{ {
bðw, t Þb ðp, hÞ = - b ðp, hÞbðw, t Þ þ δth δ ðw - pÞ: 3
ð23:142Þ

Also, using (22.344), we have

aðk0 , r 0 Þa{ ðl0 , m0 Þ = a{ ðl0 , m0 Þaðk0 , r 0 Þ þ ζ r0 δr0 m0 δ3 ðk0 - l0 Þ: ð23:143Þ

Further using (23.141)-(23.143) and performing the momentum space integra-


tion with respect to dw′3, dw3, and dl′3, we get

f jSð1Þ ji

1 0 0
= ie d 4 xuðp0 , h0 Þεμ ðk0 , r 0 Þγ μ ζ r0 uðp, hÞeiðp þk - pÞx
9
ð2π Þ  2p0  2p00  2k00

1
= ie uðp0 , h0 Þεμ ðk0 , r 0 Þγ μ ζ r0 uðp, hÞð2π Þ4 δ4 ðp0 þ k0 - pÞ,
ð2π Þ  2p0  2p00  2k00
9

ð23:144Þ

where with the last equality, we used the four-dimensional version of (22.101). The
four-dimensional δ function means that

δ4 ðp0 þ k0 - pÞ = δ p00 þ k00 - p0 δ3 ðp0 þ k0 - pÞ, ð23:145Þ


23.7 Feynman Rules and Feynman Diagrams in QED 1107

p00
where p0 = p0 , p= p0
p , etc. From (23.145), εμ ðk0 , r 0 Þ in (23.144) can be replaced
with εμ ðp - p , r 0 Þ. Equation (23.145) implies the conservation of four-momentum
0

(or energy-momentum). That is, we have

p = p0 þ k 0 : ð23:146Þ

With the electron, we have

p02 = p00 - p0 = m2
2 2
and p2 = p0 2 - p2 = m2 : ð23:147Þ

For the photon, we get

k 0 = k00 - k0 = ω0 - ω0 = 0,
2 2 2 2 2
ð23:148Þ

where ω′ is an angular frequency of an emitted photon. But, from (23.146), we have

p2 = p0 þ 2p0 k0 þ k 0 :
2 2

This implies p′k′ = 0. It is, however, incompatible with a real physical process
[1]. This can be shown as follows: We have

p0 k0 = p00 ω0 - p0 k0 :

But, we have

p0 k0 ≤ jp0 j  jk0 j = p00 2 - m2  jk0 j < p00 ω0 :

Hence, p′k′ > p′0ω′ - p′0ω′ = 0. That is, p′k′ > 0. This is, however, inconsistent
with p′k′ = 0, and so an equality of p′k′ = 0 is physically unrealistic. The simplest
case would be that a single electron is present at rest in the initial state and it emits a
photon with the electron recoiling in the final state. Put

m p0 0 ω0
p= , p0 = , k0 = :
0 p0 k0

Then, seek a value of p′k′ on the basis of the energy-momentum conservation. It


is left for readers.
As evidenced from the above illustration, we must have
1108 23 Interaction Between Fields

f jSð1Þ ji = 0: ð23:149Þ

23.7.2 Second-Order S-Matrix Elements and Feynman


Diagrams

As already examined above, we had to write down a pretty large number of


equations to examine even the simplest cases. It is the Feynman diagram that
facilitates understanding the physical meaning of individual processes. A building
block comprises three lines jointed at a vertex (see Fig. 23.2); i.e., two electron- or
positron-associated lines and a photon-associated line. These lines correspond to the
interaction Hamiltonian (23.17) related to the fields. Figure 23.2 is topologically
equivalent to the first-order S-matrix S(1) discussed in the last section.
The diagram shows an incoming (or initial) electron (represented by a straight
line labelled with p) and an outgoing (final) electron (labelled with p′). A photon line
(shown as a wavy line labelled with k or k′) intersects with the electron lines at their
vertex. The Feynman diagrams comprise such building blocks and the number of the
blocks contained in a diagram is determined from the order of the S-matrix. In
general, a Feynman diagram consisting of n blocks is related to a n-th order S-matrix
S(n). We show several kinds of Feynman diagrams related to S(2) below. Other
diagrams representing S(2) processes can readily be constructed using two building
blocks, one of which is depicted in Fig. 23.2. Associated Feynman rules will be
summarized at the end of this section.
We show examples of the second-order S-matrix S(2). Figure 23.3 summarizes
several topologically different diagrams. Individual diagrams correspond to different
scattering processes. One of important features that have not appeared in the
previous section is that the matrix elements hf| S(2)| ii contain the contraction(s). In

(a) (b)

Fig. 23.2 Building blocks of the Feynman diagram. The Feynman diagram comprises two
electron- or positron-associated lines (labelled p, p′) and a photon-associated line (labelled k, k′)
that are jointed at a vertex. (a) Incoming photon is labelled k. (b) Outgoing photon is labelled k′
23.7 Feynman Rules and Feynman Diagrams in QED 1109

(a)
(b)

= + = − =

(c) (d)

Fig. 23.3 Feynman diagram representing the second-order S-matrix S(2). The diagram is composed
of two building blocks. (a) Feynman diagram containing a single contraction related to electron
field. (b) Feynman diagram showing the electron self-energy. (c) Feynman diagram showing the
photon self-energy (or vacuum polarization). (d) Feynman diagram showing the vacuum bubble
(or vacuum diagram)

QED, each contraction corresponds to the Feynman propagator described by


SF(x - y) or DF(x - y). On the basis of (23.85), we explicitly show S(2) such that
1 1
1 2
Sð2Þ = - e dx1 dx2 T½Hint ðx1 ÞHint ðx2 Þ
2 -1 -1
1 1
1
= - e2 dx1 dx2 T
2 -1 -1

ψ ðx1 Þγ μ Aμ ðx1 Þψ ðx1 Þ ψ ðx2 Þγ μ Aμ ðx2 Þψ ðx2 Þ : ð23:150Þ

From (23.150), up to three contractions may be present in S(2). As can be seen in


Fig. 23.3, the patterns and number of the contractions determine the topological
features of the Feynman diagrams. In other words, the contractions play an essential
role in determining how the building blocks are jointed in the diagram (see Fig. 23.3).
Before discussing the several features of an example (i.e., Compton scattering) at
large in the next section, we wish to briefly mention the characteristics of the
Feynman diagrams that have contraction(s). Figure 23.3a shows a diagram having
a single contraction. Initial electron and photon are drawn on the left side and final
electron and photon are depicted on the right side with the contraction with respect to
the electron shown at the center.
1110 23 Interaction Between Fields

Examples that have two contractions are shown in Fig. 23.3b, c, which represent
the “electron self-energy” and “photon self-energy,” respectively. The former term
causes us to reconsider an electron as a certain entity that is surrounded by “photon
cloud.” The photon self-energy, on the other hand, is sometimes referred to as the
vacuum polarization. It is because the photon field might modify the distribution of
virtual electron-positron pairs in such a way that the photon field polarizes the
vacuum like a dielectric [1]. The S-matrix computation of the physical processes
represented by Fig. 23.3b, c leads to a divergent integral. We need a notion of
renormalization to traverse these processes, but we will not go into detail about this
issue.
We show another example of the Feynman diagrams in Fig. 23.3d called “vac-
uum diagram” (or vacuum bubble [7]). It lacks external lines, and so no transitions
are thought to occur. Diagrams collected in Fig. 23.3c, d contain two contractions
between the Fermions. In those cases, the associated contraction lines go in opposite
directions. This implies that both types of Fermions (i.e., electron and positron) take
part in the process.
Now, in accordance with the above topological feature of the Feynman diagrams,
we list major S(2) processes [1] as follows:

ð2Þ e2
SA = - dx1 dx2 N ψγ μ Aμ ψ x1
ðψγ ν Aν ψ Þx2 , ð23:151Þ
2
ð2Þ
SB =
e2
- dx1 dx2 N ðψγ μ Aμ ψÞx1 ðψ γ ν Aν ψÞx  þ ½ðψγ μ Aμ ψÞx1 ðψγ ν Aν ψÞx ,
2 j j 2 j j 2

ð23:152Þ
ð2Þ e2
SC = - dx1 dx2 N ðψγ μ Aμ ψÞx1 ðψγ ν Aν ψÞx , ð23:153Þ
2 j j 2

ð23:154Þ

ð23:155Þ

ð23:156Þ

ð2Þ
Of these equations, SB contains a single contraction between the Fermions. We
describe related important aspects in full detail in the next section with emphasis
23.7 Feynman Rules and Feynman Diagrams in QED 1111

ð2Þ
upon the Compton scattering. Another process SC containing a single contraction
includes the Mϕller scattering and Bhabha scattering. Interested readers are referred
ð2Þ ð2Þ ð2Þ
to the literature [1, 3]. The SD , SE , and SF processes are associated with the
electron self-energy, photon self-energy, and vacuum bubble, respectively. The first
ð2Þ
two processes contain two contractions, whereas the last process SF contains three
contractions. Thus, we find that the second-order S-matrix elements
ð2Þ
Sk ðk = A, ⋯, FÞ reflect major processes of QED.
Among the above equations, (23.151) lacking a contraction is of little importance
as in the previous case, because it does not reflect real processes. Equation (23.152)
contains two contractions between the Dirac fields (ψ and ψ) and related two terms
are equivalent. As discussed in Sect. 23.6.2, especially as represented in (23.124),
we have a following N-product expressed as

N ðψγ μ Aμ ψÞx1 ðψγ ν Aν ψÞx2 = N ðψγ μ Aμ ψÞx2 ðψ γ ν Aν ψÞx1 : ð23:157Þ


j j j j

Notice that (i) exchanging two factors ψγ μ Aμ ψ x1 and ψγ μ Aμ ψ x2 causes even


permutations of the Fermion operators (ψ and ψ). But, the even permutations do not
change the N-products as shown in (23.120). (ii) Since (23.157) needs to be
integrated with respect to x1 and x2, these variables can freely be exchanged.
Hence, exchanging the integration variables x1 and x2 in RHS of (23.157), we obtain

N ðψγ μ Aμ ψÞx2 ðψ γ ν Aν ψÞx1 = N ðψγ μ Aμ ψÞx1 ðψ γ ν Aν ψÞx2 :


j j j j

Thus, two terms of (23.152) are identical, and so (23.152) can be equivalently
expressed as

ð2Þ
SB = - e2 dx1 dx2 N ðψγ μ Aμ ψÞx1 ðψγ ν Aν ψÞx2 : ð23:158Þ
j j

As already noted, the contractions represent the Feynman propagators. The


contraction in (23.158) corresponds to SF(x - y) described by (23.126) and contains
two uncontracted Fermion operators and two uncontracted photon operators. These
uncontracted operators are referred to as external particles (associated with the
external lines in the Feynman diagrams) and responsible for absorbing or creating
the particles (i.e., electrons or photons). Consequently, (23.158) contributes to many
real physical processes (see Sect. 23.8.2).
In parallel with the discussion made so far, in Fig. 23.4 we depict several major
“components” of the diagram along with their meaning. A set of line drawings
accompanied by their physical meaning is usually referred to as Feynman rules. The
words “initial” and “final” used in the physical meaning should be taken as formal.
In the above discussion, the initial states are associated with the momenta p and k,
whereas the final states are pertinent to p′ and k′. In this respect, the words initial and
1112 23 Interaction Between Fields

(a)

+
=
− +

(b)


=
+

(c) (d)

( , ℎ) – ( , ℎ)

( , ℎ) ( , ℎ)

(e)

( , )

( , )

Fig. 23.4 Major components of the Feynman diagrams. (a) Internal (or intermediate) electron line
and the corresponding Feynman propagator. (b) Internal (or intermediate) photon line and the
corresponding Feynman propagator. (c) External lines of the initial electron [u(p, h)] and final
electron [uðp, hÞ] from the top. (d) External lines of the initial positron [vðp, hÞ] and final positron
[v(p, h)] from the top. (e) External lines of the initial photon [εμ(k, r)] and final photon [εμ(k, r)] from
the top

final are of secondary importance. What is important is a measured (or measurable)


quantity of p, k, p′, and k′.
Figure 23.4a, b shows the internal (or intermediate) electron and photon lines
labelled by momentum p and k, respectively. Figure 23.4c indicates the external lines
of the initial electron [u(p, h)] and final electron [uðp, hÞ] from the top. In Fig. 23.4d,
we depict similarly the external lines of initial positron [vðp, hÞ] and final positron
[v(p, h)] from the top. Note that the direction of the arrows of Fig. 23.4d is reversed
23.8 Example: Compton Scattering [1] 1113

relative to Fig. 23.4c. Moreover, Fig. 23.4e shows likewise the external lines of
photons. They are labelled by the polarization εμ(k, r).

23.8 Example: Compton Scattering [1]

23.8.1 Introductory Remarks

To gain better understanding of the interaction between the quantum fields and to get
used to dealing with the Feynman diagrams, we take this opportunity to focus upon a
familiar example of Compton scattering. In Chap. 1, we dealt with it from a classical
point of view (i.e., early-stage quantum theory). Here we revisit this problem to
address the major issue from a point of view of quantum field theory. Even though
the Compton scattering is only an example among various kinds of interactions of
the quantum fields, it represents a general feature of the field interactions and, hence,
to explore its characteristics in depth enhances heuristic knowledge.
The elementary process of the Compton scattering is described by

γ þ e- ⟶ γ þ e- , ð23:159Þ

where γ denotes a photon and e- stands for an electron. Using (23.130), a major part
ð2Þ
of the interaction Hamiltonian for SB of (23.158) is described as

e2 N ½ψ þ ðx1 Þ þ ψ - ðx1 Þγ μ Aþ -


μ ðx1 Þ þ Aμ ðx1 Þ SF ðx1 - x2 Þ

× γ ν Aþ - þ -
ν ðx2 Þ þ Aν ðx2 Þ ½ψ ðx2 Þ þ ψ ðx2 Þg: ð23:160Þ

Equation (23.160) represents many elementary processes including the Compton


scattering (see Sect. 23.8.2). Of the operators in (23.160), ψ - ðx1 Þ and ψ +(x2) are
responsible for the creation [b{(p, h)] and annihilation [b(p, h)] of electron, respec-
tively. The operators ψ -(x2) and ψ þ ðx1 Þ are associated with the creation [d{(p, h)]
and annihilation [d(p, h)] of positron, respectively. The operators Aμ- and Aþ μ are
pertinent to the creation and annihilation of photon, respectively. We need to pick
appropriate operators up according to individual elementary processes and arrange
them in the normal order. In the case of the Compton scattering, for example, the
order of operators is either

ψ - ðx1 Þγ μ Aμ- ðx1 Þγ ν Aþ þ


ν ðx2 Þψ ðx2 Þ

or
1114 23 Interaction Between Fields

ψ - ðx1 Þγ μ Aν- ðx2 Þγ ν Aþ þ


μ ðx1 Þψ ðx2 Þ:

Regarding the photon operators, either Aþ þ


μ ðx1 Þ or Aν ðx2 Þ is thought to annihilate
- -
the “initial” photon. Either Aμ ðx1 Þ or Aν ðx2 Þ creates the “final” photon accordingly.
Note the “initial” and “final” with the double quotation marks written above.
Because of the presence of time-ordered products, the words of initial and final are
somewhat pointless. Nonetheless, we used them according to the custom. The
custom merely arises in the notation hf| S(2)| ii. That is, in this notation the “initial
state” i stands to the right of the “final state” f.
ð2Þ
Defining the S(2) relevant to the Compton scattering as SComp , we have [1]

ð2Þ
SComp = Sa þ Sb , ð23:161Þ

where Sa and Sb are given by

Sa = - e2 d4 x1 d 4 x2 ψ - ðx1 Þγ μ SF ðx1 - x2 Þγ ν Aμ- ðx1 ÞAþ þ


ν ðx2 Þψ ðx2 Þ ð23:162Þ

and

S b = - e2 d4 x1 d 4 x2 ψ - ðx1 Þγ μ SF ðx1 - x2 Þγ ν Aν- ðx2 ÞAþ þ


μ ðx1 Þψ ðx2 Þ: ð23:163Þ

In (23.162) and (23.163), once again we note that the field operators have been
placed in the normal order. In both (23.162) and (23.163), the positive-frequency
part ψ + and negative-frequency part ψ - of the electron field are assigned to the
coordinate x2 and x1, respectively. The normal order of the photon part is given by

Aμ- ðx1 ÞAþ - - þ þ - þ


ν ðx2 Þ, Aμ ðx1 ÞAν ðx2 Þ, Aμ ðx1 ÞAν ðx2 Þ, and Aν ðx2 ÞAμ ðx1 Þ:

Of these, the first and last terms are responsible for Sa and Sb, respectively. Other
ð2Þ
two terms produce no contribution to SComp .
- þ
The term Aμ ðx1 ÞAν ðx2 Þ of Sa in (23.162) plays a role in annihilating the initial
photon at x2 and in creating the final photon at x1. Meanwhile, the term
Aν- ðx2 ÞAþ
μ ðx1 Þ of Sb in (23.163) plays a role in annihilating the initial photon at x1
and in creating the final photon at x2. Thus, comparing (23.162) and (23.163), we
find that the coordinates x1 and x2 switch their role in the annihilation and creation of
photon. Therefore, it is of secondary importance which “event” has happened prior
to another between the event labelled by Aþ þ
ν ðx2 Þ and that labelled by Aμ ðx1 Þ. This is
the case with Aμ- ðx1 Þ and Aν- ðx2 Þ as well. Such a situation frequently occurs in other
kinds of field interactions and their calculations more generally. Normally, we
23.8 Example: Compton Scattering [1] 1115

distinguish the state of the quantum field by momentum and helicity with electron
and momentum and polarization with photon.
The related Feynman diagrams and calculations pertinent to the Compton scat-
tering are given and discussed accordingly (vide infra).

23.8.2 Feynman Amplitude and Feynman Diagrams


of Compton Scattering

ð2Þ
The S-matrix SB of (23.152) represents interaction processes other than the Comp-
ton scattering shown in (23.159) as well. These are described as follows:

γ þ eþ ⟶ γ þ eþ , ð23:164Þ
- þ
e þ e ⟶ γ þ γ, ð23:165Þ
γ þ γ ⟶ e- þ eþ , ð23:166Þ

ð2Þ
where e+ stands for a positron. In the SB types of processes, we are given four
particles, i.e.,

e - , eþ , γ, γ:

Using the above four particles, we have four different ways of choosing two of
them. That is, we have

γ, e - , γ, eþ , e - , eþ , γ, γ:

The above four combinations are the same as the remainder combinations left
after the choice. The elementary processes represented by (23.164), (23.165), and
(23.166) are called the Compton scattering by positrons, electron-positron pair
annihilation, and electron-positron pair creation, respectively. To distinguish
those processes unambiguously, the interaction Hamiltonian (23.160) including
both the positive frequency-part and negative-frequency part must be written down
in full. In the case of the electron-positron pair creation of (23.166), for instance, the
second-order S-matrix element is described as [1]

S ð 2 Þ = - e2 d4 x1 d4 x2 ψ - ðx1 Þγ μ SF ðx1 - x2 Þγ ν ψ - ðx2 ÞAþ þ


μ ðx1 ÞAν ðx2 Þ: ð23:167Þ

In (23.167), field operators are expressed in the normal order and both the
annihilation operators (corresponding to the positive frequency) and creation oper-
ators (corresponding to the negative frequency) are easily distinguished.
1116 23 Interaction Between Fields

We have already shown the calculation procedure of hf| S(1)| ii. Even though it did
not describe a real physical process, it helped understand how the calculations are
performed on the interacting fields. The calculation processes of hf| S(2)| ii are
basically the same as those described above. In the present case, instead of
(23.134) we assume

j ii = a{ ðk, r Þb{ ðp, hÞj0i and j f i = a{ ðk0 , r 0 Þb{ ðp0 , h0 Þj0 :

Then, using e.g., (23.161), we have

ð2Þ
f jSComp ji = hf jSa jii þ hf jSb jii

with

hf jSa jii = h0 j bðp0 , h0 Þaðk0 , r 0 ÞSa a{ ðk, r Þb{ ðp, hÞj0i,


hf jSb jii = h0 j bðp0 , h0 Þaðk0 , r 0 ÞSb a{ ðk, r Þb{ ðp, hÞj0i: ð23:168Þ

Substituting (23.162) or (23.163) for Sa or Sb in (23.168), respectively, we carry


out the calculations as in Sect. 23.7.1. The calculation procedures are summarized as
follows: (i) Expand the field operators into the positive- or negative-frequency parts
in the three-dimensional momentum space as in Sects. 22.4 and 22.5. (ii) Use the
(anti)commutation relation regarding the creation or annihilation operators of photon
and electron as in (23.141)-(23.143). (iii) Perform the integration subsequently with
respect to the momentum space and further perform the integration with respect to x1
and x2 of SF(x1 - x2) that produces δ functions of four-momenta. Then, (iv) further
integration of the resulting SF(x1 - x2) in terms of the four-momenta converts it into
the Fourier counterpart of S~F ðpÞ denoted by (22.276). Thus, the complicated
integration can be reduced to simple algebraic computation.
The above summary would rather hint to us an overwhelming number of com-
putations. As the proverb goes, however, “an attempt is sometimes easier than
expected.” We show the major parts of the calculations for hf| Sa| ii as below. Similar
techniques can be applied to the computation of hf| Sb| ii.
The S-matrix element is described as

ζ r ζ r0
hf jSa jii = ð- ieÞ2
ð2π Þ 6
2p00 2k00 ð2p0 Þð2k 0 Þ


× 0j d4 x1 d4 x2 uðp0 , h0 Þεν ðk0 , r 0 Þ γ ν SF ðx1 - x2 Þεμ ðk, r Þγ μ uðp, hÞ j0 ,

ð23:169Þ

where ζ r and ζ r0 come from (22.344). In (23.169) SF(x1 - x2) is expressed as


23.8 Example: Compton Scattering [1] 1117

- ix ðq - p 0
- k 0 Þ ix2 ðq - p - kÞ
- id 4 q e 1 e qμ γ μ þ m
SF ð x 1 - x 2 Þ = :
ð2π Þ4 m2 - q2 - iE

Performing the integration of SF(x1 - x2) in (23.169) with respect to x1 and x2, we
obtain an expression including δ functions of four-momenta such that

ζ r ζ r0 h0j0i
hf jSa jii = ð- ieÞ2 ×
ð2π Þ6 2p00 2k 00 ð2p0 Þð2k0 Þ

0 0 4 μ
-id 4 q δ ðq - p -k Þδ ðq- p - k Þ qμ γ þ m
4

ð2π Þ8 uðp0 , h0 Þεν ðk0 ,r 0 Þ γ ν εμ ðk, rÞγ μ uðp,hÞ
ð2π Þ4 m2 - q2 -iE

ζ r ζ r0 δ4 ðp þ k - p0 - k 0 Þ
= ð- ieÞ2 ð2π Þ4
ð2π Þ2 2p00 2k00 ð2p0 Þð2k0 Þ

 ð- iÞ pμ þ kμ γ μ þ m
× uðp0 , h0 Þεν ðk0 , r 0 Þ γ ν εμ ðk, r Þγ μ uðp, hÞ
m 2 - ðp þ k Þ2

ð2π Þ4 δ4 ðp þ k - p0 - k 0 Þ
=
ð2π Þ6 2p00 2k00 ð2p0 Þð2k0 Þ

× - e2 ζ r ζ r0 uðp0 , h0 Þεν ðk0 , r 0 Þ γ ν S~F ðp þ kÞεμ ðk, r Þγ μ uðp, hÞ : ð23:170Þ

In (23.170), we used

δ4 ðq - p0 - k0 Þδ4 ðq - p - kÞ = δ4 ðp þ k - p0 - k 0 Þδ4 ðq - p - k Þ: ð23:171Þ

Since the first factor of (23.171) does not contain the variable q, it can be taken
out from the integrand. The integration of the integrand containing the second factor
produces S~F ðp þ kÞ in (23.170). In (23.170), S~F ðpÞ was defined in Sect. 22.4.4 as

- i pμ γ μ þ m
S~F ðpÞ = : ð22:276Þ
m2 - p2 - iE

Namely, the function S~F ðpÞ is a momentum component (or Fourier component) of
SF(x) defined in (22.275). If in (23.170) we think of only the transverse modes for the
photon field, the indices r and r′ of ζ r and ζ r0 run over 1 and 2 [1]. In that case, we
have ζ 1 = ζ 2 = 1, and so we may neglect the factor ζ r ζ r0 ð= 1Þ accordingly.
On this condition, we define a complex number M a as
1118 23 Interaction Between Fields

Fig. 23.5 Feynman


diagrams representing the
(a)
second-order S-matrix of the
Compton scattering. (a)
Diagram of Sa. (b) Diagram
of Sb

= +

(b)

= −


M a  - e2 uðp0 , h0 Þεν ðk0 , r 0 Þ γ ν S~F ðp þ kÞεμ ðk, r Þγ μ uðp, hÞ: ð23:172Þ

The complex number M a is called the Feynman amplitude (or invariant scattering
amplitude) associated with the Feynman diagram of Sa (see Fig. 23.5a).
To compute S~F ðp þ k Þ, we have

ð- iÞ pμ γ μ þ m ð- iÞ pμ γ μ þ m
S~F ðp þ kÞ = =
2
ðp þ k Þ - m2 þ iE p2 þ 2pk þ k2 - m2 þ iE
ð- iÞ pμ γ μ þ m ð- iÞ pμ γ μ þ m
= = , ð23:173Þ
2pk þ iE 2pk

where in the denominator we used p2 - m2 = k2 = 0. In (23.173), we dealt with the


“square” ( p + k)2 as if both p and k were ordinary numbers. Here, it means
23.8 Example: Compton Scattering [1] 1119

ð p þ k Þ 2  pμ þ k μ ð p μ þ k μ Þ = p μ p μ þ p μ k μ þ k μ p μ þ k μ k μ

= pμ pμ þ 2pμ k μ þ k μ kμ = p2 þ 2pk þ k2 ,

where we define pk  pμkμ. Since it is a convenient notation, it will be used


henceforth. Since we had pk ≠ 0 in the present case, the denominator was not an
infinitesimal quantity and, hence, we deleted iE as redundant. Thus, we get

1
hf jSa jii = ð2π Þ4 δ4 ðp0 þ k 0 - p - k Þ M a : ð23:174Þ
ð2π Þ 6
2p00 2k00 ð2p0 Þð2k 0 Þ

In a similar manner, with hf| Sb| ii, we have

1
hf jSb jii = ð2π Þ4 δ4 ðp0 þ k0 - p - k Þ
ð2π Þ6 2p00 2k00 ð2p0 Þð2k0 Þ

× - e2 ζ r ζ r0 uðp0 , h0 Þεν ðk, r Þγ ν S~F ðp - k0 Þεμ ðk0 , r 0 Þ γ μ uðp, hÞ : ð23:175Þ

Also, defining a complex number M b as


M b  - e2 uðp0 , h0 Þεν ðk, r Þγ ν S~F ðp - k 0 Þεμ ðk0 , r 0 Þ γ μ uðp, hÞ, ð23:176Þ

we obtain

1
hf jSb jii = ð2π Þ4 δ4 ðp0 þ k 0 - p - k Þ M b : ð23:177Þ
ð2π Þ 6
2p00 2k00 ð2p0 Þð2k 0 Þ

In (23.174) and (23.177), the presence of δ4( p′ + k′ - p - k) reflects the four-


momentum conservation before and after the Compton scattering process. Fig-
ure 23.5 represents the Feynman diagrams that correspond to Sa (Fig. 23.5a) and
Sb (Fig. 23.5b) related to the Compton scattering. Both the diagrams consist of two
building blocks depicted in Fig. 23.2. Even though Fig. 23.5a, b shares a topological
feature the same as that of Fig. 23.3a, the connection of the two building blocks has
been reversed with Sa and Sb. From the discussions of the present and previous
sections, the implications of these Feynman diagrams (Fig. 23.5a, b) are evident.
Equations (23.174) and (23.177) represent a general feature of the field interac-
tions. The factor (2π)4 present in both the numerator and denominator could be
reduced, but we leave it without change. It is because (23.174) and (23.177) can
readily be generalized in the case where initial two particles and N final particles take
part in the overall reaction.
In the general two-particle collision, we assume that the reaction is represented by
1120 23 Interaction Between Fields

P1 þ P2 ⟶ P0 1 þ P0 2 þ ⋯ þ P0 N, ð23:178Þ

where P1 and P2 denote the initial particles and P′1, P′2, ⋯P′N represent the final
particles. Then, in accordance with (23.174) and (23.177), the S-matrix element is
expressed as

hf jSjii
N 2 1
= ð2π Þ4 δ4 p0
f =1 f
- p
i=1 i
ð2π Þ3 ð2E 1 Þð2π Þ3 ð2E 2 Þ

N 1
 f =1
M: ð23:179Þ
ð2π Þ 2E 0f
3

In (23.179), pi = Ei
pi
ði = 1, 2Þ denotes four-momenta of the initially colliding
E 0f
two particles and p0f = p0f
ðf = 1, ⋯, N Þ represents those of the finally created
N particles. Also, in (23.179) we used S and M instead of Sa, M a , etc., since a general
case is intended. Notice that with the Compton scattering N = 2.

23.8.3 Scattering Cross-Section [1]

To connect the general formalism to experimental observations, we wish to relate the


S-matrix elements to the transition probability that measures the strength of the
interaction between the fields. The relevant discussion is first based on the general
expression (23.179), and then, we focus our attention to the Compton scattering.
Let w be the transition probability per unit time and unit volume. This is given by

w = jhf jSjiij2 =TV, ð23:180Þ

where T is a time during which the scattering experiment is being done; V is a


volume related to the experiment. Usually, TV is taken as the whole space-time,
because an elementary particle reaction takes place in an extremely narrow space-
time domain.
Conservation of the energy and momentum with respect to (23.179) is read as

N
E1 þ E2 = E0 ,
k=1 k
ð23:181Þ
23.8 Example: Compton Scattering [1] 1121

N
p1 þ p2 = p0 :
k=1 k
ð23:182Þ

From the above equations, we know that all E 01 , ⋯, E 0N are not independent
valuables. All p01 , ⋯, p0N are not independent valuables, either.
The cross-section σ for the reaction or scattering is described by

σ = ð2π Þ6 w=vrel , ð23:183Þ

where vrel represents the relative velocity between two colliding particles P1 and P2.
In (23.183), vrel is defined by [1]

vrel  ðp1 p2 Þ2 - ðm1 m2 Þ2 =E 1 E 2 , ð23:184Þ

where m1 and m2 are the rest masses of the colliding particles 1 and 2, respectively.
When we have a colinear collision of the two particles, (23.184) is reduced to a
simple expression. Suppose that at a certain inertial frame of reference Particle 1 and
Particle 2 have a velocity v1 and v2 = av1 (where a is a real number), respectively.
According to (1.9), we define γ 1 and γ 2 such that

γ i  1= 1 - jvi j2 ði = 1, 2Þ: ð23:185Þ

Then, from (1.21), we have

E i = mi γ i and pi = mi γ i vi ði = 1, 2Þ: ð23:186Þ

Also, we obtain

ðp1 p2 Þ2 - ðm1 m2 Þ2 = ðm1 m2 γ 1 γ 2 - p1  p2 Þ2 - ðm1 m2 Þ2


4 1
= ðm1 m2 γ 1 γ 2 Þ2 1 - 2a v1 j2 þ a2 v1 -
γ12γ22

= ðm1 m2 γ 1 γ 2 Þ2 jv1 j2 ða - 1Þ2 : ð23:187Þ

With the above derivation, we used (23.185) and (23.186) along with v2 = av1
(colinear collision). Then, we get

vrel = j v1 ða - 1Þ j = j v2 - v1 j : ð23:188Þ

Thus, in the colinear collision case, we have had a relative velocity identical to
that in a non-relativistic limit. In a particular instance of a = 0 (to which Particle 2 is
at rest), vrel = j v1j. This is usually the case with a laboratory frame where Particle
1122 23 Interaction Between Fields

1 is caused to collide against Particle 2 at rest. If Particle 1 is photon, in a laboratory


frame we have a particularly simple expression of

vrel = 1: ð23:189Þ

Substituting (23.180) and (23.184) for (23.183), we obtain

ð2π Þ6 E1 E2
σ= 
TV
ð p1 p2 Þ 2 - ð m 1 m 2 Þ 2
1 N 2 2
 6
ð2π Þ4 δ4 p0
f =1 f
- p
i=1 i
ð2π Þ  4E1 E2
1
× jM j2 , ð23:190Þ
f ð2π Þ 2E 0f
3

where (2π)6 of the numerator in the first factor is related to the normalization of
incident particles 1 and 2 (in the initial state) [8]. In (23.190), we have

N 2 2 N 2
ð2π Þ4 δ4 p0
f =1 f
- p
i=1 i
= ð2π Þ4 δ4 ð0Þ  ð2π Þ4 δ4 p0
f =1 f
- p
i=1 i
,

where we used a property of a δ function expressed as [9]

f ðxÞδðxÞ = f ð0ÞδðxÞ: ð23:191Þ

But, in analogy with (22.57), we obtain


1 1 1
1 1
δ ð 0Þ = ei0x dx = dx or ð2π Þ4 δ4 ð0Þ = d4 x ≈ TV, ð23:192Þ
2π -1 2π -1 -1

where (2π)4δ4(0) represents the volume of the whole space-time. Thus, we get

ð2π Þ4 N 2
σ= δ4 p0
f =1 f
- p
i=1 i
4 ðp1 p2 Þ2 - ðm1 m2 Þ2

N 1
 f =1
jM j2 : ð23:193Þ
ð2π Þ 2E 0f
3

The cross-section is usually described in terms of differential cross-section dσ


such that
23.8 Example: Compton Scattering [1] 1123

ð2π Þ4 N 2
dσ = δ4 p0
f =1 f
- p
i=1 i
4 ðp1 p2 Þ2 - ðm1 m2 Þ2

N dp0f
 f =1
jM j2 , ð23:194Þ
ð2π Þ3 2E0f

where dp0f represents an infinitesimal range of momentum from p0f to p0f þ dp0f for
which the particle P′f ( f = 1, ⋯, N ) is found. Then, σ can be expressed in a form of
integration of (23.194) as

ð2π Þ4 N 2
σ= δ4 p0
f =1 f
- p
i=1 i
4 ðp1 p2 Þ2 - ðm1 m2 Þ2

N dp0f
 f =1
jM j2 : ð23:195Þ
ð2π Þ 3
2E0f

Note that (23.195) is Lorentz invariant. In fact, if in (22.150) we put f(k0)  1 and
g(k)  1, we have

1
d4 kδ k2 - m2 θðk 0 Þ = d3 k :
2 k þ m2
2

As an equivalent form of (23.195), we obtain

ð2π Þ4 4 N 2 N dp0f
σ= δ p0 -
f =1 f i=1
p f =1
jM j2 : ð23:196Þ
4vrel E 1 E 2 ð2π Þ3 2E 0f

In particular, if N = 2 (to which the Compton scattering is relevant), we obtain

1 2 2
dσ = δ4 p0 - p jM j2 d3 p01 d3 p02 : ð23:197Þ
64π 2 vrel E 1 E 2 E 01 E 02 f =1 f i=1 i

Now, we wish to perform practical calculations of the Compton scattering after


Mandl and Shaw [1]. Because of the momentum conservation described by

p02 = p1 þ p2 - p01 , ð23:198Þ

we regard only p01 (of the scattered photon) as an independent variable out of p01 and
p02 . The choice is at will, but it is both practical and convenient to choose the
momentum of the particle in question for the independent variable. To express the
1124 23 Interaction Between Fields

differential cross-section dσ as a function of the scattering angle of the particle, we


convert the Cartesian coordinate representation into the polar coordinate represen-
tation with respect to the momentum space p01 in (23.197). Then, we integrate
(23.197) with respect to p02 to get

2 2 1
dσ = δ4 E0 - E jM j2 p01 d p01 dΩ0 , ð23:199Þ
2
f =1 f i=1 i 64π 2 E 1 E 2 E 01 E 02 vrel

where dΩ′ denotes a solid angle element, to which the particle P′1 is scattered into.
Notice that the p02 integration converted δ4 2 0
f = 1 pf -
2
i = 1 pi into
2 0 2
δ4 f = 1 Ef - i = 1 Ei .
Further integration of dσ over j p01 j produces [1]

-1
1 ∂ E 01 þ E02
p01 jM j2 p01 dΩ0 , ð23:200Þ
2
dσd = d~
σ=
64π E 1 E 2 E 01 E 02 vrel
2
∂ p01

where d~ σ means the differential cross-section obtained after the integration of dσ


over j p01 j. To derive (23.200), we used the following formula of δ function [1]

∂x
f ðx, yÞδ½gðx, yÞdx = f ðx, yÞδ½gðx, yÞ dg
∂g y

-1
∂g
= f ðx, yÞ : ð23:201Þ
∂x y
g=0

We make use of (23.200) in Sect. 23.8.6.

23.8.4 Spin and Photon Polarization Sums

We often encounter a situation where a Feynman amplitude M takes a form [1]

M = cuðp0 , h0 ÞΓuðp, hÞ, ð23:202Þ

where c is a complex number and Γ is usually a (4, 4) matrix (tensor) that contains
gamma matrices. The amplitude M is a complex number as well and we are
interested in calculating jM j2 (a real number).
Hence, we have
23.8 Example: Compton Scattering [1] 1125


jM j2 = jcj2 ½uðp0 , h0 ÞΓuðp, hÞ½uðp0 , h0 ÞΓuðp, hÞ : ð23:203Þ

Viewing uðp0 , h0 ÞΓuðp, hÞ as a (1, 1) matrix, we have

 {
½uðp0 , h0 ÞΓuðp, hÞ = ½uðp0 , h0 ÞΓuðp, hÞ : ð23:204Þ

Then, we obtain

jM j2 = jcj2 ½uðp0 , h0 ÞΓuðp, hÞ u{ ðp, hÞΓ{ u{ ðp0 , h0 Þ

= jcj2 ½uðp0 , h0 ÞΓuðp, hÞ u{ ðp, hÞγ 0 γ 0 Γ{ γ 0 γ 0 u{ ðp0 , h0 Þ

= jcj2 ½uðp0 , h0 ÞΓuðp, hÞ uðp, hÞ γ 0 Γ{ γ 0 γ 0 γ 0 uðp0 , h0 Þ

= jcj2 ½uðp0 , h0 ÞΓuðp, hÞuðp, hÞ γ 0 Γ{ γ 0 uðp0 , h0 Þ, ð23:205Þ

where with the second equality we used γ 0γ 0 = E; with the third equality, we used
~ as
u{ = γ 0 u. Defining Γ

~  γ 0 Γ{ γ 0 ,
Γ ð23:206Þ

we get

~ ðp0 , h0 Þ:
jM j2 = jcj2 uðp0 , h0 ÞΓuðp, hÞuðp, hÞΓu ð23:207Þ

In (23.207), u(p, h), u(p′, h′), etc. are (4, 1) matrices; Γ and Γ ~ as well as
~ 0 0
uðp, hÞuðp, hÞ are (4, 4) matrices. Then, Γuðp, hÞuðp, hÞΓuðp , h Þ is a (4, 1) matrix.
Now, we recall (18.70) and (18.71); namely,

TrðPQÞ = TrðQPÞ, ð18:70Þ


ðPQÞii = Pij Qji = Qji Pij = ðQPÞjj : ð18:71Þ
i i j j i j

Having a look at (18.71), we find (18.70) can be extended to the case where
neither P nor Q is a square matrix but PQ is a square matrix. Thus, (23.207) can be
rewritten as

~ ðp0 , h0 Þuðp0 , h0 Þ ,
jM j2 = jcj2  Tr Γuðp, hÞuðp, hÞΓu ð23:208Þ

where Γuðp, hÞuðp, hÞΓu ~ ðp0 , h0 Þuðp0 , h0 Þ is a (4, 4) matrix. Note, however, that
taking traces does not mean sum or average of the spin and photon polarization. It is
because in (23.208) the helicity indices h or h′ for electron remain without being
1126 23 Interaction Between Fields

summed. The photon polarization indices [hidden in Γ and Γ ~ for (23.208)] have not
yet been summed either.
The polarization sum and average are completed by taking summation over all the
pertinent indices. This will be done in detail in the next section.

23.8.5 Detailed Calculation Procedures of Feynman


Amplitude [1]

Now that we have gotten used to the generalities of the scattering theory, we are in
the position to carry out the detailed calculations of the Feynman amplitude. In the
case of the Compton scattering, the Feynman amplitude M a has been given as


M a  - e2 uðp0 , h0 Þεν ðk0 , r 0 Þ γ ν S~F ðp þ k Þεμ ðk, r Þγ μ uðp, hÞ, ð23:172Þ

M b  - e2 uðp0 , h0 Þεν ðk, r Þγ ν S~F ðp - k 0 Þεμ ðk0 , r 0 Þ γ μ uðp, hÞ: ð23:176Þ

Comparing (23.172) and (23.176) with (23.202), we find that c in (23.202) is


given by

c = - e2 ζ r ζ r 0 : ð23:209Þ

Note that ζ r ζ r0 = 1 ðr, r 0 = 1, 2Þ. The total Feynman amplitude M is given by

M = M a þ M b: ð23:210Þ

Then, we have

jMj2 = jM a j2 þ jM b j2 þ M a M b  þ M a  M b : ð23:211Þ

From here on, we assume that electrons and/or photons are unpolarized in both
their initial and final states. In terms of the experimental techniques, it is more
difficult to detect or determine the polarization of helicity of electrons than the
polarization of photons. To estimate the polarization sum with the photon field, we
need another technique in relation to its gauge transformation (vide infra).
(i) In the case of the electrons, we first average over initial spins (or helicities) h and
sum over final helicities h′. These procedures yield a positive real number X ~ that
is described by [1]
23.8 Example: Compton Scattering [1] 1127

~= 1
X
2 2
jMj2 : ð23:212Þ
2 h=1 h0 = 1

(ii) As for the photons, we similarly average over initial polarizations r and sum
over final polarizations r′. These procedures yield another positive real number
X defined as [1]

1 2 2
~= 1 2 2 2 2
X r=1 r0 = 1
X r=1 r0 = 1 h=1 h0 = 1
jMj2 , ð23:213Þ
2 4

where jMj2 was given in (23.211). We further define X1 to X4 as

1 2 2 2 2
X1  r=1 r0 = 1 h=1 h0 = 1
jM a j2 , ð23:214Þ
4
1 2 2 2 2
X2  r=1 r0 = 1 h=1 h0 = 1
jM b j2 , ð23:215Þ
4
1 2 2 2 2
X3  r=1 r0 = 1 h=1 h0 = 1
M aM b, ð23:216Þ
4
1 2 2 2 2
X4  r=1 r0 = 1 h=1 h0 = 1
M aM b: ð23:217Þ
4

As a number X we want to compute, we have

X = X1 þ X2 þ X3 þ X4 : ð23:218Þ

We calculate four terms of (23.218) one by one. To compute jM a j2 and jM b j2 , we


follow (23.214) and (23.215), respectively. We think of M a M b  and M a  M b
afterward. First, with jM a j2 we have

εν ðk0 , r 0 Þ γ ν pμ þ k μ γ μ þ m εμ ðk, r Þγ μ
Γ= : ð23:219Þ
2pk

Then, from (23.206), we obtain

ðγ 0 γ μ γ 0 Þεμ ðk, r Þ γ 0 pμ þ kμ γ μ þ m γ 0 ðγ 0 γ ν γ 0 Þεν ðk0 , r 0 Þ 0


~ = γ0
Γ γ
2pk
γ μ εμ ðk, r Þ pμ þ kμ γ μ þ m γ ν εν ðk0 , r 0 Þ
= : ð23:220Þ
2pk

Thereby, we get
1128 23 Interaction Between Fields

e4 2 2 2 2
~ ðp0 , h0 Þuðp0 , h0 Þ
X1 = r=1 r0 = 1 h=1 h0 = 1
Tr Γuðp, hÞuðp, hÞΓu
4

e4 εν ðk0 , r 0 Þ γ ν pμ þ k μ γ μ þ m ελ ðk, r Þγ λ
= r,r 0 ,h,h0
Tr uðp, hÞuðp, hÞ
4 2pk

γ ρ ερ ðk, r Þ pμ þ k μ γ μ þ m εσ ðk0 , r 0 Þγ σ
× uðp0 , h0 Þuðp0 , h0 Þ ½ {
2pk

e4 εν ðk0 , r 0 Þ γ ν pμ þ kμ γ μ þ m ελ ðk, r Þγ λ
= r,r 0
Tr ð pα γ α þ m Þ
4 2pk

γ ρ ερ ðk, r Þ pμ þ kμ γ μ þ m εσ ðk0 , r 0 Þγ σ 0 β
× pβ γ þ m
2pk

e4 εν ðk0 , r 0 Þ εσ ðk0 , r 0 Þγ ν pμ þ kμ γ μ þ m ερ ðk, r Þ ελ ðk, r Þγ λ
= r,r0
Tr
4 2pk

ðpα γ α þ mÞγ ρ pμ þ kμ γ μ þ m γ σ p0β γ β þ m


× ½{{
2pk

e4 ð- ηνσ Þγ ν pμ þ kμ γ μ þ m - ηρλ γ λ
= Tr
4 2pk

ðpα γ α þ mÞγ ρ pμ þ kμ γ μ þ m γ σ p0β γ β þ m


× ½{{{
2pk

e4 ð- γ σ Þ pμ þ k μ γ μ þ m - γ ρ ðpα γ α þ mÞγ ρ pμ þ k μ γ μ þ m γ σ p0β γ β þ m


= Tr
4 ð2pkÞ2

e4 γ σ pμ þ kμ γ μ þ m γ ρ ðpα γ α þ mÞγ ρ pμ þ kμ γ μ þ m γ σ p0β γ β þ m


= Tr :
4 ð2pk Þ2
ð23:221Þ

At [{] of (23.221), we used

uðp, hÞuðp, hÞ = pμ γ μ þ m and h0


uðp0 , h0 Þuðp0 , h0 Þ = p0β γ β þ m: ð21:144Þ
h

At [{{], we used the fact that εσ (k′, r′) and ερ ðk, r Þ can freely be moved in
(23.221) because they are not an operator, but merely a complex number. At
23.8 Example: Compton Scattering [1] 1129

[{ { {], see discussion made just below based on the fact that the Feynman amplitude
M must have the gauge invariance.
Suppose that the vector potential Aμ(x) takes a form [1]

Aμ ðxÞ = αεμ ðk, r Þe ± ikx ,

where α is a constant.
Also, suppose that in (22.293) we have [1]

~ ðk Þe ± ikx :
Λ ð xÞ = α Λ

Then, from (22.293), Aμ(x) should be transformed such that

~ ðk Þe ± ikx
A0μ ðxÞ = Aμ ðxÞ þ ∂μ ΛðxÞ = Aμ ðxÞ ± ik μ Λ
~ ðk Þe ± ikx = α εμ ðk, r Þ ± ik μ Λ
= αεμ ðk, r Þe ± ikx ± ik μ Λ ~ ðkÞ=α e ± ikx : ð23:222Þ

Therefore, corresponding to the gauge transformation of Aμ(x), εμ(k, r) is


transformed such that [1]

~ ðkÞ=α:
ε0μ ðk, r Þ = εμ ðk, r Þ ± ik μ Λ ð23:223Þ

Taking complex conjugate of (23.223), we have

~ ðkÞ =α :
ε0μ ðk, r Þ = εμ ðk, r Þ ∓ ik μ Λ ð23:224Þ

Using (23.223) and (23.224), we get

2
ε0 ðk, r Þ εν 0 ðk, r Þ
r=1 μ

=
2
~ ðk Þ =α εν ðk, r Þ ± ik ν Λ
εμ ðk, r Þ ∓ ik μ Λ ~ ðkÞ=α
r=1
2 
= ε ðk, r Þ
r=1 μ
εν ðk, r Þ

þ
2
~ ðkÞ=α ∓ ik μ εν ðk, r ÞΛ
± ik ν εμ ðk, r Þ Λ ~ ðkÞ=α
~ ðkÞ =α þ k μ kν Λ 2
:
r=1

ð23:225Þ

Namely, the gauge transformation of the photon field produces the additional
second term in (23.225). Let the (partial) Feynman amplitude X1 including the factor
2  0
r = 1 εμ ′ ðk, r Þ εν ðk, r Þ be described by
1130 23 Interaction Between Fields

e4 μν
M~
2 2
X1 = Tr r0 = 1
ε0 ðk, r Þ ε0ν ðk, r Þ
r=1 μ
4
e4 μν
þ Ξ k μ , kν M~
2 2
= Tr r0 = 1
ε ðk, r Þ εν ðk, r Þ
r=1 μ
,
4

where Ξ(kμ, kν) represents the second term of (23.225). That is, we have

Ξ kμ , kν


2
~ ðk Þ=α ∓ ik μ εν ðk, r ÞΛ
± ik ν εμ ðk, r Þ Λ ~ ðkÞ=α
~ ðk Þ =α þ kμ kν Λ 2
:
r=1

ð23:226Þ
μν
The factor M~ represents a remaining part of X1 . In the present case, the
μν
indices μ and ν of M ~ indicate those of the gamma matrices γ μ and γ ν. Since X1 is
μν
gauge invariant, Ξ kμ , k ν M~ must vanish. As Ξ(kμ, kν) is linear with respect to
both kμ and kν, it follows that
μν μν
k μ M~ = k ν M~ = 0: ð23:227Þ

Meanwhile, we had

2  k μ kν - ðknÞ k μ nν þ nμ k ν
ε ðk, r Þ
r=1 μ
εν ðk, r Þ = - ημν - : ð23:228Þ
ðknÞ2

Note that as pointed out previously, (22.377) holds with the real photon field (i.e.,
an external photon) as in the present case. Then, we obtain

e4 kμ kν - ðknÞ kμ nν þ nμ kν μν
M~
2
X1 = Tr r0 = 1
- ημν - 2
: ð23:229Þ
4 ðknÞ

But, from the reason mentioned just above, we have

kμ k ν - ðknÞ kμ nν þ nμ kν μν
M~ = 0, ð23:230Þ
ðknÞ2
μν
with only the first term - ημν M~ contributing to X1 . With regard to another
2  0 0 0 0
related factor 0 ε
r =1 α ð k , r Þε β ð k , r Þ of (23.221), the above argument holds
as well.
Thus, the calculation of X1 comes down to the trace computation of (23.221). The
remaining calculations are straightforward and follow the literature method [1]. We
first calculate the numerator of (23.221). To this end, we define a (4, 4) matrix
Y included as a numerator of (23.221) as [1]
23.8 Example: Compton Scattering [1] 1131

Y  γ σ pμ þ k μ γ μ þ m γ ρ ðpα γ α þ mÞγ ρ ðpλ þ k λ Þγ λ þ m γ σ

# use γ ρ( pαγ α)γ ρ = - 2( pαγ α) and γ ργ ρ = 4:

= γ σ pμ þ kμ γ μ þ m ð- 2pα γ α þ 4mÞ ðpλ þ kλ Þγ λ þ m γ σ

= γ σ - 2 pμ þ k μ γ μ ðpα γ α Þðpλ þ k λ Þγ λ - 2m pμ þ kμ γ μ ðpα γ α Þ

þ 4m2 pμ þ kμ γ μ - 2mðpα γ α Þðpλ þ kλ Þγ λ


2
þ4m pμ þ kμ

- 2m2 ðpα γ α Þ þ 4m2 ðpλ þ kλ Þγ λ þ 4m3 γ σ


= 4ðpλ þ kλ Þγ λ ðpα γ α Þ pμ þ k μ γ μ - 16m pμ þ kμ pμ

- 16m2 pμ þ kμ γ μ þ 4m2 ðpα γ α Þ þ 16m3 :


2
þ16m pμ þ kμ ð23:231Þ

With the last equality of (23.231), we used the following formulae [1]:

γ λ γ λ = 4E 4 , ð21:185Þ
σ λ σ σ σ λ σ
γ λ ðAσ γ Þγ = - 2ðAσ γ Þ, γ λ ðAσ γ ÞðBσ γ Þγ = 4Aσ B , ð21:187Þ
γ λ ðAσ γ σ ÞðBσ γ σ ÞðCσ γ σ Þγ λ = - 2ðC σ γ σ ÞðBσ γ σ ÞðAσ γ σ Þ: ð21:188Þ

In (21.187) and (21.188), Aα, Bα, and Cα are arbitrary four-vectors.


The spin and polarization average and sum have already been taken in the process
of deriving (23.218), an example of which is given as (23.221). Hence, what remains
to be done is only the computation of the trace. To this end, we define another real
number Y as [1]

Y  TrY p0β γ β þ m : ð23:232Þ

Then, from (23.221), we have

X1 = e4 Y=16ðpk Þ2 : ð23:233Þ

With Y, we obtain
ρ ρ
Y = 16 2 pμ þ kμ pμ pρ þ k ρ p0 - pμ þ kμ ðpμ þ kμ Þpρ p0

þm2 - 4pμ ðpμ þ kμ Þ þ 4 pμ þ kμ ðpμ þ kμ Þ

þm2 pρ p0ρ - 4 pμ þ kμ p0μ þ 4m4 g: ð23:234Þ

In deriving (23.234), we used the following formulae [1]:


1132 23 Interaction Between Fields

Tr½ðAσ γ σ ÞðBσ γ σ Þ = 4Aσ Bσ , ð21:189Þ


σ σ σ σ
Tr½ðAσ γ ÞðBσ γ ÞðC σ γ ÞðDσ γ Þ
= 4½ðAσ Bσ ÞðC σ Dσ Þ - ðAσ C σ ÞðBσ Dσ Þ þ ðAσ Dσ ÞðBσ Cσ Þ: ð21:190Þ

Also, we used the fact that the trace of an odd number product of gamma matrices
vanishes [1]. Calculations of (23.234) are straightforward using three independent
scalars, i.e., m2, pk, and pk′; here notice that pk, pk′, etc. are the abbreviations of

pk  pμ kμ , pk 0  pμ k 0μ , etc: ð23:235Þ

We note, for instance,

ðp þ kÞp = p2 þ kp = m2 þ kp: ð23:236Þ

The conservation of four-momentum is expressed as

p þ k = p0 þ k 0 : ð23:237Þ

Taking the square of both sides of (23.237), we have

p2 þ 2kp þ k2 = p02 þ 2k0 p0 þ k 02 : ð23:238Þ

Using p2 = p′2 = m2 and k2 = k′2 = 0, we get

kp = k 0 p0 : ð23:239Þ

From (23.237), we obtain another relation

pk 0 = p0 k: ð23:240Þ

Rewriting (23.237), in turn, we obtain a relationship

p - p0 = k0 - k: ð23:241Þ

Taking the square of both sides of (23.241) again and using k′k = pk - pk′, we get

pp0 = m2 þ pk - pk 0 : ð23:242Þ

Using the above relationship among the scalars, we finally get [1]

Y = 32 m4 þ m2 ðpk Þ þ ðpk Þðpk 0 Þ : ð23:243Þ

Substituting (23.243) for (23.233), we obtain


23.8 Example: Compton Scattering [1] 1133

2e4 ½m4 þ m2 ðpk Þ þ ðpk Þðpk 0 Þ


X1 = : ð23:244Þ
ðpk Þ2

Next, we compute X2 of (23.215). From (23.172), with the c-numbers of the


photon field, we have

 
jM a j2 / εν ðk0 , r 0 Þ εν ðk0 , r 0 ÞS~F ðp þ k ÞS~F ðp þ kÞεμ ðk, r Þεμ ðk, r Þ :

Since X1 is obtained as a result of averaging the indices r and r′, X1 is unaltered


by exchanging the indices r and r′. Hence, we obtain

  
X1 / εν ðk0 , r Þ εν ðk0 , r ÞS~F ðp þ k ÞS~F ðp þ kÞεμ ðk, r 0 Þεμ ðk, r 0 Þ
  
= εν ð- k0 , r Þ εν ð- k0 , r ÞS~F ðp þ k ÞS~F ðp þ kÞεμ ð- k, r 0 Þεμ ð- k, r 0 Þ :

Since in the above equation we are thinking of the transverse modes with the
polarization, those modes are held unchanged with respect to the exchange of
k $ - k and k′ $ - k′ as shown in the above equation. Meanwhile, from
(23.176), we have

X 2 / jM b j2
 
/ εν ðk, r Þεν ðk, r Þ S~F ðp - k0 ÞS~F ðp - k 0 Þεμ ðk0 , r 0 Þ εμ ðk0 , r 0 Þ:

Comparing the above expressions of X1 and X2 , we notice that X2 can be


obtained merely by switching the four-momentum k $ - k′. Then, changing
k $ - k′ in (23.244), we immediately get

2e4 ½m4 - m2 ðpk 0 Þ þ ðpk Þðpk 0 Þ


X2 = : ð23:245Þ
ðpk 0 Þ
2

To calculate X3 , we need related but slightly different techniques from that used
for the calculation of X1 . The outline for this is as follows. As similarly to the case of
(23.221), we arrive at the following relation described by

X3

e4 γ σ pρ þ kρ γ ρ þ m γ μ ðpα γ α þ mÞγ σ pρ - k 0ρ γ ρ þ m γ μ p0β γ β þ m


= Tr :
4 ð2pk Þð- 2pk 0 Þ

ð23:246Þ

Decomposing the numerator of (23.246), we have non-vanishing eight terms that


comprise gamma matrices and four-vectors as well as scalars (i.e., an electron mass
1134 23 Interaction Between Fields

m). Of these terms, m is included as either a factor of m2 or m4. Classifying the eight
terms according to the power of m, we get

m4 : one term, m2 : six terms, m0 : one term: ð23:247Þ

We evaluate (23.247) termwise below.


(i) m4 : m4 γ σ γ μ γ σ γ μ

m4 γ σ γ μ γ σ γ μ = m4 ðγ σ γ μ γ σ Þγ μ = m4 ð- 2γ μ Þγ μ = - 2m4 γ μ γ μ = - 8m4 E 4 , ð23:248Þ

where with the second and last equalities, we used (21.185); E4 denotes the (4, 4)
identity matrix. Taking the trace of (23.248), we have

Tr - 8m4 E 4 = - 32m4 : ð23:249Þ

Notice that the trace of E4 is 4.


(ii) m2: Six terms, one of which is e.g., m2γ σ [( pρ + kρ)γ ρ]γ μ[pαγ α]γ σ γ μ

m2 γ σ pρ þ kρ γ ρ γ μ ½pα γ α γ σ γ μ = m2 γ σ pρ þ kρ γ ρ  pα γ μ γ α γ σ γ μ
= m2 γ σ pρ þ k ρ γ ρ  pα ð4ηασ Þ

= m2 γ σ pρ þ k ρ γ ρ  ð4pσ Þ = 4m2 γ σ pσ pρ þ k ρ γ ρ , ð23:250Þ

where with the second equality, we used the first equation of (21.186). Taking the
trace of (23.250), we obtain

Tr 4m2 γ σ pσ pρ þ kρ γ ρ = 16m2 pðp þ kÞ = 16m2 m2 þ ðpk Þ , ð23:251Þ

where with the first equality, we used (21.189). Other five terms of the second power
of m are calculated in a similar manner. Summing up the traces of these six matrices,
we get

96m4 þ 48m2 ðpk Þ - 48m2 ðpk 0 Þ: ð23:252Þ


23.8 Example: Compton Scattering [1] 1135

(iii)
m0 : γ σ pρ þ k ρ γ ρ γ μ ½pα γ α γ σ pλ - k0λ γ λ γ μ p0β γ β

γ σ pρ þ k ρ γ ρ γ μ ½pα γ α γ σ pλ - k0λ γ λ γ μ p0β γ β

= - 2½pα γ α γ μ pρ þ k ρ γ ρ  pλ - k 0λ γ λ γ μ p0β γ β

= - 2½ pα γ α   γ μ p ρ þ k ρ γ ρ pλ - k 0λ γ λ γ μ  p0β γ β

= - 2½pα γ α   4ðp þ kÞðp - k0 Þ  p0β γ β

= - 8ðp þ kÞðp - k 0 Þ½pα γ α  p0β γ β , ð23:253Þ

where with the first equality, we used (21.188) with Bσ  1 [or B  (1 1 1 1), where
B denotes a covariant four-vector]; with the second to the last equality, we used the
second equation of (21.187). Taking the trace of (23.253), we have

Tr - 8ðp þ kÞðp - k0 Þ½pα γ α  p0β γ β

= - 32ðp þ kÞðp - k0 Þpp0


= - 32m2 m2 þ ðpk Þ - ðpk 0 Þ , ð23:254Þ

where with the first equality, we used (21.189); the last equality resulted from
(23.237), (23.239), and (23.242). Summing (23.249), (23.252), and (23.254), finally
we get

e4 32m4 þ 16m2 ðpk Þ - 16m2 ðpk 0 Þ


X3 =  =
4 ð2pk Þð- 2pk0 Þ
e4 ½2m4 þ m2 ðpk Þ - m2 ðpk 0 Þ
- : ð23:255Þ
ðpk Þðpk 0 Þ

From (23.216) and (23.217), at once we see that X4 is obtained as a complex


conjugate of X3 . Also, we realize that (23.255) remains unchanged with respect to
the exchange of k with -k′. Since X3 is real, we obtain

e4 ½2m4 þ m2 ðpk Þ - m2 ðpk 0 Þ


X3 = X4 = - : ð23:256Þ
ðpk Þðpk 0 Þ

Summing up all the above results and substituting them for (23.218), we get X of
(23.213) as
1136 23 Interaction Between Fields

1 2 2 2 2
X = X1 þ X2 þ X3 þ X4 = r=1 r0 = 1 h=1 h0 = 1
j Mj 2
4
2e4 ½m4 þ m2 ðpk Þ þ ðpk Þðpk0 Þ 2e4 ½m4 - m2 ðpk 0 Þ þ ðpk Þðpk 0 Þ
= þ
ðpk Þ2 ðpk 0 Þ
2

e4 ½2m4 þ m2 ðpk Þ - m2 ðpk 0 Þ


-2×
ðpk Þðpk 0 Þ

pk 0 pk
2
1 1 1 1
= 2e4 m4 - þ 2m2 - þ þ : ð23:257Þ
pk pk 0 pk pk 0 pk pk0

Once again, notice that (23.257) is unchanged with the exchange between k and -
k′ .

23.8.6 Experimental Observations

Our next task is to connect (23.257) to the experimental results associated with the
scattering cross-section that has been expressed as (23.200). In the present situation,
we assume a scattering described by (23.178) as

P1 þ P2⟶P0 1 þ P0 2: ð23:258Þ

It is the case where N = 2 in (23.178). In relation to an evaluation of the cross-


section of Compton scattering, we frequently encounter an actual case where
electrons and/or photons are unpolarized or their polarization is not detected with
their initial or final states (i.e., before and after the collision experiments). In this
section, we deal with such a case in the general manner possible.
We denote the momentum four-vectors by

E E0 ω ω0
p , p0  ,k  , k0  , ð23:259Þ
p p0 k k0

where p and k represent the initial state of the electron and photon field, respectively;
p′ and k′ represent the final state of the electron and photon field, respectively. To use
(23.200), we express the quantities of (23.259) in the laboratory frame where a
photon beam is incident to the targeted electrons that are nearly stationary. In
practice, this is a good approximation. Thereby, p of (23.259) is approximated as

p≈ m
0
:

Then, the momentum conservation described by p + k = p′ + k′ is translated into


23.8 Example: Compton Scattering [1] 1137

Fig. 23.6 Geometry of the


photon field. With the
meaning of the notation
see text

= −

p0 = k - k0 : ð23:260Þ

The energy conservation described by E + ω = E′ + ω′ is approximated as

m þ ω ≈ E 0 þ ω0 : ð23:261Þ

Meanwhile, the relation

k0 k = pk - pk 0 ð23:262Þ

directly derived from the four-momentum conservation is approximated by

k0 k ≈ mω - mω0 = mðω - ω0 Þ: ð23:263Þ

With the photon field, we have

ω2 = k2 , ω0 = k0 , ω = jkj, ω0 = jk0 j:
2 2
ð23:264Þ

Figure 23.6 shows the geometry of the photon field. We assume that photons are
scattered in a radially symmetric direction with respect to the photon momentum k as
a function of the scattering angle θ. Notice that we conform Figs. 23.6 to 1.1. In this
geometry, we have

k0 k = ω0 ω - k0  k = ω0 ω - ω0 ω cos θ = ω0 ωð1 - cos θÞ: ð23:265Þ

Combining (23.263) and (23.265), we obtain

mðω - ω0 Þ = ωω0 ð1 - cos θÞ: ð23:266Þ

This is consistent with (1.16), if (1.16) is written in the natural units. In (23.258),
we may freely choose P′1 (or P′2) from among electron and photon. For a practical
1138 23 Interaction Between Fields

purpose, however, we choose photon for P′1 (i.e., scattered photon) in (23.258). That
is, since we are to directly observe the photon scattering in the geometry of the
laboratory frame, it will be convenient to measure the energy and momentum of the
scattered photon as P′1 in the final state. Then, P′2 is electron accordingly. To apply
(23.200) to the experimental results, we need to express the electron energy E 02 in
terms of the final photon energy ω′. We have

E 0 = p0 þ m2 = ðk - k0 Þ þ m2 ,
2 2 2
ð23:267Þ

where with the last equality, we used (23.260). Applying the cosine rule to Fig. 23.6,
we have

ðk - k0 Þ = jkj2 þ jk0 j - 2jkj  jk0 j cos θ = ω2 þ ω0 - 2ωω0 cos θ:


2 2 2
ð23:268Þ

Then, we obtain

E0 = ω2 þ ω0 2 - 2ωω0 cos θ þ m2 : ð23:269Þ

Differentiating (23.269) with respect to ω′, we get

∂E0 2ω0 - 2ω cos θ


= : ð23:270Þ
∂ω0 2 ω2 þ ω0 2 - 2ωω0 cos θ þ m2

Accordingly, we obtain

∂ðE 0 þ ω0 Þ ω0 - ω cos θ
= þ1
∂ω0 ω2 þ ω0 2 - 2ωω0 cos θ þ m2
ω0 - ω cos θ þ E 0 ωð1 - cos θÞ þ m
= = , ð23:271Þ
E0 E0

where with the last equality, we used (23.261). Using (23.266), furthermore, we get

∂ ðE 0 þ ω 0 Þ mω
= 0 0: ð23:272Þ
∂ω0 Eω

Now, (23.200) is read as

-1
ω0 2 ∂ðω0 þ E 0 Þ
σ=
d~ X dΩ0 : ð23:273Þ
64π 2 mωω0 E 0 vrel ∂ω0

Substituting (23.189), (23.257), and (23.272) for (23.273), we get


23.9 Summary 1139

σ
d~ ω0 2 E 0 ω0
0 = 2 0 0  X
dΩ 64π mωω E vrel mω

pk 0 pk
2
2e4 ω0 2 1 1 1 1
= m4 - þ 2m2 - þ þ : ð23:274Þ
64π 2 m2 ω2 pk pk 0 pk pk 0 pk pk 0

The energy of the scattered electron was naturally deleted. As mentioned earlier,
in (23.274) we assumed vrel = 1 as a good approximation. In the case where the
electron is initially at rest, we can use the approximate relations (23.266) as well as
pk = mω and pk′ = mω′. On these conditions, we obtain a succinct relation described
by

2
σ
d~ α 2 ω0 ω ω0
= þ - sin2 θ , ð23:275Þ
dΩ0 2m2 ω ω0 ω

where α is the fine structure constant given in (23.87). Moreover, using (23.266) ω′
can be eliminated from (23.275) [1].
The polarized measurements of the photon have thoroughly been pursued and the
results are well-known as the Klein-Nishina formula. The details can be seen in the
literature [1, 3].

23.9 Summary

In this chapter, we have described the features of the interaction among quantum
field, especially within the framework of QED. As a typical example, we detailed the
calculation procedures of the Compton scattering. It is intriguing to point out that
QED adequately explains the various aspects of the Compton scattering, whose
experiments precisely opened up a route to QED.
In this chapter, we focused on the S-matrix expansion of the second order. Yet,
the essence of QED clearly showed up. Namely, we studied how the second-order S-
ð2Þ
matrix elements Sk ðk = A, ⋯, FÞ reflected major processes of QED including the
Compton scattering. We have a useful index about whether the integrals relevant to
the S-matrix diverge. The index K is given by [1]

K = 4 - ð3=2Þf e - be , ð23:276Þ

where fe and be are the number of the external Fermion (i.e., electron) lines and
external Boson (photon) lines in the Feynman diagram, respectively. That K ≥ 0 is a
necessary condition for the integral to diverge. In the case of the Compton scattering,
fe = be = 2, then we have K = - 1. The integrals converge accordingly. We have
confirmed that this is really the case. In the cases of the electron self-energy (K = 1)
and photon self-energy (K = 2), the second-order S-matrix integrals diverge [1].
1140 23 Interaction Between Fields

Even though we did not go into detail on those issues, especially the
renormalization, we gained a glimpse into the essence of QED that offers prolific
aspects on the elementary particle physics.

References

1. Mandl F, Shaw G (2010) Quantum field theory, 2nd edn. Wiley, Chichester
2. Arfken GB, Weber HJ, Harris FE (2013) Mathematical methods for physicists, 7th edn. Aca-
demic Press, Waltham
3. Itzykson C, Zuber J-B (2005) Quantum field theory. Dover, New York
4. Kaku M (1993) Quantum field theory. Oxford University Press, New York
5. Peskin ME, Schroeder DV (1995) An introduction to quantum field theory. Westview Press,
Boulder
6. Moller C (1952) The theory of relativity. Oxford University Press, London
7. Klauber RD (2013) Student friendly quantum field theory, 2nd edn. Sandtrove Press, Fairfield
8. Sakamoto M (2020) Quantum field theory II. Shokabo, Tokyo. (in Japanese)
9. Dennery P, Krzywicki A (1996) Mathematics for physicists. Dover, New York
Chapter 24
Basic Formalism

We have studied several important aspects of the quantum theory of fields and found
that the theory is based in essence upon the theory of relativity. In Chaps. 1 and 21,
we briefly surveyed its essence. In this chapter, we explore the basic formalism and
fundamental skeleton of the theory more systematically in depth and in breadth.
First, we examine the constitution of the Lorentz group from the point of view of Lie
algebra. The Lorentz group comprises two components, one of which is virtually the
same as the rotation group that we have examined in Part IV. Another component, on
the other hand, is called Lorentz boost, which is considerably different from the
former one. In particular, whereas the matrices associated with the spatial rotation
are unitary, the matrices that represent the Lorentz boost are not unitary. In this
chapter, we connect these aspects of the Lorentz group to the matrix representation
of the Dirac equation to clarify the constitution of the Dirac equation. Another point
of great importance rests on the properties of Dirac operators G and G ~ that describe
the Dirac equation (Sect. 21.3). We examine their properties from the point of view
of matrix algebra. In particular, we study how G and G ~ can be diagonalized through
the similarity transformation. Remaining parts of this chapter deal with advanced
topics of the Lorentz group from the point of view of continuous groups and their Lie
algebras.

24.1 Extended Concepts of Vector Spaces

In Part III, we studied the properties and characteristics of the linear vector spaces.
We have seen that among the vector spaces, the inner product space and its
properties are most widely used and investigated in various areas of natural science,
accompanied by the bra-ket notation ha| bi of inner product due to Dirac (see
Chaps. 1 and 13). The inner product space is characterized by the positive definite
metric associated with the Gram matrix (Sect. 13.2). In this section, we wish to
modify the concept of metric so that it can be applied to wider vector spaces

© Springer Nature Singapore Pte Ltd. 2023 1141


S. Hotta, Mathematical Physical Chemistry,
https://doi.org/10.1007/978-981-99-2512-4_24
1142 24 Basic Formalism

including the Minkowski space (Chap. 21). In parallel, we formally introduce


tensors and related concepts as a typical example of the extended vector spaces.

24.1.1 Bilinear Mapping

First, we introduce a direct product (vector) space formed by the product of two
(or more) vector spaces. In Part IV, we introduced a notion of direct product in
relation to groups and their representation. In this chapter, we extend this notion to
the linear vector spaces. We also study the properties of the bilinear mapping that
maps the direct product space to another vector space. First, we have a following
definition:
Definition 24.1 Let B be a mapping described by

B : W × V ⟶ T, ð24:1Þ

where V, W, and T are vector spaces over K; W × V means the direct product of V and
W. Suppose that B(, ) satisfies the following conditions [1, 2]:

Bðy þ y0 , xÞ = Bðy, xÞ þ Bðy0 , xÞ,


Bðy, x þ x0 Þ = Bðy, xÞ þ Bðy, x0 Þ,
Bðay, xÞ = Bðy, axÞ = aBðy, xÞ x, x0 2 V; y, y0 2 W; a 2 K, ð24:2Þ

where K is a field usually chosen from a complex number field ℂ or real number field
ℝ. Then, the mapping B is called a bilinear mapping.
The range of B, i.e., B(, ) is a subspace of T. This is obvious from the definition
of (24.2) and (11.15). Since in Definition 24.1 the dimension of V and W can be
different, we assume that the dimension of V and W is n and m, respectively. If the
direct product W × V is deemed to be a topological space (Sect. 6.1.2), it is called a
direct product space. (However, we do not need to be too strict with the
terminology.)
Definition 24.1 does not tell us more detailed structure of B(, ) with the excep-
tion of the fact that B(, ) forms a subspace of T. To clarify this point, we set further
conditions on the bilinear mapping B defined by (24.1) and (24.2). The conditions
(B1)-(B3) are given as follows [1, 2]:
(B1) Let V and W be linear vector spaces with dimV = n and dimW = m. If x1, ⋯,
xr 2 V are linearly independent, with y1, ⋯, yr 2 W we have
r
i=1
Bðyi , xi Þ = 0 ⟹ yi = 0 ð1 ≤ i ≤ r Þ, ð24:3Þ

where r ≤ n.
24.1 Extended Concepts of Vector Spaces 1143

(B2) If y1, ⋯, yr 2 W are linearly independent, for x1, ⋯, xr 2 V, we have


r
i=1
Bðyi , xi Þ = 0 ⟹ xi = 0 ð1 ≤ i ≤ r Þ, ð24:4Þ

where r ≤ m.
(B3) The vector space T is spanned by B(y, x) (x 2 V, y 2 W ).
Regarding Definition 24.1, we have a following important theorem.
Theorem 24.1 [1, 2] Let B be a bilinear mapping described by B(, ) : W × V ⟶ T,
where V, W, and T are vector spaces over K. Let {ei; 1 ≤ i ≤ n} and {fj; 1 ≤ j ≤ m} be
the basis set of V and W, respectively. Then, a necessary and sufficient condition for
the statements (B1)-(B3) to hold is that B( fj, ei) (1 ≤ i ≤ n, 1 ≤ j ≤ m) are the basis
set of T.
Proof
(i) Sufficient condition: Suppose that B( fj, ei) (1 ≤ i ≤ n, 1 ≤ j ≤ m) are the basis set
of T. That is, we have

T = Span B f j , ei ; 1 ≤ i ≤ n, 1 ≤ j ≤ m : ð24:5Þ

From Definition 24.1 and from the assumption that ei (1 ≤ i ≤ n) and fj (1 ≤ j ≤ m)


are the basis set of V and W, regarding 8x 2 V, 8y 2 W, B(y, x) is described by a linear
combination of B( fj, ei). Since that linear combination may take any number (over
K ) as a coefficient of B( fj, ei), the said B(y, x) spans T. It is the assertion of (B3).
Suppose that with x(k) (2V ) and y(k) (2W ) we have

n ðk Þ m ðk Þ
xðkÞ = ξ ei , yðkÞ
i=1 i
= η fj
j=1 j
ð1 ≤ k ≤ r Þ, ð24:6Þ

ðk Þ ðk Þ
with ξi , ηj 2 K. We assume that in (24.6) r ≤ max (m, n). Then, we have

r m n r ðk Þ ðkÞ
k=1
B yð k Þ , xð k Þ = j=1 i=1
η ξi
k=1 j
B f j , ei : ð24:7Þ

In (24.7) putting r
k = 1B yðkÞ , xðkÞ = 0, we get

r ðk Þ ðk Þ
η ξi
k=1 j
= 0: ð24:8Þ

It is because B( fj, ei) (1 ≤ i ≤ n, 1 ≤ j ≤ m) are the basis set of T and, hence,


linearly independent. Equation (24.8) holds with any pair of j and i. Therefore, with a
fixed j, from (24.8) we obtain
1144 24 Basic Formalism

ð1Þ ð1Þ ð2Þ ð2Þ ðr Þ ðr Þ


ηj ξ1 þ ηj ξ1 þ ⋯ þ ηj ξ1 = 0,
ð1Þ ð1Þ ð2Þ ð2Þ ðr Þ ðr Þ
ηj ξ2 þ ηj ξ2 þ ⋯ þ ηj ξ2 = 0,

ð1Þ ð2Þ ðr Þ
ηj ξðn1Þ þ ηj ξðn2Þ þ ⋯ þ ηj ξðnrÞ = 0: ð24:9Þ

Rewriting (24.9) in a form of column vectors, we get

ð1Þ ð2Þ ðr Þ
ξ1 ξ1 ξ1
ð1Þ ð2Þ ðr Þ
ηj ⋮ þ ηj ⋮ þ⋯þ ηj ⋮ = 0: ð24:10Þ
ξðn1Þ ξðn2Þ ξðnrÞ

If x(1), ⋯, x(r) 2 V are linearly independent, the corresponding column vectors of


(24.10) are linearly independent as well with the fixed basis set of B( fj, ei). Conse-
quently, we have

ð1Þ ð2Þ ðr Þ
ηj = ηj = ⋯ = ηj = 0: ð24:11Þ

Since (24.11) holds with all j (1 ≤ j ≤ m), from (24.6) we obtain

yð1Þ = yð2Þ = ⋯ = yðrÞ = 0:

The above argument ensures the assertion of (B1). In a similar manner, the
statement (B2) is assured. Since B( fj, ei) (1 ≤ i ≤ n, 1 ≤ j ≤ m) are the basis set of
T, (B3) is evidently assured.
(ii) Necessary condition: Suppose that the conditions (B1)- (B3) hold. From the
argument of the sufficient condition, we have (24.5). Therefore, for B( fj,
ei) (1 ≤ i ≤ n, 1 ≤ j ≤ m) to be the basis set of T, we only have to show that
B( fj, ei) are linearly independent. Suppose that we have

ζ B
i,j ij
f j , ei = 0, ζ ij 2 K: ð24:12Þ

Putting yi = ∑jζ ijfj, from (24.12) we obtain

i
Bðyi , ei Þ = 0: ð24:13Þ

Since {ei; 1 ≤ i ≤ n} are linearly independent, from (B1) yi = 0. Since {fj; 1 ≤ j ≤ m}


are the basis set of W, from ∑jζ ijfj = 0 we must have ζ ij = 0 (1 ≤ i ≤ n, 1 ≤ j ≤ m).
Thus, from (24.12) B( fj, ei) are linearly independent. These complete the proof.
24.1 Extended Concepts of Vector Spaces 1145

Earlier we have mentioned that B(, ) is a subspace of the vector space T.


Practically, however, we may safely identify the range of B with the whole space
of T. In other words, we may as well regard the subspace B(, ) as T itself. Thus, the
dimension of T can be regarded as mn.
Abbreviating B( fj, ei) as fj  ei and omitting superscript k in (24.7), we have
m n
Bðy, xÞ = j=1 i=1
f j  ei η j ξ i : ð24:14Þ

Further rewriting (24.14), we obtain

Bðy, xÞ

η 1 ξ1

η 1 ξn
η 2 ξ1
= ð f 1  e1 ⋯ f 1  en f 2  e1 ⋯ f 2  en ⋯ f m  e1 ⋯ f m  en Þ ⋮ :
η 2 ξn

ηm ξ1

ηm ξn
ð24:15Þ

Equations (24.14) and (24.15) clearly show that any elements belonging to T can
be expressed as RHS of these equations. Namely, T is spanned by B(y, x) (x 2 V,
y 2 W ). This is what the statement (B3) represents.
Equation (24.15) corresponds to (11.13); in (24.15) mn basis vectors form a row
vector and mn “coordinates” form a corresponding column vector.
The next example will help us understand the implication of the statements (B1)-
(B3) imposed on the bilinear mapping earlier in this section.
Example 24.1 Let V and W be vector spaces with dimension 3 and 2, respectively.
Suppose that xi 2 V (i = 1, 2, 3) and yj 2 W ( j = 1, 2, 3). Let xi and yj be given by

ξ1 ξ1′ ξ″1
x1 = ð e1 e2 e3 Þ ξ2 , x 2 = ð e1 e2 e3 Þ ξ2′ , x3 = ð e1 e2 e3 Þ ξ″2 ;
ξ3 ξ3′ ξ″3
η1 η′ η″1
y1 = ð f 1 f 2 Þ , y2 = ðf 1 f 2 Þ 1′ , y3 = ð f 1 f 2 Þ , ð24:16Þ
η2 η2 η″2

where (e1 e2 e3) and ( f1 f2) are basis sets of V and W, respectively; ξ1, η1, etc. 2 K. For
simplicity, we write
1146 24 Basic Formalism

ðe1 e2 e3 Þ  e, ðf 1 f 2 Þ  f : ð24:17Þ

We define the bilinear mapping B(y, x) as, e.g.,

η1 ξ1
ξ1 η1 ξ2
η1 η1 ξ3
Bðy1 ; x1 Þ = ðf × eÞ × ξ2  ð f × eÞ , ð24:18Þ
η2 η2 ξ1
ξ3 η2 ξ2
η2 ξ3

where we have

ð f × eÞ = ð f 1  e1 f 1  e2 f 1  e3 f 2  e1 f 2  e2 f 2  e3 Þ : ð24:19Þ

The RHS of (24.19) is a row vector comprising the direct products of


fjek ( j = 1, 2; k = 1, 2, 3).
Fixing the basis sets of V and W, we represent the vectors as numerical vectors
(or column vectors) [1, 2]. Then, the first equation of the bilinear mapping in (24.3)
of (B1) can be read as

η1 ξ1 þ η1′ ξ1′ þ η″1 ξ″1


η1 ξ2 þ η1′ ξ2′ þ η″1 ξ″2
η1 ξ3 þ η1′ ξ3′ þ η″1 ξ″3
Bðy1 ; x1 Þ þ Bðy2 ; x2 Þ þ Bðy3 ; x3 Þ = ðf × eÞ = 0: ð24:20Þ
η2 ξ1 þ η2′ ξ1′ þ η″2 ξ″1
η2 ξ2 þ η2′ ξ2′ þ η″2 ξ″2
η2 ξ3 þ η2′ ξ3′ þ η″2 ξ″3

Omitting the basis set (f × e), (24.20) is separated into two components such that

ξ1 ξ01 ξ001 ξ1 ξ01


η1 ξ2 þ η01 ξ02 þ η001 ξ002 = 0, η2 ξ2 þ η02 ξ02
ξ3 ξ03 ξ003 ξ3 ξ03
ξ001
þ η002 ξ002 = 0: ð24:21Þ
ξ003

Assuming that ξ1, ξ2, and ξ3 (2V ) are linearly independent, (24.21) implies that
η1 = η01 = η001 = η2 = η02 = η002 = 0. Namely, from (24.16) we have

y1 = y2 = y3 = 0: ð24:22Þ

This is an assertion of (B1). Meanwhile, think of the following equation:


24.1 Extended Concepts of Vector Spaces 1147

η1 ξ1 þ η1′ ξ1′
η1 ξ2 þ η1′ ξ2′
η1 ξ3 þ η1′ ξ3′
Bðy1 ; x1 Þ þ Bðy2 ; x2 Þ = ðf × eÞ = 0: ð24:23Þ
η2 ξ1 þ η2′ ξ1′
η2 ξ2 þ η2′ ξ2′
η2 ξ3 þ η2′ ξ3′

This time, (24.23) can be separated into three components such that

η1 η01 η1 η01 η1
ξ1 þ ξ01 = 0, ξ2 þ ξ02 = 0, ξ3
η2 η02 η2 η02 η2
η01
þ ξ03 = 0: ð24:24Þ
η02

Assuming that y1 and y2(2W) are linearly independent, in turn, (24.24) implies
that ξ1 = ξ01 = ξ2 = ξ02 = ξ3 = ξ03 = 0. Namely, from (24.16), we have

x1 = x2 = 0: ð24:25Þ

This is an assertion of (B2). The statement (B3) is self-evident; see (24.14) and
(24.15).
In relation to Theorem 24.1, we give another important theorem.
Theorem 24.2 [1–3] Let V and W be vector spaces over K. Then, a vector space
T (over K ) and a bilinear mapping B : W × V ⟶ T that satisfy the above-mentioned
conditions (B1), (B2), and (B3) must exist. Also, let T be an arbitrarily chosen vector
space over K and let B be an arbitrary bilinear mapping expressed as

B : W × V ⟶ T: ð24:26Þ

Then, a linear mapping ρ : T ⟶ T that satisfies the condition described by

Bðy, xÞ = ρ½Bðy, xÞ; x 2 V, y 2 W ð24:27Þ

is uniquely determined.
Proof
(i) Existence: Suppose that T is a nm-dimensional vector space with the basis set
B( fj, ei) (1 ≤ i ≤ n, 1 ≤ j ≤ m), where ei (1 ≤ i ≤ n) and fj (1 ≤ j ≤ m) are the basis
set of V and W, respectively. We assume that V and W are n- and m-dimensional
vector spaces, respectively.
Let x 2 V and y 2 W be expressed as
1148 24 Basic Formalism

n m
x= ξ e ,y=
i=1 i i
ηf :
j=1 j j
ð24:28Þ

We define the bilinear mapping B(y, x) as


m n
Bðy, xÞ = j=1
ηξe
i = 1 j i ji
ð24:29Þ

with eji  B f j , ei , so that eij 2 T ð1 ≤ i ≤ n, 1 ≤ j ≤ mÞ is the basis set of T. Then, we


have B : W × V ⟶ T. On top of it, a set of (T, B) satisfies the conditions (B1)-
(B3) according to Theorem 24.1. Thus, existence of the bilinear mapping of
B required by the statement of the theorem must necessarily exist.
(ii) Uniqueness: Suppose that an arbitrary bilinear mapping B : W × V ⟶ T is
given. Given such B and T, we define a linear mapping ρ : T ⟶ T that is
described by

ρ ζ e
i,j ji ji
= ζ B
i,j ji
f j , ei , ð24:30Þ

where ζ ji is an arbitrary number over K. Choosing ζ ji = 1 when j = k, i = l, and


otherwise ζ ji = 0, we have

ρðekl Þ = Bðf k , el Þ: ð24:31Þ

Meanwhile, we have shown the existence of the bilinear mapping B(y, x)


described by (24.29). Operating ρ on both sides of (24.29), we obtain

m n m n
ρ½Bðy, xÞ = ρ j=1
ηξe
i = 1 j i ji
= j=1
ηξρ
i=1 j i
eji

= ηξB
i,j j i
f j , ei = i,j
B ηj f j , ξi ei = B ηf ,
j j j
ξe
i i i
= Bðy, xÞ,

where with the third equality, we used (24.30) and (24.31); with the fourth and fifth
equalities, we used the bilinearity of B. Thus, the linear mapping ρ defined in
(24.30) satisfies (24.27).
The other way around, any linear mapping that satisfies (24.27) can be described
as

ρ eji = ρ B f j , ei = B f j , ei : ð24:32Þ

Multiplying both sides of (24.32) by ζji and summing over i and j, we recover
(24.30). Thus, the linear mapping ρ is determined uniquely. These complete the
proof.
24.1 Extended Concepts of Vector Spaces 1149

24.1.2 Tensor Product

Despite being apparently craggy and highly abstract, Theorem 24.2 finds extensive
applications in the vector space theory. The implication of Theorem 24.2 is that once
the set (T, B) is determined for the combination of the vector spaces V and W such
that B : W × V ⟶ T, any other set ðT, BÞ can be derived from (T, B). In this respect,
B defined in Theorem 24.2 is referred to as the universal bilinear mapping
[3]. Another important point is that the linear mapping ρ : T ⟶ T defined in
Theorem 24.2 is bijective (see Fig. 11.3) [1, 2]. That is, the inverse mapping ρ-1
exists.
The range of the universal bilinear mapping B is a vector space over K. The vector
space T accompanied by the bilinear mapping B, or the set (T, B) is called a tensor
product (or Kronecker product) of V and W. Thus, we find that the constitution of
tensor product hinges upon Theorem 24.2. The tensor product T is expressed as

T  W  V: ð24:33Þ

Also, B(y, x) is described as

Bðy, xÞ  y  x, ð24:34Þ

where x 2 V, y 2 W, and y  x 2 T.
The notion of tensor product produces a fertile plain for the vector space theory.
In fact, (24.29) can readily be extended so that various bilinear mappings are
included. As a typical example, let us think of linear mappings φ and ψ such that

φ : V ⟶ V 0, ψ : W ⟶ W 0, ð24:35Þ

where V, V′, W, and W′ are vector spaces of the dimension n, n′, m, and m′,
respectively. Also, suppose that we have [1, 2]

T =W  V and T  W 0  V 0: ð24:36Þ

What we assume in (24.36) is that even though in Theorem 24.2 we have chosen
T as an arbitrary vector space, in (24.36) we choose T as a special vector space (i.e.,
a tensor product W′  V′). From (24.35) and (24.36), using φ and ψ we can construct
a bilinear mapping B : W × V ⟶ W 0  V 0 such that

Bðy, xÞ = ψ ðyÞ  φðxÞ, ð24:37Þ

where ψ( y) 2 W′, φ(x) 2 V′, and ψ ðyÞ  φðxÞ 2 T = W 0  V 0 . Then, from Theorem
24.2 and (24.34), a linear mapping ρ : T ⟶ T must uniquely be determined such
that
1150 24 Basic Formalism

ρðy  xÞ = ψ ðyÞ  φðxÞ: ð24:38Þ

Also, defining this linear mapping ρ as

ρ  ψ  φ, ð24:39Þ

(24.38) can be rewritten as [1, 2]

ðψ  φÞðy  xÞ = ψ ðyÞ  φðxÞ: ð24:40Þ

Equation (24.40) implies that in response to a tensor product of the vector spaces,
we can construct a tensor product of the operators that act on the vector spaces. This
is one of the most practical consequences derived from Theorem 24.2.
The constitution of the tensor products is simple and straightforward. Indeed, we
have already dealt with the tensor products in connection with the group represen-
tation (see Sect. 18.8). Let A and B be matrix representations of φ and ψ, respec-
tively. Following (18.193), the tensor product of B  A is expressed as [4]

b11 A ⋯ b1m A
B  A= ⋮ ⋱ ⋮ , ð24:41Þ
bm0 1 A ⋯ bm0 m A

where A and B are given by (aij) (1 ≤ i ≤ n′; 1 ≤ j ≤ n) and (bij) (1 ≤ i ≤ m′; 1 ≤ j ≤ m),
respectively. In RHS of (24.41), A represents a (n′, n) full matrix of (aij). Thus, the
direct product of matrix A and B yields a (m′n′, mn) matrix. We write it symbolically
as

½ðm0 , mÞ matrix  ½ðn0 , nÞ matrix = ½ðm0 n0 , mnÞ matrix: ð24:42Þ

The computation rules of (24.41) and (24.42) apply to a column matrix and a row
matrix as well. Suppose that we have

x1 y1
x = ðe1 ⋯ en Þ ⋮ , y = ðf 1 ⋯ f m Þ ⋮ , ð24:43Þ
xn ym

where e1, ⋯, en are the basis vectors of V and f1, ⋯, fm are the basis vectors of W.
Thus, the expression of y  x has already appeared in (24.15) as
24.1 Extended Concepts of Vector Spaces 1151

y  x=
y1 x1

y1 xn
y2 x1
ð f 1  e1 ⋯ f 1  en f 2  e 1 ⋯ f 2  en ⋯ f m  e1 ⋯ f m  en Þ ⋮ :
y2 xn

y m x1

y m xn
ð24:44Þ

In (24.44), the basis set is fp  eq (1 ≤ p ≤ m, 1 ≤ q ≤ n) and their corresponding


coordinates are ypxq (1 ≤ p ≤ m, 1 ≤ q ≤ n). Replacing ψ  φ in (24.40) with B  A,
we obtain

ðB  AÞðy  xÞ = BðyÞ  AðxÞ: ð24:45Þ

Following the expression of (11.37), we have

y1 x1
BðyÞ = f 01 ⋯ f 0m0 B ⋮ , AðxÞ = e01 ⋯ e0n0 A ⋮ , ð24:46Þ
ym xn

where e01 , ⋯, e0n0 are the basis vectors of V′ and f 01 , ⋯, f 0m0 are the bas vectors of W′.
Thus, (24.45) can be rewritten as

y1 x1
f 01 ⋯ f 0m0 B ⋮  e01 ⋯ e0n0 A ⋮
ym xn

y1 x1
= f 01 ⋯ f 0m0  e01 ⋯ e0n0 ðB  AÞ ⋮  ⋮ : ð24:47Þ
ym xn

Using (24.41) and (24.44), we get


1152 24 Basic Formalism

y1 x1

y1 x n
y1 x1 b11 A ⋯ b1m A y2 x1
ðB  AÞ ⋮  ⋮ = ⋮ ⋱ ⋮ ⋮ : ð24:48Þ
y2 x n
ym xn bm ′ 1 A ⋯ bm ′ m A

ym x 1

ym x n

Notice that since B  A operated on the coordinates from the left with the basis set
unaltered, we omitted the basis set f 01 ⋯f 0m0  e01 ⋯e0n0 in (24.48). Thus, we only
had to perform the numerical calculations based on the standard matrix theory as in
RHS of (24.48), where we routinely dealt with a product of a (m′n′, mn) matrix and a
(mn, 1) matrix, i.e., column vector. The resulting column vector is a (m′n′, 1) matrix.
The bilinearity of tensor product of the operators A and B can readily be checked
so that we have

B  ðA þ A0 Þ = B  A þ B  A0 ,
ðB þ B0 Þ  A = B  A þ B0  A: ð24:49Þ

The bilinearity of the tensor product of column or row matrices can be checked
similarly as well.
Suppose that the dimension of all the vector spaces V, V′, W, and W′ is the same n.
This simplest case is the most important and frequently appears in many practical
applications. We have

a11 ⋯ a1n x1
AðxÞ = ðe1 ⋯ en Þ ⋮ ⋱ ⋮ ⋮ ,
an1 ⋯ ann xn

b11 ⋯ b1n y1
B ð y Þ = ð e1 ⋯ e n Þ ⋮ ⋱ ⋮ ⋮ , ð24:50Þ
bn1 ⋯ bnn yn

where x and y are given by

x1 y1
x = ð e1 ⋯ en Þ ⋮ , y = ðe1 ⋯ en Þ ⋮ : ð24:51Þ
xn yn

Now, we are thinking of the endomorphism


24.1 Extended Concepts of Vector Spaces 1153

A : V n ⟶ V n, B : V n ⟶ V n, ð24:52Þ

where Vn is a vector space of the dimension n; A and B are given by

a11 ⋯ a1n b11 ⋯ b1n


A= ⋮ ⋱ ⋮ ,B= ⋮ ⋱ ⋮ : ð24:53Þ
an1 ⋯ ann bn1 ⋯ bnn

Then, replacing φ and ψ of (24.40) with A and B as before, respectively, we


obtain

ðB  AÞðy  xÞ = BðyÞ  AðxÞ: ð24:54Þ

Following (11.37) once again, we rewrite (24.54) such that

ðB  AÞðy  xÞ
y1 x1
= ½ðe1 ⋯ en Þ  ðe1 ⋯ en Þ½B  A ⋮  ⋮ , ð24:55Þ
yn xn

where B  A is given by a (n2, n2) matrix expressed as

b11 A ⋯ b1n A
B  A= ⋮ ⋱ ⋮ : ð24:56Þ
bn1 A ⋯ bnn A

The structure of (24.55) is the same as (24.45) in point of the linear mapping.
Also, (24.56) has a same constitution as (24.41), except that the former gives a (n2,
n2) matrix, whereas the latter is a (m′n′, mn) matrix. In other words, instead of (24.48)
we have a linear mapping B  A described by
2 2
B  A : Vn ⟶ Vn : ð24:57Þ

In this regard, the tensor product produces an enlarged vector space. As shown
above, we use a term of tensor product to represent a direct product of two operators
(or matrices) besides a direct product of two vector spaces. In relation to the latter
case, we use the same term of tensor product to represent a direct product of row
vectors and that of column vectors (vide supra).
A following example helps us understand how to construct a tensor product.

Example 24.2 We have x = ðe1 e2 Þ x1


x2
and y = ðe1 e2 Þ y1
y2
. Also, let A and B be
1154 24 Basic Formalism

a11 a12 b11 b12


A= ,B= : ð24:58Þ
a21 a22 b21 b22

Then, we get

ðB  AÞðy  xÞ = ½ðe1 e2 Þ  ðe1 e2 Þ½B  A y1


y2
 x1
x2

= ðe1  e1 e1  e2 e2  e1 e2  e2 Þ ×
b11 a11 b11 a12 b12 a11 b12 a12 y 1 x1
b11 a21 b11 a22 b12 a21 b12 a22 y 1 x2
: ð24:59Þ
b21 a11 b21 a12 b22 a11 b22 a12 y 2 x1
b21 a21 b21 a22 b22 a21 b22 a22 y 2 x2

Omitting the basis set, we have

ðb11 y1 þ b12 y2 Þða11 x1 þ a12 x2 Þ


ðb21 y2 þ b22 y2 Þða11 x1 þ a12 x2 Þ
RHS of ð24:59Þ =
ðb11 y1 þ b12 y2 Þða21 x1 þ a22 x2 Þ
ðb21 y2 þ b22 y2 Þða21 x1 þ a22 x2 Þ
b a yx
l, j 1l 1 j l j y1′ x1′
b a yx
l, j 1l 2 j l j y1′ x2′
=  , ð24:60Þ
b a yx y2′ x1′
l, j 2l 1 j l j
y2′ x2′
b a yx
l, j 2l 2 j l j

where the rightmost side denotes the column vector after operating B  A on y  x.
Thus, we obtain

y0i x0k = bil akj yl xj : ð24:61Þ


j, l

Defining Yik  yi xk , we get

Y0ik = bil akj Ylj : ð24:62Þ


j, l

Equation (24.62) has already represented a general feature of the tensor product.
We discuss the transformation properties of tensors later.
24.1 Extended Concepts of Vector Spaces 1155

24.1.3 Bilinear Form

In the previous sections, we have studied several forms of the bilinear mapping. In
this section, we introduce a specific form of the bilinear mapping called a
bilinear form.
Definition 24.2 Let V and W be n-dimensional and m-dimensional vector spaces
over K, respectively. Let B be a mapping described by

B : W × V ⟶ K, ð24:63Þ

where W × V is a direct product space (see Sect. 24.1.1); K denotes the field (usually
chosen from among a complex number field ℂ or real number field ℝ). Suppose that
B(, ) satisfies the following conditions [1, 2]:

Bðy þ y0 , xÞ = Bðy, xÞ þ Bðy0 , xÞ,


Bðy, x þ x0 Þ = Bðy, xÞ þ Bðy, x0 Þ,
Bðay, xÞ = Bðy, axÞ = aBðy, xÞ, x, x0 2 V; y, y0 2 W; a 2 K: ð24:64Þ

Then, the mapping B is called a bilinear form.


Equation (24.64) is virtually the same as (24.2). Note, however, that although in
(24.2) the range of B is a vector space, in (24.64) that of B is K (i.e., simply a
number).
Let x 2 V and y 2 W be expressed as
n m
x= ξ e ,y=
i=1 i i
ηf :
j=1 j j
ð24:28Þ

Then, we have
m n
Bðy, xÞ = j=1
ηξB
i=1 j i
f j , ei : ð24:65Þ

Thus, B(, ) is uniquely determined by a (m, n) matrix B( fj, ei) (2K ).

24.1.4 Dual Vector Space

So far, we have introduced the bilinear mapping and tensor product as well as related
notions. The vector spaces, however, were “structureless,” as in the case of the
vector spaces where the inner product was not introduced (see Part III). Now, we
wish to introduce the concept of the dual (vector) space to equip the vector space
with a “metric.”
1156 24 Basic Formalism

Definition 24.3 Let V be a vector space over a field K. Then, a collection of arbitrary
homogeneous linear functionals f defined on V that map x (2V ) to K is called a dual
vector space (or dual space) of V.
This simple but abstract definition needs some explanation. The dual space of V is
written as V. With 8f 2 V and 8g 2 V, we define the sum f + g and multiplication
αf (α 2 K ) as

ðf þ gÞðxÞ  f ðxÞ þ gðxÞ,


ðαf ÞðxÞ  α  f ðxÞ, x 2 V: ð24:66Þ

Then, V forms a vector space over K with respect to the operation of (24.66). The
dimensionality of V is the same as that of V [1, 2]. To show this, we make a
following argument: Let 8x 2 V be expressed as
n
x= i=1
ξ i ei ,

where ξi 2 K and ei (i = 1, ⋯, n) is a basis set of a n-dimensional vector space V.


Also, for a future purpose, we denote the “coordinate” that corresponds to ei by a
character with a superscript such as ξi. The element x 2 V is uniquely represented by
a set {ξ1, ⋯, ξn} (see Sect. 11.1). Let fi 2 V  fi : V ⟶ K be defined as

fi ðxÞ = ξi : ð24:67Þ

Accordingly, we have
n
x= i=1
fi ðxÞei ði = 1, ⋯, nÞ: ð24:68Þ

Meanwhile, we have

n n
fi ð xÞ = f i k=1
ξ k ek = k=1
ξk fi ðek Þ: ð24:69Þ

For (24.67) and (24.69) to be consistent, we get

fi ðek Þ = δik : ð24:70Þ

A set of f1 , ⋯, fn is a basis set of V and called a dual basis with respect to {e1,
⋯, en}. As we have already seen in Sect. 11.4, we can choose another basis set of V
that consists of n linearly independent vectors. Accordingly, the dual basis that
satisfies (24.70) is sometimes referred to as a standard basis or canonical basis to
distinguish it from other dual bases of V. We have
24.1 Extended Concepts of Vector Spaces 1157

V  = Span f1 , ⋯, fn

in correspondence with

V = Span fe1 , ⋯, en g:

Thus, dimV = n. We have V ffi V (i.e., isomorphic) accordingly [1, 2]. In this


case, we must have a bijective mapping (or isomorphism) between V and V,
especially between {e1, ⋯, en} and f1 , ⋯, fn (Sects. 11.2, 11.4, and 16.4). We
define the bijective mapping N : V ⟶ V  as

Nðek Þ = fk ðk = 1, ⋯, nÞ: ð24:71Þ

Then, there must be an inverse mapping N - 1 to N (see Sect. 11.2). That is, we
have

ek = N - 1 fk ðk = 1, ⋯, nÞ:

Now, let y (2V) be expressed as


n
y= ηf
i=1 i
i
,

where ηi 2 K. As in the case of (24.68), we have


n
y= e ðyÞf
i=1 i
i
ði = 1, ⋯, nÞ, ð24:72Þ

with

ei ðyÞ = ηi , ð24:73Þ

where ei 2 V (ei : V ⟶ K ).
As discussed above, we find that the vector spaces V and V stand in their own
right on an equal footing. Whereas residents in V see V as a dream world, those in V
see V as it as well. Nonetheless, the two vector spaces are complementary to each
other, which can clearly be seen from (24.67), (24.68), (24.72), and (24.73). These
situations can be formulated as a recurrence property such that

ðV  Þ = V  = V: ð24:74Þ

Notice, however, that (24.74) is not necessarily true of an infinite dimensional


vector space [1, 2].
A bilinear form is naturally introduced as in (24.64) to the direct product V × V.
Although the field K can arbitrarily be chosen, from here on we assume K = ℝ (a real
1158 24 Basic Formalism

number field) for a future purpose. Then, as in (24.65), B(, ) is uniquely determined
by a (n, n) matrix B fj , ei ð2 ℝÞ, where fj ðj = 1, ⋯, nÞ ð2 V  Þ are linearly inde-
pendent in V and form a dual basis (vide supra). The other way around, ek (k = 1,
⋯, n) (2V ) are linearly independent in V. To denote these bilinear forms B(, ), we
use a shorthand notation described by

½j,

where an element of V and that of V are written in the left-side and right-side
“parentheses,” respectively. Also, to clearly specify the nature of a vector species,
we use j  ] for an element of V and [  j for that of V as in the case of a vector of the
inner product space (see Chap. 13). Notice, however, that [| ] is different from the
inner product h| i in the sense that although h| i is positive definite, it is not
necessarily true of [| ]. Using the shorthand notation, e.g., (24.70), can be succinctly
expressed as

fi jek = δik : ð24:75Þ

Because of the isomorphism between V and V, jx] is mapped to [xj by the
bijective mapping and vice versa. We call [xj a dual vector of jx].
Here we briefly compare the bilinear form with the inner product (Chap. 13).
Unlike the inner product space, the basis vectors [xj do not imply the concept of
adjoint with respect to the vector spaces we are thinking of now. Although hx| xi is
positive definite for any jxi 2 V, the positive definiteness is not necessarily true of [x|
x]. Think of a following simple example. In case K = ℂ, we have [x| x] = [ix| -
ix] = - [ix| ix]. Hence, if [x| x] > 0, [ix| ix] < 0. Also, we may have a case where [x|
x] = 0 with jx] ≠ 0 (vide infra). We will come back to this issue later. Hence, it is safe
to say that whereas the inner product buys the positive definiteness at the cost of the
bilinearity, the bilinear form buys the bilinearity at the cost of the positive
definiteness.
The relation represented by (24.70) is somewhat trivial and, hence, it will be
convenient to define another bilinear form B(, ). Meanwhile, in mathematics and
physics, we often deal with invariants before and after the vector (or coordinate)
transformation (see Sect. 24.1.5). For this purpose, we wish to choose a basis set of
V more suited than fk ðk = 1, ⋯, nÞ. Let us define the said basis set of V as

ff1 , ⋯, fn g:

Also, we define a linear mapping M such that

Mðek Þ = fk ðk = 1, ⋯, nÞ, ð24:76Þ

where M : V ⟶ V  is a bijective mapping. Expressing vectors |x] (2V ) and


[x| (2V) as
24.1 Extended Concepts of Vector Spaces 1159

n n
j x = i=1
ξi j ei , x j = i=1
ξi fi j , ξi 2 ℝ, ð24:77Þ

respectively, and using a matrix representation, we express [x| x] as

½f1 je1  ⋯ ½f1 jen  ξ1


½xjx = ξ1 ⋯ξn ⋮ ⋱ ⋮ ⋮ : ð24:78Þ
½fn je1  ⋯ ½fn jen  ξn

Note that from the definition of M, we may write

Mjx = ½x j : ð24:79Þ

Defining a matrix M as [1, 2]

½f1 je1  ⋯ ½f1 jen 


M ⋮ ⋱ ⋮ , ð24:80Þ
½fn je1  ⋯ ½fn jen 

we obtain

ξ1
½xjx = ξ1 ⋯ ξn M ⋮ : ð24:81Þ
ξn

We rewrite (24.81) as

½xjx = ξi M ij ξj = ξk M kk ξk þ ξl M lm ξm , ð24:82Þ
i, j k l, m; l ≠ m

where we put M = (M )ij. The second term of (24.82) can be written as

1
ξl M lm ξm = ξl M lm ξm þ ξm M ml ξl
l, m; l ≠ m
2 l, m; l ≠ m m, l; m ≠ l

1
= ðM þ M ml Þξl ξm :
2 l, m; l ≠ m lm

~ ij  1 M ij þ M ji , we get a symmetric quadratic form


Therefore, defining M 2
described by

½xjx = ~ kk ξk þ
ξk M ~ lm ξl ξm :
M
k l, m; l ≠ m
1160 24 Basic Formalism

Then, we assume that M is symmetric from the beginning, i.e., fi jej = fj jei
[1, 2]. Thus, M is real symmetric matrix (note K = ℝ), and so it is to be diagonalized
through orthogonal similarity transformation. That is, M can be diagonalized with an
orthogonal matrix P = Pi j . We have

P1 1 ⋯ Pn 1 ½f1 je1  ⋯ ½f1 jen  P1 1 ⋯ P1 n


PT MP = ⋮ ⋱ ⋮ ⋮ ⋱ ⋮ ⋮ ⋱ ⋮
P1 n ⋯ Pn n ½fn je1  ⋯ ½fn jen  Pn 1 ⋯ Pn n

f Pj j
j j 1
e Pi
i i 1
⋯ f Pj j
j j 1
e Pi
i i n

= ⋮ ⋱ ⋮

f Pj j
j j n
e Pi
i i 1
⋯ f Pj j
j j n
e Pi
i i n

½f01 je01  ⋯ ½f01 je0n  d1 ⋯ 0


= ⋮ ⋱ ⋮ = ⋮ ⋱ ⋮ , ð24:83Þ
½f0n je01  ⋯ ½f0n je0n  0 ⋯ dn

where summation with respect to both i and j ranges from 1 to n. In (24.83), we have

e0k = Pðek Þ = i
Pi k j ei , f0k j = Pðfk Þ = j
Pj k fj j , f0j je0i = d j δji , ð24:84Þ

where dj ( j = 1, ⋯, n) are real eigenvalues of M. Equation (24.84) shows that both


the basis sets of jei] and fj j ði, j = 1, ⋯, nÞ of V and V, respectively, undergo the
orthogonal transformation by P so that it can be converted into new basis sets j e0i 
and fj0 j. The number dj takes a positive value or a negative value.
Further, interchanging the basis set jei] and similarly exchanging fj j, the
resulting diagonalized matrix of (24.83) is converted through orthogonal transfor-
mation Π = Q1⋯Qs into

d i1

d ip
dipþ1
,

dipþq

where di1 , ⋯, dip are positive; dipþ1 , ⋯, d ipþq are negative; we have p + q = n. What
we have done in the above process is to rearrange the order of the basis vectors
s times so that the positive di1 , ⋯, dip can be arranged in the upper diagonal elements
and the negative dipþ1 , ⋯, d ipþq can be arranged in the lower diagonal elements.
Correspondingly, we obtain
24.1 Extended Concepts of Vector Spaces 1161

je01 ⋯je0n Π = je0i1 ⋯je0ipþq , f01 ⋯ f0n Π = f0i1 ⋯ f0ipþq : ð24:85Þ

Assuming a following non-singular matrix D expressed as [1, 2]


p
1= d i1
⋱ p
D=
1= d ip p , ð24:86Þ
1= - d ipþ1
⋱ p
1= - dipþq

we get

DT ΠT PT MPΠD = ðPΠDÞT MPΠD


1

1
= : ð24:87Þ
-1

-1

The numbers p and q are intrinsic to the matrix M. Since M is a real symmetric
matrix, M can be diagonalized with the similarity transformation by an orthogonal
matrix. But, since the eigenvalues (given by diagonal elements) are unchanged after
the similarity transformation, it follows that p (i.e., the number of the positive
eigenvalues) and q (the number of the negative eigenvalues) are uniquely determined
by M. The integers p and q are called a signature [1, 2]. The uniqueness of ( p, q) is
known as the Sylvester’s law of inertia [1, 2].
We add as an aside that if M is a real symmetric singular matrix, M must have an
eigenvalue 0. In case the eigenvalue zero is degenerated r-fold, we have p + q + r = n.
In that case, p, q, and r are also uniquely determined and these integers are called a
signature as well. This case is also included as the Sylvester’s law of inertia [1, 2].
Note that (24.87) is not a similarity transformation, but an equivalence transfor-
mation and, hence, that the eigenvalues have been changed [from dj (≠0) to ±1].
Also, notice that unlike the case of the equivalence transformation with an arbitrarily
chosen non-singular matrix mentioned in Sect. 14.5, (24.87) represents an equiva-
lence transformation as a special case. That is, the non-singular operator comprises a
product of an orthogonal matrix PΠ and a diagonal matrix D.
The above-mentioned equivalence transformation also causes the transformation
of the basis set. Using (24.86), we have

je0i1 ⋯je0ipþq  D = je0i1 = d i1 ⋯je0ipþq = - d ipþq ,

f0i1 j⋯ f0ipþq j D = f0i1 j= di1 ⋯ f0ipþq j= - d ipþq :

Defining
1162 24 Basic Formalism

j ~eik  j e0ik = ± dik and ~fik j f0ik j = ± d ik ðk = 1, ⋯, n = p þ qÞ,

we find that both the basis sets of V and V finally undergo the same transformation
through the equivalence transformation by PΠD such that

ðj~ei1 ⋯j~ein Þ = ðje1 ⋯jen ÞPΠD,


~f ⋯ ~f = ð½f1 j ⋯½fn jÞPΠD ð24:88Þ
i1 in

with

~fj j~ei = ± δji i, j take a number out of i1 , ⋯, ip , ipþ1 , ⋯, ipþq : ð24:89Þ

Putting W  PΠD, from (24.81) we have

-1 ξ1
½xjx = ξ1 ⋯ ξn W T  W T MW  W - 1 ⋮
ξn

T ξ1
= ξ1 ⋯ ξ n W - 1  W T MW  W - 1 ⋮ :
ξn

Just as the basis sets are transformed by W, so the coordinates are transformed
according to W-1 = D-1Π-1P-1 = D-1ΠTPT. Then, consecutively we have

k ik
ξ0 = P-1 ξi , ξ0 k = Π-1 ξ0 m ,
k i i
i i im im

~ξik ¼ D-1
ik
ξ0 m ¼
i
± d ik δik im ξ0 m ¼
i
± d i k ξ0 k :
i
ð24:90Þ
im im im

Collectively writing (24.90), we obtain

~ξi1 ξ1 ξ1
-1 -1 -1 -1
⋮ ¼W ⋮ ¼D Π P ⋮ : ð24:91Þ
~ξin ξn ξn

Thus, (24.88) and (24.91) afford the transformation of the basis vectors and
corresponding coordinates. These relations can be compared to (11.85).
Now, if the bilinear form [y| x] (x 2 V, y 2 V) described by [1, 2]

ξ1
½yjx = η1 ⋯ ηn M ⋮
ξn
24.1 Extended Concepts of Vector Spaces 1163

satisfies the following conditions, the bilinear form is said to be non-degenerate.


That is, (i) if for 8x 2 V we have [y| x] = 0, then we must have y = 0. Also, (ii) if for
8
y 2 V we have [y| x] = 0, then we must have x = 0. The above condition (i) is
translated into

η1 ⋯ ηn M = 0 ⟹ η1 ⋯ ηn = 0:

This implies detM ≠ 0. Namely, M is non-singular. From the condition (ii) which
implies that

ξ1 ξ1
M ⋮ =0 ⟹ ⋮ = 0,
ξn ξn

we get detM ≠ 0 as well. The determinant of the matrix for RHS of (24.87) is ±1.
Thus, the non-degeneracy of the bilinear form is equivalent to that M has only
non-zero eigenvalues. It is worth comparing two vector spaces, the one of which is
an inner product space Vin and the other of which is a vector space Vbi where the
bilinear form is defined.
Let x 2 Vin and y 2 Vin, where Vin is a dual space of Vin. The inner product can
be formulated by a mapping I such that

I : V in  × V in ⟶ ℂ:

In that case, we have described a resulting complex number of the above


expression as an inner product hy| xi and we have defined the computation rule of
the inner product as (13.2)-(13.4). Regarding two sets of vectors of Vin, we have
defined a Gram matrix G as (13.35) in Sect. 13.2. There, we have shown that

Both the sets of vectors of V in are linearly independent: ⟺ detG ≠ 0: ð13:31Þ

As for the vector space Vbi, the matrix M defined in (24.80) is a counterpart to the
Gram matrix G. With respect to two sets of basis vectors of Vbi and its dual space V bi ,
we have an expression related to (13.31). That is, we have [1, 2]

The bilinear form B given by B : V bi × V bi ⟶ K ðK = ℂ or ℝÞ


is non degenerate: ⟺ det M ≠ 0: ð24:92Þ

Let us consider a case where we construct M as in (24.80) from two sets of basis
vectors of Vbi and V bi given by

fje1 , ⋯, jen ; jei  2 V bi ði = 1, ⋯, nÞg,


1164 24 Basic Formalism

f1 j , ⋯, fn j ; ½fi j 2 V bi ði = 1, ⋯, nÞ :

We assume that these two sets are connected to each other through (24.76).
Suppose that we do not know whether these sets are linearly dependent or indepen-
dent. However, if either of the above two sets of vectors is linearly independent, so is
the remainder set. Think of the following relation:

ξ 1 j e1  þ ⋯ þ ξ n j en  = 0
# M " M-1
ξ1 ½f1 j þ⋯ þ ξn ½fn j = 0:

In the above relation, M has been defined in (24.76). If je1], ⋯, j en] are linearly
independent, we obtain ξ1 = ⋯ = ξn = 0. Then, it necessarily follows that
½f1 j , ⋯, ½fn j are linearly independent as well, and vice versa. If, the other way
around, either of the above two sets of vectors is linearly dependent, so is the
remainder set. This can be shown in a manner similar to the above. Nonetheless,
even though we construct M using these two basis sets, we do not necessarily have
detM ≠ 0. To assert detM ≠ 0, we need the condition (24.92).
In place of M of (24.80), we use a character M and define (24.87) anew as

1

1
M  : ð24:93Þ
-1

-1

Equation (24.93) can be rewritten as

ðM Þij = - δij ζ i = - δij ζ j ; ζ 1 = , ⋯, = ζ p = - ζ pþ1 = , ⋯, = - ζ pþq =


- 1: ð24:94Þ

Meanwhile, we redefine the notation ~e, ~f, and ~ξ as e, f, and ξ, respectively, and
renumber their subscripts (or superscripts) in (24.88) and (24.90) such that

fe1 , ⋯, en g ← fei1 , ⋯, ein g, ff1 , ⋯, fn g ← fi1 , ⋯, fin ,


i1 in
ξ1 , ⋯, ξn ⟵ ξ , ⋯, ξ : ð24:95Þ

Then, from (24.89) and (24.95), M can be rewritten as

ðM Þji = fj jei = ± δji ð1 ≤ i, j ≤ nÞ:


24.1 Extended Concepts of Vector Spaces 1165

The matrix M is called metric tensor. This terminology is, however, somewhat
misleading, because M is merely a matrix consisting of real numbers. Care should be
taken accordingly. If all the eigenvalues of M are positive, the quadratic form [x| x] is
positive definite (see Sect. 14.5). In that case, the metric tensor M can be expressed
using an inner product such that

ðM Þji = fj jei = fj jei = δji ð1 ≤ i, j ≤ nÞ: ð24:96Þ

This is ensured by Theorem 13.2 (Gram–Schmidt orthonormalization Theorem)


and features the Euclidean space. That space is characterized by the Euclidean metric
given by (24.96). If, on the other hand, M contains both positive and negative
eigenvalues, [x| x] can be positive or negative or zero. In the latter case, [x| x] is
said to be an indefinite (Hermitian) quadratic form. As a typical example for this, we
have the four-dimensional Minkowski space where the metric tensor M is
represented by

1 0 0 0
-1
M= 0
0 0
0
-1
0
0
: ð24:97Þ
0 0 0 -1

The matrix M is identical with η of (21.16) that has already appeared in Chap. 21.

24.1.5 Invariants

The metric tensor M plays a fundamental role in a n-dimensional real vector space. It
is indispensable for studying the structure and properties of the vector space. One of
the most important points is that M has a close relationship to the invariants of the
vector field. First, among various operators that operate on a vector, we wish to
examine what kinds of operators hold [x| x] invariant after their operation on vectors
jx] and [xj belonging to V or V, respectively.
Let the relevant operator be R. Operating R on

n ξ1
jx = i=1
ξi j ei  = ðje1  ⋯ jen Þ ⋮ ð2 V Þ,
ξn

we obtain
n
Rjx = j Rx = i,j = 1
Ri j ξj j ei , ð24:98Þ
1166 24 Basic Formalism

where ei has been defined in (24.95). Operating M of (24.76) on both sides of


(24.98), we obtain

n
Rx j = i,j = 1
R i j ξj ½ fi j : ð24:99Þ

Then, we get

n n n
½RxjRx = i,j = 1
R i j ξj k,l = 1
Rk l ξl ½fi jek  = i,j,k,l = 1
ξj Ri j M ik Rk l ξl , ð24:100Þ

where with the last equality we used (24.80) and M ik represents (i, k)-element of M
given in (24.93), i.e., M ik = ± δik . Meanwhile, we have
n
½xjx = j,l = 1
ξj M jl ξl : ð24:101Þ

From (24.100) and (24.101), we have


n
½RxjRx - ½xjx = i,j,k,l = 1
ξj Ri j M ik Rk l - M jl ξl : ð24:102Þ

Since ξ j and ξl can arbitrarily be chosen from ℝ, we must have


n
i,k = 1
Ri j M ik Rk l - M jl = 0 ð24:103Þ

so that [Rx| Rx] - [x| x] = 0 (namely, so that [x| x] can be invariant before and after
the transformation by R). In a matrix form, (24.103) can be rewritten as

RT M R = M : ð24:104Þ

Remind that an equation related to (24.104) has already appeared in Sect. 13.3 in
relation to the Gram matrix and adjoint operator; see (13.81). If real matrices are
chosen for R and G, (13.81) is translated into

RT GR = G:

Hence, this expression is virtually the same as (24.104).


Even though (24.103) and (24.104) are equivalent, (24.103) clearly shows a
covariant characteristic (vide infra). The transformation R plays a fundamental role
in a tensor equation, i.e., an equation which contains tensor(s). In what follows, the
transformation of the tensors and tensor equation along with related invariants will
form the central theme.
First, we think about the transformation of the basis vectors and its corresponding
coordinates. In Sect. 11.4, we described the transformation as
24.1 Extended Concepts of Vector Spaces 1167

x1
x = ðe1 ⋯ en Þ ⋮
xn

x1 x1 x~1
= ðe1 ⋯ en ÞPP - 1 ⋮ = e01 ⋯ e0n P - 1 ⋮ = e01 ⋯ e0n ⋮ : ð11:84Þ
xn xn x~n

Here, we rewrite this relationship as

ξ1 ξ1 ξ01
x = ð e1 ⋯ e n Þ ⋮ = ðe1 ⋯ en ÞR - 1  R ⋮ = e01 ⋯ e0n ⋮ : ð24:105Þ
ξn ξn ξ0n

Note that in (24.105) R was used instead of P-1 in (11.84). This is because in this
section we place emphasis on the coordinate transformation. Thus, as the transfor-
mation of the basis set and its corresponding coordinates of V, we have

n j
e01 ⋯ e0n = ðe1 ⋯ en ÞR - 1 or e0i = e
j=1 j
R - 1 i: ð24:106Þ

Also, we obtain

ξ01 ξ1 n
⋮ =R ⋮ or ξ0i = j=1
R i j ξj : ð24:107Þ
ξ0n ξn

We assume that the transformation of the dual basis fi of V is described such that
n
f0i = j=1
Qi j fj : ð24:108Þ

We wish to decide Q so that it can satisfy (24.70)

fi jej = f0i je0j = δij : ð24:109Þ

Inserting e0j of (24.106) and f0i of (24.108) into (24.109), we get

k k k
f0i je0j = Qi l fl jek R - 1 j
= Qi l R - 1 j
fl jek = Qi l R - 1 δl
j k

l
= Qi l R - 1 j
= fi jej = δij : ð24:110Þ

Representing (24.110) in a matrix form, we obtain


1168 24 Basic Formalism

QR - 1 = E: ð24:111Þ

Hence, we get Q = R. Thus, we find that fi undergoes the same transformation as


ξ . It is said to be a contravariant transformation.
j

Since both N of (24.71) and M of (24.76) are bijective mappings and f1 , ⋯, fn


and ff1 , ⋯, fn g are two basis sets of V, we must have a non-singular matrix that
transforms each other (Sect. 11.4). Let T be the said non-singular matrix. Then, we
have
n
fi = T f
j = 1 ij
j
ði = 1, ⋯, nÞ: ð24:112Þ

Also, we have

n n n
½xjx = j,l = 1
ξj fj jel ξl = j,l = 1
ξj T f
k = 1 jk
k
jel ξl
n n n
= j,l,k = 1
ξj T jk fk jel ξl = j,l,k = 1
ξj T jk δkl ξl = j,l = 1
ξj T jl ξl : ð24:113Þ

Comparing (24.113) with (24.101) and seeing that ξ j ( j = 1, ⋯, n) can arbitrarily


be chosen, we get

T jl = M jl i:e:, T =M:

Hence, we have
n
fi = j=1
M ij fj ði = 1, ⋯, nÞ: ð24:114Þ

Making a bilinear form between ek and both sides of (24.114), we obtain


n n
½fi jek  = j=1
M ij fj jek = j=1
M ij δjk = M ik , ð24:115Þ

where with the second equality we used (24.109).


Next, we investigate the transformation property of fi . Let R be a transformation
in a vector space V. Operating R-1 on both sides of (24.76), we obtain

R - 1 Mðek Þ = R - 1 fk ðk = 1, ⋯, nÞ: ð24:116Þ

Assuming that R-1 and M are commutative, we have

M R - 1 ek = M e0k = R - 1 fk = f0k : ð24:117Þ

Operating M on (24.106), we obtain


24.1 Extended Concepts of Vector Spaces 1169

j n j
M e0i = f0i = R - 1 i M ej = j=1
R - 1 i fj , ð24:118Þ

where with the first equality we used (24.117); with the last equality we used (24.76).
Rewriting (24.118), we get

n j
f0i = f
j=1 j
R - 1 i: ð24:119Þ

Namely, fj exhibits the same transformation property as ej.


We examine the coordinate transformation in a similar manner. To this end, let us
define a following quantity ξi such that
n
ξl  j=1
ξj M jl : ð24:120Þ

Using the definition (24.120), (24.101) is rewritten as


n
½xjx = ξξ
l=1 l
l
: ð24:121Þ

Further rewriting (24.121), we get

n n n k
½xjx = ξξ
l=1 l
l
= ξ δk ξl
k,l = 1 k l
= ξ
i,k,l = 1 k
R-1 i
R i l ξl : ð24:122Þ

Defining the transformation of ξi as

n k
ξ0i  ξ
k=1 k
R-1 i
ð24:123Þ

and using (24.107), we obtain


n
½xjx = ξ0 ξ0i :
i=1 i
ð24:124Þ

Consequently, nl= 1 ξl ξl is an invariant quantity with regard to the transformation


by R. Thus, (24.106), (24.107), (24.108), (24.119), and (24.123) give the transfor-
mation properties of the basis vectors and their corresponding coordinates.
Next, we investigate the transformation property of M of (24.93). Using (24.106)
and (24.119), we have

n j l
f0i je0k = j,l = 1
fj jel R - 1 i
R-1 k
: ð24:125Þ

Multiplying both sides by Ri s Rk t and summing over i and k, we obtain


1170 24 Basic Formalism

n n j l
i,k = 1
Ri s Rk t f0i je0k = i,k,j,l = 1
R - 1 i Ri s R - 1 k
Rk t fj jel
n n
= δj δl
j,l = 1 s t
fj jel = ½fs jet  = M st = i,k = 1
Ri s M ik Rk t , ð24:126Þ

where with the second to the last equality we used (24.115); with the last equality we
used (24.103). Putting f0i je0k = M 0ik and comparing the leftmost side and rightmost
side of (24.126), we have
n n
i,k = 1
Ri s M 0ik Rk t = i,k = 1
Ri s M ik Rk t : ð24:127Þ

Rewriting (24.127) in a matrix form, we obtain

RT M 0 R = RT M R: ð24:128Þ

Since R is a non-singular matrix, we get

M0=M: ð24:129Þ

That is, M is invariant with respect to the basis vector transformation. Hence, M
is called invariant metric tensor. In other words, the transformation R that satisfies
(24.128) and (24.129) makes [x| x] invariant and, hence, plays a major role in the
vector space characterized by the invariant metric tensor M .
As can be seen immediately from (24.93), M is represented as a symmetric matrix
and an inverse matrix of M - 1 is M itself. We denote M - 1 by

ij
M -1  M -1  ðM Þij :

Then, we have
n n
j=1
M ij M jk = δik or k=1
M jk M ki = δij : ð24:130Þ

These notations make various expressions simple. For instance, multiplying both
sides of (24.120) by M ik and summing the result over i we have
n n n n
ξM
i=1 i
ik
 i=1 j=1
ξj M ji M ik = j=1
ξj δkj = ξk :

Multiplying both sides of (24.114) by M il = M li and summing over i, we


obtain
24.1 Extended Concepts of Vector Spaces 1171

n n n n
fM
i=1 i
il
= i=1 j=1
M li M ij fj = δl fj
j=1 j
= fl : ð24:131Þ

Moreover, we get
n n n n n n
i=1
ξi fi = l=1 i=1
ξi δli fl = l=1 k=1 i=1
ξi M ik M kl fl
n
= ξf
k=1 k
k
, ð24:132Þ

where with the second and last equalities we used (24.130).


In this way, we can readily move the indices up and down. With regard to the
basis vector transformation of e.g., (24.114), we can carry out the matrix calculation
in a similar way. These manipulations are useful in relation to derivation of the
invariants.

24.1.6 Tensor Space and Tensors

In Sect. 24.1.1, we showed the definition and a tangible example of the bilinear
mapping. Advancing this notion further, we reach the multilinear mapping
[1, 2]. Suppose that a mapping S from a direct product space Vs × ⋯ × V1 to a
vector space T is described by

S : V s × ⋯ × V 1 ⟶ T,

where V1, ⋯, Vs are vector spaces and Vs × ⋯ × V1 is their direct product space. If
with x(i) 2 Vi (i = 1, ⋯, s), S xðsÞ , ⋯, xð1Þ is linear with respect to each x(i) (i = 1,
⋯, s), then S is called a multilinear mapping. As a natural extension of the tensor
product dealt with in Sect. 24.1.2, we can define a tensor product T in relation to the
multilinear mapping. It is expressed as

T = V s  ⋯  V 1: ð24:133Þ

Then, in accordance with (24.34), we have

S xðsÞ , ⋯, xð1Þ = xðsÞ  ⋯  xð1Þ :

Also, as a natural extension of Theorem 24.2, we have a following theorem.


Although the proof based on the mathematical induction can be seen in literature
[1, 2], we skip it and show the theorem here without proof.
Theorem 24.3 [1, 2] Let Vi (i = 1, ⋯, s) be vector spaces over K. Let T be an
~ be an arbitrary multilinear mapping expressed as
arbitrary vector space. Let S
1172 24 Basic Formalism

~ : V s × ⋯ × V 1 ⟶ T:
S

Also, a tensor product is defined such that

T = V s  ⋯  V 1:

Then, a linear mapping ρ : V s  ⋯  V 1 ⟶ T that satisfies the condition


described by

~ xðsÞ , ⋯, xð1Þ = ρ xðsÞ  ⋯  xð1Þ ; xðiÞ 2 V i ði = 1, ⋯, sÞ


S ð24:134Þ

is uniquely determined.
As in the case of the bilinear mapping, we have linear mappings φi (i = 1, ⋯, s)
such that

φi : V i ⟶ V 0i ði = 1, ⋯, sÞ: ð24:135Þ

Also, suppose that we have

T = V 0s  ⋯  V 01 : ð24:136Þ

Notice that (24.136) is an extension of (24.36). Then, from (24.133) to (24.136),


we can construct a multilinear mapping S~ : V s × ⋯ × V 1 ⟶T as before such that

~ xðsÞ , ⋯, xð1Þ = φs xðsÞ  ⋯  φ1 xð1Þ :


S ð24:137Þ

Also defining a linear mapping ρ : T⟶T such that

ρ  φs  ⋯  φ1 , ð24:138Þ

we get

ðφs  ⋯  φ1 Þ xðsÞ  ⋯  xð1Þ = φs xðsÞ  ⋯  φ1 xð1Þ : ð24:139Þ

We wish to consider the endomorphism with respect to φi (i = 1, ⋯, s) within a


same vector space Vi as in the case of Sect. 24.1.2. Then, we have

φi : V i ⟶ V i ði = 1, ⋯, sÞ: ð24:140Þ

In this case, we have


24.1 Extended Concepts of Vector Spaces 1173

T = V s  ⋯  V 1:

Accordingly, from (24.133), we get a following endomorphism with the


multilinear mapping φs  ⋯  φ1:

φs  ⋯  φ1 : T ⟶ T:

Replacing φi with a (n, n) matrix Ai, (24.55) can readily be generalized such that

ðAs  ⋯  A1 Þ xðsÞ  ⋯  xð1Þ

= eð1Þ  ⋯  eðsÞ ½As  ⋯  A1  ξðsÞ  ⋯  ξð1Þ , ð24:141Þ

where e(i) (i = 1, ⋯, s) denotes e1ðiÞ ⋯ enðiÞ ; ξ(i) (i = 1, ⋯, s) is an abbreviation


ξ1ðiÞ
for ⋮ . Also, in (24.141), As  ⋯  A1 operates on a vector e(i)  ξ(i) 2 T.
ξðniÞ
In the above, e(i)  ξ(i) is a symbolic expression meant by

eðiÞ  ξðiÞ = eð1Þ  ⋯  eðsÞ ξðsÞ  ⋯  ξð1Þ ,

where e(1)  ⋯  e(s) is represented by a (1, ns) row matrix; ξ(s)  ⋯  ξ(1) by a
(ns, 1) column matrix. The operator As  ⋯  A1 is represented by a (ns, ns) matrix.
As in the case of (11.39), the operation of As  ⋯  A1 is described by

ðAs  ⋯  A1 Þ xðsÞ  ⋯  xð1Þ

= eð1Þ  ⋯  eðsÞ ½As  ⋯  A1  ξðsÞ  ⋯  ξð1Þ

= eð1Þ  ⋯  eðsÞ ½As  ⋯  A1  ξðsÞ  ⋯  ξð1Þ :

That is, the associative law holds with the matrix multiplication.
We have a more important case where some of vector spaces Vi (i = 1, ⋯, s) that
are involved in the tensor product are the same space V and the remaining vector
spaces are its dual space V. The resulting tensor product is described by

T = V s  ⋯  V 1 ðV i = V or V  ; i = 1, ⋯, nÞ, ð24:142Þ

where T is a tensor product consisting of s vector spaces in total; V is a dual space of


V. The tensor product described by (24.142) is called a tensor space. If the number of
vector spaces V and that of V contained in the tensor space are q and r (q + r = s),
respectively, that tensor space is said to be a (q, r)-type tensor space or a q-th order
contravariant-r-th order covariant tensor space.
1174 24 Basic Formalism

An element of the tensor space is called a s-th rank tensor. In particular, if all
Vi (i = 1, ⋯, s) are V, the said tensor space is called a s-th rank contravariant tensor
space and the elements in that space are said to be a s-th rank contravariant tensor. If
all Vi (i = 1, ⋯, s) are V, the tensor space is called a s-th rank covariant tensor space
and the elements in that space are said to be a s-th rank covariant tensor accordingly.
As an example, we construct three types of second rank tensors described by
(i)
ðR  RÞ xð2Þ  xð1Þ = eð2Þ  eð1Þ ½R  R ξð2Þ  ξð1Þ ,

(ii)
T T
R-1  R-1 yð2Þ  yð1Þ
T T
= fTð2Þ  fTð1Þ R-1  R-1 ηTð2Þ  ηTð1Þ ,

(iii)
T T
R-1  R yð2Þ  xð1Þ = fTð2Þ  eð1Þ R-1 R

ηTð2Þ  ξð1Þ : ð24:143Þ

Equation (24.143) shows (2, 0)-, (0, 2)-, and (1, 1)-type tensors for (i), (ii), and
(iii) from the top. In (24.143), with x(i) 2 V we denote it by x(i) = e(i)  ξ(i) (i = 1, 2);
for y(i) 2 V we express it as yðiÞ = fTðiÞ  ηTðiÞ ði = 1, 2Þ. The vectors x(i) and y(i) are
referred to as a contravariant vector and a covariant vector, respectively. Since with a
dual vector of V, a basis set fðiÞ are arranged in a column vector and coordinates are
arranged in a row vector, they are transposed so that their arrangement can be
consistent with that of a vector of V. Concomitantly, R-1 has been transposed as
well. Higher rank tensors can be constructed in a similar manner as in (24.143) noted
above.
The basis sets e(2)  e(1), fTð2Þ  fTð1Þ , and fTð2Þ  eð1Þ are referred to a standard basis
(or canonical basis) of the tensor spaces V  V, V  V, and V  V, respectively.
Then, if (24.142) contains the components of V, we need to use ηTðiÞ for the
coordinate; see (24.132). These basis sets are often disregarded in the computation
process of tensors.
Equation (24.143) are rewritten using tensor components (or “coordinates” of a
tensor) as
24.1 Extended Concepts of Vector Spaces 1175
n
(i) ξ ′ ið2Þ ξ ′ kð1Þ = j,l = 1
Ri l Rk j ξðl 2Þ ξjð1Þ ,

(ii)
n l j
η0ið2Þ η0kð1Þ = j,l = 1
R-1 i
R-1 η η ,
k lð2Þ jð1Þ

(iii)
n l
η0ið2Þ ξ ′ kð1Þ = j,l = 1
R - 1 i Rk j ηlð2Þ ξjð1Þ ,

where, e.g., ξ ′ ið2Þ ξ ′ kð1Þ shows the tensor components after being transformed by R.
With these relations, see (24.62). Also, notice that the invariant tensor fj jel of
(24.125) indicates the transformation property of the tensor components of a second
rank tensor of (0, 2)-type. We have another invariant tensor δij of (1, 1)-type, whose
transformation is described by

l l i
δ0 j = R - 1 j Rik δkl = R - 1 j Ril = RR - 1 j = δij :
i

Note that in the above relation R can be any non-singular operator. The invariant
tensor δij is exactly the same as fi ðek Þ of (24.70). Although the invariant metric tensor
ðM Þij of (24.93) is determined by the inherent structure of individual vector spaces
that form a tensor product, the invariant tensor δij is common to any tensor space,
regardless of the nature of constituent vector spaces.
To show tensor components, we use a character with super/subscripts such as
e.g., Ξij (1≤i, j ≤ n), which are components of a second rank tensor of (2, 0)-type.
The transformation of Ξ is described by
n
Ξ0 =
ik
j,l = 1
Ri l Rk j Ξlj , etc:

We can readily infer the computation rule of the tensor from analogy and
generalization of the tensor product calculations shown in Example 24.2.
More generally, let us consider a (q, r)-type tensor space T. We express an
element T (i.e., a tensor) of T ðT 2 T Þ symbolically as

T = Γ  Θ  Ψ, ð24:144Þ

where Γ denotes a tensor product of the s basis vectors of V or V; Θ denotes a tensor
product of the s operators R or (R-1)T; Ψ also denotes a tensor product of the
s coordinates ξiðjÞ or ηk(l ) where 1 ≤ j ≤ q, 1 ≤ l ≤ r, 1 ≤ i, k ≤ n (n is a dimension
of the vector spaces V and V). Individual tensor products are further described by
1176 24 Basic Formalism

Γ = gðsÞ  gðs - 1Þ  ⋯  gð1Þ ,


Θ = QðsÞ  Qðs - 1Þ  ⋯  Qð1Þ ,
Ψ = ζ ðsÞ  ζ ðs - 1Þ  ⋯  ζ ð1Þ : ð24:145Þ

In (24.145), Γ represents s basis sets gðiÞ ð1 ≤ i ≤ sÞ, which denote either e(i) or
fTðiÞ ð1 ≤ i ≤ sÞ. The number of e(i) is q and that of f TðiÞ is r with q + r = s. With the
tensor product Θ, Q(i) represents R or (R-1)T with the number of R and (R-1)T being
q and r, respectively. The tensor product Ψ denotes s coordinates ζ (i) (1 ≤ i ≤ s),
which represent either ξ(i) or ηTðiÞ . The number of ξ(i) is q and that of ηTðiÞ is r with
q + r = s.
The symbols ξ(i) and ηTðiÞ represent

ξ1ðiÞ η 1ð i Þ
ξðiÞ = ⋮ ð1 ≤ i ≤ qÞ, ηTðiÞ = ⋮ ð1 ≤ i ≤ r Þ,
ξnðiÞ η nð i Þ

respectively. Thus, Ψ is described by the tensor products of column vectors. Con-


sequently, Ψ has a dimension of nq + r = ns.
In Sect. 22.5, we have introduced the field tensor Fμν, i.e., a (2, 0)-type tensor
which is represented by a matrix. To express a physical quantity in a tensor form, Ψ
is usually described as

ξ ξ ⋯ξ
Ψ  T ηððqrÞÞηððrq-- 11ÞÞ⋯ηðð11ÞÞ : ð24:146Þ

Then, Ψ is transformed by R such that

ξ ξ ⋯ξ
Ψ0 = T 0 ηððqrÞÞηððrq-- 11ÞÞ⋯ηðð11ÞÞ
α β γ ρσ⋯τ
= R-1 η ðr Þ
R-1 η ð r - 1Þ
⋯ R-1 η ð 1Þ
RξðqÞ ρ Rξðq - 1Þ σ ⋯Rξð1Þ τ T αβ⋯γ : ð24:147Þ

As can be seen above, if the tensor products of the basis vectors Γ are fixed
(normally at the standard basis) in (24.144), the tensor T and their transformation are
determined by the part Θ  Ψ in (24.144). Moreover, since the operator R is related to
the invariant metric tensor M through (24.104), the most important factor of
(24.144) is the tensor product Ψ of the coordinates. Hence, it is often the case in
practice with physics and natural science that Ψ described by (24.146) is simply
referred to as a tensor.
Let us think of the transformation property of a following (1, 1)-type tensor Ξji :
24.1 Extended Concepts of Vector Spaces 1177

n l
Ξ ′ ki = j,l = 1
R - 1 i Rk j Ξjl : ð24:148Þ

Putting k = i in (24.148) and taking the sum over i, we obtain

n n l n n
i=1
Ξ ′ ii = i,j,l = 1
R - 1 i Ri j Ξjl = δl Ξj
j,l = 1 j l
= Ξl
l=1 l
 Ξll : ð24:149Þ

Thus, (24.149) represents an invariant quantity that does not depend on the
coordinate transformation. The above computation rule is known as tensor contrac-
tion. In this case, the summation symbol is usually omitted and this rule is referred to
as Einstein summation convention (see Chap. 21). We readily infer from (24.149)
that a (q, r)-type tensor is converted into a (q - 1, r - 1)-type tensor by the
contraction.

24.1.7 Euclidean Space and Minkowski Space

As typical examples of the real vector spaces that are often dealt with in mathemat-
ical physics, we have Euclidean space and Minkowski space. Even though the
former is familiar enough even at the elementary school level, we wish to revisit
and summarize its related topics.
The metric tensor of Euclidean space is given by

M = ðM Þij = δij : ð24:150Þ

Hence, from (24.104) the transformation R that holds a bilinear form [x| x]
invariant satisfies the condition

RT ER = RT R = E, ð24:151Þ

where E is the identity operator of Euclidean space. That is, R must be an orthogonal
matrix. From (24.120), we have ξi = ξ j. Also, from (24.123) we have

n k n n
ξ0i  R-1
k
ξ
k=1 k i
= ξ
k=1 k
RT i
= ξ Ri
k=1 k k
n
= k=1
Ri k ξk : ð24:152Þ

From (24.107), ξi is the same transformation property as ξ′i. Therefore, we do not


have to distinguish a contravariant vector and covariant vector in Euclidean space.
The bilinear form [x| x] is identical with the positive definite inner product hx| xi in a
real vector space.
In the n-dimensional Minkowski space, the metric tensor ðM Þij = fi jej is
represented by
1178 24 Basic Formalism

fi jej = - δij ζ i = - δij ζ j = fj jei ; ζ 0 = - ζ k ðk = 1, ⋯, n - 1Þ = - 1: ð24:153Þ

In the four-dimensional case, M is expressed as

1 0 0 0
-1
M= 0
0 0
0
-1
0
0
: ð24:97Þ
0 0 0 -1

Minkowski space has several properties different from those of Euclidean space
because of the difference in the metric tensor. First, we make general discussion
concerning the vector spaces, followed by specific topics related to Minkowski space
as contrasted with the properties of Euclidean space. The discussion is based on the
presence of the bilinear form that has already been mentioned in Sect. 24.1.3. First,
we formally give a following definition.
Definition 24.4 [5] Let W be a subspace of a vector space V. Suppose that with

w0 (≠0) 2 W we have

8
½w0 jw = 0 for w 2 W: ð24:154Þ

Then, the subspace W is said to be degenerate. Otherwise, W is called non-


degenerate. If we have

8
½ujv = 0 for u, v 2 W,

W is called totally degenerate.


The statement that the subspaces are degenerate, non-degenerate, or totally
degenerate in the above definition is clear from the definition of subspace (Sect.
11.1) and [| ]. The existence of degenerate and totally degenerate subspaces results
from the presence of a singular vector, i.e., a vector u (≠0) that satisfies [u| u] = 0.
Note that any subspace of Euclidean space is non-degenerate and that no singular
vector is present in Euclidean space except for zero vector. If it was not the case,
from (24.154) with ∃w0 (≠0) 2 W we would have [w0| w0] = 0. It is in contradiction
to the positive definite inner product of Euclidean space.
In Sect. 14.1, we showed a direct sum decomposition of a vector space into its
subspaces. The concept of the direct sum decomposition plays an essential role in
this section as well. The definition of an orthogonal complement stated in (14.1),
however, needs to be modified in such a way that

W ⊥  jx; ½xjy = 0 for 8 jy 2 W : ð24:155Þ

In that case, W⊥ is said to be an orthogonal complement of W.


24.1 Extended Concepts of Vector Spaces 1179

In this context, we have following important proposition and theorems and see
how the present situation can be compared with the previous results obtained in the
inner product space (Sect. 14.1).
Proposition 24.1 [5] Let V be a vector space where a bilinear form [| ] is defined.
Let W be a subspace of V and let W⊥ be an orthogonal complement of W. A
necessary and sufficient condition for W \ W⊥ = {0} to hold is that W is
non-degenerate.
Proof
(i) Sufficient condition: To prove “W is non-degenerate ⟹ W \ W⊥ = {0},” we
prove its contraposition. That is, suppose W \ W⊥ ≠ {0}. Then, we have 0 ≠ ∃w0
which satisfies w0 2 W \ W⊥. From the definition of (24.155) for the orthogonal
complement and the assumption of w0 2 W⊥, we have

½w0 jw = 0 for 0 ≠ w0 2 W, 8 w 2 W: ð24:156Þ

But, (24.156) implies that W is degenerate.


(ii) Necessary condition: Let us suppose that when W \ W⊥ = {0}, W is degenerate.
Then, from the definition of (24.154), (24.156) holds with W. But, from the
definition of the orthogonal complement, (24.156) implies w0 2 W⊥. That is, we
have 0 ≠ w0 2 W \ W⊥. This is, however, in contradiction to W \ W⊥ = {0}.
Then, W must be non-degenerate. These complete the proof.
Theorem 24.4 [5] Let V be a vector space with a dimension n where a bilinear form
[| ] is defined. Let W be a subspace of V with a dimension r. If W is non-degenerate,
we have

V =W W ⊥: ð24:157Þ

Moreover, we have


ðW ⊥ Þ = W, ð24:158Þ

dimW = n - r: ð24:159Þ

Proof If W is non-degenerate, Theorem 14.1 can be employed. As already seen, we


can always construct the orthonormal basis such that [5]

fi jej = δij ζ i = δij ζ j ζ j = ± 1 ; ð24:160Þ

see (24.94) and (24.115). Notice that the terminology of orthonormal basis was
borrowed from that of the inner product space and that the existence of the ortho-
normal basis can be proven in parallel to the proof of Theorem 13.2 (Gram-Schmidt
orthonormalization theorem) and Theorem 13.3 [4].
1180 24 Basic Formalism

Suppose that with 8|x] (2V ), |x] is described as j x = n


i = 1ξ
i
j ei . Then,

n n
fj jx = i=1
ξi fj jei = i=1
ξi δji ζ i = ξj ζ j : ð24:161Þ

Note that the summation with respect to j was not taken. Multiplying both sides of
(24.161) by ζ j, we have

ξj = ζ j fj jx: ð24:162Þ

Thus, we obtain

n
j x = j=1
ζ j fj jxjej : ð24:163Þ

Since W is a non-degenerate subspace, it is always possible to choose the


orthonormal basis for it. Then, we can make {e1, ⋯, er} the basis set of W. In fact,
with ∃|w0] 2 W and 8w 2 W, describing
r r
j w0  = ηk
k=1 0
j ek , j w = j=1
ηj j ej , ð24:164Þ

we obtain
r r r
½w0 jw = ηk ηj
k,j = 1 0
fk jej = ηk ηj δkj ζ j
k,j = 1 0
= ηj ηj ζ j :
j=1 0
ð24:165Þ

For [w0| w] = 0 to hold with 8w 2 W, i.e., with 8η j 2 ℝ ( j = 1, ⋯, r), we must


have ηj0 ζ j = 0. As ζ j = ± 1 (≠0), we must have ηj0 = 0 ðj = 1, ⋯, r Þ. That is, |w0] = 0.
This means that W is non-degenerate. Thus, using the orthonormal basis we repre-
sent |x′] 2 W as

r
j x0  = j=1
ζ j fj jxjej , ð24:166Þ

where we used (24.162). Putting |x′′] = |x] - |x′] and operating ½fk j ð1 ≤ k ≤ r Þ on
both sides from the left, we get

n r
fk jx00  = j=1
ζ j fj jx fk jej - j=1
ζ j fj jx fk jej
r r
= j=1
ζ j fj jxδkj ζ j - j=1
ζ j fj jxδkj ζ j
2 2
= ζk fk jx - ζ k ½fk jx = ½fk jx - ½fk jx = 0: ð24:167Þ

Note in (24.167) fk jej = 0 with r + 1 ≤ j ≤ n. Rewriting (24.167), we have


24.1 Extended Concepts of Vector Spaces 1181

½x00 jek  = 0 ð1 ≤ k ≤ r Þ: ð24:168Þ

Since W = Span {e1, ⋯, er}, (24.168) implies that [x′′|w] = 0 with 8|w] 2 W.
Consequently, from the definition of the orthogonal complement, we have x′′ 2 W⊥.
Thus, we obtain

jx = jx0  þ jx00 , ð24:169Þ

where |x] 2 V, |x′] 2 W, and x′′ 2 W⊥. Then, from Proposition 24.1 we get

V =W W ⊥: ð24:157Þ

From Theorem 11.2, we immediately get

dimW ⊥ = n - r: ð24:159Þ

Also, from the above argument we have W⊥ = Span {er + 1, ⋯, en}. This once
again implies that W⊥ is non-degenerate as well. Then, we have


V = W⊥ ðW ⊥ Þ : ð24:170Þ

Therefore, we obtain


dimðW ⊥ Þ = n - ðn - r Þ = r: ð24:171Þ

From the definition for the orthogonal complement of (24.155), we have

⊥ 8
ðW ⊥ Þ = jx; ½xjy = 0 for j y 2 W ⊥ : ð24:172Þ

Then, if jx] 2 W, again from the definition such jx] must be contained in (W⊥)⊥.
That is, (W⊥)⊥ ⊃ W [1, 2]. But, from the assumption and (24.171), dim
(W⊥)⊥ = dim W. This implies [1, 2]


ðW ⊥ Þ = W: ð24:158Þ

These complete the proof.


Getting back to (24.93), we have two non-degenerate subspaces of V where
their basis vectors comprise |ei] (i = 1, ⋯, p) to which ½fk jei  = δki or those comprise
|ej] ( j = p + 1, ⋯, p + q = n) to which fk ej = - δkj . Let us define two such
subspaces as
1182 24 Basic Formalism

W þ = Span e1 , ⋯, ep , W - = Span epþ1 , ⋯, epþq : ð24:173Þ

Then, from Theorem 24.4 we have

V = Wþ W-: ð24:174Þ

In (24.173) we redefine p and q as

p = nþ , q = n - ; nþ þ n - = n:

We have

dimV = dimW þ þ dimW - = nþ þ n - = n: ð24:175Þ

We name W+ and W- Euclidean subspace and anti-Euclidean subspace, respec-


tively. With respect to W+ and W-, we have a following theorem.
Theorem 24.5 [5] Let V be a vector space with a dimension n. Let W+ and W- be
Euclidean subspace and anti-Euclidean subspace of V with the dimension n+ and n-,
respectively (n+ + n- = n). Then, there must be a totally degenerate subspace W0
with a dimension n0 given by

n0 = minðnþ , n - Þ: ð24:176Þ

Proof From Theorem 24.4, we have W+ \ W- = {0}. From the definition of the
totally degenerate subspace, obviously we have W+ \ W0 = W- \ W0 = {0}. (Oth-
erwise, W+ and W- would be degenerate.) From this, we have

nþ þ n0 ≤ n, n - þ n0 ≤ n: ð24:177Þ

Then, we obtain

n0 ≤ minðn - nþ , n - n - Þ = minðn - , nþ Þ  m: ð24:178Þ

Suppose n+ = m ≤ n-. Then, we can choose two sets of orthonormal bases


fei1 , ⋯, eim g for which fik jeil = δik il and ej1 , ⋯, ejm for which fjk jejl = - δjk jl .
Let us think of the following set W given by

W = ei1 - ej1 , ⋯, eim - ejm : ð24:179Þ

Defining eis - ejs  Es , fis - fjs  Fs ðs = 1, ⋯, mÞ, we have


24.1 Extended Concepts of Vector Spaces 1183

½Fs jEt  = 0 ðs, t = 1, ⋯, mÞ: ð24:180Þ

Let us consider the following equation:

c1 E1 þ ⋯ þ cm Em = 0: ð24:181Þ

Suppose that E1 , ⋯, Em are linearly dependent. Then, without loss of generality


we assume c1 ≠ 0. Dividing both sides of (24.181) by c1 and expressing (24.181) in
terms of ei1 , we obtain

c2 c
ei1 = ej1 þ e - ej2 þ ⋯ þ m eim - ejm : ð24:182Þ
c1 i2 c1

Equation (24.182) shows that ei1 is described by a linear combination of (2m - 1)


linearly independent vectors, but it is in contradiction to the fact that 2m vectors
ei1 , ⋯, eim ; ej1 , ⋯, ejm are linearly independent. Thus, E1 , ⋯, and Em must be linearly
independent. Hence, Span fE1 , ⋯, Em g forms a m-dimensional vector space. Equa-
tion (24.180) implies that it is a totally degenerate subspace of V with a dimension of
m. This immediately means that

W 0 = Span ei1 - ej1 , ⋯, eim - ejm : ð24:183Þ

Then, from the assumption we have n0 = m. Namely, the equality holds with
(24.178) so that (24.176) is established.
If, on the other hand, we assume n- = m ≤ n+, proceeding in a similar way as the
above discussion we reach the same conclusion, i.e., n0 = m. This completes the
proof.
Since we have made general discussion and remarks on the vector spaces where
the bilinear form is defined, we wish to show examples of the Minkowski space 4
and 3 .
Example 24.3 We study several properties of the Minkowski space, taking four-
dimensional space 4 and three-dimensional space 3 as an example. For both the
cases of 4 and 3 , the dimension of a totally degenerate subspace is 1 (Theorem
24.5). In 4 , n+ and n- of (24.176) are 1 and 3, respectively; see (24.97). The space
4 is denoted by

4 = Span fe0 , e1 , e2 , e3 g, ð24:184Þ

where e0, e1, e2, and e3 are defined according to (24.95) with
1184 24 Basic Formalism

fi jej = - δij ζ i = - δij ζ j = fj jei ; ζ 0 = - ζ k ðk = 1, ⋯, 3Þ = - 1: ð24:185Þ

With the notation ζk (k = 0, ⋯, 3), see (22.323).


According to Theorem 24.4, we have, e.g., a following orthogonal
decomposition:

4 = Span fe0 g Span fe0 g⊥ ,

where Span {e0}⊥ = Span{e1, e2, e3}. Since ½f0 je0  = 1, e0 is a timelike vector. As
½f1 þ f2 þ f3 je1 þ e2 þ e3  = - 3, e1 + e2 + e3 is a spacelike vector, etc. Meanwhile,
according to Theorem 24.5, a dimension n0 of the totally degenerate subspace W0 is
n0 = min (n+, n-) = min (1, 3) = 1. Then, for instance, we have

W 0 = Spanfje0 þje1 g: ð24:186Þ

Proposition 24.1 tells us that it is impossible to get an orthogonal decomposition


using a degenerate subspace. In fact, as an orthogonal complement to W0 we have

W 0 ⊥ = Span fe0 þ e1 , e2 , e3 g: ð24:187Þ

Then, W0 \ W0⊥ = W0 = j e0] + j e1] ≠ 0. Note that W0⊥ can alternatively be


expressed as [5]

W 0⊥ = W 0 Span fe2 , e3 g: ð24:188Þ

The presence of the totally degenerate subspace W0 has a special meaning in the
theory of special relativity. The collection of all singular vectors forms a light cone.
Since the dimension of W0 is one, it is spanned by a unique lightlike vector (i.e.,
singular vector), e.g., je0] + j e1]. It may be chosen from among any singular vectors.
Let us denote a singular vector by uλ (λ 2 Λ) and the collection of all the singular
vectors that form the light cone by L. Then, we have

L = [λ2Λ uλ : ð24:189Þ

With the above notation, see Sect. 6.1.1. Notice that the set L is not a vector space
(vide infra). The totally degenerate subspace W0 can be described by

W 0 = Span fuλ g,

where uλ is any single singular vector.


We wish to consider the coordinate representation of the light cone in the
orthogonal coordinate system. In the previous chapters, by the orthogonal coordinate
system we meant a two- or three-dimensional Euclidean space, i.e., a real inner
24.1 Extended Concepts of Vector Spaces 1185

product space accompanied by an orthonormal basis. In this chapter, however, we


use the terminologies of orthogonal coordinate system and orthonormal basis, if
basis vectors satisfy the relations of (24.153). Let the vector w 2 L be expressed as

3
w= i=0
ξ i ei : ð24:190Þ

Then, we obtain

3 3 2 2 2 2
i=0
ξ i ei j k=0
ξ k ek = ξ 0 - ξ1 - ξ2 - ξ3 = 0: ð24:191Þ

That is,

2 2 2 2
ξ1 þ ξ2 þ ξ3 = ξ0 : ð24:192Þ

If we regard the time coordinate (represented by ξ0) as parameter, the light cone
can be seen as an expanding (or shrinking) hypersphere in 4 with elapsing time
(both past and future directions).
In Fig. 24.1a, we depict a geometric structure including the light cone of the
three-dimensional Minkowski space 3 so that the geometric feature can appeal to
eye. There, we have

2 2 2
ξ1 þ ξ2 = ξ0 , ð24:193Þ

where ξ1 and ξ2 represent the coordinate of the x- and y-axis, respectively; ξ0 indi-
cates the time coordinate t. Figure 24.1b shows how to make a right circular cone
contained in the light cone.
The light cone comprises the whole collection of singular vectors uλ (λ 2 Λ).
Geometrically, it consists of a couple of right circular cones that come face-to-face
with each other at the vertices, with the axis (i.e., a line that connects the vertex and
the center of the bottom circle) shared in common by the two right circular cones in
3 (see Fig. 24.1a). A lightlike vector (i.e., singular vector) is represented by a
straight line obliquely extending from the origin to both the future and past directions
in the orthogonal coordinate system. Since individual lightlike vectors form the light
cone, it is a kind of ruled surface. A circle expanding (or shrinking) with time
ξ0 according to (24.193) is a conic section of the right circular cone shaping the
light cone.
In Fig. 24.1a, the light cone comprises a two-dimensional figure, and so one
might well wonder if the light cone is a two-dimensional vector space. Think of,
however, e.g., je0] + j e1] and je0] - j e1] within the light cone. If those vectors were
basis vectors, (| e0]+| e1]) + (| e0]-| e1]) = 2 j e0] would be a vector of the light cone as
well. That is, however, evidently not the case, because the vector je0] is located
1186 24 Basic Formalism

(a) t (b)

| ]

| ] y
| ] O

Fig. 24.1 Geometric structure including the light cone of the three-dimensional Minkowski space
3 . (a) The light cone consists of a couple of right circular cones that come face-to-face with each
other at the vertices, with the axis (i.e., a line that connects the vertex and the center of the bottom
circle) shared in common. A lightlike vector (i.e., singular vector) is represented by an oblique
arrow extending from the origin to both the future and past directions. Individual lightlike vectors
form the light cone as a ruled surface. (b) Simple kit that helps visualize the light cone. To make it,
follow next procedures: (i) Take a thick sheet of paper and cut out two circles to shape them into
p
sectors with a central angle θ = 2π radian ≈ 254:6 ° . (ii) Bring together the two sides of the sector
so that it can be a right circular cone with its vertical angle of a right angle (i.e., 90°). (iii) Bring
together the two right circular cones so that they can be jointed at their vertices with the axis shared
in common to shape the light cone depicted in (a) of the above

outside the light cone. Whereas a vector “inside” the light cone is a timelike, that
“outside” the light cone is spacelike.
We have, e.g.,

3 = Span fe0 g Span fe0 g⊥ , ð24:194Þ

where Span {e0}⊥ = Span{e1, e2}. Since ½f0 je0  = 1, e0 is a timelike vector. As
½f1 þ f2 je1 þ e2  = - 2, e1 + e2 is a spacelike vector, etc. Meanwhile, a dimension n0
of the totally degenerate subspace W~0 of 3 is n0 = min (n+, n-) = min (1, 2) = 1.
Then, for instance, as in (24.186), we have

W~0 = Spanfje0 þje1 g: ð24:195Þ

Also, as an orthogonal complement to W~0 , we obtain


24.2 Lorentz Group and Lorentz Transformations 1187

⊥ ⊥
W~0 = Span fe0 þ e1 , e2 g or W~0 = W~0 Span fe2 g: ð24:196Þ


The orthogonal complement W~0 forms a tangent space to the light cone L. We

have W~0 \ W~0 = W~0 = j e0 þ j e1  ≠ 0.
As described above, we have overviewed the characteristics of the extended
concept of vector spaces and studied several properties which the inner product
space and Euclidean space lack. Examples included the singular vectors and degen-
erate subspaces as well as their associated properties. We have already seen that the
indefinite metric caused intractability in the field quantization. The presence of the
singular vectors is partly responsible for it.
The theory of vector space is a fertile ground for mathematics and physics.
Indeed, the theory deals with a variety of vector spaces such as an inner product
space (in Chap. 13), Euclidean space, Minkowski space, etc. Besides them a spinor
space is also contained, even though we do not go into detail about it. Interested
readers are referred to the literature with this topic [5]. Yet, the spinor space naturally
makes an appearance in the Dirac equation as a representation space. We will deal
with the Dirac equation from a different angle in the following sections.

24.2 Lorentz Group and Lorentz Transformations

In the previous section, we have learned that the transformation R that makes the
bilinear form [| ] invariant must satisfy the condition described by

RT M R = M : ð24:104Þ

At the same time, (24.104) has given a condition for the metric tensor M to be
invariant with respect to the vector (or coordinate) transformation R in a vector space
where [| ] is defined. In the Minkowski space 4 , R of (24.104) is called the
Lorentz transformation and written as Λ according to the custom. Also, M is denoted
by η. Then, we have

η = ΛT ηΛ: ð24:197Þ

Equation (24.197) implies that the matrix η is determined, independent of the


choice of the inertial frame of reference. Or rather, the mathematical formulation of
the theory of special relativity has been established in such a way that η can be
invariant in any inertial frame of reference. This has had its origin in the invariance
of the world interval of the events (see Sect. 21.2.2).
As already mentioned in Sect. 21.2, the Lorentz group consists of all the Lorentz
transformations Λ. The Lorentz group is usually denoted by O(3, 1). On the basis of
(24.197), we define O(3, 1) as a set that satisfies the following relation described by
1188 24 Basic Formalism

Oð3, 1Þ = Λ 2 GLð4, ℝÞ; ΛT ηΛ = η : ð24:198Þ

Suppose that Λ1, Λ2 2 O(3, 1). Then, we have

ðΛ1 Λ2 ÞT ηΛ1 Λ2 = ΛT2 ΛT1 ηΛ1 Λ2 = ΛT2 ΛT1 ηΛ1 Λ2 = ΛT2 ηΛ2 = η: ð24:199Þ

That is, Λ1Λ2 2 O(3, 1). With the identity operator E, we have

E T ηE = EηE = ηEE = η: ð24:200Þ


-1
Multiplying both sides of ΛT1 ηΛ1 = η by ΛT1 from the left and (Λ1)-1 from
the right, we obtain

-1
ηðΛ1 Þ - 1 = Λ1- 1 ηðΛ1 Þ - 1 ,
T
η = ΛT1 ð24:201Þ

-1 T
where we used ΛT1 = Λ1- 1 . From (24.199) to (24.201), we find that O(3, 1)
certainly satisfies the definitions of group; see Sect. 16.1. Taking determinant of
(24.197), we have

detΛT detηdetΛ = detΛð- 1ÞdetΛ = detη = - 1, ð24:202Þ

where with the first equality we used detΛT = det Λ [1, 2]. That is, we obtain

ðdetΛÞ2 = 1 or detΛ = ± 1: ð24:203Þ

Lorentz groups are classified according to the sign of determinants and the (0, 0)
component (i.e., Λ00 ) of the Lorentz transformation matrix. Using the notation of
tensor elements, (24.197) can be written as

ηρσ = ημν Λμ ρ Λν σ : ð24:204Þ

In (24.204) putting ρ = σ = 0, we get

2 3 2
η00 Λ0 0 þ η
i = 1 ii
Λi 0 = η00 ,

namely we have

2 3 2
Λ0 0 =1 þ i=1
Λi 0 : ð24:205Þ

Hence,
24.2 Lorentz Group and Lorentz Transformations 1189

Table 24.1 Four connected components of the Lorentz groupa


Connected component Nomenclature Λ0 0 detΛ
L0  SO0 ð3, 1Þ Proper orthochronous Λ0 0 ≥ 1 +1
L1 Improper orthochronous Λ0 0 ≥ 1 -1
L2 Improper anti-orthochronous Λ0 0 ≤ - 1 -1
L3 Proper anti-orthochronous Λ0 0 ≤ - 1 +1
a
With the notations L0 , L1 , L2 , and L3 , see Sect. 24.9.2

2
Λ0 0 ≥ 1: ð24:206Þ

Therefore, we obtain either

Λ0 0 ≥ 1 or Λ0 0 ≤ - 1: ð24:207Þ

Notice that in (24.207) the case of Λ0 0 ≥ 1 is not accompanied by the time


reversal, but the case of Λ0 0 ≤ - 1 is accompanied by the time reversal. Combining
(24.203) and (24.207), we get four combinations for the Lorentz groups and these
combinations correspond to individual connected components of the groups (see
Chap. 20). Table 24.1 summarizes them [6–8]. The connected component that
contains the identity element is an invariant subgroup of O(3, 1); see Theorem
20.9. This component is termed the proper orthochronous Lorentz group and
denoted by SO0(3, 1).
The notation differs from literature to literature, and so care should be taken
accordingly. In this book, to denote the proper orthochronous Lorentz group, we use
SO0(3, 1) according to Hirai [6], for which the index 0 means the orthochronous
transformation and S stands for the proper transformation (i.e., pure rotation without
spatial inversion). In Table 24.1, we distinguish the improper Lorentz transformation
from the proper transformation according as the determinant of the transformation
matrix Λ is ±1 [7]. We will come back to the discussion about the construction of the
Lorentz group in Sect. 24.9.2 in terms of the connectedness.

24.2.1 Lie Algebra of the Lorentz Group

The Lie algebra of the Lorentz group is represented by a (4, 4) real matrix. We have
an important theorem about the Lie algebra of the Lorentz group.
Theorem 24.6 [8] Let the Lie algebra of the Lorentz group O(3, 1) be denoted as
oð3, 1Þ. Then, we have

oð3, 1Þ = X; X T η þ ηX = 0, X 2 glð4, ℝÞ : ð24:208Þ


1190 24 Basic Formalism

Proof Suppose that X 2 oð3, 1Þ. Then, from Definition 20.3, exp(sX) 2 O(3, 1) with
8
s (s: real number). Moreover, from (24.198), we have

exp sXÞT ηðexp sX Þ = η: ð24:209Þ

Differentiating (24.209) with respect to s, we obtain

X T exp sXÞT ηðexp sX Þ þ exp sXÞT ηX ðexp sX Þ = 0: ð24:210Þ

Taking the limit s → 0, we get

X T η þ ηX = 0: ð24:211Þ

Inversely, suppose that X satisfies (24.211). Then, using (15.31), we obtain

1 1 ν 1 sν T ν
ðexp sX ÞT η = exp sX T η = sX T η = X η: ð24:212Þ
ν=0 ν! ν=0 ν!

In (24.212), we have

ν ν-1 ν-1 ν-2


XT η = XT XT η = XT ð- ηX Þ = X T X T ð- ηX Þ
ν-2
= XT ηð- 1Þ2 X 2 = ⋯ = ηð- 1Þν X ν : ð24:213Þ

Substituting (24.213) for (24.212), we get

1 sν 1 sν
ðexp sX ÞT η = ν = 0 ν!
ηð- 1Þν X ν = η ν = 0 ν!
ð- X Þν = η expð- sX Þ

= ηðexp sX Þ - 1 , ð24:214Þ

where with the last equality we used (15.28). Multiplying both sides of (24.214) by
expsX from the right, we obtain

ðexp sX ÞT ηðexp sX Þ = η: ð24:215Þ

This implies that expsX 2 O(3, 1). This means X 2 oð3, 1Þ. These complete the
proof.
Let us seek a tangible form of a Lie algebra of the Lorentz group. Suppose that
X 2 oð3, 1Þ of (24.208). Assuming that
24.2 Lorentz Group and Lorentz Transformations 1191

a b c d
e f g h
X= p q r s ða w : realÞ ð24:216Þ
t u v w

and substituting X of (24.216) for (24.208), we get

2a b - e c-p d -t
b-e - 2f - ð g þ qÞ - ð h þ uÞ
c-p - ð g þ qÞ - 2r - ð s þ vÞ = 0: ð24:217Þ
d-t - ð h þ uÞ - ð s þ vÞ - 2w

Then, we obtain

a = f = r = w = 0, b = e, c = p, d = t, g = - q, h = - u, s = - v: ð24:218Þ

The resulting form of X is

0 b c d
X= b
c
0
-g
g
0
h
s
: ð24:219Þ
d -h -s 0

Since we have ten conditions for sixteen elements, we must have six independent
representations. We show a tangible example of these representations as follows:

0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 1 0 0 -1 0
0 0 0 -1 , 0 0 0 0 , 0 1 0 0 ;

0 0 1 0 0 -1 0 0 0 0 0 0
0 1 0 0 0 0 1 0 0 0 0 1
1 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 , 1 0 0 0 , 0 0 0 0 : ð24:220Þ

0 0 0 0 0 0 0 0 1 0 0 0

The Lie algebra of the Lorentz group can be expressed as linear combinations of
the above matrices (i.e., basis vectors of this Lie algebra). This Lie algebra is a
six-dimensional vector space V6 accordingly. Note that in (24.220) the upper three
matrices are anti-Hermitian and that the lower three matrices are Hermitian. We
symbolically represent the upper three matrices and the lower three matrices as

A  ðA1 , A2 , A3 Þ and B  ðB1 , B2 , B3 Þ, ð24:221Þ


1192 24 Basic Formalism

respectively, in the order from the left of (24.220). Regarding the upper three
matrices, the (3, 3) principal submatrices excluding the (0, 0) element are identical
with Ax, Ay, and Az in the shown order; see (20.26). In other words, these three
matrices represent the spatial rotation in ℝ3. Moreover, these matrices have been
decomposed into a direct sum such that

A1 = f0g Ax , A2 = f0g Ay , A3 = f0g Az , ð24:222Þ

where {0} denotes the (0, 0) element of A. The remaining three matrices B are
called Lorentz boosts.
Expressing a magnitude related to A and B as a  (a1, a2, a3) and b  (b1, b2,
b3), respectively, we succinctly describe any Lorentz transformation Λ as

Λ = expða  AÞ expðb  BÞ: ð24:223Þ

The former factor of (24.223) is related to a spatial rotation and represented by a


unitary matrix (see Sect. 15.2). The latter factor of (24.223) is pertinent to the
Lorentz boost and represented by a Hermitian matrix (Sect. 25.3). Since the Lorentz
transformation is represented by a real matrix, a matrix describing the Lorentz boost
is symmetric.
For example, with the former factor, we obtain, e.g.,

1 0 0 0
0 cos θ - sin θ 0
expðθA3 Þ = , ð24:224Þ
0 sin θ cos θ 0
0 0 0 1

where we put a3 = θ. With the Lorentz boost, we have, e.g.,

coshω 0 0 - sinhω
expð - ωB3 Þ = 0 1 0 0 , ð24:225Þ
0 0 1 0
- sinhω 0 0 coshω

where we put b3 = - ω.
In (24.224), we have 0 ≤ θ ≤ 2π, that is, the parameter space θ is bounded. In
(24.225), on the other hand, we have - 1 < ω < 1 where the parameter space ω is
unbounded. The former group is called a compact group, whereas the latter is said to
be a non-compact group accordingly.
In general, with a compact group, it is possible to construct a unitary represen-
tation [8]. For a non-compact group, however, it is not necessarily possible to
construct a unitary representation. In fact, although expðθA3 Þ of (24.224) is unitary
[see (7)’ of Sect. 15.2], expð- ωB3 Þ in (24.225) in not unitary, but Hermitian. These
features of the Lorentz group lead to various intriguing properties (vide infra).
24.2 Lorentz Group and Lorentz Transformations 1193

24.2.2 Successive Lorentz Transformations [9]

In the last sections, we have studied how the Lorentz transformation is dealt with
using the bilinear form and Lie algebra in the Minkowski space. In the subsequent
sections, we examine how the Dirac equation is transformed through the medium of
the Lorentz transformation within the framework of standard matrix algebra. To
examine the constitution of the Dirac equation appropriately, we need to deal with
the successive Lorentz transformations. These transformations can be well under-
stood by the idea of the moving coordinate systems (Sect. 17.4.2).
Let us express a four-vector X (i.e., space-time vector) in the Minkowski space as

x01
X = ðje0  je1  je2  je3 Þ x2
x3 , ð24:226Þ
x

where {| e0], | e1], | e2], | e3]} are the basis set of Minkowski space defined in Sect.
x01
24.1.7; x2
x3 are corresponding coordinates. We will use a symbol ei (i = 0, 1, 2, 3)
x
instead of jei], in case confusion is unlikely to happen.
Following Sect. 11.4, we specify the transformation in such a way that the said
transformation keeps the vector X unaltered. In terms of the special theory of
relativity, a vector is associated with a certain space-time point where an event has
taken place. Consequently, the vector X should be the same throughout the inertial
frames of reference that are connected to one another via the Lorentz
transformations.
In what follows, we express a four-vector X using a shorthand notation as

x01
X  ex = ðje0  je1  je2  je3 Þ x2
x3 ,
x

where e stands for the basis set of 4 and x denotes the corresponding coordinates
(i.e., a column vector). Let a transformation be expressed as P. We rewrite X as

X = ex = eP - 1 ðPxÞ = e0 x0 , ð24:227Þ

where e and e′ are the set of basis vectors connected by P with e′ = eP-1 and Px = x′.
Once again, we note that in the case where the Lorentz transformation is relevant to
the expression, unlike (11.84) of Sect. 11.4 the coordinate representation is placed
ahead of the basis vector representation (see Sect. 24.1.5). Next, let us think about
the several successive transformations P1, ⋯, Pn. We have
1194 24 Basic Formalism

ex = eP1 - 1 ⋯Pn - 1 ðPn ⋯P1 xÞ = e0 ðPn ⋯P1 xÞ = e0 x0 : ð24:228Þ

From the last equality, we obtain

x0 = Pn ⋯P1 x, ð24:229Þ

because e′ is the basis set of vectors.


Now, we wish to apply the above general point of view to our present case. Let us
think of three successive Lorentz transformations. Of these, two transformations are
spatial rotations and remaining one is a Lorentz boost. We have four pertinent
inertial frames of reference. We denote them by O, Oθ, Oϕ, and O′. Also, we
distinguish the associated basis vectors and corresponding coordinates by

e, x ðOÞ; θ, xθ ðOθ Þ; ϕ, xϕ Oϕ ; e0 , x0 ðO0 Þ: ð24:230Þ

We proceed with the calculations from the assumption that an electron is at rest in
the frame O, where the Dirac equation can readily be solved. We perform the
successive Lorentz transformations that start with O and finally reach O′ via Oθ
and Oϕ such that

O ðeÞ → Oθ ðθÞ → Oϕ ðϕÞ → O0 ðe0 Þ: ð24:231Þ

Then, the successive transformation of the basis vectors is described by

eΛb Λθ- 1 Λϕ- 1 = e0 , ð24:232Þ

where the individual transformations are given in the moving coordinate.


Figure 24.2a depicts these transformations similarly as in the case of Fig. 17.17.
Equation (24.232) can be rewritten as

e = e0 Λϕ Λθ Λb- 1 : ð24:233Þ

In accordance with Fig. 24.2a, Fig. 24.2b shows how the motion of electron looks
from the frame O′, where the electron is moving at a velocity v in the direction
defined by a zenithal angle θ and azimuthal angle ϕ. Correspondingly, from (24.229)
we have

x0 = Λϕ Λθ Λb- 1 x: ð24:234Þ

As the full description of (24.234), we get


24.2 Lorentz Group and Lorentz Transformations 1195

x00 x01
x01
x02 = Λϕ Λθ Λb- 1 x2
x3 : ð24:235Þ
x03 x

Thus, we find that three successive Lorentz transformations have produced the
total Lorentz transformation Λ described by

Λ = Λϕ Λθ Λb- 1 , ð24:236Þ

where Λb- 1 related to the Lorentz boost is the inverse matrix of (24.225). We
compute the combined Lorentz transformation Λ such that

Λ = Λϕ Λθ Λb- 1 =
1 0 0 0 1 0 0 0 coshω 0 0 sinhω
0 cos ϕ - sin ϕ 0 0 cos θ 0 sin θ 0 1 0 0
0 sin ϕ cos ϕ 0 0 0 1 0 0 0 1 0
0 0 0 1 0 - sin θ 0 cos θ sinhω 0 0 coshω

coshω 0 0 sinhω
cos ϕ sin θsinhω cos ϕ cos θ - sin ϕ cos ϕ sin θcoshω
= : ð24:237Þ
sin ϕ sin θsinhω sin ϕ cos θ cos ϕ sin ϕ sin θcoshω
cos θsinhω - sin θ 0 cos θcoshω

Equation (24.237) describes a general form of the Lorentz transformation with


respect to the polar coordinate representation (see Fig. 24.2b). Notice that the
transformation Λ expressed as (24.237) satisfies the criterion of (24.197). The
confirmation is left for the readers.
The Lorentz boost Λb- 1 has been defined in reference to the frame O. It is worth
examining how the said boost is described in reference to the frame O′. For this
purpose, we make the most of the general method that was developed in Sect. 17.4.2.
It is more convenient to take a reverse procedure to (24.231) so that the general
method can directly be applied to the present case. That is, think of the following
transformations

O0 ðe0 Þ → Oϕ ðϕÞ → Oθ ðθÞ → O ðeÞ ð24:238Þ

in combination with the inverse operators to those shown in Fig. 24.2a. Then, use
ð1Þ
(17.85) and (17.91) with k = n = 3. Deleting R3 , we get

ð1Þ ð2Þ ð1Þ - 1


R1 R2 R3 R1 R2 = R3 : ð24:239Þ

ð1Þ ð2Þ
Substituting Λϕ, Λθ, and Λb- 1 for R1, R2 , and R3 , respectively, and defining
-1
R3  Λ~b , we obtain
1196 24 Basic Formalism

(a) (b)
Λ Λ Λ

Fig. 24.2 Successive Lorentz transformations and their corresponding coordinate systems. (a)
Lorentz boost (Λb) followed by two spatial rotations (Λθ- 1 and Λϕ- 1 ). The diagram shows the
successive three transformations of the basis vectors. (b) Geometry of an electron motion. The
electron is moving at a velocity v in the direction defined by a zenithal angle θ and azimuthal angle
ϕ in reference to the frame O′

-1 -1
Λ~b = Λϕ Λθ Λb- 1 Λϕ Λθ : ð24:240Þ

Further defining the combined rotation-related unitary transformation Λu in the


Minkowski space as

Λ u  Λϕ Λθ , ð24:241Þ

we get

Λ = Λu Λb- 1 : ð24:242Þ

Also, we have

-1
Λ~b = Λu Λb- 1 Λ{u : ð24:243Þ

-1
The operator Λ~b represents the Lorentz boost viewed in reference to the
frame O′, whereas the operator Λb- 1 is given in reference to Oθ with respect to the
same boost. Both the transformations are connected to each other through the unitary
similarity transformation and belong to the same conjugacy class, and so they have a
-1
same physical meaning. The matrix representation of Λ~b is given by
24.3 Covariant Properties of the Dirac Equation 1197

-1
Λ~b =
coshω cos ϕsinθsinhω sin ϕsinθsinhω cos θsinhω
cos ϕsin ϕsin θ 2 cos ϕcos θ sinθ
cos ϕsinθsinhω cos 2 ϕcos 2 θ þ sin 2 ϕ
× ðcoshω- 1Þ × ðcoshω -1Þ
þcos 2 ϕ sin 2 θcoshω
sin 2 ϕcos 2 θ þ cos 2 ϕ sinϕcos θ sinθ :
sin ϕsinθsinhω cos ϕ sinϕsin 2 θ
þ sin 2 ϕsin 2 θcoshω × ðcoshω -1Þ
× ðcoshω- 1Þ
sin ϕcos θ sinθ
cos θsinhω cos ϕcos θ sinθ sin 2 θ þ cos 2 θcoshω
× ðcoshω- 1Þ
× ðcoshω- 1Þ
ð24:244Þ

The trace χ of (24.244) equals

χ = 2 þ cosh ω: ð24:245Þ

of Λb- 1 , reflecting the fact that the unitary similarity transformation keeps the trace
unchanged.
Equation (24.242) gives an interesting example of the decomposition of Lorentz
transformation into the unitary part and Hermitian boost part (i.e., polar decompo-
sition). The Lorentz boost is represented by a positive definite Hermitian matrix (or a
real symmetric matrix). We will come back to this issue later (Sect. 24.9.1).

24.3 Covariant Properties of the Dirac Equation

24.3.1 General Consideration on the Physical Equation

It is of great importance and interest to examine how the Dirac equation is


transformed under the Lorentz transformation. Before addressing this question, we
wish to see how physical equations (including the Schrödinger equation, the Klein-
Gordon equation, and Dirac equation) undergo the transformation. In general, the
physical equations comprise a state function X (scalar, vector, spinor, etc.) and an
operator A that operates on the state function X. A general form of the physical
equation is given as

A ðX Þ = 0: ð24:246Þ

According to Sect. 11.2, we wright X as


1198 24 Basic Formalism

c1 ðxÞ
X = ðξ 1 ⋯ ξ n Þ ⋮ , ð24:247Þ
cn ðxÞ

where {ξ1, ⋯, ξn} are the basis functions (or basis vectors) of a representation space
c1 ðxÞ
and ⋮ indicate the corresponding coordinates. In (24.247), we assume that
cn ðxÞ
X is an element of the representation space of dimension n that is related to the
physical equation. Correspondingly, we wright LHS of (24.246) as

c1 ð xÞ
A ðX Þ = ðξ1 ⋯ ξn ÞA ⋮ : ð24:248Þ
cn ð x Þ

Also, according to Sect. 11.4, we rewrite (24.248) such that

c1 ð xÞ
A ðX Þ = ðξ1 ⋯ ξn ÞR - 1 RAR - 1 R ⋮
cn ð x Þ

c01 ðxÞ c01 ðxÞ


= ðξ ′ 1 ⋯ ξ ′ n ÞRAR - 1 ⋮ = ðξ ′ 1 ⋯ ξ ′ n ÞA 0 ⋮ , ð24:249Þ
c0n ðxÞ c0n ðxÞ

where R represents a transformation matrix in a general form; it can be e.g., an


orthogonal transformation, a Lorentz transformation, etc. Notice again that unlike
(11.84) of Sect. 11.4 (see Sect. 24.2.2), the coordinates representation and transfor-
mation are placed ahead of those of the basis vectors.
In (24.249), the relation expressed as

ðξ ′ 1 ⋯ ξ ′ n Þ = ðξ1 ⋯ ξn ÞR - 1 ð24:250Þ

shows the set of basis functions obtained after the transformation. The relation

c01 ðxÞ c1 ðxÞ


⋮ =R ⋮ ð24:251Þ
c0n ðxÞ cn ðxÞ

represents the corresponding coordinates after being transformed. Also, the operator
A 0 given by

A 0 = RAR - 1 ð24:252Þ

denotes the representation matrix of the operator viewed in reference to ξ01 ⋯ ξ0n .
In general, the collection of the transformations forms a group (i.e., transforma-
tion group). In that case, ξ01 , ⋯, ξ0n spans the representation space. To stress that
24.3 Covariant Properties of the Dirac Equation 1199

R is the representation matrix of the group, we identify it with D(R). That is, from
(18.6) we have

DðRÞ  R and ½DðRÞ - 1 = D R - 1 : ð24:253Þ

Then, (24.249) is rewritten as

c01 ðxÞ
A ðX Þ = ðξ ′ 1 ⋯ ξ ′ n ÞDðRÞA ½DðRÞ - 1 ⋮ : ð24:254Þ
c0n ðxÞ

Instead of the full representation of (24.248) and (24.249), we normally write

ðξ1 ⋯ ξn ÞA = 0 or ðξ ′ 1 ⋯ ξ ′ n ÞA 0 = 0: ð24:255Þ

Alternatively, we write

c1 ð xÞ c01 ðxÞ
A ⋮ =0 or A0 ⋮ = 0, ð24:256Þ
cn ð x Þ c0n ðxÞ

where

A 0 = DðRÞA ½DðRÞ - 1 , ð24:257Þ


c01 ðxÞ c1 ð xÞ
⋮ = DðRÞ ⋮ : ð24:258Þ
c0n ðxÞ cn ð xÞ

Equation (24.256) can be viewed as an eigenvalue problem with respect to the


operator A with its corresponding “eigenvector” of (n, 1) matrix. At least one of the
eigenvalues is zero (possibly degenerate). The second equation of (24.256) indicates
the transformation of the operator A and its eigenvectors, which are transformed
according to (24.257) and (24.258), respectively. Their transformation is accompa-
nied by the transformation of the basis functions (or vectors) {ξ1, ⋯, ξn}, even
though the basis functions (or vectors) are not shown explicitly.
In this section, we solely use (24.256) to conform to the custom. The relevant
formalism may well be referred to as the “coordinate representation.” According to
the custom, when the formalism is associated with the Lorentz transformation, we
express the representation D(Λ) as

DðΛÞ  SðΛÞ, ð24:259Þ

where Λ indicates a Lorentz transformation; S(Λ) denotes a representation of Lorentz


group. The notation (Λ) is usually omitted and we simply write S to mean S(Λ).
1200 24 Basic Formalism

24.3.2 The Klein-Gordon Equation and the Dirac Equation

Now, we wish to see how the tangible physical equations which we are interested in
undergo the transformation. We start with the Klein-Gordon equation and then
examine the transformation properties of the Dirac equation in detail.
(i) Klein-Gordon equation:
The equation is given by

μ
∂μ ∂ þ m2 ϕðxÞ = 0: ð24:260Þ

Using (24.252), we have

μ
A  ∂μ ∂ þ m 2 , ð24:261Þ
μ
A 0 = SAS - 1 = S ∂μ ∂ þ m2 S - 1 : ð24:262Þ

Then, (24.256) can be translated into

μ
S ∂μ ∂ þ m2 S - 1  SϕðxÞ = 0: ð24:263Þ

Meanwhile, using (21.25) and (21.51) we obtain

μ 0 μ 0ρ 0 0ρ 0 0ρ
∂μ ∂ ¼ Λν μ ∂ν Λ - 1 ρ
∂ ¼ δνρ ∂ν ∂ ¼ ∂ρ ∂ : ð24:264Þ

Assuming that S is commutative with both ∂μ and ∂μ, (24.263) can be rewritten as

0 0μ
∂μ ∂ þ m2 SϕðxÞ = 0: ð24:265Þ

For (24.265) to be covariant with respect to the Lorentz transformation, we must


have

0 0μ
∂μ ∂ þ m2 ϕ0 ðx0 Þ = 0: ð24:266Þ

Since ϕ(x) is a scalar function, from the discussion of Sect 19.1 we must have

ϕðxÞ = ϕ0 ðx0 Þ = SϕðxÞ: ð24:267Þ

Comparing the first and last sides of (24.267), we get


24.3 Covariant Properties of the Dirac Equation 1201

S  1: ð24:268Þ

Thus, the Klein-Gordon equation gives a trivial but important one-dimensional


representation with the scalar field.
(ii) Dirac equation:
In the x-system, the Dirac equation has been described as

iγ μ ∂μ - m ψ ðxÞ = 0: ð22:213Þ

Since we assume the four-dimensional representation space with the Dirac spinor
ψ(x), we describe ψ(x) as

c1 ð xÞ
ψ ð xÞ  c2 ð xÞ ð24:269Þ
c34 ðxÞ
c ð xÞ

accordingly. Notice that in (24.269) the superscripts take the integers 1, ⋯, 4 instead
of 0, ⋯, 3. It is because the representation space for the Dirac equation is not the
Minkowski space. Since ψ(x) in (24.269) represents a spinor, the superscripts are not
associated with the space-time. Defining

A  iγ μ ∂μ - m, ð24:270Þ

(22.213) is read as

Aψ ðxÞ = 0: ð24:271Þ

If we were faithful to the notation of (24.248), we would write

ðχ 1 ⋯ χ n ÞAψ ðxÞ = 0:

At the moment, however, we only pay attention to the coordinate representation


in the representation space. In this case, we may regard the column vector ψ(x), i.e.,
spinor as the “eigenvector.” The situation is virtually the same as that where we
formulated the quantum-mechanical harmonic oscillator in terms of the matrix
representation (see Sect. 2.3).
A general form (24.256) is reduced to

S iγ μ ∂μ - m S - 1  Sψ ðxÞ = 0: ð24:272Þ

Using (21.51), (24.272) is rewritten as


1202 24 Basic Formalism

0
S iγ μ Λν μ ∂ν - m S - 1  Sψ ðxÞ = 0: ð24:273Þ

We wish to delete the factor Λν μ from (24.273) to explicitly describe 22.213 in the
Lorentz covariant form. For this, it suffices for the following relation to hold:

0 0
Siγ μ Λν μ ∂ν S - 1 = iγ ν ∂ν : ð24:274Þ
0
In (24.274) we have assumed that S is commutative with ∂ν . Rewriting (24.274),
we get

0
∂ν Λν μ Sγ μ S - 1 - γ ν = 0: ð24:275Þ

That is, to maintain the Lorentz covariant form, again, it suffices for the following
relation to hold [10, 11]:

Λν μ Sγ μ S - 1 = γ ν : ð24:276Þ

Notice that in (24.276) neither S nor S-1 is in general commutative with γ μ.


Meanwhile, we define the Dirac spinor observed in the x′-system as [10, 11]

ψ 0 ðx0 Þ  Sψ ðxÞ = Sψ Λ - 1 x0 , ð24:277Þ

where ψ ′(x′) denotes the change in the functional form accompanied by the coordi-
nate transformation. Notice that only in the case where ψ(x) is a scalar function, we
have ψ ′(x′) = ψ(x); see (19.5) and (24.267). Using (24.276), we rewrite (24.273) as

0
iγ μ ∂μ - m  ψ 0 ðx0 Þ = 0: ð24:278Þ

Thus, on the condition of (24.276), we have gotten the covariant form of the Dirac
equation (24.278). Defining

A 0  iγ μ ∂ ′ μ - m, ð24:279Þ

from (24.273), (24.274), and (24.278), we get

A 0 ψ 0 ðx0 Þ = 0, ð24:280Þ

with

A 0 = SAS - 1 : ð24:281Þ

Equation (24.281) results from (24.257).


24.3 Covariant Properties of the Dirac Equation 1203

In this way, (24.281) decides the transformation property of the operator A.


Needless to say, our next task is to determine a tangible form of S so that (24.276)
and (24.281) can hold at once. To do this, (24.276) will provide a good criterion by
which to judge whether we have chosen a proper matrix form for S. These will be
performed in Sect. 24.4.
A series of equations (24.277)-(24.281) can be viewed as the transformation of
the Dirac operators in connection with the Lorentz transformation and is of funda-
mental importance to solve the Dirac equation. That is, once we successfully find an
appropriate form of matrix S in combination with a solution ψ(x) of the Dirac
equation in the x-system, from (24.277) we can immediately determine ψ ′(x′), i.e.,
a solution of the Dirac equation in the x′-system.
To discuss the transformation of the Dirac equation, in this book we call G
defined in (21.91) and G ~ of (21.108) Dirac operators together with their differential
form A  iγ ∂μ - m. Note that normally we call the differential operator (iγ μ∂μ -
μ

m) that appeared in Sect. 21.3.1 Dirac operators. When we are dealing with the plane
wave solution with the Dirac equation, ∂μ is to be replaced with -ipμ or +ipμ (i.e.,
Fourier components) to give the Dirac operator G or G, ~ respectively.
We also emphasize that (24.277) is contrasted with (24.267), which characterizes
the transformation of the scalar function. Comparing the transformation properties of
the Klein-Gordon equation and the Dirac equation, we realize that the representa-
tion matrix S is closely related to the nature of the representation space. In the
representation space, state functions (i.e., scalar, vector, spinor, etc.) and basis
functions (e.g., basis vectors in a vector space) as well as the operators undergo
the simultaneous transformation by the representation matrix S; see (24.257) and
(24.258). In the next section, we wish to determine a general form of S of the Dirac
equation. To this end, we carry the discussion in parallel with that of Sect. 24.2.
Before going into detail, once again we wish to view solving the Dirac equation
as an eigenvalue problem. From (21.96), (21.107), and (21.109), we immediately get

~ = - 2mE,
GþG ð24:282Þ

~ is a normal operator (i.e.,


where E is a (4, 4) identity matrix. Since neither G nor G
~ through the
neither Hermitian nor unitary), it is impossible to diagonalize G and G
unitary similarity transformation (see Theorem 14.5). It would therefore be intrac-
table to construct the diagonalizing matrix of the Dirac operators G and G, ~ even
though we can find their eigenvalues. Instead, if we are somehow able to find a
suitable diagonalizing matrix S(Λ), we can at once solve the Dirac equation. Let S be
such a matrix that diagonalizes G. Then, from (24.282) we have

~ = - 2mS - 1 ES = - 2mE:
S - 1 GS þ S - 1 GS

Rewriting this equation, we obtain


1204 24 Basic Formalism

~ = - 2mE - D,
S - 1 GS

where D is a diagonal matrix given by D = S - 1 GS. This implies that G ~ can also be
diagonalized using the same matrix S.
Bearing in mind this point, in the next section we are going to seek and construct a
matrix S(Λ) that diagonalizes both G and G. ~

24.4 Determination of General Form of Matrix S(Λ)

In the previous section, if the representation matrix S(Λ) is properly chosen,


S(iγ μ∂μ - m)S-1 and Sψ(x) appropriately represent the Dirac operator and Dirac
spinor, respectively. Naturally, then, we are viewing the Dirac operator and spinor in
reference to the basis vectors after being transformed. The condition for S(Λ) to
satisfy is given by (24.276).
The matrices S(Λ) of (24.259) and their properties have been fully investigated
and available up to now [10, 11]. We will make the most of those results to decide a
tangible form of S(Λ) for the Dirac equation. Since S(Λ) is a representation matrix,
from (24.236) we immediately calculate S(Λ) such that

SðΛÞ = S Λϕ SðΛθ ÞS Λb- 1 = S Λϕ SðΛθ Þ½SðΛb Þ - 1 : ð24:283Þ

The matrix S(Λϕ) produced by the spatial rotation is given as [11]

S Λϕ = expðϕρm Þ, ð24:284Þ

where ϕ is a rotation angle and ρm is given by

i k l
ρm  ð- iÞJ kl ; J kl  γ , γ ðk, l, m = 1, 2, 3Þ: ð24:285Þ
4

In (24.285) γ k is a gamma matrix and k, l, m change cyclic. We have

1 σm
J kl = 0
σm
, ð24:286Þ
2 0

where σ m is a Pauli spin matrix and 0 denotes (2, 2) zero matrix. Note that Jkl is given
by a direct sum of σ m. That is,

1
J kl = σ σm : ð24:287Þ
2 m

When the rotation by ϕ is made around the z-axis, S(Λϕ) is given by


24.4 Determination of General Form of Matrix S(Λ) 1205

eiϕ=2 0 0 0
- iϕ=2 0 0
S Λϕ = expðϕρ3 Þ = 0 e : ð24:288Þ
0 0 eiϕ=2 0
0 0 0 e - iϕ=2

With another case where the rotation by θ is made around the y-axis, we have

θ θ
cos sin 0 0
2 2
θ θ
- sin cos 0 0
SðΛθ Þ ¼ expðθρ2 Þ ¼ 2 2 : ð24:289Þ
θ θ
0 0 cos sin
2 2
θ θ
0 0 - sin cos
2 2

Meanwhile, an example of the transformation matrices S(Λb) related to the


Lorentz boost is expressed as

SðΛb Þ = expð- ωβk Þ, ð24:290Þ

where ω represents the magnitude of boost; βk is given by

i
βk  ð- iÞ J 0k ; J 0k  γ 0 γ k ðk = 1, 2, 3Þ: ð24:291Þ
2

If we make the Lorentz boost in the direction of the z-axis, S(Λb) is expressed as

SðΛb Þ ¼ expð- ωβ3 Þ


ω ω
cosh 0 sinh 0
2 2
ω ω
0 cosh 0 - sinh
¼ 2 2 : ð24:292Þ
ω ω
sinh 0 cosh 0
2 2
ω ω
0 - sinh 0 cosh
2 2

As in the case of the Lorentz transformation matrices, the generator


ρm (m = 1, 2, 3), which is associated with the spatial rotation, is anti-Hermitian.
Hence, exp(ϕρ3) and exp(θρ2) are unitary. On the other hand, βk (k = 1, 2, 3),
associated with the Lorentz boost, is Hermitian. The matrix exp(ωβ3) is Hermitian
as well; for these see Sects. 15.2 and 25.4. From (24.288), (24.289), and (24.292) we
get
1206 24 Basic Formalism

SðΛÞ = S Λϕ SðΛθ Þ½SðΛb Þ - 1


= expðϕρ3 Þexpðθρ2 Þexpðωβ3 Þ =

θ θ
cos sin 0 0
2 2
eiϕ=2 0 0 0
θ θ
0 e - iϕ=2 0 0 - sin cos 0 0
2 2
0 0 eiϕ=2 0 θ θ
0 0 cos sin
0 0 0 e - iϕ=2 2 2
θ θ
0 0 - sin cos
2 2
ω ω
cosh 0 - sinh 0
2 2
ω ω
0 cosh 0 sinh
2 2
ω ω
- sinh 0 cosh 0
2 2
ω ω
0 sinh 0 cosh
2 2
θ ω θ ω θ ω θ ω
eiϕ=2 cos cosh eiϕ=2 sin cosh -eiϕ=2 cos sinh eiϕ=2 sin sinh
2 2 2 2 2 2 2 2
θ ω θ ω θ ω θ ω
- e -iϕ=2 sin cosh e -iϕ=2 cos cosh e - iϕ=2 sin sinh e - iϕ=2 cos sinh
= 2 2 2 2 2 2 2 2 :
θ ω θ ω θ ω θ ω
- eiϕ=2 cos sinh eiϕ=2 sin sinh eiϕ=2 cos cosh eiϕ=2 sin cosh
2 2 2 2 2 2 2 2
-iϕ=2 θ ω -iϕ=2 θ ω - iϕ=2 θ ω - iϕ=2 θ ω
e sin sinh e cos sinh -e sin cosh e cos cosh
2 2 2 2 2 2 2 2
ð24:293Þ

Also, we have

½SðΛÞ - 1 =
24.5 Transformation Properties of the Dirac Spinors [9] 1207

θ ω θ ω θ ω θ ω
e - iϕ=2 cos cosh - eiϕ=2 sin cosh e - iϕ=2 cos sinh - eiϕ=2 sin sinh
2 2 2 2 2 2 2 2
θ ω θ ω θ ω θ ω
e - iϕ=2 sin cosh eiϕ=2 cos cosh - e - iϕ=2 sin sinh - eiϕ=2 cos sinh
2 2 2 2 2 2 2 2 :
- iϕ=2 θ ω θ ω - iϕ=2 θ ω θ ω
e cos sinh -e iϕ=2
sin sinh e cos cosh -e iϕ=2
sin cosh
2 2 2 2 2 2 2 2
- iϕ=2 θ ω θ ω - iϕ=2 θ ω θ ω
-e sin sinh -e iϕ=2
cos sinh e sin cosh iϕ=2
e cos cosh
2 2 2 2 2 2 2 2
ð24:294Þ

Equation (24.293) gives a general form of S(Λ). The construction processes of


(24.293) somewhat resemble those already taken to form the unitary transformation
matrix of SU(2) in (20.45) or those to construct the transformation matrix of
(17.101). Because of the presence of [S(Λb)]-1 in (24.293), however, S(Λ) of
(24.293) is not unitary. This feature plays an essential role in what follows.

24.5 Transformation Properties of the Dirac Spinors [9]

Now, we are in the position to seek a tangible form of the Dirac spinors. Regarding
the plane waves, ψ(x) written in a general form in (24.269) is described by

c0 ð xÞ c0
c1 ð x Þ c1
ψ ð xÞ = = e ± ipx = e ± ipx wðp; hÞ, ð24:295Þ
c2 ð x Þ c2
c3 ð x Þ c3

where w(p, h) represents either u(p, h) or v(p, h) given in Sect. 21.3. Note that w(p, h)
does not depend on x. Since px (pμxμ) in e±ipx is a scalar quantity, it is independent
of the choice of an inertial frame of reference, namely, px is invariant with the
Lorentz transformation. Therefore, it behaves as a constant in terms of the operation
of S(Λ).
Let the x-system be set within the inertial frame of reference O where the electron
is at rest (see Fig. 24.2). Then, we have

ψ ðxÞ = e ± ip0 x wð0, hÞ = e ± imx wð0, hÞ:


0 0
ð24:296Þ

Meanwhile, (24.277) is rewritten as


0 0 0 0
ψ 0 ðx0 Þ = e ± ip x SðΛÞwð0, hÞ = e ± ip x wðp0 , hÞ: ð24:297Þ

In (24.297) we assume that x′-system is set within the inertial frame of reference

O where the electron is moving at a velocity of v (see Fig. 24.2). Note that in
p00
(24.297) p′x′ = px = p0x0 = mx0 and p = p0
p
; p 0 = p0 , p0 = p0
.
1208 24 Basic Formalism

In the remaining parts of this section, we use the variables x and p for the frame
O and the variables x′ and p′ for the frame O′. Note that the frames O and O′ are
identical to those appearing in Sect. 24.2.2. First, we wish to seek the solution w(0, h)
of the Dirac equation with an electron at rest. In this case, momentum p of the
electron is zero, i.e., p1 = p2 = p3 = 0. Then, if we find the solution w(0, h) in the
frame O, we are going to get a solution w(p′, h) in the frame O′ such that

SðΛÞwð0, hÞ = wðp0 , hÞ: ð24:298Þ

Equation (24.298) implies that the helicity is held unchanged by the transforma-
tion S(Λ). Now, the Dirac operator of the rest frame is given by

pμ γ μ - m = p0 γ 0 - m = mγ 0 - m = m γ 0 - 1  A ð24:299Þ

or

- pμ γ μ - m = - p0 γ 0 - m = - mγ 0 - m = m - γ 0 - 1  B: ð24:300Þ

Namely, we have

0 0 0 0
0 0 0 0
A ð24:301Þ
0 0 - 2m 0
0 0 0 - 2m

and

- 2m 0 0 0
0 - 2m 0 0
B : ð24:302Þ
0 0 0 0
0 0 0 0

The operators A and B correspond to the positive-energy solution and the


negative-energy solution, respectively. For the former case, we have

0 0 0 0 c0
0 0 0 0 c1
= 0: ð24:303Þ
0 0 - 2m 0 c2
0 0 0 - 2m c3

In (24.303) we have c2 = c3 = 0 and c1 and c2 can freely be chosen. Then, with


the two independent solutions, we get
24.5 Transformation Properties of the Dirac Spinors [9] 1209

1 0
0 1
u1 ð0; - 1Þ = , u2 ð0; þ1Þ = : ð24:304Þ
0 0
0 0

With the plane wave solutions, we have

1 0
- imx0 0 - imx0 1
ψ 1 ð xÞ = e , ψ 2 ðxÞ = e x0  t : ð24:305Þ
0 0
0 0

With the negative-energy solution, on the other hand, we have

- 2m 0 0 0 c0
0 - 2m 0 0 c1
= 0: ð24:306Þ
0 0 0 0 c2
0 0 0 0 c3

Therefore, as another two independent solutions we obtain

0 0
0 0
v1 ð0; þ1Þ = , v2 ð0; - 1Þ = ð24:307Þ
1 0
0 1

with the plane wave solutions described by

0 0
0 0 0 0
ψ 3 ðxÞ = eimx , ψ 4 ðxÞ = eimx : ð24:308Þ
1 0
0 1

The above solutions (24.304) can be automatically obtained from the results of
Sect. 21.3 (including the normalized constant as well). That is, using (21.104) and
putting θ = ϕ = 0 as well as S = 0 in (21.104), we have the same results as (24.304).
Note, however, that in (24.307) we did not take account of the charge conjugation
(vide infra). But, using (21.120) and putting θ = ϕ = 0 and S = 0 once again, we
obtain

0 0
0 0
v1 ð0; þ1Þ = , v2 ð0; - 1Þ = : ð24:309Þ
1 0
0 -1

Notice that the latter solution of (24.309) has correctly reflected the charge
conjugation. Therefore, with the negative-energy plane wave solutions, we must use
1210 24 Basic Formalism

0 0
imx0 0 imx0 0
ψ 3 ð xÞ = e , ψ 4 ð xÞ = e : ð24:310Þ
1 0
0 -1

Even though (24.303) and (24.306) and their corresponding solutions (24.304)
and (24.309) look trivial, they provide solid ground for the Dirac equation and its
solution of an electron at rest. In our case, these equations are dealt with in the frame
O and described by x.
In the above discussion, (24.305) and (24.310) give ψ(x) in (24.296). Thus,
substituting (24.304) and (24.309) for w(0, h) of (24.297) and using D(Λ) = S(Λ)
of (24.293), we get at once the solutions ψ ′(x′) described by x′ with respect to the
frame O′. The said solutions are given by

θ ω θ ω
eiϕ=2 cos cosh eiϕ=2 sin cosh
2 2 2 2
θ ω θ ω
- e - iϕ=2 sin cosh e - iϕ=2 cos cosh
0 0 2 2 0 0 2 2
ψ 01 ðx0 Þ = e - ip x , ψ 02 ðx0 Þ = e - ip x ,
θ ω θ ω
- eiϕ=2 cos sinh eiϕ=2 sin sinh
2 2 2 2
θ ω θ ω
e - iϕ=2 sin sinh e - iϕ=2 cos sinh
2 2 2 2
θ ω θ ω
- eiϕ=2 cos sinh -e
iϕ=2
sin sinh
2 2 2 2
θ ω θ ω
e - iϕ=2 sin sinh - e - iϕ=2 cos sinh
0 0 2 2 0 0 2 2
ψ 03 ðx0 Þ = eip x , ψ 04 ðx0 Þ = eip x :
θ ω θ ω
eiϕ=2 cos cosh -e iϕ=2
sin cosh
2 2 2 2
θ ω θ ω
- e - iϕ=2 sin cosh - e - iϕ=2 cos cosh
2 2 2 2
ð24:311Þ

The individual column vectors of (24.311) correspond to those of (24.293) with


the sign for ψ 04 ðx0 Þ reversed relative to the last column of (24.293). This is because of
the charge conjugation. Note that both the positive-energy solution and the negative-
energy solution are transformed by the same representation matrix S. This is evident
from (24.297).
Further rewriting (24.311), we obtain e.g.,
24.5 Transformation Properties of the Dirac Spinors [9] 1211

θ
eiϕ=2 cos
2
θ
0 0 ω - e - iϕ=2 sin
ψ 01 ðx0 Þ = e - ip x cosh θ 2 ω : ð24:312Þ
2 - eiϕ=2 cos tanh
2 2
θ ω
e - iϕ=2 sin tanh
2 2

Notice that in (24.312) we normalized ψ 1(x) to unity [instead of 2m used in e.g.,


(21.141)]. Here we wish to associate the hyperbolic functions with the physical
quantities of the Dirac fields. From (1.8), (1.9), and (1.21) along with (21.45), we
have [12]

p00 = mγ = m cosh ω and p0 = mγv: ð24:313Þ

As important formulae of the hyperbolic functions (associated with the physical


quantities), we have

ω 1 þ cosh ω p00 þ m ω - 1 þ cosh ω p00 - m


cosh = = , sinh = = ,
2 2 2m 2 2 2m
ð24:314Þ

ω sinh ω2 p00 - m ðp00 Þ2 - m2 P0


tanh = = = = 00 = S0, ð24:315Þ
2 cosh ω2 p00 þ m ðp þ m Þ
00 2 p þm

where P 0 j p0 j and S 0  P 0 = p0 0 þ m [see (21.97) and (21.99)]. With the second


to the last equality of (24.315), we used (21.79) and (21.97); with the last equality we
used (21.99). Then, we rewrite (24.312) as

θ
eiϕ=2 cos
2
θ
0 0 p00 þ m - e - iϕ=2 sin
ψ 01 ðx0 Þ = e - ip x 2 : ð24:316Þ
2m θ
-S ′e iϕ=2
cos
2
0 - iϕ=2 θ
Se sin
2

The column vector of (24.316) is exactly the same as the first equation of (21.104)
including the phase factor (but, aside from the normalization factor 1=2m). Other
three equations of (24.311) yield the column vectors same as given in (21.104) and
(21.120) as well.
Thus, following the procedure of (24.277) we have proven that S(Λ) given in
(24.293) produces the proper solutions of the Dirac equation. From (24.304) and
(24.309), we find that individual column vectors of S(Λ) yield four solutions of the
~ (aside from the phase factor). This
Dirac equation, i.e., the eigenvectors of G and G
1212 24 Basic Formalism

suggests that S(Λ) is a diagonalizing matrix of G and G. ~ For the same reason, the
column vectors of S(Λ) are the eigenvectors of H for (21.86) as well. As already
pointed out in Sect. 21.3.2, however, G is not Hermitian. Consequently, on the basis
of Theorem 14.5, the said eigenvectors shared by G and H do not constitute the
orthonormal system. Hence, we conclude S(Λ) is not unitary. This is consistent with
the results of Sect. 24.4; see an explicit matrix form of S(Λ) indicated by (24.293).
That is, the third factor matrix representing the Lorentz boost in LHS of (24.293) is
not unitary.
Now, we represent the positive energy solutions and negative-energy solutions as
Ψ and Ψ ~ altogether, respectively. Operating both sides of (24.282) on Ψ and Ψ
~ from
the left, we get

~ = 0 þ GΨ
GΨ þ GΨ ~ = GΨ
~ = - 2mΨ,
~ þG
GΨ ~Ψ~ = GΨ
~ þ 0 = GΨ
~ = - 2mΨ:
~

From the above equations, we realize that Ψ is an eigen function that belongs to
the eigenvalue 0 of G and, at the same time, an eigen function that belongs to the
eigenvalue -2m of G. ~ In turn, Ψ~ is an eigen function that belongs to the eigenvalue
~
0 of G and, at the same time, an eigen function that belongs to the eigenvalue -2m of
G. Both G and G ~ possess the eigenvalues 0 and -2m with both doubly degenerate.
Thus, the questions raised in Sect. 21.3.2 have been adequately answered.
The point rests upon finding out a proper matrix form of S(Λ). We will further
pursue the structure of S(Λ) in Sect. 24.9 in relation to the polar decomposition of a
matrix.

24.6 Transformation Properties of the Dirac Operators [9]

In the previous section, we have derived the functional form of the Dirac spinor from
that of the Dirac spinor determined in the rest frame of the electron. In this section we
study how the Dirac operator is transformed according to the Lorentz transformation.
In the first half of this section, we examine it in reference to the rest frame in
accordance with the approach taken in the last section. More generally, however,
we need to investigate the transformation property of the Dirac operator in reference
to a moving frame of an electron. We deal with this issue in the latter half of the
section.

24.6.1 Case I: Single Lorentz Boost

In (24.293) we considered S(Λ) = S(Λϕ)S(Λθ)[S(Λb)]-1. To further investigate the


operation and properties of S(Λ), we consider the spatial rotation factor S(Λϕ)S(Λθ)
and the Lorentz boost [S(Λb)]-1 of (24.293) separately.
Defining SR as
24.6 Transformation Properties of the Dirac Operators [9] 1213

SR  SðΛu Þ = S Λϕ SðΛθ Þ, ð24:317Þ

we have

θ θ
cos sin 0 0
2 2
eiϕ=2 0 0 0
θ θ
0 e - iϕ=2 0 0 - sin cos 0 0
2 2
SR ¼
0 0 eiϕ=2 0 θ θ
0 0 cos sin
2 2
0 0 0 e - iϕ=2
θ θ
0 0 - sin cos
2 2
θ θ
eiϕ=2 cos eiϕ=2 sin 0 0
2 2
θ θ
- e - iϕ=2 sin e - iϕ=2 cos 0 0
¼ 2 2 : ð24:318Þ
θ θ
0 0 eiϕ=2 cos eiϕ=2 sin
2 2
- iϕ=2 θ - iϕ=2 θ
0 0 -e sin e cos
2 2

Also defining a (2, 2) submatrix S~R as

θ θ
eiϕ=2 cos eiϕ=2 sin
S~R  2
θ
2
θ , ð24:319Þ
- e - iϕ=2 sin e - iϕ=2 cos
2 2

we have

SR 0
SR = : ð24:320Þ
0 SR

Then, we obtain

-1
0 0 SR
SR ASR- 1 = ð - 2mÞ SR 0 0
-1
0 SR 0 E2 0 SR

= ð- 2mÞ 0
0
0
S~R E 2 S~R
-1 = ð- 2mÞ 0
0
0
E2
= A, ð24:321Þ
1214 24 Basic Formalism

where A is that appeared in (24.301). Consequently, the functional form of the Dirac
equation for the electron at rest is invariant with respect to the spatial rotation.
Corresponding to (24.311), the Dirac spinor is given by e.g.,

θ
eiϕ=2 cos
1 2
00
ψ 01 ðx0 Þ = SR 0
0
= e - imx - e - iϕ=2 sin
θ
, ð24:322Þ
0 2
0
0

θ
eiϕ=2 sin
0 2
00 θ
ψ 02 ðx0 Þ = SR 1
0
= e - imx e - iϕ=2 cos : ð24:323Þ
2
0 0
0

Similarly, we have

0
0 0 θ
00
ψ 03 ðx0 Þ = SR 0
= eimx eiϕ=2 cos
2
, ð24:324Þ
1
0 - iϕ=2
θ
-e sin
2

0
0 0
00 θ
ψ 04 ðx0 Þ = SR 0
0
= eimx - eiϕ=2 sin
2
: ð24:325Þ
-1 - iϕ=2
θ
-e cos
2

With the projection operators, we obtain e.g.,

θ θ θ
cos2 - eiϕ cos sin 0 0
2 2 2
θ θ θ
jψ 01 ψ 01 j = = jψ 01 ψ 01 j :
2
- e - iϕ cos sin
2 2
sin 2
2
0 0 ð24:326Þ
0 0 0 0
0 0 0 0

We find that jψ 01 ψ 01 j is a Hermitian and idempotent operator (Sects. 12.4 and


14.1); i.e., a projection operator sensu stricto (Sect. 18.7). It is the case with other
operators jψ 0k ψ 0k j ðk = 2, 3, 4Þ as well. Similarly calculating other operators, we get

4
k=1
jψ 0k ψ 0k j = E 4 , ð24:327Þ

where E4 is a (4, 4) identity matrix. Other requirements with the projection operators
mentioned in Sect. 14.1 are met accordingly (Definition 14.1). Equation (24.327)
represents the completeness relation (see Sect. 18.7).
24.6 Transformation Properties of the Dirac Operators [9] 1215

It is worth mentioning that in (24.322)-(24.325) the components of upper two


rows and lower two rows are separated with the two pairs of Dirac spinors ψ 1(x),
ψ 2(x) and ψ 3(x), ψ 4(x). The spatial rotation does not change the relative movement
between the observers standing still on the individual inertial frames of reference.
This is the case with the frames O and OR. Here, we have defined OR as an inertial
frame of reference that is reached by the spatial rotation of O. Consequently, if the
observer standing still on OR is seeing an electron at rest, the other observer on O is
seeing that electron at rest. Thus, the physical situation is the same in that sense with
those two observers either on O or on OR.
Meanwhile, defining the Lorentz boost factor Sω  [S(Λb)]-1 in (24.293) as

ω ω
cosh 0 - sinh 0
2 2
ω ω
0 cosh 0 sinh
Sω ¼ 2 2 , ð24:328Þ
ω ω
- sinh 0 cosh 0
2 2
ω ω
0 sinh 0 cosh
2 2

we get

Sω ASω- 1

ω ω
cosh 0 - sinh 0
2 2
ω ω
0 cosh 0 sinh 0 0
¼ ð- 2mÞ 2 2
ω ω
- sinh 0 cosh 0 0 E2
2 2
ω ω
0 sinh 0 cosh
2 2

ω ω
cosh 0 sinh 0
2 2
ω ω
0 cosh 0 - sinh
2 2
×
ω 0 ω 0
sinh cosh
2 2
ω ω
0 - sinh 0 cosh
2 2
1216 24 Basic Formalism

ω ω ω
- sin h2 0 - cosh sinh 0
2 2 2
ω ω ω
0 - sin h2 0 cosh sinh
2 2 2
¼ ð - 2mÞ :
ω ω ω
cosh sinh 0 cos h2 0
2 2 2
ω ω ω
0 - cosh sinh 0 cos h2
2 2 2
ð24:329Þ

Rewriting (24.329) by use of the formulae of the hyperbolic functions of


(24.314), we obtain

p′ 0 - m 0 p′ 3 0
0 p′ - m
0
0 - p′ 3
Sω ASω- 1 = p′ 3 > 0 :
- p′ 3 0 - p′ 0 - m 0
0 p′ 3 0 - p′ 0 - m
ð24:330Þ

In the above computation, we put ϕ = θ = 0 in (24.293). From (24.314) we had

ω ω p00 þ m p00 - m j p0 j p03


cosh sinh = = = :
2 2 2m 2m 2m 2m

This is identical with RHS of (21.80). The Dirac spinor is given by e.g.,

ω
1 cosh
2
0 ′
x′ 0
ψ 1′ x ′ = Sω = e - ip ω
0 - sinh
0 2
0
1
0
′ ′ p′0 þ m j p′3 j
= e - ip x
- ′0 , ð24:331Þ
2m p þm
0
24.6 Transformation Properties of the Dirac Operators [9] 1217

0
0 ω
1 cosh
′ ′ 2
ψ 2′ x ′ = Sω = e - ip x
0 0
0 ω
sinh
2
0
1

x′ p′0 þ m
= e - ip 0 : ð24:332Þ
2m ′3
jp j
p′0 þ m
Similarly, we have

ω
0 - sinh
2
0 ′
x′ 0
ψ 3′ x ′ = Sω = eip ω
1 cosh
0 2
0
j p′3 j
-

x′ p′0 þ m p′0 þ m
= eip 0 , ð24:333Þ
2m
1
0
0
ω
0 sinh
0 ′ ′ 2
ψ 4′ x ′ = Sω = eip x
0 0
-1
ω
cosh
2
0
j p′3 j
′ ′ p′0 þ m - ′0
= eip x p þm : ð24:334Þ
2m 0
-1
In the case of the Lorentz boost, we see that from (24.322) to (24.325) the
components of upper two rows and lower two rows are intermixed with the two
pairs of Dirac spinors ψ 01 ðx0 Þ, ψ 02 ðx0 Þ and ψ 03 ðx0 Þ, ψ 04 ðx0 Þ. This is because whereas one
is observing the electron at rest, the other is observing the moving electron.
Using (24.301) and (24.302), we have

B = - 2m - A: ð24:335Þ

Therefore, multiplying both sides of (24.335) by Sω from the left and Sω- 1 from
the right and further using (24.330), we get
1218 24 Basic Formalism

Sω BSω- 1 = - 2m - Sω ASω- 1
-p′0 -m 0 - p′3 0
0 -p′0 -m 0 p′3
= : ð24:336Þ
p′3 0 p′0 -m 0
0 -p′3 0 p′0 -m

Equation (24.336) is the operator that gives the negative-energy solution of the
Dirac equation. From (24.293) we see that S(Λ) is decomposed into S(Λ) = SRSω.
Although SR is unitary, Sω is not a unitary matrix nor a normal matrix either.
Accordingly, S is not a unitary matrix as a whole. Using (24.329) in combination
with (24.318), we obtain

SAS - 1 = SR Sω ASω- 1 SR- 1 = SðΛÞA½SðΛÞ - 1 = ð- 2mÞ ×

ω ω ω iϕ ω ω
-sinh2 0 -cos θ cosh sinh e sin θ cosh sinh
2 2 2 2 2
ω ω ω ω ω
0 -sinh2 e-iϕ sin θ cosh sinh cos θ cosh sinh
2 2 2 2 2 :
ω ω ω ω ω
cos θ cosh sinh -e sin θ cosh sinh

cosh2 0
2 2 2 2 2
ω ω ω ω ω
-e-iϕ sin θ cosh sinh -cos θ cosh sinh 0 cosh2
2 2 2 2 2
ð24:337Þ

Using (24.314) and (21.79), (24.337) can be rewritten in the polar coordinate
representation as

SAS - 1
p′ 0 - m 0 j p′j cos θ - j p′j eiϕ sin θ
- iϕ
0 p′ - m
0
- j p′j e sin θ -j p′j cos θ
= :
- j p′j cos θ j p′j eiϕ sin θ - p′ 0 - m 0
j p′j e - iϕ sin θ j p′j cos θ 0 - p′ 0 - m
ð24:338Þ

Converting the polar coordinate into the Cartesian coordinate, we find that
(24.338) is identical with the (4, 4) matrix of (21.75). Namely, we have successfully
gotten

SAS - 1 = G: ð24:339Þ
-1
We remark that when deriving (24.338), we may choose Λ~b of (24.240) for
-1
the Lorentz boost. Using S Λb~ in place of S(Λ) = S(Λϕ)S(Λθ)[S(Λb)]-1,
instead of SAS-1 of (24.338) we get
24.6 Transformation Properties of the Dirac Operators [9] 1219

-1
S Λ~b  A  S Λ~b ,

which yields the same result as SAS-1 of (24.338). This is evident in light of
(24.321). Corresponding to (24.243), we have

-1
S Λ~b = SðΛu ÞS Λb- 1 S Λ{u = SðΛu ÞS Λb- 1 S Λu- 1

= SðΛu ÞS Λb- 1 ½SðΛu Þ - 1 = SðΛu ÞS Λb- 1 ½SðΛu Þ{

ω ω ω
cosh 0 - cos θ sinh eiϕ sin θ sinh
2 2 2
ω ω ω
0 cosh e - iϕ sin θ sinh cos θ sinh
2 2 2
¼ :
ω ω ω
- cos θ sinh eiϕ sin θ sinh cosh 0
2 2 2
- iϕ ω ω ω
e sin θ sinh cos θ sinh 0 cosh
2 2 2
ð24:340Þ

-1
In (24.340) we obtain the representation S Λ~b as a consequence of viewing

the operator S Λb- 1in reference to the frame O . The trace of (24.340) is 4 cosh ω2 ,
which is the same as [S(Λb)]-1, reflecting the fact that the unitary similarity trans-
formation keeps the trace unchanged; see (24.328).
Now, the significance of (24.339) is evident and consistent with (24.281). Tracing
the calculation (24.339) reversely, we find that S - 1 GS = A. That is, G has been
diagonalized to yield A through the similarity transformation with S. The operator G
possesses the eigenvalues of -2m (doubly degenerate) and zero (doubly degener-
ate). We also find that the rank of G is two. From (24.338) and (24.339), we find that
the trace of G is -4m that is identical with the trace of the matrices A and B of
(24.301) and (24.302).
Let us get back to (24.282) once again. Sandwiching both sides of (24.282)
between S-1 and S from the left and right, respectively, we obtain

~ = - 2mS - 1 ES:
S - 1 GS þ S - 1 GS ð24:341Þ

~ = - 2mE. Further using (24.301) and (24.302), we


Using (24.339), A þ S - 1 GS
get
1220 24 Basic Formalism

Table 24.2 Eigenvalues and their corresponding Dirac spinors (eigenspinors) of the Dirac
operators
Dirac operatora Dirac spinorb
Inertial frame Inertial frame
O O′ Eigenvalue O O′
A G 0 ψ 1(x), ψ 2(x) Sψ 1(x), Sψ 2(x)
-2m ψ 3(x), ψ 4(x) Sψ 3(x), Sψ 4(x)
B ~
G 0 ψ 3(x), ψ 4(x) Sψ 3(x), Sψ 4(x)
-2m ψ 1(x), ψ 2(x) Sψ 1(x), Sψ 2(x)
a
A and B are taken from (24.301) and (24.302), respectively. G and G ~ are connected to A and
B through similarity transformation by S, respectively; see (24.339) and (24.342)
b
ψ 1(x) and ψ 2(x) are taken from (24.305); ψ 3(x) and ψ 4(x) are taken from (24.310).

~ = - 2mE - A = B
S - 1 GS or ~
SBS - 1 = G: ð24:342Þ

The operator G ~ is found to possess the eigenvalues of -2m (doubly degenerate)


and zero (doubly degenerate) as well, in agreement with the argument made at the
end of Sect. 24.5. Thus, choosing S of (24.293) for an appropriate diagonalizing
matrix, both G and G ~ have been successfully diagonalized so as to be A and B,
respectively.
Table 24.2 summarizes eigenvalues and their corresponding Dirac spinors
(eigenspinors) of the Dirac operators. Note that both the eigenvalues 0 and -2m
are doubly degenerate. Of the Dirac spinors, those belonging to the eigenvalue 0 are
the solutions of the Dirac equation.
Following (24.277), we define

SϕðxÞ  ϕ0 ðx0 Þ and Sχ ðxÞ  χ 0 ðx0 Þ: ð24:343Þ

In (24.343) ϕ(x) is chosen from among ψ 1(x) and ψ 2(x) of (24.305) and represents
the positive-energy state of an electron at rest; χ(x) from among ψ 3(x) and ψ 4(x) of
(24.310) and for the negative-energy state of an electron at rest. Using (24.301),
(24.303) is rewritten as

AϕðxÞ = 0: ð24:344Þ

Then, we have successively

AS - 1  SϕðxÞ = 0, SAS - 1  SϕðxÞ = 0: ð24:345Þ

The second equation of (24.345) can be rewritten as

Gϕ0 ðx0 Þ = 0, ð24:346Þ


24.6 Transformation Properties of the Dirac Operators [9] 1221

where G = SAS - 1 and ϕ′(x′) = Sϕ(x). Taking a reverse process of the above three
equations, we get

Gϕ0 ðx0 Þ = 0, GS  S - 1 ϕ0 ðx0 Þ = 0, S - 1 GS  S - 1 ϕ0 ðx0 Þ = 0: ð24:347Þ

The final equation of (24.347) is identical to (24.344), where

S - 1 GS = A and S - 1 ϕ0 ðx0 Þ = ϕðxÞ: ð24:348Þ

The implication of (24.348) is that once G has been diagonalized with S, ϕ′(x′) is
at once given by operating S on a simple function ϕ(x) that represents the state of an
electron at rest. This means that the Dirac equation has been solved.
With negative-energy state of an electron, starting with (24.306), we reach a
related result described by

~ 0 ð x0 Þ = 0
Gχ ð24:349Þ

with G~ = SBS - 1 and χ ′(x′) = Sχ(x). Evidently, (24.346) and (24.349) correspond to
(21.73) and (21.74), respectively. At the same time, we find that both G and G ~
possess eigenvalues -2m and 0. It is evident from (24.303) and (24.306).
At the end of this section, we remark that in relation to the charge conjugation
mentioned in Sect. 21.5, using the matrix C of (21.164) we obtain following
relations:

CAC - 1 = B and CBC - 1 = A:

24.6.2 Case II: Non-Coaxial Lorentz Boosts

As already seen earlier, the Lorentz transformations include the unitary operators
that are related to the spatial rotation and Hermitian operators (or real symmetric
operators) called Lorentz boost. In the discussion made thus far, we focused on the
transformation that included only a single Lorentz boost. In this section, we deal with
the case where the transformation contains complex Lorentz boosts, especially
non-coaxial Lorentz boosts. When the non-coaxial Lorentz boosts are involved,
we encounter a somewhat peculiar feature.
As described in Sect. 24.2.2, the successive Lorentz transformations consisted of
a set of two spatial rotations and a single Lorentz boost. In this section, we further
make a step toward constructing the Lorentz transformations. Considering those
three successive transformations as a unit, we combine two unit transformations so
that the resulting transformation can contain two Lorentz boosts. We have two cases
for this: (i) The boosts are coaxial. (ii) The boosts are non-coaxial. Whereas in the
former case the boost directions are the same when viewed from a certain inertial
frame of reference, in the latter case the boost directions are different.
1222 24 Basic Formalism

As can be seen from Fig. 24.2a, if the rotation angle θ ≠ 0 in the frame Oθ, the
initial Lorentz boost and subsequently occurring boost are non-coaxial. Here, we
assume that the second Lorentz boost takes place along the z′-axis of the frame O′. In
the sense of the non-coaxial boost, the situation is equivalent to a following case: An
observer A1 is watching a moving electron with a constant velocity in the direction
of, say the z-axis in a certain inertial frame of reference F1 where the observer A1
stays at rest. Meanwhile, another observer A2 is moving with a constant velocity in
the direction of, say along the x-axis relative to the frame F1. On this occasion, we
have a question of how the electron is moving in reference to the frame F2 where the
observer A2 stays at rest.
In Sect. 24.2.2 we expressed the successive transformations as

Λ = Λϕ Λθ Λb- 1 : ð24:236Þ

Let us denote the transformation by (ϕ, θ, b). Let another transformation be


(ϕ′, θ′, b′) that is described by

Λ0 = Λϕ0 Λθ0 Λb-0 1 : ð24:350Þ

Suppose that (ϕ′, θ′, b′) takes place subsequently to the initial transformation
(ϕ, θ, b).
The overall transformation Ξ is then described such that

Ξ = Λ0 Λ = Λϕ0 Λθ0 Λb-0 1 Λϕ Λθ Λb- 1 : ð24:351Þ

Note that in (24.351) we are viewing the transformation in the moving coordinate
systems. That is, in the second transformation Λ′ the boost takes place in the
direction of the z′-axis of the frame O′ (Fig. 24.2a). As a result, the successive
transformations Λ′Λ cause the non-coaxial Lorentz boosts in the case of θ ≠ 0. In a
special case where θ = 0 is assumed for the original transformation Λ, however,
Ξ = Λ′Λ produces co-axial boosts.
Let the general form of the Dirac equation be

Dψ ðxÞ = 0, ð24:352Þ

where D represents either G of (24.339) or G ~ of (24.342). Starting with the rest


frame of the electron, we have D = A or D = B and ψ ðxÞ = e ± imx wð0, hÞ ; see
0

(24.296) with the latter equation. Operating S(Λ) of (24.293) on (24.352) from the
left, we obtain

SðΛÞD½SðΛÞ - 1 SðΛÞψ ðxÞ = 0: ð24:353Þ

Defining D0  SðΛÞD½SðΛÞ - 1 and using (24.277), we have


24.6 Transformation Properties of the Dirac Operators [9] 1223

D0 ψ 0 ðx0 Þ = 0: ð24:354Þ

Further operating S(Λ′) on (24.354) from the left, we get

-1
SðΛ0 ÞD0 ½SðΛ0 Þ SðΛ0 Þψ 0 ðx0 Þ = 0, ð24:355Þ
-1
where Λ′ was given in (24.350). Once again, defining D00  SðΛ0 ÞD0 ½SðΛ0 Þ and
ψ ′′(x′′)  S(Λ′)ψ ′(x′), we have

D00 ψ 00 ðx00 Þ = 0: ð24:356Þ

We assume that after the successive Lorentz transformations Λ and Λ′, we reach
the frame O′′. Also, we have

ψ 00 ðx00 Þ = SðΛ0 Þψ 0 ðx0 Þ = SðΛ0 ÞSðΛÞψ ðxÞ = SðΛ0 ΛÞψ ðxÞ,


-1 -1
D00 = SðΛ0 ÞD0 ½SðΛ0 Þ = SðΛ0 ÞSðΛÞD½SðΛÞ - 1 ½SðΛ0 Þ
-1
= SðΛ0 ÞSðΛÞD½SðΛ0 ÞSðΛÞ - 1 = SðΛ0 ΛÞD½SðΛ0 ΛÞ : ð24:357Þ

In the second equation of (24.357), with the last equality we used the fact that
S(Λ) is a representation of the Lorentz group. Repeatedly operating S(Λ), S(Λ′), etc.,
we can obtain a series of successively transformed Dirac spinors and corresponding
Dirac operators. Thus, the successive transformations of the Dirac operators can be
viewed as a similarity transformation as a whole. In what follows, we focus on a case
where two non-coaxial Lorentz boosts take place.
Example 24.4 As a tangible example, we show how the Dirac spinors and Dirac
operators are transformed as a result of the successively occurring Lorentz trans-
formations. Table 24.3 shows a (4, 4) matrix of Ξ = Λ′Λ given in (24.351).
Table 24.4 lists another (4, 4) matrix of S(Ξ) = S(Λ′Λ) = S(Λ′)S(Λ). The matrix
S(Λ) is given in (24.293). The matrix S(Λ′) is expressed as

-1
S Λ ′ = S Λϕ ′ S Λθ ′ S Λb ′ =


=2 θ′ ω′ ′ θ′ ω′ ′ θ′ ω′ ′ θ′ ω′
eiϕ cosh
cos eiϕ =2 sin cosh - eiϕ =2
cossinh eiϕ =2 sin sinh
2 2 2 2 2 2 2 2
- iϕ ′ =2 θ′ ω ′ - iϕ ′ =2 θ′ ω′ - iϕ ′ =2 θ′ ω′ - iϕ ′ =2 θ′ ω′
-e sin cosh e cos cosh e sin sinh e cos sinh
2 2 2 2 2 2 2 2 :
iϕ ′ =2 θ′ ω′ iϕ ′ =2 θ′ ω′ iϕ ′ =2 θ′ ω′ iϕ ′ =2 θ′ ω′
-e cos sinh e sin sinh e cos cosh e sin cosh
2 2 2 2 2 2 2 2
′ ′ ′ ′ ′
′ θ ω ′ θ ω′ ′ θ ω ′
′ θ ω′
e - iϕ =2 sin sinh e - iϕ =2 cos sinh - e - iϕ =2 sin cosh e - iϕ =2 cos cosh
2 2 2 2 2 2 2 2
ð24:358Þ
1224

Table 24.3 Representation matrix of Lorentz transformation Λ′Λ


Ξ=Λ′Λ=
coshω ′ coshω - sin θsinhω ′ 0 coshω ′ sinhω
þ cos θsinhω ′ sinhω þ cos θsinhω ′ coshω

cos ϕ ′ sin θ ′ sinhω ′ coshω cos ϕ ′ cos θ ′ cos ϕ cos θ - cos ϕ ′ cos θ ′ sin ϕ cos ϕ ′ sin θ ′ sinhω ′ sinhω
þ cos ϕ ′ cos θ ′ cos ϕ sin θsinhω - sin ϕ ′ sin ϕ cos θ - sin ϕ ′ cos ϕ þ cos ϕ ′ cos θ ′ cos ϕ sin θcoshω
- sin ϕ ′ sin ϕ sin θsinhω - cos ϕ ′ sin θ ′ sin θcoshω ′ - sin ϕ ′ sin ϕ sin θcoshω
þ cos ϕ ′ sin θ ′ cos θcoshω ′ sinhω þ cos ϕ ′ sin θ ′ cos θcoshω ′ coshω

sin ϕ ′ sin θ ′ sinhω ′ coshω sin ϕ ′ cos θ ′ cos ϕ cos θ - sin ϕ ′ cos θ ′ sin ϕ sin ϕ ′ sin θ ′ sinhω ′ sinhω
þ sin ϕ ′ cos θ ′ cos ϕ sin θsinhω þ cos ϕ ′ sin ϕ cos θ þ cos ϕ ′ cos ϕ þ sin ϕ ′ cos θ ′ cos ϕ sin θcoshω
þ cos ϕ ′ sin ϕ sin θsinhω - sin ϕ ′ sin θ ′ sin θcoshω ′ þ cos ϕ ′ sin ϕ sin θcoshω
þ sin ϕ ′ sin θ ′ cos θcoshω ′ sinhω þ sin ϕ ′ sin θ ′ cos θcoshω ′ coshω
cos θ ′ sinhω ′ coshω - sin θ ′ cos ϕ cos θ sin θ ′ sin ϕ cos θ ′ sinhω ′ sinhω
- sin θ ′ cos ϕ sin θsinhω - cos θ ′ sin θcoshω ′ - sin θ ′ cos ϕ sin θcoshω
þ cos θ ′ cos θcoshω ′ sinhω þ cos θ ′ cos θcoshω ′ coshω
24
Basic Formalism
24.6

Table 24.4 Representation matrix of S(Λ′Λ)


SðΛ ′ΛÞ =
0θ0 θ ω0 þ ω 0 θ0 θ ω0 - ω 0 θ0 θ ω0 þ ω 0 θ0 θ ω0 - ω
eiðϕ þϕÞ=2 cos cos cosh eiðϕ þϕÞ=2 cos sin cosh - eiðϕ þϕÞ=2 cos cos sinh - eiðϕ þϕÞ=2 cos sin sinh
2 2 2 2 2 2 2 2 2 2 2 2
θ0 θ ω0 - ω θ0 θ ω0 þ ω 0 θ0 θ ω0 - ω 0
iðϕ0 - ϕÞ=2 iðϕ0 - ϕÞ=2 θ0 θ ω0 þ ω
-e sin sin cosh þe sin cos cosh - eiðϕ - ϕÞ=2 sin sin sinh þeiðϕ - ϕÞ=2 sin cos sinh
2 2 2 2 2 2 2 2 2 2 2 2

0 θ0 θ ω0 þ ω 0 θ0 θ ω0 - ω
- e - iðϕ - ϕÞ=2 sin cos cosh - e - iðϕ - ϕÞ=2 sin sin cosh 0 θ0 θ ω0 þ ω 0 θ0 θ ω0 - ω
2 2 2 2 2 2 e - iðϕ - ϕÞ=2 sin cos sinh e - iðϕ - ϕÞ=2 sin sin sinh
2 2 2 2 2 2
- iðϕ0 þϕÞ=2 θ0 θ ω0 - ω - iðϕ0 þϕÞ=2 θ0 θ ω0 þ ω 0 θ0 θ ω0 - ω
-e cos sin cosh þe cos cos cosh - e - iðϕ þϕÞ=2 cos sin sinh - iðϕ0 þϕÞ=2 θ0 θ ω0 þ ω
2 2 2 2 2 2 2 2 2 þe cos cos sinh
2 2 2

0 θ0 θ ω0 - ω
0 θ0 θ ω0 þ ω 0 θ0 θ ω0 - ω 0
- eiðϕ þϕÞ=2 cos cos sinh - eiðϕ þϕÞ=2 cos sin sinh eiðϕ þϕÞ=2 cos
θ0 θ
cos cosh
ω0 þ ω eiðϕ þϕÞ=2 cos sin cosh
2 2 2 2 2 2 2 2 2 2 2 2
0 θ0 θ ω0 þ ω
θ0 θ ω0 - ω θ0 θ ω0 þ ω 0 θ0 θ ω0 - ω þeiðϕ - ϕÞ=2 sin cos cosh
iðϕ0 - ϕÞ=2 iðϕ0 - ϕÞ=2
-e sin sin sinh þe sin cos sinh - eiðϕ - ϕÞ=2 sin sin cosh 2 2 2
Transformation Properties of the Dirac Operators [9]

2 2 2 2 2 2 2 2 2

0 θ0 θ ω0 þ ω 0
θ0 θ ω0 - ω
0 θ0 θ ω0 þ ω 0 θ0 θ ω0 - ω - e - iðϕ - ϕÞ=2 sin cos cosh - e - iðϕ - ϕÞ=2 sin sin cosh
e - iðϕ - ϕÞ=2 sin cos sinh e - iðϕ - ϕÞ=2 sin sin sinh 2 2 2 2 2 2
2 2 2 2 2 2 0 θ0 θ ω0 - ω 0 θ0 θ ω0 þ ω
0 θ0 θ ω0 - ω 0 θ0 θ ω0 þ ω - e - iðϕ þϕÞ=2 cos sin cosh þe - iðϕ þϕÞ=2 cos cos cosh
- e - iðϕ þϕÞ=2 cos sin sinh þe - iðϕ þϕÞ=2 cos cos sinh 2 2 2 2 2 2
2 2 2 2 2 2
1225
1226

Table 24.5 Representation matrix of Dirac operator D00 a


-1
D″ = S Λ ′ Λ A S Λ ′Λ = ð - 2mÞ ×
′ ′
- cosh2ω2 sinh2ω2 þ sinh2ω2 cosh2ω2 0 - ð3; 1Þ - ð4; 1Þ
1 ′
þ2 cos θsinhω sinhω

0 ð1; 1Þ - ð4; 1Þ ð3; 1Þ


sin θ cos ϕsinhω
- 12 sin θ ′ ′ ′

′ ′
cosh2ω2 cosh2ω2 þ sinh2ω2 sinh2ω2

þ12 cos θ sinhω coshω þ cos θcoshω ′ sinhω ð4; 1Þ þ12 cos θsinhω ′ sinhω 0
′ ′
1 - iϕ ′
2e sin θsinhω eiϕ sin 2θ2 - e - iϕ cos 2θ2
′ - ð3; 1Þ
- 12e - iϕ sin θ ′ cos θcoshω ′ sinhω þ sinhω ′ coshω 0 ð3; 3Þ

a
In the above matrix, e.g., (1, 1) means that the (2, 2) matrix element should be replaced with the (1, 1) element
24
Basic Formalism
24.6 Transformation Properties of the Dirac Operators [9] 1227

As shown in Sect. 24.5, the Dirac spinors ψ(x) was expressed in reference to the
rest frame of the electron as

ψ ðxÞ = e ± ip x wð0, hÞ = e ± imx wð0, hÞ:


0 0 0
ð24:296Þ

Then, we have

ψ 00 ðx00 Þ = SðΞÞψ ðxÞ = e ± imx SðΞÞwð0, hÞ:


0
ð24:359Þ

The spinors w(0, h) were given in (24.305) and (24.310) according to the positive-
and negative-energy electron. Thus, individual columns of S(Ξ) in Table 24.4 are
directly associated with the four spinors of either the positive- or negative-energy
electron, as in the case of (24.293). The orthogonal characteristics hold with the
Dirac spinors as in (21.141) and (21.142). Individual matrix elements of S(Ξ) consist
of two complex terms. Note, however, in the case of the co-axial boosts (i.e., θ = 0)
each matrix element comprises a single term. In both the co-axial and non-coaxial
boost cases, individual four columns of S(Λ′Λ) represent the Dirac spinors with
respect to the frame O′′. As mentioned before, the sign for ψ 004 ðx00 Þ should be reversed
relative to the fourth column of S(Λ′Λ) because of the charge conjugation.
Meanwhile, Table 24.5 lists individual matrix elements of a (4, 4) matrix of the
Dirac operator D00 with the positive-energy electron. Using S(Λ′Λ), D00 viewed in
reference to the frame O′′ is described by

D00 = SðΛ0 ΛÞA½SðΛ0 ΛÞ - 1 = SðΞÞA½SðΞÞ - 1 , ð24:360Þ

where A was given in (24.301). As can be seen in (21.75), the Dirac operator
contains all the information of the rest mass of an electron and the energy-
momentum of a plane wave electron. This is clearly stated in (24.278). In the x′′-
system (i.e., the frame O′′) the Dirac equation is described in a similar way such that

00
iγ μ ∂ μ - m  ψ 00 ðx00 Þ = 0:

The point is that in any inertial frame of reference the matrix form of the gamma
matrices is held invariant. Or rather, S(Ξ) has been chosen for the gamma matrices to
be invariant in such a way that

Λνμ Sγ μ S - 1 = γ ν : ð24:276Þ

Readers are encouraged to confirm this relation by calculating a tangible example.


Thus, with the plane wave solution (of the positive-energy case) in frame O′′ we
obtain
1228 24 Basic Formalism

p00 μ γ μ - m  ψ 00 ðx00 Þ = 0,

where p′′μ is a four-momentum in O′′. Hence, even though the matrix form of
Table 24.5 is somewhat complicated, the Dirac equation must uniquely be
represented in any inertial frame of reference.
Thus, from Table 24.5 we get the following four-momenta in the frame O′′
described by

p00 = ½ð1, 1Þ - ð3, 3Þ=2, m = ½ð1, 1Þ þ ð3, 3Þ=ð- 2Þ,


0

p00 = ð1, 3Þ, p1 = ½ - ð1, 4Þ þ ð4, 1Þ=2, p2 = ½ð1, 4Þ þ ð4, 1Þi=2,


3
ð24:361Þ

where (m, n) to which 1 ≤ m, n ≤ 4 indicates the (m, n)-element of the matrix given in
Table 24.5. As noted from (21.75), we can determine other matrix elements from the
four elements indicated in Table 24.5, in which we certainly confirm that
(1, 1) + (3, 3) = - 2m. This is in agreement with G of (21.75) and (21.91). Thus,
from (24.361) and Table 24.5 we get

p000 = mðcosh ω0 cosh ω þ cos θ sinh ω0 sinh ωÞ,


p003 = m cos θ0 ðsinh ω0 cosh ω þ cos θ cosh ω0 sinh ωÞ
- m sin θ0 sin θ cos ϕ sinh ω,
p001 = m½ sin θ0 cos ϕ0 ðsinh ω0 cosh ω þ cos θ cosh ω0 sinh ωÞ
θ0 θ0
þ cosðϕ0 þ ϕÞ cos 2 sin θ sinh ω - cosðϕ0 - ϕÞ sin 2 sin θ sinh ω,
2 2

p002 = m½ sin θ0 sin ϕ0 ðsinh ω0 cosh ω þ cos θ cosh ω0 sinh ωÞ


θ0
þ sinðϕ0 þ ϕÞ cos 2 sin θ sinh ω
2
θ0
- sinðϕ0 - ϕÞ sin 2 sin θ sinh ω: ð24:362Þ
2

From the above relations, we recover

2
p000 - p002 - m2 = 0 or p000 = p002 þ m2 : ð21:79Þ

As seen above, the basic construction of the Dirac equation remains unaltered by
the operation of successive Lorentz transformations including the non-coaxial
boosts.
24.7 Projection Operators and Related Operators for the Dirac Equation 1229

Example 24.5 We show another simple example of the non-coaxial Lorentz boosts.
Suppose that in the moving coordinate systems we have successive non-coaxial
boosts Λ described by

cosh ω ′ - sinh ω ′ 0 0 cosh ω 0 0 sinh ω


′ ′
- sinh ω cosh ω 0 0 0 1 0 0
Λ=
0 0 1 0 0 0 1 0
0 0 0 1 sinh ω 0 0 cosh ω

cosh ω ′ cosh ω - sinh ω ′ 0 cosh ω ′ sinh ω


- sinh ω ′ cosh ω cosh ω ′ 0 - sinh ω ′ sinh ω
= : ð24:363Þ
0 0 1 0
sinh ω 0 0 cosh ω

Equation (24.363) corresponds to a special case of (24.351), where θ = - θ′ = π/


2 and ϕ = ϕ′ = 0; see Table 24.3. Readers are encouraged to imagine the geometric
feature of the successive non-coaxial boosts in the moving coordinate systems; see
Fig. 24.2.

24.7 Projection Operators and Related Operators


for the Dirac Equation

In this section, we examine the constitution of the Dirac equation in terms of the
projection operator. In the next section, related topics are discussed as well. To this
end, we wish to deal with the issue from a somewhat general standpoint. Let G be
expressed as

pE H
G= ðp ≠ qÞ,
-H qE

where E is a (2, 2) identity matrix; H is a (2, 2) Hermitian matrix. Then, we have

p2 E þ H 2 ðq - pÞH p2 E þ H 2 ðp - qÞH
G{ G = , GG{ = : ð24:364Þ
ðq - pÞH q2 E þ H 2 ðp - qÞH q2 E þ H 2

For G to be a normal matrix (i.e., G{G = GG{), we must have H  0. In our


present case, if we choose G of (21.91) for G, we have p = p0 - m, q = - p0 - m,
and H = - σ  p. Thus, H  0 is equivalent to p  0. This means that an electron is at
rest. Namely, G is normal (and Hermitian) only if the electron is at rest. As long as
the electron is moving (in a certain inertial frame of reference), G is not normal. In
this situation, Theorem 14.5 tells us that G could only be diagonalized via similarity
1230 24 Basic Formalism

transformation using a non-unitary operator, if p ≠ 0. In fact, in (24.339) using a


non-unitary operator S, G could be diagonalized. This means that although the
operator G is not normal (nor Hermitian), it is semi-simple (Sect. 12.5). The above
discussion is true of the operator G ~ of (21.108) as shown in (24.342).
In the discussion of the previous section, we have examined the constitution of
the operator S and known that their non-unitarity comes from the Lorentz boost.
However, once G or G ~ is given, the diagonalizing matrix form for S can uniquely be
determined (see Sect. 24.9).
Let us start our main theme about the projection operator. From (24.301) and
(24.302), we obtain

A B
þ = E, ð24:365Þ
ð- 2mÞ ð- 2mÞ

where E is the identity operator; i.e., a (4, 4) identity matrix. Operating S from the left
and S-1 from the right on both sides of (24.365), respectively, we get

SAS - 1 SBS - 1
þ = SES - 1 = E or SAS - 1 þ SBS - 1 = ð- 2mÞE: ð24:366Þ
ð- 2mÞ ð- 2mÞ

Using (24.339) and (24.342), we restore (24.282). Meanwhile, operating both


sides of (24.366) on ϕ′(x′) of (24.346) from the left and taking account of (24.282),
we obtain

~ 0 ðx0 Þ = - 2mϕ0 ðx0 Þ:


Gϕ ð24:367Þ

Operating both sides of (24.366) on χ ′(x′) of (24.349) from the left and taking
account of (24.282) once again, we have

Gχ 0 ðx0 Þ = - 2mχ 0 ðx0 Þ: ð24:368Þ

Now, suppose that the general solutions of the plane wave Dirac field Ψ(x′) are
given by

4
Ψðx0 Þ = α ψ 0 ðx0 Þ,
i=1 i i
ð24:369Þ

where ψ 0i ðx0 Þ are those of (24.311) and αi are their suitable coefficients. Then,
operating G ~ from the left on both sides of (24.369), we get

~ ðx0 Þ =

2
ð- 2mÞαi ψ 0i ðx0 Þ, ð24:370Þ
i=1
24.7 Projection Operators and Related Operators for the Dirac Equation 1231

4
GΨðx0 Þ = i=3
ð- 2mÞαi ψ 0i ðx0 Þ: ð24:371Þ

Rewriting the above equations, we obtain

~ ðx0 Þ=ð- 2mÞ =



2
α ψ 0 ðx0 Þ, ð24:372Þ
i=1 i i
4
GΨðx0 Þ=ð- 2mÞ = α ψ 0 ðx0 Þ:
i=3 i i
ð24:373Þ

Equations (24.372) and (24.373) mean that GΨ~ ðx0 Þ=ð- 2mÞ extracts the positive-

energy states from Ψ(x ) including the coefficients and that GΨðx0 Þ=ð- 2mÞ extracts
the negative-energy states from Ψ(x′) including the coefficients as well. These imply
~ ðx0 Þ=ð- 2mÞ and GΨðx0 Þ=ð- 2mÞ act as the projection
that both the operators GΨ
operators (see Sects. 14.1 and 18.7). Defining P and Q as

G~ - pμ γ μ - m pμ γ μ þ m
P = = ,
ð- 2mÞ ð- 2mÞ 2m
G pμ γ μ - m - pμ γ μ þ m
Q = = ð24:374Þ
ð- 2mÞ ð- 2mÞ 2m

and referring to (24.282), we obtain

P þ Q = E: ð24:375Þ

From (24.374), we have

pμ γ μ þ m p ρ γ ρ þ m pμ γ μ pρ γ ρ þ 2mpρ γ ρ þ m2
P2 = = : ð24:376Þ
ð2mÞð2mÞ ð2mÞð2mÞ

To calculate (24.376), we take account of the following relation described by [13]

1 1
p μ γ μ pρ γ ρ = pμ pρ γ μ γ ρ = p p ðγ μ γ ρ þ γ ρ γ μ Þ = pμ pρ  2ημρ = pμ pμ
2 μ ρ 2
= p2 = m 2 , ð24:377Þ

where with the third equality we used (21.64). Substituting (24.377) into (24.376),
we rewrite it as

2m pρ γ ρ þ m pμ γ μ þ m
P2 = = = P, ð24:378Þ
ð2mÞð2mÞ 2m

where the last equality came from the first equation of (24.374). Or, using the matrix
algebra we reach the same result at once. That is, we have
1232 24 Basic Formalism

SBS - 1
P= ,
ð- 2mÞ
SBS - 1 SBS - 1 SB2 S - 1 ð- 2mÞSBS - 1 SBS - 1
P2 =  = = = = P, ð24:379Þ
ð- 2mÞ ð- 2mÞ ð- 2mÞ2 ð- 2mÞ2 ð- 2mÞ

where with the third equality we used (24.302). Similarly, we obtain

Q2 = Q: ð24:380Þ

Moreover, we have

PQ = QP = 0: ð24:381Þ

The relations (24.375) through (24.381) imply that P and Q are projection
operators. However, notice that neither P nor Q is Hermitian. This property
originally comes from the involvement of the Lorentz boost term of (24.293). In
this regard, P and Q can be categorized as projection operators sensu lato (Sect.
18.7), even though these operators are idempotent as can be seen from (24.378) and
(24.380). Equation (24.375) represents the completeness relation.
Let us consider the following example.
Example 24.6 We have an interesting and important example in relation to the
projection operators and spectral decomposition dealt with in Sect. 14.3. A projec-
tion operator was defined using an adjoint operator (see Sect. 14.1) such that

P~k = jwk ihwk j : ð24:382Þ

Notice that (24.382) postulates that P~k is Hermitian. That is,

ðjwi ihwi jÞ{ = ðhwi jÞ{ ðjwi iÞ{ = jwi ihwi j : ð14:28Þ

To apply this notation to the present case, we define following vectors in relation
to (24.304) and (24.309) as:

j1i  u1 ð0, - 1Þ, j2i  u2 ð0, þ1Þ; ð24:383Þ


j3i  v1 ð0, þ1Þ, j4i  v2 ð0, - 1Þ: ð24:384Þ

Defining

~  A ~  B
P and Q , ð24:385Þ
ð- 2mÞ ð- 2mÞ

we have
24.7 Projection Operators and Related Operators for the Dirac Equation 1233

2 2
P þ Q = E, P = P, Q = Q, PQ = 0: ð24:386Þ

Hence, both P ~ and Q~ are Hermitian projection operators (i.e., projection operators
sensu stricto; see Sect. 18.7). This is self-evident, however, notice that neither A nor
B in (24.385) involves a Lorentz boost. Then, using (24.383) and (24.384), we obtain

j1ih1jþj2ih2j þ j3ih3jþj4ih4j = E: ð24:387Þ

Rewriting (24.387) in the form of completeness relation, we have

1 0 0 0 0 0 0 0
0 0 0 0 þ 0 1 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
þ 0 0 0 0 þ 0 0 0 0 = E: ð24:388Þ
0 0 1 0 0 0 0 0
0 0 0 0 0 0 0 1

Though trivial, (24.388) represents the spectral decomposition.


Example 24.7 Another important projection operator can readily be derived from
the spin associated operator defined in Chap. 21 as

1
S σp ð21:85Þ
jpj

and

~  1
S σ  ð- pÞ = - S: ð21:107Þ
j - pj

The above operators were (2, 2) submatrices (Sect. 21.3). The projection opera-
~
~ of (4, 4) full matrix can be defined by use of the direct sum of S or S
tors Σ and Σ
such that

1 S
Σ Eþ 0
S
, ð24:389Þ
2 0

~
~ 1 Eþ
Σ S 0
, ð24:390Þ
2 0 ~
S

~ are Hermitian. We can


where E is the (4, 4) identity matrix. Notice that both Σ and Σ
readily check that
1234 24 Basic Formalism

2
Σ þ Σ = E, Σ2 = Σ, Σ = Σ, ΣΣ = ΣΣ = 0: ð24:391Þ

More importantly, we have

~ { = Σ:
Σ{ = Σ and Σ ~ ð24:392Þ

~ are the projection operators sensu stricto.


In this respect, the operators Σ and Σ
In Sect.21.3.2, as the solution of the Dirac equation (21.96) we have gotten the
eigenfunction Χ described by

χ
Χ= hSχ
: ð21:100Þ

Operating Σ of (24.389) on Χ of (21.100) from the left, for instance, we obtain

1 S χ 1 χ 1 Sχ 1 χ 1
ΣΧ = Eþ 0
S
= þ = þ hχ
2 0 hSχ 2 hSχ 2 hSSχ 2 hSχ 2 hShχ

1 χ 1 χ
= þ h : ð24:393Þ
2 hSχ 2 hSχ

χ
If in (24.393) h = 1, we have ΣΧ = hSχ
. If h = - 1, on the other hand,
χ
ΣΧ = 0. Consequently, Σ extracts hSχ
intactly when h = 1, whereas it discards
χ
hSχ
in whole when h = - 1.
For another example of the projection operator, we introduce

~
~h  1 E þ h
Σ S 0
: ð24:394Þ
2 0 ~
S

~ h of (24.394) on Χ′ of (21.133) from the left, we obtain


Operating Σ

~ ~χ
~ h Χ0 ¼ 1 E þ h
Σ S 0 S~χ
¼
1 S~χ 1
þ h S S~
2 0 ~
S - h~χ 2 - h~χ 2 ~χ
- hS~

1 S~χ 1 Sh~χ 1 S~χ 1 S~χ S~χ


¼ - h~χ
þ h - hh~χ
¼ - h~χ
þ h2 - h~χ
¼ - h~χ
¼ Χ0 : ð24:395Þ
2 2 2 2

Equation (24.395) looks trivial, but it clearly shows that Χ′ is certainly a proper
negative-energy eigenvector that belongs to the helicity h. Similarly, we define
24.8 Spectral Decomposition of the Dirac Operators 1235

1 S
Σh  Eþh 0
S
: ð24:396Þ
2 0

Also, we have

χ Sχ
Σh Χ = Σ h = Χ, Σh Χ0 = Σh = 0, Σh Χ = 0, ð24:397Þ
hSχ - hχ
{
~h = Σ
Σ h 2 = Σh , Σ
2
~h, Σ
~ h Σ h = Σh Σ ~ =Σ
~ h = 0, Σ{ = Σh , Σ ~ h , etc: ð24:398Þ
h h

To the above derivation, use (21.89) and (21.115). It is obvious that both Σh and
~ h are projection operators sensu stricto.
Σ

24.8 Spectral Decomposition of the Dirac Operators

In the previous section, we have shown how the Dirac operators G and G ~ (with the
eigenvalues -2m and 0 both doubly degenerate) can be diagonalized. In Sect. 14.3
we discussed the spectral decomposition and its condition. That is, we have shown
that a necessary and sufficient condition for an operator to be a normal operator is
that the said operator is expressed as the relation (14.74) called the spectral decom-
position. This is equivalent to the statement of Theorem 14.5. In this case, relevant
projection operators are the projection operators sensu stricto (i.e., Hermitian and
idempotent).
Meanwhile, in Sects. 12.5 and 12.7 we mentioned that similar decomposition can
be done if an operator in question can be diagonalized via similarity transformation
(namely, in the case where the said operator is semi-simple). In that case, Definition
14.1 of the projection operator has been relaxed so that the projection operators
sensu lato can play a role (see Sect. 18.7). The terminology of spectral decomposi-
tion should be taken in a broader meaning accordingly. Bearing in mind the above
discussion, we examine the conditions for the spectral decomposition of the Dirac
operators G and G. ~ Neither G nor G ~ is Hermitian or normal, but both of them are
semi-simple.
Now, operating S and S-1 on both sides of (24.387) from the left and right,
respectively, we have

Sðj1ih1jþj2ih2j þ j3ih3jþj4ih4jÞS - 1 = SES - 1 = E: ð24:399Þ

If the Lorentz boost is absent, the representation matrix S related to the Lorentz
transformation is unitary (i.e., normal). In this case, (24.399) is rewritten as
1236 24 Basic Formalism

Sðj1ih1jþj2ih2j þ j3ih3jþj4ih4jÞS{ = E: ð24:400Þ

Defining

Sjki  jk0 i ðk = 1, 2, 3, 4Þ ð24:401Þ

and taking the adjoint of (24.401), we have

hkjS{  hk0 j: ð24:402Þ

Then, from (24.400) we get

j10 ih10 jþj20 ih20 j þ j30 ih30 jþj40 ih40 j = E: ð24:403Þ

Individual projection operators | k′ihk′ j (k = 1, 2, 3, 4) are Hermitian and, hence,


these operators are the projection operators sensu stricto. We have already seen this
case in Sect. 24.7. Notice again that (24.401) represents the transformation of the
Dirac spinor caused by the Lorentz transformation.
If the Lorentz boost is present, however, S is not unitary. The spectral decompo-
sition is not straightforward in that case. Instead, we have a following relation with
S described by [11]

γ 0 S{ γ 0 = S - 1 : ð24:404Þ

Using (24.404) we rewrite (24.399) as

Sðj1ih1jþj2ih2j þ j3ih3jþj4ih4jÞγ 0 S{ γ 0 = E:

Using (24.401), we have

ðj10 ih1jþj20 ih2j þ j30 ih3jþj40 ih4jÞγ 0 S{ γ 0 = E: ð24:405Þ

Note that | 1′i and | 2′i represent two linearly independent solutions ϕ′(x′) of
(24.346) and that | 3′i and | 4′i represent two linearly independent solutions χ ′(x′) of
(24.349).
Using h1| γ 0 = h1| , h2|γ 0 = h2j; h3| γ 0 = - h3| , h4|γ 0 = - h4j, we get

ðj10 ih1jþj20 ih2j - j30 ih3j - j40 ih4jÞS{ γ 0 = E: ð24:406Þ

Using (24.402), we have


24.8 Spectral Decomposition of the Dirac Operators 1237

ðj10 ih10 jþj20 ih20 j - j30 ih30 j - j40 ih40 jÞγ 0 = E: ð24:407Þ

Defining

hk 0 jγ 0  hk 0 j, ð24:408Þ

we obtain

j10 ih10 j þ j20 ih20 j - j30 ih30 j - j40 ih40 j = E: ð24:409Þ

Note that (24.408) is equivalent to (21.135) that defines the Dirac adjoint.
Equation (24.409) represents the completeness relation of the Dirac spinors; be
aware of the minus signs of the third and fourth terms. Multiplying P of (24.374)
on both sides of (24.409) from the left and using (24.349) and (24.367), we have

pμ γ μ þ m
P j10 ih10 j þ j20 ih20 j = j10 ih10 j þ j20 ih20 j = P = : ð24:410Þ
2m

In (24.410), we identify j10 ih10 j þ j20 ih20 j with ( pμγ μ + m)/2m. This can readily be
confirmed using the coordinate representation ψ 01 ðx0 Þ and ψ 02 ðx0 Þ of (24.311) for | 1′i
and | 2′i, respectively. Readers are encouraged to confirm this.
Multiplying Q of (24.374) on both sides of (24.409) from the left and using
(24.346) and (24.368), in turn, we obtain

Q - j30 ih30 j - j40 ih40 j = - j30 ih30 j - j40 ih40 j = Q: ð24:411Þ

That is, we get

pμ γ μ - m
j30 ih30 j þ j40 ih40 j = - Q = : ð24:412Þ
2m

From (24.410) and (24.412), we recover

P þ Q = E: ð24:375Þ

Rewriting (24.410), we have

- pμ γ μ - m = - 2mj10 ih10 j þ ð- 2mÞj20 ih20 j: ð24:413Þ

Also, we obtain
1238 24 Basic Formalism

j10 ih10 j  j10 ih10 j = j10 ih10 j, j20 ih20 j  j20 ih20 j = j20 ih20 j,

j10 ih10 j  j20 ih20 j = j20 ih20 j  j10 ih10 j = 0: ð24:414Þ

Similarly, we have

pμ γ μ - m = - 2m - j30 ih30 j þ ð- 2mÞ - j40 ih40 j , ð24:415Þ

j30 ih30 j  j30 ih30 j = - j30 ih30 j, j40 ih40 j  j40 ih40 j = - j40 ih40 j,

j30 ih30 j  j40 ih40 j = j40 ih40 j  j30 ih30 j = 0:

For instance, (24.415) can readily be checked using (24.311) and (24.337).
Equation (24.413) can similarly be checked from (24.282).
Looking at (24.413) and (24.415), we find the resemblance between them and
(12.196). In fact, both these cases offer examples of the spectral decomposition (see
Sects. 12.5 and 12.7). Taking account of the fact that eigenvalues of a matrix remain
unchanged after the similarity transformation (Sect. 12.1), we find that the coeffi-
cients -2m for RHS of (24.413) and (24.415) are the eigenvalues (doubly degener-
ate) for the matrix B of (24.302) and A of (24.301).
From (24.414), j10 ih10 j and j20 ih20 j are idempotent operators. Also, from
(24.415), - j30 ih30 j and - j40 ih40 j are idempotent as well. The presence of the two
idempotent operators in (24.413) and (24.415) reflects that the rank of (-pμγ μ - m)
and ( pμγ μ - m) is 2. Notice that neither the operator j30 ih30 j nor j40 ih40 j, on the other
hand, is an idempotent operator but, say, “anti-idempotent” operator. That is, for an
anti-idempotent operator A we have

A2 = - A,

where its eigenvalues are 0 or -1. The operators of both types play an important role
in QED. Meanwhile, we have

{ { {
jk0 ihk 0 j = jk 0 ihk 0 jγ 0 = γ 0 jk 0 ihk 0 j = γ 0 jk0 ihk0 j

≠ jk 0 ihk0 jγ 0 = jk0 ihk 0 j:

Hence, jk0 ihk0 j is not Hermitian. The operator - jk 0 ihk 0 j is not Hermitian, either.
Thus, the operators ± jk0 ihk0 j may be regarded as the projection operators sensu lato
accordingly. The non-Hermiticity of the projection operators results from the fact
that S is not unitary; see the discussion made in Sect. 24.7.
As mentioned earlier in this section, however, we have an exceptional case where
the Lorentz boost is absent; see Sect. 24.6.1. In this case, we can construct the
projection operators sensu stricto; see (24.327), (24.387), and (24.403). As can be
24.8 Spectral Decomposition of the Dirac Operators 1239

seen in these examples, the Dirac operator G is Hermitian. Then, as pointed out in
Sect. 21.3.4, we must have the complete orthonormal set (CONS) of common
eigenfunctions with the operators G and H according to Theorem 14.14. Then, the
CONS constitutes the diagonalizing unitary matrix S(Λ). This can easily be checked
using (21.86), (21.102), (24.319) and (24.320). With H, we obtain

-1 0 0 0
-1 { 0 1 0 0
SR HSR = SR HSR = ,
0 0 -1 0
0 0 0 1

where SR is given by (24.318); the diagonal elements are eigenvalues -1 and 1 (each
doubly degenerate) of H. Individual columns of (24.318) certainly constitute the
CONS. Denoting them from the left as | ji ( j = 1, 2, 3, 4), we have

4
hjjji = δij, j=1
jjihj j = E:

The operator | jihjj is a projection operator sensu stricto. The above results are
essentially the same as those of Example 24.6. With G ð = AÞ, the result is self-
evident.
What about S - 1 HS, then? As already seen in Sect. 21.3.2, G and H commute
with each other; see (21.93). Consequently, we expect to have common eigenvectors
for G and H. In other words, once G has been diagonalized with S as in (24.339), H
may well be diagonalized with S as well. It is indeed the case. That is, we also have

-1 0 0 0
-1 0 1 0 0
S HS = Sω- 1 SR -1
HSR Sω = , ð24:416Þ
0 0 -1 0
0 0 0 1

where Sω is given by (24.328). Thus, we find that the common eigenvectors for G
and H are designated by the individual column vectors of S given in (24.293). As
already shown in Sects. 21.4 and 24.5, these column vectors do not
constitute CONS.
~ given in (21.116), similarly we get
Regarding the operator H

1 0 0 0
-1~ -1 0 -1 0 0
S HS= -S HS = : ð24:417Þ
0 0 1 0
0 0 0 -1

We take the normalization as in (21.141) and (21.142) and define the operators
| u-i, | u+i, etc. such that
1240 24 Basic Formalism

p p p p
2m j 1 ′ j ju - i, 2m 2 ′  juþ i, 2m j3 ′  jvþ i, 2m 4 ′  jv - i;
p p p p
2m h10 j  hu - j, 2m h20 j  huþ j, 2m h30 j  hvþ j, 2m h40 j  hv - j,
ð24:418Þ

where the subscripts - and + denote the helix -1 and +1, respectively. Thus,
rewriting (24.410) and (24.412) we get the relations described by

ju - ihu - j þ juþ ihuþ j = pμ γ μ þ m, ð24:419Þ

jvþ ihvþ j þ jv - ihv - j = pμ γ μ - m: ð24:420Þ

Using (21.141), we have

ju - ihu - j  ju - ihu - j = 2mju - ihu - j, juþ ihuþ j  juþ ihuþ j = 2mjuþ ihuþ j:

Similarly, using (21.142) we obtain

jvþ ihvþ j  jvþ ihvþ j = - 2mjvþ ihvþ j, jv - ihv - j  jv - ihv - j = - 2mjv - ihv - j:

Equivalent expressions of (24.419) and (24.420) are (21.144) and (21.145),


respectively. The above relations are frequently used for various calculations includ-
ing the field interaction (see Chap. 23). Since the operator pμγ μ in (24.419) or
(24.420) is not a normal operator, neither (24.419) nor (24.420) represents a spectral
decomposition in a strict sense (see Sect. 14.3). Yet, (24.419) and (24.420) may be
referred to as the spectral decomposition in a broader sense (Sect. 12.7).
In the present section and the precedent sections, we have examined the proper-
ties of the projection operators and related characteristics of the spectral decompo-
sition in connection with the Dirac equation on the basis of their results, we wish to
summarize important aspects below.
(i) The Dirac operators (G and G)~ and helicity operators (H and H)~ are mutually
commutative. Both the operators can be diagonalized with the operator S(Λ). In
other words, individual column vectors of S(Λ) are the eigenvectors common to
~ H, and H.
all the operators G, G, ~
(ii) Although the Dirac operators are not Hermitian (nor normal) for the case of
p ≠ 0, the helicity operators are Hermitian. The non-Hermiticity of Dirac
operators implies that the diagonalizing matrix S(Λ) is not unitary. Conse-
quently, the projection operators formed by the individual column vectors of
S(Λ) may be termed projection operators sensu lato.
Combining the Dirac adjoint with the change in the functional form of the Dirac
spinor described by (24.277), we have [10]
24.9 Connectedness of the Lorentz Group 1241

-1 { 0
ψ ′ ðx0 Þ = Sψ ðxÞ = ½Sψ ðxÞ{ γ 0 = ψ { ðxÞS{ γ 0 = ψ ðxÞ γ 0 Sγ

= ψ ðxÞγ 0 S{ γ 0 = ψ ðxÞS - 1 , ð24:421Þ

where with the last equality we used (24.404). Multiplying both sides of (24.421) by
the Dirac operator G from the right, we obtain

ψ ′ ðx0 ÞG = ψ ðxÞS - 1 G = ψ ðxÞS - 1  SAS - 1 = ψ ðxÞAS - 1 ,

where with the second equality we used (24.339). Multiplying both sides of (24.421)
~ from the right, we get
by the Dirac operator G

~ = ψ ðxÞS - 1 G
ψ ′ ðx0 ÞG ~ = ψ ðxÞS - 1  SBS - 1 = ψ ðxÞBS - 1 ,

where with the second equality we used (24.342). If we choose ψ 1(x) or ψ 2(x) of
(24.305) for ψ(x), we have ψ ðxÞA = 0. This leads to

ψ ′ ðx0 ÞG = 0, ð24:422Þ

which reproduces (21.150). Similarly, if we choose ψ 3(x) or ψ 4(x) of (24.308) for


ψ(x), in turn, we have ψ ðxÞB = 0. Then, we obtain

~ = 0:
ψ ′ ð x0 Þ G ð24:423Þ

24.9 Connectedness of the Lorentz Group

In this section, we wish to examine the properties of the Lorenz groups in terms of
the Lie groups and Lie algebras. In Chap. 20 we studied the structure and properties
of SU(2) and SO(3). We mentioned that several different Lie groups share the same
Lie algebra. Of those Lie groups, we had a single universal covering group charac-
terized as the simply connected group (Sect. 20.5). As a typical example, only
SU(2) is simply connected as the universal covering group amongst the Lie groups
such as O(3), SO(3), etc., having the same Lie algebra. In this section, we examine
how the Lorentz group is positioned as part of the continuous groups.
1242 24 Basic Formalism

24.9.1 Polar Decomposition of a Non-Singular Matrix


[14, 15]

We have already studied properties of Lie algebra of the Lorentz group (Sect 24.2).
We may ask what kind of properties the Lorentz group has in relation to the above-
mentioned characteristics of SU(2) and related continuous groups. In particular, is
there a universal covering group for the Lorentz group? To answer this question, we
need a fundamentally important theorem about the matrix algebra. The theorem is
well-known as the polar decomposition of a matrix.
Theorem 24.7 Polar Decomposition Theorem [15] Any complex non-singular
matrix A is uniquely decomposed into a product such that

A = UP, ð24:424Þ

where U is a unitary matrix and P is a positive definite Hermitian matrix.


Proof Since A is non-singular, A{A is a positive definite Hermitian matrix (see Sect.
13.2). Hence, A{A can be diagonalized via unitary similarity transformation.
Namely, we have a following equation:

λ1
λ2
W { A{ AW = Λ  , ð24:425Þ

λn

where W is a suitable unitary matrix; n is a dimension of the matrix and λi (i = 1, ⋯,


n) is a real positive number with all the off-diagonal elements being zero. Defining
another positive definite Hermitian matrix Λ1/2 as
p
þ λ1 p
þ λ2
Λ 1=2
 , ð24:426Þ
⋱ p
þ λn

we rewrite (24.425) such that

W { A{ A W = Λ1=2 Λ1=2 : ð24:427Þ

Further rewriting (24.427), we have


24.9 Connectedness of the Lorentz Group 1243

WΛ1=2 W { = Λ~1=2  Λ~1=2 = Λ1=2


~ 2
A{ A = WΛ1=2 Λ1=2 W { = WΛ1=2 W { ,
ð24:428Þ

~ is defined as
where Λ1=2

~  WΛ1=2 W { :
Λ1=2 ð24:429Þ

~
Since the unitary similarity transformation holds the eigenvalue unchanged, Λ1=2
is the positive definite Hermitian operator. Meanwhile, we define a following matrix
V as

-1
~
V  A Λ1=2 : ð24:430Þ

Then, using (24.428) and (24.430) we obtain

-1 { -1 { -1 -1
V {V = ~
Λ1=2 ~
A{ A Λ1=2 = ~
Λ1=2 ~
Λ1=2
2
~
Λ1=2

-1 -1
~
= Λ1=2 Λ~1=2
2
Λ~
1=2
= E, ð24:431Þ

-1 { { -1
where with the second equality we used ~
Λ1=2 = Λ~1=2 and with the

third equality we used the Hermiticity of Λ~1=2 . To show the former relation, suppose
that we have two non-singular matrices Q and R. We have (QR){ = R{Q{. Putting
R = Q-1, we get E{ = E = (Q-1){Q{, namely (Q{)-1 = (Q-1){ [14]. Thus, from
(24.430) it follows that there exist a unitary matrix V and a positive definite
~ which satisfy (24.424) such that
Hermitian matrix Λ1=2

A = V  Λ~1=2 : ð24:432Þ

Next, suppose that there is another combination of unitary matrix U and


Hermitian matrix P other than (24.432) and see what happens. From (24.424) we
have

A{ A = P{ U { UP = P2 , ð24:433Þ

where with the second equality we used U{U = E (U: unitary) and P{ = P (P:
Hermitian). Then, from (24.428) and (24.433), we obtain
1244 24 Basic Formalism

P2 = Λ~1=2
2
: ð24:434Þ

From (24.429) we get

~
P2 = Λ1=2
2
= WΛW { : ð24:435Þ

Here, assuming P = WΩW{, we obtain

P2 = WΩ2 W { = WΛW { : ð24:436Þ

Hence, Ω2 = Λ. Then, we have


p
± λ1 p
± λ2
Ω= : ð24:437Þ
⋱ p
± λn

From the assumption, however, P is positive definite. Consequently, only the


positive sign is allowed in (24.437). Then, we must have Ω = Λ1/2, and so from the
above assumption of P = WΩW{ and (24.429) we get

P = WΛ1=2 W { = Λ~1=2 : ð24:438Þ

Then, from (24.424) we have

-1
~
U = A Λ1=2 : ð24:439Þ

From (24.430) we have U = V. This implies that the decomposition (24.424) is


unique. These complete the proof.
In relation to the proof of Theorem 24.7, we add that including Λ1/2 we have 2n
1=2
matrices Λð ± Þ that satisfy

1=2 1=2
Λð ± Þ Λð ± Þ = Λ, ð24:440Þ

1=2
where Λð ± Þ is defined as
24.9 Connectedness of the Lorentz Group 1245

p
± λ1 p
1=2 ± λ2
Λð ± Þ = : ð24:441Þ
⋱ p
± λn

Among them, however, only Λ1/2 is the positive definite operator. The
one-dimensional polar representation of a non-zero complex number is the polar
coordinate representation expressed as z = eiθr [where the unitary part is described
first and r (>0) corresponds to a positive definite eigenvalue]. Then, we can safely
say that the polar decomposition of a matrix is a multi-dimensional polar represen-
tation of the matrix.
Corollary 24.1 [14] Any complex non-singular matrix A is uniquely decomposed
into a product such that

A = P0 U 0 , ð24:442Þ

where P′ is a positive definite Hermitian matrix and U′ is a unitary matrix. (Notice


that the product order of P′ and U′ has been reversed against Theorem 24.7.)
Proof The proof can be carried out following that of Theorem 24.7 except that we
perform the calculation using AA{ instead of A{A used before. That is, we have

λ1′
{ λ2′
U′ AA{ U ′ = Ω  , ð24:443Þ

λn′

where all the off-diagonal elements are zero. Defining Ω1/2 as

þ λ1′

Ω1=2  þ λ2′ ð24:444Þ



þ λn′

and proceeding similarly as before, we get

= Ω~1=2  Ω~1=2 = Ω~1=2


2
{ { {
AA{ = U 0 Ω1=2 Ω1=2 U 0 = U 0 Ω1=2 U 0 U 0 Ω1=2 U 0 ,

where Ω~1=2 is defined as Ω~1=2  U 0 Ω1=2 U 0{ . Moreover, defining a following unitary


-1
matrix W as W  Ω~1=2 A, we obtain
1246 24 Basic Formalism

A = Ω~1=2  W:

This means the existence of the polar decomposition of (24.442). The uniqueness
of the polar decomposition can be shown as before. This completes the proof.
Corollary 24.2 [14] Let A be a non-singular matrix. Then, there exist positive
definite Hermitian matrices H1 and H2 as well as a unitary matrix U that satisfy

A = UH 1 = H 2 U: ð24:445Þ

If and only if A is a normal matrix, then we have H1 = H2. (That is, H1 = H2 and
U are commutative.)
Proof From Theorem 24.7, there must exist a positive definite Hermitian matrix H1
and a unitary matrix U that satisfy the relation A = UH1. Rewriting this, we have

A = UH 1 U { U = H~1 U, ð24:446Þ

where H~1  UH 1 U { . This is a unitary similarity transformation, and so H~1 is a


positive definite Hermitian matrix. From Corollary 24.1 the decomposition A = H~1 U
is unique, and so comparing RHSs of (24.445) and (24.446) we can choose H2 as
H 2 = H~1 . Multiplying this by U{ from the right, we get

H 2 U = H~1 U = UH 1 U { U = UH 1 : ð24:447Þ

This gives the proof of the former half of Corollary 24.2.


From Corollary 24.1, with respect to (24.447) that U and H1 commute is
equivalent to H1 = H2. Taking account of this fact, the latter half of the proof is as
follows: If U and H1 commute, we have

A{ A = H {1 U { UH 1 = H {1 H 1 = H 1 H 1 :

Also, we have

AA{ = UH 1 H {1 U { = H 1 UU { H {1 = H 1 H {1 = H 1 H 1 , ð24:448Þ

where with the second equality we used UH1 = H1U and its adjoint H {1 U { = U { H {1 .
From the above two equations, A is normal.
Conversely, if A is normal, from (24.447) we have

A{ A = H 21 = U { H {2 H 2 U = U { H 22 U = AA{ = H 22 :
24.9 Connectedness of the Lorentz Group 1247

Hence, H 21 = H 22 . Since both H1 and H2 are positive definite Hermitian, as in the


proof of Theorem 24.7 we have H1 = H2. Namely, U and H1 = H2 commute. These
complete the proof.
Regarding real non-singular matrices, we have the following corollary.
Corollary 24.3 [8] Any real non-singular matrix A is uniquely decomposed into a
product such that

A = OP, ð24:449Þ

where O is an orthogonal matrix and P is a positive definite symmetric matrix.


Proof The proof can immediately be performed using Theorem 24.7. This is
because a real unitary matrix is an orthogonal matrix and a real Hermitian matrix
is a symmetric matrix.
Since the Lorentz transformation Λ is represented by a real (4, 4) matrix, from
Corollary 24.3 we realize that the matrix Λ is expressed as (24.449); see Sect. 24.2.
Tangible examples with respect to the above theorem and corollaries can be seen in
the following examples.
Example 24.8 Let us take an example of the polar decomposition for a matrix
A expressed as

2 1
A= : ð24:450Þ
1 0

We have

5 2
AT A = : ð24:451Þ
2 1

We dealt with the diagonalization of this matrix in Example 14.3. Borrowing U in


(14.102), we have

p p 2
3þ2 2 0p 2þ1 0
D = U HU =
T
= p : ð24:452Þ
0 3-2 2 0 2-1
2

Then, we get
p
2þ1 p0
D 1=2
= ð24:453Þ
0 2-1

as well as
1248 24 Basic Formalism

D~1=2 = UD1=2 U { = UD1=2 U T


1 1 1 1
p - p p p p
= 4-2 2 4þ2 2 2þ1
p
0 4-2 2 4þ2 2
1 1 0 2-1 1 1
p p - p p
4þ2 2 4-2 2 4þ2 2 4-2 2
p p
3=p2 1=p2
= : ð24:454Þ
1= 2 1= 2

With the matrix V, we obtain

-1
p p
- 1=
V = A D~1=2
2 1 1= p
2 p 2
=
1 0 - 1= 2 3= 2
p p
1=p2 1= p2
= : ð24:455Þ
1= 2 - 1= 2

Thus, we find that V is certainly an orthogonal matrix. The unique polar decom-
position of A is given by
p p p p
= V  D~1=2 =
2 1 1=p2 1= p2 3=p2 1=p2
A= : ð24:456Þ
1 0 1= 2 - 1= 2 1= 2 1= 2

In (24.456) V and D~1=2 are commutative, as expected, because A is a real


symmetric matrix (i.e., Hermitian and normal). The confirmation that D~1=2 is positive
definite is left for readers.
Example 24.9 Let us take an example of the polar decomposition for a triangle
matrix A expressed as

1 2
A= : ð24:457Þ
0 1

Similarly proceeding as the above, we should get a result shown as


p p p p
1 2 1= p2 1=p2 1=p2 1=p2
A= = : ð24:458Þ
0 1 - 1= 2 1= 2 1= 2 3= 2

In this case, the orthogonal matrix and the positive definite symmetric matrix are
not commutative. Finding the other decomposition is left as an exercise for readers.
With respect to the transformation matrix S(Λ) of Sect. 24.4, let us think of a
following example.
24.9 Connectedness of the Lorentz Group 1249

Example 24.10 In the general case of the representation matrix described by


(24.293) of the previous section, consider a special case of θ = 0. In that case,
from (24.293) we obtain

ω ω
eiϕ=2 cosh 0 - eiϕ=2 sinh 0
2 2
ω ω
0 e - iϕ=2 cosh 0 e - iϕ=2 sinh
2 2
ω ω
- eiϕ=2 sinh 0 eiϕ=2 cosh 0
2 2
ω ω
0 e - iϕ=2 sinh 0 e - iϕ=2 cosh
2 2
eiϕ=2 0 0 0
- iϕ=2
0 e 0 0
¼ iϕ=2
0 0 e 0
0 0 0 e - iϕ=2
ω ω
cosh 0 - sinh 0
2 2
ω ω
0 cosh 0 sinh
2 2 : ð24:459Þ
ω ω
- sinh 0 cosh 0
2 2
ω ω
0 sinh 0 cosh
2 2

The LHS of Eq. (24.459) is a normal matrix and the two matrices of RHS are
commutative as expected. As in this example, if an axis of the rotation and a
direction of the Lorentz boost coincide, such successive operations of the rotation
and boost are commutative and, hence, the relevant representation matrix S(Λ) is
normal according to Corollary 24.2.
Theorem 24.7 helps understand that universal covering group exists in the
Lorentz groups. The said group is termed special linear group and abbreviated as
SL(2, ℂ). In Sects. 20.2 and 20.5, we have easily shown that SU(2) is isomorphic to
S3, because we have many constraint conditions imposed upon the parameters of
SU(2). Then, we could readily show that since S3 is simply connected, so is SU(2).
With SL(2, ℂ), on the other hand, the number of constraint conditions is limited; we
have only two conditions resulting from that the determinant of two-dimensional
complex matrices is 1. Therefore, it is not easy to examine whether SL(2, ℂ) is
simply connected. In virtue of Theorem 24.7, however, we can readily prove that
SL(2, ℂ) is indeed simply connected. It will be one of the main subjects of the next
section.
1250 24 Basic Formalism

24.9.2 Special Linear Group: SL(2, ℂ)

In this section we examine how SL(2, ℂ) is related to the Lorentz group. We start
with the most general feature of SL(2, ℂ) on the basis of the discussion of Sect. 20.5.
The discussion is based on Definitions 20.8-20.10 (Sect. 20.5) along with the
following definition (Definition 24.5).
Definition 24.5 The special linear group SL(2, ℂ) is defined as
SL(2, ℂ)  {A = (aij); i, j = 1, 2, aij 2 ℂ, detA = 1}.
Theorem 24.8 [8] The group SL(2, ℂ) is simply connected.
Proof Let g(t, s) 2 SL(2, ℂ) be an arbitrary continuous matrix function with respect
to t in a region I × I = {(t, s); 0 ≤ t ≤ 1, 0 ≤ s ≤ 1}. Also, let g(t, 1) be an arbitrary loop
in SL(2, ℂ). Then, according to Theorem 24.7, g(t, 1) can uniquely be decomposed
such that

gðt, 1Þ = U ðt ÞPðt, 1Þ, ð24:460Þ

where U(t) is unitary and P(t, 1) is positive definite Hermitian.


Since U(t) [2SL(2, ℂ)] is unitary, we must have U(t) 2 SU(2). Suppose that the
function P(t, s) can be described as

Pðt, sÞ = exp sX ðt Þ ð0 ≤ s ≤ 1Þ, ð24:461Þ

where X ðt Þ 2 slð2, ℂÞ; slð2, ℂÞ is a Lie algebra of SL(2, ℂ). Since P(t, 1) is Hermitian
and s is real, X(t) is Hermitian as well. This can immediately be derived from (15.32).
Since we have

gðt, 1Þ = U ðt Þ exp X ðt Þ and gðt, 0Þ = U ðt Þ ð24:462Þ

and expsX(t) is analytic, g(t, 1) is homotopic to g(0, t) = U(t); see Definition 20.9.
Because SU(2) is simply connected (see Sect. 20.5.3), U(t) is homotopic to any
constant loop (Definition 20.10). Consequently, from the above statement g(t, 1) is
homotopic to a constant loop. This loop can arbitrarily be chosen and g(t, 1) is an
arbitrary loop in SL(2, ℂ). Therefore, SL(2, ℂ) is simply connected.
During the process of the above proof, an important feature of SL(2, ℂ) has
already shown up. That is, Hermitian operators are contained in slð2, ℂÞ. This feature
shows a striking contrast with the Lie algebra suðnÞ in that suðnÞ comprises anti-
Hermitian operators only.
Theorem 24.9 [8] The Lorentz group O(3, 1) is self-conjugate. That is, if
Λ 2 O(3, 1), then ΛT 2 O(3, 1).
Proof From (24.198), for a group element Λ to constitute the Lorentz group O(3, 1)
we have
24.9 Connectedness of the Lorentz Group 1251

ΛT ηΛ = η:

Multiplying both sides by Λη from the left and Λ-1η from the right, we obtain

Λη  ΛT ηΛ  Λ - 1 η = Λη  η  Λ - 1 η:

Using η2 = 1, we get

T
ΛηΛT = η or ΛT ηΛT = η:

This implies ΛT 2 O(3, 1).


Theorem 24.10 [8] Suppose that with the Lorentz group O(3, 1), individual ele-
ments Λ 2 O(3, 1) are decomposed into

Λ = UP: ð24:463Þ

Then, their factors U and P belong to O(3, 1) as well with P described by

P = exp Y, ð24:464Þ

where Y 2 oð3, 1Þ is a real symmetric matrix.


Proof From Theorem 24.9, ΛT 2 O(3, 1). Then, ΛTΛ 2 O(3, 1). Since Λ is real, from
Corollary 24.3 U is an orthogonal matrix and P is a positive definite symmetric
matrix. Meanwhile, we have

ΛT Λ = PT U T UP = PT P = P2 , ð24:465Þ

where with the second equality we used the fact that U is an orthogonal matrix; with
the last equality we used the fact that P is symmetric. Then, P2 2 O(3, 1). Therefore,
using ∃ X 2 oð3, 1Þ, P2 can be expressed as

P2 = exp X: ð24:466Þ

Since P is a positive definite symmetric matrix, so is P2. Taking the transposition


of (24.466), we have

T
P2 = P2 = ðexp X ÞT = exp X T , ð24:467Þ

where with the last equality we used (15.31). Comparing (24.466) and (24.467), we
obtain
1252 24 Basic Formalism

exp X = exp X T : ð24:468Þ

Since (24.468) equals the positive definite matrix P2, a logarithmic function
corresponding to (24.468) uniquely exists [8]. Consequently, taking the logarithm
of (24.468) we get

X T = X: ð24:469Þ

Namely, X is real symmetric. As X=2 2 oð3, 1Þ, exp(X/2) 2 O(3, 1). Since P and
P2 are positive definite, exp(X/2) is uniquely determined such that

1=2
expðX=2Þ = P2 = P: ð24:470Þ

Hence, P 2 O(3, 1). Putting X/2  Y, we obtain (24.464), to which Y 2 oð3, 1Þ is


real symmetric. From (24.463) we have U = ΛP-1, and so U 2 O(3, 1). This
completes the proof.
Theorem 24.10 shows that O(3, 1) comprises the direct product of a compact
subgroup O(3, 1) O(4) and exp ~oð3, 1Þ, where ~oð3, 1Þ is a subspace of oð3, 1Þ
consisting of real symmetric matrices. We have another two important theorems with
the Lorentz group O(3, 1).
Theorem 24.11 [8] The compact subgroup O(3, 1) O(4) is represented as

Oð3, 1Þ Oð4Þ = Oð3Þ Oð1Þ: ð24:471Þ

That is, any elements Λ of O(3, 1) O(4) can be described by

±1 0 0 0
0
ð24:472Þ

In an opposite way, any elements that can be expressed as (24.472) belong to


O(3, 1) O(4).
Proof We write Λ such that

Λ= g0
g2
g1
g3
; g0 2 ℝ, g1 : ð1, 3Þ matrix, g2 : ð3, 1Þ matrix,
g3 2 glð3, ℝÞ: ð24:473Þ

As Λ is an orthogonal matrix, i.e., ΛT = Λ-1, ΛTηΛ = η is equivalent to


24.9 Connectedness of the Lorentz Group 1253

g0 g1 g0 - g1
ηΛ = = = Λη: ð24:474Þ
- g2 - g3 g2 - g3

Therefore, we have g1 = g2 = 0. From ΛTηΛ = η and the above result, we have

g0 g2T g0 g1 g20 0
ΛT ηΛ = =
g1T g3T - g2 - g3 0 - g3T g3
1 0 0 0
0 -1 0 0
= = η: ð24:475Þ
0 0 -1 0
0 0 0 -1

From the above equation, we have

g0 = ± 1 and gT3 g3 = 1: ð24:476Þ

The second equation of (24.476) implies that g3 is an orthogonal matrix. In an


opposite way, any elements given in a form of (24.472) must belong to
O(3, 1) O(4). These complete the proof.
Theorem 24.12 [8] The Lorentz group O(3, 1) consists of four connected compo-
nents L0  SO0 ð3, 1Þ, L1 , L2 , and L3 that contain

1 0 0 0 1 0 0 0
0 1 0 0 0 1 0 0
E0  E = , E1 = ,
0 0 1 0 0 0 1 0
0 0 0 1 0 0 0 -1

-1 0 0 0 -1 0 0 0
0 1 0 0 0 1 0 0
E2 = , E3 = , ð24:477Þ
0 0 1 0 0 0 1 0
0 0 0 1 0 0 0 -1

respectively.
Proof As in the case of the proof of Theorem 24.8, an arbitrary element Λ of the
Lorentz group O(3, 1) can be represented as

Λ = Oðt ÞPðt, sÞ, ð24:478Þ

where O(t) is an orthogonal matrix of O(4); P(t, s) = exp sX(t) for which X(t) is a real
symmetric matrix according to Theorem 24.10. Both O(t) and P(t, s) belong to
O(3, 1) (Theorem 24.10). Then, we have O(t) 2 O(3, 1) O(4).
1254 24 Basic Formalism

Consequently, making the parameter s change continuously from 1 to 0, we find


8
Λ 2 O(3, 1) is connected to O(t) in O(3, 1). Since O(t) 2 O(3, 1) O(4), O(t) must
be described in the form of (24.472) according to Theorem 24.11. The elements
Ei (i = 0, 1, 2, 3) of (24.477) certainly belong to O(t) and, hence, belong to
O(3, 1) O(4) as well.
Meanwhile, Example 20.8 tells us that within a connected set the sign of the
determinant should be constant, if the determinant is defined as a real continuous
function. Also, we have already known that O(n) is decomposed into two connected
sets (Sect. 20.5). In the case of O(3), each set is connected to either of the following
two matrices according to the determinant ±1:

1 0 0 1 0 0
0
O= 0 1 0 ,O = 0 1 0 : ð24:479Þ
0 0 1 0 0 -1

For this, see (20.326) and (20.327). Taking account of O(1) = {+1, -1} in
(24.471), we have 2 × 2 = 4 connected sets in O(3, 1). These four sets originate
from the combinations of (24.479) with ±1 of O(1) and the individual connected sets
are represented by Ei (i = 0, 1, 2, 3) of (24.477). Suppose that L0 , L1 , L2 , and L3
comprise all the elements that are connected to E0, E1, E2, and E3, respectively.
Then, these Li ði = 0, 1, 2, 3Þ are connected sets and arbitrary elements of O(3, 1) are
contained in one of them. Thus, it suffices to show that Li \ Lj = ∅ði ≠ jÞ:
First, from ΛTηΛ = η we have (detΛ)2 = 1, i.e., detΛ = ± 1. From (24.477) we
have detE0 = det E3 = 1 and detE1 = det E2 = - 1. Then, all the elements Λ
belonging to L0 and L3 have detΛ = 1 and those belonging to L1 and L2 have
detΛ = - 1. From Example 20.8, the elements of L0 cannot be connected to the
elements of L1 or those of L2 . Similarly, the elements of L3 cannot be connected to
the elements of L1 or those of L2 .
Let Λ0 2 L0 and let Λ0 be expressed as Λ0 = O0P0 with the polar decomposition.
The element Λ0 is connected to O0, and so O0 2 L0 from Theorem 24.10. Since L0 is
a collection of the elements that are connected to E0, so is O0 ½2 L0 \ Oð4Þ. The
element E0 can be expressed as E 0 = O ~ fþ1g so that detO ~ = 1 in (24.479) and det
{+1} = 1. Meanwhile, let Λ3 2 L3 be described by Λ3 = O3P3. The element Λ3 is
connected to O3, and so O3 2 L3 . Similarly to the above case, O3 ½2 L3 \ Oð4Þ is
connected to E3, which is expressed as E 3 = O~0 f - 1g so that detO~0 = - 1 in
(24.479) and det{-1} = - 1. Again, from Example 20.8, within a connected set the
sign of the determinant should be constant. Therefore, the elements of L0 cannot be
connected to the elements of L3 . The discussion equally applies to the relationship
between L1 and L2 .
Thus, after all, we get
24.9 Connectedness of the Lorentz Group 1255

Oð3, 1Þ = C ðE0 Þ [ C ðE1 Þ [ C ðE2 Þ [ C ðE3 Þ,


CðE i Þ \ C E j = ∅ði ≠ jÞ, ð24:480Þ

where CðE i Þ  Li . These complete the proof.


In the above proof, (24.480) is in parallel with (20.318) of O(3), where the
number of the connected components is two. The group L0  SO0 ð3, 1Þ is a
connected component that contains the identity element E0. Then, according to
Theorem 20.9, SO0(3, 1) is an invariant subgroup of O(3, 1). Thus, following Sect.
16.3 we have a coset decomposition described by

Oð3, 1Þ = E0 SO0 ð3, 1Þ þ E 1 SO0 ð3, 1Þ þ E 2 SO0 ð3, 1Þ þ E 3 SO0 ð3, 1Þ: ð24:481Þ

Summarizing the above, we classify the connected components such that [8]:

L0  SO0 ð3, 1Þ = fΛ; Λ 2 Oð3, 1Þ, detΛ = 1, Λ00 ≥ 1g,


L1 = fΛ; Λ 2 Oð3, 1Þ, detΛ = - 1, Λ00 ≥ 1g,
L2 = fΛ; Λ 2 Oð3, 1Þ, detΛ = - 1, Λ00 ≤ - 1g,
L3 = fΛ; Λ 2 Oð3, 1Þ, detΛ = 1, Λ00 ≤ - 1g: ð24:482Þ

The set {E0, E1, E2, E3} forms a group, which is isomorphic to the four-group we
dealt with in Sects. 17.1 and 17.2. In parallel with (20.328), we have

Oð3, 1Þ=SO0 ð3, 1Þ ffi fE0 , E 1 , E 2 , E3 g: ð24:483Þ

In theoretical physics, instead of E1 the following E~1 is often used as a represen-


tative element:

1 0 0 0
0 -1 0 0
E~1 ¼
0 0 -1 0
0 0 0 -1

The element E~1 represents the inversion symmetry (i.e., spatial inversion) and is
connected to E1. This can be shown as in the case of Example 20.6. Suppose a
following continuous matrix function:

1 0 0 0
0 cos πt - sin πt 0
f ðt Þ = :
0 sin πt cos πt 0
0 0 0 -1
1256 24 Basic Formalism

Then, f(t) is a continuous matrix function with respect to real t; f(t) represents
product transformations of mirror symmetry with respect to the xy-plane and rotation
about the z-axis. We have f(0) = E1 and f ð1Þ = E~1 . Hence, f(0) and f(1) are connected
to each other in L1 .
The element E1 (or E~1 ) is known as a discrete symmetry along with the element
E2 that represents the time reversal. The matrix C defined in (21.164), on the other
hand, is not a Lorentz transformation. This is known from Theorem 24.11. Namely,
C is unitary, and so if C belonged to O(3, 1), C would be described in a form of
(24.472). But it is not the case. This is associated with the fact that C represents the
exchange between a pair of particle and antiparticle. Note that this is not a space-time
transformation but an “inner” transformation.
As a result of Theorem 24.12 and on the basis of Theorem 20.10, we reach the
following important consequence: The Lie algebras oð3, 1Þ of O(3, 1) and so0 ð3, 1Þ,
of SO0(3, 1) are identical.
With the modulus of Λ00 , see (24.207). The special theory of relativity is based
upon L0 = O0 ð3, 1Þ where neither the spatial inversion nor time reversal is assumed.
As a tangible example, expðθA3 Þ of (24.224) and expð- ωB3 Þ of (24.225) satisfy
this criterion. Equation (24.223) reflects the polar decomposition. The representation
matrix S(Λ) of (24.293) with respect to the transformation of the Dirac spinor gives
another example of the polar decomposition. In this respect, we show some exam-
ples below.
Example 24.11: Lorentz Transformation Matrix In (24.223), expða  AÞ and
expðb  BÞ represent an orthogonal matrix and a positive definite symmetric matrix,
respectively. From Theorems 24.10 and 24.11, we have

SO0 ð3, 1Þ SOð4Þ = SOð3Þ SOð1Þ, ð24:484Þ

~ of SO0(3, 1)
where SO(1)  {1}. Any elements Λ SO(4) can therefore be described
by

1 0 0 0
0
ð24:485Þ

In fact, (24.224) is an orthogonal matrix, i.e., a specific type of (24.485), and a


unitary matrix component of the polar decomposition (Theorem 24.7 and Corollary
24.3). Equation (24.225) represents an example of a positive definite real symmetric
matrix (i.e., Hermitian) component of the polar decomposition. The matrix
expð- ωB3 Þ of (24.225) has eigenvalues

cosh ω þ sinh ω, cosh ω - sinh ω, 1, 1: ð24:486Þ


24.9 Connectedness of the Lorentz Group 1257

Since all these eigenvalues are positive and det expð- ωB3 Þ = 1, expð- ωB3 Þ is
positive definite (see Sect. 13.2) and the product of the eigenvalues of (24.486) is
1, as can be easily checked. Consequently, (24.223) is a (unique) polar
decomposition.
Example 24.12: Transformation Matrix S(Λ) of Dirac Spinor As already shown,
(24.293) describes the representation matrix S(Λ) with respect to the transformation
of the Dirac spinor caused by the Lorentz transformation. The RHS of (24.293) can
be decomposed such that

Sð Λ Þ ¼
iϕ θ iϕ θ
e 2 cos e 2 sin 0 0
2 2
iϕ θ iϕ θ
- e - 2 sin e - 2 cos 0 0
2 2
iϕ θ iϕ θ
0 0 e 2 cos e 2 sin
2 2
iϕ θ iϕ θ
0 0 - e - 2 sin e - 2 cos
2 2
ω ω
cosh 0 - sinh 0
2 2
ω ω
0 cosh 0 sinh
× 2 2 : ð24:487Þ
ω ω
- sinh 0 cosh 0
2 2
ω ω
0 sinh 0 cosh
2 2

The first factor of (24.487) is unitary and the second factor is a real symmetric
matrix. The eigenvalues of the second factor are positive. These are given by

ω ω ω
cosh þ sinh ðdoubly degenerateÞ, cosh
2 2 2
ω
- sinh ðdoubly degenerateÞ ð24:488Þ
2

with a determinant of 1 (>0). Hence, the second matrix is positive definite. Thus, as
in the case of Example 24.10, S(Λ) gives a unique polar decomposition.
Given the above tangible examples, we wish to generalize the argument a little bit
further. In Theorem 24.7 and Corollary 24.1 of Sect. 24.9.1 we mentioned two ways
of the polar decomposition. What about the inverse Lorentz transformation, then?
The inverse transformation of (24.424) is described by
1258 24 Basic Formalism

A - 1 = P - 1U - 1: ð24:489Þ

Here, suppose that a positive definite Hermitian operator P is represented by a


(n, n) square matrix. From Theorem 14.5 using a suitable unitary matrix U, we have
a following diagonal matrix D

λ1 ⋯
U { PU = D  ⋮ ⋱ ⋮ , ð24:490Þ
⋯ λn

where λ1, ⋯, λn are real positive numbers. Taking the inverse of (24.490), we obtain

1=λ1 ⋯
-1
U - 1P - 1 U{ = U{P - 1U = ⋮ ⋱ ⋮ : ð24:491Þ
⋯ 1=λn

Meanwhile, we have

{ -1
P-1 = P{ = P - 1: ð24:492Þ

Equations (24.491) and (24.492) imply that P-1 is a positive definite Hermitian
operator. Thus, the above argument ensures that the inverse transformation A-1 of
(24.489) can be dealt with on the same footing as the transformation A. In fact, in
both Examples 24.11 and 24.12, eigenvalues of the positive definite real symmetric
(i.e., Hermitian) matrices are unchanged, or switched to one another (consider e.g.,
exchange between ω and -ω) after the inversion.
We have known from Theorem 24.8 that an arbitrary continuous function g(t,
s) 2 SL(2, ℂ) is homotopic to U(t) 2 SU(2) and concluded that since SU(2) is simply
connected, SL(2, ℂ) is simply connected as well. As for the Lorentz group O(3, 1), on
the other hand, O(3, 1) is homotopic to O(t) 2 O(4), but since O(4) is not simply
connected, we infer that O(3, 1) is not simply connected, either. In the next section,
we investigate the representation of O(3, 1), especially that of the proper
orthochronous Lorentz group SO0(3, 1). We further investigate the connectedness
of the Lorentz group with this special but important case.

24.10 Representation of the Proper Orthochronous Lorentz


Group SO0(3, 1) [6, 8]

Amongst four connected components of the Lorentz group O(3, 1), the component
that is connected to the identity element E0  E does not contain the spatial inversion
or time reversal. This component itself forms a group termed SO0(3, 1) and fre-
quently appears in the elementary particle physics.
24.10 Representation of the Proper Orthochronous Lorentz. . . 1259

Let us think of a following set V4 represented by

V4 = H = hij ; i, j = 1, 2, hij 2 ℂ, H : Hermitian , ð24:493Þ

where (hij) is a complex (2, 2) Hermitian matrix. That is, V4 is a set of all (2, 2)
Hermitian matrices. The set V4 is a vector space on a real number field ℝ. To show
this, we express an arbitrary element H of V4 as

a c þ id
H= ða; b; c; d : realÞ: ð24:494Þ
c - id b

We put

1 0 0 1 0 i -1 0
σ0 = , σ1 = , σ2 = , σ3 = : ð24:495Þ
0 1 1 0 -i 0 0 1

Notice that σ k (k = 1, 2, 3) are the Pauli spin matrices that appeared in (20.41).
Also, σ k is expressed as

σ k = 2iζ k ðk = 1, 2, 3Þ,

where ζ k is given by (20.270). Whereas ζ k is anti-Hermitian, σ k is Hermitian. Then,


H of (24.494) can be described with Hermitian operators σ μ (μ = 0, 1, 2, 3) by

1 1
H= ða þ bÞσ 0 þ ð- a þ bÞσ 3 þ cσ 1 þ dσ 2 : ð24:496Þ
2 2

Equation (24.496) implies that any vector H belonging to V4 is expressed as a


linear combination of the four basis vectors σ μ (μ = 0, 1, 2, 3). Since four coefficients
of σ μ (μ = 0, 1, 2, 3) are real, it means that V4 is a four-dimensional vector space on a
real number field ℝ. If we view V4 as an inner product space, the orthonormal set is
given by

1 1 1 1
p σ0, p σ1 , p σ2, p σ3: ð24:497Þ
2 2 2 2

Notice that among σ μ (μ = 0, 1, 2, 3), σ 0 is not an element of the Lie algebra


slð2, ℂÞ since the trace of σ 0 is not zero.
Meanwhile, putting

x0 - x3 x1 þ ix2
H= x0 ; x1 ; x2 ; x3 : real , ð24:498Þ
x1 - ix2 x0 þ x 3

we have
1260 24 Basic Formalism

H = x0 σ 0 þ x1 σ 1 þ x2 σ 2 þ x3 σ 3 = xμ σ μ : ð24:499Þ

That is, (24.498) gives the coordinates xμ (μ = 0, 1, 2, 3) in reference to the basis


vectors σ μ. Now, we consider the following operation:

H ⟼ qHq{ ½q 2 SLð2, ℂÞ, ð24:500Þ

with

q= a
c
b
d
, detq = ad - bc = 1: ð24:501Þ

We have (qHq{){ = qH{q{ = qHq{. That is, qHq{ 2 V4 . Defining this mapping
as φ[q], we write

φ½q : V4 ⟶ V4 , φ½qðH Þ  qHq{ : ð24:502Þ

The mapping φ[q] is a linear transformation within V4 . It is because we have

φ½qðH 1 þ H 2 Þ = φ½qðH 1 Þ þ φ½qðH 2 Þ, φ½qðαH Þ = αφ½qðH Þ ðα 2 ℝÞ: ð24:503Þ

Also, we have

φ½qpðH Þ = qpH ðqpÞ{ = qpHp{ q{ = q pHp{ q{ = φ½q½φ½pðH Þ


= φ½qφ½pðH Þ: ð24:504Þ

Thus, we find that q ⟼ φ[q] is a real representation of SL(2, ℂ) over V4 .


Using the notation of Sect. 11.2, we have

x0
x1
H ′ = φ½qðH Þ = ðσ 0 σ 1 σ 2 σ 3 Þφ½q
x2
x3
x0
x1
= ðσ 0 φ½q σ 1 φ½q σ 2 φ½q σ 3 φ½qÞ : ð24:505Þ
x2
x3

Then, with e.g., σ 0φ[q] we obtain


24.10 Representation of the Proper Orthochronous Lorentz. . . 1261

a b 1 0 a c
σ 0 φ½q = φ½qðσ 0 Þ = qσ 0 q{ =
c d 0 1 b d
jaj2 þjbj2 ac þbd 
= a cþb d jcj2 þjdj2
: ð24:506Þ

Using the relation (24.496), after the transformation φ[q](σ 0) we get

1 2 1 2
H0 = aj2 þ b 2 þ cj2 þ d σ0 þ cj2 þ d 2 - aj2 - b σ3
2 2
þℜeðac þ bd  Þσ 1 þ ℑmðac þ bd Þσ 2 : ð24:507Þ

The coefficients of each σ i (i = 0, 1, 2, 3) act as their coordinates. Meanwhile,


from (24.498) we have

2 2 2 2
detH = x0 - x1 - x2 - x3 : ð24:508Þ

Equation (24.7) is identical with the bilinear form introduced into the
Minkowski space 4 . We know from (24.500) and (24.501) that the transforma-
tion φ[q] keeps the determinant unchanged. That is, the bilinear form (24.508) is
invariant under this transformation. This implies that φ[q] is associated with the
Lorentz transformation.
To see whether this is really the case, we continue the calculations pertinent to
(24.507). In a similar manner, we obtain related results H′ after the transformations
φ[q](σ 1), φ[q](σ 2), and φ[q](σ 3). The results are shown collectively as [6]

ðσ 0 σ 1 σ 2 σ 3 Þφ½q
= ðσ 0 σ 1 σ 2 σ 3 Þ ×
1 1
jaj2 þ jbj2 þ jcj2 þ jd j2 Reðab þ cd  Þ - Imðab þ cd Þ jbj2 þ jdj2 - jaj2 - jcj2
2 2
Reðac þ bd  Þ Reðad  þ bc Þ - Imðad  - bc Þ Reðbd  - ac Þ
:
Imðac þ bd  Þ Imðad  þ bc Þ Reðad  - bc Þ Imðbd  - ac Þ
1 1
jcj þ jd j2 - jaj2 - jbj2
2
Reðcd  - ab Þ Imðab - cd Þ jaj2 þ jd j2 - jbj2 - jcj2
2 2
ð24:509Þ

Equation (24.509) clearly gives a matrix representation of SO0(3, 1). As already


noticed, the present method based on (24.500) is in parallel to the derivation of
(20.302) containing Euler angles performed by the use of the adjoint representation.
In either case, we have represented a vector in the real vector space (either three-
dimensional or four-dimensional) and obtained the matrices (20.302) or (24.509) as
a set of the expansion coefficients with respect to the basis vectors.
Let us further inspect the properties of (24.509) in the following example.
1262 24 Basic Formalism

Example 24.13 We may freely choose the parameters a, b, c, and d under the
restriction of (24.501). Let us choose b = c = 0 and a = eiθ/2 and d = e-iθ/2 in
(24.501). That is,

eiθ=2 0
q= : ð24:510Þ
0 e - iθ=2

Then, we have ad = eiθ to get

1 0 0 0
0 cos θ - sin θ 0
φ½q = : ð24:511Þ
0 sin θ cos θ 0
0 0 0 1

This is just a rotation by θ around the z-axis. Next, we choose a = d = cosh (ω/2)
and b = c = - sinh (ω/2). Namely,

coshðω=2Þ - sinhðω=2Þ
q= : ð24:512Þ
- sinhðω=2Þ coshðω=2Þ

Then, we obtain

coshω - sinhω 0 0
- sinhω coshω 0 0
φ½ q = : ð24:513Þ
0 0 1 0
0 0 0 1

ω ω
Moreover, if we choose b = c = 0 and a = cosh 2 - sinh 2 and d =
cosh ω2 þ sinh ω2 in (24.501), we have

ω ω
cosh - sinh 0
q= 2 2 : ð24:514Þ
ω ω
0 cosh þ sinh
2 2

Then, we get

coshω 0 0 sinhω
φ½ q = 0 1 0 0 : ð24:515Þ
0 0 1 0
sinhω 0 0 coshω

The latter two illustrations represent Lorentz boosts. We find that the (2, 2) matrix
of (24.510) belongs to SU(2) and that the (2, 2) matrices of (24.512) and (24.514) are
positive definite symmetric matrices. In terms of Theorem 24.7 and Corollary 24.3,
24.10 Representation of the Proper Orthochronous Lorentz. . . 1263

these matrices have already been decomposed. If we expressly (or purposely) write
in a decomposed form, we may do so with the above examples such that

eiθ=2 0 1 0
q= , ð24:516Þ
0 e - iθ=2 0 1
1 0 coshðω=2Þ - sinhðω=2Þ
q= : ð24:517Þ
0 1 - sinhðω=2Þ coshðω=2Þ

The identity matrix of (24.516) behaves as a positive definite Hermitian matrix


P in relation to Theorem 24.7. The identity matrix of (24.517) behaves as an
orthogonal matrix O in relation to Corollary 24.3.
The parameter θ in (24.510) is bounded. A group characterized by a bounded
parameter is called a compact group, to which a unitary representation can be
constructed (Theorem 18.1). In contrast, the parameter ω of (24.512) and (24.514)
ranges from -1 to +1. A group characterized by an unbounded parameter is called
a non-compact group and the unitary representation is in general impossible. Equa-
tions (24.11) and (24.13) are typical examples.
We investigate the constitution of the Lorentz group a little further. In Sect. 20.4,
we have shown that

SU ð2Þ=F ffi SOð3Þ, F = fe, - eg, ð24:518Þ

where F is the kernel and e is the identity element. To show this, it was important to
show the existence of the kernel F . In the present case, we wish to show the
characteristic of the kernel N with respect to the homomorphism mapping φ, namely

N = fq 2 SLð2, ℂÞ; φ½q = φ½σ 0 g: ð24:519Þ

First, we show the following corollary.


Corollary 24.4 [8] Let ρ be a continuous representation of a connected linear Lie
group G and let eG be the identity element of G. Let H be another Lie group and let
H0 be a connected component of the identity element eH of H. We express the image
of G by ρ as ρ(G). If ρ(G) is contained in a Lie group H, then ρ(G) is contained in a
connected component H0.
Proof Since ρ transforms the identity element eG to eH, eH 2 ρ(G). The identity
element eH is contained in H. Because ρ(G) is an image of the connected set G, in
virtue of Theorem 20.8 we find that ρ(G) is also a connected subset of H. We have
eH 2 ρ(G), and so this implies that all the elements of ρ(G) are connected to eH. This
means that ρ(G) ⊂ H0. This completes the proof.
The representation φ[q] keeps the bilinear form given by detH of (24.508)
invariant and φ[q], to which q 2 SL(2, ℂ), is the representation of SO0(3, 1).
Meanwhile, SL(2, ℂ) is (simply) connected and SO0(3, 1) is the connected Lie
1264 24 Basic Formalism

group that contains the identity element E0. Then, according to Corollary 24.4, we
have

φ½SLð2, ℂÞ ⊂ SO0 ð3, 1Þ: ð24:520Þ

As already seen in (24.509), φ[q] [q 2 SL(2, ℂ)] has given the matrix represen-
tation of the Lorentz group SO0(3, 1). This seems to imply at first glance that

SO0 ð3, 1Þ ⊂ φ½SLð2, ℂÞ: ð24:521Þ

Strictly speaking to assert (24.521), however, we need the knowledge of the


differential representation and Theorem 25.2 based on it. Equation (24.21) will be
proven in Sect. 25.1 accordingly. Taking (24.521) in advance and combining
(24.520) with (24.521), we get

φ½SLð2, ℂÞ = SO0 ð3, 1Þ: ð24:522Þ

Now, we have a question of what kind of property φ has (see Sect. 16.4). In terms
of the mapping, if we have

f ð x1 Þ = f ð x 2 Þ ⟹ x1 = x2 , ð24:523Þ

the mapping f is injective (Sect. 11.2). If (24.523) is not true, f is not injective, and so
f is not isomorphic either. Therefore, our question can be translated into whether or
not the following proposition is true of φ:

φ ½ q  = φ½ σ 0  ⟹ q = σ0: ð24:524Þ

Meanwhile, φ[σ 0] gives the identity mapping (Sect. 16.4). In fact, we have

φ½σ 0 ðH Þ = σ 0 Hσ 0 { = σ 0 Hσ 0 = H: ð24:525Þ

Consequently, our question boils down to the question of whether or not we can
find q (≠σ 0) for which φ[q] gives the identity mapping. If we find such q, φ[q] is not
isomorphic, but has a non-trivial kernel. Let us get back to our current issue. We
have

φ½ - σ 0 ðH Þ = ð- σ 0 ÞH - σ 0 { = σ 0 Hσ 0 = H: ð24:526Þ

Then, φ[-σ 0] gives the identity mapping as well; that is, φ[-σ 0] = φ[σ 0]. Equation
(24.24) does not hold with φ accordingly. Therefore, according to Theorem 16.2 φ is
not isomorphic. To fully characterize the representation φ, we must seek element
(s) other than -σ 0 as a candidate for the kernel.
24.10 Representation of the Proper Orthochronous Lorentz. . . 1265

Now, suppose that q 2 SL(2, ℂ) is an element of the kernel N , i.e., q 2 N . Then,


for an arbitrary element H 2 V4 defined in (24.493) we must have

φ½qðH Þ = qHq{ = H:

In a special case where we choose σ 0 (2H ), the following relation must hold:

φ½qðσ 0 Þ = qσ 0 q{ = qq{ = σ 0 , ð24:527Þ

where the last equality comes from the supposition that φ[q] is contained in the
kernel N . This means that q is unitary. That is, q{ = q-1. Then, for any H 2 V4 we
have

φ½qðH Þ = qHq{ = qHq - 1 = H or qH = Hq: ð24:528Þ

Meanwhile, any (2, 2) complex matrices z can be expressed as

z = x þ iy, x, y 2 V4 , ð24:529Þ

which will be proven in Theorem 24.14 given just below. Then, from both (24.528)
and (24.529), q must be commutative with any (2, 2) complex matrices. Let C be a
set of all the (2, 2) complex matrices described as

C = C = cij ; i, j = 1, 2, cij 2 ℂ : ð24:530Þ

Then, since C ⊃ SLð2, ℂÞ, it follows that q is commutative with 8g 2 SL(2, ℂ).
Consequently, according to Schur’s Second Lemma q must be expressed as

q = αE ½α 2 ℂ, E : ð2, 2Þ identity matrix:

Since detq = 1, we must have α2 = 1, i.e., α = ± 1. This implies that we have


only the possibility that the elements of N are limited to q that satisfies

q = σ0 or q = - σ0: ð24:531Þ

Thus, we obtain

N = fσ 0 , - σ 0 g: ð24:532Þ

Any other elements belonging to V4 are excluded from the kernel N . This
implies that φ is a two-to-one homomorphic mapping. At the same time, from
Theorem 16.4 (Homomorphism Theorem) we obtain
1266 24 Basic Formalism

SO0 ð3, 1Þ ffi SLð2, ℂÞ=N : ð24:533Þ

Equation (24.32) is in parallel to the relationship between SU(2) and SO(3)


discussed in Sect. 20.4.3; see (20.314). Thus, we reach a next important theorem.
Theorem 24.13 [8] Suppose that we have a representation φ[q] [q 2 SL(2, ℂ)] on
V4 described by

φ½qðH Þ : V4 ⟶ V4 , φ½qðH Þ = qHq{ : ð24:502Þ

Then, φ defined by φ : SL(2, ℂ) ⟶ SO0(3, 1) is a two-to-one surjective homo-


morphic mapping from SL(2, ℂ) to SO0(3, 1). Namely, we have
SO0 ð3, 1Þ ffi SLð2, ℂÞ=N , where N = fσ 0 , - σ 0 g.
Thus, we find that the relationship between SO0(3, 1) and SL(2, ℂ) can be dealt
with in parallel with that between SO(3) and SU(2). As the representation of
SU(2) has led to that of SO(3), so the representation of SL(2, ℂ) boils down to that
of SO0(3, 1). We will come back to this issue in Sect. 25.1.
Moreover, we have simple but important theorems on the complex matrices and,
more specifically, on slð2, ℂÞ.
Theorem 24.14 [8] Any (2, 2) complex matrices z can be expressed as

z = x þ iy x, y 2 V4 : ð24:534Þ

Proof Let z be described as

a þ ib c þ id
z= ða; b; c; d; p; q; r; s 2 ℝÞ: ð24:535Þ
p þ iq r þ is

Then, z is written as

1
z= ½ða þ rÞσ 0 þ ð- a þ r Þσ 3 þ ðc þ pÞσ 1 þ ðd - qÞσ 2 
2
i
þ ½ðb þ sÞσ 0 þ ð- b þ sÞσ 3 þ ðd þ qÞσ 1 þ ð- c þ pÞσ 2 , ð24:536Þ
2

where σ i ði = 0, 1, 2, 3Þ 2 V4 is given by (24.495). The first and second terms of


(24.536) are evidently expressed as x and iy of (24.534), respectively. This com-
pletes the proof.
Note in Theorem 24.14 all the (2, 2) complex matrices z form the Lie algebra
glð2, ℂÞ. Therefore, from (24.535) glð2, ℂÞ has eight degrees of freedom, namely,
glð2, ℂÞ is an eight-dimensional vector space.
24.10 Representation of the Proper Orthochronous Lorentz. . . 1267

Theorem 24.15 [8] Let Z be any arbitrary element of slð2, ℂÞ. Then, z is uniquely
described as

Z = X þ iY ½X, Y 2 suð2Þ: ð24:537Þ

Proof Let X and Y be described as

X = Z - Z { =2, Y = Z þ Z { =2i: ð24:538Þ

Then, (24.537) certainly holds. Both X and Y are anti-Hermitian. As Z is traceless,


so are X and Y. Therefore, X, Y 2 suð2Þ. Next, suppose that X′(≠X) and Y′(≠Y )
satisfy (24.538). That is,

Z = X 0 þ iY 0 ½X 0 , Y 0 2 suð2Þ: ð24:539Þ

Subtracting (24.537) from (24.539), we obtain

ðX 0 - X Þ þ iðY 0 - Y Þ = 0: ð24:540Þ

Taking the adjoint of (24.540), we get

- ðX 0 - X Þ þ iðY 0 - Y Þ = 0: ð24:541Þ

Adding (24.540) and (24.541), we get

2iðY 0 - Y Þ = 0 or Y 0 = Y: ð24:542Þ

Subtracting (24.541) from (24.540) once again, we have

2ð X 0 - X Þ = 0 or X 0 = X: ð24:543Þ

Equations (24.40) and (24.41) indicate that the representation (24.537) is unique.
These complete the proof.
The fact that an element of slð2, ℂÞ is traceless is shown as follows: From (15.41),
we have

detðexp tAÞ = expðTrtAÞ; t : real: ð24:544Þ

Suppose A 2 slð2, ℂÞ. Then, exptA 2 SL(2, ℂ). Therefore, we have

expðTrtAÞ = detðexp tAÞ = 1, ð24:545Þ


1268 24 Basic Formalism

where with the first equality we used (15.41) with A replaced with tA; the last
equality came from Definition 24.5. This implies that TrtA = tTrA = 0. That is,
A is traceless.
Theorem 24.15 gives an outstanding feature to slð2, ℂÞ. That is, Theorem 24.15
tells us that the elements belonging to slð2, ℂÞ can be decomposed into a matrix
belonging to suð2Þ and that belonging to the set i  suð2Þ. In turn, the elements of
slð2, ℂÞ can be synthesized from those of suð2Þ. With suð2Þ, an arbitrary vector is
described as a linear combination on a real field ℝ. In fact, let X be a vector on suð2Þ.
Then, we have

ic a - ib
X= ða; b; c 2 ℝÞ: ð24:546Þ
- a - ib - ic

Defining the basis vectors of suð2Þ by the use of the matrices defined in (20.270)
as

0 -i 0 1 i 0
ζ1 = , ζ2 = , ζ3 = , ð24:547Þ
-i 0 -1 0 0 -i

we describe X as

X = cζ~3 þ aζ~2 þ bζ~1 ða, b, c 2 ℝÞ: ð24:548Þ

As for slð2, ℂÞ, on the other hand, we describe Z 2 slð2, ℂÞ as

a þ ib c þ id
Z= ða; b; c; d; p; q 2 ℝÞ: ð24:549Þ
p þ iq - a - ib

Using (24.547) for the basis vectors, we write Z as

1 i
Z = ðb - iaÞζ~3 þ ðc - pÞ þ ðd - qÞ ζ~2
2 2
1 i
þ - ðd þ qÞ þ ðc þ pÞ ζ~1 : ð24:550Þ
2 2

Thus, we find that slð2, ℂÞ is a complex vector space. If in general a complex Lie
algebra g has a real Lie subalgebra h (i.e., h ⊂ g) and an arbitrary element Z 2 g is
uniquely expressed as

Z = X þ iY ðX, Y 2 hÞ, ð24:551Þ

then, h is said to be a real form of g. By the same token, g is said to be


complexification of h [8]. In light of (24.551), the relationship between
References 1269

suð2Þ and slð2, ℂÞ is evident. Note that in the case of slð2, ℂÞ, X is anti-Hermitian

and iY is Hermitian. Defining the operators ~ζ k such that

~ζ   iζ~k ðk = 1, 2, 3Þ, ð24:552Þ


k

we can construct the useful commutation relations among them.


On the basis of (20.271), we obtain

ζ~k , ζ~l = ~
3
E ζ ,
m = 1 klm m
ð24:553Þ

~ζ  , ~ζ  = - 3
E ζ , ~ ð24:554Þ
k l m = 1 klm m


ζ~k , ~ζ l =
3
E ~ζ  : ð24:555Þ
m = 1 klm m

The complexification is an important concept to extend the scope of applications


of Lie algebra. Examples of the complexification will be provided in the next
chapter.
In the above discussion we mentioned the subalgebra. Its definition is formally
given as follows.
Definition 24.6 Let g be a Lie algebra. Suppose that h is a subspace of g and that
with 8 X, Y 2 h we have ½X, Y  2 h. Then, h is called a subalgebra of g.

References

1. Satake I-O (1975) Linear algebra (Pure and applied mathematics). Marcel Dekker, New York
2. Satake I (1974) Linear algebra. Shokabo, Tokyo. (in Japanese)
3. Birkhoff G, Mac Lane S (2010) A survey of modern algebra, 4th edn. CRC Press, Boca Raton
4. Byron FW Jr, Fuller RW (1992) Mathematics of classical and quantum physics. Dover,
New York
5. Rosén A (2019) Geometric multivector analysis. Birkhäuser, Cham
6. Hirai T (2001) Linear algebra and representation of groups II. Asakura Publishing, Tokyo.
(in Japanese)
7. Kugo C (1989) Quantum theory of gauge field I. Baifukan, Tokyo. (in Japanese)
8. Yamanouchi T, Sugiura M (1960) Introduction to continuous groups. Baifukan, Tokyo.
(in Japanese)
9. Hotta S (2023) A method for direct solution of the Dirac equation. Bull Kyoto Inst Technol 15:
27–46
10. Itzykson C, Zuber J-B (2005) Quantum field theory. Dover, New York
11. Sakamoto M (2014) Quantum field theory. Shokabo, Tokyo. (in Japanese)
12. Goldstein H, Poole C, Safko J (2002) Classical mechanics, 3rd edn. Addison Wesley, San
Francisco
13. Schweber SS (1989) An introduction to relativistic quantum field theory. Dover, New York
14. Mirsky L (1990) An Introduction to linear algebra. Dover, New York
15. Hassani S (2006) Mathematical physics. Springer, New York
Chapter 25
Advanced Topics of Lie Algebra

Abstract In Chap. 20, we dealt with the Lie groups and Lie algebras within the
framework of the continuous groups. With the Lie groups the operation is multipli-
cative (see Part IV), whereas the operation of the Lie algebras is additive since the
Lie algebra forms a vector space (Part III). It is of importance and interest, however,
that the representation is multiplicative in both the Lie groups and Lie algebras. So
far, we have examined the properties of the representations of the groups (including
the Lie group), but little was mentioned about the Lie algebra. In this chapter, we
present advanced topics of the Lie algebra with emphasis upon its representation.
Among those topics, the differential representation of Lie algebra is particularly
useful. We can form a clear view of the representation theory by contrasting the
representation of the Lie algebra with that of the Lie group. Another topic is the
complexification of the Lie algebra. We illustrate how to extend the scope of
application of Lie algebra through the complexification with actual examples. In
the last part, we revisit the topic of the coupling of angular momenta and compare the
method with the previous approach based on the theory of continuous groups.

25.1 Differential Representation of Lie Algebra [1]

25.1.1 Overview

As already discussed, the Lie algebra is based on the notion of infinitesimal


transformation around the identity element of the Lie groups (Sect. 20.4). The
representation of Lie groups and that of Lie algebras should have close relationship
accordingly. As the Lie algebra is related to the infinitesimal transformation, so is its
representation. We summarize the definition of the representation of Lie algebra. We
have a differential representation as another representation specific to Lie algebra.
The differential representation is defined in relation to the representation of the Lie
group.

© Springer Nature Singapore Pte Ltd. 2023 1271


S. Hotta, Mathematical Physical Chemistry,
https://doi.org/10.1007/978-981-99-2512-4_25
1272 25 Advanced Topics of Lie Algebra

Definition 25.1 Let σ(X) be a mapping that relates X 2 g (i.e., X is an element of a


Lie algebra g) to a vector space V. Suppose that σ(X) is a linear transformation and
that σ(X) satisfies the following relationships (i), (ii), and (iii):
(i)
σ ðaX Þ = aσ ðX Þ ða 2 ℝÞ, ð25:1Þ

(ii)
σ ðX þ Y Þ = σ ðX Þ þ σ ðY Þ, ð25:2Þ

(iii)
σ ðXY Þ = σ ðX Þσ ðY Þ: ð25:3Þ

Then, the mapping σ(X) is called a representation of the Lie algebra g on V.


We schematically write this mapping as

σ : g→V

or

g 3 X ° σ ðX Þ 2 V: ð25:4Þ

The relationship (iii) implies that the σ(X) is a homomorphic mapping on g. In the
above definition, if V of (25.4) is identical to g, σ(X) is an endomorphism
(Chap. 11). Next, we think of the mapping σ(X), where X represents a commutator.
We have

σ ð½X, Y Þ = σ ðXY - YX Þ = σ ðXYÞ - σ ðYX Þ = σ ðX Þσ ðY Þ - σ ðY Þσ ðX Þ


= ½σ ðX Þ, σ ðY Þ,

where with the second equality we used (25.2); with the third equality we used
(25.3). That is,
(iv)
σ ð½X, Y Þ = ½σ ðX Þ, σ ðY Þ: ð25:5Þ

The reducibility and irreducibility of the representation can similarly be defined


as before (see, e.g., Sect. 20.2). The expression (25.5) plays a special role with the
representation because it holds the commutation relation unchanged in V as well as
in g. More importantly, we have a relationship described by

ρðexp tXÞ = exp½tσ ðX Þ ðX 2 g, t 2 ℝÞ, ð25:6Þ

where ρ is a representation of the Lie group ℊ corresponding to g. Differentiating


both sides of (25.6) and taking the limit of t → 0, we get
25.1 Differential Representation of Lie Algebra [1] 1273

1
lim ½ρðexp tX Þ - ρðexp 0Þ = σ ðX Þ lim ½exp tσ ðX Þ: ð25:7Þ
t→0 t t→0

From (18.4) we have

ρðexp 0Þ = ρðeÞ = E, ð25:8Þ

where e and E are an identity element of ℊ and ρ(ℊ), respectively. Then, (25.7) can
be rewritten as

1
σ ðX Þ = lim ½ρðexp tX Þ - E : ð25:9Þ
t→0 t

Thus, the relationship between the Lie group ℊ and its representation space L S
[see (18.36)] can be translated into that between the Lie algebra g and the associated
vector space V. In other words, as a whole collection consisting of X forms the Lie
algebra g (i.e., a vector space g), so a whole collection consisting of σ(X) forms the
Lie algebra on the vector space V. We summarize the above statements as a
following definition.
Definition 25.2 Suppose that we have a relationship described by

ρðexp tXÞ = exp½tσ ðX Þ ðX 2 g, t 2 ℝÞ: ð25:6Þ

The representation σ(X) of g on V is called a differential representation for the


representation ρ of the Lie group ℊ. In this instance, σ is expressed as

σ  dρ: ð25:10Þ

To specify the representation space that is associated with the Lie group and Lie
algebra, we normally write the combination of the representation σ or ρ and its
associated representation space V as [1]

ðσ, VÞ or ðρ, VÞ: ð25:11Þ

Notice that the same representation space V has been used in (25.11) for both the
representations σ and ρ relevant to the Lie algebra and Lie group, respectively.

25.1.2 Adjoint Representation of Lie Algebra [1]

One of the most important representations of the Lie algebras is an adjoint repre-
sentation. In Sect. 20.4.3, we defined the adjoint representation for the Lie group. In
the present section, we deal with the adjoint representation for the Lie algebra. It is
1274 25 Advanced Topics of Lie Algebra

defined as the differential representation of the adjoint representation Ad[g] of the


Lie group ℊ 3 g. In this case, the differential representation σ(X) is denoted by

σ ðX Þ  ad½X , ð25:12Þ

where we used an angled bracket with RHS in accordance with Sect. 20.4.3. The
operator ad[X] operates on an element Y of the Lie algebra such that

ad½X ðY Þ: ð25:13Þ

With (25.13) we have a simple but important theorem.


Theorem 25.1 [1] With X, Y 2 g we have

ad½X ðY Þ = ½X, Y : ð25:14Þ

Proof Let gX(t) be expressed as

gX ðt Þ = Ad½exp tX : ð25:15Þ

That is, gX is an adjoint representation of the Lie group ℊ on the Lie algebra g.
Notice that we are using (25.15) for ρ(exptX) of (25.6). Then, according to (25.6) and
(25.9), we have

d
ad½X ðY Þ = g0X ð0ÞY = ðexp tX ÞY exp tXÞ - 1 t=0
dt
d
= ½ðexp tX ÞY ðexp - tX Þt = 0 = XY - YX = ½X, Y , ð25:16Þ
dt

where with the third equality we used (15.28) and with the second to the last equality
we used (15.4). This completes the proof.
Equation (25.14) can be seen as a definition of the adjoint representation of the
Lie algebra. We can directly show that the operator ad[X] is indeed a representation
of g as follows: That is, with any arbitrary element Z 2 g we have

ad½X, Y ðZ Þ = ½½X, Y , Z  = - ½Z, ½X, Y  = ½X, ½Y, Z  - ½Y, ½X, Z 


= ½X, ad½Y ðZ Þ - ½Y, ad½X ðZ Þ = ad½X ðad½Y Z Þ - ad½Y ðad½X Z Þ
= ad½X ad½Y ðZ Þ - ad½Y ad½X ðZ Þ = ½ad½X , ad½Y ðZ Þ,
ð25:17Þ

where with the third equality we used Jacobi’s identity (20.267). Comparing the first
and last sides of (25.17), we get
25.1 Differential Representation of Lie Algebra [1] 1275

ad½X, Y ðZ Þ = ½ad½X , ad½Y ðZ Þ: ð25:18Þ

Equation (25.18) holds with any arbitrary element Z 2 g. This implies that

ad½X, Y  = ½ad½X , ad½Y : ð25:19Þ

From (25.5), (25.19) satisfies the condition for a representation of the Lie algebra
g.
We know that if X, Y 2 g, ½X, Y  2 g (see Sect. 20.4). Hence, we have a following
relationship described by

g 3 Y ° ad½X ðY Þ = Y 0 2 g ðX 2 gÞ ð25:20Þ

or

ad½X  : g → g: ð25:21Þ

That is, ad[X] is an endomorphism within g.


Example 25.1 In Example 20.4 we had (20.270) as suð2Þ. Among them we use
i 0
ζ 3 = 12 2 suð2Þ for X in (25.20). We have
0 -i

ad½X  : g → g ½X 2 suð2Þ, g = suð2Þ: ð25:22Þ

Also, we have

ad½ζ 3 ðY Þ = ζ 3 Y - Yζ 3 ½Y 2 suð2Þ: ð25:23Þ

Substituting ζ 1, ζ 2, and ζ 3 for Y, respectively, we obtain

ad½ζ 3 ðζ 1 Þ = ζ 2 , ad½ζ 3 ðζ 2 Þ = - ζ 1 , and ad½ζ3 ðζ 3 Þ = 0: ð25:24Þ

That is, we get

0 -1 0
ðζ 1 ζ 2 ζ 3 Þad½ζ 3  = ðζ 1 ζ 2 ζ 3 Þ 1 0 0 = ðζ 1 ζ 2 ζ 3 ÞA3 : ð25:25Þ
0 0 0

Similarly, we obtain
1276 25 Advanced Topics of Lie Algebra

Fig. 25.1 Isomorphism ad


between suð2Þ and soð3Þ
(2) (3)
ad

ad: isomorphism = ad( )

0 0 0
ðζ 1 ζ 2 ζ 3 Þad½ζ 1  = ðζ 1 ζ 2 ζ 3 Þ 0 0 -1 = ðζ 1 ζ 2 ζ 3 ÞA1 , ð25:26Þ
0 1 0
0 0 1
ðζ 1 ζ 2 ζ 3 Þad½ζ 2  = ðζ 1 ζ 2 ζ 3 Þ 0 0 0 = ðζ 1 ζ 2 ζ 3 ÞA2 : ð25:27Þ
-1 0 0

The (3, 3) matrices of RHS of (25.25), (25.26), and (25.27) are identical to A3, A1,
and A2 of (20.274), respectively. Equations (25.25)-(25.27) show that the differen-
tial mapping ad transforms the basis vectors ζ1, ζ 2, and ζ 3 of suð2Þ into basis vectors
A1, A2, and A3 of soð3Þ ½ = oð3Þ, respectively (through the medium of ζ 1, ζ 2,
and ζ 3). From the discussion of Sect. 11.4, the transformation ad is bijective.
Namely, suð2Þ and soð3Þ are isomorphic to each other.
In Fig. 25.1 we depict the isomorphism ad between suð2Þ and soð3Þ as

ad : suð2Þ → soð3Þ: ð25:28Þ

Notice that the mapping between two Lie algebras suð2Þ and soð3Þ represented in
(25.28) is in parallel with the mapping between two Lie groups SU(2) and SO(3) that
is described by

Ad : SU ð2Þ → SOð3Þ: ð20:303Þ

Thus, in accordance with (25.11), we have the following correspondence between


the differential representation ad½X  ½X 2 suð2Þ on a three-dimensional vector space
V3 (as a representation space) and the representation Ad[g] [g 2 SU(2)] on V3 such
that
25.1 Differential Representation of Lie Algebra [1] 1277

ad, V3 , Ad, V3 : ð25:29Þ

In the present example, we have chosen suð2Þ for V3 ; see Sect. 20.4.3.
In (25.6), replacing ρ, t, X, and σ with Ad, α, ζ 3, and ad, respectively, we get

expfα  ad½ζ 3 g = Ad½expðαζ 3 Þ ðζ 3 2 g, α 2 ℝÞ: ð25:30Þ

In fact, comparing (20.28) with (20.295), we find that both sides of the above
equation certainly give a same matrix expressed as

cos α - sin α 0
sin α cos α 0 :
0 0 1

Notice that [LHS of (25.30)]= (20.28) and [RHS of (25.30)]= (20.295).


As explained above, the present example clearly shows a close relationship
between the representations of Lie group and Lie algebra.
We can construct the adjoint representation of the Lie algebras using a represen-
tation other than (25.14). In that case it is again helpful to use Jacobi’s identity in
combination with (20.269) that gives structure constants which define the structure
of the Lie algebra (Sect. 20.4.2). We rewrite (20.269) in such a way that the
expression given below conforms to the Einstein summation rule

X i , X j = f ij l X l : ð25:31Þ

Jacobi’s identity is described by

Xi, Xj, Xk þ X j , ½X k , X i  þ X k , X i , X j = 0: ð25:32Þ

The term [Xi, [Xj, Xk]] can be rewritten by the use of the structure constants such
that

X i , f jk l X l = f jk l ½X i , X l  = f jk l f il m X m :

Similarly rewriting the second and third terms of (25.32), we obtain

f jk l f il m
þ f ki l f jl m
þ f ij l f kl m X m = 0:

Since Xm (m = 1, ⋯, d; d is a dimension of the representation) is linearly


independent, we must have for each m
1278 25 Advanced Topics of Lie Algebra

f jk l f il m
þ f ki l f jl m
þ f ij l f kl m = 0: ð25:33Þ

Taking account of the fact that f ki l is antisymmetric with respect to the indices
k and i, (25.33) is rewritten as

f jk l f il m
- f ik l f jl m
= f ij l f lk m : ð25:34Þ

If in (25.34) we define f jk l as

l
f jk l  σ j k
, ð25:35Þ

l
then the operator σ j k is a representation of the Lie algebra. Note that in (25.35)
l
σ j k indicates (l, k) element of the matrix σ j. By the use of (25.35), (25.34) can be
rewritten as

l m
ð σ i Þm l σj k
- σj l
ðσ i Þl k = f ij l ðσ l Þm k : ð25:36Þ

That is, we have

σ i , σ j = f ij l σ l : ð25:37Þ

Equation (25.37) is virtually the same form as (25.31) and (20.269). Hence, we
find that (25.35) is the representation of the Lie algebra, whose structure is
represented by (25.37) or (20.269). Using the notation (25.14), we get

ad½σ i  σ j = σ i , σ j = f ij l σ l : ð25:38Þ

25.1.3 Differential Representation of Lorentz Algebra [1]

As another important differential representation of the Lie algebra, we wish to


introduce the differential representation of the Lorentz algebra. In Sect. 24.10 we
examined the properties of the operation described by

H ° qHq{ ½q 2 SLð2, ℂÞ, ð24:500Þ

with
25.1 Differential Representation of Lie Algebra [1] 1279

a b
q= , detq = ad - bc = 1: ð24:501Þ
c d

This constituted the representation φ, V4 described by

φ½qðH Þ : V4 → V4 , φ½qðH Þ = qHq{ : ð24:502Þ

Now, we wish to construct another differential representation dφ, V4 . Follow-


ing Theorem 25.1 and defining f(t) as

f X ðt Þ  φðexp tX Þ ½X 2 slð2, ℂÞ,

for y 2 V4 = H = hij ; i, j = 1, 2, hij 2 ℂ, H : Hermitian we have

d
dφ½X ðyÞ = f 0X ð0ÞðyÞ = ðexp tX Þy exp tXÞ{ t=0
dt
d
= ðexp tX Þy exp tXÞ{ t=0
dt

= X exp tXÞy exp tX { þ ðexp tX ÞyX { exp tX { t=0


{
= Xy þ yX , ð25:39Þ

where with the third equality we used (15.32).


Example 25.2 Following Example 25.1, we compute dφ[X]( y) with X 2 slð2, ℂÞ
and y 2 V4 . The mapping is expressed as

dφ½X  : V4 → V4 ; dφ½X ðyÞ = Xy þ yX { ð25:40Þ

with

y 2 V4 = H = hij ; i, j = 1, 2, hij 2 ℂ, H : Hermitian ,

where V4 is spanned by σ 0, σ 1, σ 2, and σ 3 of (24.497). Note that (25.40) gives an


endomorphism on V4 .
i 0
Putting, e.g., X = ζ 3 = 12 , we have
0 -i

dφ½ζ 3 ðyÞ = ζ 3 y þ yζ {3 : ð25:41Þ

Substituting σ 0, σ 1, σ 2, and σ 3 for y, respectively, we obtain


1280 25 Advanced Topics of Lie Algebra

dφ½ζ 3 ðσ 0 Þ = 0, dφ½ζ 3 ðσ 1 Þ = σ 2 , dφ½ζ 3 ðσ 2 Þ = - σ 1 and dφ½ζ 3 ðσ 3 Þ = 0:

That is, we get

0 0 0 0
0 0 -1 0
ðσ 0 σ 1 σ 2 σ 3 Þdφ½ζ 3  = ðσ 0 σ 1 σ 2 σ 3 Þ : ð25:42Þ
0 1 0 0
0 0 0 0

Similarly, we obtain

0 0 0 0
0 0 0 0
ðσ 0 σ 1 σ 2 σ 3 Þdφ½ζ 1  = ðσ 0 σ 1 σ 2 σ 3 Þ , ð25:43Þ
0 0 0 -1
0 0 1 0
0 0 0 0
0 0 0 1
ðσ 0 σ 1 σ 2 σ 3 Þdφ½ζ 2  = ðσ 0 σ 1 σ 2 σ 3 Þ : ð25:44Þ
0 0 0 0
0 -1 0 0

On the basis of Theorem 24.15, we have known that slð2, ℂÞ is spanned by six
linearly independent basis vectors. That is, we have

slð2, ℂÞ = Spanfζ 1 , ζ 2 , ζ 3 , iζ 1 , iζ 2 , iζ 3 g, ð25:45Þ

where ζ1, ζ 2, ζ 3 have been defined in (20.270). Then, corresponding to (25.42),


(25.43), and (25.44), we must have related expressions with iζ1, iζ 2, and iζ 3. In fact,
we obtain

0 0 0 1
0 0 0 0
ðσ 0 σ 1 σ 2 σ 3 Þdφ½iζ 3  = ðσ 0 σ 1 σ 2 σ 3 Þ , ð25:46Þ
0 0 0 0
1 0 0 0
0 1 0 0
1 0 0 0
ðσ 0 σ 1 σ 2 σ 3 Þdφ½iζ 1  = ðσ 0 σ 1 σ 2 σ 3 Þ , ð25:47Þ
0 0 0 0
0 0 0 0
25.1 Differential Representation of Lie Algebra [1] 1281

0 0 1 0
0 0 0 0
ðσ 0 σ 1 σ 2 σ 3 Þdφ½iζ 2  = ðσ 0 σ 1 σ 2 σ 3 Þ : ð25:48Þ
1 0 0 0
0 0 0 0

Equations (25.42), (25.43), and (25.44) are related to the spatial rotation and
(25.46), (25.47), and (25.48) to the Lorentz boost. All these six (4, 4) real matrices
are identical to those given in (24.220). Equations (25.42)-(25.48) show that the
differential mapping dφ transforms the basis vectors ζ 1, ζ 2, ζ 3, iζ 1, iζ 2, iζ 3 of slð2, ℂÞ
into those A = ðA1 , A2 , A3 Þ, B = ðB1 , B2 , B3 Þ of so0 ð3, 1Þ, respectively (through
the medium of σ 0, σ 1, σ 2, and σ 3). Whereas slð2, ℂÞ is a six-dimensional complex
vector space, so3 ð3, 1Þ is a six-dimensional real vector space. Once again, from the
discussion of Sect. 11.4 the transformation dφ is bijective. Namely, slð2, ℂÞ and
so0 ð3, 1Þ [i.e., Lie algebras corresponding to the Lie groups SL(2, ℂ) and SO0(3, 1)]
are isomorphic to each other.
Thus, the implication of Examples 25.1 and 25.2 is evident and the differential
representation (e.g., ad and dφ in the present case) allows us to get a clear view on
the representation theory of the Lie algebra. In Sect. 24.1, we had the following
relation described by

Λ = expða  AÞ expðb  BÞ: ð24:223Þ

In this section, redefining a and b as

a  ðθ1 , θ2 , θ3 Þ and b  ð- ω1 , - ω2 , - ω3 Þ: ð25:49Þ

We establish the expression the same as (24.223). Notice that the parameters -
ωi (i = 1, 2, 3) have the minus sign. It came from the conventional definition about
the direction of the Lorentz boost.
Example 25.3 In Example 25.2, through the complexification we extended the Lie
algebra suð2Þ = Spanfζ1 , ζ 2 , ζ 3 g to slð2, ℂÞ such that

slð2, ℂÞ = Spanfζ 1 , ζ 2 , ζ 3 , iζ 1 , iζ 2 , iζ 3 g: ð25:45Þ

Meanwhile, replacing ρ, t, X, and σ with φ, ω, iζ 3, and dφ, respectively, the


following equation given by

ρðexp tXÞ = exp½tσ ðX Þ ðX 2 g, t 2 ℝÞ ð25:6Þ

can be read as
1282 25 Advanced Topics of Lie Algebra

expfω  dφ½iζ 3 g = φ½expðωiζ 3 Þ ½iζ 3 2 slð2, ℂÞ, ω 2 ℝ: ð25:50Þ

From (20.270) we have

1 -1 0
iζ 3 = :
2 0 1

The function exp(ωiζ 3) can readily be calculated so that we have

ω ω
cosh - sinh 0
expðωiζ 3 Þ = 2 2 :
ω ω
0 cosh þ sinh
2 2

Then, from (24.514) and (24.515), we obtain

cosh ω 0 0 sinh ω
0 1 0 0
φðexp ωiζ3 Þ = 0 0 1 0 :

sinh ω 0 0 cosh ω

With respect to RHS of (25.50), in turn, from (25.46) we get

0 0 0 ω
0 0 0 0
ωdφ½iζ 3  = :
0 0 0 0
ω 0 0 0

Then, calculating exp{ωdφ[iζ 3]} we immediately find that (25.50) holds.


Once again, Examples 25.1 and 25.3 provide tangible illustrations to an abstract
relation between the representations of Lie group and Lie algebra that is represented
by (25.6). With regard to Examples 25.1 and 25.2, we have a following important
theorem in relation to Theorem 24.13.
Theorem 25.2 [1] Let ðρ, VÞ be a representation of a linear Lie group G on a vector
space V and let ðσ, VÞ be a representation of the Lie algebra g of G so that σ is a
differential representation of ρ; i.e., σ = dρ. Also, let N be a kernel of ρ. That is, we
have

N = fg 2 G; ρðgÞ = eg,

where e is an identity element. Then, we have


25.1 Differential Representation of Lie Algebra [1] 1283

G=N ffi ρðGÞ:
(i)
(ii) Let n be a Lie algebra of N . Then, n is a kernel of the homomorphism mapping
σ = dρ of g. That is,

n = fX 2 g; σ ðX Þ = 0g: ð25:51Þ

Proof (i) We have already proven (i) in the proof of Theorem 16.4 (Homomorphism
Theorem). With (ii), the proof is as follows: Regarding 8 X 2 g and any real number
t, we have

ρðexp tX Þ = exp½tσ ðX Þ ðX 2 g, t 2 ℝÞ: ð25:6Þ

If σ(X) = 0, RHS of (25.6) is e for any t 2 ℝ. Therefore, according to Definition


16.3 we have exp tX 2 N with any t 2 ℝ. This means that X is an element of the Lie
algebra of N . From the definition of the Lie algebra of N , we have X 2 n. In an
opposite way, if X 2 n, we have exp tX 2 N for any t 2 ℝ and, hence, LHS of (25.6)
is always e according to Definition 16.3. Consequently, differentiating both sides of
(25.6) with respect to t, we have

0 = σ ðX Þ½exp tσ ðX Þ:

Taking the limit t → 0, we have 0 = σ(X). That is, n comprises the elements X that
satisfy (25.51). This completes the proof.
Now, we come back to the proof of Theorem 24.13. We wish to consider
dφ, V4 in correspondence with φ, V4 . Let us start a discussion about the kernel
N of φ. According to Theorem 24.13, it was given as

N = fσ 0 , - σ 0 g: ð24:532Þ

Since the differential representation dφ is related to the neighborhood of the


identity element σ 0, the kernel n with respect to dφ is only X = 0 in light of (25.40).
In fact, from Example 25.2, we have

0 d e f
-c
dφ½aζ1 þ bζ 3 þ cζ 3 þ diζ 1 þ eiζ 2 þ fiζ 3  = d
e
0
c 0 -a
b
:
f -b a 0

From (25.45), aζ 1 + ⋯ + f i ζ 3 (a, ⋯, f : real) represents all the elements slð2, ℂÞ.
For RHS to be zero element 0′ of so0 ð3, 1Þ, we need a = ⋯ = f = 0. This implies that

dφ - 1 ð00 Þ = f0g,
1284 25 Advanced Topics of Lie Algebra

Fig. 25.2 Isomorphism dφ


between slð2, ℂÞ and
(2, ) (3, 1)
so0 ð3, 1Þ

: isomorphism = ( )

where 0 denotes the zero element of slð2, ℂÞ; dφ-1 is the inverse transformation of
dφ. The zero element 0′ is represented by a (4, 4) matrix. An explicit form is given by

0 0 0 0
00 = 0
0
0
0
0
0
0
0 :
0 0 0 0

The presence of dφ-1 is equivalent to the statement that

dφ : slð2, ℂÞ⟶so0 ð3, 1Þ ð25:52Þ

is a bijective mapping (see Sect. 11.2). From Theorem 25.2, with respect to the
kernel n of dφ we have

n = fX 2 slð2, ℂÞ; dφðX Þ = 00 g:

Since dφ is bijective, from Theorem 11.4 we get

n = f0g, 0  0
0
0
0
:

Consequently, the mapping dφ can be characterized as isomorphism expressed as

slð2, ℂÞ ffi so0 ð3, 1Þ: ð25:53Þ

This relationship is in parallel with that between suð2Þ and soð3Þ; see Example
25.1. In Fig. 25.2 we depict the isomorphism dφ between slð2, ℂÞ and so0 ð3, 1Þ.
Meanwhile, Theorem 20.11 guarantees that 8g 2 SO0(3, 1) can be described by

g = ðexp t 1 X 1 Þðexp t 2 X 2 Þ⋯ðexp t 6 X 6 Þ, ð25:54Þ


25.1 Differential Representation of Lie Algebra [1] 1285

Fig. 25.3 Homomorphic (2, ) (3, 1)


mapping φ from SL(2, ℂ) to
SO0(3, 1) with the kernel
N = fσ 0 , - σ 0 g

where X1, X2, ⋯, and X6 can be chosen from among the (4, 4) matrices of, e.g.,
(25.42), (25.43), (25.44), (25.46), (25.47), and (25.48). With 8q 2 SL(2, ℂ), q can be
expressed as

q = ðexp t 1 ζ 1 Þðexp t 2 ζ 2 Þðexp t 3 ζ 3 Þðexp t 4 iζ 1 Þðexp t 5 iζ 2 Þðexp t 6 iζ 3 Þ: ð25:55Þ

Then, (25.6) is read in the present case as

φ½ðexp t1 ζ 1 Þ⋯ðexp t 6 iζ 3 Þ = expðt 1 dφ½ζ 1 Þ⋯ expðt 6 dφ½iζ 3 Þ


½ζ 1 , ⋯, iζ 3 2 slð2, ℂÞ, t i ði = 1, ⋯, 6Þ 2 ℝ:

Since from the above discussion there is the bijective mapping (or isomorphism)
between slð2, ℂÞ and so0 ð3, 1Þ, we may identify X1, ⋯, X6 with dφ[ζ 1], ⋯, dφ[iζ 3],
respectively; see (25.42)-(25.48). Thus, we get

φ½q = expðt 1 dφ½ζ 1 Þ⋯ expðt 6 dφ½iζ 3 Þ


ð25:56Þ
= ðexp t 1 X 1 Þðexp t 2 X 2 Þ⋯ðexp t 6 X 6 Þ = g:

Equation (25.56) implies that 8g 2 SO0(3, 1) can be described using ∃q 2 SL(2, ℂ)


given by (25.55). This means that

SO0 ð3, 1Þ ⊂ φ½SLð2, ℂÞ: ð24:521Þ

Combining (24.520) and (24.521), we get the relation described by

φ½SLð2, ℂÞ = SO0 ð3, 1Þ: ð24:522Þ

Figure 25.3 shows the homomorphic mapping φ from SL(2, ℂ) to SO0(3, 1) with
the kernel N = fσ 0 , - σ 0 g. In correspondence with (25.29) of Example 25.1, we
have
1286 25 Advanced Topics of Lie Algebra

Table 25.1 Representation species of Lie algebras


Representation Representation space Mapping Category
Ad suð2Þ ðrealÞ SU(2) → SO(3) Homomorphism
ad suð2Þ ðrealÞ suð2Þ → soð3Þ Isomorphism
φ V4 ðrealÞ SL(2, ℂ) → SO0(3, 1) Homomorphism
dφ V4 ðrealÞ slð2, ℂÞ → so0 ð3, 1Þ Isomorphism

dφ, V4 ⟺ φ, V4 : ð25:57Þ

Note that in Example 25.1 the representation space V4 has been defined as a four-
dimensional real vector space given in (24.493).
Comparing Figs. 25.1 and 25.2 with Figs. 25.3 and 20.6, we conclude the
following: (i) The differential representations ad and dφ exhibit the isomorphic
mapping between the Lie algebras, whereas the representations Ad and φ indicate
the homomorphic mapping between the Lie groups. (ii) This enables us to deal with
the relationship between SL(2, ℂ) and SO0(3, 1) and that between SU(2) and SO(3) in
parallel. Also, we are allowed to deal with the relation between slð2, ℂÞ and so0 ð3, 1Þ
and that between suð2Þ and soð3Þ in parallel. Table 25.1 summarizes the properties
of several (differential) representations associated with the Lie algebras and
corresponding Lie groups.
The Lie algebra is a special kind of vector space. In terms of the vector space, it is
worth taking a second look at the basic concept of Chaps. 11 and 20. In Sect. 25.1,
we have mentioned the properties of the differential representation of the Lie algebra
g on a certain vector space V. That is, we have had

σ : g⟶V: ð25:4Þ

Equation (25.4) may or may not be an endomorphism.


Meanwhile, the adjoint representation Ad of the Lie group SU(2) was not an
endomorphism; see (20.303) denoted by

Ad : SU ð2Þ ⟶ SOð3Þ: ð20:303Þ

These features not only enrich the properties of Lie groups and Lie algebras but
also make various representations including differential representation easy to use
and broaden the range of applications.
As already seen in Sect. 20.4, soð3Þ and suð2Þ have the same structure constants.
Getting back to Theorem 11.3 (Dimension Theorem), we have had

dim V n = dimAðV n Þ þ dim Ker A: ð11:45Þ

Suppose that there is a mapping M from suð2Þ to soð3Þ such that


25.2 Cartan–Weyl Basis of Lie Algebra [1] 1287

M : suð2Þ⟶ soð3Þ:

Equations (20.273) and (20.276) clearly show that both suð2Þ and soð3Þ are
spanned by three basis vectors, namely the dimension of both suð2Þ and soð3Þ is
three. Hence, (11.45) should be translated into

dim½suð2Þ = dimM½suð2Þ þ dim Ker M: ð25:58Þ

Also, we have

dim½suð2Þ = dimM½suð2Þ = dim½soð3Þ = 3: ð25:59Þ

This naturally leads to dim Ker M = 0 in (25.58). From Theorem 11.4, in turn,
we conclude that M is a bijective mapping, i.e., isomorphism. In this situation, we
expect to have the homomorphism between the corresponding Lie groups. This was
indeed the case with SU(2) and SO(3) (Theorem 20.7).
By the same token, in this section we have shown that there is one-to-one
correspondence between slð2, ℂÞ and so0 ð3, 1Þ. Corresponding to (25.58), we have

~ ½slð2, ℂÞ þ dim Ker M,


dim½slð2, ℂÞ = dimM ~ ð25:60Þ

where

~ : slð2, ℂÞ⟶so0 ð3, 1Þ:


M ð25:61Þ

In the above, we have

~ ½slð2, ℂÞ = dim½so0 ð3, 1Þ = 6:


dim½slð2, ℂÞ = dimM ð25:62Þ

Again, we have dim Ker M ~ = 0. Notice that dφ of (25.52) is a special but


~
important case of M. Similarly to the case of suð2Þ, there has been a homomorphic
mapping from SL(2, ℂ) to SO0(3, 1) (Theorem 24.13).

25.2 Cartan-Weyl Basis of Lie Algebra [1]

In this section, we wish to take the concept of complexification a step further. In this
context, we deal with the Cartan-Weyl basis that is frequently used in quantum
mechanics and the quantum theory of fields. In our case, the Cartan-Weyl basis has
already appeared in Sect. 3.5, even though we did not mention that terminology.
1288 25 Advanced Topics of Lie Algebra

25.2.1 Complexification

We focus on the application of Cartan-Weyl basis to soð3Þ [or oð3Þ] to widen the
vector space of soð3Þ by their “complexification” (see Sect. 24.10). The Lie algebra
soð3Þ is a vector space spanned by three basis vectors whose commutation relation is
described by

X i , X j = f ij k X k ði, j, k = 1, 2, 3Þ, ð25:63Þ

where Xi (i = 1, 2, 3) are the basis vectors, which are not commutative with one
another. We know that soð3Þ is isomorphic to suð2Þ. In the case of suð2Þ, the
complexification of suð2Þ enables us to deal with many problems within the
framework of slð2, ℂÞ.
Here, we construct the Cartan-Weyl basis from the elements of soð3Þ [or oð3Þ]
by extending soð3Þ to oð3, ℂÞ through the complexification. Note that oð3, ℂÞ
comprises complex skew-symmetric matrices. In Sects 20.1 and 20.2 we converted
the Hermitian operators to the anti-Hermitian operators belonging to soð3Þ such that,
e.g.,

Az  - iMz : ð20:15Þ

These give a clue to how to address the problem. In an opposite way, we get

iAz = Mz : ð25:64Þ

In light of Theorem 24.15, the Hermitian operator Mz should now be regarded as


due to the complexification of soð3Þ in (25.64). We redefine A3  Az and M3  Mz;
see (20.274). In a similar way, we get

M1  iA1 and M2  iA2 :

The operators M1 , M2 , and M3 construct oð3, ℂÞ together with A1, A2, and A3.
That is, we have

oð3, ℂÞ = Span fA1 , A2 , A3 , M1 , M2 , M3 g: ð25:65Þ

A general form of oð3, ℂÞ is given using real numbers a, b, c, d, f, g as


25.2 Cartan–Weyl Basis of Lie Algebra [1] 1289

0 a þ bi c þ di 0 a c 0 bi di
- ða þ biÞ 0 f þ gi = -a 0 f þ - bi 0 gi

- ðc þ diÞ - ðf þ giÞ 0 -c -f 0 - di - gi 0
0 a c 0 b d
= -a 0 f þi -b 0 g :
-c -f 0 -d -g 0
ð25:66Þ

Let U be an arbitrary element of oð3, ℂÞ. Then, from (25.66) U is uniquely


described as

U = V þ iW ½V, W 2 soð3Þ:

This implies that oð3, ℂÞ is a complexification of soð3Þ. Or, we may say that soð3Þ
is a real form of oð3, ℂÞ.
According to the custom, we define Fþ ½2 oð3, ℂÞ and F - ½2 oð3, ℂÞ as

Fþ  A1 þ iA2 = A1 þ M2 and F - = A1 - iA2 = A1 - M2 ,

and choose Fþ and F - for the basis vectors of oð3, ℂÞ instead of M1 and M2 . We
further define the operators J(+) and J(-) in accordance with (3.72) such that

J ðþÞ  iFþ = iA1 - A2 and J ð- Þ = iF - = iA1 þ A2 :

Thus, we have gotten back to the discussion made in Sects. 3.4 and 3.5 to directly
apply it to the present situation. Although J(+), J(-), and M3 are formed by the linear
combination of the elements of soð3Þ, these operators are not contained in soð3Þ but
contained in oð3, ℂÞ. The vectors J(+), J(-), and M3 form a subspace of oð3, ℂÞ. We
term this subspace W, to which these three vectors constitute a basis set. This basis
set is referred to as Cartan-Weyl basis and we have

W  Span J ð- Þ , M3 , J ðþÞ : ð25:67Þ

Even though M3 is Hermitian, J(+) is neither Hermitian nor anti-Hermitian. Nor is


(-)
J , either. Nonetheless, we are well used to dealing with these operators in Sect.
3.5. We have commutation relations described as
1290 25 Advanced Topics of Lie Algebra

M3 , J ðþÞ = J ðþÞ ,

M3 , J ð- Þ = - J ð- Þ , ð25:68Þ

J ðþÞ , J ð- Þ = 2M3 :

Note that the commutation relations of (25.68) have already appeared in (3.74) in
the discussion of the generalized angular momentum.
p
Using J ð~± Þ  J ð ± Þ = 2, we obtain a “normalized” relations expressed as

M3 , J ð~± Þ = ± J ð~± Þ and J ð~þÞ , J ð~- Þ = M3 : ð25:69Þ

From (25.68) and (25.69), we find that W forms a subalgebra of oð3, ℂÞ. Also, at
the same time, from (25.14) the differential representation ad with respect to W
diagonalizes ad½M3 . That is, with the three-dimensional representation, using
(25.68) or (25.69) we have

-1 0 0
J ð- Þ M3 J ðþÞ ad½M3  = J ð- Þ M3 J ðþÞ 0 0 0 : ð25:70Þ
0 0 1

Hence, we obtain

-1 0 0
ad½M3  = 0 0 0 : ð25:71Þ
0 0 1

In an alternative way, let us consider slð2, ℂÞ instead of oð3, ℂÞ. In a similar


manner of the above case, we can choose a subspace W ~ of slð2, ℂÞ defined by

~  Spanfiζ1 þ ζ 2 , iζ 3 , iζ 1 - ζ 2 g:
W

At the same time, W ~ is a subalgebra of slð2, ℂÞ. The operators iζ 1 + ζ 2, iζ 3, and


iζ 1 - ζ 2 correspond to J(-), M3 , and J(+) of the above case, respectively. Then, W
and W ~ are isomorphic and the commutation relations exactly the same as (25.68)
hold with the basis vectors iζ 1 + ζ 2, iζ 3, and iζ 1 - ζ 2. Corresponding to (25.71), we
have

-1 0 0
ad½iζ 3  = 0 0 0 : ð25:72Þ
0 0 1

We find that choosing the basis vectors iζ1 + ζ 2, iζ 3, and iζ 1 - ζ 2 contained in


slð2, ℂÞ, ad[iζ 3] can be diagonalized as in (25.71). Thus, in terms of Lie algebra the
subalgebra W and W ~ are equivalent.
25.2 Cartan–Weyl Basis of Lie Algebra [1] 1291

From (25.68) and on the basis of the discussions in Sects. 3.4, 3.5, 20.2, and 20.4,
we can readily construct an irreducible representation ad of W or W ~ on the (2j + 1)-
dimensional representation space V, where j is given as positive integers or positive
half-odd-integers. The number j is said to be a highest weight as defined below.
Definition 25.3 Eigenvalues of the differential representation ad½M3  or ad[iζ 3] are
called a weight of the representation ad. Among the weights, the weight that has the
largest real part is the highest weight.
In (25.71) and (25.72), for instance, the highest weight of M3 or iζ 3 is 1. From the
discussion of Sects. 3.4 and 3.5, we immediately obtain the matrix representation of
the Cartan-Weyl basis of J(-), M3, and J(+) or (iζ 1 + ζ 2, iζ 3, and iζ 1 - ζ 2).
Replacing l with j in (3.152), (3.153), and (3.158), we get

-j
-j þ 1
-jþ 2
M3 = ⋱ , ð25:73Þ
k-j

j

where k = 0, 1, ⋯, 2j and the highest weight of M3 is j. Also, we have

J ð- Þ =p
0 2j  1
0 ð2j - 1Þ  2
0 ⋱
⋱ 0
ð2j - k þ 1Þ  k ,
0
⋱ ⋱
p
0 1  2j
0
ð25:74Þ
1292 25 Advanced Topics of Lie Algebra

J ð þÞ =
0
p
2j  1 0
ð2j - 1Þ  2 0
⋱ 0
,
ð2j - k þ 1Þ  k 0
⋱ ⋱
0
⋱ 0
p
1  2j 0
ð25:75Þ

where k = 1, 2, ⋯, 2j. In (25.73)-(25.75), the basis vectors iζ 1 + ζ 2, iζ 3, and iζ 1 - ζ 2


may equally be used instead of J(-), M3, and J(+), respectively.
The above matrix representations of the Cartan-Weyl basis of J(-), M3, and J(+)
give important information by directly appealing to the intuition. The matrix M3 is
admittedly a diagonalized matrix. The matrices J(+) and J(-) are lower and upper
triangle matrices, respectively. This means that all their eigenvalues are zero (see
Sect. 12.1). In turn, this immediately leads to the fact that J(+) and J(-) are (2j+1)th
nilpotent (2j+1, 2j+1) square matrices and implies the presence of (2j+1) linearly
independent vectors in the representation space V. Also, we know that neither J(+)
nor J(-) can be diagonalized; see Sect. 12.3. Let ad be a representation pertinent to
(25.73)-(25.75). Then, the representation ðad, VÞ is irreducible. Summarizing the
above results, we have the following theorem.
Theorem 25.3 [1] Let ðad, VÞ be an irreducible representation of W. Suppose that
the highest weight of ðad, VÞ is j (>0). Then, the following relationships holds:
(i) dimV = 2j þ 1.
(ii) The weights are j, j - 1, ⋯, - j + 1, - j with the highest weight j.
(iii) There exist (2j + 1) basis vectors jζ, μ i [μ = j, j - 1, ⋯, -j + 1, -j; ζ = j( j + 1)]
that satisfy the following equations:

M3 j ζ, μi = μ j ζ, μ i ðμ = j, j - 1, ⋯, - j þ 1, - jÞ,
J ðþÞ j ζ, μi = ðj - μÞðj þ μ þ 1Þ j ζ, μ þ 1i ðμ = j - 1, j - 2, ⋯, - jÞ, ð25:76Þ
ð- Þ
J j ζ, μi = ðj - μ þ 1Þðj þ μÞ j ζ, μ - 1 i ðμ = j, j - 1, ⋯, - j þ 1Þ,

where j is chosen from among positive integers or positive half-odd-integers.


25.2 Cartan–Weyl Basis of Lie Algebra [1] 1293

Three operators of (25.73)-(25.75) satisfy the commutation relations described


by (25.68). Readers are encouraged to confirm it. The dimension of the representa-
tion space V relevant to these irreducible representations is 2j + 1. Correspondingly,
V is spanned by the (2j + 1) basis vectors jζ, μ i [μ = j, j - 1, ⋯, -j + 1, -
j; ζ = j( j + 1)]. Note, however, that the dimension as the Lie algebra of W is three;
see (25.67). The commutation relations (25.73)-(25.75) equally hold with iζ 1 + ζ 2,
iζ 3, and iζ 1 - ζ 2, which constitute the basis vectors of W.
As explained above, the notion of complexification of the Lie algebra finds wide
applications in the field of continuous groups not only from the point of view of
abstract algebra, but also with the physical interpretations of, e.g., angular momenta
that were dealt with in Chap. 3. In particular, the Cartan-Weyl basis has been
constructed through the complexification of soð3Þ or suð2Þ. The Lie algebras of W
and W are subalgebras of oð3, ℂÞ and slð2, ℂÞ, respectively.
In relation to the complexification, we add a following condition to the definition
of the representation of Lie algebra so that we may extend the concept of the
representation to the complex representation. The condition is described by [1]

ρðiX Þ = iρðX Þ, ð25:77Þ

where ρ is a representation of the Lie algebra. If ρ satisfies (25.77), ρ is said to be a


complex analytic representation of the Lie algebra [1]. Let us formally give a
following definition.
Definition 25.4 Let g be a complex Lie algebra and let V be a vector space.
Suppose that there is a complex linear mapping ρ from g to V. If ρ is a representation
of g on V and satisfies the condition (25.77), then such ρ is said to be a complex
analytic representation.
By virtue of the condition (25.77), we may extend (i) of Definition 25.1 to

ðiÞ0 σ ðaX Þ = aσ ðX Þ ða 2 ℂÞ: ð25:78Þ

Except for (i) of Definition 25.1, the conditions (ii), (iii), and (iv) hold without
alterations.
In the context of the complex representation, we have a following general
theorem.
Theorem 25.4 [1] Let g be a complex Lie algebra and h be a real form of g (see
Sect. 24.10). Let ρ be any representation of h on a vector space V. Moreover, let ρc
be a representation of g on V. Suppose with ρc we have a following relationship:

ρc ðZ Þ = ρðX Þ þ iρðY Þ ðZ 2 g; X, Y 2 hÞ, ð25:79Þ

where Z = X + iY. Then, ðρc , VÞ is a complex analytic representation of g on V.


1294 25 Advanced Topics of Lie Algebra

Proof For the proof, use (25.1), (25.2), and (25.5). The representation ρc is a real
linear mapping from g to glðVÞ. The representation ρc is a complex linear mapping
as well. It is because we have

ρc ½iðX þ iY Þ = ρc ðiX - Y Þ = - ρðY Þ þ iρðX Þ = iρc ðX þ iY Þ: ð25:80Þ

Meanwhile, we have

ρc ð½X þ iY, U þ iW Þ = ½ρðX Þ, ρðU Þ - ½ρðY Þ, ρðW Þ


þi½ρðX Þ, ρðW Þ þ i½ρðY Þ, ρðU Þ
= ½ρc ðX þ iY Þ, ρc ðU þ iW Þ:

Hence, ðρc , VÞ is a complex analytic representation of g. This completes the


proof.
In the next section, we make the most of the results obtained in this section.

25.2.2 Coupling of Angular Momenta: Revisited

In Sect. 20.3, we calculated the Clebsch-Gordan coefficients on the basis of the


representation theory of the continuous group SU(2). Here, we wish to seek the
relevant results using the above-mentioned theory of Lie algebras. We deal with the
problems that we considered in Examples 20.1 and 20.2. Since in this section we
consider the coupling of the angular momenta, we modify the expression as in Sect.
20.3 such that

J = jj1 - j2 j, jj1 - j2 j þ 1, ⋯, j1 þ j2 , ð20:164Þ

where J indicates the momenta of the coupled system with j1 and j2 being momenta
of the individual systems. At the same time, J represents the highest weight. Thus,
j used in the previous section should be replaced with J.
We show two examples below and compare the results with those of Sect. 20.3.3.
In the examples given below, we adopt the following calculation rule. Let O be an
operator chosen from among M3, J(+), and J(-) of (25.76). Suppose that the starting
equation we wish to deal with is given by

Ξ = Φ  Ψ,

where Φ  Ψ is a direct product of quantum states Φ and Ψ (or physical states, more
generally). Then, we have
25.2 Cartan–Weyl Basis of Lie Algebra [1] 1295

OΞ = OðΦ  ΨÞ = ðOΦÞ  Ψ þ Φ  ðOΨÞ,

where O is an operator that is supposed to operate on Ξ.


If we choose M3 for O and both Φ and Ψ are eigenstate of M3, we have a trivial
result with no new result obtained. It can be checked easily. If, however, we choose
J(+) or J(-), we can get meaningful results. To show this, we give two examples
below.
Example 25.4 We deal with the problem that we considered in Example 20.1. That
is, we are examining the case of D(1/2)  D(1/2). According to (20.164), as the highest
weight J we have

J = 1, 0:

(i) J = 1
If among the highest weights we choose the largest one, the associated
coupled state is always given by the direct product of the individual component
state(see Examples 20.1 and 20.2). Thus, we have

j 1, 3i = j 1=2, 2iA j 1=2, 2iB : ð25:81Þ

In (25.81), we adopted the expression jμ, 2J + 1i with 2J + 1 bold faced


according to the custom instead of jζ, μi used in the previous section. The
subscripts A and B in RHS of (25.81) designate the individual component
systems. The strategy to solve the problem is to operate J(-) of (25.76) on
both sides of (25.81) and to diminish μ stepwise with J (or ζ) kept unchanged.
That is, we have

J ð- Þ j 1, 3i = J ð- Þ ½j1=2, 2iA j1=2, 2iB : ð25:82Þ

Then, we obtain
p
2 j 0, 3i = J ð- Þ j1=2, 2iA j 1=2, 2iB þ j 1=2, 2iA J ð- Þ j1=2, 2iB : ð25:83Þ

Hence, we have

1
j 0, 3i = p ½j - 1=2, 2iA j1=2, 2iB þj1=2, 2iA j - 1=2, 2iB : ð25:84Þ
2

Further operating J(-) on both sides, we get

j - 1, 3i = j - 1=2, 2iA j - 1=2, 2iB : ð25:85Þ


1296 25 Advanced Topics of Lie Algebra

Thus, we have obtained the symmetric representations (25.81), (25.84), and


(25.85). Note that

J ð- Þ j - 1=2, 2iA = J ð- Þ j - 1=2, 2iB = 0: ð25:86Þ

Looking at (25.85), we notice that we can equally start with (25.85) using J(+)
instead of starting with (25.81) using J(-).
(ii) J = 0
In this case, we determine a suitable solution j0, 1i so that it may become
orthogonal to j0, 3i given by (25.84). We can readily find the solution such that

1
j 0, 1i = p ½j - 1=2, 2iA j1=2, 2iB - j1=2, 2iA j - 1=2, 2iB : ð25:87Þ
2

Compare the above results with those obtained in Example 20.1.


Example 25.5 Next, we consider the case of the direct product D(1)  D(1) in
parallel with Example 20.2. In this case the highest weight J is

J = 2, 1, 0: ð25:88Þ

The calculation processes are a little bit longer than Example 25.4, but the
calculation principle is virtually the same.
(i) J = 2.
We start with

j 2, 5i = j 1, 3iA j 1, 3iB : ð25:89Þ

Proceeding in a similar manner as before, we get

1
j 1, 5i = p ðj1, 3iA j0, 3iB þj0, 3iA j1, 3iB Þ: ð25:90Þ
2

Further operating J(-) on both sides of (25.90), we successively obtain

1
j 0, 5i = p ðj - 1, 3iA j1, 3iB þ 2j0, 3iA j0, 3iB þj1, 3iA j - 1, 3iB Þ, ð25:91Þ
6
1
j - 1, 5i = p ðj - 1, 3iA j0, 3iB þj0, 3iA j - 1, 3iB Þ, ð25:92Þ
2
j - 2, 5i = j - 1, 3iA j - 1, 3iB : ð25:93Þ

(ii) J = 1
25.2 Cartan–Weyl Basis of Lie Algebra [1] 1297

In this case, we determine a suitable solution j1, 3i so that it may become


orthogonal to j1, 5i given by (25.90). We can readily find the solution described
by

1
j 1, 3i = p ðj1, 3iA j0, 3iB - j0, 3iA j1, 3iB Þ: ð25:94Þ
2

Then, operating J(-) on both sides of (25.94) successively, we get

1
j 0, 3i = p ðj1, 3iA j - 1, 3iB - j - 1, 3iA j0, 3iB Þ, ð25:95Þ
2
1
j - 1, 3i = p ðj0, 3iA j - 1, 3iB - j - 1, 3iA j0, 3iB Þ: ð25:96Þ
2

(iii) J = 0
We determine a suitable solution j0, 1i so that it is orthogonal to j0, 5i given
by (25.91) to get the desired solution described by

1
j 0, 1i = p ðj - 1, 3iA j1, 3iB - j0, 3iA j0, 3iB þj1, 3iA j - 1, 3iB Þ: ð25:97Þ
3

Notice that if we had chosen for j0, 1i a vector orthogonal to j0, 3i of (25.95), we
would not successfully construct a unitary matrix expressed in (20.232). The above
is the whole procedure that finds a set of proper solutions. Of these, the solutions
obtained with J = 2 and 0 possess the symmetric representations, whereas those
obtained for J = 1 have the antisymmetric representions.
Symbolically writing the above results of the direct product in terms of the direct
sum, we have

3  3 = 5  3  1: ð25:98Þ

If in the above processes we start with j - 2, 5i = j - 1, 3iA j - 1, 3iB and


operate J(+) on both sides, we will reach the same conclusion as the above. Once
again, compare the above results with those obtained in Example 20.2.
As shown in the above examples, we recognize that we have been able to
automatically find a whole bunch of solutions compared to the previous calculations
carried out in Examples 20.1 and 20.2.
And what is more, as we have seen in this chapter, various problems which the
rresentation theory deals with can be addressed clearly and succinctly by replacing
the representation ðρ, VÞ of the group G with the representation ðσ, VÞ of its
corresponding Lie algebra g. Thus, the representation theory of the Lie algebras
together with the Lie groups serves as a powerful tool to solve various kinds of
problems in mathematical physics.
1298 25 Advanced Topics of Lie Algebra

25.3 Decomposition of Lie Algebra

In Sect. 24.10 we have mentioned the decomposition of the complex matrices


(Theorem 24.14) and that of slð2, ℂÞ (Theorem 24.15) in connection with the
discussion that has clarified the relationship between SL(2, ℂ) and SO0(3, 1). Since
the whole collection of the (2, 2) complex matrices forms glð2, ℂÞ, Theorem 24.14
states the decomposition of glð2, ℂÞ. Since the Lie algebras form vector spaces, the
relevant discussion will come down to the decomposition of the vector spaces.
Let us formally define glð2, ℂÞ such that

glð2, ℂÞ  C = cij ; i, j = 1, 2, cij 2 ℂ : ð25:99Þ

This has already appeared in Sect. 24.10. In fact, glð2, ℂÞ is identical to C of


(24.530).
In Sect. 11.1 we dealt with the decomposition of a vector space into a direct sum
of two (or more) subspaces. Meanwhile, in Sect. 14.6 we have shown that any matrix
A can be decomposed such that

A = B þ iC, ð14:125Þ

where

1 1
B A þ A{ and C  A - A{ ð25:100Þ
2 2i

with both B and C being Hermitian. In other words, any matrix A can be decomposed
into the sum of an Hermitian matrix and an anti-Hermitian matrix. Instead of
decomposing A as (14.125), we may equally decompose A such that

A = B þ iC, ð14:125Þ

where

1 1
B A - A{ and C  A þ A{ ð25:101Þ
2 2i

with both B and C being anti-Hermitian.


If a Lie algebra is formed by a complexification of a subalgebra (see Definition
24.2), such examples are of high practical interest. As a typical example, we had
slð2, ℂÞ and glð2, ℂÞ. Both (25.100) and (25.101) are used for the complexification
of real Lie algebras. We have already encountered the latter example in Sect. 24.10.
There, we have shown that slð2, ℂÞ is a complexification of suð2Þ, which is a real Lie
subalgebra of slð2, ℂÞ; see Theorem 24.15. In that case, we can choose suð2Þ for a
real form h of slð2, ℂÞ. The Lie algebra slð2, ℂÞ is decomposed as in (14.125) in
25.3 Decomposition of Lie Algebra 1299

combination with (25.101), where both B and C are anti-Hermitian. Namely, we


have

Z = B þ iC ½Z 2 slð2, ℂÞ; B, C 2 suð2Þ:

In the case of glð2, ℂÞ, however, glð2, ℂÞ is a complexification of V4 . The vector


space V4 , which has appeared in (24.493), is evidently a real subalgebra of glð2, ℂÞ.
(The confirmation is left for the readers.) Therefore, we can choose V4 for a real
form h of glð2, ℂÞ. It is decomposed as in (14.125) in combination with (25.100),
where both B and C are Hermitian. In this case, we have

Z = B þ iC Z 2 glð2, ℂÞ; B, C 2 V4 :

Viewing the Lie algebra as a vector space, we have a following theorem with
respect to V4 .
Theorem 25.5 Let the vector space V4 be given by

V4 = H = hij ; i, j = 1, 2, hij 2 ℂ, H : Hermitian , ð24:493Þ

where (hij) is a complex (2, 2) Hermitian matrix. Then, the Lie algebra glð2, ℂÞ is
expressed as a direct sum of V4 and iV4 . That is, we have

glð2, ℂÞ = V4  iV4 , ð25:102Þ

where iV4 = iH; H 2 V4 ; see (24.493).

Proof It is evident that both V4 and iV4 form a vector space [i.e., subspace of
glð2, ℂÞ. Let z be an arbitrarily chosen element of glð2, ℂÞ. We assume that z can be
described as

a þ ib c þ id
z= ða, b, c, d, p, q, r, s 2 ℝÞ: ð24:535Þ
p þ iq r þ is

Using σ i ði = 0, 1, 2, 3Þ 2 V4 given by (24.495), we have

1
z = ½ða þ r Þσ 0 þ ð - a þ r Þσ 3 þ ðc þ pÞσ 1 þ ðd - qÞσ 2 
2 ð24:536Þ
i
þ ½ðb þ sÞσ 0 þ ð - b þ sÞσ 3 þ ðd þ qÞσ 1 þ ð - c þ pÞσ 2 :
2

The element of the first term of (24.536) belongs to V4 and that of the second
term of (24.536) belongs to iV4 . Thus, z 2 V4 þ iV4 . Since z is arbitrary, this
implies that
1300 25 Advanced Topics of Lie Algebra

glð2, ℂÞ ⊂ V4 þ iV4 : ð25:103Þ

Now, suppose that ∃ u 2 glð2, ℂÞ is contained in V4 \ iV4 . Then, u can be


expressed as

u = ασ 0 þ βσ 1 þ γσ 2 þ δσ 3 ðα, β, γ, δ : arbitrary real numbersÞ


ð25:104Þ
= iðλσ 0 þ μσ 1 þ νσ 2 þ ξσ 3 Þ ðλ, μ, ν, ξ : arbitrary real numbersÞ:

Since σ 0, σ 1, σ 2, and σ 3 are basis vectors of V4 , (25.104) implies that

α = iλ, β = iμ, γ = iν, δ = iξ: ð25:105Þ

Since α, β, ⋯, λ, μ, etc. are real numbers, (25.105) means that

α = β = γ = δ = λ = μ = ν = ξ = 0:

Namely, u  0. This clearly indicates that

V4 \ iV4 = f0g:

Thus, from Theorem 11.1, any vector w 2 V4 þ iV4 is uniquely described by

w = w1 þ w2 w1 2 V4 , w2 2 iV4 : ð25:106Þ

That is, by the definition of the direct sum described by (11.17), V4 þ iV4 is
identical with V4  iV4 . Meanwhile, from (25.106) evidently we have

V4  iV4 ⊂ glð2, ℂÞ: ð25:107Þ

Using (25.103) and (25.107) along with (25.106), we obtain (25.102). These
complete the proof.
Thus, glð2, ℂÞ is a complexification of V4 . In a complementary style, the
subalgebra V4 is said to be a real form of glð2, ℂÞ. Theorem 25.5 is in parallel
with the discussion of Sects. 24.9 and 25.2 on the relationship between
slð2, ℂÞ and suð2Þ. In the latter case, we say that slð2, ℂÞ is a complexification of
suð2Þ and that suð2Þ is a real form of slð2, ℂÞ. Caution should be taken in that the real
form V4 comprises the Hermitian operators, whereas the real form suð2Þ consists of
the anti-Hermitian operators. We remark that although iV4 is a subspace of glð2, ℂÞ,
it is not a subalgebra of glð2, ℂÞ.
In this book, we chose glð2, ℂÞ and slð2, ℂÞ as typical examples of the complex
Lie algebra. A bit confusing, however, these Lie algebras are real Lie algebra as
well. This is related to the definition of the representation of Lie algebra. In (25.1),
we designated a as a real number. Suppose that the basis set of the Lie algebra g is
25.4 Further Topics of Lie Algebra 1301

fe1 , ⋯, ed g, where d is the dimension of g (as a vector space). Then, from (25.1) and
(25.2) we obtain

d d
σ ak e k = ak σ ðek Þ ak : real : ð25:108Þ
k=1 k=1

Now, we assume that σ : g → g0 . Also, we assume that σ ðek Þ ðk = 1, ⋯, dÞ is the


basis set of g0 . Then, g and g0 are isomorphic and σ is a bijective mapping. If σ is
endomorphism (i.e., g = g0 ), σ is merely an identity mapping. Though trivial,
d
(25.108) clearly indicates that σ transforms a real vector ak ek into another real
k=1
d
vector ak σ ðek Þ. It follows that both the g and g0 are real Lie algebras. Thus, so far
k=1
as ak is a real number, we must deal with even a complex Lie algebra as a real Lie
algebra. As already discussed, this confusing situation applies to glð2, ℂÞ and
slð2, ℂÞ. Note that in contrast, e.g., suð2Þ is inherently not a complex algebra, but
a real Lie algebra.
Recall (24.550). We have mentioned that slð2, ℂÞ is a complex vector space. Yet,
if we regard iζ 1 , iζ 2 , and iζ 3 as the basis vectors of slð2, ℂÞ, then their coefficients -
a, (d - q)/2, and (c + p)/2 of (24.550) are real, and so slð2, ℂÞ can be viewed as a real
vector space. This situation is the same as that for glð2, ℂÞ; see (24.536). In this
respect, the simplest example is a complex number z 2 ℂ. It can be expressed as
z = x + iy (x, y 2 ℝ). We rewrite it as

z = x  1 þ y  i ðx, y 2 ℝÞ: ð25:109Þ

We have z = 0 if and only if x = y = 0. Suppose that we view a whole complex


field ℂ as a two-dimensional vector space. Then, 1 and i constitute the basis vectors
of ℂ with x and y being coefficients. In that sense, ℂ is a two-dimensional real vector
space over ℝ. In terms of abstract algebra, ℂ is referred to as the second-order
extension of ℝ.
We will not go into further detail about this issue. Instead, further reading is given
at the end of this chapter (see Refs. [2, 3]).

25.4 Further Topics of Lie Algebra

In relation to the advanced topics of the Lie algebra given in this chapter, we wish to
mention related topics below. In Sect. 15.2 we have had various properties of the
exponential functions of matrices. Here, it is worth adding some more properties of
those functions. We have additional important properties below [1].
1302 25 Advanced Topics of Lie Algebra

(10) Let A be a symmetric matrix. Then, expA is a symmetric matrix.


(11) Let A be an Hermitian matrix. Then, expA is an Hermitian matrix.
(12) Let expA be a symmetric matrix. Then, A is a symmetric matrix.
(13) Let expA be an Hermitian matrix. Then, A is an Hermitian matrix.
Properties (10) and (11) are evident from Properties (3) and (4), respectively. To
show Property (12), we have

ðexp tAÞT = exp tAT = exp tA, ð25:110Þ

where with the first equality we used Property (3) of (15.31). Differentiating exp
(tAT) and exptA of (25.110) with respect to t and taking a limit of t → 0 in the
derivative, we obtain AT = A. In a similar manner, we can show Property (13).
With so0 ð3, 1Þ we have six basis vectors (in a matrix form) as shown in (24.220).
Of these, three vectors are real skew-symmetric (or anti-Hermitian) matrices and
other three vectors are real symmetric (or Hermitian) matrices. Then, their exponen-
tial functions are real orthogonal matrices [from Property (6) of Sect. 15.2] and real
symmetric (or Hermitian) matrices [from Properties (12) and (13) as shown above],
respectively. Since eigenvalues of the Hermitian matrix are real, the eigenvalues
corresponding to its exponential function are positive definite from Theorem 15.3.
Thus, naturally we have a good example of the polar decomposition of a matrix with
respect to the Lorentz transformation matrix; see Theorem 24.7 and Corollary 24.3.
We have a next example for this.
Example 25.6 Returning to the topics dealt with in Sects. 24.1 and 24.2, let us
consider the successive Lorentz transformations of the rotation about the z-axis and
the subsequent boost in the positive direction of the z-axis. It is a case where the
direction of rotation axis and that of boost coincide.
The pertinent elements of the Lie algebra are given by

0 0 0 0 0 0 0 1
0 0 -1 0 0 0 0 0
Xr = , Xb = 0 0 0 0 , ð25:111Þ
0 1 0 0
0 0 0 0 1 0 0 0

where Xr and Xb stand for the rotation and boost, respectively. We express the
rotation by θ and the boost by ω as
25.4 Further Topics of Lie Algebra 1303

0 0 0 0 0 0 0 -ω
0 0 -θ 0 0 0 0 0
X θr = 0 θ 0 0 , X ωb = : ð25:112Þ
0 0 0 0
0 0 0 0 -ω 0 0 0

Notice that Xr and Xb are commutative and so are X θr and X ωb . Therefore, on the
basis of Theorem 15.2 we obtain

exp X θr exp X ωb = exp X ωb exp X θr = exp X θr þ X ωb : ð25:113Þ

Meanwhile, we define X as

0 0 0 -ω
0 0 -θ 0
X  X θr þ X ωb = : ð25:114Þ
0 θ 0 0
-ω 0 0 0

Borrowing (24.224) and the related result, as the successive Lorentz transforma-
tions Λ we get

Λ = exp X θr exp X ωb = exp X ωb exp X θr = exp X θr þ X ωb = exp X


1 0 0 0 cosh ω 0 0 - sinh ω
0 cos θ - sin θ 0 0 1 0 0
= 0 sin θ cos θ 0 0 0 1 0

0 0 0 1 - sinh ω 0 0 cosh ω ð25:115Þ

cosh ω 0 0 - sinh ω
0 cos θ - sin θ 0
= 0 sin θ cos θ 0 :

- sinh ω 0 0 cosh ω

It can be readily checked that Λ is normal. Thus, (25.115) provides a good


illustration for the polar decomposition. As expected from the discussion of Sect.
24.9.1, two matrices with the second last equality of (25.115) are commutative by
virtue of Corollary 24.2. We have
1304 25 Advanced Topics of Lie Algebra

ΛΛ{ = Λ{ Λ = ðexpX Þ expX { = expX { ðexpX Þ = exp X þ X {


cosh 2ω 0 0 - sinh 2ω
0 1 0 0 ð25:116Þ
= exp 2X bω = :
0 0 1 0
- sinh 2ω 0 0 cosh 2ω

As shown above, if the successive transformations comprise the rotation around a


certain axis (the z-axis in the present case) and the boost in the direction of that same
axis, the corresponding representation matrices Λ and Λ{ are commutative (i.e., Λ is
normal). In this case, the effects of the rotation of Λ and Λ{ are cancelled out and we
are only left with a doubled effect of the boosts.

25.5 Closing Remarks

In this book we have dealt with classical electromagnetism and quantum mechanics
that are bridged by the quantum field theory. Meanwhile, chemistry-oriented topics
such as quantum chemistry were studied in a framework of materials science and
molecular science including the device physics. The theory of vector spaces has
given us a bird’s-eye view of all these topics. Theory of analytic functions and group
theory supply us with a powerful tool for calculating various physical quantities and
invariants that appeared in the mathematical equations. The author is sure that the
readers acquire ability to attack various problems in the mathematical physical
chemistry.

References

1. Yamanouchi T, Sugiura M (1960) Introduction to continuous groups. Baifukan, Tokyo.


(in Japanese)
2. Takeuchi G (1983) Lie algebra and the theory of elementary particles. Shokabo, Tokyo.
(in Japanese)
3. Gilmore R (2005) Lie groups, Lie algebras, and some of their applications. Dover, New York
Index

A Analytical solution, 45, 109, 155


Abelian group, 648, 715, 737, 772, 789 Analytic continuation, 232–234, 241, 273
Abscissa axis, 185, 202 Analytic function, 185–274, 330, 607, 608,
Absolute convergence, 609 898, 1048, 1304
Absolute temperature, 352 Analytic prolongation, 234
Absolute value, 122, 129, 141, 144, 202, 211, Anderson, C., 1036
222, 349, 366, 591, 804, 809, 837, 943, Angle-dependent emission spectra, 377, 388,
1021 390, 391
Abstract algebra, 877, 883, 914, 1293, 1301 Angle-dependent spectroscopy, 393
Abstract concept, 457, 661, 714, 915, 922 Angular coordinate, 202, 203
AC5, 374–377 Angular frequency, 4, 7, 32, 129, 131, 142, 287,
AC’7, 376 353, 358, 359, 364, 369, 994, 995, 1002,
Accumulation point, 196–198, 230, 232, 233 1107
Adherent point, 192, 197 Angular momentum
Adjoint boundary functionals, 411, 415 generalized, 75–79, 839, 840, 874, 904,
Adjoint Dirac equation, 1029, 1081 926, 1290
Adjoint Green’s functions, 419, 424 operator, 30, 89, 94, 144, 147
Adjoint matrix, 22, 550, 610 orbital, 60, 79–110, 124, 144, 874, 926, 959
Adjoint operator, 111, 409, 411, 549, 561–566, spin, 79, 873, 874, 959
1166, 1232 total, 139, 959, 960, 962
Algebraic branch point, 264 Anisotropic
Algebraic equation, 15 crystal, 381, 392
Allowed transition, 140, 149, 790, 803, 825 dielectric constant, 380
Allyl radical, 714, 715, 773, 797–805 medium, 380
Aluminum-doped zinc oxide (AZO), 377, 378, Annihilation operator, 31, 41, 113, 114, 998,
384–386 1005, 1026, 1035–1037, 1039–1041,
Ampère, A.-M., 281 1061, 1065, 1097, 1104, 1115, 1116
Ampère’s circuital law, 281 Anti-commutative, 974, 1043
Amplitude, 287, 293, 294, 308, 310, 313, 315, Anti-commutator, 947, 1033
316, 321, 336, 344, 345, 347, 364, 396, Anti-Hermitian, 29, 111, 119, 409, 830, 901,
398, 445, 446, 1073, 1115–1120, 1124, 968, 972, 974, 1030, 1086, 1191, 1208,
1126–1136 1259, 1267, 1268, 1298, 1299
Analytical equation, 3 Anti-Hermitian operator, 29, 111, 636, 834,
Analytical mechanics, 977, 983, 999 835, 839, 901, 904, 1288, 1300
Analytical method, 80, 85, 94 Antilinear, 552

© Springer Nature Singapore Pte Ltd. 2023 1305


S. Hotta, Mathematical Physical Chemistry,
https://doi.org/10.1007/978-981-99-2512-4
1306 Index

Antiparticle, 967, 970, 1039–1040, 1256 1160–1162, 1164, 1167, 1174, 1176,
Anti-phase, 349 1180, 1193, 1194, 1289, 1300
Antipodal point, 924, 925 vectors, 8, 42, 61, 63, 143, 159, 300, 418,
Antisymmetric, 435, 750–753, 890–892, 895, 457, 459, 464, 465, 467, 469–471, 473,
1003, 1053, 1278, 1297 475, 478–484, 495, 497, 501, 503, 514,
Approximation method, 155–183 516–518, 522, 532–535, 540, 542,
Arc tangent, 66, 386 544–546, 554, 558, 562–566, 571, 601,
Argument, 30, 57, 68, 156, 192, 203, 234, 238, 662, 666, 675, 677, 678, 681, 683, 695,
245, 251, 255, 261, 263, 264, 267, 268, 703, 705, 709, 710, 712, 714, 715, 717,
270, 273, 313, 328, 330, 365, 407, 416, 730, 737, 742, 746, 764, 770, 773, 787,
420–423, 425, 427, 428, 431, 434, 435, 791, 798, 810, 831, 834, 873, 883, 895,
438, 440, 441, 480, 489, 516, 525, 527, 903, 960, 1025, 1086, 1150, 1160, 1175,
543, 558, 581, 598, 604, 634, 642, 649, 1194, 1198, 1203, 1259, 1261, 1276,
673, 684, 707, 719, 729, 733, 757, 761, 1287–1290, 1302
766, 773, 777, 790, 825, 834, 846, 849, BCs, see Boundary conditions (BCs)
856, 858, 878, 932, 940, 988, 1003, Benzene, 674, 755, 773, 789, 791–797, 799
1008, 1017, 1068, 1130, 1144, 1156, Bessel functions, 642, 643
1181, 1220, 1257, 1258 Bijective, 472–474, 476, 648, 655, 924, 926,
Aromatic hydrocarbons, 755, 773 1149, 1157, 1158, 1168, 1276, 1281,
Aromatic molecules, 783, 797, 805 1284, 1285, 1287, 1301
Associated Laguerre polynomials, 59, 110, Bijective homomorphism, 655, 656
118–124 Bilinear form, 935, 936, 938, 939, 1155, 1157,
Associated Legendre differential equation, 85, 1158, 1162, 1163, 1168, 1177–1179,
94–105, 450 1183, 1187, 1193, 1261, 1263
Associated Legendre functions, 59, 85, 97, Bilinearity
105–109 of tensor product, 1152
Associative law, 24, 466, 566, 567, 650, 658, Binomial expansion, 240, 241, 883
689, 692, 943, 1083, 1173 Binomial theorem
Asymmetric, 1037 generalized, 240
Atomic orbitals, 761, 768, 771–773, 775, 778, Biphenyl, 674
783, 784, 791, 792, 798, 805, 813 Bithiophene, 674
Attribute, 662, 663 Blackbody, 351–353
Azimuthal angle, 30, 72, 700, 819, 861, 869, Blackbody radiation, 351–353
924, 925, 1194, 1196 Block matrix, 712, 891
Azimuthal quantum number, 122, 123, 144, Blueshift, 388
149 Blueshifted, 389–392, 394
Azimuth of direction, 689 Bohr radius
of hydrogen, 135, 822
of hydrogen-like atoms, 110
B Boltzmann distribution law, 351, 360
Back calculation, 587 Born probability rule, 589
Backward wave, 286, 343, 347 Boron trifluoride, 678
Ball, 226, 227, 935, 939 Bose–Einstein distribution functions, 353
Basis Boson, 1097–1102, 1139
functions, 90, 709–717, 746, 747, 771, 776, Boundary
786, 805, 806, 827, 844–846, 849, 869, functionals, 402, 406, 410, 411, 415, 431
873, 874, 876, 877, 880, 890–892, 894, term, 409
895, 1198, 1199 Boundary conditions (BCs)
set, 63, 483, 542, 543, 566, 576, 610, 711, adjoint, 415
718, 719, 746, 766, 877, 879, 885, 909, homogeneous, 445
912, 922, 934, 935, 1086, 1143–1148, homogeneous adjoint, 415–417
1151, 1152, 1154, 1156, 1158, inhomogeneous, 424, 623
Index 1307

Bounded, 25, 29, 77, 81, 212, 219, 221–223, Chemical species, 677
229, 1192, 1263 Circular argument, 156
BP1T, 374–377 Circular cone, 1185, 1186
BP3T, 392–394 Circularly polarized light
Bragg’s condition, 389, 390, 392 left-, 138, 140, 150, 302
Branch, 259, 261–264, 643 right-, 138, 140
Branch cut, 261, 262, 264–269, 271–273 Clad layer, 332, 339, 340, 343, 386
Branch point, 260, 262, 264–266, 268, 271–273 Classes, 647, 651–653, 679–681, 683, 685,
Bra vector, 22, 549, 552, 555 688, 725, 726, 731–737, 871
Brewster angle, 322–325 Classical Hamiltonian, 33
Classical numbers, 10
Classical orthogonal polynomials, 49, 97, 401,
C 450
Canonical basis, 1156, 1174 Clebsch–Gordan coefficient, 873–895, 1294
Canonical coordinate, 28, 977, 993, 1055, Clopen set, 196, 199, 200, 921
1059, 1089 Closed interval, 201
Canonical forms of matrices, 485–548, 583 Closed-open set, 196
Canonical momentum, 977, 993, 994, 1055, Closed path, 199, 200, 215, 216, 922
1059, 1089 Closed set, 190, 191, 193–195, 198, 201, 219,
Canonical variables, 983, 993 921
Cardinalities, 188, 1083 Closed shell, 767, 768, 782
Cardinal numbers, 188 Closed surface, 281, 282
Carrier space, 713 Closure, 191–194, 680, 943, 1083
Cartan–Weyl basis, 1287–1297 C-number, 10, 22, 995, 1015, 1026, 1032,
Cartesian coordinate, 8, 47, 111, 175, 291, 361, 1033, 1038, 1039, 1059, 1099, 1133
677, 762, 819, 821, 834, 852, 919, 978, Cofactor, 476, 477, 498
980, 1124, 1218 Cofactor matrix, 498
Cartesian space, 61, 460, 461 Coherent state, 128, 129, 135, 141, 152
Cauchy–Hadamard theorem, 209–212 Coinstantaneity, 944
Cauchy–Liouville Theorem, 222 Column vector, 8, 22, 42, 90, 91, 93, 299, 465,
Cauchy–Riemann conditions, 209–212 481, 483, 488, 489, 491, 492, 529,
Cauchy–Schwarz inequality, 551, 611 533–535, 537, 553, 560, 567, 580, 581,
Cauchy’s integral formula, 213–223, 225, 232, 592, 597, 598, 602, 622, 640, 684, 710,
236 786, 809, 839, 846, 849, 861, 894, 935,
Cauchy’s integral theorem, 216, 217, 251, 937, 938, 947, 949–953, 957, 968, 969,
1021–1023 1030, 1144–1146, 1152–1154, 1174,
Cavity, 33, 351, 353, 355, 357, 396, 397, 399 1176, 1193, 1201, 1210–1212, 1239,
Cavity radiation, 351 1240
Cayley’s theorem, 689 Common factor, 504, 505, 509, 510
Center of inversion, 668 Commutable, 28, 509, 513–515, 537, 550, 580,
Center-of-mass coordinate, 60 603, 668, 669, 900
Character, 479, 654, 657, 663, 723, 725, 726, Commutation relation
728, 729, 748–750, 767, 772, 774–777, anti-, 1032, 1035–1039, 1041, 1098, 1116
782–787, 792, 797, 803, 811, 825, 871, canonical, 3, 28–30, 44, 45, 56, 61, 67, 68,
872, 874, 943, 1156, 1164, 1175 977, 993, 1061, 1065
Characteristic equation, 443, 486, 487, 528, equal-time, 977, 993–999, 1005, 1015
591, 597, 715, 861 Commutative, 28, 175, 499, 544, 611, 617, 629,
Characteristic impedance 630, 641, 648, 677, 715, 854, 855, 896,
of dielectric media, 279, 280, 288, 305–351 901, 902, 919, 921, 945, 960–962, 1062,
Characteristic polynomial, 486, 508, 540, 543, 1088–1090, 1168, 1200, 1202, 1240,
547, 974 1248, 1249, 1265, 1288, 1303, 1304
Charge conjugation, 967–971, 1039, 1040, Commutative group, 648
1081, 1209, 1210, 1221, 1227 Commutative law, 648
1308 Index

Commutator, 28–30, 146, 147, 899, 903, 947, Conjugate momentum, 979
998, 1005–1007, 1017, 1032, 1062, Conjugate representations, 789
1089, 1272 Connected
Compact group, 1192, 1263 component, 916–918, 921, 926, 1189, 1255,
Compatibility relation, 776 1258, 1263
Compendium quantization method, 1031 set, 916, 917, 919, 923, 1254, 1263
Complement, 187, 188, 190, 194, 195, 571, Connectedness, 199, 200, 204, 915–926,
572, 583, 1178, 1179, 1181, 1184, 1186, 1241–1258
1187 CONS, see Complete orthonormal system
Complementary set, 188, 219, 918 (CONS)
Completely reducible, 712, 714, 855, 860 Conservation law, 283, 933
Completeness relation, 746, 1214, 1232, 1233, Conservation of charge, 282
1237 Conservation of energy, 4
Complete orthonormal system (CONS), 64, Conservative force, 982, 1078
155, 159, 167, 170, 171, 177, 873, 952, Conserved quantity, 959
962, 969, 989, 1091, 1239 Constant loop, 922, 1250
Complete system, 159 Constant matrix, 629, 630, 643
Complex amplitude, 313 Constant term, 173
Complex analysis, 185, 186, 203 Constructability, 423
Complex analytic representation, 1293, 1294 Continuity equation
Complex conjugate, 22, 24, 25, 29, 34, 119, current, 282, 1054
140, 205, 360, 412, 419, 423, 552, 562, Continuous function, 220, 915, 917, 922, 1258
718, 849, 881, 907, 932, 970, 1030, Continuous group, 196, 612, 689, 827–926,
1129 1141, 1241, 1242, 1293, 1294
Complex conjugate representation, 846 Continuous spectra, 961
Complex conjugate transposed matrix, 22, 550, Contour
740, 789 integral, 213–215, 236–238, 244, 246, 248,
Complex domain, 19, 205, 223, 234, 272–274, 250, 1021, 1047
326, 1012 integration, 217–219, 223, 227, 237, 239,
Complex function, 14, 185, 205, 210, 212, 213, 243, 245, 247, 255, 267, 270, 271, 1011,
411, 790, 1047 1020–1023, 1048
Complexification, 1268, 1269, 1281, map, 755, 756, 758, 759
1288–1294, 1298–1300 Contraposition, 210, 230, 404, 474, 503, 554,
Complex number field, 1142, 1155 693, 854, 1179
Complex phase shift, 341 Contravariant
Complex plane, 14, 185, 186, 202–204, 206, tensor, 1051, 1173, 1174
207, 212, 213, 217–219, 223, 249, 250, vector, 937–939, 1061, 1174, 1177
258–261, 272, 328, 329, 548, 915 Convergence circle, 226, 227, 233
Complex roots, 243, 442, 443 Convergence radius, 226
Complex variable, 15, 204–213, 222, 237, 557, Converse proposition, 617, 854
886, 1011 Coordinate point, 298, 301
Composite function, 66, 979, 981 Coordinate representation, 45–53, 110, 114,
Compton, A., 4, 6, 1075, 1078, 1102, 1109, 124, 129, 131, 139, 149, 161, 167, 170,
1111 172, 174, 175, 179, 181, 182, 264, 418,
Compton effect, 4 819, 852, 867, 954, 1049, 1086, 1124,
Compton scattering, 1075, 1078, 1102, 1109, 1184, 1193, 1195, 1199, 1201, 1237,
1111, 1113–1139 1245
Compton wavelength, 4 Coordinate space, 977
Condon–Shortley phase, 103, 105, 109, 125, Coordinate transformation, 300, 382, 661, 667,
135, 141–143 703, 706, 764, 845, 868, 1167, 1169,
Conductance, 336 1177, 1187
Conic section, 1185 Core layer, 300, 382, 661, 667, 703, 764, 845,
Conjugacy classes, 651, 680, 681, 683, 685, 868, 1167, 1169, 1177, 1187
688, 725, 734, 737 Corpuscular beam, 6, 7
Conjugate element, 651 Correction term, 157–160
Index 1309

Coset δ function, 418, 420, 422, 425, 444, 988, 1007,


decomposition, 650, 652, 677, 920, 1255 1008, 1024, 1025, 1047, 1065, 1068,
left, 650, 651 1106, 1116, 1117, 1122, 1124
right, 650, 651 de Moivre’s theorem, 203
Cosine rule, 1138 de Morgan’s law, 189
Coulomb gauge, 1056 Derived set, 196, 197
Coulomb integral, 766, 778, 794, 816 Determinant, 43, 61, 296, 405, 432, 457,
Coulomb potential, 59, 69 473–478, 544, 553, 557, 591, 596, 673,
Coulomb’s law, 278 680, 689, 690, 770, 815, 837, 895, 917,
Countable, 14, 588, 647, 989 918, 921, 943, 945, 949, 974, 1163,
Covariant 1188, 1189, 1249, 1254, 1257
derivative, 1076, 1080, 1082, 1083, 1085 Device physics, 342, 372, 395, 1304
form, 1052, 1060, 1064, 1095, 1202 Device substrate, 377, 384
tensor, 1173, 1174 Diagonal elements, 89, 90, 93, 296, 475, 477,
vector, 938, 939, 945, 1061, 1174, 1177 487, 488, 492, 503, 523, 525, 526, 540,
Covering group 545, 583, 595, 614, 615, 681, 716, 723,
universal, 926, 1241, 1242, 1249 771, 809, 811, 901, 902, 904, 936, 1034,
Cramer’s rule, 317, 428, 837 1160, 1161, 1239
Creation operator, 41, 998, 1004, 1005, 1026, Diagonalizable, 502, 513, 540, 543, 544
1035, 1036, 1097, 1104 Diagonalizable matrix, 539–548, 585
Critical angle, 322–325, 328, 329, 331, 341 Diagonalization, 493, 543, 592, 598–605, 627,
Cross-section area, 367, 369, 397 694, 716, 952, 1087, 1247
Crystal Diagonalized, 44, 502, 504, 513, 540, 542, 545,
anisotropic, 392 556, 571, 580, 581, 583, 585, 591, 593,
organic, 372, 377, 380, 389, 390, 393–395, 603, 605, 627, 632, 716, 717, 809, 832,
594 951–953, 962, 1087, 1141, 1160, 1161,
Cubic groups, 688 1204, 1219–1221, 1229, 1230, 1235,
Cyclic group, 648 1239, 1240, 1242, 1290, 1292
Cyclic permutation, 667, 703 Diagonal matrix, 91, 485, 512, 543, 560, 580,
Cyclopropenyl radical, 773, 783–791, 794, 796, 582, 583, 592, 593, 598, 627, 631, 664,
799 707, 715, 764, 809, 834, 856, 858, 910,
1034, 1161, 1204, 1258
Diagonal positions, 492, 513
D Dielectric constant, 278, 279, 348, 349, 380,
d’Alembertian, 932 1049
d’Alembert operator, 932 Dielectric constant ellipsoid, 381, 387, 391
Damped oscillator, 441, 443, 445, 446 Dielectric media, 279, 280, 288, 294, 305–351
Damping Differentiable, 67, 205, 207, 210, 212, 222,
constant, 441 283, 428, 607, 617, 898, 899
critical, 442 Differential cross-section, 1122, 1124
over, 442 Differential equations
weak, 442, 443 inhomogeneous, 402, 405, 424, 430, 440,
Darboux’s inequality, 219–220, 238 624, 626, 634
de Broglie, L.-V., 6 in a matrix form, 621–626
de Broglie wave, 6 second-order, 941
Definite integral, 26, 101, 114, 134, 139, 175, system of, 607, 618–637, 643, 644
236, 743, 749, 767, 769, 820, 1048 theory of, 431, 450
Degeneracy, 156 Differential operator, 15, 19, 26, 27, 51, 52, 71,
Degenerate 74, 80, 171, 177, 278, 401, 408, 410,
doubly, 795, 974, 1212, 1219, 1220, 1235, 411, 413, 415, 416, 421, 424–426, 431,
1238, 1239, 1257 436, 448–450, 549, 571, 932, 1050,
triply, 817, 823, 824 1203
1310 Index

Diffraction grating, 372, 375–378, 386, 390, of variability, 185


393, 395 Dual basis, 1156, 1158, 1167
Diffraction order, 379, 392 Dual space, 552, 1156, 1163, 1173
Dimensionless, 75, 80, 81, 110, 278–280, 287, Dual vector, 552, 1155–1165, 1174
368, 372, 819, 829 Dynamical system, 982–984, 989, 991, 1004
Dimensionless parameter, 51, 156 Dynamical variable, 959
Dimension theorem, 469, 473, 505, 519, 520,
541, 1286
Dipole E
approximation, 129, 144, 359 Effective index, 377, 383, 390–394
isolated, 365 Eigenenergy, 44, 113, 123, 155–160, 167, 168
oscillation, 351, 364 Eigenequation, 486
radiation, 361–367 Eigenfunction, 17–19, 24, 25, 27, 36, 37, 40,
transition, 127–129, 131, 134, 149, 766, 780 42, 69–71, 74, 89, 112, 114, 139, 156,
Dirac, P., 11, 12, 22, 549, 931 177, 452, 716, 742, 746, 765, 766, 796,
Dirac adjoint, 958, 964, 965, 967, 971, 1097, 800, 801, 821, 856, 892, 894, 951–955,
1237, 1240 957, 962, 964, 965, 967, 986, 987, 1234,
Dirac equation, 12, 399, 931–975, 991, 1032, 1239
1081, 1085, 1141, 1187, 1193, 1194, Eigenspace, 486, 495–500, 505–512, 514, 515,
1197–1204, 1208, 1210, 1211, 1214, 523, 540, 542, 543, 583, 589, 602, 746
1218, 1220–1222, 1227–1235, 1240 Eigenspinor, 1220
Dirac field, 960, 962, 963, 970, 977, 991, 993, Eigenstate
999, 1005, 1011, 1023, 1026–1049, energy, 167, 168, 180
1061, 1071, 1075, 1077–1079, 1083, simultaneous, 68, 71, 75, 81, 88, 92, 93,
1084, 1096, 1098, 1101, 1111, 1211 599–605, 962
Dirac operators, 1141, 1203, 1204, 1208, Eigenvalue
1212–1229, 1235–1241 equation, 13, 24, 29, 33, 52, 128, 155, 156,
Direct factor, 658, 677 164, 176, 177, 386, 430, 486, 492, 529,
Direction cosine, 288, 315, 684, 701, 702, 862, 548, 560, 602, 755, 760, 769, 771, 816,
864, 865 954, 956, 962, 989
Direct-product group, 647, 657–659, 677, 746, problem, 14, 16, 19, 21, 24, 122, 448–453,
749, 750, 874, 919 485, 548, 603, 717, 746, 813, 962, 1199,
Direct product space, 1142, 1155, 1171 1203
Direct sum, 196–198, 460, 462, 497, 500, 501, Eigenvector, 27, 37, 42, 76, 89, 156, 157, 159,
504–508, 514–516, 523, 540, 542, 543, 299, 485–495, 498, 500–505, 519–522,
571, 574, 659, 712, 717, 725, 728, 746, 525, 528–534, 537, 538, 540, 542–544,
768, 775, 797, 808, 855, 873, 916, 919, 546, 548, 580, 583, 585, 590, 599–604,
1178, 1192, 1204, 1233, 1298–1300 681, 691, 693, 702, 717, 770, 772, 809,
Dirichlet conditions, 15, 19, 26, 73, 354, 357, 861, 862, 952, 953, 962, 964, 989, 1199,
416, 430 1201, 1211, 1212, 1234, 1239, 1240
Disconnected, 199, 200 Einstein, A., 3, 4, 6, 7, 351, 367
Discrete set, 196, 197 Einstein coefficients, 359, 361, 365
Discrete topology, 199 Einstein summation convention, 937, 1177
Disjoint sets, 199, 200 Electric dipole
Dispersion of refractive index, 373–376 approximation, 144
Displacement current, 281, 283 moment, 129, 169, 170, 767
Distance function, 185, 202, 204, 550 operator, 824
Distributive law, 189, 467 transition, 127–129, 131, 134, 149, 766, 780
Divisor, 499, 500, 541, 675 Electric displacement, 278
Domain Electric field, 129, 150, 152, 161, 164, 168,
of analyticity, 213, 215–217, 219, 223, 234 170, 178, 277, 278, 280, 293–295, 297,
complex, 19, 205, 223, 234, 272–274, 326, 298, 301, 302, 307, 313–317, 321, 326,
1012 336, 337, 340–343, 347–349, 357,
Index 1311

363–365, 383–385, 396, 397, 441, 637, Equilateral triangle, 677, 678, 783, 784
781, 824, 1051 Equivalence
Electric flux density, 278, 382 law, 512, 593
Electric permittivity tensor, 380 relation, 916
Electric wave, 293–295 theorem, 932
Electromagnetic field, 33, 129, 151, 305–307, transformation, 570, 593, 594, 1161, 1162
313–315, 332, 363, 377, 380, 384, 386, Equivalent transformation, 593
398, 594, 643, 977, 991, 993, 1023, Ethylene, 755, 773–784, 786, 788, 790, 791,
1049–1073, 1075, 1077–1079, 1083, 797, 799
1084 Euclidean space, 1165, 1177–1187
Electromagnetic induction, 280 Euler angles, 695–704, 836, 840, 842, 845, 849,
Electromagnetic plane wave, 290 856, 861, 911, 1261
Electromagnetic potential, 1051, 1077 Euler equation
Electromagnetic wave, 129, 277, 289–303, second-order, 173
305–351, 353–358, 365, 370, 372, 382, Euler–Lagrange equation, 982, 992, 993, 1057
383, 396, 401, 441, 637, 1059 Euler’s formula, 203
Electromagnetism, 170, 277, 278, 287, 450, Euler’s identity, 203
594, 1049, 1061, 1304 Evanescent mode, 331
Electronic configuration, 767, 780–782, 797, Evanescent wave, 332, 338–343
802, 803, 824, 825 Even function, 19, 51, 134, 161, 867, 873, 1024
Element, 47, 86, 129, 186, 296, 359, 457, 487, Even permutations, 475, 904, 1095, 1111
549, 577, 607, 647, 661, 705, 761, 841, Excitation
936, 1032, 1083, 1145, 1271 optical, 372, 825
Elementary charge, 60, 129, 151, 359, 1076 strong, 373, 393
Elementary particle weak, 393, 394
physics, 1140, 1258 Excited state, 37, 44, 134, 156, 178, 179, 181,
reaction, 1120 351, 352, 358, 359, 366, 780, 797, 802,
Ellipse, 297, 298, 301, 302, 381, 599 803, 824, 825
Ellipsoidal coordinates, 819 Existence probability, 21, 32, 987
Elliptically polarized light, 297, 298, 301, 1061 Existence probability density, 142
Emission Expansion coefficient, 177, 225, 418, 866, 994,
direction, 377, 393 1261
gain, 392, 394 Expectation value, 25, 33–36, 53, 54, 70, 176,
grazing, 379, 390, 391 178, 588–590, 1015–1017, 1031, 1043,
lines, 370, 374, 375, 377, 393, 394 1079, 1098–1100
optical, 372, 768 Exponential function, 15, 248, 290, 308, 309,
spontaneous, 358, 359, 365, 366, 368 446, 607–644, 827, 830, 836, 856, 921,
stimulated, 358, 359, 367, 368 946, 1048, 1082, 1086, 1301, 1302
Empty set, 188–190, 652 Exponential polynomial, 309
Endomorphism, 457, 465, 468, 469, 471–474, External force field, 60, 441
478, 480, 648, 656, 907, 908, 1086, Extremals, 131, 347
1152, 1172, 1173, 1272, 1275, 1279, Extremum, 180, 388, 984
1286, 1301
Energy conservation, 4, 322, 1137
Energy eigenvalue, 20, 31, 36, 37, 53, 123, 135, F
155, 156, 158, 162, 176, 717, 746, 765, Fabry-Pérot resonator, 373
777, 779, 780, 789, 795, 796, 802, 814, Factor group, 653, 657, 677
823, 962, 967 Faraday, M., 280
Energy level, 41, 157–160, 176, 358, 781, 824 Faraday’s law of electromagnetic induction,
Energy-momentum, 991, 1107 280
Energy transport, 319–322, 331, 347 Fermion, 1036, 1043, 1095, 1098–1103, 1110,
Entire function, 213, 222, 223, 250, 1048 1111, 1139
Equation of motion, 60, 151, 441, 933, 981, Feynman
992–994, 1079, 1089 amplitude, 1073, 1115–1120, 1124, 1126,
Equation of wave motion, 277, 284–289, 334 1129
1312 Index

Feynman (cont.) 959, 983, 1023, 1047, 1051, 1083, 1156,


diagrams, 1103–1113, 1115–1120, 1139 1202, 1212, 1214, 1240
propagator, 977, 989, 994, 1007, Functional space, 549
1015–1023, 1028, 1043–1049, 1071, Function space, 25, 649, 774
1072, 1103, 1109, 1111, 1112 Fundamental set of solutions, 15, 405,
rules, 1103–1113 426–428, 431–434, 441–443, 640, 644
FIB, see Focused ion beam (FIB) Fundamental solution, 634
Fine structure constant, 1095, 1139
First-order approximation, 181, 182
First-order linear differential equation G
(FOLDE), 45, 46, 401, 406–411, 619, Gain function, 369
620, 624, 634 Gamma functions, 98–100, 887
First-order partial differential equation, 277 Gamma matrices, 947, 949, 952, 971–975,
Fixed coordinate system, 666, 683, 696, 704, 1032, 1082, 1124, 1130, 1132, 1133,
861, 924 1204, 1227
Flux, 151, 278–280, 322, 365, 369, 382, 638, Gauge
1050, 1051 field, 1055, 1082–1085
Fock invariant, 1084, 1130
basis, 1005 transformation, 1051, 1054, 1082–1084,
space, 999–1005 1126, 1129
state, 1092 Gaussian plane, 185
Focused ion beam (FIB), 377, 395 Gauss’ law, 278
FOLDE, see First-order linear differential Gegenbauer polynomials, 97, 102
equation (FOLDE) Generalized binomial theorem, 98, 240
Forbidden, 127, 781, 797, 825, 961 Generalized coordinate, 977, 978, 980, 981,
Force field 993
central, 59, 109 Generalized eigenspace, 498, 505–512, 515,
external, 60, 441 523
Forward wave, 286, 343, 345, 347 Generalized eigenvalue, 486, 528, 615
Four-group, 664, 674, 1255 Generalized eigenvector, 498, 500–505, 519,
Fourier 521, 522, 525, 528, 531–534, 538, 540,
analysis, 985–990 548
coefficients, 171 Generalized force, 981
components, 1026, 1117, 1203 Generalized Green’s identity, 432
cosine coefficients, 873 Generalized momentum, 979, 982, 993
cosine series, 873 General linear group, 648, 837
counterpart, 1049, 1116 General point, 661, 756, 844, 1194
integral transform, 987–990 General solution, 15, 32, 286, 405, 406, 431,
series expansion, 985–987 442, 443, 994, 1230
Fourier transform Generating function, 97
inverse, 985, 987–990 Geometric figures, 673
pairs, 989 Geometric object, 661–664, 666, 673, 680
Four‐momentum, 1004, 1017, 1104, 1107, Geometric series, 224, 228
1119, 1132, 1133, 1137, 1228 Glass, 277, 288, 295
Four-vector, 934, 945, 1003, 1024, 1026, 1035, Global properties, 920–926
1052–1055, 1059, 1076, 1079, 1131, Gram matrix, 555, 556, 558–560, 566, 569,
1133, 1135, 1136, 1193 570, 587, 592, 707, 1141, 1163, 1166
Fréchet axiom, 200 Gram–Schmidt orthonormalization, 567, 580,
Free field, 960, 1075, 1085 742, 965, 1165, 1179
Free particle, 21, 932, 1026, 1085 Grand orthogonality theorem (GOT), 717–723,
Free space, 305, 337, 343, 365 739
Functionals, 13, 15, 103, 120, 124, 136, 142, Grassmann numbers, 1032–1034, 1038, 1042,
165, 171, 173, 206, 210, 225, 234, 241, 1045
287, 402, 404, 406, 410, 411, 415, 431, Grating period, 379, 394
549, 620, 743, 760, 767, 772, 780, 885, Grating wavevector, 378, 379, 388, 391, 393
Index 1313

Grazing incidence, 328, 330 Hermite polynomials, 49, 51, 59, 167, 450
Greatest common factor, 505 Hermitian, 7, 34, 70, 132, 156, 360, 401, 555,
Green’s functions, 401–453, 607, 618, 620, 571, 616, 707, 765, 830, 943, 995, 1081,
621, 644 1165, 1278
Green’s identity, 420 Hermitian differential operator, 176, 178, 411,
Ground state, 36, 130, 134, 156, 157, 160, 161, 421, 424, 425, 448
168, 176, 178, 179, 181, 351, 352, 358, Hermitian matrix, 24, 133, 555–557, 569, 592,
359, 768, 780, 797, 802, 803, 824, 825 597, 616, 618, 627, 898, 907, 909, 951,
Group 952, 1192, 1197, 1229, 1242, 1243,
algebra, 726–734 1245–1247, 1259, 1263, 1298, 1299,
element, 647, 649, 651, 659, 675, 680, 685, 1302
688, 689, 705–707, 710, 712, 715, 719, Hermitian operator, 24, 25, 29, 54, 55, 71, 79,
721, 723–726, 730, 733–735, 737, 738, 111, 132, 160, 177, 181, 360, 401, 424,
746, 750, 761, 772, 780, 784, 791, 855, 450, 453, 571–605, 636, 741, 766, 834,
863–867, 870, 876, 1250 835, 839, 901, 904, 960, 962, 972, 1030,
finite, 647–649, 651, 654, 655, 657, 689, 1069, 1087, 1104, 1221, 1243, 1250,
705–707, 709, 710, 712, 717, 729, 827, 1258, 1259, 1288, 1300
863, 870, 871, 876, 918, 920 Hermiticity, 3, 19, 25, 26, 119, 413, 416, 424,
infinite, 648, 655, 661, 689, 827, 873, 876 449, 450, 601, 709, 771, 974, 1238,
representation, 711, 720, 723, 725–727 1240, 1243
symmetric, 688, 689 Hesse’s normal form, 288, 684, 687
theory, 484, 607, 647–659, 662, 669, 705, High-energy physics, 949
714, 737, 755–826 Highest-occupied molecular orbital (HOMO),
Group index, 371, 376 780, 781, 797
Group refractive index, 371 Hilbert space, 26, 588, 774, 1005
Group velocity, 7, 12, 338 Hole theory, 1036
Homeomorphic, 926
HOMO, see Highest-occupied molecular orbital
H (HOMO)
Half-integer, 78, 79 Homogeneous boundary conditions (BCs), 410,
Half-odd-integer, 78, 79, 81, 84, 841, 844, 880, 411, 416, 417, 420, 421, 424, 425, 427,
914, 1291 429–433, 445, 448, 449, 623
Hamilton–Cayley theorem, 498–499, 509, 510, Homogeneous equation, 173, 402, 419, 421,
515, 540 423, 424, 426, 427, 430, 441, 448, 449,
Hamiltonian, 11, 21, 29, 31, 33, 36, 43, 59–69, 620, 623, 626, 630, 633, 639, 640
109, 156, 161, 168, 178, 181, 558, 761, Homomorphic, 654–657, 706, 711, 734, 1265,
764, 766, 767, 771, 788, 816, 819, 1272, 1285–1287
960–964, 977, 983, 984, 999–1005, Homomorphism, 647, 653–657, 705–707, 739,
1031, 1049, 1065–1071, 1076, 1077, 745, 912–914, 1263, 1283, 1286, 1287
1079, 1085, 1088, 1091, 1094, 1104, Homomorphism theorem, 657, 914, 1265
1105, 1108, 1113, 1115 Homotopic, 922, 1250, 1258
Hamiltonian density, 999, 1028–1035, 1049, Hückel approximation, 780
1055–1059, 1065, 1079, 1085, 1094, Hydrogen, 59, 124, 127, 134–144, 167–171,
1104 176, 781, 805, 807–810, 813–819, 962
Hamilton’s principle, 984 Hydrogen-like atoms, 59–125, 127, 134, 143,
Harmonic oscillator, 30–59, 109–111, 127, 144, 155, 157, 558, 603, 949
128, 132, 155, 156, 178, 179, 351, 353, Hyperbolic functions, 942, 1211, 1216
396, 398, 399, 401, 440, 441, 558, 774, Hypersphere, 923, 1185
994, 998, 1003, 1201 Hypersurface, 599, 923
Heaviside-Lorentz units, 1049
Heaviside step function, 422
Heisenberg, W., 11, 1087–1091 I
Heisenberg equation of motion, 1089 Idempotent matrix, 505–513, 544, 546–548,
Hermite differential equation, 449, 450 588, 631
1314 Index

Identity action, 977, 984, 991–993, 1028, 1080,


element, 648, 649, 651, 653–656, 658, 663, 1084
664, 675, 723, 777, 786, 913, 916–918, line, 306, 307
921, 922, 942, 1083, 1189, 1255, 1258, surface, 305–307, 363
1263, 1271, 1273, 1282, 1283 Integrand, 221, 239, 252, 254, 266, 268, 271,
mapping, 1264, 1301 274, 985, 992, 995, 1002, 1019, 1023,
matrix, 44, 507, 509, 511, 513, 521, 569, 1026, 1031, 1041, 1044, 1045, 1047,
570, 586, 591, 608, 666, 690, 694, 706, 1066, 1068, 1079, 1117
720, 728, 729, 808, 830, 896, 915, 921, Integration
939, 947, 951, 970, 975, 1030, 1134, clockwise, 217
1203, 1214, 1229, 1230, 1233, 1263, complex, 213, 236, 1011, 1019, 1020, 1023
1265 constant, 151, 152, 397, 407, 619, 620
operator, 35, 56, 159, 171, 417, 419, 507, contour, 217–219, 223, 227, 237, 239, 243,
989, 993, 1024, 1035, 1062, 1092, 1095, 245, 247, 255, 267, 270, 271, 1011,
1104, 1177, 1188, 1230 1020–1023, 1048
transformation, 666, 689, 693, 912, 913 counter-clockwise, 217
Image definite, 134
of transformation, 468 by parts, 26, 29, 86, 106, 108, 132, 408,
Imaginary axis, 185, 202 432, 992, 1029, 1057, 1084
Imaginary unit, 185 path, 215, 217, 229, 1018, 1023
Improper rotation, 669, 673, 690, 807, 920 range, 25, 247, 438, 440, 769, 995, 1011
Incidence angle, 311, 325 real, 236
Incidence plane, 311, 314, 315, 317, 322, 326 surface, 282, 306
Incident light, 310, 312, 314 termwise, 224, 228
Indeterminate constant, 19 volume, 282
Indeterminate form of limit, 323, 960 Interface, 305–308, 310–315, 322, 325, 331,
Indiscrete topology, 199 332, 335–339, 343, 347–351, 357, 377,
Inequivalent, 711, 718, 720–722, 725, 734, 380, 385, 386, 396, 397
737, 860, 869, 871–873 Interference
Inertial frame of reference, 934, 939, 940, 943, constructive, 370
1085, 1121, 1187, 1207, 1222, theory, 372
1227–1229 Interference modulation
Infinite dielectric media, 294, 295, 305, 306 of emissions, 373, 376
Infinite-dimensional, 459, 1157 spectra, 374
Infinite power series, 608, 1095 Interior point, 191
Infinite series, 163, 224, 232 Intermediate state, 160
Inflection point, 242 Intermediate value theorem, 917
Inhomogeneous boundary conditions (BCs), Intersection, 188, 195, 649, 669, 671, 677, 688,
426, 427, 430, 438, 441, 446, 447, 623, 924
624, 626 Invariant, 487, 495–501, 504, 508, 514, 517,
Inhomogeneous equation, 402, 405, 424, 430, 529–531, 539, 540, 544, 548, 571, 583,
623, 624, 633 590, 593, 602, 603, 615, 652, 656, 659,
Initial value problems (IVPs), 401, 431–448 677, 688, 713–715, 717, 732, 746, 760,
Injective, 472–474, 478, 653, 655, 924, 1264 764, 808, 831, 834, 843, 867, 869, 880,
Inner product 882, 883, 892, 914, 917, 918, 935, 938,
of functions, 745 940, 957, 967, 970, 977, 989, 994,
space, 202, 549–571, 574, 578, 579, 587, 1005–1015, 1017, 1018, 1023, 1024,
588, 590, 592, 610, 714, 1086, 1141, 1027, 1028, 1035, 1040–1043, 1049,
1158, 1163, 1179, 1259 1052, 1054, 1059, 1072, 1082, 1084,
of vectors, 419, 601, 749 1089, 1100, 1118, 1123, 1130, 1158,
vector space, 574 1165–1171, 1175–1177, 1187, 1189,
In-quadrature, 396, 397 1207, 1214, 1227, 1255, 1261, 1263,
Integral 1304
Index 1315

Invariant delta functions, 977, 1005–1015, Klein-Gordon field, 991, 994, 999, 1028
1017, 1018, 1023, 1028, 1040–1043, Kronecker delta, 50, 937
1072 Kronecker product, 1149
Invariant scattering amplitude, 1118
Invariant subgroup, 652, 656, 659, 677, 688,
732, 914, 917, 918, 1189, 1255 L
Invariant subspace, 495–500, 508, 517, Laboratory coordinate system, 381
529–531, 539, 540, 544, 548, 571, 583, Laboratory frame, 1121, 1136, 1138
602, 603, 713, 714, 717, 808 Ladder operator, 76, 93, 94, 114
Inverse element, 472, 648, 653, 655, 658, 664, Lagrange’s equations of motion, 982, 984, 985,
682, 706, 727, 733, 942, 1083 993
Inverse matrix, 43, 205, 457, 476, 479, 480, Lagrangian, 977–985, 991–993, 1028–1035,
538, 596, 622, 628, 631, 706, 937, 942, 1049, 1055–1059, 1079–1082, 1084,
944, 1170, 1195 1085
Inverse transformation, 468, 472, 473, 648, Lagrangian density, 991–993, 1028, 1049,
845, 847, 1257, 1258, 1284 1055–1059, 1079, 1080, 1084
Inversion symmetry, 668, 669, 673, 680, 1255 Lagrangian formalism, 977–985, 993
Inverted distribution, 368 Laguerre polynomials, 59, 110, 118–124
Invertible, 472, 648, 656 Laplace operator, 8, 761
Irradiance, 322, 367–370 Laplacian, 8, 761, 762
Irrational functions, 256 Laser
Irreducible, 712, 714, 717–726, 728, 729, device, 372, 375, 395
734–738, 742–746, 749, 764, 766–768, material, 371, 375, 377
771–773, 775–777, 780, 782, 784, 786, medium, 367–369, 371, 380
788, 792, 797–799, 801, 803, 806, 808, organic, 372–395
810–812, 814, 815, 817, 823–825, Laser oscillation
853–861, 868–874, 876–878, 891, multimode, 376, 394
1291–1293 single-mode, 394, 395
Irreducible character, 723, 868–874 Lasing property, 372
Isolated point, 196–199 Laurent’s expansion, 227, 229, 236, 240, 241
Isomorphic, 654, 655, 657, 674, 680, 688, 711, Laurent’s series, 223–230, 234, 236, 241
760, 914, 924, 1157, 1249, 1255, 1264, Law of conservation of charge, 282
1276, 1281, 1286, 1288, 1290, 1301 Law of electromagnetic induction, 280
Isomorphism, 647, 653–657, 705, 920, 1157, Law of equipartition of energy, 353
1158, 1276, 1284–1287 LACO, see Linear combination of atomic
IVPs, see Initial value problems (IVPs) orbitals (LCAO)
Least action
principle of, 984, 992, 993
J Left-circularly polarized light, 138, 140, 146,
Jacobian, 865 150, 301, 302, 637
Jacobi’s identity, 903, 1274, 1277 Legendre differential equation, 95, 96, 450
j-j Coupling, 879 Legendre polynomials, 59, 88, 94, 95, 101, 103,
Jordan blocks, 517–528, 530–533, 540 107–109, 273
Jordan canonical form, 485, 515–539 Legendre transformation, 983, 984
Jordan decomposition, 515 Leibniz rule, 94, 96, 103, 120
Jordan’s lemma, 248–256, 1021, 1022 Leibniz rule about differentiation, 94, 96
Levi-Civita symbol, 904
Lie algebra, 618, 626, 636, 827, 830, 896–915,
K 921, 922, 926, 1189–1193, 1241, 1242,
Kernel, 468, 529, 655–657, 912–914, 1250, 1256, 1259, 1266, 1269,
1263–1265, 1283–1285 1271–1304
Ket vector, 22, 549, 552 Lie group
Klein-Gordon equation, 931–934, 946, 991, linear, 897, 898, 908, 917, 921, 922, 926,
992, 994, 1097, 1197, 1200–1204 1282
1316 Index

Light amplification, 367, 368 Lorentz


Light amplification by stimulated emission of algebra, 1278–1287
radiation (LASER), 367 boost, 1141, 1192, 1194–1197, 1205, 1206,
Light-emitting devices, 377, 378, 391, 392 1212–1223, 1229, 1230, 1232, 1233,
Light-emitting materials 1235, 1236, 1238, 1249, 1262, 1281
organic, 372 condition, 1058, 1059, 1069, 1070
Light-emitting properties, 372 contraction, 943, 944
Lightlike, 940 force, 150, 637–639, 642, 643, 1075–1079
Lightlike vector, 1184–1186 gauge, 1058, 1084
Light propagation, 151, 277, 331, 377, 379, group, 942–944, 1141, 1187–1197, 1223,
381, 387 1241–1258, 1263, 1264
Light quantum, 3, 4, 358, 359 invariant, 1006, 1007, 1009–1011, 1015,
Limes superior, 226 1027, 1049, 1052, 1123
Limit, 208, 212–214, 218, 219, 226, 233, 267, transformation, 934–940, 942, 943, 957,
323, 438, 440, 613, 617, 729, 830, 866, 971, 1187–1200, 1203, 1205, 1207,
889, 900, 960, 1018, 1121, 1190, 1272, 1212, 1221, 1223, 1224, 1228, 1236,
1283, 1302 1247, 1256, 1257, 1261, 1302, 1303
Linear algebra, 518 Lowering operator, 93, 110, 114
Linear combination Lower left off-diagonal elements, 475
of atomic orbitals (LCAO), 755, 761, 768 Lower triangle matrix, 488, 539, 577, 632
of basis vectors, 545 Lowest-unoccupied molecular orbital (LUMO),
of functions, 128 780, 781, 797
of a fundamental set of solutions, 426 L-S Coupling, 879
Linear combination of atomic orbitals (LCAO), Luminous flux, 322
755, 761, 768
Linearly dependent, 17, 96, 103, 109, 309, 355,
403–405, 458, 467, 469, 479, 493, 494, M
496, 553, 554, 557, 558, 568, 950, 1164, Magnetic field, 150, 277, 279–281, 284, 285,
1183 292–294, 307, 313–317, 320, 321, 334,
Linearly independent, 15, 17, 32, 73, 309, 353, 336, 338, 357, 365, 383–385, 396, 637
357, 383, 403–405, 426, 457–459, 461, Magnetic flux, 280
462, 470, 479, 482, 489, 493–495, 497, Magnetic flux density, 151, 280
500, 503–505, 516, 517, 540, 552, 554, Magnetic permeability, 279, 380
558, 559, 567, 569, 571, 600, 622, 640, Magnetic quantum number, 143, 144
690, 691, 710, 714, 719, 728, 736, 741, Magnetic wave, 155, 293, 295
742, 812, 814, 840, 851, 946, 950, 952, Magnetostatics, 279
974, 975, 1142–1144, 1146, 1147, 1156, Magnitude relationship, 434, 720, 825
1158, 1163, 1164, 1183, 1236, 1280 Major axis, 301
Linearly polarized light, 144, 146, 295, 303, Mapping
1061 bijective, 926, 1157, 1158, 1168, 1284,
Linear transformation 1285, 1287, 1301
endomorphic, 475, 648 bilinear, 1142–1149, 1155, 1172
group, 648 continuous, 916, 917, 926
successive, 480, 482, 695 homomorphic, 655, 656, 1265, 1272,
Linear vector space 1285–1287
finite-dimensional, 457, 473, 474, 1157 identity, 1264, 1301
n-dimensional, 457, 458, 461, 465, 515 injective, 653
Lithography, 395 inverse, 472, 1149, 1157
Local properties, 920–926 invertible, 472
Logarithmic branch point, 264 isomorphic, 657, 688
Logarithmic functions, 256, 1252 multilinear, 1171–1173
Longitudinal multimode, 350, 370, 374, 394 non-invertible, 472
Loop, 306, 922, 924, 1250 reversible, 472
Index 1317

surjective, 657, 912 Minor, 296, 397, 473, 476, 559, 597
universal bilinear, 1149 Minor axis, 301
Mathematical induction, 49, 82, 115, 117, 488, Mirror symmetry
490, 493, 568, 582, 595, 609, 1171 plane of, 667, 671, 673, 675
Matrix Mode density, 353–358
adjugate, 477 Modulus, 202, 219, 242, 866, 1256
algebra, 43, 317, 382, 457, 468, 476, 501, Molecular axis, 674, 783, 819, 823
556, 853, 952, 971, 1141, 1193, 1242 Molecular long axis, 783, 804, 805
decomposition, 515, 539, 585, 587, 725 Molecular orbital energy, 761
equation, 531, 949 Molecular orbitals (MOs), 717, 746, 761–826
function, 915, 1250, 1255, 1256 Molecular science, 3, 737, 750, 755, 826, 1304
mechanics, 11 Molecular short axis, 804
nilpotent, 93, 94, 500–505, 511–521, 526, Molecular states, 755
536, 539, 577, 633 Molecular systems, 746, 749
self-adjoint, 555 Momentum, 3–5, 7, 28–30, 33, 57, 62, 396,
semi-simple, 513, 536, 539 932, 958, 960, 977, 991, 1002, 1005,
singular, 579, 1161 1018, 1024, 1073, 1105, 1112, 1116,
skew-symmetric, 616, 618, 626, 902, 903, 1123, 1136
907, 921, 1288, 1302 Momentum operator, 29–31, 33, 63, 79–94,
symmetric, 133, 592, 594–597, 631, 1160, 147, 1003
1170, 1197, 1247, 1248, 1251–1253, Momentum space
1262, 1302 representation, 1017, 1018, 1049, 1073
transposed, 405, 498, 549, 562, 1035 Monoclinic, 380
Matrix element Monomials, 841, 886, 888
of electric dipole, 144 Moving coordinate system, 695, 696, 698, 861,
of Hamiltonian, 766, 767 923, 1193, 1229
Matter field, 1076 Multiple roots, 258, 487, 500, 505, 543, 544,
Matter wave, 6, 7 583
Maximum number, 458 Multiplication table, 649, 674, 675, 714, 727
Maxwell, J.C., 332, 333, 380, 381, 1049–1053, Multiplicity, 495, 508, 509, 513, 520, 525, 586,
1058 589, 600, 809
Maxwell’s equations, 277–303, 332, 333, 380, Multiply connected, 199, 229
381, 1049, 1050, 1052, 1053, 1058 Multivalued function, 256–274
Mechanical system, 33, 124, 396–399
Meromorphic, 232, 238
Meta, 794, 795 N
Methane, 620, 622, 640 Nabla, 64, 933
Method of variation of constants, 685, 755, Nano capacitor, 161
805–826 Nano device, 164
Metric Naphthalene, 674, 684
Euclidean, 939, 1165 Natural units, 934, 935, 991, 993, 995, 1002,
indefinite, 476, 937, 1058, 1065–1071, 1049, 1050, 1086, 1095, 1137
1187 Necessary and sufficient condition, 195, 198,
Minkowski, 936, 937, 1052 201, 405, 473, 476, 486, 554, 579, 580,
space, 202, 204, 226, 227, 550, 937 648, 650, 655, 855, 918, 1143, 1179,
tensor, 936, 1165, 1170, 1175–1178, 1187 1235
Michelson–Morley experiment, 940 Negation, 192, 196
Microcausality, 1015 Negative-energy
Minimal coupling, 1075–1078, 1085 solution, 951, 955–960, 962, 965, 1209,
Minimal substitution, 1076 1212, 1218
Minkowski space state, 948, 949, 1026, 1027, 1036, 1220,
four-dimensional, 936, 1165 1221, 1231
n-dimensional, 1177 Negative helicity, 302
1318 Index

Neumann conditions, 27, 338, 416 Normal product, 1005, 1103, 1105
Newtonian equation, 31 N-product, 1016, 1095–1103, 1111
Newton’s equation of motion, 981 n-th Root, 257–259
Nilpotent, 93, 94, 500–505, 511–521, 525, 526, Nullity, 469
528, 539, 577, 633, 1292 Null-space, 468, 529
Node, 337, 338, 347–349, 397, 773, 774, 817 Number density operator, 1002
Non-commutative, 3, 28, 603, 960 Number operator, 41, 1003
Non-commutative group, 648, 664, 772 Numerical vector, 459, 1146
Non-compact group, 1192, 1263
Non-degenerate, 155–158, 167, 168, 180, 600,
1163, 1178, 1180, 1181 O
Non-degenerate subspace, 1180, 1181 O(3), 690, 918, 919, 921
Non-equilibrium phenomenon, 368 O(3, 1), 1187–1190, 1250–1256, 1258
Non-linear equation, 1075, 1091 O(n), 902, 903, 906, 907, 1254
Non-magnetic substance, 288, 295 Oblique incidence
Non-negative, 27, 34, 51, 70, 71, 77, 96, 202, of wave, 305, 315
220, 249, 250, 406, 550, 560, 571, 592, Observable, 588, 590
840, 849, 886 Octants, 682
Non-negative operator, 76, 558 Odd function, 51, 134, 161, 1014
Non-relativistic Odd permutation, 475, 904
approximation, 7, 11, 12 Off-diagonal elements, 475, 488, 539, 584, 664,
limit, 1121 666, 770, 771, 788, 815, 936, 1034,
quantum mechanics, 959 1242, 1245
Non-singular matrix, 457, 480, 481, 488, 489, One-dimensional
492, 502, 513, 525, 535, 540, 570, harmonic oscillator, 31, 59, 128, 132
593–596, 614, 707, 711, 896, 906, 938, infinite potential well, 20
952, 1161, 1168, 1170, 1242–1249 position coordinate, 33
Non-trivial, 403, 458, 1264 representation, 772, 787, 828
Non-trivial solution, 405, 421, 430, 449, 548 system, 130–134
Non-vanishing, 15, 16, 93, 130, 134, 139, 149, One-parameter group, 896–905, 921
230, 231, 235, 282, 306, 416, 617, 681, One-particle Hamiltonian, 960–964, 1031,
767, 771, 797, 812, 1044, 1133 1077
Non-zero One-particle wave function, 1031
coefficient, 310 Open ball, 227
constant, 27 Open neighborhood, 201
determinant, 480 Open set, 190–193, 195, 196, 199, 201, 219
eigenvalue, 525, 527, 1163 Operator, 3, 31, 63, 132, 155, 278, 359, 401,
element, 89, 90 482, 505, 549, 571, 636, 673, 710, 755,
vector, 70 827, 932, 989, 1077, 1141, 1274
Norm, 24, 55, 550, 551, 555, 557, 564–567, Operator method, 33–41, 134
578, 585, 590, 607, 610, 789, 790, 1068, Optical device
1069, 1092 high-performance, 390
Normalization constant, 37, 46, 53, 114, 568, Optical path, 339, 340
954, 957, 965 Optical process, 127, 358
Normalized, 18, 24, 25, 29, 34, 35, 37, 40, 55, Optical resonator, 372, 373
70, 78, 109, 117, 123, 130, 134–136, Optical transition, 127–152, 155, 359, 743, 746,
158, 369, 568, 753, 769, 779, 813, 885, 749, 767, 780, 781, 783, 797, 802, 803,
1026, 1031, 1045, 1079, 1209, 1211, 824–826
1290 Optical waveguide, 341
Normalized eigenfunction, 37, 801 Optics, 305, 351, 377
Normalized eigenstate, 55 Orbital energy, 761, 772, 818, 823
Normalized eigenvector, 157 Order of group, 675, 722
Normal operator, 578–581, 585, 604, 694, Ordinate axis, 131
1203, 1235 Organic materials, 375
Normal-ordered product, 1005, 1016, 1095 Orthochronous Lorentz group
Index 1319

proper, 1189, 1258 Periodic modulation, 373, 374


Orthogonal Periodic wave, 287
complement, 571–574, 714, 717, 1178, Permeability, 151, 279, 316, 380, 1049
1179, 1181, 1184, 1186, 1187 Permittivity, 60, 110, 278, 279, 316, 380, 395,
coordinate, 202, 381, 1184, 1185 594
decomposition, 1184 Permutation, 405, 475, 667, 689, 697, 703, 904,
group, 689–694, 902 1095, 1101, 1111
matrix, 569, 593, 594, 598, 616, 618, 641, Perturbation
664, 690, 693, 698, 762, 851, 861, 862, calculation, 1007, 1017
902, 907, 939, 1160, 1161, 1247, 1248, method, 36, 155–176, 183, 1075, 1078,
1251–1253, 1256, 1263 1102, 1103
similarity transformation, 1160 theory, 176, 1075
transformation, 662, 689, 695, 696, 912, Perturbed
939, 1160, 1198 state, 157, 168
Orthonormal Phase
base, 8, 1182 change, 329, 342
basis set, 63, 571, 576, 610, 746, 766, 877, difference, 295, 339, 340, 396
972 factor, 18, 40, 76, 79, 86, 122, 135, 143,
basis vectors, 61, 63, 662, 677, 678, 742, 293, 568, 957, 959, 967, 970, 1082,
756, 772, 980 1211
eigenfunctions, 40 matching, 379, 380, 386, 387, 394
eigenvectors, 157, 583 matching conditions, 380, 394
system, 109, 155, 159, 873, 989, 1091, 1212 refractive index, 376, 377, 384, 390
Orthonormalization, 567, 580, 742, 964, 965, shift, 328, 329, 331, 341, 342
1165, 1179 space, 1002
Oscillating electromagnetic field, 129 velocity, 7, 8, 286, 288, 289, 313, 338, 343
Out-coupling Photoabsorption, 134
of emission, 379, 380, 390 Photoemission, 134
of light, 377 Photon
Over damping, 442 absorption, 134, 137, 140
Overlap integrals, 743, 766, 779, 780, 795, 804, emission, 134, 137, 140
813, 816, 817 energy, 368, 825, 1138
field, 977, 999, 1005, 1011, 1023, 1026,
1027, 1049, 1063, 1071, 1072, 1084,
P 1096, 1098, 1102, 1110, 1117, 1129,
Pair annihilation, 1115 1130, 1133, 1136, 1137
Pair creation, 1115 Physical laws, 940
Paper-and-pencil, 824 Physicochemical properties, 772, 780
Para, 794, 795 Picture
Parallelepiped, 367, 368, 372 Heisenberg, 1087, 1089–1091
Parallelogram, 878 interaction, 1085–1091
Parallel translation, 286 Schrödinger, 1087, 1089–1091
Parameter space, 861–867, 923, 924, 1192 π-Electron approximation, 773–805
Parity, 19, 51, 132 Planar molecule, 677, 773, 783, 791
Partial differential equations, 277 Planck, M., 3, 351, 358
Partial differentiation, 9, 10, 291, 333, 1095 Planck constant, 3, 4, 129
Partial fraction decomposition Planck’s law of radiation, 351, 353–361
method of, 239, 1213 Plane of incidence, 311
Partial sum, 163, 224 Plane of mirror symmetry, 667, 671, 673
Particular solution, 173, 405 Plane wave
Path, 199, 200, 214–217, 219, 229, 339, 340, electromagnetic, 290
370, 922, 1018, 1023 expansion, 1092
Pauli spin matrices, 839, 947, 971, 1204, 1259 solution, 334, 931, 948–955, 960, 965,
Periodic conditions, 27, 416 1209, 1227
Periodic function, 259, 914 Plasma etching, 395
1320 Index

Plastics, 295 Positive helicity, 302


Point group, 661, 664, 674, 678, 680, 681, 685, Positive semi-definite, 27, 558
772, 782, 806 Positron, 1036, 1039, 1082, 1092, 1096, 1098,
Polar coordinate, 30, 47, 65, 149, 175, 202, 249, 1104, 1108, 1110, 1112, 1113, 1115
1218 Power series expansion, 110, 118, 120, 123,
Polar coordinate representation, 149, 264, 867, 124, 203, 204, 223, 608, 617
931, 1124, 1195, 1218 Poynting vector, 319–322, 347
Polar decomposition Primitive function, 215
of a matrix, 593, 1212, 1242, 1245, 1247, Principal axis, 381, 392, 393
1302 Principal branch, 259, 263
Polar form, 202, 203, 245, 257 Principal diagonal, 521, 523
Polarizability, 168–170, 175 Principal minor, 296, 477, 559, 595, 597
Polarization Principal part, 230
elliptic, 277 Principal submatrix, 382, 477
left-circular, 302 Principal value
longitudinal, 1059 of branch, 259
photon, 1106, 1124–1126 of integral, 243, 244
right-circular, 302 of ln z, 267
scalar, 1059 Principle of constancy of light velocity, 934,
transverse, 1059 940, 1085
vacuum, 1109, 1110 Probabilistic interpretation, 944, 991, 1031
vector, 129, 136, 137, 140, 293, 313–316, Probability
319, 326, 336, 343, 347, 363–365, 396, density, 128, 129, 131, 142, 931–933
767, 781, 824, 1059–1061 distribution, 128
Pole distribution density, 131
isolated, 232, 239 Projection operator
of order n, 231, 235, 1047 sensu lato, 738, 1232, 1235, 1238, 1240
simple, 231, 235, 237, 239, 243–245, 247, sensu stricto, 741, 1214, 1233–1236, 1238,
266, 267, 271, 274, 1012, 1019, 1023 1239
Polyene, 805 Proof by contradiction, 195
Polymer, 277, 288 Propagating electromagnetic wave, 305
Polynomial Propagation constant, 337, 338, 343, 376, 377,
exponential, 309 379
homogeneous, 983 Propagation vector, 379
minimal, 499, 500, 502, 526, 527, 535, Propagation velocity, 7, 286
540–544 Proper rotation, 666, 669
Population inversion, 368 Proposition
Portmanteau word, 196 converse, 617, 854
Position coordinate, 33 P6T, 377, 378, 380, 381, 383–385, 390–392
Position operator, 29, 30, 57, 359, 366 Pure imaginary, 29, 326, 342, 442, 901, 944
Position vector, 8, 61, 129, 131, 136, 151, 169, Pure rotation group, 680, 685
288, 289, 308, 361, 362, 364, 463, 564,
638, 662, 756, 761, 767, 774, 782, 802,
828, 933, 934, 959 Q
Positive definite, 17, 27, 34, 36, 296, 297, 558, QED, see Quantum electrodynamics (QED)
559, 569, 571, 592, 593, 595, 707, 905, Q-numbers, 995, 1026
937, 939, 965, 1141, 1158, 1165, 1177, Quadratic equation, 180, 297, 442, 551, 800
1178, 1197, 1242–1248, 1250–1252, Quadratic form
1256–1258, 1262, 1263, 1302 Hermitian, 54, 557, 558, 560, 592–599,
Positive definiteness, 296, 571, 592, 593, 1158 1165
Positive-definite operator, 27, 1245 real symmetric, 571
Positive-energy Quantization
solution, 952, 955, 962, 1208, 1210, 1212 canonical, 993, 1040, 1059, 1061–1065
state, 948, 949, 957, 1026, 1027, 1036, field, 986, 991, 993, 995, 1005, 1023–1028,
1040, 1220, 1231 1035, 1039, 1040, 1063, 1187
Index 1321

Quantum chemical calculations, 737, 755 Real symmetric matrix, 592–594, 597, 1160,
Quantum chemistry, 791, 1304 1197, 1251
Quantum electrodynamics (QED), 399, 931, Real variable, 14, 25, 205, 212, 213, 225, 237,
971, 977, 991, 1049, 1075, 1080, 1084, 411, 607
1092, 1096, 1103–1113, 1139, 1140, Rearrangement theorem, 649, 707, 721, 728,
1238 730, 739
Quantum electromagnetism, 1061 Rectangular parallelepiped, 367, 368, 372
Quantum-mechanical, 3, 21–27, 30, 59, 124, Recurrence relation, 86
155, 169, 605, 931, 959, 987, 1032, Redshift, 4, 388, 392
1069 Redshifted, 389–392, 394
Quantum-mechanical harmonic oscillator, Reduced mass, 60, 61, 110, 135, 761
31–58, 109, 111, 156, 164, 558, 1201 Reduced Planck constant, 4, 129
Quantum mechanics, 3, 6, 11, 12, 19, 28, 29, Reducible, 497, 712, 714, 725, 726, 728, 768,
31, 59, 60, 155–183, 351, 366, 401, 450, 772, 775, 780, 786, 792, 797, 854, 855,
549, 571, 699, 774, 903, 959, 977, 1075, 860, 873, 876, 878, 895
1080, 1304 Reduction, 583, 712
Quantum numbers Reflectance, 321, 328, 349
azimuthal, 122, 149 Reflected light, 312
magnetic, 143, 144 Reflected wave, 321, 324, 336, 339, 348
orbital angular momentum, 110, 124, 144 Reflection
principal, 122–124 angle, 312
Quantum operator, 29, 1045 coefficient, 317, 328, 336, 338, 349, 357
Quantum state, 21, 32, 54, 77, 93, 114, 123, Refraction angle, 312, 319, 326
128, 129, 134, 144, 145, 148, 155–164, Refractive index
167, 168, 171, 176, 366, 767, 878, 880, of dielectric, 288, 305, 324, 325, 332, 337,
957, 960, 967, 1026, 1086, 1294 338
Quantum theory of light, 368 group, 371
Quartic equation, 148, 950 of laser medium, 369, 371
Quotient group, 653 phase, 376, 377, 384, 390, 393
relative, 313, 323, 329
Region, 151, 204, 210, 213–218, 221, 225–230,
R 232–234, 294, 295, 328, 330, 332, 343,
Radial coordinate, 202, 834, 852, 867 351, 363, 438, 873, 922, 1250
Radial wave function, 109–124 Regular hexagon, 791
Radiation field, 129, 139, 140, 1065 Regular point, 212, 223
Raising and lowering operators, 93 Regular singular point, 173
Raising operator, 76, 93 Regular tetrahedron, 687, 805
Rank, 468, 469, 505, 522, 527, 530–532, 548, Relations between roots and coefficients, 487
950, 955, 958, 1174, 1175, 1219, 1238 Relative coordinates, 59–61
Rapidity, 943 Relative permeability, 279
Rational number, 203 Relative permittivity, 279
Rayleigh–Jeans law, 351, 357 Relative refractive index, 313, 323, 324, 329
Real analysis, 186, 222 Relativistic effect, 949
Real axis, 18, 37, 185, 202, 242, 250, 254, 255, Relativistic field, 993, 1023, 1027, 1055, 1097
263, 267, 269 Relativistic quantum mechanics, 959
Real domain, 205, 222, 234, 273, 592 Relativity
Real form, 1268, 1289, 1293, 1298–1300 special principle of, 940
Real function, 51, 130, 143, 144, 205, 330, 414, Renormalization, 1110, 1140
420, 423, 425, 426, 778, 783, 790, 933, Representation
1082, 1083 adjoint, 905–915, 1261, 1273–1278, 1286
Real number antisymmetric, 750–753, 890–892, 895,
field, 1142, 1155, 1259 1297
line, 207, 418 complex conjugate, 789, 846
Real space, 828, 834, 837 conjugate, 651
1322 Index

Representation (cont.) Resonance structures, 714, 715


coordinate, 3, 31, 45–53, 110, 114, 124, Resonant structures, 784
129, 131, 134, 139, 149, 161, 167, 170, Resonator, 350, 372, 373, 375
172, 174, 175, 179, 181, 182, 264, 418, Riemann sheet, 260
819, 852, 867, 931, 934, 936, 954, 1023, Riemann surface, 256–274
1049, 1086, 1124, 1184, 1193, 1195, Right-handed system, 293, 314, 319, 397
1198, 1199, 1201, 1237, 1245 Rigid body, 690, 755
differential, 1264, 1271–1287, 1290, 1291 Rodrigues formula, 88, 95, 97, 102, 118, 273
dimension of, 706, 711, 713, 729, 746, 772, Root
860 complex, 243, 442, 443
Dirac, 947 double, 257, 297, 442, 501, 503, 537, 545,
direct-product, 746–751, 766, 767, 797, 591, 716, 809, 968
824, 873–876, 880, 890–892, 894 multiple, 258, 487, 500, 504, 505, 541, 543,
of groups, 705, 749 544, 548, 583
Heisenberg, 1087 simple, 529, 545
irreducible, 712, 714, 717, 720–722, square, 258, 297, 382, 818
724–726, 728, 729, 734, 735, 737, 738, triple, 257, 501, 529–531, 786
742–746, 766–768, 771–773, 775–777, Rotation
780, 782, 784, 786, 797–799, 801, 803, angle, 664, 666, 667, 673, 693, 828, 861,
806, 808, 810–812, 814, 815, 817, 863–866, 868, 874–876, 919, 1204,
823–825, 853–861, 868, 870, 871, 873, 1222
876–878, 891, 1291–1293 axis, 665–668, 673, 674, 683–685, 688,
matrix, 3, 31, 34, 42–45, 89, 92, 93, 464, 690–694, 700–703, 861, 862, 864, 868,
465, 467, 475, 477, 480–482, 495, 517, 869, 874, 919, 924, 925, 1302
521, 530, 531, 545, 561–564, 577, 601, group, 680, 685, 689, 690, 827, 834–895,
603, 666, 670, 675–679, 685, 689, 717, 920, 1141
721, 756, 764, 772, 785, 828, 837, 856, improper, 669, 673, 690, 807, 920
891, 894, 912, 961, 1141, 1150, 1196, matrix, 641, 664, 689–694, 849, 868, 919
1261, 1291, 1292 proper, 666, 669
one-dimensional, 768, 772, 782, 787, 789, symmetry, 666
810, 828, 1201, 1245 transformation, 661
reducible, 712, 725, 726, 728, 768, 772, Row vector, 8, 405, 459, 491, 546, 554, 566,
775, 786 567, 728, 860, 936, 937, 1030, 1146,
regular, 726–734 1174
Schrödinger, 1087 Ruled surface, 1185, 1186
space, 1018, 1049, 1073
subduced, 776, 786, 792
symmetric, 750–753, 760, 765, 767, 768, S
771, 777, 780, 786, 797, 811, 824, 891 SALC, see Symmetry-adapted linear
theory, 690, 705–753, 755, 771, 827, 849, combination (SALC)
874, 1281, 1294, 1297 Scalar, 278, 284, 458, 459, 471, 499, 513, 552,
three-dimensional, 831, 849, 1290 602, 977, 1017, 1028, 1030, 1059, 1069,
totally symmetric, 760, 765, 767, 768, 771, 1079, 1099, 1100, 1102, 1132, 1133,
777, 780, 786, 797, 811, 824 1197, 1203, 1207
two-dimensional, 753, 789 Scalar field, 977, 991–1028, 1030, 1032, 1035,
unitary, 705, 706, 709, 713, 714, 741, 855, 1036, 1040, 1041, 1043, 1048–1051,
860, 1192, 1263 1058, 1061–1065, 1067, 1071, 1075,
Representative element, 1255 1096, 1097, 1201
Residue, 234–237, 239, 240, 244, 267, 1014, Scalar function, 286, 287, 755, 1005, 1017,
1047, 1048 1056, 1058, 1200, 1202, 1203
Resolvent, 607, 621–626, 628–633, 635, 636, Scalar potential, 1050, 1051
641, 644 Scattering
Resolvent matrix, 621–626, 628–633, 635, 636, Bhabha, 1111
641, 644 cross-section, 1120–1124, 1136
Resonance integral, 778, 794 Moller, 1111
Index 1323

process, 4, 5, 176, 1108, 1111, 1119 1087–1089, 1141, 1160, 1161, 1196,
Schrödinger, E., 8 1197, 1203, 1219, 1220, 1223, 1235,
Schrödinger equation 1238, 1242, 1243, 1246
of motion, 60 Simple harmonic motion, 32
time-evolved, 128 Simply connected, 199, 200, 215–219, 221,
Schur’s First Lemma, 717, 720, 723, 810, 852, 225, 228, 229, 234, 920–926, 1241,
869 1249, 1250, 1258, 1263
Schur’s lemmas, 717–723, 853 Simply reducible, 876, 878, 895
Schur’s Second Lemma, 717, 719, 721, 734, Simultaneous eigenfunction, 89
764, 854, 855, 860, 1265 Singleton, 201
Second-order differential operator, 411–416 Single-valuedness, 212, 221, 259, 260
Second-order extension, 1301 Singularity
Second-order linear differential equation essential, 232
(SOLDE), 3, 14, 15, 33, 45, 72, 73, 85, removable, 230, 252
383, 401–406, 410–412, 417, 424, 447, Singular part, 230, 232
448, 451, 607, 618–620, 945, 986 Singular point
Secular equation, 530, 755, 770–773, 777, 788, isolated, 242
794, 798, 799, 801, 812, 814, 815, 823 regular, 173
Selection rule, 127–152, 746 Sinusoidal oscillation, 129, 131, 135
Self-adjoint, 411, 415, 416, 424, 432, 449, 450, Skew-symmetric, 616, 618, 626, 902, 903, 907,
452, 555 921, 1288, 1302
Self-adjoint operator, 413, 422, 424, 449 S-matrix
Self-energy element, 1104–1108, 1111, 1115, 1116,
electron, 1109, 1139 1120
photon, 1109–1111, 1139 expansion, 1091–1095, 1139
Sellmeier’s dispersion formula, 371, 375, 376 Snell’s law, 313, 323, 325, 326, 343
Semicircle, 237–239, 243, 245, 246, 249, 250, SO0(3, 1), 1189, 1255, 1256, 1258, 1261, 1263,
255, 1020–1022 1264, 1266, 1281, 1284–1287, 1298
Semiclassical, 127 Solid angle, 681, 684, 702, 1124
Semiclassical theory, 150 Solid-state chemistry, 395
Semiconductor Solid-state physics, 395
crystal, 372, 373, 375, 376, 395 Space coordinates, 127, 934, 936, 1030
organic, 372, 373, 375, 376, 395 Space group, 663
processing, 395 Spacelike, 940, 1015, 1186
Semi-infinite dielectric media, 305, 306 Spacelike separation, 1015
Semi-simple, 512–515, 536, 539, 543, 546, Spacelike vector, 1060, 1184, 1186
1230, 1235 Space-time coordinates, 934–936, 946, 987,
Separation axiom, 200, 210 998, 1039, 1082, 1102
Separation condition, 200 Space-time point, 939, 1193
Separation of variables, 12, 69–74, 110, 127, Span, 459–462, 468–470, 480, 495–498, 501,
353, 1086 517, 529, 539, 548, 572, 600, 602, 691,
Sesquilinear, 552 693, 764, 771, 788, 791, 808–810, 823,
Set 828, 846, 869, 873, 874, 886, 891, 915,
difference, 188 1143, 1157, 1181–1184, 1186, 1187,
finite, 188, 1083, 1086 1288, 1289
infinite, 188, 1083, 1086 Special functions, 59, 109, 643
of points, 232 Special linear group, 1249–1258
theory, 185, 186, 202, 204, 274 Special orthogonal group
Signature, 1161 SO(3), 661, 689–704, 827, 830, 834–874,
Similarity transformation, 485, 487–489, 492, 876, 878, 895, 896, 898, 904, 909,
502, 504, 511–513, 523, 525, 533, 535, 911–915, 918, 919, 921, 923, 924, 926,
537, 540, 542, 556, 577, 580, 581, 583, 1241, 1265, 1266, 1276, 1286, 1287
584, 587, 593, 603, 614, 627, 631, 635, SO(n), 902, 903, 906, 907, 918
681, 694, 705, 707, 709, 712, 834, 851, Special solution, 8
869, 895, 906, 919, 951, 952, Special theory of relativity, 931, 934–944, 1256
1324 Index

Special unitary group 514, 517, 528–531, 539, 540, 544, 548,
SU(2), 827, 830, 834–880, 883, 896, 898, 571–574, 583, 584, 602, 603, 649, 650,
904, 909–914, 923, 924, 926, 1207, 665, 713, 714, 717, 808, 809, 902, 1142,
1241, 1242, 1249, 1250, 1258, 1262, 1145, 1178–1184, 1186, 1187, 1252,
1265, 1266, 1276, 1286, 1287, 1294 1269, 1289, 1290, 1298–1300
SU(n), 837, 902, 903, 906, 907 Superior limit, 226
Spectral decomposition, 587–589, 604, 605, Superposed wave, 346, 347
694, 1232, 1233, 1235–1241 Superposition
Spectral lines, 129, 279, 370, 755 of waves, 294–303, 336, 344–346, 348
Spherical basis, 834, 850 Surface term, 409–411, 413, 416, 419,
Spherical coordinate, 60, 62, 63, 834, 852, 978 424–426, 430, 432, 436–440, 446, 450,
Spherical surface harmonics, 89, 94–105, 109, 621
171, 846, 869, 873, 915 Surjective, 472–474, 478, 653, 655, 657, 912,
Spherical symmetry, 59, 60, 922 924, 1266
Spinor Sylvester’s law of inertia, 1161
Dirac, 957, 958, 961–963, 1201, 1202, Symmetric, 59, 60, 69, 133, 175, 179, 296, 360,
1207–1212, 1214–1217, 1220, 1223, 386, 422, 425, 571, 592–597, 616, 618,
1227, 1236, 1237, 1256, 1257 626, 627, 630, 631, 663, 664, 688, 689,
space, 1187 750–753, 760, 765, 767, 768, 771, 777,
Spin state, 123 780, 786, 797, 811, 824, 825, 835, 864,
Spring constant, 31 878, 890–892, 895, 898, 902, 903, 907,
Square-integrable, 26 921, 937, 943, 995, 1051, 1137,
Square matrix, 457, 488, 490, 495, 497, 498, 1159–1161, 1170, 1192, 1197, 1221,
501, 512, 513, 523, 608, 609, 614, 711, 1247, 1248, 1251–1253, 1256–1258,
718, 719, 723, 830, 858, 975, 1125, 1262, 1288, 1296, 1297, 1302
1258 Symmetry
Standard deviation, 54, 57 discrete, 1256
Stark effect, 176 groups, 648, 661–704, 761, 764, 782
State vector, 129, 133, 135, 157, 1086, 1087, operations, 661, 663–669, 671, 673–680,
1089, 1091 685–687, 710, 716, 717, 728, 737, 755,
Stationary current, 283 759, 763–765, 771–773, 776, 777, 784,
Stationary wave, 343–350, 355, 396 787, 797, 806, 809, 810
Step function, 422 requirement, 179, 671, 673, 813, 815–817
Stimulated absorption, 358, 359, 368 species, 685, 746, 772, 784, 785, 802, 824
Stirling’s interpolation formula, 612 transformation, 774, 775, 786, 792, 798,
Stokes’ theorem, 306, 307 807, 808
Strictly positive, 17 Symmetry-adapted linear combination (SALC),
Structure constants, 903, 926, 1095, 1139, 755, 771–773, 777–778, 784, 787, 788,
1277, 1286 791, 792, 794, 798, 800, 801, 805, 810,
Structure-property relationship, 372 811, 813–816, 891, 892
Strum-Liouville system, 450 Syn-phase, 347, 349
Subalgebra, 1268, 1269, 1290, 1293, System of differential equations, 607–637, 643,
1298–1300 644
Subgroups, 647, 649–652, 656, 659, 675, 677, System of linear equations, 476, 478
685, 688–690, 732, 776, 784, 786, 791,
897, 906, 914, 917, 918, 921, 1252,
1255 T
Submatrix, 382, 476, 477, 583, 809, 891, 892, Tangent space, 1187
951, 1213 Taylor’s expansion, 223, 225, 229
Subset, 186–198, 200, 201, 204, 233, 459, Taylor’s series, 99, 223–229, 231, 241, 245,
649–651, 655, 902, 915–918, 921, 1263 250, 272
Subspace, 190, 199, 200, 204, 459–463, Td, 673, 680–689, 806–808, 811, 825, 920
468–470, 486, 495–501, 504, 505, 508, TE mode, see Transverse electric (TE) mode
Index 1325

Tensor Traceless, 839, 898, 902, 909, 1267


contraction, 1177 Transformation
electric permittivity, 380 canonical, 1089
equation, 1166 charge-conjugation, 967, 968, 970
invariant, 1175 contravariant, 1168
permittivity, 380, 395, 594 coordinate, 300, 382, 661, 667, 703, 704,
product, 1149–1155, 1171–1173, 1175, 764, 845, 868, 1167, 1169, 1177, 1187
1176 discrete, 967
Termwise integration, 224, 228 equivalence, 593, 594, 1161, 1162
TE wave, see Transverse electric (TE) wave group, 648, 737, 837, 1083
θ function, 444 linear, 457, 463–473, 475–477, 479–482,
Thiophene, 674, 727 484, 485, 495, 497, 498, 501, 508, 511,
Thiophene/phenylene co-oligomers (TPCOs), 517, 521, 528, 533, 545, 546, 552, 558,
373, 375–377 561–563, 565, 578, 579, 590, 648, 655,
Three-dimensional, 8, 59–61, 134–144, 168, 656, 662, 695, 703, 704, 709, 710, 734,
287, 288, 334, 354, 361, 460, 461, 477, 832, 855, 907, 908, 938, 940, 1260,
523, 529, 661, 664, 665, 677, 685, 689, 1272
757, 759, 811, 828, 831, 849, 895, 958, local phase, 1082–1085
994, 1002, 1060, 1061, 1116, matrix, 468, 477, 479, 480, 485, 565, 693,
1183–1186, 1276, 1290 836, 1188, 1198, 1207, 1248,
Three-dimensional Cartesian coordinate, 8, 677 1256–1258, 1302
Time-averaged Poynting vector, 320–322 particle-antiparticle, 1039
Time coordinate, 934–936, 939, 946, 948, 977, similarity, 485, 487–489, 492, 502, 504,
987, 994, 998, 1030, 1039, 1082, 1102, 511–513, 523, 525, 533, 535, 537, 540,
1185 542, 556, 577, 580, 581, 583, 584, 587,
Time dilation, 944 593, 603, 614, 627, 631, 635, 681, 694,
Timelike, 940, 1061, 1186 705, 707, 709, 712, 834, 851, 869, 895,
Timelike vector, 1060, 1084, 1186 906, 919, 951, 952, 1087–1089, 1141,
Time-ordered product (T-product), 977, 1007, 1160, 1161, 1196, 1197, 1203, 1219,
1015–1017, 1043, 1094–1103, 1114, 1220, 1223, 1235, 1238, 1242, 1243,
1146 1246
TM mode, see Transverse magnetic (TM) mode successive, 484, 695, 696, 847, 863, 1193,
TM wave, see Transverse magnetic (TM) wave 1194, 1221–1223, 1304
Topological group, 196, 827, 830, 926 Transition dipole moment, 129, 134
Topological space, 185, 189–191, 194, 196, Transition electric dipole moment, 129
199, 200, 204, 921, 926, 1142 Transition matrix element, 133, 766, 781, 797,
Topology, 185–204 802, 803
Torus, 199 Transition probability, 129, 135, 141, 176, 359,
Total internal reflection, 332, 338–343, 386 767, 804, 1120
Totally symmetric, 760, 765, 767, 768, 771, Translation symmetry, 663
777, 780, 786, 797, 811, 824 Transmission
Totally symmetric ground state, 797, 824 coefficient, 317, 318
Total reflection, 305, 313, 325–331, 334, 342 of electromagnetic wave, 305–350
TPCOs, see Thiophene/phenylene co-oligomers irradiance, 322
(TPCOs) of light, 308
T-product, see Time-ordered product Transmittance, 321
(T-product) Transpose
Trace, 255, 298, 301, 302, 487, 533, 592, 594, complex conjugate, 22
680, 681, 694, 698, 700, 716, 720, Transposition, 550, 596, 621, 740, 881, 938,
723–725, 729, 734, 735, 772, 774, 1030, 1081, 1251
784–786, 792, 797, 852, 864, 869, Transverse electric (TE) mode, 336, 338, 341
902–904, 906, 975, 1125, 1130–1132, Transverse electric (TE) wave, 305, 313–319,
1134, 1135, 1197, 1219, 1259 327, 329, 331–338, 341
1326 Index

Transverse magnetic (TM) mode, 341, 383, matrix, 138, 299, 556, 559, 560, 566, 571,
386, 390 580–583, 585, 590, 591, 597, 602–604,
Transverse magnetic (TM) wave, 305, 616, 618, 627, 631, 694, 705, 707, 713,
313–319, 321, 323–325, 327, 329, 716, 725, 772, 789, 796, 801, 809, 832,
331–338, 341 833, 843, 849–851, 877, 881, 885, 894,
Transverse mode, 349, 375, 377, 386, 392, 895, 898, 906, 907, 1192, 1218, 1239,
1059, 1106, 1117, 1133 1242, 1243, 1245–1247, 1256, 1258,
Transverse wave, 290, 291, 365 1297
Trial function, 178, 179 operator, 571–605, 834, 881, 891, 900, 901,
Triangle inequality, 551 974, 1087, 1221, 1230
Triangle matrix representation, 705–707, 709, 713, 714,
lower, 488, 539, 577, 632 741, 855, 860, 1192, 1263
upper, 488, 490, 502, 512, 523, 577, 632 transformation, 138, 142, 566, 590, 766,
Trigonometric formula, 104, 132, 320, 323, 789, 790, 801, 840, 880, 882, 883, 885,
324, 342, 388, 872, 914 891, 895, 1086, 1088, 1196, 1207
Trigonometric function, 143, 248, 665, 843, Unitary similarity transformation, 577, 580,
866 583, 584, 587, 603, 627, 635, 694, 834,
Triple integral, 992, 1065 851, 895, 906, 919, 951, 1087–1089,
Triply degenerate, 817, 824 1197, 1203, 1242, 1243, 1246
Trivial, 21, 26, 199, 309, 403, 421, 423, 432, Unit polarization vector
449, 458, 462, 479, 486, 502, 507, 548, of electric field, 129, 293, 313, 363, 364,
579, 591, 595, 633, 634, 648, 690, 693, 781, 824
767, 831, 849, 854, 892, 898, 1104, of magnetic field, 314, 365
1158, 1201, 1210, 1233, 1234, 1264, Unit set, 201
1301 Universal covering group, 926, 1241, 1242,
Trivial solution, 421, 423, 432, 449, 479, 486, 1249
548 Universal set, 186–190
Trivial topology, 199 Unknowns, 157, 293, 316, 386, 622, 624, 626
T1-space, 199–202, 926 Unperturbed
Two-level atoms, 351, 358–361, 366, 367, 396 state, 157, 162
system, 156, 157
Upper right off-diagonal elements, 475, 488
U
U(1), 1082–1085
Ultraviolet catastrophe, 351, 358 V
Ultraviolet light, 377 Vacuum expectation value, 1016, 1017, 1043,
Unbounded, 25, 29, 223, 1192, 1263 1098–1100
Uncertainty principle, 29, 53–58 Variable transformation, 48, 80, 142, 241
Uncountable, 647, 989 Variance, 53–58
Underlying set, 190 Variance operator, 53
Undetermined constant, 76, 78, 86 Variational method, 155, 176–183
Uniformly convergent, 224, 225, 227–229 Variational principle, 36, 112
Uniqueness Vector
of decomposition, 513 analysis, 284, 291, 1050
of Green’s function, 424 field, 280, 281, 306, 1050, 1084, 1165
of representation, 460, 465, 467, 658, 710 potential, 1050, 1051, 1055, 1077–1079,
of solution, 408, 430 1129
Unique solution, 438, 441, 447, 472, 476, 941 singular, 1178, 1184–1187
Unitarity, 709, 739, 741, 745, 766, 837, 846, transformation, 465, 466, 476, 480, 514,
848, 852, 882, 900, 951, 972, 974, 1230 661, 704, 831, 1170, 1171
Unitary unit, 4, 131, 132, 281, 288–290, 306–308,
diagonalization, 580–588, 592, 598 313–315, 336, 339, 340, 365, 460, 461,
group, 827, 837, 844, 901–903, 906 564, 605, 638, 831
Index 1327

Vector space Wavevector, 378, 379, 388, 391, 393


complex, 138, 1268, 1301 Wave zone, 363–365
finite-dimensional, 473 Weight function, 51, 406, 413, 415, 417, 428,
linear, 457, 465, 469, 471, 473, 485, 544, 432, 445, 450, 620
549, 583, 600, 609, 610, 649, 650, 653, Wick’s theorem, 1102
717, 903, 1141, 1142 Wigner formula, 840–843
n-dimensional, 457, 458, 461, 515, 936, World interval, 939–944, 1015, 1187
1156 Wronskian, 15, 308, 404, 405, 432, 434
theory of, 552, 1187, 1304
Venn diagram, 186, 187, 189, 191
Vertex, 688, 703, 1108, 1185, 1186 X
Vertical incidence, 313, 314, 323 X-ray, 4–6
Virtual displacement, 979, 980
Virtual work
infinitesimal, 980 Y
Yukawa potential, 109

W
Wave equation, 8, 277, 287, 293, 334, 336, 353, Z
354, 932 Zenithal angle, 72, 700, 861, 869, 924, 1194,
Wave front, 340 1196
Waveguide, 305, 331–343, 349, 372, 375, 377, Zero
380, 386, 390–394 element, 1283, 1284
Wavelength, 4, 6, 7, 152, 312, 313, 321, 346, matrix, 28, 495, 497, 501, 502, 514–516,
348–350, 370–376, 388–395, 638 539, 586, 612, 613, 921, 947, 1204
Wavelength dispersion, 371, 372, 374–376, vector, 24, 278, 458, 460, 486, 495, 568,
392, 393 585, 650, 899, 921, 1178
Wavenumber, 4, 7, 287, 337, 338, 340, 370, Zero-point energy, 1003
379, 1002 Zeros, 19, 229–232, 521
Wavenumber vector, 4, 5, 287, 307, 308, 311,
326, 332, 334, 368, 379, 391, 1059

You might also like