Professional Documents
Culture Documents
Amari Methods
Amari Methods
MATHEMATICAL
MONOGRAPHS
Volume 191
Methods of
Information Geometry
Shun-ichi Amari
Hiroshi Nagaoka
Selected Titles in This Series
191 Shun-ichi Amari and Hiroshi Nagaoka, Methods of information geometry, 2000
190 Alexander N. Starkov, Dynamical systems on homogeneous spaces, 2000
189 Mitsuru Ikawa, Hyperbolic partial differential equations and wave phenomena, 2000
188 V. V. Buldygin and Yu. V. Kozachenko, Metric characterization of random variables
and random processes, 2000
187 A. V. Fursikov, Optimal control of distributed systems. Theory and applications, 2000
186 Kazuya Kato, Nobushige Kurokawa, and Takeshi Saito, Number theory 1:
Fermat’s dream, 2000
185 Kenji Ueno, Algebraic Geometry 1: From algebraic varieties to schemes, 1999
184 A. V. Melnikov, Financial markets, 1999
183 Hajime Sato, Algebraic topology: an intuitive approach, 1999
182 I. S. Krasilshchik and A. M. Vinogradov, Editors, Symmetries and conservation
laws for differential equations of mathematical physics, 1999
181 Ya. G. Berkovich and E. M. Zhmud, Characters of finite groups. Part 2, 1999
180 A. A. Milyutin and N. P. Osmolovskii, Calculus of variations and optimal control,
1998
179 V. E. Voskresenskiı̆, Algebraic groups and their birational invariants, 1998
178 Mitsuo Morimoto, Analytic functionals on the sphere, 1998
177 Satoru Igari, Real analysis—with an introduction to wavelet theory, 1998
176 L. M. Lerman and Ya. L. Umanskiy, Four-dimensional integrable Hamiltonian
systems with simple singular points (topological aspects), 1998
175 S. K. Godunov, Modern aspects of linear algebra, 1998
174 Ya-Zhe Chen and Lan-Cheng Wu, Second order elliptic equations and elliptic
systems, 1998
173 Yu. A. Davydov, M. A. Lifshits, and N. V. Smorodina, Local properties of
distributions of stochastic functionals, 1998
172 Ya. G. Berkovich and E. M. Zhmud, Characters of finite groups. Part 1, 1998
171 E. M. Landis, Second order equations of elliptic and parabolic type, 1998
170 Viktor Prasolov and Yuri Solovyev, Elliptic functions and elliptic integrals, 1997
169 S. K. Godunov, Ordinary differential equations with constant coefficient, 1997
168 Junjiro Noguchi, Introduction to complex analysis, 1998
167 Masaya Yamaguti, Masayoshi Hata, and Jun Kigami, Mathematics of fractals, 1997
166 Kenji Ueno, An introduction to algebraic geometry, 1997
165 V. V. Ishkhanov, B. B. Lure, and D. K. Faddeev, The embedding problem in
Galois theory, 1997
164 E. I. Gordon, Nonstandard methods in commutative harmonic analysis, 1997
163 A. Ya. Dorogovtsev, D. S. Silvestrov, A. V. Skorokhod, and M. I. Yadrenko,
Probability theory: Collection of problems, 1997
162 M. V. Boldin, G. I. Simonova, and Yu. N. Tyurin, Sign-based methods in linear
statistical models, 1997
161 Michael Blank, Discreteness and continuity in problems of chaotic dynamics, 1997
160 V. G. Osmolovskiı̆, Linear and nonlinear perturbations of the operator div, 1997
159 S. Ya. Khavinson, Best approximation by linear superpositions (approximate
nomography), 1997
158 Hideki Omori, Infinite-dimensional Lie groups, 1997
157 V. B. Kolmanovskiı̆ and L. E. Shaı̆khet, Control of systems with aftereffect, 1996
156 V. N. Shevchenko, Qualitative topics in integer linear programming, 1997
10.1090/mmono/191
Translations of
MATHEMATICAL
MONOGRAPHS
Volume 191
Methods of
Information Geometry
Shun-ichi Amari
Hiroshi Nagaoka
Translated by
Daishi Harada
1
Editorial Board
Shoshichi Kobayashi (Chair)
Masamichi Takesaki
2000 Mathematics Subject Classification. Primary 00A69, 53–02, 53B05, 53A15, 62–02,
62F05, 62F12, 93C05, 81Q70, 94A15.
Copying and reprinting. Individual readers of this publication, and nonprofit libraries
acting for them, are permitted to make fair use of the material, such as to copy a chapter for use
in teaching or research. Permission is granted to quote brief passages from this publication in
reviews, provided the customary acknowledgment of the source is given.
Republication, systematic copying, or multiple reproduction of any material in this publication
is permitted only under license from the American Mathematical Society. Requests for such
permission should be addressed to the Acquisitions Department, American Mathematical Society,
201 Charles Street, Providence, Rhode Island 02904-2294, USA. Requests can also be made by
e-mail to reprint-permission@ams.org.
c 2000 by the American Mathematical Society. All rights reserved.
Reprinted by the American Mathematical Society, 2007.
The American Mathematical Society retains all rights
except those granted to the United States Government.
Printed in the United States of America.
∞ The paper used in this book is acid-free and falls within the guidelines
established to ensure permanence and durability.
Visit the AMS home page at http://www.ams.org/
10 9 8 7 6 5 4 3 2 1 12 11 10 09 08 07
Contents
Preface vii
3 Dual connections 51
3.1 Duality of connections . . . . . . . . . . . . . . . . . . . . . . . . 51
3.2 Divergences: general contrast functions . . . . . . . . . . . . . . . 53
3.3 Dually flat spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
3.4 Canonical divergence . . . . . . . . . . . . . . . . . . . . . . . . . 61
3.5 The dualistic structure of exponential families . . . . . . . . . . . 65
3.6 The dualistic structure of α-affine manifolds and α-families . . . 70
3.7 Mutually dual foliations . . . . . . . . . . . . . . . . . . . . . . . 75
3.8 A further look at the triangular relation . . . . . . . . . . . . . . 77
v
vi CONTENTS
Bibliography 187
Index 203
Preface
vii
viii PREFACE
ix
x PREFACE TO THE ENGLISH EDITION
of wide areas of applications. Even though we could not show detailed and
comprehensive explanations for many topics, we expect that the readers feel its
flavor and prosperity from the description. It is our pleasure if the book would
play a key role for further developments of information geometry.
Year 2000
Shun-ichi Amari
Hiroshi Nagaoka
10.1090/mmono/191/BO
181
182 GUIDE TO THE BIBLIOGRAPHY
Sections 2.6, 3.1, 3.3, 3.4 and 3.6 of the present book are mostly based on
this paper. Although these results are now well known through [9] and later
publications, the paper itself was never published in a major journal. The ed-
itors or reviewers of some major journals were reluctant to accept new ideas
of combining statistics and differential geometry. Theorems 3.18 and 3.19 were
shown in the master thesis [155] of Nagaoka, but were not dealt with in [161]
nor [9] for the reason that the authors could find little significance in replacing
the definitions of expectation and variance with their α-versions. The attempt
made by Curado and Tsallis [74] and Tsallis et al. [208] has encouraged the
authors to revive the results.
The dualistic geometry of general divergences (contrast functions) was de-
veloped by Eguchi [84, 85, 87], to which §3.2 is essentially indebted. The yoke
geometry of Barndorff-Nielsen [39, 40, 41] is a natural generalization of the di-
vergence geometry. Important examples of divergences on statistical models are
given by the f -divergences which were introduced by Csiszár [70]. He made
several pioneering works related to information geometry such as [71, 72, 73].
A comprehensive study of the f -divergences is found in Vajda [214].
There are slightly different or more general frameworks for information ge-
ometry. Barndorff-Nielsen [38, 41] used the observed Fisher information and the
observed α-connections instead of our definitions, which he called the expected
Fisher information and the expected α-connections, to construct the observed
geometry. Barndorff-Nielsen and Jupp [47] studied a symplectic structure of a
dualistic manifold. Critchley et al. [69] proposed the preferred point geometry,
and Zhu and Rohwer [229, 230] investigated a Bayesian information geometry.
See also Burbea and Rao [61], which treats a situation where the entropy H(p)
in Equation (3.57) is replaced with a more general form of functionals.
Statistics has constantly been the primary field for applications of informa-
tion geometry. Sections 4.1–4.6 are mostly based on Amari [5, 6, 7, 9], while
the source of §4.7 is Kumon and Amari [132] and Amari [9]. The idea of lo-
cal exponential family bundles described in §4.8.1 was discussed by Amari [11]
and further pursued by Barndorff-Nielsen and Jupp [46]. The subject of §4.8.2
— the theory of estimating functions for semi-parametric and non-parametric
models using dual connections on fiber bundles — was developed by Amari and
Kumon [29] and Amari and Kawanabe [27, 28]. This theory was applied to ICA
(Independent Component Analysis) by Amari and Cardoso [24], Amari [20, 21]
and Kawanabe and Murata [120].
Besides these, there are lots of works related to statistical inference and dif-
ferential geometry. The book [9] contains a comprehensive list of references,
which would be helpful for those interested in the applications to statistics. In
addition, many new developments were made after the publication of this book,
among which are as follows. As mentioned in §4.5, the higher-order asymp-
totics of statistical estimation and prediction were further discussed by Eguchi
and Yanagimoto [88], Komaki [123] and Kano [116]. The relationship between
sequential estimation and conformal geometry was established by Okamoto et al.
[175]. Geometrical aspects of the EM algorithm (Dempster et al. [79]) were stud-
ied by Csiszár and Tusnády [73], Amari [16] and Byrne [63], while Matsuyama
GUIDE TO THE BIBLIOGRAPHY 183
[42, 43]. In §8.3 we have briefly touched upon the dualistic structure of such
transformation models, although the subject has not fully been investigated yet.
The notion of dual connections naturally arises in the framework of affine
differential geometry as seen in §8.4. Mathematical studies on this subject are
found in Nomizu and Pinkall [166], Nomizu and Simon [168], Nomizu and Sasaki
[167], Dillen et al. [80], Kurose [133, 134, 135], Abe [1], Lauritzen [138], Matsu-
zoe [149], Noguchi [165], Li et al. [141] and Uohashi et al. [213]. On the other
hand, Shima [198, 199, 200] studied a Hessian manifold which is a Riemannian
manifold with the metric given by the second derivatives of a convex potential
function ψ(θ) in a local coordinate system as in Equation (3.34). This is es-
sentially equivalent to the notion of dually flat space. He investigated this kind
of manifolds including their global structures from purely mathematical points
of view. Infinite-dimensional statistical manifolds were studied by Kambayashi
[115], von Friedrich [215] and Lafferty [136], and a mathematical foundation was
given to this problem in the framework of Orlicz space by Pistone and his group
in [97, 188, 189].
Bibliography
[2] D.H. Ackley, G.E. Hinton, and T.J. Sejnowski. A learning algorithm for
boltzmann machines. Cognitive Science, 9:147–169, 1985.
187
188 BIBLIOGRAPHY
[19] S. Amari. Natural gradient for over- and under-complete bases in ICA.
Neural Computation, 11:1875–1883, 1999.
[23] S. Amari, O.E. Barndorff-Nielsen, R.E. Kass, S.L. Lauritzen, and C.R.
Rao. Differential Geometry in Statistical Inference. IMS Lecture Notes:
Monograph Series 10. Institute of Mathematical Statistics, Hayward, Cal-
ifornia, 1987.
[26] S. Amari and T.S. Han. Statistical inference under multi-terminal rate
restrictions—a differential geometrical approach. IEEE Transactions on
Information Theory, IT-35:217–227, 1989.
[34] C. Atkinson and A.F. Mitchell. Rao’s distance measure. Sankhyā: The
Indian Journal of Statistics, Ser. A, 43:345–365, 1981.
[35] K.S. Azoury and M.K. Warmuth. Relative loss bounds for on-line density
estimation with the exponential family of distributions. submitted.
[44] O.E. Barndorff-Nielsen, R.D. Cox, and N. Reid. The role of differential
geometry in statistical theory. International Statistics Review, 54:83–96,
1986.
[47] O.E. Barndorff-Nielsen and P.E. Jupp. Yokes and symplectic structures.
Journal of Statistical Planning and Inference, 63:133–146, 1997.
[51] S.L. Braunstein and C.M. Caves. Statistical distance and the geometry of
quantum states. Physical Review Letters, 72:3439–3442, 1994.
[52] S.L. Braunstein and C.M. Caves. Geometry of quantum states. In V.P.
Belavkin, O. Hirota, and R.L. Hudson, editors, Quantum Communications
and Measurement, pages 21–30. Plenum Press, New York, 1995.
[53] L.M. Bregman. The relaxation method of finding the common point of
convex sets and its application to the solution of problems in convex pro-
gramming. USSR Computational Mathematics and Physics, 7:200–217,
1967.
[56] R.W. Brockett. Some geometric questions in the theory of linear systems.
IEEE Transactions on Automatic Control, 21:449–455, 1976.
BIBLIOGRAPHY 191
[57] R.W. Brockett. Dynamical systems that sort lists, diagonalize matrices
and solve linear programming problems. In Proceedings of 27th IEEE
Conference on Decision and Control, pages 799–. IEEE, 1988.
[58] R.W. Brockett. Least squares matching problems. Linear Algebra and Its
Applications, 122–124:761–777, 1989.
[59] D.C. Brody and L.P. Hughston. The Geometry of Statistical Physics.
Imperial College Press/ World Scientific Pub. to appear.
[61] J. Burbea and C.R. Rao. Entropy differential metric, distance and di-
vergence measures in probability spaces: A unified approach. Journal of
Multivariate Analysis, 12:575–596, 1982.
[62] J.P. Burg. Maximum entropy spectral analysis. In Proceedings of the 37th
Meeting of the Society of Exploration Geophysicists, 1967. Also in Modern
Spectrum Analysis.
[64] L.L. Campbell. The relation between information theory and the differ-
ential geometric approach to statistics. Information Sciences, 35:199–210,
1985.
[65] N.N. Chentsov (Čencov). Statistical Decision Rules and Optimal Infer-
ence. American Mathematical Society, Rhode Island, U.S.A., 1982. (Orig-
inally published in Russian, Nauka, Moscow, 1972).
[67] B.S. Clarke and A.R. Barron. Information theoretic asymptotics of Bayes
methods. IEEE Transactions on Information Theory, 36:453–471, 1990.
[76] A.P. Dawid. Discussion to Efron’s paper. The Annals of Statistics, 3:1231–
1234, 1975.
[77] A.P. Dawid. Further comments on a paper by Bradley Efron. The Annals
of Statistics, 5:1249, 1977.
[79] A.P. Dempster, N.M. Laird, and D.B. Rubin. Maximum likelihood from
incomplete data via the EM algorithm. Journal of the Royal Statistical
Society, B39:1–38, 1977.
[82] A. Edelman, T. Arias, and S.T. Smith. The geometry of algorithms with
orthogonality constraints. SIAM Journal on Matrix Analysis and Appli-
cations, 20:303–353, 1998.
[100] T.S. Han and S. Amari. Statistical inference under multiterminal data
compression. IEEE Transactions on Information Theory, 44(6):2300–
2324, 1998.
[109] F. Hiai and D. Petz. The proper formula for relative entropy and its
asymptotics in quantum probability. Communications in Mathematical
Physics, 143:99–114, 1991.
[114] A.T. James. The variance information manifold and the function on it. In
P.K. Krishnaiah, editor, Multivariate Analysis, pages 157–169. Academic
Press, 1973.
[115] T. Kambayashi. Statistical manifold of infinite dimension. Transactions of
the Japan Society for Industrial and Applied Mathematics, 4(3):211–228,
1994. (in Japanese).
[116] Y. Kano. Third-order efficientcy implies fourth-order efficiency. Journal
of the Japan Statistical Society, 26:101–117, 1996.
[117] R.E. Kass. Canonical parameterization and zero parameter effect curva-
ture. Journal of the Royal Statistical Society, B46:86–92, 1984.
[118] R.E. Kass. The geometry of asymptotic inference (with discussions). Sta-
tistical Science, 4:188–234, 1989.
[119] R.E. Kass and P.W. Vos. Geometrical Foundations of Asymptotic Infer-
ence. John Wiley, New York, 1997.
[120] K. Kawanabe and N. Murata. Semiparametric estimation in ICA and
information geometry. in preparation.
[121] L. Kivinen and M.K. Warmuth. Exponentiated gradient versus gradient
descent for linear predictors. Information and Computation, 132:1–63,
1997.
[122] S. Kobayashi and K. Nomizu. Foundations of Differential Geometry, I,
II. Interscience, New York, 1963, 1969.
[123] F. Komaki. On asymptotic properties of predictive distributions.
Biometrika, 83(2):299–313, 1996.
[124] F. Komaki. An estimating method for parametric spectral densities of
Gaussian time series. Journal of Time Series Analysis, 20:31–50, 1999.
[125] F. Komaki. Least informative prior for prediction. in preparation.
[126] K. Kraus. States, Effects, and Operations. Lecture Notes in Physics 190.
Springer-Verlag, 1983.
[127] R. Kubo, M. Toda, and N. Hashitsume. Statistical Physics II, Nonequi-
librium Statistical Mechanics. Springer Series in Solid State Sciences.
Springer-Verlag, 1991.
[128] R. Kulhavy. A Bayes-closed approximation of recursive non-linear esti-
mation. International Journal of Adaptive Control and Signal Processing,
4:271–285, 1990.
[129] R. Kulhavy. Recursive nonlinear estimation: A geometric view of recursive
estimation provides a deeper insight into the problem of its approximation
under memory limitation. Automatica, 26(3):545–555, 1990.
196 BIBLIOGRAPHY
[164] Y. Nakamura. A Tau-function for the finite Toda molecule and informa-
tion spaces. Contemporary Mathematics, 179:205–211, 1994.
[169] T. Ogawa and H. Nagaoka. Strong converse and Stein’s lemma in quantum
hypothesis testing. IEEE Transactions on Information Theory. to appear.
[174] M. Ohya and D. Petz. Quantum entropy and its use. Springer-Verlag,
1993.
[176] J.M. Oller. Information metric for extreme value and logistic probability
distributions. Sankhyā: The Indian Journal of Statistics, Ser. A, 49:17–
23, 1987.
[177] J.M. Oller and J.M. Corcuera. Intrinsic analysis of statistical estimation.
The Annals of Statistics, 23:1562–1581, 1995.
[178] J.M. Oller and C.M. Cuadras. Rao’s distance for negative multinomial
distributions. Sankhyā: The Indian Journal of Statistics, Ser. A, 47:75–
83, 1985.
[184] D. Petz. Monotone metrics on matrix spaces. Linear Algebra and Its
Applications, 244:81–96, 1996.
[186] D. Petz and G. Toth. The Bogoliubov inner product in quantum statistics.
Letters in Mathematical Physics, 27:205–216, 1993.
[187] D.B. Picard. Statistical morphisms and related invariance properties. An-
nals of the Institute of Statistical Mathematics, 44(1):45–61, 1992.
[188] G. Pistone and M.P. Rogantin. The exponential statistical manifold: Mean
parameters, orthogonality, and space transformation. Bernoulli, 5:721–
760, 1999.
[190] C.R. Rao. Information and accuracy attainable in the estimation of statis-
tical parameters. Bulletin of the Calcutta Mathematical Society, 37:81–91,
1945.
[191] M. Rattray and D. Saad. Analysis of natural gradient descent for multi-
layer neural networks. Physical Review E, 59(4):4523–4532, 1999.
[192] M. Rattray, D. Saad, and S. Amari. Natural gradient descent for on-line
learning. Physical Review Letters, 81:5461–5464, 1999.
[208] C. Tsallis, R.S. Mendes, and A.R. Plastino. The role of constraints within
generalized nonextensive statistics. Physica A, 261:534–554, 1998.
[216] P.W. Vos. Fundamental equations for statistical submanifolds with ap-
plications to the Bartlett correction. Annals of the Institute of Statistical
Mathematics, 41(3):429–450, 1989.
202 BIBLIOGRAPHY
203
204 INDEX
MMONO/191.S
www.oup.com