Download as pdf or txt
Download as pdf or txt
You are on page 1of 69

Foundations of info-metrics: modeling

and inference with imperfect


information Golan
Visit to download the full and correct content document:
https://ebookmass.com/product/foundations-of-info-metrics-modeling-and-inference-w
ith-imperfect-information-golan/
More products digital (pdf, epub, mobi) instant
download maybe you interests ...

Foundations of Library and Information Science, Fourth


Edition (Ebook PDF)

https://ebookmass.com/product/foundations-of-library-and-
information-science-fourth-edition-ebook-pdf/

Modern Information Optics with MATLAB Yaping Zhang

https://ebookmass.com/product/modern-information-optics-with-
matlab-yaping-zhang/

Reasoning with Attitude: Foundations and Applications


of Inferential Expressivism Luga Incurvati

https://ebookmass.com/product/reasoning-with-attitude-
foundations-and-applications-of-inferential-expressivism-luga-
incurvati/

Optimization Modeling with Spreadsheets 3rd Edition

https://ebookmass.com/product/optimization-modeling-with-
spreadsheets-3rd-edition/
Speed Metrics Guide: Choosing the Right Metrics to Use
When Evaluating Websites 1st Edition Matthew Edgar

https://ebookmass.com/product/speed-metrics-guide-choosing-the-
right-metrics-to-use-when-evaluating-websites-1st-edition-
matthew-edgar/

Principles of Pharmacology: The Pathophysiologic Basis


of Drug Therapy 4th Edition David E. Golan

https://ebookmass.com/product/principles-of-pharmacology-the-
pathophysiologic-basis-of-drug-therapy-4th-edition-david-e-golan/

Probability and Statistical Inference 9th Edition,


(Ebook PDF)

https://ebookmass.com/product/probability-and-statistical-
inference-9th-edition-ebook-pdf/

Statistical Inference in Financial and Insurance


Mathematics with R (Optimization in Insurance and
Finance Set) 1st Edition Alexandre Brouste

https://ebookmass.com/product/statistical-inference-in-financial-
and-insurance-mathematics-with-r-optimization-in-insurance-and-
finance-set-1st-edition-alexandre-brouste/

Irradiation for quality improvement, microbial safety


and phytosanitation of fresh produce 1st Edition Rivka
Barkai-Golan

https://ebookmass.com/product/irradiation-for-quality-
improvement-microbial-safety-and-phytosanitation-of-fresh-
produce-1st-edition-rivka-barkai-golan/
Foundations of Info-Metrics
Foundations of Info-Metrics
MODELING, INFERENCE,
AND IMPERFECT INFORMATION

Amos Golan

1
1
Oxford University Press is a department of the University of Oxford. It furthers
the University’s objective of excellence in research, scholarship, and education
by publishing worldwide. Oxford is a registered trade mark of Oxford University
Press in the UK and certain other countries.

Published in the United States of America by Oxford University Press


198 Madison Avenue, New York, NY 10016, United States of America.

© Oxford University Press 2018

All rights reserved. No part of this publication may be reproduced, stored in


a retrieval system, or transmitted, in any form or by any means, without the
prior permission in writing of Oxford University Press, or as expressly permitted
by law, by license, or under terms agreed with the appropriate reproduction
rights organization. Inquiries concerning reproduction outside the scope of the
above should be sent to the Rights Department, Oxford University Press, at the
address above.

You must not circulate this work in any other form


and you must impose this same condition on any acquirer.

Library of Congress Cataloging-in-Publication Data


Names: Golan, Amos.
Title: Foundations of info-metrics: modeling, inference, and imperfect
information / Amos Golan.
Other titles: Foundations of info-metrics
Description: New York, NY: Oxford University Press, [2018] |
Includes bibliographical references and index.
Identifiers: LCCN 2016052820 | ISBN 9780199349524 (hardback: alk. paper) |
ISBN 9780199349531 (pbk.: alk. paper) | ISBN 9780199349548 (updf) |
ISBN 9780199349555 (ebook)
Subjects: LCSH: Measurement uncertainty (Statistics) | Inference. | Mathematical statistics.
‘Information Measurement’ and ‘Mathematical Modeling.’
Classification: LCC T50.G64 2017 |
DDC 519.5/4—dc23
LC record available at https://lccn.loc.gov/2016052820

9 8 7 6 5 4 3 2 1
Paperback printed by WebCom, Inc., Canada
Hardback printed by Bridgeport National Bindery, Inc., United States of America
To my grandparents, Sara and Jacob Spiegel and Dora and
Benjamin Katz, and my children, Maureen and Ben
CONTENTS

List of Figures xiii


List of Tables xv
List of Boxes xvii
Acknowledgments xix

1. Introduction 1
The Problem and Objectives 1
Outline of the Book 3

2. Rational Inference: A Constrained Optimization Framework 10


Inference Under Limited Information 11
Qualitative Arguments for Rational Inference 11
Probability Distributions: The Object of Interest 12
Constrained Optimization: A Preliminary Formulation 15
The Basic Questions 18
Motivating Axioms for Inference Under Limited Information 19
Axioms Set A: Defined on the Decision Function 20
Axioms Set B: Defined on the Inference Itself 20
Axioms Set C: Defined on the Inference Itself 21
Axioms Set D: Symmetry 22
Inference for Repeated Experiments 22
Axioms Versus Properties 24

3. The Metrics of Info-Metrics 32


Information, Probabilities, and Entropy 32
Information Fundamentals 32
Information and Probabilities 37
Information and Entropy 39
Information Gain and Multiple Information Sources 43
Basic Relationships 43
Entropy and the Grouping Property 44
Relative Entropy 46
Mutual Information 47
Axioms and Properties 49
Shannon’s Axioms 49
Properties 49
viii { Contents

4. Entropy Maximization 59
Formulation and Solution: The Basic Framework 60
Information, Model, and Solution: The Linear Constraints Case 60
Model Specification 60
The Method of Lagrange Multipliers: A Simple Derivation 61
Information, Model, and Solution: The Generalized Constraints Case 68
Basic Properties of the Maximal Entropy Distribution 71
Discussion 72
Uniformity, Uncertainty, and the Solution 72
Conjugate Variables 74
Lagrange Multipliers and Information 76
The Concentrated Framework 79
Examples in an Ideal Setting 82
Geometric Moment Information 82
Arithmetic Moment Information 83
Joint Scale and Scale-Free Moment Information 87
Likelihood, Information, and Maximum Entropy: A Qualitative
Discussion 87

5. Inference in the Real World 107


Single-Parameter Problems 108
Exponential Distributions and Scales 108
Distribution of Rainfall 108
The Barometric Formula 110
Power and Pareto Laws: Scale-Free Distributions 112
Distribution of Gross Domestic Products 113
Multi-Parameter Problems 114
Size Distribution: An Industry Simulation 114
Incorporating Inequalities: Portfolio Allocation 117
Ecological Networks 122
Background 123
A Simple Info-Metrics Model 124
Efficient Network Aggregation 126

6. Advanced Inference in the Real World 135


Interval Information 136
Theory 136
Conjugate Variables 139
Weather Pattern Analysis: The Case of New York City 140
Treatment Decision for Learning Disabilities 143
Background Information and Inferential Model 143
A Simulated Example 145
Brain Cancer: Analysis and Diagnostics 147
The Information 148
Contents } ix

The Surprisal 151


Bayesian Updating: Individual Probabilities 154

7. Efficiency, Sufficiency, and Optimality 165


Basic Properties 166
Optimality 166
Implications of Small Variations 167
Efficiency 169
Statistical Efficiency 169
Computational Efficiency 175
Sufficiency 176
Concentration Theorem 178
Conditional Limit Theorem 180
Information Compression 180

8. Prior Information 194


A Preliminary Definition 195
Entropy Deficiency: Minimum Cross Entropy 196
Grouping Property 200
Surprisal Analysis 209
Formulation 209
Extension: Unknown Expected Values or Dependent Variables 211
Transformation Groups 211
The Basics 212
Simple Examples 215
Maximum Entropy Priors 221
Empirical Priors 221
Priors, Treatment Effect, and Propensity Score Functions 222

9. A Complete Info-Metrics Framework 231


Information, Uncertainty, and Noise 232
Formulation and Solution 234
A Simple Example with Noisy Constraints 242
The Concentrated Framework 245
A Framework for Inferring Theories and Consistent Models 249
Examples in an Uncertain Setting 250
Theory Uncertainty and Approximate Theory: Markov Process 250
Example: Mixed Models in a Non-Ideal Setting 254
Uncertainty 259
The Optimal Solution 259
Lagrange Multipliers 261
The Stochastic Constraints 262
The Support Space 262
The Cost of Accommodating Uncertainty 264
x { Contents

Visual Representation of the Info-Metrics Framework 264


Adding Priors 268

10. Modeling and Theories 281


Core Questions 282
Basic Building Blocks 284
Problem and Entities 284
Information and Constraints 285
Incorporating Priors 286
Validation and Falsification 286
Prediction 288
A Detailed Social Science Example 288
Characterizing the Problem 288
Introducing the Basic Entities 289
Information and Constraints 291
Production 291
Consumption 292
Supply and Demand 292
Individual Preferences 292
Budget Constraints 293
The Statistical Equilibrium 294
Economic Entropy: Concentrated Model 296
Prices, Lagrange Multipliers, and Preferences 297
Priors, Validation, and Prediction 298
Model Summary 299
Other Classical Examples 300

11. Causal Inference via Constraint Satisfaction 307


Definitions 308
Info-Metrics and Nonmonotonic Reasoning 309
Nonmonotonic Reasoning and Grouping 314
Typicality and Info-Metrics 316
The Principle of Causation 316
Info-Metrics and Causal Inference 318
Causality, Inference, and Markov Transition Probabilities: An
Example 319
The Model 320
Inferred Causal Influence 322

12. Info-Metrics and Statistical Inference: Discrete Problems 334


Discrete Choice Models: Statement of the Problem 335
Example: A Die and Discrete Choice Models 335
Definitions and Problem Specification 339
The Unconstrained Model as a Maximum Likelihood 340
Contents } xi

The Constrained Optimization Model 341


The Info-Metrics Framework: A Generalized Likelihood 343
Real-World Examples 345
Tailoring Political Messages and Testing the Impact of Negative
Messages 345
Background on the Congressional Race and the Survey 346
Inference, Prediction, and the Effect of Different Messages 346
Is There Racial Discrimination in Home Mortgage Lending? 347
Background on Loans, Minorities, and Sample Size 347
Inference, Marginal Effects, Prediction, and Discrimination 348
The Benefits of Info-Metrics for Inference in Discrete Choice
Problems 351

13. Info-Metrics and Statistical Inference: Continuous Problems 357


Continuous Regression Models: Statement of the Problem 358
Definitions and Problem Specification 359
Unconstrained Models in Traditional Inference 359
Rethinking the Problem as a Constrained Optimization 361
A Basic Model 361
A General Information-Theoretic Model 363
Generalized Entropies 364
Information-Theoretic Methods of Inference: Zero-Moment
Conditions 367
Specific Cases: Empirical and Euclidean Likelihoods 367
Exploring a Power Law: Shannon Entropy Versus Empirical
Likelihood 371
Theoretical and Empirical Examples 373
Information-Theoretic Methods of Inference: Stochastic Moment
Conditions 376
The Support Spaces 380
A Simulated Example 381
Misspecification 386
The Benefits of Info-Metrics for Inference in Continuous Problems 388
Information and Model Comparison 390

14. New Applications Across Disciplines 411


Option Pricing 412
Simple Case: One Option 413
Generalized Case: Inferring the Equilibrium Distribution 416
Implications and Significance 418
Predicting Coronary Artery Disease 418
Data and Definitions 419
Analyses and Results 420
The Complete Sample 420
xii { Contents

Out-of-Sample Prediction 423


Sensitivity Analysis and Simulated Scenarios 424
Implications and Significance 425
Improved Election Prediction Using Priors on Individuals 426
Analyses and Results 427
The Data 427
The Priors and Analyses 428
Implications and Significance 431
Predicting Dose Effect: Drug-Induced Liver Injury 432
Medical Background and Objective 433
Data and Definitions 434
Inference and Predictions 434
A Linear Model 434
Analyzing the Residuals: Extreme Events 437
Implications and Significance 439

Epilogue 446

List of Symbols 449


Index 451
FIGURES

1.1. Chapter dependency chart 7


3.1. A graphical illustration of the information, entropy, and
probability relationships for a binary random variable with
probabilities p and 1–p 40
3.2. A simple representation of the interrelationships among entropies
and mutual information for two dependent random variables 48
4.1. A two-dimensional representation of the maximum entropy solution
for a discrete probability distribution defined over two possible
events 65
4.2. A geometrical view of maximum entropy 66
4.3. A graphical representation of the log inequality 69
5.1. The distribution for rainfall in southwest England over the period
1914 to 1962 109
5.2. Pareto GDP tail distribution of the 39 largest countries (20% of the
world’s countries) in 2012 on a log-log plot of the distribution versus
the GDP in US$ 113
5.3. A graphical illustration of the size distribution of firms in Uniformia
under the a priori assumption that all states are equally likely 116
5.4. Entropy contours of the three assets 120
5.5. Network aggregation from 12 nodes to 7 nodes with simple
weights 127
5.6. Two nonlinear network aggregation examples 129
6.1. A simple representation of the inferred temperature-range joint
distribution for New York City 141
6.2. A higher-dimensional surprisal representation of the New York City
weather results 142
8.1. A geometrical view of cross entropy with uniform priors 198
8.2. A geometrical view of cross entropy with nonuniform priors 199
8.3. The two-dice example, featuring a graphical representation of the
relationship between elementary outcomes (represented by dots)
and events 202
9.1. A simple representation of the info-metrics stochastic moments
solution for a discrete, binary random variable 240
9.2. A simplex representation of the info-metrics solution for a discrete
random variable with three possible outcomes 241
xiv { Figures

9.3. A simplex representation of the three-sided-die version of the


example 242
9.4. The inferred Lagrange multipliers of two measurements of a six-
sided die as functions of the support bounds for the errors 246
9.5. A simplex representation of the solution to a mixed-theory, three-
event problem 260
9.6. The info-metrics constraints and noise 265
9.7. A two-dimensional representation of the info-metrics problem and
solution 266
9.8. A simplex representation of the info-metrics problem and
solution 267
9.9. A simplex representation of the info-metrics framework and
solution for a discrete probability distribution defined over three
possible events and nonuniform priors 270
11.1. The Martian creatures, part I 315
11.2. The Martian creatures, part II 315
13.1. Graphical comparisons of the Rényi, Tsallis, and Cressie-Read
entropies of order α for a binary variable with value of 0 or 1 366
13.2. A comparison of the theoretical Benford (first-digit) distribution
with the ME and EL inferred distributions under two scenarios 374
13.3. A simplex representation of the info-metrics solution for
a linear regression problem with three parameters and ten
observations 382
14.1. Risk-neutral inferred probability distributions of a Wells Fargo call
option for October 9, 2015, specified on September 30, 2015 415
14.2. The predicted probability (gray line) of each patient together with
the correct diagnosis (dark points) of being diseased or healthy 422
14.3. The predicted (out-of-sample) probability (gray) of each patient
together with the correct diagnosis (dark points) of being diseased
or healthy 423
14.4. Comparing the prediction for the Democrats under two
scenarios 430
14.5. Comparing the prediction for the Republicans under two
scenarios 431
14.6. Residuals by dose as a result of the first-stage analysis of the liver
data 436
14.7. The return levels for the four different doses based on the
info-metrics model of the first stage when dose is treated as
a set of binary variables 438
TABLES

6.1. Correct Versus Inferred Multipliers 147


9.1. A Schematic Representation of a Three-State Transition Matrix 251
12.1. Inferred Parameters of Bank A 349
12.2. Inferred Probabilities for Bank A Using a More Comprehensive
Analysis 350
14.1. The Marginal Effects of the Major Risk Factors and Individual
Characteristics for Patients Admitted to the Emergency Room with
Some Type of Chest Pain 420
14.2. Prediction Table of Both the Risk Factor Model (Regular Font)
and the More Inclusive Model Using Also the Results of the Two
Tests (Italic Font) 421
14.3. Diagnostic Simulation 424
14.4. Prediction Table of the Binary Voting Data Based on the November
Sample 429
14.5. Prediction Table of the Multinomial Voting Data Based on the
November Sample 429
14.6. Inferred Coefficients of the First-Stage Regression of the Drug-
Induced Liver Damage Information 436
BOXES

1.1. A Concise Historical Perspective 4


2.1. On the Equivalence of Probabilities and Frequencies 14
2.2. A Simple Geometrical View of Decision-Making for
Underdetermined Problems 16
3.1. Information and Guessing 35
3.2. Information, Logarithm Base, and Efficient Coding: Base 3 Is the
Winner 36
3.3. Information and Entropy: A Numerical Example 41
4.1. Graphical Representation of Inequality Constraints 67
4.2. Temperature and Its Conjugate Variable 75
4.3. Primal-Dual Graphical Relationship 81
4.4. Bayes’ Theorem 85
4.5. Maximum Entropy Inference: A Basic Recipe 89
5.1. Info-Metrics and Tomography 121
5.2. Networks for Food Webs 123
6.1. The Bose-Einstein Distribution 139
6.2. Prediction Table 146
6.3. Brain Tumor: Definitions and Medical Background 148
6.4. Prediction Accuracy, Significance Level, and miRNA 157
7.1. Maximum Entropy and Statistics: Interrelationships Among Their
Objectives and Parameters 170
7.2. Variance and Maximum Entropy 172
7.3. Relative Entropy and the Cramér-Rao Bound 174
7.4. Information, Maximum Entropy, and Compression: Numerical
Examples 183
8.1. Multivariate Discrete Distributions: Extending the Two-Dice
Problem 205
8.2. Size Distribution Revisited: Constructing the Priors 206
8.3. Simple Variable Transformation 216
8.4. Priors for a Straight Line 218
9.1. Incorporating Theoretical Information in the Info-Metrics
Framework: Inferring Strategies 255
10.1. A Toy Model of Single-Lane Traffic 299
11.1 A Six-Sided-Die Version of Default Logic 312
11.2. Markov Transition Probabilities and Causality: A Simulated
Example 323
xviii { Boxes

12.1. Die, Conditional Die, and Discrete Choice Models 337


13.1. Rényi’s and Shannon’s Entropies 365
13.2. Three-Sided-Die and Information-Theoretic Methods of
Inference 370
13.3. Constraints from Statistical Requirements 383
13.4. Inequality and Nonlinear Constraints from Theory 383
ACKNOWLEDGMENTS

This book is a result of what I have learned via numerous discussions, debates,
tutorials, and interactions with many people over many years and across many
disciplines. I owe thanks and gratitude to all those who helped me with this
project. I feel fortunate to have colleagues, friends, and students who were
willing to provide me with their critiques and ideas.
First, I wish to thank Raphael (Raphy) Levine for his contributions to some
of the ideas in the early chapters of the book. In fact, Raphy’s contributions
should have made him an equal coauthor for much of the material in the first
part of this book. We had many conversations and long discussions about
info-metrics. We bounced ideas and potential examples back and forth. We sat
over many cups of coffee and glasses of wine trying to understand—and then
reconcile—the different ways natural and social scientists view the world and
the place of info-metrics within that world. Raphy also made me understand
the way prior information in the natural sciences can often emerge from the
grouping property. For all of that, I am grateful to him.
Special thanks go to my colleague and friend Robin Lumsdaine, who has
been especially generous with her time and provided me with comments
and critiques on many parts of this book. Our weekly morning meetings and
discussions helped to clarify many of the info-metrics problems and ideas
discussed here. Ariel Caticha and I have sat together many times, trying to
understand the fundamentals of info-metrics, bouncing around ideas, and
bridging the gaps between our disciplines. In addition, Ariel provided me with
his thoughts and sharp critique on some of the ideas discussed here. My good
colleague Alan Isaac is the only one who has seen all of the material discussed
in this book. Alan’s vision and appraisal of the material, as well as major edi-
torial suggestions on an earlier version, were instrumental. Furthermore, his
careful analysis and patience during our many discussions contributed signif-
icantly to this book.
My academic colleagues from across many disciplines were generous with
their time and provided me with invaluable comments on parts of the book.
They include Radu Balan, Nataly Kravchenko-Balasha, Avi Bhati, Min Chen,
J. Michael (Mike) Dunn, Mirta Galesic, Ramo Gencay, Boris Gershman, Justin
Grana, Alastair Hall, Jim Hardy, John Harte, Kevin Knuth, Jeff Perloff, Steven
Kuhn, Sid Redner, Xuguang (Simon) Sheng, Mike Stutzer, Aman Ullah, and
John Willoughby. Min Chen also provided guidance on all of the graphics
incorporated here, much of which was new to me. I also thank all the students xix
xx { Acknowledgments

and researchers who attended my info-metrics classes and tutorials during


the years; their questions and critique were instrumental for my understand-
ing of info-metrics. My students and colleagues T. S. Tuang Buansing, Paul
Corral, Huancheng Du, Jambal Ganbaatar, and Skipper Seabold deserve spe-
cial thanks. Tuang helped with all of the figures and some of the computational
analysis. Huancheng worked on two of the major applications and did some
of the computational work. Ganbaatar was instrumental in two of the applica-
tions. Skipper was instrumental in putting the Web page together, translat-
ing many of the codes to Python, and developing a useful testing framework
for the codes. Paul tested most of the computer codes, developed new codes,
helped with some of the experiments, and put all the references together. I also
thank Arnob Alam, who developed the code for aggregating networks, one
of the more complicated programs used in this book, and Aarti Reddy, who
together with Arnob helped me during the last stage of this project.
I owe a special thanks to my former editor from Oxford University Press,
Scott Parris. I have worked with Scott for quite a while (one previous book),
and I am always grateful to him for his wisdom, thoughtful suggestions, rec-
ommendations, and patience. I am also thankful to Scott for sticking by me
and guiding me during the long birth process of this book. I also thank David
Pervin, my new editor, for his patience, suggestions, and effort. Finally, I thank
Sue Warga, my copyeditor, for her careful and exceptional editing.
I am grateful for the institutions that hosted me, and for all the resources
I have received in support of this project. I am indebted to the Info-Metrics
Institute and its support of some of my research assistants and students.
I thank the Faculty of Science at the Hebrew University for hosting me twice
during the process of writing this book. I thank Raphy Levine for sharing
his grant provided by the European Commission (FP7 Future and Emerging
Technologies—Open Project BAMBI 618024) to partially support me during
my two visits to the Hebrew University. I thank the Santa Fe Institute (SFI) for
hosting me, and partially supporting me, during my many visits at the Institute
during the last three years. I also thank my many colleagues at SFI for numer-
ous enchanting discussions and for useful suggestions that came up during
my presentations of parts of this book. I thank Pembroke College (Oxford) for
hosting me a few times during the process of writing this book. I am also grate-
ful to Assen Assenov from the Center for Teaching, Research and Learning at
American University for his support and help.
Special thanks to Maureen and Ben for their contributions and edits.
Foundations of Info-Metrics
1}

Introduction

Chapter Contents
The Problem and Objectives 1
Outline of the Book 3
References 7

The Problem and Objectives

The material in this book derives from the simple observation that the availa-
ble information is most often insufficient to provide a unique answer or solu-
tion for most interesting decisions or inferences we wish to make. In fact,
insufficient information—including limited, incomplete, complex, noisy, and
uncertain information—is the norm for most problems across all disciplines.
The pervasiveness of insufficient information across the sciences has resulted
in the development of discipline-specific approaches to dealing with it. These dif-
ferent approaches provide different insights into the problem. They also provide
grist for an interdisciplinary approach that leverages the strengths of each. This
is the core objective of the book. Here I develop a unified constrained optimi-
zation framework—I call it info-metrics—for information processing, modeling,
and inference for problems across the scientific spectrum. The interdisciplinary
aspect of this book provides new insights and synergies between distinct scien-
tific fields. It helps create a common language for scientific inference.
Info-metrics combines the tools and principles of information theory,
within a constrained optimization framework, to tackle the universal prob-
lem of insufficient information for inference, model, and theory building.
In broad terms, info-metrics is the discipline of scientific inference and effi-
cient information processing. This encompasses inference from both quanti-
tative and qualitative information, including nonexperimental information,
information and data from laboratory experiments, data from natural
1
2 { Foundations of Info-Metrics

experiments, the information embedded in theory, and fuzzy or uncertain


information from varied sources or assumptions. The unified constrained
optimization framework of info-metrics helps resolve the major challenge to
scientists and decision-makers of how to reason under conditions of incom-
plete information.
In this book I provide the mathematical and conceptual foundations for
info-metrics and demonstrate how to use it to process information, solve
problems, and construct models or theories across all scientific disciplines.
I present a framework for inference and model or theory building that copes
with limited, noisy, and incomplete information. While the level and type of
uncertainty can differ among disciplines, the unified info-metrics approach
efficiently handles inferential problems across disciplines using all availa-
ble information. The info-metric framework is suitable for constructing and
validating new theories and models, using observed information that may be
experimental or nonexperimental. It also enables us to test hypotheses about
competing theories or causal mechanisms. I will show that the info-metrics
framework is logically consistent and satisfies all important requirements.
I will compare the info-metrics approach with other approaches to inference
and show that it is typically simpler and more efficient to use and apply.
Info-metrics is at the intersection of information theory, statistical meth-
ods of inference, applied mathematics, computer science, econometrics,
complexity theory, decision analysis, modeling, and the philosophy of sci-
ence. In this book, I present foundational material emerging from these
sciences as well as more detailed material on the meaning and value of
information, approaches to data analysis, and the role of prior information.
At the same time, this primer is not a treatise for the specialist; I provide a
discussion of the necessary elementary concepts needed for understanding
the methods of info-metrics and their applications. As a result, this book
offers even researchers who have minimal quantitative skills the necessary
building blocks and framework to conduct sophisticated info-metric analy-
ses. This book is designed to be accessible for researchers, graduate students,
and practitioners across the disciplines, requiring only some basic quantita-
tive skills and a little persistence.
With this book, I aim to provide a reference text that elucidates the
mathematical and philosophical foundations of information theory and
maximum entropy, generalizes it, and applies the resulting info-metrics
framework to a host of scientific disciplines. The book is interdisciplinary
and applications-oriented. It provides all the necessary tools and building
blocks for using the info-metrics framework for solving problems, mak-
ing decisions, and constructing models under incomplete information.
The multidisciplinary applications provide a hands-on experience for the
reader. That experience can be enhanced via the exercises and problems at
the end of each chapter.
Introduction } 3

Outline of the Book

The plan of the book is as follows. The current chapter is an introductory one.
It expresses the basic problem and describes the objectives and outline of the
book. The next three chapters present the building blocks of info-metrics.
Chapter 2 provides the rationale for using constrained optimization to do
inference on the basis of limited information. This chapter invokes a specific
decision function to achieve the kind of inference we desire. It also sum-
marizes the axioms justifying this decision function. Despite the axiomatic
discussion, this is a nontechnical chapter. Readers familiar with constrained
optimization and with the rationale of using entropy as the decision function
may even skip this chapter.
The following two chapters present the mathematical framework underpin-
ning the building blocks of Chapter 2. Chapter 3 explores the basic metrics of
info-metrics; additional quantities will be defined in later chapters. Chapter 4
formulates the inferential problem as a maximum entropy problem within the
constrained optimization framework of Chapter 2, which is then formulated
as an unconstrained optimization. Chapter 4 also develops the methods of val-
idation to evaluate the inferred solutions.
The two chapters after that provide a mix of detailed cross-disciplinary
applications illustrating the maximum entropy method in action. They dem-
onstrate its formulation, its simplicity, and its generality in real world settings.
Chapter 5 starts with a relatively simple set of problems. Chapter 6 presents
more advanced problems and case studies.
Chapter 7 develops some of the basic properties of the info-metrics frame-
work. It builds directly on Chapter 4 and concentrates on the properties of
efficiency, optimality, and sufficiency. Chapter 7 fully quantifies the notion of
“best solution.”
Having formulated the basic building blocks, the book moves on to the
broader, more general info-metrics framework. Chapter 8 introduces the con-
cept of prior information and shows how to incorporate such information into
the framework. This chapter also takes up the critical question of how to con-
struct this prior information, and it explores three different routes. The first
approach is based on the grouping property—a property of the Boltzmann-
Gibbs-Shannon entropy, defined in Chapter 3—which is less familiar to social
and behavioral scientists. The second approach is based on the more obscure
concept of transformation groups. Finally, the chapter considers empiri-
cal priors—a concept that is familiar to social scientists but often misused.
Chapter 8 places special emphasis on the extension of these ideas to com-
mon problems in the social sciences. Chapter 9 extends all previous results
to accommodate all types of uncertainties, including model and parameters
uncertainties. This chapter provides the complete info-metrics framework.
All applications and specific problems can be modeled within the complete
4 { Foundations of Info-Metrics

BOX 1.1 } A Concise Historical Perspective

I provide here a brief historical perspective on some of the major research on


inference that leads us to info-metrics. This background is for historical interest
only; it is not needed to understand this book.
The problem of inference under uncertainty is as old as human history.
Possibly the work of the Greek philosophers and Aristotle ( fourth century BC),
where the first known study of formal logic started, led to the foundations for
logical inference. But not until the seventeenth century were the mathematical
foundations of inference under uncertainty formally founded. None of this pre-
seventeenth-century work extended “to the consideration of the problem: How,
from the outcome of a game (or several outcomes of the same game), could one
learn about the properties of the game and how could one quantify the uncertainty
of our inferred knowledge of these properties?” (Stigler 1986, 63).
The foundations of info-metrics can be traced to Jacob Bernoulli’s work in
the late 1600s. He established the mathematical foundations of uncertainty, and
he is widely recognized as the father of probability theory. Bernoulli’s work is
summarized in the Art of Conjecture (1713), published eight years after his death.
Bernoulli introduced the “principle of insufficient reason,” though at times that
phrase is also used to recognize some of Laplace’s work. De Moivre and Laplace
followed on Bernoulli’s work and established the mathematical foundations of
the theory of inference. De Moivre’s three books (1718, 1738, 1756) on probability,
chance, and the binomial expansion appeared before Laplace arrived on the
scene. To support himself early on, De Moivre provided tutorials and consulting,
in London’s coffeehouses, for clients interested in learning mathematics and
quantitative inference.
Although De Moivre developed a number of groundbreaking results in
mathematics and probability theory and practiced (simple) quantitative inference,
his work did not have an immediate impact on the more empirical scientists who
were interested in knowing how to convert observable quantities into information
about the underlying process generating these observables. Approximately at
the time the second edition of De Moivre’s Doctrine of Chances was published,
Simpson (1755) and Bayes (1764) engaged (independently) in pushing Bernoulli’s
work toward establishing better tools of inference. But it was Laplace (1774, 1886),
with his deep understanding of the notions of inverse probability and “inverse
inference,” that finally laid the foundations for statistical and probabilistic
reasoning or logical inference under uncertainty. The foundations of info-metrics
grew out of that work.
To complete this brief historical note, I jump forward almost two centuries
to the seminal work of Shannon (1948) on the foundations of information
theory. Jaynes recognized the common overall objectives and the common
mathematical procedures used in all of the earlier research on inference and
modeling, and understood that the new theory developed by Shannon could
help in resolving some of the remaining open questions. As a consequence,
Jaynes formulated his classical work on the maximum entropy (ME) formalism
(1957a, 1957b). Simply stated, facing the fundamental question of drawing
(continued)
Introduction } 5

BOX 1.1 } Continued

inferences from limited and insufficient information, Jaynes proposed a


generalization of Bernoulli’s and Laplace’s principle of insufficient reason.
Jaynes’s original ME formalism aimed at solving any inferential problem with
a well-defined hypothesis space and noiseless but incomplete information.
This formalism was subsequently extended and applied by a large number
of researchers across many disciplines, including Levine (1980), Levine and
Tribus (1979), Tikochinsky, Tishby, and Levine (1984), Skilling (1988), Hanson
and Silver (1996), and Golan, Judge, and Miller (1996). Axiomatic foundations
for this approach were developed by Shore and Johnson (1980), Skilling (1989),
and Csiszar (1991). See also Jaynes 1984 and his nice 2003 text for additional
discussion.
Though the present book contains minimal discussion of Bayes’ theorem and
related Bayesian and information-theoretic methods, the two (Bayes’ theorem
and information-theoretic inference) are highly related. For a nice exposition, see
Caticha 2012 and the recent work of Toda (2012) as well as the original work of
Zellner (1988).
Naturally, like all scientific methods and developments, info-metrics grew
out of many independent and at times interconnected lines of research and
advancements. But unlike most other scientific advancements, info-metrics
developed out of the intersection of inferential methods across all disciplines.
Thus, rather than providing a long historical perspective, we move straight into
the heart of the book. For the history of statistics and probability prior to 1900,
see Stigler’s classic and insightful 1986 work. For a more comprehensive historical
perspective on info-metrics and information-theoretic inference, see Golan 2008,
which extends the brief historical thread provided by Jaynes (1978).

framework. It encompasses all inferential and model construction problems


under insufficient information. Chapter 9 fully develops the complete inter-
disciplinary vision of this book and the complete info-metrics framework. The
examples throughout the book complement that vision.
Combining the ideas of Chapter 9 with those of the earlier chapters takes us
to model and theory building, causal inference, and the relationship between
the two. The fundamental problem of model development and theory building
is the subject of Chapter 10. The premise of this chapter is that the info-metrics
framework can be viewed as a “meta-theory”—a theory of how to construct
theories and models given the imperfect information we have. That frame-
work provides a rational perspective that helps us to identify the elements
needed for building a reasonably sound model. That premise is demonstrated
via multidisciplinary examples, one of which is very detailed. The same build-
ing blocks are used to construct each one of these examples. This chapter also
places emphasis on the idea that a model should be constructed on all of the
information and structure we know or assume, even if part of that informa-
tion is unobserved. In such cases a mechanism for connecting the observable
6 { Foundations of Info-Metrics

information to the unobserved entities of interest must be provided. Examples


of such a mechanism are discussed as well.
In Chapter 11 the emphasis shifts from model and theory building to causal
inference via constraint satisfaction. The term causal inference here is taken
to mean the causality inferred from the available information; we infer that
A causes B from information concerning the occurrences of both. The first
part of this chapter concentrates on nonmonotonic and default logic, which
were developed to deal with extremely high conditional probabilities. The sec-
ond part deals with cause and effect in a probabilistic way given the informa-
tion we have and the inferential framework we use. The chapter also provides
detailed example that connects some of the more traditional ideas of causal
inference to the info-metrics framework.
The next two chapters connect the info-metrics framework with more tradi-
tional statistical methods of inference. In particular, they show that the family
of information-theoretic methods of estimation and inference are subsumed
within the info-metrics framework. These chapters use duality theory to con-
nect info-metrics with all other methods. Chapter 12 concentrates on discrete
models. In that setting, specific maximum likelihood approaches are special
cases of the info-metrics framework. Chapter 13 concentrates on continuous
models, such as linear and nonlinear regression analysis and system of equa-
tions analysis. It compares the info-metrics framework with the familiar least-
squares technique and other methods of moments approaches for continuous
models. Chapter 13 also shows that the info-metrics framework can accommo-
date possible misspecifications in empirical models. Misspecification issues
are common across the social and behavioral sciences, where the researcher
does not have sufficient information to determine the functional form of the
structure to be inferred. Chapter 13 also demonstrates, via familiar examples,
the trade-offs between functional forms (the constraints in our framework)
and the decision function used in the inference. To demonstrate this, the chap-
ter shows that two different formulations yield the same inferred distribution
even though one of the two is misspecified.
Chapter 14 provides four detailed, cross-disciplinary applications devel-
oped especially for this book. These applications represent diverse fields of
investigation: the medical sciences, political science, and finance. The chapter
illustrates the generality and simplicity of the info-metrics approach, while
demonstrating some of the features discussed throughout the book. Each case
study presents the required empirical background, the necessary analytics
conditional on the input information, the inferred solution, and a brief sum-
mary of its implications.
Each chapter includes exercises and extended problems. Each chapter ends
with a notes section, which summarizes the main references to that chapter
as well as readings on related topics. The book is complemented by a website,
http://info-metrics.org, that provides supporting codes and data sets (or links
Introduction } 7

Rationale & Core Framework Modeling & Extended


Properties Estimation Multidisciplinary
Examples &
1 Case Studies

2
3

4
5

7 6
8

10
11
12 13

Core Chapter (size reflects core ‘level’) 14


Multidisciplinary Examples (size reflects complexity level)

FIGURE 1.1. Chapter dependency chart. The chart provides the logical flow of the book. Though there
are many examples throughout the book, the three chapters devoted solely to examples, shown above,
can be read in order (see arrows) or at any time after reading the relevant chapters.

to the data) for many of the examples presented in the book. It also provides
extended analyses of some of the examples as well as additional examples.
A simple chart illustrating the logical dependencies among the chapters is
provided above. It shows the flow of the book. Though I recommend reading
the chapters in order, the diagram helps those who may be more informed or
are just interested in a certain topic or problem. It also provides the necessary
details for instructors and students.

References

Bayes, T. 1764. “An Essay Towards Solving a Problem in the Doctrine of Chances.”
Philosophical Transactions of the Royal Society of London 53: 37–418.
Bernoulli, J. 1713. Art of Conjecturing. Basel: Thurneysen Brothers.
Caticha, A. 2012. Entropic Inference and the Foundations of Physics. Monograph com-
missioned by the 11th Brazilian Meeting on Bayesian Statistics, EBEB 2012. São
Paulo: University of São Paulo Press.
8 { Foundations of Info-Metrics

Csiszar, I. 1991. “Why Least Squares and Maximum Entropy? An Axiomatic Approach to
Inference for Linear Inverse Problem.” Annals of Statistics 19: 2032–66.
De Moivre, A. 1718. The Doctrine of Chances. London: W. Pearson.
———. 1738. The Doctrine of Chances. 2nd ed. London: Woodfall.
———. 1756. The Doctrine of Chances: or, A Method for Calculating the Probabilities of Events
in Play. 3rd ed. London: A. Millar.
Golan, A. 2008. “Information and Entropy Econometrics: A Review and Synthesis.”
Foundations and Trends in Econometrics 2, nos. 1–2: 1–145.
Golan, A., G. Judge, and D. Miller. 1996. Maximum Entropy Econometrics: Robust Estimation
with Limited Data. Chichester, UK: John Wiley & Sons.
Hanson, K. M., and R. N. Silver, 1996. Maximum Entropy and Bayesian Methods.
Dordrecht: Kluwer.
Jaynes, E. T. 1957a. “Information Theory and Statistical Mechanics.” Physics Review
106: 620–30.
———. 1957b. “Information Theory and Statistical Mechanics II.” Physics Review 108:
171–90.
———. 1978. “Where Do We Stand on Maximum Entropy.” In The Maximum Entropy
Formalism, ed. R. D. Levine and M. Tribus, 15–118. Cambridge, MA: MIT Press.
———. 1984. “Prior Information and Ambiguity in Inverse Problems.” In Inverse Problems,
ed. D. W. McLaughlin, 151–66. Providence, RI: American Mathematical Society.
———. 2003. Probability Theory: The Logic of Science. Cambridge: Cambridge
University Press.
Laplace, P. S. 1774. “Mémoire sur la probabilité des causes par les évènemens.” Mémoires de
l’Académie Royale des Sciences 6: 621–56.
———. 1886. Théorie analytique des probabilités. 3rd ed. Paris: Gauthier-Villars. Originally
published 1820.
Levine, R. D. 1980. “An Information Theoretical Approach to Inversion Problems.” Journal
of Physics A: Mathematical and General 13, no. 1: 91.
Levine, R. D., and M. Tribus, eds. 1979. The Maximum Entropy Formalism. Cambridge,
MA: MIT Press.
Shannon, C. E. 1948. “A Mathematical Theory of Communication.” Bell System Technical
Journal 27: 379–423.
Shore, J. E., and R. W. Johnson. 1980. “Axiomatic Derivation of the Principle of Maximum
Entropy and the Principle of Minimum Cross-Entropy.” IEEE Transactions on
Information Theory IT-26, no. 1: 26–37.
Simpson, T. 1755. “A Letter to the Right Honourable George Earl of Macclesfield, President
of the Royal Society, on the Advantage of Taking the Mean of a Number of Observations,
in Practical Astronomy.” Philosophical Transactions of the Royal Society of London
49: 82–93.
Skilling, J. 1988. “The Axioms of Maximum Entropy.” In Maximum-Entropy and Bayesian
Methods in Science and Engineering, eds Gary J. Erickson and C. Ray Smith; vols 31–32, of
the series Fundamental Theories of Physics (1988) pp 173–187. Boston: Kluwer Academic.
———. 1989. “Classic Maximum Entropy.” In Maximum Entropy and Bayesian Methods,
Cambridge, England, 1988, ed. J. Skilling, 45–52. Boston: Kluwer Academic.
Stigler, S. M. 1986. The History of Statistics: The Measurement of Uncertainty Before 1900.
Cambridge, MA: Harvard University Press.
Introduction } 9

Tikochinsky, Y., N. Z. Tishby, and R. D. Levine. 1984. “Alternative Approach to Maximum-


Entropy Inference.” Physical Review A 30, no. 5: 2638–44.
Toda, A. A. 2012. “Axiomatization of Maximum Entropy Without the Bayes Rule.” In AIP
Conference Proceedings. New York: American Institute of Physics.
Zellner, A. 1988. “Optimal Information Processing and Bayes Theorem.” American
Statistician 42: 278–84.
2}

Rational Inference
A CONSTRAINED OPTIMIZATION FRAMEWORK

Chapter Contents
Inference Under Limited Information 11
Qualitative Arguments for Rational Inference 11
Probability Distributions: The Object of Interest 12
Constrained Optimization: A Preliminary Formulation 15
The Basic Questions 18
Motivating Axioms for Inference Under Limited Information 19
Axioms Set A: Defined on the Decision Function 20
Axioms Set B: Defined on the Inference Itself 20
Axioms Set C: Defined on the Inference Itself 21
Axioms Set D: Symmetry 22
Inference for Repeated Experiments 22
Axioms Versus Properties 24
Summary 25
Appendix 2A: Axioms Set B—A Concise Specification 25
Notes 26
Exercises and Problems 28
References 30

In this chapter I introduce a framework for rational inference on the basis of


limited information. This foundational chapter consists of two interdependent
parts. In the first part I provide the framework for rational inference, which
involves the use of a specific decision function to achieve the kind of inference
we desire. In the second part I summarize four sets of axioms to justify the
decision function I argue for in the first part.

10
Rational Inference: A Constrained Optimization Framework } 11

The second part, axioms, is not essential for the mathematical and tech-
nical understanding nor for the implementation of the inferential methods
discussed in the following chapters. If you skip it now, I hope that curiosity will
bring you back to it once you have perused a few applications in the coming
chapters.
I begin by defining rational inference in terms of an information decision
function, which I call H. Then I briefly reflect on four sets of axioms that pro-
vide alternative logical foundations for info-metric problems dealing with
inference of probability distributions. All four alternatives point to the entropy
function of Boltzmann, Gibbs, and Shannon. I then discuss an important sub-
set of problems: those where an experiment is independently repeated a very
large number of times. These types of problems are quite common in the natu-
ral sciences, but they can also arise elsewhere. When a system is governed by a
probability distribution, the frequency of observing a certain event of that sys-
tem, in a large number of trials, approximates its probability. I show that, just
as in other inferential problems I discuss, the repeated experiment setting also
naturally leads us to the entropy function of Boltzmann, Gibbs, and Shannon
as the decision function of choice. Finally, rather than investigate other sets of
axioms leading to the same conclusion, I take the complementary approach
and reflect on the basic properties of the inferential rule itself.

Inference Under Limited Information


QUALITATIVE ARGUMENTS FOR RATIONAL INFERENCE

I discuss here a framework for making rational inference based on partial,


and often uncertain and noisy, information. By partial or noisy information,
I mean that the problem is logically underdetermined: there is more than a
single inference that can be logically consistent with that information. But
even in the rare case that we are lucky and there is no uncertainty surrounding
the incomplete information we face, there may still be more than a single solu-
tion. Think, for example, of the very trivial problem of figuring out the ages
of Adam and Eve from the information that their joint age is fifty-three. This
problem generates a continuum of solutions. Which one should we choose?
The problem is magnified if there is additional uncertainty about their joint
age. Again, there is more than a single inference that can be logically consistent
with that information. In order to choose among the consistent inferences, we
need to select an inferential method and a decision criterion.
The inferential method for which I argue is grounded in ordinary notions
of rational choice as optimization of an objective, or decision criterion—often
called a decision or utility function. We optimize that decision function in
order to choose a solution for our inferential problem from a set of logically
12 { Foundations of Info-Metrics

consistent solutions. The process of choosing the solution with the help of our
decision criterion is called an optimization process. We optimize (minimize or
maximize) that decision function while taking into account all of the informa-
tion we know. No other hidden information, such as hidden structures, is
imposed in the inferential process. This optimization process is our character-
ization of rational inference.
In this chapter, I discuss the logic for using a particular decision function.
I also provide details about the inferential method: what the decision function
is optimized on, what I mean by “optimization,” and in what way that optimi-
zation should be done.
There are other ways of attacking underdetermined problems. In partic-
ular, we might try to transform the problem with additional resources or
assumptions. For example, we could collect more data or impose restrictions
on functional forms. The first option is often unrealistic: we must deal with
the data at hand. Further, with noisy information the problem may remain
underdetermined regardless of the amount of information we can gather. The
second option has an unattractive feature: it requires imposing structure we
cannot verify. Therefore, I treat all of our inferential problems as inherently
underdetermined.

PROBABILITY DISTRIBUTIONS: THE OBJECT OF INTEREST

Practically all inferential problems, across all disciplines, deal with the infer-
ence of probability distributions. These distributions summarize our inferences
about the structure of the systems analyzed. With the help of these inferred
distributions, we can express the theory in terms of the inferred parameters,
or any other quantity of interest. Info-metrics, like other inferential methods,
is a translation of limited information about the true and unknown probabil-
ity density function (pdf) toward a greater knowledge of that pdf. Therefore,
we express any problem in terms of inferring a probability distribution—often
a conditional probability distribution. That distribution is our fundamental
unobserved quantity of interest.
Naturally, we want the data to inform our inferences. We specify observed
quantities via some functions of the data—such as moments of the data, or
other linear or nonlinear relationships. We choose functions that connect the
unobserved probability distributions to the observed quantities. These func-
tions are called the constraints; they capture the information we know (or
assume we know) and use for our inference.
In info-metric inference, the entity of interest is unobserved. For example,
it may be a probability (or conditional probability) distribution of character-
istics in a certain species, or a parameter capturing the potential impact of a
certain symptom of some disease. The observed quantities are usually some
expected values, such as the arithmetic or geometric means.
Rational Inference: A Constrained Optimization Framework } 13

Often, the unobserved quantities are the micro states, while the observed
quantities capture the macro state of the system. By micro state, I mean the
precise details of the entities of interest—the elements of the system studied,
such as the exact positions and velocities of individual molecules in a con-
tainer of gas, or the exact allocation of goods and means of production among
agents in a productive social system. By macro state, I mean the values of attri-
butes of the population or system as a whole, such as the volume, mass, or total
number of molecules in a container of gas. A single macro state can corre-
spond to many different possible micro states. The macro state is usually char-
acterized by macro variables, whereas different micro states may be associated
with a particular set of values of the macro variables. A statistical analogy of
these micro-macro relationships may help. The micro state provides a high-
resolution description of the physical state of the system that captures all the
microscopic details. The macro state provides a coarser description of lower
resolution, in terms of averages or moments of the micro variables, where dif-
ferent micro states can have similar moments.
Once the constraints are specified, we optimize a decision function sub-
ject to these constraints and other requirements. That decision function is
a function of the fundamental probability distribution of interest. There
are other potential decision criteria, some of which I discuss briefly in sub-
sequent chapters, but in this chapter I provide the rationale for the choice
of decision function used here. Using a fundamental method for finding
optima—the tools of the calculus of variations in general, and the variational
principle (attributed to Leibniz around 1707) in particular—we employ our
decision criterion for inferring probability distributions conditional on our
limited information.
Our objects of interest are most often unobserved probability distributions.
We infer these probabilities using the tools of info-metrics. With these inferred
probability distributions we can predict, or further infer, any object of interest.
However, dealing with observed information as the inputs for our inference
means that in practice we deal with frequencies rather than with “pure” proba-
bilities. In our analyses, I do not differentiate among these quantities. I provide
the reasoning for that in Box 2.1. Briefly stated, we can say that probabilities
are never observed. We can think of probability as a likelihood—a theoreti-
cal expectation of the frequency of occurrence based on some laws of nature.
Some argue that probabilities are grounded in these laws. (See Box 2.1 for a
more detailed discussion.) Others consider them to be somewhat subjective or
involving degrees of belief. Regardless of the exact definition or the research-
er’s individual interpretation, we want to infer these probabilities. Stated dif-
ferently, regardless of the exact definition, I assume that either there is one
correct (objective) probability distribution that the system actually has or that
there is a unique most rational way to assign degrees of belief to the states of
the system. These are the quantities we want to infer. Frequencies, on the other
14 { Foundations of Info-Metrics

BOX 2.1 } On the Equivalence of Probabilities and Frequencies

I summarize here the reasoning for conceptually comparing probabilities with


frequencies.
One way to interpret probabilities is as likelihoods. The likelihood of an event is
measured in terms of the observed favorable cases in relation to the total number
of cases possible. From that point of view, a probability is not a frequency of
occurrence. Rather, it is a likelihood—a theoretical expectation of the frequency
of occurrence based on some laws of nature.
Another, more commonly used interpretation is that probabilities convey
the actual frequencies of events. But that interpretation holds only for events
that occur under similar circumstances (events that arise from the exact same
universe) an arbitrarily large number of times. Under these circumstances, the
likelihood and frequency definitions of probabilities converge as the number of
independent trials becomes large. In that case, the notion of probability can be
viewed in an objective way as a limiting frequency.
Thus, under a repeated experiment setup, I treat probabilities and frequencies
similarly. This view is quite similar to that in Gell-Mann and Lloyd 1996.
But how should we handle the probabilities of events that are not repeatable—
a common case in info-metrics inference? The problem here is that we cannot
employ the notion of probability in an objective way as a limiting frequency;
rather, we should use the notion of subjective probability as a degree of rational
expectation (e.g., Dretske 2008).
In that case we can relate the notion of subjective probabilities (subjective
degree of rational expectation) to that of a subjective interpretation of
likelihood. Within an arbitrarily large number of repeated experiments (arising
from the same universe), our subjective probabilities imply relative frequency
predictions. But this doesn’t solve our problem completely, as we still may
have non-reproducible events. So I add the following: For non-reproducible
events, probabilities and observed frequencies are not the same. But the best
we can do is use our observed information under the strict assumption that
the expected values we use for the inference are correct. If they are, then we
know that our procedure will provide the desired inferred probabilities. In that
case we can say that our “predicted frequencies” correspond to our “subjective
probabilities.”
Given the above arguments, our formulation in the repeated experiments
section, and the axiom at the beginning of the axioms section—all observed
samples come from a well-defined population (universe) even if that universe
is unknown to us—I treat probabilities and frequencies as likelihoods (even if
subjective at times).
For a deeper discussion of this, see the original work of Keynes (1921), Jeffreys
(1939), Cox (1946), and Jaynes (1957a and b) and more recent discussions in the
work of Gell-Mann and Lloyd (1996, 2003) and MacKay (2003).
Rational Inference: A Constrained Optimization Framework } 15

hand, are observed. They are not subjective. We use our inferred probabilities
to update our subjective probabilities using the information we have.

CONSTRAINED OPTIMIZATION: A PRELIMINARY


FORMULATION

Simply stated, the basic inferential problem can be specified as follows. We want
to infer a discrete probability distribution P associated with the distribution of
values of a random variable X, where X can take on K discrete and distinct
values, k = 1, …, K . There are two basic constraints on these probabilities: the
probability of any particular value is not negative, pk ≥ 0, and the probabilities
must assume values such that ∑ k pk = 1. The latter is called the normalization
constraint. The random variable X is a certain variable of interest with K mutu-
ally exhaustive states such as the number of atoms or molecules in a gas, or indi-
viduals’ wealth, or the incidence of a certain disease. In a more general setting,
the K possible values of X stand for possible states of a social, behavioral, phys-
ical, or other system, or possibly the set of outcomes of a certain experiment, or
even the classification of different mutually exclusive propositions.
Let f m ( X ), m = 1,..., M , be functions with mean values defined as
∑ k fm (xk ) pk = fm ( X ) ≡ E[ fm ( X )] ≡ ym, where “E” and “ ⋅ ” are synony-
mous notations for the expectation operation, and f m ( X ) and ym stand for
the expected value of f m ( X ). Our inference will be based on M + 1 pieces of
information about the pk ’s, namely, normalization and the M expected val-
ues. This is the only information we have. In this chapter we are interested in
the underdetermined case (the most common case in info-metrics inference),
where ( M + 1) < K : the number of constraints is smaller (often very much
smaller) than the number, K, of unknown probabilities that we wish to infer.
Suppose we have a differentiable decision function, H(P ). Then our infer-
ence problem is amenable to standard variational techniques. We need to
maximize (or minimize) H subject to the M + 1 constraints. The mathematical
framework for solving the problem is within the field of calculus, or for contin-
uous probability density functions it is within the calculus of variations (or the
variational principle), by searching for a stationary value, say the minimum
or maximum, of some function. Expressed in mathematical shorthand, the
constrained optimization inferential framework is:
Over all probability distributions {P }
Maximize H ( P )
subjectt to
(2.1)

K
k =1
pk = 1
and

K
k =1 m
f (x k ) pk = f m ( X ) ≡ ym , m = 1,...,M; M < K .
16 { Foundations of Info-Metrics

BOX 2.2 } A Simple Geometrical View of Decision-Making for Underdetermined Problems

Consider the simplest underdetermined problem (top panel of the figure


below) of solving for x1 and x 2 given the linear condition W = α1 x1 + α 2 x 2 where
α1 and α 2 are known constants and the value of W is the constraint. (See dark
line, with a negative slope, on the top panel of the figure that shows the x1 and
x 2 plane.) Every point on that (dark) line is consistent with the condition. As
the plot shows, there are infinitely many such points. Which point should we
choose? We need a “decision-maker.” Such a decider can be mathematically
viewed as a concave function H whose contours are plotted in the figure.
Maximizing the value of the concave function H subject to the linear constraint
W makes for a well-posed problem: at the maximal value of H that just falls on
the constraint we determine the unique optimal solution x1* and x 2* (contour C2
in the figure).
Mathematically, the slope of H must equal the slope of the linear function
at the optimal solution. The optimal solution is the single point on H where the
linear constraint is the tangent to the contour. (I ignore here the case of a corner
solution where only one of the two x’s is chosen and it is positive.) The contours
to the northeast (above C2 ) are those that are above the constraint for all their
values. Those that are below C2 intersect with the constraint at more than a single
value. Only C2 satisfies the optimality condition.
The bottom panel provides a similar representation of an underdetermined
problem, but this time it is the more realistic case where the value of the constraint
may be noisy. The light gray area in between the dashed lines captures this “noisy”
constraint (instead of the dark line in the left panel). The noisy constraint can
be expressed as W = α1 x1 + α 2 x 2 + ε , where ε represents the noise such that
the mean of ε is zero. Equivalently, we can express the noisy information as
W + ∆W = α1 x1 + α 2 x 2 , where now ∆W captures the realized value of the noise.
Again the solution is at the point where the plane is tangent to H, but this time the
optimal solution is different due to the noise.

SOME EXAMPLES OF OPTIMIZATION PROBLEMS

Economics—Consumer: Given two goods x1 and x 2 , the linear line is the budget
constraint; the α’s are the prices of each good (the slope of the line is the price
ratio), and H is a preference function (utility function, which is strictly quasi-
concave). The x * ’s are the consumer’s optimal choice.
Economics—Producer: If the x’s are inputs, the linear line is the input price ratio
and H in this case is some concave production function.
Operation Research: Any optimization problem, of any dimension, where we
need to find an optimal solution for a problem with many solutions. The H to be
used is problem specific; often it is in terms of minimizing a certain cost function.
Info-metrics: All the problems we deal with in this book where the objective
is to infer certain quantities (e.g., probabilities) from partial information. H
is the entropy (as defined in Chapter 3). The x*’s may be the optimal choice of
probabilities.
(continued)
BOX 2.2 } Continued

FIGURE BOX 2.2. Constrained optimization of an underdetermined problem: a graphical representation. The
figure shows the optimal solution of solving for x1 and x 2 given the linear condition W = α1 x1 + α 2 x 2 ,
where α1 and α 2 are known constants and the value of W is the constraint. The top panel shows the perfect
case where there is no additional noise. The dark line, with a negative slope, is the constraint. Every point on
that (dark) line is consistent with the condition. There are infinitely many such points. We use the concave
function H, whose contours are plotted in the figure, to choose the optimal solution. The unique optimal
solution x1* and x 2* (Contour C2 in the Figure) is at the maximal value of H that just falls on the constraint.
The contours to the north-east (above C2 ) are those that are above the constraint for all their values. Those
that are below C2 intersect with the constraint at more than a single value. Only C2 satisfies the optimality
condition. The bottom panel provides a similar representation of an underdetermined problem but where the
value of the constraint is noisy. The light gray area in between the dashed lines captures this “noisy” constraint:
W = α1 x1 + α 2 x 2 + ε , where ε represents the noise such that the mean of ε is zero. Again, the solution is at
the point where the plane is tangent to H but this time the optimal solution is different due to the noise.
18 { Foundations of Info-Metrics

In words: We maximize the decision criterion H subject to the constraints.


In Chapter 3, we will propose a decision criterion that captures the distance
(in information units) between our desired quantity P and a state of complete
uncertainty (no constraints are imposed).
The optimal solution generated by the prescription (2.1) is the inferred
solution. I must emphasize that often we do have some prior knowledge about
the possible outcomes of X. That knowledge comes from nature, behavior, or
society. It is obtained through all the information we know about the system
apart from the observed sample. Stated differently, these priors arise from
some fundamental properties of the system or from other logical reasoning.
Unlike the constraints that must be satisfied within the optimization proc-
ess, the prior information is introduced into the optimization in the criterion
function H (P ). In Chapter 8, I will discuss that prior information and will
develop different ways of specifying and formulating it within our framework
of rational inference. For now I just acknowledge the existence and impor-
tance of such information.

The Basic Questions

Using the inferential framework (2.1), we optimize a decision function sub-


ject to the constraints. The function we optimize allows us to make a decision
about the solution for our problem. This framework provides a logical way
for converting underdetermined problems into well-posed ones. But the main
issues that are yet to be resolved are:
A. What decision (objective) function (H in our notations here) should
be used?
This is the subject of this chapter because while the points below
are all central, they depend on the specific problem analyzed, on the
observed information, and on other considerations. What is H is the
paramount issue.
B. What do we optimize on?
Though this is problem specific, it is a fundamental issue that arises in
each problem. I discuss this issue throughout the book and in every
example I provide.
C. How should the constraints be specified?
I discuss this in Chapters 4 and 9 as well as in the applications
presented.
D. Is there prior information, or a fundamental reference state, that should
be used? This important issue is the subject of Chapter 8.
E. What other soft information (theory, axioms, assumptions, conjectures,
beliefs, intuition) can be used?
Rational Inference: A Constrained Optimization Framework } 19

These two issues are discussed in Chapters 8–9 and 12–13 and in the exam-
ples provided throughout the book.
As discussed in Chapter 3, the objective function in the info-metrics
approach is an informational one. I will show that out of all possible infor-
mational criteria, the one due to Boltzmann-Gibbs-Shannon is the most rea-
sonable one to use. That criterion, to be quantified and studied in Chapter 3,
is H ( P ) = − ∑ k =1 pk log pk . I must emphasize that while there are different sub-
K

tleties in the exact meaning of what H is, for our info-metrics formulations we
can ignore these differences.
So far I have argued that we need to choose a criterion that will identify
one of the many solutions—the one I call optimal. We then use that criterion
within a constrained optimization framework to carry out our inference. The
resulting solution is our inferred distribution. We now ask how to determine
the criterion for that decision.

Motivating Axioms for Inference Under Limited Information

In this section I briefly discuss, in a qualitative way, fundamental axioms that


can be used to validate the method of info-metrics. Since we are dealing here
with a framework for inference in inherently underdetermined problems, the
axioms should be defined on the requirements for the solution. The solution
is in terms of a probability distribution. Sometimes it is useful to extend the
framework to a set of probability distributions in multivariate or multientity
problems where the constraints are defined jointly over different distributions.
Such cases are shown throughout the book.
There are several consistent sets of axioms that are not independent but
which are different in terms of the spirit of their requirements. As is to be
expected, each one of these sets of axioms yields the very same info-metrics
framework. For philosophical reasons, I present here four sets of such axioms;
each one covers a different facet of the problem. I do not discuss here the pros
and cons of each one of these sets. For this the reader is referred to the exten-
sive literature on that issue. Generally speaking, I find it rather convincing that
all axioms lead to the same conclusion.
Before I state the axioms and conclusion, I have to specify two attributes
that are common to all the axioms I discuss. First, the axioms provided here
are independent of noise. They are defined for information in general. Second,
there is one fundamental axiom that is common to all of these sets. In fact,
I believe that it applies to practically all inferential problems. I assume that
all observed samples come from a well-defined population (universe) even
if that universe is unknown to us. That means that if we are able to design an
experiment and repeat it within that (possibly unknown) population, the basic
statistical properties captured by the samples emerging from that repeated
20 { Foundations of Info-Metrics

experiments must be the correct properties. (In Chapter 9 I discuss more


complex problems where the underlying structure is unknown and keeps
evolving.)

AXIOMS SET A: DEFINED ON THE DECISION FUNCTION


For historical perspective I start with the set of axioms specified by Jaynes
(1957a) in his original work. Jaynes’s objective was to infer a discrete proba-
bility distribution with K ≥ 2 outcomes that are mutually exclusive and such
that all possible outcomes are included, so that the distribution is normalized:
∑ k =1 pk = 1. His choice of outcomes was dictated by the requirement that the
K

conditions for “consistency” of inference must be the same as those previously


introduced by Shannon for the definition of information, which is discussed
in Chapter 3. Under this requirement H must be the entropy function as used
by Boltzmann, Gibbs, and Shannon.

AXIOMS SET B: DEFINED ON THE INFERENCE ITSELF

In this case the inference itself is assumed to be an optimization problem.


The axioms show that the info-metrics inference rule is a correct and unique
rule of inference. The axioms are the requirements necessary to determine the
exact form of H, but the axioms are on the inference rule itself. For clarity
I specify five requirements, but it is possible to condense these into a smaller
number of requirements. As with other inferential methods, it is required that
the function we search for must be independent of the information used in
the inference (it cannot be problem specific or a function of the information
used—say, the constraints). The discussion of this set of axioms is based on the
work of Shore and Johnson (1980), Skilling (1988, 1989), and Csiszar (1991).
This set of axioms is for a more general framework of problems. In these
problems, the objective is to identify the best inferential rule that will convert
some prior probabilities into inferred probabilities when new information (the
constraints) is taken into account. In the previous case, (2.1), the prior prob-
abilities were implicitly taken to be uniform. (Chapter 8 elaborates on that
point.)
Given the problem (2.1), what function H is acceptable in the sense that
it satisfies a set of axiomatic requirements? The requirements are the follow-
ing. First, uniqueness (the solution must be unique). Second, invariance. This
axiom states that if we solve a given problem in two different coordinate sys-
tems, both sets of inferred solutions are related by the same coordinate trans-
formation. The solution should not vary with a change of coordinates, say
from Cartesian to polar coordinates. This requirement means that the actual
system of coordinates used has no information in it. Third, system independ-
ence. This one deals with the totality of information in the inferential process.
Rational Inference: A Constrained Optimization Framework } 21

It says that it does not matter if we account for independent pieces of informa-
tion about independent systems in terms of separate probability distributions
(or densities) or in terms of one joint distribution (or density).
The fourth requirement is subset independence. It means that it does not mat-
ter if we treat disjoint or independent subsets of the system states in terms of
conditional densities or in terms of a full density. A simple example inspired by
the work of Uffink (1996) may help here. Consider forecasting the results of the
next election. There are two parties: Democrats (D) and Republicans (R). Say
the former includes Green Party voters and the latter includes Tea Party voters.
A recent Republican poll provided some new information about that party’s
potential voters. It shows a shift from Tea Party to mainstream Republicans.
The subset independence requirement states that the updating inferential rule
will be such that it affects only the non-Democrat part of the distribution.
The last axiom is scaling. It requires that in the absence of new information,
the inferred probability distribution must be equal to the prior probabilities.
Though this seems trivial, it makes much sense. If not, the inferred probabili-
ties may haphazardly change even in the absence of new information.
The following theorem (originated by Shore and Johnson [1980]) states that
for the basic problem (2.1), with a finite information set, the only probabil-
ity distribution that satisfies these five axioms, resulting from an optimiza-
tion procedure, must be the one with H being a generalized version of the
Boltzmann-Gibbs-Shannon entropy known as the relative entropy (Chapter 3).
This theorem holds for both discrete and continuous random variables.
Since this set of axioms was the original set of axioms justifying the infer-
ential rule itself for a large class of problems within info-metrics, it initiated a
large body of work, critique, and discussion in the literature. I therefore provide
a more mathematically concise statement of these axioms in Appendix 2A.

AXIOMS SET C: DEFINED ON THE INFERENCE ITSELF

In this case the inference itself is not within an optimization framework. Only
two natural requirements are imposed. These two requirements, called con-
sistency conditions here, must be satisfied by any information processing rule.
The information processing rule itself is viewed as an “algorithm” for inducing
a probability distribution for reproducible experiments. The first requirement
is that the information processing algorithm is “uniform”: information coming
from the same underlying process (population) must be treated equally. The
second requirement is reproducibility: the observed information must come
from a reproducible experiment. (See also the discussion in the notes to this
chapter.)
A theorem shows that under these requirements the same function H (the
entropy, as discussed in the previous case) arises. This set of axioms, how-
ever, holds only for inference of discrete probability distribution conditional
22 { Foundations of Info-Metrics

on linear conservation laws (linear constraints). This framework is due to


Tikochinsky, Tishby, and Levine (1984a, 1984b).

AXIOMS SET D: SYMMETRY

In this case, the inference itself could be of any form. This requirement is based
on the principle that the information processing rule chosen must yield a solu-
tion (probability distribution) that is invariant under the symmetry opera-
tions that the system admits. All the problem’s symmetries (properties) must
be transformed (from the prior information), via the information processing
rule, to the inferred solution. One can think of this as the requirement for con-
servation of properties from “input” to “output.” A theorem shows that, again,
the entropy function is the only function satisfying that invariance require-
ment. In addition to its simplicity, the advantage of this formulation is that
it allows the observer to identify and specify the information correctly. This
formulation is due to the work of Levine (1981).

Inference for Repeated Experiments

We now limit the scope of our considerations to a rather special, and yet quite
important, set of inferential problems: inference based on information result-
ing from repeated independent experiments. These circumstances are com-
mon in the natural sciences, but they can also occur in more general settings.
We consider an individual event that can result in a number of distinct out-
comes and an actual experiment that is viewed as a large number of independ-
ent repetitions of the same event.
Repeated tossing of the same die is an example where the outcomes of the
experiment are the long-run means of the observed occurrences for each face
of the die. In this case, tossing the die is the “event” and the distinct outcomes
are the numbers 1 through 6 on the upper face of the die. The experiment is the
independent repetitions of tossing the same die.
Boltzmann (1872), on the other hand, was interested in the properties of
dilute gases, where molecules move essentially freely, apart from occasional
collisions upon which their velocity can change. He divided the velocity range
into small intervals and wanted to infer the fraction of all molecules with
velocity in a given velocity bin. He already knew that even in a dilute gas there
is a huge number of molecules in a small volume. He argued that only the most
probable macro state (i.e., distribution of velocities) will be observed under
such circumstances.
In a more formal way, let there be N molecules with N k molecules occu-
pying velocity bin k. The set {N k } of such occupation numbers must clearly
Another random document with
no related content on Scribd:
bridegroom part of the way to the bride’s apartment. So, lifting the
heavy Mohammed, I carried him a few paces. He was evidently
pleased at my doing him this friendly service, and, the form having
been gone through, sprang quickly down, and, taking me and one of
his other friends each by a hand, began to run. Before us sped a
young man; the rest followed. We were breathless when we reached
the caves.
All was in order. A crowd of spectators began to gather
immediately, and we slipped in through the gates and down the
passage, rapidly crossed into the first court, thence through the
underground passage and out into the other court. This was half-
dark, but from one of the caves shone a light. Here we entered. The
vaulted oblong room with its whitewashed walls was brilliantly lit up.
At the far end a carpet hung right across the room, concealing
something on the ground; in front was spread another carpet. Here
Mohammed seated himself, facing the door. There was no other
furniture visible.
On the bridegroom’s left his friend took a seat, pointing to me to
take my place on the right. There was not the slightest sign on the
features of the former expressive of any emotion, either of gladness
or gravity. To the looker-on he appeared merely phlegmatic, and sat,
wrapped in his cloak, staring into vacancy. His friend, who was also
clothed in red, sat, like himself, in silence.
In the open doorway I saw the faces of Belkassim and Mansur,
also some children, Jews, and the men who had followed us. No
women were present.
When we had been seated thus for a while, there appeared, from
the part of the room divided from us by the hangings, a large dish of
kus-kus and, soon after, a pitcher of water. These were placed
before Mohammed, who took a mouthful of the food—the first meal
prepared for him by his bride.
We sat silent a moment longer, then Belkassim dismissed the
spectators from the door, and I rose, shook hands with my friend the
bridegroom, and left. In the doorway I looked back. There sat the
bridegroom, dumb and stiff, but behind him I saw the carpet being
drawn a little aside, and in the dim light beyond it fancied I caught
sight of a woman’s face. Whether it were pretty, young, or smiling, I
know not. I only know that it must have been the bride’s.
In the open air the festival was in full swing. Closely packed in
front of the gate, and all along the approach, sat veiled women. The
banks and hollows were white with spectators. The negroes danced,
played, and drummed. There a mulatto sang a droll ballad; here two
men danced a stick dance, and so on.
I had been requested to take my place amongst the bridegroom’s
friends, who held themselves in a group apart, prepared to show him
this last day’s homage.
About an hour elapsed, then from the bridal cave a muffled gun-
shot was heard; it was scarcely noticeable, as the shrieking and
booming of the music overpowered all sounds. Ali hastily handed me
a gun, which I discharged, and several shots were fired from our
group. Every one of those present knew what this meant, and
rejoiced, but none more so than the family of the bride. They, who
had waited anxiously, were reassured, for she would not now, under
cover of the silent dark night, and wrapped in a grey blanket, be
hunted at a given word out of the village, and driven home to
sorrowing and disgraced parents and relatives.
The festival was not interrupted, but continued as before.
With the Khalifa by my side, and surrounded by his sons, the
sheikhs, and the principal guests, I remained seated all the evening
and far into the night, watching the entertainment, that in course of
time became very monotonous.
Now and again some men stepped forward, either singly or two
together. Over their shoulders hung red cloaks, and they posed in
graceful attitudes, with their heads held high, one foot forward, and
the left arm hidden beneath the burnous and the red cloak, whilst the
right hand was extended. On each side of them crouched a negro,
with the flaps of his burnous spread out before him to catch the coins
shortly to be thrown to him. Round these figures danced other
negroes, whilst the drums played.
Now one of the red figures raised an arm and threw a coin into the
negro’s lap, then again, slipping his hands into the folds over his
breast, pulled out another coin. This went on incessantly, that all
might witness how much money was distributed.
First it was the turn of the representatives from Beni Sultan, then
from Zaraua, Tamezred, or other villages, who in this way paid the
tribute expected of them on such festive occasions for the benefit of
the negro musicians.
When at last the men ceased, and the chink of coin was no longer
heard, one of the negroes advanced towards the group of women,
and, half-singing, half-declaiming, told them that the men of such
and such villages had given so much, at the same time praising not
only their generosity, but also their other virtues. Now and then his
song was interrupted by the “Yu, yu” of the women, which this
evening, owing to the number of voices, sounded quite imposing.
When he concluded, the applause was deafening.
Now and then a solo was sung, two or three voices joining in the
chorus that followed, the singers sticking their noses as close
together as they could during the performance. These songs are
always sung in a nasal tone, without any modulation, and the time
never varies.
CHAPTER XI

Over the Mountains and across the Plain

from Hadeij to Metamer

During the night most of the guests wended their way homewards,
but a few still remained next morning; some of whom desired to
accompany me to Beni Sultan.
The bridegroom was expected to emerge from his cave at any
moment, so I lingered awhile, partly in hopes of bidding him farewell,
and also because I had been told he would be received with
rejoicings, and would distribute sweetmeats amongst the village
children. But the time fixed for my departure came, and I had to ride
off without witnessing this concluding scene of the festival.
Mansur’s mule was brought me. The Khalifa himself arranged my
saddle and lengthened the stirrups, thus showing me the final marks
of courtesy. He then gave the guide his instructions, and I took my
leave with warm expressions of thanks to him and to his sons, and
also to the assembled men. I rewarded little Ali for the services he
had rendered me, bowed respectfully to the Khalifa, and rode off with
my heart full of gratitude towards him and his people for their great
hospitality, and with the pleasant impression that my stay in the
Matmata mountains had given me the opportunity of seeing manners
and customs which, to my knowledge, no European had yet
witnessed in these regions. I thankfully recognised my good fortune
in having had the goodwill and assistance of the authorities; and
was, above all, grateful for the great hospitality of the people from
whom I had then just parted, and for their friendship which I flattered
myself I had gained.
A mule saddle is very broad, and resembles somewhat a pack. Its
peculiarity is that the stirrup-leathers are not secured to it. A leather
strap with a stirrup hung at each end is slung over the saddle, so
that, to mount, one must either vault into the saddle without setting
foot in the stirrup or be lifted into it. To anyone accustomed to the
ordinary English saddle it is an extremely uncomfortable seat, as it is
necessary to bear equally on both stirrups, or one risks losing one’s
balance and falling off; but I must say my mule proved to be
altogether a success on the difficult mountain road.
For nearly an hour we rode along the mountain top, whence we
had a lovely view; then we descended into a long valley in which
were many half-dead olive trees and green palms. Just as we began
the descent, we met a couple of men on their way to Hadeij from
Beni Sultan to complain to the Khalifa that their sheep had been
stolen.
In the valley was a deep, broad river bed, then dry, and the
mountain sides were furrowed with deep watercourses leading
thereto. In these furrows stood a number of palms surrounded by
embankments.
We halted in a lovely grove of olives, amongst which sprung a few
palms. Here some of our guides awaited us. They had crossed the
mountain by a shorter but precipitous path, whilst we had circled
round by a less steep and fatiguing route.
From an eminence some way down the valley we observed a
village looking like an eagle’s nest. This was the ancient Beni Sultan,
now deserted and in ruins, the present village lying on the incline on
the farther side of the mountain. A few of the houses in this deserted
village were excavated in the ground.
We travelled directly across the valley, and by a very dangerous
and slippery path reached the lowest point of the mountain ridge.
From thence we looked down on the valley on the other side. Facing
us were the ruins of the old village, standing picturesquely against
the sky. We rode down in a zigzag line past the farms and houses
scattered on the mountain side; the dwellings were crowded with
domestic animals, with men and women, and especially with
children.
Not till we reached the valley did we halt, close to the descent to a
cluster of cave dwellings belonging to the Sheikh, in absence of
whom I was most cordially received by one of his nearest relatives.
The passage to the cave was not covered, and was cut into steps
where it sloped down into the ground towards the gate. On one side
of the wall by the steps was dug out a vaulted and somewhat
decorated cave; this was the guest-room where I dined. Afterwards I
visited the nearer of the Sheikh’s houses, with permission to
examine them from top to bottom.
In the main these dwellings were on the same plan as those of
Hadeij, but I found several cisterns in both the farms and the ruins.
Water flowed from the mountains into these through canals and
primitive pipes.
The caves were not all dug down and around a courtyard, but
were often high up on a perpendicular wall, and were reached by
steps.
The women offered me dates and showed me their looms. I saw
where they slept, generally on benches like low tables, called by
them “mokera.”
In one of the underground vaults, to which the access was
through a very heavy gate, was an oil-mill, and in another a granary.
After spending a couple of hours in the shelter of the caves, we
again started riding through the valley in a southerly direction, and
passing through large palm and olive groves. Nowhere in the
mountains had I seen such rich vegetation.
Close to the village were some ten women clad in dark blue,
drawing water from one of the few wells on this mountain. Two large
columns, formed of hewn palm stems, were inserted on either side of
the well, so as to slope inwards. These supported another palm stem
placed horizontally on the top of them; this again sustained a
wooden disk by means of which the water was hauled up. This
system of drawing water is rather comical, for the women, instead of
hauling up the bucket by moving their hands on the rope, seize the
latter and take a quick run, the distance covered being equal to the
depth of the well. When they have thus drawn a pitcherful of water
they return to the well to take another run.
We constantly passed spots in the valley planted more or less
largely with olive trees, but some of these were in an unhealthy
condition, showing grey or yellow instead of a deep fresh green. If
rain were not soon to fall these would die, and it would be many
years before others could be grown and bear fruit.
It cut me to the heart to see all this wealth on the verge of
destruction, and the more so when I learnt that the Khalifa owned
many of the trees. Rain had fallen in many other districts, but none in
this.
Quitting the valley we turned to the right, and rode in a westerly
direction amongst colossal cliffs and into a wild ravine, where we
were surrounded to the north, west, and south by towering rock
pinnacles. Only the very centre of this chasm was reached by the
sun, which, hidden behind the mountain, streamed in glorious
radiance through a rift in the wall of the cliff. On either side of this rift,
with the light playing on their roughly piled grey masses, were the
two villages of Tujan, clinging to the precipitous sides like swallows’
nests to a wall. On one side, high up the mountain, I caught a
glimpse of what appeared to be an eagle’s nest as the sunlight
glanced on it. On inquiring what it was I received the reply that in old,
very old, days the village people resided there, before they moved
lower down the slope.
When we arrived at a difficult pass, my guide, “Erzib ben Hamed,”
who had his home in the village, asked me to dismount. So, leading
our animals, we walked slowly up, our feet slipping, and the stones
rattling down behind us. Beneath some olive trees we again
mounted to make our entry.
We were now near enough to discern that the rift was a deep
ravine; on either side was an irregular mass of dreary, grey houses
piled one over the other, above which the nearly perpendicular cliffs
rose steeply to almost the very top of the mountain, broken only in
one place by a flat surface. On the side nearest to us stood the ruins
of the village of bygone days, perched like a mediæval castle on the
summit of the cliff.
I sat in silent contemplation of this imposing sight, till interrupted
by Hamed, the faithful Hamed, who came up dragging his horse
behind him. He told me to turn round and look at the view of the
Mediterranean.
Great heavens! how glorious was the sight when I raised myself
and looked back. It was so beautiful that even Hamed and the Arabs
were awed by its splendour.
Looking down directly over the slopes, the valley, and the
mountains on either side, we saw the blue sea far away beyond the
plains. In the evening light all the tints of blue, violet, brown, yellow,
and green were softly blended and intermingled as into a veil which
spread over the whole landscape, and imagination divined more than
was actually visible, thus adding to the fascination of the scene.
By the first house, the Sheikh’s, we halted. Some people came
out, one of whom, I suppose, acted as his representative, since he
invited me in; but Hamed was already off his horse and had gone in
to look at the quarters. He returned and announced that they were
very bad, upon which I inquired whether notice had been given of my
coming, and whether the Sheikh had not directed that I should be
given decent accommodation. The spokesman insisted that there
was no other room available. I suspected this to be false, and
ordered Hamed and Erzib to mount their horses at once, and we
rode up a narrow lane and alighted outside Erzib’s dwelling, where
he had already told me I should be welcome. I heard the man
following us, and saw that a number of people had gradually
assembled.
Erzib’s dwelling lay high up on the side of the cliff, but there were
others that were higher still, and yet others below. In front of these
dwellings ran a narrow path, that, starting from the highest farms, led
in a zigzag course down to those below. The outside of the path was
on the edge of a steep declivity, down which all refuse was thrown,
and was therefore dirty; looking, as did all the other banks when
seen from a distance, as though scored with black stripes. On the
slope below us was a house with a courtyard between it and the cliff
wall. This yard, in which a woman was working, was completely
open to view. Thence the ground fell gradually away till it ended in
palm-grown gorges and valleys; beyond these were low hills, then a
plain, and, last of all, the sea.
Below us, and a little to one side, was the principal edifice of the
village—a little mosque, or Marabout’s tomb, outside which a crowd
of men had gathered to perform their evening devotions. Kneeling
almost simultaneously, they kissed the earth and rose again. A few
of them presently disappeared through the open door of the
Marabout’s tomb, but the remainder stayed outside.
Looking upwards, the eyrie on the summit was visible above
Erzib’s house, that is, one could discern it by stretching one’s neck.
On the opposite side, at the end to the left, lay the other village.
In the evening the women sallied forth in numbers to fetch water
from the cisterns in the valley, and the village dogs barked,
answering each other from every side. Below us, at the foot of the
slope, a crowd of men gathered. I could hear their shouts and see
their gesticulations, as, with extended arms, they pointed to one
figure. Some of them turned and called up the bank to us, one of
them being the man who had met me on the Sheikh’s behalf. They
shouted that I should come down and live wherever I pleased with
the other men, and when I replied that I was well installed, they
informed Erzib that fowls, eggs, and bread would shortly be sent,
that the stranger guest might have a really good meal prepared for
him.
Through a very broad gateway I descended into a court. Opposite
was a long house with its own entrance, to the right another
resembling it; and between the two was a passage leading to a third
dwelling that was situated at the back. To the left was a wall.
On the flat roof of the nearest of these houses stood some
enormous rush-bins for corn, and in the courtyard was another.
There also were two fireplaces, one on either side, screened off with
branches. Behind the screen to the left sat a woman laying small
faggots on the fire to warm her hands, for it was cold since the sun
had set. Some children came out of the door, but fled when they
caught sight of me, wrapped as I was in the folds of Erzib’s burnous.
From the door on the left peeped out an elderly and rather nice-
looking woman.
These two were Erzib’s wives: each had her own house; the
children belonged to the woman I saw seated by the hearth.
Erzib told his wives to come forward. This they did quite naturally
and willingly, retiring again after I had shaken hands with them.
Soon after, I saw people arriving with screaming fowls and a
basket of eggs and bread. Erzib at once drew his knife and vanished
with the hens—his intentions were easy to divine.
In the meanwhile carpets had been spread on the floor of the
house, and a couch arranged for me. I lit a candle which had been
placed in a small square recess in the wall. The room was very
irregularly shaped. The floor was of beaten clay, and the walls not
whitewashed. In the background a door led into another room
containing a loom, and where gala dresses hung on a cord, and
household goods on the wall. Through yet another door in the wall to
the right was a room with a bed in it raised on four slight stumps: this
was made of twigs, and had no coverings.
This dwelling was inhabited by the younger wife and her children
—two boys and a little girl. The wife was pretty and not old.
In the house in the courtyard the elder wife resided. In this the
anteroom was larger, and contained household goods and
implements; behind it was the sleeping apartment.
A grown-up married son, then absent, occupied a house tucked
away at the back, and designed on the same plan as the others. His
wife was at home.
Whilst the pile of wood burnt and crackled in the yard and the
women were busy preparing food, I sat on a bank outside the house
in company of my host and several other men.
The moon had risen and shone clearly over mountain and vale. I
could see down into a courtyard at the foot of the slope, where a fire
burnt brightly on a hearth. Over it hung a cauldron watched by the
housewife. She was young and pretty, and as she moved to and fro
a couple of little children trotted after her. Now and then she stood
still, shading her eyes with her hand, and gazing up in our direction;
possibly in the stillness of the night our voices reached her, for it was
not likely that she could see us. By the hearth a white dog lay and
growled, and when the woman paused and looked up he moved
restlessly, for he also was watching the stranger.
Erzib’s first wife came out and stood leaning against the doorway.
She did not speak, but was evidently interested in our conversation.
Her husband glanced at her and said abruptly—
“She has a great sorrow, and has grieved and wept for many
years. Ali, her only son, who was in service at Gabés, was sent to
prison, accused of having stolen money from the tradesman he
served. But he was innocent—that we know; he was a good boy, and
his mother loved him. It is now four years and four months since we
heard from him, and eight months more must pass before we can
have him home again.”
“Do you not even know if he lives?”
“Yes, we have learnt through strangers that he is alive, and
supposed to be imprisoned at Bona in Algeria.”
The old woman drew herself along the wall till she was close to
me when she heard of what we were talking.
“Are you from Bona?” she asked, whimpering.
“No,” I replied, “I come from a much more distant place, and have
never been in Bona.”
“Ah! then you do not know Ali,” she said, with a sob.
“No, poor woman,” I replied; “that I do not; but now you will soon
see your son alive. You have waited so long for him that the
remaining time will soon pass ere he return to you and be happy with
you again, for you love him. He will have thought so often of you,
and he will be so good to you that both of you will rejoice.”
“Ah! it was a great misfortune, for he was innocent—I am sure of
that; another must have been the culprit, for he was so young.”
“How old was he?”
“That I do not remember.”
“Don’t you know what year he was born?”
“No, I cannot recollect; we never know anything of that.”
“Don’t you know either, Erzib?”
“No, Sidi; but it was before the strangers came to this country.”[2]
The poor woman sobbed audibly, and Erzib pushed her inside the
door that her weeping might not trouble me, saying, “She is very
unhappy, Sidi.”
“Oh yes, Erzib. Would I were able to help you to get back your
son sooner, or at least to procure you tidings of him. But this I can
promise—I will speak to the Khalifa of Gabés on the subject, and, if
possible, send you greeting from your son.”
To my regret, however, I must confess that I was unable later to
do anything for these poor folk. Whether the boy is still in prison I
know not, and whether innocent or no, I know less. My sincere hope
is that he may be worthy of his parents’ touching affection.
The repast was now brought and set out in the house, on the clay
floor, where I enjoyed it; the father, surrounded by his children whom
he caressed, sat aside with Hamed and the younger wife.
When I had finished, and Hamed and Erzib had also eaten, we
remained seated. I talked with the wife about her children. The eldest
may have been about ten years old; he was a lively boy, who nodded
continually to me, and was indefatigable in showing me all the
treasures of his home, from an old musket to his father’s agricultural
implements. When I showed surprise at a very primitive and curious
harrow used to break up the soil, his father gave it to me.
Next in age to the boy was a very pretty little girl about six years
old. Unfortunately she had lost one eye; her father told me that it was
in consequence of a severe attack of inflammation when she was
quite little, and that the eye had fallen out of itself. Here in the south
one meets with an alarming number of people who are blind or
suffering from eye complaints. A doctor told me that many are born
thus; with others it is the result of dust, heat, and uncleanliness.
The youngest child was a bright little fellow of two, who clung to
his father, whose neck he clasped tightly in his arms.
Feeling disposed to take a stroll before retiring to rest, I bade
Erzib follow me. As we crossed the court, he inquired whether I
would not like to see all the dwellings. Accordingly we went first to
visit the elder lady. When we entered with a light we found her
crouching in a corner, her face buried in her hands; beside her lay a
large dog which growled at me.
Thence we went into the son’s house. Asleep on the bed, quite
dressed,—for the natives never undress at night,—was a woman
wrapped in blue clothing; she was evidently the son’s wife.
We walked on and up amongst other houses till we were nearly at
the top of the village. Beneath, we saw the lights and fires in the
courts, and heard the incessant barking of dogs. Shortly after, we
climbed a difficult ascent just over the village, to a ledge or terrace of
some width cut in the side of the cliff, which from thence rose, quite
straight and steep, to the old deserted village that lay in darkness on
the very summit. According to Erzib, we could not reach it from the
side we were on.
I contented myself with examining some real cliff caves, which I lit
up by means of matches. They were excavated from the terrace,
and, according to tradition, had once been inhabited; they were
irregular in form, and not very large.
After an hour’s enjoyment of the beautiful evening, we descended
from this high point.
Wrapping myself in my burnous I lay down on my couch on the
ground; in the same room lay both Hamed and Erzib. In the side
chamber, of which the door remained open, slept the children and
their mother. Just as I was falling asleep a woman came and spread
a covering over me; it warmed me well, and I slept till daybreak, and
was only once disturbed by a little kid coming in through the open
door leading from the courtyard and tripping over me. I heard then
the children, who with their mother were sleeping in the next room,
Hamed and Erzib moving on their beds, and, out of doors, the distant
and continuous barking of dogs. I slept again, and when I awoke saw
that what had been spread over me was a brand new festal garment
that evidently was considered none too good for the guest.
From the doorway overlooking the courtyard I saw through the
gate and down into the valley, where grew a solitary palm, and at the
same time had a view of the flat roofs of several houses, and of the
path where the horses and mules stood ready saddled. From a side
chamber the head of a cow came peering in at the gate, and above
the gateway a white dog lay on the wall watching me.
I gave some money to the children, ate a couple of dates with a
sup of water, and, having thanked the women for their hospitality,
mounted, with Erzib in front and Hamed behind me. As we left, the
women came out to throw refuse down the slope, and vanished
again behind the wall.
From the hearths rose a light blue smoke that was wafted over the
valley beneath us.
We had a view over the mountains of the valley, the plains, and
the Mediterranean Sea, as we followed the route along the western
declivity of the Matmata range, which commands the low-lying land
that extends right away to Tripoli.
For a while we were accompanied by two women who were on
their way to the mountains. They tripped along beside our horses,
and stared at me in astonishment through carelessly drawn veils.
The mountain tops, where lie the villages of Shenini and Sguimi,
are a continuation of the southern range. As I was aware that the
inhabitants of these villages were absent sowing their crops, and
having been told that the dwellings were similar to those I had
already seen, I decided not to visit them. We therefore left them on
one side and rode down the mountain and across a small plain
encircled by hills, behind which lie the great steppes. Towards the
east this plain is bounded by low hills, where water springs are
found, and where we could descry herds grazing. It was here that,
when passing through a little thicket, we spied a covey of partridges
running amongst the bushes. Erzib tried to fire at them from his
horse, but it would not stand long enough, and when he got off it was
too late—the birds had flown.
Before traversing the last of these hills, we halted and partook of
dates, bread, and water, as many hours would elapse before we
could arrive at any place of habitation.
The ride on the mule had tired me, so I preceded the others on
foot, and reached the farther side of the acclivity. There lies an
interminable flat plain stretching as far as the eye can reach from the
east to the north-west; whilst towards the south the mountains fade
away in long undulations. In the midst of the plain I distinguished a
hill, and on its summit what appeared to be a tower or fortress. This
was the signal station near Metamer. It corresponds to the one we
saw near Gabés, and also to another farther south.
I wandered down the gentle slope, through bushes and among
stones, and crossed the bed of the river, that, coming from the
mountains, winds out into the plain. There were many paths, all
leading in an easterly direction. I followed one of these, crossed yet
another stony torrent bed, and continued steadily towards the east,
making the signal station my point of direction; until, looking round, I
discovered the two riders in their white burnouses far away towards
the south. They beckoned to me, as we were compelled to make a
détour to avoid a rough and uneven river bed.
Joining once more my party, we rode farther and farther over the
plain, which becomes dismally desolate and monotonous; with the
exception of the hill and its signal station, nothing breaks the long
line of the horizon.
At last we viewed in the distance a couple of palm trees, and
concluded that the Ksar of Metamer was probably near them, but we
could not see it at all, as it lay in a hollow.
For long, naught but these trees showed on the level horizon.
Then at last the tops of other palm trees appeared, and a little later
some huts; the number of these increased, and proved to be the
outskirts of the town. The huts—of straw and branches—were round,
as a rule, with a pointed thatch. But it was easy to infer that the
inhabitants were absent, as the network which usually encloses the
verandah that runs round each hut had been removed, and only the
centre of the huts remained, their thatched eaves sticking out all
round, so that they resembled thick mushrooms on short stalks.
As the day advanced, the heat became stifling, so that I took off
my gaiters and bared my legs. But after a couple of hours they were
so scorched by the sun that, on arrival at Medinin, I had to ask a
doctor to dress them for me, to ease the pain of the sun-scorch, and
it was eight days before they recovered.
CHAPTER XII

Metamer and Medinin

Arriving at the palm grove in the hollow we had seen from the
distance, we found that it lay by a river bed. The trees were not
particularly well cared for, as could be seen at a glance; they were
far apart, and there were few ditches for irrigation.
On a slope to the east of the valley and above it, there is a village
of peculiar construction, with whitewashed buildings that are dazzling
in the sunlight. This is the “Ksar” Metamer. The ground plan of the
houses is oblong and rectangular, and their raised roofs are vaulted.
They lie lengthwise, as the houses do at home in towns dating from
the Middle Ages—the gable ends turning towards the streets. In
general they are erected round an open square. The fronts of those
facing the plain are without any aperture, except some loop-holes
here and there. In other words, every quarter, and also the town as a
whole, forms a little fortress. This is the style of building adopted
here in the plains; it is, in fact, the same plan as that employed in
cave construction, but in this case carried out aboveground; since
the natives have found it impossible to reach the inaccessible
mountain peaks, or to dig down into the rocky ground. The houses
are very often seven storeys high. On every storey there is a well-
barred door to the inner gable. This is reached by steps or by stones
projecting from the walls. The effect is most peculiar and
picturesque. Each inhabitant carries in his hand a key that he takes
with him everywhere. This locks his rooms, which are mostly used as
corn stores.
Not far from the “Ksar” are barracks for the little garrison, and
shops that supply the needs of the soldiers, not only of the place, but
also of those quartered in the neighbouring town of “Medinin.” I did
not wish to visit the camp just then, so dismounted outside an Arab
dwelling, and was invited to enter and partake of stewed kid.
After a hurried visit to the town, and having taken leave of Erzib,
who desired to ride a long way towards his home that evening, I
procured a new guide and rode eastwards over the plain, so as to
arrive before nightfall at the Ksar of Medinin. As we approached its
neighbourhood we turned into the highroad from Gabés.
Before us and to our left lay the “Ksar” of Medinin, illuminated by
the evening sun. The ends of the houses were turned outwards,
producing the effect of a circular wall scalloped at the top. Above
these vaulted gable ends I caught a glimpse of higher buildings, and
amongst them, in the centre of the town, a large square block. This
was the Kasba. Through a narrow opening in the row of houses I
saw the inner gable ends of dwellings, and doors disposed one
above the other, the whole calling to mind the pictures one sees of
Mexican “pueblos.”
A STREET IN BENI BARKA.

Parts lay in deep shadow, parts blinding white in the sunshine.


These lights and shadows were mingled in such dazzling contrast
that the eye could scarcely discriminate what it beheld.
We rode along the exterior wall till we came to some palms;
farther on grew others. These plantations are to the south of the
Ksar and between it and the European quarter, which showed up
gradually on the right, and consisted of barracks for the cavalry and
infantry, quarters for the officers, and those occupied by the “Bureau
de Renseignement.” The soldiers work amongst the palms, and have
enclosed a plot of ground as a garden. In the beds I saw tender
young green plants sprouting, which proved to be cress. In the open
square in front of headquarters, and before the other houses, holes
were being dug for plants by soldiers in light linen clothing.
In the future the whole military quarter will be surrounded by a
beautiful palm grove, affording shade to the dwellings now
completely exposed to the glare of the sun.
I rode up to headquarters—a large building—where the flag was
hoisted half-mast high on account of the death of Marshal
MacMahon.
Lieutenant Henry, who was at the Bureau, came out to welcome
me. He told me that I was expected, and added that I should meet
the officers of the 4th Light Brigade, whom I had known well at
Gabés, they having arrived to relieve the southern station. I was
quickly conducted to real bachelor’s quarters, consisting of a couple
of rooms. All over the walls hung weapons and curiosities collected
in these regions. The furniture, though camp-like, was very
comfortable. At last I was able to indulge in the luxury of a bath and
change.
In the meantime Hamed arrived to say farewell. He wished to ride
back to Metamer on his donkey and accompany Erzib as far as
Tujan, whence he hoped to take the donkey back to Hadeij, and
return later to Gabés.
When I was dressed I called on the Commander-in-Chief of the
district, Commandant Billet, a young man, who invited me to be his
guest.
When I told him that I was most anxious to meet some Tuareg if
possible, he replied, to my great joy, that by riding some thirty-two
miles farther south I should probably have my wish gratified, as a
telegram had just arrived from the signal station that two of these

You might also like