Connectionist Approaches in Economics and Management Sciences

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 269

CONNECTIONIST APPROACHES IN ECONOMICS

AND MANAGEMENT SCIENCES


Advances in Computational Management Science
VOLUME6
Connectionist Approaches
in Economics and Management
Sciences

Edited by

CEDRIC LESAGE
CREREG CNRS,
University of Rennes, France

and

MARIE COTTRELL
SAMOS MATISSE CNRS,
University of Paris 1, France

....
''
SPRINGER-SCIENCE+BUSINESS MEDIA. B.V.
A C.l.P. Catalogue record for this book is available from the Library of Congress.

ISBN 978-1-4419-5379-7 ISBN 978-1-4757-3722-6 (eBook)


DOI 10.1007/978-I-4757-3722-6

Printed on acid-free paper

. All Rights Reserved


© 2003 Springer Science+ Business Media Dordrecht
Originally published by Kluwer Academic Publishers in 2003
Softcover reprint of the hardcover I st edition 2003
No part of this work may be reproduced, stored in a retrieval system, or transmitted
in any form or by any means, electronic, mechanical, photocopying, microfilming, recording
or otherwise, without written permission from the Publisher, with the exception
of any material supplied specifically for the purpose of being entered
and executed on a computer system, for exclusive use by the purchaser of the work.
Contents

Editors Vll

Acknowledgements lX

Preface XI

Part 1: Advances in connectionist approaches in Economics


and Management Sciences

Chapter 1: Evolution of complex economic systems and uncertain


information 3
J.-P. AUBIN

Chapter 2: Possibilistic case-based decisions 31


D. DUBOIS, E. HULLERMEIER, H. PRADE

Chapter 3: Introduction to multilayer perceptron and hybrid hidden


Markov models 49
J. RYNKIEWICZ

Chapter 4: Issues and dilemmas in cognitive economics 71


B. PAULRE

v
VI Contents

Part II. Applications in Economics and Management Sciences 109

Chapter 5: Working times in atypical forms of employment:


The special case of part-time work Ill
P. LETREMY, M COTTRELL

Chapter 6: Work and employment policies in French establishments


in 1998 131
S. LEMIERE, C. PERRAUDIN, H. PETIT

Chapter 7: Measuring and optimizing the validity of Means-End data 145


J.-M AURIFEILLE, S. MANIN

Chapter 8: Synergy modelling and financial valuation: The contribution of


Fuzzy Integrals 165
X BRY, J.-F. CASTA

Chapter 9: Mechanisms of discipline and corporate performance:


Evidence from France 183
E. SEVERIN

Chapter 10: Approximation by radial basis function networks 203


A. LENDASSE, J. LEE, E. DE BODT, V WERTZ, M VERLEYSEN

Chapter 11: The dynamics of the term structure of interest rates:


An Independent Component Analysis 215
F. MORAUX C. VILLA

Chapter 12: Classifying hedge funds with Kohonen maps:


A first attempt 233
B. MAILLET, P. ROUSSET
Editors

Cedric Lesage

Professor at the University of Rennes 1, France. PhD in Management Science.


He is member of the CNRS-CREREG-UMR 6585 research team. His work deals
with the application of fuzzy logic in management, interactive fuzzy arithmetic
and cognitive management.

Marie Cottrell

Professor at the University of Paris 1- Pantheon - Sorbonne, France. PhD in


Mathematics. She leads the SAMOS (Applied Statistics and Stochastic
Modelling) research team, part of the CNRS MATISSE- UMR 8595. Scientific
expert in several international reviews in the field of neuronal networks, her work
deals with stochastic processes, self-organized algorithms, prevision in time-
series and neuronal techniques.

Vll
Acknowledgements

We would like to thank Dominique Maraine, assistant at CREREG, for her


helpful kindness and efficiency during the 81h ACSEG'meetings held in Rennes
in 2001.

A special thank to Corinne Perraudin, researcher at SAMOS-MATISSE, for


her technical help on the statistical analysis of the contributions of the past
ACSEG'meetings.

lX
Preface

Since the beginning of the 1980's, a lot of news approaches of biomimetic


inspiration have been defined and developed for imitating the brain behavior, for
modeling non linear phenomenon, for providing new hardware architectures, for
solving hard problems. They are named Neural Networks, Multilayer
Perceptrons, Genetic algorithms, Cellular Automates, Self-Organizing maps,
Fuzzy Logic, etc. They can be summarized by the word of Connectionism, and
consist of an interdisciplinary domain between neuroscience, cognitive science
and engineering. First they were applied in computer sciences, engineering,
biological models, pattern recognition, motor control, learning algorithms, etc.
But rapidly, it appeared that these methods could be of great interest in the fields
of Economics and Management Sciences. The main difficulty was the distance
between researchers, the difference in the vocabulary used by the ones and the
others, their basic background. The main notions used by these new techniques
were not familiar to the Social and Human Sciences researchers. What are they ?
Four of them are now very briefly introduced, but the reader will find more
information in the following chapters.

1) Multilayer perceptrons: Multilayer perceptrons are used in numerous scientific fields since
more than 20 years. Derived from an analogy with natural neuronal networks, they may be
regarded by mathematicians as non linear, parametric and differentiable functions. Moreover, this
class of functions is quite more general, as proves the theorem of universal approximation, which
stands that any continuous function defined on a compact may be approximate as finely as required
by a multilayer perceptron function. On the other hand, their success is due for a great extent to the
easiness of implementation of learning algorithms and to their good results in practical problems,
despite some weakness in their theoretical properties. Multilayer perceptrons learn by fitting their
parameters, or weights, to a problem in order to minimize an error of prediction or classification.
However, for a given architecture, learning algorithms only provide a sub optimal solution, so you
must make numerous random initialisations in order to avoid too bad solutions. On the other hand
there exists a pitfall more difficult to be avoided: the "too good" learning, or the "over learning" of
Xl
Xll Preface

multilayer perceptrons. That phenomenon is comparable to a by heart learning on learning data: the
multilayer perceptron is no more capable of making satisfYing prediction on new examples of the
studied problem and its ability for "generalisation" is very poor, whereas this is the key
characteristic for one user. One solution for a compromise between the complexity of the model
and its abilities for generalisation consists in using a penalisation of the complexity of the
multilayer perceptron. One other limit of that kind of models is the hypothesis of stationnarity of
the phenomenon that you try to design with multilayer perceptrons, which leads to suppose that the
characteristics of the model do not evolve. A possibility to relax that hypothesis is to use several
MLP and specialise each of them on a specific part of the phenomenon. This idea leads to use
hybrid systems including hidden Markov chains. This generalisation illustrates one of the MLP
characteristics: the ability to be easily integrated in complex systems and to improve already
existing models.

2) The SOM algorithms: the SOM algorithm, also called Kohonen algorithm, is an original
classification algorithm defined by Teuvo Kohonen in the 1980s (Kohonen, 1984, 1995). The
motivation was neuromimetics. The algorithm presents two essential differences with the
traditional methods of classification: it is a stochastic algorithm, and an a priori neighbourhood
concept between classes is defined. The neighbourhood between classes may be chosen on a wide
range of representations: grid, string, cylinder or torus, called Kohonen maps. The classification
algorithm is iterative. The initialisation consists of associating a code vector (or representative) to
each class, chosen at random in the space of observations. Then, at each stage, an observation is
randomly chosen, compared to all code vectors, and a winner class is determined, i.e., the class of
which the code vector is nearest in the sense of a given distance. Finally, the code vectors of the
winner class and those of the neighbouring classes are moved closer to the observation . Hence, for
a given state of code vectors, an application associating to each observation the number of the
nearest code vector (the number of the winning class) is defined. Once the algorithm converges,
this application respects the topology of the space of entries, in the sense that after classification,
similar observations belong to the same class or to neighbouring classes. An inconvenience of the
basic algorithm is that the number of classes needs to be determined a priori. In order to palliate for
this inconvenience, a hierarchical type of classification of the code vectors is undertaken, so as to
define a smaller number of classes called clusters. The classes and clusters can then be represented
on the Kohonen map corresponding to the chosen topology. As neighbouring classes contain
similar observations, the clusters gather contiguous classes, which gives interesting visual
properties. It is then easy to establish a typology of individuals, by describing each of the clusters
by means of traditional statistics. According to the problem, one may also define a model (of
regression, auto-regression, factor analysis, etc .. . ) adapted to each of the clusters. For qualitative
data, adaptations of the Kohonen algorithm allowing the study of relationships among modalities
also exist. Finally, in order to get the best results, it is recommended to use simultaneously, or
successively, the traditional methods of data analysis and Kohonen techniques.

3) Genetic Algorithms: Basic principles of Genetic Algorithms, as founded by John Holland in


1975, model some biological phenomena, and more precisely the ability of populations of living
organisms to adapt to their environment, via genetic inheritance and Darwinian strife for survival.
Resolution methods and stochastic optimisation methods have been designed according to these
biologically-inspired principles, and give birth to the so-called "Artificial Darwinism". The main
characteristic of GA is that they manipulate populations of points of the search space, and involve a
set of operations applied (stochastically) to each individual of the population, organised in
generations of the artificial evolution process. Operations involved are of two types: selection,
based on the individuals' performance giving the level the problem being solved and genetic
operators, usually crossover and mutation, that produce new individuals. If correctly designed, a
dynamic stochastic search process is started on the search space that converges to the global
optimum of the function to be optimised. From the point of view of optimisation, GAs are powerful
Connectionist Approaches in Economics and Management Sciences Xlll

stochastic zero th order methods (i.e., requiring only values of the function to optimise) that can
find the global optimum of very rough functions. This allows GAs to tackle optimisation problems
for which standard methods (e.g., gradient-based algorithms requiring the existence and
computation of derivatives) are not applicable. Despite the apparent simplicity of an GA process-
which has driven many programmers to first write their own GA adapted to their specific problem-
building an efficient EA for an application must take into account the sensitivity to parameter
settings and design choices.

4) Fuzzy Logic: Fuzzy logic is a superset of conventional (Boolean) logic that has been
extended to handle the concept of partial truth - truth values between "completely true" and
"completely false" . It was introduced by Lotti Zadeh in the 1960's as a means to model the
uncertainty. Rather than regarding fuzzy theory as a single theory, you should regard the process of
"fuzzification" as a methodology to generalize any specific theory from a crisp (discrete) to a
continuous (fuzzy) form. Thus fuzzy arithmetic, fuzzy calculus, fuzzy differential equations and so
on were developed. Just as there is a strong relationship between Boolean logic and the concept of a
subset, there is a similar strong relationship between fuzzy logic and fuzzy subset theory. A fuzzy
subset F of a set S can be defined as a set of ordered pairs, each with the first element from S, and
the second element from the interval [0, 1], with exactly one ordered pair present for each element
of S. This defines a mapping between elements of the set S and values in the interval [0, I). The
value zero is used to represent complete non-membership, the value one is used to represent
complete membership, and values in between are used to represent intermediate degrees of
membership. A very common methodology to apply fuzzy theory is the fuzzy expert system, i.e. an
expert system that uses a collection of fuzzy membership functions and rules, instead of Boolean
logic, to reason about data. The rules in a fuzzy expert system are usually based on a "If-Then"
structure. Their ability to model a "natural" processing of reasoning make their formalisation
generally more robust and less costly than analytical modelling. Conversely, fuzzy methodologies
demand "context-dependent" that may be difficult to obtain. Fuzzy sets and logic must be viewed
as a formal mathematical theory for the representation of uncertainty. Uncertainty is crucial for the
management of real systems in economics and management sciences. The presence of uncertainty
is the price you pay for handling a complex system.

The application of neuronal and connectionist methods to economics and


management sciences started in 1994, at the first ACSEG conference. Later and
gradually, these techniques have slowly but continuously been diffused in the
most varied domains. On the occasion of the publication of the volume gathering
the best papers of the 2001 ACSEG, we wished to review all the communications
presented in the first eight ACSEG conferences. It seemed to us interesting to
study the evolution of the contents of papers presented over the years.

The corpus under study is made up of a set of 148 articles, which can be
classified in terms of the methods and the nature of applications. Following the
latter classification we chose to divide the articles up into seven categories.

Hence, the methods can be related to different paradigms:


1. Multilayer Perceptrons and general Neural Networks (MLP)
2. Genetic Algorithms (GA)
3. Kohonen Maps (SOM)
4. Fuzzy methods (FUZ)
XIV Preface

5. Connectionist networks, of which automaton models (CON)


6. Cognitive economy (COG)
7. Other different methods (13 articles, with classical data analysis, time series,
etc.) (OtherM).

The applications cover various domains of which:


1. General Economics (labour markets, macroeconomics) (ECO)
2. Finance (failing enterprises, interest rate structure, financial liabilities) (FIN)
3. Marketing (purchasing behaviour, etc.) (MKG)
4. Human resources and organization theories (ORG)
5. Decision theory (DEC)
6. Methodology (MET)
7. Other applications (corresponding to a total of8 articles) (OtherA)

Globally, about a third of the methods are related to perceptrons and general
neural networks. The remaining two thirds are evenly distributed among
Kohonen maps, genetic algorithms, fuzzy methods, and connectionist systems.
Lastly, cognitive economy accounts only for the 3% of the methods used.

Distribution of methods Distribution ofapplications

OtherM: 9% OthefA: 5%
M.P. 31% EC0:15%

SOM: 15% I M<G:23%

L_____~---l

As for the applications, about half of them are related to finance and
marketing, followed by economics and methodology (around 15% each), lastly
by human resources and organization theories and decision theory (less than 10%
each).
As such, we notice an interesting diversity concerning both the methods and
their applications.
The progression in time is even more insightful. An evolution towards a more
uniform distribution of methods as well as their applications can be detected.
Indeed, perceptrons and neural networks accounted for the totality of the methods
used in 1994, whereas a much more uniform distribution of the methods can be
observed in 2001 . The same can be said of the applications.
Connectionist Approaches in Economics and Management Sciences XV

Furthermore, in time, the areas of applications have become more specific;


less well defined categories qualified as "other applications" and "methodology"
occupy a minor place in 2001.

Evolution ol methods Evolution of applications

100 60

80 50

l&L•
40
60

~l .. t.lt.~
30

~
40
20
20 10

1994 1995 1996 1997 1998 1999 2000 2001 1994 1995 1996 1997 1998 1999 2000 2001

D M.P • GA • SOM ll R.JZ c 00N o COG • Otl1erM • EGO D FIN 0 M<G • ORG DEC ~T • OlllerA

The following graphs illustrate the transition towards a more uniform distribution
of the different types of methods and applications.

MethOdS ! 19'9-4 Methods : 2001

""'
Olhe<>.t

GA:
IS%

M.J> FUZ:
·~

Applications : 1994 A.ppllutlot~.t ; 2001


0%
,..
Other A :

The distribution of applications by type of method and that of methods by


application (leaving aside "Other Methods" and "Other Applications") reveal
strong associations between methods and applications. For instance, perceptrons
and genetic algorithms are mainly used in marketing, Kohonen maps in finance,
fuzzy logic in finance and in decision theory. Connectionism is more uniformly
distributed among the various applications, except for finance.
XVl Preface

MLP SOM
OtherA ECO M:T
6% 6% 9%

ORG
20%

2% 36%
L_____ J
AJZ CON COG
ECO
t.£T OtherA ECO OtherA ECO
11%
9% 5% 9%

M<G
Fl'l 25%
31%
IE FN
32% 20%

9% 26%

Lastly, the distribution of the methods by application reveals that marketing


makes an extensive use of perceptrons and general neural networks, and that half
of the applications in decision theory have recourse to fuzzy methods. Moreover,
economics and finance mostly use Kohonen maps and organization theories have
recourse to connectionist approaches.

ECO AN MKG
0\herM M.P M.P
OtherM
14%

COG COG GA
5% 3% 6% SCM MLP
CON 49%
6%
9%

ORG MLP
8%
- r OlherM DEC
7%

MLP
29%
GA M.P
33% 51%

17% 17%

Hence, we observe that the "ACSEG community" has more and more
appropriated general neural methods and has diversified its applications.

Finally, to make a wink to the followers of self-organized Kohonen maps, we


have used an algorithm inspired by the SOM algorithm which allowed us to
Connectionist Approaches in Economics and Management Sciences xvn

classify the 148 articles on the basis of the two qualitative variables mentioned
above (methods and applications, each with seven modalities). We have chosen a
one-dimensional map (a string) with six units. In each unit, the different methods
and applications, as well as the number of the corresponding articles have been
classified. In the map below only the modalities of the two qualitative variables
are represented.

SOM MLP
FUZ COG ECO FIN MET
DEC OtherM OtherA

As expected, we find that the fuzzy methods are associated with the
applications of the decision theory; also that Kohonen maps and economics are
placed in the same unit and that finance is in the neighbouring case. Genetic
algorithms and connectionist approaches go together with marketing and
organization theory, etc.
The following graphical representations show the number of articles per year
in each class of the constructed string. An outbreak of the cognitive approaches
in the year 2001 can be clearly observed. For an easier lecture of the graphs, we
have represented the six units in three rows (from 1 to 2, from 3 to 4 and from 5
to 6).

.,
I. I \1
"
:1 . ' "
.
:
' '
El •
. . - '""i!J-- ""'""
0 •

J
.
"'
' ,!•

" . ,,
15 I) l)

The purpose of the book is to put these new techniques at disposal of


researchers coming from different horizons, to assess the state of the art, to
identify the capability of these new algorithms, to evidence the contribution of
these methods to Economics and Management Sciences. It is a privileged place
XVlll Preface

to expose the know-how and to discuss new developments and problems


encountered in the researches.

The selected papers give a good illustration of the variety of possible


applications in Economic and Management Sciences. They represent all
interesting examples of the advances that this field of research can produce.

A first part is dedicated to theoretical advances. J.-P. Aubin links various


connectionist approaches (neuronal network, fuzzy logic, genetic algorithm) with
respect to the viability theory point of view. H. Prade, E. Hlillermeier and D.
Dubois demonstrate the interest of the possibility theory for case-based
reasoning. J. Rynkiewick deepens the multilayer perceptrons and B Paulre
suggests a framework of research for cognitive economics.

Part II is devoted to the application in Economics and Management sciences


of the connectionist approaches. First, M. Cottrell and P. Letremy on part time
work , and S. Lemiere and al. on work policies, apply the Kohonen's maps to
describe the employment market. J.-M. Aurifeille and S. Manin use a genetic
algorithm to optimise the means-end data to model the customer behaviour. Then
five chapters show a very diversified range of the possibilities of connectionist
approaches into finance. X. Bry and J.-F. Casta apply fuzzy logic to model
synergy effects into the firm's assets. E. Severin applies the Kohonen's maps to
reveal the relations of corporate governance on a collection of companies. A.
Lendasse et alii use a neuronal network optimised by a radial function for the
pricing of derivatives, F. Moraux and C. Villa suggest the use of Independent
Component Analysis to analyse the dynamics of term interest rates, and finally,
B. Maillet and P. Rousset obtain a classification of hedge funds with the
Kohonen's algorithm,

All these papers contain interesting results on each subject, that it would be
very difficult to show with classical techniques and that have been proved by
using these connectionist non linear methods.

They reflect the great diversity of connectionist approaches of which we hope


the reader will benefit for his(her) own research. If this study enlarges the range
of the tools of analysis of researchers in economics and management we will
have reached our goal of sharing the interest that we carry for these new and
fascinating connectionist methods.

M Cottrell, C. Lesage
PART I

ADVANCES IN CONNECTIONIST APPROACHES


IN ECONOMICS AND MANAGEMENT SCIENCES
Chapter 1

Evolution of complex economic systems and


uncertain information

Jean-Pierre AUBIN
Universite Paris-Dauphine, F-75775 Paris ex (16),France

Abstract: Socio-economic networks describe collective phenomena through constraints


relating actions of several agents, coalitions of these agents and multilinear
connectionist operators acting on the set of actions of each coalition. We provide a
class of control systems governing the evolution of actions, coalitions and
multilinear connectionist operators under which the architecture of the network
remains viable. The controls are the "viability multipliers" of the "resource space"
in which the constraints are defined. They are involved as "tensor products" of the
actions of the coalitions and the viability multiplier, allowing to encapsulate in this
dynamical and multilinear framework the concept of Hebbian learning rules in
neural networks in the form of "multi-Hebbian" dynamics in the evolution of
connectionist operators. They are also involved in the evolution of coalitions
through the "cost" of the constraints under the viability multiplier regarded as a
price.

Key words: Viability theory, coalition, connectionist networks.

INTRODUCTION

We begin this paper by quoting the wish J. von Neumann and 0. Morgenstern
expressed in 1944 at the end of the first chapter of their monograph "Theory of
Games and Economic Behavior":
"Our theory is thoroughly static. A dynamic theory would unquestionably be
more complete and therefore, preferable ..."

C. Lesage et al. (eds.), Connectionist Approaches in Economics and Management Sciences


© Springer Science+Business Media Dordrecht 20033
4 Chapter 1

"Our static theory specifies equilibria ... A dynamic theory, when one is found
-will probably describe the changes in terms of simpler concepts."
One of the economic characteristics is the presence of scarcity constraints,
and more, generally, viability constraints to which a socio-economic system must
adapt during its evolution.
It becomes then natural to specify the "minimal" conditions under which an
economy can work and to specify classes - as large as possible - of reasonable
economies whose evolution does not violate these viability conditions (as well as
other specifications).
In my opinion, when one has to design a mathematical metaphor for an
evolutionary model of socio-economic variables, one should start by gathering
the constraints of these variables which cannot- or should not- be violated.
This requires first to delineate the endogenous states of the system under
study and to discriminate them from the rest of the variables, regarded as
exogenous, constituting in some sense the "environment" of the system under
investigation. This partition between variables, which dictates the level of
abstraction of a particular investigation, is the first source of constraints that the
endogenous variable must obey.
Usually, there are few disputes among "modelers" when they are listing these
constraints.
Serious disagreements may begin when behavioral assumptions have to be
made.

I. Designing Dynamics through Viability Multipliers

In order to weaken such controversies, or to prorogue the ultimate choice of a


behavioral description of the economic agents, the strategy I suggest is to begin
by characterizing an "envelope" of dynamical systems under which the
constraints are viable, in the sense that starting from any initial state satisfying
these constrains, at least one evolution is viable, i.e., satisfies at each instant these
viability constraints. We shall perform this task here for a class of constraints
describing the architecture of an abstract socio-economic network.
We then can devise general strategies for designing dynamical behaviors of
the economic agents under which the constraints are viable. Now, the problem of
choice of a behavior of the consumers is well circumscribed: one can propose,
describe or suggest such and such shape of a change function and check whether
or not a representative of this class belongs to this "envelope". Or one can
propose a choice criterion and choose among viable dynamical economies of this
"envelope" the ones which satisfy optimally this criterion. Given the constraints
that a socio-economic system must obey, and given an initial dynamic system for
which these constraints are not viable, a theorem on viability multipliers allows
Evolution ofcomplex economic systems and uncertain information 5

us to correct the dynamics of the initial system in order that the constraints
become viable under the corrected system. These viability multipliers play the
role of Lagrange or Kuhn-Tucker multipliers in optimization theory, where an
optimal solution of the problem of maximization of a utility function under
constraint is obtained by maximizing without constraints a corrected utility
function involving Lagrange multipliers. Both the viability and Lagrange
multipliers belong to the same space (the dual of the resource space), and are
usually interpreted as virtual prices as well as other regulatory controls, called
regulons.
Therefore viability multipliers provide one way (but not the unique one)
allowing to design dynamical economies for which the constrained set is viable,
that should be familiar to economists, since it amounts to use for (virtual) prices
the very same multipliers - viability multipliers instead of Lagrange ones -
that are used in optimization under constraints for relaxing the constraints. In this
respect, viability theory can be regarded as an evolution theory under (viability)
constraints.
For example, ever since Adam Smith's invisible hand, what we call nowadays
decentralization is justified by the need of agents to behave in a decentralized
way for complying to scarcity constraints, using for this purpose "messages"
such as prices or "rationing" mechanisms which involve shortages (and lines,
queues, unemployment), or "frustration" of consumers, or "monetary"
mechanisms, or others. "Prices" constitute the main examples of messages,
actually, the messages with the smallest dimension (see for instance an
introduction to this issue in (Saari 1995). Such prices appear here as viability
multipliers emerging2 when allocations of commodities satisfy the scarcity
constraints.
The next task is to derive from the confrontation of the (corrected) dynamics
and the constraints the concealed regulation mechanisms governing viable
evolutions, and to select some of them according to some further principle. This
allows us to derive "adjustment laws" instead of founding the modelization
process based on such a law. This goes against the tradition of theoretical
economy, where the adjustment of some variables for reaching an equilibrium
plays a basic and prominent role. The so-called "law of supply and demand"
states that prices react in a determined direction in response to a difference
between supply and demand in the market: the price of a particular commodity is
assumed to vary according to the sign of the excess demand of this commodity.
Instead of reasoning with a law of adjustment a priori given, and which does not
produce viable evolutions, we shall build "dedicated" laws of supply and demand
which shall provide viable solutions. In some way, this reverse approach allows
us to "explain" a posteriori the role of such an adjustment law instead of
scrutinizing the consequences of an a priori given law for possible justifications.
6 Chapter 1

In summary, the main purpose of the viability approach to dynamical


economics is to explain the evolution of a system, governed by given
nondeterministic dynamics and viability constraints, to reveal the concealed
regulation laws which allow the system to be regulated and provide selection
mechanisms for implementing them.
It assumes implicitly an "opportunistic" and "conservative" behavior of the
system: a behavior which enables the system to keep viable solutions as long as
its potential for exploration (or its lack of determinism) - described by the
availability of several evolutions- makes possible its regulation.
Therefore, using this viability approach, the modeling difficulty is confined to
the elaboration of the list of constraints that the state of the system must obey and
to use the above theory to suggest dynamics and study their properties.

2. Complex Economic Systems

This is at this level that the concept of "connectionist complexity" - to make


more precise a meaning of such a polysemous concept as "complexity" - comes
into the picture. The complexity of dynamic socio-economic systems sterns from
such a non teleological collective evolution of the set of agents, even though
many individual economic agents think of themselves as being pursuing
definitive and rational aims, instead of adapting permanently to the many
viability constraints (among which scarcity constraints) they face under
uncertainty, either contingent, stochastic or tychastic 1• This theme has been
introduced and studied in economic theory under the name of "bounded
rationality". Indeed economic agents are humans, not computers, are seldom
rational, obey inertia principles, are poor forecasters, actually very myopic. They
base many decisions not on rational grounds, but on faith, beliefs and bets, rules
of thumb, moods, rumors, and the like. They are more inductive than deductive
in their learning processes. Actually, we can adopt Peirce's terminology and look
at them as "abductive", i.e., as making conjectures rather than predictions2 • They
are mose often more conservative than innovative, afraid of changes when they
are not perceived as improving their situation. They may prefer to adopt a herd
- or panurgean - behavior instead of choosing dissident ways opening new
avenues.
Social (and living) systems are "complex", although there is no consensus on
the definition of complexity. Reading the literature on complexity, and quoting
George Cowan, the founder of the Santa Fe Institute, "in the universe, everything
is connected with every thing" seems to be the consensual agreement of the
members of this Institute. However, Seth Loyd had found 31 different definitions
of complexity at the beginning of the 90's. This number increased a lot since.
Complexity is indeed a polysemous word, that tries to embrace too many distinct
phenomenon of interest in social and biological sciences3.
Evolution of complex economic systems and uncertain information 7

Since at least the works of Charles Elton ( 1958) and of George Hutchinson
( 1959) at the end of the fifties, the conventional wisdom of biologists proposes in
some loose ways that complexity, regarded as the number of variables of the
systems and their links - is justified for maintaining "stability" - a fuzzy word,
meaning confinement, or rather, viability as it was proposed to single out this
meaning from the numerous intendments of "stability" in mathematics.
Biodiversity is presently and actually championed on the basis of this objective.
In a series of papers summarized in (May 1973), Robert May and his
collaborators disclaimed this proposition by showing that the higher the
dimension, the less stable were dynamical models of Lotka-Volterra types.
Therefore, either the biologists' assumption was false, or such mathematical
connotation of complexity - the dimension of the state space - or the chosen
mathematical model is inadequate. This is a suggestion made by John-Maynard
Smith (1974), when he concluded that stability of ecosystems is due to some
specific interactions.
Here, we retain the following features:
the purpose of complexity to sustain the Complexity means in the day-to-day
language not only the number of variables of a system, but also and above all
the labyrinth of connections between the components of an organization or a
"system" (or, for that matter, of a living organism),
the purpose of complexity to sustain the "stability" - another polysemous is
word - or more precisely the viability constraints set by an organization,
the increase of complexity is parallel to the growth of the web of constraints
whenever the system cannot comply to them in an autonomous or
decentralized way,
the organization of organisms as a hierarchical structure of relatively
"autonomous" organs is due to "cycles" involved in the viability constraints
or multi-stage production processes,
the organization in organisms or subsystems is rooted in the need to offer
them slowly evolving partial environments to specialize them in specific
activities.

However, these attempts did not answer directly the question that some
economists or biologists asked: Complexity and hierarchical organization, yes,
but for what purpose?
This growth of "structural" complexity is the legacy that Jean-Baptiste de
Monet, chevalier de Lamarck, offered to us, the backbone of his theory of
evolution which was forgotten ever since, overshadowed as it was by other
aspects of the evolution of species, such as the Darwinian natural selection or
genetics. The Claude Bernard's "constance du milieu interieur", the
"homeostasis' of Walter Cannon, viability constraints to which dynamical
8 Chapter 1

systems must comply, later contributed to single out the crucial role of
constraints as a key for explaining this aspect of complexity.
In this framework of adaptation to viability constraints, the evolution of the
state no longer derives from intrinsic dynamical laws valid in the absence of
constraints, but from some "organization" that evolves together with the state of
the system in order to adapt to the viability constraints. This attempt to sustain
the viability of the system by connecting the dynamics or the constraints of its
agents may be a general feature of "complex systems".
We regard here connectionism - a less normative and more neutral term than
cooperation whenever the system, the organ, the organism or the organization
arise in economics, social sciences or biology- as an answer to adapt to more
and more viability constraints, which implies the emergence of links between the
components of a dynamical system and their evolution.

3. Connectionist Complexity of the Architecture ofNetworks

We shall restrict our study to the case when the organization is described by a
"network" we now define.
A purpose of an organization is to coordinate the actions of a finite number n
of agents labelled i = 1, . . . , n. It is described here by the architecture of a
network of agents, such as:
1. socio-economic networks (see for instance Ioannides 1997; Aubin 1997,
1998b; Aubin, Foray 1998; Bonneuil 2000).
2. neural networks (see for instance Aubin 1995; 1996; 1998c),
3. genetic networks (see for instance Bonneuil 1998; Bonneuil, Saint-Pierre
2000).
This coordinated activity requires a network of communications of actions
xi E Xi ranging over n finite dimensional vector spaces x;
The simplest general form of coordination is to require that a relation between
actions ofthe formg(A(x 1, . . . , x,J) E Mmust be satisfied. Here:
1. A : Il~=f ~ Y is a connectionist operator relating the individual actions
in a collective way,
2. M c Y is the subset of the resource space Y and g is a map, regarded as a
resource map.
We shall study this coordination problem in a dynamic environment, by
allowing actions x(t) and connectionist operators A(t) to evolve4 according to
dynamical systems we shall construct later. In this case, the coordination problem
takes the form:

V t ~ 0, g(A(t)(xi(t) . . . , Xn(t))) E M
Evolution of complex economic systems and uncertain information 9

However, in the fields of motivation under investigation, the number n of


variables may be very large. Even though the connectionist operators A(t)
defining the "architecture" of the network are allowed to operate a priori on all
variables x;(t), they actually operate at each instant t on a coalition S(t) c N := {1,
... , n} of such variables, varying naturally with time according to the nature of
the coordination problem (see Aubin (2002); Petrosjan (2001), Petrosjan,
Zenkevitch (1996), Filar, Petrosjan (2000) for closely related issues in dynamic
cooperative game theory).
Therefore, our coordination problem in a dynamic environment involves the
evolution:
rr;=ft ,
1. of actions (x J(t) ... , Xn(t)) E

2. of connectionist operators A (t) : rr;=fi


S(t) H y
3. acting on coalitions S(t) c N := { 1, .. , n} of the n agents and requires that:

Vt~O,g({As(t)(x(t))} scN)E M

where g : n Ys H Y .
SeN

The question we raise is the following: Assume that we may know the
intrinsic laws of evolution of the variables x; (independently of the constraints),
of the connectionist operator As(t) and of the coalitions S(t), there is no reason
why collective constraints defining the above architecture are viable under these
dynamics, i.e, satisfied at each instant.
One may be able, with a lot of ingeniosity and the intimate knowledge of a
given problem, and for "simple constraints", to derive dynamics under which the
constraints are viable.
However, we can use the kind of "mathematical factory" providing classes of
dynamics "correcting" the initial (intrinsic) ones through viability multipliers q(t)
ranging over the dual Y* of the resource space Yin such a way that the viability
of the constraints is guaranteed.
This may allow us to provide an explanation of the formation and the
evolution of the architecture of the network and of the active coalitions as well as
the evolution of the actions themselves.
The results presented here use this approach in the case of the above specific
constraints. We show that by doing so, the dynamics of the evolution of
connectionist operators and coalitions present some interesting features.
In order to tackle mathematically this problem, we shall:
10 Chapter 1

1. restrict the connectionist operators to be multiaffine, the sum over all


coalitions of multilinear operators As, also called (or regarded) as tensors 5,
and thus, involve tensor products,
2. next, allow coalitions S to become fuzzy coalitions so that they can evolve
continuously.

Fuzzy coalitions X = (X~, . . . , Xn) are defined by memberships Xi E [0, 1]


between 0 and 1, instead of being equal to either 0 or 1 as in the case of usual
. z, is by definition the product of the
coalitions. The membership Ys(X):=IT teS
memberships of the members i E S of the coalitions. Using fuzzy coalitions
allows us to define their velocities and study their evolution.
The viability multipliers q(t) E Y* can be regarded as regulons, i.e., regulation
controls or parameters, or virtual prices in the language of economists. They are
chosen adequately at each instant in order that the viability constraints describing
the network can be satisfied at each instant, and the main theorem of this paper
guarantees that it is possible. Another one tells us how to choose at each instant
such regulons (the regulation law).
The main theorem asserts that for each agent i, the velocities x' i(t) of the state
and the velocitiesx' i(t) of its membership in the fuzzy coalition X(t) are corrected
by adding
1. the sum over all coalitions S to which he belongs of adequate functions
weighted by the membership Ys(X(t)),
2. the sum over all coalitions S to which he belongs of the costs of the
constraints associated with connectionist tensor As of the coalition S
weighted by the membership Ysli (X(t)). This type of dynamics describes a
panurgean effect. The (algebraic) increase of agent i's membership in the
fuzzy coalition aggregates over all coalitions to which he belongs the cost
of their constraints weighted by the products of memberships of the agents
of the coalition other than him.

As for the correction of the velocities of the connectionist tensors As, their
correction is a weighted "multi-Hebbian" rule: for each component of the
connectionist tensor, the correction term is the product of the membership
yS(X(t)), of the coalitionS, of the components xik (t) and of the component q'(t) of
the regulon. This is a generalization of the celebrated Hebbian rule proposed by
Hebb in his classic book The organization of behavior in 1949 as the basic
learning process of synaptic weight in neural networks (see Aubin 1995, 1996,
1998c) for more details). Mathematically speaking, we recognize tensor products
of vectors that boil down to matrices when only two vectors are involved.
In other words, the viability multipliers appear in the regulation of the
multiaffine connectionist operators under the form of a "multi-Hebbian" rules, as
Evolution of complex economic systems and uncertain information 11

in Aubin, Burnod (1998) where they were introduced for the fist time,
compounded with the presence of the membership coalitions ys{x(t)), when
coalitions of agents are allowed to form and to evolve.
Even though viability multipliers do not provide all the dynamics under which
a constrained set is viable, they provide classes of them exhibiting interesting
structures that deserve to be investigated and tested in concrete situations.

Remark: Learning Laws and Supply and Demand Law - It is curious that
both the standard supply and demand Jaw, known as the Walrasian tatonnement
process, in economics and the Hebbian learning law in cognitive sciences were
the starting points of the Walras general equilibrium theory and of learning
processes in neural networks. In both theories, this choice of putting such
adaptation laws as a prerequisite led to the same cul de sacs. As we alluded to
above, starting instead from dynamic laws of agents, viability theory provides
"dedicated adaptation laws", so to speak, as the conclusion of the theory instead
as the primitive feature. In both cases, the point is to maintain the viability of the
system, that allocation of scarce commodities satisfy the scarcity constraints in
economics, that the viability of the neural network is maintained in the cognitive
sciences. For neural networks, this approach provides learning rules that possess
the features meeting the Hebbian criterion. For the general networks studied here,
these features are still satisfied in spirit.

These modeling challenges raised by the study of the evolution of socio-


economic networks require not necessarily more difficult mathematical
techniques, but new ones motivated by these questions. If we accept that physics
studies much simpler phenomena than the ones investigated by social and
biological sciences, and that for this very purpose, they motivated and used a
more and more complex mathematical apparatus, we have to accept also that
social sciences require a new and dedicated mathematical arsenal which goes
beyond what is presently available. Paradoxically, the very fact that the
mathematical tools useful for social sciences are and have to be quite
sophisticated impairs their acceptance by many social scientists and economists,
and the gap menaces to widen.

4. Outline

We present examples of network structures in order of increasing difficulty.


We begin with results (already) obtained for affine constraints (case of one
agent), and expose them in details when there are only two agents and when
bilinear constraints are involved.
In the next section, we exhibit the results for n agents for multiaffine
constraints without evolving coalitions, whereas in the last section, we introduce
12 Chapter 1

fuzzy coalitions and show how they may evolve for maintaining the viability of
the architecture of the network.

1. EXAMPLES OF ARCHITECTURES INVOLVING


LINEAR AND BILINEAR CONNECTIONIST MAPS

1.1 Case of Affine Constraints

For simplicity, we summarize the case when there is only one agent and when
the operator A : X f--7 Y is affine studied in Aubin ( 1997, 1998b, 1998c):

\;/ x EX, A(x) := Wx + y where WE L(X,Y) andy E Y.

The coordination problem takes the form:

\;/ t ~ 0, W(t)x(t) + y(t) E M

where both the state x, the resource y and the connectionist operator W evolve.
These constraints are not necessarily viable under an arbitrary dynamic system of
the form

i)x'(t) = c(x(t))
{ ii)y'(t) = d(y(t))
iii)W' (t) = a(W (t))

We can reestablish viability by involving multipliers q E Y* ranging over the


dual Y*:= Y of the resource spaceY. We denote by W* E £(Y*, X*) the transpose
ofW:

\;/ q E Y*, \;/ q E X, (W*q, x) := (q, Wx)

by x ® q E £(X, Y*) the tensor product defined by:

x ® q :p E X* := X f--7 (x ® q)(x) := (p,x)q

the matrix of which is made of entries (x ® q ){ = x;q 1 .


The contingent cone TM(x) to M c Y at y E M is the set of directions v E Y
such that there exist sequences hn > 0 converging to 0 and Vn converging to v
Evolution of complex economic systems and uncertain information 13

satisfying y+ hn Vn E M for every n. The (regular) normal cone toM c Yat y E M


is defined by:

N~y) := {q e Y*l V ve TM(y), (q, v) ~ 0}

(see Aubin, Frankowska (1990) and Rockafellar, Wets (1997) for more details on
these topics).
We can prove that the viability of the constraints can be reestablished when
the initial system is replaced by the control system:

i)x'(t) = c(x(t))- W * (t)q(t)


{ ii)y'(t) = d(y(t))- q(t)
iii)W' (t) = a(W (t))- x(t) ® q(t)

where q(t) E NM(W(t)x(t) + y(t)).

Where NM(y) c Y* denotes the normal cone to M at y E M c Y and


x ® q E L(X, Y*) denotes the tensor product defined by:

x ® q :p E X* := X~ (x ® q)(x) := (p,x)q

the matrix of which is made of entries (x ® q){ = x;q 1 (see Aubin (1996) for
more details on the relations between Hebbian rules and tensor products in the
framework of neural networks).
In other words, the correction of a dynamical system for reestablishing the
viability of constraints of the form W(t)x(t)+y(t) e M involves the celebrated
Hebbian rule proposed by Hebb in 1949 as the basic learning process of synaptic
weight: Taking a( W) = 0, the evolution of the synaptic matrix W := w( obeys the
differential equation:

d . .
-wf (t) = -x;(t)q 1 (t)
dt

It states that the velocity of the synaptic weight is the product of the presynaptic
activity and the postsynaptic activity. This intuition of a neurobiologist is
confirmed mathematically by the above result. Such a learning rule "pops up"
(or, more pedantically, emerges) whenever the synaptic matrices are involved for
regulating the system in order to maintain the "homeostatic" constraint
W(t)x(t)+y(t) E M.
14 Chapter 1

We may enrich this problem by introduced a coefficient z(t) E [0, 1) aimed at


"tuning" the action x(t) regarded as a potential action that is not wholly
implemented. In this framework, the constraint becomes:

Vt ~ 0, W(t)x(t)x(t) + y(t) E M

Again, one can correct a differential system of the form:

i)x'(t) = c(x(t))
ii)y'(t) =d(y(t))
iii)z'(t) = K(X(t))
iiv)W'(t) = a(W(t))

by introducing viability multipliers as controls in a system of the form:

i)x'(t) = c(x(t))- W * (t)q(t)


ii)y'(t) = d(y(t))- q(t)
iii) X' (t) = K(X(t)- ( q(t), W(t)x(t))
iv)W'(t) =a(W(t))- x(t) ® q(t)

where q(t) E NM(W(t)x(t) + y(t)).

The correction term is the "cost of the linear constraint" (q(t), W(t)x(t)) in the
law of evolution of l(t).

1.2 Case of Bi-Affine Constraints

Before investigating the general case and confronting notational difficulties,


let us explain how we go from the affine case to the hi-affine case.
Here, we assume now that X:= X 1 x X 2 is the product of two vector spaces.
Affine constraint takes the form:

where A; E L(X,, Y) (i = 1, 2) and A 0 E Y. But we can also involve a bilinear


operator A{l ,2l E L2(X1 x X 2, Y) and consider hi-affine constraints of the form:
Evolution of complex economic systems and uncertain information 15

We introduce the linear operators A{l,z}(x;) E L(X;, Y) defined by:

and:

We shall prove that when these constraints are not viable under an arbitrary
dynamic system of the form:

i)x; (t) = c;(x(t)),i = 1,2


ii)A~ (t) = a 0 (A0 (t))
iii)A; (t) = a 1 (A 1 (t))
iv)A; (t) = a 2 (A 2 (t))

v)A{t,z}(t) = a{t ,z}(A{t,z}(t))


we can still reestablish viability by involving multipliers q E Y* and correct the
above system by the control system:

i)x; (t) =c1(x(t))- ~ (t) * q(t)- A{t,z}(t)(Xz(t))* q(t)

ii)x~(t) =c2 (x(t))- A2 (t) * q(t)- ~t,z}(t)(x1 (t))* q(t)

iir)Az,(t) =Cl0(~(t))-q(t))
iv)~(t)=aj(A1 (t))-x1(t)®q(t)
v)A~(t) =lX:!(A2 (t))- x2 (t) ®q(t)
vr)A{1,2 j(t) =q 1 , 2 j(~ 1 , 2 }(t))- x 1(t) ®x2 (t) ®q(t)

where: q(t)E NM (Ap,z}(t)(xi(t),x2(t)) + AI(t)xi(t) + A2(t)x2(t) + A .o(t))

Hence, the structure of this control system involves the transposes A;* (t)q(t)
and Ap ,21 (t)(xj(t))* (t)q(t) (i = 1, 2) in the evolution of the variables x;(t), and the
tensor products x;(t) ® q(t) (Hebbian rules) in the evolution of the linear
operators A;(t), and the tensor product x 1(t) ® xl(t) ® q(t) in the evolution of the
bilinear form A {1,2} .
The tensor product X] ®X] ® q is a bilinear operator from x; xx;
to Y*
associating with any pair (p 1 , p 2 ) E x x; x;
the element:
16 Chapter 1

If the vector spaces are supplied with bases, the components of this bilinear form
- the "tensors" - can be written:

as the products of the components of the three fagents of this tensor product.
Taking a1 ,2(A) = 0, the evolution of the hi-synaptic tensor A{l,2J := (a(,,;2 ) obeys
the differential equation:

d . .
-a 1 . (t) = -x1 (t)x 2 (t)q 1 (t)
dt l] , l 2 I) 12

It states that the velocity of the synaptic tensor is the product of the presynaptic
activities of the neurons arriving at the synapse (i~, i2 , j) and the postsynaptic
activity (see Aubin, Bumod 1998).
We may enrich this problem by introduced coefficients x;(t) E [0, 1] aimed at
tuning the action x;(t) (i = 1, 2) that we shall regard as the components of a fuzzy
coalition later. In this framework, the constraint becomes:

If we assume that the evolutions of these x;(t) are governed by differential


equations:

we shall prove that the above constraints are viable under the control system:
Evolution of complex economic systems and uncertain information 17

i) = c 1(x(t))- X 1 (t)A 1 (t) * q(t)- X 1(t)X 2 (t)A{1, 2} (t)(x 2 (t)) * q(t)


x; (t)
ii) x~ (t) = c 2 (x(t))- X 2 (t)A 2 (t) * q(t)- x, (t)X 2 (t)A{, ,z} (t)(x, (t)) * q(t)
iii) z; (t) = K (X (t))- ( q(t), A, (t)x (t)- X
1 1 1 2 (t)A{1, 2} (t)(x 1 (t), x 2 (t)) J
iv) X~ (t) = K 2 (X2 (t))- ( q(t), A 2 (t)x 2 (t)- %1 (t)A{1, 2} (t)(x 1 (t), x 2 (t)) J

v) A~ (t) = a 0 (A0 (t))- q(t)


vi) A; (t) =a, (A, (t))- z, (t)x 1 (t) ® q(t)
vii) A~ (t) = a 2 (A 2 (t))- X 2 (t)x 2 (t) ® q(t)
viii) A{,,z} (t) = a{,,2} (A{1, 2} (t))- x, (t)X 2 (t)x, (t) ® x 2 (t) ® q(t)

where:

2. REGULATION BY CONNECTIONIST TENSORS

2.1 Connectionist Tensors

In order to handle more explicit and tractable formulas and results, we shall
assume that the connectionist operator A : X:= II=•
X; ---7 Y is multiaffine.
For defining such a multiaffine operator, we associate with any coalitionS c
Nits characteristic function Xs : N H R associating with any i E N

. {10
X s (z) :=
if
if
iE S
i il S

It defines a linear operator:


18 Chapter 1

that associates with any X =(xJ, ... ' Xn) E n~=l xi the sequence Xs 0 X ERn

defined by:

X if i E S
'IIi= 1, .. .. ,n (Xsox)i := { o' if i fl s

We associate with any coalitionS c N the subspace:

since Xs 0 is nothing other that the canonical projector from n~=l xi onto Jf. In
particular,XV := IJni=J
X . , andX 0 := {0}.
I

Let Y be another finite dimensional vector space. We associate with any


coalitionS c N the space Ls(Jf, Y) of S-linear operators As. We extend such aS-

linear operator As to a n-linear operator (again denoted by) AsE Ln( n:


i=l
n
Xi ,Y)

defined by:

'II X E n: n

i=l
xi 'As(x) = As(xJ, ... ,Xn) := As(xsox)

(IT Xi ,Y) is a sum of S-linear operators As


n
A multiaffine operator A E An E
i=l
Ls(Jf, Y) when S ranges over the family of coalitions:

As(XJ, ... ,xn) := L


SeN
As(xsox) = L
SeN
As(x)

We identify A; with a constant A0 E Y.


Hence the collective constraint linking multiaffine operators and actions can
be written in the form:

'II t ~0, L As(t)(x(t)) E M


SeN
Evolution ofcomplex economic systems and uncertain information 19

For any i E S, we shall denote by (x_;, u;) E XN the sequence y E X N where


y1 := x1 when) 7:- i and y1 := u; when)= i.
We shall denote by As (x_;) E L(X;, Y) the linear operator defined by u; H As
(x_;) u; := As(x_;,u;). We shall use its transpose As(x_;)* E L(Y*, X;*) defined by:

We associate with q E Y* and elements x; E X; the multilinear operator6:


n
XJ® . .. ®Xn ® q E Ln(II xi· ,Y*)
i=l

p:~ (p ,, ... , p") E UX,' U x,)) x


u u
associating with any the element( (P;o q: ,0 ... 0

X" 0 q: p :~ (p,, ... ,p") E x; f-> (x,0 ... 0x"0q)(p): ~( (p,,x,)) q E Y'

This multilinear operator x 1® .. . ® Xn ® q is called the tensor product of the x;'s


andq.

II xi· 'Y*) II X; 'Y) for pairs


n n
We recall that the duality product on Ln( X Ln(
i=i i=i

(x 1® .. . ® Xn ® q, A) can be written in the form:

2.2 Multi-Hebbian Learning Process

Assume that we start with intrinsic dynamics of the actions X;, the resources y,
the connectionist matrices Wand the fuzzy coalitions x:

{ i)x;~t)=c;(x(t)), i=l, ... ,n


ii)As(t)=as(A(t)), SeN

Using viability multipliers, we can modify the above dynamics by introducing


regulees that are elements q E Y* of the dual q E Y* of the space Y:

Theorem 1 Assume that the functions c;, K; and as are continuous and that M c
Yare closed. Then the constraints
20 Chapter 1

V t ~0, L As(t)(x(t)) E M
Se N

are viable under the control system:

S3i

i =l, ... ,n
ii)A~ (t) = as(A(t)) -(® JES
x1 (t)) ® q(t), Sc N

where q(t)e N M { "


~S eN
As (t)(x(t)))

Remark: Multi-Hebbian Rule - When we regard the multilinear operator


As as a tensor product of components

A£n ,j = 1, ... , p, ik = 1, . .. , n;, i E S,


ieSik

differential equation ii) can be written in the form:

ViE S,j = 1, ... , p, k = 1, ... , n;,

!!_A£
dt n ,
=as
n ,
(A(t))-[nxi
. *
(t))q 1 (t)
ie Sk ieSk tE S

The correction term of the component A£n . of the S-linear operator is the
;es'*
product of the components X;* (t) actions x; in the coalition S and of the
component q1 of the viability multiplier. This can be regarded as a multi-Hebbian
rule in neural network learning algorithms, since for linear operators, we find the
product of the component xk of the presynaptic action and the component q1 of the
post-synaptic action.
Indeed, when the vector spaces x; := Rn' are supplied with basis e;* , k = 1,
... , n;, when we denote by < their dual basis, and when Y := W is supplied

with a basis j 1 its dual supplied with the dual basis J;, then the tensor products
(®eik
IES
)® J; (j = 1, . .. , p , k= 1, . .. , n;) form a basis of Ls(XS*, Y*).
Evolution of complex economic systems and uncertain information 21

Hence the components of the tensor product (®xi)® q in this basis are the

[TI
IES

products xik Jq 1 of the components q1 of q and xik of the x/s, where


IES

q 1 := (q,jl) and xi k := ( e;k , xi>· Indeed, we can write:

3. REGULATION INVOLVING FUZZY COALITIONS

3.1 Fuzzy Coalitions

This first definition of a coalition which comes to mind being that of a subset
of players S c N is not adequate for tackling dynamical models of evolution of
coalitions since the 2n coalitions range over a finite set, preventing us from using
analytical techniques.
One way to overcome this difficulty is to embed the family of subsets of a
(discrete) set N of n players to the space R n through the map X associating with
any coalitionS E P(N) its characteristic function 7 XsE {0, 1} ncR n, since R n can
be regarded as the set of functions from N to R.
By definition, the family of fuzzy sets 8 is the convex hull [0, 1] n of the power
set {0, 1} n in R n. Therefore, we can write any fuzzy set in the form:

X= LmsXs where ms ~ 0 and Lms = 1


SEP(N) SEP(N)

The memberships are then equal to:

ViE N,xi= Lms


SJi

Consequently, if ms is regarded as the probability for the setS to be formed, the


membership of the player i to the fuzzy set9 X is the sum of the probabilities of
the coalitions to which player i belongs. Player i participates fully in X if Xi = 1,
does not participate at all if Xi = 0 and participates in a fuzzy way if Xi E ]0, 1[.
22 Chapter I

We associate with a fuzzy coalition X the set P(X) := {i E N I X; ::F 0} c N of


agents i partipating to the fuzzy coalition X·
We also introduce the membership:

YsCX) :=II x1
}E S

of a coalition S in the fuzzy coalition X as the product of the memberships of


agents i of the coalition S. It vanishes whenever one the membership of one agent
does and boils down to individual memberships for one agent coalitions. when
two coalitions are disjoint (S n T = 0), then 'YsuT (X) = Ys(X)YT(x). In particular,
for any agent i E S, Ys(X) = Xi'YsliCX).

(II X; , Y), a sum of S-linear As.


n
Let A E An E Ls(X', Y) when S ranges over
i=l
the family of coalitions, be a multiaffine operator.
When X is a fuzzy coalition, we observe that:

A (Xo x) = I n-(X) As(x)


ScP(X)

I (IIx;:
Sc P (z) jES
As(x)

We wish to encapsulate the idea that at each instant, only a number of fuzzy
coalitions X are active. Hence the collective constraint linking multiaffine
operators, fuzzy coalitions and actions can be written in the form:

V t ?_ 0, I "{s (X(t)) A s (t)(x(t))


ScP(z(t ))

I (IIx;Ct))
Sc P(z(t )) ; ES
As (t)(x(t)) EM
Evolution ofcomplex economic systems and uncertain information 23

3.2 Constructing Viable Dynamics

Assume that we start with intrinsic dynamics of the actions x;, the resources y,
the connectionist matrices Wand the fuzzy coalitions X

i) x;(t) = c,(x(t)), i = l, ... ,n


{ ii)X;~t)=K,(X(t)), i=l, ... ,n
iii)As(t)=as(A(t)) , SeN

Using viability multipliers, we can modify the above dynamics by introducing


regulees that are elements q E Y* of the dual Y* of the space Y :

Theorem 2 Assume that the functions C;, K; and as are continuous and that M c
Yare closed. Then the constraints

\1 t ~0, L As(t) (X(t) o x(t))


ScP(z(r))

:L
ScP(z(t))
[Ilx/t))
JES
As (t)(x(t)) eM

are viable under the control system:

i = l, ... ,n

i = l,..., n

iii) A~(t) = a s (A(t)) -2:[rr X/t))( <?? x (t)) ® q(t),


S3t JE S J
1

SeN

where q(t) EM["N L..s c P(z(t))


(rrxJ
jES
(t))As (t)(x(t))l
24 Chapter 1

Let us comment these formulas. First, the viability multipliers q(t) E Y* can
be regarded as regulons, i.e., regulation controls or parameters, or virtual prices
in the language of economists. They are chosen adequately at each instant in
order that the viability constraints describing the network can be satisfied at each
instant, and the above theorem guarantees that it is possible. The next section
tells us how to choose at each instant such regulons (the regulation law).
For each agent i, the velocities x' ;(t) of the state and the velocities x' ;(t) of its
membership in the fuzzy coalition X (t) are corrected by subtracting:
I. the sum over all coalitions S to which he belongs of the As (t)(x_;(t))*q(t)
weighted by the membership Ys (X(t)):

x';(t) =c;(x;(t))- L Ys (X(t)) As(t)(x.;(t))*q(t)


SJi

2. the sum over all coalitions S to which he belongs of the costs (q(t), A s (t) (x(t)))
of the constraints associated with connectionist tensor As of the coalition S
weighted by the membership n-1;(X(t)):

X';(t) = K;{l(t))- L Ysi;(X(t)) (q(t), As(t)(x(t)))


SJi

This type of dynamics describes a panurgean effect. The (algebraic) increase of


agent i's membership in the fuzzy coalition aggregates over all coalitions to
which he belongs the cost of their constraints weighted by the products of
memberships of the agents of the coalition other than him.

As for the correction of the velocities of the connectionist tensors As, their
correction is a weighted "multi-Hebbian" rule: for each component product of
components A in
of As, the correction term is the product of the membership r
ieSik

S(X(t)) of the coalitionS, of the components X;


k
(t) and of the component c/ (t) of
the regulon:

3.3 The Regulation Map

Actually, the viability multipliers q(t) regulating viable evolutions of the


actions x;(t), the fuzzy coalitions X (t) and the multiaffine operators A (t) obey the
Evolution of complex economic systems and uncertain information 25

regulation law (an "adjustment law", in the vocabulary of economists) of the


form:

Vt ~0, q(t) E RM ( x(t), X(t), A(t))

where RM : X' x Rn x An(X', Y) ~ Y* is the regulation map RM that we can


compute.
For that purpose, we introduce the operator h :XV x Rn x An(XV, Y) defined
by:

h(x, x, A) :=I, A.s{xox)


SeN

and the linear operator H(x, X A) : Y* := Y H Y defined by:

Then the regulation map is defined by:

RM ( x, x, A) := H(x, X AY 1( I, (a s(A) x) +I, (y.s{x)As(x_;, c;(x)) +


SeN iES

YSI;(X)K;(X)As(x)))- TM(h(x, z, A)))

NOTES
l. The theory of tychastic control (or "robust control") can be studied in the framework of
dynamical games, when one player plays the role of Nature that chooses - plays -
perturbations. These perturbations, disturbances, parameters that are not under the control of the
controller or the decision-maker, could be called "random variables" if this vocabulary was not
already confiscated by probabilists. We suggest to borrow to Charles Peirce the concept of
tyche, one of the three words of classical Greek meaning "chance", and to call in this case the
control system as a tychastic system. See Aubin, Pujal, Saint-Pierre (2001) for more details.
2. Non mathematical accounts of such questions can be found in Aubin (to be edited).
3. Physicists and computer scientists have attempted to measure it in various ways, through the
concept of Clausius's entropy, Claude Shannon's information, Gilbert Chauvet's nonsymmetric
information, the degree of regularity instead of randomness, "hierarchical complexity" in the
26 Chapter 1

display of level of interactions, Andrei Kolmogorov, Gregory Chaitin & Ray Solomono_
"algorithmic information contents" (see Chaitin (1992) for instance) and other temporal or
spatial computational complexity indices measuring the computer time or the amount of
computer memory needed to describe a system, "grammatical complexity" measuring the
language to describe it, etc. Some economists link complexity issues with chaos theory as in
Day (1994; to be edited) for instance. Other investigators link complexity issues with
catastrophe theory, or fractals. See among many references (Peliti, Vulpiani 1987). Physicists -
and among them, specialists of "spin glasses" such as Giorgio Parisi ( 1990; 1992; 1996)-
proposes the number of equilibria of a dynamical system as a characteristic of complexity. Or,
even more to the point, "quasi equilibria", that are "small" areas of the state space in which the
evolution remains a "long time", before "jumping quickly" to another quasi equilibrium. The
concept of static and dynamical "connectionist complexity" indices when connectionist matrices
are used as regulons to regulate viable solutions and to the search of evolution minimizing at
each instant those indices was introduced in Aubin (1998b).
4. For simplicity, the set M(t) is assumed to be constant. But they could also evolve through
mutational equations and the following results can be adapted to this case. Curiously, the overall
architecture is not changed when the set of available resources evolves under a mutational
equation. See Aubin ( 1999) for more details on mutational equations.
5. That are nothing other than matrices when the operators are linear instead of multilinear.
Tensors are the matrices of multilinear operators, so to speak, and their "entries" depend upon
several indexes instead of the two involved in matrices.

IT X; ,Y) of n-linear operators from IT X;


n n
6. We recall that the space Ln( to Y is isometric to
i=l i=l
n n

the tensor product ®xi*®


i=]
Y, the dual of which IS ® x. ®Y that
i=] I
IS isometric with

IT X; *, Y*) .
n
Ln(
i=l
7. This canonical embedding is more adapted to the nature of the power set P(N) than the universal
embedding of a discrete set M of m elements to Rm by the Dirac measure associating with any j
E M the jth element of the canonical basis of Rm. The convex hull of the image of M by this
embedding is the probability simplex of Rm. Hence fuzzy sets offer a "dedicated
convexification" procedure of the discrete power set M := P(N) instead of the universal
convexification procedure of frequencies, probabilities, mixed strategies derived from its
2
embedding in Rm = R n
8. This concept of fuzzy set was introduced in 1965 by L. A. Zadeh. Since then, it has been wildly
successful, even in many areas outside mathematics! Lately, we found in "La lutte finale",
Michel Lafon (1994), p.69 by A. Bercoffthe following quotation of the late Franyois Mitterand,
president of the French Republic (1981-1995): "Aujourd'hui, nous nageons dans Ia poesie pure
des sous ensembles flous" ... (Today, we swim in the pure poetry of fuzzy subsets)!
9. Actually, this idea of using fuzzy coalitions has already been used in the framework of
cooperative games with and without side-payments (Aubin 1981 a; 1981 b; 1979 Chapter 12;
1998a Chapter 13; Mares 200 I; Mishizaki, Sokawa 200 I; Basile 1993; 1994; to be edited;
Basile, De Simone, Graziano 1996; Florenzano 1990). Fuzzy coalitions have also been used in
dynamical models of cooperative games in (Aubin, Cellina 1984 Chapter 4) and of economic
theory in Aubin (1997 Chapter 5).
Evolution of complex economic systems and uncertain information 27

REFERENCES
Aubin J.-P. (1979), "Mathematical Methods of Game and Economic Theory", Studies in
Mathematics and its applications, 7, North- Holland.
Aubin J.-P. (1981a), "Cooperative fuzzy games", Mathematical Operational Research, 6, l-13 .
Aubin J.-P. (1981 b), "Locally lipchitz cooperative games", J. Math. Economics, 8, 241-262.
Aubin J.-P. (1982), "An alternative mathematical description of a player in game theory", IIASA
WP, 82-122.
Aubin J.-P. (1991), Viability Theory, Birkhliuser, Boston, Basel, Berlin.
Aubin J.-P. (1993), "Beyond Neural Networks: Cognitive Systems", in Demongeot J., Capasso
(eds.) Mathematics Applied to Biology and Medicine, Wuers, Winnipeg.
Aubin J.-P. (1995), "Learning as adaptive control of synaptic matrices", in Arbib M. (ed.) The
handbook of brain theory and neural networks, Bradford Books and MIT Press.
Aubin J.-P. (1996), Neural Networks and Qualitative Physics: A Viability Approach, Cambridge
University Press.
Aubin J.-P. (1997) Dynamic Economic Theory: a Viability Approach, Springer-Verlag.
Aubin J.-P. (1998 a), Optima and equilibria (2nde edition), Springer-Verlag.
Aubin J.-P. (1998 b), "Connectionist complexity and its evolution", in Equations aux deriw!es
partielles, Articles dedies aJ.-L. Lions, Elsevier, 50-79.
Aubin J.-P. (1998 c), "Minimal complexity and maximal decentralization", in Beckmann H.J.,
Johansson B., Snickars F, Thord D. (eds.) Knowledge and Information in a Dynamic Economy,
Springer, 83-l 04.
Aubin J.-P. (1999), Mutational and morphological analysis: tools for shape regulation and
morphogenesis, Birkhiiuser.
Aubin J.-P. (2002), "Dynamic Core of Fuzzy Dynamical Cooperative Games, Annals of Dynamic
Games", Ninth International Symposium on Dynamical Games and Applications, Adelaide,
2000.
Aubin J.-P. (to be edited), La mort du devin, !'emergence du demiurge. Essai sur Ia contingence,
l 'inertie et Ia viabi/ite des systemes.
Aubin J.-P., Burnod Y. (1998), "Hebbian Learning in Neural Networks with Gates", Cahiers du
Centre de Recherche Viabilite, Jeux, Contr6le 981.
Aubin J.-P., Cellina A. (1984), Differential Inclusions, Springer-Verlag.
Aubin J.-P., Dordan 0 . (1996), "Fuzzy Systems, Viability Theory and Toll Sets", in Hung Nguyen
(ed.) Handbook ofFuzzy Systems, Modeling and Control, Kluwer, 461-488.
Aubin J.-P., Foray D. (1998), "The emergence of network organizations in processes of
technological choice: a viability approach", in Cohendet P., Llerena P., Stahn H., Umbhauer G.
(eds.), The economics of networks, Springer, 283-290.
Aubin J.-P., Frankowska H. (1990) Set-Valued Analysis.
Aubin J.-P., Louis-Guerin C., Zavalloni M. (1979) Comptabilite entre conduites sociales reelles
dans les groupes et les representations symboliques de ces groupes : un essai de formalisation
mathematique, Math. Sci. Hum., 68, 27-61.
Aubin J.-P. , Pujal D., Saint-Pierre P. (2001), Dynamic Management of Portfolios with Transaction
Costs under Tychastic Uncertainty, preprint.
Basile A., De Simone A., Graziano M.G. (1996), "On the Aubin-like characterization of
competitive equilibria in infinite-dimensional economies", Rivista di Matematica perle Scienze
Economiche e Sociali, 19, 187-213.
Basile A. (1993), "Finitely additive nonatomic coalition production economies: Core-Walras
equivalence", Int. Econ. Rew., 34,993-995.
Basile A. (1994), "Finitely additive correpondences", Procedings AMS 121, 883-891.
28 Chapter I

Basile A. (to be edited) On the range of certain additive correspondences, Universita di Napoli
Bonneuil N. (2000), "Viability in dynamic social networks", Journal of Mathematical Sociology,
24, 175-182.
Bonneui1 N. (1998) "Games, equilibria, and population regulation under viability constraints: An
interpretation of the work of the anthropologist Fredrik Barth", Population: An English
selection, special issue ofNew Methodological Approaches in the Biological Sciences, 151-179.
Bonneuil N. (1998), "Population paths implied by the mean number of pairwise nucleotide
differences among mitochondrial sequences", Annals of Human Genetics, 62, 61-73.
Bonneuil N., Rosenta1 P.-A. (2002), "Changing social mobility in 19th century France", Historical
Methods, Spring, 32, 53-73.
Bonneuil N., Saint-Pierre P. (2000), "Protected polymorphism in the theo-locus haploid model with
unpredictable firnesses", Journal of Mathematical Biology, 40, 251-377 .
Bonneuil N., Saint-Pierre P. (1998), "Domaine de victoire et strategies viables dans le cas d'une
correspondance non convexe : application a l'anthropologie des pecheurs selon Fredrik Barth",
Mathematiques et Sciences Humaines, 132, 43-66.
Chaitin G.J. (1992), Algorithmic information theory, Cambridge University Press.
Chauvet G. (1995), La vie dans Ia matiere, Flammarion.
Day R.H. (1994) Complex Economic Dynamics, Vol. I, An introduction to dynamical systems and
market mechanims, MIT Press.
Day R.H. (to be edited) Complex Economic Dynamics, Vol. II, An introduction to macroeconomic
dynamics, MIT Press.
Deghdak M., Florenzano M. (1999), "Decentralizing Edgeworth equilibria in economies with many
commodities", Economic Theory, 14, 287-310.
Elton C. (1958), The ecology of invasion in plants and animals, Cambridge University Press.
Filar J.A., Petrosjan L.A. (2000), Dynamic cooperative games, International Game, Theory Review,
2, 47-65.
Florenzano M. (1990), "Edgeworth equilibria, fuzzy core and equilibria of a production economy
without ordered preferences", Journal of Math. Anal. Appl., 153, 18-36.
Florenzano M. , Marakulin V.M. (2001), "Production equilibria in vector lattices", Economic
Theory,20.
Henry C. ( 1972), "Differential equations with discontinuous right hand side", Journal of Economic
Theory, 4, 545-551.
Hutchinson G.E. (1959), "Hommage to Santa Rosalia, or why there are so many kinds of animals",
American Naturalist, 93, 145-159.
Ioannides Y.M. (1997), "Evolution of trading structures", in Arthur, Durlauf, Lane (eds.) The
Economy as an evolving complex system, Addison-Wesley.
Lamarck J.-B. (1809), Philosophie biologique.
Livi R., Ruffo S. , Ciliberto S. , Buatti M. (eds) (1988), Chaos and complexity, Word Scientific.
Mares M. (2001), Fuzzy cooperative games. Cooperation with vague expectations, Physica Verlag
May R.M. ( 1973), Stability and complexity in model ecosytems, Princeton University Press.
Mayr E. ( 1988), Toward a new philosophy of biology, Harvard University Press.
Mishizaki 1., Sokawa M. (2001), Fuzzy and multiobjective games for conflict resolution, Physica
Verlag.
Parisi G. (1990), "Emergence of a tree structure in complex systems", in Solbrig O.T., Nicolis C.
(eds.) Perspectives on biological complexity, IUBS monograph series, 6.
Parisi G. (1992), Order, disorder and simulations, World Scientific.
Parisi G. (1996), Sulla complessita, inFra ordine e caos, Tumo M., Liotta E., Oruscci F. (eds),
Cosmopoli.
Peliti L., Vulpiani A. (eds) (1987) Measures of complexity, Springer-Verlag.
Evolution of complex economic systems and uncertain information 29

Petrosjan L.A. (200 1), "Dynamic Cooperative Games", Annals ofDynamic Games.
Petrosjan L.A., Zenkevitch N.A. (1996), Game Theory, World Scientific.
Rockafellar R.T., Wets R. (1997), Variational Analysis, Springer-Verlag.
Saari D.G. (1995), Mathematical complexity of simple economics, Notices of AMS.
Saint-Pierre P. (200 1), "Approximation of capture basins for hybrid systems", Proceedings of the
ECC 2001 Conference.
Shimokawa T., Pakdaman K., Takahata T., Tanabe S. Sato S. (to be edited), "A first-passage-time
analysis of the periodically forced noisy leaky integrate-and-fire model", Biological cybernetics.
Shimokawa T., Pakdaman K. & Sato S. (1999) Coherence resonance in a noisy leaky integrate-
and-fire model.
Shimokawa T., Pakdaman K., Sato S. (1999), "Time-scale matching in the response of a leaky
integrate-and-fire neuron model to periodic stimulation with additive noise", Physical Review E,
59, 3427-3443.
Smith J.M. (1974), Models in ecology, Cambridge University Press.
Weaver W. (1948), "Science and complexity", American Scientist, 36, 536.
Wigner E. (1960), "The unreasonable effectiveness of mathematics m the natural sciences",
Communications in Pure and Applied Mathematics, 13, 1.
Chapter 2

Possibilistic case-based decisions

Didier DUBOIS\ Eyke HULLERMEIERb, Henri PRADEa


aIRIT; lnstitut de Recherche en lnformatique de Toulouse. bUniversity of Marburg,
Department of Mathematics and Computer Science

Abstract: The idea of case-based decision making has recently been proposed as an alternative
to expected utility theory. It combines concepts and principles from both decision
theory and case-based reasoning. Loosely speaking, a case-based decision maker
learns by storing already experienced decision problems, along with a rating of the
results. Whenever a new problem needs to be solved, possible actions are assessed
on the basis of experience from similar situations in which these actions have
already been applied. We formalize case-based decision making within the
framework of fuzzy sets and possibility theory. The basic idea underlying this
approach is to give preference to acts which have always led to good results for
problems which are similar to the current one. We also propose two extensions of
the basic model. Firstly, we deal separately with situations where an agent has made
very few, if any, observations. Obviously, such situations are difficult to handle for
a case-based approach. Secondly, we propose a reasonable relaxation of the original
decision principle, namely to look for acts which have yielded good results, not
necessarily for all, but at least for most cases in the past.

Key words: Decision theory, Case-based reasoning, Possibility theory, Fuzzy sets.

INTRODUCTION

Early work in artificial intelligence (AI) has mainly focused on formal logic
as a basis for knowledge representation and has largely rejected approaches from
(statistical) decision theory as being intractable and inadequate for expressing the
rich structure of (human) knowledge (Horvitz, Breese, Henrion 1988). However,
the recent development of more tractable and expressive decision-theoretic

31

C. Lesage et al. (eds.), Connectionist Approaches in Economics and Management Sciences


© Springer Science+Business Media Dordrecht 2003
32 Chapter 2

frameworks and inference strategies, such as e.g. graphical formalisms (Pearl


1988; Heckerman, Geiger, Chickering 1995) in combination with the analysis of
restrictions of traditional AI reasoning techniques have stimulated renewed
interest in decision theory. In fact, ideas from decision theory now play a
predominant role in the modeling of rationality, one of the major topics of
current research in AI (Doyle, Dean 1997). Loosely speaking, the AI paradigm
has undergone a shift from "acting logically" to "acting rationally" (Russell,
Norvig 1995). The related view of intelligent behavior deviates fundamentally
from the classical "logicist" approach. While the latter emphasizes the ability to
reach correct conclusions from correct premises, the decision-theoretic approach
considers AI as the design of (limited) rational agents (Russell, Wefald 1991).
For this "agent-based" view of AI, intelligence is strongly related to the capacity
of successful behaviour in complex and uncertain environments and, hence, to
rational decision making.
Decision theory and AI can fertilize each other in various ways (Pomerol
1997). As already suggested above, classical decision theory provides AI with
important ideas and concepts of rationality, thus contributing to a formal basis of
intelligent agent design. Yet, it has been less concerned with computational and
knowledge representational aspects. AI can particularly contribute in this
direction. As concerns the aspect of knowledge representation, for instance,
research in AI has shown various possibilities of extending the decision-theoretic
frameworks usually considered in classical approaches. Recent developments
include the modeling of decision problems within qualitative (Brafmann,
Tennenholz 1996; Dubois, Prade 1995; Dubois, Prade, Sabadin 1998) and
constraint-based (Fargier, Lang, Schiex 1995) settings and make use of formal
logic in order to represent the knowledge of a decision maker in a more flexible
way (e.g. (Boutilier 1994; Dubois, Prade, Yager 1999, Pearl 1993)). These
approaches are intended to make decision-theoretic models more realistic,
tractable and expressive.
Here, we are mainly concerned with the idea of case-based decision making
(CBDM) which is originally due Gilboa and Schmeidler (1995; 2001). As the
notion suggests, CBDM is largely motivated by ideas from case-based
reasoning (CBR), by now a widely applied problem solving technique with
roots in cognitive psychology and artificial intelligence (Kolodner 1993;
Riesbeck, Shank 1989). In case-based reasoning, new problems are solved by
recalling experience from previously solved problems which are stored in a
memory of cases. This experience is exploited against the background of the
assumption that "similar problems have similar solutions". In CBDM, the same
principle is applied in the context of decision making: An agent faced with a
decision problem relies upon its experience from similar problems encountered in
Possibilistic case-based decisions 33

the past. Loosely speaking, it chooses an act based on the performance of


(potential) acts in previous problems which are similar to the current one.
Even though the model in Gilboa, Schmeidler (1995) has mainly been
introduced with economic applications in mind, C B D M is particularly
interesting from an AI perspective. Firstly, it combines principles from two
important subfields of AI, namely decision theory and CBR. Secondly, it
touches on interesting aspects of knowledge representation and reasoning. In fact,
the mental notions of preference and belief constitute the main concepts of
classical decision theories. Corresponding mathematical models are based on
formalizations of these concepts, such as e.g. preference relations, utility
functions, and probability distributions. The aforementioned approach of Gilboa
and Schmeidler leads to a decision theory in which the cognitive concept of
similarity plays a predominant role. Needless to say, incorporating this concept
into formal approaches to decision making raises some interesting (semantic)
questions. Particularly, it has to be clarified which role similarity plays and,
hence, what the relation between this and other concepts such as preference and
belief could be. Clearly, this question concerns basic assumptions underlying a
decision-theoretic model. One should therefore not expect to find definite
answers. Classical works by Ramsey (1931), De Finetti (1937), Von Neumann
and Morgenrstern (1953), and Savage (1954) as well as recent developments in
the field of decision theory, such as e.g. non-additive expected utility
(Schmeidler 1989, Gilboa 1987) or qualitative decision making, show various
ways of formalizing the notions of preference and belief (including measure-
theoretic approaches, such as e.g. fuzzy measures (Wakker 1990) and different
types of probability (De Finetti 1973), as well as more logic-oriented symbolic
methods (Tan, Pearl 1994)( Moreover, a consensus concerning the actual
meaning of the concept itself seems to exist even less in the case of similarity
than in the case of preference or uncertainty.
In Section 1, we provide a brief review and discussion of case-based decision
theory as introduced by Gilboa and Schmeidler. An alternative approach to
CBDM based on fuzzy set and possibility theory will be discussed in Section 2.
Two extensions of this approach are proposed in Sections 3 and 4. These
extensions can be motivated, among other things, by the idea of repeated decision
making, which arises quite naturally within the context of case-based reasoning
where new cases are encountered and, hence, experience is accumulated over
time. Firstly, an agent will generally have made very few, if any, observations at
the beginning of a decision sequence. This lack of experience does inevitably
cause problems for a case-based approach to decision making which will always
seem more or less arbitrary and, hence, will be open to criticism in such
situations. We approach this problem by allowing for some kind of
"hypothetical" reasoning in connection with a generalized evaluation of acts
34 Chapter 2

which allows for the representation of uncertainty. Secondly, the case-based


valuation of acts according to the original model appears to be rather drastic
since it is based on a sort of worst case evaluation. Indeed, the original decision
rule which requires good results for any (similar) case may lead to undesirable
consequences if an agent has to act repeatedly. Therefore, we relax it by looking
for acts which have yielded good results at least in most cases in the past.

1. CASE-BASED DECISION THEORY

This section gives a brief review of the model introduced by Gilboa and
Schmeidler (1995), referred to as case-based decicion theory (CBDT) by the
authors. Putting it in a nutshell, the setup they proceed from can be characterized
as follows : Let Q and A be (finite) sets of problems and acts, respectively, and
denote by R a set of results or outcomes. Choosing act a E A for solving problem
p E Q leads to the outcome r = r(p,a) E R. A utility function u: R~U, resp. u: Q
x A -fU assigns utility values to such outcomes; the utility scale U is taken as
the set of real numbers. Let

O"Q :Q xQ -7 [0,1], O"R : R xR -7 [0,1]

be similarity measures quantifying the similarity of problems and results,


respectively. Suppose the decision making agent to have a (finite) memory:

(Eq.l)

of cases at its disposal, where (pk,ak)EQ xA, rk =r(pk , ak) (l~k~n),


and suppose furthermore that it has to choose an act for a new problem p 0 E Q .
If a certain act a0 E A has not been applied to the problem p 0 so far (i.e. there
is no case (p 0 ,a0 ,r)E M ) the agent will generally be uncertain about the result
r(p0 ,a0 ) and, hence, about the utility u(r(p0 ,a0 )). According to the
assumption underlying the paradigm of CBDT it then evaluates an act based on
its performance in similar problems in the past, as represented by (parts of) the
memory M . More precisely, the decision maker is supposed to choose an act
which maximizes a linear combination of the benefits experienced so far:

V(ao) = VPo.M (ao) = L


(p. a0 ,r )eM
O"Q (p, Po). u(r). (Eq.2)

The summation over an empty set yields the "default value" 0 which plays the
role of an "aspiration level." Despite the formal resemblance between (Eq.2) and
Possibilistic case-based decisions 35

the well-known expected utility formula one should not ignore some substantial
differences between CBDT and expected utility theory (EUT). This concerns
not only the conceptual level but also mathematical aspects. Particularly, it
should be noted that the similarity weights in (Eq.2) do not necessarily sum up to
1. Consequently, (Eq.2) must not be interpreted as an estimation of the utility
u(r(p 0 ,a0 )). As an alternative to the linear functional (Eq.2), an "averaged
similarity" version has been proposed. It results from replacing O"Q in (Eq.2) by
the similarity measure:

(Eq.3)

whenever the latter is well-defined. (Note that this measure is defined separately
for each act a0 .) Theoretical details of CBDT including an axiomatic
characterization of decision principle (Eq.2) are presented in (Gilboa, Schmeidler
1995).
The basic model has been generalized concerning several aspects. The
problem of optimizing decision behavior by adjusting the aspiration level in the
context of repeated problem solving is considered in (Gilboa, Schmeidler 1996).
In (Gilboa, Schmeidler 1997), the similarity measure in (Eq.2) is extended to
problem-act tuples: Given two similar problems, it is assumed that similar
outcomes are obtained for similar acts (not only for the same act). Indeed, it is
argued convincingly that a model of the form:

V(a 0 ) = L O"QxA ((p,a),(p 0 ,a0 )) · u(r), (Eq.4)


(p ,a ,r)aM

where O"QxA is a (problem-act) similarity measure over Q xA , is more realistic


than (Eq.2). For example, an act a0 which has not been applied as yet is
generally not evaluated by the default utility 0 if experiences with a comparable
act a have been made. In fact, an outcome r(p,a) will then influence the rating
of a0 in connection with a problem p 0 which is similar to p . Besides, it should
be noticed that (Eq.4) allows for realizing some kind of analogical reasoning.
Suppose, for instance, that the effect expected from applying a0 to Po is
comparable to the effect of applying a to p . In that sense, ( a0 , p 0 ) might
appear to be quite similar to (a, p), although a and a0 as well as p and p 0 as
such are rather dissimilar.
With regard to alternative models of C B D M proposed in subsequent sections
it is useful to picture again the following properties of the decision criteria
outlined above:
36 Chapter 2

Accumulation/averaging: The criteria (Eq.2) and (Eq.4) realize a simple


summation of (weighted) degrees of utility. Consequently, a decision maker
might prefer an act a , which always brought about rather poor results, to an
act a* which has so far yielded very good results, simply because a has been
tried more often than a* . This effect is annulled by (Eq.3), where the use of a
normalized similarity measure yields an average utility.
- Compensation: Both decision rules compensate between good results and bad
results associated with an act a .

2. POSSIBILISTIC CASE-BASED DECISIONS


Following ideas presented in (Dubois, Prade 1997), case-based decision
making has been realized in (Dubois, Esteva, Garcia et al. 1997) as a kind of
similarity-based approximate reasoning. This approach is in line with methods of
qualitative decision theory. In fact, the assumption that uncertainty and
preference can be quantified by means of, respectively, a precise probability
measure and a cardinal utility function (as it is assumed in classical decision
theory) does often appear unrealistic. As opposed to (Eq.2), the approach
discussed in this section only assumes an ordinal setting for modeling decision
problems, i.e. ordinal scales for assessing preference and similarity. This
interpretation should be kept in mind, especially since both scales will
subsequently be taken as (subsets of) the unit interval.
Let ~ be a multiple-valued implication connective, that is a binary operator
(O,l]x(O,l] ~ (0,1] which is (at least) non-increasing in its first and non-
decreasing in its second argument. Given a memory M and a new problem p 0 ,
the following (estimated) utility value is assigned to an act a E A

t .
V M(a)= min aQ(P,Po) ~ u(r) (Eq.5)
Po, (p,a,r)EM

This valuation supports the idea of finding an act a which has always resulted in
good outcomes for problems similar to the current problem Po . Indeed, (Eq.5)
can be considered as a (generalized) truth degree of the claim that "whenever a
has been applied to a problem p similar to p 0 , the corresponding outcome has
yielded a high utility." An essential idea behind (Eq.5) is that of avoiding the
accumulation and compensation effect caused by the decision criterion (Eq.2)
since these effects do not always seem appropriate 2 .
As a special realization of (Eq.5) the valuation:
Possibilistic case-based decisions 37

is proposed, where h is an order-preserving function which assures the linear


scales of similarity and preference to be commensurable and n is the order-
reversing function of the similarity scale. By taking n as x ~ 1- x in [0, 1] and
h as the identity, we obtain:

(Eq.6)

This criterion can obviously be seen as a qualitative counterpart to (Eq.2).


Besides, the criterion:

(Eq.7)

is introduced as an optimistic counterpart to (Eq.6). It can be seen as a


formalization of the idea to find an act a for which there is at least one problem
which is similar to p 0 and for which a has led to a good result. Again, let us
mention that expressions (Eq.6) and (Eq.7) are closely related to decision criteria
which have recently been derived in (Dubois et al. 1995) in connection with an
axiomatic approach to qualitative decision making under uncertainty.
In the more general context of problem-act similarity, the decision rules
(Eq.6) and (Eq.7) become:

{ } (Eq.8)
V l M(a 0 )=. mm. max l-O'QxA((p, a),(p 0 ,a0 )),u(r)
Po. (p ,a ,r)EM

i . . f } (Eq.9)
V M (ao)= max mmlO'QxA ((p,a),(p 0 ,a0 )),u(r)
Po. (p,a,r)EM

In order to make the basic principles underlying the above criteria especially
obvious, suppose the qualitative utility scale to be given by U ={0, 1} .That is,
only a crude distinction between "bad" and "good" outcomes is made. (Eq.8) and
(Eq.9) can then be simplified as follows:

l . (Eq.lO)
V M(ao)=I - max O'QxA ((p,a),(p 0,a0 ))
Po• (p,a,r)EM:u(r)=O
38 Chapter 2

i (Eq.11)
V M(a0 )= max O'QxA((p,a),(po,ao))
Po, (p ,a,r)EM:u(r)=1

According to (Eq.lO), the decision maker only takes cases (p,a,r) with bad
outcomes into account. An act a0 is discounted whenever (p 0 , a 0 ) is similar to a
corresponding problem-act tuple (p, a). Thus, the agent is cautious and looks
for an act that it does not associate with a bad experience. According to (Eq.11),
it only considers the cases with good outcomes. An act a0 appears promising if
(p 0 ,a0 ) is similar to a tuple (p,a) which has yielded a good result. In other
words, the decision maker is more adventurous and looks for an act that it
associates with a good experience.
Alternative formalizations of C B D M have also been proposed in (Dubois,
Godo, Prade, Zapico 1998; Hiillermeier 1998; Hiillermeier 1999). For estimating
the utility of an act in connection with a new problem, these methods make use of
observed cases by more indirect means than the approaches discussed so far.
More precisely, a memory M of cases is used for deriving a quantification of
how likely a certain act will yield a certain outcome (utility). The (case-based
reasoning) hypothesis underlying these approaches is the assumption that "the
more similar two problems are, the more likely it is that an act leads to similar
results." Within the framework of (Hiillermeier 1998), where likely means
probable, a probability distribution on the set R of outcomes is derived from a
memory M . Likewise, possibility distributions are obtained in connection with
the possibilistic frameworks in (Dubois, Godo, Prade, Zapico 1998) and
(Hiillermeier 1999), where likely means possible and certain, respectively.

3. COPING WITH UNCERTAINTY

There are several kinds of uncertainty which might become relevant in


connection with C B D M. A first source of uncertainty concerns the observed
cases. A problem which occurs frequently, e.g., in connection with experimental
data, is that of imprecise observations, i.e., outcomes which cannot be observed
exactly. This kind of uncertainty can be taken into account, e.g., by modeling
outcomes as fuzzy sets R E F (R ) , where F (R ) denotes the set of all (normal)
fuzzy subsets of the set R of results.
It might also become necessary to give up the assumption that a problem
pE Q and an act aE A determine a unique outcome r(p,a). For instance, a
case-based reasoning framework might be assumed in which results are treated as
random variables. Again, there are different motivations for such a non-
deterministic setting. For example, the process which determines the result
associated with a problem p and an act a might indeed be subject to some
Possibilistic case-based decisions 39

random influences. A second motivation, which seems to be of considerable


practical relevance, is related to the completeness, precision, and granularity of
information. Even though the application of a certain act might principally
determine the outcome, the characterization of the problem might be imprecise,
incomplete, or not detailed enough. Thus, choosing an act a for repeatedly
solving the (apparently) same problem p might result in different outcomes. An
uncertainty measure associated with a problem/act tuple (p,a) is then used for
characterizing the true but unknown result. Consider the case where the
description of the problem p contains missing attribute values as an example.
The uncertainty concerning the outcome r(p, a) is then directly related to the
uncertainty concerning the values of these attributes. A further example is the
problem of decision making in game playing. Namely, the outcome associated
with a certain act will generally depend on the decision of the opponent as well.
The latter, however, is not part of the problem description.
Here, we are particularly concerned with a second source of uncertainty
which actually corresponds to a lack of information, and which is not related to
the observed cases. Rather, it concerns the cases which have not been
encountered so far. By this we mean the problem that a case-based decision
procedure will inevitably get into trouble, or at least become dubious, if not
enough cases have been observed. The assignment of the "default value" 0 in
connection with (Eq.2), for instance, might appear somehow arbitrary. The
alternative models of CBDM mentioned at the end of Section 1.3 seem
advantageous with respect to this problem (Htillermeier 1999). The fact that no
cases or, more generally, no similar cases have been observed so far can be
modeled adequately by means of the possibility distribution J[ = 1 on R .
Namely, the latter is an expression of complete ignorance, which cannot be
depicted by less expressive scalar estimations such as (Eq.2) and (Eq.6).
Problems caused by a lack of information also occur in connection with
(Eq.6). As pointed out in (Dubois, Esteva, Garcia et al. 1997) this valuation only
makes sense if the memory contains at least one problem p such that
O"Q (p, Po) = 1, and where a has been chosen for solving p . Otherwise, it may
happen that (Eq.6) is very high even though none of the problems contained in
the memory is similar to the current problem p 0 . Particularly:

which does not seem satisfactory.


40 Chapter 2

Modifications of (Eq.6) and its optimistic counterpart have been proposed in


order to cope with these difficulties. The modified measures are based on some
kind of normalization of the similarity function for each act a , and a discounting
of (Eq.6) and (Eq.7) which takes the absence of problems similar to p 0 into
account. More precisely, the modified version of (Eq.6) is given by:

M(a)=minJ. M(a,p 0 ), min max{l-aQ' (p,p 0 ),u(r)}}


Vp.j,
O• yz (p,a,r)EM
(Eq.12)

where:

and a' (-, p 0 ) denotes a renormalization of O"Q (-, p 0 ) such as, e.g.,
aQ(-,p0)!hM (a,p 0 ) (assuming hM (a,p 0 )>0.) The idea behind (Eq.12) is
that the willingness of a decision maker to choose act a is upper bounded by the
existence of problems which are completely similar to p 0 , and to which a has
been applied. Moreover, aQ (-, p 0 ) is renormalized in order to obtain a
meaningful degree of inclusion. Thus, (Eq.12) corresponds to the compound
condition that "there are problems similar to p 0 to which act a has been
applied, and the problems which are most similar to p 0 are among the problems
for which a has led to good results." Observe that (Eq.6) is retrieved from
(Eq.12) as soon as hM (a,p 0 ) = 1.
We shall now propose a generalization of (Eq.6) which can handle the two
above-mentioned sources ofuncertain ty in a unified way, and which is also able
to express uncertainty in connection with the valuation of an act. To this end, it
should first be noticed that we can write (Eq.6) as:

(Eq.13)

where the values 0 = a 0 < 0"1 < ... <am =1 constitute the (finite) set
{ aQ (p,p') Ip,p'E Q} of possible similarity degrees of problems, and:

is the lowest utility obtained in connection with act a for problems which are
ak -similar to Po. Moreover, vk = 1 (by definition) if Vk = 0, which just leads
to the problem that (Eq.6) becomes large if only few observations have been
made.
Possibilistic case-based decisions 41

According to (Eq.13), the valuation (Eq.6) of an act is completely determined


by the lower bounds vk (0 ~ k ~ m), which are derived from the memory M .
This reveals that (Eq.6) can be seen as some kind of "experience-based"
approximation of the well-known MAXIMIN decision principle. The case in
which all problems are completely similar makes this especially apparent. Then,
(Eq.6) valuates an act a simply according to the worst consequence observed so
far.
More generally, the value vk can be seen as an estimation of the lower utility
bound:

i.e., the smallest degree of utility which can be obtained in connection with act a
for (not necessarily encountered) problems from Q which are ak -similar to p 0 .
Then, V P10 •M (a) can be interpreted as an approximation of:

which defines a similarity-based generalization of a MAXIMIN-evaluation. In


vt
fact, Wp!0 (a) iS equal tO ~ . M (a) if a has already been applied tO all problems
(up to p 0 ) from Q , i.e., if:

{p 13reR : (p,a,r)E M }=Q \ {p0 }

The above considerations suggest a generalization which is obtained by


replacing the scalar values vk by fuzzy sets and applying the extension principle
(Zadeh 1965) to (Eq.13):

vpj, M(a) =maxJmin wk (vk )lvl ,... ,vn E R, min max{l-(JJ. , VJ}= v} (Eq.l4)
O' ~~k~m O~k~m

where ~ E F (U) (0 ~ k ~ m) , and vLM


(a) denotes the membership
function of vLM
(a) which is now a fuzzy set. wk
represents the available
information about wk, and a value Wk( v) is understood as the possibility that
the lower bound wk is given by v. The model (Eq.6) then corresponds to the
special case where ~ is derived from M according to Wk = X{vdJ
The problem of uncertainty due to the absence of solved problems which are
(completely) similar to p 0 can now be handled in a more flexible way. Consider,
for instance, the case where vk = 0 , i.e., no problem has been encountered so
far which is ak -similar to p 0 and to which act a has been applied. As already
42 Chapter 2

mentioned above, the original approach (Eq.6) does then (implicitly) estimate the
lower bound wk by vk = 1, whereas (Eq.l3) is able, e.g., to express complete
ignorance via ~ = 1. Particularly, letting Wm = 1 in the case where Vm = 0
implies that vp:.M (a )(0) = 1 , i.e., the fact that act a should be assigned the
valuation 0 seems completely possible.
The above modeling of ignorance concerning the lower bound wk might be
generalized to the case where V, :;t: 0 by means of ~ ( v) = 1 if v ~ vk , and
Wk(v) = 0 otherwise. It seems reasonable, however, to think of more general
definitions of ~. Seen from the viewpoint of CBR, for instance, the memory
M may provide evidence concerning wk even if V, = 0 . In this connection it
seems particularly interesting to combine (Eq.5) with the possibilistic methods
mentioned at the end of section 2, which leads to a more "hypothetical"
specification of the ~ . Suppose, for example, the C B R principle, suggesting
that acts lead to similar outcomes for similar problems, to be strongly supported
by the observations which have been made so far. Moreover, suppose that a
certain act a has often led to good results for problems which are very (but not
perfectly) similar to the problem p 0 under consideration. It seems, then, likely
that a also leads to good outcomes when being applied to problems which are
completely similar to p 0 • Thus, Wm(v) should be small for small utility values
v, even though a case (p,a,r) such that O'Q (p,p 0 ) = O'm = 1 has not yet been
encountered.
Observe that (Eq.l4) also allows for the utilization ofbackground knowledge
which is not derived from the memory M . It might be known from a further
information source, for instance, that wk does definitely not fall below a certain
bound v' k , or at least that wk < v' k is unlikely, which leads to ~ ( v) = 0 resp.
Wk(v) << 1 for v < v'k.
Based on (Eq.l4) in conjunction with a generalized CBDM framework we
can also approach the first type of uncertainty, which has been mentioned at the
beginning of this section. To this end, we extend the set of outcomes to F (R ) ,
i.e., the set of all (normal) fuzzy subsets of the set R of results. A representation
Wk of knowledge about the lower bound wk is then derived from "fuzzy" cases
of the form (p, a, R) E Q xA xF (R ) . A value R(r) is interpreted as the
possibility .1l'(X =r) that the (unknown, not precisely observed) outcome X is
given by r E R . The derivation of Wk from "fuzzy" cases can be realized by
applying the extension principle to a derivation of ~ from "crisp" cases.
It has already been hinted at in the introduction that Gilboa and Schmeidler's
approach to case-based decision making is partly motivated by the idea of
avoiding any kind of "hypothetical" reasoning. As pointed out in (Gilboa,
Schmeidler 1995), such reasoning might become necessary in connection with
E UT since the decision maker has to know, e.g., all outcomes associated with
act/state pairs. It should, therefore, be mentioned that the hypothetical reasoning
in connection with (Eq.l4) is by far less demanding. Particularly, it does not
Possibilistic case-based decisions 43

require any knowledge which is not available. On the contrary, nothing prevents
us from using Wk in order to express complete ignorance. If available, however,
general background knowledge (including hypothetical knowledge "derived"
from the CBR assumption) should be utilized, and (Eq.14) presents the
opportunity for doing this. Comparing acts in the context of the generalized
model (Eq.14) turns out as the problem of comparing fuzzy sets resp. possibility
distributions. Needless to say, such a comparison is less straightforward than the
comparison of scalar values. In fact, there are different possibilities to approach
this problem (Dubois, Prade 1999) which are, however, not further discussed
here.

4. FUZZY QUANTIFICATION IN ACT EVALUATION

In some situations, the extremely pessimistic and optimistic nature of the


criteria (Eq.6) and (Eq.7), respectively, might appear at least as questionable as
the accumulation in (Eq.2). Here we shall propose a generalization of the
decision rule (Eq.6) which is a weakening of the demand that an act has always
produced good results for similar problems. In fact, one might already be
satisfied if a turned out to be a good choice for most similar problems, thus
allowing for a few exceptions (Dubois, Prade 1997). In other words, the idea is to
relax the universal "for all" quantifier. Observe that a similar generalization of
(Eq.7), which replaces "there exists" by "there are at least several" and, hence,
corresponds to a strengthening of this decision principle, seems reasonable as
well. It can be obtained analogously.
Consider a finite set A of cardinality m =i A 1. In connection with
propositions of the form "most elements of A have property X" the fuzzy
quantifier "most" can be formalized by means of a fuzzy set (Dubois et al.
1988,Yager 1985)3 , the membership function J.L: {0,1, ... ,m} ~ [0,1] of which
satisfies:

\;/ 1 ::;; k ::;; m - 1: J.L( k) ::;; J.L( k + 1) and J.L ( m) =1. (Eq.15)

The special case "for all" then corresponds to J.L(k) = 0 for 0 :s; k::;; m -1 and
J.L( m) = 1 . Given some J.L satisfying (Eq.15), we define an associated
membership function 71 by 71(0) =0 and 71(k) =1- J.L(k -1) for 1::;; k :s; m
(see e.g. (Dubois et al. 1997b)). A membership degree 71(k) can then be
interpreted as quantifying the importance that the property X is satisfied for k
(out of the m ) elements.
44 Chapter 2

Consider a memory M of cases, a problem p 0 E Q , an act a E A , and let


Ma={(p',a',r')EM la=a'}. Moreover, let ,Uformalize the above-
mentioned "for most" concept. A reasonable generalization of (Eq.6) is then
given by:

V
Po .
M (a) = O<;k'>IM
min max {1 -
al
,ll ( k) , ga (k )} , (Eq.16)

where:

oa(k)= M .emax . min max{l-crQ (p,po),u(r)},


M .:IM l=k (p ,a,r)eM

1 ::;; k ::;;JM J,
a defines the degree to which "the act a has induced good
outcomes for similar problems k times". The extent to which a (small) degree
ga (k) decreases the overall valuation of a is upper bounded by 1 = ,ll ( k) , i.e.
by the respective level of (un-)importance. Observe that we do not have to
consider all subsets M 'c M a of size k for deriving ga (k) . In fact, for
computin9 it is reasonable to arrange the m =I M a I values
v=maxt1-0'Q (p,p 0 ),u(r)} in a non-increasing order v1 ;:::v2 ;:::, •• ;:::vm.
Then, (Eq.16) is equivalent to:

VP 0 M(a) min max {1- ,ll ( k), vk} ,


= O'>k'>IM
' al

where v0 = 1.
The generalized criterion (Eq.16) can be useful, e.g. in connection with the
idea of repeated decision making which arises quite naturally in connection with
a case-based approach to decision making. We might think of different scenarios
in which repeated problem solving becomes relevant. A simple model emerges
from the assumption that problems are chosen repeatedly from Q according to
some selection process which is not under the control of the agent, such as e.g.
the repeated (and independent) selection of problems according to some
probability measure. More generally, the problem faced next by the agent might
depend on the current problem and the act which is chosen for solving it. A
Markov Decision Process extended by a similarity measure over states (which
correspond to problems) may serve as an example. Besides, we might consider
case-based decision making as a reasonable strategy within a (repeated) game
playing framework, such as e.g. the iterated prisoner's dilemma (Axelrod 1984).
See (Htillermeier, Dubois, Prade 1999) for an experimental study in which
(Eq.16) is applied in repeated decision making.
Possibilistic case-based decisions 45

CONCLUSION

We have introduced a possibilistic model of case-based decision making


which appears particularly interesting from the viewpoint of knowledge
representation and reasoning since it combines ideas from qualitative and case-
based decision making. Such an approach, however, does inevitably raise some
difficulties. Firstly, a case-based approach to decision making seems problematic
if an agent suffers from a lack of experience, in the sense that it has not yet
encountered enough cases. In order to alleviate this problem we have replaced
the scalar evaluation of an act by a "fuzzy" evaluation, which allows for the
expression of uncertainty concerning the usefulness of an act. Besides, this
approach allows for the processing of observations with imprecise outcomes, and
for making use of background knowledge which is not necessarily based on
observed cases.
Secondly, a qualitative setting for decision making rules out the idea of
"averaging" in connection with the valuation of an act and, hence, does generally
favor rather extreme decision rules. Just as, e.g., the MAXIMIN decision
principle, the case-based valuation of acts according to the above-mentioned
model is extremely pessimistic. We have, therefore, proposed a formalization of
a relaxed version of this ordinal (heuristic) decision principle. Instead of
requiring that an act has always led to good outcomes for previous problems
which are similar to the problem under consideration, the proposed version is less
demanding and allows for some exceptions. This tolerance toward exceptions
seems to be advantageous, e.g., if an agent has to act repeatedly over time.
Moreover, it makes the model more flexible and provides the opportunity of
adapting it to a particular class of problems.

NOTES
I. Needless to say, a validation or comparison of decision-.theoretic models is generally difficult,
no matter whether from a descriptive or a normative point of view.
2. Note that the accumulation effect is also the main motivation for the normalization (Eq.3).
3. Other possibilities of expressing a fuzzy quantifier exist as well, including the use of order-
statistics (Prade, Yager 1994) and an ordered weighted minimum or maximum (Dubois, de
Berre, Prade, Sabbadin 1999).
46 Chapter 2

REFERENCES
Axelrod R. (1984), The Evolution ofCooperation. Basic Books, Inc., New York.
Boutilier C. (1994), "Toward a logic for qualitative design theory", in Doyle J., Sandewall E.,
Torasso P. (eds.), Proceedings K.R-94, 41h International Conference on Principles of Knowledge
Representation and Reasoning, Bonn, Germany,75-86.
Brafmann R., Tennenholtz M. (1996), "On the foundation of qualitative decision theory", in
Proceedings AAAI-96, 131h National Conference on Artificial Intelligence, AAAI-Press, 1291-
1296.
De Finetti B. (1937), "La prevision: Ses lois logiques, ses sources subjectives", Annates de
L I' nstitut Henri Poincare, VII, 1-68.
Doyle J., Dean T. "Strategic directions in artificial intelligence", AI Magazine, 18(1), 87-101.
Dubois D., de Berre D., Prade H., Sabbadin R. (1999), "Using possibilistic logic for modelling
qualitative decision: ATMS-Based algorithms", Fundamenta Informaticae, 37,1-30.
Dubois D., Esteva F., Garcia P., Godo L., de Mantaras R.L., Prade H. (1997), "Fuzzy modelling of
case-based reasoning and decision", in Leake D.B., Plaza E. (eds.), Case-based Reasoning
Research and Development, Proceedings ICCBR-97, Springer-Verlag, 599-610.
Dubois D., Godo L., Prade H., Zapico A. (1998), "Making decision in a qualitative setting: from
decision under uncertainty to case-based decision", in Cohn A. G., Schubert L., Shapiro S. C.
(eds.), Proceedings of the 61h International Conference on Principles of Knowledge
Representation and Reasoning (K.R-98), Trento, Italy, 594-605.
Dubois D., Nakata M., Prade H. (1997), "Extended divisions for flexible queries in relational
databases", Technical Report 97-43 R, IRIT - Institut de Recherche en lnformatique de
Toulouse, Universite Paul Sabatier, September.
Dubois D., Prade H. (1995), "Possibility theory as a basis for qualitative decision theory", in
Proceedings IJCAI-95, /4 1h International Joint Conference on Artificial Intelligence, Montreal,
1924-1930.
Dubois D., Prade H. (1997), "A fuzzy set approach to case-based decision", in Felix R. (ed.),
EFDAN-97, 2nd European Workshop on Fuzzy Decision Analysis and Neural Networks for
Management, Planning and Optimization , , Dortmund, Germany, 1-9. A revised version has
appeared in Reusch B., Temme K.H. (eds) (2001), Computational Intelligence: Theory and
Practice, Physica-Verlag, Heidelberg, 1-14.
Dubois D., Prade H. (1999), "A unified view of ranking techniques for fuzzy numbers",
Proceedings FUZZ-IEEE-99, Seoul.
Dubois D., Prade H., Sabbadin R. (1998), "Qualitative decision theory with Sugeno integrals", in
Proceedings UAI-94, l41h Conference on Uncertainty in Artificial Intelligence, Morgan
Kaufmann, 121-128.
Dubois D., Prade H., Testemale C. (1988), "Weighted fuzzy pattern matching", Fuzzy Sets and
Systems, 28,313-331,.
Dubois D., Prade H., Yager R. R. (1999), "Merging fuzzy information", in Bezdek J. C., Dubois
D., Prade H. (eds), Fuzzy Sets in Approximate Reasoning and Information Systems, Kluwer
Academic Publishers, Boston, 335-401.
Fargier H., Lang J., Schiex T. (1996), "Mixed constraint satisfaction: a framework for decision
problems under incomplete knowledge", in Proceedings AAAI-96, 131h Conference on Artificial
intelligence, Portland, Oregon, 175-180.
Fine T. L. (1973), Theories ofProbability, Academic Press, New York.
Gilboa I. (1987), "Expected utility with purely subjective non-additive probability", Journal of
Mathematical Economics, 16, 65-88.
Possibilistic case-based decisions 47

Gilboa I., D. Schmeidler (1996), "Case-based optimisation", Games and Economic Behavior,
15(1 ), 1-26.
Gilboa I., Schmeidler D. (1995), "Case-based decision theory", Quarterly Journal of Economics,
110(4), 605-639.
Gilboa I., Schmeidler D. (1997), "Act similarity in case-based decision theory", Economic Theory,
9, 47-61.
Gilboa 1., Schmeidler D. (2001), A Theory of Case-Based Decision, Cambridge University Press,
U.K.
Heckermann D., Geiger D., Chickering D. (1995), "Learning Bayesian networks: The combination
of knowledge and statistical data", Machine Learning; 20.
Horvitz E.J., Breese J.S., Henrion M. (1988), "Decision theory in expert systems and artificial
intelligence", International Journal ofApproximate Reasoning, 2, 247-302.
Hiillermeier E. (1998), "A Bayesian approach to case-based probabilistic reasoning", in
Proceedings IPMU-98, 7'h International Conference on Information Proceeding and
Management of Uncertainty in Knowledge-Based Systems, Paris, La Sorbonne, July, Editions
E.D.K, 1296-1303.
Hullermeier E. (1999), "A possibilistic formalization of case-based reasoning and decision
making", in Reusch B. (ed.), Proceeding of the 6'h International Conference on Computational
Intelligence, Lecture Notes in Computer Science, Springer-Verlag, Dortmund, Germany, May,
1625, 411-420.
Hullermeier E., Dubois D., Prade H. (1999), "Extensions of a qualitative approach to case-based
decision making: Uncertainty and fuzzy quantification in act evaluation", in Zimmermann H.J.
(ed.), EUFIT-99, ih European Congress on Intelligent Techniques and Soft Computing,
Aachen.
Kolodner J.L. (1993), Case-based reasoning, Morgan Kaufmann, San Mateo.
Pearl J. (1988), Probabilistic Reasoning in Intelligent Systems. Networks of Plausible Inference.
Morgan Kaufmann, San Mateo, CA.
Pearl J. (1993), "From qualitative utility to conditional "ought to"", in Heckerman D., Mamdani H.
(eds.), Proceedings 9'h International Conference on Uncertainty in Artificial Intelligence, San
Mateo, CA, Morgan Kaufmann, 12-20.
Pomerol J.C. (1997), "Artificial intelligence and human decision making", European Journal of
Operational Research, 99, 3-25.
Prade H., Yager R.R. (1994), "Estimations of expectedness and potential surprise in possibility
theory", International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems, 2, 417-
428 .
Ramsey F.P. (1931), "Truth and probability", in The Foundations of Mathematics and Other
Logical Essays, Kegan Paul, London.
Riesbeck C.K., Schank R.C. (1989), Inside Case-based Reasoning. Hillsdale, New York.
Russell S. J., Wefald E. ( 1991 ), "Principles of metareasoning", Artificial Intelligence, 49, 361-395.
Russell S.J., Norvig P. (1995), Artificial Intelligence: A Modern Approach. Prentice Hall, New
Jersey.
Savage L.J. (1954), The Foundations of Statistics. John Wiley and Sons, Inc., New York.
Schmeidler D. (1989), "Subjective probability and expected utility without additivity", Economica,
57,571-587, (First Version 1982).
Tan S.W ., Pearl J. (1994), "Qualitative decision theory", in Proceedings AAAI-94, JJ'h National
Conference on Artificial Intelligence, Seattle, WA, 928-933 .
Von Neumann J., Morgenstern 0. (1953), "Theory of Games and Economic Behavior, John Wiley
and Sons.
48 Chapter 2

Wakker P. (1990), "A behaviour foundation for fuzzy measures", Fuzzy Sets and Systems, 37, 327-
350.
Yager R.R. (1985), "Aggregating evidence using quantified statements", Information Sciences, 36,
179-206.
Zadeh L.A. (1965), "Fuzzy sets", Information and Control, 8, 338-353.
Chapter 3

Introduction to multilayer perceptron and


hybrid hidden Markov models

Joseph RYNKIEWICZ
SAMOS-MATISSE, University Paris I, 90, rue de Tolbiac, 75634 PARIS CEDEX 13, France,
rynkiewi@univ-parisl jr

Abstract: A time series is a succession of measures that constitute equidistant measures in


time. It is quite obvious that it would be a major advantage if we could guess how
these series are going to behave. Statisticians have therefore developed tools to
model this behaviour and to try to optimise their predictions of the future values of
the process being observed. In the present paper we will be studying the
contributions that neural networks, and more specifically multilayer perceptrons
(MLP), have made to time series. In the first section, we will mostly be looking at
the MLPs' selection of architecture. In the second section, we will be focusing on an
example of a piecewise stationary time series which requires that we use several
regression models simultaneously and choose, in a probabilistic manner and at any
given moment in time, which of these models makes the most relevant prediction.
Lastly, we will be applying these methods to the modelling of a pollutant emission
series relating to ozone levels in Paris.

Key words: Time series, Multilayer Perceptron (MLP), Pollution

INTRODUCTION

A time series is a succession of measures that constitute equidistant measures


in time. For example, it can be made up of CAC 40 index prices, taken on a daily
basis, or else a country's GDP, measured on a year-by-year basis. It is quite
obvious that it would be a major advantage if we could guess how these series
are going to behave. Statisticians have therefore developed tools to model this
behaviour and to try to optimise their predictions of the future values of the
process being observed. In the present paper we will be studying the

49

C. Lesage et al. (eds.), Connectionist Approaches in Economics and Management Sciences


© Springer Science+Business Media Dordrecht 2003
50 Chapter 3

contributions that neural networks, and more specifically multilayer perceptrons


(MLP), have made to time series. In the first section, we will mostly be looking
at the MLPs' selection of architecture. This will allow us to avoid any over-
parameterisation of the model, something that would cause over-learning. In the
second section, we will be focusing on an example of a piecewise stationary time
serie which requires that we use several regression models simultaneously and
choose, in a probabilistic manner and at any given moment in time, which of
these models makes the most relevant prediction. Lastly, we will be applying
these methods to the modelling of a pollutant emission series relating to ozone
levels in Paris.

1. AUTO-REGRESSIVE MODELS

The present paper looks at the parametric modelling of time series. More
specifically, we will be studying those models that use multilayer perceptrons
(MLP) as a regression function.
Consider the following times series model:

where:
I. Y1E R is the observation at time "t" of the time series;
2. £1 are independent and identically distributed (i.i.d.) random variables with
null expectation and a constant variance cr 2 for example a variable N(O,cr 2)
independent from the series past;
3. F w0 is a function represented by a MLP whose parameters (weighting) is
the vector WoERD .
To simplify the notation, we will only be considering self-regressive models
of the first order. However, it would be easy to generalise at a higher order. As
such, the phenomenon we observe ( Y;) is the combination of a deterministic
function out of the process's past and of a random event. If we are familiar with
the underlying deterministic function F w , we can optimise the predictions we
0

make.
Given that this function is entirely determined by its vector parameter, the
statistician's job consists of estimating the parameter, W0 using a finite number of
observations (y 0 ,···,yr).
To do this, we minimise in W a functional such as the average of the residual
squares:
Introduction to multilayer perceptron 51

and we note:

Wr = argminSr(W)
w

the least squares estimator of w0 .

1.1 Theoretical findings of an MLP with a hidden layer

1.1.1 The MLP model

A function can be represented by an MLP in the following manner:

Figure 1. Multilayer Perceptron

Here the MLP is represented by a function of R 2 ~R which at (x, y )


associates Fw (x,y) with:

Fw(x,y)=w10 + w11 ¢'(w1 +xxw4 + yxw7 )


+w12 ¢'(w2 +xxw5 + yxw8 )
+w13 ¢'(w3 +xxw6 + yxw9 )
52 Chapter 3

The activation function ¢of the hidden layer is generally a sigmoid function that
we can consider (without any loss of generality) as being equal to the hyperbolic
tangent. As such, here the vector parameter is: W = (w 1 , • • ·, w 13 ).
From here on in, we will be assuming that our model is identifiable, meaning
that there can only be one representative vector parameter for any function that
can be represented by a given MLP. To obtain this property, all of the parameters
will have to be restrained in a suitable fashion (Sussmann 1992).

1.1.2 Statistical properties

We are interested in the behaviour of the estimator WT when T tends towards


infinity. It is useful to have two fundamental properties at our disposal:
T~oo
- Consistency, meaning that WT W0
- The asymptotic normality ensures that the preceding convergence will take
place at a rate of JT,
and makes it possible to obtain a limit law of WT.

For example, we can find in Yao (2000) a demonstration of the following


theorem:

Theorem 1 Consistency and asymptotic normality of the estimator WT


avec¢(x) = tanh(x), we assume that:

l)(Et )tEN* is a series i.i.d. such that E(c?) < oo,

2)Wo belongs inside of a compact sub-set of Euclidian space RD.


3) If J..Lo is the stationary measurement of (Yt} a matrix with a dimension of
mxm

is definite positive.

In which case:
- The estimator WT will almost surely converge towards w0 when T tends
towards +oo.
Introduction to multilayer perceptron 53

- The term .JTlw T - W0 J converges by law towards a multidimensional

Gaussian distribution N(O, I.01).

1.1.3 Identification of the model

One of the main problems encountered when using increasingly complex


functions to estimate processes statistically is that the models can be overfitted.
In actual fact, if we use an overly complex model in an area where too little data
exist, we end up with a modelling of the noise that had generated the data. By so
doing, we introduce a bias that strongly undermines the model's ability to make
predictions using new and as yet unobserved data on the same process. An
efficient statistical principle for fighting against the bias introduced by a
complexification of models is the use of a penalty term that is itself a function of
the number of parameters being applied (Akaike 1974).

Let us assume that an upper limit M exists for all of the possible dimensions
of the model. This can be indicated as:

a family of dominant models with a dimension of M, so that a true parameter W0 ,


with a dimension of B, can be expressed as a vector of this family with M-B null
components. Note Wf a least squares estimator with a dimension of L. The
parsimony principle would then consist of choosing the estimator that minimises
the new penalised cost function:

or else:

L
CP*(WL) = ln(ST(W )) + c(T) xL
T T
where ~T) is the rate ofpenalisation.
If c(T~2 , the penalised contrast cp* is then equal to Akaike's AIC criterion,
if c(T) = 2ln(T), cp* is equal to Schwarz's BIC criterion (Schwarz 1978).
Based on these definitions, we display the finding whose proofs can be found in
Yao (2000) or else in Rynkiewicz (2000).
54 Chapter 3

Theorem 2 We assume that the conditions of the theorem I have been


verified. We assume also that the penalisation rate c(T) is such that:

liTm c(TT) = 0, et lim inf c(T) > 0' 2 A


T 2ln lnT A.

where A (resp.A) is the largest (resp. the smallest) proper value of the matrix I:0 .

In this case:
1. the couple (L, wf) will almost surely converge towards the true value and

2. the true dimension (Lo, W0L 0 ) of the parameter when T will tend towards oo .

Based on this theorem, we can suggest a methodology that enables an almost


certain identification, something which allows us to determine the true model.

1.2 Practical research on the true model

1.2.1 The search for a dominant model

Using the findings from the preceding section, we can propose the following
method for determining the true model. We launch the architecture by taking all
of the relevant entries (as we would get them from a linear AR model) plus a
single hidden unit. Then we progressively add units into the hidden layer,
calculating the BIC criterion at each step. We continue this process as long as the
BIC value drops. When the addition of another hidden unit causes a renewed
upswing in the BIC, we stop the model search and construe this latest MLP as the
dominant model. Schematically, this search can be represented by the following
figure, which translates into a dominant model with k+ 1 hidden units.
Introduction to multilayer p erceptron 55

BIC (as a function of the number of parameters)

Hidden units

k-1 k+ l

Figure 2. BIC for the search for a dominant model

1.2.2 Determination of the true model

Once this dominant model has been obtained, we get the real model by
successfully pruning away extra weight.
Remember that W M = ( w 1, · · · , w N) is the vector parameter that is
associated with the dominant model. In principle, to estimate the true model we
should be comprehensively exploring the finite family of all of the submodels.
However, this number is an exponentially large one, and this is the reason why to
guide our research we are proposing, as is done in linear regression, a Statistical
Stepwise Method (SSM). This kind of strategy is based on the asymptotic
normality of the estimator Wr (Cottrell et al. 199 5) that uses Student statistics as
an aid for exploring subfamilies of the dominant model. To decide whether or not
the weightings wz should be eliminated, we compare the BIC values of the model
F and those of the model F without the weightings wz: Fz. As Fz is a submodel of
F, it suffices that the BIC criterion diminishes for us to move somewhat closer to
the true model, which minimises the BIC criterion. By so doing, we obtain a
series of MLPs with fewer and fewer parameters corresponding to a decreasing
BIC trajectory. The criterion for ceasing our pruning exercise is quite
straightforward, in that we will refuse to eliminate a weighting if this causes the
BIC to rise. We keep the final MLP that maintained this weighting.
In short, the procedure involved in searching for a true model is as follows:
56 Chapter 3

1. Determine F max a dominant model;


2. Thanks to Student statistics specifying the weighting of I which is a candidate
for elimination;
3. Accepting the elimination of this weighting if and only if the BIC criterion
diminishes, otherwise keeping the last MLP model before this pruning took
place.

This procedure has been tested with a number of examples and provides
excellent results as long as the volume of data is sufficiently large, generally
more than 500 observations.

2. HYBRID MODELS

2.1 Introduction

By modelling time series with the help of neural networks, we can account for
any non- linearities that the model may contain. At the same time, this is based
on a restrictive hypothesis as regards the model's stationariness. One simple
generalisation we could make would be to account for the piecewise stationary
time series For instance, Hamilton (1989) studied these kinds of models in an
effort to model time series that are subject to discrete changes of regime. He used
this as a way of analysing GNP series in the United States. We can therefore use
such models for series featuring a particular regime for periods of economic
growth, for example, and another one for periods of recession.
Although this model is more general than its predecessor, we still have need
of a number of restrictive hypotheses. First of all, there has to be a finite number
of possible regimes. Secondly, even though regime changes might take place, we
will assume that such changes occur in a stationary manner, something that will
ultimately enable us to make use of the law of large numbers and thus engage in
statistical analysis.

2.2 The model

The theory of hidden Markov chains and their first applications in voice
recognition are more than 30 years old. The basic theory was published in a
series of articles written by Baum et al. (1966; 1967; 1970; 1972) in the late
1960s. Hidden Markov chains have subsequently been applied in a number of
different fields, such as genetics, biology, economics, etc.
Introduction to multilayer perceptron 57

2.2.1 Markov chains in a discrete space

Consider (X t) tE z, a homogeneous Markov chain with values in the finite


state space E = {e1, ···,eN}, N E N * . Without any loss of generality, we can
identify the state space E with the simplex of RN, where ei is a vector unit of RN
with 1 on the i-th component and 0 everywhere else. The chain X 1 IS

characterized by its transition matrix A= (aij h:s.i,)S.N which is such that:

If additionally we were to define: V1+ 1:=X1+ 1-AX1, we would get the following
notation for this model:

2.2.2 Equations of the model

Assume that the time series we have observed (}[) verifies the following
equations:

where {Fe 1 , ···,FeN } are functions of RP----R which will be represented in our
example by MLP.
For every eiEE, (e,r(ei)) is a succession random variables that are
independent and distributed identically. This model makes it possible to use
several MLPs (here we would talk about a mixture of experts) and to use the
Markov chain (Xt) in order to specify at a time "t" which MLP makes the most
appropriate prediction. Note that since we can only observe the series (Y1 ), we
will have to find a way of reverting to the rest of the states of the chain (X1 ),
thanks to a (Y1 ) behaviour.
To adjust this sort of model to our observations, we estimate the parameters
(weightings of the MLPs Fe. , the variance of noise c(ez}, and the transition
l
58 Chapter 3

matrix A) thanks to the method of maximum of likelihood. A study of the


theoretical properties of this estimator can be found in Krishnamurthy, Ryden
(1998) and Douc, Moulines, Ryden (200 1).

2.3 Maximum of likelihood for the hybrid models

From here on in we will consider that the density of noise (e(e 1 )) 1 ~i~N IS

Gaussian in nature. We start by studying the free parameters that are under
consideration:
The transition matrix A, this matrix is stochastic, meaning that the sum of any
given column A is 1. Thus there are no more than (N -1 )x N free
parameters;
- The variances (aei hs;i~N , which are supposed to be strictly positive;
- The parameters of the regression functions (Fei )1 ~i~N, since we are using

MLPs, the parameters will obviously be the weighting vectors (Wei ) l~i~N ) of
the MLP.

The parameter vector (J will therefore be:

(} = (W ... W . . .all . . . a(N I)N (J


2 ... 0'
2 )
el ' ' eN ' ' ' ' - ' el ' ' eN

2.3.1 Calculation of the log-likelihood and of its derivative

Following on from this we will be assuming that the first observation y as


0
well as the initial probability of the state X are known, and that the conditioning
I
of the expressions with respect to these initial conditions will always be implied.

An initial writing
The likelihood of the model for a succession of observations of the series
y := (y0 ,···,Yr) for a supposedly fulfilled path of x := (x0 ,···,Xr) is therefore:

=IIII [<I> e; (Yt


T N
Le (y,x) -Fe; (Yt-1 ))]l{e;)(x,)
/=] i=l
Introduction to multilayer perceptron 59

where <I>ei is the density of the normal law N (0, a e ), 1 G the indicative function
of the set G and ;r the probability of the initial1 state x . To get the overall
likelihood of the ob~ervations, we could add up all of these' likelihoods along all
of the potential paths of the hidden Markov chain. We would then have:

L 8 (y) = L:L8 (y,x)


X

It is well known that the complexity of this sum is exponential, something that
makes it difficult to make the calculation whenever the number of observations is
more than a few hundred. It might also be possible to calculate the maximum
likelihood thanks to the E.M. algorithm by using Baum and Welch's forward-
backward algorithm. However, here we would prefer to use a differential
optimisation technique that is generally faster than the E.M. algorithm when the
regression functions being used involve multilayer perceptrons.

The log-likelihood
A more elegant way to write the log-likelihood is to use the predictive filter
P(X, = e; I Yt-1,.. ·,y0 ) := p,(i) since the likelihood will be written:

Lo(Yp···,yT) = :rr
T

1= 1
Lo(Y, I y,_,,-··,yo)
T-i
= Lo (yT I YT-P ···,Yo) X TI Lo(Yt I Yt-i,. ··,yo)
t=i
= ~(Lo(YT I XT = ei,YT-P···,yo)) TIT-I L ( I ... )
L. P(X I )) X Yt y,_,, ,yo .
T = e; YT-l ,···,Yo
II
i=l X t=i

Note that:
• Pt the vector whose i -th component is:

• bt the vector whose i-th component is:

i.e., the conditional density ofYt given that X 1=ei and (yt-1 ,. .. ,yo).
• uT the transposed vector u.
60 Chapter 3

This gives us:

T-1 T
Le(Yp· · ·,yr) = bJ Pr X IT Le(Yt I Yt-P ···,yo)= IJ btr Pt ·
1=1 t=i

From this we deduce a practical form of the log-likelihood:

(1)

All we have to do is to calculate Pt for t=l ,-··,n, to be able to calculate the log-
likelihood, since:

Mi) = Le(Yt I Xt = ei,Yt-l•··,yo)


= f/Je I. (Yt -FeI (Yt-1 ))

Calculation ofPt
In light of Brdiag (bt} the diagonal matrix whose diagonal is the vector bt,
we can easily verify (Rynkiewicz 2000; 2001) that the predictive filter Pt verifies
the recurrence:

ABtPt
Pt+l = T · (2)
bt Pt

We will assume that p 1 follows an uniform distribution over {1, . . . ,N} and we
will therefore be able to calculate Pt' t=l,-··,Tby means of recurrence. The choice
of an initial value of Pt is relatively unimportant thanks to the initial
distribution's exponential forgetting property.

2.4 Derivative of the log-likelihood

Remember that we have:


Introduction to multilayer perceptron 61

thus, if we write £1 the j-th parameter of the model, we get

db{ Pt
dln(Lo(Yl,···,yn)) = f dB;
dB; t=l b'{ Pt

dbT p
All we have to do then is calculate 1 1 in order to be able to calculate the
dB }·
derivative of the log-likelihood, being aware of the fact that:

db{ Pt = dbtT + bT dpt (3)


dB. dB . Pt I dB. ·
J J J

We can find the details of the calculation of this derivative in Rynkiewicz (2000).
Here we will be simply explaining the calculation ofthe derivative of the filter:

Calculation of dp 1 following £1
dB·}
Since we have the recurrence:

ABtPt
Pt+I = T
bt Pt

By deriving this expression with respect to the parameter ej , we find:

hence:
62 Chapter 3

dp 1+1 =[dAB1
()()
1
()() Pt
1
+AB dp 1
t ()()
1
)x-1-+AB
bT
tPt
tPt
x[db1T +bT dp 1
()() Pt
1
t ()()
1
)x[- 1 )
(bT ) 2
t Pt

We then have:

hence:

(4)

with, if PI is the initial distribution: dpl


()().
=0 for every}.
J
The rest of the calculation of the derivative should not cause any problems.
As such, we can calculate, for a reasonable calculation cost, the log-likelihood
and its derivative. This enables us to apply a wide array of differential
optimisation techniques in order to get closer to the maximum (or to the local
one, at least) of the log-likelihood.

3. APPLICATION: STUDY OF POLLUTION, DEFINED


IN LEVEL OF OZONE

The purpose of this study is to predict the maximum daily pollution rate,
defined in the level of ozone, between the months of April and September,
included. Towards this end, we will be using the previous day's maximum of the
pollution rate (i.e., of the ozone level) as our regressor - plus the following
meteorological observations:

- Total radiation,
- Average daily wind speed,
Introduction to multilayer perceptron 63

Maximum daily temperature,


The temperature gradient over the course of the day.

Statistical modelling of ozone levels (and in particular of models of


regression) have been studied many times over. Linear models do not seem to
capture all of the complexity of this phenomenon, hence our need to use richer
models (Chen, Islam, Biswas 1998; Gardner, Dorling 2000). Amongst these
models, the MLPs seem to come up with better results that the linear models,
even though they often require a much greater effort for just a slight
improvement in the prediction (Comrie 1997).
Here we will be showing how such predictions can be further improved
thanks to the hybrid model HMM/MLP. In addition, and asides from the
improvement in predictions, this modelling offers additional and precious
information that allows us to predict peaks in pollution.
For the present study, we have at our disposal meteorological observations
and ozone pollution rates for 1994 through 1997. We will be using the 1994-1996
data to estimate our models (data "in sample"), and comparing these different
models with the data from 1997 (data "out of sample").

3.1 Comparison between the MLP and the linear model

This preliminary study will allow us to detect what it is that the MLP
contributes to the simple linear model. The architecture of the MLP model was
determined thanks to the SSM method that was described in the first section of
the present paper. Here our performance criterion is the square root of the
average quadratic error (RMSE). It is expressed in an microgram's of ozone per
cubic meter (jLg/m 3 ). Table 1 summarises the findings obtained:

Table 1. Comparison MLP, linear model

Years 1994-1996 1997


(in sample) (out ofsample)
RMSEMLP 17.49 J.Lglm 3 17.98 J.Lg/m 3
RMSELINEAR 20.97 j.tg/m 3 19.70 J.Lg/m 3

Note first of all that the MLP leads to a marked improvement in the
performances of the linear model, whether in terms of the "in sample" or "out of
sample" data. Although there is relatively little "in sample" data to help us to
estimate the model (550), the SSM method makes it possible to avoid over-
fitting. This is achieved in an entirely satisfactory manner, given that the RMSE
difference between 1994-1996 and 1997 is relatively small. Note in addition that
64 Chapter 3

these finding are entirely coherent with preceding studies on atmospheric


pollution in Paris (Bel et al. 1972).

220
serias-
predlcaons -----
00

180
~~~~
I 11.
160 r ~~ Jl
j•lr(• r

,i !
I

140

,.I~ J ~
A
120
N.
II

~ il
~v
100 r ~ 1,

I~
~ . -'1/ '
'

I ~~ I ~ ~'II'
80
i I \
~~
60

~
I

40
·/ 1I
20
0 20 40 60 80 100 120 140 160

Figure 3 .MLP Prediction on "out of sample" data

Figure 3 compares the true value of the ozone rate with its MLP prediction
based on the "out of sample" series. Note that the prediction for the average
values is a particularly good one, but that peaks are generally underestimated.
This behaviour is all the more troubling since it is the higher values that the State
authorities are more interested in. We are therefore going to use a hybrid
HMM/MLP model, hoping that an expert will specialise in the prediction of
average and lower values, whilst another will be trying to ascertain the dynamic
of the high ones.

3.2 Performance of the hybrid model for an ozone series


Since the linear models are capable of correctly modelling the low values of
ozone pollution, we have chosen to use for our hybrid model a linear regression
model - as well as a MLP model, whose architecture is the one that is determined
on the basis of what we have studied in the preceding section of the present
paper. This choice is guided by the desire to maintain the most reasonable
number of parameters.
After estimating the parameters, we obtain the following transition matrix for
the hidden Markov chain:
Introduction to multilayer perceptron 65

A= (0.97 o.o2)
0.03 0.98

Note that the diagonal terms, which represent the probability of remaining in the
same state, are very close to the maximal probability 1. This means that for long
periods of time the model stays in the same state, a sign that it has indeed
identified two distinct regimes. The standard deviations for the linear expert a0
and for the MLP a 1 are as follows:

a 0 =0.11
{
a 1 =0.20

This result is intuitively coherent, since as we will see the linear model has
specialised in the easier part of the series (i.e., in the average or low values, thus
allowing us to come up with good predictions), whereas the MLP is specialised
in the more difficult part of the series, i.e., the high values.
Lastly, prediction error results are as follows:

Table 2. Prediction error results

Years 1994-1996 1997


RMSE !6 .5lflglm3

Note first of all a significant improvement in forecasting errors versus the


simple MLP. In addition, the hybrid model provides information that is ever
richer than is the case with the simple regression model.
We can in fact obtain a segmentation of the series that depends on the
conditional probability of the two regimes (see Figure 4). Note that the high
probabilities of the regime that is associated with the MLP tend to apply to the
strong values. In addition, there is never a peak in pollution when this probability
is low.
66 Chapter 3

Serie:1 -
Conditi.onnal probabilities - -

-1

~ L-----~----~----~----~----~------~--~
0 100 200 JOO 400 500 600 700

Figure 4. Centred and normalised series and probability of the state that is associated with the MLP

We can also break the model's prediction down into two sections: the linear
expert and the MLP. This gives the user a lot of flexibility. After all, if he is only
interested in the strong values, he will only take those MLP predictions that are
much more relevant into consideration. Figures such as 5 and 6 show the
"scatterplot" nature of the two experts' predictions with respect to the true values
for all of the data (in and out of sample).

250

Figure 5. Predictions of the linear expert, as a function of the true values


Introduction to multilayer perceptron 67
2W,-------~---------.---------,--------~------~~

200

Figure 6. Predictions of the MLP, as a function of the true values

Note with these figures that the linear expert is better than the MLP for the
low values, whereas for the high one its predictions are much less far-reaching
than are the true values. The MLP expert is the other way around, inasmuch as it
overestimates low values but is much better at estimating the high ones. If we
were to use this MLP alone to make predictions, the average quadratic error
would be worse than with a single auto-regressive model, but better for the part
of the series that we are interested in: pollution peaks as defined by ozone levels.

CONCLUSION

The present paper has introduced two important models that can be used with
time series. The first section was devoted to the difficult problem of the model's
dimensions. We have offered a methodology that is based on the asymptotic
properties of the parametric regression models. In our experience, this provides
good results as long as there is a sufficient number of observations (at least 500).

In the second section, we have generalised our problem and studied the
example of a piecewise stationary time series. The greater complexity of the
dynamics underlying such series cannot be captured in a simple regression
model. We therefore use, for a particular series, a number of regression functions
that are interconnected via a hidden Markov chain. All of these auto-regressive
functions make a simultaneous prediction of this series, with the hidden Markov
chain's role now being to weight the predictions that these regression models are
making by the various regimes' conditional probabilities. By applying this model
68 Chapter 3

to pollution data on ozone levels in Paris, we have shown that it can be a


relatively promising way of predicting the sort of phenomena that are likely to be
associated with regime changes such as pollution peaks.

We should note however that no statistical tools exist yet enabling us to


choose the number of regimes for a given series. In fact, if we overestimate the
number of regimes, we will lose the model's identifiability, and traditional
statistical tools such as those that were used during the first section of the present
paper to determine the MLPs' architecture are no longer theoretically justifiable.
Furthermore, this problem constitutes a very active area of research, one that uses
highly complex statistical and mathematical tools that go well beyond the
purview of the present paper.

REFERENCES

Akaike H. (1974), "A new look at the statistical model identification", Transactions on Automatic
Control, 19,716-723.
Baum L.E. ( 1972), "An inequality and associated maximization technique in statistical estimation
for probabilistic functions of Markov processes", Inequalities, 3, 1-8.
Baum L.E., Egon A. ( 1967), "An inequality with applications to statistical estimation for
probabilistic functions of a Markov process and to a model for ecology", Bulletin of American
Meteorology Society, 73, 360-363.
Baum L.E., Petrie T. (1966), "Statistical inference for probabilistic functions of finite Markov
chains". Annals of Mathematical Statistics, 37, 1559-1563.
Baum L.E., Petrie T., Soules G., Weiss. N. (1970), "A maximisation technique occurring in the
statistical estimation of probabilistic functions of Markov processes", Annals of Mathematical
Statistics, 41(1), 164-171.
Bel L., Bellanger L., Bonneau V., Ciuperca G., et al.. (1972), «Element de comparaison de
previsions statistiques des pies d'ozone »,Revue de Statistique Appliquee, 47(3), 7- 25.
Chen J.-L., S. Islam, P. Biswas (1998), "Nonlinear dynamics of hourly ozone concentrations:
Nonparametric short-term prediction", Atmospheric Environment, 32, 1839- 1848.
Comrie A.C. (1997), "Comparing neural network and regression models for ozone forecasting".
Journal of the Air and Waste Management Association, 47, 653- 663.
Cottrell M., Girard B., Girard Y., Mangeas M., Muler C. (1995), "Neural modeling for time series:
a statistical stepwise method for weight elimination", IEEE Transaction on Neural Networks, 6,
1355-1364.
Douc R., Moulines E., Ryden T. (2001), "Asymptotic properties of the maximum likelihood
estimator in autoregressive models with Markov regimes", Technical Reports 9, University of
Lund.
Gardner M.W., Dorling S.R. (2000), "Statistical surface ozone models : an improved methodology
to account for non-linear behaviour", Atmospheric Environment, 34, 21-34.
Hamilton J. D. (1989), "A new approach to the economic analysis of nonstationary time series and
the business cycle", Econometrica, 57, 357- 384.
Krishnamurthy V., Ryden T. (1998)," Consistent estimation of linear and non-linear autoregressive
models with Markov regime", Journal of Time Series Analysis, 19(3), 291-307.
Introduction to multilayer perceptron 69

Rynkiewicz J. (2001), "Estimation of Hybrid HMMIMLP models". In Verleysen M. (Ed), Proc.


ESANN'200 I, D Facto, Bruxelles.
Rynkiewicz J. (2000), « Modeles hybrides integrant des reseaux de neurones artificiels a des
modeles de chaines de Markov cachees: applications a Ia prediction de series temporelles >>,
PhD thesis, Universite de Paris I, France.
Schwarz G. (1978), "Estimating the dimension of a model", The Annals of Statistics, 6(2), 461-
464.
Sussmann H.J. (1992), "Uniqueness of the weights for minimal feedforward nets with a given
input-output Map", Neural Networks, 5, 589-593.
Yao J. (2000), "On least square estimation for stable nonlinear AR processes", The Annals of
Institute of Mathematical Statistics, 52, 316- 331.
PART II

APPLICATIONS IN ECONOMICS AND


MANAGEMENT SCIENCES
Chapter 4

Issues and dilemmas in cognitive economics

Bernard PAULRE
University Paris 1 Pantheon Sorbonne J.S. Y.S. - MATISSE UMR Paris 1 - C.N.R.S. 8595

Abstract: The cognitive economics, which first made its appearance in the 1960s, now focuses
a great deal of economic research resources. It is legitimate to raise questions about
the contents and/or timing of the proposal of a cognitive economics research
program. In this article, we underline some of the issues at stake in this sort of
clarification, focusing specifically on problems pertaining to the speeds at which
knowledge or real interactions actually adjust, and to the relevancy of the axioms of
the epistemic logic. The dilemmas that we have emphasized here all provide us with
an opportunity to highlight a certain number of significant alternatives. Specifically:
(i) issues relating to the respective roles of the knowledge economy and of cognitive
economics stricto sensu; (ii) the difference between the computable orientation that
is involved in standard approaches to economic behaviours, and the connectionist
orientation that we illustrate, for example, by the evolutionary conception of the
firm; (iii) the problem of the relationship between a cognitive economics research
program and one that relates to cognitive sciences. We present an outline of the
potential foundations for a cognitive economics research program. This kind of
program would manifest itself through potentially divergent schools of thought
whose disparities should all be seen as factors of dynamism that could be used to
drive research in this area.

Key words: Cognitive economics, cognitivism, connectionism, information economics,


knowledge economics, cognitive sciences.

INTRODUCTION

There are two reasons why the present emergence and rapid development of
what has come to be called the field of cognitive economics might be considered
a surprise. First of all, information and knowledge concepts seem consubstantial

71

C. Lesage et al. (eds.), Connectionist Approaches in Economics and Management Sciences


© Springer Science+Business Media Dordrecht 2003
72 Chapter4

with economics -yet asides from the debate on planning and the market economy
(Taylor 1929; Hayek 1935; Lange 1935) 1, amazingly enough it was not until the
1960s that the first articles dealing directly with the information economy were
published. Secondly, with the development of cybernetics and theory of
communication during the 1940s and '50s, it would have been no surprise had a
lot of economists become the prophets of these new sciences during the 1950s
and '60s, given the major effects they were having on other disciplines - but this
did not happen. Cybernetics did have some impact via the Keynesian models that
a few engineers developed using language derived from the theory of control
(Tustin 1958), and 0. Lange did offer us a most remarkable (albeit isolated)
contribution when, in 1964, he wrote a book devoted to a cybernetic approach to
economics. However, for the most part, and as we will see below, we can
generally say that it was primarily in the 1970s that information and knowledge
became a major focus in economics.
Even so, this preoccupation was usually deemed to be part of a mainstream
inspired study of markets with imperfect information and, as a result, little
attention was paid to the conditions underlying agents' acquisition of
information, whether at a cognitive or a psychological level. The main focus of
study was the impact of insufficient information, with observers usually basing
their analyses on the rational expectations hypothesis instead of highlighting the
conditions that enable agents, through learning or in some other way, to increase
the information at their disposal. In the end, it was in the 1980s that we finally
saw the advent of approaches reserving a lot of space for learning, knowledge
and beliefs, and portraying these factors as forms that were less trivial in kind
(and probably not as readily compatible with a Walrasian corpus of literature).
The goal of this article is to come up with a few elements to help us reflect
upon the ideas that cognitive economics convey, and the project that it offers.
Given the level of development which this discipline has reached by now, we are
probably ready to take a closer look at the meaning and the impact of those
changes that have already taken place. Should we be even be talking about a sub-
discipline called "cognitive economics" ? Should we go as far as to suggest a
new research program, following in the steps of what some people were calling
cognitive sciences during the 1970s and accepting directions that B. Walliser
(2000) laid out for us ? This may seem a reasonable point of view, but if so what
are the main axes and principles that will unite those who support this kind of
project ? These are the issues with which we will be dealing.
Our approach will be as follows. First of all, we will review the progressive
emergence of economic approaches on information and knowledge in order to
identify the themes around which the field of cognitive economics historically
seems to have been either partially or entirely structured (section 1). This will
enable us to explain some of the theoretical issues at stake in a future cognitive
economics research program. We will then distinguish between two types of
Issues and dilemmas in cognitive economics 73

dilemmas. First there are external dilemmas (section 2), located upstream from
the very definition of our cognitive economics concept. These are the dilemmas
that people encounter when their goal is to define either the boundaries of what
we have called "cognitive economics", or else the nature of a research program in
this field. After having analysed these dilemmas, we will be tracing the contours
of what a cognitive economics research program might look like (section 3),
before delving into the internal dilemmas (section 4) that will be nurturing and
supporting discussions on the various options that are available in this area, once
a research program has been defined (or at least outlined). A cognitive economics
research program does not actually mean that a unified theory or doctrine exists.
We already saw this with the development of cognitive sciences. Instead, we are
looking at a general orientation capable of federating and uniting researchers who
do not necessarily share the same basic options, themes or methodologies, even if
they are active in the same field of research.

1. THE INFORMATION AND KNOWLEDGE


ECONOMY. A BRIEF PANORAMA AND THE MAIN
ISSUES

It is difficult to determine when cognitive economics were born. First of all,


there is the word itself. Secondly, there is the spirit or idea to which it refers, i.e.,
the first manifestations of a definite interest in studying the role that information
and knowledge play in people's behaviour and the economy's modus operandi. If
we are prepared to grant some significance to institutional signals, we could
mention the following key dates:
- 1965: the theme appeared in the agenda of the annual conference of the
American Economic Association,
1971: publication of the first readings on this topic (Economics of Information
and Knowledge, published by D. M. Lamberton),
1976: introduction of an information concept in the American Economic
Association's index of economic reviews.

Other dates are also noteworthy, but we feel that it would be more useful to
identify the themes and approaches comprising that which we will be calling
(with a certain amount of vagueness for the moment) the study of the economics
of information and knowledge. It is also a good idea to provide a timeline for this
discipline. To do so, we will focus on the period extending from the 1950s to the
early 1980s, an era we have good reason to highlight2 . A whole series of studies
can be explored, analyses we think should be regrouped by theme (or even by
"wave") whenever they feature a modicum of concentration or historical
74 Chapter4

coincidence. Of course, we are not saying that this overview will cover the entire
field. Our purpose is only to offer some idea as to its boundaries and
characteristic themes.

By limiting our investigation to the period between 1950 and 1980, we are not
trying to suggest that knowledge-related issues were entirely absent from earlier
economists' areas of concern. For example:
- Since the very outset, the division of labour has raised questions about the
development of knowledge and intelligence, these being the factors that this
division either denotes or else which it restricts (Smith 1776; Ferguson 1783,
particularly),
As far back as the 1930s (1933 precisely), F. Knight focused on themes such
as the relationship between information structure and organisation,
In 1934 N. Kaldor and A. Robinson were already studying issues such as the
optimal size of the firm, linking this to the cognitive capacity for planning
(the central theme of Penrose's theory, see below),
The role that information plays in market structure and market equilibrium
has been studied on several occasions: to determine how this relates to
monopolistic competition and the role of advertising (Chamberlin 1933); and
as part of oligopoly theory, which stresses problems with oligopolistic actors'
crossed expectations (Fellner 1949),
Hayek began to work on knowledge diffusion in the late 1930s, during the
aforementioned debate about socialist planning.

However, these elements are highly dispersed, and have never been exploited
systematically. For this reason, we prefer the assertion that the history of this
field of study began in the early 1950s, an era that witnessed the hatching of our
topic's main constituent phylum - the forerunner of those modem research efforts
whose orientations we would also like to discuss.

1.1 The 1950s and the phylum of the theory of the firm

From 1950 until 1953, there was much debate about the profit maximization
principle, relating to whether the underlying hypotheses were realistic, and
whether the principle itself actually furthered the Oxford surveys on corporate
executive behavior. A. Alchian renewed the terms of this debate in 1950, as he
was the first to invoke the natural selection argument to explain, if not the
maximization of profit, then at least the constant search for it. M. Friedman used
this argument in 1953 to justify the maximization principle. It is worthwhile for
us to review this debate because of the fact that a study published by S. Winter in
1964, offering both a synthesis and a pedagogical extrapolation of our topic, was
Issues and dilemmas in cognitive economics 75

followed by a whole series of articles that would culminate in 1982 with the
publication of the key work by R. Nelson and S. Winter, the promoter of the
modern evolutionary approach in economics. This text asserted strongly (if not
for the first time) that the analysis of the firm should center on the concept that
"routines" are the place where "knowledge resides". In this view, the storage
function that comprises one of the three constituent functions of selection in a
natural selection paradigm (along with selection itself and mutation) is enacted
through routines. According to Nelson and Winter, "organizations remember by
doing" with "routines constituting the most significant way of storing
organizations' specific operational knowledge" (p. 99). The train of history
between 1950 and 1982 was one of the main vectors for the progressive
emergence of a cognitive approach to firms - but it was not the only one. Enters
E.T. Penrose, who participated in the 1950s debates by contesting the naturalist
metaphor to which the firm was being exposed, in order to suggest and develop
(in an entirely different vein) an approach to the firm that reserved a lot of space
for a cognitive dimension.
In 1959 she published her famous book on the theory of the growth of the
firm. She started out with a principle that was unique in the context of the micro-
economic theory of the time 3 : a firm has to be dealt with as if it were "an
autonomous administrative center for planning". It is both an organization and
the sum of a variety of means of production. Penrose pointed out that it is never
the resources themselves that constitute what can be called the factors of
production, but instead the services that they can provide. So, she had good
reason to postulate that a productive activity depends on a "productive potential,
which includes all of the production possibilities of which entrepreneurs are
aware and from which they can benefit". This being the case, the entire theory
that Penrose develops is based on what we feel should be called a cognitive
conception of the firm. One of the most noteworthy manifestations of this
approach derives from Penrose's famous theorem about the factors that restrict a
firm's growth. In short, this theorem states that a firm's rate of growth is bounded
by its managers' cognitive capacities for planning growth. Penrose's theory
contains room for learning. Although this concept is mainly used with regards to
the learning of executive managerial activities, it is also present via the analysis
of the firm's internal processes, those that are involved in the creation of new
productive services. The Penrosian vision of the growth of the firm is therefore
highly dependent on the progressive extension of productive services, on the
creation of a surplus and on the subjective growth potential. Now, these factors
are all qualitatively and quantitatively determined by experience. The so-called
"resource theory" school of thought, which occupies a central position in modem
evolutionary approach, springs from Penrose's analyses (Durand, Quelin 1999).
Certain developments relating to corporate strategy (Ansoff 1965; Drucker 1965;
Hamel, Prahalad, 1990) are also clearly rooted therein.
76 Chapter4

At the same time as Penrose, and with an analogous starting point (i.e., with
the firm being seen as an organization), a neo-rationalist approach to
organizations developed, based on the seminal writings of Simon, whose two key
works were published in 1958 and 1963. In a book that Cyert and March wrote in
1963, organizational orientation is depicted in a more radical light than was the
case with Penrose, inasmuch as the "productive" aspect of the firm disappears.
The only thing that remains is a vision of the firm as a system for making
decisions and solving problems. On the other hand, theirs is a vision that has
been enriched by an incorporation of conflicts relating to the firm ' s objectives4 .
This is a decisional and dynamic approach in which learning plays a key role,
albeit one that is different from the function it fulfils for Penrose, given that it
now pertains to decisions and not to activities or productive services.
Furthermore, even though innovation (in the widest and not necessarily
technological sense of the term) is present in March and Simon's writings, it is
absent in Cyert and March. The former authors rely explicitly upon a conception
wherein human organization is seen as "a complex system of information
processing". For the latter, this theory (which is first presented in a verbal form)
blossoms into an information system model that can simulate a general decision-
making model determining price and quantity (this model representing the
solving process that is often attributed to firms).

1.2 The late 1950s and the emergence of a microeconomic


theory of innovation

Between 1958 and 1962, several seminal articles were published about R&D
and the knowledge economy.
In one of the first articles on the subject, R. Nelson (1959) raised the issue of
the quantity of resources that a country should devote to scientific research,
which he defines as "a human activity that is geared towards the advancement of
knowledge, which can itself summarily be divided into two separate categories:
the facts and data that can be derived from reproducible experiences; and the
theories and relationships between these facts." He looks at problems relating to
the externalities causing a gap between the marginal private returns or else the
social returns of basic research. He also observes that the marginal social cost of
using an already existing bit of knowledge is zero, hence the need to
"administrate knowledge as if it were a freely accessible common resource." 5 In
1962 K. Arrow focused on the same issue in a fundamental theoretical article in
which he demonstrated that in a competitive situation the incentive to invent is
greater than it is in a monopolistic environment, but that it is less than (or at best
equal to) the invention's social yield - hence an opportunity for State intervention
(1962a). In 1958, B. Klein had underlined the unavoidable duplication of R&D
Issues and dilemmas in cognitive economics 77

activities, due to the high degree of uncertainty affecting them (Klein 1958;
Klein, Meckling 1958). This type of questioning, and more generally the issue of
R&D-related externalities or spillovers, would never cease to be central to
economists' reflections, especially within the framework of endogenous growth
studies (Romer 1990). This is the genesis of the widespread interest in networks.
In 1961, having observed that before an innovation's effects are completed an
imitation process will have already been widely launched, E. Mansfield, relying
on a classical contagion principle, demonstrated that in general "the number of
users of an innovation will approximately follow a logistic curve." This fits in
with the initial empirical findings that Z. Griliches (1957) found for this subject.
Contemporary analyses of innovation diffusion processes still refer to these two
authors.
In 1962, K. Arrow published another seminal article on learning by doing
(1962b) 6 • This study was macroeconomic in nature. It is different from the more
microeconomic and engineering-based "progress curve" or experience curve of
Alchian (1950).

1961: a possible birth date for the information economy

To the best of our knowledge, an article published by J. Stigler in 1961 was


the first to introduce a "search" concept to describe the process by which "a
buyer (or a seller) who wants to obtain the best price will poll several sellers (or
buyers)". Here J. Stigler mentions the search for knowledge about goods' quality,
an issue "that was studiously avoided in the present paper. ..[and which] is
certainly more difficult at an analytical level." To conclude, he suggests that
certain economic organizations be seen as systems intended to "eliminate
uncertainties about quality". This is taken further in an article on the labour
market that evaluates the costs and yields of undertaking research in this market.
In 1966 A. Rees wrote a paper on this theme, focusing on quality issues and
informal information networks.
That same year, Muth published an article based on the principle that agents
use all available information; that the representation of an economic system is a
type of shared knowledge; and that expectations are rational. In fact, the rational
expectations hypothesis is an equilibrium concept expressing the self-fulfilling
nature of agents' expectations.

1.3 The 1960s : the first measurements of the information


economy and education

1962 saw the publication of a seminal work by F. Machlup involving a


measurement of the role that cognitive activities play in the American economy.
78 Chapter4

Then in 1966 there was an article by K. Boulding on the knowledge economy, as


far as we know the first publication to carry this title. The 1970s featured several
studies on the rise of knowledge or information-related activities in the
developed economies. The most significant empirical work was by M. Porat
(1977).
To accompany these observations or in parallel to them, note the arrival of the
post-industrial society concept, created by D. Bell and developed by A. Touraine,
who called it the programmed society, and by others. The name of K. Boulding
should also be mentioned here (1965).
The human factor or "human capital" obviously accounts for a significant and
singular portion of all analyses of the role that knowledge plays in production
and growth. In a fundamental article published in 1956, T. Schultz bemoaned the
fact that two important elements were being neglected: "an improvement in
peoples' attitude as productive agents, and a raising of the art of production ...".
E. Denison on the other hand tried to use a breakdown of the Solow residue to
evaluate the role of training levels (1962). T. Schultz attempted to estimate the
indirect costs of education ( 1960) and then to directly assess the contribution of a
stock of education to GNP growth (1963). The utility of an increased level of
education was first assessed by Houthakker in 1959 and then by H. Miller in
1960. The theoretical reflections of G. Becker operated at the individual firm
level (1962).

1.4 The 1970s and the incorporation of imperfect


information into a market's functioning

The early 1970s saw a whole corpus of articles (or chapters in collective
books) devoted to an incorporation of the notion of imperfect information into
market theory. However, the hypotheses that were formulated and the
conclusions to which they lead were not convergent.
Amongst these studies we should specifically mention those that were
devoted to unemployment and which developed the thesis that this is one
consequence of job-searching in an environment characterized by imperfect
information. As such, starting out with the idea that the search for information
about "notional" trading opportunities has a cost, A. Alchian, in a well-known
text, suggested that the type of unemployment that results from this situation "is
self-employment in information collection" ( 1970, p. 30).
A 1970 article by Akerlof highlights some of the difficulties that will crop up
in markets where buyers and sellers have different information, and stresses the
existence of adverse selection phenomena. In actual fact, the first article to have
delved into the issue of asymmetry appears to be one that Arrow wrote back in
1963, which focused on moral hazard (as he had done in his 1962b article).
Issues and dilemmas in cognitive economics 79

Moral hazard involves situations in which one side of the market cannot observe
the behaviour of the other. Inversely, adverse selection relates to situations in
which one side of the market cannot observe the quality or the "type" of the
goods that are being created by the other side. For the former we speak of
"hidden behaviours" or of unobservable actions. For the latter we speak of
"hidden types" or of unobservable characteristics. We can scarcely over-estimate
the impact and significance of research on this topic, including (in recent times)
in the financial sphere.
Signals theory was first evoked in an article published by M. Spence in 1974,
shortly before a model that M. Rothschild and J. Stiglitz developed in 1976. The
Spence model treats education as a signal of competency. The other model
focuses on purchasing behaviour in the insurance market, and considers that the
type of policy being purchased is a signal of the degree of risk that the insured
party represents. Other models exploiting the same analytical principle include
ones that were published by Milgrom and Roberts (1982: price is an indicator of
the cost that an oligopoly assumes; and 1986: advertising used as a quality
signal), Reinganum and Wilde (1986: the propensity for bringing a lawsuit seen
as a signal of the strength of one's position), etc.
The firm is an other area where information asymmetries are present:
shareholders and executives do not possess the same level of information and do
not necessarily have the same interests7• Principal-agent literature helps us to
explore this situation, which developed from initial contributions by M. Spence
and R. Zechauser (1971) and Ross (1973) regarding the issues involved when
contracts are signed in a situation where asymmetrical information exists. Then
came a somewhat different approach from M. Jensen and W. Meckling (1976),
one that could be applied in a wide variety of situations. With respect to the
theory of the firm, this latter approach would become particularly relevant
whenever restrictions exist within an organization or a group regarding
decentralization and/or the delegation of authority, particularly when this is due
to informational asymmetries. Nowadays it has been extrapolated into a new type
of thinking about the nature of the firm, an approach in which analyses are made
in terms of incomplete contracts. This consists of thinking about the impacts of
such asymmetries and about the types of contracts that would make it possible to
resolve any conflicts of interest (i.e., "to align incentives"): c.f., 0. Hart, 1995. It
also covers issues relating to the firm's borders (Hart, Moore 1990), thus tying
into transaction cost literature, which springs from a seminal article by R. Coase
(1937) and from Williamson's book on this topic (1976).
In retrospect, the 1970s definitely constituted a most fertile period for the
information economy.
80 Chapter 4

1.5 Economic analysis of information systems in organizations and in


markets

To a certain extent, instead of looking at how an imperfect market functions,


information economy studies analyse the impact that imperfections and a lack of
information will have on the overall functioning of markets. The focus is on the
relationship between an information system's structure and the type of
equilibrium it achieves (or its capacity for achieving this equilibrium). We can
associate market functioning issues with issues relating to the influence that
structures or internal information systems exert on an organization's behaviour
and performances.
The thinking of Hayek unsurprisingly occupies a key position in this
discipline, since it indicates the main approaches that economists may wish to
pursue when trying to ascertain the role of information and knowledge.
According to Hayek, co-ordination is the key to all social sciences, starting with
economics. Hence a question that runs throughout much of his work: "How can a
combination of fragmented knowledge, one that has been dispersed in many
people's minds, produce outcomes that, if we were to assume that they were
intentional, would require the leading mind to possess a knowledge that no one
single person can possess?" (193 7). Here Hayek is concentrating on the idea of a
dispersed and subjective type of knowledge that he sees in terms of the division
of labour, and which ties into some very modem concepts produced by the
cognitive sciences8. This orientation caused Hayek to express his opinion on
several of the fundamental paths and themes that are linked to knowledge and to
the development thereof. First he explored in great depth a number of topics that
are very specifically related to the theory of the mind (Hayek 1952)9 , before
developing an evolutionary approach (Hayek 1973).
G. B. Richardson, as part of this "Austrian" perspective, highlighted the co-
ordination and temporal desynchronisation problems that firms have to contend
with (1960, 1972). Here reducing disequilibria means developing organizational
forms that facilitate mutual information and which can shape whatever type of
knowledge is needed if the co-ordination is to take place. G. B. Richardson refers
to the Marshalian vision according to which "capital largely consists of
knowledge and organization... knowledge being the main driver of
production ... and organization enhancing knowledge" (1890-1920, p. 115).
A more theoretical and formalized way of thinking about the conditions in
which information exists, and about how markets function, was developed from a
more axiomatic perspective, notably by L. Hurwicz (1972). During the 1960s, he
provided fodder for discussions about the informational feasibility of the
economic processes that resource allocation entails, depending on whether or not
this is occurring in a "classical" environment (i.e., whether or not there is a
Issues and dilemmas in cognitive economics 81

presence of externalities and/or indivisibilities; a convexity of preferences and


technologies, etc.).
A theoretical analysis of organizations' information systems was developed in
the early 1970s based on work by T. Marsckak ( 1972), J. Marskack and R.
Radner (1971) or K. Arrow (1974). Here a theoretical and normative approach to
organizations means trying to discover which rules of the game should be
established to help an organization to obtain the performances it desires. The
"rules of the game" encompass internal information roles and all of the feasible
joint strategies actually considered, with such elements reflecting the
technological or other sorts of constraints that the "organizer" has imposed. This
approach can be analysed in the light of the debates that were taking place on
topics like planning or the equivalency of market socialism and capitalism. This
interpretation of general equilibrium, which was furthered in a number of studies
(particularly in France during the 1960s with E. Malinvaud, J.-Cl. Milleron and
Y. Younes), is similar to an approach we would call "computable" nowadays,
where planning is described as if it were an information system algorithm. The
algorithm's architecture certainly highlights informational relationship issues, but
it does this from a perspective that has nothing to do with the way in which actual
institutional structures really function. They are much more complex.
1966 saw the publication of a book by Axel Leijonhufvud who tried to
demonstrate that Keynesian theory is more than just a simple IS-LM model. His
thesis was that Keynes had rejected the neo-classical idea according to which a
price mechanism fulfils its information function perfectly over the short-term.
Leijonhufvud compares the Walrasian model with a Newtonian type of
mechanical system, and criticized applications of neo-classical theory to
production in "a world where human agents are to an ever greater extent
exclusively involved in the generation, transmission, reception and processing of
information, and not in the manufacturing of pins as in Smith's vision" (p. 396).
He observes a convergence between Keynes's original vision and cybernetics,
which basically arrived too late to allow analysts to grasp all of the impact of the
General Theory. The same Leijonhufvud would later create the first research
center on computable economics 10 •

1.6 Information, decision theory and games theory

If there is an area in which problems relating to imperfect information or to


uncertainty find it easy to occupy a central position, it has to be decision theory.
In more recent times, by its very nature, games theory also concentrates on this
theme.
In 1954, L. Savage wrote what is probably the most important work of recent
years. By reducing uncertainty to risk, he placed expected utility theory in an
82 Chapter4

axiomatic framework 11 • In this approach, an agent assigns subjective probabilities


to varying "states of the world", defined as a description of all relevant aspects of
the world. An event is an entire group of states of the world, or a subgroup M of
all states of the world E. Despite observations formulated by Allais as far back as
1953, it was not until the early 1980s that the experimental identification of a
certain number of paradoxes cast doubts on the expected utility theory (c.f.,
notably Kahnemann, Slovic, Tversky 1982) paving the way for alternative
theories (cf. Munier 1995).
Games theory is more demanding than decision theory since economic agents
must be deemed to possess information on and/or beliefs about other players'
behaviour in addition to whatever information or knowledge they may have
about events. In the early 1980s arose a new approach in games theory, one that
consisted of pursuing an epistemic logic. The advent of this new orientation can
be explained by the fact that: (i) specialists were being asked to treat epistemic
aspects of behaviour in an increasingly precise manner, and (ii) the theory's
formalistic method of measurement plus the probabilities it assumes are "less
distant than could be expected of epistemic logic" (Bacharach et al. 1998). The
advantage of the epistemic logic that Hintikka proposed back in 1962 was that it
supplied a useful formal framework for modelling an agent's knowledge and
beliefs. Furthermore it was compatible with games theory. Its relevancy is
illustrated retrospectively by the debates that took place on backward induction
and "rationalisable equilibria", and by the way people were dealing with the idea
of shared knowledge (Bacharach et al. 1998). Henceforth games theory would no
longer be limited to a mere expose of the probabilities of occurrence that agents
allocate to an event. It now portrays agents as decision-makers who acquire
knowledge, receive information and develop their own beliefs - and it supplies
axioms and rules of inference that account for these processes (c.f. inter alia:
Bonnano 2000).
Yet even with this list of publications, as impressive (albeit highly selective)
as it is, we should not be concluding that information economy studies
constituted a new field, that they had emerged "naturally", or that the
relationships between all of the aforementioned theses and models were born out
of (and found meaning within) some all-encompassing and integrated approach.
In actual fact, between 1950 and 1980 there was no attempt whatsoever to
synthesize all of these aspects of the information or knowledge economy, and a
fortiori of cognitive economics. Instead the federating theme for many of these
studies was the economics of uncertainty. As was the case elsewhere, such
studies were divided between the theory of the firm or else a macroeconomic
approach (expressed in terms of growth, productivity, technological progress,
etc.).
Issues and dilemmas in cognitive economics 83

1. 7 The issues involved in the incorporation of imperfect


information and the role of knowledge

The aforementioned themes constitute one complete set of issues, inasmuch


as they involve problems that could be elucidated (if not resolved) by a cognitive
economics research program, and/or by advances in this field . At the same time,
analyses of these topics have also revealed variances or divergences between
them, raising a number of different kinds of issues. This makes it impossible to
design a cognitive economics research program (with the research orientation and
strategy it would tend to emphasize) that is neutral in terms of the points of view
it expresses or the issues it deems to be a priority. We feel that is useful at this
juncture to highlight some of these fundamental issues. We will be analysing two
of them: (i) equilibrium's role as an analytical principle (vs. learning); (ii) the
nature of cognitive rationality (epistemology).

1.7.1 The first issue: an approach based on the notion of equilibrium

Let's go back to the consequences of imperfect information and note, first of


all, how destructive this is for the existence of competitive equilibrium and for
the incorporation thereof. J. Stiglitz did a good job demonstrating this outcome in
his 1985 article, which readers are advised to consult.
At another level and from a different point of view (i.e., without concentrating
on a study of the imperfections that can exist in a specific market), factors such
as criticisms of the auctioneer's existence, the absence of certain markets and the
roles that strategic behaviours can potentially play, are all some of the reasons
that have led researchers to focus on the conditions in which economic agents
receive information and develop their expectations. For example, the
implementation of a sequential approach (based on a refusal of a disequilibrium
approach) has led to the introduction of learning phenomena and of historical
aspects. It has also raised questions about what happens when certain markets
(especially futures markets) are missing, and sparked analyses of the way in
which certain other markets function, i.e., asset or labour markets. The role that
contracts play, particularly in this latter market, has also become a topic of study
(for all of these points, see Hahn 1990).
However, much of the research being carried out from this point of view is,
fundamentally, not very distant from a Walrasian framework. We are still
operating within the confines of an economy that is comprised of interdependent
markets within which price adjustments play a central role, and where, despite
everything else, agents remain highly rational. This is little more than an attempt
to reshuffle or to develop the analytical framework in such a way as to preserve
its existing methodological orientation. The rational expectations hypothesis is a
84 Chapter 4

key element of the reasoning that is being presented within this framework.
Associated with the hypothesis that behaviour is neutral towards risk, the
postulate seemingly constitutes a strong condition for demonstrating the possible
existence of price taking equilibria. Nevertheless, certain difficulties do remain,
notably because of the existence of multiple equilibria.
According to F. Hahn, the rational expectations hypothesis in a price taking
economy does not suffice to create a pre-determined equilibrium:
" .. since even here, to obtain our pre-determined responses we need to know
something about the learning process. Plus the equilibrium might not reflect
the fundamentals of the situation ... ".
In addition, in an economy with imperfect sector competition:
"the dynamic must be seen as a learning process both in terms of the demand
conditions and also as regards competitor strategies. Once again, when an
equilibrium is defined in terms of these processes, it would appear to remain
undetermined unless its past (in other words, information) has been explicitly
modelled and is a known commodity ... The information available to agents at
any given moment in time is very path-dependent. The economy could have
followed another path and generated quite different types of information.
There is something that is fundamentally past-oriented about the very
definition of equilibrium, and obviously in the dynamic itself' (Arrow 1958).
There are as many equilibria as there are possible pasts. An equilibrium with
given properties can stem from a number of different pasts, which can have one
or several characteristics in common.
Leaving to one side the issue of multiple equilibria, we see that a rational
expectations hypothesis has a strategic role to play here. Above and beyond the
lack of realism argument that consists of pointing out the fact that this hypothesis
attributes a cognitive capacity that is manifestly excessive in terms of agents' real
capacities and level of information, what else can we say about it?
During a conference in the late 1980s, R. Lucas made a major contribution to
this debate when he highlighted the methodological options underlying his
research strategy. According to Lucas, the way in which economic agents are
able to store the decision-making rules that they use has fundamentally nothing to
do with economic theory:
"Economics tend to focus on situations in which it can be expected that agents
"know" or have learnt the consequences of different actions, meaning that
their observable choices reveal the stable elements in their underlying
preferences... Technically I believe that economics are the study of the
decision-making rules that constitute the stationary states of certain adaptive
Issues and dilemmas in cognitive economics 85

process, rules that work for a certain range of situations and that are not
revised, especially when experience starts to accumulate" 12 •
Ultimately, one of the main issues in the incorporation of imperfect
information and knowledge seems to be the very existence of a competitive
(price taking) equilibrium. Hence the inclusion in a certain number of studies of
recommendations for this type of situation, as if it constituted the norm. In other
words, the main strategic debate seems to involve a questioning of the rationality
being ascribed to actors, especially the existence of an opportunity for
incorporating learning (and when this does exist, the type of hypotheses that
should be made in this respect). As such, cognition does indeed lies at the heart
of the problem - something that provides us with a natural link to the second
. .
Issue we now raise.

1.7.2 The second issue: what kind of cognitive rationality?

It is not absolutely necessary at this point to review past debates on the nature
of rationality in economics (i.e., the way that this topic has been dealt with since
the 1960s, particularly by H. Simon). In a context such as the one in which we
are situating ourselves at present, we feel that it would be more judicious to take
a closer look at cognitive rationality, or to be more specific at the epistemology
that economic agents apply (and at the way in which this has been portrayed in a
number of recent studies). Hence the need to revisit the aforementioned epistemic
type of logic, whose usage in games theory nowadays would appear to be well
established. Many analysts feel that this logic is widely recognized as the way to
move forward. We will be discussing the axioms upon which it is built,
immediately followed by an expose of its limitations. We find (see Osborne,
Rubinstein 1994):
An axiom in which agents are familiar with all of the possible states of the
world, and always know when a "universal" event has been enacted 13 . This is
tantamount to excluding so-called "ignorance" situations, postulating that
agents cannot be surprised by events that they did not even realize were
possible.
A deductive closing axiom: if an agent knows E and knows F, then s/he
knows that EnF.
A truth axiom: what an agent knows is true; nothing that s/he knows can be
false.
A transparency axiom (a.k.a., a positive introspection axiom) : if agents know
E, then they know that they know E. In other words, agents are always aware
of what they know.
86 Chapter 4

- A wisdom (or negative introspection) axiom: if an agent does not know E,


then s/he knows that s/he does not know E. Agents are perfectly aware of
their own ignorance.

We will be limiting ourselves at present to a summary presentation of these


axioms. Note that nuances or modifications would have to be made if
probabilistic beliefs, information and the role of information were introduced to
knowledge (on these issues, see Bonnano 2000; Walliser 2000).
The axiom that is generally considered to be the most critical and which shuts
the door on a number of other approaches is the second one (deductive closing).
Here a so-called "omniscient logic" property is inferred: the agent is supposed to
be able to exploit "basic" knowledge in its entirety (B. Walliser speaks of
"primitive beliefs"). This axiom precludes any possibility of what we could call
"limited cognitive rationality", since knowledge of the implication (K(E) and
K(F) ~ K (En F)) is independent of the complexity of the deduction that is to
be carried out. Agents' capacity for reasoning is meant to be limitless (given the
complexity of the implications that their knowledge of the world supposedly
allows them to derive from this situation).
One of the consequences of the truth axiom is that knowledge or know-how is
necessarily cumulative. Agents are not supposed to be able to find themselves in
situations where they "know" something that is not true - and all of the
knowledge that they do have is true. Knowledge acquisition necessarily evolves
towards greater completeness or finer detail. Great care must be taken when
discussing this topic since the axiom is acceptable in "atomistic" knowledge,
conceptions which hold that knowledge can be stored. Inversely, in a more
constructivist epistemology where the reshuffling or restructuring of know-how
is considered to be a feasible outcome, this axiom causes problems. Furthermore,
it is vulnerable whenever we are contending with the social types of knowledge
that are involved in the interactions between individuals and groups -
interactions that are quite changeable. This would be less open to discussion if
we were dealing with an empirical type of knowledge, of the kind that we find,
for example, in natural or physical sciences 14• Once again, here we are facing
issues like the stability of possible worlds, or else relative rates of learning. As
long as we are operating within the framework of a paradigm that is based upon a
relatively stable situation, we have every reason to assert that knowledge is
necessarily increasing. This explains, for example, the relevancy of knowledge
acquisition studies that are based on learning by doing.
In economics it is difficult to accept the negative introspection axiom.
Particularly by those who adhere to the principle that radical uncertainty exists.
In other hand, those who are more rationalist and who start out with the idea that
this can lead to a drafting of incomplete contracts may accept this axiom. The
founder of epistemic logic Hittinka, even rejected this axiom.
Issues and dilemmas in cognitive economics 87

The transparency axiom is difficult to accept, certainly for those economists


who recognize the soundness of an evolutionary perspective and who, within this
framework, accept the existence of tacit knowledge. However, this point remains
open for discussion, since this tacit characteristic is clearly less aimed at
reflexivity than at the ability to express and to formulate knowledge in a
communicable form.
We do not feel that there is any need to extend this analysis to attempt to
convey the idea that the axioms characterizing the epistemology of economic
agents need to be studied in any great detail. We have chosen not to delve any
further into this particular discussion despite the fact that it merits further
analysis. This is because we have restricted ourselves to the most elementary
expression of the epistemic logic, and because we know that the difficulties we
have mentioned are familiar to anyone who builds and proposes these kinds of
systems. This has allowed us to stress how very useful it is to analyze economic
information and knowledge models as if they were a function of the axioms or
epistemic hypotheses that create them, sometimes implicitly. There is no doubt
that this trade-off between an epistemic and a more economic perspective can be
useful. This strengthens our conviction that a cognitive economics research
program could be extremely useful.

2. COGNITIVE ECONOMICS: THE EXTERNAL


DILEMMA

To advance in our definition of cognitive economics, there are two problems


we need to deal with. The first involves the choice between a broad or a narrow
approach to cognitive economics. The second relates to the nature of the
relationship between cognitive economics and cognitive sciences. We will be
emphasising the latter.

2.1 The first dilemma: is the knowledge economy part of


cognitive economics?

In the brief panorama we provide below, we were able to observe the


existence of two types of preoccupations that (albeit distinct from one another)
both illustrate an economic approach to knowledge-related phenomena.
On one hand, we note the existence of an essentially macro-economic school
of thought that focuses on measuring the role of economic activities that are
devoted to the production, processing and circulation of information and
knowledge. Studies by F. Machlup and M. Po rat have already been mentioned,
but over the past few years contemporary developments in the New Information
88 Chapter4

and Communications Technologies (ICT) have sparked in-depth statistical


studies whose purpose has been to tabulate accurately and to measure the role
(and performances) of any activities that tie into this sector. The same approach
has been pursued in the high-tech sectors, or more generally, in what has been
called the New Economy.
At a more micro-economic level, a certain number of models or theories
process knowledge or information not in terms of agents' cognitive capacities,
but from the point of view of their economic value, or the economic calculation
to which they can be subjected. One well-known analysis is a seminal article by
K. Arrow (1962a) where he first establishes the principle that information should
be seen as a good, and then takes apart Schumpeter's supposition about the
relationship between market structure and the propensity to innovate. More
recently (and in a very different vein), we can mention a book by R. Shapiro and
H. Varian that looks at the economic principles that allow the producers of
certain recent informational goods and services to commercialise them.
In these approaches, the information economy can be characterised in three
different ways:

- Regardless of whether this involves knowledge (c.f., K. Arrow: a new process


that is protected by a patent) or else information, in the mundane sense of this
term (c.f., R. Shapiro and H. Varian: access for example to a database),
information is solely dealt with in terms of the value and the price that can be
attributed to it;
Information is viewed either as a merchandise or else as a specific activity;
- At no time does anyone wonder how the information is going to be processed
from a cognitive point of view, in other words, the impact it can have at an
individual level, its assimilation, its processing and its effects on the
behaviour of the person who is going to benefit from it. It is as if information
possessed a common "objective" value; or where it has been differentiated, as
if the utility that can be derived from it were independent of its user's
mental/psychological disposition or level of"competency".

We therefore have good reason to introduce an important distinction between


the two economic approaches to knowledge and information: on one hand,
knowledge is an "objectified" good, if not a merchandise; on the other,
information and knowledge are relationships or factors that produce a change in
agents' behaviour, meaning in this case a modification in their knowledge,
perception or "vision " of the surrounding world. In this latter denotation, we
prefer talking about a change that cannot be expressed directly in terms of utility
or profit. Once this distinction has been established, the dilemma we face is
whether or not we should postulate that cognitive economics encompasses both
approaches, or else only covers the latter.
Issues and dilemmas in cognitive economics 89

In actual fact, the cognitive dimension, in the strict sense of the term, is absent
from the first approach. In much the same way as other merchandise or economic
objects, knowledge is entirely dependent on an evaluation process that uses up
(so to speak) the characteristics or the economic meaning of the issue with which
it is having to contend. The beneficiary or recipient of the information might
derive a differentiated utility or product from this service, but at a first order level
this relationship is thought to be analogous to the one that derives from the
purchase or consumption of any other kind of good or service. The cognitive or
"mental" dimension is totally implicit therein. What we will be calling the
knowledge economy is the portion of economic thinking that deals with
information and knowledge basically as if they were activities, relationships or
objects that can be treated at an aggregate level; or which infers behaviours
within which the individual cognitive dimension is both implicit and non-
analysed.
We face a different sort of product if we focus on the conditions that
transform an agent's way of carrying out a certain operation due to the
information that s/he has received because, for example, of some previous
experience. A transformation process that is based upon the information that an
individual has received cannot be equated with a mere commercial exchange -
even if we were to assume that the information being used has a known cost, the
outcome of the operation (i.e., the change in behaviour) constitutes an innovation
that is akin to a clean break with the past. This makes it difficult, if not
impossible, to evaluate the outcome thereof from an economic point of view.
Unless we find ourselves in a situation where damages are being paid, we cannot
even establish the principle that as part of this transformation operation (which is
not necessarily a voluntary or even a conscious act) the agent is displaying an
economic type of rationality.
What we will be calling cognitive economics is that part of economic thinking
which deals with economic phenomena within which information and knowledge
play an essential role; and which actually incorporates the cognitive sphere of
agents' behaviour. The field also includes any economic research whose specific
purpose is to provide suitable interpretations and representations of actors'
behaviours, from a cognitive perspective.
Given the ambitions we have developed, what direction should we be heading
towards?
The two themes are related and should in principle be interconnected. If
indeed they have come apart, this is to due to the state of the discipline, and to
the research strategies that are being pursued. Ideally, cognitive economics
should include a study of the emergence processes thanks to which we can
explain (or at least account for) the vertical relationships and conceptual or
model-related imbalances that are in play. However, our understanding has made
barely any progress in this domain, and with the exception of very few studies
90 Chapter4

(Lesoume 1991; Lesoume, Orlean (eds.) 1998; Paulre 2000) we have not even
been given sight of the articulations that are actually taking place. In fact, our
ignorance is such that despite the fact that cognitive economics and studies of the
knowledge economy are not officially distinct from one another, they cannot at
present be totally associated and integrated in a joint research program (or at
least, not if such a program is to be rendered fully operational). Note however
that this does not mean that research projects involving cognitive economics can
be carried out independently of the lessons we have drawn from the knowledge
economy, if only because they illuminate certain institutional aspects thereof, as
well as the nature of certain problems relating to activities that feature a strong
informational or cognitive dimension.
Another argument also supports our decision to separate these two domains
temporarily - their relationship with cognitive sciences. If indeed we want to
define a cognitive economics research program that draws its inspiration from the
contents (and links) of earlier cognitive science research programs, we should try
to emphasize the individual aspects of cognition. This distances us from sectoral
or macro-economic approaches that neglect such aspects, or else which do not
deal with them in a direct manner.

2.2 The second dilemma: what relationship between


cognitive economics and cognitive sciences?

At first glance there is no need to establish a relationship between cognitive


sciences and cognitive economics. The former have been around for a long
time 15 . They have never really integrated the economic dimension, and their
penetration of the social sciences has been very limited 16 • Inversely, asides from
the past few years (ECHO simulation projects, research by the Santa Fe
Institute), we cannot find any economic research project that was fully and
significantly integrated (as defined by the resources used and the number of
researchers) into a cognitive sciences research program 17 .
Although there is no obvious relationship between the two domains, the
absence of a relationship does however remain somewhat surprising and can in
fact appear rather absurd. It is surprising because we do not see why these two
areas would neglect one another, since they deal with analogous objects and
therefore must have certain elements in common. And it can seem absurd
inasmuch as the materialization of a space of scientific thinking around cognitive
sciences would signify the availability of an acquired or basic experience that
features a number of elements which could probably tum out to be quite useful -
if only because of everything this infers about certain simulation tools (which are
also beginning to crop up in the field of economics).
Issues and dilemmas in cognitive economics 91

In addition, if we carefully examine the current period as well as an earlier


one (1950s-'60s), there is no doubt but that several economic research projects
did indeed situate themselves within a cognitive science perspective. In
chronological order:
- In the 1950s it was clear that H. Simon, through his bilateral scientific
commitment as one of the fathers of Artificial Intelligence (A. Newell and
H.A. Simon) and as the founder of the neo-rationalist school of thought in
economics and organizational science, established a significant relationship
between a program of economic research (behaviorism) and cognitive
sciences (cognitivism or computationalism). One of these projects involved
preparing simulation models and languages for use in problem solving
processes. The other involved simulating organizational decision-making via
models based on the principle that such organizations have to be analysed as
if they were decision-making and problem solving systems;
The 1990s saw the emergence of a new research program called
computational economics. In 1993 A Leijonhufvud created the Centre for
Computable Economics at the University of California at Los Angeles, a
forum that brought together M. Aoki, J. McCall, K. Vellupillai, etc. In one of
the first texts he wrote there (Leijonhufvud 1993), he observed that the neo-
classical theory of general equilibrium is top-down, whereas a bottom-up
approach should in fact be implemented: "It is preferable to conceive of an
economy as if it were a network of interacting processes, each of which has
an information processing capability which is smaller than that which a
central processor would use to resolve the problem of the entire system's
overall allocation" (p. 9). Regarding the question of how economic theory
should be unified, he answered, "Economics should be viewed as a machine
whose function is to compute equilibrium" (p. 20);
In recent years we have witnessed many computer-based attempts to model
artificial economic systems. One of the best known is the ECHO project run
by J. Holland, whose initial output dates from the 1970s. Here modelling has
involved complex adapted systems built on the basis of interacting agents
who are described in terms of IF ... THEN sorts of rules. As they accumulate
experience, agents adapt by modifying these rules. However, the environment
in which each agent operates is partially comprised of other agents with the
same type of behaviour. This characteristic is "one of the main sources of the
complex temporal structure the system is able to generate." The model also
uses genetic algorithms;
Neuronal networks, whose origins go back to studies by McCulloch and Pitts
(at the dawn of cybernetics) and which are exploited by the connectionist
school of thought, are also used in economics, albeit mainly as a classification
tool;
92 Chapter 4

- In addition, an approach based on the concept of distributed knowledge,


which in economics should be associated with the name of Hayek, is linked
nowadays with that which we would call Distributed Artificial Intelligence
(or else with multi-agent systems).

If significant relationships do exist between a number of relatively specific


research programs that are interested in what we call cognitive economics (a
phenomenon that we have not yet defined precisely), does this mean that if a
cognitive economics research program were to exist, it would be integrated into a
wider cognitive sciences program ? We cannot answer in the affirmative, given
that cognitive sciences seem to focus mainly on the study of human intelligence,
that is to say, on the intelligence of an individual.
Nevertheless, it is not beyond the realm of possibility that cognitive
economics can be "wedged into" cognitive sciences. This can be achieved by
providing this discipline with a principal objective that consists of something
which is not key to cognitive sciences, even though it is being developed there -
collective intelligence or cognition, in other words socialized intelligence. This
offers the advantage of helping to create a level of analysis which cognitive
sciences have barely touched upon but which would appear to hold a great deal
of potential - after all, even when an individual is the object of study, intelligence
necessarily entails a social dimension.
This ultimately leads us to conceive of Cognitive Economics as a Research
Program that is linked, as far as this is possible, with Cognitive Sciences, and
which is part of a broader plan for thinking about and studying inter-individual
and collective cognition. In a program of this sort, the goal would be to study
cognition in such a way as to improve our understanding of economic and social
activities, exploring the influence that cognitive systems (depending on their
general or specific properties) have on the way in which economic systems
function and perform, at the various levels where this can be studied. The goal
could also be to elucidate the role that economic and social activities play in
developing or structuring inter-individual or collective cognition. Much in the
same way as individual knowledge still derives from a social type of interaction
(at least from a certain level upwards), we can see the close links between
economic and cognitive aspects, both at the individual and at the collective level.

3. COGNITIVE ECONOMICS AS A RESEARCH


PROGRAM

To specify what a Cognitive Economics research program would entail, we


will start out with D. Andler's presentation of an initial research program for
Issues and dilemmas in cognitive economics 93

cognitive sciences, i.e. for cognitivism. It is not our intention to align ourselves
with this perspective. We are simply trying to come up with a starting point that
can be useful in two ways: (i) it can trigger the dialogue that we feel economists
should be maintaining with cognitive science specialists, thus improving the way
in which the elements from the respective research programs mesh with one
another; (ii) it will specify the nature and the characteristics of the orientations
that will be making up this cognitive economics research program. We can
benefit from D. Andler's text by using it as a basis for thinking about how to
specify our orientations in such a way that they can correspond with (or
complement) those orientations that cognitive sciences have already defined
According to D. Andler, the classical paradigm that lies at the origin of
cognitive sciences ( i.e. the cognitivist paradigm) can be characterized "in its
most simple expression" by invoking three propositions that we will be
successively commenting upon and exploiting.
1. "The mind/brain is a complex that can be described in two ways: material
or physical, in the broadest sense of the term; and iriformational or
functional. These two levels are largely independent, and the rapport that
builds up between them is akin to the one that connects a computer (when
seen as a physical system) to the description of the same machine as an
information processing system. "
Metaphorically, a computer shapes the architecture of a field that, according
to D. Andler, "even in these days of contestation, is still widely shared" (p. 14).
The aforementioned double description has made it possible to separate the study
of the mind from the study of its material functioning. Tantamount to the
separation of hardware and software (something that has been essential to the
emergence of cognitive sciences), this initial orientation can be copied and
accepted by economists - but it cannot have the same strategic impact in this
discipline as it does in a psychological or in a cognitive scientific context.
Inasmuch as our goal is to identify which principles are capable of driving a
research program, this seems to be a good opportunity to discuss an element that
the present demonstration continues to imply naturally, yet whose impact is even
more strategic within the confines of our discipline -the acknowledgement that a
cognitive system is a legitimate object of study, and a crucial element in our
understanding of economic systems, whatever the level involved.
From our point of view, we find it difficult to imagine that a research program
in cognitive economics could start out with any premise other than: (i) it is
important to study individuals' cognitive capacities (the constitution of their
universe of perception); (ii) there is a constant need to justify the cognitive
hypotheses that are on offer through a precise examination of their relevancy (to
theory or to reality); (iii) we need to continually explore the way in which the
94 Chapter4

cognitive capacities being ascribed to individuals will influence the outcomes of


the models being proposed.
A cognitive economics research program thus commits us inevitably to a path
of micro-economics. It does this for one immediate reason, which is that
cognition by its very nature is an individual operation. However, as we have
pointed out (and as witnessed with games theory), knowledge is not only
individual- it contains a social dimension, and is also inter-individual.
We can use co-development phenomena to defend this point of view. In
ecology, we say that co-development exists when the evolution of the species
affects the evolution of the species with which it interacts, meaning that there is
an overall evolution associating a species with its environment. In their 1982 text,
R. Nelson and S. Winter clearly illustrated the co-development of industrial
structures and of something that in their model constitutes the characteristic
attribute of companies' cognitive systems, to wit, their productivity (see
chapter 11 ).
A. Kirman stressed this point recently in a text in which he came out in favour
of economists' incorporating interaction structures into their field of study,
underlining that we should "consider that agents learn as a function of their
environment and that the environment itself learns as a function of changes in
individual behaviour. The speed of these two learning processes should not be
thought of as being very different from one another." As such, when trying to
study the influence of a system's structure on its overall attributes, we should not
view this structure as something that is stable, nor should we think that what
individuals are seeking is a way to adapt optimally to it. If, as many economists
now agree, we cannot ignore organizational arrangements (including markets),
this also means that we should not ignore the influence they exert upon agents'
behaviour, nor the feedback that results from this. Hence the requirement that we
formulate precise hypotheses (other than the ones that relate to the search for
optimal adaptation) in order to account for these decision-making systems and
cognitive dispositions.
In a cognitive sciences context, there is no point in affirming the need to
incorporate individual cognitive systems. Inversely, in economics, the same
affirmation takes on strategic importance since it infers that people acknowledge
the heterogeneity of cognitive competencies and the instability of decision-
making rules.
2. "At the informational level, a person's cognitive system... is characterised
by his/her internal or mental states and by the processes that lead from one
state to the next. These states are representational in nature - their contents
are linked to external entities (we also say that they are semantically
assessable) ".
Issues and dilemmas in cognitive economics 95

This assertion, together with the one below, determines the "computo-
representational" nature of the cognitivist approach. The exact same idea is found
in the aforementioned works by J. March and H.A. Simon and by R. Cyert and J.
March. This is no surprise given their origins (the Carnegie Institute of
Technology's neo-rationalist school of thought, during the 1950s).
This principle is just as relevant in economics as it is in cognitive sciences.
Nevertheless it continues to be disputed by the connectionist school of thought,
which has developed a non-symbolic approach. This latter school considers that
"meaning is not enclosed within symbols: it is a function of the overall state of
the system and remains connected to the general activity in a given area."
However, as we affirm above, we are not seeking at present to find in favour of
one or the other of these schools of thought or paradigmatic options. Instead we
are focusing on the nature of the issues at stake or on the type of orientation
being emphasized by the programmatic announcements that have been coming
out of the cognitive sciences.
We will therefore be remembering that a cognitive research program has to
emphasize the importance of agents' ability to specify their world (their
"strategic universe") and that this specification must be designed in accordance
with the mode of representation, the mode of interpretation, or the mode of any
interaction schemes and correlation's. To a certain extent, this act of recognition
simply involves an acknowledgement of subjectivity.
3. "These internal states or representations comprise the formula of an
internal language or "mentalese" that is similar to the formal languages of
the particular logic. The processes involved are those that the logic sees as
being effective. They can mainly be reduced to a small number of primitive
operations that will obviously be carried out by a machine (since this requires
no "interpretation") ... ".
The issue raised here relates to the language in which the individual's
"internal" state or subjectivity can be expressed. In other words, the goal is to
figure out how to describe the agent's cognitive competency in the area where
s/he is active. In a cognitive sciences context, modem logic is one preferred
mode of description. Certain schools of thought postulate a "symbolic" system
with the minimalist meaning that D. Andler lends to this term, i.e. where it
consists of symbols that refer to external entities. In economics, the
representation that is ascribed to an actor is usually expressed in the same terms
as those that are used to describe his/her environment and actions. Lack of
information can impact the probabilities that given states of the world will indeed
materialize - but the semantics involved in the agent's description of the states
are the same as the semantics that are used by the person who is making the
model.
96 Chapter4

J. Marschak and R. Radner (1972) introduced the possibility that economic


agents view the world through an information structure that causes a partition in
the possible states thereof18 • Potential information structures can be compared in
terms of the degree of detail they offer. The most finely detailed structure
corresponds to complete information. This axiom allows us to differentiate
between agents' varying information structures, which therefore become
representations of the world that can be distinguished by the degree of detail each
involves -but which cannot always be compared.
For B. Walliser (2000), whose interpretation relies on an epistemic type of
logic, an actor's know-how can be expressed in the same language as a model-
maker' s. Know-how is generally presented "as a stock of propositions to which
[the actor] adheres .. .". The knowledge operator, which "indicates, for any
proposition P ... that is being considered by the model-maker whether or not the
actor is familiar with this, is defined by the model-maker in such a way as to
reflect the actor's know-how". The actor is equipped with a representation of the
environment that can be expressed in the same terms and language as the
"objective" knowledge that is at the observer's disposal.
In economics there are other modes for describing an agent's behaviour or
cognitive system, notably those that mobilize the algorithm-based representations
which are derived from a computational conception of agent behaviour. R. Cyert
and J. March's general decision-making model is the best example of this. Here,
the agent's internal semantics are also identical to those that the model-maker
uses to describe the interaction between agent and environment.
It seems to us that the essential question at this juncture should be: is it
possible to imagine gaps or divergences between the semantics used to describe
the states of the world and the agent' s action in this world, on one hand, and the
nature of the operators and the elements of tacit "internal" knowledge, on the
other? Can we free ourselves from the kind of representation that is most
pervasive in economics, which consists of ascribing to actors a rationality that
operates according to states or types of "know-how" that are expressed in the
same terms as those which apply to the world and to the acts themselves? There
are two reasons why this merits a positive response: (i) there is no reason why we
cannot free ourselves from a "representationist" approach, and in fact there is
even a current of research in cognitive sciences (i.e., connectionism) that sees
itself as an alternative to computational approaches; (ii) we think that in
economics, representing an agent's internal universe in terms of routines (as
defined by R. Nelson & S. Winter) provides a good illustration of this latter
approach. We return to this below.
We deduce from the above that a cognitive economics research program
should enable us to get a broader spectrum of modes of representation. This
should support the consensus that an agent is not necessarily enclosed in a pre-
determined cognitive framework by the model-maker; and/or that this latter, give
Issues and dilemmas in cognitive economics 97

or take a few factors, is in fact simply reproducing the structure of his/her own
actual interaction with the environment. What is important is not whether this
"internal" functioning is being thought of in a "realistic" or justified manner. We
feel that from a modelling perspective, what is essential is that a dissociation be
created between these two universes so that the problem of an agent's evolution
towards an equilibrium or stable state can be expressed with all the complexity
that is inherent to this issue.

In sum, a Cognitive Economics Research Program should integrate the


following elements:
1. An acknowledgement of the importance of studying individuals' cognitive
capacities and an affirmation of the need to justify the cognitive hypotheses
that are being made by examining their relevancy (to theory or to reality).
This implies that we must try to assess, as far as possible, the impact that the
cognitive capacities that are being ascribed to agents will have on the
outcomes of the models being proposed.
1. An acknowledgement of the role that representation and/or interpretation
plays in the shaping or in the manifestations of knowledge
2. An affirmation of the role that is played by Computation and by computable
capacities.
3. A focus on learning processes, at several different levels. This implies some
recognition of the cumulative nature and of the historical basis of knowledge.
This cumulativeness cannot necessarily be assimilated with a storage
phenomenon, inasmuch as restructurings or reorganizations sometimes have
to take place if knowledge is to be extended.
4. A study of the manifestations or of the overall behavior of a collective
system, seen as the outcome of individual interactions between individuals,
interactions where representations and adaptation modes can sometimes vary
in a mutually dependent manner, i.e., where they can co-develop.
5. The application, where relevant, of a hierarchy principle that consists of
analyzing how (collective and/or higher level) cognitive organizational
functions result from this mutual adaptation of lower level units (emergence
phenomenon or self-organization); from their co-development; and from the
co-development of the system and its environment.
6. An acknowledgement of the heuristic role that information system tools play
in simulating the complex interactions between the agents in a system.
7. An acknowledgement of the relevancy of information system concepts, and of
the architectures that being proposed by I.T. specialists.
4. THE INTERNAL DILEMMAS OF COGNITIVE
ECONOMICS.

We have said on several occas10ns that the announcement of a research


program should not involve the proposition of specific, quasi-paradigmatic
options as much as it should involve a declaration of the program's general
orientations, in other words, of principles that are broad enough to create a
modicum of ecumenicalism. As such, we would now like to take a look at the
"doctrinal" divergences that are still subsist, or at least at a number of problems
that will subsequently cause us to select certain options over other ones.

4.1 The first dilemma: the diversity of the schools of thought


that can be found in the cognitive sciences (cognitivism,
connectionism, constructivism)

Despite their thematic unity and the existence of a few federating orientations,
cognitive sciences remain pluralistic in nature, with several schools of thought
co-existing. Each emphasizes its own orientations, analytical principles and
specific representations. We can identify at least three such approaches or
paradigms: cognitivism (or computationalism), connectionism, and
constructivism. The first is perfectly (albeit not exclusively) represented by
behaviourism in economics, or more simply by the decision-making theories to
which economists resort. The second is manifested in this discipline by a certain
number of studies based on the use of specific simulation techniques (i.e., genetic
algorithms). The third, with which the name of J. Piaget is particularly
associated, has to our knowledge no particular projection in the field of
economics, asides from a few studies that are relatively epistemological in
nature.
This raises questions as to the impact that this paradigmatic diversity might
have on a planned cognitive economics research program. Where economic
agents are being represented as maximizing actors who are operating as per their
representation of the semantically assessable surrounding environment, most
economic research programs are likely to rally, however trivial this may be,
around the banner of cognitivism (albeit with the considerable reservation that
the computable capacities which this entails are rarely explored and assessed).
This is due to the fact that from a computational perspective studies of the
cognitive capacities being ascribed to economic agents in certain models
sometimes reveal major limitations. For example, M. Rabin demonstrated back in
1957 "that we currently find a number of strictly determined win-lose games 19
for which no computable winning strategy actually exists".
Issues and dilemmas in cognitive economics 99

Yet even though out of all of the cognitive sciences connectionism seems the
best way to transcend cognitivism, the conception of rationality that clearly
continues to prevail in economics means that this latter discipline is not very
distant from this one. The purpose of a research program in cognitive economics
would thus be to raise people' s vigilance about the type (and level) of the
cognitive capacity that a given discipline ascribes to its actors. In fact, a strict
application of the aforementioned orientations, and specifically an evaluation of
the computability they entail, would in our opinion constitute a first move
forward, allowing us to specify the cognitive bases upon which many economic
models have been built - even if for other reasons (i.e., the limited rationality
argument), some analysts will continue to doubt the validity of a particular
model.
Computability can be tested via mathematics (numerical analysis) and
recursive functions . Moreover, to the non-mathematician, a more accessible path
exists - computer-based dynamic simulations, something that in our opinion
provides a precious tool for testing and evaluating the computational
requirements of a number of behavioural models.
However, a cognitive approach in economics can benefit from the type of
representation that connectionism inspires. An important illustration of this
follows.

4.2 The second dilemma: information versus knowledge

Economists have a tendency to not distinguish sufficiently between


information and knowledge concepts. One ubiquitous yet unfortunate metaphor is
to see knowledge as a stock, and information as a flow in and out of the stock.
This image stem from an intuitive vision which postulates the existence of a
memory within which agents "stock" all of the information they have obtained
over the course of time - an inventory from which they can make withdrawals as
need be.
Asides from the fact that this kind of vision keeps us from discussing whether
the information being stored in this manner is valid or coherent, the main
problem is that knowledge should not be construed as a stock. Moreover, it is
possible to re-organize knowledge. If we accept the (non-cognitivist) principle
that knowledge is manifested through an agent's acts (and that it is implied
therein), we must agree that it possesses a relational aspect. This is one of the
principles of a Piaget sort of constructivist approach, whose epistemology is the
best example of the relational conception of structure. This is the same thing as
postulating, as Ashby once formulated in an old principle, that the object of our
study is always the sum total (i.e., the system) that has been added up by the
agent and his/her environment.
100 Chapter4

In economics, the standard vision of the agent either does not differentiate
between knowledge and structure, or else does so in an implicit manner. Decision
theory represents agents as possessing information on the state of their
environment. Now, this information might be assimilated with knowledge, but it
certainly does not constitute know-how, in B. Walliser' s sense of this term: an
explanatory know-how in nature, derived from the causal inference operations
that an agent carries out (Walliser 2000, chapter 2). Although knowledge is
present in economic models, it is usually manifested through the way in which
the agent's possible acts are structured, i.e. through the model's causal meaning
(information on the states ~ acts ~ consequences for the states). It remains that
this refers to a capacity (or an endowment) that has been ascribed to the agent by
the builder of the model, who is using a stable structure - hence our description of
this phenomenon as something that is implied. For these reasons, most economic
models involve information alone. This is not really a surprise, given that the
implied central question is one of co-ordination (which often has more of a
quantitative than a structural definition).
In the end, the dilemma we are facing here stems from the choice we have to
do between a conception of the agent in which s/he allegedly reasons in terms of
"natural states", and another in which s/he supposedly develops plans and fits
into the world in a way that reflects existing causal relationships. In the former
case, the causality is defined once and for all in the model, and the agent reasons
in terms of his/her acts instead of as someone who is getting physically involved
in the world. S/he can play upon the intensity of these acts but cannot modify
their structure. In the latter case, the door is open to entire sequences of actions;
to action gambits; to an increase in available information during the time it takes
to carry out a plan, etc. Although economists generally remain relatively close to
representations in which the agent's operational structure is both implicit and
invariable, the increased utilisation of certain kinds of sophisticated simulation
tools could lead to a change in attitudes.

4.3 The third dilemma: joined information vs. non-joined


information

There are two ways to obtain information: via the outcome of an action that
has been undertaken for reasons other than the acquisition of information; or via
the outcome of an action that is specifically designed to obtain information. We
call the former joined information. With the latter, we say non-joined or free
information. A hybrid is possible when the action that has been decided upon is
altered to make it possible to achieve a "physical" goal and at the same time to
obtain the information that is being sought. To a certain extent, these are test
Issues and dilemmas in cognitive economics 101

actions, or strategies that take full advantage of the possibility of an increase in


available information.
Finding out where information comes from and how it can be obtained is a
crucial issue in economics. If we can accept that the origin of a bit of information
might be external to an economic type of transaction, we are paving the way for
an incorporation of the social domain, going down the path of embeddedness of
which Granovetter is so enamoured. If information is necessarily joined, the
agent's cognition is entirely in keeping with his/her economic activity and the
cognitive universe that is born out of economic acts is specifically economic in
kind.

4.4 The fourth dilemma: the sub-symbolic level of routines


vs. the computational system

Connectionism has developed in opposition to cognitivism. More specifically,


it infers an abandonment of the principle according to which the explanation of
cognitive phenomena requires a distinct symbolic level. This is sometimes called
a "sub-symbolic paradigm" (F. Varela, 1988) to express the idea that the overall
state of the system under study emerges from a network of entities that is located
at a level which is much more finely detailed than a symbolic level could be.
Connectionism can be contrasted with computationalism, something that induces
the analyst to view the firm as a decision-making system (i.e., J. Marschak and R.
Radner); or else as an information processing system (i.e., R. Cyert and J.
March).
Situating one's self at a sub-symbolic level might seem inappropriate from an
economic point of view. In our opinion, however, there exists a simple and
famous example for this approach- R. Nelson and S. Winter's use of the concept
of routines (1982).
This concept is rooted in a sub-symbolic level. The reason is that (as is the
case with genes) an element of this type is crucial for behavior but does not
constitute any part of actors' representations of external objects. "Any one of a
firm's regular and predictable behavioral schemes constitutes a routine" (p.14).
Behavior is therefore built upon foundations that are radically different from the
ones that are found in the decision-making model that is customarily used in the
field of economics. In R. Nelson and S. Winter's evolutionary approach, a
decision-action tandem does not exist. This is why they dispense with the
[objectives - sum total of possible choices - maximisation rule] triad (1982, p.
14). Routines are places for storing knowledge that is usually tacit. A firm is a
complete set of routines, each of which can be replaced or modified. Once the
routines have been activated, they determine the overall behavior of the system
being studied (in general, the firm).
102 Chapter4

Computation is not really relevant to the activation of routines. The activation


is not implying that operations are based on symbols that assume "that actors
react through their representations of whatever relevant elements they discover in
the situations in which they find themselves" (Varela 1988, p.37). Rules,
procedures and task descriptions are all instruments for the representation and
implicit structuring of types of conduct that have little to do with a manipulation
of symbols. The paradigm in which we find ourselves is different from the
rationalist one, even from the bounded rationalist paradigm a Ia Simon.
We can interpret (and extrapolate from) R. Nelson and S. Winter's assertions
by stating that routines are components of a whole, and that they are created and
then replaced depending on their mutual interactions and on the efficiency of the
system in which they participate20 . The system's overall configuration is likely to
be just as important as the characteristics of each of its components. The logic
surrounding us has more to do with an emergence phenomenon than with the
framework of a decision-making entity whose components improve in line with
the information they receive and as a result of the decisional process itself. Of
course, the internal structure is not unrelated to the external constraints. This
circuit plays upon an "enactment" phenomenon, i.e., a process for structuring
reality (Weick 1979; Varela 1988).
One final and important observation. The fact that we are dealing with an
emergent type of functioning at certain levels is not incompatible with the
existence of deliberating and deliberate modes at higher levels.

CONCLUSION

Through a wide array of studies that are highly diverse both in terms of the
problems with which they deal and also as regards the theoretical or
epistemological options they favor, the cognitive economics, which first made its
appearance in the 1960s, now focuses a great deal of economic research
resources. It is entirely legitimate to raise questions about the contents and/or
timing of the proposal of a cognitive economics research program. On one hand,
this is a way of providing this field with some structure, thus clarifying its issues
and orientations. On the other hand, we are now able to move towards a
conceptual clarification and discussion of the epistemological foundations of the
research that is being conducted this field. We have underlined some of the issues
at stake in this sort of clarification, focusing specifically on problems pertaining
to the speeds at which knowledge or real interactions actually adjust.
The dilemmas that we have emphasized here all provide us with an
opportunity to highlight a certain number of significant alternatives. Specifically:
(i) issues relating to the respective roles of the knowledge economy and of
Issues and dilemmas in cognitive economics 103

cognitive economics stricto sensu; (ii) the difference between the computable
orientation that is involved in standard approaches to economic behaviors and the
connectionist orientation that is illustrated, for example, by the evolutionary
conception of the firm; (iii) the problem of the relationship between a cognitive
economics research program and one that relates to cognitive sciences.
We have presented an outline of the potential foundations for a cognitive
economics research program. We would like to particularly emphasize the fact
that, following in the footsteps of the cognitive sciences, this kind of program
would manifest itself through potentially divergent schools of thought whose
disparities should all be seen as factors of dynamism that could be used to drive
research in this area. We would also like to point out the significant benefits that
economists would derive from this endeavour if they were to draw greater
inspiration from a number of studies that have been undertaken in the field of
cognitive sciences, most of which could lead to an extension of their current
approaches.

NOTES
1. In actual fact, we have to go all the way back to Barone ( 1908) and Pareto ( 1897) to identify the
opening salvoes in this debate.
2. The limits of the period we are analyzing are, on one hand, the immediate aftermath of the War,
when the "first" cybernetic appeared (one that in all likelihood contributed, at least in part albeit
not for economists, to a modification of the meaning given to a certain number of phenomena);
and on the other hand the 1980s, a period during which the evolutionary school of thought
developed considerably, this being an area within which the cognitive plays a crucial role,
alongside theories of endogenous growth and the more macro-economic approaches to
economics that are said to be based on "science" or knowledge. From the 1980s onwards, the
importance of "cognitive" economics was so obvious to everyone that no one really wanted (or
felt the need) to illustrate it any more.
3. This theme was already present in C. Barnard and H.A. Simon, but had not yet truly penetrated
economist circles.
4. This enhancement disappeared from Nelson and Winter's theory following the appearance of
the truce hypothesis, something that clearly demonstrated, in certain respects at least, the retreat
that this theory represents when compared with theories of the firm during the 1960s and 1970s.
However, the different level of analysis can also explain this.
5. The same assertion lies at the heart of analyses of cognitive capitalism and of "new enclosures"
(Paulre 2001a)
6. Previously published articles on learning by doing had been micro-economic in nature. Learning
or experience curves were nothing more than reduced forms, moreover they were not based on
any analysis of the conditions in which knowledge actually arises. The trailblazer article on
learning was the one that A. Alchian wrote in 1949.
7. Clearly this is not a new problem, and it is generally considered that Berle and Means first
brought it up in 1932. Its more distant forerunners can be found in profit literature, the most
interesting example being F. Knight (1921 ).
8 We are alluding to Intelligence as it is defined in DAI (Distributed Artificial Intelligence)
104 Chapter 4

9 The texts compiled in this volume seem to have been written in the 1930s.
I 0 Note in addition a compilation of articles published in 1981 under the name oflnformation and
Co-ordination, in which A Leijonhufvud develops the idea that failures in macro-economic co-
ordination result from the problems agents have in correctly perceiving all of the opportunities
that are present in the system.
li.Ofcourse, Savage's book was preceded by von Neumann and Morgenstern's Theory of Games
(1944).
12. Lucas argues in favour of being able to make a distinction between economic adjustment on one
hand and the learning processes by which agents discover the properties of their environment or
of the situation with which they are confronted. At the very least, they discover that they have
very different rates of adjustment. This is diametrically opposed to the philosophy that K. Arrow
expresses in his 1958 article. For an approach that encompasses alterations in the rules of
decision-making, cf. R. Nelson and S. Winter (1982, chapter 7).
13. The sum total of the states of the world is called the "universal state". By definition, this event is
always enacted.
14.In "hard" sciences, the stability of the object concerning which knowledge is being accumulated
feeds the hope that this can be done on a step-by-step basis. However, the existence of differing
paradigms suggests that even in this field knowledge does not accumulate in a linear fashion (cf.
T. Kuhn).
15.According to F. Varela, the second phase of what he calls STC (the Sciences and Technologies
of Cognition), dates from 1956, the year two major conferences took place, one at Cambridge
and the other at Dartmouth. As the first phase corresponds to cybernetics, I believe that 1956 is
a more suitable year of birth for Cognitive Sciences, even if a certain number of theses or
techniques date from the 1940s.
16.For example, in "Progres en situation d'incertitude", the leading article in one special issue of
Le Debat ( 1987), D. Andler states that " three groups of discipline co-exist under the cognitive
banner". Yet the only social sciences to be mentioned are "(cognitive) anthropology and
(cognitive) ergonomics ", whereas in the same issue we find an article by D. Sperber entitled
"Les sciences cognitives, sciences sociales et materialisme". Inversely, in Introduction aux
sciences cognitives, published under the supervision of D. Andler, we find four articles on social
sciences.
17. One attempt took place in France based on the production of a work group that had been run by
P. Petit as part of a C.N.R.S research project. A report that was co-signed by B. Munier and by
A. Orlean was produced on this occasion.
18. The partition principle can be found in epistemic logic whenever certain actions are being
satisfied simultaneously.
19. These are strictly defined games where one of the players is equipped with winning strategies.
20. We choose not to discuss the issues at stake in this selection process at present (i.e., the firm or
the routine?), nor the modalities of this selection. This is a problem that can be explained by the
biological and Darwinian orientation of the analytical framework that R. Nelson & S. Winter
used - even though they quite correctly denied exploiting this metaphor. This issue can be left to
one side for the moment.

REFERENCES

Akerlof G. A. (1970), "The market for lemons: quality and the market mechanism", Quarterly
Journal of Economics, 84.
Issues and dilemmas in cognitive economics 105

Alchian A. A. (1949), An Airframe production function, Rand Paper.


Alchian A. A. (1950), "Uncertainty, evolution and economic theory", Journal of Political
Economy, 58.
Alchian A. A. (1970), "Information Costs, Pricing, and Resource Unemployment", in Phelps E. S.
(ed.), Microeconomic Foundations of Employment and Inflation Theory, Norton.
Allais M. (1953), « Le comportement de l'homme rationnel devant le risque: critique des postulats
et axiomes de !'ecole americaine »,Econometrica, 21.
Andler D. (1992), Introduction aux sciences cognitives, Gallimard.
Andler D. (1987), « Progres en situation d'incertitude », Special issue of Le Debat (Une nouvelle
science de !'esprit), 47, November.
AnsoffH. I. (1965), Corporate Strategy, McGraw-Hill.
Arrow K. (1959), "Towards a theory of price adjustment", in Abramovitz, M. (ed.), The Allocation
of Economic Resources, Stanford University Press.
Arrow K. (1962a), "Economic welfare and the allocation of resources for invention", in: The Rate
and Direction of Inventive Activity: Economic and Social Factors, N.B.E.R., Princeton
University Press.
Arrow K. (1962b), "The Economic Implications of Learning by Doing", Review of Economic
Studies .
Arrow, K. ( 1963), "Uncertainty and the Welfare economics of medical care", American Economic
Review.
Arrow K. (1974), The Limits of Organization. W.W. Norton and Co, New York.
Bacharach M., Gerard-Varet M., Mongin L.A., Shin P., (eds.), Epistemic Logic and the Theory of
Games and Decisions, Kluwer.
Barone E. (1908), "II Ministro della Produzione nello Stato Collettivista", Giornale degli
Economisti.
Becker G. S. (1962), "Investment in human capital: A theoretical analysis", Journal of Political
Economy, 70.
Bonanno G. (2000), Information, Knowledge and Belief, University of California.
Boulding K. (1966), "The Economics of Knowledge and the Knowledge of Economics", American
Economic Review, 56(2).
Boulding K. (1968), "Knowledge as a Commodity", Beyond Economics: Essays on
Society,Religion and Ethics, University of Michigan Press.
Chamberlin E. H. (1933), Theory of Monopolistic Competition, Oxford University Press.
Coase R. ( 193 7), The nature of the firm, Economica.
Cyert R. M., March J. G. ( 1963), A behavioral theory of the firm, Prentice Hall.
Denison E. (1962), The Sources of Economic Growth in the United States, Washington, D.C.:
Committee for Economic Development.
Drucker P. F. ( 1964), Managing for results, Harper & Row.
Durand R., Quelin, B. (1999), «Contribution de Ia theorie des ressources a une theorie
evolutionniste de Ia firme »,in Basle, M. and alii (eds), Approches evolutionnistes de lafirme et
del 'industrie, L'Harmattan.
Fellner W. (1949), Competition Among the Few- Oligopoly and Similar Market Structures, Alfred
A. Knopf.
Granovetter M. (1973), "The Strength of Weak Ties", American Journal of Sociology.
Granovetter M. (1988), "The Old and the New Economic Sociology", in Friedland R., Robertson
A. (eds.), Beyond the Marketplace, Rethinking Economy and Society, Adline de Gryter.
Griliches Z. (1960), "Hybrid com and the economics of innovation", Science, 29th of July.
Reprinted in Rosenberg, N. (1971), The economics of technological change, Penguin,
Harmondsworth.
106 Chapter4

Hamel G., Prahalad C.K. (1990), "The Core Competence of the Corporation", Harvard Business
Review.
Hart 0. (1995), Firms, contracts, and financial structure. Oxford: Oxford University Press.
Hart 0., Moore, J. (1990), "Property rights and the nature of the firm", Journal of Political
Economy, 98.
Hayek F. A. (ed.) (1935), Collectivist Economic Planning, Clifton. Reprint : Augustus M. Kelley,
1975.
Hayek F. A. (1945), "The Use of Knowledge in Society", American Economic Review, 35.
Hintikka J. (1962), Knowledge and Belief, Cornell University Press.
Houthakker H. S. (1959), "Education and Income", Review of Economics and Statistics.
Hurwicz L. (1972), "On Informationally Decentralized Systems", in Radner R., McGuire C. B.
(eds.) Decision and Organization (Volume in Honour of Jacob Marschak), North-Holland.
Jensen M . C., Meckling W. H. (1992), "Specific and general knowledge and organizational
structure", in Werin, L. Wijkander, H. (eds.), Contract Economics, Basil Blackwell.
Jensen M. , Meckling W. (1976), "Theory of the firm: managerial behavior, agency costs, and
ownership structure", Journal of Financial Economics.
Kahneman D., Slovic P., Tversky A. (1982), Judgment under uncertainty: Heuristics and Biases,
Cambridge University Press.
Kaldor N. ( 1934), "The equilibrium of the firm", Economic Journal.
Kirman A. 1997, "Interaction and markets", Southern European Economics Discussion Series, 166.
Kirman A. (1999), « Quelques reflexions a propos du point de vue des economistes sur le role de Ia
structure organisationnelle dans l'economie »,Revue d'Economie Industrielle, 88.
Kirman A. (1992), "Variety: the Coexistence of Techniques", Revue d'Economie Industrielle, 62.
Klein B. H., Meckling W. H. (1958), "An application of operations research to development
decisions", Operations Research, 6.
Klein B. H. (1958), "A radical proposal for R&D" , Fortune, 57(5).
Knight K. ( 1921 ), Risk, uncertainty and profit, Houghton Mifflin.
Kuhn T.S. (1970), The Structure of Scientific Revolutions, University of Chicago Press.
Lamberton D. M. (1971 ), Economics ofInformation and Knowledge, Penguin, Harmondsworth.
Lamberton D. M. (1993), "The information economy re-visited", in BabeR. (ed.), Information and
Communication in Economics, Kluwer Academic, Dordrecht.
Lange 0. ( 1936), "On the Economic Theory of Socialism", Review of Economic Studies, 4.
Lange 0. (1964), Introduction to Economic Cybernetics, Pergamon.
Leijonhufvud A. (1968), On Keynesian Economics and the Economics of Keynes: A Study in
Monetary Theory, Oxford University Press.
Leijonhufvud A. (1981), Information and Coordination. Essays in Macroeconomic Theory, Oxford
University Press.
Leijonhufvud A. (1993), "Towards a Not-Too-Rational Macroeconomics", Southern Economic
Journal, 60.
Lesoume J. ( 1991 ), Economie de I 'ordre et du desordre, Economica.
Lesoume J., Orlean A. (eds.), (1998), Self-Organization and Evolutionary Economics, Economica.
Lucas R. E. Jr (1986), "Adaptive Behavior and Economic Theory", The Journal Of Business,
supplement to the October issue. Reprinted in Hogarth R. M ., Reder M. W. (eds.) (1987),
Rational Choice-The Contrast between Economic and Psychology, The University of Chicago
Press.
Machlup F. ( 1962), The production and distribution of knowledge in the U.S., Princeton University
Press.
Mansfield E. (1961 ), "Technical change and the rate of imitation", Econometrica, October.
March, J. G ., Simon, H. A. (1958), Organizations, J. Wiley.
Issues and dilemmas in cognitive economics 107

Marschak J., Radner R. (1972), The Theory of Teams, New Haven: Yale University Press.
Marshall A. ( 1890-1920), Principles in economics, Macmillan.
Milgrom P., Roberts J. (1982), "Limit pricing and entry under incomplete information: An
equilibrium analysis", Econometrica, 50.
Milgrom P., Roberts J. (1986), "Price and advertising signals of product quality", Journal of
Political Economy, 94.
Milgrom P., Roberts, J. (1995), "Complementarities and fit strategy, structure, and organizational
change in manufacturing", Journal ofAccounting and Economics, 19.
Milgrom P., Roberts J. (1990), "The Economics of Modem Manufacturing: Technology, Strategy
and Organization", American Economic Review, 80.
Miller H. P. (1960), "Annual and Lifetime Income in Relation to Education", American Economic
Review.
Munier B. (1997), "La rationalite face au risque: de l'economie a Ia psychologie cognitive", in
Roland-Levy, C., Adair, P. (eds.), Psychologie economique, Economica.
Muth, J. F. (1961 ), "Rational expectations and the theory of price movements", Econometrica, 29.
Nelson R. R. (1959), "The simple economics of basic scientific research", Journal of Political
Economy, 21.
Nelson R. R., Winter, S. ( 1982), An Evolutionary Theory of Economic Change, Cambridge, Mass.:
Harvard University Press.
Neumann J. von, Morgenstern 0. (1944), Theory of Games, J. Wiley.
Osborne M. J., Rubinstein A. (1994 ), A course in game theory, M.l.T. Press.
Pareto V. ( 1896-1897), Cours d'economie politique, Lausanne, F. Rouge, 2 vols.
Paulre B. et alii (collectif ISYS), (2001a), "Le capitalisme cognitif comme sortie de Ia crise du
capitalisme industriel", communication au Forum de Ia regulation, Ecole Normale Superieure,
September.
Paulre B. (2001b), Preface a Azais C., Corsani A., Dieuaide P. (eds.), Vers un capitalisme cognitif,
L'Harmattan.
Paulre B. (2000), "L'auto-organisation comme objet et comme strategie de recherche'', in Decision,
Prospective et Auto-organisation, Melanges en I 'honneur de J. Lesourne, Dunod.
Penrose E. T. ( 1959), Theory of the Growth of the Firm, Basil Blackwell.
Porat M. (1977), The Information Economy, Special Publication, Department of Commerce,
Washington D.C.
Porat M. (1978), "Global implications of the information society", Journal of Communication,
28(1 ).
Rabin M. 0. (1957), "Effective Computability of Winning Strategies", in Contribution to the
Theory of Games, Dresher M.D. et al. (eds.), Annals of Mathematics Studies, 39.
Reinganum J., Wilde L. (1986), "Settlement, Litigation and the Allocation of Litigation Costs",
Rand Journal ofEconomics, 17.
Rees A. (1966), "Information networks in labor markets", American Economic Review, 56(2).
Richardson G. B. (1960), Information and Investment, Oxford University Press.
Richardson G. B. ( 1972), "The organization of industry", Economic Journal, 82.
Robinson A. ( 1934), "The problem of management and the size of the firm", Economic Journal.
Romer P. (1986), "Increasing Returns and Long-Run Growth", Journal of Political Economy.
Romer P. (1990), "Endogenous Technological Change", Journal of Political Economy.
Ross S. (1973), "The economic theory of agency: The principal's problem", American Economic
Review, 63.
Rothschild M., J. Stiglitz, J. (1976), "Equilibrium in competitive insurance markets: an essay on the
economics of imperfect information", The Quarterly Journal of Economics, 90.
Savage L. J. (1954), The Foundations of Statistics, J. Wiley.
108 Chapter4

Schultz T. W. (1956), "Reflections on Agricultural Production, Output and Supply", Journal of


Farm Economics.
Schultz T. W. (1959), "Investment in Man: An Economist's View", The Social Service Review,
XXXIII.
Schultz T. W. (1960), "Capital Formation by Education", Journal of Political Economy.
Schultz T. W. ( 1961 ), "Investment in human capital", American Economic Review.
Schultz, T. W. (1962), Reflections on Investment in Man, Journal of Political Economy.
Schultz T. W. (I 963), The Economic Value of Education, Columbia University Press.
Shapiro R., Varia H. (1998), Information Rules: A Strategic Guide to the Network Economy,
Harvard Business School Press.
Simon H. ( 1991 ), "Bounded rationality and organizational learning", Organization Science, 2.
Spence A. M. (1974), "Market signaling informational transfer in hiring and related screening
processes", Harvard economic studies, 143, Cambridge, Mass.
Spence A. M. (1973), "Job market signalling", Quarterly Journal of Economics, 87.
Spence M., Zechauser, R. (1971), "Insurance, Information and Individual Action", American
Economic Review, Papers and Proceedings, 61.
Stigler G. J. ( 1961 ), "The economics of information", Journal of Political Economy, 69.
Stigler G. J. ( 1962), "Information in the labor market", Journal of Political Economy, 70.
Stiglitz J. E. (1975), Information and Economic Analysis, in Parkin, Nobay (eds.), Current
Economic Problems.
Stiglitz J. E. ( 1985), "Information and Economic Analysis: A Perspective", The Economic Journal,
95, Supplement: Conference Papers.
Stiglitz J. E. (1987), "The Causes and Consequences of the Dependence of Quality on Prices",
Journal ofEconomic Literature, 25, March.
Stiglitz J. E. (I 989), "On the Economic Role of the State", in Heertje A. (ed.) The Economic Role
of the State.
Taylor F. M. (1929), "The Guidance of Production in a Socialist State", American Economic
Review.
Tustin A. (I 958), The Mechanism ofEconomic Systems, Heinemann.
Varela F. (1988-1989), Connaitre, Seuil.
Walliser B. (2000), L e' conomie cognitive, Editions Odile Jacob.
Weick K. E. (1979), The social psychology of organizing, (2nd ed.), Addison-Wesley.
Weick K. E., Roberts K. H (1993), "Collective mind in organizations: Heedful interrelating on
flight decks", Administrative Science Quarterly, 38.
Williamson 0. E. ( 1976), Market and Hierarchy, Free Press.
Williamson 0 . E. (1985), The Economic Institutions of Capitalism, Free Press.
Williamson 0 . E. (1996), The Mechanisms of Governance. Oxford: Oxford University Press.
Chapter 5

Working times in atypical forms of employment:


the special case of part-time work

Patrick LETREMY, Marie COTTRELL


Samos-Matisse, CNRS UMR 8595, Universite Paris 1, pley,cottrell@univ-Parisl.fr

Abstract: In the present article, we attempt to devise a typology of forms of part-time


employment by applying a widely used neuronal methodology called Kohonen
maps. Starting out with data that we describe using category-specific variables, we
show how it is possible to represent observations and the modalities of the variables
that define them simultaneously, on a single map. This allows us to ascertain, and to
try to describe, the main categories of part-time employment.

Key words: Kohonen maps, Working times, Classification.

INTRODUCTION

France's economic recovery since 1997 has been accompanied by strong job
creation and by a significant drop in unemployment. This does not mean however
that there has been any real reduction in the number of people working under
what has come to be known as "atypical forms of employment". Quite the
contrary, the number of persons in temporary employment (with fixed term
contracts hereafter FTC or doing temporary agency work) has never been as
high. There has been an unprecedented rise in part-time work in France,
something that coincides nowadays with the ever-increasing number of female
entrants into the job market. In an Employment Survey carried out by the French
National Statistics Office (INSEE), part-time jobs represented 16.8% of the
country's employed working population, and temporary jobs (temporary work
and FTC) 6.3%. Furthermore, since 1994, there has been greater growth in
temporary work than in FTC.

111
C. Lesage et al. (eds.), Connectionist Approaches in Economics and Management Sciences
© Springer Science+Business Media Dordrecht 2003
112 Chapter 5

Tet'lXf'll)', FTC and part-tirre v.ukers as prqxrtiCI1 in % oftcml salaried errplO)'rent

(sarroe : INSFE)
20

151
10

5
: : : :
0
'
1982

-
1985 1900

Tem:x:rcry
1£e5 1900

-+- FTC
1997 1~ 1999

-+- Pcrt-tirre
2(XX)

Figure 1.Changes in number of workers involved in atypical forms of employment

Such atypical forms of employment still constitute a relatively small minority


of all jobs. It should be noted however that the circumstances surrounding female
work have been considerably altered by the large and increasing proportion of
women part-time workers. In addition, temporary work has a much greater effect
on the labour market than the sheer weight of the numbers involved, a prime
example being the preponderant role it plays in workers' transition back and forth
between employment and unemployment.

The rise of these atypical forms of employment has unsurpnsmg drawn


attention to issues relating to the extent to which full-time employment will in the
future be carried out by people working under an open-ended contract (hereafter
OEC). The IRES and MATISSE research centres' contributions to the INSEE's
1998-99 Timetables survey focused on the working times which characterise
these atypical forms of employment. For example, a study that benefited from a
DARES research grant (Cottrell, Letremy, Macaire et al. 2001), "Working times
with atypical forms of employment, Final Report", IRES, Noisy-le-Grand,
February 2001) tried to discover whether atypical forms of employment are
subject to specific constraints in terms of the working times they entail. In other
words, do they imply circumstances that should a priori be considered to be more
difficult for those who are actually in this sort of work situation? Answering this
question means making a comparison with a benchmark norm, the obvious one
being the current situation for people in full-time employment and working under
an open-ended contract. This means that we have not only tried to discover
whether such atypical forms of employment are subject to specific time
constraints, but also whether they are having to cope with working time
Working times in atypical forms of employment 113

constraints that are harder to deal with than is the case when the person involved
benefits from an open-ended contract and a full-time employment status.

We should specify the terms which the present article uses:


By atypical forms of employment we primarily mean i) the various modalities
of temporary salaried work, whether full-time or part-time, and ii) the various
modalities of part-time salaried work, regardless of the nature of the
employment contract.
By working times, not only do we mean issues relating to the number of hours
worked, but also schedules, calendars and working times patterns, the
variability and predictability thereof, how much choice the employee has in
these different areas, etc. Note that the study only focuses on people's current
principal activity.

Neuronal techniques such as Kohonen maps were used throughout the study
to segment groups of employees according to available quantitative variables,
before linking the category variable that is defined in this manner with informed
qualitative variables. We would like use the present article to present an
alternative to this technique, proposing a method that makes it possible to
segment individuals by qualitative variables, even though this particular
segmentation will later be crossed with available quantitative variables.

To present this new methodology, we took a particular interest in part-time


employees working on either an open-ended or a fixed term contract. It is
common knowledge that practically all part-time employment involves women
(90% of OEC part-timers, 82.5% of FTC part-timers). This basically relates to
women employees in areas such as retail, services and the social and non-profit
sectors. However, we still wonder whether there are any differences between
OEC and FTC part-timers - for example, whether the women who find
themselves in either of these two situations have the same profile, whether they
chose their part-time status or not, etc.?

We extracted data relating to part-time employees from from the INSEE's


1998-1999 Timetables survey. This covered 690 OEC and 13 7 FTC workers,
after eliminating data that contained input errors or missing information. We then
restricted the number of variable and kept 14 qualitative ones (type of contract,
gender, age, the regularity of the timetabling, whether this involved sociable
hours [night or weekend shifts], employee autonomy, the schedules
predictability, etc), all of this for a total of 39 modalities. We also kept 5
quantitative variable relating to the number of hours worked per week. Data
presentation takes the form of a complete disjunctive table containing 827 rows,
114 Chapter 5

39 columns featuring ls or Os and 5 columns of real data (see the appendix for
additional information on the survey).

Table 1. Qualitative variables

Headin,; Modalities Name


Type of employment contract Open-ended I fixed term contract OEC,FTC
Gender Man, Woman MAN,
FEM
Age <25, AGEl,
[25, 40], AGE2
[40,50], AGE3
~50 AGE4
Daily working schedules Identical, HORIDE
Posted, HORPOS
Variable HORVAR
Number of days worked per week Identical, JWKl,
Variable JWK2
Night shifts Never, NITEl,
Sometimes, Usually NITE2, NITE3
Saturday shifts Never, SAT!,
Sometimes, Usually SAT2
SAT3
Sunday shifts Never, SUN!,
Sometimes, Usually SUN2,
SUN3
Wednesday shifts Never, WED!,
Sometimes, Usually WED2,
WED3
Able to take time off Yes, Yes under certain ABSl ,
conditions, No ABS2,
ABS3
The schedule is determined by ... The firm itself, choice is given, DETI,
he/her decides him/herself, other DET2,
DET3,
DET4
Part-time status forced Yes, INVOL,
No VOL
Worker knows his/her schedule for Yes, LEND I,
next day(s) No LEND2
Possibility of carrying over working Not applicable, Yes, RECUPO,
hours No RECUPI,
RECUP2
Working times in atypical forms of employment 115

Table 2. Quantitative variables

Heading Name
Minimum duration of actual workweek DMIN
Maximum duration of actual workweek DMAX
Theoretical duration of workweek DTHEO
Number of overtime hours worked per week HSUP
Number of hours of extended shift work/week HPROL

A simple cross-analysis of the variables reveals right away that men only
represent 10% of all part-time employees working on an OEC basis and 18% of
all part-time employees on an FTC. Moreover, even though forced (and therefore
involuntary) part-time work accounts for nearly 50% of all employment
contracts, it only represents 43% of the OEC, versus nearly 80% of FTC. Note
that 83% of all contracts are OEC.

After a cursive study of these descriptive statistics (we will not be delving any
further into them at present; see appendix for elements thereof), we are now
going to carry out a segmentation of those individuals who are represented by the
14 qualitative variable defined above, as well as their 39 modalities. Towards this
end, we will be defining a new method, one that is based on the Kohonen
algorithm, but which enables an analysis of complete disjunctive tables.

1. THE KOHONEN ALGORITHM

This is the original classification algorithm that Teuvo Kohonen defined in


the 1980s based on his studies of neuromimetic motivations (Kohonen 1984;
1995). In the present data analysis framework (Kaski 1997; Cottrell, Rousset
1997), the data space is a finite set that is identified by the rows of a data table.
Each row of this table represents one of N individuals (or observations) that are
being described by an identifier and by p quantitative variables. The algorithm
then regroups the observations into separate classes, whilst respecting the
topology of the data space.

This means that a priori we have defined a concept that accounts for a
neighbourhood between classes. It also means that neighbouring observations in
the data space of dimension p will belong (once they have been classified) to the
same class or to neighbouring classes. The use of this algorithm is justified by the
fact that it enables a regrouping of individuals into small classes whose
neighbourhood is meaningful (unlike a hierarchical classification or a moving
centres algorithm), and that they themselves can then be dynamically regrouped
116 Chapter 5

into super classes, preserving all the while the relationships of neighbourhood
that have been detected. The visual representation of the classes is therefore easy
to interpret, inasmuch as it occurs at a global level. Inversely, visual
representations obtained through the use of classical projection methods are
incomplete, as it becomes necessary to consult a number of successive
projections in order to derive any reliable conclusions.

The structures of neighbourhood between the various classes can be chosen in


a variety of ways, but in general we assume that the classes are laid out on a
rectangular two-dimensional grid, this being a natural definition of neighbours in
each class. We can also consider a one-dimensional topology, a so-called string,
and possibly even a toroidal structure or a cylinder.

1.1 The algorithm for the quantitative data

The classification algorithm is an iterative one. It is launched through the


association of each class with a randomly chosen code vector (or representative)
of p dimensions. We then choose one observation randomly at each stage and
compare it with all of the code vectors to determine the winning class, meaning
the one whose code vector is closest (for a distance that has been determined
beforehand). The code vectors of the winning class and of the neighbouring
classes are moved in the direction of the chosen observation, so that the distance
between them decreases.

This algorithm is analogous to a moving centres algorithm (in its stochastic


version). However, the latter does not seek to conceptualise neighbourhoods of
classes. Moreover, the only thing that it modifies at each stage is the code vector
(or representative) of the winning class.

Following on from this, we assume that our readers are familiar with this
algorithm (see inter alia Cottrell Fort, Pages 1998).

Given that an arbitrary number of classes is chosen (it is often high because
we frequently select grids of 8 by 8 or 10 by 10), we can reduce the number of
classes, regrouping them by subjecting the vector codes to a classical hierarchical
classification. We can then colour the class groups (called super classes) to
enhance their visibility. Generally we observe that the only classes that such
super classes regroup are contiguous ones. This can be explained by one of their
properties, i.e., by the fact that the Kohonen algorithm respects the topology of
the data space. Moreover, non-compliance with this property would indicate the
Working times in atypical forms of employment 117

algorithm's lack of convergence, or else a structure that has been particularly


"folded" into the data set.

To describe the super classes, we calculate the basic statistics of the


quantitative variables that are being used. We then study the way in which the
modalities of the qualitative variables that the Kohonen classification algorithm
does not use are distributed along the grid (Cottrell, Rousset 1997).

1.2 Classification of the observations that are being


described by the qualitative variables - the KDISJ
algorithm

This involves simultaneously classifying both individuals and the modalities


of the qualitative variables that describe them. Analysts should be aware however
that most of the time qualitative variables cannot be used in their existing form,
even when the modalities are number coded. If no ordered relationship exists
between the codes (for instance, 1 for blue eyes, 2 for brown eyes, etc.), it is no
use applying them as if they were numerical variables, in a blind attempt to use
Kohonen learning. Even if the codes were to correspond to an increasing or
decreasing progression, this would only be meaningful if a linear scale were used
(modality 2 corresponding to half of the progression between modalities 1 and 3).
A fruitful method would then consist of processing the qualitative variables
beforehand via a multiple correspondence analysis and preserving all of the co-
ordinates. This is tantamount to coding all of the individuals by the co-ordinates
that have been attributed to them as a result of this transformation. Once
individuals have been represented by numerical variables, they can be classified
using the Kohonen algorithm. We will however have lost the modalities, and the
calculations will be both cumbersome and also costly in terms of calculating
times - exactly that which we are trying to avoid by using the Kohonen
algorithm.

The present paper introduces a method that has been adapted to qualitative
variables, and which also enables a simultaneous processing of individuals and of
modalities.

Consider N individuals and a certain number K of qualitative variables. Each


variable k = 1,2, ... , K has mk modalities. Each individual chooses one and only
one modality for each variable. If M is the total number of modalities, each
individual is represented by aM-vector comprised of Os and 1s. There is only one
1 amongst the m1 first components, only one 1 between the (m 1+1Yh and the
(m 1+m 2 Yh, etc. The table with N rows and M columns that is formed in this way is
118 Chapter 5

the complete disjunctive table, called D. Note that it contains all of the
information that will enable us to include individuals as well as the modalities'
distribution.

We note dij as the general term of this table. This can be equated to a
contingency table that crosses an "individual" variable with N modalities and a
"modality" variable with M modalities. The term dij takes its values in {0, 1}.

We use an adaptation of an algorithm (KORRESP) that has been introduced


to analyse contingency tables which cross two qualitative variables. This
algorithm is a very fast and efficient way of analysing the relationships between
two qualitative variables. Please refer inter alia to Cottrell, Letremy, Roy (1993)
to see the various ways it can be applied to real data.

We calculate the row sums and column sums by:

}=I i=l

Note that with a complete disjunctive table, d; is equal to K, regardless of i. The


term d. 1 represents the number of persons who are associated with the modality j.

In order to use a X2-distance along the rows as well as down the columns, and
to weight the modalities proportionately to the size of each sample, we adjust the
complete disjunctive table, and put:

d .. d .
de= I) de= I)
ij ~ij ~·
"1/ui.u .j "1/uiu.j

When adjusted thusly, the table is called De (adjusted disjunctive table). This
transformation is the same as the one that lbbou proposes in his thesis (lbbou
1998; Cottrell, Ibbou, 1995).

These adjustments are exactly the same as the ones that correspondence
analysis entails. This is in fact a principal weighted component analysis that uses
the Chi-Square distance simultaneously along the row and column profiles. It is
the equivalent of a principal components analysis of data that has been adjusted
in this way.
Working times in atypical forms of employment 119

We then choose a Kohonen network, and associate with each unit a code
vector that is comprised of (M + N) components, with the M first components
evolving in the space for individuals (represented by the rows of Dc) and the N
final components in the space for modalities (represented by the columns of De).
The Kohonen algorithm lends itself to a double learning process. At each stage,
we alternatively draw a Dc row (i. e., an individual), or a column (i. e., a
modality).

When we draw an individual i, we associate a modality j(i), thus maximising


the coefficient d~, i.e., the rarest modality out of all of the corresponding ones in
the total population. We then create an extended individual vector of dimension
(M + N). Subsequently, we try to discover which is the closest of all the code
vectors, in terms of the Euclidean distance (restricted to the M first components).
Note u the winning unit. Next we move the code vector of the unit u and its
neighbours closer to the extended vector (i, j( i) ), as per the customary Kohonen
law.

When we draw a modality j with dimension N, we do not associate an


individual with it. Indeed, by construction, there are many equally placed
individuals, and this would be an arbitrary choice. We then seek the code vector
that is the closest, in terms of the Euclidean distance (restricted to the N last
components). We then move the N last components of the winning code vector
and its neighbours closer to the corresponding components of the modality vector
}, without modifying the M first components.

By so doing, we are carrying out a classical Kohonen classification of


individuals, plus a classification of modalities, maintaining all the while their
association with one another. After the convergence, the individuals and the
modalities are classified into Kohonen classes. "Neighbouring" individuals or
modalities are classified in the same class or in neighbouring classes. We call the
algorithm that has been defined thusly KDISJ.

When we are not trying to classify individuals but only modalities, we can use
another algorithm that draws its inspiration from the genuine Kohonen algorithm.
This is called KMCA. We can then classify individuals as if they were additional
data (for definitions and applications, see inter alia lbbou's thesis, lbbou, 1998).
We can also classify individuals alone, and then classify as additional data the
"virtual individuals" associated with the modalities that have been calculated
from the rows of the Burt matrix. Finally we can classify modalities alone (as is
the case with KMCA) and classify individuals subsequently, once they have been
properly normalised. This is what Ibbou called KMCA1 and KMCA2. These
120 Chapter 5

methods generate findings that are very comparable to those that can be found
with KDISJ, but they do require a few more iterations.

2. THE CLASSIFICATION

2.1 Classification usinga Kohonen matrix and a regrouping


into 10 super classes

On the map below (a 7 by 7 grid) we display findings from a simultaneous


classification of individuals and variables. To simplify this representation, we
have in each case displayed the current modalities, the number of individuals
who have been classified, and between brackets the number of persons working
on an OEC or FTC basis.

!'(E1 CEH ~ mro;

fl4<12.22) PM :>1(31,3) S(S.O) ~,3) flo.Ol ~


SlN1

l()'s,s) ~4.0) ~52) ~~0) 13(12.1) 34{1l.l) p

~
rus' :.£1'2 !lli"'ll

2(9.3) 1(2.0)
~ 10:10.0) ~12) 36(33.0)

stNl Rlll.Pl

19(18.1) 1(1.0) 1(31,0) '(4,0) ll(IM) 1((16,0) 2(2.0)

~TJ I'1'M AilS!

""IJL
·O<VAR
1{5.6) r \\KI
f\Bsl
~8.1) ~I) 1'(8.0) f'4<43,1)

pr::c_ DEl'l SATI ~


I(E UMJI
NITEI ftKU>I ~ID!
~ N\()1_

~II!) 1{0.1) fx4.0) !>o\1'2 ~.0) 2(2.0) 1(31.0)


1\Wl
1(21,0)
fl'(; MAN OU!:E MD
~ f<ocuu
I(OJI) ll{0.15) Jti(.JS.Q) 1{1.0) IJ7(37,0) ps.:14,1) ~4)

Figure 2. Distribution of modalities and individuals across the grid

Note: The squares in gray feature a much higher percentage of OEC than the total population does.
Working times in atypical forms of employment 121

Note how modalities and individuals are distributed amongst the various
classes in a relatively balanced fashion. Fixed term contracts are mostly found to
the left of the map. Remember that they only represent 17% of all contracts.

The modalities that correspond to the best working conditions (in other
words, and for the purposes of the present paper, to more regular working times;
to no night-time, Saturday or Sunday shifts; to open-ended contracts; and to
voluntary part-time status) are associated with all age brackets, except for young
persons, and are found to the bottom right. These correspond to relatively
favourable work situations. Inversely, the young persons modality is located to
the top right, and is associated with "unpleasant" modalities such as night shifts,
Sunday shifts, no chance to take any time off etc.

The modality for women (who are present everywhere and who constitute the
vast majority of the total population, to wit 88%) is close to the centre of the map
and associated with the involuntary part-time modality that is close to the FTC
modality.

2.2 Regrouping the classes

Next we diminish the number of classes by carrying out a hierarchical


classification of the 49 code vectors. After several attempts to obtain a
reasonably small number of classes, we have kept the 10 super classes that are
represented below (see Figure 3).
122 Chapter 5

Figure 3. The I 0 super classes

The total population is relatively well balanced amongst these 10 super


classes, with class 4 alone featuring a much larger sample. This will become
understandable once we explain why- such individuals' working conditions are
the most standard.

Table 3. Absolute frequencies


Working times in atypical forms of employment 123

Table 4. Description of classes using qualitative variables


(frequencies expressed as the percentage that the modality accounts for in each class)

I 2 3 4 5 6 7 8 9 10 Tot
OEC 99 40 100 92 47 92 77 94 88 93 83
FTC 1 60 0 8 53 8 23 6 12 7 17
MAN 0 54 2 1 14 13 16 10 12 7 12
FEM 100 46 98 99 86 87 84 90 88 93 88
AGEl 0 0 0 0 100 0 5 0 0 0 6
AGE2 36 51 43 45 0 26 65 39 39 39 40
AGE3 42 22 36 32 0 39 9 44 29 43 31
AGE4 23 27 22 22 0 34 21 17 32 18 22
HORIDE 52 61 48 75 35 26 49 29 29 0 52
HORPOS 0 0 0 0 0 0 12 0 8 100 4
HORVAR 48 39 52 25 65 74 40 71 71 0 44
JWKI 96 83 89 91 78 79 47 39 68 57 79
JWK2 4 17 II 9 22 21 53 61 32 43 21
NITEI 100 92 95 99 86 95 51 64 100 75 90
NITE2 0 5 4 1 10 5 21 27 0 21 7
NITE3 0 3 I 0 4 0 28 9 0 4 3
SATI 88 57 3 80 29 53 5 2 34 21 49
SAT2 6 23 5 10 I4 26 5 91 27 68 23
SAT3 6 20 92 10 57 2I 90 7 39 11 28
SUNI 96 88 82 99 69 87 0 13 83 57 76
SUN2 4 I2 IS 1 IS 10 2 87 I7 43 18
SUN3 0 0 0 0 I3 3 98 0 0 0 6
WEDI 41 I3 2I 33 IO I6 14 II I5 14 23
WED2 I4 9 IO 9 IS 8 12 46 I9 43 I6
WED3 45 78 69 58 72 76 74 43 66 43 61
ABSI 70 8I 72 7I 67 82 58 77 73 75 73
ABS2 24 9 0 21 6 I3 16 8 IO 18 I4
ABS3 6 IO 28 8 27 5 26 I5 I7 7 I3
DETI 0 81 90 75 88 37 72 68 0 78 63
DET2 0 5 0 25 2 8 I4 I2 0 7 II
DET3 100 I4 10 0 4 45 5 13 0 11 19
DET4 0 0 0 0 6 II 9 6 100 3 7
INVOL 12 80 74 44 82 53 60 34 46 21 50
INVOL 88 20 26 56 18 47 40 66 54 79 50
LEND I 100 100 100 100 100 5 95 98 100 100 95
LEND2 0 0 0 0 0 95 5 2 0 0 5
RECUPO 44 57 40 67 65 34 44 32 37 6I 52
RECUPI 34 20 3I 19 23 29 40 43 39 25 28
RECUP2 22 23 29 14 12 37 16 25 24 14 20

Note: The numbers written in bold font correspond to particularly high values and those in italics
to particularly low values.
124 Chapter 5

We can verify that in most cases, the modalities find themselves either within
or else close to one of the classes where they have a significant role to play. We
can control this by calculating each modality's deviation for each of the 10 super-
classes 1• As can be expected, such deviations are positive 85 % of the time.

We then study the 5 quantitative variables' average values across the 10


classes:

Table 5. Description of the 10 classes by their quantitative variables (averages)

Variable I 2 3 4 5 6 7 8 9 10 Total
DMIN 27.1 23 .7 24.4 25.4 21.8 24.3 22.1 23.2 24.4 24.9 24.5
DMAX 29. 1 27.7 27.4 27.2 24.1 29.5 32.2 32.3 28.9 32.0 28.5
DTHEO 27.0 24.4 24.5 25.7 22.3 24.8 25.4 25.8 25.1 27.1 25.4
HSUP 0.69 1.8 3.45 0.95 1.16 1.82 2 1.48 1.78 1.36 1.51
HPROL 1.65 1.97 1.54 0.78 0.75 2.08 2.53 2 2.71 0.86 1.5

Note that classes 6, 7, 8 and 10 display significant disparities between


minimum and maximum workweek durations. Fisher statistics corresponding to
these 5 variables show that they are all discriminatory in nature.

The 5 qu antitative
- - - - -- - variables,- - - - - - - - - - - - - ,

-- ....

DMIN ~ OTREO
- 7

H U
JHPAOI.
10 T...

Figure 4. Quantitative variables in the 10 classes and in the total population


Working times in atypical forms of employment 125

Based on these elementary statistics, it is possible to both describe the 10


classes and to develop a typology.

Table 6. Typology

I Employees, choosing voluntarily to work on a part-time basis; no Saturday shifts; they


determine their own working schedules (very little overtime)
2 Men, working on a FTC basis; with Wednesday shifts; possibility of taking time off
without any problem
3 Women who have had a part-time status forced upon them; every week they have the
same number of workdays but daily schedules are variable; Saturday shifts; no
possibility of taking any time off; no carryover of working hours (a lot of overtime)
4 The largest class, with 29% of the total. Employees working on an OEC basis; over the
age of25; no night-time or Sunday shifts; schedule is determined by the firm but
flexibility is a possible; identical work schedules every week, but employees know their
schedule for the next few days; time off can be taken under certain conditions; no
reason to carry over working hours (shifts are rarely extended and there is little
overtime)
5 All young persons under age of25 (halfOEC and half FTC); shift extensions are
infrequent
6 Employees do not know their schedules for the next few days; average of almost 4
hours a week of shift extensions or overtime hours
7 It is customary for employees to work night and Sunday shifts (an average of more
than 4h30 of shift extensions and overtime per week, some workweeks are more than
32h long even though they are allegedly doing part-time work).
8 Employees sometimes work night-time, Sunday, Saturday and Wednesday shifts, and
do not work the same number of days every week (some weeks they can work more
than 32h).
9 Working schedules are determined in a different way; shifts can be significantly
extended
10 Everyone's schedule is posted openly (possibility of an approximately 32h workweek)

On the super class representation, it is clear to see that the FTC and OEC
modalities are distinct and separate (class 2 and class 4), as are men and women.
As expected, women are associated with involuntary part-time work. The
"voluntary part-time" modality can be found in class 1, near the OEC modality.
Class 4 features the modalities that correspond to "normal" working conditions,
with all ages being represented except for young persons.

For a more exhaustive summary, the reader can refer to the report edited
by Cottrell, Letn!my, Macaire eta!. (2001).
126 Chapter 5

CONCLUSION

In presenting our conclusions on part-time workers, we will refer to some of


the descriptive statistics that the present paper was unable to mobilise, due to a
lack of space.

Firstly, part-time work is more of an involuntary phenomenon for temporary


employees than for permanent employees
The INSEE's Timetable survey raised a number of issues about part-time
workers, during its attempt to test the "voluntary" nature of this form of
employment. Regarding open-ended contracts, nearly 60% of all part-timers
stated that this had been their choice, i. e., it was not imposed on them by their
employer, either at the time of recruitment or else through the transformation of a
full-time position into a part-time one. Around half stated that they had freely
chosen their "shift system". In comparison, amongst employees working under
fixed term contracts, fewer than 20% of all part-timers had volunteered for this
status, but around 30% were working a shift system of their choice. This is quite
a difference. Moreover, women unsurprisingly state more frequently than men do
that they were the ones who had made the decision to work on a part-time basis.
However, we know that choice is a highly relative concept, as all choice is made
under constraint. We also know that family requirements often cause women to
prefer part-time work.

Male and female part-time workers' situations vary greatly, depending on


whether they are working under the aegis of an open-ended or a fixed term
contract. Around 70% ofwomen part-timers working on a fixed term contract do
not get to choose their schedules (versus 54% of all men in this position). For
part-timers working on an open-ended contract the gap is both reversed and
smaller - in this population, 48% of women do not get to choose their schedules,
versus 55% of men. It is as if the difference between OEC and FTC women were
greater than between OEC and FTC men, whose situation is more homogeneous.
Women on an open-ended contract basically choose their own schedule, more
than men in this situation do. Working on a fixed term contract, however, they
have less choice in their schedules.

In analysing the responses given to the question "Would you like to work
more?", we learn that the more atypical the contract, the more employees would
prefer to work more, as long as the increase in pay is proportional to the increase
in the number of hours they work. Amongst atypical jobs, it is primarily part-
time workers on fixed term contracts (and temporary workers, albeit to a lesser
extent) who would like to work more. The same opposition between part-time
and full-time work can be found in responses to questions relating to the desire to
Working times in atypical forms of employment 127

work less: unsurprisingly it is the part-timers who are less in favour of working
fewer hours. Furthermore, those who are out looking for a new job are basically
part-time employees on a fixed term contract (more than 40%) and temporary
workers (more than 50%).

APPENDIX
The data we used comes from the latest INSEE Timetable survey, the fourth of its kind (the
previous one having been carried out in 1985-1986). It ran from February 1998 to February 1999 in
8 successive survey waves. Focusing on French lifestyle and working patterns, the full study looked
at compensated professional working times, and more specifically at people's working times in
their "main current occupation". The sample is comprised of the only salaried population that can
provide comprehensive data on its professional working times. Teachers (who often make
incoherent statements about their working times, equating them with contact hours alone) and other
abnormal cases were taken out of the sample. 1,153 individuals were eliminated thusly, leaving a
database of 5,558 wage-earning individuals.

When this sample is linked to data from the INSEE's 1998 2or 1999 Employment surveys, no
major difference is detected between the two in percentage terms. If we structure the data according
to the type of work (full-time OEC, part-time OEC, full-time FTC, part-time FTC, temporary
workers, other), we come up with two very similar distributions (see table 7 below).

Table 7. Distribution of sample according to form of employment in the 1998 INSEE Timetable
and Job surveys

OECFT OECPT FTCFT FTCPT Temp Others


Timetable Survey 4,033 690 258 137 115 325
% of total sample
72. 6 12.4 4. 6 2. 5 2. 1 5. 8
85% 7.1% 2.1% 5.8%
Job survey 88.12% 3 5.57% 4 2.08% 4.22%

The breakdown between permanent/non permanent workers or between part-timers/full-timers


is very comparable in the two surveys. Men represent a share of between 53 et 54% in both studies
(and women between 45 and 46%). However, in terms of respondents' ages, the Timetable survey
slightly over-represents people between the age of 40 and 49 (by 3 points) and under-represents the
25-39 age bracket.
Other differences can be observed:
over-representation of the industrial sector (by 6 points) in the Timetable survey;
under-representation by 5 points of the service sector;
under-representation by around 5 points of sectors such as healthcare, education and social
work.
128 Chapter 5

NOTES
I. The deviation for a modality m (shared by nm individuals) and for a class k (with n\ individuals)
can be calculated as the difference between the number of individuals who possess this modality
and belong to the class k and the "theoretical " number nm nkI n which would correspond to a
distribution of the modality m in the class k that matches its distribution throughout the total
population.
2. Source: Employment Survey 1998, INSEE findings, no 141-142, 1998, 197 pages.
3. Except for non-tenured State and local authority employees.
4. TC except for State and local authority officials, + non-tenured State and local authority
employees.

REFERENCES
Boisard P., Fermanian J.-D. (1999), "Les rythmes de travail hors normes", Economie et Statistique,
321-322 (112), 111-132.
Bue J., Rougerie C. (1998), "L'organisation du travail: entre contraintes et initiative - resultats de
l'enquete Conditions de travail de 1998", Premieres Syntheses, DARES, 32(1), 99. 08.
Bue J., Rougerie C. (1999), "L'organisation des horaires: un etat des lieux en mars 1998",
Premieres Syntheses, 99(07), 30. 0 I, 8 pages.
Bue J., Rougerie C. (2000), "L'organisation des horaires: un etat des lieux en mars 1998", Les
Dossiers de Ia Dares, 1-2,9-15.
Cottrell M., Fort J.-C., Pages G. (1998), "Theoretical aspects of the SOM Algorithm",
Neurocomputing, 21, 119-138.
Cottrell M., lbbou S. (1995), "Multiple correspondence analysis of a cross-tabulation matrix using
the Kohonen algorithm", in Verleysen M. (ed.), Proc. ESANN '95, D Facto, Bruxelles, 27-32.
Cottrell M., Letremy P., Macaire, Meilland, Michon (2001), Les heures de travail des formes
particulieres d 'emploi. Rapportfinal, IRES, February, Noisy-le-Grand, France.
Cottrell M., Letremy P., Roy E. (1993), "Analysing a contingency table with Kohonen maps: a
Factorial Correspondence Analysis", Proc. IWANN'93, in Cabestany J., Mary J., Prieto A. (eds.)
( 1993), Lecture Notes in Computer Science, Springer-Verlag, 305-311.
Cottrell M., Rousset P. (1997), "The Kohonen algorithm: a powerful tool for analysing and
representing multidimensional quantitative et qualitative data", Proc. IWANN'97, Lanzarote.
Freyssinet J. in G. Cette (1999), "Le temps partie! en France", Paris, La Documentation franvaise
(collection "Les rapports du Conseil d'Analyse economique").
Galtie B. (1998), Les emplois des salaries a temps partielle secteur prive - Diversite des emplois
et des conditions de travail, Conseil Superieur de I'Emploi, des Revenus et des Couts, 98(03).
Gollac M., Volkoff S. (2000), Les conditions de travail, collection "Reperes", La Decouverte,
Paris.
Ibbou S. (1998), « Classification, analyse des correspondances et methods neuronales », Doctoral
thesis, Universite Paris I.
Kaski S. (1997), "Data Exploration Using Self-Organising Maps", Acta Polytechnica Scandinavia,
82.
Kohonen T. (1993), Self-organization and Associative Memory, 3°ed., Springer.
Kohonen T. (1995), "Self-Organizing Maps ", Springer Series in Information Sciences, 30,
Springer.
Working times in atypical forms of employment 129

Letoumeux V. (1997), "Precarite et conditions de travail dans l'Union Europeenne", Fondation


Europeenne pour !'Amelioration des Conditions de Vie et de Travail, Dublin.
Letremy P., Cottrell M., Macaire S., Meilland C., Michon F. (2001), « Le temps de travail des
formes particulieres d'emploi »,Rapport final, IRES, Noisy-le-Grand, February 2001.
Merllie D., Paoli P. (2000), "Dix ans de conditions de travail dans l'Union Europeenne - resume'',
Fondation europeenne pour !'Amelioration des Conditions de Vie et de Travail, Dublin,
(http://www. fr. eurofound. ie/publications/files/3712FR. pdf) .
Paugam S. (2000), Le salarie de Ia precaritl Les nouvelles formes de !'integration professionnelle,
Collection "Le lien social", Documents d'enquete series Presses Universitaires de France, Paris.
Chapter 6

Work and employment policies in French


establishments in 1998
A Kohonen Algorithm-Based Analysis

Severine LEMIERE3 , Corinne PERRAUDINb, Heloise PETIT3


a MATISSE,Universite Paris/, /06-l/ 2 Bid de l'H6pital. 75647 Paris Cedex I, :
slemiere@univ-paris 1fr, Heloise.Petit@univ-parisl Jr.
bMATJSSE-SAMOS-CNRS 8595 et EUREQua-CNRS 8594, Universite Paris I, Corinne.
Perraudin@univ-paris I fr

Abstract: The present paper analyses current employment and work policies in French
establishments on the basis ofthe REPONSE survey that was conducted in 1998. By
employment and work policy we mean a parallel study of customary employment
relationship characteristics as well as work organisation practices. Our study is
rooted in several employment policy variables as well as variables relating to work
organisation. The methodology used is based on two complementary analytical
tools: multiple correspondence analysis (MCA); and Kohonen's neuronal algorithm
(KMCA). After an exploratory study our interpretations are complemented by the
construction of a typology.

Key words: Kohonen Maps, Classification, Work management

INTRODUCTION

The French productive system has gone through a number of major


transformations in recent years. Changes affecting the mode of competition
during the 1980s lead to a devaluation of the Fordist production mode. At the
same time, work and employment practices derived from the concept of
flexibility emerged. Such organisational mode may be characterised by
individual carrier management and collective work organisation. It became, over
the past 20 years, a topic of heated debate for many employment economists, and
an emblem of the modem era. Yet questions remain over the magnitude and

131

C. Lesage et al. (eds.), Connectionist Approaches in Economics and Management Sciences


© Springer Science+Business Media Dordrecht 2003
132 Chapter 6

modalities of the mode's dissemination. Has flexible production replaced


traditional forms of Fordist organisation (Boyer, Beffa, Touffut 1999) or has it
developed alongside them, as a sort of complement (Galtier 1996)? To answer
this question, we conducted a global analysis of the forms of production
organisation that have been implemented in France in recent years.

Our study is based on the 1998 REPONSE survey. We define the varying
forms of production organisation not only by the human resource management
(HRM) practices they encompass but also by their modes of work organisation.
These two poles are in fact highly complementary, and even inseparable when it
comes to defining a firm's policy towards its employees. Our analysis therefore
focuses on employment policy variables, i.e., the extent to which firms make use
of part-time work, fixed term contracts (hereafter FTC) and temporary work; the
presence of wage increases and negotiating systems; and spending on training on
one hand and work organisation variables, such as the use of forms of collective
work, flatter hierarchies or employee mobility on the other hand.

We have adopted a methodology that uses multiple correspondence analysis


and a neuronal method based on the Kohonen algorithm to complement one
another.
We first study the relationships that tie together the whole set of qualitative
variables which relate to the management of work modes. The results of a
traditional multiple correspondences analysis (MCA) both complement and
accord with the results we derive from neuronal methods (KMCA, based on the
Kohonen algorithm). They highlight a clear polarisation between establishments
pursuing a restrictive policy and others who have adopted novel wage policies,
negotiating systems and organisational innovations.
To describe these HRM policies with greater precision, we create a typology
of the establishments involved, by means of a standard classification method
whose implementation is rooted in neuronal analysis.
It turns out that with this method it is easier to build up classes that are more
discriminatory (i.e., that avoid over-emphasising abnormal observations) than
would be the case with traditional classification methods.
This is very useful when dealing with a set of variables that are as complex as
the ones we are working with. The typology we set up in this manner defines 5
types of employment and work practices.
Work and employment policies in French establishments in 1998 133

1. PRESENTATION OF THE SURVEY

Our study is rooted in the REPONSE 98 survey (Relations Professionnelles et


Negociations d'Etablissements) carried out by the DARES (from France's
Ministere de l'emploi et de la Solidarite). This survey covered 3,022 non-farm,
non public administration establishments with 20 or more employees. It is
divided into three sections, by respondent. Information was supplied by
management representatives, by some establishment's employees or by staff
representatives. We pay particular attention to the employer database, comprised
of 2,978 establishments that provide information on 962 variables which have
been derived either from this survey or else from matching with the DMMO
(Declarations Mensuelles de Mouvements de Main-d'Q?uvre) and the DIANE
(Disque pour /'Analyse Economique) data sets.

We select a set of (active) variables relating to workforce and work


organisation policies, having chosen not to base the typology on the
establishments' structural characteristics (i.e., size or sector of activity for
example). The variables we keep focus on different aspects of the establishments'
behaviour.
First, employment and training policies: proportion of part-time workers vs.
total staff members (TP ART), percentage of temporary workers and FTC holders
employed in the establishment (PRECA) and percentage of the total wage bill
that is spent on general training (DEPFORM)
Secondly, wage policy: one variable providing information on whether any
wage hikes (across the board or individualised, bonuses) took place in 1998,
either for managers or non-managers (POLSAL) and another providing
information on whether or not there was a profit-sharing arrangements for the
establishment's employees in FY 1998 (INTERES).
Then, negotiation policy: one variable relating to whether or not there was
any wage bargaining or discussion with employees in FY 1998 (NEGSL98) and
another indicating whether over the past three years there had been any
discussions or negotiations on issues other than wages, such as employment,
technological changes, organisational innovations and working times (duration
and organisation) (AUTNEGR).
Finally, work organisation mode: one variable relating to the way in which
work is specified (either through a description of the specific tasks that are to be
executed or else through the setting of overall objectives) (ORDRES) and other
with information on employees' mobility in their work, i.e. whether their normal
job allows them to move from one workstation to another (MAJMO), the
shortening of the hierarchy (SUPNIV), collective work (percentage of the
establishment's employees that participate regularly in entities such as quality
groups, problem solving task forces, regular meetings at the workshop, office or
134 Chapter6

departmental level, autonomous production teams, multidisciplinary working


groups, project teams) (NVORGA).

The variables we use were recoded to ensure their appropriateness. This


recoding was based on both analytical and methodological criteria. Methods
such as multiple correspondences analysis (MCA) require a modality sample size
that is large enough for active variables. This is because a modality's
contribution to overall inertia is a decreasing function of its sample size, and a
modality that is based on too small a sample will bias the analysis, much in the
way that abnormal observations do. This caused us to recode the variables in
such a way as to maintain a minimum sample size for each modality (around
10%). We also tried to ensure that the study only included those establishments
that provided information on all of the active variables we had selected 1 In the
end, the sample we used covers 2,297 establishments. It remains a representative
one, given the establishments' distribution by sector of activity, size and age.

2. OVERALLANALYSIS

To analyse the relationships between the qualitative variables, alongside the


MCA we used an alternative method that is based on the Kohonen classification
algorithm, named SOM (Self Organising Map l This is called the KMCA. These
analyses were carried out on the 2,297 establishments and 11 aforementioned
active qualitative variables (for a total of 31 modalities).

2.1 MCA

The findings of our correspondences analysis can be synthesised by studying


the first three axes 3 (see Figures 1 and 2).

The first axis (II% of the total inertia) is built around variables such as wages
(POLSAL, INTERES), training (DEPFORM) and negotiation policy (NEGSL98
and AUTNEGR). The specificity of the organisational forms that the
establishments implemented can be detected when this axis is analysed, even
though it does not particularly stand out (only NVORGA and SUPNIV manifest
themselves to a significant extent). Note that the four modalities of the
DEPFORM variable are distributed uniformly, and that they constitute an axis
which is almost parallel to axis I.
More generally, axis I contrasts "restrictive" and "voluntarist" workforce and
work organisation policies. Indeed, on the right hand side we note an absence of
wage bargaining, negotiations on any other issues, wage hikes or profit-sharing
Work and employment policies in French establishments in 1998 135

arrangements - as well as lower spending on training. These behaviours are


twinned with the infrequent implementation of forms of collective work or of
flatter hierarchies. Regardless of whether this relates to work or to employment
policies, such practices are the embodiment of an attitude we can term
"restrictive". Behaviours of this ilk almost always involve a non-implementation
of specific forms of action (in terms of training, pay policies, negotiations or
work organisation). This type of policy can be contrasted with the behaviours
that are described on the left hand side of axis 1, characterised by greater
spending on training; the existence of profit-sharing arrangements, wage
bargaining and negotiations on other issues; and a flat hierarchy. All in all, the
management mode at this end of the axis can be called "voluntarist". Faced with
this binary opposition, axes 2 and 3 specify behaviours in terms of work
organisation and in terms of the use of certain forms of employment.

The second axis (7% of the total inertia) is built around variables that relate to
the forms of employment (TPART and PRECA) and to the other variables which
describe the work organisation (this time around ORDRES and MAJMO). It
contrasts policies based on workers' lesser mobility between workstations, a
prescription of work through the setting of objectives and a relatively widespread
use of part-time contracts (situated to the top) with diametrically opposed
behaviours (situated below). Hence the factorial representation (1,2), which
reveals four types of behaviours. Amongst the "restrictive" policies we
distinguish behaviours in the north-east quadrant, characterised by work that
involves very little mobility and by the presence of a large number of part-timers,
with behaviours that ally themselves to the "restrictive" policies found in the
south west quadrant, where work is prescribed through specific tasks and very
few part-timers are employed. "Voluntarist" behaviours include an opposition
between a major use of part-timers and a work prescription that is defined by
overall objectives, versus a great deal of work mobility and a large number of
fixed term jobs.
Axis 3 (6% of the inertia) is built around employment policy variables
(DEPFORM4 and PRECAl) and around certain work organisation modalities
(NVORGAl, MAJMOit The northern part of the axis associates a relatively
insignificant recourse to fixed term contracts (FTC or temporary work) with a
high level of spending on training. This type of behaviour is also linked to the use
of a specific mode of work organisation, replete with highly mobile employees
and featuring the implementation of a wide array of forms of collective work.
The lower part of the axis is not very specific.
136 Chapter 6

Axis2
0.8
o. 7 TPJMfES! a.\JIIOJ
0,6 PotSAU TPIUITI
0.5 llfPfOlMI
INOAGAl
0.~
0.3 NEGlll!l82
P!lECA! PotWJ
0.1
0.1 ~TERES2 AUTNEGRJ
DEPFOIIIIJ AUlliEGR! SOPit.'2
0.0
N'IORGAJ
-0.1 suPtl'l~,rn~~a~~ NI'OAW
-0.1 PotSAil

-0.3 llfl'falWl
OAORESl P!lECAl
-o.~ WAAIO! OO'FORII!
-o.s
UA.U01
-0.6
PII'CAJ
-o. 1
-0.8
- 0.9
1PART1 Axis 1

-0.1-0.6-0.5 -o.~ -0.3-0.1-0.1 o.o o.1 o.1 o.3 o.~ o.5 o.6 o.7 o.8 o.9 1.0 1.1 1.1

Figure 1. Representation of the factorial axes (1,2)

Axis3

0.9
!NORMI
0.8 PRECAI
IIAIIOl
0.1 OEPFUIII4
0.6
Pot.SA.ll
o.s POlSAlJ
0.~
WNITI DfPf(JIIII
SUPIM
aiDIIES2
0.3
0.1 AI111£GA1 PII'CAJ lfTEAES2
N[Gill!m
0. 1 AUTJEGR3
0.0 """
-0.1 J«QSL981=~
a.\1103 DEPFOAII!
-o.z OADIIESI
IN1I11E$1 POlSAl! SUPtw2
-0.3 PRECA! tfe'ORGAt
-o . ~ """"''i.ul02 AUTI£QRI

-o .s
TA\RT4 Axis 1
-0.6
-0 .1 - 0.6 ·o.s -o.~ -0.3 -o.z -0.1 o.o 0.1 o.1 o.3 o.~ o.s 0.6 0.1 o.8 o.9 1.0 1.1 1.z

Figure 2. Representation of the factorial axes (I ,3)


Work and employment policies in French establishments in 1998 137

2.2 KMCA

The Kohonen map provides us with an all-encompassing and synthetic vision


of these various types of work and employment modes (see Figures 3, 4 and 5).
An initial north-south opposition arises between policies characterised by
varying degrees of strictness with respect to their training, wage and negotiation
system policies.
The northern part (the first two rows) corresponds to an absence of wage
hikes, profit-sharing arrangements and negotiations (wage bargaining or else
other issues); and to little spending on training.
Conversely, the policies located to the southern side of the map are
characterised by wage hikes; employee profit-sharing schemes; wage bargaining
and other negotiations; and major spending on training. We rediscover the
opposition between "restrictive" and "voluntarist" policies that the MCA had
highlighted.

A second contrast can be ascertained along the second diagonal (from the
northwest to the southeast). This relates to work organisation practices and to
wage hike and training modalities. One of these regroupings includes mixed or
general (across-the-board) wages hikes, an absence of profit-sharing
arrangements, non-flattened hierarchies, a work prescription that involves an
allocation of specific tasks, the non-implementation of forms of collective work
and lesser mobility for employees in their jobs5 . Inversely, the southeastern part
regroups policies featuring individualised wages, a team-based work
organisation, much employee mobility, the elimination of hierarchical levels and
a work prescription expressed in overall objectives. These practices go together
with major spending on training.

With the Kohonen map, we are able to summarise the main findings of the
MCA approach. Indeed, in reading this map we again encounter an analysis that
focuses on the axes' variation. The north-south opposition mostly corresponds to
information that can be used to structure the first factorial scale, whereas the
second diagonal cuts across information we could detect on axes 2 and 3. This
crossing of analytical sources is a precious tool for interpreting the
multidimensional phenomena we analyse.
138 Chapter6

SUPNIV2 NVORGA2 TPART4


ORDRESI NEGSL982 PRECAI
JNTERES2 AUTNEGR3 DEPFORMI
POLSAL3
MAJM03

PRECA2 AUTNEGR2 TPARTI


NEGSL981 PRECA3
POLSAL2 DEPFORM2

NVORGA3 MAJMOI
INTERESI

MAJM02 TPART2 TPART3


DEPFORM3 SUPNIVI NVORGAI
AUTNEGRI ORDRES2 DEPFORM4
POLSALI

Figure 3. The Kohonen Map

L
!~~~_/

[.' -· \
--- \_
,.~·,...
-- '
(~ J

,- 7
I I
1
l- - - - --,

I
-J
_j l L

/
/
_j '-
\
.l
\
L- L J
Figure 4. Distance between cells and their closest neighbours

Note: we can regroup cells according to the distances that separate them. A breakdown into 3
classes regroups the 4 squares to the upper left into a first class, the 9 squares to the right (light
gray and darker) into a second class, and the rest into a third one. A breakdown into 5 classes
would make us split classes 2 and 3 in two.
Work and employment policies in French establishments in 1998 139

Figure 5. Cell representatives (final weights).

Note: each cell contains the representation(s) of the vector codes that are associated with the
varying modalities, in the following order: Tpartl Tpart2 Tpart3 Tpart4 Precal Preca2 Preca3
Nvorgal Nvorga2 Nvorga3 Supnivl Supniv2 Majmol Majmo2 Majmo3 Negsl981 Negsl982
Depforml Depform2 Depform3 Depform4 Ordresl Ordres2 Interesl Interes2 Autnegrl
Autnegr2 Autnegr3 Polsall Polsal2 Polsal3.

Some modalities are systematically associated with one another, regardless of


the factorial representation or Kohonen map involved. We can therefore already
draw certain conclusions regarding the behaviour of the establishments we have
studied. On one hand, an absence of wage bargaining is associated with the
absence of negotiations on any other issues; with a lack of wage hikes; and with a
lesser spending on training. On the other, organisational innovations, whether
this involves working in teams or flatter hierarchies (very often associated with a
work prescription that is expressed in global objectives) are tied to a major
spending on training and to individualised wage increases. A typology of
establishments according to their employment and work practices should help us
to further fine-tune this analysis.

3. A TYPOLOGY OF THE ESTABLISHMENTS

In observing the classification tree featured in the appendix, we can clearly


see that the sample is split into two classes whose sizes (number of firms) are the
same. This corresponds to the aforementioned dualistic opposition that crops up
in MCA or KMCA analyses. A more refined breakdown (into 5 classes) allows
us to define the so-called "restrictive" and "voluntarist" policies with greater
precision. The various classes have been reinterpreted through a projection of
140 Chapter 6

nearly 200 additional variables. Note that at present we will only be summarising
the main conclusions of this analysis.

Table 1. Repartition of establishments and employees by class


Restrictive policies Voluntarist policies
Class I Class 2 Class 3 Class 4 Class 5
% establishments 8 40 18.7 21.2 12.1
%employees 7.6 28 .9 II 33.6 18.9

3.1 Establishments that pursue a restrictive policy

On one hand, we find establishments that pursue a restrictive and relatively


inactive policy, whether in terms of their wage policy, negotiation system or
organisational innovations. They are more frequently characterised by an absence
of wage hikes and negotiations, even when the latter only relates to wage
bargaining. Profit-sharing arrangements or individualised wage polices are
relatively under-developed, and fixed term forms of employment are relatively
rare. Lastly, organisational innovations are applied sporadically, at best. On the
other hand, such establishments behave differently from one another in terms of
their use of part-time work, their training policies and certain aspects of their
work organisation (the way in which tasks are prescribed or employee mobility).
We distinguish an initial group largely comprised of mutuals or associations
operating in the healthcare sector and featuring a large number of female
employees. This group represents 8.5% of all establishments and 7.4% of all
employees. These are establishments that manifest a certain desire for projects
entailing innovative and qualitative types of work organisations and training
programmes, and which opt more generally for a strategy based on a qualitative
type of product offer. However, they also rely on employment management tools
that are very restrictive in nature, whether this relates to the wage policies,
negotiation systems or forms of employment concerned. Despite poor working
conditions, employees seem to be often motivated by their identification with the
firm's objectives. The employment relationship seems to be based on a reciprocal
commitment that has been established at a relatively low level. This type of
system can be termed compromise-oriented cost management.
A second group, representing 30% of all establishments and 17.4% of
employees is comprised of small, less capital-intensive establishments that
practice a restrictive policy combined with a very little utilisation of so-called
atypical forms of employment. Their commercial strategy is mainly geared
towards costs (both in terms of their product offer and also as regards their
internal management). This type of approach can be termed stabilised cost
management.
Work and employment policies in French establishments in 1998 141

A third group, representing 11.5% of all establishments and 4% of employees,


regroups establishments that are even smaller than in the preceding group. This is
a category that accounts for a large number of production workers. The
employment management methods it uses are quite severe, with fewer wage
hikes, negotiations and above all training. This orientation can be termed strict
cost management.

3.2 Establishments that pursue a voluntarist policy

The other establishments pursue a policy that we can call voluntarist. This
second class is characterised by the fact that it frequently resorts to collective
work, active wage policies (with individualised pay hikes and profit-sharing
arrangements), frequent negotiations, a great deal of spending on training and a
frequent use of precarious forms of employment. Moreover, the work
organisation in these establishments is more or less geared towards polyvalence.
Two groups of establishments can be distinguished in this category however.
This distinction is based on their contrasting wage practices (general or
individualised hikes); the varying proportions of fixed term employment they
offer; and/or their recourse to collective work mechanisms.
On one hand, we have a group representing 33.5% of all establishments (and
45.1% of employees) that is comprised of large capital-intensive groups which
often belong to the industrial sector. Such firms employ a great number of
technicians and workers and feature very active training or pay policies. They
also seem to favour internal careers, meaning that the level they operate at is
closer to the traditional internal market model. Note however that these forms of
employment are accompanied in this group by policies that involve innovative
work organisation policies and production techniques. Moreover, the strong
reliance on temporary work or FTC contracts in this context means that we can
hypothesise a dualistic type of employment management, i.e., a renewed internal
market.
We distinguish another group, representing 16.5% of all establishments and
26.1% of employees, which like the one above is comprised of larger and
relatively older groups, but where such firms are less confined to the industrial
sector. The establishments here employ more managers, fewer workers, and just
as many technicians. Their work organisation and production methods are very
innovative. Spending on training is high and wage policies are based on
individualisation and on profit-sharing. They resort relatively infrequently to
fixed term contracts. All in all, we can call such work and employment policies
professionalised management. Employees' working and employment conditions
are relatively beneficial and people are very involved in (and associated with) the
firm's objectives. The work and employment organisation twins independence in
142 Chapter6

one's work with a personalised motivation, and employees feel a great sense of
responsibility.

CONCLUSION

In terms of the data that was used in the present study, the Kohonen algorithm
turned out to be entirely complementary to traditional methods of data analysis.
An association of these two methods is particularly useful in a synthesis of
complex information, whether this involves an overall analysis or the creation of
a typology.
Regarding our interpretation of the transformations that have affected the
structure of the labor market over the past 20 years, the present study has enabled
us to advance two main conclusions. On one hand, we have been able to ascertain
a mode of production and work organisation that is close to the canonical model
of flexible production (involving individualised carrier management and
polyvalence in work organisation). Note that to a certain extent this has
developed to the detriment of the classical forms of the internal Fordist market,
and not in parallel to them (as shows the emergence of a renewed ILM class).
Secondly, our analysis enables a precise study of the extent to which the labour
market segmentation schema has been globally incorporated. In addition to its
definition of a professionalised work organisation segment (class 5), it has
enabled us to differentiate three types of organisation within the entity that is
generally considered globally and called the secondary market (with classes 1, 2
and 3).

ACKNOWLEDGEMENTS

The present study was carried out under the auspices of a research agreement
between the DARES - France's Ministere de l'emploi et de fa solidarite and
MATISSE - Universite Paris I covering a processing of the REPONSE 98
survey. See Lemiere S., Perraudin C. and Petit H. (2001) for a complete report of
these researches. We would like to thanks Marie Cottrell, Patrick Letremy,
Christophe Ramaux and Bernard Gazier for their advice and suggestions during
this project. We are particularly grateful to Patrick Letremy for having allowed
us to use his computer programmes, available at: http://samos.univ-parisl.fr,
logiciels. We would like to thank Alan Sitkin (Transalver Ltd.) for the translation
of this paper.
Work and employment policies in French establishments in 1998 143

APPENDIX

Table 2. Results of the MCA analysis

Singular Values Princi~a1 Inertias Chi-Sguares Percents


0.45536 0.20735 5855.96 11.40%
0.35163 0.12365 3491.99 6.80%
0.33483 0.11211 3166.24 6.17%
0.31729 0.10067 2843 .15 5.54%
0.31580 0.09973 2816.55 5.49%
0.30901 0.09549 2696.73 5.25%
0.30481 0.09291 2623.96 5.11%
0.30077 0.09046 2554.78 4.98%
0.29664 0.08799 2485.09 4.84%
0.29346 0.08612 2432.14 4.74%
0.28914 0.08360 2361 .08 4.60%
0.28176 0.07939 2242.04 4.37%
0.28031 0.07857 2219.07 4.32%
0.27711 0.07679 2168 .63 4.22%
0.27304 0.07455 2105.51 4.10%
0.26938 0.07257 2049.43 3.99%
0.26590 0.07070 1996.72 3.89%
0.26069 0.06796 1919.33 3.74%
0.24727 0.06114 1726.80 3.36%
0.23755 0.05643 1593.71 3.10%

1.81818 51348.9 (d.f. = 900)

Volunlarislpolicies

Reslrittivepol~~'

Figure 6. Classification tree.


144 Chapter 6

NOTES

I. We regrouped into one and the same modality both the non-responses and the "Does not know",
when the latter stemmed from nested questions. It remains that this modality generally involved
a small sample size, leading us to only incorporate those establishments that never answered
"Does not know" or" missing" to any of the active variables.
2. Kohonen (1984; 1993; 1995). See Allison, Yin et al. 2001; Cottrell, Fort, Pages 1998; Cottrell,
Gaubert, et al. 1999; Oja, Kaski 1999, for presentations of these methods.
3. Our interpretation will only cover the three first factorial axes. This will allow us to account for
around 25% of the total inertia. See the appendix for the results of this analysis. Note that all
the modalities are represented, but we will only interpret those modalities that have the greatest
contribution to the axes' construction (as well as those that are well represented).
4. Here we are talking about modalities and not about variables, given the disparity between the
contribution (and even the quality) of certain modalities' representation (i.e., DEPFORM1 and
DEPFORM2 cannot be analysed along axis 3, unlike DEPFORM3 and DEPFORM4).
5. This finding appears to contradict our reading of axis 2 of the MCA, which associated a lesser
mobility with a work prescription that is expressed in terms of overall objectives. On the other
hand, it corresponds to our reading of axis 3. In addition, the result is a robust one, in that it is
able to withstand a repetition of the algorithm.

REFERENCES

Allison N., Yin H., Allinson L., Slack J. (2001), Advances in Self Organising Maps, Springer.
Boyer R., Beffa J.L., Touffut J.P. ( 1999), Employment relationships in France: the State, the firms
and the financial markets, Published by the Saint Simon foundation, December.
Cottrell M., Fort J., Pages G. (1998), Theoretical aspects of the SOM algorithm, Neurocomputing,
21, 119-138.
Cottrell M., Gaubert P., Letremy P., Rousset P. ( 1999), "Analysing and Representing
multidimensional quantitative and qualitative data: Demographic study of the Rhone valley. The
domestic consumption of the Canadian families", in Oja E., S.Kaski (eds.), Kohonen Maps,
Elsevier, June, 1-14.
Galtier B. (1996), "Gerer Ia main d'a:uvre dans Ia duree: des pratiques differenciees en
renouvellement", Economie et Statistique, 298, 45-70.
Kohonen T. (1984, 1993), Self-organization and Associative Memory, 3'd ed., Springer.
Kohonen T. (1995), Self-Organizing Maps, Springer Series in Information Sciences, 30, Springer.
Lemiere S., Perraudin C., Petit H. (2001 ), Regimes d'emploi et de remuneration des etablissements
fram;:ais en 1998, Construction d u' ne typologie apartir de l'enquete REPONSE, rapport dans le
cadre de Ia convention d'etude sur I'Enquete REPONSE, pour le compte de Ia Direction de
I' Animation et de Ia Recherche, des Etudes et des Statistiques (DARES) du Ministere de
l'emploi et de Ia solidarite, 51p, novembre.
Oja E., Kaski S. (eds.) (1999), Kohonen Maps, Elsevier.
Chapter 7

Measuring and optimizing the validity of


Means-End data

Jacques-Marie AURIFEILLE, Stephane MANIN


Laboratoire GREGEOI-FACIREM, Universite de La Reunion

Abstract: The validity of means-end data is discussed from a conceptual and empirical
perspective. A dynamic programming approach (Markov chain) is then proposed
that enables to measure the validity and reliability of any group of means-end data
(item, hierarchical level, whole data basis), thus enabling to purify it and to have
empirical insight on debated topics such as the number of means-end levels to be
considered and the validity of the means-end data collection methods.

Key words: Means-end Chains, Markov Chains, Genetic Algorithms, Consumer Behaviour.

INTRODUCTION

Product attributes usually play a major role in a consumer's final choice.


However, they rarely suffice to define a lasting marketing strategy. This is so
because most attributes are constantly changing and copied, or because their
specificity is difficult to explicit. Since Haley's initial proposition, many
researchers have commented that expected benefits might offer a more
meaningful understanding of consumer behaviour (Haley 1968). As a result, a
major challenge in consumer marketing has been to connect product attributes
with consumer's expected benefits.
A large variety of theoretical models and many statistical methods have been
proposed to formalize the association between product attributes and benefits.
However, these works have been hampered by the lack of a data collection
method that reflects the decision process underlying the attribute/benefit

145

C. Lesage et al. (eds.), Connectionist Approaches in Economics and Management Sciences


© Springer Science+Business Media Dordrecht 2003
146 Chapter 7

assoctatwn. The proposttton of an open interviewing technique, known as


"laddering interview" (Gutman, Reynolds 1979; Reynolds, Gutman 1988),
provided a new opportunity to trace the attribute/benefit relationship. Starting
from a consumer's preferred product or most important attribute in a category of
products, a laddering interview progresses constantly into the respondent's more
personal and essential motivations by repeating the same question about the
previous reply: "Why is that important to you?". Almost implacably, this
inquisitorial method describes a single chain from object to subject, through
motives of higher abstraction and deeper personal meaning, ending when the
consumer cannot think of any higher order motive. The series was called a
"ladder" or "Means-end chain" (MEC) to express that each link was a means to
reach a more personal end (Gutman, Reynolds 1979; Gutman 1982). In means-
end theory, the "ultima ratio" is generally conceptualised as a terminal value
(Rokeach 1973). However, any end-state of being could be accepted (Gutman
1982, Walker, Olson 1991; Bagozzi, Dabholkar 1994; Pieters, Baumgartner,
Allen 1995; Gutman 1997).
When several starting points are important for the consumer, a different MEC
can be generated from each starting point, by considering them in order of
decreasing importance. On average, laddering interviews generate 3 MECs per
consumer (e.g. Olson, Reynolds 1983; Clayes et al. 1995; Gengler, Reynolds
1995), sometimes going up to 5 ladders (Reynolds, Jamieson 1984).

The MEC approach to understanding consumer behaviour has several


attractive features:
1. A dynamic conception of the relationship between product attributes and
consumer benefits,
2. A multi-process view of this relationship, in accordance with several
researchers' opmwn that consumer decision-making results from
simultaneous and interactive processes (Cacioppo, Petty, Kao and Rodriguez
1986; Chaiken 1987; Aurifeille, Clerfeuille and Quester, 2001; Zajonc,
Markus 1982).
3. A structure of collected data which corroborates the theoretical framework.

However these qualities are raising considerable data analysis problems, with
the appropriate length of a MEC becoming a moot-point among researchers.
Gutman (1982) proposed that a MEC comprises 3 levels: the "attribute" level, the
"consequence" level and the "value" level. Although some researchers strongly
advocate that only two links should be considered (ter Hofstede, Audenaert,
Steenkamp, Wedel 1998), a typical laddering interview would often generate
more than 3 successive links. Such empirical evidence supports the idea that each
one of Gutman's basic three levels could be split into two, following Rokeach's
(1973) distinction of two value levels (terminal and instrumental). The resulting
Measuring and optimizing the validity of Means-End data 147

hierarchy (Olson, Reynolds 1983; Peter, Olson 1987; Pitts, Wong, Whalen 1991;
Reynolds, Gengler, Howard 1995) is summed up in Table 1.

Table 1. Structure of a MEC, example in the case of shampoo brands.

Level Example (shampoo)


Preferred brand B
Attribute concrete Made from plants
abstract Soft
Consequence functional Frequent washing
psycho-social Elegance
Values instrumental Self-confidence
terminal Accomplishment

Longer MECs, like those collected with laddering interviews, are likely to
reflect more precisely a consumer's means-end processes. However, because of
the large number of possible arrangements, such precision could be lost and
create unnecessary data analysis problems when the population's common MECs
are sought. On average, the number of possible MEC is 85, and the observed
MECs are rarely common to more than 2% of the sample. Many other questions
have not been rigorously addressed yet, in particular that of the number of MEC
to be collected from a consumer and that of the importance of a MEC. Indeed,
even if researchers agreed to analyse only one MEC per consumer, there would
be no method for choosing the appropriate MEC among those expressed by the
consumer.

To clarify these issues a quantitative method is proposed in this paper. In the


first part, the necessity of a measure of means-end validity is justified by
studying past means-end researches, at both the data collection and the data
analysis levels. Therefore, a definition of means-end validity is proposed that
complies with the discussed researches.
In the second part of the paper, a quantitative measure of means-end validity
is proposed. This measure, derived from dynamic programming research
(Markov chain modelling), can be applied to all levels of a means-end data
analysis: the item level, the hierarchical level and the data collection method
level. Coupled with the corresponding measure of means-end reliability, the
proposed methodology enables to purify means-end data sets from their less
relevant items, where relevance means that suppressing the item would degrade
significantly the means-end validity or reliability of the remaining means-end
network. The proposed methodology is extended to groups of items, thus
enabling to study the validity and reliability of any hierarchical level(s) or data
collection method. To avoid the limitations of the stepwise analysis commonly
used for such filtering problem, a genetic algorithm (Holland, 1975) is proposed
148 Chapter 7

for identifying the most valid combination of means-end items. In the third part
of the paper, the procedure is tested empirically, using actual means-end data
collected from 509 consumers of beer.

1. DATA COLLECTION AND ANALYSIS: A


CONTINUING DEBATE ABOUT MEANS-END
VALIDITY

This section considers the validity of MECs from the perspectives of data
collection protocols and analysis procedures. The purpose is to make clear the
main problems and advantages inherent in each area of concern and to propose a
definition of means-end validity.

1.1 Data collection issues

When Reynolds and Gutman proposed the initial laddering approach, it


seemed that accepting only an answer at each question would generate the most
meaningful links. As a consequence, only one chain could result from an
attribute. This choice, however, could discard other important chains generated
by the same attribute, therefore raising the question of the "means-end validity"
of a chain or of a collection method.
Several alternate protocols have been proposed that allow more than one
answer to be derived from any item. For example, Young and Feigin (1975), in
their "Grey Benefit Chain", systematically asked for two motives on any one
item, while Valette-Florence and Rapacchi's (1991a) card-sorting protocol
allowed for multiple motives per item. In the card sorting method, respondents
were presented with n sets of cards corresponding to n a priori means-end steps.
The items and means-end steps were based on previously completed laddering
interviews. Thus, starting from an attribute, the respondent describes a tree-like
diagram of means-end processes by juxtaposing cards from each successive step.
To obtain an even broader collection of links, an Association Pattern Technique
(APT, ter Hofstede et al. 1998; Gutman 1982) approach may be used. In this
technique, respondents are presented with two tables: one crossing a priori items
with a priori consequences and another crossing a priori consequences with a
priori values. Thus, the respondents must consider each table separately and
indicate which links (ie table cells) are meaningful to them.
With many more associations identified at each step, the number of MECs
increases exponentially, thus imposing a drastic limit on the number of steps. As
a consequence, when laddering interviewing is avoided, it is common practice to
Measuring and optimizing the validity of Means-End data 149

consider only two links: the attribute-consequence and the consequence-value


links.
However, researchers have provided conceptual and empirical evidence that
means-end processes would be better reflected by considering a larger number of
steps. In particular, significant relations could be found between MEC length and
consumer dispositions such as cognitive abstraction, product familiarity and
involvement (Walker, Celsi, Olson 1987; Perkins, Reynolds 1988; Pitts, Wong,
Whalen 1991; Mulvey, Olson, Celsi, Walker 1994; Clayes, Swinnen, Van den
Abeele 1995). Moreover, the over-specification of a MEC structure results in
some paths not being elicited. For example, in face-to-face laddering interviews
some respondents connect attributes immediately to values, effectively skipping
steps required by an a priori definition of a means-end model, or conversely they
take more than two steps to arrive at an end point (Reynolds, Gutman 1988;
Reynolds, Craddock 1988; Grunert, Grunert 1995). Finding the right number and
the right length of consumers' MEC could have important operational
consequences, in particular for advertising conceptualisation (Olson, Reynolds
1983; Reynolds, Gutman 1984; Reynolds, Craddock 1988; Reynolds, Rochon
1991). For instance, posters would use shorter MEC than a magazine (Aurifeille,
Valette-Florence 1995).

1.2 Data analysis issues

Schematically, two main approaches have been used to analyse means-end


data: one hierarchical and one multidimensional
With hierarchical methods, concatenating the most frequent links and drawing
a graph of the resulting paths identifies the dominant chains. Reynolds and
Gutman's HVM (Hierarchical value maps) are typical of this approach
(Reynolds, Gutman 1988). Graph theory was also used in this perspective
(Valette-Florence, Rapacchi 1991 b): the capacity of a MEC to convey means-end
flows from attribute to values would depend on the total capacity (i.e. frequency)
of its links. Numerous publications and algorithms have been dedicated to the
hierarchical method, with several of them providing sound results and
unambiguous graphs (Olson, Reynolds 1983; Reynolds, Gutman 1988; Reynolds,
Gengler 1995; Aurifeille, Valette-Florence 1995).
However, these hierarchical methods sometimes generate inconsistent
pathways. This is so for a number of reasons. First, as illustrated in figure 1
where the thickness of lines is used to trace the replies of 30 consumers, it is
possible that continuity of concatenated means-end flows is an artefact of the
method. Thus, consumers linking Itl to It6 are not the same as those linking It6
to ItlO. Rather, the consumers linking It6 to ItlO have previously linked either It2
to It6 or It3 to It6. In a like manner, the chain of more frequent links [Itl > It6 >
150 Chapter 7

lt9> ltlO] is composed of three groups of consumers. Thus, [Itl > lt6] proceeds
from ltl, [lt6 > It9] proceeds from It2 and lt3, and finally [lt9 > ltlO] proceeds
from It4 and ItS.

&, l5~
5 /

1~05.-Jr
~~ 5
119,:£
~5 5

LJ ~
~5

Figure 1. An example of means-end inconsistency

A second reason why hierarchical methods can generate inconsistent paths is


that low frequency links, being more numerous, prevail and, when added at a
subsequent level, may dominate the more homogeneous links. To avoid this
problem, several authors have applied an arbitrary "cut-off' level. Thus, any link
with a frequency below the cut-off level is not considered (Reynolds, Gutman
1988; Gengler, Reynolds 1995; Gengler, Klenovsky, Mulvey 1995; Pieters,
Baumgartner, Allen 1995). Unfortunately, whenever a cut-off is operated, the
raw data should be re-conditioned by removing all the following links in the
consumers' ladders. To our knowledge, revision of data in this way has never
been carried out. This may be because re-conditioning the data would require a
re-consideration of the cut-off level, as the total frequencies in the matrix will
have changed, thus making the fixing of a correct cut-off level extremely
complex.
The complexity of the problems raised by instituting cut-off levels suggests
an alternate approach: to filter both items and links that convey more noise than
valid means-end information. Multi-dimensional analyses have been used to
complete filtering in this way and so uncover the latent structure of the means-
end relationships. These approaches work by focusing on the frequency with
which two items occur in the same ladder, whatever the number of the links
Measuring and optimizing the validity of Means-End data 151

between items. Thus, co-occurrences as measures of means-end association and


researchers use them to identify means-end chains by positioning them in a
"means-end space" and/or by clustering them (e.g.: Aurifeille, Valette-Florence
1995). Although these analyses provide more information about the latent means-
end relationships, the resulting maps or clusters are not constrained by a
hierarchical model, thus raising interpretation problems and violating some basic
assumptions. For instance, the clusters often comprise no terminal value or no
starting point (e.g. Valette-Florence, Rappachi 1991 b).
To overcome the interpretation problem and to estimate more precisely the
role of the items within the MEC, a growing number of predictive studies have
been undertaken. These techniques benefit from the increasing availability of
sophisticated regression methods such as ordinal, multinomial and mixture
models (Reynolds, Sutrik 1986; Perkins, Reynolds 1988; Mulvey, Olson, Celsi,
Walker 1994; ter Hofstede, Steenkamp, Wedel 1999). With the inclusion of
interaction effects in the tested models and the advances in latent class
methodology (Wedel, DeSarbo 1995), it seems that a general method of
accounting for latent structures and means-end causality may now emerge.
However, with these formal methods, it is likely that statistical constraints will
continue to limit the complexity of the tested model. Thus, the number of items
and, more importantly, the number and the kind of possible interaction terms will
be radically limited, either at the data analysis level or at the data collection level.
As an example, ter Hofstede et al. (1998) study is limited to second order
interactions. Similarly, ter Hofstede et al. study applies to data that comprises
exactly two steps (attribute > consequence and consequence > value) (ter
Hofstede, Steenkamp, Wedel 1999). The discussion in section 1.1. suggests that
this a priori and very restricted framework could be excessive.

1.3 A definition of means-end validity

In the previous sections, it has been argued that means-end data collection has
suffered from the limitations of available statistical analyses. Some statistical
methods are exerting on the data collection structural constraints which may be
excessive (e.g. the APT approach). Others allow more flexible data collection
(e.g. laddering) but they lack the structural guidelines necessary for achieving
interpretable results (e.g. multidimensional analysis and clustering).
The converse perspective on MEC research would see statistical techniques
develop, which would respect three criteria. First, they must exert as little
constraint as possible on the data to be measured (e.g. no a priori number of
links). Second, they need to be consistent with the reality of means-end processes
by adopting a dynamic approach (e.g. by accounting for the order of elicitation).
Third, they must be capable of checking means-end validity at every data level
152 Chapter 7

(item, main hierarchical steps, and whole database), so that a better


understanding of the phenomenon may be developed using relevant filtering
decisions.
To achieve such measurement, a definition of means-end chain validity must
also comply with all the conceptions discussed above. Therefore, we propose the
following formulation:
"The means-end validity of a concept (item, level, network) is its capacity to
convey means-end flow from product stimuli to consumers' terminal goals".
In the next section, a measure of means-end validity is presented that fits the
three criteria as well as the above definition. It is based on a measure of validity
called "internal predictivity", developed in Markov chains modelling for
comparing different dynamic processes.

2. A MARKOV CHAIN APPROACH TO MEANS-END


VALIFITY

This section is composed of two parts. In the first, the utilisation of a dynamic
programming approach of Means-end Validity is justified as well as the concept
of internal predictivity. In the second part, the method is illustrated empirically
using actual data from cigarette smokers.

2.1 Markov chains and means-end chain analysis

Dynamic programming is a set of methods for analyzing recursive processes


that are repeated several times following the same rules. These techniques are
commonly known as Markov Chains, after the Russian mathematician Andrej
Andreevic Markov (1907). Markov chain theory is continuously developed, in
particular in the areas of cognitive sciences and artificial intelligence. Its specific
strength is extracting probabilistic decision trees from complex dynamic models
(Puterman 1994; Ghahramani, Jordan 1996; Jordan, Ghahramani, Saul 1997).
Although consumer behaviour processes rarely repeat themselves exactly,
there are time patterns that a Markovian approach may successfully uncover. In
Marketing, this approach has been used to model brand switching and, therefore,
estimate long-term market shares (Meyer, Kahn 1990). Since then, the
application of Markov chains in Marketing has broadened, for instance with
studies examining the successive states of dispersion resulting from word-of-
mouth (Eliashberg, Jonker, Sawhney, Wierenga 2000). In the basic Markov
model, each relationship is assumed to capture a percentage of the previous
relationship and to predict a part of the next relationship. These percentages are
Measuring and optimizing the validity of Means-End data 153

normalized so that any item has a 100% probability ofbeing linked. The resulting
matrix (called "transition matrix") gathers the probability of a transition between
any two items. The global transition process can then be simulated over N
periods by raising the transition matrix to its Nth power. As a result, the most
probable connections are emphasised, while the least probable are reduced to a
lesser role. In the vast majority of cases the result of this process is that the
probabilities of transition converge to stable values. Hence, the incidental
elements of the initial probabilities are progressively filtered, leaving a weighted
graph of the significant relationships.
Means-end chains are very similar to a Markov chain in that they are
composed of sequential decision-making steps with discrete states and
probabilistic outcomes that can be conveniently translated into a graph (a
"HVM"). In both cases, the objective is to reduce the uncertainty, also known as
"entropy" in Information Theory, in the observed phenomena by translating the
significant information into the simplest possible graph. In order to do so a
measure of entropy is needed. Several measurements have been proposed to cope
with the variety of the existing problems. Most of them are based on the strength
of the associations (e.g. Cramer's V, Goodman and Kruskal's 1 and g). A "u"
measure, based on Shannon's entropy was also proposed by Theil (1971), to
quantify how much one variable (e.g. row) predicts another variable (e.g. a
column). These association measures can be used to quantify the entropy within a
transition matrix and, therefore, to research ad hoc methods to reduce this
entropy. However, association measures do not account for the dynamic
characteristic of the processes. This can be illustrated with the limit case of a
transition matrix whose rows are identical and whose columns are different.
According to the measures of entropy mentioned above, the association between
rows and columns is zero, meaning that no information is provided. However
there clearly is some information: the probability of a column is equal to its score
across the rows. For this reason, the measure of entropy used in this paper is
slightly different from the existing association measures. It has been recently
proposed to deal with dynamic process by Berchtold and Ritschard (1996). Using
Shannon's entropy, these authors propose to measure the predictivity of a
transition matrix, that is the ability of any item of the process to predict any other
item. In terms of MEC theory, the predictivity measure would reflect the ability
of an item to convey means-end flow, thus corresponding to our definition of
means-end validity (2.1).
The measure of predictivity of a transition matrix that Berchtold et al. (1996)
propose is made at the item level, then averaged and adjusted for item frequency.
Empirical comparisons with alternative measures indicate that this measure has a
normal asymptotic distribution and is more particularly reliable and valid:
(Berchtold 1997).
154 Chapter 7

L lli L pi,j log2pij


j
P(M) = 1+ -----,,....,...,...--- (1)
N log2c

where:
- Pij is the probability of a direct link between i and j,
- n; is the number of occurrences of item i , and N is the total number of
occurrences of all items,
- c is the number of columns with a non-zero sum,
- P(M) is comprised between 0 and 1, with 1 representing a 100% predictivity.

P(M) has the additional advantage of accounting for the reliability of the
estimates. This is useful for comparing different matrices, because the total
number of occurrences has a non-linear effect on the distribution of probabilities.
Getting into the theoretical and empirical justification of this measure would
go beyond the scope of the present paper. Therefore the readers are invited to
consult Berchtold and Ristschard ( 1996) as well as Berchtold (1997). Let us just
stress the difference between the dynamic probabilities of a n order transition
matrix and static association measures such as item co-occurrences. For example,
with the means-end data analysed in the next section (2.2), the correlation
between the markovian probabilities and the number of co-occurrences is -0.079,
with a probability of 0.099 (Bartlett's chi-square). Following the observations
made in other fields of consumer behavioural processes, this discrepancy
suggests that a Markovian approach may better reduce the entropy within a
means-end chains process.
The definition of the proposed predictivity measure enables three fundamental
analyses:

1. Checking the means-end predictivity of the items and filtering the less valid
ones. Typically, it could follow a backward stepwise approach, by removing
one by one the less valid items, until the matrix P(M) score is significantly
degraded.

2. Checking the means-end predictivity of a hierarchical level, thus indicating


how long the considered MECs could be and providing some insights on
means-end theory.

3. Comparing the predictivity of means-end networks collected using different


collection protocols.
Measuring and optimizing the validity of Means-End data 155

The selection of the optimal combination of items or levels to remove could


be done in a stepwise manner. However, in highly combinatorial problems like
the MEC processes, this classical approach has limitations that may result in sub-
optimal and non-robust results. Instead, we propose to use a bio-mimetic
heuristic: the Genetic Algorithm (GA, Holland 1975).

2.2 The genetic algorithm approach

The basic principle of bio-mimetic methods is that an optimal solution is


more likely to emerge if a multiplicity of agents can search it simultaneously and
exchange information. This multiplicity means that a GA is less dependent on the
heuristic's starting point( s) and more robust against local minima, when
compared to most heuristics based on a single agent.
GA solution consists of a vector containing possible values of the parameters
to be estimated. The corresponding score, on the function to optimise,
characterises this vector. A simple mechanism of selection/reproduction is
followed to search for better scores. Those solutions with better scores have their
parameter estimates spread more widely among the possible outcomes.
Conversely, poor scores increase the likelihood that parameter estimates will be
discarded. Through a continuing process of selection and reproduction the range
of possible solutions converge towards a single answer, which is composed ofthe
best parameter estimates.
The selection/reproduction mechanism of a GA is directly inspired from
genetic reproduction. Each solution is coded in binary terms, thus allowing a
"cross over" process analogous to the one happening among chromosomes. This
cross over capability has two qualities. First, it is extremely simple and so
effectively compensates for the burden of considering several simultaneous
solutions. Second, there is no dependence on statistical (e.g. distributional) or
mathematical (e.g. linear independence, derivability) hypotheses, thus enabling to
explore a wider range of solutions.
The capacity of optimisation and the robustness of the GA is observed in
many researches (e.g. Renders 1995; Man, Tang, Kwong 1999; Aurifeille 2000).
These results are particularly notable when derivative-less constraints and binary
parameters are involved, as in the case of means-end predictivity.

A basic GA is as follows:

1. Initialisation: Fix the number of parameters, fix the number of chromosomes


(C = 30 to 120), fix the number of reproductions in each iteration (R= 60% to
80% of the chromosomes) and fix the number of mutations (M = 1% to 1%o
of the parameters of the whole chromosome population). Assuming that P is
156 Chapter 7

the number of parameters to estimate, draw at random C series of P


parameters, one per chromosome. Position the parameters at the same
location in all chromosomes.
2. Measure of the chromosomes' scores: Measure the score, si, of each
chromosome, ci, the error of which has not yet been measured. If the best
score has not changed for more than 2.C iterations, go to six. Otherwise go to
three.
3. Selection/reproduction: Repeat R.C/2 times the following procedure. Draw
two chromosomes, ci and cj, with a probability proportional to their scores.
Choose a random cutting point. Generate two new chromosomes, Cij and Cji,
by swapping the part of their codes located after the cutting point.
4. Elimination: Draw at random N.R older chromosomes and replace them by
the N.R new chromosomes. The best chromosome cannot be replaced.
5. Mutation: Draw at random M.N.P parameters and replace them by their
complement to 1. Go to two.
6. End.

In the present study, the chromosomes are possible sets of items, with a
structural constraint that a solution comprises at least two items, one of which
among the terminal goals. This derivative-less constraint is particularly easy to
implement in a GA. Chromosomes that do not respect it are purely and simply
discarded, meaning that the reproduction process goes on until all chromosomes
confirm to the necessary structure. The objective function to optimise is the
predictivity of the transition matrix corresponding to the selected set of items.
However, this filtering may degrade the matrix' reliability, that is the transition
probabilities may become less representative of the whole population's ones. The
existing measures of reliability are adapted to continuous or binomial
distributions of probability and do not apply well to the discrete probabilities of
Markov chains. Several aggregation methods have been proposed to translate this
distribution into a binomial one, but they imply a significant reduction of the
information. Berchtold and Ritschard (1997) have then proposed a measure of
unreliability that overcomes this drawback and allows the comparison of
different matrices. Using the parameters defined in (1) the unreliability of a
transition matrix M is measured as:

c lli(l-millj(Sij)
R(M) = ~ (2)
n ~I lli +1

where R(M) is positive, with 0 meaning the greater reliability.


Measuring and optimizing the validity of Means-End data 157

Since two criteria are considered, predictivity and reliability, means-end


validity can be defined and maximized as the predictivity/unreliability ratio. An
item, or a group of items, whose removal does not degrade means-end validity
can be considered as non-valid and filtered. Thus, not only could transition
matrices be purified and represented more simply but some information is also
gained about which items or hierarchical levels are not means-end valid.
Furthermore, the means-end validity of the filtered transition matrices resulting
from different data collection methods may be analysed, thus informing about the
appropriate collection method and length of a MEC.

2.3 Empirical study, discussion

This section is based on a means-end database of 482 ladders, collected


following Reynolds and Gutman's laddering protocol. Respondents were a
sample of cigarette smokers, complying with Western European smokers' age,
gender and consumption levels. Fifty-five items were considered, 14 attributes (8
concrete and 6 abstract), 21 consequences (10 functional and 11 psychosocial)
and 20 values ( 10 instrumental and 10 terminal). The stimuli were the preferred
brand. As usual with laddering interviews, the lengths of the collected ladders
varied from 2 to 8, with an average of 4.593 and a standard deviation of0.873.
The Genetic algorithm was operated with 35 chromosomes, a 60% renewal
rate and a 1% mutation rate. These values are within the usual range of
recommended ones (Man, Tang, Kwong 1999) (DeJong 1975). The "elitist" rule
was followed (Goldberg 1991 ), meaning that the best solution was systematically
preserved from replacement and mutation. In addition, two cutting points were
implemented, rather than one, following recommendation by DeJong (1975) as
well as Spears and DeJong (1991). The heuristic could be stopped at the 200th
iteration.

The initial values of predictivity and reliabilility were respectively 0.24 and
0.91. The optimised values were 0.408 and 0.95, corresponding to a ratio of
means-end validity equal to 0.448 and indicating a progression of both criteria.

The 10 terminal values could be parsimoniously achieved with a selection of


28 transitory items among the 45 initial ones. Table 2 shows the proportion of
items removed from each hierarchical level:
158 Chapter 7

Table 2. Percentage of removed items in each hierarchical level

Hierarchical level Removed items


Attributes concrete 4/8
abstract 116
Consequences functional 5/ 10
psycho-social 4/ 11
Values instrumental 3110

Two hierarchical levels seem to be more affected by the removal of some


items, namely: the functional consequences and the concrete attributes.
Conversely, a majority of abstract attributes has been maintained. These results
do not mean that the concrete attributes or functional consequences are
unimportant levels: they might just indicate that fewer items of these levels are
necessary to reach the terminal values.
To check the validity of any abstraction level, the same procedure should be
used. Assuming that P (A) is the validity of the process when a specific
abstraction level A is skipped, the validity measure in table 3 are equal to 1-Ps(A):
the higher the value, the higher the validity of level A.

Table 3. Validity of the means-end levels

Hierarchical level (A) Validity


Whole process 0.732
Attributes concrete 0.718
abstract 0 .738
Consequences functional 0.737
psycho-social 0.719
Values instrumental 0.736

It is apparent from table 3 that two levels are lowering the global validity of
the process: concrete attribute and psychosocial consequence. This result is
consistent with the observation that a majority of concrete attributes had a low
validity (table 2). Therefore, the hypothesis that the concrete attribute level may
be skipped cannot be rejected.
Furthermore, the validity of the database was checked with the two lower
validity levels cancelled simultaneously. The resulting validity was 0,3059,
outperforming all those obtained previously. Hence, in this sample of cigarette
smokers, it appears that a better means-end connection exists between brands and
terminal values when only three levels are considered: the abstract attributes, the
functional consequences and the instrumental values. This result gives some
empirical credence to the concept that a limited number of hierarchical levels
might suffice and, more importantly, that these levels should be divided
according to Gutman's initial distinction of attributes, consequences and values.
Measuring and optimizing the validity of Means-End data 159

However, this a-posteriori result may be highly depending on the type of product
and on the data collection method. Therefore, a cautious approach would be to
follow systematically the stepwise filtering presented here.
The same validity analysis may be extended to whole databases to indicate
which has the best means-end validity and, therefore, inform on the validity of
the corresponding data collection measures. To illustrate this approach, the
relationship between MEC length and means-end validity is studied with the
cigarette data in two ways. First, by measuring the means-end validity
corresponding to all ladders with a number of items inferior or equal to a specific
value, and second, by measuring the means-end validity of two groups of ladders
(i.e. those with 4 items or less, and those with more than 4 items). The results are
given in Table 4.

Table 4. Relationship between means-end validity and MEC length

Number of items Validity Number of MECs


::02 0.3257 3
::;3 0.337 51
::;4 0.3700 200
::;5 0.3547 433
::;6 .3320 474
::07 0.3240 481
::;8 0.3240 482
>8 0.32568 282

The best means-end validity is obtained by retaining the MEC with 4 items or
less. These length MECs have a validity of 0.3 700, versus 0.3568, for those with
more than 4 items. These results are consistent with those of table 3, and so
confirm that a length of 4 items or less is associated with the best means-end
validity levels. Since this consistency was observed with elements of a split
sample, they indicate that the proposed method is robust.

CONCLUSION

In this study about means-end chains, the conceptual and empirical debate
among researchers has been analysed at the data collection and the data analysis
levels. This discussion has laid stress on two necessities:

- to find a relevant analysis method which exerts minimal constraints on the


data collection protocols and comply with the dynamic of the means-end
process;
160 Chapter 7

- to define a measure of means-end validity which enables the analysis of two


main pending issues: the number of links to be considered in a process and
the filtering of the data.

A Markov chain approach was then proposed to satisfy the first necessity in
the case of transient processes. Furthermore, the second necessity could be
addressed by proposing an indicator for the latent concept of "means-end
validity". This indicator is based on the measure of "internal predictivity"
recently developed in Markov chain modelling. It indicates to which level any
item of a dynamic process is able to transmit its input to other items. This
measure has useful statistical characteristics for analysing means-end data. First,
it is based on a measure at the item level, thus allowing to analyse different levels
of aggregation (item, abstraction level, whole database). Second, it is
asymptotically normally distributed, adjusted for the number of items and their
frequencies, thus allowing a wide range of comparisons.

A genetic algorithm enabled filtering the low relevance items, determining the
optimal number of links and comparing means-end data whose structure reflected
different collection methods. It was empirically illustrated with 482 ladders from
cigarette smokers. The empirical results were consistent at three levels:

- At the item level, a greater validity of the means-end network was obtained
by identifying and cancelling the items with the lowest means-end validity,

- At the main abstraction levels, it was observed that 3 transitory levels


provided better means-end validity than the 5 ones initially identified.
Interestingly, the three remaining levels corresponded to those of the basic
means-end hierarchy: attributes, consequences and values. More precisely, the
remaining transitory levels are those of the abstract attributes, the functional
consequences and the instrumental values,

- At the database level, the illustration was done by splitting the data into sub-
samples with contrasted MEC lengths. Consistent with the analyses at the
item and at the main abstraction levels, it appeared that the shorter MEC, in
particular the 4 items ones have a better means-end validity than the longer
ones.

Overall, the empirical test of the method did not reveal conceptual or
statistical anomaly. The consistence of its result across the sub-samples is also
indicator of some robustness. However, the study should be replicated with other
databases. If the consistency and robustness of the method were confirmed, there
would be at last a possibility to test the different data collection protocols. This
Measuring and optimizing the validity of Means-End data 161

would require dual databases, e.g. one from face to face laddering and one from a
paper and pencil task. When this is done, it should be easier to identify and
represent the dominant MEC hidden into a database. Though this paper was
positioned upstream of this question, the proposed method should provide results
analogous to those classically obtained with Markov chain models, that is trees
with links whose length translates the importance.

REFERENCES
Alwitt L.F. (1991), "Analysis Approaches To Moment By Moment Reactions To Commercials",
Advances in Consumer Research, 18,550-551.
Aurifeille J.-M., Clerfeuille F. Quester P.G. (2001), "Consumers' attitudinal profiles and
involvement", Advances in Consumer Research, XXVIII, June.
Aurifeille J.-M., (2000), "Methodological and empirical issues in market segmentation: a
comparison of the formal and the biomimetic methods", Fuzzy Economic Review, 5(1), 43-60.
Aurifeille J.-M., Valette-Fiorence P. (1995), "Determination of the Dominant Means-End Chains",
International Journal of Research in Marketing, 12,267-278.
Bagozzi R.P., Dabholkar P.A. (1994), "Consumer Recycling Goals and their Effects on Decisions
to Recycle: A Means-End Chain Analysis", Psychology & Marketing, 11(4), 313-340.
Berchtold A. (1997), "Learning in Homogeneous Markov Chains", Proceedings of the 7th congress
oftheAIDRI, Switzerland: Geneva University, 121-125.
Berchtold A., Ristschard G. ( 1996): « Le Pouvoir Predictif des Matrices de Transition )), Actes des
28emes Jourm!es de Statistique, Quebec: Universite de Laval, 164-167.
Cacioppo J.T., Petty R.E., Kao C.F., Rodriguez R. (1986), "Central and Peripheral Routes to
Persuasion", Journal of Personality and Social Psychology, 51, I 032-1043.
Chaiken S. (1987), "The Heuristic Model of Persuasion", in Zanna M., Olson J., Herman C. (eds.),
Social Influence: The Ontario Symposium, 5, 143-177.
Clayes C., Swinnen A., Van den Abeele P. (1995), "Consumers' Means-End Chains for "Think and
"Feel" Products", International Journal of Research in Marketing, 12, 193-208.
DeJong K. ( 197 5), "The analysis and behaviour of a class of genetic adaptative systems", PhD
thesis, University of Michigan.
Eliashberg J., Jonker J.J., Sawhney M.S., Wierenga B. (2000), "MOVIEMOD: An Implementable
Decision Support System for Pre-Release Market Evaluation of Motion Pictures", Marketing
Science, 19(3), 226-243.
Gengler C.E., Klenosky D.B., Mulvey M.S. (1995), "Improving the Graphic Representation of
Means-End Results", International Journal of Research in Marketing, 12,245-256.
Gengler C.E., Reynolds T.J. (1995), "Consumer Understanding and Advertising Strategy: Analysis
and Strategic Translation of Laddering Data", Journal of Advertising Research, July-August,
19-33.
Ghahramani Z., Jordan M. I. (1996), "Factorial Hidden Markov Models", in Tesauro, D. S.
Touretzky, T. K. Leen, (eds.), Advances in Neural Information Processing Systems 7, MIT
Press, Cambridge MA.
Goldberg, D.E. (1991), Genetic Algorithms, Addison-Wesley, USA.
Gutman J. (1982), "A Means-End Model Based on Consumer Categorization Processes", Journal
of Marketing, 46, Spring, 60-72.
162 Chapter 7

Gutman J. (1997), "Means End Chains as Goals Hierarchies", Psychology & Marketing, 14(6),
545-560.
Gutman J., Reynolds T.J. (1979), "An Investigation of the Levels of Cognitive Abstraction Utilized
by the Consumers in Product Differentiation", in Eighmey J. (ed.), Attitude Research Under the
Sun, American Marketing Association, Chicago, 128-150.
Grunert K.G., Grunert S.C. (1995), "Measuring Subjective Meaning Structures by the Laddering
Method", International Journal of Research in Marketing, 12, 209-225 .
Haley R.I. (1968), "Benefit Segmentation: A Decision Oriented Research Tool", Journal of
Marketing, 32, July, 30-35.
Holland J.H. (1975), Adaptation in natural and artificial systems, MIT Press.
Jordan M.l., Ghahramani Z., Saul L.K (1997), "Hidden Markov decision trees", in Mozer M.C.,
Jordan M.I, Petsche T., (eds.), Advances in Neural Information Processing Systems 9, MIT
Press, Cambridge, MA.
Man K.F., Tang K.S., Kwong S. (1999), Genetic Algorithms, Springer.
Meyer R.J., Kahn B.E. (1990), "Probabilistic Models of Consumer Choice Behaviour", in
Kassarjian H., Robertson T.(eds.), Handbook of Consumer Behaviour: Theoretical and
Empirical Constructs, Englewood Cliffs, N.J.: Prentice Hall, 85-123.
Mulvey M.S., Olson J.C., Celsi R.L. Walker B.A. (1994), "Exploring the Relationships Between
Means-End Knowledge and Involvement", Advances in Consumer Research, 21, 51-57.
Olson J.C., Reynolds T.J. (1983), "Understanding Consumers' Cognitive Structures: Implications
for Advertising Strategy", in Percy L., Woodside A.G. (eds.), Advertising And Consumer
Psychology, Lexington, M.A., Lexington Books, 77-90.
Perkins W.S., Reynolds T.J. (1988), "The Explanatory Power of Values in Preference Judgements:
Validation of the Means-End Perspective", Advances in Consumer Research, 15, 122-126.
Peter J.P., Olson J.C. (1987), "Consumer Behaviour: Marketing Strategy Perspectives",
Homewood, IL, Irwin.
Pieters R., Baumgartner H., Allen D. (1995), "A Means-End Approach to Consumer Goal
Structures", International Journal ofResearch in Marketing, 12, 227-244.
Pitts R.E., Wong J.K, Whalen D.J. (1991), "Consumers' Evaluative Structures in two Ethical
Situations: A Means-End Approach", Journal of Business Research, 22, 119-130.
Puterman M.L. (1994), Markov Decision Processes: Discrete Stochastic Dynamic Programming,
John Wiley and Sons.
Renders J.-M. (1995), Algorithmes genetiques et reseaux de neurones, Hermes.
Reynolds T.J., Craddock A.B. (1988), "The Application of the MECCAS Model to the
Development and Assessment of Advertising Strategy", Journal ofAdvertising Research, April-
May, 43-54.
Reynolds T.J., Gengler C.E., Howard D.J. (1995), "A Means-End Analysis of Brand Persuasion
Trough Advertising", International Journal of Research in Marketing, 12, 257-266.
Reynolds T.J., Gutman J. (1984), "Advertising is Image Management", Journal of Advertising
Research, 24(1), February-March, 27-37.
Reynolds T.J., Gutman J. (1988), "Laddering Theory, Method, Analysis, and Interpretation",
Journal ofAdvertising Research, February-March, 11-31.
Reynolds T.J., Jamieson L.F. (1984), "Image Representations: An Analytical Framework", in
Jacoby J., Olson J. (eds), Perceived Quality of Products, Services and Stores, Lexington, MA,
Lexington Books.
Reynolds T.J., Rochon J.P. (1991), "Means-End Based Advertising Research: Copy Testing is not
Strategy Assessment", Journal ofBusiness Research, 22, 131-142.
Reynolds T.J., Sutrick K. (1986), "Assessing the Correspondence of One or More Vectors to a
Symmetric Matrix Using Ordinal Regression", Psychometrika, 51(1), 101-112.
Measuring and optimizing the validity of Means-End data 163

Rokeach M.J. (1973), The Nature of Human Values, New York, Free Press.
Spears W.M., DeJong K. (1991), "An analysis of Multi-point crossover", in Rawlins G.J.E. (ed.),
Foundations of Genetic Algorithms, 301-315.
ter Hofstede F., Audenaert A., Steenkamp J-B.E.M ., Wedel M. (1998), "An Investigation Into the
Association Pattern Technique as a Quantitative Approach to Measuring Means-End Chains",
International Journal of Research in Marketing, 15,37-50.
ter Hofstede F. , Steenkamp J-B.E.M., Wedel M., (1999), "International Market Segmentation
Based on Consumer-Product Relations", Journal of Marketing Research, XXXVI, 1-17.
Theil H. (1971 ), "On the Estimation of Relationships Involving Qualitative Variables", American
Journal of Sociology, 76, 103-154.
Valette-Florence P., Rappachi B. (1991b), "Improvements in Means End Chain Analysis Using
Graph Theory and Correspondence Analysis", Journal of Advertising Research, Journal of
Advertising Research, 31, 30-45.
Walker B., Celsi R., Olson J.C. (1987), "Exploring the Structural Characteristics Of Consumers'
Knowledge", Advances In Consumer Research, 14, 17-21.
Walker B.A., Olson J. C. (1991), "Means-End Chains: Connecting Product With Self', Journal of
Business Research, 22, 111-118.
Wedel M., DeSarbo W.S. (1995), "A Mixture Likelihood Approach for Generalized Linear
Models", Journal of Classification, 12, 21-55.
YoungS., Feigin B. (1975), "Using the Benefit Chain for Improved Strategy Formulation", Journal
of Marketing, July, 72-74.
Zajonc R.B., Markus H. (1982), "Affective and Cognitive Factors in Preferences", Journal of
Consumer Research, 9, 123-131.
Chapter 8

Synergy modelling and financial valuation:


the contribution of Fuzzy Integrals

Xavier BRY a, Jean-Fran9ois CASTA b


°CEREMADE CNRS 7534, Universite Paris Dauphine, bryxavier@yahoo..fr,
b CEREG CNRS 7088, Universite Paris Dauphine, France, casta@dauphinefr.

Abstract: Financial valuation methods use additive aggregation operators. But a patrimony
should be regarded as an organized set, and additivity makes it impossible for these
aggregation operators to formalize such phenomena as synergy or mutual inhibition
between the patrimony's components. This paper considers the application of fuzzy
measures and fuzzy integrals (such as Sugeno, Grabisch, Choquet) to financial
valuation. More specifically, we show how integration with respect to a non additive
measure can be used to handle positive or negative synergy in value construction.

Key words: Fuzzy measure, Fuzzy integral, Aggregation operator, Synergy, Financial valuation.

INTRODUCTION

Financial assessments are characterised by: the importance of the role


assigned to human judgement in decision making, the use of qualitative
information and the dominant role of subjective evaluation. The aim of the article
is to examine the specific problems raised by the modelling of synergy between
the assets of a firm. As a process which aggregates information and subjective
opinions, the financial evaluation of the company raises very many problems
relating to issues such as measurement, imprecision and uncertainty. The
methods used in the process of financial evaluation are classically based on
additivity. By construction, these methods abandon the idea of expressing
phenomena of synergy (or redundancy, nay mutual inhibition) linked to over-
additivity (or under-additivity) that may be observed between the elements of an
organised set such as the firm's assets. This synergy (respectively redundancy)

165

C. Lesage et al. (eds.), Connectionist Approaches in Economics and Management Sciences


© Springer Science+Business Media Dordrecht 2003
166 ChapterS

effect may lead to a value of the set of assets greater (resp. lower) than the sum of
the values of all assets. This is particularly the case in the presence of intangible
assets as good will. We will explore the possibilities offered by non-additive
aggregation operators (Choquet 1953; Grabisch et al. 1995; Sugeno 1977) with
the aim of modelling this effect through fuzzy integrals (Casta, Bry 1998; Casta,
Lesage 2001 ).

1. FINANCIAL ASSESSMENT AND ACCOUNTING


MODEL

The strictly numerical approach which underlies the accounting


representation is not easily compatible with the imprecise and/or uncertain nature
of the data or with the ambiguity of concepts such as imprecision and subjectivity
of the accounting valuations, poorly defined accounting categories, subjective
nature of any risk evaluation method (see March 1987; Zebda 1991; Casta 1994;
Bry, Casta 1995; de Korvin et al. 1995; Casta, Lesage 2001).
Moreover, because its computation structure stems from elementary
arithmetic, the traditional accounting model is not designed to handle features
linked to synergy.
For these problematic issues, we propose extensions of this model which deal
with the synergy affecting the data used in the elaboration of financial
statements. However, this approach requires a thorough re-examination of the
semantics of the accounting measurement of value and income of the firm.
Our discussion concerns the operating rules governing quantification used in
accounting. These rules are based on a rigorous conception of "numericity"
which relates back to a given state of mathematical technology linked to the
concept of measurement used.

1.1 Measure theory and accounting

Generally speaking, measure theory, in the mathematical sense, relates to the


problem of mapping the structure of a space corresponding to observed elements
onto a space allowing numerical representation; the set R of real numbers for
example. The concept of measurement used in accounting has been influenced by
two schools of thought:
- The classic approach - the so-called measure theory - directly inspired by
the physical sciences according to which measurement is limited to the
process of attributing numerical values, thereby allowing the representation of
properties described by the laws of physics and presupposing the existence of
an additivity property;
Synergy modelling and financial valuation 167

- The modem approach - the so-called measurement theory - which has its
origin in social sciences and which extends the measure theory to the
evaluation of sensorial perceptions as well as to the quantification of
psychological properties (Stevens 1951; 1959).

The quantitative approach to the measurement of value and income is present


in all the classic authors for whom it is a basic postulate of accounting. The
introduction by Mattessich (1964 ), Sterling (1970) and ljiri (1967; 197 5) to
Stevens' work provoked a wide-ranging debate on the modem theory of
measurement but did not affect the dominant model (see Vickrey 1970).
Following criticisms of the traditional accounting model whose calculation
procedures were considered to be simple algebraic transformations of
measurements (Abdel-Magid 1979), a certain amount of work was carried out,
within an axiomatic framework, with a view to integrating the qualitative
approach. However, the restrictive nature of the hypotheses (complete and
perfect markets) (see Tippett 1978; Willet 1987) means that their approach
cannot be generally applied.
Efforts to integrate the qualitative dimension into the theory of accounting did
not come to fruition. From then on, the idea of measurement which underlies
financial accounting remained purely quantitative.

1.2 Generally accepted accounting principles

In a given historical and economic context, financial accounting is a


construction which is based on a certain number of principles generally accepted
by accounting practice and by theory. An understanding of economic reality
through the accounting model representing a firm is largely conditioned by the
choice of these principles. The principle of double-entry occupies a specific
place. By prescribing, since the Middle Ages, the recording of each accounting
transaction from a dual point of view, it laid down an initial formal constraint
which affected both the recording and the processing of the data in the accounts.
Later, with the emergence of the balance sheet concept, the influence of this
principle was extended to the structuring of financial statements.

1.3 The measurement of value in accounting: the balance


sheet equation

The accounting model for the measurement of value and income is structured
by the double-entry principle through what is known as the balance sheet
equation. It gives this model a strong internal coherence, in particular with regard
168 Chapter 8

to the elaboration of financial statements. In fact the balance sheet equation


expresses an identity in terms of assets and liabilities:

Assets (T) = Net Equities(T) + Debts(T)


Since this is a description of a tautological nature of the company's value, this
relationship is, by nature, verifiable at any time.

1.4 The algebraic structure of double-entry accounting

On a formal level, the underlying algebraic structure has been explained by


Ellerman (1986). Going beyond Ijiri's classic analysis in integrating both the
mechanism of the movement of accounts and the balance sheet equation,
Ellerman identifies a group of differences: a group constructed on a commutative
and cancelling monoid, that of positive real numbers endowed with addition. He
calls this algebraic structure the Pacioli group. The Pacioli group P(M) of a
cancelling monoid M is constructed through a particular equivalence relationship
between ordered couples of elements of M

2. SYNERGY MODELLING AND FINANCIAL


VALUATION

The determination of the value of a set of assets results from a subjective


aggregation of viewpoints concerning characteristics which are objective in
nature. As we have seen, the usual methods of financial valuation are based on
additive measure concepts (as sums or integrals). They cannot, by definition,
express the relationships of reinforcement or synergy which exist between the
elements of an organised set such as assets. Fuzzy integrals, used as an operator
of non-additive integration, enable us to model the synergy relation which often
underlies financial valuation.
We present the concepts of fuzzy measure and fuzzy integrals (Choquet 1953;
Sugeno 1977) and we then suggest various learning techniques which allow the
implementation of a financial valuation model which includes the synergy
relation (Casta, Bry 1998; Casta, Lesage 2001).

2.1 Unsuitability of the classic measurement concept

First, methods of evaluating assets presuppose, for the sake of convenience,


that the value V of a set of assets is equal to the sum of the values of its
components, that is:
Synergy modelling and financial valuation 169

V({xi }i=l toJ)= L:V(x;)


i=ltol

The additivity property, based on the hypothesis of the interchangeability of


the monetary value of the different elements, seems intuitively justified.
However, this method of calculation proves particularly irrelevant in the case of
the structured and finalised set of assets which makes up a patrimony. Indeed, the
optimal combination of assets (for example: brands, distribution networks,
production capacities, etc.) is a question of know-how on the part of managers
and appears as a major characteristic in the creation of intangible assets. This is
why an element of a set may be of variable importance depending on the position
it occupies in the structure; moreover, its interaction with the other elements may
be at the origin of value creation such as:

V({xi }i=l toJ)> L:V(x;)


i=ltol

Secondly, the determination of value is a subjective process which requires


viewpoints on different objective characteristics to be incorporated. In order to
model the behaviour of the decision-maker when faced to these multiple criteria,
the properties of the aggregation operators must be made clear. Indeed, there
exists a whole range of operators which reflect the way in which each of the
elements can intervene in the aggregated result such as: average operators,
weighted-average operators, symmetrical sums, t-norms and t-conorms, mean
operators, ordered weighted averaging (OW A).
Depending on the desired semantics, the following properties may be required
(Grabisch et al. 1995): continuity, increase (in the widest sense of the term) in
relation to each argument, commutativity, associativity, and the possibility of
weighing up the elements and of expressing the way the various points of view
balance each other out, or complement each other.
However, these operators of simple aggregation do not allow to fully express
the modalities of the decision-maker's behaviour (tolerance, intolerance,
preferential independence) or to model the interaction between criteria
(dependence, redundancy, synergy) which is characteristic of the structuring
effect.

2.2 Fuzzy measures and fuzzy integrals

The concept of fuzzy integrals is in direct continuity with fuzzy measures and
extends the integrals to measures which are not necessarily additive. It
170 ChapterS

characterises integrals of real functions in relation to a given fuzzy measure.


(Denneberg 1994; Grabisch et al., 1995).

2.2.1 The concept of fuzzy measure

For a finite, non-empty set X, composed of n elements, a fuzzy measure


(Sugeno 1977) is a mapping m, defined over the set P(X) of the subsets of X,
with values in [0, 1], such that:
I. /1(0)=0
2. JL( X)= 1
3. \:/A c B, /l(A) '5, /l(B)
There is no additivity axiom. As a result, for two disconnected sets E and F, a
fuzzy measure can, depending on the modelling requirement, behave in the
following manner:
- additive: Jl(fuF) = Jl(E) + Jl(F)
- over-additive: Jl(fuF) 2:: Jl(E} + Jl(F)
-under-additive: j..i(fuF) ::;; Jl(E} + Jl(F)

The definition of a fuzzy measure requires the measures of all subsets of X to


be specified, that is to say 2" coefficients to be calculated.

2.2.2 The fuzzy integral concept

The redefinition of the concept of fuzzy measurement implies calling into


question the definition of the integral in relation to a measure (Sugeno 1977;
Choquet 1953). Sugeno's integral of a measurable functionf X~ [0,1] relative
to a measure m is defined as:

S(f) =max
ae[O,lj
(min(a, ({x lfix)>a }))) (4)

Since it involves only operators max and min, this integral is not appropriate
for modelling synergy.
Choquet's integral of a measurable functionfX~[O,l] relative to a measure f..i
is defined as:

C (f) = J 1-L ({x lfix) > y}) dy


Synergy modelling andfinancial valuation 171

f'(x)

/ ~
I 1"'--

{x lj(x) > y} X

Figure 1. Choquet's integral

For example, in the case of a finite set X= {x1, x 2, ... xn} with:

0 ~/(xi)~ ... ~fixn) ~ 1 Ai = {xi, ... xJ,

we have:

n
C (/) = L [/{xi) - fixi-1)] ~ (Ai)
i=l

Moreover, l(A=B) being the "indicator function" which takes value 1 if A =B


and 0 otherwise , we can write:

C (f)= fc L ~(A).l(A={xlf{x)>y})) dy
AeP( X)

C (/) = L ~(A).(J l(A= {xjf{x)>y}) dy)


AeP( X)

If we denote gA{f) as the value of the expression Jl(A={x I fix) > y}) dy,
Choquet's integral may be expressed in the following manner:
172 Chapter 8

C (f)=
L:
AEP( X) f.!(A). gA(f)

Choquet's integral involves the sum and the usual product as operators. It
reduces to Lebesgue's integral when ,u is Lebesgue's measure, and therefore
extends it to possibly non additive measures. As a result of monotonicity, it is
increasing with respect to the measure and to the integrand. Hence, Choquet's
integral can be used as an aggregation operator.

2.2.3 Principal applications of fuzzy integrals

Fuzzy integrals found an especially suitable field of application in the


control of industrial processes (Sugeno 1977). This approach then enabled new
applications to economic theory to be made on subjects such as non-additive
probabilities, expected utility without additivity (Schmeidler 1989), and the
paradoxes relating to behaviour in the presence of risk (Wakker 1990).
More recently, they have been used as aggregation operators for the
modelling of multicriteria choice, particularly in the case of problems of
subjective evaluation and classification (Grabisch, Nicolas 1994; Grabisch et al.
1995). With regard to the latter applications, fuzzy integrals exhibit the
properties usually required from an aggregation operator whilst providing a
very general framework for formalization.
The fuzzy integral approach means that the defects of classical operators can
be compensated for (Grabisch et al. 1995). Including most other operators as
particular cases, fuzzy integrals permit the detailed modelling of such features
as:
- The redundancy through the specification of the weights on the criteria, but
also on the groups of criteria. Taking into account the structuring effect makes
possible to take interaction and the interdependency of criteria into account: m
is under-additive when the elements are redundant or mutually inhibiting; m is
additive decomposable, it becomes necessary to define the value of 2n
coefficients for the independent elements; m is over-additive when expressing
synergy and reinforcement.
- The compensatory effect: all degrees of compensation can be expressed by a
continuous change from minimum to maximum.
- The semantic underlying the aggregation operators.

2.3 Fuzzy Measure learning method (Casta, Bry 1998)

Modelling through Choquet's integral presupposes the construction of a


measure which is relevant to the semantic of the problem. Since the measure is
Synergy modelling and financial valuation 173

not a priori J.-L(A) where A E P(X). We suggest an indirect econometric method


for estimating the coefficients. Moreover, in cases where the structure of the
interaction can be defined approximately, it is possible to reduce the combinatory
part of the problem by restricting the analysis of the synergy to the interior of the
useful
n
subsets (see Casta, Bry, 1998). Determining fuzzy measures (that is to say
2 coefficients) brings us back to a problem for which many methods have been
elaborated (Grabisch, Nicolas, 1994; Grabisch et al. 1995). We propose a specific
method of indirect estimation on a learning sample made up of companies for
which the firm's overall evaluation and the individual value of each element in its
patrimony are known.
Let us consider I companies described by their overall value v and a set X of J
real variables :xi representing the individual value of each element in the assets.
Let fi be the function assigning to every variable i its value for company i :
J;: xj --n(. We are trying to determine a fuzzy measure fL in order to come
as close as possible to the relationship:

Let A be a subset of variables and gA(fi) be the variable called generator


relative to A and defined as:

i-';gAifi) = J l(A={x lfi(x) > y}). dy

Thus, we obtain the model:

\1 i, Vj = L Jl(A) . gAifi) + Uj
AEP( X)

in which ui is a residual which must be globally minimized in the adjustment. It is


possible to model this residual as a random variable or, more simply, to restrict
oneself to an empirical minimization of the ordinary least squares type. The
model given below is linear with 21 parameters: the fl(A)'s for all the subsets A of
variables 1. The dependent variable is the value v; the explanatory variables are
the generators corresponding to the subsets of X. A classical multiple regression
provides the estimations of these parameters, that is to say the required measure.
In practice, we shall consider the discrete case with a regular subdivision of the
values:

y 0 = 0 ,y1 = dy, .... ,yn = n.dy


174 Chapter8

and for each group A of variables :J, we compute the corresponding generator as:

gAifJ = dy. I
h=O
l(A={x I Xi> nJJ

The following principle will be used to interpret the measure thus obtained for
AnB=0:

1. fl(AuB) ~ fl(A) + fl(A) ¢::>synergy between A and B,


2. fl(AuB) :=; fl(A) + fl(A) ¢:::>mutual inhibition between A and B.

It should be noted that the suggested model is linear with respect to the
generators, but obviously non-linear in the variables :J. Moreover, the number of
parameters only expresses the most general combination of interactions between
the :J. For a small number of variables :J, (up to 5 for example) the computation
remains possible. The question is not only to compute the parameters, but also to
interpret all the differences of the type /1( Au B) - ( /1( A)+ /1( B)).
For a greater number of variables, one may consider either to start with a
preliminary Principal Components Analysis and adopt the first factors as new
variables, or to restrict a priori the number of interactions considered.

2.4 Measure estimation

2.4.1 Numerical illustration

Consider a set of 35 companies evaluated globally (value V) as well as


through a separate evaluation of three elements of the assets (A, B and C) . Since
the values A, B, Care integer numbers (ranging from 0 to 4), we have divided the
value field into unit intervals dy = I. Computing generators is very simple. For
example, take company i=3 described in the third line of Table 1, and let us
represent its values for A, B and C:
Synergy modelling and financial valuation 175

3
2

0
A B c

Figure 2. Computation of the generator for company i = 3

We have:
- {x1Ji(x)>3} =0
- {x IJi(x) > 2} = {C}
- {x IJi(x) > 1} = {A,B,C}
- {x IJi(x) > 0} = {A,B,C}

Hence we derive the value of the generators:

g{q(i)= l,g{A,B,C}(i)=2,

the other generators having a value of 0. For the whole sample of companies we
have the following generators (Table 1):

Table I. Computation of generators for the whole sample


176 Chapter8

4 2 1 6 2 0 0 1 0 0 1
3 1 3 4 0 0 0 0 2 0 1
0 2 4 6 0 0 2 0 0 2 0
4 0 3 2 I 0 0 0 3 0 0
I I I 3.5 0 0 0 0 0 0 I
2 2 2 6.5 0 0 0 0 0 0 2
I 2 2 6 0 0 0 0 0 1 I
2 I 2 4 0 0 0 0 1 0 I
2 1 3 4.5 0 0 I 0 1 0 I
3 1 2 4.5 I 0 0 0 I 0 I
2 2 2 6 0 0 0 0 0 0 2
I 3 I 4 0 2 0 0 0 0 I
2 2 I 5 0 0 0 I 0 0 I
0 2 4 7 0 0 2 0 0 2 0
4 0 3 2.5 I 0 0 0 3 0 0
I I 3 4 0 0 2 0 0 0 I
3 I I 4 2 0 0 0 0 0 I
1 I 3 4.5 0 0 2 0 0 0 1
3 I I 4.5 2 0 0 0 0 0 I
I I I 3 0 0 0 0 0 0 I
0 2 4 5.5 2 0 0 2 0 0 0
3 0 4 2.5 0 0 I 0 3 0 0
I 4 3 9 0 I 0 0 0 2 I

Then, by regressing the global value on the generators, we obtain (with an R2


of0.96) the results displayed in Table 2:

Table 2. Regression of the global value of the generators

( {A}) ( { B}) ( { C}) ({A,B}) ({A , C}) ({B,C}) ({A,B,C})


0,50 0,55 0,57 2,25 0,60 2,66 3,12

The interpretation is simple in terms of structuring effects: each of the criteria


A, Band C, taken individually, has more or less the same importance. But there is
a strong synergy between A and B on one hand, and between B and C on the
other; A and C partly inhibit each other (possible redundancy between the two).
There is no synergy being the specific property of the 3 grouped criteria:
!l( {A,B,C}) is barely different from the sum !l( {A,B}) + !l( {C} ), as well as from
j..l({B,C}) + !l( {A}). The fact that it is superior to !l( {A,C}) + !l( {B} ), denoting
synergy between {A,C} and {B}, simply comes from the synergy already
observed between A and B on one hand, and between C and B on the other hand.
Synergy modelling and financial valuation 177

2.4.2 Limiting a priori the combinatory effect of the interactions

2.4.2.1 The extension principle


Instead of considering the set Q)(A) of the A -subsets to define the measure, we
have only considered a limited number of these subsets, the measure f.L on all the
other subsets being defined univocally by an extension rule. The extension rule
that naturally comes to mind is addition. Indeed, addition being equivalent to
non-interaction, the set of all A-subsets on which Jl is not a priori defined as
additive is that of all interactions considered by the model.
Take for example a set of 6 variables: {a,b,c,d,e,j}. One can restrict
interactions to {b, c} and {e,j}. The measure Jl will hence be defined through the
following values: J..L( {a}), J..L( {b }), J..L( {c }), J..L( {d} ), J..L( {e} ), J..L( {/} ), J..L( {b,c} ),
!l( {e,j} ). From these values, ll can be computed for any A-subset using the
addition as extension rule. For instance, !l({a,b,c}) = !l({a})+~-t({b,c}). Note that
computing !l( {a, b, c}) as !l( {a}) +~-t( {b}) +~-t( {c}) would violate the assumption
that b and c can interact.
We shall refer to all subsets corresponding to interactions considered by the
model as kernels of the measure ll· In the example above, there are 8 kernels. We
shall consider that singletons (atoms of the measure) are kernels too. Taking a
subset B, we shall say that a kernel N is B-maximal if and only if: N c B and
there exists no other kernel N' distinct from N and such that N c N' c B .
The extension rule can then be stated accurately as follows: given any subset
B, find a partition of Busing only B-maximal kernels {N1, ... , NL}, and compute
!l(B) as:

In the example above, there are only two ways to partition subset {a,b,c} using
the kernels: B = {a} u {b} u {c} and B ={a} u {b,c}. The latter uses only B-
maximal kernels, whereas the former does not.
We therefore see that if a subset B can be partitioned in more than one way
using B-maximal kernels only and if the values of ll over the kernels are not
constrained, the extension principle may lead to two distinct values of !l(B). So,
in order for the extension rule to remain consistent, the set of kernels must fulfil
certain conditions.

2.4.2.2 The kernel set structure


Let K be the set of kernels. K must abide by the following rule R:

'v'NEK.'v'N'EK: NnN'i=0 ::::> NuN'E K


178 Chapter 8

Rule R forbids the possibility that a subset B be partitioned in different ways


using B-maximal kernels. Indeed, suppose the subset B can be partitioned in two
different ways using B-maximal kernels. Then, take any kernels N 1 in partition 1
and N2 in partition 2 such that N 1 n N2 ::;:. 0 .
If rule R holds, then N 1 u N 2 E K and either N, = N2 or one of the two kernels
is not B-maximal, which contradicts one of the hypotheses. Therefore, all
intersecting kernels between partitions have to be identical, and then, partitions 1
and 2 have to be identical, which contradicts the other hypothesis.
Conversely, the impossibility for any subset B to be partitioned in different
ways using B-maximal kernels requires rule R to hold: supposing rule R does not
hold, there exist distinct intersecting kernels N 1 and N 2 such that N1 u N 2 is not a
kernel. Let B = N 1u N 2 , K 1 be a B-maximal kernel containing N1 and K 2 be a B-
maximal kernel containing N 2• B can then be partitioned in two different ways:
- partition 1 = K 1 u partition of B\K1 using B-maximal kernels (where operator
\ is the set difference),
- partition 2 = K 2 u partition of B\K2 using B-maximal kernels.

There are two notable and extreme cases of structures complying with rule R:

1. A partition hierarchy, i.e. a set of subsets between which only disjunction or


inclusion relations hold, e.g.:

abc d efg

Figure 3. Partition 1

Here: K={ {a},{b},{c},{d},{e},{f},{g},


{d,e}, {a,b,c}, {d,e,f,g}}

2. The set of all subsets of A.


Synergy modelling and financial valuation 179

Between these two extreme cases, various structures can be considered.


Given, for instance, a partition of A into components, one can take every
component, as well as all its subsets, as kernels:

abc
1\1\
d efg

Figure 4. Partition 2

Here:

K={ {a},{b},{c},{d},{e},{f},{g},{a,b},{b,c},{a,c}, d,e},{f,g},{a,b,c}}

2.4.2.3 Integrating over the kernel structure


This is achieved by simply computing the fuzzy integral with respect to the
measure defined on the kernels and extended as stated above.

2.4.2.4 Measure estimation


Estimating the measure ll on the kernels requires one extra step: the
generators corresponding to the kernels have to be computed first. Recall that:

C (f)= Jc ij(A) .l(A={xlf{x)>y} )).dy


Ae<A:X)

= Jc L L
Ae<J.l:X) NEK,N A-max
Jl(N).l(A={xlf(x) > y})).dy

= L
NEK
Jl(N) L
AE<J.l:X),N A-max
J(l(A={x lf(x)y})).dy

The generator associated with kernel N will here be defined as:


180 Chapter 8

gN(f) = L
AE <J{X},N A-max
J(t(A={xjj(x)>y} )). dy

Thus, we have:

c (j) = I ~eN) gN(f)


NeK

Example: Take the following/, and kernel structure:

r
4

0
I
a b c d e f a b c d ef

Figure 5. Example

Here:

K= { {a}, {b }, {c} , {d}, {e}, {f} , {a,b} , {a,b,c }, {d,e}, {d,f}, {e,f}, {d,e,f}}

We must examine, for y going from 0 to 4, every subset B = {x I j{x) > y}. We
get:

- y=O ~ B = {a,b,c,d,e,j};
- y=1 ~ B = {a,b,c,d,e};
y =2 ~ B = {b,d,e}
- y=3 ~ B = {d};
- y=4 ~ B=0.

Let us consider the B-maximality of kernel {d,e} with respect to each of these
subsets. Kernel {d,e} appears to be B-maximal forB = {a,b,c,d,e} and forB =
{b,d,e}. Hence, we draw: g{d,eJ(j) = 1+1=2.
Synergy modelling and financial valuation 181

Once every kernel generator has been computed, we can proceed to the least
squares step of the learning procedure, just as in § 2.4.

CONCLUSION

After reviewing the possibilities offered by fuzzy integrals, we have observed


that there exist many potential fields of application in finance for this family of
operators. They enable the effects of micro-structure, synergy and redundancy to
be analysed in detail, while they are opaque in linear models. There is a price to
be paid for this sophistication in terms of computational complexity. However,
we have tried to show that these techniques allow to limit the purely combinatory
effects which appear at the learning stage of the methodology.

ACKNOWLEDGEMENTS

This work took place with the support of the CEREG Laboratory (University
of Paris Dauphine). We thank Helyette Geman for her helpful comments.

REFERENCES
Abdel-Magid M.F. (1979), "Toward a better understanding of the role of measurement in
accounting", The Accounting Review, April, 54(2), 346-357.
Bry X., Casta J.F. (1995), "Measurement, imprecision and uncertainty in financial accounting",
Fuzzy Economic Review, November, 43-70.
Casta J.F. (1994) "Le nombre et son ombre. Mesure, imprecision et incertitude en comptabilite", in
Annales du Management, Xllemes Joumees Nationales des IAE, Montpellier, 78-100.
Casta J.F., Bry X. (1998), "Synergy, financial assessment and fuzzy integrals", in Proceedings of
IVth Meeting of the International Society for Fuzzy Management and Economy (SIGEF),
Santiago de Cuba, II, 17-42.
Casta J.F., Lesage C. (2001), "Accounting and Controlling in Uncertainty: concepts, techniques and
methodology", in J. Gii-Aiuja (ed.) Handbook of Management under Uncertainty, Kluwer
Academic Publishers, Dordrecht.
Choquet G. (1953), "Theorie des capacites", Annates de l'Institut Fourier, 5, 131-295.
Denneberg D. (1994), Non-additive measures and integral, Kluwer Academic Publishers,
Dordrecht.
Ellerman D.P. (1986), "Double-entry multidimensional accounting", Omega, International Journal
of Management Science, 14(1 ), 13-22.
Grabisch M. (1995), "Fuzzy integral in multicriteria decision making", Fuzzy Sets and Systems, 69,
279-298.
Grabisch M., Nguyen H.T., Wakker E.A (1995), Fundamentals of uncertainty calculi with
applications to fuzzy inference, Kluwer Academic Publishers, Dordrecht.
182 Chapter 8

Grabisch M., Nicolas J.M. (1994), "Classification by Fuzzy Integral: Performance and Tests",
Fuzzy sets and systems, 65, 255-271.
ljiri Y. (1967), The foundations of accounting measurement: a mathematical, economic and
behavioral inquiry, Prentice Hall, Englewood Cliffs.
ljiri Y. (1975), "The theory of accounting measurement, Studies" in Accounting Research, 10,
American Accounting Association.
de Korvin A. (1995), "Uncertainty methods in accounting: a methodological overview", in Seigel
P.H., de Korvin A., Orner K. (eds.), Applications of fuzzy sets and the theory of evidence to
accounting, JAI Press, Stamford, Conn., 3-18.
March J.G. (1987), "Ambiguity and accounting: the elusive link between information and decision
making", Accounting, Organizations and Society, 12(2), 153-168.
Mattessich R. (1964), Accounting and analytical methods, Richard D. Irwin, Inc.
Morgenstern 0. (1950), On the accuracy of economic observations, Princeton University Press.
Schmeidler D. (1989), "Subjective probability and expected utility without additivity",
Econometrica, 57(3), 571-587.
Sterling R.R. (1970), Theory of the measurement of enterprise income, The University Press of
Kansas.
Stevens S.S. (1951), "Mathematical measurement and psychophysics", in Stevens S.S (ed.)
Handbook of Experimental Psychology, John Wiley and Sons, New-York, 1-49.
Stevens S.S. (1959), "Measurement, psychophysics and utility", in Churman C.W., Ratoosh P.
(eds.), Measurement: Definitions and Theories, John Wiley and Sons, New-York, 18-63.
Sugeno M. (1977) "Fuzzy measures and fuzzy integrals: a survey", in Gupta, Saridis, Gaines (eds.),
Fuzzy Automata and Decision Processes, 89-102.
Tippett M. ( 1978), "The axioms of accounting measurement", Accounting and Business Research,
Autumn, 266-278.
Vickrey D.W. (1970), "Is accounting a measurement discipline ?", The Accounting Review,
October, 45,731-742 .
Wakker P. (1990), "A behavioral foundation for fuzzy measures", Fuzzy Sets and Systems, 37, 327-
350.
Willett R.J. (1987), "An axiomatic theory of accounting measurement", Accounting and Business
Research, Spring, 66, 155-171.
Zebda A. (1991), 'The problem of ambiguity and vagueness in accounting and auditing",
Behavioral Research in Accounting, 3, 117-145.
Chapter 9

Mechanisms of discipline and corporate performance:


Evidence from France

Eric SEVERIN
University of Lille2, 1, Place Deliot BP 381 59020 Lille, eric.severin@freejr

Abstract: This article deals with the influence of the structure of the Board of Directors,
external and internal discipline and size on corporate performance in the economic
and financial fields. By using firstly, self organising maps, and secondly, panel data,
we highlight three main results. Firstly, our results suggest, from a sample of 136
firms, that the relation between the structure of the Board of Directors and
performance is non-linear. Furthermore, one can observe that the least effective
firms are those in which the proportion of outside directors is the highest. Secondly,
variables of leverage, and stock turnover are the main explanatory variables of
performance. Although leverage has a negative influence on performance (Opler,
Titman 1994) conversely, stock turnover has a beneficial impact (Charreaux 1997).
Thirdly, variables of concentration of ownership structure and size have a positive,
but non-significant, influence on performance.

Key words: Outside Directors, Leverage, Performance, Self Organising Map.

INTRODUCTION

The financial literature has tried to measure the influence of ownership


structure on corporate performance for a long time. Ownership structure has
various facets. It may concern the influence of managers on capital, the kind of
shareholders, the size or composition of Board of Directors. The objective of this
article is twofold. Firstly, it attempts to deal with the relationship between the
composition of the Board of Directors (and more particularly the presence of
outside directors) and corporate performance of French firms. By recognizing
that the relationship between the structure of the Board of Directors and
183
C. Lesage et al. (eds.), Connectionist Approaches in Economics and Management Sciences
© Springer Science+Business Media Dordrecht 2003
184 Chapter 9

performances appears to be unclear, we will try to more clearly understand this


relationship. Is this of a linear or non-linear nature? More precisely, and this is
the second objective of the paper, if the structure of Board of Directors partially
explains performance, can it be explained by other variables such as internal or
external discipline or by size.

Several results appear:


Firstly, our results suggest, from a sample of 136 firms, that the relation
between structure of Board of Directors and performance is non-linear.
Furthermore, one can observe that the least effective firms are those in which the
proportion of outside directors is the highest.
Secondly, variables of leverage, and stock turnover are the main explanatory
variables of performance. Although leverage has a negative influence on
performance (Opler, Titman 1994) conversely, stock turnover has a beneficial
impact (Charreaux 1997).
Thirdly, variables of concentration of ownership structure and size have a
positive, but non significant, influence on performance.

This article is organized as follows. In a first section, we discuss the


relationship between the structure of the Board of Directors and corporate
performance and moreover, we will try to highlight the main mechanisms of
discipline influencing performance. In the second section we will describe our
data and our methodology. Our results are described in the third section.

1. THE VARIABLES EXPLAINING CORPORATE


PERFORMANCE

Within this section, we deal with the elements of the theoretical literature and
will present our working hypotheses.

1.1 The structure of the Board of Directors: theoretical


discussion and empirical verifications

Very numerous papers have dealt with the influence of the Board of Directors
(and more especially of the outside directors) on corporate performance. The
outside director is defined by opposition to the inside director. An inside director
is simultaneously on the Board of Directors and a salaried employee or on the
Board of Directors and in the senior management committee. From this general
definition, one can include several types of actors in the inside administrators'
category: family members and people in business relationship with the firm (but
Mechanisms of discipline and corporate performance 185

not necessarily tied to it by a working contract: consultants for example). The


latter actors are qualified in the American literature as "grey directors". The
outside character is applied to all other actors' of the Board of Directors not
belonging to the quoted categories before.

Nowadays two main trends deal with the relationship between outside
administrators and performance. The first is unfavourable to outside directors on
the Board of Directors for several reasons.
Some authors (Mace 1986; Vancil 1987) highlight the fact that outside
directors are biased because their recruitment is strongly influenced by
management. Then, their weak involvement hardly incites them to defend
shareholders' interests (Jensen 1993; Mace 1986; Patton, Baker 1987). In short,
as J. Maati (1998) underlines, the exchange of directorships between managers
allows the emergence of a microcosm based on a favourable mutual support for
entrenchment.
In response to these critiques, some authors underline outside directors'
advantages. Firstly, as their reputation depends on their vigilance, they will be
very keen to control leaders efficiently. Secondly, belonging to several Boards of
Directors allows them to diversify their human capital (which favours their
independence) and to improve their appraisal as well as their field of expertise.
Thirdly, the threat oflegal action by the shareholders motivates them to act in the
interest of the latter (Fama, Jensen 1983a). Fourthly, their presence on a Board is
a guarantee of performance because they can examine the different proposals of
the executive committee with detachment (Eisenhardt, Bourgeois 1988; Kosnik
1990).

Empirical contributions are contradictory and do not permit clear decisions to


be taken. Some papers have shown greater efficiency for those firms with
numerous outside directors. The presence of outside directors is a means of
putting pressure on management.
Thus, Rosenstein, Wyatt (1990) showed a positive reaction of the market to
the announcement of the arrival of an outside director. In a complementary study
in 1997, these same authors showed that although the financial market reacts
positively and significatively to the arrival of an outside director, it is indifferent
to that an inside director. In line with these papers, Weisbach (1988) suggested
that Boards of Directors dominated by outside directors (when management is
composed of at least by 60% of outside directors) have an influence on
management turnover when the firm is in distress.

On the contrary, other empirical studies, carried out on the American market,
re-open the question on the presence and usefulness of outside directors. Thus
Hermalin, Weisbach (1988) failed to observe a link between the amount of
186 Chapter 9

outside directors and corporate performance, but (Agrawal, Knoeber 1996;


Sundaramuthy et al. 1997; Bhagat, Black 1998) observed a negative influence of
these actors on performance. Some contradictory results on the French market
were also noted. For Charreaux (1997), the outside directors' percentage
influences the performance of managerial or controlled firms. On the contrary,
Maati (1998) underlined a negative and significant relationship between the
outside directors' proportion and performance (measured in book value) for firms
belonging to industrial and services sectors.

From these different contributions, several elements appear:


Firstly, the relation between the Board of Directors' structure and
performance appears to be unclear. Among all empirical contributions, no thesis
really seems to dominate.
Secondly, most of these studies are cross-sectional.
Thirdly, the authors did not use the same measurements to define the outside
character of administrators. Although the empirical studies on the American
market distinguished between outside directors, inside directors and "grey
directors", studies on the French market (Charreaux 1997; Maati 1998) defined
outside directors as directors not in the management team (Maati 1998), or as
directors not in the management team and who do not represent the most
important shareholders (Charreaux 1997).
Fourthly, measurements ofperformance, as underlined by Charreaux (1997),
deserve greater precision. A distinction must be made between shareholder and
total value. From that, one can consider that the influence of outside directors on
performance seems unclear. Even if we are conscious that corporate performance
depends on numerous factors (such as the industrial sector, the economic cycle,
or the management reputation), one must note that it is difficult to take into
account all of these factors. All these investigations, allowed us to make the
following assumption: the influence of outside directors on performance is non-
linear. Can we observe such a dichotomy in practice? And if this relationship is
non-linear, can we determine the factors (or variables) able to explain the causes
for the different equilibriums ? As these factors are numerous, we will focus on
the internal and external variables of discipline.

1.2 Other internal and external mechanisms of discipline

Besides the composition of the Board of Directors, ownership structure is


defined by internal and external mechanisms. The internal mechanisms consist,
in particular, of the shareholders' control but also of the leverage. The external
mechanisms are based on the discipline of financial markets and more especially
that exerted by investors.
Mechanisms ofdiscipline and corporate performance 187

1.2.1 Internal discipline mechanisms

We focus our attention on two variables: the percentage of the three main
shareholders and leverage.
Concentration of capital is a favourable element to the exercise of efficient
control by shareholders (Shleifer, Vishny 1986; Bethel, Liebeskind 1993;
Agrawal, Knoeber 1996). Indeed if there is not one (or several) main
shareholder(s), no shareholder has an interest in using resources (time and funds)
to control the management because he will be the only one to bear the investment
cost whereas all the owners or partners will benefit from this action. Conversely,
in the case of a major shareholder, he is strongly encouraged to invest in
management control, because he will benefit from significant extra profit. From
this argument, we can develop a first hypothesis H 1:
HI: Shareholder concentration positively influences performance.
Financial structure is also able to manage agency conflicts and to reduce the
"free cash flow". Jensen (1986) explained that the best way to reduce conflicts of
interest is to increase debt level. Leverage provides discipline and monitoring not
available to a firm completely financed by equity. According to the 'free cash
flow' theory, debt creates value by imposing discipline on organizations which in
tum reduces agency costs (Jensen 1986). The use of debt has two functions: 1) it
decreases the free cash flow that can be wasted by managers and, 2) it increases
the probability of bankruptcy and the possibility of job loss for managers' (thus
leading to the disciplining effect).
Although leverage has advantages, it can also provoke harmful effects. In
response to Modigliani, Miller (1963), many authors (Altman 19842; Matecot
1984; Wruck 1990) show that operational difficulties and a rise in leverage leads,
ceteris paribus, to a decrease in performance and an increase in the weakness of
the firm. When leverage is not controlled, the firm is likely to file for bankruptcy
(Altman 1984). The costs of bankruptcy provide an explanation for the capital
structure. To this extent, Opler, Titman (1994) specified that debt is a factor of
"financial distress" likely to endanger the firm. Indeed, if a firm is overextended
in debt, the shareholders may doubt its durability. For example, customers may
be reluctant to do business with distressed firms.
In other words, the shareholders have no confidence in a firm that is not able
to meet its commitments. The originality of the findings of Opler, Titman lies in
highlighting the existence of indirect costs harmful to the firm prior to
bankruptcy. In other words, they reverse the causality assumed by Altman. Opler
and Titman (by using a classical methodology based on multiple linear regression
and working with a large sample (46799 firms) and over a long period (1972-
1991)), showed that highly-leveraged firms (i.e. which have leverage in deciles 8
to 10) are more sensitive to an economic downturn. These firms lose market
188 Chapter 9

share. Industry-adjusted sales growth is 13.6% lower (p value <1%) for firms
with leverage deciles 8 to 10 in distressed industries than for less-leveraged
firms . Similarly, industry-adjusted sales growth for firms in distressed industries
with leverage decile 10 is on average 26.4 % lower (p value <1%) than for firms
in leverage decile 1 (the least-leveraged firms). In a distressed industry the
highly-leveraged firms, compared to the others, have a drop in equity value
11.9% greater than firms with leverage deciles 1 to 7 (significant at the 1%
threshold).
In line with the hypothesis of the "free cash flow" (Jensen 1986), one can
expect debt increase and value (debt imposes a discipline on organizations and in
particular on management). Here, debt is a source of "good stress". On the
contrary, if debt is a result of difficulties that are borne by the finn (Opler,
Titman 1994), it can have negative consequences (loss of customer confidence
for instance). In this framework, debt is a source of"bad stress".
From this argument, we can develop two hypotheses H2a and H2b:
H2a: Leverage, a measurement of the lenders' monitoring, has a positive
influence on performance.

H2b: Leverage, a factor of financial distress, has a negative influence on


performance.

1.2.2 Mechanisms of external discipline

Mechanisms of external discipline hinge on the financial market discipline.


We will use one variable of external discipline: security turnover rate. The reason
for choosing security turnover rate is in line with the arguments of Titman,
Wessels (1988). These authors indicated that a considerable volume of
transactions shows the discipline of financial markets. It is a means of putting
pressure on management to act in accordance with interests of investors.
From this argument, we can develop a new hypothesis H3:
H3 : Security turnover rate, a measurement of investors' monitoring, has a
positive influence on performances.
After explaining our objectives and hypotheses, we will justify our
methodology and the choice of our data.
Mechanisms of discipline and corporate performance 189

2. DATAAND METHODOLOGY

2.1 Data

The sample was chosen from 136 French industrial firms 3 listed on the Paris
Bourse (Reglement Mensuel, Comptant and Second Marche) during the 1991-
19954 period. By means of the Dafsa-Pro and Annee Boursiere databases, we
used all the items able to give information about ownership structure, internal
and external discipline variable and size variable.
Thus, our sample was constituted of panel data. For Sevestre (1992), one of
the advantages of panel data compared to the other temporal series was to show
the dynamics of individual behaviour.

2.2 Pertinence of chosen variables

As we sought to highlight the relationship between the structure of Boards of


Directors and performance, we chose two performance indicators.

2.2.1 Variables of performance

As underlined by Charreaux (1997, p. 32):


"An adequate performance measurement should be able to take into account
all the consequence on the wealth of stakeholders".
Nevertheless, the choice of a performance measurement is complex for
several reasons. Indeed it will differ depending on the shareholder or stakeholder
value maximisation. In particular, the latter considers that economic performance
is the key element because it is the origin of global (or stakeholder) value5 . From
this argument, we have chosen ROI and ROE.
ROI values the total performance of the firm. ROI is calculated in book value
and equal to: (net result+ interest6)/(equity +total debt). This conception, which
we will follow, implies that the output corresponding to amounts of vested
interests includes the net result returned to shareholders and the financial costs of
paying creditors at that time. Nevertheless, this indicator is not free of criticism.
Indeed ROI uses the net profit that very often includes non-operating elements.
That is why it would be better to choose EBITDA (earning before interest and
taxes and depreciation) because this measurement excludes amortisation. In spite
of these imperfections and to simplify, we will choose the final net result.
Financial performance is given by the ROE (net profit/equity -in book value-).
This indicator is very important for shareholders. However the ROE does not
190 Chapter 9

permit the assessment of the profitability of all invested funds. Furthermore, we


were careful to reprocess the absurd values 7 .

2.2.2 The explanatory variables of performance

- The internal mechanism variables


If one refers to the literature, the number of outside directors can influence
corporate performance both positively and negatively. We have considered the
outside directors as directors who are not in the management team (variable
OUTDIR).
To take into account shareholder concentration, we took the percentage held
by the three main shareholders (variable %A13).
In line with Charreaux ( 1997), we chose to measure leverage as the ratio of
book value of financial debt divided by the total of equity in book value (variable
FIDEEQUI).

- The external mechanism variables


The reason for choosing security turnover rate (STR) as an external discipline
is in line with the arguments of Titman, Wessels ( 1988). These authors indicated
that a considerable volume of transactions shows the discipline of financial
markets. It is a means of putting pressure on management to act in accordance
with the interests of investors. We took this data from within the "Annee
Boursiere".

- Size variable
Furthermore, we used a control variable: size defined by the logarithm of the
total assets (LNTA).

2.3 Methodology

To clarify our remarks, we chose to explain the advantages and drawbacks of


the utilization of Kohonen maps and panel data. Developments are presented in
detail in Appendixes 1 and 2.

2.3.1 Self organising maps

By assuming that, firstly, no consensus exists concerning the influence of


ownership structure on performance and secondly, that a non-linear relationship
may exist between ownership structure and corporate performance (Morek et
al.1988; McConnell, Servaes 1990; Short, Keasey 1999), we focused our
attention on self-organized maps (SOM) and more especially on one of the
Mechanisms ofdiscipline and corporate performance 191

variants called the Kohonen Map (Kohonen 1982; 1995). One of the major
advantages in its use is its capacity to deal with non-linear problems in particular.
Our objective was to determine several groups of homogeneous individuals.

2.3.2 Specification of the model: advantage of panel data

Based on the initial results, firstly, we used some classic non-parametric tests
(Wilcoxon8) to highlight significant differences between our groups, and
furthermore, we developed a model by using the technique of panel data.
In order to better to understand the impact of surveillance variables, external
control and size, we carried out an empiric analysis by means of regressions on
panel data as follows 9 :

Performance= intercept+ Bl(%Al3) + B2(FIDEQUI) + B3(STR) +


B4(LNTA) + £ (Eq. 1)

With:
- Performance = ROI or ROE
- %Al3 =Percentage of the detained capital held by the three main hareholders
FIDEEQUI =Financial debt (in book value)/Total of equity (in book value)
STR = Security turnover rate
LNTA = Logarithm of assets

The advantages of panel data hinge on the fact that one has at ones disposal a
satisfactory amount of data, both from the quantity and variability viewpoints.
Econometrically speaking, all the available data results in great accuracy of the
estimations. Another characteristic of panel data is the predominance of inter-
individual disparities ("between estimator") in the variance of observation. The
final advantage consists in the capacity to carry out cross-sectional and series
estimations from the same data. Although the use of panel data shows a certain
advantage, it is however not free of drawbacks. Thus, the lack of individual
information concerning a variable, considerably weakens the dynamic-type
regressions, like inter-individual estimations ("within estimator"), which are
extremely sensitive to the biases due to omitted variables or measurements errors.
A second limit results from the nature of the sample chosen. Indeed, working on
panel data necessitates the samples being cylindered, which questions their
representativity. It is the reason why all mergers and bankruptcies of firms were
excluded from our sample.
192 Chapter 9

3. RESULTS AND ANALYSIS

Within this section, we will present our results. Firstly, we will focus on the
influence of outside directors on performance and secondly, on the impact of the
other monitoring variables explaining performance.

3.1 The relationship between the composition of the Board


of Directors and performance: a non-linear relationship

Many methods have been used to represent the Kohonen maps A good
synthesis is also provided in the Kohonen book 10 . In this study, we used one
representation to plot the Kohonen grid (or string) on a plane, each unit having
the same size. This is the classical Kohonen map representation of the
relationships between the composition of Board of Directors and Performance
according to the following years:

Ul U2 U3 U4
Outdirs Outdirvs Outdirw Outdirvw
Roivw Roiw Rois Roivs
Roevw Roew Roes Roevs

Figure 1. Year 91

Ul U2 U3 U4
Outdirvw Rois Outdirvs Outdirw
Roivs Roes Outdirs Roivw
Roevs Roiw Roevw
Roew

Figure 2. Year 92

Ul U2 U3 U4
Outdirvw Outdirw Outdirvs Outdirs
Roivs Rois Roiw Roivw
Roevs Roevs Roevw Roevw

Figure 3. Year 93
Mechanisms of discipline and corporate performance 193

Ul U2 U3 U4
Outdirvs Outdirw Outdirvw Outdirs
Roiw Rois Roiw Roivs
Roew Roevs Roew Roevs

Figure 4. Year 94

Ul U2 U3 U4
Outdirvs Outdirs Outdirw Outdirvw
Roivw Roiw Rois Roivs
Roevw Roew Roes Roevs

Figure 5. Year 95

With:
ROIVS, ROIS, ROIW, ROIVW: ROI very strong, strong, weak, very weak.
ROEVS, ROES, ROEW, ROEVW: ROE very strong, strong, weak, very
weak.
- OUTDIRVS, OUTDIRS, OUTDIRW, OUTDIRVW: % outside directors in
the Board of Directors very strong, strong, weak, very weak.
- U1, U2, U3, U4: Unit 1, Unit 2, Unit 3, Unit 4.

The representation of the Kohonen map highlights that all performance


variables converge (i.e., for instance, ROEVS and ROIVS are in the same unit).
It is important to note that firms with the worst performances are those where the
percentage of outside directors is the greatest. This result is consistent with those
found by Mace (1986), Patton, Baker, (1987), Vancil (1987), Jensen (1993) and
Maati (1998). Nevertheless, the influence of the structure of the Board of
Directors seems complex. The coexistence of positive and negative effects leads
us to note that the relationship between the structure of the Board of Directors
and performance is non-linear. More precisely, our results suggest (especially for
1994) that a strong percentage of outside directors is associated with very strong
performances.
It leads us to recognize that the relationship between structure of the Board of
Directors and performance remains an open question. These first results led us to
question the reasons for these different balances. Consequently, we attempted to
determine the characteristics of the individuals located in each unit of the
Kohonen map. The second task consists in working on the individuals.
194 Chapter 9

3.2 Non parametric tests on monitoring and size variables

Within this section, we will try to determine if some differences exist between
the explanatory variables of performance.

Table 1. Descriptive statistics and Wilcoxon test on monitoring and size variables from 1991 to
1995

lx_ear 1991 Yariables YoAI3 FIDEEQ_UI STR 1:-NTA


!Unit I N=29 Mean 63.22 1.67 10.11 13.63
(VW) Median ~5 . 30 0.88 5.50 13.64
!Unit 2 N=42 Mean 161.90 0.78 16.64 14.83
(W) Median ~0.25 0.59 10.10 14.37
!Unit 3 N=35 Mean 57.79 0.79 ~7.47 15.07
(S) Median 160.10 0.66 ~5.60 14.63
!Unit 4 IN=3o Mean 71.72 0.67 13.92 13.87
VS) Median 72.40 0.45 11.00 13.61
~otal IN= 136 Mean 63.29 0.95 17.43 14.42
Median 163.90 0.58 10.80 14.07
Wilcoxon 1.65 -2.09 -1.89 0.71
P value to.09* 0.04** to.06* 0.48

!Year 1992 y_ariables YoA13 FIDEEQUI STR 1,-NTA


!unit I IN=26 Mean 72.62 0.54 ~0. 20 14.37
VS) Median 72.33 0.37 18.37 13.92
Unit 2 IN=24 Mean 59.95 0.88 ~5 .98 15.05
S) Ivt_edian ~0. 30 0.85 ~3 . 80 14.83
!unit 3 IN=59 Mean 62.44 1.11 ~2 . 85 14.84
W) Median 160.90 0.57 15.00 14.55
!Unit 4 IN=27 Mean 166.83 1.04 13.13 13.36
VW) Median ~5.79 1.18 13. 11 12.99
~otal IN= 136 Mean 64.82 0.95 ~0.96 14.49
Median 164.80 0.58 14.00 14.12
Wilcoxon -1.32 1.90 -1.72 1.94
P value lo.l9 0.06* to.09* 0.05*

~ear 1993 !Variables YoA13 IFIDEEQUI STR NTA


!Unit I lt-!_=37 !Mean 75.72 0.53 ~8.75 15.20
(VS) !Median 77.20 p .37 19.60 15.26
Unit2 IN=24 !Mean 67.16 lo.58 19.10 14.44
S) !Median 69.80 k>.5o 14.90 13.91
!unit 3 lt-!_=56 !Mean 61.40 p.69 15.76 14.32
W) !Median 59.75 lo.58 ~.95 14.17
!unit 4 IN=I9 !Mean 61.42 ~.41 15.22 13.62
VW) ~edian 60.00 ~ . 18 7.10 13.11
Mechanisms of discipline and corporate performance 195

[Iota I N=I36 !Mean ~4.42 0.87 19.81 14.48


!Median k;3 .85 P.58 15.20 14.11
!Wilcoxon -1.89 -4.39 2.27 2.40
IP value ~.06* ~.00*** ~.02** ~.02**

~ear 1994 Variables VoAI3 lFIDEEQUI STR '=-NTA


[unit I N=31 Mean 63.11 0.67 13.93 14.19
(VW) Median 60.00 ~ . 57 5.53 14.06
ltJnit 2 N=33 Mean 68.16 1.08 18.59 14.53
(W) Median 70.00 ~~64 10.34 14.25
[unit 3 ~=43 ~ean 65.49 0.57 19.98 14.85
S) ~edian 62.80 0.39 15.28 14.38
Unit4 N=29 !Mean 69.29 0.54 18.88 14.45
(VS) if'.!_edian ~_7 . 90 PA2 18.11 14.07
Total ~=136 ~ean 166.40 ~ . 77 18.03 14.54
~edian 66.80 ~.50 12.42 14.14
!Wilcoxon -1.15 -0.95 -3.11 -0.95
IP value ~.25 1<>.34 lo.OO*** 1<>.34

Year 1995 Variables VoA13 IFIDEEQUI STR ILNTA


Unit I N=29 Mean 72.29 1.47 18.41 13 .68
(VW) Median 77.93 0.65 9.67 13.84
Unit2 IN=I8 Mean 71.27 ~ .67 p4.97 15.19
W) Median 71.25 0.58 18.77 15.11
Unit3 N=64 Mean 67.98 .068 ~5.45 14.8
(S) Median 67.45 0.49 18.82 14.45
!Unit 4 N=l3 ~ean 78.28 0.43 ~3.14 14.59
(VS) Median 76.02 0.31 ~1.46 14.26
Total N=136 !Mean 71.23 0.80 ~4.78 14.57
!Median 71.73 0.52 19.18 14.24
~ilcoxon -0.94 2.99 1.41 -1.87
IP value ~ . 35 -0.00*** ~.16 p.o6

Notes:
1. (VW): Firms with very poor performance, W: Firms with poor performance, (S): Effective
firms, (VS): Very effective firms.
2. We use the Wilcoxon test -for monitoring and control variables- between very effective firms
and firms with poor performance).Variables %A13, FIDEEQUI, STR and LNTA are
respectively: the percentage held by the 3 main shareholders, the total of financial debt divided
by the total of equity in book value, the security turnover rate and the logarithm of total assets.
3. ***,**and* significant at the I, 5 et 10% threshold. N =Number of observations.

Our findings give some important points to comment on. First of all, the
results indicate that the effective firms have a statistically different security
turnover rate (at the 1% threshold). Our results are in line with those of
Charreaux ( 1997). Wilcoxon tests give values of -1.89 and -1.71 (significant at
196 Chapter 9

the 10% threshold) for years 1991 and 1992. Results for 1993 and 1994 are much
more significant (-2.26 and -3.11 significant at 1% threshold).
The result for 1995 is consistent with the others but it is not significant (p
value= 0.16) Our results are in line with Titman, Wessels (1988) and Charreaux
(1997). Thus the security turnover rate is a means of putting pressure on
management to act in accordance with investors' interests.
A second point leads us to observe that very strong leverage is associated with
the weakest performances. The results are significant at the 1% threshold for
1993 and 1995, significant at the 5% threshold for 1991 and significant at 10%
for 1992.
If one compares the leverage of very effective firms with those that are the
least effective, one can observe that, on average, the leverage is two or three
times as great as for the least effective firms. Thus, in 1995, the mean leverage
was 1.47 for the least effective firms whereas it only was 0.43 for firms located
in unit 4.
This result is not consistent with the hypothesis of free cash flow (Jensen
1986), but is in line with the results ofOpler, Titman (1994).
In line with the findings of Shleifer, Vishny (1986), Bethel, Liebeskind
(1993), Agrawal, Knoeber (1996), our results suggested that the most effective
firms have a greater shareholder concentration.
For years 1991 and 1993, the mean of shareholder concentration mean was
respectively 71.72% and 75 .72% for the most effective firms whereas 63 .22%
and 61.42% for firms with very poor performances. Nevertheless, the results are
hardly significant (significant at 10% for 1991 and 1993 for the variable %A 13 ).
In short, the size (measured by total assets) seems to have an influence on
performance but the results are not significant except for 1993 (Wilcoxon test -
2.39 significant at the 5% threshold) and for 1992 and 1995 (The Wilcoxon tests
were -1.93 and -1.86, significant respectively at the 10% threshold). This result
seems to suggest that size has a positive influence on performance but that this
impact remains weak.

3.3 Influence of monitoring and size variables on


performance 11

In order to understand more clearly the role of monitoring and size variables
on corporate performance, we carried out the empirical analysis by using a
regression (Eq. 1).
Mechanisms of discipline and corporate performance 197

Table 2. Estimation of coefficients of equation 1ftandom effect}_


R2% Intercept %A13 DEFIEQUI STR LNTA
ROI 34.9 -0.01 0.01 -0.09 0.00 0.00
(-0.61) (1.53) (-2.56) (2 .59) (0.14)
*** ***
ROE 18.2 -0.23 0.01 -0.03 O.Ql O.Ql
(-1 .66) (0.47) (-11.83) (0.65) (0.21)
* ***
Note: ***, ** and * stgmficant at the 1, 5 et 10% threshold. N =Number of observations.

Our results suggest that leverage has a negative influence on economic


performance: ROI (coefficient of -0.09 significant at the 1% threshold) and ROE
(coefficient of -0.03 also significant at the 1% threshold). These results are in line
with those of Opler, Titman (1994) noting a negative and significant influence of
leverage on performance but our results are not consistent with the 'free cash
flow' hypothesis of Jensen (1986) for whom leverage, and more especially its
excess, is a means to improve performance. Our H2b hypothesis is verified, on
the contrary the H2a hypothesis is not verified. The security turnover rate, a
variable of external discipline, positively influences performance (ROI and ROE)
which verifies our H3 hypothesis. Nevertheless, the results are not significant
except for the ROI (coefficient of 0.00 significant at the 1% threshold). These
results are in line with those of Titman, Wessels (1998). Indeed, these authors
show that a considerable volume of transactions shows the discipline of the
financial market. It is a means of putting pressure on management to act in
accordance with the interests of investors.
Finally, shareholder concentration and size appear to have a positive influence
on economic and financial performance (H 1 verified) but the results (coefficients
of0.02 and 0.01) are not significant.

CONCLUSION

Our objective, in this paper, was to understand more clearly the relationship
between ownership structure and performance.
The contributions of this article are both economic and methodological.
With regard to the methodological aspect, this work illustrates the utility of
SOM. The strong points of these methods are the following: in particular, they
enable analysis of data whose distribution is not normal, and are able to
determine links between variables that are non-linear. The Kohonen algorithm
allows individuals within homogeneous groups to be grouped.
Several results appear:
Firstly, our results suggest, from a sample of 136 firms, that the relationship
between structure of Board of Directors and performance is non-linear.
198 Chapter 9

Furthermore, one can observe that the least effective firms are those in which the
proportion of outside directors is the highest. Secondly, variables of leverage, and
stock turnover are the main explanatory variables of performance. Although
leverage has a negative influence on performance (Opler, Titman 1994) (result
significant at 1% threshold) conversely stock turnover has a beneficial impact
(Charreaux 1997) Thirdly, variables of concentration of ownership structure and
size have a positive, but not significant, influence on performance.
These results lead us to question several points. The first is the measurement
of the shareholders' concentration. The percentage of the capital (held by the
main shareholders) is probably different from the percentage of voting rights.
Unfortunately, we know of no French database with this accurate information.
The second important point in the ownership structure is the presence of
institutional investors among shareholders. These shareholders can force
managers to create value. Taking into account these elements in the future will
allow us to more clearly understand the influence of ownership structure on
performance.

ACKNOWLEDGEMENTS

The author thanks Fany Declerck, Eric De Bodt and the anonymous referees
for their comments.

APPENDIX 1
Many traditional 12 methods presuppose strong hypotheses: in particular, the assumption of
normality. To test this, we examined the distribution of the ratios. Using the Kolmogorov test, we
were able to test normality. Our results suggest that our ratios do not have a normal distribution and
that there are extreme values which require the use of qualitative data. This non-normality and the
presence of extreme values led us to cluster our individuals into 4 classes. Hence, we first
transformed each Xi character (that is to say the performance variables: ROI, ROE and the
composition of Board of Directors: OUTDIR) into 4 categories (very strong, strong, weak, very
weak) and secondly transformed our variables into binary variables. The table of values of X
turned into a table of N rows (in our cases the 136 individuals) and 12 columns corresponding to
12 qualitative data (each variable has 4 categories).
Thus we used a specific kind of self-organized map (SOM) called the Kohonen map 13 • The
Kohonen algorithm 15 is a well-known unsupervised learning algorithm which produces a map
composed of a fixed number of units (figs. 1,2, 3, 4 and 5 present a one-dimensional map,
frequently called a string). Each unit has a specific position on the map and is associated with an n-
dimensional vector ~ (which will define its position in the input space); n being the number of
dimensions of the input space. Moreover, a physical neighborhood relation between the units is
defined. Units I and 3 are neighbors of Unit 2 and for each unit i . Vr(i) represents the
neighborhood with the radius r centered at i .
Mechanisms of discipline and corporate performance 199

After learning, each unit represents a group of individuals with similar features. The
correspondence between the individuals and the units more or less respects the input space
topology: individuals with similar features correspond to the same unit or to neighboring units. The
final map is said to be a self-organized map that preserves the topology of the input space.
The learning a~orithm takes the following form:
wi
- at time 0 ~0) is randomly defined for each unit i ,
- at time t we present a vector x(t) randomly chosen according to the input density J and we
determine the winning unit ( which minimizes the Euclidean distance between x(t) and Wi (t).
We then modify the ~ in order to move the weights of the winning unit ( and its physical
neighbors towards x(t) using the following relations:

Wj(t + 1) = Wi(t) + [c(t)(x(t)- Wj(t)]] for iEVr(t)(i•) (Eq. 1)

Wj(t + 1) = Wj(t) for other I, (Eq. 2)

where &(t) is a small positive adaptation parameter. r(t) is the radius of Vr(t). and c(t) and
r(t) are progressively decreased during the learning 15 .
This is clearly a competitive kind of algorithm (each unit competes to be the closest to the
presented individual) which will perform two interesting tasks for data analysis:
1) clustering: each unit will be associated with a similar kind of individual. the Wi vector
associated with the unit converging toward the mean profile of the associated individuals.
2) reduction in the number of dimensions: the (at least local) proximities between the units will
give us an idea of the proximities of clusters of individuals in the input space.
A last remark concerning the neighbourhood: it is reduced progressively to finish at value 0
(only the winning unit is displaced). The Kohonen algorithm then turns into a vectorial
quantification.

APPENDIX2
The panel nature of our data allowed us to use a panel data methodology for our empirical
research. As Dormont (1989) states, this type of analysis presents clear advantages over cross-
sectional or time-series studies. For instance, it can check for firm heterogeneity, and reduce
colinearity among variables that are considered. Moreover, this technique enabled us to eliminate
the potential biases in the resulting estimates due to correlation between unobservable individual
effects and the explanatory variables in the study. Our panel data may be represented as follows:
Yit = xit f3 + 1Ji + uit
Where Y is the dependent variable, X is a vector containing all explanatory variables, f3 is
also a vector with the variable coefficients that we attempt to estimate, 1Ji denotes the
unobservable individual specific effect that is time - invariant, and uit is the random error, with i
denoting firms (cross - section dimension) and t denoting years (time-series dimension).
A critical question on cross section models is to identify whether the unobservable individual
effects are fixed or random, that is if these effects are orthogonal or not compared to the exogenous
variables considered. Usually, the individual effects are correlated with the independent variables,
and as Dormont ( 1989) asserts, this generates biases in the least squares estimators.
Notwithstanding, one of the main advantages of panel data models, like the one we used in this
work, is that they give us the possibility of eliminating the cited biases.
200 Chapter 9

To verifY the character of individual effects, Hausman's specification test is generally used.
The null hypothesis can be written as follows : [HO: cov(TJ; , X;1=0]. If we accept the null hypothesis,
the individual effects are supposed to be random and we will have to apply generalized least
squares (GLS) to our model with instrumental variable estimators. However, if we find that HO is
false, the individual effects are fixed and the GLS estimator biased and inconsistent. In this latter
case we will have to transform our original model, subtracting the average of the variables from it:
lit-~ =(x;,-xi)fJ+(uu-lii)
With this new model, we can use the ordinary least squares (OLS) to estimate its parameters. By
doing so, the model will provide unbiased estimators.
The outcome of Hausman's specification test in our study enabled us to reject the hypothesis
regarding the correlation between the individual unobservable effects and the explanatory variables,
and thereby, the choice was the model of random effects.

NOTES
I. M.C. Jensen (1986, p. 323-329) defines free cash flow as the cash available when all projects
with NPV>O have been realized.
2. In E.I. Altman (1984) direct costs are only one element in the total costs of bankruptcy. Indirect
bankruptcy costs are the lost profits that a firm can be expected to suffer due to a significant
bankruptcy potential (e.g. loss of reputation)
3. SBF is a data set in which sectorial classification is given. We can note 7 industrial sectors:
energy, basic commodities, construction, consumer goods, car industry, other consumer goods,
food industry. With a Chi-square test, we reported no statistical difference.
4. The period has been chosen in reference to the Cadbury report (1992) and Vienot report (1995).
5. The link between economic (stakeholders' interests) and financial (shareholders' interests)
performance lies in the leverage effect. Leverage represents the influence of financial structure
on corporate performance.
6. We measured the ratio interests divided by total financial debt and noted no absurd value (that is
to say value outside the market conditions).
7. In particular, we checked if the equity value was always positive. All values were positive.
8. It is a rank test. Its justification is due to the non-normality of the data. Such tests are very
robust. By arranging the different observations (i.e. by giving them a rank), one identifies the
place of each observation in the sample. One substitutes rank per observation. Thus one
neutralises problems concerning the accurate measurement of the value for every observation.
We can also note that the results of rank tests are not altered by the distributions of observations
(symmetrical, non-symmetrical. .. ).
9. A very good synthesis is presented by D. Gujarati Basics Econometrics, Third Edition,
McGraw-Hill, 499-539.
I 0. T. Kohonen: "Self-Organising Maps", Springer Series in Information Sciences Vol. 30, Springer
Berlin, 1995.
II . The outcome of Hausman's specification test in our study enables us to reject the choice of the
model with fixed effects. The values are respectively 5.09 and 3.47 for ROI and ROE. These
two values are lower than the values of Chi-square at the I 0% threshold, 9 .23). This result leads
us to accept the model with random effects.
12. A traditional method is to use a multidimensional analysis. This analysis aims at a structuring of
sample studied in relation to the measured variables. Thus, we can visualise the existing
Mechanisms of discipline and corporate performance 201

relations between statistical variables. If we apply this method to qualitative variables, we carry
out a MCA (Multiple Component Analysis) (Volle 1981).
13.An extensive presentation can be found in T. Kohonen, "Self organized formation of
topologically correct feature maps", Biological Cybernetics, 43, 1982, p. 59 or in T. Kohonen,
"Self-Organizing Maps", Springer Series in Information Sciences Vol. 30, Springer, Berlin,
1995.
14. The Kohonen algorithm led to some theoretical studies (Cottrell and al. 1994; Ritter, Schulten
1988), and to interesting applications in economics (and more specifically in finance) . The self
organised map (SOM) deals with quantitative data. In our paper we adopted them to qualitative
data.
15 .For the stochastic algorithm t:(t) must follow the requirements ofRobins-Monro.

REFERENCES
Agrawal A., Knoeber C.R. (1996), "Firm performance and mechanisms to control agency problems
between managers and shareholders", Journal of Financial and Quantitative Analysis, 31(3),
377-397.
Altman E.I. ( 1984), "A Further empirical investigation of the bankruptcy cost question", Journal of
Finance, 39(4), 1067-1089.
Bethel J.E., Liebeskind J. (1993), "The effects of Ownership and Corporate Restructuring",
Strategic Management Journal, 14, Summer, 15-31.
Bhagat S., Black B. (1998), "Board lndependance and Long Term Firm Performance", Working
Paper, 143, The Center for Law and Economic Studies-Columbia Law School.
Charreaux G. (1997), Le gouvernement des en/reprises: Corporate Governance, Theories et Faits,
Economica.
Cottrell M. , Fort J.C., Pages G. (1994), "Two or three things that we know about the Kohonen
algorithm", in Proceedings of ESANN, Bruxelles, 235-244.
Cottrell M., Fort J.C., Pages G. (1998), "Theoretical Aspects of the SOM Algorithm",
NeuroComputing, 21 , 119-138.
Cottrell M., Letremy P, Roy E. (1993), "Analysing a Contingency Table with Kohonen Maps: a
Factorial Correspondence Analysis", in Proceedings of IWANN '93, Springer Verlag, 305-311.
Dormont B. (1989), Introduction a l'econometrie des donnees de panel: theorie et applications a
des echantillons d'entreprises, Monographies d'econometries, Editions du CNRS.
Eisenhardt K., Bourgeois L. (1988), "Politics of strategic decision making in high-velocity
environments: toward a midrange theory", Academy of Management Journal, 31 , 737-770.
Fama E.F., Jensen M.C. (1983a), "Separation of ownership and control", Journal of Law and
Economics, 26, 301-324.
Fort J.C., Pages G. (1995), About Kohonen algorithm: strong or weak self organization, Neural
Networks.
Hermalin B.E., Weisbach M.S. (1988), "The determinants of board composition", Rand Journal of
Economics, 19(4), 589-606.
Jensen M.C. (1986), "Agency costs of free cash flow, corporate finance and takeovers", American
Economic Review, 76, 323-329.
Jensen M.C. (1993), "The modem industrial revolution, exit and the failure of internal control
systems", Journal of Finance, 48(3), 831-880.
Kohonen T. (1982), " Self organized formation of topologically correct feature maps", Biological
Cybernetics, 43, 115-164.
202 Chapter 9

Kohonen T. (1995), "Self Organizing Maps", Springer Series in Information Sciences, 30, 12-27.
Kosnik R. ( 1990), "Effects of board demography and directors' incentives on corporate greenmail
decisions", Academy of Management Journal, 33, 129-151.
Maati J. (1998), "Le conseil d'administration: outil de controle et d'ordonnancement social des
firmes en France", Colloque de !'Association Franyaise Financiere (AFFI).
Mace L. (1986), Directors, myth and reality, Harvard Business School Press, Boston.
Malecot J.-F. (1984), "Theorie financiere et couts de faillite", PhD Thesis, France.
McConnell J.J., SERVAES H. (1990), "Additional evidence on equity ownership and corporate
value", Journal of Financial Economics, 27, 595-612.
Modigliani F., Miller M., (1963), "Corporate income taxes and the cost of capital: a correction",
American Economic Review, 53, 433-443 .
Morek R., Shleifer A., Vishny R.W. (1988), "Management Ownership and Market Valuation",
Journal of Financial Economics, 20, 293-315.
Opler T., Titman S. (1994), "Financial Distress and Corporate Performance", Journal of Finance,
49(3), 1015-1040.
Patton A. Baker J. (1987), "Why do not directors rock the board?", Harvard Business Review, 65,
10-12.
Ritter H., Schulten K. (1988), "Convergence Properties ofKohonen's Topology Conserving Maps:
Fluctuations, Stability and Dimension Selection", Bioi Cybernetics, 60, 59-71.
Robins H. Monro H. (1951 ), "A Stochastic Approximation Method", Annates de Mathematiques et
de Statistiques, 22, 400-407.
Rosenstein S., Wyatt J.G. (1990), "Outside directors, board independence and shareholder wealth",
Journal of Financial Economics, 26, 175-191.
Rosenstein S., Wyatt J.G. (1997), "Inside directors, board effectiveness and shareholder wealth",
Journal ofFinancial Economics, 44, 229-250.
Sevestre P. (1992), "L'econometrie sur donnees individuelles-temporelles: une note introductive",
INSEE, 9204.
Shleifer A., Vishny R.W. (1986), "Large shareholders and corporate control", Journal of Political
Economy, 94,461-479.
Short H., Keasey K. ( 1999), "Managerial Ownership and the performance of firms : evidence from
the UK", Journal of Corporate Finance, 5, 79-101.
Sundaramuthy C., Mahoney J., Mahoney J. (1997), "Board structure, antitakeover provisions and
stockholder wealth", Strategic Management Journal, 18, 3, 231-245.
Titman S., Wessels R. (1988), "The determinants of capital structure choice", Journal of Finance,
43(1), 1-19.
Vancil R. (1987), Passing the baton: managing the process of CEO succession, Harvard Business
School Press, Boston.
Voile M. (1981), Analyse des donnees. Economica, 2° ed., Paris.
Weisbach M.S. (1988), "Outside directors and CEO turnover", Journal of Financial Economics,
20,431-460.
Wruck K.H. (1990), "Financial distress, reorganization, and organizational efficiency", Journal of
Financial Economics, 27,419-444.
Chapter 10

Approximation by radial basis function networks


Application to Option Pricing

Amaury LENDASSE3 , John. LEEb, Eric DE BODr, Vincent WERTZ\


Michel VERLEYSENb
0Universite catholique de Louvain, CESAME, 4 av. G.Lemaitre, B-1348 Louvain-la-Neuve,
Belgium, {lendasse, wertz)@auto.ucl.ac.be; bUniversite catholique de Louvain,
DICE, 3 pl. du Levant, B-1348 Louvain-la-Neuve, Belgium, {lee, verleysen)@dice.ucl.ac.be;
cUniversite catholique de Louvain, lAG, 1 pl. des Doyens, B-1348 Louvain-la-Neuve, Belgium,
debodt@fin.ucl.ac.be.

Abstract: We propose a method of function approximation by radial basis function networks.


We will demonstrate that this approximation method can be improved by a pre-
treatment of data based on a linear model. This approximation method will be
applied to option pricing. This choice justifies itself through the known nonlinear
nature of the behaviour of options price and through the effective contribution of the
pre-treatment proposed for the implementation of radial basis function networks in
this field.

Key words: Option pricing, Artificial Neural Networks, Radial-Basis Function Networks, Non-
linear approximation, Preprocessing techniques

INTRODUCTION

The approximation of functions is one of the most general uses of artificial


neural networks. The general framework of the approximation problem is the
following. One supposes the existence of a relation between several input
variables and one output variable. This relation being unknown, one tries to build
an approximator (black box model) between these inputs and this output. The
structure of this approximator must be chosen and the approximator must be
calibrated as to best represent the input-output dependence. To realize these
different stages, one disposes of a set of inputs-output pairs that constitute the
learning data of the approximator.

C. Lesage et al. (eds.), Connectionist Approaches in Economics and Management Sciences


© Springer Science+Business Media Dordrecht 2003 203
204 Chapter 10

The most common type of approximator is the linear approximator. It has the
advantage of being simple and cheap in terms of computation load, but it is
obviously not reliable if the true relation between the inputs and the output is
nonlinear. One has then rely on nonlinear approximators such as artificial neural
networks.
The most popular artificial neural networks are the multilayer perceptrons
(MLP) developed by Werbos (1974) and Rumelhart (1986). In this chapter, we
will use another type of neural networks: the radial basis function networks (or
RBFN) (Powell 1987). These networks have the advantage of being much
simpler than the perceptrons while keeping the major property of universal
approximation of functions (Poggio, Girosi 1987). Numerous techniques have
been developed for RBFN learning. The technique that we have chosen has been
developed by Verleysen and Hlavackova (1994). This technique is undoubtlessly
one of the simplest ones but it gives very good results. The RBFN and the
learning technique chosen will be presented in section 1.
We will demonstrate that the results obtained with RBFN can be improved by
a specific pre-treatment of the inputs. This pre-treatment technique is based on
linear models. It does not complicate the RBFN learning but yields very good
results. The pre-treatment technique will be presented in section 2.
These different techniques will be applied to option pricing. This problem has
been successfully handled by for instance Hutchinson, Lo and Poggio (1994), a
work that has surely widely contributed to give credibility to the use of artificial
neural networks in finance. The existence of a chapter dedicated to neural
networks in the work of Cambell, Lo and MacKinlay ( 1997) sufficiently attests
to it. Hutchinson et al., by using notably simulated data, have demonstrated that
RBFN allow to price options, and also to form hedged portfolios. The choice that
the authors made of the determination of a call option price as application
domain of neural networks in finances is certainly not an accident. The financial
derivatives assets indeed characterize themselves by the nonlinear relation that
links their prices to the prices of the underlying assets. The results that we obtain
are comparable to those of Hutchinson et al. but with a simplified learning
process. We will demonstrate with this example the advantages of our technique
of data pre-treatment. This example will be handled in detail in section 3.

1. APPROXIMATION BY RBFN

We dispose of a set of inputs x 1 and a set of outputs y 1• The approximation of


y, by a RBFN will be noted y1• This approximation will be the weighted sum of m
Gaussian kernels <1>:
Approximation by radial basis function networks 205

m
Yt = L),i<l>(xt ,Ci ,oJ (Eq.l)
i=l

t = 1 to N , with

(Eq.2)

The RBFN is illustrated in Figure 1.


The complexity of a RBFN is determined by the number of Gaussian kernels.
The different parameters to specify are the position of the Gaussian kernels ( C;),
their variances (cr;), and the multiplicative factors (A;). The technique that allows
determining them is developed in detail in Verleysen, Hlavackova (1994). We
will explain it briefly.
The position of the Gaussian kernels is chosen according to the distribution of
x 1 in space. At locations where there are few inputs x 1 few nodes will be placed
and conversely, a lot of nodes will be placed where there are many input data.
The technique that allows realizing this operation is called vector quantization
and the points that summarize the position of the nodes are called centroids. The
vector quantization is composed of two stages. The centroids are first randomly
initialized in the space. They are then placed in the following way. All x 1 points
are inspected, and for each of them the closest centroid will be moved in the
direction of x 1 according to the following formula:

(Eq.3)

with x 1 the considered point, C; the closest centroid to x 1, and a a parameter that
decreases with time. Further details on vector quantization methods can be found
in Kohonen (1995) and Gray (1984).
206 Chapter 10

Figure 1. Representation of a RBFN

The second parameter to be chosen is the standard deviation (or width) of the
different Gaussian kernels (a;). We chose to work with a different width for each
node. To estimate them, we define the Voronoi' zone of a centroid as the space
region that is closest to this centroid than to any other centroid. In each one of
these Voronoi' zones, the variance of the points belonging to that zone is
calculated. The width of a Gaussian kernel will be the product of the variance in
the Voronoi' zone where the node is located, multiplied by a factor k. We will
explain in our application how to choose this parameter (Benoudjit, Archambeau
Lendasse et al. 2002). This method has several advantages, the most important
being that the Gaussian kernels better cover the space of the RBFN inputs.
The last parameters to determine are the multiplicative factors A;. When all
other parameters are defined, these are determined by the solution of a system of
linear equations.
The total number of parameters equals m *(n+ J)+ I with n being the
dimension of the inputs space and m being the number of Gaussian kernels used
in the RBFN.

2. RBFN WITH WEIGHTED INPUTS

One of the disadvantages of the RBFN that we have presented is that they
give an equal importance to all input variables. This is not the case with other
approximators of functions such as the MLP. We will try to eliminate this
disadvantage without penalizing the parameters estimation process of the RBFN.
Approximation by radial basis function networks 207

Let's suppose first that all inputs are normalized. We understand by this that
they all have zero mean and unit variance. If we build a linear model between the
inputs and the output, the latter will be approximated by a weighted sum of the
different inputs. The weighting associated to each input determines the
importance that this latter has on the approximation of the output. Indeed, if one
differentiates the linear model with respect to the different inputs, one finds back
these very same weightings. This is illustrated in the following example:

(Eq.4)

which yields:

(Eq.5)

(Eq.6)

We thus dispose of a very simple mean to determine the relative importance that
the different inputs have on the output.
We will then multiply the different normalized inputs by the weighting factors
obtained from the linear model. These new inputs will be used in a RBFN such as
the one we presented in the previous section. This new RBFN that we will
qualify as «weighted», will thus give a different importance to the different input
variables.

3. OPTION PRICING

The initial success of neural networks in finance has most surely been
motivated by the numerous applications presented in the field of assets price
prediction (Cottrell, de Bodt, Levasseur 1996) present a wide synthesis of
obtained results in this field). The emergence of nonlinear prevision tools and
their universal approximation properties, obviously not well understood, brought
in new hopes. Quickly though, it appeared that forecasting the price of assets
remains an extremely complex problem, that the concept of financial markets
informational efficiency introduced by Fama (1965) is no idle words; to
overperform financial markets, after having taken account of the transaction costs
and the level of risk taken is not simple.
208 Chapter 10

The application studied in this section, the modeling of the behavior of the
price of a call option, as developed by Hutchinson, Lo and Poggio (1994 ),
presents a typical case of application of neural networks in finance. The prices of
the derivatives depend nonlinearly on the price of the underlying assets. Major
advances have been introduced in finance to set up analytical evaluation formulas
for assets derivatives. The most famous is undoubtedly the one established by
Black and Scholes (1973), daily used nowadays by the majority of financial
operators. Evaluation formulas of options prices are based on very strict
assumptions among which, for example, the fact that the actions prices follow a
geometric Brownian motion. The fact that these assumptions are not strictly
verified in practice explains that the prices observed on financial markets deviate
more or less significantly from the theoretical prices. In this context, to dispose
of a universal function approximator, capable of capturing the nonlinear relation
that links an option price to the price of its underlying asset, but that does not rely
on the assumptions necessary for the setting up of analytic formulas, presents an
obvious interest. It is though necessary that the proposed tool be reliable and
robust for it to be adopted by the financial community. This is indeed our major
concern.

3.1 Generating data

The RBFN with weighted inputs has been tested on an example of


determination of a call option price. This example has been handled by
Hutchinson, Lo and Poggio (1994), and we will use the same method of
generation of data.
To generate their data, the authors use in their article the Black and Scholes
formula (1973) in order to simulate the call option prices. This formula is the
following:

C(t) = S(t _}<I>(d 1)- Xe -r(T-t )<I>(d 2 ) (Eq.7)

with

(Eq.8)

and
Approximation by radial basis function networks 209

(Eq.9)

In the above formulas, C(t) is the option price, S(t) the stock price, X the strike
price, r the risk-free interest rate, T-t the time-to-maturity, a the volatility and <I>
is the standard normal distribution function. If r and s are stable, which is the
case in our simulations, the price of the call option will only be function of S(t), X
and T-t. The approximation type that has been chosen is the following:

(Eq.lO)

For our simulation, the prices of the option during a period of two years will be
generated, in a classical way, by the following formula:

I
IZ;
S(t)=S(O)ei=i (Eq.ll)

taking the number of working days by year equal to 253, and Z1 a random
variable extracted from a normal distribution with 1.1 = 0.10/253, and variance
d = 0.04/253. The value S(O) equals 50 US$.
The strike price X and the time-to-maturity T-t are determined by the rules of
the «Chicago Board Options Exchange» (CBOE) (Hull 1993). In short, the rules
are the following:

1. The strike price is a multiple of 5$ for the stock prices between 25 and 200$;
2. The two closest strike prices to the stock prices are used at each expiration of
options;
3. A third strike price is used when the stock price is too close to the strike price
(less than one dollar);
4. Four expiration dates are used: the end of the current month, the end of the
next month and the end of the next two semesters.

A typical trajectory obtained by the application of these rules is represented in


Figure 2.
210 Chapter 10

~OL-----~
, 00
-------~
~----~~
~----~
400
L------~
~----~
~

Jou-

Figure 2.Typical trajectory derived from CBOE's rules

Note: The continuous line represents the action price. The oblique lines represent the different
exercise prices. These are represented obliquely to make visible the different introduction and
expiration dates.

The call option prices obtained usmg these trajectories are represented m
Figure 3.

05

0.4

0.3

50.2
0 .1

0
0.5

1.6

T-1
SIX

Figure 3. Option purchase prices obtained by using the simulated trajectories and the Black and
Scholes formula .
Approximation by radial basis function networks 211

3.2 Performance measures

Three performance measures will be used, as in Hutchinson, Lo, Poggio


(1994). The first measure is the determination coefficient R2 between C etC. The
two other performance measures are the tracking error 1; and the prediction error
11· These errors are defined as follows:

(Eq.12)

11 = erT ~ E 2 [v(T)]+ Var[V(T)] (Eq.13)

V (t) = VS (t) + VB (t) + V C(t) (Eq.14)

with V(t) being the portfolio value at time t, Vs the stock value, V8 the obligations
value, and Vc the option value. If the option price is correctly evaluated, V(1)
should at any time be equal to zero, given that it is a fully hedged portfolio. The
more the tracking error (/;) deviates from 0, the more the option price deviates
thus from its theoretical value. The prediction error is based on the classical
formula of variance decomposition (the variance is equal to the difference
between the expectation of the squared variable and its squared expectation). The
expected squared V(1), in other words the prediction average quadratic error,
equals thus the sum of its squared expectation and its variance. The terms err
represent the actualization terms in continuous time, allowing the addition of
obtained results at different moments in time. A more detailed explanation of
these criteria can be found in Hutchinson, Lo, Poggio (1994).

3.3 Results

In order to measure the quality of the results obtained by classical and


weighted RBFN, we have simulated a price sample of a duration of 6 months
(using formulas (Eq.7) and (Eq.ll)). Two RBFN are calibrated on these data: a
classical RBFN and a weighted RBFN. The number of Gaussian kernels is 6.
This corresponds in fact to 19 parameters per RBFN, which is roughly equivalent
to the 20 parameters RBFN used in Hutchinson, Lo, Poggio (1994).
Then, one hundred test-sets are generated (using the same formulas), and for
each of the two RBFN the coefficient R2 is calculated. The values of 1; and 11
obtained for the two RBFN and for the exact Black and Scholes formula are also
calculated.
212 Chapter 10

The results obtained for R2 (averaged on the one hundred test-sets) are
presented in Figure 4, as a function of k, the coefficient used to compute the
width of the nodes. The value of k to be used is chosen as the smallest value
giving a result (in terms of R2) close to the asymptote, that is to say a value that
can be found in the elbow of the curves in Figure 4. The value of k = 4 has been
chosen in this case.

---------------------------------·
0.95
.,.. .......... --
/
/
/
0.9 /
/
/
I
I
I
0.85 I
I

I
I
I
I

tr 0.8
I
I
I

I
I
I
I
0.75 I
I

I
I
I
I
I
0.7 I
I
I
I
I
I
I
0.65

5 10 15
k

Figure 4. Value of R2 as a function of the coefficient k of the RBFN.

Note: Dotted line: classical RBFN; solid line: weighted RBFN .

The benefit of weighting is obvious. The R2 obtained exceeds 97%, which is


equivalent to the results in Hutchinson, Lo, Poggio (1994). while using a RBFN
with much simpler learning process.

The results obtained for f and 17 are also in favour of the weighted RBFN.
Table 1 presents the average values and the standard deviations of R2, ~ and 11 for
both types of RBFN. As for the performance measures for the Black and Scholes
exact formula, we have~= 0.57 and 11 = 0.85.
Approximation by radial basis function networks 213

Table 1. Average values and standard deviations of R2 , ~ and 11 for both types of RBFN.

classical RBFN 0.93 ± 0.10 1.80 ± 0.53 2.03 ± 057


weighted RBFN 0.97 ± 0.02 1.24 ± 0.50 1.50 ± 0.53

CONCLUSION

In this paper, we have presented a simple method to parameterise a RBFN.


We have then proposed an improvement to this classical RBFN. This
improvement consists in the weighting of inputs by the coefficients obtained
through a linear model. These methods have then been tested for the
determination of the price of a call option. The results that we have obtained
show a clear advantage of the weighted RBFN whatever the performance
measure used. In addition, in the example used, the results are comparable to the
best RBFN or multilayer perceptrons that can be found in literature. The
advantages of this weighted RBFN are thus simplicity of parameterisation and
quality of approximation.

ACKNOWLEDGMENTS

Michel Verleysen is Senior research associate at the Belgian Fonds National


de la Recherche Scientifique (FNRS). The work of John Lee has been realized
with the support of the Ministere de la Region wallonne, in the framework of the
Programme de Formation et d'Impulsion a la Recherche Scientifique et
Technologique. The work of A. Lendasse and V. Wertz is supported by the
Interuniversity Attraction Poles (lAP), initiated by the Belgian Federal State,
Ministry of Sciences, Technologies and Culture. The scientific responsibility
rests with the authors.

REFERENCES
Benoudjit N., Archambeau C., Lendasse A., Lee J., Verleysen M. (2002), "Width optimization of
the Gaussian kernels in Radial Basis Function Networks", ESANN 2002, European Symposium
on Artificial Neural Networks, Bruges (Belgium), 425-432.
Black F., Scholes N. (1973), "The pricing of options and corporate liabilities", Journal of Political
Economy, 81, 637-659.
Cambell Y., Lo A., MacKinlay A. (1997), The Econometrics of Financial Markets, Princeton
University Press, Princeton.
214 Chapter 10

Cottrell M., de Bodt E., Levasseur M. (1996)," Les reseaux de neurones en finance: Principes et
revue de Ia litterature", Finance, 16, 25-92.
Fama E. (1965), "The Behaviour of Stock Market Prices", Journal of Business, 38, 34-105.
Gray R. (1984), "Vector Quantization", IEEE Mag., 1, 4-29.
Hull J. (1993), Options, Futures, and Other Derivative Securities, 2"d ed., Prentice-Hall,
Englewood Cliffs, New Jersey.
Hutchinson J., Lo A., Poggio T. (1994), "A Nonparametric Approach to Pricing and Hedging
Securities Via Learning Networks", The Journal of Finance, XLIX(3).
Kohonen T. (1995), "Self-organising Maps", Springer Series in Information Sciences, 30, Springer,
Berlin.
Poggio T., Girosi F. (1987), "Networks for approximation and learning", Proceedings of IEEE 87,
1481-1497.
Powell M. (1987), "Radial basis functions for multivariable interpolation: A review", in Mason
J.C., Cox M.G. (eds.), Algorithms for Approximation, 143-167.
Rumelhart D., Hinton G., Williams R. (1986), "Learning representation by back-propagating
errors", Nature, 323, 533-536.
Verleysen M., Hlavackova K. (1994), "An Optimised RBF Network for Approximation of
Functions", ESANN /994, European Symposium on Artificial Neural Networks, Brussels
(Belgium), 175-180.
Werbos P. (1974), "Beyond regression: new tools for prediction and analysis in the behavioral
sciences", PhD thesis, Harvard University.
Chapter 11

The dynamics of the term structure of interest rates:


an Independent Component Analysis

Franck MORAux•, Christophe VILLAb


aUniversite of Rennes I, CREREG, franck.moraux@univ-rennesl.fr; b CREST- ENSAI and
CREREG - University of Rennes I

Abstract: The movements of a term structure of interest rates are commonly assumed to be
driven by a small number of uncorrelated factors. Identified to the level, the slope,
and the curvature, these factors are routinely obtained by a Principal Component
Analysis (PCA) of historical bond prices (interest rates). In this paper, we focus on
the Independent Component Analysis (ICA). The central assumption here is that
observed multivariate time series reflect the reaction of a system to some (few)
statistically independent time series. The ICA seeks to extract out independent
components (ICs) as well as the mixing process. Both ICA and PCA are linear
transform of the observed series. But, whereas a PCA obtains uncorrelated
(principal) components, ICA provides statistically independent components. In
contrast to PCA algorithms that use only second order statistical information, ICA
algorithms (like JADE) exploit higher order statistical information for separating the
signals. This approach is required when financial data are suspected to be not
gaussian.

Key words: Term Structure of Interest Rates, Principal Component Analysis, Independent
Principal Component Analysis.

INTRODUCTION

It is now well established that factor analysis is adequate for financial risk
management purposes. Factor or Principal Component analysis (FA, PCA) are
celebrated statistical methods to extract out factors from data and to measure the
way each factor affects or loads on variables. Such analysis are nowadays widely
used to decompose the dynamics of the term structure of interest rates in few

215

C. Lesage et al. (eds.), Connectionist Approaches in Economics and Management Sciences


© Springer Science+Business Media Dordrecht 2003
216 Chapter 11

underlying components. Among other things, it indicates that interest rate models
should not be so parsimonious as to have only a single underlying source of
uncertainty. But hedging fixed income securities remains straightforward by
introducing the so-called factor durations. Analogous to the standard Macaulay
duration, they are easy to compute. In addition, factor durations of a portfolio are
the weighted averages of the portfolio components factor durations.
As explained by Bliss (1997), economic variables that may affect interest
rates dynamics include (along with innumerable others) the supply and demand
for loans, announcements of unemployment and inflation, and changes in market
participants risk aversion arising from perceived changes in the prospects for
continued economic growth. The key assumption of a factor model is that this
multitude of influences can be compactly summarized by a few variables, called
factors, that capture the changes in the underlying determinants of interest rates.
Part of the process of performing a factor model is to examine just how
reasonable that assumption is. The factor shocks are not the fundamental causes
of changes in the term structure; rather, they are sufficient statistics for fully
capturing the underlying economic shocks that do cause the changes.
Rather than PCA or FA, this paper applies Independent Component Analysis
(ICA) to extract a structure from bond returns. (see Back and Weigend (1997) for
stock returns and Ane and Abidi (2001) for implied volatilities). Because this
method is an alternative approach for finding underlying factors or components
from multivariate statistical data. ICA must be contrasted with the classical
Principal Component Analysis (hereafter PCA). Both ICA and PCA linearly
transform the observed signals into components. The key difference, however, is
in the type of components obtained.
The goal of PCA is to obtain components which are uncorrelated 1• PCA
algorithms therefore use only second order statistical information. According to
several papers, e.g. Bliss (1997), the three-factor decomposition of the
movements in interest rates uncovered by Litterman and Scheinkman (1991) is
robust. The ability of the three factors (level, slope, and curvature) to explain
observed changes in interest rates is high in all subperiods studied. ICA find
components which are statistically independent. Independence being a much
stronger property than uncorrelatedness, PCA and ICA may generate different
components. Several algorithms are nowadays available. Recent contributions to
ICA lie in the neural network literature. Most of the ones found in the signal
processing literature rather exploit the algebraic structure of higher order
moments of the observations. Cardoso (1999) pointed out that this last approach
is largely ignored by the neural network community involved in ICA. He corrects
this view by showing how higher-order correlations can be efficiently exploited
to reveal independent component.
The empirical study carried out in this paper uses the JADE 2 algorithm
(Cardoso, Souloumiac 1993). Like many other ICA algorithms, this algorithm
The dynamics of the term structure of interest rate 217

follows a two-stages procedure termed Whitening and Rotation. The first stage is
performed by doing a classical PCA to whiten the observed data. The second
stage consists of finding a rotation matrix which jointly diagonalizes
eigenmatrices formed from the fourth order cumulants of the whitened data. The
outputs from this stage are the independent components. Thus one can say that a
PC Analysis solves half of the problem ofiC Analysis.
This paper is organized in the following way. Section 1 provides a
background to ICA and a guide to some of the algorithms available. Our specific
experimental results for the application of ICA to US bond data are given in
section 2.

1. THE GENERAL FRAMEWORK

The goal of ICA is to recover unknown source signals from their unknown
linear mixtures using the strong assumption that the sources are mutually
independent. The mutual statistical independence of the original signals is thus
the only assumption in such methods.

1.1 Motivation and Definition

The most famous example is the so called cocktail-party problem: Imagine


you are in a room where two people are speaking simultaneously. You have two
microphones, which you hold in different locations. The microphones give you
two recorded time signals: x 1(t) and x 2(t), with x 1 and x2 the amplitudes, and t the
time index. Each of these recorded signals is a weighted sum of the speech
signals emitted by the two speakers s 1(t) and s2(t). W e could express this situation
as a linear equation:

where a 11 , a 12 , a 21 and a 22 depend on the distances of the microphones from the


speakers. It would be very useful if you could now estimate the two original
speech signals s 1(t) and s 2(t), using only the recorded signals x 1(t) and x2(t). If we
knew au we could solve this linear equation by classical methods. If not the
problem is considerably more difficult.
218 Chapter 11

To rigorously define ICA, we can use a statistical/alent variables model. To


this end, let's denote s(t) = (s 1 (t ), s 2 (t ), ... , s n(t ))' the underlying independent factors
I

and x(t) = (x, (t ), x2 (t ), ... , xJt )) the variables observed at date t. In the
standard ICA literature s is called the source signal and x is the mixed signal.
While this is not completely necessary, the number of independent components is
supposed to be equal to the number of observed variables, m = n. In addition, and
without loss of generality, the observed variables are supposed to be centered2•
Assuming a linear relationship between x and s implies with a vector-matrix
notation that the data model is given by:

x(t)= As(t)
where A is the so-called mixing matrix. It is an unknown (nxn) real matrix.
Omitting t, one also has:

Denoting adh columns of matrix A, the model becomes:

x= L:a1s1
J=l

The ICA model is a generative model in the sense that it describes how the
observed data are generated by mixing underlying components4 . Such
independent components are latent variables since they are not directly
observable; recall however that the mixing matrix is unknown. Because one
observes only the random vector x, both A and s must be estimated. This must be
done under as general assumptions as possible. Throughout the paper, the main
assumptions are:
The sources are statistically independent;
- At most one source is normally distributed;
The signals are stationary.

In addition, minor ones are necessary. The number of variables must of


course be greater than or equal to the number of sources. Whereas sources are
mutually independent at each time instant, no other time constraint is required or
relevant. Finally, there is no measurement error.
The dynamics of the term structure of interest rate 219

ICA task consists of estimating both the matrix A and the underlying factors s
when we only observe x. If the coefficients of A were known and different
enough to make the matrix A invertible, B = A·' could be computed. The
independent factors, we are looking for, would also be given by:

x(t) = B.s(t)
However, as the only thing we observe is x, only linear transformation:

y(t) = W.x(t)
may be considered where y(t)=(y 1(t), y 2(t), ... ,y0 (t)Y is a recovered signal .As, this
signal is also related to the underlying factors, one has y(t)=WAs(t). If WA=l
then y=s and perfect separation occurs. The problem is finding W, the unmixing
matrix that gives y= Wx - the best estimate of the independent source vector. If
the unknown mixing matrix A is square and nonsingular, then W=A- 1 and s=y.
Else, the best unmixing matrix, which separates sources as independent as
possible, may be given by pseudo-inverse matrices (as the generalized inverse
Penrose-Moore matrix).
Solutions can be found just by considering the statistical independence of s. In
fact, if signals are not gaussian, standard results tell us that it is enough to find
coefficients for W such that the recovered signals (y 1(t) ,y2(t), ... ,y0 (t)) are
statistically independent. Once they are, the recovered signal y is equal to the
original signals s 5 • The separation problem thus boils down to the search of a
linear representation in which the components are statistically independent. In
most practical situations, however, one can find only components that are as
independent as possible.

1.2 Independence vs uncorrelatedness

The potential interest of ICA over PCA is nested in the difference between
independence and uncorrelatedness. Therefore, some recalls merit to be done.
Let's consider two scalar-valued random variables y 1 and y 2 . These variables are
said to be independent if information on the value of y 1 does not give any
information on the value of y 2 , and vice versa. Let's denote p(y 1, y 2) the joint
probability density function (hereafter pdf) of y 1 and y 2 and p 1(y 1) the marginal
pdf of y 1• This pdf is given by p.(y. )= Jp(y.,y2 }1y2 and similarly for y1• y 1 and
y 2 are then said to be independent if and only if the joint pdf is factorizable in the
following way:
220 Chapter 11

This definition can be used to derive an important property of independent


random variables. Given two functions, h 1 and h2 , we always have

Hence, a weaker form of independence may be constructed, which is


uncorrelatedness. Two random variables y 1 and y 2 are said to be uncorrelated, if
their covariance is zero:

If the variables are independent, they are uncorrelated, which follows directly,
taking hi(yi)=Yi· On the other hand, uncorrelatedness does not imply
independence. Assume that (y 1,y2) are discrete valued and follow such a
distribution that the pair are with probability l/4 equal to any of the following
values: (0, I), (0,-1 ), (1 ,0), (-1 ,0). Then y 1 and y 2 are uncorrelated, as can be
simply calculated. On the other hand, £~ 12 Yi] = 0 :t:- ..!_ = £~12 ]E~; ]so
the
4
variables cannot be independent. Since independence implies uncorrelatedness,
many ICA methods constrain the estimation procedure so that it always gives
uncorrelated estimates of the independent components. This reduces the number
offree parameters, and simplifies the problem.

1.3 Potential Ambiguities

Many ambiguities must be avoided when dealing with ICA. Among them, the
first one is that the variances (energies) of the independent components cannot be
determined. The reason is that any scalar multiplier in one of the sources si could
always be cancelled by dividing the corresponding column ai of A by the same
scalar. Whitening (sphering) of the independent components, i.e., choose all
variances equal to one: .E[s?]=l can solve this ambiguity. This establishes equal
"magnitudes" of the sources, but the ambiguity of the sign remains, as we can
multiply any source by -1 without affecting the model. Insignificant in most
applications.
A second ambiguity is that we cannot determine the order of the independent
components. In fact we can freely change the order of the terms in the sum in
The dynamics of the term structure of interest rate 221

x="'J:pjSj , and call any of the independent components the first one. A
j=]
permutation matrix P and its inverse can be substituted in the model to give
x = AP- 1Ps = A's' with s' = Ps, A'= AP- 1 • The sources are then identified by
using a priori knowledge about their features.

2. ESTIMATION METHODS

Many different approaches may be followed to estimate ICs and the "mixing
process", each of them exploits different properties of independent components.
In what follows, one insists on two ways of finding ICs and the mixing matrix:
the Maximisation ofNongaussianity measures and Tensorial Methods.

2.1 Maximisation of Nongaussianity measures

The well known Central Limit Theorem tell us that the distribution of a sum
of independent random variables, arbitrarily distributed, tends toward a gaussian
distribution under rather general conditions. This key result in addition to the non
gaussianity of the underlying sources allow us to find the "demixing matrix" and
the sources we are looking for. In essence, the theorem explains that a gaussian
distribution can be considered as the result of the mixing of many independent
variables. Conversely, a transformation that yields a distribution as far away as
possible from gaussianity can be seen as separating independent components.
To see this, let's consider x distributed according to the ICA data model x=As
and let's assume for simplicity that all the ICs have identical distributions. To
estimate one of the ICs, we consider some linear combination of the xi:

where W is a vector to be determined. If W were one of the rows of the inverse of


A, this linear combination would actually equal one of the ICs. One now shows
how the Central Limit Theorem helps us in determining W so that it would equal
one of the rows of the inverse of A.
Let's first make a change of variables. Defining z = wA ', one has y = w 'x =
w 'As = z 's andy is a linear combination of Sj, with weights given by Zj. Since, by
virtue of the Central Limit Theorem, a sum of even 2 independent random
variables is more gaussian than the original variables, either the z 's is more
gaussian than any of the si, or it is less gaussian and this is possible only if it
equals one of the Sj, . Hence, a natural candidate for W is a vector that maximizes
222 Chapter 11

the nongaussianity of w 'x. Such a vector would necessarily correspond to a z


which has only one nonzero component. This means that w 'x= z 's equals one of
the independent components ! In few words, maximizing the nongaussianity of
w 'x thus gives us one of the independent components.
The key point now is how measuring nongaussianity. The classical measure
of nongaussianity is the so-called kurtosis or fourth-order cumulant. Defined by
E[y4 ]-3E[lJ 2 , the kurtosis of a gaussian random variable is zero. For most (but
not all) nongaussian random variables, kurtosis is nonzero. The absolute value of
the kurtosis has been widely used as a measure of nongaussianity in ICA and
related fields. The advantages of these measure are its theoretical simplicity, its
low computational cost (estimation from the fourth moment of the sample data)
and its linearity property. Disadvantages are however that it is not a robust
measure of nongaussianity. When its value has to be estimated from a measured
sample, it can be very sensitive to outliers. Its value may also depend on only a
few observations in the tails of the distribution, which may be erroneous or
irrelevant observations.
A second possible measure of nongaussianity relies on the information-
theoretic quantity of entropy. The entropy of a random variable is as a measure
of the information that the observation of the variable gives. The more random,
i.e. unpredictable and unstructured the variable is, the larger its entropy. For a
discrete random variable Y, entropy His defined as:

H(Y) = LP(Y = ai )InP(Y = ai )

where the ai are the possible values of Y. For a continuous-valued random vector
y with density j(y), entropy is defined by

H(Y)= fJ(y)Inf(y)dy.

A fundamental result of information theory states that gaussian variables have the
largest entropy among all random variables of equal variance. Entropy can be
used as a measure of nongaussianity. The gaussian distribution is the most
random of all distributions. Entropy is small for distributions that are clearly
concentrated on certain values, i.e., when the variable is clearly clustered. A
closely related measure is the negentropy. Negentropy J of a random variable Y is
defined as:

J(H) = H(Ygauss )- H(Y)


The dynamics of the term structure of interest rate 223

where }gauss is a Gaussian random variable of the same covariance matrix as


Y. Note that the negentropy is always non-negative, but equal to zero if and only
if Y has a Gaussian distribution. It is invariant for invertible linear transformation.
Negentropy is the optimal estimator of nongaussianity, as far as statistical
properties are concerned, it is thus well justified as a measure of nongaussianity
by both statistical theory and empirical tasks. Main drawbacks are its
computation cost and the fact that it requires a (possibly nonparametric) estimate
of the pdf. For practical purposes, simpler approximations ofnegentropy are used
to be necessary.

2.2 Tensorial Methods

Cardoso (1999) has shown how higher-order correlations can be efficiently


exploited to reveal independent component. Several algorithms maximize
measures of independence by a technique akin to the Jacobi method of
diagonalization. These ICA algorithms use higher order 2 statistical information
for separating the signals. Note that uncorrelatedness alone is not enough to
separate the desired components. To keep the following exposition simple it is
restricted to symmetric distributions. For any nxn matrix M, we define the
associated cumulant matrix Tx(M) as the nxn matrix defined component-wise by:

[Tx(M)]y = 2:Cum(xi'x1 ,xk,x,)Mk,


k,l

where the subscript iJ means the (i,j) - th element of a matrix T is a linear


operator, and thus has n2 eigenvalues that correspond to eigenmatrices.
Cum(Xl,X2,X3,.X4) are fourth-order cumulants defined by:

Cum(x1,x2,x3,x4)= E[x1x2x3x4]
- E[x1x2]E[x3x4]
- E[x1x3]E[.x2.x4]
- E[x1x4]E[x2x3]

where X; = X; - E[x;].

Recall that for symmetric distributions odd-order cumulants are zero and that
second order cumulants are Cum(Xl,X2 )=E[Xi.X2] . The variance and the kurtosis of
a real random variable x are defined as:
224 Chapter I I

0' 2 = Cum(x, x) = Elx 2 J,

k(x) = Cum(x,x,x,x) = E[x 4 ] - 3(E[x 2 D


2

These are the second and fourth-order auto-cumulants. A cumulant involving at


least two different variables is called a cross-cumulant.

Under x=As which also reads xi = L aijs 1 where aij denotes the ij- th entry
j

of matrix A, the cumulants of order 4 transform can be written as:

Cum(xi,x1 ,xk ,x1 )= LaipaJqakra,sCum(s P'sq ,sr,ss)


pqrs

Using the assumption of independence of s, one has:

where 8 is the Kronecker function, we readily obtain the simple algebraic


structure of the cumulants of x=As:

Cum(xi, x 1 , xk, x,) =I aipaJpakpa,Pk(s P)


p=l

The structure of a cumulant matrix in the ICA model is easily deduced from this
last equation that:

Tx(M)= AA(M)A 1

with: A(M) = diag(k(s 1 )a 1 1 Map ... ,k(sJan 1 MaJwhere ai denotes the ith
column of A.

In this factorization the kurtosis enter only in the diagonal matrix. Solving for
the eigenvectors of such eigenmatrices, the ICA model can be estimated. This is
typically a joint diagonalization of several matrices but the most difficulty is that
A is not an orthogonal matrix. The Joint Approximate Diagonalization of Eigen-
Matrices algorithm (JADE) uses the fact that any linear mixture of the
independent components can be transformed into white components, in which
case this mixing matrix is orthogonal.
The dynamics of the term structure of interest rate 225

3. JADE ALGORITHM

This section outlines the JADE algorithm (Cardoso and Souloumiac, 1993).
The approach for the JADE algorithm is the following two-stage procedure:
Whitening and Rotation.

3.1 Whitening

Like some other ICA algorithms, JADE require a preliminary whitening of


the data x. This means that before the application of the algorithm (and after
centering), we transform the observed vector x linearly so that we obtain a new
vector x x
which is white, ( = Qx ) i.e. its components are uncorrelated and their
variances equal unity. In other words E[xx'] =I where I is the identity matrix.
The whitening transformation is always possible. The whitening transform Q can
be determined by taking the inverse square root of the covariance matrix via an
eigenvalue decomposition (EVD) of the covariance matrix or by a PCA.
Denoting r the orthogonal matrix of eigenvectors of E[x'X'] and A the
associated diagonal matrix of its eigenvalues, one has E[xx'] = r Ar'.
Whitening can now be performed by:

That is Q=A-tr'. It is easy to check that E[:xx']=I by taking the expectation of:

xx'=A- 2 r'x A- 2 r'x '=A- 2 r'xx'rA-,


I ( I ) I I

Since x=As and after whitening x = Qx , one has x = As where A = QA . It


can be easily shown that A is an orthogonal matrix. Indeed E[xx']=I and:

E[xx']= AE[ss']A'= AA'.


Recall that we assumed that the independent components si have unit variance.
Whitening reduced the problem of finding an arbitrary~matrix in model x=A,! to
the simpler problem of finding an orthogonal matrix A . Once it is found, A is
used to solve the independent components from the observed by
226 Chapter 11

A couple of remarks merits to be done. First, whitening alone does not solve the
separation problem. This is because whitening is only defined up to an additional
rotation: if Q 1 is a whitening matrix, then Q2=U Q 1 is also a whitening matrix if
and only if U is an orthogonal matrix. Therefore, we have to find the correct
whitening matrix that equally separates the independent components. This is
done by first finding any whitening matrix Q, and later determining the
appropriate orthogonal transformation from a suitable non-quadratic criterion.
Second, whitening reduces the number of parameters to be estimated. Instead of
having to estimate the n2 parameters that are the elements of th~ original matrix
A, we only need to estimate the new, orthogonal mixing matrix A.
An orthogonal matrix contains:

n(n -1)
2

degrees of freedom. In larger dimensions, an orthogonal matrix contains only


about half of the number of parameters of an arbitrary matrix.
Thus one can say that whitening solves half of the problem of ICA. Because
whitening is a very simple and standard procedure, much simpler than any ICA
algorithms, it is a good idea to reduce the complexity of the problem this way.

3.2 Rotation

For any nxn matrix M, we can define the associated cumulant matrix T1 (M)
defined component-wise by:

[T1 (M)]u = LCum(x;,x1 ,xk,xJMk1


k,l

where the subscript u means the (iJ)- th~lement oJ a matrix T. We have shown
that whitening yields to the model .X = As with A orthonormal. This model is
still a model of independant components. From section 1.2 the structure of the
corresponding cumulant matrix of x can be written as:

T1 (M)= AA(M)A'

with: A(M) = diag(k(s 1)a,' Ma" ... , k(sJan' Ma'Jand a denotes the i-th
column of A and this for any nxn matrix M
The dynamics of the term structure of interest rate 227

Let IT={M~, ..., Mp} be a set of p matrices of size nxn and denote by Tx (M;)
( 1 ~ i ~ p ) the associated cumulant matrices of the whitened data :X . Again for
all i, we have Tx{M;) = AA(MJ4• with A(M;) a diagonal matrix. As a
measure of nondiagonality of a matrix H, define Off(H) as the sum of the squares
of the non diagonal elements:

Off(H)= L_H~.
i"')

We have in particular:

For any matrix set IT and any orthonormal matrix V, the Jade algorithm
optimize a so-called orthogonal contrast:

argmin L Off(V'T-x (M; )v) .


This criterion measures how close to diagonality an orthonormal matrix V can
simultaneously bring the cumulants matrices generated by IT. With JADE
algorithm, this joint diagonalizer is found by a jacobi technique.

4. APPLICATION TO THE TERM STRUCTURE OF


INTEREST RATES

4.1 Presentation of the data and first results

To investigate the effectiveness of ICA techniques for financial time series,


we apply ICA to term structure of interest rates from the US. We use daily data
from October 1993 to December 2001. Principal Component Analysis is one
statistical way for finding out such factors from the data and measuring the way
each factor affects or loads on variables (here each interest rates). The relations
between interest rate changes and the factors are called factor loadings. One has
Lx=E[xx'}=rAr' , where r is the orthogonal matrix of eigenvectors of the
228 Chapter II

variance-covariance matrix E[xx'] and A is the diagonal matrix of eigenvalues


(A;) i·
Litterman and Scheinkman (1991) have early used a PCA method to
determine the number of the factors underlying movements in interest rates. They
determined that three factors explain the majority of movements in interest rates
for various maturities. This study examine changes in interest rates rather than
the levels of interest rates or changes in bond prices. Bliss (1997) recognizes that
for hedging purposes, it is not the levels of interest rates that are important but
the changes, which in tum produce changes in bond prices. Of Litterman and
Scheinkman's three factors, the first one accounted for an average of 89.5 percent
of the observed variation in yield changes across maturities. This factor, which
they identified as a level change factor, helps to explain why Macaulay duration
is so successful. While changes in levels are not the whole story, they are such a
large part of what goes on in interest rate movements that the assumption
underlying Macaulay duration (that is, parallel movements up and down in
interest rates) is a good first approximation. Nonetheless, Litterman and
Scheinkman show that hedging based on three factors will improve hedge
performance relative to Macaulay durationbased hedging by 28 percent on
average and in some cases much more.

0,75

0,5

0,25

0
3 TCM5 TCM7 TCM10 TCM20 TCM30
-0 ,25

-0,5

-0,75

-1
- - - F2 F1 -+-F3

Figure I. Principal Component Analysis


The dynamics of the term structure of interest rate 229

The first PCs accounts for 83.14 % of the total variability, the second PCs for
about 15.86% and the first three for 99.74%.
The first factor loading is very close to being constant across all maturities.
Since Factor 1 has roughly equal effects on all maturities, a change in Factor 1
will produce a parallel movement in all interest rates. For this reason Factor 1 can
be interpreted as a level factor, producing changes in the overall level of interest
rates. The loadings on the second factor decrease uniformly from a relatively
large positive value at the short end of maturities to a negative value at the
longest maturities.
This pattern of decreasing loadings is consistent with interpreting Factor 2 as
a slope factor, affecting the slope of the term structure but not the average level
of interest rates. Factor 2 produces movements in the long and short ends of the
term structure in opposite directions (twisting the yield curve), with
commensurate smaller changes at intermediate maturities.
Factor 3 may be interpreted as a hump or curvature factor.

4.2 An Independent Component Analysis

The empirical study carried out in this section uses the Joint Approximate
Diagonalization of Eigenmatrices algorithm (Cardoso, Souloumiac 1993)6 .
As explained in the whitening step the first stage is performed by computing
the sample covariance matrix, giving the second order statistics of the observed
outputs. From this, a matrix is computed by eigendecomposition which whitens
the observed data. We transform the observed vector x linearly so that we obtain
a new vector x which is white, i.e. its components are uncorrelated and their
variances equal unity. The whitening transform Q ( x=Qx) can be determined by
a principal component analysis (see section 3.1):

where r is the orthogonal matrix of eigenvectors of E[xx'] and A is the diagonal


matrix of eigenvalues (/!;) i· already computed.
The second stage consists of finding a rotation matrix which jointly
diagonalizes eigenmatrices formed from the fourth order cumulants of the
whitened data. The outputs from this stage are the independent components.
230 Chapter 11

0,75

0,5

0,25

-0,25

-0,5

-0,75

-1

l-- F2 F1 -+- F3

Figure 2. Independent Component Analysis

The loadings of the first three independent factors obtained by JADE have
natural interpretation compared to those of the PCA. The first IC represent the
short term, the second the middle term and the third the end term of interest rates.

ACKNOWLEDGEMENTS

Conference participants are thanked for stimulating discussions.

NOTES
I. The principal components (PCs) are ordered in terms of their variances: the first PC defines the
direction that captures the maximum variance possible, the second PC defines (in the remaining
orthogonal subspace) the direction of maximum variance, and so forth.
2. For Joint Approximate Diagonalization ofEigenmatrices.
3. The basic preprocessing is to center x , i.e. subtract its mean vector m=E[x] (approximated by its
sample mean) so as to make a zero-mean variable. This necessary implies that s is zero-mean as
well, as can be seen by taking expectations on both sides of x =As, below. This preprocessing is
made solely to simplify the ICA algorithms: it does not mean that the mean could not be
estimated. After estimating the mixing matrix A with centered data, we can complete the
estimation by adding the mean vector of s back to the centered estimates of s. The mean vector
of sis given by A- 1m, where m is the mean that was subtracted in the preprocessing.
The dynamics of the term structure of interest rate 231

4. It must be stressed that ICA is a special case of a more general task termed the blind source
separation (BSS). Blind means that very little, if anything, is known on the mixing matrix, and
make little assumptions on the source signals.
5. Note they could be multiplied by some scalar constants.
6. We are grateful to Jean-Fran9ois Cardoso for making the source code of the JADE algorithm
available. The batch algorithm is an efficient version of the two step procedure that has already
been described.

REFERENCES
Ane T., Labidi C. (2001), "Implied Volatility Surfaces and Market Activity over Time", Journal of
Economics and Finance, 25(3), 259-275 .
Bliss R. (1997), "Movements in the term structure of interest rates", Economic Review, 41h Quarter,
16-33.
Bogner R. (1992), "Blind Separation of Sources", Technical Report 4559, Defense Research
Agency, Malvern.
Cardoso J.-F. (1989), "Source separation using higher order moments", International Conference
on Acoustics, Speech and Signal Processing, 2109-2112.
Cardoso, J.-F. (1999), "High-order contrasts for independent component analysis", Neural
Computation, 11(1), 157--192.
Cardoso J.-F., Souloumiac A. (1993), "Blind beamforming for non-Gaussian signals", lEE Proc.
F ., 140(6), 771-774.
Chaumeton L., Connor G., Curds R.(l996), "A Global Stock and Bond Model"', Financial Analysts
Journal, 52 (6), 65-74.
Comon P. (1994). "Independent component analysis a new concept?", Signal Processing, 36(3),
287-314.
Jamshidian, Farshid, Yu Zhu (1996), "Scenario Simulation Model: Theory and Methodology",
mimeo, Sakura Global Capital.
Kahn R. (1989), "Risk and Return in the U.S. Bond Market: A Multifactor Approach", in Fabozzi
F. (ed.), Advances & Innovations in the Bond and Mortgage Markets (Probus).
Kambhu J., Rodrigues A. (1997), "Stress Tests and Portfolio Composition", mimeo, Federal
Reserve Bank of New York.
Knez P., Litterman R., Scheinkman J.(l994), "Explorations into Factors Explaining Money Market
Returns", The Journal of Finance, XLIX (5), 1861-1882.
Litterman R., Iben T. (1991), "Corporate Bond Valuation and the Term Structure of Credit
Spreads", The Journal of Portfolio Management, Spring, 52-64.
Litterman R., Scheinkman J. (1991), "Common Factors Affecting Bond Returns", The Journal of
Fixed Income, June, 54-61.
Lore tan M.( 1996), "Market Risk Scenarios and Principal Components Analysis: Methodological
and Empirical Considerations", mimeo, Board of Governors of the Federal Reserve.
Murphy B., Won D. (1995), "Valuation and Risk Analysis of International Bonds", in Fabozzi F.,
Fabozzi T. (eds.), The Handbook of Fixed Income Securities, New York, Irwin.
Murphy K. (1992), "J.P. Morgan Term Structure Model", mimeo, Bond Index Group, J.P. Morgan
Securities, Inc.
Oja E. (1989), "Neural networks, principal components and subspaces", International Journal of
Neural Systems, 1, 61-68.
232 Chapter 11

Pope K., Bogner R. (1994), "Blind separation of speech signals", Proc. of the Fifth Australian Int.
Conf. on Speech Science and Technology, Perth, 46-50.
Tong L., Liu R., Soon V., Huang Y. (1991), "Indeterminacy and identifiability of blind
identification", IEEE Trans. Circuits Systems, 38(5), 499-509.
Chapter 12

Classifying hedge funds with Kohonen maps:


A first attempt

Bertrand MAILLET", Patrick ROUSSETb


aTEAMICNRS- University Paris-i (Pantheon-Sorbonne), ESCP-EAP and A.A.Advisors
(ABN Amro Group). Correspondance to: Dr. B. B. Maillet, TEAMICNRS, MSE, 106 bv de l'h6pital
F-75647 Paris cedex 13. Tel: +33 I 44078269170 (fac). bmaillet@univ-parisi Jr.;
bCEREQ and SAMOS- University Paris-i (Pantheon-Sorbonne), rousset@cereqjr.

Abstract: The purpose of this paper is to present an empirical study of a set of hedge funds on
recent periods. Alternative investments are now widely used by institutional
investors and numerous studies highlight the main features of such investments. As
they are in general poorly correlated with the main world indexes, traditional asset
pricing models yield poor adjustments, partially because of potential non-linearities
in pay-off functions. Some funds, however, exhibit high reward to variability ratios
and can advantageously be incorporated in a portfolio in a diversification
perspective. After describing the dataset, we classify the funds employing the
Kohonen algorithm. We then cross the classification with the one based on the style
of strategies involved, wondering if such categories are enough homogeneous to be
relevant. The map of funds allows to characterize families of funds - whose
conditional densities are different one to another - and to define a representative
fund for each class. The structure of the network of funds is then described. In
particular, we measure inter-class similarities and visualize them both on the
network of funds and via a map of one-to-one distances between representative
funds. Finally, we underline some of characteristics of classified fund families that
may interest investors such as performance measurements.

Key words: Kohonen maps, Classification, Multidimensional Data Analysis, General Non-linear
Models, Hedge Funds, Fund-picking, Performance Measurements.

C. Lesage et al. (eds.), Connectionist Approaches in Economics and Management Sciences


© Springer Science+Business Media Dordrecht 2003
234 Chapter 12

INTRODUCTION

Hedge funds are now recognized as an asset class in their own right since
several studies have highlighted their specific characteristics (see, for instance,
Fung, Hsieh, 1997; 1998; 1999; 2000). The success of that class of investments
among worldwide investors indicates moreover that they clearly have some
advantages for rational investors. Thought they take their name originally from
hedge positions against some specific or market risks, hedge funds include
nowadays various types of funds even if they do not hedge anything. The
heterogeneity of hedge funds (in term of financial strategies, markets involved,
return paths, nature of risk implied) claims for a robust typology that allows the
investor to have an a priori on the future behavior of the funds, especially hedge
funds, because of some lack of transparency due to well-protected strategies.
It has been shown nevertheless that one can identify factors explaining the
dynamics of some fund returns. As pointed by Lhabitant (2001), selecting the
correct factors to explain return dynamics is often more of an art than a science.
The techniques used for factor identification range from principal component
analysis (PCA), generalized hierarchical classification (Brown, Goetzmann 1997;
2001 ), option-like return representative strategies (Fung, Hsieh, 2000; Agarwal,
Naik 2000-b ), cluster analysis (Gruber 200 I), hierarchical tree (Mantegna 1999;
Bonanno et al. 2000), to well-known arbitrary specified factors in frameworks of
extended CAPMs, multifactor or APT models (see Blake et al 1999; Gruber
2001). A special case of the last technique is Sharpe's methodology for
identifying styles (see Sharpe 1988; 1992). In the context of hedge funds, such
analysis depends crucially on the definition of benchmarks. Style analysis works
remarkably well for investment funds and traditional portfolios (see Sharpe 1992;
Brown , Goetzmann 1997; Daniel et al, 1997) but performs poorly with hedge
funds (Brown, Goetzmann 2001). The reasons could be at least three-fold.
Firstly, the factors underlying hedge fund returns have not been fully identified
yet in previous research. The use of indexes as proxies often remains a
questionable and crucial issue. Secondly, hedge fund managers have often their
own investment styles and their way of identifying market opportunities. The
range of such styles and market opportunities are larger than for traditional stock
and bond fund managers. Typically, while the latter are strictly regulated and
must hold primarily long positions in the underlying assets with a limited and
controlled cash position whatever market conditions, the former have broad
mandates, take long and short positions and use different degrees of leverage in
time-varying market conditions. Thirdly, within general classes of strategies
(Statistical Arbitrage for instance), the practices, models and markets involved
are quite heterogeneous. In other words, similar types of general strategies lead to
Classifying hedge funds with Kohonen maps: A first attempt 235

very different patterns. All this facts result in non-linear pay-off that can flaw the
(linear) return-based style analysis.
We propose here a method for classifying and organizing funds - grouped
without any a priori ground- that can help to models representative individuals in
a homogeneous class. Several studies have shown that traditional linear asset
pricing fail to explain hedge fund returns (see, for instance, Fung, Hsieh 1999).
When the standard linear statistical methods are not appropriate, due to the
intrinsic structure of the observations, one can try to use the Kohonen algorithm,
already widely used for data analysis in different fields and in the financial
literature in particular (de Bodt et al. 1997, for interest rate curves and Deboeck
1997, for mutual funds).
The paper is organized as follow. In section 1, we introduce the methodology
and the database. In section 2, we present different visualizations of the dataset
discriminating hedge funds. Last section concludes, highlighting some potential
drawbacks of the method and some problems associated with the dataset.
Potential financial applications of this methodology are finally introduced.

1. METHODOLOGY AND DATABASE

The Kohonen network used hereafter is a two-dimensional grid (but other


topological organizations exist - see for instance Cottrell et al. 1995; Kohonen
2001). The Kohonen algorithm is unsupervised and makes a classification
including a topology of classes. It has been widely used in different fields (see
Cottrell et al. 1995; 1998-b), specifically in finance recently (see de Bodt et al.
1997; Deboeck 1997). After recalling shortly the main differences between
classification methods, we briefly present the Kohonen algorithm and some of
the recent results concerning its properties.

1.1 Kohonen Maps: A General Introduction

Kohonen maps organize classes of individuals according to a neighborhood


notion such as adjacent classes in the output space are close in the input space. It
can be compared to the Lloyd's and K-means family algorithms (see Anderson
1984) and is well adapted to big data sets. However, the originality of Kohonen
algorithm is based on the neighbourhood notion involved. The output map can be
considered as a representation of the surface joining all the class centroids (i.e.
representative individuals per class), and can be viewed as an adjustment of the
data. The Kohonen algorithm is then an alternative representation to the more
classical factorial analyses and, more precisely, to the combination of factorial
analysis and classification when we consider the following double aspect:
236 Chapter 12

classification and representation. Both representation tools have the same spirit
and the same goal (see Blayo, Demartines 1991 ). Factorial analysis allows to
adjust the data set with a plane surface whose representation is straightforward.
Because of a higher flexibility, the surface adjustment involved in the Kohonen
method provides a better fit, but it does not imply a particular structure of
representation. This last difficulty is solved by using a map of distances between
classes centroids. Moreover, the classification obtained with the Kohonen
algorithm - shortly described below - can be used as an input of a classical
hierarchical method.
A simple method for visualizing distances between neighboured centroids has
already been proposed (see Cottrell, de Bodt 1996) and has been found well-
adapted for visualizing distances between close classes, through, unfortunately, it
does not capture the whole structure of the surface in some cases (in particular
when the surface makes a fold). In this paper, we use a specific representation
that allows to visualize distances between all classes.

1.1.1 The Kohonen Algorithm

This algorithm - also called Self-organizing Map (SOM) because of its


properties - is similar to the classical Lloyd's classification algorithm, to which is
added a concept of neighborhood that organizes the different classes of
observations.
The first step of the SOM algorithm is to define a structure of the map, a
distance between elementary units and a neighborhood function. As a structure,
we choose a rectangular grid for representing the network in which we group the
observations according to their similarities. For instance, when the grid is a
mxm =U square box, each unit u, u = {l, ... ,U}, is defined by coordinates
(iu ,Ju). Units can be numbered from 1 to U in an arbitrary way. The distance
between two units u and v writes:

(Eq.l)

where 1·1 is the absolute value operator.


For each ray r(s)- where s is the step of the learning procedure (i. e. the
iteration of the algorithm) - a set of neighboured units of the unit v , denoted
v(v,r(s )), is defined by:
V(v, r(s )) = {u E (l,... , U~ d(u, v) $ r(s )} (Eq.2)
Classifying hedge funds with Kohonen maps: A first attempt 237

where r(s) is an arbitrary neighbourhood function such as, for instance:

r(s )= 2 forO< s ~ S/4


{ r(s)= 2 for S/4 < s ~ (3/4)S (Eq.3)
r(s) = 0 for(3/4)S <s~S

with S is the total number of iterations.


To each unit u at step s, we associate a code vector, denoted Gu (s),
belonging to JRP. Supposing that the dimension of the input datasee is
(Px N), and once given the couple U and S, initialized s to 1, the Kohonen
algorithm can then be summarized as follows:

1. Initialize the U code vectors of dimension (P x 1) relative to each unit u of


the map. Such initialization is obtained in our application using a random
convex combination of original observations.

2. Increment s and drawn randomly without replacement from the data one
observation denoted x . The winning unit associated to this drawn, denoted
u 0 , is the one whose code vector is the "nearest" of x; more precisely, the
code vector of the winning unit, denoted Gu , solves:
0

Guo (s) = Arg{Min~lx-


U Gu
Gu (s ~I)} (Eq.4)

where 11·11 is the Euclidean distance 2 .

3. Modify the code vectors of the U units according to the following rule:

G u (s + 1) = G u (s) + 7J(s )[x - G u (s )]


{ for each unit u E V(Guo (s ), r(s ))

Gu (s + 1)= Gu (s) for all other units u


(Eq.5)

where l](s) is an adaptive parameter which decreases to 0 when s goes to


infinity, following the Robbins-Monroe criteria3 . For instance, one can choose:
238 Chapter 12

(Eq.6)

where a, j3 and o are constants.


At step 3, the observation x influences the winning and its neighbored unit
code vectors and, in that sense, the map self-organizes.

4. Implement the algorithm for s varying from 1 to S (steps 2 and 3). When the
Kohonen algorithm attains the zero-ray step (i.e. when ~3/4)S < s ~ S in our
example), it then reduces to that ofLloyd's simple competitive learning.

Then, at the end of the Kohonen algorithm, the classification is built by


affecting to each element of the input dataset x , the class and the code vector
corresponding to its relative winning unit. The map determines a neighbourhood
between classes (corresponding to units on the network) such as two code vectors
of units which are, roughly speaking, "close" on the map are "similar" in the
input space.

1.1.2 Kohonen Algorithm and Data Analysis

This algorithm is used as an alternative classification method but also for data
analysis (see Blayo, Demartines 1991). In that case, the map becomes a
representation of the raw database: observations classified in the same unit - or in
the neighboured units - are supposed to share similarities. Once the dataset has
been reduced to some classes - called micro-classes, applying a classical
hierarchical method to these micro-classes allows to group them into macro-
classes and have a second level of interpretation. Finally, using exogenous
qualitative variables, the classification is explained and validated with classical
inference methods.
The originality of the method rests in the organization of classes on a map
according to a neighborhood notion. This methodology might be an
advantageous alternative to other techniques even if, at this stage, some problems
remam.
In one hand, these techniques are preferred to Forgy's Mobile Centroids
Method (1965) and Ward classification principle (1963) when large databases are
under review (see Anderson 1984), mainly because they are parsimonious in term
of computing time and tractability. This classification method is also robust
because it is less sensitive to outliers (for the input distribution) than most of
other techniques. A new element in the input data set will not in general change
significantly the result (on the contrary to hierarchical classification methods).
Studies based on truncated samples built with a bootstrap method (see Cottrell,
Classifying hedge funds with Kohonen maps: A first attempt 239

de Bodt 2000) lead indeed to the conclusion of the robustness of the


classification.
In the other hand, some arbitrariness is still required when applying the
Kohonen classification. If it is possible to rely on Wilk and Fisher statistic for
determining the right number of classes, the correct structure of the map has, in
practice, to be determined on an a priori ground. In the same way, we do not
dispose of a criterion to choose the set of parameter needed in the learning
process (see in the previous section, the role of functions and parameters: 7J(.),
V(.), r(.) and S). Moreover, more extensive results - focusing on
multidimensional database - are still required concerning the convergence of the
Kohonen algorithm. A state-of-the-art review of theoretical results is available in
Cottrell et al. (1998-a) and Cottrell et al. (2000).

2. THEDATA

The original data - provided by Micropal™ - consists in monthly Net Asset


Values of Hedge Funds (expressed in EUR, rescaled to 100 at inception) since
December 1976. Figure 1 represents the number of funds in the data from the
beginning to the end of the sample. The maximum number of funds values is
reached in June 2000 (1 ,358 funds values are observed). Each fund is classified
using a three-level typology (see Appendix for details): the first level is linked to
Micropal sub-categories (50 categories); the second is made using Micropal
categories at an aggregated level (18 categories); the third groups the last
categories into four categories (see Appendix for definition of categories and
strategies).

1600
1400
+=
1200
1000

-1
800

400
200
0
.... .... ~ "'
<D <Xl 0 N <D <Xl 0 N <Xl 0
~
., 'Z., .,'Z .,""u .., .., .,..,""u ""u ..,u
'9 <Xl q> 9
u u u u
.,
~ ~ ~ -G)
.,
~ ~
0
"0
-G)
"0
-G)
0
"0
0
"0
•Q)
"0 "0

Figure 1. Number of Funds in the Database- from 01 /31/1976 to 12/29/2000


240 Chapter 12

From this sample, applying traditional filter rules, we keep 471 funds on a 6-
year period in a compromise between a large sample of funds with few history
and a small number of funds with numerous observations. We then delete funds
data when too missing values were present and interpolate - using a cubic spline -
missing observations when possible and normalize fund values to index I 00 at
the beginning of the final sample. At the end, this one contains 294 funds and the
number of observations is 67 (from January 1995 to September 2000).
The database may suffer from major drawbacks (see conclusion) and caution
is needed when addressing the results presented in next sub-sections since they
are obtained with the final sample.

3. CLASSIFYING HEDGE FUNDS AND DETERMINING


EXPLICATIVE FACTORS: AN EMPIRICAL STUDY

In this application of SOM, we are using a grid of 49 units. This figure is


somehow arbitrary but can be justified ex ante by the number of a priori classes
in the Micropal's Typology (50 classes) and is justified ex post by noting that
only a few classes defined by the Self-organizing Map are empty. Algorithms
used are those described in previous sub-section, with the Euclidean distance and
a ten-level hierarchical classification with the Ward distance. The learning
process is composed of three steps and the neighbourhood function is the one
described in the previous sub-section 1.1.2.

3.1 Defining Micro- and Macro-classes

The next figure represents the map where each fund evolution through the
sample is drawn in a unit corresponding to its micro-class (represented in a box
placed in the grid). Then, we can also group funds into macro-classes
(represented by colored group of boxes) defined using a Hierarchical
classification with Ward distance. The number of funds in micro-classes is
heterogeneous varying from I to 40 funds by micro-class. Moreover, two
separate groups of funds can be distinguished: two third of the data belongs to the
first one (Group 1), grouped into one macro-class only (the green zone), whilst
one third belongs to the second one (Group 2: the nine other macro-classes). That
indicates roughly that one can distinguish an homogeneous group of funds, and
some others that exhibit individual particularities. Nearly a third of funds are
finally contained in the first row of the grid, indicating a strong concentration in
the green region.
Based on these remarks, we first focus on the latter group of funds (Group 2),
and second go back to the former group of funds (Group 1). We remark this time
ClassifYing hedge funds with Kohonen maps: A first attempt 241

that the number of funds is quite homogeneous (compared to these of Group 1)


since the mean number of fund per micro-class is 4 and the maximum number of
funds in a micro-class is 12.

NB• 1 MB· 40 Nl· lO MB • II NB· II NB· 8 Nl· 1

s
HB•
' HB• 1 HI•
' NB• B NB•
·~
NB• 1 HI•

HB• l HB• 1 HI• 2 HB• 9 HB•


' NB• l HI• 12

NB• I NB• 4 HI• 0 NB• 6 HB• 1 NB• l HI• 1

NB• 1 HB• 2 HI• I HB• II HB• I HB• l HI• I


1
HB• 0 HB• 1 HI• II HB• I HB• I HB• 0 HI•
'
NBo I MBo , Nl= 1 NBo s N8o 11 NBo I HI= I

"'· l

Figure 2. Funds Classification and Macro-classes

Note: This figure represents the network, grouping funds altogether according to the similarities of
fund return patterns (whole sample, see text). The number of funds is reported in each unit
(NB=#). Colors stand for meta-categories of funds, obtained applying a hierarchical
classification to the centroids of classes.

We can also define a representative fund for each micro-class u as a result of


Kohonen algorithm, and then group these funds using a hierarchical clustering
method based on a Ward distance to obtain macro-classes (colored zones of the
grid). Each representative fund is represented by the code vector corresponding
to a specific unit u and vector codes are classified according to their similarities.
In Figure 3, the evolutions (i.e. the code vectors) of these funds are visualized
and we verify that neighbored code vectors exhibit strong similarities. One
should notice that - in the southwest and east regions (inputs 29, 37, 43, 44 and
49)- the more volatile funds are grouped together (five funds).
242 Chapter 12

Nevertheless, the chosen representation of the data (i.e. the grid) does not
allow to capture the entire structure of the data; in particular, the distances
between code vectors cannot be inferred from previous figures. A first and
natural way of solving the problem is to project the code vectors on the principal
plane given by a traditional Principal Component Analysis method. But the
scratch due to the projection of a multidimensional data into a plane could yield
some misunderstandings and one needs here for complementary tools to visualize
the structure of the map. The distance between two micro-classes can be
associated with a surface proportional to the similarity between representative
funds (see Figure 4) .

...... -
...... .......
..-..oi ,.../_

"'
j ~

Figure 3. Representative Funds for each Sub-classes

Note: This figure represents the evolutions of the NAY of representative funds.

Figure 4. Map of Distances between Neighbor Classes

Note: This figure represents adjacent centroid distances (undashed zones between cells).
Classifying hedge funds with Kohonen maps: A first attempt 243

This visualization avoids misleading interpretations and gives an idea


discrimination between classes. We use the method proposed by Cottrell et al.
( 1998-b): each unit is represented by an octagon. The bigger, the closer to its
neighbors the unit is. So the clusters appear to be regions, in which octagons tend
to be big and frontiers are regions largely undashed. In other words, the more
important the undashed zone between two adjacent units, the more different the
NAY offunds.
We can observe that in general the "macro-classes" boundaries coincide with
the most important distances between classes, confirming the relevance of the
second level classification. On the contrary, if a boundary occurs between two
classes with small distance, that means that the second level classification splits a
large group into two groups and that the path from one to the other is continuous.
For instance, frontiers between magenta, green and red regions do not correspond
to large distance and that indicates that perhaps we could consider - without loss
of generality - a hierarchical classification with fewer classes. On the contrary
distances between yellow, blue, cyan, brown, grey, white and pink are marked.
These categories are composed of 12 funds which have very particular evolutions
within their classical typology. All funds are classified "Sector" funds in the
eighteen-category typolog/, and, more precisely, three funds are - in the fifty-
category Micropal's typology - "Sector: Real Estate", one is "Sector:
Healthcare/Biotechnology", eight are "Sector: Metals/Mining". These results
clearly indicate that a new map based on the original dataset excluding these
particular funds will be more informative. Thus, we build another map
considering only individuals of Group 1, reducing the total number of micro-
classes to 36 (see Figure 5).
The next figures represent the map of funds, the evolutions of representative
funds and map of distance between micro-classes for the filtered dataset
corresponding to Group 1 funds.
From Figure 5, we can see that numbers of funds within a micro-class are
quite homogeneous (it varies from 2 to 28) and two large populations can be
distinguished: funds in the Magenta and Green regions represent nearly 70% of
funds. The comparison between Figure 2 and 5 indicates that this time funds
spread homogeneously on the entire map. Moreover, only one unit is empty
which allows to put some confidence in the arbitrary number of units chosen for
the grid (a 6x6 box grid). Figure 6 indicates that once again the representative
fund for each category exhibits strong similarities with all funds in the category.
In that sense, kinds of benchmarks for categories have been defined.
244 Chapter 12

Figure 5. Funds Classification and Macro-classes

Note: This figure represents the network, grouping funds altogether according to the similarities of
fund return patterns (second sub-sample, see text). The number of funds is reported in each unit
(NB=#). Colors stand for meta-categories of funds, obtained applying a hierarchical
classification to the centroids of classes.

Figure 6. Representative Funds for each Sub-classes

Note: This figure represents the evolutions of the NAY of representative funds (second sub-sample,
see text).
Classifying hedge funds with Kohonen maps: Afirst attempt 245

Figure 7. Map of Distances between Neighbored Classes

Note: This figure represents adjacent centroid distances (undashed zones between cells; second
sub-sample, see text).

From Figure 7, we can see two homogeneous classes since distances between
close classes are relatively low for central classes (the green and magenta
regions). On the contrary, large distances are located in the upper-left, upper-
right and bottom-right comers (the grey, cyan and yellow regions).
But all information one can extract from the data cannot be backed out from
this last representation. Indeed - as in the cyan region - intra-classes are
sometimes more distant than adjacent macro-classes. This drawback leads to
evaluate, as a complementary tool, distances between each code vectors and all
others. In the next figure, each unit u is subdivided into U sub-units u'. In each
sub-unit u' of an unit u, the color (white for small distances, red for intermediate
distances and dark red for relative large distances) represents the distance
between code vectors Gu and Gu·. This superposition of maps directly allows to
visualize distances between representative funds and it is then clear from this
figure that one-to-one distances between micro-classes are very different. One
can distinguish four regions: a large central one (units 7, 8, 9, 10, 11, 13, 14, 15,
16, 17, 19, 20, 21, 22, 23, 25, 26, 27, 28, 29, 31, 32), a ring region surrounding
the previous one (composed of units 2, 3, 4, 5, 12, 18, 24, 30, 33, 34, 35) and
246 Chapter 12

three typical regtons located m the upper-left, upper-right and bottom-right


comers.

-- II - Ill

Figure 8. Map of One-to-another Class Distances

Note: This figure represents a superposition of the Kohonen network and one-to-one centroid
distances.

3.2 A priori Characterization of Funds using Classical


Hedge Funds Typologies

We tum now to answer the question of redundancy of information between


Kohonen map and classical existing typologies. A first method is based on the
extraction of the observations of a given class, analysing them class-by-class by
computing means, variances for quantitative variables (used for the
classification) and frequencies for qualitative variables. But this loses the
neighborhood properties of the Kohonen map, so we complete the description by
studying the distribution of each qualitative variable inside each class. In each
cell of the Kohonen map, we draw a frequency pie, where each modality is
represented by a color occupying an area proportional to its frequency in the
corresponding class. By representing frequencies of each modality across the
map, we show continuities between some classes as well as breaks between
others. The next figure consists in the combination of Kohonen map and, firstly,
the four-category classification (funds belong to "Directional Trading", "Relative
Value", "Specialist Credit" or "Stock Selection" families of fund).
Classifying hedge funds with Kohonen maps: A first attempt 247

modal i tie• : 1 -> ~

Figure 9.Kohonen Map and Meta Categories Typology- A Pie Representation

Note: This figure represents the proportion of each modality in each cell. Modalities vary from I
(Directional Trading, in pink), 2 (Relative Value, in blue), 3 (Specialist Credit, in yellow) to 4
(Stock Selection, in grey).

It is clear from this figure that funds of the same category do not correspond
exactly neither to micro-classes nor to macro-classes. Nevertheless, from Figures
8 and 9, one notices that the large central class is composed of different style
funds, whilst in other regions (ring and corners areas) category 4 funds (called
"Stock Selection") are essentially represented. This category is in fact not
homogeneous and the meta-category classification is not discriminating as it can
bee seen in the green region where all categories are present. A complementary
visualization focuses on a chart representation of modalities within micro-classes.
Focusing this time on the location of families of funds on the network, Figure 10
gives an idea of how qualitative modalities are distributed into the micro-classes.
It confirms the fact that Directional Trading and Stock Selection funds spread
into the entire map, whilst Relative Value and Specialist Credit funds are placed
into the green area (with a slight tendency for the Relative Value funds to be in
the south of the green area and the Specialist Credit funds to be put into the north
of the green area).
248 Chapter 12

v r l b i g. COL.2

Figure 10. Kohonen Map and Meta Categories Typology- A Chart Representation -

Note: This figure represents the localisation of each modality on the map. Modalities varies from I
(Directional Trading, in pink), 2 (Relative Value, in blue), 3 (Specialist Credit, in yellow) to 4
(Stock Selection, in grey).

As shown by these figures, the considered level of aggregation do not appear


fine enough to correspond to a classification that makes sense in term of fund
behavior. The next figures present results corresponding this time to the eighteen-
level Micropal's aggregation. There is obviously a trade-off between economic
relevance of the category (in term of performance and risk) and the number of
categories that is possible to distinguish on graphs. A eighteen-level aggregation
is definitively a high level classification for visualization purposes. Nevertheless,
a close look to the visualization indicates that some regularities should be
underlined. First of all, the three corner regions - which were indistinguishable
with the meta categories - contained different types of funds: the upper-left
corner (grey box) contains Emerging Market funds, the upper-right (cyan zone)
mainly contains Convertible Arbitrage funds and the bottom-left corner (yellow
funds) is mainly characterized by Distressed Securities funds (and Macro and
Funds of funds). Confirming previous results, it appears from Figures 11 and 12
that the green region is composed of funds of every category and, additionally,
that the magenta region seems to be mainly represented by Funds of funds.
ClassifYing hedge funds with Kohonen maps: A first attempt 249

Figure 11. Kohonen Map and Eighteen-category Typology- A Pie Representation-

Note: This figure represents the proportion of each modality in each cell. Modalities vary from 1
(Convertible Arbitrage) to 18 (Statistical Arbitrage) See text and Appendix for the list of
strategies.

• 1_. ___ ,___•


: •

I ..

.11:.
0

, . I · : · •

"If
1 .

:. . . .
I.
.
.. I··~
Figure 12. Kohonen Map and Micropal's Categories Typology- a Chart Representation-

Note: This figure represents the localisation of each modality on the map. Modalities vary from 1
(Convertible Arbitrage) to 18 (Statistical Arbitrage) See text and Appendix for the list of
strategies.
250 Chapter 12

3.3 Ex post Characterization of Funds using Performance


Measurements

Finally, it could be interesting to characterize micro- and macro-classes with a


qualitative variable such as the level of volatility of the fund's return, the level of
its mean return or the attained level of a traditional performance measurement
like its Sharpe's ratio (denoted S1 ). An illustration of a possible characterization
of a priori groups of funds is proposed hereafter with the small-sample
approximate estimated Sharpe's ratio (see Sharpe I 966; Miller, Gerh I 978;
A

Jobson, Korkie 1981; Sharpe I 994; Pedersen, Satchell 2000), denoted S1 and
defined such as:

(Eq.7)

and:

A ( 3 100 ) (-
S1 = 1+-+--
4n 128n
R1 -R1 ) VaA

1 (Eq.8)

where R1 is the annualized return on the considered hedge fund and R1 its
sample mean counterpart, Rf is a proxy for the riskless asset return, <J1 is the
annualized standard deviation of hedge fund return and a 1 its sample estimation,
and n the number of observations in the sample.
Figures 13 and 14 represent one possible discrimination obtained with
Sharpe's ratios. In these figures, the color represents a four-level discretization of
Sharpe's ratios (each category containing a quartile of the population and colors
ranging from magenta, blue, yellow and grey according to the level of the
ascending Sharpe's ratio ranks of funds). From Figure 13 and 14, we remark that
low and medium-low Sharpe's ratios (magenta and blue levels) can mainly be
found on the ring zone of the map, that medium-high (yellow level) ones are
more often in the green zone, whilst high Sharpe's ratios (grey level) are
essentially located in the central zone of the map (green and magenta zones).
Nevertheless, the discretization of Sharpe's ratios operated here is somehow
arbitrary and to go further on the attempt of characterization of the map using
Sharpe's ratios, one can represents, for each unit, the conditional and
unconditional Sharpe's ratios densities (see Figure 15) and conditional and
unconditional (simplified) Box-plots (see Figure 15). These representation
confirms that none of both information - adjusted performance measures and
Classifying hedge funds with Kohonen maps: Afirst attempt 251

Kohonen classification - are redundant. Indeed, one can see on Figures 15 and 16
that high Sharpe's ratios can be found more probably in the central region of the
map, whilst lower Sharpe's ratio are more present in the north and the east zones
of the map.

Figure 13. Qualitative Sharpe's Ratios Discrimination- A Pie Representation -

Note: This figure represents the proportion of each modality in each cell. Modalities vary from 1
(Low Sharpe's Ratios, in pink), 2 (Medium-low, in blue), 3 (Medium-high (3), in yellow), to 4
(High, in grey).

Figure 14. Qualitative Sharpe's Ratios Discrimination- A Chart Representation -

Note: This figure represents the localisation of each modality on the map. Modalities vary from I
(Convertible Arbitrage) to 18 (Statistical Arbitrage) See text and Appendix for the list of
strategies.
252 Chapter 12

.. ..1 11.1. .........

Figure/5. Qualitative Sharpe's Ratios Discrimination- Conditional versus Unconditional Sharpe's


Ratio Densities -

Note: This figure presents, in each cell of the map, the density of Sharpe's ratios of funds in the
category, together with, below, the distribution of Sharpe's ratios for the whole dataset.

~-t -f=
I--
~-c-
-I-
H -
-

=:::-
~-----
F=

-
1-

-
=r==
I--

I--

=-
-
=-
- -~ -f=_
-

=
:: ~-----
- 1--

===-1- I--
- ~=- -- -1---
-
-_ -
-- ==
I--
1--- -t;;

n
- - I-- I--

- :=-- :f= - -
- I-- -I-- -

Figure 16. Qualitative Sharpe's Ratios Discrimination - Conditional versus Unconditional Box Plot

Note: This figure presents, in each cell of the map, the minimum, maximum and mean Sharpe's ratio
of the category of funds, together with, on the right, the minimum, maximum and mean Sharpe's
ratios for the whole dataset.
Classifying hedge funds with Kohonen maps: A first attempt 253

That is consistent with previous analyses of the map (Figure 13 and 14) and
leads to the conclusion that the central region of the map should be preferred by a
rational and risk-averse investor.

Finally, note that this analysis could be applied using most of performance
measurements, from practitioner oriented measures (Information ratio, Sortino
ratio, Calmar ratio, Sterling ratio ...; see Sortino and Price for instance),
traditional ones 5 (Treynor 1965; Sharpe 1966; Jensen 1968), other measures
(Connor-Korajczik 1986; Grinblatt-Titman 1989; Okunev 1990) or more recent
ones (such as Leland 1999; Bowden 2000; Chauveau-Maillet 2001 or Dacorogna
et al. 2001) depending on the hypotheses on the Data Generating Process, on
market timing effects, on normality of the funds returns, wealth, risk aversion
coefficient and implied preferences of the final investor.

CONCLUSION

We propose here a general methodology to analyze multidimensional data,


when a linear model is not satisfactory and when the observations are described
by quantitative and qualitative variables. The Self-organising Map is a useful tool
for defining clear homogeneous groups of funds with little knowledge of the true
category of a fund and financial strategies involved. Indeed, although a funds
prospectus should obviously provide this information, recent researches by
Dibartolomeo and Witkowski (1997), Brown and Goetzmann (1997) and Kim et
al. (1999) present evidence of serious misclassifications when self-reported in-
vestment objectives are compared to actual investment styles. Robust Kohonen
maps interpretation partially solves such misclassification problems.
It also allows to define representative funds (i.e. code vectors) - that can be
interpreted as "benchmarks" of categories. In addition, the visual tools we used
yield the conclusion that existing typologies group heterogeneous funds and that
some funds exhibits strong individual particularities. Specifically, dissimilarities
between categories of funds are visualized using distances between
representative funds.
We propose then an analysis mixing information extracted from the Kohonen
map and performance measurement relative to funds - the Sharpe's ratio in our
case - and report three types of representations, based on a combination units of
Kohonen map and level of Sharpe' s ratio (see Figures 13 and 14), a combination
of units of Kohonen map, and conditional and unconditional Sharpe's ratio
densities (see Figure 15), units of Kohonen map and conditional and
unconditional box -plots (see Figure 16). These allow us to identify (or highlight)
254 Chapter 12

categories where risk-adjusted performances are the more interesting for


investors.
This first of the various applications of this methodology is the building of a
robust typology. The set of representative funds (see Figure 3) may also be used
as benchmarks for style analysis of funds as in Sharpe ( 1988). In this framework,
each fund would be modelled by a regression on benchmarks defined by the
Kohonen classification (constrained coefficients would be interpreted as in the
classical return-based style analysis). This type of analysis has proven to be
beneficial, even if its results must be carefully interpreted since, as underlined in
Lhabitant (200 1), it is often incorrectly stated that the factor loadings correspond
to the effective allocation of the funds portfolio among the asset classes and since
- as stated by DeRoon et a!. (200 1) extra emphasis should be put on beta
constraints and their induced biases. Nevertheless, one can say that the fund
behaves as if it was invested using these specific pure strategies. A natural
extension of this approach could be find in Lhabitant (200 1) who uses return-
based style analysis based on the nine CSFB/Tremont sub-indices for assessing
relevant VaR measures corresponding to specific hedge funds. Auto-selected
benchmarks extracted from the Kohonen algorithm could replace advantageously
traditional indexes.
But the reliability of our empirical results depends crucially on the dataset as
well as our methodology. The analysis can indeed be drastically mislead by
survivorship and backfilling biases, non-synchronicity of observations, non-
available observations or error measurements. In particular, since only survivor
funds are available in the our database, our results hold only for funds still alive
at the end of the sample. A more complete analysis should require to focus on
funds that failed during the period under review. A larger, longer and more
exhaustive database is also required is order to strengthen the results. Other
typologies available on the market might also be more relevant and yield more
homogeneity of fund families.
About the methodology used, one should underline that some extra theoretical
results are still researched regarding the convergency of the algorithm to the final
organization of the map. Positive results have been already obtained for many
distributions for elementary networks (Cottrell et a!. 2000). For more complex
structure or unknown densities, it is always possible to control - during the
convergence process - that the main structure of the map is quite stable during the
learning process. In our case, intermediate structures obtained during the
algorithm were found in accordance with the final results. In that sense, the
algorithm shows fast convergence properties on our sample. Nevertheless, due to
the large restrictions and limitations of the database used, multivariate analysis
could be sensitive to sample biases. In this context, Kohonen maps are proven to
be quite robust to such biases as a recent work emphasizes (see Cottrell et a!.
Classifying hedge funds with Kohonen maps: A first attempt 255

2000), and, as soon the data is ergodic, one could expect the structure of the map
to be relevant.

APPENDIX: THREE-LEVEL HEDGE FUND A


PRIORITYPOLOGY6
(in parentheses is the number of funds NA V provided at December 2000)

A. Meta Classification

I. Directional Trading
II. Relative Value
III. Specialist Credit
IV. Stock Selection

B. Two Micropal's Classification

1. Convertible Arbitrage
l.l Convertible Arbitrage- Global (34)
1.2 Convertible Europe (3)
1.3 Convertible Japon (2)
2. Distressed Securities
2.4 Distressed Securities - Global (31)
3. Emerging Markets
3.5 Emerging Markets : Asia (31)
3.6 Emerging Markets: Eastern Europe (18)
3.7 Emerging Markets- Global (43)
3.8 Emerging Market: Latin America (23)
4. Equity Hedge
4.9 Equity Hedge (253)
4.10 Equity Hedge: Africa (2)
4.11 Equity Hedge: Asia (8)
4.12 Equity Hedge: Europe (33)
4.13 Equity Hedge: Japan (3)
5. Equity Market Neutral
5.14 Equity Market Neutral (42)
5.15 Equity Market Neutral : Europe (I)
6. Equity Non-hedge
6.16 Equity Non-hedge (23)
6.17 Equity Non-hedge: Asia (2)
6.18 Equity Non-hedge: Europe (3)
7. Event Driven
7.19 Event-Driven (64)
7.20 Event-Driven: Europe (2)
8. Fixed Income Arbitrage
256 Chapter 12

8.21 Fixed Income: Arbitrage (16)


8.22 Fixed Income: Convertible Bonds (16)
8.23 Fixed Income: Diversified (34)
8.24 Fixed Income: High Yield (11)
8.25 Fixed Income: High Yield, Asia (2)
8.26 Fixed Income: High Yield, Europe (3)
8.27 Fixed Income: Mortgage-Backed (13)
9. Foreign Exchange
9.28 Foreign Exchange (21)
10. Funds of Fund
10.29 Fund of Funds (242)
11. Macro
11.30 Macro- Global (53)
11.31 Macro: Europe (3)
12. Managed Futures
12.32 Managed Futures (17)
13. Market Timing
13.33 Market Timing (30)
14. Merger Arbitrage
14.34 Merger Arbitrage (42)
14.35 Merger Arbitrage: Europe (3)
14.36 Merger Arbitrage: Global (I)
15. Regulation
15.37 Regulation D (18)
16. Relative Value Arbitrage
16.38 Relative Value Arbitrage (39)
16.39 Relative Value Arbitrage: Europe (2)
16.40 Relative Value Arbitrage: Japan (2)
17. Sector
17.41 Sector: Energy (5)
17.42 Sector: Financial (4)
17.43 Sector: Healthcare/Biotechnology (II)
17.44 Sector: Metals/Mining (I)
17.45 Sector: Miscellaneous (4)
17.46 Sector: Real Estate (5)
17.47 Sector: Short Selling (9)
17.48 Sector: Technology (65)
18. Statistical Arbitrage
18.49 Statistical Arbitrage (31)
18.50 Statistical Arbitrage: Europe ( 1)

NOTES
I. Corresponding, in our case, to the P Net Asset Value dates for the N considered hedge funds.
2. Other criterion defining the neighborhood have been chosen in financial applications such as a
notion of distance based on one minus the squared Pearson correlation coefficient between
Classifying hedge funds with Kohonen maps: A first attempt 257

returns associated with financial assets and a benchmark (see Mantegna 1999). The adopted
definition might be an important issue when classifying funds .
+oo +oo
3. That is _L17(s) = +oo and L [17(s )]2 $ +oo.
s=l s=l
4. We find here a singularity for these funds that has been already signalled for classical mutual
funds (see Brown, Goetzmann 1997, for instance).
5. See Pedersen and Satchell (2000) for recent applications.
6. A complete description of such categories can be found, for instance, in Agarwal and Naik,
(2000-a) or Lhabitant (200 1).

REFERENCES
Agarwal V., Naik N. (2000-a), "Multi-period Persistence Analysis of Hedge Funds", Journal of
Quantitative and Financial Analysis, 35(3), September, 327-342.
Agarwal V., Naik N. (2000-b), "Characterizing Systematic Risk of Hedge Funds with Buy-and-
hold and Option-based Strategies", LBS working paper, IFA n°300, August, 51 pages.
Anderson T. (1984), An Introduction to Multivariate Statistical Analysis, John Wiley Editor, 2"ct
Edition, New York.
Blake E., Elton M., Gruber C. (1999), "Common Factors in Active and Passive Portfolios",
European Finance Review, 3(1 ), 53-78.
Blayo F., Demartines P. (1991), "Data Analysis: How to Compare Kohonen Neural Networks to
Other Techniques?", in Proceedings of IWANN '91 Conference, Springer, 469-476.
Bonanno G., Vandewalle N., Mantegna R. (2000), "Taxonomy of Stock Market Indices", Physical
Review E 62(6), December.
Bowden R. "The Ordered Mean Difference as a Portfolio Performance Measure", Journal of
Empirical Finance, 1, 195-223.
BrownS., Goetzmann W. (1997), "Mutual Fund Styles", Journal of Financial Economics, 43, 373-
399.
BrownS., Goetzmann W. (2001), "Hedge Funds with Styles", NBER Working Paper n°w8173,
March, 37 pages.
Chauveau T., Maillet B. (2001), "Performance with Restricted Borrowing: A Generalisation of
Usual Measures", in Proceedings ofEFMA 'OJ Conference, Lugano, June, 46 pages.
Connor G., Korajczyk R. (1986), "Performance Measurement with the Arbitrage Pricing Theory: A
New Framework for Analysis", Journal ofFinancial Economics, 15(3), 373-394.
Cottrell M., Fort J.C., Pages G. (1995) "Two or Three Things that We Know about the Kohonen
Algorithm", in Verleysen M. (ed.), Proceedings of ESANN'94 Conference, D Facto, Bruxelles,
235-244.
Cottrell M., de Bodt E. (1996), "A Kohonen Map Representations to Avoid Misleading
Interpretations", in Verleysen M. (ed.), Proceedings of ESANN'96 Conference, D Facto,
Bruxelles, 103-110.
Cottrell M., de Bodt E. (2000), "Bootstrapping Self-organizing Maps to Assess the Statistical
Significance of Local Proximity", in Verleysen M. (ed.), Proceedings ofESANN'OO Conference,
D Facto, Bruges, 245-254.
Cottrell M., Girard B., Girard Y., Muller C., Rousset P. (1995-a), "Daily Electrical Power Curves:
Classification and Forecasting Using a Kohonen Map, From Natural to Artificial Neural
Computation", Proceedings of IWANN '95 Conference, Springer, II 07-1113.
258 Chapter 12

Cottrell M., Girard B., Girard Y., M. Mangeas (1995-b), "Neural Modelling for Time Series: A
Statistical Stepwise Method for Weight Elimination", IEEE Tr. on Neural Networks, 6(6),
November, 1355-1364.
Cottrell M., Fort J.-C., Pages G. (1998-a), "Theoretical Aspects of the SOM Algorithm", Neuro
Computing, 21, 119-138.
Cottrell M., Girard B., Rousset P. (1998-b), "Forecasting of Curves Using a Kohonen
Classification", Journal of Forecasting, 17(5/6), 429-439.
Dacorogna M., Gens;ay R., MUller U., Pictet 0 . (2001), "Effective Return, Risk Aversion and
Drawdowns", Physica, A 289, 229-248.
Daniel K., Grinblatt M., Titman S., Wermers R. (1997), "Measuring Mutual Fund Performance
with Characteristics-based Benchmarks", Journal of Finance, 52(3), July, I 035-1058.
de Bodt E., Gregoire Ph., Cottrell M. (1997), "Projection of Long-term Interest Rates with Maps",
in Deboeck G., Kohonen T. (eds), Visual Explorations in Finance with Self-organizing Maps,
Springer, 24-38.
Deboeck G. (1997), "Picking Mutual Funds with Self-organizing Maps", in Deboeck G., Kohonen
T. (eds) Visual Explorations in Finance with Self-organizing Maps, Springer, 39-58.
DeRoon F., Nijman T., Terhost J. (2001), "Evaluating Style Analysis", in Proceedings ofEFMA 'OJ
Conference, Lugano, June, 33 pages.
Dibartomoleo D., Witkowski E. (1997), "Mutual Fund Misclassification: Evidence based on Style
Analysis", Financial Analysts Journal, Sept-Oct, 32-43.
Fung W., Hsieh D. (1997), "Empirical Characteristics of Dynamic Trading Strategies: The Case of
Hedge Funds", Review of Financial Studies, 10(2), Summer, 275-302.
Fung W., Hsieh D. (1998), "Performance Attribution and Style Analysis. From Mutual Funds to
Hedge Funds", Working paper, Paradigm Financial Product, February, 41 pages.
Fung W., Hsieh D. (1999), "A Primer on Hedge Funds", Journal of Empirical Finance, 6(3), 309-
331.
Fung W., Hsieh D. (2000), "Performance Characteristics of Hedge Funds and Commodity Funds:
Natural versus Spurious Biases", Journal of Quantitative and Financial Analysis, 35(3),
September, 291-307.
Fung W., Hsieh D. (2001), "The Risk in Hedge Fund Strategies: Theory and Evidence from Trend
Followers", Review of Financial Studies, 14(2), Summer, 3 13-341.
Grinblatt M., Titman S. (1989), "Portfolio Performance Evaluation: Old Issues and New Insights",
Review ofFinancial Studies, 2(3), 393-421 .
Gruber M. (2001), "ldentying the Risk Structure of Mutual Fund Returns", European Financial
Management, 7(2), June 2001, 147-160.
Jensen M. (1968), "The Performance of Mutual Funds in the Period 1945-1964", Journal of
Finance, 23, May, 389-416.
Jobson J., Korkie B. (1981 ), "Performance Hypothesis and Testing and with the Sharpe and
Treynor Measures", Journal of Finance, 36, September, 889-908.
Kohonen T. (2001), Self-organizing Maps, Springer Series in Information Sciences, XXX, 3'd
extended edition, Springer, Berlin.
Leland H. (1999), "Beyond Mean-variance: Performance Measurement in a Nonsymmetric World",
Financial Analysts Journal, January-February, 27-36.
Lhabitant F.-S. (200 I), "Assessing Market Risk for Hedge Funds and Hedge Funds Portfolios",
FAME Research Paper n°24, March, 40 pages.
Mantegna R. (1999), "Information and Hierachical Structure in Financial Markets", Computer
Physics Comunications, 121-122, 153-156.
Mitev T. (1998), "Classification of Commodity Trading Advisors (CTAs) using Maximum
Likelihood Factor Analysis", Journal ofAlternative Investments, 1(2), Fall, 40-46.
Classifying hedge funds with Kohonen maps: Afirst attempt 259

Miller R., Gehr A. (1978), "Sample Bias and Sharpe's Performance Measure: A Note", Journal of
Financial and Quantitative Analysis, 13, December, 943-946.
Okunev J. (1990), "An Alternative Measure of Mutual Fund Performance", Journal of Business
Finance and Accounting, 17(2), Spring, 247-264.
Pedersen Ch., Satchell S. (2000), "Small Sample Analysis of Performance Measures in the
Asymmetric Response Model", Journal of Quantitative and Financial Analysis, 35(3),
September, 425-450.
Sharpe W. (1966), "Mutual Fund Performance", Journal of Business, 34(1), Part 1, January, 119-
138.
Sharpe W. (1988), "Determining a Fund's Effective Asset Mix", Investment Management Review,
2(6), 1988, 59-69.
Sharpe W. (1992), "Asset Allocation: Management Style and Performance Measurement", Journal
of Portfolio Management, 18(2), 7-19.
Sharpe W. (1994), "The Sharpe Ratio", Journal of Portfolio Management, Fall, 49-58.
Sortino F., Price L. (1994), "Performance Measurement in a Downside Risk Framework", Journal
of Investing, Fall, 59-65.
Treynor J. (1965), "How to Rate Management of Investment Funds", Harvard Business Review,
January-February, 63-75.

You might also like