Professional Documents
Culture Documents
Connectionist Approaches in Economics and Management Sciences
Connectionist Approaches in Economics and Management Sciences
Connectionist Approaches in Economics and Management Sciences
Edited by
CEDRIC LESAGE
CREREG CNRS,
University of Rennes, France
and
MARIE COTTRELL
SAMOS MATISSE CNRS,
University of Paris 1, France
....
''
SPRINGER-SCIENCE+BUSINESS MEDIA. B.V.
A C.l.P. Catalogue record for this book is available from the Library of Congress.
Editors Vll
Acknowledgements lX
Preface XI
v
VI Contents
Cedric Lesage
Marie Cottrell
Vll
Acknowledgements
lX
Preface
1) Multilayer perceptrons: Multilayer perceptrons are used in numerous scientific fields since
more than 20 years. Derived from an analogy with natural neuronal networks, they may be
regarded by mathematicians as non linear, parametric and differentiable functions. Moreover, this
class of functions is quite more general, as proves the theorem of universal approximation, which
stands that any continuous function defined on a compact may be approximate as finely as required
by a multilayer perceptron function. On the other hand, their success is due for a great extent to the
easiness of implementation of learning algorithms and to their good results in practical problems,
despite some weakness in their theoretical properties. Multilayer perceptrons learn by fitting their
parameters, or weights, to a problem in order to minimize an error of prediction or classification.
However, for a given architecture, learning algorithms only provide a sub optimal solution, so you
must make numerous random initialisations in order to avoid too bad solutions. On the other hand
there exists a pitfall more difficult to be avoided: the "too good" learning, or the "over learning" of
Xl
Xll Preface
multilayer perceptrons. That phenomenon is comparable to a by heart learning on learning data: the
multilayer perceptron is no more capable of making satisfYing prediction on new examples of the
studied problem and its ability for "generalisation" is very poor, whereas this is the key
characteristic for one user. One solution for a compromise between the complexity of the model
and its abilities for generalisation consists in using a penalisation of the complexity of the
multilayer perceptron. One other limit of that kind of models is the hypothesis of stationnarity of
the phenomenon that you try to design with multilayer perceptrons, which leads to suppose that the
characteristics of the model do not evolve. A possibility to relax that hypothesis is to use several
MLP and specialise each of them on a specific part of the phenomenon. This idea leads to use
hybrid systems including hidden Markov chains. This generalisation illustrates one of the MLP
characteristics: the ability to be easily integrated in complex systems and to improve already
existing models.
2) The SOM algorithms: the SOM algorithm, also called Kohonen algorithm, is an original
classification algorithm defined by Teuvo Kohonen in the 1980s (Kohonen, 1984, 1995). The
motivation was neuromimetics. The algorithm presents two essential differences with the
traditional methods of classification: it is a stochastic algorithm, and an a priori neighbourhood
concept between classes is defined. The neighbourhood between classes may be chosen on a wide
range of representations: grid, string, cylinder or torus, called Kohonen maps. The classification
algorithm is iterative. The initialisation consists of associating a code vector (or representative) to
each class, chosen at random in the space of observations. Then, at each stage, an observation is
randomly chosen, compared to all code vectors, and a winner class is determined, i.e., the class of
which the code vector is nearest in the sense of a given distance. Finally, the code vectors of the
winner class and those of the neighbouring classes are moved closer to the observation . Hence, for
a given state of code vectors, an application associating to each observation the number of the
nearest code vector (the number of the winning class) is defined. Once the algorithm converges,
this application respects the topology of the space of entries, in the sense that after classification,
similar observations belong to the same class or to neighbouring classes. An inconvenience of the
basic algorithm is that the number of classes needs to be determined a priori. In order to palliate for
this inconvenience, a hierarchical type of classification of the code vectors is undertaken, so as to
define a smaller number of classes called clusters. The classes and clusters can then be represented
on the Kohonen map corresponding to the chosen topology. As neighbouring classes contain
similar observations, the clusters gather contiguous classes, which gives interesting visual
properties. It is then easy to establish a typology of individuals, by describing each of the clusters
by means of traditional statistics. According to the problem, one may also define a model (of
regression, auto-regression, factor analysis, etc .. . ) adapted to each of the clusters. For qualitative
data, adaptations of the Kohonen algorithm allowing the study of relationships among modalities
also exist. Finally, in order to get the best results, it is recommended to use simultaneously, or
successively, the traditional methods of data analysis and Kohonen techniques.
stochastic zero th order methods (i.e., requiring only values of the function to optimise) that can
find the global optimum of very rough functions. This allows GAs to tackle optimisation problems
for which standard methods (e.g., gradient-based algorithms requiring the existence and
computation of derivatives) are not applicable. Despite the apparent simplicity of an GA process-
which has driven many programmers to first write their own GA adapted to their specific problem-
building an efficient EA for an application must take into account the sensitivity to parameter
settings and design choices.
4) Fuzzy Logic: Fuzzy logic is a superset of conventional (Boolean) logic that has been
extended to handle the concept of partial truth - truth values between "completely true" and
"completely false" . It was introduced by Lotti Zadeh in the 1960's as a means to model the
uncertainty. Rather than regarding fuzzy theory as a single theory, you should regard the process of
"fuzzification" as a methodology to generalize any specific theory from a crisp (discrete) to a
continuous (fuzzy) form. Thus fuzzy arithmetic, fuzzy calculus, fuzzy differential equations and so
on were developed. Just as there is a strong relationship between Boolean logic and the concept of a
subset, there is a similar strong relationship between fuzzy logic and fuzzy subset theory. A fuzzy
subset F of a set S can be defined as a set of ordered pairs, each with the first element from S, and
the second element from the interval [0, 1], with exactly one ordered pair present for each element
of S. This defines a mapping between elements of the set S and values in the interval [0, I). The
value zero is used to represent complete non-membership, the value one is used to represent
complete membership, and values in between are used to represent intermediate degrees of
membership. A very common methodology to apply fuzzy theory is the fuzzy expert system, i.e. an
expert system that uses a collection of fuzzy membership functions and rules, instead of Boolean
logic, to reason about data. The rules in a fuzzy expert system are usually based on a "If-Then"
structure. Their ability to model a "natural" processing of reasoning make their formalisation
generally more robust and less costly than analytical modelling. Conversely, fuzzy methodologies
demand "context-dependent" that may be difficult to obtain. Fuzzy sets and logic must be viewed
as a formal mathematical theory for the representation of uncertainty. Uncertainty is crucial for the
management of real systems in economics and management sciences. The presence of uncertainty
is the price you pay for handling a complex system.
The corpus under study is made up of a set of 148 articles, which can be
classified in terms of the methods and the nature of applications. Following the
latter classification we chose to divide the articles up into seven categories.
Globally, about a third of the methods are related to perceptrons and general
neural networks. The remaining two thirds are evenly distributed among
Kohonen maps, genetic algorithms, fuzzy methods, and connectionist systems.
Lastly, cognitive economy accounts only for the 3% of the methods used.
OtherM: 9% OthefA: 5%
M.P. 31% EC0:15%
L_____~---l
As for the applications, about half of them are related to finance and
marketing, followed by economics and methodology (around 15% each), lastly
by human resources and organization theories and decision theory (less than 10%
each).
As such, we notice an interesting diversity concerning both the methods and
their applications.
The progression in time is even more insightful. An evolution towards a more
uniform distribution of methods as well as their applications can be detected.
Indeed, perceptrons and neural networks accounted for the totality of the methods
used in 1994, whereas a much more uniform distribution of the methods can be
observed in 2001 . The same can be said of the applications.
Connectionist Approaches in Economics and Management Sciences XV
100 60
80 50
l&L•
40
60
~l .. t.lt.~
30
~
40
20
20 10
1994 1995 1996 1997 1998 1999 2000 2001 1994 1995 1996 1997 1998 1999 2000 2001
D M.P • GA • SOM ll R.JZ c 00N o COG • Otl1erM • EGO D FIN 0 M<G • ORG DEC ~T • OlllerA
The following graphs illustrate the transition towards a more uniform distribution
of the different types of methods and applications.
""'
Olhe<>.t
GA:
IS%
M.J> FUZ:
·~
MLP SOM
OtherA ECO M:T
6% 6% 9%
ORG
20%
2% 36%
L_____ J
AJZ CON COG
ECO
t.£T OtherA ECO OtherA ECO
11%
9% 5% 9%
M<G
Fl'l 25%
31%
IE FN
32% 20%
9% 26%
ECO AN MKG
0\herM M.P M.P
OtherM
14%
COG COG GA
5% 3% 6% SCM MLP
CON 49%
6%
9%
ORG MLP
8%
- r OlherM DEC
7%
MLP
29%
GA M.P
33% 51%
17% 17%
Hence, we observe that the "ACSEG community" has more and more
appropriated general neural methods and has diversified its applications.
classify the 148 articles on the basis of the two qualitative variables mentioned
above (methods and applications, each with seven modalities). We have chosen a
one-dimensional map (a string) with six units. In each unit, the different methods
and applications, as well as the number of the corresponding articles have been
classified. In the map below only the modalities of the two qualitative variables
are represented.
SOM MLP
FUZ COG ECO FIN MET
DEC OtherM OtherA
As expected, we find that the fuzzy methods are associated with the
applications of the decision theory; also that Kohonen maps and economics are
placed in the same unit and that finance is in the neighbouring case. Genetic
algorithms and connectionist approaches go together with marketing and
organization theory, etc.
The following graphical representations show the number of articles per year
in each class of the constructed string. An outbreak of the cognitive approaches
in the year 2001 can be clearly observed. For an easier lecture of the graphs, we
have represented the six units in three rows (from 1 to 2, from 3 to 4 and from 5
to 6).
.,
I. I \1
"
:1 . ' "
.
:
' '
El •
. . - '""i!J-- ""'""
0 •
J
.
"'
' ,!•
" . ,,
15 I) l)
All these papers contain interesting results on each subject, that it would be
very difficult to show with classical techniques and that have been proved by
using these connectionist non linear methods.
M Cottrell, C. Lesage
PART I
Jean-Pierre AUBIN
Universite Paris-Dauphine, F-75775 Paris ex (16),France
INTRODUCTION
We begin this paper by quoting the wish J. von Neumann and 0. Morgenstern
expressed in 1944 at the end of the first chapter of their monograph "Theory of
Games and Economic Behavior":
"Our theory is thoroughly static. A dynamic theory would unquestionably be
more complete and therefore, preferable ..."
"Our static theory specifies equilibria ... A dynamic theory, when one is found
-will probably describe the changes in terms of simpler concepts."
One of the economic characteristics is the presence of scarcity constraints,
and more, generally, viability constraints to which a socio-economic system must
adapt during its evolution.
It becomes then natural to specify the "minimal" conditions under which an
economy can work and to specify classes - as large as possible - of reasonable
economies whose evolution does not violate these viability conditions (as well as
other specifications).
In my opinion, when one has to design a mathematical metaphor for an
evolutionary model of socio-economic variables, one should start by gathering
the constraints of these variables which cannot- or should not- be violated.
This requires first to delineate the endogenous states of the system under
study and to discriminate them from the rest of the variables, regarded as
exogenous, constituting in some sense the "environment" of the system under
investigation. This partition between variables, which dictates the level of
abstraction of a particular investigation, is the first source of constraints that the
endogenous variable must obey.
Usually, there are few disputes among "modelers" when they are listing these
constraints.
Serious disagreements may begin when behavioral assumptions have to be
made.
us to correct the dynamics of the initial system in order that the constraints
become viable under the corrected system. These viability multipliers play the
role of Lagrange or Kuhn-Tucker multipliers in optimization theory, where an
optimal solution of the problem of maximization of a utility function under
constraint is obtained by maximizing without constraints a corrected utility
function involving Lagrange multipliers. Both the viability and Lagrange
multipliers belong to the same space (the dual of the resource space), and are
usually interpreted as virtual prices as well as other regulatory controls, called
regulons.
Therefore viability multipliers provide one way (but not the unique one)
allowing to design dynamical economies for which the constrained set is viable,
that should be familiar to economists, since it amounts to use for (virtual) prices
the very same multipliers - viability multipliers instead of Lagrange ones -
that are used in optimization under constraints for relaxing the constraints. In this
respect, viability theory can be regarded as an evolution theory under (viability)
constraints.
For example, ever since Adam Smith's invisible hand, what we call nowadays
decentralization is justified by the need of agents to behave in a decentralized
way for complying to scarcity constraints, using for this purpose "messages"
such as prices or "rationing" mechanisms which involve shortages (and lines,
queues, unemployment), or "frustration" of consumers, or "monetary"
mechanisms, or others. "Prices" constitute the main examples of messages,
actually, the messages with the smallest dimension (see for instance an
introduction to this issue in (Saari 1995). Such prices appear here as viability
multipliers emerging2 when allocations of commodities satisfy the scarcity
constraints.
The next task is to derive from the confrontation of the (corrected) dynamics
and the constraints the concealed regulation mechanisms governing viable
evolutions, and to select some of them according to some further principle. This
allows us to derive "adjustment laws" instead of founding the modelization
process based on such a law. This goes against the tradition of theoretical
economy, where the adjustment of some variables for reaching an equilibrium
plays a basic and prominent role. The so-called "law of supply and demand"
states that prices react in a determined direction in response to a difference
between supply and demand in the market: the price of a particular commodity is
assumed to vary according to the sign of the excess demand of this commodity.
Instead of reasoning with a law of adjustment a priori given, and which does not
produce viable evolutions, we shall build "dedicated" laws of supply and demand
which shall provide viable solutions. In some way, this reverse approach allows
us to "explain" a posteriori the role of such an adjustment law instead of
scrutinizing the consequences of an a priori given law for possible justifications.
6 Chapter 1
Since at least the works of Charles Elton ( 1958) and of George Hutchinson
( 1959) at the end of the fifties, the conventional wisdom of biologists proposes in
some loose ways that complexity, regarded as the number of variables of the
systems and their links - is justified for maintaining "stability" - a fuzzy word,
meaning confinement, or rather, viability as it was proposed to single out this
meaning from the numerous intendments of "stability" in mathematics.
Biodiversity is presently and actually championed on the basis of this objective.
In a series of papers summarized in (May 1973), Robert May and his
collaborators disclaimed this proposition by showing that the higher the
dimension, the less stable were dynamical models of Lotka-Volterra types.
Therefore, either the biologists' assumption was false, or such mathematical
connotation of complexity - the dimension of the state space - or the chosen
mathematical model is inadequate. This is a suggestion made by John-Maynard
Smith (1974), when he concluded that stability of ecosystems is due to some
specific interactions.
Here, we retain the following features:
the purpose of complexity to sustain the Complexity means in the day-to-day
language not only the number of variables of a system, but also and above all
the labyrinth of connections between the components of an organization or a
"system" (or, for that matter, of a living organism),
the purpose of complexity to sustain the "stability" - another polysemous is
word - or more precisely the viability constraints set by an organization,
the increase of complexity is parallel to the growth of the web of constraints
whenever the system cannot comply to them in an autonomous or
decentralized way,
the organization of organisms as a hierarchical structure of relatively
"autonomous" organs is due to "cycles" involved in the viability constraints
or multi-stage production processes,
the organization in organisms or subsystems is rooted in the need to offer
them slowly evolving partial environments to specialize them in specific
activities.
However, these attempts did not answer directly the question that some
economists or biologists asked: Complexity and hierarchical organization, yes,
but for what purpose?
This growth of "structural" complexity is the legacy that Jean-Baptiste de
Monet, chevalier de Lamarck, offered to us, the backbone of his theory of
evolution which was forgotten ever since, overshadowed as it was by other
aspects of the evolution of species, such as the Darwinian natural selection or
genetics. The Claude Bernard's "constance du milieu interieur", the
"homeostasis' of Walter Cannon, viability constraints to which dynamical
8 Chapter 1
systems must comply, later contributed to single out the crucial role of
constraints as a key for explaining this aspect of complexity.
In this framework of adaptation to viability constraints, the evolution of the
state no longer derives from intrinsic dynamical laws valid in the absence of
constraints, but from some "organization" that evolves together with the state of
the system in order to adapt to the viability constraints. This attempt to sustain
the viability of the system by connecting the dynamics or the constraints of its
agents may be a general feature of "complex systems".
We regard here connectionism - a less normative and more neutral term than
cooperation whenever the system, the organ, the organism or the organization
arise in economics, social sciences or biology- as an answer to adapt to more
and more viability constraints, which implies the emergence of links between the
components of a dynamical system and their evolution.
We shall restrict our study to the case when the organization is described by a
"network" we now define.
A purpose of an organization is to coordinate the actions of a finite number n
of agents labelled i = 1, . . . , n. It is described here by the architecture of a
network of agents, such as:
1. socio-economic networks (see for instance Ioannides 1997; Aubin 1997,
1998b; Aubin, Foray 1998; Bonneuil 2000).
2. neural networks (see for instance Aubin 1995; 1996; 1998c),
3. genetic networks (see for instance Bonneuil 1998; Bonneuil, Saint-Pierre
2000).
This coordinated activity requires a network of communications of actions
xi E Xi ranging over n finite dimensional vector spaces x;
The simplest general form of coordination is to require that a relation between
actions ofthe formg(A(x 1, . . . , x,J) E Mmust be satisfied. Here:
1. A : Il~=f ~ Y is a connectionist operator relating the individual actions
in a collective way,
2. M c Y is the subset of the resource space Y and g is a map, regarded as a
resource map.
We shall study this coordination problem in a dynamic environment, by
allowing actions x(t) and connectionist operators A(t) to evolve4 according to
dynamical systems we shall construct later. In this case, the coordination problem
takes the form:
V t ~ 0, g(A(t)(xi(t) . . . , Xn(t))) E M
Evolution of complex economic systems and uncertain information 9
Vt~O,g({As(t)(x(t))} scN)E M
where g : n Ys H Y .
SeN
The question we raise is the following: Assume that we may know the
intrinsic laws of evolution of the variables x; (independently of the constraints),
of the connectionist operator As(t) and of the coalitions S(t), there is no reason
why collective constraints defining the above architecture are viable under these
dynamics, i.e, satisfied at each instant.
One may be able, with a lot of ingeniosity and the intimate knowledge of a
given problem, and for "simple constraints", to derive dynamics under which the
constraints are viable.
However, we can use the kind of "mathematical factory" providing classes of
dynamics "correcting" the initial (intrinsic) ones through viability multipliers q(t)
ranging over the dual Y* of the resource space Yin such a way that the viability
of the constraints is guaranteed.
This may allow us to provide an explanation of the formation and the
evolution of the architecture of the network and of the active coalitions as well as
the evolution of the actions themselves.
The results presented here use this approach in the case of the above specific
constraints. We show that by doing so, the dynamics of the evolution of
connectionist operators and coalitions present some interesting features.
In order to tackle mathematically this problem, we shall:
10 Chapter 1
As for the correction of the velocities of the connectionist tensors As, their
correction is a weighted "multi-Hebbian" rule: for each component of the
connectionist tensor, the correction term is the product of the membership
yS(X(t)), of the coalitionS, of the components xik (t) and of the component q'(t) of
the regulon. This is a generalization of the celebrated Hebbian rule proposed by
Hebb in his classic book The organization of behavior in 1949 as the basic
learning process of synaptic weight in neural networks (see Aubin 1995, 1996,
1998c) for more details). Mathematically speaking, we recognize tensor products
of vectors that boil down to matrices when only two vectors are involved.
In other words, the viability multipliers appear in the regulation of the
multiaffine connectionist operators under the form of a "multi-Hebbian" rules, as
Evolution of complex economic systems and uncertain information 11
in Aubin, Burnod (1998) where they were introduced for the fist time,
compounded with the presence of the membership coalitions ys{x(t)), when
coalitions of agents are allowed to form and to evolve.
Even though viability multipliers do not provide all the dynamics under which
a constrained set is viable, they provide classes of them exhibiting interesting
structures that deserve to be investigated and tested in concrete situations.
Remark: Learning Laws and Supply and Demand Law - It is curious that
both the standard supply and demand Jaw, known as the Walrasian tatonnement
process, in economics and the Hebbian learning law in cognitive sciences were
the starting points of the Walras general equilibrium theory and of learning
processes in neural networks. In both theories, this choice of putting such
adaptation laws as a prerequisite led to the same cul de sacs. As we alluded to
above, starting instead from dynamic laws of agents, viability theory provides
"dedicated adaptation laws", so to speak, as the conclusion of the theory instead
as the primitive feature. In both cases, the point is to maintain the viability of the
system, that allocation of scarce commodities satisfy the scarcity constraints in
economics, that the viability of the neural network is maintained in the cognitive
sciences. For neural networks, this approach provides learning rules that possess
the features meeting the Hebbian criterion. For the general networks studied here,
these features are still satisfied in spirit.
4. Outline
fuzzy coalitions and show how they may evolve for maintaining the viability of
the architecture of the network.
For simplicity, we summarize the case when there is only one agent and when
the operator A : X f--7 Y is affine studied in Aubin ( 1997, 1998b, 1998c):
where both the state x, the resource y and the connectionist operator W evolve.
These constraints are not necessarily viable under an arbitrary dynamic system of
the form
i)x'(t) = c(x(t))
{ ii)y'(t) = d(y(t))
iii)W' (t) = a(W (t))
(see Aubin, Frankowska (1990) and Rockafellar, Wets (1997) for more details on
these topics).
We can prove that the viability of the constraints can be reestablished when
the initial system is replaced by the control system:
x ® q :p E X* := X~ (x ® q)(x) := (p,x)q
the matrix of which is made of entries (x ® q){ = x;q 1 (see Aubin (1996) for
more details on the relations between Hebbian rules and tensor products in the
framework of neural networks).
In other words, the correction of a dynamical system for reestablishing the
viability of constraints of the form W(t)x(t)+y(t) e M involves the celebrated
Hebbian rule proposed by Hebb in 1949 as the basic learning process of synaptic
weight: Taking a( W) = 0, the evolution of the synaptic matrix W := w( obeys the
differential equation:
d . .
-wf (t) = -x;(t)q 1 (t)
dt
It states that the velocity of the synaptic weight is the product of the presynaptic
activity and the postsynaptic activity. This intuition of a neurobiologist is
confirmed mathematically by the above result. Such a learning rule "pops up"
(or, more pedantically, emerges) whenever the synaptic matrices are involved for
regulating the system in order to maintain the "homeostatic" constraint
W(t)x(t)+y(t) E M.
14 Chapter 1
Vt ~ 0, W(t)x(t)x(t) + y(t) E M
i)x'(t) = c(x(t))
ii)y'(t) =d(y(t))
iii)z'(t) = K(X(t))
iiv)W'(t) = a(W(t))
The correction term is the "cost of the linear constraint" (q(t), W(t)x(t)) in the
law of evolution of l(t).
and:
We shall prove that when these constraints are not viable under an arbitrary
dynamic system of the form:
iir)Az,(t) =Cl0(~(t))-q(t))
iv)~(t)=aj(A1 (t))-x1(t)®q(t)
v)A~(t) =lX:!(A2 (t))- x2 (t) ®q(t)
vr)A{1,2 j(t) =q 1 , 2 j(~ 1 , 2 }(t))- x 1(t) ®x2 (t) ®q(t)
Hence, the structure of this control system involves the transposes A;* (t)q(t)
and Ap ,21 (t)(xj(t))* (t)q(t) (i = 1, 2) in the evolution of the variables x;(t), and the
tensor products x;(t) ® q(t) (Hebbian rules) in the evolution of the linear
operators A;(t), and the tensor product x 1(t) ® xl(t) ® q(t) in the evolution of the
bilinear form A {1,2} .
The tensor product X] ®X] ® q is a bilinear operator from x; xx;
to Y*
associating with any pair (p 1 , p 2 ) E x x; x;
the element:
16 Chapter 1
If the vector spaces are supplied with bases, the components of this bilinear form
- the "tensors" - can be written:
as the products of the components of the three fagents of this tensor product.
Taking a1 ,2(A) = 0, the evolution of the hi-synaptic tensor A{l,2J := (a(,,;2 ) obeys
the differential equation:
d . .
-a 1 . (t) = -x1 (t)x 2 (t)q 1 (t)
dt l] , l 2 I) 12
It states that the velocity of the synaptic tensor is the product of the presynaptic
activities of the neurons arriving at the synapse (i~, i2 , j) and the postsynaptic
activity (see Aubin, Bumod 1998).
We may enrich this problem by introduced coefficients x;(t) E [0, 1] aimed at
tuning the action x;(t) (i = 1, 2) that we shall regard as the components of a fuzzy
coalition later. In this framework, the constraint becomes:
we shall prove that the above constraints are viable under the control system:
Evolution of complex economic systems and uncertain information 17
where:
In order to handle more explicit and tractable formulas and results, we shall
assume that the connectionist operator A : X:= II=•
X; ---7 Y is multiaffine.
For defining such a multiaffine operator, we associate with any coalitionS c
Nits characteristic function Xs : N H R associating with any i E N
. {10
X s (z) :=
if
if
iE S
i il S
that associates with any X =(xJ, ... ' Xn) E n~=l xi the sequence Xs 0 X ERn
defined by:
X if i E S
'IIi= 1, .. .. ,n (Xsox)i := { o' if i fl s
since Xs 0 is nothing other that the canonical projector from n~=l xi onto Jf. In
particular,XV := IJni=J
X . , andX 0 := {0}.
I
defined by:
'II X E n: n
i=l
xi 'As(x) = As(xJ, ... ,Xn) := As(xsox)
X" 0 q: p :~ (p,, ... ,p") E x; f-> (x,0 ... 0x"0q)(p): ~( (p,,x,)) q E Y'
Assume that we start with intrinsic dynamics of the actions X;, the resources y,
the connectionist matrices Wand the fuzzy coalitions x:
Theorem 1 Assume that the functions c;, K; and as are continuous and that M c
Yare closed. Then the constraints
20 Chapter 1
V t ~0, L As(t)(x(t)) E M
Se N
S3i
i =l, ... ,n
ii)A~ (t) = as(A(t)) -(® JES
x1 (t)) ® q(t), Sc N
!!_A£
dt n ,
=as
n ,
(A(t))-[nxi
. *
(t))q 1 (t)
ie Sk ieSk tE S
The correction term of the component A£n . of the S-linear operator is the
;es'*
product of the components X;* (t) actions x; in the coalition S and of the
component q1 of the viability multiplier. This can be regarded as a multi-Hebbian
rule in neural network learning algorithms, since for linear operators, we find the
product of the component xk of the presynaptic action and the component q1 of the
post-synaptic action.
Indeed, when the vector spaces x; := Rn' are supplied with basis e;* , k = 1,
... , n;, when we denote by < their dual basis, and when Y := W is supplied
with a basis j 1 its dual supplied with the dual basis J;, then the tensor products
(®eik
IES
)® J; (j = 1, . .. , p , k= 1, . .. , n;) form a basis of Ls(XS*, Y*).
Evolution of complex economic systems and uncertain information 21
Hence the components of the tensor product (®xi)® q in this basis are the
[TI
IES
This first definition of a coalition which comes to mind being that of a subset
of players S c N is not adequate for tackling dynamical models of evolution of
coalitions since the 2n coalitions range over a finite set, preventing us from using
analytical techniques.
One way to overcome this difficulty is to embed the family of subsets of a
(discrete) set N of n players to the space R n through the map X associating with
any coalitionS E P(N) its characteristic function 7 XsE {0, 1} ncR n, since R n can
be regarded as the set of functions from N to R.
By definition, the family of fuzzy sets 8 is the convex hull [0, 1] n of the power
set {0, 1} n in R n. Therefore, we can write any fuzzy set in the form:
YsCX) :=II x1
}E S
I (IIx;:
Sc P (z) jES
As(x)
We wish to encapsulate the idea that at each instant, only a number of fuzzy
coalitions X are active. Hence the collective constraint linking multiaffine
operators, fuzzy coalitions and actions can be written in the form:
I (IIx;Ct))
Sc P(z(t )) ; ES
As (t)(x(t)) EM
Evolution ofcomplex economic systems and uncertain information 23
Assume that we start with intrinsic dynamics of the actions x;, the resources y,
the connectionist matrices Wand the fuzzy coalitions X
Theorem 2 Assume that the functions C;, K; and as are continuous and that M c
Yare closed. Then the constraints
:L
ScP(z(t))
[Ilx/t))
JES
As (t)(x(t)) eM
i = l, ... ,n
i = l,..., n
SeN
Let us comment these formulas. First, the viability multipliers q(t) E Y* can
be regarded as regulons, i.e., regulation controls or parameters, or virtual prices
in the language of economists. They are chosen adequately at each instant in
order that the viability constraints describing the network can be satisfied at each
instant, and the above theorem guarantees that it is possible. The next section
tells us how to choose at each instant such regulons (the regulation law).
For each agent i, the velocities x' ;(t) of the state and the velocities x' ;(t) of its
membership in the fuzzy coalition X (t) are corrected by subtracting:
I. the sum over all coalitions S to which he belongs of the As (t)(x_;(t))*q(t)
weighted by the membership Ys (X(t)):
2. the sum over all coalitions S to which he belongs of the costs (q(t), A s (t) (x(t)))
of the constraints associated with connectionist tensor As of the coalition S
weighted by the membership n-1;(X(t)):
As for the correction of the velocities of the connectionist tensors As, their
correction is a weighted "multi-Hebbian" rule: for each component product of
components A in
of As, the correction term is the product of the membership r
ieSik
NOTES
l. The theory of tychastic control (or "robust control") can be studied in the framework of
dynamical games, when one player plays the role of Nature that chooses - plays -
perturbations. These perturbations, disturbances, parameters that are not under the control of the
controller or the decision-maker, could be called "random variables" if this vocabulary was not
already confiscated by probabilists. We suggest to borrow to Charles Peirce the concept of
tyche, one of the three words of classical Greek meaning "chance", and to call in this case the
control system as a tychastic system. See Aubin, Pujal, Saint-Pierre (2001) for more details.
2. Non mathematical accounts of such questions can be found in Aubin (to be edited).
3. Physicists and computer scientists have attempted to measure it in various ways, through the
concept of Clausius's entropy, Claude Shannon's information, Gilbert Chauvet's nonsymmetric
information, the degree of regularity instead of randomness, "hierarchical complexity" in the
26 Chapter 1
display of level of interactions, Andrei Kolmogorov, Gregory Chaitin & Ray Solomono_
"algorithmic information contents" (see Chaitin (1992) for instance) and other temporal or
spatial computational complexity indices measuring the computer time or the amount of
computer memory needed to describe a system, "grammatical complexity" measuring the
language to describe it, etc. Some economists link complexity issues with chaos theory as in
Day (1994; to be edited) for instance. Other investigators link complexity issues with
catastrophe theory, or fractals. See among many references (Peliti, Vulpiani 1987). Physicists -
and among them, specialists of "spin glasses" such as Giorgio Parisi ( 1990; 1992; 1996)-
proposes the number of equilibria of a dynamical system as a characteristic of complexity. Or,
even more to the point, "quasi equilibria", that are "small" areas of the state space in which the
evolution remains a "long time", before "jumping quickly" to another quasi equilibrium. The
concept of static and dynamical "connectionist complexity" indices when connectionist matrices
are used as regulons to regulate viable solutions and to the search of evolution minimizing at
each instant those indices was introduced in Aubin (1998b).
4. For simplicity, the set M(t) is assumed to be constant. But they could also evolve through
mutational equations and the following results can be adapted to this case. Curiously, the overall
architecture is not changed when the set of available resources evolves under a mutational
equation. See Aubin ( 1999) for more details on mutational equations.
5. That are nothing other than matrices when the operators are linear instead of multilinear.
Tensors are the matrices of multilinear operators, so to speak, and their "entries" depend upon
several indexes instead of the two involved in matrices.
IT X; *, Y*) .
n
Ln(
i=l
7. This canonical embedding is more adapted to the nature of the power set P(N) than the universal
embedding of a discrete set M of m elements to Rm by the Dirac measure associating with any j
E M the jth element of the canonical basis of Rm. The convex hull of the image of M by this
embedding is the probability simplex of Rm. Hence fuzzy sets offer a "dedicated
convexification" procedure of the discrete power set M := P(N) instead of the universal
convexification procedure of frequencies, probabilities, mixed strategies derived from its
2
embedding in Rm = R n
8. This concept of fuzzy set was introduced in 1965 by L. A. Zadeh. Since then, it has been wildly
successful, even in many areas outside mathematics! Lately, we found in "La lutte finale",
Michel Lafon (1994), p.69 by A. Bercoffthe following quotation of the late Franyois Mitterand,
president of the French Republic (1981-1995): "Aujourd'hui, nous nageons dans Ia poesie pure
des sous ensembles flous" ... (Today, we swim in the pure poetry of fuzzy subsets)!
9. Actually, this idea of using fuzzy coalitions has already been used in the framework of
cooperative games with and without side-payments (Aubin 1981 a; 1981 b; 1979 Chapter 12;
1998a Chapter 13; Mares 200 I; Mishizaki, Sokawa 200 I; Basile 1993; 1994; to be edited;
Basile, De Simone, Graziano 1996; Florenzano 1990). Fuzzy coalitions have also been used in
dynamical models of cooperative games in (Aubin, Cellina 1984 Chapter 4) and of economic
theory in Aubin (1997 Chapter 5).
Evolution of complex economic systems and uncertain information 27
REFERENCES
Aubin J.-P. (1979), "Mathematical Methods of Game and Economic Theory", Studies in
Mathematics and its applications, 7, North- Holland.
Aubin J.-P. (1981a), "Cooperative fuzzy games", Mathematical Operational Research, 6, l-13 .
Aubin J.-P. (1981 b), "Locally lipchitz cooperative games", J. Math. Economics, 8, 241-262.
Aubin J.-P. (1982), "An alternative mathematical description of a player in game theory", IIASA
WP, 82-122.
Aubin J.-P. (1991), Viability Theory, Birkhliuser, Boston, Basel, Berlin.
Aubin J.-P. (1993), "Beyond Neural Networks: Cognitive Systems", in Demongeot J., Capasso
(eds.) Mathematics Applied to Biology and Medicine, Wuers, Winnipeg.
Aubin J.-P. (1995), "Learning as adaptive control of synaptic matrices", in Arbib M. (ed.) The
handbook of brain theory and neural networks, Bradford Books and MIT Press.
Aubin J.-P. (1996), Neural Networks and Qualitative Physics: A Viability Approach, Cambridge
University Press.
Aubin J.-P. (1997) Dynamic Economic Theory: a Viability Approach, Springer-Verlag.
Aubin J.-P. (1998 a), Optima and equilibria (2nde edition), Springer-Verlag.
Aubin J.-P. (1998 b), "Connectionist complexity and its evolution", in Equations aux deriw!es
partielles, Articles dedies aJ.-L. Lions, Elsevier, 50-79.
Aubin J.-P. (1998 c), "Minimal complexity and maximal decentralization", in Beckmann H.J.,
Johansson B., Snickars F, Thord D. (eds.) Knowledge and Information in a Dynamic Economy,
Springer, 83-l 04.
Aubin J.-P. (1999), Mutational and morphological analysis: tools for shape regulation and
morphogenesis, Birkhiiuser.
Aubin J.-P. (2002), "Dynamic Core of Fuzzy Dynamical Cooperative Games, Annals of Dynamic
Games", Ninth International Symposium on Dynamical Games and Applications, Adelaide,
2000.
Aubin J.-P. (to be edited), La mort du devin, !'emergence du demiurge. Essai sur Ia contingence,
l 'inertie et Ia viabi/ite des systemes.
Aubin J.-P., Burnod Y. (1998), "Hebbian Learning in Neural Networks with Gates", Cahiers du
Centre de Recherche Viabilite, Jeux, Contr6le 981.
Aubin J.-P., Cellina A. (1984), Differential Inclusions, Springer-Verlag.
Aubin J.-P., Dordan 0 . (1996), "Fuzzy Systems, Viability Theory and Toll Sets", in Hung Nguyen
(ed.) Handbook ofFuzzy Systems, Modeling and Control, Kluwer, 461-488.
Aubin J.-P., Foray D. (1998), "The emergence of network organizations in processes of
technological choice: a viability approach", in Cohendet P., Llerena P., Stahn H., Umbhauer G.
(eds.), The economics of networks, Springer, 283-290.
Aubin J.-P., Frankowska H. (1990) Set-Valued Analysis.
Aubin J.-P., Louis-Guerin C., Zavalloni M. (1979) Comptabilite entre conduites sociales reelles
dans les groupes et les representations symboliques de ces groupes : un essai de formalisation
mathematique, Math. Sci. Hum., 68, 27-61.
Aubin J.-P. , Pujal D., Saint-Pierre P. (2001), Dynamic Management of Portfolios with Transaction
Costs under Tychastic Uncertainty, preprint.
Basile A., De Simone A., Graziano M.G. (1996), "On the Aubin-like characterization of
competitive equilibria in infinite-dimensional economies", Rivista di Matematica perle Scienze
Economiche e Sociali, 19, 187-213.
Basile A. (1993), "Finitely additive nonatomic coalition production economies: Core-Walras
equivalence", Int. Econ. Rew., 34,993-995.
Basile A. (1994), "Finitely additive correpondences", Procedings AMS 121, 883-891.
28 Chapter I
Basile A. (to be edited) On the range of certain additive correspondences, Universita di Napoli
Bonneuil N. (2000), "Viability in dynamic social networks", Journal of Mathematical Sociology,
24, 175-182.
Bonneui1 N. (1998) "Games, equilibria, and population regulation under viability constraints: An
interpretation of the work of the anthropologist Fredrik Barth", Population: An English
selection, special issue ofNew Methodological Approaches in the Biological Sciences, 151-179.
Bonneuil N. (1998), "Population paths implied by the mean number of pairwise nucleotide
differences among mitochondrial sequences", Annals of Human Genetics, 62, 61-73.
Bonneuil N., Rosenta1 P.-A. (2002), "Changing social mobility in 19th century France", Historical
Methods, Spring, 32, 53-73.
Bonneuil N., Saint-Pierre P. (2000), "Protected polymorphism in the theo-locus haploid model with
unpredictable firnesses", Journal of Mathematical Biology, 40, 251-377 .
Bonneuil N., Saint-Pierre P. (1998), "Domaine de victoire et strategies viables dans le cas d'une
correspondance non convexe : application a l'anthropologie des pecheurs selon Fredrik Barth",
Mathematiques et Sciences Humaines, 132, 43-66.
Chaitin G.J. (1992), Algorithmic information theory, Cambridge University Press.
Chauvet G. (1995), La vie dans Ia matiere, Flammarion.
Day R.H. (1994) Complex Economic Dynamics, Vol. I, An introduction to dynamical systems and
market mechanims, MIT Press.
Day R.H. (to be edited) Complex Economic Dynamics, Vol. II, An introduction to macroeconomic
dynamics, MIT Press.
Deghdak M., Florenzano M. (1999), "Decentralizing Edgeworth equilibria in economies with many
commodities", Economic Theory, 14, 287-310.
Elton C. (1958), The ecology of invasion in plants and animals, Cambridge University Press.
Filar J.A., Petrosjan L.A. (2000), Dynamic cooperative games, International Game, Theory Review,
2, 47-65.
Florenzano M. (1990), "Edgeworth equilibria, fuzzy core and equilibria of a production economy
without ordered preferences", Journal of Math. Anal. Appl., 153, 18-36.
Florenzano M. , Marakulin V.M. (2001), "Production equilibria in vector lattices", Economic
Theory,20.
Henry C. ( 1972), "Differential equations with discontinuous right hand side", Journal of Economic
Theory, 4, 545-551.
Hutchinson G.E. (1959), "Hommage to Santa Rosalia, or why there are so many kinds of animals",
American Naturalist, 93, 145-159.
Ioannides Y.M. (1997), "Evolution of trading structures", in Arthur, Durlauf, Lane (eds.) The
Economy as an evolving complex system, Addison-Wesley.
Lamarck J.-B. (1809), Philosophie biologique.
Livi R., Ruffo S. , Ciliberto S. , Buatti M. (eds) (1988), Chaos and complexity, Word Scientific.
Mares M. (2001), Fuzzy cooperative games. Cooperation with vague expectations, Physica Verlag
May R.M. ( 1973), Stability and complexity in model ecosytems, Princeton University Press.
Mayr E. ( 1988), Toward a new philosophy of biology, Harvard University Press.
Mishizaki 1., Sokawa M. (2001), Fuzzy and multiobjective games for conflict resolution, Physica
Verlag.
Parisi G. (1990), "Emergence of a tree structure in complex systems", in Solbrig O.T., Nicolis C.
(eds.) Perspectives on biological complexity, IUBS monograph series, 6.
Parisi G. (1992), Order, disorder and simulations, World Scientific.
Parisi G. (1996), Sulla complessita, inFra ordine e caos, Tumo M., Liotta E., Oruscci F. (eds),
Cosmopoli.
Peliti L., Vulpiani A. (eds) (1987) Measures of complexity, Springer-Verlag.
Evolution of complex economic systems and uncertain information 29
Petrosjan L.A. (200 1), "Dynamic Cooperative Games", Annals ofDynamic Games.
Petrosjan L.A., Zenkevitch N.A. (1996), Game Theory, World Scientific.
Rockafellar R.T., Wets R. (1997), Variational Analysis, Springer-Verlag.
Saari D.G. (1995), Mathematical complexity of simple economics, Notices of AMS.
Saint-Pierre P. (200 1), "Approximation of capture basins for hybrid systems", Proceedings of the
ECC 2001 Conference.
Shimokawa T., Pakdaman K., Takahata T., Tanabe S. Sato S. (to be edited), "A first-passage-time
analysis of the periodically forced noisy leaky integrate-and-fire model", Biological cybernetics.
Shimokawa T., Pakdaman K. & Sato S. (1999) Coherence resonance in a noisy leaky integrate-
and-fire model.
Shimokawa T., Pakdaman K., Sato S. (1999), "Time-scale matching in the response of a leaky
integrate-and-fire neuron model to periodic stimulation with additive noise", Physical Review E,
59, 3427-3443.
Smith J.M. (1974), Models in ecology, Cambridge University Press.
Weaver W. (1948), "Science and complexity", American Scientist, 36, 536.
Wigner E. (1960), "The unreasonable effectiveness of mathematics m the natural sciences",
Communications in Pure and Applied Mathematics, 13, 1.
Chapter 2
Abstract: The idea of case-based decision making has recently been proposed as an alternative
to expected utility theory. It combines concepts and principles from both decision
theory and case-based reasoning. Loosely speaking, a case-based decision maker
learns by storing already experienced decision problems, along with a rating of the
results. Whenever a new problem needs to be solved, possible actions are assessed
on the basis of experience from similar situations in which these actions have
already been applied. We formalize case-based decision making within the
framework of fuzzy sets and possibility theory. The basic idea underlying this
approach is to give preference to acts which have always led to good results for
problems which are similar to the current one. We also propose two extensions of
the basic model. Firstly, we deal separately with situations where an agent has made
very few, if any, observations. Obviously, such situations are difficult to handle for
a case-based approach. Secondly, we propose a reasonable relaxation of the original
decision principle, namely to look for acts which have yielded good results, not
necessarily for all, but at least for most cases in the past.
Key words: Decision theory, Case-based reasoning, Possibility theory, Fuzzy sets.
INTRODUCTION
Early work in artificial intelligence (AI) has mainly focused on formal logic
as a basis for knowledge representation and has largely rejected approaches from
(statistical) decision theory as being intractable and inadequate for expressing the
rich structure of (human) knowledge (Horvitz, Breese, Henrion 1988). However,
the recent development of more tractable and expressive decision-theoretic
31
This section gives a brief review of the model introduced by Gilboa and
Schmeidler (1995), referred to as case-based decicion theory (CBDT) by the
authors. Putting it in a nutshell, the setup they proceed from can be characterized
as follows : Let Q and A be (finite) sets of problems and acts, respectively, and
denote by R a set of results or outcomes. Choosing act a E A for solving problem
p E Q leads to the outcome r = r(p,a) E R. A utility function u: R~U, resp. u: Q
x A -fU assigns utility values to such outcomes; the utility scale U is taken as
the set of real numbers. Let
(Eq.l)
The summation over an empty set yields the "default value" 0 which plays the
role of an "aspiration level." Despite the formal resemblance between (Eq.2) and
Possibilistic case-based decisions 35
the well-known expected utility formula one should not ignore some substantial
differences between CBDT and expected utility theory (EUT). This concerns
not only the conceptual level but also mathematical aspects. Particularly, it
should be noted that the similarity weights in (Eq.2) do not necessarily sum up to
1. Consequently, (Eq.2) must not be interpreted as an estimation of the utility
u(r(p 0 ,a0 )). As an alternative to the linear functional (Eq.2), an "averaged
similarity" version has been proposed. It results from replacing O"Q in (Eq.2) by
the similarity measure:
(Eq.3)
whenever the latter is well-defined. (Note that this measure is defined separately
for each act a0 .) Theoretical details of CBDT including an axiomatic
characterization of decision principle (Eq.2) are presented in (Gilboa, Schmeidler
1995).
The basic model has been generalized concerning several aspects. The
problem of optimizing decision behavior by adjusting the aspiration level in the
context of repeated problem solving is considered in (Gilboa, Schmeidler 1996).
In (Gilboa, Schmeidler 1997), the similarity measure in (Eq.2) is extended to
problem-act tuples: Given two similar problems, it is assumed that similar
outcomes are obtained for similar acts (not only for the same act). Indeed, it is
argued convincingly that a model of the form:
t .
V M(a)= min aQ(P,Po) ~ u(r) (Eq.5)
Po, (p,a,r)EM
This valuation supports the idea of finding an act a which has always resulted in
good outcomes for problems similar to the current problem Po . Indeed, (Eq.5)
can be considered as a (generalized) truth degree of the claim that "whenever a
has been applied to a problem p similar to p 0 , the corresponding outcome has
yielded a high utility." An essential idea behind (Eq.5) is that of avoiding the
accumulation and compensation effect caused by the decision criterion (Eq.2)
since these effects do not always seem appropriate 2 .
As a special realization of (Eq.5) the valuation:
Possibilistic case-based decisions 37
(Eq.6)
(Eq.7)
{ } (Eq.8)
V l M(a 0 )=. mm. max l-O'QxA((p, a),(p 0 ,a0 )),u(r)
Po. (p ,a ,r)EM
i . . f } (Eq.9)
V M (ao)= max mmlO'QxA ((p,a),(p 0 ,a0 )),u(r)
Po. (p,a,r)EM
In order to make the basic principles underlying the above criteria especially
obvious, suppose the qualitative utility scale to be given by U ={0, 1} .That is,
only a crude distinction between "bad" and "good" outcomes is made. (Eq.8) and
(Eq.9) can then be simplified as follows:
l . (Eq.lO)
V M(ao)=I - max O'QxA ((p,a),(p 0,a0 ))
Po• (p,a,r)EM:u(r)=O
38 Chapter 2
i (Eq.11)
V M(a0 )= max O'QxA((p,a),(po,ao))
Po, (p ,a,r)EM:u(r)=1
According to (Eq.lO), the decision maker only takes cases (p,a,r) with bad
outcomes into account. An act a0 is discounted whenever (p 0 , a 0 ) is similar to a
corresponding problem-act tuple (p, a). Thus, the agent is cautious and looks
for an act that it does not associate with a bad experience. According to (Eq.11),
it only considers the cases with good outcomes. An act a0 appears promising if
(p 0 ,a0 ) is similar to a tuple (p,a) which has yielded a good result. In other
words, the decision maker is more adventurous and looks for an act that it
associates with a good experience.
Alternative formalizations of C B D M have also been proposed in (Dubois,
Godo, Prade, Zapico 1998; Hiillermeier 1998; Hiillermeier 1999). For estimating
the utility of an act in connection with a new problem, these methods make use of
observed cases by more indirect means than the approaches discussed so far.
More precisely, a memory M of cases is used for deriving a quantification of
how likely a certain act will yield a certain outcome (utility). The (case-based
reasoning) hypothesis underlying these approaches is the assumption that "the
more similar two problems are, the more likely it is that an act leads to similar
results." Within the framework of (Hiillermeier 1998), where likely means
probable, a probability distribution on the set R of outcomes is derived from a
memory M . Likewise, possibility distributions are obtained in connection with
the possibilistic frameworks in (Dubois, Godo, Prade, Zapico 1998) and
(Hiillermeier 1999), where likely means possible and certain, respectively.
where:
and a' (-, p 0 ) denotes a renormalization of O"Q (-, p 0 ) such as, e.g.,
aQ(-,p0)!hM (a,p 0 ) (assuming hM (a,p 0 )>0.) The idea behind (Eq.12) is
that the willingness of a decision maker to choose act a is upper bounded by the
existence of problems which are completely similar to p 0 , and to which a has
been applied. Moreover, aQ (-, p 0 ) is renormalized in order to obtain a
meaningful degree of inclusion. Thus, (Eq.12) corresponds to the compound
condition that "there are problems similar to p 0 to which act a has been
applied, and the problems which are most similar to p 0 are among the problems
for which a has led to good results." Observe that (Eq.6) is retrieved from
(Eq.12) as soon as hM (a,p 0 ) = 1.
We shall now propose a generalization of (Eq.6) which can handle the two
above-mentioned sources ofuncertain ty in a unified way, and which is also able
to express uncertainty in connection with the valuation of an act. To this end, it
should first be noticed that we can write (Eq.6) as:
(Eq.13)
where the values 0 = a 0 < 0"1 < ... <am =1 constitute the (finite) set
{ aQ (p,p') Ip,p'E Q} of possible similarity degrees of problems, and:
is the lowest utility obtained in connection with act a for problems which are
ak -similar to Po. Moreover, vk = 1 (by definition) if Vk = 0, which just leads
to the problem that (Eq.6) becomes large if only few observations have been
made.
Possibilistic case-based decisions 41
i.e., the smallest degree of utility which can be obtained in connection with act a
for (not necessarily encountered) problems from Q which are ak -similar to p 0 .
Then, V P10 •M (a) can be interpreted as an approximation of:
vpj, M(a) =maxJmin wk (vk )lvl ,... ,vn E R, min max{l-(JJ. , VJ}= v} (Eq.l4)
O' ~~k~m O~k~m
mentioned above, the original approach (Eq.6) does then (implicitly) estimate the
lower bound wk by vk = 1, whereas (Eq.l3) is able, e.g., to express complete
ignorance via ~ = 1. Particularly, letting Wm = 1 in the case where Vm = 0
implies that vp:.M (a )(0) = 1 , i.e., the fact that act a should be assigned the
valuation 0 seems completely possible.
The above modeling of ignorance concerning the lower bound wk might be
generalized to the case where V, :;t: 0 by means of ~ ( v) = 1 if v ~ vk , and
Wk(v) = 0 otherwise. It seems reasonable, however, to think of more general
definitions of ~. Seen from the viewpoint of CBR, for instance, the memory
M may provide evidence concerning wk even if V, = 0 . In this connection it
seems particularly interesting to combine (Eq.5) with the possibilistic methods
mentioned at the end of section 2, which leads to a more "hypothetical"
specification of the ~ . Suppose, for example, the C B R principle, suggesting
that acts lead to similar outcomes for similar problems, to be strongly supported
by the observations which have been made so far. Moreover, suppose that a
certain act a has often led to good results for problems which are very (but not
perfectly) similar to the problem p 0 under consideration. It seems, then, likely
that a also leads to good outcomes when being applied to problems which are
completely similar to p 0 • Thus, Wm(v) should be small for small utility values
v, even though a case (p,a,r) such that O'Q (p,p 0 ) = O'm = 1 has not yet been
encountered.
Observe that (Eq.l4) also allows for the utilization ofbackground knowledge
which is not derived from the memory M . It might be known from a further
information source, for instance, that wk does definitely not fall below a certain
bound v' k , or at least that wk < v' k is unlikely, which leads to ~ ( v) = 0 resp.
Wk(v) << 1 for v < v'k.
Based on (Eq.l4) in conjunction with a generalized CBDM framework we
can also approach the first type of uncertainty, which has been mentioned at the
beginning of this section. To this end, we extend the set of outcomes to F (R ) ,
i.e., the set of all (normal) fuzzy subsets of the set R of results. A representation
Wk of knowledge about the lower bound wk is then derived from "fuzzy" cases
of the form (p, a, R) E Q xA xF (R ) . A value R(r) is interpreted as the
possibility .1l'(X =r) that the (unknown, not precisely observed) outcome X is
given by r E R . The derivation of Wk from "fuzzy" cases can be realized by
applying the extension principle to a derivation of ~ from "crisp" cases.
It has already been hinted at in the introduction that Gilboa and Schmeidler's
approach to case-based decision making is partly motivated by the idea of
avoiding any kind of "hypothetical" reasoning. As pointed out in (Gilboa,
Schmeidler 1995), such reasoning might become necessary in connection with
E UT since the decision maker has to know, e.g., all outcomes associated with
act/state pairs. It should, therefore, be mentioned that the hypothetical reasoning
in connection with (Eq.l4) is by far less demanding. Particularly, it does not
Possibilistic case-based decisions 43
require any knowledge which is not available. On the contrary, nothing prevents
us from using Wk in order to express complete ignorance. If available, however,
general background knowledge (including hypothetical knowledge "derived"
from the CBR assumption) should be utilized, and (Eq.14) presents the
opportunity for doing this. Comparing acts in the context of the generalized
model (Eq.14) turns out as the problem of comparing fuzzy sets resp. possibility
distributions. Needless to say, such a comparison is less straightforward than the
comparison of scalar values. In fact, there are different possibilities to approach
this problem (Dubois, Prade 1999) which are, however, not further discussed
here.
\;/ 1 ::;; k ::;; m - 1: J.L( k) ::;; J.L( k + 1) and J.L ( m) =1. (Eq.15)
The special case "for all" then corresponds to J.L(k) = 0 for 0 :s; k::;; m -1 and
J.L( m) = 1 . Given some J.L satisfying (Eq.15), we define an associated
membership function 71 by 71(0) =0 and 71(k) =1- J.L(k -1) for 1::;; k :s; m
(see e.g. (Dubois et al. 1997b)). A membership degree 71(k) can then be
interpreted as quantifying the importance that the property X is satisfied for k
(out of the m ) elements.
44 Chapter 2
V
Po .
M (a) = O<;k'>IM
min max {1 -
al
,ll ( k) , ga (k )} , (Eq.16)
where:
1 ::;; k ::;;JM J,
a defines the degree to which "the act a has induced good
outcomes for similar problems k times". The extent to which a (small) degree
ga (k) decreases the overall valuation of a is upper bounded by 1 = ,ll ( k) , i.e.
by the respective level of (un-)importance. Observe that we do not have to
consider all subsets M 'c M a of size k for deriving ga (k) . In fact, for
computin9 it is reasonable to arrange the m =I M a I values
v=maxt1-0'Q (p,p 0 ),u(r)} in a non-increasing order v1 ;:::v2 ;:::, •• ;:::vm.
Then, (Eq.16) is equivalent to:
where v0 = 1.
The generalized criterion (Eq.16) can be useful, e.g. in connection with the
idea of repeated decision making which arises quite naturally in connection with
a case-based approach to decision making. We might think of different scenarios
in which repeated problem solving becomes relevant. A simple model emerges
from the assumption that problems are chosen repeatedly from Q according to
some selection process which is not under the control of the agent, such as e.g.
the repeated (and independent) selection of problems according to some
probability measure. More generally, the problem faced next by the agent might
depend on the current problem and the act which is chosen for solving it. A
Markov Decision Process extended by a similarity measure over states (which
correspond to problems) may serve as an example. Besides, we might consider
case-based decision making as a reasonable strategy within a (repeated) game
playing framework, such as e.g. the iterated prisoner's dilemma (Axelrod 1984).
See (Htillermeier, Dubois, Prade 1999) for an experimental study in which
(Eq.16) is applied in repeated decision making.
Possibilistic case-based decisions 45
CONCLUSION
NOTES
I. Needless to say, a validation or comparison of decision-.theoretic models is generally difficult,
no matter whether from a descriptive or a normative point of view.
2. Note that the accumulation effect is also the main motivation for the normalization (Eq.3).
3. Other possibilities of expressing a fuzzy quantifier exist as well, including the use of order-
statistics (Prade, Yager 1994) and an ordered weighted minimum or maximum (Dubois, de
Berre, Prade, Sabbadin 1999).
46 Chapter 2
REFERENCES
Axelrod R. (1984), The Evolution ofCooperation. Basic Books, Inc., New York.
Boutilier C. (1994), "Toward a logic for qualitative design theory", in Doyle J., Sandewall E.,
Torasso P. (eds.), Proceedings K.R-94, 41h International Conference on Principles of Knowledge
Representation and Reasoning, Bonn, Germany,75-86.
Brafmann R., Tennenholtz M. (1996), "On the foundation of qualitative decision theory", in
Proceedings AAAI-96, 131h National Conference on Artificial Intelligence, AAAI-Press, 1291-
1296.
De Finetti B. (1937), "La prevision: Ses lois logiques, ses sources subjectives", Annates de
L I' nstitut Henri Poincare, VII, 1-68.
Doyle J., Dean T. "Strategic directions in artificial intelligence", AI Magazine, 18(1), 87-101.
Dubois D., de Berre D., Prade H., Sabbadin R. (1999), "Using possibilistic logic for modelling
qualitative decision: ATMS-Based algorithms", Fundamenta Informaticae, 37,1-30.
Dubois D., Esteva F., Garcia P., Godo L., de Mantaras R.L., Prade H. (1997), "Fuzzy modelling of
case-based reasoning and decision", in Leake D.B., Plaza E. (eds.), Case-based Reasoning
Research and Development, Proceedings ICCBR-97, Springer-Verlag, 599-610.
Dubois D., Godo L., Prade H., Zapico A. (1998), "Making decision in a qualitative setting: from
decision under uncertainty to case-based decision", in Cohn A. G., Schubert L., Shapiro S. C.
(eds.), Proceedings of the 61h International Conference on Principles of Knowledge
Representation and Reasoning (K.R-98), Trento, Italy, 594-605.
Dubois D., Nakata M., Prade H. (1997), "Extended divisions for flexible queries in relational
databases", Technical Report 97-43 R, IRIT - Institut de Recherche en lnformatique de
Toulouse, Universite Paul Sabatier, September.
Dubois D., Prade H. (1995), "Possibility theory as a basis for qualitative decision theory", in
Proceedings IJCAI-95, /4 1h International Joint Conference on Artificial Intelligence, Montreal,
1924-1930.
Dubois D., Prade H. (1997), "A fuzzy set approach to case-based decision", in Felix R. (ed.),
EFDAN-97, 2nd European Workshop on Fuzzy Decision Analysis and Neural Networks for
Management, Planning and Optimization , , Dortmund, Germany, 1-9. A revised version has
appeared in Reusch B., Temme K.H. (eds) (2001), Computational Intelligence: Theory and
Practice, Physica-Verlag, Heidelberg, 1-14.
Dubois D., Prade H. (1999), "A unified view of ranking techniques for fuzzy numbers",
Proceedings FUZZ-IEEE-99, Seoul.
Dubois D., Prade H., Sabbadin R. (1998), "Qualitative decision theory with Sugeno integrals", in
Proceedings UAI-94, l41h Conference on Uncertainty in Artificial Intelligence, Morgan
Kaufmann, 121-128.
Dubois D., Prade H., Testemale C. (1988), "Weighted fuzzy pattern matching", Fuzzy Sets and
Systems, 28,313-331,.
Dubois D., Prade H., Yager R. R. (1999), "Merging fuzzy information", in Bezdek J. C., Dubois
D., Prade H. (eds), Fuzzy Sets in Approximate Reasoning and Information Systems, Kluwer
Academic Publishers, Boston, 335-401.
Fargier H., Lang J., Schiex T. (1996), "Mixed constraint satisfaction: a framework for decision
problems under incomplete knowledge", in Proceedings AAAI-96, 131h Conference on Artificial
intelligence, Portland, Oregon, 175-180.
Fine T. L. (1973), Theories ofProbability, Academic Press, New York.
Gilboa I. (1987), "Expected utility with purely subjective non-additive probability", Journal of
Mathematical Economics, 16, 65-88.
Possibilistic case-based decisions 47
Gilboa I., D. Schmeidler (1996), "Case-based optimisation", Games and Economic Behavior,
15(1 ), 1-26.
Gilboa I., Schmeidler D. (1995), "Case-based decision theory", Quarterly Journal of Economics,
110(4), 605-639.
Gilboa I., Schmeidler D. (1997), "Act similarity in case-based decision theory", Economic Theory,
9, 47-61.
Gilboa 1., Schmeidler D. (2001), A Theory of Case-Based Decision, Cambridge University Press,
U.K.
Heckermann D., Geiger D., Chickering D. (1995), "Learning Bayesian networks: The combination
of knowledge and statistical data", Machine Learning; 20.
Horvitz E.J., Breese J.S., Henrion M. (1988), "Decision theory in expert systems and artificial
intelligence", International Journal ofApproximate Reasoning, 2, 247-302.
Hiillermeier E. (1998), "A Bayesian approach to case-based probabilistic reasoning", in
Proceedings IPMU-98, 7'h International Conference on Information Proceeding and
Management of Uncertainty in Knowledge-Based Systems, Paris, La Sorbonne, July, Editions
E.D.K, 1296-1303.
Hullermeier E. (1999), "A possibilistic formalization of case-based reasoning and decision
making", in Reusch B. (ed.), Proceeding of the 6'h International Conference on Computational
Intelligence, Lecture Notes in Computer Science, Springer-Verlag, Dortmund, Germany, May,
1625, 411-420.
Hullermeier E., Dubois D., Prade H. (1999), "Extensions of a qualitative approach to case-based
decision making: Uncertainty and fuzzy quantification in act evaluation", in Zimmermann H.J.
(ed.), EUFIT-99, ih European Congress on Intelligent Techniques and Soft Computing,
Aachen.
Kolodner J.L. (1993), Case-based reasoning, Morgan Kaufmann, San Mateo.
Pearl J. (1988), Probabilistic Reasoning in Intelligent Systems. Networks of Plausible Inference.
Morgan Kaufmann, San Mateo, CA.
Pearl J. (1993), "From qualitative utility to conditional "ought to"", in Heckerman D., Mamdani H.
(eds.), Proceedings 9'h International Conference on Uncertainty in Artificial Intelligence, San
Mateo, CA, Morgan Kaufmann, 12-20.
Pomerol J.C. (1997), "Artificial intelligence and human decision making", European Journal of
Operational Research, 99, 3-25.
Prade H., Yager R.R. (1994), "Estimations of expectedness and potential surprise in possibility
theory", International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems, 2, 417-
428 .
Ramsey F.P. (1931), "Truth and probability", in The Foundations of Mathematics and Other
Logical Essays, Kegan Paul, London.
Riesbeck C.K., Schank R.C. (1989), Inside Case-based Reasoning. Hillsdale, New York.
Russell S. J., Wefald E. ( 1991 ), "Principles of metareasoning", Artificial Intelligence, 49, 361-395.
Russell S.J., Norvig P. (1995), Artificial Intelligence: A Modern Approach. Prentice Hall, New
Jersey.
Savage L.J. (1954), The Foundations of Statistics. John Wiley and Sons, Inc., New York.
Schmeidler D. (1989), "Subjective probability and expected utility without additivity", Economica,
57,571-587, (First Version 1982).
Tan S.W ., Pearl J. (1994), "Qualitative decision theory", in Proceedings AAAI-94, JJ'h National
Conference on Artificial Intelligence, Seattle, WA, 928-933 .
Von Neumann J., Morgenstern 0. (1953), "Theory of Games and Economic Behavior, John Wiley
and Sons.
48 Chapter 2
Wakker P. (1990), "A behaviour foundation for fuzzy measures", Fuzzy Sets and Systems, 37, 327-
350.
Yager R.R. (1985), "Aggregating evidence using quantified statements", Information Sciences, 36,
179-206.
Zadeh L.A. (1965), "Fuzzy sets", Information and Control, 8, 338-353.
Chapter 3
Joseph RYNKIEWICZ
SAMOS-MATISSE, University Paris I, 90, rue de Tolbiac, 75634 PARIS CEDEX 13, France,
rynkiewi@univ-parisl jr
INTRODUCTION
49
1. AUTO-REGRESSIVE MODELS
The present paper looks at the parametric modelling of time series. More
specifically, we will be studying those models that use multilayer perceptrons
(MLP) as a regression function.
Consider the following times series model:
where:
I. Y1E R is the observation at time "t" of the time series;
2. £1 are independent and identically distributed (i.i.d.) random variables with
null expectation and a constant variance cr 2 for example a variable N(O,cr 2)
independent from the series past;
3. F w0 is a function represented by a MLP whose parameters (weighting) is
the vector WoERD .
To simplify the notation, we will only be considering self-regressive models
of the first order. However, it would be easy to generalise at a higher order. As
such, the phenomenon we observe ( Y;) is the combination of a deterministic
function out of the process's past and of a random event. If we are familiar with
the underlying deterministic function F w , we can optimise the predictions we
0
make.
Given that this function is entirely determined by its vector parameter, the
statistician's job consists of estimating the parameter, W0 using a finite number of
observations (y 0 ,···,yr).
To do this, we minimise in W a functional such as the average of the residual
squares:
Introduction to multilayer perceptron 51
and we note:
Wr = argminSr(W)
w
The activation function ¢of the hidden layer is generally a sigmoid function that
we can consider (without any loss of generality) as being equal to the hyperbolic
tangent. As such, here the vector parameter is: W = (w 1 , • • ·, w 13 ).
From here on in, we will be assuming that our model is identifiable, meaning
that there can only be one representative vector parameter for any function that
can be represented by a given MLP. To obtain this property, all of the parameters
will have to be restrained in a suitable fashion (Sussmann 1992).
is definite positive.
In which case:
- The estimator WT will almost surely converge towards w0 when T tends
towards +oo.
Introduction to multilayer perceptron 53
Let us assume that an upper limit M exists for all of the possible dimensions
of the model. This can be indicated as:
or else:
L
CP*(WL) = ln(ST(W )) + c(T) xL
T T
where ~T) is the rate ofpenalisation.
If c(T~2 , the penalised contrast cp* is then equal to Akaike's AIC criterion,
if c(T) = 2ln(T), cp* is equal to Schwarz's BIC criterion (Schwarz 1978).
Based on these definitions, we display the finding whose proofs can be found in
Yao (2000) or else in Rynkiewicz (2000).
54 Chapter 3
where A (resp.A) is the largest (resp. the smallest) proper value of the matrix I:0 .
In this case:
1. the couple (L, wf) will almost surely converge towards the true value and
2. the true dimension (Lo, W0L 0 ) of the parameter when T will tend towards oo .
Using the findings from the preceding section, we can propose the following
method for determining the true model. We launch the architecture by taking all
of the relevant entries (as we would get them from a linear AR model) plus a
single hidden unit. Then we progressively add units into the hidden layer,
calculating the BIC criterion at each step. We continue this process as long as the
BIC value drops. When the addition of another hidden unit causes a renewed
upswing in the BIC, we stop the model search and construe this latest MLP as the
dominant model. Schematically, this search can be represented by the following
figure, which translates into a dominant model with k+ 1 hidden units.
Introduction to multilayer p erceptron 55
Hidden units
k-1 k+ l
Once this dominant model has been obtained, we get the real model by
successfully pruning away extra weight.
Remember that W M = ( w 1, · · · , w N) is the vector parameter that is
associated with the dominant model. In principle, to estimate the true model we
should be comprehensively exploring the finite family of all of the submodels.
However, this number is an exponentially large one, and this is the reason why to
guide our research we are proposing, as is done in linear regression, a Statistical
Stepwise Method (SSM). This kind of strategy is based on the asymptotic
normality of the estimator Wr (Cottrell et al. 199 5) that uses Student statistics as
an aid for exploring subfamilies of the dominant model. To decide whether or not
the weightings wz should be eliminated, we compare the BIC values of the model
F and those of the model F without the weightings wz: Fz. As Fz is a submodel of
F, it suffices that the BIC criterion diminishes for us to move somewhat closer to
the true model, which minimises the BIC criterion. By so doing, we obtain a
series of MLPs with fewer and fewer parameters corresponding to a decreasing
BIC trajectory. The criterion for ceasing our pruning exercise is quite
straightforward, in that we will refuse to eliminate a weighting if this causes the
BIC to rise. We keep the final MLP that maintained this weighting.
In short, the procedure involved in searching for a true model is as follows:
56 Chapter 3
This procedure has been tested with a number of examples and provides
excellent results as long as the volume of data is sufficiently large, generally
more than 500 observations.
2. HYBRID MODELS
2.1 Introduction
By modelling time series with the help of neural networks, we can account for
any non- linearities that the model may contain. At the same time, this is based
on a restrictive hypothesis as regards the model's stationariness. One simple
generalisation we could make would be to account for the piecewise stationary
time series For instance, Hamilton (1989) studied these kinds of models in an
effort to model time series that are subject to discrete changes of regime. He used
this as a way of analysing GNP series in the United States. We can therefore use
such models for series featuring a particular regime for periods of economic
growth, for example, and another one for periods of recession.
Although this model is more general than its predecessor, we still have need
of a number of restrictive hypotheses. First of all, there has to be a finite number
of possible regimes. Secondly, even though regime changes might take place, we
will assume that such changes occur in a stationary manner, something that will
ultimately enable us to make use of the law of large numbers and thus engage in
statistical analysis.
The theory of hidden Markov chains and their first applications in voice
recognition are more than 30 years old. The basic theory was published in a
series of articles written by Baum et al. (1966; 1967; 1970; 1972) in the late
1960s. Hidden Markov chains have subsequently been applied in a number of
different fields, such as genetics, biology, economics, etc.
Introduction to multilayer perceptron 57
If additionally we were to define: V1+ 1:=X1+ 1-AX1, we would get the following
notation for this model:
Assume that the time series we have observed (}[) verifies the following
equations:
where {Fe 1 , ···,FeN } are functions of RP----R which will be represented in our
example by MLP.
For every eiEE, (e,r(ei)) is a succession random variables that are
independent and distributed identically. This model makes it possible to use
several MLPs (here we would talk about a mixture of experts) and to use the
Markov chain (Xt) in order to specify at a time "t" which MLP makes the most
appropriate prediction. Note that since we can only observe the series (Y1 ), we
will have to find a way of reverting to the rest of the states of the chain (X1 ),
thanks to a (Y1 ) behaviour.
To adjust this sort of model to our observations, we estimate the parameters
(weightings of the MLPs Fe. , the variance of noise c(ez}, and the transition
l
58 Chapter 3
From here on in we will consider that the density of noise (e(e 1 )) 1 ~i~N IS
Gaussian in nature. We start by studying the free parameters that are under
consideration:
The transition matrix A, this matrix is stochastic, meaning that the sum of any
given column A is 1. Thus there are no more than (N -1 )x N free
parameters;
- The variances (aei hs;i~N , which are supposed to be strictly positive;
- The parameters of the regression functions (Fei )1 ~i~N, since we are using
MLPs, the parameters will obviously be the weighting vectors (Wei ) l~i~N ) of
the MLP.
An initial writing
The likelihood of the model for a succession of observations of the series
y := (y0 ,···,Yr) for a supposedly fulfilled path of x := (x0 ,···,Xr) is therefore:
where <I>ei is the density of the normal law N (0, a e ), 1 G the indicative function
of the set G and ;r the probability of the initial1 state x . To get the overall
likelihood of the ob~ervations, we could add up all of these' likelihoods along all
of the potential paths of the hidden Markov chain. We would then have:
It is well known that the complexity of this sum is exponential, something that
makes it difficult to make the calculation whenever the number of observations is
more than a few hundred. It might also be possible to calculate the maximum
likelihood thanks to the E.M. algorithm by using Baum and Welch's forward-
backward algorithm. However, here we would prefer to use a differential
optimisation technique that is generally faster than the E.M. algorithm when the
regression functions being used involve multilayer perceptrons.
The log-likelihood
A more elegant way to write the log-likelihood is to use the predictive filter
P(X, = e; I Yt-1,.. ·,y0 ) := p,(i) since the likelihood will be written:
Lo(Yp···,yT) = :rr
T
1= 1
Lo(Y, I y,_,,-··,yo)
T-i
= Lo (yT I YT-P ···,Yo) X TI Lo(Yt I Yt-i,. ··,yo)
t=i
= ~(Lo(YT I XT = ei,YT-P···,yo)) TIT-I L ( I ... )
L. P(X I )) X Yt y,_,, ,yo .
T = e; YT-l ,···,Yo
II
i=l X t=i
Note that:
• Pt the vector whose i -th component is:
i.e., the conditional density ofYt given that X 1=ei and (yt-1 ,. .. ,yo).
• uT the transposed vector u.
60 Chapter 3
T-1 T
Le(Yp· · ·,yr) = bJ Pr X IT Le(Yt I Yt-P ···,yo)= IJ btr Pt ·
1=1 t=i
(1)
All we have to do is to calculate Pt for t=l ,-··,n, to be able to calculate the log-
likelihood, since:
Calculation ofPt
In light of Brdiag (bt} the diagonal matrix whose diagonal is the vector bt,
we can easily verify (Rynkiewicz 2000; 2001) that the predictive filter Pt verifies
the recurrence:
ABtPt
Pt+l = T · (2)
bt Pt
We will assume that p 1 follows an uniform distribution over {1, . . . ,N} and we
will therefore be able to calculate Pt' t=l,-··,Tby means of recurrence. The choice
of an initial value of Pt is relatively unimportant thanks to the initial
distribution's exponential forgetting property.
db{ Pt
dln(Lo(Yl,···,yn)) = f dB;
dB; t=l b'{ Pt
dbT p
All we have to do then is calculate 1 1 in order to be able to calculate the
dB }·
derivative of the log-likelihood, being aware of the fact that:
We can find the details of the calculation of this derivative in Rynkiewicz (2000).
Here we will be simply explaining the calculation ofthe derivative of the filter:
Calculation of dp 1 following £1
dB·}
Since we have the recurrence:
ABtPt
Pt+I = T
bt Pt
hence:
62 Chapter 3
dp 1+1 =[dAB1
()()
1
()() Pt
1
+AB dp 1
t ()()
1
)x-1-+AB
bT
tPt
tPt
x[db1T +bT dp 1
()() Pt
1
t ()()
1
)x[- 1 )
(bT ) 2
t Pt
We then have:
hence:
(4)
The purpose of this study is to predict the maximum daily pollution rate,
defined in the level of ozone, between the months of April and September,
included. Towards this end, we will be using the previous day's maximum of the
pollution rate (i.e., of the ozone level) as our regressor - plus the following
meteorological observations:
- Total radiation,
- Average daily wind speed,
Introduction to multilayer perceptron 63
This preliminary study will allow us to detect what it is that the MLP
contributes to the simple linear model. The architecture of the MLP model was
determined thanks to the SSM method that was described in the first section of
the present paper. Here our performance criterion is the square root of the
average quadratic error (RMSE). It is expressed in an microgram's of ozone per
cubic meter (jLg/m 3 ). Table 1 summarises the findings obtained:
Note first of all that the MLP leads to a marked improvement in the
performances of the linear model, whether in terms of the "in sample" or "out of
sample" data. Although there is relatively little "in sample" data to help us to
estimate the model (550), the SSM method makes it possible to avoid over-
fitting. This is achieved in an entirely satisfactory manner, given that the RMSE
difference between 1994-1996 and 1997 is relatively small. Note in addition that
64 Chapter 3
220
serias-
predlcaons -----
00
180
~~~~
I 11.
160 r ~~ Jl
j•lr(• r
,i !
I
140
,.I~ J ~
A
120
N.
II
~ il
~v
100 r ~ 1,
I~
~ . -'1/ '
'
I ~~ I ~ ~'II'
80
i I \
~~
60
~
I
40
·/ 1I
20
0 20 40 60 80 100 120 140 160
Figure 3 compares the true value of the ozone rate with its MLP prediction
based on the "out of sample" series. Note that the prediction for the average
values is a particularly good one, but that peaks are generally underestimated.
This behaviour is all the more troubling since it is the higher values that the State
authorities are more interested in. We are therefore going to use a hybrid
HMM/MLP model, hoping that an expert will specialise in the prediction of
average and lower values, whilst another will be trying to ascertain the dynamic
of the high ones.
A= (0.97 o.o2)
0.03 0.98
Note that the diagonal terms, which represent the probability of remaining in the
same state, are very close to the maximal probability 1. This means that for long
periods of time the model stays in the same state, a sign that it has indeed
identified two distinct regimes. The standard deviations for the linear expert a0
and for the MLP a 1 are as follows:
a 0 =0.11
{
a 1 =0.20
This result is intuitively coherent, since as we will see the linear model has
specialised in the easier part of the series (i.e., in the average or low values, thus
allowing us to come up with good predictions), whereas the MLP is specialised
in the more difficult part of the series, i.e., the high values.
Lastly, prediction error results are as follows:
Serie:1 -
Conditi.onnal probabilities - -
-1
~ L-----~----~----~----~----~------~--~
0 100 200 JOO 400 500 600 700
Figure 4. Centred and normalised series and probability of the state that is associated with the MLP
We can also break the model's prediction down into two sections: the linear
expert and the MLP. This gives the user a lot of flexibility. After all, if he is only
interested in the strong values, he will only take those MLP predictions that are
much more relevant into consideration. Figures such as 5 and 6 show the
"scatterplot" nature of the two experts' predictions with respect to the true values
for all of the data (in and out of sample).
250
200
Note with these figures that the linear expert is better than the MLP for the
low values, whereas for the high one its predictions are much less far-reaching
than are the true values. The MLP expert is the other way around, inasmuch as it
overestimates low values but is much better at estimating the high ones. If we
were to use this MLP alone to make predictions, the average quadratic error
would be worse than with a single auto-regressive model, but better for the part
of the series that we are interested in: pollution peaks as defined by ozone levels.
CONCLUSION
The present paper has introduced two important models that can be used with
time series. The first section was devoted to the difficult problem of the model's
dimensions. We have offered a methodology that is based on the asymptotic
properties of the parametric regression models. In our experience, this provides
good results as long as there is a sufficient number of observations (at least 500).
In the second section, we have generalised our problem and studied the
example of a piecewise stationary time series. The greater complexity of the
dynamics underlying such series cannot be captured in a simple regression
model. We therefore use, for a particular series, a number of regression functions
that are interconnected via a hidden Markov chain. All of these auto-regressive
functions make a simultaneous prediction of this series, with the hidden Markov
chain's role now being to weight the predictions that these regression models are
making by the various regimes' conditional probabilities. By applying this model
68 Chapter 3
REFERENCES
Akaike H. (1974), "A new look at the statistical model identification", Transactions on Automatic
Control, 19,716-723.
Baum L.E. ( 1972), "An inequality and associated maximization technique in statistical estimation
for probabilistic functions of Markov processes", Inequalities, 3, 1-8.
Baum L.E., Egon A. ( 1967), "An inequality with applications to statistical estimation for
probabilistic functions of a Markov process and to a model for ecology", Bulletin of American
Meteorology Society, 73, 360-363.
Baum L.E., Petrie T. (1966), "Statistical inference for probabilistic functions of finite Markov
chains". Annals of Mathematical Statistics, 37, 1559-1563.
Baum L.E., Petrie T., Soules G., Weiss. N. (1970), "A maximisation technique occurring in the
statistical estimation of probabilistic functions of Markov processes", Annals of Mathematical
Statistics, 41(1), 164-171.
Bel L., Bellanger L., Bonneau V., Ciuperca G., et al.. (1972), «Element de comparaison de
previsions statistiques des pies d'ozone »,Revue de Statistique Appliquee, 47(3), 7- 25.
Chen J.-L., S. Islam, P. Biswas (1998), "Nonlinear dynamics of hourly ozone concentrations:
Nonparametric short-term prediction", Atmospheric Environment, 32, 1839- 1848.
Comrie A.C. (1997), "Comparing neural network and regression models for ozone forecasting".
Journal of the Air and Waste Management Association, 47, 653- 663.
Cottrell M., Girard B., Girard Y., Mangeas M., Muler C. (1995), "Neural modeling for time series:
a statistical stepwise method for weight elimination", IEEE Transaction on Neural Networks, 6,
1355-1364.
Douc R., Moulines E., Ryden T. (2001), "Asymptotic properties of the maximum likelihood
estimator in autoregressive models with Markov regimes", Technical Reports 9, University of
Lund.
Gardner M.W., Dorling S.R. (2000), "Statistical surface ozone models : an improved methodology
to account for non-linear behaviour", Atmospheric Environment, 34, 21-34.
Hamilton J. D. (1989), "A new approach to the economic analysis of nonstationary time series and
the business cycle", Econometrica, 57, 357- 384.
Krishnamurthy V., Ryden T. (1998)," Consistent estimation of linear and non-linear autoregressive
models with Markov regime", Journal of Time Series Analysis, 19(3), 291-307.
Introduction to multilayer perceptron 69
Bernard PAULRE
University Paris 1 Pantheon Sorbonne J.S. Y.S. - MATISSE UMR Paris 1 - C.N.R.S. 8595
Abstract: The cognitive economics, which first made its appearance in the 1960s, now focuses
a great deal of economic research resources. It is legitimate to raise questions about
the contents and/or timing of the proposal of a cognitive economics research
program. In this article, we underline some of the issues at stake in this sort of
clarification, focusing specifically on problems pertaining to the speeds at which
knowledge or real interactions actually adjust, and to the relevancy of the axioms of
the epistemic logic. The dilemmas that we have emphasized here all provide us with
an opportunity to highlight a certain number of significant alternatives. Specifically:
(i) issues relating to the respective roles of the knowledge economy and of cognitive
economics stricto sensu; (ii) the difference between the computable orientation that
is involved in standard approaches to economic behaviours, and the connectionist
orientation that we illustrate, for example, by the evolutionary conception of the
firm; (iii) the problem of the relationship between a cognitive economics research
program and one that relates to cognitive sciences. We present an outline of the
potential foundations for a cognitive economics research program. This kind of
program would manifest itself through potentially divergent schools of thought
whose disparities should all be seen as factors of dynamism that could be used to
drive research in this area.
INTRODUCTION
There are two reasons why the present emergence and rapid development of
what has come to be called the field of cognitive economics might be considered
a surprise. First of all, information and knowledge concepts seem consubstantial
71
with economics -yet asides from the debate on planning and the market economy
(Taylor 1929; Hayek 1935; Lange 1935) 1, amazingly enough it was not until the
1960s that the first articles dealing directly with the information economy were
published. Secondly, with the development of cybernetics and theory of
communication during the 1940s and '50s, it would have been no surprise had a
lot of economists become the prophets of these new sciences during the 1950s
and '60s, given the major effects they were having on other disciplines - but this
did not happen. Cybernetics did have some impact via the Keynesian models that
a few engineers developed using language derived from the theory of control
(Tustin 1958), and 0. Lange did offer us a most remarkable (albeit isolated)
contribution when, in 1964, he wrote a book devoted to a cybernetic approach to
economics. However, for the most part, and as we will see below, we can
generally say that it was primarily in the 1970s that information and knowledge
became a major focus in economics.
Even so, this preoccupation was usually deemed to be part of a mainstream
inspired study of markets with imperfect information and, as a result, little
attention was paid to the conditions underlying agents' acquisition of
information, whether at a cognitive or a psychological level. The main focus of
study was the impact of insufficient information, with observers usually basing
their analyses on the rational expectations hypothesis instead of highlighting the
conditions that enable agents, through learning or in some other way, to increase
the information at their disposal. In the end, it was in the 1980s that we finally
saw the advent of approaches reserving a lot of space for learning, knowledge
and beliefs, and portraying these factors as forms that were less trivial in kind
(and probably not as readily compatible with a Walrasian corpus of literature).
The goal of this article is to come up with a few elements to help us reflect
upon the ideas that cognitive economics convey, and the project that it offers.
Given the level of development which this discipline has reached by now, we are
probably ready to take a closer look at the meaning and the impact of those
changes that have already taken place. Should we be even be talking about a sub-
discipline called "cognitive economics" ? Should we go as far as to suggest a
new research program, following in the steps of what some people were calling
cognitive sciences during the 1970s and accepting directions that B. Walliser
(2000) laid out for us ? This may seem a reasonable point of view, but if so what
are the main axes and principles that will unite those who support this kind of
project ? These are the issues with which we will be dealing.
Our approach will be as follows. First of all, we will review the progressive
emergence of economic approaches on information and knowledge in order to
identify the themes around which the field of cognitive economics historically
seems to have been either partially or entirely structured (section 1). This will
enable us to explain some of the theoretical issues at stake in a future cognitive
economics research program. We will then distinguish between two types of
Issues and dilemmas in cognitive economics 73
dilemmas. First there are external dilemmas (section 2), located upstream from
the very definition of our cognitive economics concept. These are the dilemmas
that people encounter when their goal is to define either the boundaries of what
we have called "cognitive economics", or else the nature of a research program in
this field. After having analysed these dilemmas, we will be tracing the contours
of what a cognitive economics research program might look like (section 3),
before delving into the internal dilemmas (section 4) that will be nurturing and
supporting discussions on the various options that are available in this area, once
a research program has been defined (or at least outlined). A cognitive economics
research program does not actually mean that a unified theory or doctrine exists.
We already saw this with the development of cognitive sciences. Instead, we are
looking at a general orientation capable of federating and uniting researchers who
do not necessarily share the same basic options, themes or methodologies, even if
they are active in the same field of research.
Other dates are also noteworthy, but we feel that it would be more useful to
identify the themes and approaches comprising that which we will be calling
(with a certain amount of vagueness for the moment) the study of the economics
of information and knowledge. It is also a good idea to provide a timeline for this
discipline. To do so, we will focus on the period extending from the 1950s to the
early 1980s, an era we have good reason to highlight2 . A whole series of studies
can be explored, analyses we think should be regrouped by theme (or even by
"wave") whenever they feature a modicum of concentration or historical
74 Chapter4
coincidence. Of course, we are not saying that this overview will cover the entire
field. Our purpose is only to offer some idea as to its boundaries and
characteristic themes.
By limiting our investigation to the period between 1950 and 1980, we are not
trying to suggest that knowledge-related issues were entirely absent from earlier
economists' areas of concern. For example:
- Since the very outset, the division of labour has raised questions about the
development of knowledge and intelligence, these being the factors that this
division either denotes or else which it restricts (Smith 1776; Ferguson 1783,
particularly),
As far back as the 1930s (1933 precisely), F. Knight focused on themes such
as the relationship between information structure and organisation,
In 1934 N. Kaldor and A. Robinson were already studying issues such as the
optimal size of the firm, linking this to the cognitive capacity for planning
(the central theme of Penrose's theory, see below),
The role that information plays in market structure and market equilibrium
has been studied on several occasions: to determine how this relates to
monopolistic competition and the role of advertising (Chamberlin 1933); and
as part of oligopoly theory, which stresses problems with oligopolistic actors'
crossed expectations (Fellner 1949),
Hayek began to work on knowledge diffusion in the late 1930s, during the
aforementioned debate about socialist planning.
However, these elements are highly dispersed, and have never been exploited
systematically. For this reason, we prefer the assertion that the history of this
field of study began in the early 1950s, an era that witnessed the hatching of our
topic's main constituent phylum - the forerunner of those modem research efforts
whose orientations we would also like to discuss.
1.1 The 1950s and the phylum of the theory of the firm
From 1950 until 1953, there was much debate about the profit maximization
principle, relating to whether the underlying hypotheses were realistic, and
whether the principle itself actually furthered the Oxford surveys on corporate
executive behavior. A. Alchian renewed the terms of this debate in 1950, as he
was the first to invoke the natural selection argument to explain, if not the
maximization of profit, then at least the constant search for it. M. Friedman used
this argument in 1953 to justify the maximization principle. It is worthwhile for
us to review this debate because of the fact that a study published by S. Winter in
1964, offering both a synthesis and a pedagogical extrapolation of our topic, was
Issues and dilemmas in cognitive economics 75
followed by a whole series of articles that would culminate in 1982 with the
publication of the key work by R. Nelson and S. Winter, the promoter of the
modern evolutionary approach in economics. This text asserted strongly (if not
for the first time) that the analysis of the firm should center on the concept that
"routines" are the place where "knowledge resides". In this view, the storage
function that comprises one of the three constituent functions of selection in a
natural selection paradigm (along with selection itself and mutation) is enacted
through routines. According to Nelson and Winter, "organizations remember by
doing" with "routines constituting the most significant way of storing
organizations' specific operational knowledge" (p. 99). The train of history
between 1950 and 1982 was one of the main vectors for the progressive
emergence of a cognitive approach to firms - but it was not the only one. Enters
E.T. Penrose, who participated in the 1950s debates by contesting the naturalist
metaphor to which the firm was being exposed, in order to suggest and develop
(in an entirely different vein) an approach to the firm that reserved a lot of space
for a cognitive dimension.
In 1959 she published her famous book on the theory of the growth of the
firm. She started out with a principle that was unique in the context of the micro-
economic theory of the time 3 : a firm has to be dealt with as if it were "an
autonomous administrative center for planning". It is both an organization and
the sum of a variety of means of production. Penrose pointed out that it is never
the resources themselves that constitute what can be called the factors of
production, but instead the services that they can provide. So, she had good
reason to postulate that a productive activity depends on a "productive potential,
which includes all of the production possibilities of which entrepreneurs are
aware and from which they can benefit". This being the case, the entire theory
that Penrose develops is based on what we feel should be called a cognitive
conception of the firm. One of the most noteworthy manifestations of this
approach derives from Penrose's famous theorem about the factors that restrict a
firm's growth. In short, this theorem states that a firm's rate of growth is bounded
by its managers' cognitive capacities for planning growth. Penrose's theory
contains room for learning. Although this concept is mainly used with regards to
the learning of executive managerial activities, it is also present via the analysis
of the firm's internal processes, those that are involved in the creation of new
productive services. The Penrosian vision of the growth of the firm is therefore
highly dependent on the progressive extension of productive services, on the
creation of a surplus and on the subjective growth potential. Now, these factors
are all qualitatively and quantitatively determined by experience. The so-called
"resource theory" school of thought, which occupies a central position in modem
evolutionary approach, springs from Penrose's analyses (Durand, Quelin 1999).
Certain developments relating to corporate strategy (Ansoff 1965; Drucker 1965;
Hamel, Prahalad, 1990) are also clearly rooted therein.
76 Chapter4
At the same time as Penrose, and with an analogous starting point (i.e., with
the firm being seen as an organization), a neo-rationalist approach to
organizations developed, based on the seminal writings of Simon, whose two key
works were published in 1958 and 1963. In a book that Cyert and March wrote in
1963, organizational orientation is depicted in a more radical light than was the
case with Penrose, inasmuch as the "productive" aspect of the firm disappears.
The only thing that remains is a vision of the firm as a system for making
decisions and solving problems. On the other hand, theirs is a vision that has
been enriched by an incorporation of conflicts relating to the firm ' s objectives4 .
This is a decisional and dynamic approach in which learning plays a key role,
albeit one that is different from the function it fulfils for Penrose, given that it
now pertains to decisions and not to activities or productive services.
Furthermore, even though innovation (in the widest and not necessarily
technological sense of the term) is present in March and Simon's writings, it is
absent in Cyert and March. The former authors rely explicitly upon a conception
wherein human organization is seen as "a complex system of information
processing". For the latter, this theory (which is first presented in a verbal form)
blossoms into an information system model that can simulate a general decision-
making model determining price and quantity (this model representing the
solving process that is often attributed to firms).
Between 1958 and 1962, several seminal articles were published about R&D
and the knowledge economy.
In one of the first articles on the subject, R. Nelson (1959) raised the issue of
the quantity of resources that a country should devote to scientific research,
which he defines as "a human activity that is geared towards the advancement of
knowledge, which can itself summarily be divided into two separate categories:
the facts and data that can be derived from reproducible experiences; and the
theories and relationships between these facts." He looks at problems relating to
the externalities causing a gap between the marginal private returns or else the
social returns of basic research. He also observes that the marginal social cost of
using an already existing bit of knowledge is zero, hence the need to
"administrate knowledge as if it were a freely accessible common resource." 5 In
1962 K. Arrow focused on the same issue in a fundamental theoretical article in
which he demonstrated that in a competitive situation the incentive to invent is
greater than it is in a monopolistic environment, but that it is less than (or at best
equal to) the invention's social yield - hence an opportunity for State intervention
(1962a). In 1958, B. Klein had underlined the unavoidable duplication of R&D
Issues and dilemmas in cognitive economics 77
activities, due to the high degree of uncertainty affecting them (Klein 1958;
Klein, Meckling 1958). This type of questioning, and more generally the issue of
R&D-related externalities or spillovers, would never cease to be central to
economists' reflections, especially within the framework of endogenous growth
studies (Romer 1990). This is the genesis of the widespread interest in networks.
In 1961, having observed that before an innovation's effects are completed an
imitation process will have already been widely launched, E. Mansfield, relying
on a classical contagion principle, demonstrated that in general "the number of
users of an innovation will approximately follow a logistic curve." This fits in
with the initial empirical findings that Z. Griliches (1957) found for this subject.
Contemporary analyses of innovation diffusion processes still refer to these two
authors.
In 1962, K. Arrow published another seminal article on learning by doing
(1962b) 6 • This study was macroeconomic in nature. It is different from the more
microeconomic and engineering-based "progress curve" or experience curve of
Alchian (1950).
The early 1970s saw a whole corpus of articles (or chapters in collective
books) devoted to an incorporation of the notion of imperfect information into
market theory. However, the hypotheses that were formulated and the
conclusions to which they lead were not convergent.
Amongst these studies we should specifically mention those that were
devoted to unemployment and which developed the thesis that this is one
consequence of job-searching in an environment characterized by imperfect
information. As such, starting out with the idea that the search for information
about "notional" trading opportunities has a cost, A. Alchian, in a well-known
text, suggested that the type of unemployment that results from this situation "is
self-employment in information collection" ( 1970, p. 30).
A 1970 article by Akerlof highlights some of the difficulties that will crop up
in markets where buyers and sellers have different information, and stresses the
existence of adverse selection phenomena. In actual fact, the first article to have
delved into the issue of asymmetry appears to be one that Arrow wrote back in
1963, which focused on moral hazard (as he had done in his 1962b article).
Issues and dilemmas in cognitive economics 79
Moral hazard involves situations in which one side of the market cannot observe
the behaviour of the other. Inversely, adverse selection relates to situations in
which one side of the market cannot observe the quality or the "type" of the
goods that are being created by the other side. For the former we speak of
"hidden behaviours" or of unobservable actions. For the latter we speak of
"hidden types" or of unobservable characteristics. We can scarcely over-estimate
the impact and significance of research on this topic, including (in recent times)
in the financial sphere.
Signals theory was first evoked in an article published by M. Spence in 1974,
shortly before a model that M. Rothschild and J. Stiglitz developed in 1976. The
Spence model treats education as a signal of competency. The other model
focuses on purchasing behaviour in the insurance market, and considers that the
type of policy being purchased is a signal of the degree of risk that the insured
party represents. Other models exploiting the same analytical principle include
ones that were published by Milgrom and Roberts (1982: price is an indicator of
the cost that an oligopoly assumes; and 1986: advertising used as a quality
signal), Reinganum and Wilde (1986: the propensity for bringing a lawsuit seen
as a signal of the strength of one's position), etc.
The firm is an other area where information asymmetries are present:
shareholders and executives do not possess the same level of information and do
not necessarily have the same interests7• Principal-agent literature helps us to
explore this situation, which developed from initial contributions by M. Spence
and R. Zechauser (1971) and Ross (1973) regarding the issues involved when
contracts are signed in a situation where asymmetrical information exists. Then
came a somewhat different approach from M. Jensen and W. Meckling (1976),
one that could be applied in a wide variety of situations. With respect to the
theory of the firm, this latter approach would become particularly relevant
whenever restrictions exist within an organization or a group regarding
decentralization and/or the delegation of authority, particularly when this is due
to informational asymmetries. Nowadays it has been extrapolated into a new type
of thinking about the nature of the firm, an approach in which analyses are made
in terms of incomplete contracts. This consists of thinking about the impacts of
such asymmetries and about the types of contracts that would make it possible to
resolve any conflicts of interest (i.e., "to align incentives"): c.f., 0. Hart, 1995. It
also covers issues relating to the firm's borders (Hart, Moore 1990), thus tying
into transaction cost literature, which springs from a seminal article by R. Coase
(1937) and from Williamson's book on this topic (1976).
In retrospect, the 1970s definitely constituted a most fertile period for the
information economy.
80 Chapter 4
key element of the reasoning that is being presented within this framework.
Associated with the hypothesis that behaviour is neutral towards risk, the
postulate seemingly constitutes a strong condition for demonstrating the possible
existence of price taking equilibria. Nevertheless, certain difficulties do remain,
notably because of the existence of multiple equilibria.
According to F. Hahn, the rational expectations hypothesis in a price taking
economy does not suffice to create a pre-determined equilibrium:
" .. since even here, to obtain our pre-determined responses we need to know
something about the learning process. Plus the equilibrium might not reflect
the fundamentals of the situation ... ".
In addition, in an economy with imperfect sector competition:
"the dynamic must be seen as a learning process both in terms of the demand
conditions and also as regards competitor strategies. Once again, when an
equilibrium is defined in terms of these processes, it would appear to remain
undetermined unless its past (in other words, information) has been explicitly
modelled and is a known commodity ... The information available to agents at
any given moment in time is very path-dependent. The economy could have
followed another path and generated quite different types of information.
There is something that is fundamentally past-oriented about the very
definition of equilibrium, and obviously in the dynamic itself' (Arrow 1958).
There are as many equilibria as there are possible pasts. An equilibrium with
given properties can stem from a number of different pasts, which can have one
or several characteristics in common.
Leaving to one side the issue of multiple equilibria, we see that a rational
expectations hypothesis has a strategic role to play here. Above and beyond the
lack of realism argument that consists of pointing out the fact that this hypothesis
attributes a cognitive capacity that is manifestly excessive in terms of agents' real
capacities and level of information, what else can we say about it?
During a conference in the late 1980s, R. Lucas made a major contribution to
this debate when he highlighted the methodological options underlying his
research strategy. According to Lucas, the way in which economic agents are
able to store the decision-making rules that they use has fundamentally nothing to
do with economic theory:
"Economics tend to focus on situations in which it can be expected that agents
"know" or have learnt the consequences of different actions, meaning that
their observable choices reveal the stable elements in their underlying
preferences... Technically I believe that economics are the study of the
decision-making rules that constitute the stationary states of certain adaptive
Issues and dilemmas in cognitive economics 85
process, rules that work for a certain range of situations and that are not
revised, especially when experience starts to accumulate" 12 •
Ultimately, one of the main issues in the incorporation of imperfect
information and knowledge seems to be the very existence of a competitive
(price taking) equilibrium. Hence the inclusion in a certain number of studies of
recommendations for this type of situation, as if it constituted the norm. In other
words, the main strategic debate seems to involve a questioning of the rationality
being ascribed to actors, especially the existence of an opportunity for
incorporating learning (and when this does exist, the type of hypotheses that
should be made in this respect). As such, cognition does indeed lies at the heart
of the problem - something that provides us with a natural link to the second
. .
Issue we now raise.
It is not absolutely necessary at this point to review past debates on the nature
of rationality in economics (i.e., the way that this topic has been dealt with since
the 1960s, particularly by H. Simon). In a context such as the one in which we
are situating ourselves at present, we feel that it would be more judicious to take
a closer look at cognitive rationality, or to be more specific at the epistemology
that economic agents apply (and at the way in which this has been portrayed in a
number of recent studies). Hence the need to revisit the aforementioned epistemic
type of logic, whose usage in games theory nowadays would appear to be well
established. Many analysts feel that this logic is widely recognized as the way to
move forward. We will be discussing the axioms upon which it is built,
immediately followed by an expose of its limitations. We find (see Osborne,
Rubinstein 1994):
An axiom in which agents are familiar with all of the possible states of the
world, and always know when a "universal" event has been enacted 13 . This is
tantamount to excluding so-called "ignorance" situations, postulating that
agents cannot be surprised by events that they did not even realize were
possible.
A deductive closing axiom: if an agent knows E and knows F, then s/he
knows that EnF.
A truth axiom: what an agent knows is true; nothing that s/he knows can be
false.
A transparency axiom (a.k.a., a positive introspection axiom) : if agents know
E, then they know that they know E. In other words, agents are always aware
of what they know.
86 Chapter 4
In actual fact, the cognitive dimension, in the strict sense of the term, is absent
from the first approach. In much the same way as other merchandise or economic
objects, knowledge is entirely dependent on an evaluation process that uses up
(so to speak) the characteristics or the economic meaning of the issue with which
it is having to contend. The beneficiary or recipient of the information might
derive a differentiated utility or product from this service, but at a first order level
this relationship is thought to be analogous to the one that derives from the
purchase or consumption of any other kind of good or service. The cognitive or
"mental" dimension is totally implicit therein. What we will be calling the
knowledge economy is the portion of economic thinking that deals with
information and knowledge basically as if they were activities, relationships or
objects that can be treated at an aggregate level; or which infers behaviours
within which the individual cognitive dimension is both implicit and non-
analysed.
We face a different sort of product if we focus on the conditions that
transform an agent's way of carrying out a certain operation due to the
information that s/he has received because, for example, of some previous
experience. A transformation process that is based upon the information that an
individual has received cannot be equated with a mere commercial exchange -
even if we were to assume that the information being used has a known cost, the
outcome of the operation (i.e., the change in behaviour) constitutes an innovation
that is akin to a clean break with the past. This makes it difficult, if not
impossible, to evaluate the outcome thereof from an economic point of view.
Unless we find ourselves in a situation where damages are being paid, we cannot
even establish the principle that as part of this transformation operation (which is
not necessarily a voluntary or even a conscious act) the agent is displaying an
economic type of rationality.
What we will be calling cognitive economics is that part of economic thinking
which deals with economic phenomena within which information and knowledge
play an essential role; and which actually incorporates the cognitive sphere of
agents' behaviour. The field also includes any economic research whose specific
purpose is to provide suitable interpretations and representations of actors'
behaviours, from a cognitive perspective.
Given the ambitions we have developed, what direction should we be heading
towards?
The two themes are related and should in principle be interconnected. If
indeed they have come apart, this is to due to the state of the discipline, and to
the research strategies that are being pursued. Ideally, cognitive economics
should include a study of the emergence processes thanks to which we can
explain (or at least account for) the vertical relationships and conceptual or
model-related imbalances that are in play. However, our understanding has made
barely any progress in this domain, and with the exception of very few studies
90 Chapter4
(Lesoume 1991; Lesoume, Orlean (eds.) 1998; Paulre 2000) we have not even
been given sight of the articulations that are actually taking place. In fact, our
ignorance is such that despite the fact that cognitive economics and studies of the
knowledge economy are not officially distinct from one another, they cannot at
present be totally associated and integrated in a joint research program (or at
least, not if such a program is to be rendered fully operational). Note however
that this does not mean that research projects involving cognitive economics can
be carried out independently of the lessons we have drawn from the knowledge
economy, if only because they illuminate certain institutional aspects thereof, as
well as the nature of certain problems relating to activities that feature a strong
informational or cognitive dimension.
Another argument also supports our decision to separate these two domains
temporarily - their relationship with cognitive sciences. If indeed we want to
define a cognitive economics research program that draws its inspiration from the
contents (and links) of earlier cognitive science research programs, we should try
to emphasize the individual aspects of cognition. This distances us from sectoral
or macro-economic approaches that neglect such aspects, or else which do not
deal with them in a direct manner.
cognitive sciences, i.e. for cognitivism. It is not our intention to align ourselves
with this perspective. We are simply trying to come up with a starting point that
can be useful in two ways: (i) it can trigger the dialogue that we feel economists
should be maintaining with cognitive science specialists, thus improving the way
in which the elements from the respective research programs mesh with one
another; (ii) it will specify the nature and the characteristics of the orientations
that will be making up this cognitive economics research program. We can
benefit from D. Andler's text by using it as a basis for thinking about how to
specify our orientations in such a way that they can correspond with (or
complement) those orientations that cognitive sciences have already defined
According to D. Andler, the classical paradigm that lies at the origin of
cognitive sciences ( i.e. the cognitivist paradigm) can be characterized "in its
most simple expression" by invoking three propositions that we will be
successively commenting upon and exploiting.
1. "The mind/brain is a complex that can be described in two ways: material
or physical, in the broadest sense of the term; and iriformational or
functional. These two levels are largely independent, and the rapport that
builds up between them is akin to the one that connects a computer (when
seen as a physical system) to the description of the same machine as an
information processing system. "
Metaphorically, a computer shapes the architecture of a field that, according
to D. Andler, "even in these days of contestation, is still widely shared" (p. 14).
The aforementioned double description has made it possible to separate the study
of the mind from the study of its material functioning. Tantamount to the
separation of hardware and software (something that has been essential to the
emergence of cognitive sciences), this initial orientation can be copied and
accepted by economists - but it cannot have the same strategic impact in this
discipline as it does in a psychological or in a cognitive scientific context.
Inasmuch as our goal is to identify which principles are capable of driving a
research program, this seems to be a good opportunity to discuss an element that
the present demonstration continues to imply naturally, yet whose impact is even
more strategic within the confines of our discipline -the acknowledgement that a
cognitive system is a legitimate object of study, and a crucial element in our
understanding of economic systems, whatever the level involved.
From our point of view, we find it difficult to imagine that a research program
in cognitive economics could start out with any premise other than: (i) it is
important to study individuals' cognitive capacities (the constitution of their
universe of perception); (ii) there is a constant need to justify the cognitive
hypotheses that are on offer through a precise examination of their relevancy (to
theory or to reality); (iii) we need to continually explore the way in which the
94 Chapter4
This assertion, together with the one below, determines the "computo-
representational" nature of the cognitivist approach. The exact same idea is found
in the aforementioned works by J. March and H.A. Simon and by R. Cyert and J.
March. This is no surprise given their origins (the Carnegie Institute of
Technology's neo-rationalist school of thought, during the 1950s).
This principle is just as relevant in economics as it is in cognitive sciences.
Nevertheless it continues to be disputed by the connectionist school of thought,
which has developed a non-symbolic approach. This latter school considers that
"meaning is not enclosed within symbols: it is a function of the overall state of
the system and remains connected to the general activity in a given area."
However, as we affirm above, we are not seeking at present to find in favour of
one or the other of these schools of thought or paradigmatic options. Instead we
are focusing on the nature of the issues at stake or on the type of orientation
being emphasized by the programmatic announcements that have been coming
out of the cognitive sciences.
We will therefore be remembering that a cognitive research program has to
emphasize the importance of agents' ability to specify their world (their
"strategic universe") and that this specification must be designed in accordance
with the mode of representation, the mode of interpretation, or the mode of any
interaction schemes and correlation's. To a certain extent, this act of recognition
simply involves an acknowledgement of subjectivity.
3. "These internal states or representations comprise the formula of an
internal language or "mentalese" that is similar to the formal languages of
the particular logic. The processes involved are those that the logic sees as
being effective. They can mainly be reduced to a small number of primitive
operations that will obviously be carried out by a machine (since this requires
no "interpretation") ... ".
The issue raised here relates to the language in which the individual's
"internal" state or subjectivity can be expressed. In other words, the goal is to
figure out how to describe the agent's cognitive competency in the area where
s/he is active. In a cognitive sciences context, modem logic is one preferred
mode of description. Certain schools of thought postulate a "symbolic" system
with the minimalist meaning that D. Andler lends to this term, i.e. where it
consists of symbols that refer to external entities. In economics, the
representation that is ascribed to an actor is usually expressed in the same terms
as those that are used to describe his/her environment and actions. Lack of
information can impact the probabilities that given states of the world will indeed
materialize - but the semantics involved in the agent's description of the states
are the same as the semantics that are used by the person who is making the
model.
96 Chapter4
or take a few factors, is in fact simply reproducing the structure of his/her own
actual interaction with the environment. What is important is not whether this
"internal" functioning is being thought of in a "realistic" or justified manner. We
feel that from a modelling perspective, what is essential is that a dissociation be
created between these two universes so that the problem of an agent's evolution
towards an equilibrium or stable state can be expressed with all the complexity
that is inherent to this issue.
Despite their thematic unity and the existence of a few federating orientations,
cognitive sciences remain pluralistic in nature, with several schools of thought
co-existing. Each emphasizes its own orientations, analytical principles and
specific representations. We can identify at least three such approaches or
paradigms: cognitivism (or computationalism), connectionism, and
constructivism. The first is perfectly (albeit not exclusively) represented by
behaviourism in economics, or more simply by the decision-making theories to
which economists resort. The second is manifested in this discipline by a certain
number of studies based on the use of specific simulation techniques (i.e., genetic
algorithms). The third, with which the name of J. Piaget is particularly
associated, has to our knowledge no particular projection in the field of
economics, asides from a few studies that are relatively epistemological in
nature.
This raises questions as to the impact that this paradigmatic diversity might
have on a planned cognitive economics research program. Where economic
agents are being represented as maximizing actors who are operating as per their
representation of the semantically assessable surrounding environment, most
economic research programs are likely to rally, however trivial this may be,
around the banner of cognitivism (albeit with the considerable reservation that
the computable capacities which this entails are rarely explored and assessed).
This is due to the fact that from a computational perspective studies of the
cognitive capacities being ascribed to economic agents in certain models
sometimes reveal major limitations. For example, M. Rabin demonstrated back in
1957 "that we currently find a number of strictly determined win-lose games 19
for which no computable winning strategy actually exists".
Issues and dilemmas in cognitive economics 99
Yet even though out of all of the cognitive sciences connectionism seems the
best way to transcend cognitivism, the conception of rationality that clearly
continues to prevail in economics means that this latter discipline is not very
distant from this one. The purpose of a research program in cognitive economics
would thus be to raise people' s vigilance about the type (and level) of the
cognitive capacity that a given discipline ascribes to its actors. In fact, a strict
application of the aforementioned orientations, and specifically an evaluation of
the computability they entail, would in our opinion constitute a first move
forward, allowing us to specify the cognitive bases upon which many economic
models have been built - even if for other reasons (i.e., the limited rationality
argument), some analysts will continue to doubt the validity of a particular
model.
Computability can be tested via mathematics (numerical analysis) and
recursive functions . Moreover, to the non-mathematician, a more accessible path
exists - computer-based dynamic simulations, something that in our opinion
provides a precious tool for testing and evaluating the computational
requirements of a number of behavioural models.
However, a cognitive approach in economics can benefit from the type of
representation that connectionism inspires. An important illustration of this
follows.
In economics, the standard vision of the agent either does not differentiate
between knowledge and structure, or else does so in an implicit manner. Decision
theory represents agents as possessing information on the state of their
environment. Now, this information might be assimilated with knowledge, but it
certainly does not constitute know-how, in B. Walliser' s sense of this term: an
explanatory know-how in nature, derived from the causal inference operations
that an agent carries out (Walliser 2000, chapter 2). Although knowledge is
present in economic models, it is usually manifested through the way in which
the agent's possible acts are structured, i.e. through the model's causal meaning
(information on the states ~ acts ~ consequences for the states). It remains that
this refers to a capacity (or an endowment) that has been ascribed to the agent by
the builder of the model, who is using a stable structure - hence our description of
this phenomenon as something that is implied. For these reasons, most economic
models involve information alone. This is not really a surprise, given that the
implied central question is one of co-ordination (which often has more of a
quantitative than a structural definition).
In the end, the dilemma we are facing here stems from the choice we have to
do between a conception of the agent in which s/he allegedly reasons in terms of
"natural states", and another in which s/he supposedly develops plans and fits
into the world in a way that reflects existing causal relationships. In the former
case, the causality is defined once and for all in the model, and the agent reasons
in terms of his/her acts instead of as someone who is getting physically involved
in the world. S/he can play upon the intensity of these acts but cannot modify
their structure. In the latter case, the door is open to entire sequences of actions;
to action gambits; to an increase in available information during the time it takes
to carry out a plan, etc. Although economists generally remain relatively close to
representations in which the agent's operational structure is both implicit and
invariable, the increased utilisation of certain kinds of sophisticated simulation
tools could lead to a change in attitudes.
There are two ways to obtain information: via the outcome of an action that
has been undertaken for reasons other than the acquisition of information; or via
the outcome of an action that is specifically designed to obtain information. We
call the former joined information. With the latter, we say non-joined or free
information. A hybrid is possible when the action that has been decided upon is
altered to make it possible to achieve a "physical" goal and at the same time to
obtain the information that is being sought. To a certain extent, these are test
Issues and dilemmas in cognitive economics 101
CONCLUSION
Through a wide array of studies that are highly diverse both in terms of the
problems with which they deal and also as regards the theoretical or
epistemological options they favor, the cognitive economics, which first made its
appearance in the 1960s, now focuses a great deal of economic research
resources. It is entirely legitimate to raise questions about the contents and/or
timing of the proposal of a cognitive economics research program. On one hand,
this is a way of providing this field with some structure, thus clarifying its issues
and orientations. On the other hand, we are now able to move towards a
conceptual clarification and discussion of the epistemological foundations of the
research that is being conducted this field. We have underlined some of the issues
at stake in this sort of clarification, focusing specifically on problems pertaining
to the speeds at which knowledge or real interactions actually adjust.
The dilemmas that we have emphasized here all provide us with an
opportunity to highlight a certain number of significant alternatives. Specifically:
(i) issues relating to the respective roles of the knowledge economy and of
Issues and dilemmas in cognitive economics 103
cognitive economics stricto sensu; (ii) the difference between the computable
orientation that is involved in standard approaches to economic behaviors and the
connectionist orientation that is illustrated, for example, by the evolutionary
conception of the firm; (iii) the problem of the relationship between a cognitive
economics research program and one that relates to cognitive sciences.
We have presented an outline of the potential foundations for a cognitive
economics research program. We would like to particularly emphasize the fact
that, following in the footsteps of the cognitive sciences, this kind of program
would manifest itself through potentially divergent schools of thought whose
disparities should all be seen as factors of dynamism that could be used to drive
research in this area. We would also like to point out the significant benefits that
economists would derive from this endeavour if they were to draw greater
inspiration from a number of studies that have been undertaken in the field of
cognitive sciences, most of which could lead to an extension of their current
approaches.
NOTES
1. In actual fact, we have to go all the way back to Barone ( 1908) and Pareto ( 1897) to identify the
opening salvoes in this debate.
2. The limits of the period we are analyzing are, on one hand, the immediate aftermath of the War,
when the "first" cybernetic appeared (one that in all likelihood contributed, at least in part albeit
not for economists, to a modification of the meaning given to a certain number of phenomena);
and on the other hand the 1980s, a period during which the evolutionary school of thought
developed considerably, this being an area within which the cognitive plays a crucial role,
alongside theories of endogenous growth and the more macro-economic approaches to
economics that are said to be based on "science" or knowledge. From the 1980s onwards, the
importance of "cognitive" economics was so obvious to everyone that no one really wanted (or
felt the need) to illustrate it any more.
3. This theme was already present in C. Barnard and H.A. Simon, but had not yet truly penetrated
economist circles.
4. This enhancement disappeared from Nelson and Winter's theory following the appearance of
the truce hypothesis, something that clearly demonstrated, in certain respects at least, the retreat
that this theory represents when compared with theories of the firm during the 1960s and 1970s.
However, the different level of analysis can also explain this.
5. The same assertion lies at the heart of analyses of cognitive capitalism and of "new enclosures"
(Paulre 2001a)
6. Previously published articles on learning by doing had been micro-economic in nature. Learning
or experience curves were nothing more than reduced forms, moreover they were not based on
any analysis of the conditions in which knowledge actually arises. The trailblazer article on
learning was the one that A. Alchian wrote in 1949.
7. Clearly this is not a new problem, and it is generally considered that Berle and Means first
brought it up in 1932. Its more distant forerunners can be found in profit literature, the most
interesting example being F. Knight (1921 ).
8 We are alluding to Intelligence as it is defined in DAI (Distributed Artificial Intelligence)
104 Chapter 4
9 The texts compiled in this volume seem to have been written in the 1930s.
I 0 Note in addition a compilation of articles published in 1981 under the name oflnformation and
Co-ordination, in which A Leijonhufvud develops the idea that failures in macro-economic co-
ordination result from the problems agents have in correctly perceiving all of the opportunities
that are present in the system.
li.Ofcourse, Savage's book was preceded by von Neumann and Morgenstern's Theory of Games
(1944).
12. Lucas argues in favour of being able to make a distinction between economic adjustment on one
hand and the learning processes by which agents discover the properties of their environment or
of the situation with which they are confronted. At the very least, they discover that they have
very different rates of adjustment. This is diametrically opposed to the philosophy that K. Arrow
expresses in his 1958 article. For an approach that encompasses alterations in the rules of
decision-making, cf. R. Nelson and S. Winter (1982, chapter 7).
13. The sum total of the states of the world is called the "universal state". By definition, this event is
always enacted.
14.In "hard" sciences, the stability of the object concerning which knowledge is being accumulated
feeds the hope that this can be done on a step-by-step basis. However, the existence of differing
paradigms suggests that even in this field knowledge does not accumulate in a linear fashion (cf.
T. Kuhn).
15.According to F. Varela, the second phase of what he calls STC (the Sciences and Technologies
of Cognition), dates from 1956, the year two major conferences took place, one at Cambridge
and the other at Dartmouth. As the first phase corresponds to cybernetics, I believe that 1956 is
a more suitable year of birth for Cognitive Sciences, even if a certain number of theses or
techniques date from the 1940s.
16.For example, in "Progres en situation d'incertitude", the leading article in one special issue of
Le Debat ( 1987), D. Andler states that " three groups of discipline co-exist under the cognitive
banner". Yet the only social sciences to be mentioned are "(cognitive) anthropology and
(cognitive) ergonomics ", whereas in the same issue we find an article by D. Sperber entitled
"Les sciences cognitives, sciences sociales et materialisme". Inversely, in Introduction aux
sciences cognitives, published under the supervision of D. Andler, we find four articles on social
sciences.
17. One attempt took place in France based on the production of a work group that had been run by
P. Petit as part of a C.N.R.S research project. A report that was co-signed by B. Munier and by
A. Orlean was produced on this occasion.
18. The partition principle can be found in epistemic logic whenever certain actions are being
satisfied simultaneously.
19. These are strictly defined games where one of the players is equipped with winning strategies.
20. We choose not to discuss the issues at stake in this selection process at present (i.e., the firm or
the routine?), nor the modalities of this selection. This is a problem that can be explained by the
biological and Darwinian orientation of the analytical framework that R. Nelson & S. Winter
used - even though they quite correctly denied exploiting this metaphor. This issue can be left to
one side for the moment.
REFERENCES
Akerlof G. A. (1970), "The market for lemons: quality and the market mechanism", Quarterly
Journal of Economics, 84.
Issues and dilemmas in cognitive economics 105
Hamel G., Prahalad C.K. (1990), "The Core Competence of the Corporation", Harvard Business
Review.
Hart 0. (1995), Firms, contracts, and financial structure. Oxford: Oxford University Press.
Hart 0., Moore, J. (1990), "Property rights and the nature of the firm", Journal of Political
Economy, 98.
Hayek F. A. (ed.) (1935), Collectivist Economic Planning, Clifton. Reprint : Augustus M. Kelley,
1975.
Hayek F. A. (1945), "The Use of Knowledge in Society", American Economic Review, 35.
Hintikka J. (1962), Knowledge and Belief, Cornell University Press.
Houthakker H. S. (1959), "Education and Income", Review of Economics and Statistics.
Hurwicz L. (1972), "On Informationally Decentralized Systems", in Radner R., McGuire C. B.
(eds.) Decision and Organization (Volume in Honour of Jacob Marschak), North-Holland.
Jensen M . C., Meckling W. H. (1992), "Specific and general knowledge and organizational
structure", in Werin, L. Wijkander, H. (eds.), Contract Economics, Basil Blackwell.
Jensen M. , Meckling W. (1976), "Theory of the firm: managerial behavior, agency costs, and
ownership structure", Journal of Financial Economics.
Kahneman D., Slovic P., Tversky A. (1982), Judgment under uncertainty: Heuristics and Biases,
Cambridge University Press.
Kaldor N. ( 1934), "The equilibrium of the firm", Economic Journal.
Kirman A. 1997, "Interaction and markets", Southern European Economics Discussion Series, 166.
Kirman A. (1999), « Quelques reflexions a propos du point de vue des economistes sur le role de Ia
structure organisationnelle dans l'economie »,Revue d'Economie Industrielle, 88.
Kirman A. (1992), "Variety: the Coexistence of Techniques", Revue d'Economie Industrielle, 62.
Klein B. H., Meckling W. H. (1958), "An application of operations research to development
decisions", Operations Research, 6.
Klein B. H. (1958), "A radical proposal for R&D" , Fortune, 57(5).
Knight K. ( 1921 ), Risk, uncertainty and profit, Houghton Mifflin.
Kuhn T.S. (1970), The Structure of Scientific Revolutions, University of Chicago Press.
Lamberton D. M. (1971 ), Economics ofInformation and Knowledge, Penguin, Harmondsworth.
Lamberton D. M. (1993), "The information economy re-visited", in BabeR. (ed.), Information and
Communication in Economics, Kluwer Academic, Dordrecht.
Lange 0. ( 1936), "On the Economic Theory of Socialism", Review of Economic Studies, 4.
Lange 0. (1964), Introduction to Economic Cybernetics, Pergamon.
Leijonhufvud A. (1968), On Keynesian Economics and the Economics of Keynes: A Study in
Monetary Theory, Oxford University Press.
Leijonhufvud A. (1981), Information and Coordination. Essays in Macroeconomic Theory, Oxford
University Press.
Leijonhufvud A. (1993), "Towards a Not-Too-Rational Macroeconomics", Southern Economic
Journal, 60.
Lesoume J. ( 1991 ), Economie de I 'ordre et du desordre, Economica.
Lesoume J., Orlean A. (eds.), (1998), Self-Organization and Evolutionary Economics, Economica.
Lucas R. E. Jr (1986), "Adaptive Behavior and Economic Theory", The Journal Of Business,
supplement to the October issue. Reprinted in Hogarth R. M ., Reder M. W. (eds.) (1987),
Rational Choice-The Contrast between Economic and Psychology, The University of Chicago
Press.
Machlup F. ( 1962), The production and distribution of knowledge in the U.S., Princeton University
Press.
Mansfield E. (1961 ), "Technical change and the rate of imitation", Econometrica, October.
March, J. G ., Simon, H. A. (1958), Organizations, J. Wiley.
Issues and dilemmas in cognitive economics 107
Marschak J., Radner R. (1972), The Theory of Teams, New Haven: Yale University Press.
Marshall A. ( 1890-1920), Principles in economics, Macmillan.
Milgrom P., Roberts J. (1982), "Limit pricing and entry under incomplete information: An
equilibrium analysis", Econometrica, 50.
Milgrom P., Roberts J. (1986), "Price and advertising signals of product quality", Journal of
Political Economy, 94.
Milgrom P., Roberts, J. (1995), "Complementarities and fit strategy, structure, and organizational
change in manufacturing", Journal ofAccounting and Economics, 19.
Milgrom P., Roberts J. (1990), "The Economics of Modem Manufacturing: Technology, Strategy
and Organization", American Economic Review, 80.
Miller H. P. (1960), "Annual and Lifetime Income in Relation to Education", American Economic
Review.
Munier B. (1997), "La rationalite face au risque: de l'economie a Ia psychologie cognitive", in
Roland-Levy, C., Adair, P. (eds.), Psychologie economique, Economica.
Muth, J. F. (1961 ), "Rational expectations and the theory of price movements", Econometrica, 29.
Nelson R. R. (1959), "The simple economics of basic scientific research", Journal of Political
Economy, 21.
Nelson R. R., Winter, S. ( 1982), An Evolutionary Theory of Economic Change, Cambridge, Mass.:
Harvard University Press.
Neumann J. von, Morgenstern 0. (1944), Theory of Games, J. Wiley.
Osborne M. J., Rubinstein A. (1994 ), A course in game theory, M.l.T. Press.
Pareto V. ( 1896-1897), Cours d'economie politique, Lausanne, F. Rouge, 2 vols.
Paulre B. et alii (collectif ISYS), (2001a), "Le capitalisme cognitif comme sortie de Ia crise du
capitalisme industriel", communication au Forum de Ia regulation, Ecole Normale Superieure,
September.
Paulre B. (2001b), Preface a Azais C., Corsani A., Dieuaide P. (eds.), Vers un capitalisme cognitif,
L'Harmattan.
Paulre B. (2000), "L'auto-organisation comme objet et comme strategie de recherche'', in Decision,
Prospective et Auto-organisation, Melanges en I 'honneur de J. Lesourne, Dunod.
Penrose E. T. ( 1959), Theory of the Growth of the Firm, Basil Blackwell.
Porat M. (1977), The Information Economy, Special Publication, Department of Commerce,
Washington D.C.
Porat M. (1978), "Global implications of the information society", Journal of Communication,
28(1 ).
Rabin M. 0. (1957), "Effective Computability of Winning Strategies", in Contribution to the
Theory of Games, Dresher M.D. et al. (eds.), Annals of Mathematics Studies, 39.
Reinganum J., Wilde L. (1986), "Settlement, Litigation and the Allocation of Litigation Costs",
Rand Journal ofEconomics, 17.
Rees A. (1966), "Information networks in labor markets", American Economic Review, 56(2).
Richardson G. B. (1960), Information and Investment, Oxford University Press.
Richardson G. B. ( 1972), "The organization of industry", Economic Journal, 82.
Robinson A. ( 1934), "The problem of management and the size of the firm", Economic Journal.
Romer P. (1986), "Increasing Returns and Long-Run Growth", Journal of Political Economy.
Romer P. (1990), "Endogenous Technological Change", Journal of Political Economy.
Ross S. (1973), "The economic theory of agency: The principal's problem", American Economic
Review, 63.
Rothschild M., J. Stiglitz, J. (1976), "Equilibrium in competitive insurance markets: an essay on the
economics of imperfect information", The Quarterly Journal of Economics, 90.
Savage L. J. (1954), The Foundations of Statistics, J. Wiley.
108 Chapter4
INTRODUCTION
France's economic recovery since 1997 has been accompanied by strong job
creation and by a significant drop in unemployment. This does not mean however
that there has been any real reduction in the number of people working under
what has come to be known as "atypical forms of employment". Quite the
contrary, the number of persons in temporary employment (with fixed term
contracts hereafter FTC or doing temporary agency work) has never been as
high. There has been an unprecedented rise in part-time work in France,
something that coincides nowadays with the ever-increasing number of female
entrants into the job market. In an Employment Survey carried out by the French
National Statistics Office (INSEE), part-time jobs represented 16.8% of the
country's employed working population, and temporary jobs (temporary work
and FTC) 6.3%. Furthermore, since 1994, there has been greater growth in
temporary work than in FTC.
111
C. Lesage et al. (eds.), Connectionist Approaches in Economics and Management Sciences
© Springer Science+Business Media Dordrecht 2003
112 Chapter 5
(sarroe : INSFE)
20
151
10
5
: : : :
0
'
1982
-
1985 1900
Tem:x:rcry
1£e5 1900
-+- FTC
1997 1~ 1999
-+- Pcrt-tirre
2(XX)
constraints that are harder to deal with than is the case when the person involved
benefits from an open-ended contract and a full-time employment status.
Neuronal techniques such as Kohonen maps were used throughout the study
to segment groups of employees according to available quantitative variables,
before linking the category variable that is defined in this manner with informed
qualitative variables. We would like use the present article to present an
alternative to this technique, proposing a method that makes it possible to
segment individuals by qualitative variables, even though this particular
segmentation will later be crossed with available quantitative variables.
39 columns featuring ls or Os and 5 columns of real data (see the appendix for
additional information on the survey).
Heading Name
Minimum duration of actual workweek DMIN
Maximum duration of actual workweek DMAX
Theoretical duration of workweek DTHEO
Number of overtime hours worked per week HSUP
Number of hours of extended shift work/week HPROL
A simple cross-analysis of the variables reveals right away that men only
represent 10% of all part-time employees working on an OEC basis and 18% of
all part-time employees on an FTC. Moreover, even though forced (and therefore
involuntary) part-time work accounts for nearly 50% of all employment
contracts, it only represents 43% of the OEC, versus nearly 80% of FTC. Note
that 83% of all contracts are OEC.
After a cursive study of these descriptive statistics (we will not be delving any
further into them at present; see appendix for elements thereof), we are now
going to carry out a segmentation of those individuals who are represented by the
14 qualitative variable defined above, as well as their 39 modalities. Towards this
end, we will be defining a new method, one that is based on the Kohonen
algorithm, but which enables an analysis of complete disjunctive tables.
This means that a priori we have defined a concept that accounts for a
neighbourhood between classes. It also means that neighbouring observations in
the data space of dimension p will belong (once they have been classified) to the
same class or to neighbouring classes. The use of this algorithm is justified by the
fact that it enables a regrouping of individuals into small classes whose
neighbourhood is meaningful (unlike a hierarchical classification or a moving
centres algorithm), and that they themselves can then be dynamically regrouped
116 Chapter 5
into super classes, preserving all the while the relationships of neighbourhood
that have been detected. The visual representation of the classes is therefore easy
to interpret, inasmuch as it occurs at a global level. Inversely, visual
representations obtained through the use of classical projection methods are
incomplete, as it becomes necessary to consult a number of successive
projections in order to derive any reliable conclusions.
Following on from this, we assume that our readers are familiar with this
algorithm (see inter alia Cottrell Fort, Pages 1998).
Given that an arbitrary number of classes is chosen (it is often high because
we frequently select grids of 8 by 8 or 10 by 10), we can reduce the number of
classes, regrouping them by subjecting the vector codes to a classical hierarchical
classification. We can then colour the class groups (called super classes) to
enhance their visibility. Generally we observe that the only classes that such
super classes regroup are contiguous ones. This can be explained by one of their
properties, i.e., by the fact that the Kohonen algorithm respects the topology of
the data space. Moreover, non-compliance with this property would indicate the
Working times in atypical forms of employment 117
The present paper introduces a method that has been adapted to qualitative
variables, and which also enables a simultaneous processing of individuals and of
modalities.
the complete disjunctive table, called D. Note that it contains all of the
information that will enable us to include individuals as well as the modalities'
distribution.
We note dij as the general term of this table. This can be equated to a
contingency table that crosses an "individual" variable with N modalities and a
"modality" variable with M modalities. The term dij takes its values in {0, 1}.
}=I i=l
In order to use a X2-distance along the rows as well as down the columns, and
to weight the modalities proportionately to the size of each sample, we adjust the
complete disjunctive table, and put:
d .. d .
de= I) de= I)
ij ~ij ~·
"1/ui.u .j "1/uiu.j
When adjusted thusly, the table is called De (adjusted disjunctive table). This
transformation is the same as the one that lbbou proposes in his thesis (lbbou
1998; Cottrell, Ibbou, 1995).
These adjustments are exactly the same as the ones that correspondence
analysis entails. This is in fact a principal weighted component analysis that uses
the Chi-Square distance simultaneously along the row and column profiles. It is
the equivalent of a principal components analysis of data that has been adjusted
in this way.
Working times in atypical forms of employment 119
We then choose a Kohonen network, and associate with each unit a code
vector that is comprised of (M + N) components, with the M first components
evolving in the space for individuals (represented by the rows of Dc) and the N
final components in the space for modalities (represented by the columns of De).
The Kohonen algorithm lends itself to a double learning process. At each stage,
we alternatively draw a Dc row (i. e., an individual), or a column (i. e., a
modality).
When we are not trying to classify individuals but only modalities, we can use
another algorithm that draws its inspiration from the genuine Kohonen algorithm.
This is called KMCA. We can then classify individuals as if they were additional
data (for definitions and applications, see inter alia lbbou's thesis, lbbou, 1998).
We can also classify individuals alone, and then classify as additional data the
"virtual individuals" associated with the modalities that have been calculated
from the rows of the Burt matrix. Finally we can classify modalities alone (as is
the case with KMCA) and classify individuals subsequently, once they have been
properly normalised. This is what Ibbou called KMCA1 and KMCA2. These
120 Chapter 5
methods generate findings that are very comparable to those that can be found
with KDISJ, but they do require a few more iterations.
2. THE CLASSIFICATION
~
rus' :.£1'2 !lli"'ll
2(9.3) 1(2.0)
~ 10:10.0) ~12) 36(33.0)
stNl Rlll.Pl
""IJL
·O<VAR
1{5.6) r \\KI
f\Bsl
~8.1) ~I) 1'(8.0) f'4<43,1)
Note: The squares in gray feature a much higher percentage of OEC than the total population does.
Working times in atypical forms of employment 121
Note how modalities and individuals are distributed amongst the various
classes in a relatively balanced fashion. Fixed term contracts are mostly found to
the left of the map. Remember that they only represent 17% of all contracts.
The modalities that correspond to the best working conditions (in other
words, and for the purposes of the present paper, to more regular working times;
to no night-time, Saturday or Sunday shifts; to open-ended contracts; and to
voluntary part-time status) are associated with all age brackets, except for young
persons, and are found to the bottom right. These correspond to relatively
favourable work situations. Inversely, the young persons modality is located to
the top right, and is associated with "unpleasant" modalities such as night shifts,
Sunday shifts, no chance to take any time off etc.
The modality for women (who are present everywhere and who constitute the
vast majority of the total population, to wit 88%) is close to the centre of the map
and associated with the involuntary part-time modality that is close to the FTC
modality.
I 2 3 4 5 6 7 8 9 10 Tot
OEC 99 40 100 92 47 92 77 94 88 93 83
FTC 1 60 0 8 53 8 23 6 12 7 17
MAN 0 54 2 1 14 13 16 10 12 7 12
FEM 100 46 98 99 86 87 84 90 88 93 88
AGEl 0 0 0 0 100 0 5 0 0 0 6
AGE2 36 51 43 45 0 26 65 39 39 39 40
AGE3 42 22 36 32 0 39 9 44 29 43 31
AGE4 23 27 22 22 0 34 21 17 32 18 22
HORIDE 52 61 48 75 35 26 49 29 29 0 52
HORPOS 0 0 0 0 0 0 12 0 8 100 4
HORVAR 48 39 52 25 65 74 40 71 71 0 44
JWKI 96 83 89 91 78 79 47 39 68 57 79
JWK2 4 17 II 9 22 21 53 61 32 43 21
NITEI 100 92 95 99 86 95 51 64 100 75 90
NITE2 0 5 4 1 10 5 21 27 0 21 7
NITE3 0 3 I 0 4 0 28 9 0 4 3
SATI 88 57 3 80 29 53 5 2 34 21 49
SAT2 6 23 5 10 I4 26 5 91 27 68 23
SAT3 6 20 92 10 57 2I 90 7 39 11 28
SUNI 96 88 82 99 69 87 0 13 83 57 76
SUN2 4 I2 IS 1 IS 10 2 87 I7 43 18
SUN3 0 0 0 0 I3 3 98 0 0 0 6
WEDI 41 I3 2I 33 IO I6 14 II I5 14 23
WED2 I4 9 IO 9 IS 8 12 46 I9 43 I6
WED3 45 78 69 58 72 76 74 43 66 43 61
ABSI 70 8I 72 7I 67 82 58 77 73 75 73
ABS2 24 9 0 21 6 I3 16 8 IO 18 I4
ABS3 6 IO 28 8 27 5 26 I5 I7 7 I3
DETI 0 81 90 75 88 37 72 68 0 78 63
DET2 0 5 0 25 2 8 I4 I2 0 7 II
DET3 100 I4 10 0 4 45 5 13 0 11 19
DET4 0 0 0 0 6 II 9 6 100 3 7
INVOL 12 80 74 44 82 53 60 34 46 21 50
INVOL 88 20 26 56 18 47 40 66 54 79 50
LEND I 100 100 100 100 100 5 95 98 100 100 95
LEND2 0 0 0 0 0 95 5 2 0 0 5
RECUPO 44 57 40 67 65 34 44 32 37 6I 52
RECUPI 34 20 3I 19 23 29 40 43 39 25 28
RECUP2 22 23 29 14 12 37 16 25 24 14 20
Note: The numbers written in bold font correspond to particularly high values and those in italics
to particularly low values.
124 Chapter 5
We can verify that in most cases, the modalities find themselves either within
or else close to one of the classes where they have a significant role to play. We
can control this by calculating each modality's deviation for each of the 10 super-
classes 1• As can be expected, such deviations are positive 85 % of the time.
Variable I 2 3 4 5 6 7 8 9 10 Total
DMIN 27.1 23 .7 24.4 25.4 21.8 24.3 22.1 23.2 24.4 24.9 24.5
DMAX 29. 1 27.7 27.4 27.2 24.1 29.5 32.2 32.3 28.9 32.0 28.5
DTHEO 27.0 24.4 24.5 25.7 22.3 24.8 25.4 25.8 25.1 27.1 25.4
HSUP 0.69 1.8 3.45 0.95 1.16 1.82 2 1.48 1.78 1.36 1.51
HPROL 1.65 1.97 1.54 0.78 0.75 2.08 2.53 2 2.71 0.86 1.5
The 5 qu antitative
- - - - -- - variables,- - - - - - - - - - - - - ,
-- ....
•
DMIN ~ OTREO
- 7
H U
JHPAOI.
10 T...
Table 6. Typology
On the super class representation, it is clear to see that the FTC and OEC
modalities are distinct and separate (class 2 and class 4), as are men and women.
As expected, women are associated with involuntary part-time work. The
"voluntary part-time" modality can be found in class 1, near the OEC modality.
Class 4 features the modalities that correspond to "normal" working conditions,
with all ages being represented except for young persons.
For a more exhaustive summary, the reader can refer to the report edited
by Cottrell, Letn!my, Macaire eta!. (2001).
126 Chapter 5
CONCLUSION
In analysing the responses given to the question "Would you like to work
more?", we learn that the more atypical the contract, the more employees would
prefer to work more, as long as the increase in pay is proportional to the increase
in the number of hours they work. Amongst atypical jobs, it is primarily part-
time workers on fixed term contracts (and temporary workers, albeit to a lesser
extent) who would like to work more. The same opposition between part-time
and full-time work can be found in responses to questions relating to the desire to
Working times in atypical forms of employment 127
work less: unsurprisingly it is the part-timers who are less in favour of working
fewer hours. Furthermore, those who are out looking for a new job are basically
part-time employees on a fixed term contract (more than 40%) and temporary
workers (more than 50%).
APPENDIX
The data we used comes from the latest INSEE Timetable survey, the fourth of its kind (the
previous one having been carried out in 1985-1986). It ran from February 1998 to February 1999 in
8 successive survey waves. Focusing on French lifestyle and working patterns, the full study looked
at compensated professional working times, and more specifically at people's working times in
their "main current occupation". The sample is comprised of the only salaried population that can
provide comprehensive data on its professional working times. Teachers (who often make
incoherent statements about their working times, equating them with contact hours alone) and other
abnormal cases were taken out of the sample. 1,153 individuals were eliminated thusly, leaving a
database of 5,558 wage-earning individuals.
When this sample is linked to data from the INSEE's 1998 2or 1999 Employment surveys, no
major difference is detected between the two in percentage terms. If we structure the data according
to the type of work (full-time OEC, part-time OEC, full-time FTC, part-time FTC, temporary
workers, other), we come up with two very similar distributions (see table 7 below).
Table 7. Distribution of sample according to form of employment in the 1998 INSEE Timetable
and Job surveys
NOTES
I. The deviation for a modality m (shared by nm individuals) and for a class k (with n\ individuals)
can be calculated as the difference between the number of individuals who possess this modality
and belong to the class k and the "theoretical " number nm nkI n which would correspond to a
distribution of the modality m in the class k that matches its distribution throughout the total
population.
2. Source: Employment Survey 1998, INSEE findings, no 141-142, 1998, 197 pages.
3. Except for non-tenured State and local authority employees.
4. TC except for State and local authority officials, + non-tenured State and local authority
employees.
REFERENCES
Boisard P., Fermanian J.-D. (1999), "Les rythmes de travail hors normes", Economie et Statistique,
321-322 (112), 111-132.
Bue J., Rougerie C. (1998), "L'organisation du travail: entre contraintes et initiative - resultats de
l'enquete Conditions de travail de 1998", Premieres Syntheses, DARES, 32(1), 99. 08.
Bue J., Rougerie C. (1999), "L'organisation des horaires: un etat des lieux en mars 1998",
Premieres Syntheses, 99(07), 30. 0 I, 8 pages.
Bue J., Rougerie C. (2000), "L'organisation des horaires: un etat des lieux en mars 1998", Les
Dossiers de Ia Dares, 1-2,9-15.
Cottrell M., Fort J.-C., Pages G. (1998), "Theoretical aspects of the SOM Algorithm",
Neurocomputing, 21, 119-138.
Cottrell M., lbbou S. (1995), "Multiple correspondence analysis of a cross-tabulation matrix using
the Kohonen algorithm", in Verleysen M. (ed.), Proc. ESANN '95, D Facto, Bruxelles, 27-32.
Cottrell M., Letremy P., Macaire, Meilland, Michon (2001), Les heures de travail des formes
particulieres d 'emploi. Rapportfinal, IRES, February, Noisy-le-Grand, France.
Cottrell M., Letremy P., Roy E. (1993), "Analysing a contingency table with Kohonen maps: a
Factorial Correspondence Analysis", Proc. IWANN'93, in Cabestany J., Mary J., Prieto A. (eds.)
( 1993), Lecture Notes in Computer Science, Springer-Verlag, 305-311.
Cottrell M., Rousset P. (1997), "The Kohonen algorithm: a powerful tool for analysing and
representing multidimensional quantitative et qualitative data", Proc. IWANN'97, Lanzarote.
Freyssinet J. in G. Cette (1999), "Le temps partie! en France", Paris, La Documentation franvaise
(collection "Les rapports du Conseil d'Analyse economique").
Galtie B. (1998), Les emplois des salaries a temps partielle secteur prive - Diversite des emplois
et des conditions de travail, Conseil Superieur de I'Emploi, des Revenus et des Couts, 98(03).
Gollac M., Volkoff S. (2000), Les conditions de travail, collection "Reperes", La Decouverte,
Paris.
Ibbou S. (1998), « Classification, analyse des correspondances et methods neuronales », Doctoral
thesis, Universite Paris I.
Kaski S. (1997), "Data Exploration Using Self-Organising Maps", Acta Polytechnica Scandinavia,
82.
Kohonen T. (1993), Self-organization and Associative Memory, 3°ed., Springer.
Kohonen T. (1995), "Self-Organizing Maps ", Springer Series in Information Sciences, 30,
Springer.
Working times in atypical forms of employment 129
Abstract: The present paper analyses current employment and work policies in French
establishments on the basis ofthe REPONSE survey that was conducted in 1998. By
employment and work policy we mean a parallel study of customary employment
relationship characteristics as well as work organisation practices. Our study is
rooted in several employment policy variables as well as variables relating to work
organisation. The methodology used is based on two complementary analytical
tools: multiple correspondence analysis (MCA); and Kohonen's neuronal algorithm
(KMCA). After an exploratory study our interpretations are complemented by the
construction of a typology.
INTRODUCTION
131
Our study is based on the 1998 REPONSE survey. We define the varying
forms of production organisation not only by the human resource management
(HRM) practices they encompass but also by their modes of work organisation.
These two poles are in fact highly complementary, and even inseparable when it
comes to defining a firm's policy towards its employees. Our analysis therefore
focuses on employment policy variables, i.e., the extent to which firms make use
of part-time work, fixed term contracts (hereafter FTC) and temporary work; the
presence of wage increases and negotiating systems; and spending on training on
one hand and work organisation variables, such as the use of forms of collective
work, flatter hierarchies or employee mobility on the other hand.
2. OVERALLANALYSIS
2.1 MCA
The first axis (II% of the total inertia) is built around variables such as wages
(POLSAL, INTERES), training (DEPFORM) and negotiation policy (NEGSL98
and AUTNEGR). The specificity of the organisational forms that the
establishments implemented can be detected when this axis is analysed, even
though it does not particularly stand out (only NVORGA and SUPNIV manifest
themselves to a significant extent). Note that the four modalities of the
DEPFORM variable are distributed uniformly, and that they constitute an axis
which is almost parallel to axis I.
More generally, axis I contrasts "restrictive" and "voluntarist" workforce and
work organisation policies. Indeed, on the right hand side we note an absence of
wage bargaining, negotiations on any other issues, wage hikes or profit-sharing
Work and employment policies in French establishments in 1998 135
The second axis (7% of the total inertia) is built around variables that relate to
the forms of employment (TPART and PRECA) and to the other variables which
describe the work organisation (this time around ORDRES and MAJMO). It
contrasts policies based on workers' lesser mobility between workstations, a
prescription of work through the setting of objectives and a relatively widespread
use of part-time contracts (situated to the top) with diametrically opposed
behaviours (situated below). Hence the factorial representation (1,2), which
reveals four types of behaviours. Amongst the "restrictive" policies we
distinguish behaviours in the north-east quadrant, characterised by work that
involves very little mobility and by the presence of a large number of part-timers,
with behaviours that ally themselves to the "restrictive" policies found in the
south west quadrant, where work is prescribed through specific tasks and very
few part-timers are employed. "Voluntarist" behaviours include an opposition
between a major use of part-timers and a work prescription that is defined by
overall objectives, versus a great deal of work mobility and a large number of
fixed term jobs.
Axis 3 (6% of the inertia) is built around employment policy variables
(DEPFORM4 and PRECAl) and around certain work organisation modalities
(NVORGAl, MAJMOit The northern part of the axis associates a relatively
insignificant recourse to fixed term contracts (FTC or temporary work) with a
high level of spending on training. This type of behaviour is also linked to the use
of a specific mode of work organisation, replete with highly mobile employees
and featuring the implementation of a wide array of forms of collective work.
The lower part of the axis is not very specific.
136 Chapter 6
Axis2
0.8
o. 7 TPJMfES! a.\JIIOJ
0,6 PotSAU TPIUITI
0.5 llfPfOlMI
INOAGAl
0.~
0.3 NEGlll!l82
P!lECA! PotWJ
0.1
0.1 ~TERES2 AUTNEGRJ
DEPFOIIIIJ AUlliEGR! SOPit.'2
0.0
N'IORGAJ
-0.1 suPtl'l~,rn~~a~~ NI'OAW
-0.1 PotSAil
-0.3 llfl'falWl
OAORESl P!lECAl
-o.~ WAAIO! OO'FORII!
-o.s
UA.U01
-0.6
PII'CAJ
-o. 1
-0.8
- 0.9
1PART1 Axis 1
-0.1-0.6-0.5 -o.~ -0.3-0.1-0.1 o.o o.1 o.1 o.3 o.~ o.5 o.6 o.7 o.8 o.9 1.0 1.1 1.1
Axis3
0.9
!NORMI
0.8 PRECAI
IIAIIOl
0.1 OEPFUIII4
0.6
Pot.SA.ll
o.s POlSAlJ
0.~
WNITI DfPf(JIIII
SUPIM
aiDIIES2
0.3
0.1 AI111£GA1 PII'CAJ lfTEAES2
N[Gill!m
0. 1 AUTJEGR3
0.0 """
-0.1 J«QSL981=~
a.\1103 DEPFOAII!
-o.z OADIIESI
IN1I11E$1 POlSAl! SUPtw2
-0.3 PRECA! tfe'ORGAt
-o . ~ """"''i.ul02 AUTI£QRI
-o .s
TA\RT4 Axis 1
-0.6
-0 .1 - 0.6 ·o.s -o.~ -0.3 -o.z -0.1 o.o 0.1 o.1 o.3 o.~ o.s 0.6 0.1 o.8 o.9 1.0 1.1 1.z
2.2 KMCA
A second contrast can be ascertained along the second diagonal (from the
northwest to the southeast). This relates to work organisation practices and to
wage hike and training modalities. One of these regroupings includes mixed or
general (across-the-board) wages hikes, an absence of profit-sharing
arrangements, non-flattened hierarchies, a work prescription that involves an
allocation of specific tasks, the non-implementation of forms of collective work
and lesser mobility for employees in their jobs5 . Inversely, the southeastern part
regroups policies featuring individualised wages, a team-based work
organisation, much employee mobility, the elimination of hierarchical levels and
a work prescription expressed in overall objectives. These practices go together
with major spending on training.
With the Kohonen map, we are able to summarise the main findings of the
MCA approach. Indeed, in reading this map we again encounter an analysis that
focuses on the axes' variation. The north-south opposition mostly corresponds to
information that can be used to structure the first factorial scale, whereas the
second diagonal cuts across information we could detect on axes 2 and 3. This
crossing of analytical sources is a precious tool for interpreting the
multidimensional phenomena we analyse.
138 Chapter6
NVORGA3 MAJMOI
INTERESI
L
!~~~_/
[.' -· \
--- \_
,.~·,...
-- '
(~ J
,- 7
I I
1
l- - - - --,
I
-J
_j l L
/
/
_j '-
\
.l
\
L- L J
Figure 4. Distance between cells and their closest neighbours
Note: we can regroup cells according to the distances that separate them. A breakdown into 3
classes regroups the 4 squares to the upper left into a first class, the 9 squares to the right (light
gray and darker) into a second class, and the rest into a third one. A breakdown into 5 classes
would make us split classes 2 and 3 in two.
Work and employment policies in French establishments in 1998 139
Note: each cell contains the representation(s) of the vector codes that are associated with the
varying modalities, in the following order: Tpartl Tpart2 Tpart3 Tpart4 Precal Preca2 Preca3
Nvorgal Nvorga2 Nvorga3 Supnivl Supniv2 Majmol Majmo2 Majmo3 Negsl981 Negsl982
Depforml Depform2 Depform3 Depform4 Ordresl Ordres2 Interesl Interes2 Autnegrl
Autnegr2 Autnegr3 Polsall Polsal2 Polsal3.
nearly 200 additional variables. Note that at present we will only be summarising
the main conclusions of this analysis.
The other establishments pursue a policy that we can call voluntarist. This
second class is characterised by the fact that it frequently resorts to collective
work, active wage policies (with individualised pay hikes and profit-sharing
arrangements), frequent negotiations, a great deal of spending on training and a
frequent use of precarious forms of employment. Moreover, the work
organisation in these establishments is more or less geared towards polyvalence.
Two groups of establishments can be distinguished in this category however.
This distinction is based on their contrasting wage practices (general or
individualised hikes); the varying proportions of fixed term employment they
offer; and/or their recourse to collective work mechanisms.
On one hand, we have a group representing 33.5% of all establishments (and
45.1% of employees) that is comprised of large capital-intensive groups which
often belong to the industrial sector. Such firms employ a great number of
technicians and workers and feature very active training or pay policies. They
also seem to favour internal careers, meaning that the level they operate at is
closer to the traditional internal market model. Note however that these forms of
employment are accompanied in this group by policies that involve innovative
work organisation policies and production techniques. Moreover, the strong
reliance on temporary work or FTC contracts in this context means that we can
hypothesise a dualistic type of employment management, i.e., a renewed internal
market.
We distinguish another group, representing 16.5% of all establishments and
26.1% of employees, which like the one above is comprised of larger and
relatively older groups, but where such firms are less confined to the industrial
sector. The establishments here employ more managers, fewer workers, and just
as many technicians. Their work organisation and production methods are very
innovative. Spending on training is high and wage policies are based on
individualisation and on profit-sharing. They resort relatively infrequently to
fixed term contracts. All in all, we can call such work and employment policies
professionalised management. Employees' working and employment conditions
are relatively beneficial and people are very involved in (and associated with) the
firm's objectives. The work and employment organisation twins independence in
142 Chapter6
one's work with a personalised motivation, and employees feel a great sense of
responsibility.
CONCLUSION
In terms of the data that was used in the present study, the Kohonen algorithm
turned out to be entirely complementary to traditional methods of data analysis.
An association of these two methods is particularly useful in a synthesis of
complex information, whether this involves an overall analysis or the creation of
a typology.
Regarding our interpretation of the transformations that have affected the
structure of the labor market over the past 20 years, the present study has enabled
us to advance two main conclusions. On one hand, we have been able to ascertain
a mode of production and work organisation that is close to the canonical model
of flexible production (involving individualised carrier management and
polyvalence in work organisation). Note that to a certain extent this has
developed to the detriment of the classical forms of the internal Fordist market,
and not in parallel to them (as shows the emergence of a renewed ILM class).
Secondly, our analysis enables a precise study of the extent to which the labour
market segmentation schema has been globally incorporated. In addition to its
definition of a professionalised work organisation segment (class 5), it has
enabled us to differentiate three types of organisation within the entity that is
generally considered globally and called the secondary market (with classes 1, 2
and 3).
ACKNOWLEDGEMENTS
The present study was carried out under the auspices of a research agreement
between the DARES - France's Ministere de l'emploi et de fa solidarite and
MATISSE - Universite Paris I covering a processing of the REPONSE 98
survey. See Lemiere S., Perraudin C. and Petit H. (2001) for a complete report of
these researches. We would like to thanks Marie Cottrell, Patrick Letremy,
Christophe Ramaux and Bernard Gazier for their advice and suggestions during
this project. We are particularly grateful to Patrick Letremy for having allowed
us to use his computer programmes, available at: http://samos.univ-parisl.fr,
logiciels. We would like to thank Alan Sitkin (Transalver Ltd.) for the translation
of this paper.
Work and employment policies in French establishments in 1998 143
APPENDIX
Volunlarislpolicies
Reslrittivepol~~'
NOTES
I. We regrouped into one and the same modality both the non-responses and the "Does not know",
when the latter stemmed from nested questions. It remains that this modality generally involved
a small sample size, leading us to only incorporate those establishments that never answered
"Does not know" or" missing" to any of the active variables.
2. Kohonen (1984; 1993; 1995). See Allison, Yin et al. 2001; Cottrell, Fort, Pages 1998; Cottrell,
Gaubert, et al. 1999; Oja, Kaski 1999, for presentations of these methods.
3. Our interpretation will only cover the three first factorial axes. This will allow us to account for
around 25% of the total inertia. See the appendix for the results of this analysis. Note that all
the modalities are represented, but we will only interpret those modalities that have the greatest
contribution to the axes' construction (as well as those that are well represented).
4. Here we are talking about modalities and not about variables, given the disparity between the
contribution (and even the quality) of certain modalities' representation (i.e., DEPFORM1 and
DEPFORM2 cannot be analysed along axis 3, unlike DEPFORM3 and DEPFORM4).
5. This finding appears to contradict our reading of axis 2 of the MCA, which associated a lesser
mobility with a work prescription that is expressed in terms of overall objectives. On the other
hand, it corresponds to our reading of axis 3. In addition, the result is a robust one, in that it is
able to withstand a repetition of the algorithm.
REFERENCES
Allison N., Yin H., Allinson L., Slack J. (2001), Advances in Self Organising Maps, Springer.
Boyer R., Beffa J.L., Touffut J.P. ( 1999), Employment relationships in France: the State, the firms
and the financial markets, Published by the Saint Simon foundation, December.
Cottrell M., Fort J., Pages G. (1998), Theoretical aspects of the SOM algorithm, Neurocomputing,
21, 119-138.
Cottrell M., Gaubert P., Letremy P., Rousset P. ( 1999), "Analysing and Representing
multidimensional quantitative and qualitative data: Demographic study of the Rhone valley. The
domestic consumption of the Canadian families", in Oja E., S.Kaski (eds.), Kohonen Maps,
Elsevier, June, 1-14.
Galtier B. (1996), "Gerer Ia main d'a:uvre dans Ia duree: des pratiques differenciees en
renouvellement", Economie et Statistique, 298, 45-70.
Kohonen T. (1984, 1993), Self-organization and Associative Memory, 3'd ed., Springer.
Kohonen T. (1995), Self-Organizing Maps, Springer Series in Information Sciences, 30, Springer.
Lemiere S., Perraudin C., Petit H. (2001 ), Regimes d'emploi et de remuneration des etablissements
fram;:ais en 1998, Construction d u' ne typologie apartir de l'enquete REPONSE, rapport dans le
cadre de Ia convention d'etude sur I'Enquete REPONSE, pour le compte de Ia Direction de
I' Animation et de Ia Recherche, des Etudes et des Statistiques (DARES) du Ministere de
l'emploi et de Ia solidarite, 51p, novembre.
Oja E., Kaski S. (eds.) (1999), Kohonen Maps, Elsevier.
Chapter 7
Abstract: The validity of means-end data is discussed from a conceptual and empirical
perspective. A dynamic programming approach (Markov chain) is then proposed
that enables to measure the validity and reliability of any group of means-end data
(item, hierarchical level, whole data basis), thus enabling to purify it and to have
empirical insight on debated topics such as the number of means-end levels to be
considered and the validity of the means-end data collection methods.
Key words: Means-end Chains, Markov Chains, Genetic Algorithms, Consumer Behaviour.
INTRODUCTION
145
However these qualities are raising considerable data analysis problems, with
the appropriate length of a MEC becoming a moot-point among researchers.
Gutman (1982) proposed that a MEC comprises 3 levels: the "attribute" level, the
"consequence" level and the "value" level. Although some researchers strongly
advocate that only two links should be considered (ter Hofstede, Audenaert,
Steenkamp, Wedel 1998), a typical laddering interview would often generate
more than 3 successive links. Such empirical evidence supports the idea that each
one of Gutman's basic three levels could be split into two, following Rokeach's
(1973) distinction of two value levels (terminal and instrumental). The resulting
Measuring and optimizing the validity of Means-End data 147
hierarchy (Olson, Reynolds 1983; Peter, Olson 1987; Pitts, Wong, Whalen 1991;
Reynolds, Gengler, Howard 1995) is summed up in Table 1.
Longer MECs, like those collected with laddering interviews, are likely to
reflect more precisely a consumer's means-end processes. However, because of
the large number of possible arrangements, such precision could be lost and
create unnecessary data analysis problems when the population's common MECs
are sought. On average, the number of possible MEC is 85, and the observed
MECs are rarely common to more than 2% of the sample. Many other questions
have not been rigorously addressed yet, in particular that of the number of MEC
to be collected from a consumer and that of the importance of a MEC. Indeed,
even if researchers agreed to analyse only one MEC per consumer, there would
be no method for choosing the appropriate MEC among those expressed by the
consumer.
for identifying the most valid combination of means-end items. In the third part
of the paper, the procedure is tested empirically, using actual means-end data
collected from 509 consumers of beer.
This section considers the validity of MECs from the perspectives of data
collection protocols and analysis procedures. The purpose is to make clear the
main problems and advantages inherent in each area of concern and to propose a
definition of means-end validity.
lt9> ltlO] is composed of three groups of consumers. Thus, [Itl > lt6] proceeds
from ltl, [lt6 > It9] proceeds from It2 and lt3, and finally [lt9 > ltlO] proceeds
from It4 and ItS.
&, l5~
5 /
1~05.-Jr
~~ 5
119,:£
~5 5
LJ ~
~5
In the previous sections, it has been argued that means-end data collection has
suffered from the limitations of available statistical analyses. Some statistical
methods are exerting on the data collection structural constraints which may be
excessive (e.g. the APT approach). Others allow more flexible data collection
(e.g. laddering) but they lack the structural guidelines necessary for achieving
interpretable results (e.g. multidimensional analysis and clustering).
The converse perspective on MEC research would see statistical techniques
develop, which would respect three criteria. First, they must exert as little
constraint as possible on the data to be measured (e.g. no a priori number of
links). Second, they need to be consistent with the reality of means-end processes
by adopting a dynamic approach (e.g. by accounting for the order of elicitation).
Third, they must be capable of checking means-end validity at every data level
152 Chapter 7
This section is composed of two parts. In the first, the utilisation of a dynamic
programming approach of Means-end Validity is justified as well as the concept
of internal predictivity. In the second part, the method is illustrated empirically
using actual data from cigarette smokers.
normalized so that any item has a 100% probability ofbeing linked. The resulting
matrix (called "transition matrix") gathers the probability of a transition between
any two items. The global transition process can then be simulated over N
periods by raising the transition matrix to its Nth power. As a result, the most
probable connections are emphasised, while the least probable are reduced to a
lesser role. In the vast majority of cases the result of this process is that the
probabilities of transition converge to stable values. Hence, the incidental
elements of the initial probabilities are progressively filtered, leaving a weighted
graph of the significant relationships.
Means-end chains are very similar to a Markov chain in that they are
composed of sequential decision-making steps with discrete states and
probabilistic outcomes that can be conveniently translated into a graph (a
"HVM"). In both cases, the objective is to reduce the uncertainty, also known as
"entropy" in Information Theory, in the observed phenomena by translating the
significant information into the simplest possible graph. In order to do so a
measure of entropy is needed. Several measurements have been proposed to cope
with the variety of the existing problems. Most of them are based on the strength
of the associations (e.g. Cramer's V, Goodman and Kruskal's 1 and g). A "u"
measure, based on Shannon's entropy was also proposed by Theil (1971), to
quantify how much one variable (e.g. row) predicts another variable (e.g. a
column). These association measures can be used to quantify the entropy within a
transition matrix and, therefore, to research ad hoc methods to reduce this
entropy. However, association measures do not account for the dynamic
characteristic of the processes. This can be illustrated with the limit case of a
transition matrix whose rows are identical and whose columns are different.
According to the measures of entropy mentioned above, the association between
rows and columns is zero, meaning that no information is provided. However
there clearly is some information: the probability of a column is equal to its score
across the rows. For this reason, the measure of entropy used in this paper is
slightly different from the existing association measures. It has been recently
proposed to deal with dynamic process by Berchtold and Ritschard (1996). Using
Shannon's entropy, these authors propose to measure the predictivity of a
transition matrix, that is the ability of any item of the process to predict any other
item. In terms of MEC theory, the predictivity measure would reflect the ability
of an item to convey means-end flow, thus corresponding to our definition of
means-end validity (2.1).
The measure of predictivity of a transition matrix that Berchtold et al. (1996)
propose is made at the item level, then averaged and adjusted for item frequency.
Empirical comparisons with alternative measures indicate that this measure has a
normal asymptotic distribution and is more particularly reliable and valid:
(Berchtold 1997).
154 Chapter 7
where:
- Pij is the probability of a direct link between i and j,
- n; is the number of occurrences of item i , and N is the total number of
occurrences of all items,
- c is the number of columns with a non-zero sum,
- P(M) is comprised between 0 and 1, with 1 representing a 100% predictivity.
P(M) has the additional advantage of accounting for the reliability of the
estimates. This is useful for comparing different matrices, because the total
number of occurrences has a non-linear effect on the distribution of probabilities.
Getting into the theoretical and empirical justification of this measure would
go beyond the scope of the present paper. Therefore the readers are invited to
consult Berchtold and Ristschard ( 1996) as well as Berchtold (1997). Let us just
stress the difference between the dynamic probabilities of a n order transition
matrix and static association measures such as item co-occurrences. For example,
with the means-end data analysed in the next section (2.2), the correlation
between the markovian probabilities and the number of co-occurrences is -0.079,
with a probability of 0.099 (Bartlett's chi-square). Following the observations
made in other fields of consumer behavioural processes, this discrepancy
suggests that a Markovian approach may better reduce the entropy within a
means-end chains process.
The definition of the proposed predictivity measure enables three fundamental
analyses:
1. Checking the means-end predictivity of the items and filtering the less valid
ones. Typically, it could follow a backward stepwise approach, by removing
one by one the less valid items, until the matrix P(M) score is significantly
degraded.
A basic GA is as follows:
In the present study, the chromosomes are possible sets of items, with a
structural constraint that a solution comprises at least two items, one of which
among the terminal goals. This derivative-less constraint is particularly easy to
implement in a GA. Chromosomes that do not respect it are purely and simply
discarded, meaning that the reproduction process goes on until all chromosomes
confirm to the necessary structure. The objective function to optimise is the
predictivity of the transition matrix corresponding to the selected set of items.
However, this filtering may degrade the matrix' reliability, that is the transition
probabilities may become less representative of the whole population's ones. The
existing measures of reliability are adapted to continuous or binomial
distributions of probability and do not apply well to the discrete probabilities of
Markov chains. Several aggregation methods have been proposed to translate this
distribution into a binomial one, but they imply a significant reduction of the
information. Berchtold and Ritschard (1997) have then proposed a measure of
unreliability that overcomes this drawback and allows the comparison of
different matrices. Using the parameters defined in (1) the unreliability of a
transition matrix M is measured as:
c lli(l-millj(Sij)
R(M) = ~ (2)
n ~I lli +1
The initial values of predictivity and reliabilility were respectively 0.24 and
0.91. The optimised values were 0.408 and 0.95, corresponding to a ratio of
means-end validity equal to 0.448 and indicating a progression of both criteria.
It is apparent from table 3 that two levels are lowering the global validity of
the process: concrete attribute and psychosocial consequence. This result is
consistent with the observation that a majority of concrete attributes had a low
validity (table 2). Therefore, the hypothesis that the concrete attribute level may
be skipped cannot be rejected.
Furthermore, the validity of the database was checked with the two lower
validity levels cancelled simultaneously. The resulting validity was 0,3059,
outperforming all those obtained previously. Hence, in this sample of cigarette
smokers, it appears that a better means-end connection exists between brands and
terminal values when only three levels are considered: the abstract attributes, the
functional consequences and the instrumental values. This result gives some
empirical credence to the concept that a limited number of hierarchical levels
might suffice and, more importantly, that these levels should be divided
according to Gutman's initial distinction of attributes, consequences and values.
Measuring and optimizing the validity of Means-End data 159
However, this a-posteriori result may be highly depending on the type of product
and on the data collection method. Therefore, a cautious approach would be to
follow systematically the stepwise filtering presented here.
The same validity analysis may be extended to whole databases to indicate
which has the best means-end validity and, therefore, inform on the validity of
the corresponding data collection measures. To illustrate this approach, the
relationship between MEC length and means-end validity is studied with the
cigarette data in two ways. First, by measuring the means-end validity
corresponding to all ladders with a number of items inferior or equal to a specific
value, and second, by measuring the means-end validity of two groups of ladders
(i.e. those with 4 items or less, and those with more than 4 items). The results are
given in Table 4.
The best means-end validity is obtained by retaining the MEC with 4 items or
less. These length MECs have a validity of 0.3 700, versus 0.3568, for those with
more than 4 items. These results are consistent with those of table 3, and so
confirm that a length of 4 items or less is associated with the best means-end
validity levels. Since this consistency was observed with elements of a split
sample, they indicate that the proposed method is robust.
CONCLUSION
In this study about means-end chains, the conceptual and empirical debate
among researchers has been analysed at the data collection and the data analysis
levels. This discussion has laid stress on two necessities:
A Markov chain approach was then proposed to satisfy the first necessity in
the case of transient processes. Furthermore, the second necessity could be
addressed by proposing an indicator for the latent concept of "means-end
validity". This indicator is based on the measure of "internal predictivity"
recently developed in Markov chain modelling. It indicates to which level any
item of a dynamic process is able to transmit its input to other items. This
measure has useful statistical characteristics for analysing means-end data. First,
it is based on a measure at the item level, thus allowing to analyse different levels
of aggregation (item, abstraction level, whole database). Second, it is
asymptotically normally distributed, adjusted for the number of items and their
frequencies, thus allowing a wide range of comparisons.
A genetic algorithm enabled filtering the low relevance items, determining the
optimal number of links and comparing means-end data whose structure reflected
different collection methods. It was empirically illustrated with 482 ladders from
cigarette smokers. The empirical results were consistent at three levels:
- At the item level, a greater validity of the means-end network was obtained
by identifying and cancelling the items with the lowest means-end validity,
- At the database level, the illustration was done by splitting the data into sub-
samples with contrasted MEC lengths. Consistent with the analyses at the
item and at the main abstraction levels, it appeared that the shorter MEC, in
particular the 4 items ones have a better means-end validity than the longer
ones.
Overall, the empirical test of the method did not reveal conceptual or
statistical anomaly. The consistence of its result across the sub-samples is also
indicator of some robustness. However, the study should be replicated with other
databases. If the consistency and robustness of the method were confirmed, there
would be at last a possibility to test the different data collection protocols. This
Measuring and optimizing the validity of Means-End data 161
would require dual databases, e.g. one from face to face laddering and one from a
paper and pencil task. When this is done, it should be easier to identify and
represent the dominant MEC hidden into a database. Though this paper was
positioned upstream of this question, the proposed method should provide results
analogous to those classically obtained with Markov chain models, that is trees
with links whose length translates the importance.
REFERENCES
Alwitt L.F. (1991), "Analysis Approaches To Moment By Moment Reactions To Commercials",
Advances in Consumer Research, 18,550-551.
Aurifeille J.-M., Clerfeuille F. Quester P.G. (2001), "Consumers' attitudinal profiles and
involvement", Advances in Consumer Research, XXVIII, June.
Aurifeille J.-M., (2000), "Methodological and empirical issues in market segmentation: a
comparison of the formal and the biomimetic methods", Fuzzy Economic Review, 5(1), 43-60.
Aurifeille J.-M., Valette-Fiorence P. (1995), "Determination of the Dominant Means-End Chains",
International Journal of Research in Marketing, 12,267-278.
Bagozzi R.P., Dabholkar P.A. (1994), "Consumer Recycling Goals and their Effects on Decisions
to Recycle: A Means-End Chain Analysis", Psychology & Marketing, 11(4), 313-340.
Berchtold A. (1997), "Learning in Homogeneous Markov Chains", Proceedings of the 7th congress
oftheAIDRI, Switzerland: Geneva University, 121-125.
Berchtold A., Ristschard G. ( 1996): « Le Pouvoir Predictif des Matrices de Transition )), Actes des
28emes Jourm!es de Statistique, Quebec: Universite de Laval, 164-167.
Cacioppo J.T., Petty R.E., Kao C.F., Rodriguez R. (1986), "Central and Peripheral Routes to
Persuasion", Journal of Personality and Social Psychology, 51, I 032-1043.
Chaiken S. (1987), "The Heuristic Model of Persuasion", in Zanna M., Olson J., Herman C. (eds.),
Social Influence: The Ontario Symposium, 5, 143-177.
Clayes C., Swinnen A., Van den Abeele P. (1995), "Consumers' Means-End Chains for "Think and
"Feel" Products", International Journal of Research in Marketing, 12, 193-208.
DeJong K. ( 197 5), "The analysis and behaviour of a class of genetic adaptative systems", PhD
thesis, University of Michigan.
Eliashberg J., Jonker J.J., Sawhney M.S., Wierenga B. (2000), "MOVIEMOD: An Implementable
Decision Support System for Pre-Release Market Evaluation of Motion Pictures", Marketing
Science, 19(3), 226-243.
Gengler C.E., Klenosky D.B., Mulvey M.S. (1995), "Improving the Graphic Representation of
Means-End Results", International Journal of Research in Marketing, 12,245-256.
Gengler C.E., Reynolds T.J. (1995), "Consumer Understanding and Advertising Strategy: Analysis
and Strategic Translation of Laddering Data", Journal of Advertising Research, July-August,
19-33.
Ghahramani Z., Jordan M. I. (1996), "Factorial Hidden Markov Models", in Tesauro, D. S.
Touretzky, T. K. Leen, (eds.), Advances in Neural Information Processing Systems 7, MIT
Press, Cambridge MA.
Goldberg, D.E. (1991), Genetic Algorithms, Addison-Wesley, USA.
Gutman J. (1982), "A Means-End Model Based on Consumer Categorization Processes", Journal
of Marketing, 46, Spring, 60-72.
162 Chapter 7
Gutman J. (1997), "Means End Chains as Goals Hierarchies", Psychology & Marketing, 14(6),
545-560.
Gutman J., Reynolds T.J. (1979), "An Investigation of the Levels of Cognitive Abstraction Utilized
by the Consumers in Product Differentiation", in Eighmey J. (ed.), Attitude Research Under the
Sun, American Marketing Association, Chicago, 128-150.
Grunert K.G., Grunert S.C. (1995), "Measuring Subjective Meaning Structures by the Laddering
Method", International Journal of Research in Marketing, 12, 209-225 .
Haley R.I. (1968), "Benefit Segmentation: A Decision Oriented Research Tool", Journal of
Marketing, 32, July, 30-35.
Holland J.H. (1975), Adaptation in natural and artificial systems, MIT Press.
Jordan M.l., Ghahramani Z., Saul L.K (1997), "Hidden Markov decision trees", in Mozer M.C.,
Jordan M.I, Petsche T., (eds.), Advances in Neural Information Processing Systems 9, MIT
Press, Cambridge, MA.
Man K.F., Tang K.S., Kwong S. (1999), Genetic Algorithms, Springer.
Meyer R.J., Kahn B.E. (1990), "Probabilistic Models of Consumer Choice Behaviour", in
Kassarjian H., Robertson T.(eds.), Handbook of Consumer Behaviour: Theoretical and
Empirical Constructs, Englewood Cliffs, N.J.: Prentice Hall, 85-123.
Mulvey M.S., Olson J.C., Celsi R.L. Walker B.A. (1994), "Exploring the Relationships Between
Means-End Knowledge and Involvement", Advances in Consumer Research, 21, 51-57.
Olson J.C., Reynolds T.J. (1983), "Understanding Consumers' Cognitive Structures: Implications
for Advertising Strategy", in Percy L., Woodside A.G. (eds.), Advertising And Consumer
Psychology, Lexington, M.A., Lexington Books, 77-90.
Perkins W.S., Reynolds T.J. (1988), "The Explanatory Power of Values in Preference Judgements:
Validation of the Means-End Perspective", Advances in Consumer Research, 15, 122-126.
Peter J.P., Olson J.C. (1987), "Consumer Behaviour: Marketing Strategy Perspectives",
Homewood, IL, Irwin.
Pieters R., Baumgartner H., Allen D. (1995), "A Means-End Approach to Consumer Goal
Structures", International Journal ofResearch in Marketing, 12, 227-244.
Pitts R.E., Wong J.K, Whalen D.J. (1991), "Consumers' Evaluative Structures in two Ethical
Situations: A Means-End Approach", Journal of Business Research, 22, 119-130.
Puterman M.L. (1994), Markov Decision Processes: Discrete Stochastic Dynamic Programming,
John Wiley and Sons.
Renders J.-M. (1995), Algorithmes genetiques et reseaux de neurones, Hermes.
Reynolds T.J., Craddock A.B. (1988), "The Application of the MECCAS Model to the
Development and Assessment of Advertising Strategy", Journal ofAdvertising Research, April-
May, 43-54.
Reynolds T.J., Gengler C.E., Howard D.J. (1995), "A Means-End Analysis of Brand Persuasion
Trough Advertising", International Journal of Research in Marketing, 12, 257-266.
Reynolds T.J., Gutman J. (1984), "Advertising is Image Management", Journal of Advertising
Research, 24(1), February-March, 27-37.
Reynolds T.J., Gutman J. (1988), "Laddering Theory, Method, Analysis, and Interpretation",
Journal ofAdvertising Research, February-March, 11-31.
Reynolds T.J., Jamieson L.F. (1984), "Image Representations: An Analytical Framework", in
Jacoby J., Olson J. (eds), Perceived Quality of Products, Services and Stores, Lexington, MA,
Lexington Books.
Reynolds T.J., Rochon J.P. (1991), "Means-End Based Advertising Research: Copy Testing is not
Strategy Assessment", Journal ofBusiness Research, 22, 131-142.
Reynolds T.J., Sutrick K. (1986), "Assessing the Correspondence of One or More Vectors to a
Symmetric Matrix Using Ordinal Regression", Psychometrika, 51(1), 101-112.
Measuring and optimizing the validity of Means-End data 163
Rokeach M.J. (1973), The Nature of Human Values, New York, Free Press.
Spears W.M., DeJong K. (1991), "An analysis of Multi-point crossover", in Rawlins G.J.E. (ed.),
Foundations of Genetic Algorithms, 301-315.
ter Hofstede F., Audenaert A., Steenkamp J-B.E.M ., Wedel M. (1998), "An Investigation Into the
Association Pattern Technique as a Quantitative Approach to Measuring Means-End Chains",
International Journal of Research in Marketing, 15,37-50.
ter Hofstede F. , Steenkamp J-B.E.M., Wedel M., (1999), "International Market Segmentation
Based on Consumer-Product Relations", Journal of Marketing Research, XXXVI, 1-17.
Theil H. (1971 ), "On the Estimation of Relationships Involving Qualitative Variables", American
Journal of Sociology, 76, 103-154.
Valette-Florence P., Rappachi B. (1991b), "Improvements in Means End Chain Analysis Using
Graph Theory and Correspondence Analysis", Journal of Advertising Research, Journal of
Advertising Research, 31, 30-45.
Walker B., Celsi R., Olson J.C. (1987), "Exploring the Structural Characteristics Of Consumers'
Knowledge", Advances In Consumer Research, 14, 17-21.
Walker B.A., Olson J. C. (1991), "Means-End Chains: Connecting Product With Self', Journal of
Business Research, 22, 111-118.
Wedel M., DeSarbo W.S. (1995), "A Mixture Likelihood Approach for Generalized Linear
Models", Journal of Classification, 12, 21-55.
YoungS., Feigin B. (1975), "Using the Benefit Chain for Improved Strategy Formulation", Journal
of Marketing, July, 72-74.
Zajonc R.B., Markus H. (1982), "Affective and Cognitive Factors in Preferences", Journal of
Consumer Research, 9, 123-131.
Chapter 8
Abstract: Financial valuation methods use additive aggregation operators. But a patrimony
should be regarded as an organized set, and additivity makes it impossible for these
aggregation operators to formalize such phenomena as synergy or mutual inhibition
between the patrimony's components. This paper considers the application of fuzzy
measures and fuzzy integrals (such as Sugeno, Grabisch, Choquet) to financial
valuation. More specifically, we show how integration with respect to a non additive
measure can be used to handle positive or negative synergy in value construction.
Key words: Fuzzy measure, Fuzzy integral, Aggregation operator, Synergy, Financial valuation.
INTRODUCTION
165
effect may lead to a value of the set of assets greater (resp. lower) than the sum of
the values of all assets. This is particularly the case in the presence of intangible
assets as good will. We will explore the possibilities offered by non-additive
aggregation operators (Choquet 1953; Grabisch et al. 1995; Sugeno 1977) with
the aim of modelling this effect through fuzzy integrals (Casta, Bry 1998; Casta,
Lesage 2001 ).
- The modem approach - the so-called measurement theory - which has its
origin in social sciences and which extends the measure theory to the
evaluation of sensorial perceptions as well as to the quantification of
psychological properties (Stevens 1951; 1959).
The accounting model for the measurement of value and income is structured
by the double-entry principle through what is known as the balance sheet
equation. It gives this model a strong internal coherence, in particular with regard
168 Chapter 8
The concept of fuzzy integrals is in direct continuity with fuzzy measures and
extends the integrals to measures which are not necessarily additive. It
170 ChapterS
S(f) =max
ae[O,lj
(min(a, ({x lfix)>a }))) (4)
Since it involves only operators max and min, this integral is not appropriate
for modelling synergy.
Choquet's integral of a measurable functionfX~[O,l] relative to a measure f..i
is defined as:
f'(x)
/ ~
I 1"'--
{x lj(x) > y} X
For example, in the case of a finite set X= {x1, x 2, ... xn} with:
we have:
n
C (/) = L [/{xi) - fixi-1)] ~ (Ai)
i=l
C (f)= fc L ~(A).l(A={xlf{x)>y})) dy
AeP( X)
If we denote gA{f) as the value of the expression Jl(A={x I fix) > y}) dy,
Choquet's integral may be expressed in the following manner:
172 Chapter 8
C (f)=
L:
AEP( X) f.!(A). gA(f)
Choquet's integral involves the sum and the usual product as operators. It
reduces to Lebesgue's integral when ,u is Lebesgue's measure, and therefore
extends it to possibly non additive measures. As a result of monotonicity, it is
increasing with respect to the measure and to the integrand. Hence, Choquet's
integral can be used as an aggregation operator.
\1 i, Vj = L Jl(A) . gAifi) + Uj
AEP( X)
and for each group A of variables :J, we compute the corresponding generator as:
gAifJ = dy. I
h=O
l(A={x I Xi> nJJ
The following principle will be used to interpret the measure thus obtained for
AnB=0:
It should be noted that the suggested model is linear with respect to the
generators, but obviously non-linear in the variables :J. Moreover, the number of
parameters only expresses the most general combination of interactions between
the :J. For a small number of variables :J, (up to 5 for example) the computation
remains possible. The question is not only to compute the parameters, but also to
interpret all the differences of the type /1( Au B) - ( /1( A)+ /1( B)).
For a greater number of variables, one may consider either to start with a
preliminary Principal Components Analysis and adopt the first factors as new
variables, or to restrict a priori the number of interactions considered.
3
2
0
A B c
We have:
- {x1Ji(x)>3} =0
- {x IJi(x) > 2} = {C}
- {x IJi(x) > 1} = {A,B,C}
- {x IJi(x) > 0} = {A,B,C}
g{q(i)= l,g{A,B,C}(i)=2,
the other generators having a value of 0. For the whole sample of companies we
have the following generators (Table 1):
4 2 1 6 2 0 0 1 0 0 1
3 1 3 4 0 0 0 0 2 0 1
0 2 4 6 0 0 2 0 0 2 0
4 0 3 2 I 0 0 0 3 0 0
I I I 3.5 0 0 0 0 0 0 I
2 2 2 6.5 0 0 0 0 0 0 2
I 2 2 6 0 0 0 0 0 1 I
2 I 2 4 0 0 0 0 1 0 I
2 1 3 4.5 0 0 I 0 1 0 I
3 1 2 4.5 I 0 0 0 I 0 I
2 2 2 6 0 0 0 0 0 0 2
I 3 I 4 0 2 0 0 0 0 I
2 2 I 5 0 0 0 I 0 0 I
0 2 4 7 0 0 2 0 0 2 0
4 0 3 2.5 I 0 0 0 3 0 0
I I 3 4 0 0 2 0 0 0 I
3 I I 4 2 0 0 0 0 0 I
1 I 3 4.5 0 0 2 0 0 0 1
3 I I 4.5 2 0 0 0 0 0 I
I I I 3 0 0 0 0 0 0 I
0 2 4 5.5 2 0 0 2 0 0 0
3 0 4 2.5 0 0 I 0 3 0 0
I 4 3 9 0 I 0 0 0 2 I
In the example above, there are only two ways to partition subset {a,b,c} using
the kernels: B = {a} u {b} u {c} and B ={a} u {b,c}. The latter uses only B-
maximal kernels, whereas the former does not.
We therefore see that if a subset B can be partitioned in more than one way
using B-maximal kernels only and if the values of ll over the kernels are not
constrained, the extension principle may lead to two distinct values of !l(B). So,
in order for the extension rule to remain consistent, the set of kernels must fulfil
certain conditions.
There are two notable and extreme cases of structures complying with rule R:
abc d efg
Figure 3. Partition 1
abc
1\1\
d efg
Figure 4. Partition 2
Here:
= Jc L L
Ae<J.l:X) NEK,N A-max
Jl(N).l(A={xlf(x) > y})).dy
= L
NEK
Jl(N) L
AE<J.l:X),N A-max
J(l(A={x lf(x)y})).dy
gN(f) = L
AE <J{X},N A-max
J(t(A={xjj(x)>y} )). dy
Thus, we have:
r
4
0
I
a b c d e f a b c d ef
Figure 5. Example
Here:
K= { {a}, {b }, {c} , {d}, {e}, {f} , {a,b} , {a,b,c }, {d,e}, {d,f}, {e,f}, {d,e,f}}
We must examine, for y going from 0 to 4, every subset B = {x I j{x) > y}. We
get:
- y=O ~ B = {a,b,c,d,e,j};
- y=1 ~ B = {a,b,c,d,e};
y =2 ~ B = {b,d,e}
- y=3 ~ B = {d};
- y=4 ~ B=0.
Let us consider the B-maximality of kernel {d,e} with respect to each of these
subsets. Kernel {d,e} appears to be B-maximal forB = {a,b,c,d,e} and forB =
{b,d,e}. Hence, we draw: g{d,eJ(j) = 1+1=2.
Synergy modelling and financial valuation 181
Once every kernel generator has been computed, we can proceed to the least
squares step of the learning procedure, just as in § 2.4.
CONCLUSION
ACKNOWLEDGEMENTS
This work took place with the support of the CEREG Laboratory (University
of Paris Dauphine). We thank Helyette Geman for her helpful comments.
REFERENCES
Abdel-Magid M.F. (1979), "Toward a better understanding of the role of measurement in
accounting", The Accounting Review, April, 54(2), 346-357.
Bry X., Casta J.F. (1995), "Measurement, imprecision and uncertainty in financial accounting",
Fuzzy Economic Review, November, 43-70.
Casta J.F. (1994) "Le nombre et son ombre. Mesure, imprecision et incertitude en comptabilite", in
Annales du Management, Xllemes Joumees Nationales des IAE, Montpellier, 78-100.
Casta J.F., Bry X. (1998), "Synergy, financial assessment and fuzzy integrals", in Proceedings of
IVth Meeting of the International Society for Fuzzy Management and Economy (SIGEF),
Santiago de Cuba, II, 17-42.
Casta J.F., Lesage C. (2001), "Accounting and Controlling in Uncertainty: concepts, techniques and
methodology", in J. Gii-Aiuja (ed.) Handbook of Management under Uncertainty, Kluwer
Academic Publishers, Dordrecht.
Choquet G. (1953), "Theorie des capacites", Annates de l'Institut Fourier, 5, 131-295.
Denneberg D. (1994), Non-additive measures and integral, Kluwer Academic Publishers,
Dordrecht.
Ellerman D.P. (1986), "Double-entry multidimensional accounting", Omega, International Journal
of Management Science, 14(1 ), 13-22.
Grabisch M. (1995), "Fuzzy integral in multicriteria decision making", Fuzzy Sets and Systems, 69,
279-298.
Grabisch M., Nguyen H.T., Wakker E.A (1995), Fundamentals of uncertainty calculi with
applications to fuzzy inference, Kluwer Academic Publishers, Dordrecht.
182 Chapter 8
Grabisch M., Nicolas J.M. (1994), "Classification by Fuzzy Integral: Performance and Tests",
Fuzzy sets and systems, 65, 255-271.
ljiri Y. (1967), The foundations of accounting measurement: a mathematical, economic and
behavioral inquiry, Prentice Hall, Englewood Cliffs.
ljiri Y. (1975), "The theory of accounting measurement, Studies" in Accounting Research, 10,
American Accounting Association.
de Korvin A. (1995), "Uncertainty methods in accounting: a methodological overview", in Seigel
P.H., de Korvin A., Orner K. (eds.), Applications of fuzzy sets and the theory of evidence to
accounting, JAI Press, Stamford, Conn., 3-18.
March J.G. (1987), "Ambiguity and accounting: the elusive link between information and decision
making", Accounting, Organizations and Society, 12(2), 153-168.
Mattessich R. (1964), Accounting and analytical methods, Richard D. Irwin, Inc.
Morgenstern 0. (1950), On the accuracy of economic observations, Princeton University Press.
Schmeidler D. (1989), "Subjective probability and expected utility without additivity",
Econometrica, 57(3), 571-587.
Sterling R.R. (1970), Theory of the measurement of enterprise income, The University Press of
Kansas.
Stevens S.S. (1951), "Mathematical measurement and psychophysics", in Stevens S.S (ed.)
Handbook of Experimental Psychology, John Wiley and Sons, New-York, 1-49.
Stevens S.S. (1959), "Measurement, psychophysics and utility", in Churman C.W., Ratoosh P.
(eds.), Measurement: Definitions and Theories, John Wiley and Sons, New-York, 18-63.
Sugeno M. (1977) "Fuzzy measures and fuzzy integrals: a survey", in Gupta, Saridis, Gaines (eds.),
Fuzzy Automata and Decision Processes, 89-102.
Tippett M. ( 1978), "The axioms of accounting measurement", Accounting and Business Research,
Autumn, 266-278.
Vickrey D.W. (1970), "Is accounting a measurement discipline ?", The Accounting Review,
October, 45,731-742 .
Wakker P. (1990), "A behavioral foundation for fuzzy measures", Fuzzy Sets and Systems, 37, 327-
350.
Willett R.J. (1987), "An axiomatic theory of accounting measurement", Accounting and Business
Research, Spring, 66, 155-171.
Zebda A. (1991), 'The problem of ambiguity and vagueness in accounting and auditing",
Behavioral Research in Accounting, 3, 117-145.
Chapter 9
Eric SEVERIN
University of Lille2, 1, Place Deliot BP 381 59020 Lille, eric.severin@freejr
Abstract: This article deals with the influence of the structure of the Board of Directors,
external and internal discipline and size on corporate performance in the economic
and financial fields. By using firstly, self organising maps, and secondly, panel data,
we highlight three main results. Firstly, our results suggest, from a sample of 136
firms, that the relation between the structure of the Board of Directors and
performance is non-linear. Furthermore, one can observe that the least effective
firms are those in which the proportion of outside directors is the highest. Secondly,
variables of leverage, and stock turnover are the main explanatory variables of
performance. Although leverage has a negative influence on performance (Opler,
Titman 1994) conversely, stock turnover has a beneficial impact (Charreaux 1997).
Thirdly, variables of concentration of ownership structure and size have a positive,
but non-significant, influence on performance.
INTRODUCTION
Within this section, we deal with the elements of the theoretical literature and
will present our working hypotheses.
Very numerous papers have dealt with the influence of the Board of Directors
(and more especially of the outside directors) on corporate performance. The
outside director is defined by opposition to the inside director. An inside director
is simultaneously on the Board of Directors and a salaried employee or on the
Board of Directors and in the senior management committee. From this general
definition, one can include several types of actors in the inside administrators'
category: family members and people in business relationship with the firm (but
Mechanisms of discipline and corporate performance 185
Nowadays two main trends deal with the relationship between outside
administrators and performance. The first is unfavourable to outside directors on
the Board of Directors for several reasons.
Some authors (Mace 1986; Vancil 1987) highlight the fact that outside
directors are biased because their recruitment is strongly influenced by
management. Then, their weak involvement hardly incites them to defend
shareholders' interests (Jensen 1993; Mace 1986; Patton, Baker 1987). In short,
as J. Maati (1998) underlines, the exchange of directorships between managers
allows the emergence of a microcosm based on a favourable mutual support for
entrenchment.
In response to these critiques, some authors underline outside directors'
advantages. Firstly, as their reputation depends on their vigilance, they will be
very keen to control leaders efficiently. Secondly, belonging to several Boards of
Directors allows them to diversify their human capital (which favours their
independence) and to improve their appraisal as well as their field of expertise.
Thirdly, the threat oflegal action by the shareholders motivates them to act in the
interest of the latter (Fama, Jensen 1983a). Fourthly, their presence on a Board is
a guarantee of performance because they can examine the different proposals of
the executive committee with detachment (Eisenhardt, Bourgeois 1988; Kosnik
1990).
On the contrary, other empirical studies, carried out on the American market,
re-open the question on the presence and usefulness of outside directors. Thus
Hermalin, Weisbach (1988) failed to observe a link between the amount of
186 Chapter 9
We focus our attention on two variables: the percentage of the three main
shareholders and leverage.
Concentration of capital is a favourable element to the exercise of efficient
control by shareholders (Shleifer, Vishny 1986; Bethel, Liebeskind 1993;
Agrawal, Knoeber 1996). Indeed if there is not one (or several) main
shareholder(s), no shareholder has an interest in using resources (time and funds)
to control the management because he will be the only one to bear the investment
cost whereas all the owners or partners will benefit from this action. Conversely,
in the case of a major shareholder, he is strongly encouraged to invest in
management control, because he will benefit from significant extra profit. From
this argument, we can develop a first hypothesis H 1:
HI: Shareholder concentration positively influences performance.
Financial structure is also able to manage agency conflicts and to reduce the
"free cash flow". Jensen (1986) explained that the best way to reduce conflicts of
interest is to increase debt level. Leverage provides discipline and monitoring not
available to a firm completely financed by equity. According to the 'free cash
flow' theory, debt creates value by imposing discipline on organizations which in
tum reduces agency costs (Jensen 1986). The use of debt has two functions: 1) it
decreases the free cash flow that can be wasted by managers and, 2) it increases
the probability of bankruptcy and the possibility of job loss for managers' (thus
leading to the disciplining effect).
Although leverage has advantages, it can also provoke harmful effects. In
response to Modigliani, Miller (1963), many authors (Altman 19842; Matecot
1984; Wruck 1990) show that operational difficulties and a rise in leverage leads,
ceteris paribus, to a decrease in performance and an increase in the weakness of
the firm. When leverage is not controlled, the firm is likely to file for bankruptcy
(Altman 1984). The costs of bankruptcy provide an explanation for the capital
structure. To this extent, Opler, Titman (1994) specified that debt is a factor of
"financial distress" likely to endanger the firm. Indeed, if a firm is overextended
in debt, the shareholders may doubt its durability. For example, customers may
be reluctant to do business with distressed firms.
In other words, the shareholders have no confidence in a firm that is not able
to meet its commitments. The originality of the findings of Opler, Titman lies in
highlighting the existence of indirect costs harmful to the firm prior to
bankruptcy. In other words, they reverse the causality assumed by Altman. Opler
and Titman (by using a classical methodology based on multiple linear regression
and working with a large sample (46799 firms) and over a long period (1972-
1991)), showed that highly-leveraged firms (i.e. which have leverage in deciles 8
to 10) are more sensitive to an economic downturn. These firms lose market
188 Chapter 9
share. Industry-adjusted sales growth is 13.6% lower (p value <1%) for firms
with leverage deciles 8 to 10 in distressed industries than for less-leveraged
firms . Similarly, industry-adjusted sales growth for firms in distressed industries
with leverage decile 10 is on average 26.4 % lower (p value <1%) than for firms
in leverage decile 1 (the least-leveraged firms). In a distressed industry the
highly-leveraged firms, compared to the others, have a drop in equity value
11.9% greater than firms with leverage deciles 1 to 7 (significant at the 1%
threshold).
In line with the hypothesis of the "free cash flow" (Jensen 1986), one can
expect debt increase and value (debt imposes a discipline on organizations and in
particular on management). Here, debt is a source of "good stress". On the
contrary, if debt is a result of difficulties that are borne by the finn (Opler,
Titman 1994), it can have negative consequences (loss of customer confidence
for instance). In this framework, debt is a source of"bad stress".
From this argument, we can develop two hypotheses H2a and H2b:
H2a: Leverage, a measurement of the lenders' monitoring, has a positive
influence on performance.
2. DATAAND METHODOLOGY
2.1 Data
The sample was chosen from 136 French industrial firms 3 listed on the Paris
Bourse (Reglement Mensuel, Comptant and Second Marche) during the 1991-
19954 period. By means of the Dafsa-Pro and Annee Boursiere databases, we
used all the items able to give information about ownership structure, internal
and external discipline variable and size variable.
Thus, our sample was constituted of panel data. For Sevestre (1992), one of
the advantages of panel data compared to the other temporal series was to show
the dynamics of individual behaviour.
- Size variable
Furthermore, we used a control variable: size defined by the logarithm of the
total assets (LNTA).
2.3 Methodology
variants called the Kohonen Map (Kohonen 1982; 1995). One of the major
advantages in its use is its capacity to deal with non-linear problems in particular.
Our objective was to determine several groups of homogeneous individuals.
Based on the initial results, firstly, we used some classic non-parametric tests
(Wilcoxon8) to highlight significant differences between our groups, and
furthermore, we developed a model by using the technique of panel data.
In order to better to understand the impact of surveillance variables, external
control and size, we carried out an empiric analysis by means of regressions on
panel data as follows 9 :
With:
- Performance = ROI or ROE
- %Al3 =Percentage of the detained capital held by the three main hareholders
FIDEEQUI =Financial debt (in book value)/Total of equity (in book value)
STR = Security turnover rate
LNTA = Logarithm of assets
The advantages of panel data hinge on the fact that one has at ones disposal a
satisfactory amount of data, both from the quantity and variability viewpoints.
Econometrically speaking, all the available data results in great accuracy of the
estimations. Another characteristic of panel data is the predominance of inter-
individual disparities ("between estimator") in the variance of observation. The
final advantage consists in the capacity to carry out cross-sectional and series
estimations from the same data. Although the use of panel data shows a certain
advantage, it is however not free of drawbacks. Thus, the lack of individual
information concerning a variable, considerably weakens the dynamic-type
regressions, like inter-individual estimations ("within estimator"), which are
extremely sensitive to the biases due to omitted variables or measurements errors.
A second limit results from the nature of the sample chosen. Indeed, working on
panel data necessitates the samples being cylindered, which questions their
representativity. It is the reason why all mergers and bankruptcies of firms were
excluded from our sample.
192 Chapter 9
Within this section, we will present our results. Firstly, we will focus on the
influence of outside directors on performance and secondly, on the impact of the
other monitoring variables explaining performance.
Many methods have been used to represent the Kohonen maps A good
synthesis is also provided in the Kohonen book 10 . In this study, we used one
representation to plot the Kohonen grid (or string) on a plane, each unit having
the same size. This is the classical Kohonen map representation of the
relationships between the composition of Board of Directors and Performance
according to the following years:
Ul U2 U3 U4
Outdirs Outdirvs Outdirw Outdirvw
Roivw Roiw Rois Roivs
Roevw Roew Roes Roevs
Figure 1. Year 91
Ul U2 U3 U4
Outdirvw Rois Outdirvs Outdirw
Roivs Roes Outdirs Roivw
Roevs Roiw Roevw
Roew
Figure 2. Year 92
Ul U2 U3 U4
Outdirvw Outdirw Outdirvs Outdirs
Roivs Rois Roiw Roivw
Roevs Roevs Roevw Roevw
Figure 3. Year 93
Mechanisms of discipline and corporate performance 193
Ul U2 U3 U4
Outdirvs Outdirw Outdirvw Outdirs
Roiw Rois Roiw Roivs
Roew Roevs Roew Roevs
Figure 4. Year 94
Ul U2 U3 U4
Outdirvs Outdirs Outdirw Outdirvw
Roivw Roiw Rois Roivs
Roevw Roew Roes Roevs
Figure 5. Year 95
With:
ROIVS, ROIS, ROIW, ROIVW: ROI very strong, strong, weak, very weak.
ROEVS, ROES, ROEW, ROEVW: ROE very strong, strong, weak, very
weak.
- OUTDIRVS, OUTDIRS, OUTDIRW, OUTDIRVW: % outside directors in
the Board of Directors very strong, strong, weak, very weak.
- U1, U2, U3, U4: Unit 1, Unit 2, Unit 3, Unit 4.
Within this section, we will try to determine if some differences exist between
the explanatory variables of performance.
Table 1. Descriptive statistics and Wilcoxon test on monitoring and size variables from 1991 to
1995
Notes:
1. (VW): Firms with very poor performance, W: Firms with poor performance, (S): Effective
firms, (VS): Very effective firms.
2. We use the Wilcoxon test -for monitoring and control variables- between very effective firms
and firms with poor performance).Variables %A13, FIDEEQUI, STR and LNTA are
respectively: the percentage held by the 3 main shareholders, the total of financial debt divided
by the total of equity in book value, the security turnover rate and the logarithm of total assets.
3. ***,**and* significant at the I, 5 et 10% threshold. N =Number of observations.
Our findings give some important points to comment on. First of all, the
results indicate that the effective firms have a statistically different security
turnover rate (at the 1% threshold). Our results are in line with those of
Charreaux ( 1997). Wilcoxon tests give values of -1.89 and -1.71 (significant at
196 Chapter 9
the 10% threshold) for years 1991 and 1992. Results for 1993 and 1994 are much
more significant (-2.26 and -3.11 significant at 1% threshold).
The result for 1995 is consistent with the others but it is not significant (p
value= 0.16) Our results are in line with Titman, Wessels (1988) and Charreaux
(1997). Thus the security turnover rate is a means of putting pressure on
management to act in accordance with investors' interests.
A second point leads us to observe that very strong leverage is associated with
the weakest performances. The results are significant at the 1% threshold for
1993 and 1995, significant at the 5% threshold for 1991 and significant at 10%
for 1992.
If one compares the leverage of very effective firms with those that are the
least effective, one can observe that, on average, the leverage is two or three
times as great as for the least effective firms. Thus, in 1995, the mean leverage
was 1.47 for the least effective firms whereas it only was 0.43 for firms located
in unit 4.
This result is not consistent with the hypothesis of free cash flow (Jensen
1986), but is in line with the results ofOpler, Titman (1994).
In line with the findings of Shleifer, Vishny (1986), Bethel, Liebeskind
(1993), Agrawal, Knoeber (1996), our results suggested that the most effective
firms have a greater shareholder concentration.
For years 1991 and 1993, the mean of shareholder concentration mean was
respectively 71.72% and 75 .72% for the most effective firms whereas 63 .22%
and 61.42% for firms with very poor performances. Nevertheless, the results are
hardly significant (significant at 10% for 1991 and 1993 for the variable %A 13 ).
In short, the size (measured by total assets) seems to have an influence on
performance but the results are not significant except for 1993 (Wilcoxon test -
2.39 significant at the 5% threshold) and for 1992 and 1995 (The Wilcoxon tests
were -1.93 and -1.86, significant respectively at the 10% threshold). This result
seems to suggest that size has a positive influence on performance but that this
impact remains weak.
In order to understand more clearly the role of monitoring and size variables
on corporate performance, we carried out the empirical analysis by using a
regression (Eq. 1).
Mechanisms of discipline and corporate performance 197
CONCLUSION
Our objective, in this paper, was to understand more clearly the relationship
between ownership structure and performance.
The contributions of this article are both economic and methodological.
With regard to the methodological aspect, this work illustrates the utility of
SOM. The strong points of these methods are the following: in particular, they
enable analysis of data whose distribution is not normal, and are able to
determine links between variables that are non-linear. The Kohonen algorithm
allows individuals within homogeneous groups to be grouped.
Several results appear:
Firstly, our results suggest, from a sample of 136 firms, that the relationship
between structure of Board of Directors and performance is non-linear.
198 Chapter 9
Furthermore, one can observe that the least effective firms are those in which the
proportion of outside directors is the highest. Secondly, variables of leverage, and
stock turnover are the main explanatory variables of performance. Although
leverage has a negative influence on performance (Opler, Titman 1994) (result
significant at 1% threshold) conversely stock turnover has a beneficial impact
(Charreaux 1997) Thirdly, variables of concentration of ownership structure and
size have a positive, but not significant, influence on performance.
These results lead us to question several points. The first is the measurement
of the shareholders' concentration. The percentage of the capital (held by the
main shareholders) is probably different from the percentage of voting rights.
Unfortunately, we know of no French database with this accurate information.
The second important point in the ownership structure is the presence of
institutional investors among shareholders. These shareholders can force
managers to create value. Taking into account these elements in the future will
allow us to more clearly understand the influence of ownership structure on
performance.
ACKNOWLEDGEMENTS
The author thanks Fany Declerck, Eric De Bodt and the anonymous referees
for their comments.
APPENDIX 1
Many traditional 12 methods presuppose strong hypotheses: in particular, the assumption of
normality. To test this, we examined the distribution of the ratios. Using the Kolmogorov test, we
were able to test normality. Our results suggest that our ratios do not have a normal distribution and
that there are extreme values which require the use of qualitative data. This non-normality and the
presence of extreme values led us to cluster our individuals into 4 classes. Hence, we first
transformed each Xi character (that is to say the performance variables: ROI, ROE and the
composition of Board of Directors: OUTDIR) into 4 categories (very strong, strong, weak, very
weak) and secondly transformed our variables into binary variables. The table of values of X
turned into a table of N rows (in our cases the 136 individuals) and 12 columns corresponding to
12 qualitative data (each variable has 4 categories).
Thus we used a specific kind of self-organized map (SOM) called the Kohonen map 13 • The
Kohonen algorithm 15 is a well-known unsupervised learning algorithm which produces a map
composed of a fixed number of units (figs. 1,2, 3, 4 and 5 present a one-dimensional map,
frequently called a string). Each unit has a specific position on the map and is associated with an n-
dimensional vector ~ (which will define its position in the input space); n being the number of
dimensions of the input space. Moreover, a physical neighborhood relation between the units is
defined. Units I and 3 are neighbors of Unit 2 and for each unit i . Vr(i) represents the
neighborhood with the radius r centered at i .
Mechanisms of discipline and corporate performance 199
After learning, each unit represents a group of individuals with similar features. The
correspondence between the individuals and the units more or less respects the input space
topology: individuals with similar features correspond to the same unit or to neighboring units. The
final map is said to be a self-organized map that preserves the topology of the input space.
The learning a~orithm takes the following form:
wi
- at time 0 ~0) is randomly defined for each unit i ,
- at time t we present a vector x(t) randomly chosen according to the input density J and we
determine the winning unit ( which minimizes the Euclidean distance between x(t) and Wi (t).
We then modify the ~ in order to move the weights of the winning unit ( and its physical
neighbors towards x(t) using the following relations:
where &(t) is a small positive adaptation parameter. r(t) is the radius of Vr(t). and c(t) and
r(t) are progressively decreased during the learning 15 .
This is clearly a competitive kind of algorithm (each unit competes to be the closest to the
presented individual) which will perform two interesting tasks for data analysis:
1) clustering: each unit will be associated with a similar kind of individual. the Wi vector
associated with the unit converging toward the mean profile of the associated individuals.
2) reduction in the number of dimensions: the (at least local) proximities between the units will
give us an idea of the proximities of clusters of individuals in the input space.
A last remark concerning the neighbourhood: it is reduced progressively to finish at value 0
(only the winning unit is displaced). The Kohonen algorithm then turns into a vectorial
quantification.
APPENDIX2
The panel nature of our data allowed us to use a panel data methodology for our empirical
research. As Dormont (1989) states, this type of analysis presents clear advantages over cross-
sectional or time-series studies. For instance, it can check for firm heterogeneity, and reduce
colinearity among variables that are considered. Moreover, this technique enabled us to eliminate
the potential biases in the resulting estimates due to correlation between unobservable individual
effects and the explanatory variables in the study. Our panel data may be represented as follows:
Yit = xit f3 + 1Ji + uit
Where Y is the dependent variable, X is a vector containing all explanatory variables, f3 is
also a vector with the variable coefficients that we attempt to estimate, 1Ji denotes the
unobservable individual specific effect that is time - invariant, and uit is the random error, with i
denoting firms (cross - section dimension) and t denoting years (time-series dimension).
A critical question on cross section models is to identify whether the unobservable individual
effects are fixed or random, that is if these effects are orthogonal or not compared to the exogenous
variables considered. Usually, the individual effects are correlated with the independent variables,
and as Dormont ( 1989) asserts, this generates biases in the least squares estimators.
Notwithstanding, one of the main advantages of panel data models, like the one we used in this
work, is that they give us the possibility of eliminating the cited biases.
200 Chapter 9
To verifY the character of individual effects, Hausman's specification test is generally used.
The null hypothesis can be written as follows : [HO: cov(TJ; , X;1=0]. If we accept the null hypothesis,
the individual effects are supposed to be random and we will have to apply generalized least
squares (GLS) to our model with instrumental variable estimators. However, if we find that HO is
false, the individual effects are fixed and the GLS estimator biased and inconsistent. In this latter
case we will have to transform our original model, subtracting the average of the variables from it:
lit-~ =(x;,-xi)fJ+(uu-lii)
With this new model, we can use the ordinary least squares (OLS) to estimate its parameters. By
doing so, the model will provide unbiased estimators.
The outcome of Hausman's specification test in our study enabled us to reject the hypothesis
regarding the correlation between the individual unobservable effects and the explanatory variables,
and thereby, the choice was the model of random effects.
NOTES
I. M.C. Jensen (1986, p. 323-329) defines free cash flow as the cash available when all projects
with NPV>O have been realized.
2. In E.I. Altman (1984) direct costs are only one element in the total costs of bankruptcy. Indirect
bankruptcy costs are the lost profits that a firm can be expected to suffer due to a significant
bankruptcy potential (e.g. loss of reputation)
3. SBF is a data set in which sectorial classification is given. We can note 7 industrial sectors:
energy, basic commodities, construction, consumer goods, car industry, other consumer goods,
food industry. With a Chi-square test, we reported no statistical difference.
4. The period has been chosen in reference to the Cadbury report (1992) and Vienot report (1995).
5. The link between economic (stakeholders' interests) and financial (shareholders' interests)
performance lies in the leverage effect. Leverage represents the influence of financial structure
on corporate performance.
6. We measured the ratio interests divided by total financial debt and noted no absurd value (that is
to say value outside the market conditions).
7. In particular, we checked if the equity value was always positive. All values were positive.
8. It is a rank test. Its justification is due to the non-normality of the data. Such tests are very
robust. By arranging the different observations (i.e. by giving them a rank), one identifies the
place of each observation in the sample. One substitutes rank per observation. Thus one
neutralises problems concerning the accurate measurement of the value for every observation.
We can also note that the results of rank tests are not altered by the distributions of observations
(symmetrical, non-symmetrical. .. ).
9. A very good synthesis is presented by D. Gujarati Basics Econometrics, Third Edition,
McGraw-Hill, 499-539.
I 0. T. Kohonen: "Self-Organising Maps", Springer Series in Information Sciences Vol. 30, Springer
Berlin, 1995.
II . The outcome of Hausman's specification test in our study enables us to reject the choice of the
model with fixed effects. The values are respectively 5.09 and 3.47 for ROI and ROE. These
two values are lower than the values of Chi-square at the I 0% threshold, 9 .23). This result leads
us to accept the model with random effects.
12. A traditional method is to use a multidimensional analysis. This analysis aims at a structuring of
sample studied in relation to the measured variables. Thus, we can visualise the existing
Mechanisms of discipline and corporate performance 201
relations between statistical variables. If we apply this method to qualitative variables, we carry
out a MCA (Multiple Component Analysis) (Volle 1981).
13.An extensive presentation can be found in T. Kohonen, "Self organized formation of
topologically correct feature maps", Biological Cybernetics, 43, 1982, p. 59 or in T. Kohonen,
"Self-Organizing Maps", Springer Series in Information Sciences Vol. 30, Springer, Berlin,
1995.
14. The Kohonen algorithm led to some theoretical studies (Cottrell and al. 1994; Ritter, Schulten
1988), and to interesting applications in economics (and more specifically in finance) . The self
organised map (SOM) deals with quantitative data. In our paper we adopted them to qualitative
data.
15 .For the stochastic algorithm t:(t) must follow the requirements ofRobins-Monro.
REFERENCES
Agrawal A., Knoeber C.R. (1996), "Firm performance and mechanisms to control agency problems
between managers and shareholders", Journal of Financial and Quantitative Analysis, 31(3),
377-397.
Altman E.I. ( 1984), "A Further empirical investigation of the bankruptcy cost question", Journal of
Finance, 39(4), 1067-1089.
Bethel J.E., Liebeskind J. (1993), "The effects of Ownership and Corporate Restructuring",
Strategic Management Journal, 14, Summer, 15-31.
Bhagat S., Black B. (1998), "Board lndependance and Long Term Firm Performance", Working
Paper, 143, The Center for Law and Economic Studies-Columbia Law School.
Charreaux G. (1997), Le gouvernement des en/reprises: Corporate Governance, Theories et Faits,
Economica.
Cottrell M. , Fort J.C., Pages G. (1994), "Two or three things that we know about the Kohonen
algorithm", in Proceedings of ESANN, Bruxelles, 235-244.
Cottrell M., Fort J.C., Pages G. (1998), "Theoretical Aspects of the SOM Algorithm",
NeuroComputing, 21 , 119-138.
Cottrell M., Letremy P, Roy E. (1993), "Analysing a Contingency Table with Kohonen Maps: a
Factorial Correspondence Analysis", in Proceedings of IWANN '93, Springer Verlag, 305-311.
Dormont B. (1989), Introduction a l'econometrie des donnees de panel: theorie et applications a
des echantillons d'entreprises, Monographies d'econometries, Editions du CNRS.
Eisenhardt K., Bourgeois L. (1988), "Politics of strategic decision making in high-velocity
environments: toward a midrange theory", Academy of Management Journal, 31 , 737-770.
Fama E.F., Jensen M.C. (1983a), "Separation of ownership and control", Journal of Law and
Economics, 26, 301-324.
Fort J.C., Pages G. (1995), About Kohonen algorithm: strong or weak self organization, Neural
Networks.
Hermalin B.E., Weisbach M.S. (1988), "The determinants of board composition", Rand Journal of
Economics, 19(4), 589-606.
Jensen M.C. (1986), "Agency costs of free cash flow, corporate finance and takeovers", American
Economic Review, 76, 323-329.
Jensen M.C. (1993), "The modem industrial revolution, exit and the failure of internal control
systems", Journal of Finance, 48(3), 831-880.
Kohonen T. (1982), " Self organized formation of topologically correct feature maps", Biological
Cybernetics, 43, 115-164.
202 Chapter 9
Kohonen T. (1995), "Self Organizing Maps", Springer Series in Information Sciences, 30, 12-27.
Kosnik R. ( 1990), "Effects of board demography and directors' incentives on corporate greenmail
decisions", Academy of Management Journal, 33, 129-151.
Maati J. (1998), "Le conseil d'administration: outil de controle et d'ordonnancement social des
firmes en France", Colloque de !'Association Franyaise Financiere (AFFI).
Mace L. (1986), Directors, myth and reality, Harvard Business School Press, Boston.
Malecot J.-F. (1984), "Theorie financiere et couts de faillite", PhD Thesis, France.
McConnell J.J., SERVAES H. (1990), "Additional evidence on equity ownership and corporate
value", Journal of Financial Economics, 27, 595-612.
Modigliani F., Miller M., (1963), "Corporate income taxes and the cost of capital: a correction",
American Economic Review, 53, 433-443 .
Morek R., Shleifer A., Vishny R.W. (1988), "Management Ownership and Market Valuation",
Journal of Financial Economics, 20, 293-315.
Opler T., Titman S. (1994), "Financial Distress and Corporate Performance", Journal of Finance,
49(3), 1015-1040.
Patton A. Baker J. (1987), "Why do not directors rock the board?", Harvard Business Review, 65,
10-12.
Ritter H., Schulten K. (1988), "Convergence Properties ofKohonen's Topology Conserving Maps:
Fluctuations, Stability and Dimension Selection", Bioi Cybernetics, 60, 59-71.
Robins H. Monro H. (1951 ), "A Stochastic Approximation Method", Annates de Mathematiques et
de Statistiques, 22, 400-407.
Rosenstein S., Wyatt J.G. (1990), "Outside directors, board independence and shareholder wealth",
Journal of Financial Economics, 26, 175-191.
Rosenstein S., Wyatt J.G. (1997), "Inside directors, board effectiveness and shareholder wealth",
Journal ofFinancial Economics, 44, 229-250.
Sevestre P. (1992), "L'econometrie sur donnees individuelles-temporelles: une note introductive",
INSEE, 9204.
Shleifer A., Vishny R.W. (1986), "Large shareholders and corporate control", Journal of Political
Economy, 94,461-479.
Short H., Keasey K. ( 1999), "Managerial Ownership and the performance of firms : evidence from
the UK", Journal of Corporate Finance, 5, 79-101.
Sundaramuthy C., Mahoney J., Mahoney J. (1997), "Board structure, antitakeover provisions and
stockholder wealth", Strategic Management Journal, 18, 3, 231-245.
Titman S., Wessels R. (1988), "The determinants of capital structure choice", Journal of Finance,
43(1), 1-19.
Vancil R. (1987), Passing the baton: managing the process of CEO succession, Harvard Business
School Press, Boston.
Voile M. (1981), Analyse des donnees. Economica, 2° ed., Paris.
Weisbach M.S. (1988), "Outside directors and CEO turnover", Journal of Financial Economics,
20,431-460.
Wruck K.H. (1990), "Financial distress, reorganization, and organizational efficiency", Journal of
Financial Economics, 27,419-444.
Chapter 10
Key words: Option pricing, Artificial Neural Networks, Radial-Basis Function Networks, Non-
linear approximation, Preprocessing techniques
INTRODUCTION
The most common type of approximator is the linear approximator. It has the
advantage of being simple and cheap in terms of computation load, but it is
obviously not reliable if the true relation between the inputs and the output is
nonlinear. One has then rely on nonlinear approximators such as artificial neural
networks.
The most popular artificial neural networks are the multilayer perceptrons
(MLP) developed by Werbos (1974) and Rumelhart (1986). In this chapter, we
will use another type of neural networks: the radial basis function networks (or
RBFN) (Powell 1987). These networks have the advantage of being much
simpler than the perceptrons while keeping the major property of universal
approximation of functions (Poggio, Girosi 1987). Numerous techniques have
been developed for RBFN learning. The technique that we have chosen has been
developed by Verleysen and Hlavackova (1994). This technique is undoubtlessly
one of the simplest ones but it gives very good results. The RBFN and the
learning technique chosen will be presented in section 1.
We will demonstrate that the results obtained with RBFN can be improved by
a specific pre-treatment of the inputs. This pre-treatment technique is based on
linear models. It does not complicate the RBFN learning but yields very good
results. The pre-treatment technique will be presented in section 2.
These different techniques will be applied to option pricing. This problem has
been successfully handled by for instance Hutchinson, Lo and Poggio (1994), a
work that has surely widely contributed to give credibility to the use of artificial
neural networks in finance. The existence of a chapter dedicated to neural
networks in the work of Cambell, Lo and MacKinlay ( 1997) sufficiently attests
to it. Hutchinson et al., by using notably simulated data, have demonstrated that
RBFN allow to price options, and also to form hedged portfolios. The choice that
the authors made of the determination of a call option price as application
domain of neural networks in finances is certainly not an accident. The financial
derivatives assets indeed characterize themselves by the nonlinear relation that
links their prices to the prices of the underlying assets. The results that we obtain
are comparable to those of Hutchinson et al. but with a simplified learning
process. We will demonstrate with this example the advantages of our technique
of data pre-treatment. This example will be handled in detail in section 3.
1. APPROXIMATION BY RBFN
m
Yt = L),i<l>(xt ,Ci ,oJ (Eq.l)
i=l
t = 1 to N , with
(Eq.2)
(Eq.3)
with x 1 the considered point, C; the closest centroid to x 1, and a a parameter that
decreases with time. Further details on vector quantization methods can be found
in Kohonen (1995) and Gray (1984).
206 Chapter 10
The second parameter to be chosen is the standard deviation (or width) of the
different Gaussian kernels (a;). We chose to work with a different width for each
node. To estimate them, we define the Voronoi' zone of a centroid as the space
region that is closest to this centroid than to any other centroid. In each one of
these Voronoi' zones, the variance of the points belonging to that zone is
calculated. The width of a Gaussian kernel will be the product of the variance in
the Voronoi' zone where the node is located, multiplied by a factor k. We will
explain in our application how to choose this parameter (Benoudjit, Archambeau
Lendasse et al. 2002). This method has several advantages, the most important
being that the Gaussian kernels better cover the space of the RBFN inputs.
The last parameters to determine are the multiplicative factors A;. When all
other parameters are defined, these are determined by the solution of a system of
linear equations.
The total number of parameters equals m *(n+ J)+ I with n being the
dimension of the inputs space and m being the number of Gaussian kernels used
in the RBFN.
One of the disadvantages of the RBFN that we have presented is that they
give an equal importance to all input variables. This is not the case with other
approximators of functions such as the MLP. We will try to eliminate this
disadvantage without penalizing the parameters estimation process of the RBFN.
Approximation by radial basis function networks 207
Let's suppose first that all inputs are normalized. We understand by this that
they all have zero mean and unit variance. If we build a linear model between the
inputs and the output, the latter will be approximated by a weighted sum of the
different inputs. The weighting associated to each input determines the
importance that this latter has on the approximation of the output. Indeed, if one
differentiates the linear model with respect to the different inputs, one finds back
these very same weightings. This is illustrated in the following example:
(Eq.4)
which yields:
(Eq.5)
(Eq.6)
We thus dispose of a very simple mean to determine the relative importance that
the different inputs have on the output.
We will then multiply the different normalized inputs by the weighting factors
obtained from the linear model. These new inputs will be used in a RBFN such as
the one we presented in the previous section. This new RBFN that we will
qualify as «weighted», will thus give a different importance to the different input
variables.
3. OPTION PRICING
The initial success of neural networks in finance has most surely been
motivated by the numerous applications presented in the field of assets price
prediction (Cottrell, de Bodt, Levasseur 1996) present a wide synthesis of
obtained results in this field). The emergence of nonlinear prevision tools and
their universal approximation properties, obviously not well understood, brought
in new hopes. Quickly though, it appeared that forecasting the price of assets
remains an extremely complex problem, that the concept of financial markets
informational efficiency introduced by Fama (1965) is no idle words; to
overperform financial markets, after having taken account of the transaction costs
and the level of risk taken is not simple.
208 Chapter 10
The application studied in this section, the modeling of the behavior of the
price of a call option, as developed by Hutchinson, Lo and Poggio (1994 ),
presents a typical case of application of neural networks in finance. The prices of
the derivatives depend nonlinearly on the price of the underlying assets. Major
advances have been introduced in finance to set up analytical evaluation formulas
for assets derivatives. The most famous is undoubtedly the one established by
Black and Scholes (1973), daily used nowadays by the majority of financial
operators. Evaluation formulas of options prices are based on very strict
assumptions among which, for example, the fact that the actions prices follow a
geometric Brownian motion. The fact that these assumptions are not strictly
verified in practice explains that the prices observed on financial markets deviate
more or less significantly from the theoretical prices. In this context, to dispose
of a universal function approximator, capable of capturing the nonlinear relation
that links an option price to the price of its underlying asset, but that does not rely
on the assumptions necessary for the setting up of analytic formulas, presents an
obvious interest. It is though necessary that the proposed tool be reliable and
robust for it to be adopted by the financial community. This is indeed our major
concern.
with
(Eq.8)
and
Approximation by radial basis function networks 209
(Eq.9)
In the above formulas, C(t) is the option price, S(t) the stock price, X the strike
price, r the risk-free interest rate, T-t the time-to-maturity, a the volatility and <I>
is the standard normal distribution function. If r and s are stable, which is the
case in our simulations, the price of the call option will only be function of S(t), X
and T-t. The approximation type that has been chosen is the following:
(Eq.lO)
For our simulation, the prices of the option during a period of two years will be
generated, in a classical way, by the following formula:
I
IZ;
S(t)=S(O)ei=i (Eq.ll)
taking the number of working days by year equal to 253, and Z1 a random
variable extracted from a normal distribution with 1.1 = 0.10/253, and variance
d = 0.04/253. The value S(O) equals 50 US$.
The strike price X and the time-to-maturity T-t are determined by the rules of
the «Chicago Board Options Exchange» (CBOE) (Hull 1993). In short, the rules
are the following:
1. The strike price is a multiple of 5$ for the stock prices between 25 and 200$;
2. The two closest strike prices to the stock prices are used at each expiration of
options;
3. A third strike price is used when the stock price is too close to the strike price
(less than one dollar);
4. Four expiration dates are used: the end of the current month, the end of the
next month and the end of the next two semesters.
~OL-----~
, 00
-------~
~----~~
~----~
400
L------~
~----~
~
Jou-
Note: The continuous line represents the action price. The oblique lines represent the different
exercise prices. These are represented obliquely to make visible the different introduction and
expiration dates.
The call option prices obtained usmg these trajectories are represented m
Figure 3.
05
0.4
0.3
50.2
0 .1
0
0.5
1.6
T-1
SIX
Figure 3. Option purchase prices obtained by using the simulated trajectories and the Black and
Scholes formula .
Approximation by radial basis function networks 211
(Eq.12)
with V(t) being the portfolio value at time t, Vs the stock value, V8 the obligations
value, and Vc the option value. If the option price is correctly evaluated, V(1)
should at any time be equal to zero, given that it is a fully hedged portfolio. The
more the tracking error (/;) deviates from 0, the more the option price deviates
thus from its theoretical value. The prediction error is based on the classical
formula of variance decomposition (the variance is equal to the difference
between the expectation of the squared variable and its squared expectation). The
expected squared V(1), in other words the prediction average quadratic error,
equals thus the sum of its squared expectation and its variance. The terms err
represent the actualization terms in continuous time, allowing the addition of
obtained results at different moments in time. A more detailed explanation of
these criteria can be found in Hutchinson, Lo, Poggio (1994).
3.3 Results
The results obtained for R2 (averaged on the one hundred test-sets) are
presented in Figure 4, as a function of k, the coefficient used to compute the
width of the nodes. The value of k to be used is chosen as the smallest value
giving a result (in terms of R2) close to the asymptote, that is to say a value that
can be found in the elbow of the curves in Figure 4. The value of k = 4 has been
chosen in this case.
---------------------------------·
0.95
.,.. .......... --
/
/
/
0.9 /
/
/
I
I
I
0.85 I
I
I
I
I
I
tr 0.8
I
I
I
I
I
I
I
0.75 I
I
I
I
I
I
I
0.7 I
I
I
I
I
I
I
0.65
5 10 15
k
The results obtained for f and 17 are also in favour of the weighted RBFN.
Table 1 presents the average values and the standard deviations of R2, ~ and 11 for
both types of RBFN. As for the performance measures for the Black and Scholes
exact formula, we have~= 0.57 and 11 = 0.85.
Approximation by radial basis function networks 213
Table 1. Average values and standard deviations of R2 , ~ and 11 for both types of RBFN.
CONCLUSION
ACKNOWLEDGMENTS
REFERENCES
Benoudjit N., Archambeau C., Lendasse A., Lee J., Verleysen M. (2002), "Width optimization of
the Gaussian kernels in Radial Basis Function Networks", ESANN 2002, European Symposium
on Artificial Neural Networks, Bruges (Belgium), 425-432.
Black F., Scholes N. (1973), "The pricing of options and corporate liabilities", Journal of Political
Economy, 81, 637-659.
Cambell Y., Lo A., MacKinlay A. (1997), The Econometrics of Financial Markets, Princeton
University Press, Princeton.
214 Chapter 10
Cottrell M., de Bodt E., Levasseur M. (1996)," Les reseaux de neurones en finance: Principes et
revue de Ia litterature", Finance, 16, 25-92.
Fama E. (1965), "The Behaviour of Stock Market Prices", Journal of Business, 38, 34-105.
Gray R. (1984), "Vector Quantization", IEEE Mag., 1, 4-29.
Hull J. (1993), Options, Futures, and Other Derivative Securities, 2"d ed., Prentice-Hall,
Englewood Cliffs, New Jersey.
Hutchinson J., Lo A., Poggio T. (1994), "A Nonparametric Approach to Pricing and Hedging
Securities Via Learning Networks", The Journal of Finance, XLIX(3).
Kohonen T. (1995), "Self-organising Maps", Springer Series in Information Sciences, 30, Springer,
Berlin.
Poggio T., Girosi F. (1987), "Networks for approximation and learning", Proceedings of IEEE 87,
1481-1497.
Powell M. (1987), "Radial basis functions for multivariable interpolation: A review", in Mason
J.C., Cox M.G. (eds.), Algorithms for Approximation, 143-167.
Rumelhart D., Hinton G., Williams R. (1986), "Learning representation by back-propagating
errors", Nature, 323, 533-536.
Verleysen M., Hlavackova K. (1994), "An Optimised RBF Network for Approximation of
Functions", ESANN /994, European Symposium on Artificial Neural Networks, Brussels
(Belgium), 175-180.
Werbos P. (1974), "Beyond regression: new tools for prediction and analysis in the behavioral
sciences", PhD thesis, Harvard University.
Chapter 11
Abstract: The movements of a term structure of interest rates are commonly assumed to be
driven by a small number of uncorrelated factors. Identified to the level, the slope,
and the curvature, these factors are routinely obtained by a Principal Component
Analysis (PCA) of historical bond prices (interest rates). In this paper, we focus on
the Independent Component Analysis (ICA). The central assumption here is that
observed multivariate time series reflect the reaction of a system to some (few)
statistically independent time series. The ICA seeks to extract out independent
components (ICs) as well as the mixing process. Both ICA and PCA are linear
transform of the observed series. But, whereas a PCA obtains uncorrelated
(principal) components, ICA provides statistically independent components. In
contrast to PCA algorithms that use only second order statistical information, ICA
algorithms (like JADE) exploit higher order statistical information for separating the
signals. This approach is required when financial data are suspected to be not
gaussian.
Key words: Term Structure of Interest Rates, Principal Component Analysis, Independent
Principal Component Analysis.
INTRODUCTION
It is now well established that factor analysis is adequate for financial risk
management purposes. Factor or Principal Component analysis (FA, PCA) are
celebrated statistical methods to extract out factors from data and to measure the
way each factor affects or loads on variables. Such analysis are nowadays widely
used to decompose the dynamics of the term structure of interest rates in few
215
underlying components. Among other things, it indicates that interest rate models
should not be so parsimonious as to have only a single underlying source of
uncertainty. But hedging fixed income securities remains straightforward by
introducing the so-called factor durations. Analogous to the standard Macaulay
duration, they are easy to compute. In addition, factor durations of a portfolio are
the weighted averages of the portfolio components factor durations.
As explained by Bliss (1997), economic variables that may affect interest
rates dynamics include (along with innumerable others) the supply and demand
for loans, announcements of unemployment and inflation, and changes in market
participants risk aversion arising from perceived changes in the prospects for
continued economic growth. The key assumption of a factor model is that this
multitude of influences can be compactly summarized by a few variables, called
factors, that capture the changes in the underlying determinants of interest rates.
Part of the process of performing a factor model is to examine just how
reasonable that assumption is. The factor shocks are not the fundamental causes
of changes in the term structure; rather, they are sufficient statistics for fully
capturing the underlying economic shocks that do cause the changes.
Rather than PCA or FA, this paper applies Independent Component Analysis
(ICA) to extract a structure from bond returns. (see Back and Weigend (1997) for
stock returns and Ane and Abidi (2001) for implied volatilities). Because this
method is an alternative approach for finding underlying factors or components
from multivariate statistical data. ICA must be contrasted with the classical
Principal Component Analysis (hereafter PCA). Both ICA and PCA linearly
transform the observed signals into components. The key difference, however, is
in the type of components obtained.
The goal of PCA is to obtain components which are uncorrelated 1• PCA
algorithms therefore use only second order statistical information. According to
several papers, e.g. Bliss (1997), the three-factor decomposition of the
movements in interest rates uncovered by Litterman and Scheinkman (1991) is
robust. The ability of the three factors (level, slope, and curvature) to explain
observed changes in interest rates is high in all subperiods studied. ICA find
components which are statistically independent. Independence being a much
stronger property than uncorrelatedness, PCA and ICA may generate different
components. Several algorithms are nowadays available. Recent contributions to
ICA lie in the neural network literature. Most of the ones found in the signal
processing literature rather exploit the algebraic structure of higher order
moments of the observations. Cardoso (1999) pointed out that this last approach
is largely ignored by the neural network community involved in ICA. He corrects
this view by showing how higher-order correlations can be efficiently exploited
to reveal independent component.
The empirical study carried out in this paper uses the JADE 2 algorithm
(Cardoso, Souloumiac 1993). Like many other ICA algorithms, this algorithm
The dynamics of the term structure of interest rate 217
follows a two-stages procedure termed Whitening and Rotation. The first stage is
performed by doing a classical PCA to whiten the observed data. The second
stage consists of finding a rotation matrix which jointly diagonalizes
eigenmatrices formed from the fourth order cumulants of the whitened data. The
outputs from this stage are the independent components. Thus one can say that a
PC Analysis solves half of the problem ofiC Analysis.
This paper is organized in the following way. Section 1 provides a
background to ICA and a guide to some of the algorithms available. Our specific
experimental results for the application of ICA to US bond data are given in
section 2.
The goal of ICA is to recover unknown source signals from their unknown
linear mixtures using the strong assumption that the sources are mutually
independent. The mutual statistical independence of the original signals is thus
the only assumption in such methods.
and x(t) = (x, (t ), x2 (t ), ... , xJt )) the variables observed at date t. In the
standard ICA literature s is called the source signal and x is the mixed signal.
While this is not completely necessary, the number of independent components is
supposed to be equal to the number of observed variables, m = n. In addition, and
without loss of generality, the observed variables are supposed to be centered2•
Assuming a linear relationship between x and s implies with a vector-matrix
notation that the data model is given by:
x(t)= As(t)
where A is the so-called mixing matrix. It is an unknown (nxn) real matrix.
Omitting t, one also has:
x= L:a1s1
J=l
The ICA model is a generative model in the sense that it describes how the
observed data are generated by mixing underlying components4 . Such
independent components are latent variables since they are not directly
observable; recall however that the mixing matrix is unknown. Because one
observes only the random vector x, both A and s must be estimated. This must be
done under as general assumptions as possible. Throughout the paper, the main
assumptions are:
The sources are statistically independent;
- At most one source is normally distributed;
The signals are stationary.
ICA task consists of estimating both the matrix A and the underlying factors s
when we only observe x. If the coefficients of A were known and different
enough to make the matrix A invertible, B = A·' could be computed. The
independent factors, we are looking for, would also be given by:
x(t) = B.s(t)
However, as the only thing we observe is x, only linear transformation:
y(t) = W.x(t)
may be considered where y(t)=(y 1(t), y 2(t), ... ,y0 (t)Y is a recovered signal .As, this
signal is also related to the underlying factors, one has y(t)=WAs(t). If WA=l
then y=s and perfect separation occurs. The problem is finding W, the unmixing
matrix that gives y= Wx - the best estimate of the independent source vector. If
the unknown mixing matrix A is square and nonsingular, then W=A- 1 and s=y.
Else, the best unmixing matrix, which separates sources as independent as
possible, may be given by pseudo-inverse matrices (as the generalized inverse
Penrose-Moore matrix).
Solutions can be found just by considering the statistical independence of s. In
fact, if signals are not gaussian, standard results tell us that it is enough to find
coefficients for W such that the recovered signals (y 1(t) ,y2(t), ... ,y0 (t)) are
statistically independent. Once they are, the recovered signal y is equal to the
original signals s 5 • The separation problem thus boils down to the search of a
linear representation in which the components are statistically independent. In
most practical situations, however, one can find only components that are as
independent as possible.
The potential interest of ICA over PCA is nested in the difference between
independence and uncorrelatedness. Therefore, some recalls merit to be done.
Let's consider two scalar-valued random variables y 1 and y 2 . These variables are
said to be independent if information on the value of y 1 does not give any
information on the value of y 2 , and vice versa. Let's denote p(y 1, y 2) the joint
probability density function (hereafter pdf) of y 1 and y 2 and p 1(y 1) the marginal
pdf of y 1• This pdf is given by p.(y. )= Jp(y.,y2 }1y2 and similarly for y1• y 1 and
y 2 are then said to be independent if and only if the joint pdf is factorizable in the
following way:
220 Chapter 11
If the variables are independent, they are uncorrelated, which follows directly,
taking hi(yi)=Yi· On the other hand, uncorrelatedness does not imply
independence. Assume that (y 1,y2) are discrete valued and follow such a
distribution that the pair are with probability l/4 equal to any of the following
values: (0, I), (0,-1 ), (1 ,0), (-1 ,0). Then y 1 and y 2 are uncorrelated, as can be
simply calculated. On the other hand, £~ 12 Yi] = 0 :t:- ..!_ = £~12 ]E~; ]so
the
4
variables cannot be independent. Since independence implies uncorrelatedness,
many ICA methods constrain the estimation procedure so that it always gives
uncorrelated estimates of the independent components. This reduces the number
offree parameters, and simplifies the problem.
Many ambiguities must be avoided when dealing with ICA. Among them, the
first one is that the variances (energies) of the independent components cannot be
determined. The reason is that any scalar multiplier in one of the sources si could
always be cancelled by dividing the corresponding column ai of A by the same
scalar. Whitening (sphering) of the independent components, i.e., choose all
variances equal to one: .E[s?]=l can solve this ambiguity. This establishes equal
"magnitudes" of the sources, but the ambiguity of the sign remains, as we can
multiply any source by -1 without affecting the model. Insignificant in most
applications.
A second ambiguity is that we cannot determine the order of the independent
components. In fact we can freely change the order of the terms in the sum in
The dynamics of the term structure of interest rate 221
x="'J:pjSj , and call any of the independent components the first one. A
j=]
permutation matrix P and its inverse can be substituted in the model to give
x = AP- 1Ps = A's' with s' = Ps, A'= AP- 1 • The sources are then identified by
using a priori knowledge about their features.
2. ESTIMATION METHODS
Many different approaches may be followed to estimate ICs and the "mixing
process", each of them exploits different properties of independent components.
In what follows, one insists on two ways of finding ICs and the mixing matrix:
the Maximisation ofNongaussianity measures and Tensorial Methods.
The well known Central Limit Theorem tell us that the distribution of a sum
of independent random variables, arbitrarily distributed, tends toward a gaussian
distribution under rather general conditions. This key result in addition to the non
gaussianity of the underlying sources allow us to find the "demixing matrix" and
the sources we are looking for. In essence, the theorem explains that a gaussian
distribution can be considered as the result of the mixing of many independent
variables. Conversely, a transformation that yields a distribution as far away as
possible from gaussianity can be seen as separating independent components.
To see this, let's consider x distributed according to the ICA data model x=As
and let's assume for simplicity that all the ICs have identical distributions. To
estimate one of the ICs, we consider some linear combination of the xi:
where the ai are the possible values of Y. For a continuous-valued random vector
y with density j(y), entropy is defined by
H(Y)= fJ(y)Inf(y)dy.
A fundamental result of information theory states that gaussian variables have the
largest entropy among all random variables of equal variance. Entropy can be
used as a measure of nongaussianity. The gaussian distribution is the most
random of all distributions. Entropy is small for distributions that are clearly
concentrated on certain values, i.e., when the variable is clearly clustered. A
closely related measure is the negentropy. Negentropy J of a random variable Y is
defined as:
Cum(x1,x2,x3,x4)= E[x1x2x3x4]
- E[x1x2]E[x3x4]
- E[x1x3]E[.x2.x4]
- E[x1x4]E[x2x3]
where X; = X; - E[x;].
Recall that for symmetric distributions odd-order cumulants are zero and that
second order cumulants are Cum(Xl,X2 )=E[Xi.X2] . The variance and the kurtosis of
a real random variable x are defined as:
224 Chapter I I
Under x=As which also reads xi = L aijs 1 where aij denotes the ij- th entry
j
The structure of a cumulant matrix in the ICA model is easily deduced from this
last equation that:
Tx(M)= AA(M)A 1
with: A(M) = diag(k(s 1 )a 1 1 Map ... ,k(sJan 1 MaJwhere ai denotes the ith
column of A.
In this factorization the kurtosis enter only in the diagonal matrix. Solving for
the eigenvectors of such eigenmatrices, the ICA model can be estimated. This is
typically a joint diagonalization of several matrices but the most difficulty is that
A is not an orthogonal matrix. The Joint Approximate Diagonalization of Eigen-
Matrices algorithm (JADE) uses the fact that any linear mixture of the
independent components can be transformed into white components, in which
case this mixing matrix is orthogonal.
The dynamics of the term structure of interest rate 225
3. JADE ALGORITHM
This section outlines the JADE algorithm (Cardoso and Souloumiac, 1993).
The approach for the JADE algorithm is the following two-stage procedure:
Whitening and Rotation.
3.1 Whitening
That is Q=A-tr'. It is easy to check that E[:xx']=I by taking the expectation of:
A couple of remarks merits to be done. First, whitening alone does not solve the
separation problem. This is because whitening is only defined up to an additional
rotation: if Q 1 is a whitening matrix, then Q2=U Q 1 is also a whitening matrix if
and only if U is an orthogonal matrix. Therefore, we have to find the correct
whitening matrix that equally separates the independent components. This is
done by first finding any whitening matrix Q, and later determining the
appropriate orthogonal transformation from a suitable non-quadratic criterion.
Second, whitening reduces the number of parameters to be estimated. Instead of
having to estimate the n2 parameters that are the elements of th~ original matrix
A, we only need to estimate the new, orthogonal mixing matrix A.
An orthogonal matrix contains:
n(n -1)
2
3.2 Rotation
For any nxn matrix M, we can define the associated cumulant matrix T1 (M)
defined component-wise by:
where the subscript u means the (iJ)- th~lement oJ a matrix T. We have shown
that whitening yields to the model .X = As with A orthonormal. This model is
still a model of independant components. From section 1.2 the structure of the
corresponding cumulant matrix of x can be written as:
T1 (M)= AA(M)A'
with: A(M) = diag(k(s 1)a,' Ma" ... , k(sJan' Ma'Jand a denotes the i-th
column of A and this for any nxn matrix M
The dynamics of the term structure of interest rate 227
Let IT={M~, ..., Mp} be a set of p matrices of size nxn and denote by Tx (M;)
( 1 ~ i ~ p ) the associated cumulant matrices of the whitened data :X . Again for
all i, we have Tx{M;) = AA(MJ4• with A(M;) a diagonal matrix. As a
measure of nondiagonality of a matrix H, define Off(H) as the sum of the squares
of the non diagonal elements:
Off(H)= L_H~.
i"')
We have in particular:
For any matrix set IT and any orthonormal matrix V, the Jade algorithm
optimize a so-called orthogonal contrast:
0,75
0,5
0,25
0
3 TCM5 TCM7 TCM10 TCM20 TCM30
-0 ,25
-0,5
-0,75
-1
- - - F2 F1 -+-F3
The first PCs accounts for 83.14 % of the total variability, the second PCs for
about 15.86% and the first three for 99.74%.
The first factor loading is very close to being constant across all maturities.
Since Factor 1 has roughly equal effects on all maturities, a change in Factor 1
will produce a parallel movement in all interest rates. For this reason Factor 1 can
be interpreted as a level factor, producing changes in the overall level of interest
rates. The loadings on the second factor decrease uniformly from a relatively
large positive value at the short end of maturities to a negative value at the
longest maturities.
This pattern of decreasing loadings is consistent with interpreting Factor 2 as
a slope factor, affecting the slope of the term structure but not the average level
of interest rates. Factor 2 produces movements in the long and short ends of the
term structure in opposite directions (twisting the yield curve), with
commensurate smaller changes at intermediate maturities.
Factor 3 may be interpreted as a hump or curvature factor.
The empirical study carried out in this section uses the Joint Approximate
Diagonalization of Eigenmatrices algorithm (Cardoso, Souloumiac 1993)6 .
As explained in the whitening step the first stage is performed by computing
the sample covariance matrix, giving the second order statistics of the observed
outputs. From this, a matrix is computed by eigendecomposition which whitens
the observed data. We transform the observed vector x linearly so that we obtain
a new vector x which is white, i.e. its components are uncorrelated and their
variances equal unity. The whitening transform Q ( x=Qx) can be determined by
a principal component analysis (see section 3.1):
0,75
0,5
0,25
-0,25
-0,5
-0,75
-1
l-- F2 F1 -+- F3
The loadings of the first three independent factors obtained by JADE have
natural interpretation compared to those of the PCA. The first IC represent the
short term, the second the middle term and the third the end term of interest rates.
ACKNOWLEDGEMENTS
NOTES
I. The principal components (PCs) are ordered in terms of their variances: the first PC defines the
direction that captures the maximum variance possible, the second PC defines (in the remaining
orthogonal subspace) the direction of maximum variance, and so forth.
2. For Joint Approximate Diagonalization ofEigenmatrices.
3. The basic preprocessing is to center x , i.e. subtract its mean vector m=E[x] (approximated by its
sample mean) so as to make a zero-mean variable. This necessary implies that s is zero-mean as
well, as can be seen by taking expectations on both sides of x =As, below. This preprocessing is
made solely to simplify the ICA algorithms: it does not mean that the mean could not be
estimated. After estimating the mixing matrix A with centered data, we can complete the
estimation by adding the mean vector of s back to the centered estimates of s. The mean vector
of sis given by A- 1m, where m is the mean that was subtracted in the preprocessing.
The dynamics of the term structure of interest rate 231
4. It must be stressed that ICA is a special case of a more general task termed the blind source
separation (BSS). Blind means that very little, if anything, is known on the mixing matrix, and
make little assumptions on the source signals.
5. Note they could be multiplied by some scalar constants.
6. We are grateful to Jean-Fran9ois Cardoso for making the source code of the JADE algorithm
available. The batch algorithm is an efficient version of the two step procedure that has already
been described.
REFERENCES
Ane T., Labidi C. (2001), "Implied Volatility Surfaces and Market Activity over Time", Journal of
Economics and Finance, 25(3), 259-275 .
Bliss R. (1997), "Movements in the term structure of interest rates", Economic Review, 41h Quarter,
16-33.
Bogner R. (1992), "Blind Separation of Sources", Technical Report 4559, Defense Research
Agency, Malvern.
Cardoso J.-F. (1989), "Source separation using higher order moments", International Conference
on Acoustics, Speech and Signal Processing, 2109-2112.
Cardoso, J.-F. (1999), "High-order contrasts for independent component analysis", Neural
Computation, 11(1), 157--192.
Cardoso J.-F., Souloumiac A. (1993), "Blind beamforming for non-Gaussian signals", lEE Proc.
F ., 140(6), 771-774.
Chaumeton L., Connor G., Curds R.(l996), "A Global Stock and Bond Model"', Financial Analysts
Journal, 52 (6), 65-74.
Comon P. (1994). "Independent component analysis a new concept?", Signal Processing, 36(3),
287-314.
Jamshidian, Farshid, Yu Zhu (1996), "Scenario Simulation Model: Theory and Methodology",
mimeo, Sakura Global Capital.
Kahn R. (1989), "Risk and Return in the U.S. Bond Market: A Multifactor Approach", in Fabozzi
F. (ed.), Advances & Innovations in the Bond and Mortgage Markets (Probus).
Kambhu J., Rodrigues A. (1997), "Stress Tests and Portfolio Composition", mimeo, Federal
Reserve Bank of New York.
Knez P., Litterman R., Scheinkman J.(l994), "Explorations into Factors Explaining Money Market
Returns", The Journal of Finance, XLIX (5), 1861-1882.
Litterman R., Iben T. (1991), "Corporate Bond Valuation and the Term Structure of Credit
Spreads", The Journal of Portfolio Management, Spring, 52-64.
Litterman R., Scheinkman J. (1991), "Common Factors Affecting Bond Returns", The Journal of
Fixed Income, June, 54-61.
Lore tan M.( 1996), "Market Risk Scenarios and Principal Components Analysis: Methodological
and Empirical Considerations", mimeo, Board of Governors of the Federal Reserve.
Murphy B., Won D. (1995), "Valuation and Risk Analysis of International Bonds", in Fabozzi F.,
Fabozzi T. (eds.), The Handbook of Fixed Income Securities, New York, Irwin.
Murphy K. (1992), "J.P. Morgan Term Structure Model", mimeo, Bond Index Group, J.P. Morgan
Securities, Inc.
Oja E. (1989), "Neural networks, principal components and subspaces", International Journal of
Neural Systems, 1, 61-68.
232 Chapter 11
Pope K., Bogner R. (1994), "Blind separation of speech signals", Proc. of the Fifth Australian Int.
Conf. on Speech Science and Technology, Perth, 46-50.
Tong L., Liu R., Soon V., Huang Y. (1991), "Indeterminacy and identifiability of blind
identification", IEEE Trans. Circuits Systems, 38(5), 499-509.
Chapter 12
Abstract: The purpose of this paper is to present an empirical study of a set of hedge funds on
recent periods. Alternative investments are now widely used by institutional
investors and numerous studies highlight the main features of such investments. As
they are in general poorly correlated with the main world indexes, traditional asset
pricing models yield poor adjustments, partially because of potential non-linearities
in pay-off functions. Some funds, however, exhibit high reward to variability ratios
and can advantageously be incorporated in a portfolio in a diversification
perspective. After describing the dataset, we classify the funds employing the
Kohonen algorithm. We then cross the classification with the one based on the style
of strategies involved, wondering if such categories are enough homogeneous to be
relevant. The map of funds allows to characterize families of funds - whose
conditional densities are different one to another - and to define a representative
fund for each class. The structure of the network of funds is then described. In
particular, we measure inter-class similarities and visualize them both on the
network of funds and via a map of one-to-one distances between representative
funds. Finally, we underline some of characteristics of classified fund families that
may interest investors such as performance measurements.
Key words: Kohonen maps, Classification, Multidimensional Data Analysis, General Non-linear
Models, Hedge Funds, Fund-picking, Performance Measurements.
INTRODUCTION
Hedge funds are now recognized as an asset class in their own right since
several studies have highlighted their specific characteristics (see, for instance,
Fung, Hsieh, 1997; 1998; 1999; 2000). The success of that class of investments
among worldwide investors indicates moreover that they clearly have some
advantages for rational investors. Thought they take their name originally from
hedge positions against some specific or market risks, hedge funds include
nowadays various types of funds even if they do not hedge anything. The
heterogeneity of hedge funds (in term of financial strategies, markets involved,
return paths, nature of risk implied) claims for a robust typology that allows the
investor to have an a priori on the future behavior of the funds, especially hedge
funds, because of some lack of transparency due to well-protected strategies.
It has been shown nevertheless that one can identify factors explaining the
dynamics of some fund returns. As pointed by Lhabitant (2001), selecting the
correct factors to explain return dynamics is often more of an art than a science.
The techniques used for factor identification range from principal component
analysis (PCA), generalized hierarchical classification (Brown, Goetzmann 1997;
2001 ), option-like return representative strategies (Fung, Hsieh, 2000; Agarwal,
Naik 2000-b ), cluster analysis (Gruber 200 I), hierarchical tree (Mantegna 1999;
Bonanno et al. 2000), to well-known arbitrary specified factors in frameworks of
extended CAPMs, multifactor or APT models (see Blake et al 1999; Gruber
2001). A special case of the last technique is Sharpe's methodology for
identifying styles (see Sharpe 1988; 1992). In the context of hedge funds, such
analysis depends crucially on the definition of benchmarks. Style analysis works
remarkably well for investment funds and traditional portfolios (see Sharpe 1992;
Brown , Goetzmann 1997; Daniel et al, 1997) but performs poorly with hedge
funds (Brown, Goetzmann 2001). The reasons could be at least three-fold.
Firstly, the factors underlying hedge fund returns have not been fully identified
yet in previous research. The use of indexes as proxies often remains a
questionable and crucial issue. Secondly, hedge fund managers have often their
own investment styles and their way of identifying market opportunities. The
range of such styles and market opportunities are larger than for traditional stock
and bond fund managers. Typically, while the latter are strictly regulated and
must hold primarily long positions in the underlying assets with a limited and
controlled cash position whatever market conditions, the former have broad
mandates, take long and short positions and use different degrees of leverage in
time-varying market conditions. Thirdly, within general classes of strategies
(Statistical Arbitrage for instance), the practices, models and markets involved
are quite heterogeneous. In other words, similar types of general strategies lead to
Classifying hedge funds with Kohonen maps: A first attempt 235
very different patterns. All this facts result in non-linear pay-off that can flaw the
(linear) return-based style analysis.
We propose here a method for classifying and organizing funds - grouped
without any a priori ground- that can help to models representative individuals in
a homogeneous class. Several studies have shown that traditional linear asset
pricing fail to explain hedge fund returns (see, for instance, Fung, Hsieh 1999).
When the standard linear statistical methods are not appropriate, due to the
intrinsic structure of the observations, one can try to use the Kohonen algorithm,
already widely used for data analysis in different fields and in the financial
literature in particular (de Bodt et al. 1997, for interest rate curves and Deboeck
1997, for mutual funds).
The paper is organized as follow. In section 1, we introduce the methodology
and the database. In section 2, we present different visualizations of the dataset
discriminating hedge funds. Last section concludes, highlighting some potential
drawbacks of the method and some problems associated with the dataset.
Potential financial applications of this methodology are finally introduced.
classification and representation. Both representation tools have the same spirit
and the same goal (see Blayo, Demartines 1991 ). Factorial analysis allows to
adjust the data set with a plane surface whose representation is straightforward.
Because of a higher flexibility, the surface adjustment involved in the Kohonen
method provides a better fit, but it does not imply a particular structure of
representation. This last difficulty is solved by using a map of distances between
classes centroids. Moreover, the classification obtained with the Kohonen
algorithm - shortly described below - can be used as an input of a classical
hierarchical method.
A simple method for visualizing distances between neighboured centroids has
already been proposed (see Cottrell, de Bodt 1996) and has been found well-
adapted for visualizing distances between close classes, through, unfortunately, it
does not capture the whole structure of the surface in some cases (in particular
when the surface makes a fold). In this paper, we use a specific representation
that allows to visualize distances between all classes.
(Eq.l)
2. Increment s and drawn randomly without replacement from the data one
observation denoted x . The winning unit associated to this drawn, denoted
u 0 , is the one whose code vector is the "nearest" of x; more precisely, the
code vector of the winning unit, denoted Gu , solves:
0
3. Modify the code vectors of the U units according to the following rule:
(Eq.6)
4. Implement the algorithm for s varying from 1 to S (steps 2 and 3). When the
Kohonen algorithm attains the zero-ray step (i.e. when ~3/4)S < s ~ S in our
example), it then reduces to that ofLloyd's simple competitive learning.
This algorithm is used as an alternative classification method but also for data
analysis (see Blayo, Demartines 1991). In that case, the map becomes a
representation of the raw database: observations classified in the same unit - or in
the neighboured units - are supposed to share similarities. Once the dataset has
been reduced to some classes - called micro-classes, applying a classical
hierarchical method to these micro-classes allows to group them into macro-
classes and have a second level of interpretation. Finally, using exogenous
qualitative variables, the classification is explained and validated with classical
inference methods.
The originality of the method rests in the organization of classes on a map
according to a neighborhood notion. This methodology might be an
advantageous alternative to other techniques even if, at this stage, some problems
remam.
In one hand, these techniques are preferred to Forgy's Mobile Centroids
Method (1965) and Ward classification principle (1963) when large databases are
under review (see Anderson 1984), mainly because they are parsimonious in term
of computing time and tractability. This classification method is also robust
because it is less sensitive to outliers (for the input distribution) than most of
other techniques. A new element in the input data set will not in general change
significantly the result (on the contrary to hierarchical classification methods).
Studies based on truncated samples built with a bootstrap method (see Cottrell,
Classifying hedge funds with Kohonen maps: A first attempt 239
2. THEDATA
1600
1400
+=
1200
1000
-1
800
400
200
0
.... .... ~ "'
<D <Xl 0 N <D <Xl 0 N <Xl 0
~
., 'Z., .,'Z .,""u .., .., .,..,""u ""u ..,u
'9 <Xl q> 9
u u u u
.,
~ ~ ~ -G)
.,
~ ~
0
"0
-G)
"0
-G)
0
"0
0
"0
•Q)
"0 "0
From this sample, applying traditional filter rules, we keep 471 funds on a 6-
year period in a compromise between a large sample of funds with few history
and a small number of funds with numerous observations. We then delete funds
data when too missing values were present and interpolate - using a cubic spline -
missing observations when possible and normalize fund values to index I 00 at
the beginning of the final sample. At the end, this one contains 294 funds and the
number of observations is 67 (from January 1995 to September 2000).
The database may suffer from major drawbacks (see conclusion) and caution
is needed when addressing the results presented in next sub-sections since they
are obtained with the final sample.
The next figure represents the map where each fund evolution through the
sample is drawn in a unit corresponding to its micro-class (represented in a box
placed in the grid). Then, we can also group funds into macro-classes
(represented by colored group of boxes) defined using a Hierarchical
classification with Ward distance. The number of funds in micro-classes is
heterogeneous varying from I to 40 funds by micro-class. Moreover, two
separate groups of funds can be distinguished: two third of the data belongs to the
first one (Group 1), grouped into one macro-class only (the green zone), whilst
one third belongs to the second one (Group 2: the nine other macro-classes). That
indicates roughly that one can distinguish an homogeneous group of funds, and
some others that exhibit individual particularities. Nearly a third of funds are
finally contained in the first row of the grid, indicating a strong concentration in
the green region.
Based on these remarks, we first focus on the latter group of funds (Group 2),
and second go back to the former group of funds (Group 1). We remark this time
ClassifYing hedge funds with Kohonen maps: A first attempt 241
s
HB•
' HB• 1 HI•
' NB• B NB•
·~
NB• 1 HI•
"'· l
Note: This figure represents the network, grouping funds altogether according to the similarities of
fund return patterns (whole sample, see text). The number of funds is reported in each unit
(NB=#). Colors stand for meta-categories of funds, obtained applying a hierarchical
classification to the centroids of classes.
Nevertheless, the chosen representation of the data (i.e. the grid) does not
allow to capture the entire structure of the data; in particular, the distances
between code vectors cannot be inferred from previous figures. A first and
natural way of solving the problem is to project the code vectors on the principal
plane given by a traditional Principal Component Analysis method. But the
scratch due to the projection of a multidimensional data into a plane could yield
some misunderstandings and one needs here for complementary tools to visualize
the structure of the map. The distance between two micro-classes can be
associated with a surface proportional to the similarity between representative
funds (see Figure 4) .
...... -
...... .......
..-..oi ,.../_
"'
j ~
Note: This figure represents the evolutions of the NAY of representative funds.
Note: This figure represents adjacent centroid distances (undashed zones between cells).
Classifying hedge funds with Kohonen maps: A first attempt 243
Note: This figure represents the network, grouping funds altogether according to the similarities of
fund return patterns (second sub-sample, see text). The number of funds is reported in each unit
(NB=#). Colors stand for meta-categories of funds, obtained applying a hierarchical
classification to the centroids of classes.
Note: This figure represents the evolutions of the NAY of representative funds (second sub-sample,
see text).
Classifying hedge funds with Kohonen maps: Afirst attempt 245
Note: This figure represents adjacent centroid distances (undashed zones between cells; second
sub-sample, see text).
From Figure 7, we can see two homogeneous classes since distances between
close classes are relatively low for central classes (the green and magenta
regions). On the contrary, large distances are located in the upper-left, upper-
right and bottom-right comers (the grey, cyan and yellow regions).
But all information one can extract from the data cannot be backed out from
this last representation. Indeed - as in the cyan region - intra-classes are
sometimes more distant than adjacent macro-classes. This drawback leads to
evaluate, as a complementary tool, distances between each code vectors and all
others. In the next figure, each unit u is subdivided into U sub-units u'. In each
sub-unit u' of an unit u, the color (white for small distances, red for intermediate
distances and dark red for relative large distances) represents the distance
between code vectors Gu and Gu·. This superposition of maps directly allows to
visualize distances between representative funds and it is then clear from this
figure that one-to-one distances between micro-classes are very different. One
can distinguish four regions: a large central one (units 7, 8, 9, 10, 11, 13, 14, 15,
16, 17, 19, 20, 21, 22, 23, 25, 26, 27, 28, 29, 31, 32), a ring region surrounding
the previous one (composed of units 2, 3, 4, 5, 12, 18, 24, 30, 33, 34, 35) and
246 Chapter 12
-- II - Ill
Note: This figure represents a superposition of the Kohonen network and one-to-one centroid
distances.
Note: This figure represents the proportion of each modality in each cell. Modalities vary from I
(Directional Trading, in pink), 2 (Relative Value, in blue), 3 (Specialist Credit, in yellow) to 4
(Stock Selection, in grey).
It is clear from this figure that funds of the same category do not correspond
exactly neither to micro-classes nor to macro-classes. Nevertheless, from Figures
8 and 9, one notices that the large central class is composed of different style
funds, whilst in other regions (ring and corners areas) category 4 funds (called
"Stock Selection") are essentially represented. This category is in fact not
homogeneous and the meta-category classification is not discriminating as it can
bee seen in the green region where all categories are present. A complementary
visualization focuses on a chart representation of modalities within micro-classes.
Focusing this time on the location of families of funds on the network, Figure 10
gives an idea of how qualitative modalities are distributed into the micro-classes.
It confirms the fact that Directional Trading and Stock Selection funds spread
into the entire map, whilst Relative Value and Specialist Credit funds are placed
into the green area (with a slight tendency for the Relative Value funds to be in
the south of the green area and the Specialist Credit funds to be put into the north
of the green area).
248 Chapter 12
v r l b i g. COL.2
Figure 10. Kohonen Map and Meta Categories Typology- A Chart Representation -
Note: This figure represents the localisation of each modality on the map. Modalities varies from I
(Directional Trading, in pink), 2 (Relative Value, in blue), 3 (Specialist Credit, in yellow) to 4
(Stock Selection, in grey).
Note: This figure represents the proportion of each modality in each cell. Modalities vary from 1
(Convertible Arbitrage) to 18 (Statistical Arbitrage) See text and Appendix for the list of
strategies.
I ..
.11:.
0
, . I · : · •
"If
1 .
:. . . .
I.
.
.. I··~
Figure 12. Kohonen Map and Micropal's Categories Typology- a Chart Representation-
Note: This figure represents the localisation of each modality on the map. Modalities vary from 1
(Convertible Arbitrage) to 18 (Statistical Arbitrage) See text and Appendix for the list of
strategies.
250 Chapter 12
Jobson, Korkie 1981; Sharpe I 994; Pedersen, Satchell 2000), denoted S1 and
defined such as:
(Eq.7)
and:
A ( 3 100 ) (-
S1 = 1+-+--
4n 128n
R1 -R1 ) VaA
1 (Eq.8)
where R1 is the annualized return on the considered hedge fund and R1 its
sample mean counterpart, Rf is a proxy for the riskless asset return, <J1 is the
annualized standard deviation of hedge fund return and a 1 its sample estimation,
and n the number of observations in the sample.
Figures 13 and 14 represent one possible discrimination obtained with
Sharpe's ratios. In these figures, the color represents a four-level discretization of
Sharpe's ratios (each category containing a quartile of the population and colors
ranging from magenta, blue, yellow and grey according to the level of the
ascending Sharpe's ratio ranks of funds). From Figure 13 and 14, we remark that
low and medium-low Sharpe's ratios (magenta and blue levels) can mainly be
found on the ring zone of the map, that medium-high (yellow level) ones are
more often in the green zone, whilst high Sharpe's ratios (grey level) are
essentially located in the central zone of the map (green and magenta zones).
Nevertheless, the discretization of Sharpe's ratios operated here is somehow
arbitrary and to go further on the attempt of characterization of the map using
Sharpe's ratios, one can represents, for each unit, the conditional and
unconditional Sharpe's ratios densities (see Figure 15) and conditional and
unconditional (simplified) Box-plots (see Figure 15). These representation
confirms that none of both information - adjusted performance measures and
Classifying hedge funds with Kohonen maps: Afirst attempt 251
Kohonen classification - are redundant. Indeed, one can see on Figures 15 and 16
that high Sharpe's ratios can be found more probably in the central region of the
map, whilst lower Sharpe's ratio are more present in the north and the east zones
of the map.
Note: This figure represents the proportion of each modality in each cell. Modalities vary from 1
(Low Sharpe's Ratios, in pink), 2 (Medium-low, in blue), 3 (Medium-high (3), in yellow), to 4
(High, in grey).
Note: This figure represents the localisation of each modality on the map. Modalities vary from I
(Convertible Arbitrage) to 18 (Statistical Arbitrage) See text and Appendix for the list of
strategies.
252 Chapter 12
Note: This figure presents, in each cell of the map, the density of Sharpe's ratios of funds in the
category, together with, below, the distribution of Sharpe's ratios for the whole dataset.
~-t -f=
I--
~-c-
-I-
H -
-
=:::-
~-----
F=
-
1-
-
=r==
I--
I--
=-
-
=-
- -~ -f=_
-
=
:: ~-----
- 1--
===-1- I--
- ~=- -- -1---
-
-_ -
-- ==
I--
1--- -t;;
n
- - I-- I--
- :=-- :f= - -
- I-- -I-- -
Figure 16. Qualitative Sharpe's Ratios Discrimination - Conditional versus Unconditional Box Plot
Note: This figure presents, in each cell of the map, the minimum, maximum and mean Sharpe's ratio
of the category of funds, together with, on the right, the minimum, maximum and mean Sharpe's
ratios for the whole dataset.
Classifying hedge funds with Kohonen maps: A first attempt 253
That is consistent with previous analyses of the map (Figure 13 and 14) and
leads to the conclusion that the central region of the map should be preferred by a
rational and risk-averse investor.
Finally, note that this analysis could be applied using most of performance
measurements, from practitioner oriented measures (Information ratio, Sortino
ratio, Calmar ratio, Sterling ratio ...; see Sortino and Price for instance),
traditional ones 5 (Treynor 1965; Sharpe 1966; Jensen 1968), other measures
(Connor-Korajczik 1986; Grinblatt-Titman 1989; Okunev 1990) or more recent
ones (such as Leland 1999; Bowden 2000; Chauveau-Maillet 2001 or Dacorogna
et al. 2001) depending on the hypotheses on the Data Generating Process, on
market timing effects, on normality of the funds returns, wealth, risk aversion
coefficient and implied preferences of the final investor.
CONCLUSION
2000), and, as soon the data is ergodic, one could expect the structure of the map
to be relevant.
A. Meta Classification
I. Directional Trading
II. Relative Value
III. Specialist Credit
IV. Stock Selection
1. Convertible Arbitrage
l.l Convertible Arbitrage- Global (34)
1.2 Convertible Europe (3)
1.3 Convertible Japon (2)
2. Distressed Securities
2.4 Distressed Securities - Global (31)
3. Emerging Markets
3.5 Emerging Markets : Asia (31)
3.6 Emerging Markets: Eastern Europe (18)
3.7 Emerging Markets- Global (43)
3.8 Emerging Market: Latin America (23)
4. Equity Hedge
4.9 Equity Hedge (253)
4.10 Equity Hedge: Africa (2)
4.11 Equity Hedge: Asia (8)
4.12 Equity Hedge: Europe (33)
4.13 Equity Hedge: Japan (3)
5. Equity Market Neutral
5.14 Equity Market Neutral (42)
5.15 Equity Market Neutral : Europe (I)
6. Equity Non-hedge
6.16 Equity Non-hedge (23)
6.17 Equity Non-hedge: Asia (2)
6.18 Equity Non-hedge: Europe (3)
7. Event Driven
7.19 Event-Driven (64)
7.20 Event-Driven: Europe (2)
8. Fixed Income Arbitrage
256 Chapter 12
NOTES
I. Corresponding, in our case, to the P Net Asset Value dates for the N considered hedge funds.
2. Other criterion defining the neighborhood have been chosen in financial applications such as a
notion of distance based on one minus the squared Pearson correlation coefficient between
Classifying hedge funds with Kohonen maps: A first attempt 257
returns associated with financial assets and a benchmark (see Mantegna 1999). The adopted
definition might be an important issue when classifying funds .
+oo +oo
3. That is _L17(s) = +oo and L [17(s )]2 $ +oo.
s=l s=l
4. We find here a singularity for these funds that has been already signalled for classical mutual
funds (see Brown, Goetzmann 1997, for instance).
5. See Pedersen and Satchell (2000) for recent applications.
6. A complete description of such categories can be found, for instance, in Agarwal and Naik,
(2000-a) or Lhabitant (200 1).
REFERENCES
Agarwal V., Naik N. (2000-a), "Multi-period Persistence Analysis of Hedge Funds", Journal of
Quantitative and Financial Analysis, 35(3), September, 327-342.
Agarwal V., Naik N. (2000-b), "Characterizing Systematic Risk of Hedge Funds with Buy-and-
hold and Option-based Strategies", LBS working paper, IFA n°300, August, 51 pages.
Anderson T. (1984), An Introduction to Multivariate Statistical Analysis, John Wiley Editor, 2"ct
Edition, New York.
Blake E., Elton M., Gruber C. (1999), "Common Factors in Active and Passive Portfolios",
European Finance Review, 3(1 ), 53-78.
Blayo F., Demartines P. (1991), "Data Analysis: How to Compare Kohonen Neural Networks to
Other Techniques?", in Proceedings of IWANN '91 Conference, Springer, 469-476.
Bonanno G., Vandewalle N., Mantegna R. (2000), "Taxonomy of Stock Market Indices", Physical
Review E 62(6), December.
Bowden R. "The Ordered Mean Difference as a Portfolio Performance Measure", Journal of
Empirical Finance, 1, 195-223.
BrownS., Goetzmann W. (1997), "Mutual Fund Styles", Journal of Financial Economics, 43, 373-
399.
BrownS., Goetzmann W. (2001), "Hedge Funds with Styles", NBER Working Paper n°w8173,
March, 37 pages.
Chauveau T., Maillet B. (2001), "Performance with Restricted Borrowing: A Generalisation of
Usual Measures", in Proceedings ofEFMA 'OJ Conference, Lugano, June, 46 pages.
Connor G., Korajczyk R. (1986), "Performance Measurement with the Arbitrage Pricing Theory: A
New Framework for Analysis", Journal ofFinancial Economics, 15(3), 373-394.
Cottrell M., Fort J.C., Pages G. (1995) "Two or Three Things that We Know about the Kohonen
Algorithm", in Verleysen M. (ed.), Proceedings of ESANN'94 Conference, D Facto, Bruxelles,
235-244.
Cottrell M., de Bodt E. (1996), "A Kohonen Map Representations to Avoid Misleading
Interpretations", in Verleysen M. (ed.), Proceedings of ESANN'96 Conference, D Facto,
Bruxelles, 103-110.
Cottrell M., de Bodt E. (2000), "Bootstrapping Self-organizing Maps to Assess the Statistical
Significance of Local Proximity", in Verleysen M. (ed.), Proceedings ofESANN'OO Conference,
D Facto, Bruges, 245-254.
Cottrell M., Girard B., Girard Y., Muller C., Rousset P. (1995-a), "Daily Electrical Power Curves:
Classification and Forecasting Using a Kohonen Map, From Natural to Artificial Neural
Computation", Proceedings of IWANN '95 Conference, Springer, II 07-1113.
258 Chapter 12
Cottrell M., Girard B., Girard Y., M. Mangeas (1995-b), "Neural Modelling for Time Series: A
Statistical Stepwise Method for Weight Elimination", IEEE Tr. on Neural Networks, 6(6),
November, 1355-1364.
Cottrell M., Fort J.-C., Pages G. (1998-a), "Theoretical Aspects of the SOM Algorithm", Neuro
Computing, 21, 119-138.
Cottrell M., Girard B., Rousset P. (1998-b), "Forecasting of Curves Using a Kohonen
Classification", Journal of Forecasting, 17(5/6), 429-439.
Dacorogna M., Gens;ay R., MUller U., Pictet 0 . (2001), "Effective Return, Risk Aversion and
Drawdowns", Physica, A 289, 229-248.
Daniel K., Grinblatt M., Titman S., Wermers R. (1997), "Measuring Mutual Fund Performance
with Characteristics-based Benchmarks", Journal of Finance, 52(3), July, I 035-1058.
de Bodt E., Gregoire Ph., Cottrell M. (1997), "Projection of Long-term Interest Rates with Maps",
in Deboeck G., Kohonen T. (eds), Visual Explorations in Finance with Self-organizing Maps,
Springer, 24-38.
Deboeck G. (1997), "Picking Mutual Funds with Self-organizing Maps", in Deboeck G., Kohonen
T. (eds) Visual Explorations in Finance with Self-organizing Maps, Springer, 39-58.
DeRoon F., Nijman T., Terhost J. (2001), "Evaluating Style Analysis", in Proceedings ofEFMA 'OJ
Conference, Lugano, June, 33 pages.
Dibartomoleo D., Witkowski E. (1997), "Mutual Fund Misclassification: Evidence based on Style
Analysis", Financial Analysts Journal, Sept-Oct, 32-43.
Fung W., Hsieh D. (1997), "Empirical Characteristics of Dynamic Trading Strategies: The Case of
Hedge Funds", Review of Financial Studies, 10(2), Summer, 275-302.
Fung W., Hsieh D. (1998), "Performance Attribution and Style Analysis. From Mutual Funds to
Hedge Funds", Working paper, Paradigm Financial Product, February, 41 pages.
Fung W., Hsieh D. (1999), "A Primer on Hedge Funds", Journal of Empirical Finance, 6(3), 309-
331.
Fung W., Hsieh D. (2000), "Performance Characteristics of Hedge Funds and Commodity Funds:
Natural versus Spurious Biases", Journal of Quantitative and Financial Analysis, 35(3),
September, 291-307.
Fung W., Hsieh D. (2001), "The Risk in Hedge Fund Strategies: Theory and Evidence from Trend
Followers", Review of Financial Studies, 14(2), Summer, 3 13-341.
Grinblatt M., Titman S. (1989), "Portfolio Performance Evaluation: Old Issues and New Insights",
Review ofFinancial Studies, 2(3), 393-421 .
Gruber M. (2001), "ldentying the Risk Structure of Mutual Fund Returns", European Financial
Management, 7(2), June 2001, 147-160.
Jensen M. (1968), "The Performance of Mutual Funds in the Period 1945-1964", Journal of
Finance, 23, May, 389-416.
Jobson J., Korkie B. (1981 ), "Performance Hypothesis and Testing and with the Sharpe and
Treynor Measures", Journal of Finance, 36, September, 889-908.
Kohonen T. (2001), Self-organizing Maps, Springer Series in Information Sciences, XXX, 3'd
extended edition, Springer, Berlin.
Leland H. (1999), "Beyond Mean-variance: Performance Measurement in a Nonsymmetric World",
Financial Analysts Journal, January-February, 27-36.
Lhabitant F.-S. (200 I), "Assessing Market Risk for Hedge Funds and Hedge Funds Portfolios",
FAME Research Paper n°24, March, 40 pages.
Mantegna R. (1999), "Information and Hierachical Structure in Financial Markets", Computer
Physics Comunications, 121-122, 153-156.
Mitev T. (1998), "Classification of Commodity Trading Advisors (CTAs) using Maximum
Likelihood Factor Analysis", Journal ofAlternative Investments, 1(2), Fall, 40-46.
Classifying hedge funds with Kohonen maps: Afirst attempt 259
Miller R., Gehr A. (1978), "Sample Bias and Sharpe's Performance Measure: A Note", Journal of
Financial and Quantitative Analysis, 13, December, 943-946.
Okunev J. (1990), "An Alternative Measure of Mutual Fund Performance", Journal of Business
Finance and Accounting, 17(2), Spring, 247-264.
Pedersen Ch., Satchell S. (2000), "Small Sample Analysis of Performance Measures in the
Asymmetric Response Model", Journal of Quantitative and Financial Analysis, 35(3),
September, 425-450.
Sharpe W. (1966), "Mutual Fund Performance", Journal of Business, 34(1), Part 1, January, 119-
138.
Sharpe W. (1988), "Determining a Fund's Effective Asset Mix", Investment Management Review,
2(6), 1988, 59-69.
Sharpe W. (1992), "Asset Allocation: Management Style and Performance Measurement", Journal
of Portfolio Management, 18(2), 7-19.
Sharpe W. (1994), "The Sharpe Ratio", Journal of Portfolio Management, Fall, 49-58.
Sortino F., Price L. (1994), "Performance Measurement in a Downside Risk Framework", Journal
of Investing, Fall, 59-65.
Treynor J. (1965), "How to Rate Management of Investment Funds", Harvard Business Review,
January-February, 63-75.