Download as pdf or txt
Download as pdf or txt
You are on page 1of 263

Multicriteria Decision Aid Classification Methods

Applied Optimization
Volume 73

Series Editors:
Panos M. Pardalos
University of Florida, U.S.A.

Donald Hearn
University of Florida, U.S.A.

The titles published in this series are listed at the end of this volume.
Multicriteria Decision Aid
Classification Methods
by
Michael Doumpos
and

Constantin Zopounidis
Technical University of Crete,
Department of Production Engineering and Management,
Financial Engineering Laboratory,
University Campus, Chania, Greece

KLUWER ACADEMIC PUBLISHERS


NEW YORK, BOSTON, DORDRECHT, LONDON, MOSCOW
eBook ISBN: 0-306-48105-7
Print ISBN: 1-4020-0805-8

©2004 Kluwer Academic Publishers


New York, Boston, Dordrecht, London, Moscow

Print ©2002 Kluwer Academic Publishers


Dordrecht

All rights reserved

No part of this eBook may be reproduced or transmitted in any form or by any means, electronic,
mechanical, recording, or otherwise, without written consent from the Publisher

Created in the United States of America

Visit Kluwer Online at: http://kluweronline.com


and Kluwer's eBookstore at: http://ebooks.kluweronline.com
To my parents Christos and Aikaterini Doumpos

To my wife Kleanthi Koukouraki and my son Dimitrios Zopounidis


Table of contents

PROLOGUE xi

CHAPTER 1: INTRODUCTION TO THE CLASSIFICATION


PROBLEM
1. Decision making problematics 1
2. The classification problem 4
3. General outline of classification methods 6
4. The proposed methodological approach and the objectives of
the book 10

CHAPTER 2: REVIEW OF CLASSIFICATION TECHNIQUES


1. Introduction 15
2. Statistical and econometric techniques 15
2.1 Discriminant analysis 16
2.2 Logit and probit analysis 20
3. Non-parametric techniques 24
3.1 Neural networks 24
3.2 Machine learning 27
3.3 Fuzzy set theory 30
3.4 Rough sets 32

CHAPTER 3: MULTICRITERIA DECISION AID


CLASSIFICATION TECHNIQUES
1. Introduction to multicriteria decision aid 39
1.1 Objectives and general framework 39
1.2 Brief historical review 40
1.3 Basic concepts 41
2. Methodological approaches 43
2.1 Multiobjective mathematical programming 45
2.2 Multiattribute utility theory 48
viii

2.3 Outranking relation theory 50


2.4 Preference disaggregation analysis 52
3. MCDA techniques for classification problems 55
3.1 Techniques based on the direct interrogation of the
decision maker 55
3.1.1 The AHP method 55
3.1.2 The ELECTRE TRI method 59
3.1.3 Other outranking classification methods 64
3.2 The preference disaggregation paradigm in classification
problems 66

CHAPTER 4: PREFERENCE DISAGGREGATION


CLASSIFICATION METHODS
1. Introduction 77
2. The UTADIS method 78
2.1 Criteria aggregation model 78
2.2 Model development process 82
2.2.1 General framework 82
2.2.2 Mathematical formulation 86
2.3 Model development issues 96
2.3.1 The piece-wise linear modeling of marginal
utilities 96
2.3.2 Uniqueness of solutions 97
3. The multi-group hierarchical discrimination method (MHDIS) 100
3.1 Outline and main characteristics 100
3.2 The hierarchical discrimination process 101
3.3 Estimation of utility functions 105
3.4 Model extrapolation 111
Appendix: Post optimality techniques for classification model
development in the UTADIS method 113

CHAPTER 5: EXPERIMENTAL COMPARISON OF


CLASSIFICATION TECHNIQUES
1. Objectives 123
2. The considered methods 124
3. Experimental design 126
3.1 The factors 126
3.2 Data generation procedure 131
4. Analysis of results 134
5. Summary of major findings 143
Appendix: Development of ELECTRE TRI classification models
using a preference disaggregation approach 150
ix

CHAPTER 6: CLASSIFICATION PROBLEMS IN FINANCE


1. Introduction 159
2. Bankruptcy prediction 161
2.1 Problem domain 161
2.2 Data and methodology 164
2.3 The developed models 172
2.3.1 The model of the UTADIS method 172
2.3.2 The model of the MHDIS method 174
2.3.3 The ELECTRE TRI model 176
2.3.4 The rough set model 178
2.3.5 The statistical models 179
2.4 Comparison of the bankruptcy prediction models 181
3. Corporate credit risk assessment 185
3.1 Problem domain 185
3.2 Data and methodology 188
3.3 The developed models 194
3.3.1 The UTADIS model 194
3.3.2 The model of the MHDIS method 196
3.3.3 The ELECTRE TRI model 199
3.3.4 The rough set model 200
3.3.5 The models of the statistical techniques 201
3.4 Comparison of the credit risk assessment models 202
4. Stock evaluation 205
4.1 Problem domain 205
4.2 Data and methodology 209
4.3 The developed models 215
4.3.1 The MCDA models 215
4.3.2 The rough set model 220
4.4 Comparison of the stock evaluation models 222

CHAPTER 7: CONCLUSIONS AND FUTURE PERSPECTIVES


1. Summary of main findings 225
2. Issues for future research 229

REFERENCES 233

SUBJECT INDEX 251


Prologue

Decision making problems, according to their nature, the policy of the deci-
sion maker, and the overall objective of the decision, may require the choice
of an alternative solution, the ranking of the alternatives from the best to the
worst ones or the assignment of the considered alternatives into predefined
homogeneous classes. This last type of decision problem is referred to as
classification or sorting. Classification problems are often encountered in a
variety of fields including finance, marketing, environmental and energy
management, human resources management, medicine, etc.
The major practical interest of the classification problem has motivated
researchers in developing an arsenal of methods for studying such problems,
in order to develop mathematical models achieving the higher possible clas-
sification accuracy and predicting ability. For several decades multivariate
statistical analysis techniques such as discriminant analysis (linear and quad-
ratic), and econometric techniques such as logit and probit analysis, the lin-
ear probability model, etc., have dominated this field. However, the paramet-
ric nature and the statistical assumptions/restrictions of such approaches
have been an issue of major criticism and skepticism on the applicability and
the usefulness of such methods in practice.
The continuous advances in other fields including operations research
and artificial intelligence led many scientists and researchers to exploit the
new capabilities of these fields, in developing more efficient classification
techniques. Among the attempts made one can mention neural networks,
machine learning, fuzzy sets as well as multicriteria decision aid. Multicrite-
ria decision aid (MCDA) has several distinctive and attractive features, in-
volving, mainly, its decision support orientation. The significant advances in
MCDA over the last three decades constitute a powerful non-parametric al-
ternative methodological approach to study classification problems. Al-
xii

though the MCDA research, until the late 1970s, has been mainly oriented
towards the fundamental aspects of this field, as well as to the development
of choice and ranking methodologies, during the 1980s and the 1990s sig-
nificant research has been undertaken on the study of the classification prob-
lem within the MCDA framework.
Following the MCDA framework, the objective of this book is to provide
a comprehensive discussion of the classification problem, to review the ex-
isting parametric and non-parametric techniques, their problems and limita-
tions, and to present the MCDA approach to classification problems. Special
focus is given to the preference disaggregation approach of MCDA. The
preference disaggregation approach refers to the analysis (disaggregation) of
the global preferences (judgement policy) of the decision maker in order to
identify the criteria aggregation model that underlies the preference result
(classification).
The book is organized in seven chapters as follows:
Initially, in chapter 1 an introduction to the classification problem is pre-
sented. The general concepts related to the classification problem are dis-
cussed, along with an outline of the procedures used to develop classification
models.
Chapter 2 provides a comprehensive review of existing classification
techniques. The review involves parametric approaches (statistical and
econometric techniques) such as the linear and quadratic discriminant analy-
sis, the logit and probit analysis, as well as non-parametric techniques from
the fields of neural networks, machine learning, fuzzy sets, and rough sets.
Chapter 3 is devoted to the MCDA approach. Initially, an introduction to
the main concepts of MCDA is presented along with a panorama of the
MCDA methodological streams. Then, the existing MCDA classification
techniques are reviewed, including multiattribute utility theory techniques,
outranking relation techniques and goal programming formulations.
Chapter 4 provides a detailed description of the UTADIS and MHDIS
methods, including their major features, their operation and model develop-
ment procedures, along with their mathematical formulations. Furthermore, a
series of issues is also discussed involving specific aspects of the functional-
ity of the methods and their model development processes.
Chapter 5 presents an extensive comparison of the UTADIS and MHDIS
methods with a series of well-established classification techniques including
the linear and quadratic discriminant analysis, the logit analysis and the
rough set theory. In addition, ELECTRE TRI a well-known MCDA classifi-
cation method based on the outranking relation theory is also considered in
the comparison and a new methodology is presented to estimate the parame-
ters of classification models developed through ELECTRE TRI. The com-
parison is performed through a Monte-Carlo simulation, in order to investi-
xiii

gate the classification performance (classification accuracy) of the consid-


ered methods under different data conditions.
Chapter 6 is devoted to the real-world application of the proposed meth-
odological framework for classification problems. The applications consid-
ered originate from the field of finance, including bankruptcy prediction,
corporate credit risk assessment and stock evaluation. For each application a
comparison is also conducted with all the aforementioned techniques.
Finally, chapter 7 concludes the book, summarizes the main findings and
proposes future research directions with respect to the study of the classifica-
tion problem within a multidimensional context.
In preparing this book, we are grateful to Kiki Kosmidou, Ph.D. candi-
date at the Technical University of Crete, for her important notes on an ear-
lier version of the book and her great help in the preparation of the final
manuscript.
Chapter 1
Introduction to the classification problem

1. DECISION MAKING PROBLEMATICS

Decision science is a very broad and rapidly evolving research field at theo-
retical and practical levels. The post-war technological advances in combi-
nation with the establishment of operations research as a sound approach to
decision making problems, created a new context for addressing real-world
problems through integrated, flexible and realistic methodological ap-
proaches. At the same time, the range of problems that can be addressed ef-
ficiently has also been extended. The nature of these problems is widely di-
versified in terms of their complexity, the type of solutions that should be
investigated, as well as the methodological approaches that can be used to
address them.
Providing a full categorization of the decision making problems on the
basis of the above issues is a difficult task depending upon the scope of the
categorization. A rather straightforward approach is to define the two fol-
lowing categories of decision making problems (Figure 1.1):
Discrete problems involving the examination of a discrete set of alterna-
tives. Each alternative is described along some attributes. Within the de-
cision making context these attributes have the form of evaluation crite-
ria.
Continuous problems involving cases where the number of possible
alternatives is infinite. In such cases one can only outline the region
where the alternatives lie (feasible region), so that each point in this re-
2 Chapter 1

gion corresponds to a specific alternative. Resource allocation is a repre-


sentative example of this form of problems.

When considering a discrete decision making problem, there are four dif-
ferent kinds of analyses (decision making problematics) that can be per-
formed in order to provide meaningful support to decision makers (Roy,
1985; cf. Figure 1.2):
to identify the best alternative or select a limited set of the best alterna-
tives,
to construct a rank–ordering of the alternatives from the best to the worst
ones,
to classify/sort the alternatives into predefined homogenous groups,
to identify the major distinguishing features of the alternatives and per-
form their description based on these features.
The first three forms of decision making problems (choice, ranking, clas-
sification) lead to a specific result regarding the evaluation of the alterna-
tives. Both choice and ranking are based on relative judgments, involving
pair-wise comparisons between the alternatives. Consequently, the overall
evaluation result has a relative form, depending on the alternatives being
evaluated. For instance, an evaluation result of the form “product X is the
best of its kind” is the outcome of relative judgments, and it may change if
the set of products that are similar to product X is altered.
On the contrary, the classification problem is based on absolute judg-
ments. In this case each alternative is assigned to a specific group on the
basis of a pre-specified rule. The definition of this rule, usually, does not
depend on the set of alternatives being evaluated. For instance, the evalua-
tion result “product X does not meet the consumer needs” is based on abso-
lute judgments, since it does not depend on the other products that are simi-
lar to product X. Of course, these judgments are not always absolute, since
1. Introduction to the classification problem 3

they are often defined within the general context characterizing the decision
environment. For instance, under specific circumstances of the general eco-
nomic and business environment a firm may fulfill the necessary require-
ments for its financing by a credit institution (these requirements are inde-
pendent of the population of firms seeking financing). Nevertheless, as the
economic and business conditions evolve, the financing requirements may
change towards being stricter or more relaxed. Therefore, it is possible that
the same firm is rejected credit under in a different decision environment.
Generally, despite any changes that are made in the classification rule used,
this rule is always defined independently of the existing decision alterna-
tives. This is the major distinguishing difference between the classification
problem and the problems of choice or ranking.
4 Chapter 1

2. THE CLASSIFICATION PROBLEM


As already mentioned classification refers to the assignment of a finite set of
alternatives into predefined groups; this is a general description. There are
several more specific terms often used to refer to this form of decision mak-
ing problem. The most common ones are the following three:
Discrimination.
Classification.
Sorting.
The first two terms are commonly used by statisticians as well as by sci-
entists of the artificial intelligence field (neural networks, machine learning,
etc.). The term “sorting” has been established by MCDA researchers.
Although all three terms refer to the assignment of a set of alternatives
into predefined groups, there is notable difference to the kinds of problems
that they describe. In particular, from the methodological point of view the
above three terms describe two different kinds of problems. The terms “dis-
crimination” and “classification” refer to problems where the groups are de-
fined in a nominal way. In this case the alternatives belonging into different
groups have different characteristics, without being possible to establish any
kind of preference relation between them (i.e., the groups provide a descrip-
tion of the alternatives without any further information). One of the most
well-known problems of this form is the iris classification problem used by
Fisher (1936) with a pioneering work on the linear discriminant analysis.
This problem involves the distinction between three species of flowers, iris
setosa, iris versicolor and iris virginica, given their physical characteristics
(length and width of the sepal and petal). Obviously, each group (specie)
provide a description of its member flowers, but this description does not
incorporate any preferential information. Pattern recognition is also an ex-
tensively studied problem of this form with numerous significant applica-
tions in letter recognition, speech recognition, recognition of physical ob-
jects and human characteristics.
On the other hand, sorting refers to problems where the groups are de-
fined in an ordinal way. A typical example of this form of problems, is the
bankruptcy risk evaluation problem, which will be extensively studied later
on in this book (Chapter 6). Bankruptcy risk evaluation models typically
involve the assignment of a firm into the group healthy firms or into the
group of bankrupt ones. This is an ordinal definition of the groups, since it is
rather obvious that the healthy firms are in a better situation than the bank-
rupt ones. Therefore, the definition of the groups in sorting problems does
not only provide a simple description of the alternatives, but it also incorpo-
1. Introduction to the classification problem 5

rates additional preferential information, which could be of interest to the


decision making context.
For simplicity reasons, henceforth only the general term “classification”
will be used in this book. However, distinction will be made between sorting
and classification when required.
Closing this introduction to the main concepts related to the classification
problem, it is important to emphasize the difference between classification
and clustering: in classification the groups are defined a priori, whereas in
clustering the objective is to identify clusters (groups) of alternatives sharing
similar characteristics. In other words, in a classification problem the analyst
knows in advance what the results of the analysis should look like, while in
clustering the analyst tries to organize the knowledge embodied in a data
sample in the most appropriate way according to some similarity measure.
Figure 1.3 outlines this difference in a graphical way.
6 Chapter 1

The significance of the classification problems extends to a wide variety


of practical fields of interest. Some characteristic examples are the follow-
ing:
Medicine: medical diagnosis to assign the patients into groups (diseases)
according to the observed symptoms (Tsumoto, 1998; Belacel, 2000).
Pattern recognition: recognition of human characteristics or physical ob-
jects and their classification into properly defined groups (Ripley, 1996;
Young and Fu, 1997; Nieddu and Patrizi, 2000).
Human resources management: personnel evaluation on the basis of its
skills and assignment to proper working positions.
Production management: monitoring and control of complex production
systems for fault diagnosis purposes (Catelani and Fort, 2000; Shen,
2000).
Marketing: selection of proper marketing policies for penetration to new
markets, analysis of customer characteristics, customer satisfaction meas-
urement, etc. (Dutka, 1995; Siskos et al., 1998).
Environmental management and energy policy: analysis and in time di-
agnosis of environmental impacts, examination of the effectiveness of en-
ergy policy measures (Diakoulaki et al., 1999).
Financial management and economics: bankruptcy prediction, credit risk
assessment, portfolio selection (stock classification), country risk assess-
ment (Zopounidis, 1998; Zopounidis and Doumpos, 1998).

3. GENERAL OUTLINE OF CLASSIFICATION


METHODS
Most classification methods proposed for the development of classification
models operate on the basis of a regression philosophy, trying to exploit the
knowledge that is provided through the a priori definition of the groups. A
general outline of the procedure used to develop a classification model is
presented in Figure 1.4. This procedure is common to most of the existing
classification methods.
In traditional statistical regression, the objective is to identify the func-
tional relationship between a dependent variable Y and a vector of independ-
ent variables X given a sample of existing observations (Y, X). Most of the
existing classification methods address the classification problem in a similar
approach. The only actual difference between the statistical regression and
the classification problem is that in the latter case, the dependent variable is
not a real valued variable, but a discrete one. Henceforth, the dependent
variable that determines the classification of the alternatives will be denoted
by C, while its discrete levels (groups) will be denoted by
1. Introduction to the classification problem 7

where q is the number of groups. Similarly, g will be used to denote


the vector of independent variables, i.e., Henceforth the
independent variables will be referred to as criteria or attributes. Both terms
are quite similar. However, an attribute defines a nominal description of the
alternatives, whereas a criterion defines an ordinal description (i.e., a crite-
rion can be used to specify if an alternative is preferred over another,
8 Chapter 1

whereas an attribute cannot provide this information). A more detailed dis-


cussion of the criterion concept is given in the Chapter 3. The term “attrib-
ute” will only be used in the review presented in Chapter 2 regarding the
existing parametric and non-parametric classification techniques, in order to
comply with the terminology used in the disciplines discussed in the review.
All the remaining discussion made in this book will use the term “criterion”
which is established in the field of multicriteria decision aid, the main meth-
odological approach proposed in this book.
The sample of observations used to develop the classification model will
be referred to as the training sample or reference set. The number of observa-
tions of the training sample will be denoted by m. The observations will be
referred to as alternatives. Each alternative is considered as a vector con-
sisting of the performance of the alternative on each criterion, i.e.,
where denotes the performance of alternative on criterion
On the basis of the above notations, addressing the classification problem
involves the development of a model of the form which can be used
to determine the classification of the alternatives given their characteristics
described using the criteria vector g. The development of such a model is
performed so that a predefined measure of the differences between the a pri-
ori classification C and the estimated classification is minimized.
If the developed model performs satisfactorily in the training sample, it
can be used to decide upon the classification of any new alternative that be-
comes under consideration. This is the major point of interest in implement-
ing the above process: to be able to organize the knowledge embodied in the
training sample so that it can be used for real-time decision making pur-
poses.
The above model development and implementation process is common to
the majority of the existing classification techniques, at least as far as its
general philosophy is concerned. There are several differences, however, in
specific aspects of this process, involving mainly the details of the parameter
estimation procedure and the form of the classification model. Figure 1.5
gives a general categorization of the existing methodologies on the basis of
these two issues.
The developed classification model, most commonly, has a functional
form expressed as a function combining the alternatives’ performance on the
criteria vector g to estimate a score for each alternative. The estimated score
is a measure of the probability that an alternative belongs into a specific
group. The objective of the model development process, in this case, is to
minimize a measure of the classification error involving the assignment of
the alternatives by the model in an incorrect group. The classification models
of form are referred in Figure 1.5 as “quantitative”, in the sense that they
1. Introduction to the classification problem 9

rely on the development and use of a quantitative index to decide upon the
assignment of the alternatives1.

Alternatively to the functional form, classification models can also have a


symbolic form. The approaches that follow this methodological approach
lead to the development of a set of “IF conditions THEN conclusion” classi-

1
The term “quantitative models” does not necessarily imply that the corresponding
approaches handle only quantitative variables. The developed function can also
consider qualitative variables too. This will be demonstrated later in this book,
through the presentation of multicriteria decision aid classification methodologies.
10 Chapter 1

fication rules. The conditions part of each rule involves the characteristics of
the alternatives, thus defining the conditions that should be fulfilled in order
for the alternatives to be assigned into the group indicated in the conclusion
part of the rule. Except for a classification recommendation, in some cases
the conclusion part also includes a numerical coefficient representing the
strength of the recommendation (conclusion). Procedures used to develop
such decision rules are referred to as rule induction techniques. Generally, it
is possible to develop an exhaustive set of rules covering all alternatives be-
longing in the training sample, thus producing a zero classification error.
This, however, does not ensure that the developed rules have the necessary
generalizing ability. For this reason, in order to avoid the development of
rules of limited usefulness a more compact set of rules is often developed.
The plethora of real-world classification problems encountered in many
research and practical fields has been the major motivation for researchers
towards the continuous development of advanced classification methodolo-
gies. The general model development procedure presented above, reflects the
general scheme and objective of every classification methodological ap-
proach, i.e. the elicitation of knowledge from a sample of alternatives and its
representation into a functional or symbolic form, such that the reality can be
modeled as consistently as possible. A consistent modeling ensures the reli-
ability of the model’s classification recommendations.

4. THE PROPOSED METHODOLOGICAL


APPROACH AND THE OBJECTIVES OF THE
BOOK
Among the different methodological approaches proposed for addressing
classification problems (see Chapter 2 for an extensive review), MCDA is an
advanced field of operations research providing several advantages from the
research and practical points of view.
At the research level, MCDA provides a plethora of methodological ap-
proaches for addressing a variety of decision making situations. Many of
these approaches are well-suited to the nature of the classification problem.
The major characteristic shared by all MCDA classification approaches is
their focus on the modeling and addressing of sorting problems. This form of
classification problems is of major interest within a decision making context,
given that the concept of preference lies in the core of every real-world deci-
sion2. Furthermore, recently, there has been a number of studies on the use of

2
Recall from the discussion in section 2 of this chapter that sorting problems involve the
consideration of the existing preferences with regard to the specification of the groups (or-
1. Introduction to the classification problem 11

MCDA approaches for addressing discrimination problems too (Perny, 1998;


Belacel, 2000). Except for the MCDA approach, the research made in other
fields on considering the special features of the sorting problems is still quite
limited. This characteristic of MCDA can be considered as a significant ad-
vantage within a decision making context.
The above issue has also practical implications. The main objective of
many classification methodologies is to develop “optimal” classification
models, where the term optimal is often restricted to the statistical descrip-
tion of the alternatives, or to the classification accuracy of the developed
model given a training sample. In the first case, the discrimination of a given
set of alternatives on the basis of a pre-specified statistical discrimination
measure is often inadequate in practice. This is because such an approach
assumes that the decision maker is familiar with the necessary theoretical
background required for the appropriate interpretation of the developed
models. In the second case, the development of classification models of high
accuracy is obviously of major interest from a practical perspective. This
objective, however, should be accompanied with the objective of developing
models that are easily interpretable and that they comply with the concepts
used by the decision maker. This will ensure that the decision maker can
judge the logical consistency of the developed model, judge it according to
his/her decision making policy and argue upon the model’s recommenda-
tions.
The objective of MCDA is to address the above issues taking into con-
sideration the decision maker’s preferential system. The major part of the
research in the development of MCDA classification methodologies has
been devoted to the theoretical aspects of the model development and im-
plementation process, given that the decision maker is willing to provide
several information regarding his/her preferential system. However, this is
not always a feasible approach, especially within the context of repetitive
decision making situations, where the time required to make decisions is of-
ten crucial. The preference disaggregation approach (Jacquet-Lagrèze and
Siskos, 1982, 1983) of MCDA is well-suited for addressing this problem
following the general regression-based scheme outlined in section 3 of this
chapter. The preference disaggregation approach constitutes the basis of the
methodology proposed in this book for addressing classification problems.
In particular, the present book has two major objectives:
1. To illustrate the contribution of MCDA in general, and preference disag-
gregation in particular in addressing classification problems: Towards
the accomplishment of this objective, Chapter 4 presents in detail two

dinal specification of the groups), while discrimination problems do not consider this spe-
cial feature.
12 Chapter 1

MCDA methods that employ the preference disaggregation paradigm.


These methods include the UTADIS method (UTilités Additives DIS-
criminantes) and the MHDIS method (Multi-group Hierarchical DIS-
crimination). Both methods constitute characteristic examples of the way
that the preference disaggregation approach can be used for model devel-
opment in classification problems. Furthermore, the preference disaggre-
gation philosophy is a useful basis which can be used in conjunction with
other MCDA streams (see Chapter 3 for a discussion of the existing
MCDA methodological streams). Between these streams, the outranking
relation approach (Roy, 1991) is the most widely studied MCDA field for
developing classification models. The new methodology presented in
Chapter 5 for specifying the parameters of the ELECTRE TRI method (a
well-known MCDA classification method based on the outranking rela-
tion approach), illustrates the capabilities provided by the preference dis-
aggregation paradigm in applying alternate MCDA approaches in a flexi-
ble way; it also illustrates the interactions that can be established between
preference disaggregation analysis and other methodological streams of
MCDA.
2. To perform a thorough investigation of the efficiency of MCDA classifi-
cation approaches in addressing classification problems: Despite the sig-
nificant theoretical developments made on the development of MCDA
classification methodologies, there is still a lack of research studies on the
investigation of the efficiency of these methodologies as opposed to other
approaches. Of course, the up to date applications presented in several
practical and research fields illustrate the high level of support that
MCDA methodologies provide to decision makers through:
The promotion of the direct participation of the decision maker in the
decision making process.
The interactive development of user-oriented models that facilitate the
better understanding of the major structural parameters of the problem
at hand.
In addition to these supporting features, the efficiency of the MCDA
classification methodologies is also a crucial issue for their successful
implementation in practice. The analysis of the classification efficiency
can not be performed solely on the basis of case studies applications; ex-
perimental investigation is also required. The extensive simulation pre-
sented in Chapter 5 addresses the above issue considering the classifica-
tion performance of MCDA classification methods (UTADIS, MHDIS,
ELECTRE TRI) compared to other well-known methodologies. The en-
couraging results of this comparison are further complimented by the
practical applications presented in Chapter 6. These applications involve
1. Introduction to the classification problem 13

classification problems from the field of financial management. Financial


management during the last decades has become a field of major impor-
tance for the sustainable development of firms and organizations. This is
due to the increasing complexity of the economic, financial and business
environments worldwide. These new conditions together with the com-
plexity of financial decision making problems has motivated researchers
from various research fields (operations research, artificial intelligence,
computer science, etc.) to develop efficient methodologies for financial
decision making purposes. Considering these remarks, three financial de-
cision making problems are considered: bankruptcy prediction, credit risk
assessment, portfolio selection and management. These applications high-
light the contribution of MCDA classification techniques in addressing
significant real-world decision making problems of high complexity. The
results obtained through the conducted experimental investigation and the
above applications illustrate the high performance capabilities of MCDA
classification methodologies compared to other well-established tech-
niques.
Chapter 2
Review of classification techniques

1. INTRODUCTION
As mentioned in the introductory chapter, the major practical importance of
the classification problem motivated researchers towards the development of
a variety of different classification methodologies. The purpose of this chap-
ter is to review the most well-known of these methodologies for classifica-
tion model development. The review is organized into two major parts, in-
volving respectively:
1. The statistical and econometric classification methods which constitute
the “traditional” approach to develop classification models.
2. The non-parametric techniques proposed during the past two decades as
innovative and efficient classification model development techniques.

2. STATISTICAL AND ECONOMETRIC


TECHNIQUES
Statistics is the oldest science involved with the analysis of given samples in
order to make inferences about an unknown population. The classification
problem is addressed by statistical and econometric techniques within this
context. These techniques include both univariate and multivariate methods.
The former involve the development and implementation of univariate statis-
tical tests which are mainly of descriptive character. For these reasons, such
techniques will not be considered in this review. The foundations of multi-
16 Chapter 2

variate techniques can be traced back to the work of Fisher (1936) on the
linear discriminant analysis (LDA). LDA has been the most extensively used
methodology for developing classification models for several decades. Ap-
proximately a decade after the publication of Fisher’s paper, Smith (1947)
extended LDA to the more general quadratic form (quadratic discriminant
analysis - QDA).
During the subsequent decades the focus of the conducted research
moved towards the development of econometric techniques. The most well-
known methods from this field include the linear probability model, logit
analysis and probit analysis. These three methods are actually special forms
of regression analysis in cases where the dependent variable is discrete. The
linear probability model is only suitable for two-group classification prob-
lems, whereas both logit and probit analysis are applicable to multi-group
problems too. The latter two methodologies have several significant advan-
tages over discriminant analysis. This has been one of the main reasons for
their extensive use.
Despite the criticism on the use of these traditional statistical and econo-
metric approaches, they still remain quite popular both as research tools as
well as for practical purposes. This popularity is supported by the existence
of a plethora of statistical and econometric software, which contribute to the
easy and timeless use of these approaches. Furthermore, statistical and
econometric techniques are quite often considered in comparative studies
investigating the performance of new classification techniques being devel-
oped. In this regard, statistical and econometric techniques often serve as a
reference point (benchmark) in conducting such comparisons. It is also im-
portant to note that under specific data conditions, statistical techniques yield
the optimal classification rule.

2.1 Discriminant analysis


Discriminant analysis has been the first multivariate statistical classification
method used for decades by researchers and practitioners in developing clas-
sification models. In its linear form it was developed by Fisher (1936).
Given a training sample consisting of m alternatives whose classification is a
priori known, the objective of the method is to develop a set of discriminant
functions maximizing the ratio of among-groups to within-groups variance.
In the general case where the classification involves q groups, q-1 linear
functions of the following form are developed:
2. Review of classification techniques 17

where are the attributes describing the alternatives


is a constant term, and are the attributes’ coefficients
in the discriminant function. The indices k and l refer to a pair of groups de-
noted as and respectively.
The estimation of the model’s parameters involves the estimation of the
constant terms and the vectors The estimation pro-
cedure is based on two major assumptions: (a) the data follow the multivari-
ate normal distribution, (b) the variance-covariance matrices for each group
are equal. Given these assumptions, the estimation of the constant terms and
the attributes’ coefficients is performed as follows:

where:
is a n×1 vector consisting of the attributes’ mean values for group
is the within-groups variance-covariance matrix. Denoting by m the
number of alternatives in the training sample, by the vector
and by q the number of groups, the matrix is specified as
follows:

The parameters’ estimates in LDA are not unique. In particular, it is pos-


sible to develop alternative discriminant functions in which the coefficients
and the constant terms are defined as linear transformations of and
This makes it difficult to ascertain the contribution of each attribute in
the classification of the alternatives1. One approach to tackle this problem is
to use the standardized discriminant function coefficients estimated using a
transformed data set so that the attributes have zero mean and unit variance.
Once the parameters (coefficients and constant term) of the discriminant
functions are estimated, the classification of an alternative is decided on
the basis of its discriminant score assigned to the alternative by each

1
In contrast to the traditional multivariate regression analysis, in discriminant analysis statis-
tical tests such as the t-test are rarely used to estimate the significance of the discriminant
function coefficients, simply because these coefficients are not unique.
18 Chapter 2

discriminant function In particular, an alternative is classified into


group if for all other groups the following rule holds:

In the above rule K (k | l) denotes the misclassification cost correspond-


ing to an incorrect decision to classify an alternative into group while ac-
tually belong into group and denotes the a priori probability that an al-
ternative belongs into group Figure 2.1 gives a graphical representation
of the above linear classification rule in the two-group case, assuming that
all misclassification costs and a priori probabilities are equal.

In the case where the group variance-covariance matrices are not equal,
then QDA is used instead of LDA. The general form of the quadratic dis-
criminant function developed through QDA, for each pair of groups and
is the following:

The estimation of the coefficients and the constant term is performed as


follows:
2. Review of classification techniques 19

and denote the within-group variance covariance matrices for


groups and estimated as follows:

where denotes the number of alternatives of the training sample that be-
long into group
Given the discriminant score of an alternative on every discrimi-
nant function corresponding to a pair of groups and the quadratic clas-
sification rule (Figure 2.2) is similar to the linear case: the alternative is
classified into group if and only if for all other groups the following
inequality holds:
20 Chapter 2

In practice, both in LDA and QDA the specification of the a priori prob-
abilities and the misclassification costs K (k | l) is a cumbersome process.
To overcome this problem, trial and error processes are often employed to
specify the optimal cut-off points in the above presented classification rules.
Except for the above issue, LDA and QDA have been heavily criticized
for a series of other problems regarding their underlying assumptions, in-
volving mainly the assumption of multivariate normality and the hypotheses
made on the structure of the group variance-covariance matrices. A compre-
hensive discussion of the impact that these assumptions have on the obtained
discriminant analysis’ results is presented in the book of Altman et al.
(1981).
Given that the above two major underlying assumptions are valid (multi-
variate normality and known structure of the group variance-covariance ma-
trices), the use of the Bayes rule indicates that the two forms of discriminant
analysis (linear and quadratic) yield the optimal classification rule (the LDA
in the case of equal group variance-covariance matrices and the QDA in the
opposite case). In particular, the developed classification rules are asymp-
totically optimal (as the training sample size increases the statistical proper-
ties of the considered groups approximate the unknown properties of the cor-
responding populations). A formal proof of this finding is presented by Duda
and Hart (1978), as well as by Patuwo et al. (1993).
Such restrictive statistical assumptions, however, are rarely met in prac-
tice. This fact raises a major issue regarding the real effectiveness of dis-
criminant analysis in realistic conditions. Several studies have addressed this
issue. Moore (1973), Krzanowski (1975, 1977), Dillon and Goldstein (1978)
showed that when the data include discrete variables, then the performance
of discriminant analysis deteriorates especially when the attributes are sig-
nificantly correlated (correlation coefficient higher than 0.3). On the con-
trary, Lanchenbruch et al. (1973), Subrahmaniam and Chinganda (1978)
concluded that even in the case of non-normal data the classification results
of discriminant analysis models are quite robust, especially in the case of the
QDA and for data with small degree of skewness.

2.2 Logit and probit analysis


The aforementioned problems regarding the assumptions made by discrimi-
nant analysis motivated researches in developing more flexible methodolo-
gies. The first of such methodologies to be developed include the linear
probability model, as well as logit and probit analysis.
The linear probability model is based on a multivariate regression using
as dependent variable the classification of the alternatives of the training
sample. Theoretically, the result of the developed model is interpreted as the
2. Review of classification techniques 21

probability that an alternative belongs into one of the pre-specified groups.


Performing the regression, however, does not ensure that the model’s result
lies in the interval [0, 1], thus posing a major model interpretation problem.
Ignoring this problem, a common cut-off used to decide upon the classifica-
tion of the alternatives is 0.5. In the multi-group case, however, it is rather
difficult to provide an appropriate specification of the probability cut-off
point. This combined with the aforementioned problem on the interpretation
of the developed model make the use of the linear probability model quite
cumbersome, both from a theoretical and a practical perspective. For these
reasons the use of the linear probability model is rather limited and conse-
quently it will not be further considered in this book.
Logit and probit analysis originate from the field of econometrics. Al-
though both these approaches are not new to the research community2, their
use has been boosted during the 1970s with the works of Nobelist Daniel
McFadden (1974, 1980) on the discrete choice theory. The discrete choice
theory provided the necessary basis for understanding the concepts regarding
the interpretation of logit and probit models.
Both logit and probit analysis are based on the development of a non-
linear function measuring the group-membership probability for the alterna-
tives under consideration. The difference between the two approaches in-
volves the form of the function that is employed. In particular, logit analysis
employs the logistic function, whereas the cumulative probability density
function of the normal distribution is used in probit analysis. On the basis of
these functions, and assuming a two-group classification problem, the prob-
ability that an alternative belongs into group is defined as follows3:

Logit analysis:

Probit analysis:

The estimation of the constant term a and the vector b, is performed us-
ing maximum likelihood techniques. In particular, the parameters’ estimation
process involves the maximization of the following likelihood function:

2
The first studies on probit and logit analysis can be traced back to the 1930s and the 1940s
with the works of Bliss (1934) and Berkson (1944), respectively.
3
If a binary 0-1 variable is assigned to designate each group such that and
then equations (2.1)-(2.2) provide the probability that an alternative belongs into group C2.
If the binary variable is used in the opposite way then equations
(2.1)-(2.2) provide the probability that an alternative belongs into group C1.
22 Chapter 2

The maximization of this function is a non-linear optimization problem


which is often difficult to solve. Altman et al. (1981) report that if there ex-
ists a linear combination of the attributes that accurately dis-
criminates the pre-specified groups, then the optimization process will not
converge to an optimal solution.
Once the parameters’ estimation process is completed, equations (2.1)
and (2.2) are used to estimate the group-membership probabilities for all the
alternatives under consideration. The classification decision is taken on the
basis of these probabilities. For instance, in a two-group classification prob-
lem, one can impose a classification rule of the following form: “assign an
alternative to group if otherwise assign the alternative into group
Alternate probability cut-off points, other than 0.5, can also be specified
through trial and error processes.
In the case of multi-group classification problems, logit and probit analy-
sis can be used in two forms: as multinomial or ordered logit/probit models.
The difference among multinomial and ordered models is that the former
assume a nominal definition of the groups, whereas the latter assume an or-
dinal definition. In this respect, ordered models are more suitable for ad-
dressing sorting problems, while traditional discrimination/classification
problems are addressed through multinomial models.
The ordered models require the estimation of a vector of attributes’ coef-
ficients b and a vector of constant terms a. These parameters are used to
specify the probability that an alternative belongs into group in the
way presented in Table 2.1.
The constant terms are defined such that
The parameters’ estimation process is performed similarly to the two-group
case using maximum likelihood techniques.
The multinomial models require the estimation of a set of coefficient vec-
tors and constant terms corresponding to each group
On the basis of these parameters, the multinomial logit model estimates the
probability that an alternative belongs into group as follows:
2. Review of classification techniques 23

For normalization purposes and are set such that and


whereas all other and (k = 2, …, q) are estimated through maximum
likelihood techniques.
Between the logit and probit models, the latter is usually preferred. This
is mainly because the development of logit models requires less computa-
tional effort. Furthermore, there are not strong theoretical and practical re-
sults to support a comparative advantage of probit models in terms of their
classification accuracy.

During the last three decades both logit and probit analysis have been ex-
tensively used by researchers in a wide range of fields as efficient alterna-
tives to discriminant analysis. However, despite the theoretical advantages of
these approaches over LDA and QDA (logit and probit analysis do not pose
assumptions on the statistical distribution of the data or the structure of the
group variance-covariance matrices) several comparative studies made have
not clearly shown that these techniques outperform discriminant analysis
(linear or quadratic) in terms of their classification performance
(Krzanowski, 1975; Press and Wilson, 1978).
24 Chapter 2

3. NON-PARAMETRIC TECHNIQUES
In practice the statistical properties of the data are rarely known, since the
underlying population is difficult to be fully specified. This poses problems
on the use of statistical techniques and motivated researchers towards the
development of non-parametric methods. Such approaches have no underly-
ing statistical assumptions and consequently it is expected that they are
flexible enough to adjust themselves according to the characteristics of the
data under consideration. In the subsequent sections the most important of
these techniques are described.

3.1 Neural networks


Neural networks, often referred to as artificial neural networks, have been
developed by artificial intelligence researchers as an innovative modeling
methodology of complex problems. The foundations of the neural networks
paradigm lie on the emulation of the operation of the human brain. The hu-
man brain consists of huge number of neurons organized in a highly com-
plex network. Each neuron is an individual processing unit. A neuron re-
ceives an input signal (stimulus from body sensors or output signal from
other neurons), which after a processing phase produces an output signal that
is transferred to other neurons for further processing. The result of the over-
all process is the action or decision taken in accordance with the initial
stimulus.
This complex biological operation constitutes the basis for the develop-
ment of neural network models. Every neural network is a network of paral-
lel processing units (neurons) organized into layers. A typical structure of a
neural network (Figure 2.3) includes the following structural elements:
1. An input layer consisting of a set of nodes (processing units-neurons) one
for each input to the network.
2. An output layer consisting of one or more nodes depending on the form
of the desired output of the network. In classification problems, the num-
ber of nodes of the output layer is determined depending on the number
of groups. For instance, for a two-group classification problem the output
layer may include only one node taking two values: 1 for group and 2
for group (these are arbitrary chosen values and any other pair is pos-
sible). In the general case where there are q groups, the number of nodes
in the output layer is usually defined as the smaller integer which is larger
than (Subramanian et al., 1993). Alternatively, it is also possible to
set the number of output nodes equal to the number of groups.
3. A series of intermediate layers referred to as hidden layers. The nodes of
each hidden layer are fully connected with the nodes of the subsequent
2. Review of classification techniques 25

and the proceeding layer. Furthermore, it is also possible to consider


more complicated structures where all layers are fully connected to each
other. Such general network structures are known as fully connected neu-
ral networks. The network presented in Figure 2.3 is an example of such
structure. There is no general rule to define the number of hidden layers.
This is, usually, performed through trial and error processes. Recently,
however, a significant part of the research has been devoted on the devel-
opment of self-organizing neural network models, that is neural networks
that adjust their structure to best match the given data conditions. Re-
search made on the use of neural networks for classification purposes
showed that, generally, a single hidden layer is adequate (Patuwo et al.,
1993; Subramanian et al., 1993). The number of nodes in this layer may
range between q and 2n+1, where q is the number of groups and n is the
number of attributes.
Each connection between two nodes of the network is assigned a weight
representing the strength of the connection. The determination of these
weights (training of the network) is accomplished through optimization
techniques. The objective of the optimization process is to minimize the dif-
ferences between the recommendations of the network and the actual classi-
fication of the alternatives belonging in the training sample.
26 Chapter 2

The most widely used network training methodology is the back propaga-
tion approach (Rumerlhart et al., 1986). Recently advanced nonlinear opti-
mization techniques have also contributed in obtaining globally optimum
estimations of the network’s connection weights (Hung and Denton, 1993).
On the basis of the connections’ weights, the input to each node is deter-
mined as the weighted average of the outputs of all other nodes with which
there is a connection established. In the general case of a fully connected
neural network (cf. Figure 2.3) the input to node i of the hidden layer r is
defined as follows:

where:
the number of nodes at the hidden layer j,
the weight of the connection between node i of layer r and node k
of layer j,
the output of node k at layer j ,
an error term.
The output of each node is specified through a transformation function.
The most common form of this function is the logistic function:

where T is a user-defined constant.


The major advantage of neural networks is their parallel processing abil-
ity as well as their ability to represent highly complex, nonlinear systems.
Theoretically, this enables the approximation of any real function with infi-
nite accuracy (Kosko, 1992). These advantages led to the widespread appli-
cation of neural networks in many research fields. On the other hand, the
criticism on the use of neural networks is focused on two points:
1. The increased computational effort required for training the network
(specification of connections’ weights).
2. The inability to provide explanations of the network’s results. This is a
significant shortcoming, mainly from a decision support perspective,
since in a decision making context the justification of the final decision is
often a crucial point.
2. Review of classification techniques 27

Except for the above two problems, research studies investigating the
classification performance of neural networks as opposed to statistical and
econometric techniques have led to conflicting results. Subramanian et al.
(1993) compared neural networks to LDA and QDA through a simulation
experiment using data conditions that were in accordance with the assump-
tions of the two statistical techniques. Their results show that neural net-
works can be a promising approach, especially in cases of complex classifi-
cation problems involving more than two groups and a large set of attributes.
On the other hand, LDA and QDA performed better when the sample size
was increased.
A similar experimental study by Patuwo et al. (1993), leads to the conclu-
sion that there are many cases where statistical techniques outperform neural
networks. In particular, the authors compared neural networks to LDA and
QDA, considering both the case where the data conditions are in line with
the assumptions of these statistical techniques, as well as the opposite case.
According to the obtained results, when the data are multivariate normal
with equal group variance-covariance matrices, then LDA outperforms neu-
ral networks. Similarly in the case of multivariate normality with unequal
variance-covariance matrices, QDA outperformed neural networks. Even in
the case of non-normal data, the results of the analysis did not show any
clear superiority of neural networks, at least compared to QDA.
The experimental analysis of Archer and Wang (1993) is also worth men-
tioning. The authors discussed the way that neural networks can be used to
address sorting problems, and compared their approach to LDA. The results
of this comparison show a higher classification performance for the neural
networks approach, especially when there is a significant degree of group
overlap.

3.2 Machine learning


During the last two decades machine learning evolved as a major discipline
within the field of artificial intelligence. Its objective is to describe and ana-
lyze the computational procedures required to extract and organize knowl-
edge from the existing experience. Within the different learning paradigms
(Kodratoff and Michalski, 1990), inductive learning through examples is the
one most widely used.
In contrast to the classification techniques described in the previous sec-
tions, inductive learning introduces a completely different approach in mod-
eling the classification problem. In particular, inductive learning approaches
organize the extracted knowledge in a set of decision rules of the following
general form:
28 Chapter 2

IF elementary conditions THEN conclusion

The first part of such rules examines the necessary and sufficient condi-
tions required for the conclusion part to be valid. The elementary conditions
are connected using the AND operator. The conclusion consists of a recom-
mendation on the classification of the alternatives satisfying the conditions
part of the rule.
One of the most widely used techniques developed on the basis of the in-
ductive learning paradigm is the C4.5 algorithm (Quinlan, 1993). C4.5 is an
improved modification of the ID3 algorithm (Quinlan, 1983, 1986). Its main
advantages over its predecessor involve:
1. The capability of handling qualitative attributes.
2. The capability of handling missing information.
3. The elimination of the overfitting problem4.
The decision rules developed through the C4.5 algorithm are organized in
the form of a decision tree such as the one presented in Figure 2.4. Every
node of the tree considers an attribute, while the branches correspond to
elementary conditions defined on the basis of the node attributes. Finally, the
leaves designate the group to which an alternative is assigned, given that it
satisfies the branches’ conditions.

4
Overfitting refers to the development of classification models that perform excellently in
classifying the alternatives of the training sample, but their performance in classifying
other alternatives is quite poor.
2. Review of classification techniques 29

The development of the classification tree is performed through an itera-


tive process. Every stage of this process consists of three individual steps:
1. Evaluation of the discriminating power of the attributes in classifying the
alternatives of the training sample.
2. Selection of the attribute having the highest discriminating power.
3. Definition of subsets of alternatives on the basis of their performances on
the selected attribute.
This procedure is repeated for every subset of alternatives formed in the
third step, until all alternatives of the training sample are correctly classified.
The evaluation of the attributes’ discriminating power in the first step of
the above process is performed on the basis of amount of new information
introduced by each attribute in the classification of the alternatives. The en-
tropy of the classification introduced by each attribute is used as the appro-
priate information measure. In particular, assuming that each attribute intro-
duces a partitioning of the training sample into t subsets each
consisting of alternatives, then the entropy of this partitioning is defined as
follows:

where, denotes the number of alternatives of set that belong


into group The attribute with the minimum entropy is selected as the one
with the highest discriminating power. This attribute adds the highest
amount of new information in the classification of the alternatives.
The above procedure may lead to a highly specialized classification tree
with nodes covering only one alternative. This is the result of overfitting the
tree to the given data of the training sample, a phenomenon which is often
related to limited generalizing performance. C4.5 addresses this problem
through the implementation of a pruning phase, so that the decision tree’s
size is reduced, in order to improve its expected generalizing performance.
The development and implementation of pruning methodologies is a signifi-
cant research topic in the machine learning community. Some characteristic
examples of pruning techniques are the ones presented by Breiman et al.
(1984), Gelfand et al. (1991), Quinlan (1993).
The general aspects of the paradigm used in C4.5 are common to other
machine learning algorithms. Some well-known examples of such algo-
rithms include CN2 (Clark and Niblett, 1989), the AQ family of algorithms
(Michalski, 1969) and the recursive partitioning algorithm (Breiman et al.,
1984).
30 Chapter 2

The main advantages of machine learning classification algorithms in-


volve the following capabilities:
1. Handling of qualitative attributes.
2. Flexibility in handling missing information.
3. Exploitation of large data sets for model development purposes through
computationally efficient procedures.
4. Development of easily understandable classification models (classifica-
tion rules or trees).

3.3 Fuzzy set theory


Decision making is often based on fuzzy, ambiguous and vague judgments.
The daily use of verbal expressions such as “almost”, “usually”, “often”,
etc., are simple yet typical examples of this remark. The fuzzy nature of
these simple verbal statements is indicative of the fuzziness encountered in
the decision making process. The fuzzy set theory developed by Zadeh
(1965), provides the necessary modeling tools for the representation of un-
certainty and fuzziness in complex real-world situations.
The core of this innovative approach is the fuzzy set concept. A fuzzy set
is a set with no crisp boundaries. In the case of a traditional crisp set A a
proposition of the form “alternative x belongs to the set A” is either true or
false; for a fuzzy set, however, it can be partly true or false. Within the con-
text of the fuzzy set theory the modeling of such fuzzy judgments is per-
formed through the definition of membership functions. A membership func-
tion defines the membership degree that an object (alternative) belongs into a
fuzzy set. The membership degree ranges in the interval [0, 1]. In the afore-
mentioned example a membership degree equal to 1 indicates that the propo-
sition “alternative x belongs to the set A” is true. Similarly, if the member-
ship degree is 0, then it is concluded that the proposition is false. Any other
value for the membership degree between 0 and 1 indicates that the proposi-
tion is partly true.
Figure 2.5 presents an example of a typical form for the membership
function µ for the proposition “according to attribute alternative x belongs
to the set A”. The membership function corresponding to the negation of this
proposition is also presented (the negation defines the complement set of A,
denoted as the complement set includes the alternatives not belonging
into A).
In order to derive an overall conclusion regarding the membership of an
alternative into a fuzzy set based on the consideration of all attributes, one
must aggregate the partial membership degrees for each individual attribute.
This aggregation is based on common operators such as “AND” and “OR”
operators. The former corresponds to a union operation, whereas the latter
2. Review of classification techniques 31

indicates an intersection operation. A combination of these two operators is


also possible.

In the case of classification problems, each group can be considered as a


fuzzy set. Similarly to the machine learning paradigm, classification models
developed through approaches that implement the fuzzy set theory have the
form of decision rules. The general form of a fuzzy rule used for classifica-
tion purposes is the following:

where each corresponds to a fuzzy set defined on the scale of attribute


The strength of each individual condition is defined by the membership de-
gree of the corresponding proposition “according to attribute alternative
belongs to the set The rules of the above general form are usually asso-
ciated with a certainty coefficient indicating the certainty about the validity
of the conclusion part.
Procedures for the development of fuzzy rules in classification problems
have been proposed by several researchers. Some indicative studies on this
field are the ones of Ishibuchi et al. (1992, 1993), Inuiguchi et al. (2000),
Bastian (2000), Oh and Pedrycz (2000).
Despite the existing debate on the relation between the fuzzy set theory
and the traditional probability theory, fuzzy sets have been extensively used
to address a variety of real-world problems from several fields. Furthermore,
several researchers have exploited the underlying concepts of the fuzzy set
theory in conjunction with other disciplines such as neural networks (neuro-
fuzzy systems; Von Altrock, 1996), expert systems (fuzzy rule-based expert
systems; Langholz et al., 1996), mathematical programming (fuzzy mathe-
matical programming; Zimmermann, 1978) and MCDA (Yager, 1977; Du-
32 Chapter 2

bois and Prade, 1979; Siskos, 1982; Siskos et al., 1984a; Fodor and
Roubens, 1994; Grabisch, 1995, 1996; Lootsma, 1997).

3.4 Rough sets


Pawlak (1982) introduced the rough set theory as a tool to describe depend-
encies between attributes, to evaluate the significance of attributes and to
deal with inconsistent data. As an approach to handle imperfect data (uncer-
tainty and vagueness), it complements other theories that deal with data un-
certainty, such as probability theory, evidence theory, fuzzy set theory, etc.
Generally, the rough set approach is a very useful tool in the study of classi-
fication problems, regarding the assignment of a set of alternatives into pre-
specified classes. Recently, however, there have been several advances in
this field to allow the application of the rough set theory to choice and rank-
ing problems as well (Greco et al., 1997).
The rough set philosophy is founded on the assumption that with every
alternative some information (data, knowledge) is associated. This informa-
tion involves two types of attributes; condition and decision attributes. Con-
dition attributes are those used to describe the characteristics of the objects.
For instance the set of condition attributes describing a firm can be its size,
its financial characteristics (profitability, solvency, liquidity ratios), its or-
ganization, its market position, etc. The decision attributes define a partition
of the objects into groups according to the condition attributes.
On the basis of these two types of attributes an information table S=<U,
Q, V, f > is formed, as follows:
U is a finite set of m alternatives (objects).
Q is a finite set of n attributes.
V is the intersection of the domains of all attributes (the domain of each
attribute is denoted by The traditional rough set theory assumes
that the domain of each attribute is a discrete set. In this context every
quantitative real-valued attribute needs to be discretized5, using discreti-
zation algorithms such as the ones proposed by Fayyad and Irani (1992),
Chmielewski and Grzymala-Busse (1996), Zighed et al. (1998). Re-
cently, however, the traditional rough set approach has been extended so
that no discritezation is required for quantitative attributes. Typical ex-
amples of the new direction are the DOMLEM algorithm (Greco et al.,
1999a) and the MODLEM algorithm (Grzymala-Busse and Stefanowski,
2001).

5
Discretization involves the partitioning of an attribute’s domain [a, b] into h subintervals
where and
2. Review of classification techniques 33

is a total function such that for every


called information function (Pawlak, 1991; Pawlak and Slowinski,
1994).
Simply stated, the information table is an m×n matrix, with rows corre-
sponding to the alternatives and columns corresponding to the attributes.
Given an information table, the basis of the traditional rough set theory is
the indiscernibility between the alternatives. Two alternatives and are
considered to be indiscernible, if and only if they are characterized by the
same information, i.e. for every In this way every
leads to the development of a binary relation on the set of alternatives.
This relation is called P-indiscernibility relation, denoted by is an
equivalence relation for any P.
Every set of indiscernible alternatives is called elementary set and it con-
stitutes a basic granule of knowledge. Equivalence classes of the relation
are called P-elementary sets in S and denotes the P-elementary set con-
taining alternative
Any set of objects being a union of some elementary sets is referred to as
crisp (precise) otherwise it is considered to be rough (imprecise, vague).
Consequently, each rough set has a boundary-line consisting of cases (ob-
jects) which cannot be classified with certainty as members of the set or of
its complement. Therefore, a pair of crisp sets, called the lower and the upper
approximation can represent a rough set. The lower approximation consists
of all objects that certainly belong to the set and the upper approximation
contains objects that possibly belong to the set. The difference between the
upper and the lower approximation defines the doubtful region, which in-
cludes all objects that cannot be certainly classified into the set. On the basis
of the lower and upper approximations of a rough set, the accuracy its ap-
proximation can be calculated as the ratio of the cardinality of its lower ap-
proximation to the cardinality of its upper approximation.
Assuming that and then the P-lower approximation, the P-
upper approximation and the P-doubtful region of and
respectively), are formally defined as follows:

On the basis of these approximations it is possible to estimate the accu-


racy of the approximation of the rough set Y, denoted by The accu-
34 Chapter 2

racy of the approximation is defined as the ratio of the number of alterna-


tives belonging into the lower approximation to the number of alternatives of
the upper approximation:

Within the context of a classification problem, each group is consid-


ered as a rough set The overall quality of the approximation of the classi-
fication by a set of attributes P is defined as follows:

Having defined the quality of the approximation, the first major capabil-
ity that the rough set theory provides is to reduce the available information,
so as to retain only the information that is absolutely necessary for the de-
scription and classification of the alternatives. This is achieved by discover-
ing subsets R of the complete set of attributes P, which can provide the same
quality of classification as the whole attributes’ set, i.e Such
subsets of attributes are called reducts and are denoted by Gener-
ally, the reducts are more than one. In such a case the intersection of all re-
ducts is called the core, i.e The core is the collec-
tion of the most relevant attributes, which cannot be excluded from the
analysis without reducing the quality of the obtained description (classifica-
tion). The decision maker can examine all obtained reducts and proceed to
the further analysis of the considered problem according to the reduct that
best describes reality. Heuristic procedures can also be used to identify an
appropriate reduct (Slowinski and Zopounidis, 1995).
The subsequent steps of the analysis involve the development of a set of
rules for the classification of the alternatives into the groups where they ac-
tually belong. The rules developed through the rough set approach have the
following form:

IF conjunction of elementary conditions


THEN disjunction of elementary decisions

The procedures used to construct a set of decision rules employ the ma-
chine learning paradigm. Such procedures developed within the context of
the rough set theory have been presented by Grzymala-Busse (1992), Slow-
inski and Stefanowski (1992), Skowron (1993), Ziarko et al. (1993), Ste-
2. Review of classification techniques 35

fanowski and Vanderpooten (1994), Mienko et al. (1996), Grzymala-Busse


and Stefanowski (2001). Generally, the rule induction techniques follow one
of the following strategies:
1. Development of a minimal set of rules covering all alternatives of the
training sample (information table).
2. Development of an extensive set of rules consisting of all possible deci-
sion rules.
3. Development of a set of strong rules, even partly discriminant6, which do
not necessarily cover all alternatives of the training sample.
The first rule induction approaches developed within the rough set theory
assumed that the attributes’ domain was a set of discrete values; otherwise a
discretization was required. The most well-known approach within this cate-
gory of rule induction techniques is the LEM2 algorithm (Grzymala-Busse,
1992). This algorithm leads to the development of a minimal set of rules
(i.e., rules which are complete and non-redundant)7. The elementary condi-
tions of decision rules developed through the LEM2 algorithm have an
equality form where is a condition attribute and
Recently new rule induction techniques have been developed that do not
require the discretization of quantitative condition attributes. The condition
part of rules induced through these techniques have an inequality form
strict inequalities are also possible). Typical examples of
such techniques are the DOMLEM algorithm (Greco et al., 1999a) and the
MODLEM algorithm (Grzymala-Busse and Stefanowski, 2001). Both these
algorithms are based on the philosophy of the LEM2 algorithm. The
DOMLEM algorithm leads to the development of rules that have the follow-
ing form:

6
Rules covering only alternatives that belong to the group indicated by the conclusion ofthe
rule (positive examples) are called discriminant rules. On the contrary, rules that cover
both positive and negative examples (alternatives not belonging into the group indicated by
the rule) are called partly discriminant rules. Each partly discriminant rule is associated
with a coefficient measuring the consistency of the rule. This coefficient is called level of
discrimination and is defined as the ratio of positive to negative examples covered by the
rule.
7
Completeness refers to a set of rules that cover all alternatives of the training sample. A set
of rules is called non-redundant if the elimination of any single rule from initial rule set
leads to a new set of rules that does not have the completeness property.
36 Chapter 2

and denote the set of alternatives belonging into the sets of


groups and respectively. In this context it
is assumed that the groups are defined in an ordinal way, such that is the
group of the most preferred alternatives and is the group of the least pre-
ferred ones.
The rules developed through the MODLEM algorithm have a similar
form to the ones developed by the DOMLEM algorithm. There are two dif-
ferences, however:
1. Each elementary condition has the form or
2. The condition part of the rules indicates a specific classification of the
alternatives rather than a set of groups.
Irrespective of the rule induction approach employed, a decision rule de-
veloped on the basis of the rough set approach has some interesting proper-
ties and features. In particular, if all alternatives that satisfy the condition
part belong into the group indicated by the conclusion of the rule, then the
rule is called consistent. In the case where the condition part considers only a
single group, then the rule is called exact, otherwise the rule is called ap-
proximate. The conclusion part of approximate rules involves a disjunction
of at least two groups Approximate rules are devel-
oped when the training sample (information table) includes indiscernible
alternatives belonging into different groups. Each rule is associated with a
strength measure, indicating the number of alternatives covered by the rule.
For approximate rules their strength is estimated for each individual group
considered in their conclusion part. Stronger rules consider a limited number
of elementary conditions; thus, they are more general.
Once the rule induction process is completed, the developed rules can be
easily used to decide upon the classification of any new alternative not con-
sidered during model development. This is performed by matching the con-
ditions part of each rule to the characteristics of the alternative, in order to
identify a rule that covers the alternative. This matching process may lead to
one of the following four situations (Slowinski and Stefanowski, 1994):
1. The alternative is covered only by one exact rule.
2. The alternative is covered by more than one exact rules, all indicating the
same classification.
3. The alternative is covered by one approximate rule or by more than one
exact rules indicating different classifications.
4. The alternative is not covered by any rule.
The classification decision in situations (1) and (2) is straightforward. In
situation (3) the developed rule set leads to conflicting decisions regarding
the classification of the alternative. To overcome this problem, one can con-
sider the strength of the rules that cover the alternative (for approximate rule
2. Review of classification techniques 37

the strength for each individual group of the condition part must be consid-
ered). The stronger rule can be used to take the final classification decision.
This approach is employed in the LERS classification system developed by
Grzymala-Busse (1992).
Situation (4) is the most difficult one, since using the developed rule set
one has no evidence as to the classification of the alternative. The LERS sys-
tem tackles this problem through the identification of rules that partly cover
the characteristics of the alternative under consideration8. The strength of
these rules as well as the number of elementary conditions satisfied by the
alternative are considered in making the decision. This approach will be dis-
cussed in mode detail in Chapter 5. An alternative approach proposed by
Slowinski (1993), involves the identification of a rule that best matches the
characteristics of the alternative under consideration. This is based on the
construction of a valued closeness relation measuring the similarity between
each rule and the alternative. The construction of this relation is performed
in two stages. The first stage involves the identification of the attributes that
are in accordance to the affirmation “the alternative is close to rule r”. The
strength of this affirmation is measured on a numerical scale between 0 and
1. The second stage involves the identification of the characteristics that are
in discordance with the above affirmation. The strength of concordance and
discordance tests are combined to estimate an overall index representing the
similarity of a rule to the characteristics of the alternative.
Closing this brief discussion of the rough set approach, it is important to
note the recent advances made in this field towards the use of the rough set
approach as a methodology of preference modeling in multicriteria decision
problems (Greco et al., 1999a, 2000a). The main novelty of the recently de-
veloped rough set approach concerns the possibility of handling criteria, i.e.
attributes with preference ordered domains, and preference ordered groups in
the analysis of sorting examples and the induction of decision rules. The
rough approximations of decision groups involve dominance relation, in-
stead of indiscernibility relation considered in the basic rough set approach.
They are build of reference alternatives given in the sorting example (train-
ing sample). Decision rules derived from these approximations constitute a
preference model. Each “if ... then ...” decision rule is composed of: (a) a
condition part specifying a partial profile on a subset of criteria to which an
alternative is compared using the dominance relation, and (b) a decision part
suggesting an assignment of the alternative to “at least” or “at most” a given
class9.

8
Partly covering involves the case where the alternative satisfies only some of the elemen-
tary conditions of a rule.
9
The DOMLEM algorithm discussed previously in this chapter is suitable for developing
such rules.
38 Chapter 2

The decision rule preference model has also been considered in terms of
conjoint measurement (Greco et al., 2001). A representation theorem for
multicriteria sorting proved by Greco et al. states an equivalence of simple
cancellation property, a general discriminant (sorting) function and a specific
outranking relation (cf. Chapter 3), on the one hand, and the decision rule
model on the other hand. It is also shown that the decision rule model result-
ing from the dominance-based rough set approach has an advantage over the
usual functional and relational models because it permits handling inconsis-
tent sorting examples. The inconsistency in sorting examples is not unusual
due to instability of preference, incomplete determination of criteria and
hesitation of the decision maker.
It is also worth noting that the dominance-based rough set approach is
able to deal with sorting problems involving both criteria and regular attrib-
utes whose domains are not preference ordered (Greco et al., 2002), and
missing values in the evaluation of reference alternatives (Greco et al.,
1999b; Greco et al., 2000b). It also handles ordinal criteria in more general
way than the Sugeno integral, as it has been proved in Greco et al. (2001).
The above recent developments have attracted the interest of MCDA re-
searchers on the use of rough sets as an alternative preference modeling
framework to the ones traditionally used in MCDA (utility function, out-
ranking relation; cf. Chapter 3). Therefore, the new extended rough set the-
ory can be considered as a MCDA approach. Nevertheless, in this book the
traditional rough set theory based on the indiscernibility relation is consid-
ered as an example of rule-based classification techniques that employ the
machine learning framework. The traditional rough set theory cannot be con-
sidered as a MCDA approach since it is only applicable with attributes (in-
stead of criteria) and with nominal groups. This is the reason for the inclu-
sion of the rough sets in this chapter rather than the consideration of rough
sets in Chapter 3 that refers to MCDA classification techniques.
Chapter 3
Multicriteria decision aid classification techniques

1. INTRODUCTION TO MULTICRITERIA
DECISION AID

1.1 Objectives and general framework


Multicriteria decision aid (MCDA) is an advanced field of operations re-
search which has evolved rapidly over the past three decades both at the re-
search and practical level.
The development of the MCDA field has been motivated by the simple
finding that resolving complex real-world decision problems cannot be per-
formed on the basis of unidimensional approaches. However, when employ-
ing a more realistic approach considering all factors relevant to a decision
making situation, one is faced with the problem referring to the aggregation
of the existing multiple factors. The complexity of this problem often pro-
hibits decision makers from employing this attractive approach.
MCDA’s scope and objective is to support decision makers towards tack-
ling with such situations. Of course, MCDA is not the only field involved
with the aggregation of multiple factors. All the approaches presented in the
previous chapter are also involved with the aggregation of multiple factors
for decision making purposes. The major distinctive feature of MCDA, how-
ever, is the decision support orientation (decision aid) rather than the simple
decision model development. In this respect, MCDA approaches are focused
40 Chapter 3

on the model development aspects that are related to the modeling and repre-
sentation of the decision makers’ preferences, values and judgment policy.
This feature is of major importance within a decision making context,
bearing in mind that an actual decision maker is responsible for the imple-
mentation of the results of any decision analysis procedure. Therefore, de-
veloping decision models without considering the decision maker’s prefer-
ences and system of values, may be of limited practical usefulness. The deci-
sion maker is given a rather passive role in the decision analysis context. He
does not participate actively to the model development process and his role
is restricted to the implementation of the recommendation of the developed
model, whose features are often difficult to understand.
The methodological advances made in the MCDA field involve any form
of decision making problem (choice, ranking, classification/sorting and de-
scription problems). The subsequent sub-sections describe the main MCDA
methodological approaches and their implementation to address classifica-
tion problems.

1.2 Brief historical review


Even from the early years of mankind, decision making has been a multidi-
mensional process. Traditionally, this process has been based on empirical
approaches rather than on sound quantitative analysis techniques. Pareto
(1896) first set the basis for addressing decision-problems in the presence of
multiple criteria. One of the most important results of Pareto’s research was
the introduction of the efficiency concept.
During the post-war period, Koopmans (1951) extended the concept of
efficiency through the introduction of the efficient set concept: Koopmans
defined the efficient set as the set of non-dominated alternatives. During the
1940s and the 1950s Von Neumann and Morgenstern (1944) introduced the
utility theory, one of the major methodological streams of modern MCDA
and decision science in general.
These pioneering works inspired several researchers during the 1960s.
Charnes and Cooper (1961) extended the traditional mathematical program-
ming theory through the introduction of goal programming. Fishburn (1965)
studied the extension of the utility theory in the multiple criteria case. These
were all studies from US operations researchers. By the end of the 1960s,
MCDA attracted the interest of European operations researchers too. Roy
(1968), one of the pioneers in this field, introduced the outranking relation
approach; he is considered as the founder of the “European” school of
MCDA.
During the next two decades (1970–1990) MCDA evolved both at the
theoretical and practical (real-world applications) levels. The advances made
3. Multicriteria decision aid classification techniques 41

in information technology and computer science contributed towards this


direction. This contribution extends towards two major directions: (1) the
use of advanced computing techniques that enable the implementation of
computationally intensive procedures, (2) the development of user-friendly
decision support systems implementing MCDA methodologies.

1.3 Basic concepts


The major goal of MCDA is to provide a set of criteria aggregation method-
ologies that enable the development of decision support models considering
the decision makers’ preferential system and judgment policy. Achieving
this goal requires the implementation of complex processes. Most com-
monly, these processes do not lead to optimal solutions-decisions, but to sat-
isfactory ones that are in accordance with the decision maker’s policy. Roy
(1985) introduced a general framework described the decision aiding process
that underlies the operation of all MCDA methodologies (Figure 3.1).

The first level of the above process, involves the specification of a set A
of feasible alternative solutions to the problem at hand (alternatives). The
objective of the decision is also determined. The set A can be continuous or
discrete. In the former case it is specified through constraints imposed by the
42 Chapter 3

decision maker or by the decision environment, thus forming a set of feasible


solutions, a concept that is well-known within the mathematical program-
ming framework. In the case where the set A is discrete, it is assumed that
the decision maker can list some alternatives which will be subject to evalua-
tion within the given decision making framework.
The determination of the objective of the decision specifies the way that
the set A should be considered to take the final decision. This involves the
selection of the decision problematic that is most suitable to the problem at
hand:
Choice of the best alternative.
Ranking of the alternatives from the best to the worst.
Classification/sorting of the alternatives into appropriate groups.
Description of the alternatives.
The second stage involves the identification of all factors related to the
decision. MCDA assumes that these factors have the form of criteria. A cri-
terion is a real function g measuring the performance of the alternatives on
each of their individual characteristics, defined such that:

These properties define the main distinctive feature of the criterion con-
cept compared to the attribute concept often used in other disciplines such as
statistics, econometrics, artificial intelligence, etc. (cf. the previous chapter).
Both an attribute and a criterion assign a description (quantitative or qualita-
tive) to an alternative. In the case of a criterion, however, this description
entails some preferential information regarding the performance of an alter-
native compared to other alternatives.
The set of the criteria identified at this second stage of
the decision aiding process, must form a consistent family of criteria. A con-
sistent family of criteria is a set of criteria having the following properties:
1. Monotonicity: every criterion must satisfy the conditions described by
relations (3.1) and (3.2). Some criteria often satisfy (3.1) in the opposite
way: In this case the criterion g is referred to as cri-
terion of decreasing preference (lower values indicate higher prefer-
ence). Henceforth, this book will not make any distinction between cri-
teria of increasing or decreasing preference (any decreasing preference
criterion can be transformed to an increasing preference criterion
through sign reversal). A specified criteria set is considered to satisfy
the monotonicity property if and only if: for every pair of alternatives x
3. Multicriteria decision aid classification techniques 43

and for which there exists such that for every


and it is concluded that x is preferred to
2. Completeness: a set of criteria is complete if and only if for every pair
of alternatives x and such that for every it is con-
cluded that x is indifferent to If this condition does not hold, then it
is considered that the chosen criteria set does not provide enough in-
formation for a proper evaluation of the alternatives in A.
3. Non-redundancy: if the elimination of any single criterion from a crite-
ria set that satisfies the monotonicity and completeness conditions leads
to the formation of a new criteria set that does not meet these condi-
tions, then the initial set of criteria is considered to be non-redundant
(i.e., it provides only the absolutely necessary information for the
evaluation of the alternatives).
Once a consistent family of criteria has been specified, the next step of
the analysis is to proceed with the specification of the criteria aggregation
model that meets the requirements of the objective/nature of the problem
(i.e., choice, ranking, classification/sorting, description).
Finally, in the fourth stage of the analysis the decision maker is provided
with the necessary support required to understand the recommendations of
the model. Providing meaningful support is a crucial issue for the successful
implementation of the results of the analysis and the justification of the ac-
tual decision taken on the basis of the model’s recommendations.

2. METHODOLOGICAL APPROACHES
As already noted, MCDA provides a plethora of methodologies for address-
ing decision making problems. The existing differences between these meth-
odologies involve both the form of the models that are developed as well as
the model development process. In this respect, MCDA researchers have
defined several categorizations of the existing methodologies in this field.
Roy (1985) identified three major methodological streams considering the
features of the developed models:
1. Unique synthesis criterion approaches.
2. Outranking synthesis approaches.
3. Interactive local judgment approaches.
44 Chapter 3

Pardalos et al. (1995) suggested an alternative scheme considering both


the features of the developed models as well as the features of model devel-
opment process1:
1. Multiobjective mathematical programming.
2. Multiattribute utility theory.
3. Outranking relations.
4. Preference disaggregation analysis.
Figure 3.2 illustrates how these four main MCDA methodological
streams contribute to the analysis of decision making problems, both discrete
and continuous. In this figure the solid lines indicate a direct contribution
and dashed lines an indirect one. In particular, multiattribute utility theory,
outranking relations and preference disaggregation analysis are traditionally
used in discrete problems. All these three approaches lead to the develop-
ment of a decision model that enables the decision maker to evaluate the per-
formance of a discrete set of alternatives for choice, ranking or classification
purposes. On the other hand, multiobjective mathematical programming is
most suitable for continuous problems.

As indicated, however, except for the easily identifiable direct contribu-


tion (solid lines) of each MCDA stream in addressing specific forms of deci-
sion making problems, it is also possible to identify an indirect contribution
(dashed lines). In particular, multiattribute utility theory, outranking rela-

1
Henceforth, all subsequent discussion made in this book adopts the approach presented by
Pardalos et al . (1995).
3. Multicriteria decision aid classification techniques 45

tions and preference disaggregation analysis can also be used within the con-
text of continuous decision problems. In this case, they provide the necessary
means to model the decision maker’s preferential system in a functional or
relational model, which can be used in a second stage in an optimization
context (multiobjective mathematical programming). A well-known example
where this framework is highly applicable is the portfolio construction prob-
lem, i.e. the construction of a portfolio of securities that maximizes the in-
vestor’s utility. In this case, the multiattribute utility theory or the preference
disaggregation analysis can be used to estimate an appropriate utility func-
tion representing the investors’ decision making policy. Similarly, the mul-
tiobjective mathematical programming framework can be used in combina-
tion with the other MCDA approaches to address discrete problems. Within
this context, multiobjective mathematical programming techniques are
commonly used for model development purposes. This approach is em-
ployed within the preference disaggregation analysis framework, discussed
later on in this chapter (cf. sub-section 2.4).
The following sub-sections outline the main concepts and features of
each of the aforementioned MCDA approaches. This discussion provides the
basis for reviewing the use of MCDA for classification purposes.

2.1 Multiobjective mathematical programming


Multiobjective mathematical programming (MMP) is an extension of the
traditional mathematical programming theory in the case where multiple ob-
jective functions need to be optimized. The general formulation of a MMP
problem is as follows:

where:
x is the vector of the decision variables,
are the objective functions (linear or non-linear) to be opti-
mized,
B is the set of feasible solutions.
In contrast to the traditional mathematical programming theory, within
the MMP framework the concept of optimal solution is no longer applicable.
This is because the objective functions are of conflicting nature (the opposite
is rarely the case). Therefore, it is not possible to find a solution that opti-
mizes simultaneously all the objective functions. In this regard, within the
46 Chapter 3

MMP framework the major point of interest is to search for an appropriate


“compromise” solution.
In searching for such a solution one does not need to consider the whole
set of feasible solutions; only a part of the feasible set needs to be consid-
ered. This part is called efficient set. The efficient set consists of solutions
which are not dominated by any other solution on the pre-specified objec-
tives. Such solutions are referred to as efficient solutions, non-dominated
solutions or Pareto optimal solutions. In the graphical example illustrated in
Figure 3.3 the efficient set is indicated by the bold line between the points A
and E. Any other feasible solution is not efficient. For instance, solution Z is
not efficient because the feasible solutions C and D dominate Z on the basis
of the two objectives and and and

In solving MMP problems, the optimization of a linear weighted aggrega-


tion of the objectives is not appropriate. As indicated in the Figure 3.3, if the
feasible set is not a convex hull, then using an aggregation of the form
may lead to the identification of a limited part of the existing
efficient solutions. In the example of Figure 3.3 only solutions B and D will
be identified.
Therefore, any MMP solution methodology should accommodate the
need for searching the whole efficient set. This is performed through interac-
tive and iterative procedures. In the first stage of such procedures an initial
efficient solution is obtained and it is presented to the decision maker. If this
3. Multicriteria decision aid classification techniques 47

solution is considered acceptable by the decision maker (i.e., if it satisfies his


expectations on the given objectives), then the solution procedure stops. If
this is not the case, then the decision maker is asked to provide information
regarding his preferences on the pre-specified objectives. This information
involves the objectives that need to be improved as well as the trade-offs that
he is willing to undertake to achieve these improvements. The objective of
defining such information is to specify a new search direction for the devel-
opment of new improved solutions. This process is repeated until a solution
is obtained that is in accordance with the decision maker’s preferences, or
until no further improvement of the current solution is possible.
In the international literature several methodologies have been proposed
that operate within the above general framework for addressing MMP prob-
lems. Some well-known examples are the methods developed by Benayoun
et al. (1971), Zionts and Wallenius (1976), Wierzbicki (1980), Steuer and
Choo (1983), Korhonen (1988), Korhonen and Wallenius (1988), Siskos and
Despotis (1989), Lofti et al. (1992).
An alternative approach to address constrained optimization problems in
the presence of multiple objectives, is the goal programming (GP) approach,
founded by Charnes and Cooper (1961). The concept of goal is different
from that of objective. An objective simply defines a search direction (e.g.,
profit maximization). On the other hand, a goal defines a target against
which the attained solutions are compared (Keeney and Raiffa, 1993). In this
regard, GP optimizes the deviations from the pre-specified targets, rather
than the performance of the solutions. The general form of a GP model is the
following:

Max/Min
subject to :

where
is goal defined as a function (linear or non-linear) of the deci-
sion variables
is the target value for goal
are the deviations from the target value repre-
senting the under-achievement and over-achievement of the goal
respectively.
48 Chapter 3

g is a function (usually linear) of the deviational variables.


The above general formulation shows that actually an objective function
of an MMP formulation is transformed into a constraint within the context of
a GP formulation. The right hand size of these constraints includes the target
values of the goals, which can be defined either as some satisfactory values
of the goals or as their optimal values.
The simplicity of GP formulations has been the main reason for their
wide popularity among researchers and practitioners. Spronk (1981) pro-
vides an extensive discussion of GP as well as its applications in the field of
financial planning.

2.2 Multiattribute utility theory


Multiattribute utility theory (MAUT) extends the traditional utility theory to
the multidimensional case. Even from the early stages of the MCDA field,
MAUT has been one of the cornerstones of the development of MCDA and
its practical implementation. Directly or indirectly all other MCDA ap-
proaches employ the concepts introduced by MAUT. For instance, the un-
derlying philosophy of MMP and GP is to identify an efficient solution that
maximizes the decision maker’s utility. Obviously, this requires the devel-
opment of a utility function representing the decision maker’s system of
preferences. Some MMP methodologies employ this philosophy; they de-
velop a utility function and then, maximize it over the feasible set to identify
the most suitable solution. The methodology presented by Siskos and Despo-
tis (1989) implemented in the ADELAIS system (Aide à la DEcision pour
systèmes Linéaires multicritères par AIde à la Structuration des préférences)
is a typical example of this approach.
The objective of MAUT is to model and represent the decision maker’s
preferential system into a utility/value function U(g), where g is the vector of
the evaluation criteria Generally, the utility function is a
non-linear function defined on the criteria space, such that:

(alternative x is preferred to )
(alternative x is indifferent to )

The most commonly used form of utility function is the additive one:

where,
3. Multicriteria decision aid classification techniques 49

are the marginal utility functions corresponding the evalua-


tion criteria. Each marginal utility function defines the utility/value
of the alternatives for each individual criterion
are constants representing the trade-off that the decision
maker is willing to take on a criterion in order to gain one unit on crite-
rion These constants are often considered to represent the weights of
the criteria and they are defined such that they sum-up to one:

The form of the additive utility function is quite similar to simple


weighted average aggregation models. Actually, such models are a special
form of an additive utility function, where all marginal utilities are defined
as linear functions on the criteria’s values.
The main assumption underlying the use of the additive utility function
involves the mutual preferential independence condition of the evaluation
criteria. To define the mutual preferential independence condition, the con-
cept of preferential independence must be, firstly, introduced. A subset of
the evaluation criteria is considered to be preferential independent
from the remaining criteria, if and only if the decision maker’s preferences
on the alternatives, that differ only with respect to the criteria do not de-
pend on the remaining criteria. Given this definition, the set of criteria g is
considered to be mutual preferentially independent if and only if every sub-
set of criteria is preferentially independent from the remaining crite-
ria (Fisburn, 1970; Keeney and Raiffa, 1993).
A detailed description of the methodological framework underlying
MAUT and its applications is presented in the book of Keeney and Raiffa
(1993).
Generally, the process for developing an additive utility function is based
on the cooperation between the decision analyst and the decision maker.
This process involves the specification of the criteria trade-offs and the form
of the marginal utility functions. The specification of these parameters is
performed through interactive procedures, such as the midpoint value tech-
nique proposed by Keeney and Raiffa (1993). The realization of such inter-
active procedures is often facilitated by the use of multicriteria decision sup-
port systems, such as the MACBETH system developed by Bana e Costa and
Vansnick (1994).
The global utility of the alternatives estimated on the basis of the devel-
oped utility function constitutes an index used for choice, ranking or classifi-
cation/sorting purposes.
50 Chapter 3

2.3 Outranking relation theory


The foundations of the outranking relation theory (ORT) have been set by
Bernard Roy during the late 1960s through the development of the
ELECTRE family of methods (ELimination Et Choix Traduisant la
REalité; Roy, 1968). Since then, ORT has been widely used by MCDA re-
searchers, mainly in Europe.
All ORT techniques operate in two major stages. The first stage involves
the development of an outranking relation, whereas the second stage in-
volves the exploitation of the outranking relation in order to perform the
evaluation of the alternatives for choice, ranking, classification/sorting pur-
poses.
The concept of the outranking relation is common to both these stages.
An outranking relation is defined as a binary relation used to estimate the
strength of the preference for an alternative x over an alternative This
strength is defined on the basis of: (1) the existing indications supporting the
preference of x over (concordance of criteria), (2) the existing indications
against the preference of x over (discordance of criteria).
Generally, an outranking relation is a mechanism for modeling and repre-
senting the decision maker’s preferences based on an approach that differs
from the MAUT framework on two major issues:
1. The outranking relation is not transitive: In MAUT the evaluations ob-
tained through the development of a utility function are transitive. As-
suming three alternatives the transitivity property is formally
expressed as follows:

In contrast to MAUT, ORT enables the modeling and representation


of situations where the transitivity does not hold. A well-known example
is the one presented by Luce (1956) (see also Roy and Vincke, 1981):
obviously no one can tell the difference between a cup of coffee contain-
ing of sugar and a cup of coffee with of sugar; therefore
there is an indifferent relation between these two situations. Similarly,
there is indifference between sugar and of sugar. If
the indifference relation is transitive, then and of sugar
should be considered as indifferent. Following the same line of infer-
ence, it can be deduced that there is no difference between a cup of cof-
3. Multicriteria decision aid classification techniques 51

fee containing of sugar and a cup of coffee that is full of sugar, irre-
spective of Obviously, this is an incorrect conclusion, indicating that
there are cases where transitivity is not valid.
2. The outranking relation is not complete: In the MAUT framework only
the preference and indifference relations are considered. In addition to
these two relations, ORT introduces the incomparability relation. In-
comparability arises in cases where the considered alternatives have ma-
jor differences with respect to their characteristics (performance on the
evaluation criteria) such that their comparison is difficult to be per-
formed.
Despite the above two major differences, both MAUT and ORT use simi-
lar model development techniques, involving the direct interrogation of the
decision maker. Within the ORT context, the decision maker specifies sev-
eral structural parameters of the developed outranking relation. In most ORT
techniques these parameters involve:
1. The significance of the evaluation criteria.
2. Preference, indifference and veto thresholds. These thresholds define a
fuzzy outranking relation such as the one presented in Figure 3.4. Fur-
thermore, the introduction of the veto threshold facilitates the develop-
ment of non-compensatory models (models in which the significantly
low performance of an alternative in an evaluation criterion is not com-
pensated by the performance of the alternatives on the remaining crite-
ria).
The combination of the above information enables the decision-analyst to
measure the strength of the indications supporting the affirmation “alterna-
tive x is at least as good as alternative as well as the strength of the in-
dications against this affirmation.
52 Chapter 3

Once the development of the outranking relation is completed on the ba-


sis of the aforementioned information, the next stage is to employ the out-
ranking relation for decision making purposes (choice, ranking, classifica-
tion/sorting of the alternatives). During this stage heuristic procedures are
commonly employed to decide upon the evaluation of the alternatives on the
basis of developed outranking relation.
The most extensively used ORT techniques are the ELECTRE methods
(Roy, 1991), as well as the PROMETHEE methods (Brans and Vincke,
1985). These two families of methods include different variants that are suit-
able for addressing choice, ranking and classification/sorting problems. Sec-
tions 3.1.2 and 3.1.3 of the present chapter discuss in more detail the applica-
tion of the ORT framework in classification problems.

2.4 Preference disaggregation analysis


From the proceeding discussion on MAUT and ORT, it is clear that both
these approaches are devoted to the modeling and representation of the deci-
sion maker’s preferential system in a pre-specified mathematical model
(functional or relational).
On the other hand, the focus in preference disaggregation analysis (PDA)
is the development of a general methodological framework, which can be
used to analyze the actual decisions taken by the decision maker so that an
appropriate model can be constructed representing the decision maker sys-
tem of preferences, as consistently as possible.
Essentially, PDA employs an opposite decision aiding process compared
to MAUT and ORT (cf. Figure 3.5). In particular, both MAUT and ORT
support the decision maker in aggregating different evaluation criteria on the
basis of a pre-specified modeling form (utility function or outranking rela-
tion). This is a forward process performed on the basis of the direct interro-
gation of the decision maker. The decision maker specifies all the model pa-
rameters with the help of the decision analyst who is familiar with the meth-
odological approach that is employed.
On the contrary, PDA employs a backward process. PDA does not re-
quire the decision maker to provide specific information on how the deci-
sions are taken; it rather asks the decision maker to express his actual deci-
sions. Given these decisions PDA investigates the relationship between the
decision factors (evaluation criteria) and the actual decisions. This investiga-
tion enables the specification of a criteria aggregation model that can repro-
duce the decision maker’s decisions as consistently as possible.
PDA is founded on the principle that it is, generally, difficult to elicit
specific preferential information from decision makers. This difficulty is due
to time constraints and the unwillingness of the decision makers to partici-
3. Multicriteria decision aid classification techniques 53

pate actively in such an interactive elicitation/decision aiding process. In-


stead, it is much easier for decision makers to express their actual decisions,
without providing any other information on how these decisions are taken
(e.g., significance of criteria). PDA provides increased flexibility regarding
the way that these decisions can be expressed. Most commonly, they are ex-
pressed in an ordinal scale involving a ranking or a classification of the al-
ternatives. Alternatively a ratio scale can also be employed (Lam and Choo,
1995). More detailed information is also applicable. For instance, Cook and
Kress (1991) consider the ranking of the alternatives on each evaluation cri-
terion and the ranking of the evaluation criteria according to their signifi-
cance.

The objective of gathering such information is to form a set of examples


of decisions taken by the decision maker. These examples may involve:
1. Past decisions taken by the decision maker.
2. Decisions taken for a limited set of fictitious but realistic alternatives.
3. Decisions taken for a representative subset of the alternatives under con-
sideration, which are familiar to the decision maker and consequently he
can easily express an evaluation for them.
54 Chapter 3

These decision examples incorporate all the preferential information re-


quired to develop a decision support model. PDA’s objective is to analyze
these examples in order to specify the parameters of the model as consis-
tently as possible with the judgment policy of the decision maker.
Henceforth, the set of examples used for model development purposes
within the context of PDA will be referred to as the reference set. The refer-
ence set is the equivalent of the training sample used in statistics, economet-
rics and artificial intelligence (see the discussion in the previous chapters).
Generally, the PDA paradigm is similar to the regression framework used
extensively in statistics, econometrics and artificial intelligence for model
development purposes (cf. Chapter 2). In fact, the foundations of PDA have
been set by operations researchers in an attempt to develop non-parametric
regression techniques using goal-programming formulations. The first stud-
ies on this issue were made during the 1950s by Karst (1958), Kelley (1958)
and Wagner (1959). During the 1970s Srinivasan and Shoker (1973) used
goal programming formulations for the development of ordinal regression
models. In the late 1970s and the beginning of the 1980s, Jacquet–Lagrèze
and Siskos (1978, 1982, 1983) introduced the PDA concept for decision-
aiding purposes through the development of the UTA method (UTilités Ad-
ditives). A comprehensive review of this methodological approach of
MCDA and the development made over the past two decades is presented in
the recent paper of Jacquet–Lagrèze and Siskos (2001).
The first of the aforementioned studies on PDA employed simple linear
weighted average models:

The aim of these approaches was to estimate the scalars (criteria


weights), so that the model’s estimations were as much consistent as pos-
sible to the observed Y. From a decision aiding point of view, however, the
use of weighted average models has two major disadvantages:
1. The weighted average model represents a risk-neutral behavior. The
modeling of risk-prone or risk-averse behaviors can not be modeled and
represented in such a model.
2. The consideration of qualitative criteria is cumbersome in weighted av-
erage models. In several practical decision making problems from the
fields of marketing, financial management, environmental management,
etc., the consideration of qualitative information is crucial. The introduc-
tion of qualitative criteria in weighted average models, requires that the
each level of their qualitative scale is assigned a numerical value. Such a
3. Multicriteria decision aid classification techniques 55

quantification, however, alters the nature of the qualitative information,


while furthermore, the selection of the quantitative scale is arbitrary.
The tools provided by MAUT are quite useful in addressing these prob-
lems. Jacquet–Lagrèze and Siskos (1978, 1982) were the first to introduce
the use of utility functions within the context of PDA. In particular, the au-
thors used linear programming techniques to estimate an additive utility
function that can be used in ordinal regression decision making problems
(ranking). Of course, the use of additive utility functions in a decision mak-
ing context has been criticized, mainly with regard to the fact that the criteria
interactions are not considered in such an approach (Lynch, 1979; Oral and
Kettani, 1989). These interactions can be modeled and represented in a
multiplicative utility function, as proposed by Oral and Kettani (1989). Nev-
ertheless, the estimation of multiplicative utility functions within the frame-
work of PDA is a computationally intensive process involving the solution
of non-linear mathematical programming problems.

3. MCDA TECHNIQUES FOR CLASSIFICATION


PROBLEMS
Having defined the context of MCDA and the main methodological ap-
proaches developed within this field, the subsequent analysis is focused on
the review of the most characteristic MCDA techniques proposed for ad-
dressing classification problems. The review is performed on two phases:
1. Initially, the techniques based on the direct interrogation of the decision
maker are discussed. Such techniques originate from the MAUT and
ORT approaches.
2. In the second stage, the review extends to the contribution of the PDA
paradigm in the development of classification models.

3.1 Techniques based on the direct interrogation of the


decision maker
3.1.1 The AHP method

Saaty (1980) first proposed the AHP method (Analytic Hierarchy Process)
for addressing complex decision making problem involving multiple criteria.
The method is particularly well suited for problems where the evaluation
criteria can be organized in a hierarchical way into sub-criteria. During the
last two decades the method has become very popular, among operations
researchers and decision scientists, mainly in USA. At the same time, how-
56 Chapter 3

ever, it has been heavily criticized for some major theoretical shortcomings
involving its operation.
AHP models a decision making problem through a process involving four
stages:
Stage 1 : Hierarchical structuring of the problem.
Stage 2 : Data input.
Stage 3 : Estimation of the relative weights of the evaluation criteria.
Stage 4 : Combination of the relative weights to perform an overall evalua-
tion of the alternatives (aggregation of criteria).
In the first stage the decision maker defines a hierarchical structure repre-
senting the problem at hand. A general form of such a structure is presented
in Figure 3.6. The top level of the hierarchy considers the general objective
of the problem. The second level includes all the evaluation criteria. Each
criterion is analyzed in the subsequent levels into sub-criteria. Finally, the
last level of the hierarchy involves the objects to be evaluated. Within the
context of a classification problem the elements of the final level of the hier-
archy represent the choices (groups) available to the decision maker regard-
ing the classification of the alternatives. For instance, for a two-group classi-
fication problem the last level of the hierarchy will include two elements
corresponding to group 1 and group 2.
3. Multicriteria decision aid classification techniques 57

Once the hierarchy of the problem is defined, in the second stage of the
method the decision maker performs pairwise comparisons of all elements at
each level of the hierarchy. Each of these comparisons is performed on the
basis of the elements of the proceeding level of the hierarchy. For instance,
considering the general hierarchy of Figure 3.6 at the first level, no compari-
sons are required (the first level involves only one element). In the second
level, all elements (evaluation criteria) are compared in a pairwise way on
the basis of the objective of the problem (first level of the hierarchy). Then,
the sub-criteria of the third level are compared each time from a different
point of view considering each criterion of the second level of the hierarchy.
For instance, the sub-criteria and are initially compared on the basis
of the criterion then on the basis of criterion etc. The same process is
continued until all elements of the hierarchy are compared.
The objective of all these comparisons is to assess the relative signifi-
cance of all elements of the hierarchy in making the final decision according
to the initial objective. The comparisons are performed using the 9-point
scale presented in Table 3.1.

The results of the comparisons made by the decision maker are used to
form a n×n matrix for each level k of the hierarchy, where denotes the
number of elements in level k.

where, denotes the actual weights assigned to each


element included at level k of the hierarchy as opposed to a specific element
58 Chapter 3

of the level k-1. Assuming that all comparisons are consistent, the weights
can be estimated through the solution of the following system of linear
equalities:

If is known, then this relation can be used to solve for The prob-
lem for solving for a nonzero solution to this set of equation is known as the
eigenvalue problem:

where is the matrix formed by the comparisons made by the decision


maker, is the largest eigenvalue of and is the vector
of the estimates of the actual weights.
The last stage of the AHP method involves the combination of the
weights defined in the previous stage, so that an overall evaluation of the
elements belonging in the final level of the hierarchy (level k) is performed
on the basis of the initial objective of the analysis (first level of the hierar-
chy). This combination is performed as follows:

where, is a vector consisting of the global evaluations for the elements of


level k, and is a matrix of the weights of the elements in level j as opposed
to the elements of level j–1.
For a classification problem the global evaluation for the elements in the
last level of the hierarchy are used to decide upon the classification of an
alternative. Since the elements of the last level correspond to the pre-
specified groups, an alternative is assigned to the group for which the
evaluation of the corresponding element is higher. Srinivasan and Kim
(1987) used AHP in the credit granting problem in order to classify a set of
firms into two groups: the firms that should be granted credit and the ones
that should be rejected credit.
Despite the extensive use of AHP for addressing a variety of decision
making problem (for a review of AHP applications see Zahedi, 1986; Var-
gas, 1990), the method has been heavily criticized by researchers. The focal
point in this criticism is the “rank reversal” problem. This problem was first
noted by Helton and Gear (1983) who found that when a new alternative
is added in an existing set of alternatives A such that is the same with an
3. Multicriteria decision aid classification techniques 59

existing alternative then the evaluations on the new set of alternatives


are not consistent with evaluations on the initial set of alterna-
tives A. An example of this problem is given by Harker and Vargas (1990).
The authors considered a set of three alternatives evaluated along
three criteria and they applied the AHP method and concluding on
the ranking When they introduced an additional alternative in
the analysis such that had the same description with it would be ex-
pected that the new evaluation result will have the form
However, the new evaluation of the alternatives was not consistent with this
expected result (the new evaluation was cf. Harker and
Vargas, 1990). Several researchers have proposed methodologies for ad-
dressing the rank reversal problem (Schoner and Wedley, 1989, 1993; Dyer,
1990), however, an overall solution of this limitation of the method is not
still available.

3.1.2 The ELECTRE TRI method

The family of ELECTRE methods initially introduced by Roy (1968), is


founded on the ORT concepts. The ELECTRE methods are the most exten-
sively used ORT techniques. The ELECTRE TRI method (Yu, 1992) is a
member of this family of methods, developed for addressing classification
problems. The ELECTRE TRI method is based on the framework of the
ELECTRE III method (Roy, 1991). The objective of the ELECTRE TRI
method is to assign a discrete set of alternatives into q
groups Each alternative is considered as a vector
consisting of the performance of alternative on the set of
evaluation criteria g. The groups are defined in an ordinal way, such that
group includes the most preferred alternatives and includes the least
preferred ones. A fictitious alternative is introduced as the boundary
among each pair of consecutive groups and (Figure 3.7). Henceforth,
any such fictitious alternative will be referred to as reference profile or sim-
ply profile. Essentially, each group is delimited by the profile (the
lower bound of the group) and the profile (the upper bound of the
group). Each profile is a vector consisting of partial profiles defined for
each criterion Since the groups are defined in an ordinal
way, each partial profile must satisfy the condition for all k=1, 2, …,
q–1 and i=1, 2, …, n.
60 Chapter 3

The classification of the alternatives into the pre-specified groups is per-


formed through a two stage process. The first stage involves the develop-
ment of an outranking relation used to decide on whether an alternative out-
ranks a profile or not. The second stage involves the exploitation of the de-
veloped outranking relation to decide upon the classification of the alterna-
tives.
The development of the outranking relation in the first stage of the proc-
ess is based on the comparison of the alternatives with the reference profiles.
These comparisons are performed for all pairs j=1, 2, …, m and k=1,
2, …, q–1. Generally, the comparison of an alternative with a profile is
accomplished in two stages, involving the concordance and the discordance
test respectively. The objective of the concordance test is to assess the
strength of the indications supporting the affirmation “alternative is at
least as good as profile The measure used to assess this strength is the
global concordance index This index ranges between 0 and 1; the
closer it is to one, the higher is the strength of the above affirmation and vise
versa. The concordance index is estimated as the weighted average of partial
concordance indices defined for each criterion:
3. Multicriteria decision aid classification techniques 61

where denotes the weight of criterion (the criteria weights are specified
by the decision maker), and denotes the partial concordance index
defined for criterion Each partial concordance index measures the
strength of the affirmation “alternative is at least as good as profile on
the basis of criterion The estimation of the partial concordance index
requires the specification of two parameters: the preference threshold and the
indifference threshold. The preference threshold for criterion represents
the largest difference compatible with a preference in favor of on
criterion The indifference threshold for criterion represents the
smallest difference that preserves indifference between an alternative
and profile on criterion The values of these thresholds are specified
by the decision maker in cooperation with the decision analyst. On the basis
of these thresholds, the partial concordance index is estimated as follows
(Figure 3.8):
62 Chapter 3

The discordance index measures the strength of the indica-


tions against the affirmation “alternative is at least as good as profile on
the basis of criterion The estimation of the discordance index requires
the specification of an additional parameter, the veto threshold Concep-
tually, the veto threshold represents the smallest difference between a
profile and the performance of an alternative on criterion above
which the criterion vetoes the outranking character of the alternative over the
profile, irrespective of the performance of the alternative on the remaining
criteria. The estimation of the discordance index is performed as follows
(Figure 3.9):

Once the concordance and discordance indices are estimated as described


above, the next stage of the process is to combine the two indices so that an
overall estimation of the strength of the outranking degree of an alternative
over the profile can be estimated considering all the evaluation criteria.
This stage involves the estimation of the credibility index measur-
ing the strength of the affirmation “alternative is at least as good as profile
according to all criteria”. The estimation of the credibility index is per-
formed as follows:

where, F denotes the set of criteria for which the discordance index is higher
than the concordance index:

Obviously, if
3. Multicriteria decision aid classification techniques 63

The credibility index provides the means to decide whether an alternative


outranks profile or not. The outranking relation is considered to
hold if The cut-off point is defined by the decision-analyst in
cooperation with the decision maker, such that it ranges between 0.5 and 1.
The outranking relation developed in this way is used to establish three
possible outcomes of the comparison of an alternative with a profile In
particular, this comparison may lead to the following conclusions:
1. Indifference (I):
2. Preference (P):
3. Incomparability (R):
The modeling of the incomparability relation is one of the main distin-
guishing features of the ELECTRE TRI method and ORT techniques in gen-
eral. Incomparability arises for alternatives that have exceptionally good per-
formance on some criteria and at the same time quite poor performance on
other criteria.
The above three relations (I, P, R) provide the basis for developing the
classification rule. ELECTRE TRI employs two assignment procedures, the
optimistic and the pessimistic one. Both procedures begin by comparing an
alternative to the lowest (worst) profile If then the procedure
continues with the comparison of to the next profile The same proce-
dure continues until one of the two following situations appears:
1.
2.
64 Chapter 3

In the first case, both the optimistic and the pessimistic procedures will
assign the alternative into group In the second case, however, the pes-
simistic procedure will assign the alternative into group whereas the op-
timistic procedure will assign the alternative into group
Overall, the key issue for the successful implementation of the above
process is the elicitation of all the preferential parameters involved (i.e., cri-
teria weights, preference, indifference, veto thresholds, profiles). This elici-
tation is often cumbersome in real-world situations due to time constraints or
the unwillingness of the decision makers to actively participate in a direct
interrogation process managed by an expert decision analyst. Recently,
Mousseau and Slowinski (1998) proposed a methodology to infer all this
preferential information using the principles of PDA. The main features, ad-
vantages and disadvantages of this methodology will be discussed in the Ap-
pendix of Chapter 5, together with the presentation of a new approach to ad-
dress this problem.

3.1.3 Other outranking classification methods

The N–TOMIC method

The N–TOMIC method presented by Massaglia and Ostanello (1991), per-


forms an assignment (sorting) of the alternatives into nine pre-specified
groups indicating: exceptionally high performance, high perform-
ance, relatively high performance, adequate performance, uncer-
tain performance, inadequate performance, relatively low perform-
ance, low performance and significantly low performance. These
nine groups actually define a trichotomic classification of the alternatives,
i.e. good alternatives (high performance), uncertain alternatives and bad al-
ternatives (low performance).
The assignment of the alternatives into the above groups is performed
through the definition of two reference profiles and These two profiles
define the concepts of a “good” and “bad” alternative. Every alternative
such that is considered certainly good, whereas the
case indicates that the alternative is certainly bad.
These two cases correspond to the following two affirmations:
1. “Alternative is certainly good” (affirmation 1)
2. “Alternative is certainly bad” (affirmation 2)
The method’s objective is to estimate the credibility of these affirmations
using the concordance and discordance concepts discussed previously for the
ELECTRE TRI method. The realization of the concordance and discordance
tests is based on the same information used in the context of the ELECTRE
3. Multicriteria decision aid classification techniques 65

TRI method (criteria weights, preference, indifference, veto thresholds). The


outcomes of the concordance and discordance tests involve the estimation of
a concordance and discordance index for each one of the above affirmations.
On the basis of these indices the assignment procedure is implemented in
three stages:
Stage 1: In this first stage it is examined whether an alternative can be as-
signed into one of the groups and (these groups
do not pose any certainty on whether an alternative is good, uncer-
tain or bad). Denoting the credibility indices of affirmations 1 and
2 for an alternative by and respectively, the as-
signment is performed as follows2:
If then (uncertain performance group)
If then (significantly low perform-
ance group)
If then (low performance group)
If then (exceptionally high perform-
ance group)
If then (high performance group)
Stage 2: In this stage the assignment of an alternative into one of the follow-
ing sets is considered:
1. {Good}={Alternatives belonging into groups }
2. {Uncertain}={Alternatives belonging into groups }
3. {Bad}={Alternatives belonging into groups }
The assignment is performed as follows3:
If then
If then
If then
Stage 3: This final stage extends the analysis of stage 2 through the consid-
eration of the discordance test. The assignment of the alternatives
is performed through decision trees constructed for each of the sets
{Good}, {Uncertain}, {Bad}. These trees enable the specific clas-
sification of the alternatives into the groups-members of above
sets. A detailed description of the trees used to perform the classi-
fication is presented in Massaglia and Ostanello (1991).

2
and denote two profiles ranging between 0.5 and 1, which are defined by the decision
analyst.
3
denotes a profile ranging between 0.5 and 1.
66 Chapter 3

The PROAFTN method and the method of Perny (1998)

Both the ELECTRE TRI method and the N–TOMIC method are suitable for
addressing sorting problems where the groups are defined in an ordinal way.
The major distinguishing feature of the PROAFTN method (Belacel, 2000)
and the method of Perny (1998) is their applicability in classification prob-
lems with nominal groups.
In such cases, the reference profiles distinguishing the groups cannot be
defined such that they represent the lower bound of each group. Instead,
each reference profile is defined such that it indicates a representative exam-
ple of each group. On the basis of this approach both the PROAFTN method
and the Perny’s method develop a fuzzy indifference relation measuring the
strength of the affirmation “alternative is indifferent to profile The
development of the fuzzy indifference relation is based on similar procedures
to the one used in ELECTRE TRI. Initially, the indications supporting the
above affirmation are considered through the concordance test. Then, the
discordance test is employed to measure the indications against the above
affirmation. The realization of the two tests leads to the estimation of the
credibility index measuring the indifference degree between an al-
ternative and the profile The credibility index is used to decide upon
the classification of the alternatives. The assignment (classification) proce-
dure consists of comparing an alternative to all reference profiles, and as-
signing the alternative to the group for which the alternative is most similar
(indifferent) to the corresponding profile. This is formally expressed as fol-
lows:

Comprehensive description of the details of the model development proc-


ess and the assignment procedures used in the above methods are provided
in the works of Perny (1998) and Belacel (2000).

3.2 The preference disaggregation paradigm in classifi-


cation problems
All the aforementioned MCDA classification/sorting methods contribute to-
wards the development of a methodological context for decision support
modeling and representation purposes. Nevertheless, they all share a com-
mon problem. They require that the decision maker or the decision analyst
specifies several technical and preferential information which are necessary
for the development of the classification model.
3. Multicriteria decision aid classification techniques 67

The methodological framework of preference disaggregation analysis


(PDA) constitute a useful basis for specifying this information using regres-
sion-based techniques. Such an approach minimizes the cognitive effort re-
quired by the decision maker as well as the time required to implement the
decision aiding process.
Sub-section 2.4 of the present chapter discussed some of the early studies
made during the 1950s on using the PDA paradigm for decision making pur-
poses. During the 1960s there were the first attempts to develop classifica-
tion models using regression techniques based on mathematical program-
ming formulations.
One of the first methods to employ this approach was the MSM method
(Multi-Surface Method) presented by Mangasarian (1968). The model de-
velopment process in the MSM method involves the solution of a set of lin-
ear programming problems. The resulting model has the form of a set of hy-
perplanes discriminating the alternatives of a training sample as accurately as
possible in the pre-specified groups. Essentially, the developed classification
models introduce a piece–wise linear separation of the groups. Recently,
Nakayama and Kagaku (1998) extended the model development process of
the MSM method using goal programming and multiobjective programming
techniques to achieve higher robustness and increased generalizing perform-
ance for the developed models.
In the subsequent years there were some sparse studies by Smith (1968),
Grinold (1972), Liittschwager and Wang (1978), and Hand (1981). Despite
the innovative aspects of these pioneering studies, it was the works of Freed
and Glover (1981a, b) that really boosted this field. The authors introduced
simple goal programming formulations for the development of a hyperplane
w · g' = c that discriminates two groups of alternatives. Essentially, such a
hyperplane is a linear discriminant function similar to the one used in LDA.
The linear discriminant function looks similar to the additive utility function
defined in section 2.2 of this chapter. There are three major differences,
however, between these two modeling forms: (1) the discriminant function
cannot be considered as a preference model because it does not consider nei-
ther preference order among decision groups, nor preference order in criteria
domains, (2) all criteria are assumed to be quantitative (the qualitative crite-
ria should also be quantified), (3) the above discriminant function is always
linear, whereas the additive utility function can be either linear or non-linear
depending on the form of the marginal utility functions. These differences
can be considered as disadvantages of the discriminant function over the use
of a utility function. Nevertheless, many researchers use this modeling form
for two reasons: (1) its development is much easier than the development of
a utility function, since only the coefficients should be estimated, (2) it is
a convenient modeling form when nominal groups are considered (the use of
68 Chapter 3

a utility function is only applicable in sorting problems where the groups are
defined in an ordinal way). Furthermore, using the Bayes rule it can be
shown that the linear discriminant function is the optimal classification
model (in terms of the expected classification error) when the data are multi-
variate normal with equal group dispersion matrices (Patuwo et al., 1993).
These assumptions are strong, however, and only rarely satisfied in practice.
On the basis of the linear discriminant function, Freed and Glover
(1981a) used the following simple classification rule for two-group classifi-
cation problems:

The first approach proposed by Freed and Glover (1981a), introduced the
minimum distance between the alternatives’ scores and the cut-off point c as
the model development criterion. This is known as the MMD model (maxi-
mize the minimum distance; cf. Figure 3.10):

Max d
Subject to:

d unrestricted in sign
c user-defined constant
Soon after the publication of their first paper, Freed and Glover published
a second one (Freed and Glover, 1981b) describing an arsenal of similar
goal-programming formulations for developing classification models. The
most well-known of these is the MSD model (minimize the sum of devia-
tions), which considers two measures for the quality of the classification ob-
tained through the developed models (Figure 3.11): (1) the violation of the
classification rules (3.3) by an alternative of the training sample, and (2)
the distance (absolute difference) between a correctly classified alternative
and the cut-off point that discriminates the groups. On the basis of these two
3. Multicriteria decision aid classification techniques 69

measures, the optimal discriminating hyperplane is developed through the


solution of the following linear programming problem:

Subject to:

where and are constants representing the relative significance of the two
goals of the problem (minimization of the violations and maximization of
the distances These constants are specified by the decision maker such
that

Alternatively to the linear discriminant function, the above goal-


programming formulations are also applicable when a quadratic discriminant
function is employed:
70 Chapter 3

The quadratic discriminant function has been proposed by several authors


(Duarte Silva and Stam, 1994; Östermark and Höglund, 1998; Falk and Kar-
lov, 2001) as an appropriate approach to consider the correlations between
the criteria. Essentially, the quadratic discriminant function can be consid-
ered as a simplified form of the multiplicative utility function, which can be
used to address nominal classification problem. Using the Bayes rule it can
be shown that the quadratic discriminant function is the optimal classifica-
tion model (in terms of the expected classification error) when the data are
multivariate normal with unequal group dispersion matrices.
The above two studies by Freed and Glover (1981a, b) motivated several
other researchers towards employing similar approaches. The subsequent
research made in this field focused on the following issues:
1. Investigation of problems in goal programming formulations for the de-
velopment of classification models: This issue has been a major research
topic mainly during the 1980s. Soon after the studies of Freed and Glover
(1981a, b) researchers identified some problems in the formulations that
the authors proposed. Markowski and Markowski (1985) first identified
two possible problems that may occur in the use of the goal programming
formulations proposed by Freed and Glover (1981a, b).
3. Multicriteria decision aid classification techniques 71

All coefficients in the discriminant function (hyperplane) are zero


(unacceptable solution). In such a case all alternatives are classified in
the same group.
The developed classification models are not stable to data transforma-
tions (e.g., rotation).
Later Ragsdale and Stam (1991) noted two additional problems which
can be encountered:
The development of unbounded solutions. In such cases the objective
function of the goal programming formulations can be increased or
decreased without any limitation and the developed model for the
classification of the alternatives is meaningless.
The development of solutions for which all alternatives are placed on
the hyperplane that discriminates the groups (improper solutions).
2. The development of new goal programming formulations: Soon after the
identification of the problems mentioned above, it was found that these
problems were mainly due to the lack of appropriate normalization con-
straints. To address this issue several authors proposed improved formu-
lations including hybrid models (Glover et al., 1988; Glover, 1990),
nonlinear programming formulations (Stam and Joachimsthaler, 1989),
mixed-integer programming formulations (Choo and Wedley, 1985;
Koehler and Erenguc, 1990; Rubin, 1990a; Banks and Abad, 1991; Abad
and Banks, 1993; Wilson, 1996) and multiobjective programming formu-
lations (Stam, 1990). Recently, there have been several studies proposing
the use of advanced optimization techniques such as genetic algorithms
(Conway et al., 1998) and tabu search (Fraughnaugh et al., 1998; Yanev
and Balev, 1999).
3. The comparison with other classification techniques: The first compara-
tive study was performed by Bajgier and Hill (1982). The authors com-
pared the classification models developed through a mixed-integer pro-
gramming formulation, with the models developed by the two formula-
tions of Freed and Glover (1981a, b) and the ones of linear discriminant
analysis. The comparison was based on a simulation experiment and the
obtained results showed that the models developed through the mathe-
matical programming formulations provide higher classification accuracy
compared to the statistical models, except for the case where the groups
variance-covariance matrices are equal. Freed and Glover (1986) con-
cluded to similar encouraging results, despite their observation that goal
programming techniques were more sensitive to outliers. Joachimsthaler
and Stam (1988) performed a more extensive comparison considering a
goal programming formulation, linear and quadratic discriminant analy-
sis, as well as logit analysis. They found that all methods provide similar
72 Chapter 3

results, even though the performance of discriminant analysis (linear and


quadratic) deteriorates as kurtosis increases. Expect for these comparative
studies that concluded to encouraging results, there were also other stud-
ies that concluded to opposite results. Markowski and Markowski (1987)
investigated the impact of qualitative variables in the classification accu-
racy of models developed using goal programming techniques and linear
discriminant analysis. Qualitative variables do not comply with the distri-
butional assumptions of linear discriminant analysis (multivariate normal-
ity). The results of the authors, however, showed that linear discriminant
analysis performed better than goal programming techniques when quali-
tative variables are introduced in the data. In particular, the performance
of linear discriminant analysis was improved with the consideration of
qualitative variables, while the performance of goal programming tech-
niques remained unaffected. The simulation study performed by Rubin
(1990b) is even more characteristic. The author compared 15 goal pro-
gramming formulations to quadratic discriminant analysis. According to
the results the author concluded that in order for the goal programming
techniques to be considered as a promising alternative for addressing
classification problems, they must be shown to outperform discriminant
analysis at least in cases where the data are not multivariate normal. Ta-
ble 3.2 summarizes some of the most significant comparative studies per-
formed during the last two decades. Comprehensive reviews of this field
are presented in the work of Joachimsthaler and Stam (1990), as well as
in a recent special issue of Annals of Operations Research (Gehrlein and
Wagner, 1997).
On the basis of the above discussion, the main existing problems regard-
ing the use of goal programming techniques for developing classification
model involve the following issues:
1. The form of the developed models. The majority of the existing research
employs simple linear models (linear discriminant functions) which of-
ten fail to represent adequately the complexity of real-world classifica-
tion problems.
2. The consideration of qualitative criteria. Using a simple weighted aver-
age or a simple discriminant function makes it difficult to consider quali-
tative criteria. This requires that for each level of the qualitative scale a
0-1 variable is introduced or that each qualitative criterion is “quanti-
fied” introducing a numerical scale to its qualitative measurement (e.g,
good 1, medium 2, bad 3, etc.). However, both these solutions alter the
nature of qualitative criteria and the way that they are considered by the
decision maker.
3. Multicriteria decision aid classification techniques 73
74 Chapter 3

3. The number of groups. Most of the existing research on the use of


mathematical programming techniques for developing classification
models is restricted to two group classification problems. The multi-
group case still needs further research. There have been some sparse
studies on this issue (Choo and Wedley, 1985; Wilson, 1996; Gochet et
al., 1997) but further analysis is required towards the investigation of the
peculiarities of multi-group classification problems within the context of
mathematical programming techniques.
3. Multicriteria decision aid classification techniques 75

4. The nature of the problems that are addressed. The existing research is
heavily focused on classification problems where the groups are defined
in a nominal way. However, bearing in mind that sorting problems (or-
dinal groups) are of particular interest in many real-world decision mak-
ing fields, it is clear that this field is of major practical and research in-
terest and it deserves further investigation.
The MCDA methods that will be presented in detail in the next chapter
address most of the above issues in an integrated and flexible framework.
Chapter 4
Preference disaggregation classification methods

1. INTRODUCTION
The review of MCDA classification methods presented in the previous chap-
ter reveals two major shortcomings:
1. Several MCDA classification methods require the definition of a signifi-
cant amount of information by the decision maker. The process involv-
ing the elicitation of this information is often cumbersome due to: (1)
time constraints, (2) the willingness of the decision maker to participate
actively in this process, and (3) the ability of the analyst to interact effi-
ciently with the decision maker.
2. Other MCDA techniques that employ the preference disaggregation phi-
losophy usually assume a linear relationship between the classification
of the alternatives and their characteristics (criteria). Such an approach
implicitly assumes that the decision maker is risk–neutral which is not
always the case.
This chapter presents two MCDA classification methods that respond
satisfactory to the above limitations. The considered methods include the
UTADIS method (UTilités Additives DIScriminantes) and the MHDIS
method (Multi–group Hierarchical DIScrimination). Both methods combine
a utility function–based framework with the preference disaggregation para-
digm. The problems addressed by UTADIS and MHDIS involve the sorting
of the alternatives into q predefined groups defined in an ordinal way:
78 Chapter 4

where denotes the group consisting of the most preferred alternatives and
denotes the group of the least preferred alternatives.
The subsequent sections of this chapter discuss in detail all the model
development aspects of the two methods as well as all the important issues
of the model development and implementation process.

2. THE UTADIS METHOD

2.1 Criteria aggregation model


The UTADIS method was first presented by Devaud et al. (1980), while
some aspects of the method can also be found in Jacquet–Lagrèze and Siskos
(1982). The interest of MCDA researchers in this method was rather limited
until the mid 1990s. Jacquet–Lagrèze (1995) used the method to evaluate R
& D projects, while after 1997 the method has been widely used for develop-
ing classification models in financial decision making problems. (Zopounidis
and Doumpos, 1997, 1998, 1999a, b; Doumpos and Zopounidis, 1998;
Zopounidis et al., 1999). Recently, the method has been implemented in
multicriteria decision support systems, such as the FINCLAS system
(Zopounidis and Doumpos, 1998) and the PREFDIS system (Zopounidis and
Doumpos, 2000a).
The UTADIS method is a variant of the well–known UTA method
(UTilités Additives). The latter is an ordinal regression method proposed by
Jacquet–Lagrèze and Siskos (1982) for developing decision models that can
be used to rank a set of alternatives from the best to the worst ones.
Within the sorting framework described in the introductory section of
this chapter, the objective of the UTADIS method is to develop a criteria
aggregation model used to determine the classification of the alternatives.
Essentially this aggregation model constitutes an index representing the
overall performance of each alternative along all criteria. The objective of
the model development process is to specify this model so that the alterna-
tives of group receive the highest scores, while the scores of the alterna-
tives belonging into other groups gradually decrease as we move towards the
worst group
Formally, the criteria aggregation model is expressed as an additive util-
ity function:

where:
4. Preference disaggregation classification methods 79

is the vector of the evaluation criteria.


is a scaling constant indicating the significance of
criterion
is the marginal utility function ofcriterion

The marginal utility functions are monotone functions (linear or nonlin-


ear) defined on the criteria’s scale, such that the following two conditions are
met:

where and denote the least and the most preferred value of criterion
respectively. These values are specified according to the set A of the al-
ternatives under consideration, as follows:
For increasing preference criteria (criteria for which higher values in-
dicate higher preference, e.g. return/profitability criteria):

and

For decreasing preference criteria (criteria for which higher values in-
dicate lower preference, e.g. risk/cost criteria):

and

Essentially, the marginal utility functions provide a mechanism for trans-


forming the criterion’s scale into a new scale ranging in the interval [0, 1].
This new scale represents the utility for the decision maker of each value of
the criterion. The form of the marginal utility functions depends upon the
decision maker’s preferential system (judgment policy). Figure 4.1 presents
three characteristic cases. The concave form of the utility function presented
in Figure 4.1 (a) indicates that the decision maker considers as quite signifi-
cant small deviations from the worst performance This corresponds to a
risk–averse attitude (i.e., the decision maker is satisfied with “acceptable”
alternatives and does not necessarily seek alternatives of top performance).
On the contrary, the case presented in Figure 4.1(b) corresponds to a risk–
prone decision maker who is mainly interested for alternatives of top per-
formance. Finally, the linear marginal utility function of Figure 4.1(c) indi-
cates a risk–neutral behavior.
80 Chapter 4

Transforming the criteria’s scale into utility terms through the use of
marginal utility functions has two major advantages:
1. It enables the modeling and representation of the nonlinear behavior of
the decision maker when evaluating the performance of the alternatives.
2. It enables the consideration of qualitative criteria in a flexible way. Con-
sider for instance, a qualitative corporate performance criterion repre-
senting the organization of a firm measured through a three–level quali-
tative scale: “good”, “medium”, and “poor”. Using such a qualitative cri-
terion through simple weighted average models, requires the a priori as-
signment of a numerical value to each level of the qualitative scale (e.g.,
Such an assignment is often arbitrary
and misleading. On the contrary, the specification of the marginal utility
function provides a sound methodological mechanism to identify the
value (in quantitative terms) that the decision maker assigns to each
level of the qualitative scale. Within the context of the UTADIS method
and the preference disaggregation framework in general, the form of the
4. Preference disaggregation classification methods 81

criteria’s marginal utility functions is specified through a regression–


based framework, enabling the a posteriori assignment of a numerical
value to each level of a qualitative scale rather than an arbitrary a priori
specification.
Given the above discussion on the concept of marginal utilities, the
global utility of an alternative specified through eq. (4.1) represents a
measure of the overall performance of the alternative considering its per-
formance on all criteria. The global utilities range in the interval [0, 1] and
they constitute the criterion used to decide upon the classification of the al-
ternatives. Figure 4.2 illustrates how the global utilities are used for classifi-
cation purposes in the simple two group case. The classification is performed
by comparing the global utility of each alternative with a cut–off point de-
fined on the utility scale between 0 and 1. Alternatives with global utilities
higher than the utility cut–off point are assigned into group whereas al-
ternatives with global utilities lower than the cut–off point are assigned into
group

In the general case where q groups are considered, the classification of


the alternatives is performed through the following classification rules:
82 Chapter 4

where denote the utility cut–off points separating the group.


Henceforth, these cut–off points will be referred to as utility thresholds. Es-
sentially, each utility threshold separates two consecutive groups and

2.2 Model development process


2.2.1 General framework

The main structural parameters of the classification model developed


through the UTADIS method include the criteria weights, the marginal util-
ity functions and the utility thresholds. These parameters are specified
through the regression–based philosophy of preference disaggregation analy-
sis described in the previous chapter.
A general outline of the model development procedure in the UTADIS
method is presented in Figure 4.3. Initially, a reference set consisting of
m alternatives described along n criteria is used as the training sample
(henceforth the training sample will be referred to as the reference set in or-
der to comply with the terminology used in MCDA). The alternatives of the
reference set are classified a priori into q groups. The reference set should be
constructed in such a way so that it includes an adequate number of repre-
sentative examples (alternatives) from each group. Henceforth, the number
of alternatives of the reference set belonging into group will be denoted
by
Given the classification C of the alternatives in the reference set, the ob-
jective of the UTADIS method is to develop a criteria aggregation model
and a set of utility thresholds that minimize the classification error rate. The
error rate refers to the differences between the estimated classification
defined through the developed model and the pre–specified classification C
for the alternatives of the reference set. Such differences can be represented
by introducing a binary variable E representing the classification status of
each alternative:
4. Preference disaggregation classification methods 83

On the basis of this binary variable, the classification error rate is de-
fined as the ratio of the number of misclassified alternatives to the total
number of alternatives in the reference set:
84 Chapter 4

This classification error rate measure is adequate for cases where the
number of alternatives of each group in the reference set is similar along all
groups In the case however, where there are significant
differences then the use of the classification error
rate defined in (4.3) may lead to misleading results. For instance, consider a
reference set consisting of 10 alternatives, 7 belonging into group and 3
belonging into group In this case a classification that as-
signs correctly all alternatives of group and incorrectly all alternatives of
group has an error rate

This is a misleading result. Actually, what should be the main point of


interest in the expected classification error This is expressed in relation
to the a priori probabilities and that an alternative belongs into groups
and respectively, as follows:

In the above example the error rates for the two groups (0% for and
100% for can be considered as estimates for the probabilities
and respectively. Assuming that the a priori probabilities for the
two groups are equal then the expected error of the classifi-
cation is 0.5:

Such a result indicates that the obtained classification corresponds to a


random classification. In a random classification the probabilities
are determined based on the proportion of each group to the total number
of alternatives in the reference set. In this respect, in the above example a
naïve approach would be to assign 7 out of the 10 alternatives into group
i.e., and 3 out of the 10 alternatives into group i.e.,
4. Preference disaggregation classification methods 85

The expected error of such a naïve approach (random


classification) is 0.5:

To overcome this problem, a more appropriate measure of the expected


classification error rate is expressed as follows:

Even though this measure takes into consideration the a priori probabili-
ties of each group, it assumes that all classification errors are of equal cost to
the decision maker. This is not always the case. For instance the classifica-
tion error regarding the assignment of a bankrupt firm to the group of
healthy firms is much more costly to an error involving the assignment of a
healthy firm to the bankrupt group. The former leads to capital cost (loss of
the amount of credit granted to a firm), whereas the latter leads to opportu-
nity cost (loss of profit that would result from granting a credit to a healthy
firm). Therefore, it would be appropriate to extend the expected classifica-
tion error rate (4.4) so that the costs of each individual error are also consid-
ered. The resulting measure represents the expected misclassification cost
(EMC), rather than the expected classification error rate:

where:
is the misclassification cost involving the classification of an alter-
native of group into group
is a binary 0–1 variable defined such that if an alternative
is classified into group and if is not classified
into group
Comparing expressions (4.4) and (4.5) it becomes apparent that the ex-
pected classification error rate in (4.4) is a special case of the expected mis-
classification cost, when all costs are considered equal for every k, l=1,
2, …, q. The main difficulty related to the use of the expected misclassifica-
86 Chapter 4

tion cost as the appropriate measure of the quality of the obtained classifica-
tion is that it is often quite difficult to have reliable estimates for the cost of
each type of classification error. For this reason, all subsequent discussion in
this book concentrates on the use of the expected classification error rate
defined in (4.4). Furthermore, without loss of generality, it will be assumed
that all a priori probabilities are equal to
If the expected classification error rate, regarding the classification of the
alternatives that belong into the reference set, is considered satisfactory, then
this constitutes an indication that the developed classification model might
be useful in providing reliable recommendations for the classification of
other alternatives. On the other hand, if the obtained expected classification
error rate indicates that the classification of the alternatives in the reference
set is close to a random classification then the de-
cision maker must check the reference set regarding its completeness and
adequacy for providing representative information on the problem under
consideration. Alternatively, it is also possible that the criteria aggregation
model (additive utility function) is not able to provide an adequate represen-
tation of the decision maker’s preferential system. In such a case an alterna-
tive criteria aggregation model must be considered.
However, it should be pointed that a low expected classification error
rate does not necessarily ensure that practical usefulness of the developed
classification model; it simply provides an indication supporting the possible
usefulness of the model. On the contrary, a high expected classification error
rate leads with certainty to the conclusion that the developed classification
model is inadequate.

2.2.2 Mathematical formulation

Pursuing the objective of the model development process in the UTADIS


method, i.e., the maximization of the consistency between the estimated
classification and the predefined one C, is performed through mathemati-
cal programming techniques.
In particular, the minimization of the expected classification error rate
(4.4) requires the formulation and solution of a mixed–integer programming
(MIP) problem. The solution, however, of MIP formulations is a computa-
tionally intensive procedure. Despite the significant research that has been
made on the development of computationally efficient techniques for solving
MIP problems within the context of classification model development (cf.
sub–section 3.2 of Chapter 3), the computational effort still remains quite
significant. This problem is most significant in cases where the reference set
includes a large number of alternatives.
4. Preference disaggregation classification methods 87

To overcome this problem an approximation of the error rate (4.4) is


used as follows:

where is a positive real variable, defined such that:

Essentially, represents the magnitude of the classification error for al-


ternative On the basis of the classification rule (4.2), the classification
error for an alternative of group involves the violation of the utility
threshold that defines the lower bound of group For the alternatives of
the last (least preferred) group the classification error involves the viola-
tion of the utility threshold that defines the upper bound of group For
any other intermediate group the classification error may involve
either the violation of the upper bound of the group or
the violation of the lower bound
Henceforth the violation of the lower bound of a group will be denoted
by whereas will be used to denote the violation of the upper bound
of a group. Figure 4.4 provides a graphical representation of these two errors
in the simple two–group case. By definition it is not possible that the two
errors occur simultaneously Therefore, the total error for
an alternative is defined as
At this point it should be emphasized that the error functions (4.4) and
(4.6) are not fully equivalent. For instance, consider a reference set consist-
ing of four alternatives classified into two groups:
Assume that for this reference set an additive utility classification model
(CM1) is developed that misclassifies alternatives and such that
and Then according to (4.6) the total classification error
is 0.075, whereas considering (4.4) the expected classification error rate
is An alternative classification model (CM2) that classifies cor-
rectly but retains the misclassification of such that has
and Obviously the model CM1 outperforms CM2 when
the definition (4.6) is considered, but according to the expected classification
error rate (4.4) CM2 performs better.
88 Chapter 4

Despite this limitation the definition (4.6) provides a good approxima-


tion of the expected classification error rate (4.4), while reducing the compu-
tational effort required to obtain an optimal solution.
The two forms the classification errors can be formally expressed on the
basis of the classification rule (4.2) as follows:

These expressions illustrate better the notion of the two classification er-
ror forms. The error indicates that to classify correctly a misclassified
alternative that actually belongs into group its global utility
should be increased by Similarly, the errors indicates that to
classify correctly a misclassified alternative that actually belongs into
its global utility should be decreased by
Introducing these error terms in the additive utility model, it is possible
to rewrite the classification rule (4.2) in the form of the following con-
straints:
4. Preference disaggregation classification methods 89

These constraints constitute the basis for the formulation of a mathe-


matical programming problem used to estimate the parameters of the addi-
tive utility classification model (utility thresholds, marginal utilities, criteria
weights). The general form of this mathematical programming model is the
following (MP1):

subject to:

In constraints (4.11)–(4.12) is a positive constant used to avoid


cases where when Of course, is considered as the
lower bound of group In this regard the case typi-
cally, does not pose any problem during model development and implemen-
tation. However, assuming the simple two–group case the specification
may lead to the development of a classification model for which
for all and for all Since the util-
ity threshold is defined as the lower bound of group it is obvious that
such a model performs an accurate classification of the alternatives. Practi-
90 Chapter 4

cally, however, since all alternatives of group are placed on the utility
threshold, the generalizing ability of such a model is expected to be limited.
Therefore, to avoid such situations a small positive (non–zero) value for the
constant should be chosen. The constant in (4.12)–(4.13) is used in a
similar way.
Constraints (4.14) and (4.15) are used to normalize the global utilities in
the interval [0, 1]. In these constraints and denote the vectors consisting
of the most and the least preferred alternatives of the evaluation criteria. Fi-
nally, constraint (4.16) is used to ensure that the utility threshold dis-
criminating groups and is higher than the utility threshold dis-
criminating groups and This specification ensures the ordering of
the groups from the most preferred to the least preferred ones In
this ordering of the groups, higher utilities are assigned to the most preferred
groups. In constraint (4.16) s is a constant defined such that
Introducing the additive utility function (4.1) in MP1 leads to the formu-
lation of a nonlinear programming problem. This is because the additive util-
ity function (4.1) has two unknown parameters to be specified: (a) the crite-
ria weights and (b) the marginal utility functions. Therefore, constraints
(4.11)–(4.15) take a nonlinear form and the solution of the resulting nonlin-
ear programming problem can be cumbersome. To overcome this problem,
the additive utility function (4.1) is rewritten in a simplified form as follows:

where:

Both (4.1) and (4.18) are equivalent expressions for the additive utility
function. Nevertheless, the latter requires only the specification of the mar-
ginal utility functions As illustrated in Figure 4.1, these func-
tions can be of any form. The UTADIS method does not pre–specify a func-
tional form for these functions. Therefore, it is necessary to express the mar-
ginal utility functions in terms of specific decision variables to be estimated
through the solution of MP1. This is achieved through the modeling of the
marginal utilities as piece–wise linear functions through a process that is
graphically illustrated in Figure 4.5.
4. Preference disaggregation classification methods 91

The range of each criterion is divided into subintervals


A commonly used approach to define these sub-
intervals is based on the following simple heuristic:

Define equal subintervals


such that there is at least one alternative
belonging in each subinterval, i.e.,

Henceforth this heuristic will be referred to as HEUR1. Following this


piece–wise linear modeling approach, the estimation of the unknown mar-
ginal utility functions can be performed by estimating the marginal utilities
at the break–points As illustrated in Figure 4.5 this estimation
provides an approximation of the true marginal utility functions. On the ba-
sis of this approach, it would be reasonable to assume that the larger the
number of subintervals that are specified, the better is the approximation of
the marginal utility functions. The definition of a large number of subinter-
vals, however, provides increased degrees of freedom to the additive utility
model. This increases the fitting ability of the developed model to the data of
the reference set; the instability, however, of the model is also increased (the
model becomes sample–based).
92 Chapter 4

Once the marginal utilities for every break–point are estimated, the mar-
ginal utility of any criterion value can be found using a simple linear
interpolation:

where

On the basis of this piece–wise linear modeling approach of the marginal


utility functions, MP1 is re–written in a linear form as follows: (LP1):

subject to:
4. Preference disaggregation classification methods 93

Constraints (4.21)–(4.27) of LP1 correspond to the constraints (4.11)–


(4.17) of MP1. Therefore, the two problems are equivalent. Table 4.1 pre-
sents the dimensions of LP1. According to this table, the number of con-
straints in LP1 is defined by the number of alternatives in the reference set
and the number of evaluation criteria. The latter defines the number of
monotonicity constraints (4.27). The number of such constraints can be quite
significant in cases where there is a large number of criteria subintervals. For
instance, consider a two–group classification problem with a reference set of
50 alternatives evaluated along five criteria. Assuming that each criterion’s
values are divided into 10 subintervals, then LP1 has the following con-
straints:
1. 50 classification constraints [constraints (4.21)–(4.23)],
2. 2 normalization constraints [constraints (4.24)–(4.25)] and
3. (5 criteria)×(10 subintervals)=50 monotonicity constraints [constraint
(4.27)]

Almost half of the constraints in this simple case are monotonicity con-
straints determined by the number of criteria and the definition of the subin-
tervals. The increased number of these constraints increases the computa-
tional effort required to solve LP1. This problem can be easily addressed if
the monotonicity constraints are transformed to non–negativity constraints
(non–negativity constraints do not increase the computational effort in linear
programming). This transformation is performed using the approach pro-
posed by Siskos and Yannacopoulos (1985). In particular, new variables
are introduced representing the differences between the marginal utilities of
two consecutive break–points and
94 Chapter 4

On the basis of these new incremental variables constraints (4.27) are


transformed into non–negativity constraints The marginal utilities and
the global utilities can now be expressed in terms of the incremental vari-
ables
Marginal utilities:

where

Global utilities:

where denotes the subinterval into which the


performance of alternative on criterion belongs to.
Other changes made in LP1 through the introduction of the incremental
variables w, include the elimination of constraint (4.25), and the transforma-
tion of constraint (4.24) as follows:

According to all the above changes LP1 is now rewritten in a new form
(LP2) presented below. Table 4.2 illustrates the dimensions of the new prob-
lem.

subject to:
4. Preference disaggregation classification methods 95

Comparing Tables 4.1 and 4.2 it is clear that LP2 has


less constraints and n less variables compared to LP1. Thus the computa-
tional effort required to solve LP2 is significantly reduced. LP2 is the formu-
lation used to develop the additive utility classification model within the
context of the UTADIS method.
96 Chapter 4

2.3 Model development issues


2.3.1 The piece–wise linear modeling of marginal utilities

The way that the piece–wise linear modeling of the marginal utility functions
is performed is quite significant for the stability and the performance of the
additive utility classification models developed through the UTADIS
method. This issue is related to the subintervals defined for each criterion’s
range and consequently to the number of incremental variables w of LP2.
In traditional statistical regression it is known that to induce statistically
meaningful estimates for a regression model consisting of n independent
variables the model development sample should have at least n+1 observa-
tions. Horsky and Rao (1984) emphasize the fact that this observation also
holds for mathematical programming approaches.
In the case of the UTADIS method every basic solution of LP2 includes
as many variables as the number of constraints
In addition, the optimal basic solution includes the utility thresholds (q–1
variables). Therefore, overall the optimal solution includes at most
of the incremental variables w. It is obvious, that if a
large number of subintervals is determined such that the number of incre-
mental variables w exceeds t, then at least incremental variables w will
not be included in any basic solution of LP2 (they will be redundant). Such a
case affects negatively the developed model, increasing the instability of the
estimates of the true significance of the criteria.
One way to address this issue is to increase the number of constraints of
LP2. Such an approach has been used by Oral and Kettani (1989). The ap-
pendix of this chapter also presents a way that such an approach can be im-
plemented. Increasing the number of constraints, however, results to in-
creased computational effort required to obtain an optimal solution.
An alternative approach that is not subject to this limitation is to con-
sider appropriate techniques for determining how the criteria scale is divided
into subintervals. The heuristic HEUR1 presented earlier in this chapter is a
simple technique that implements this approach. However, this heuristic
does not consider how alternatives of different groups are distributed in each
criterion’s scale. To accommodate this valuable information, a new simple
heuristic can be proposed, which will be referred to as HEUR2. This heuris-
tic is performed for all quantitative criteria in five steps as follows:
Step 1: Rank–order all alternatives of the reference set according to
their performances on each quantitative criterion from the
least to the most preferred ones. Set the minimum acceptable
number of alternatives belonging into a subinterval equal to zero
4. Preference disaggregation classification methods 97

Step 2: Form all non–overlapping subintervals such that the


alternative, whose performance is equal to belongs to a differ-
ent group from the alternative whose performance is equal to

Step 3: Check the number of alternatives that lie into each subinterval
formed after step 2. If the number of alternatives in a subinterval is
less than then merge this subinterval with the precedent one
(this check is skipped when ).
Step 4: Check the consistency of the total number of subintervals formed
after step 3 for all criteria, as opposed to the size of the linear pro-
gram LP2, i.e. the number of constraints. If the number of subin-
tervals leads to the specification of more than
incremental variables w, then set
and repeat the process from step 3; otherwise the procedure ends.
The recent study of Doumpos and Zopounidis (2001) showed that under
several data conditions HEUR2 increases the stability of the developed addi-
tive utility classification models and contributes positively to the improve-
ment of their classification performance.

2.3.2 Uniqueness of solutions

The simple linear form of LP2 ensures the existence of a global optimum
solution. However, often there are multiple optimal solutions. In the linear
programming theory this phenomenon is known as degeneracy. The exis-
tence of multiple optimal solutions is most often when the groups are per-
fectly separable, i.e., when there is no group overlap. In such cases all error
variables and are zero. The determination of a large number of crite-
ria subintervals is positively related to the existence of multiple optimal solu-
tions (as already mentioned as the number of subintervals increases, the de-
grees of freedom of the developed additive utility model also increases and
so does the fitting ability of the model). Even if the subintervals are defined
in an appropriate way, on the basis of the remarks pointed out in the previous
sub–section, this does not necessarily eliminate the degeneracy phenomenon
for LP2 and the existence of multiple optimal solutions.
In addition to the degeneracy phenomenon, it is also important to em-
phasize that even if a unique optimal solution does exist for LP2 its stability
needs to be carefully considered. A solution is considered to be stable if it is
not significantly affected by small tradeoffs to the objective function (i.e., if
near-optimal solutions are quite similar to the optimal one). The instability
of the optimal solution is actually the result of overfitting the developed ad-
ditive utility model to the alternatives of the reference set. This may affect
negatively the generalizing classification performance of the developed clas-
98 Chapter 4

sification model. In addition to the classification performance issue, the in-


stability of the additive utility model also raises interpretation problems. If
the developed model is unstable then it is clearly very difficult to derive se-
cure conclusions on the contribution of the criteria in performing the classi-
fication of the alternatives (the criteria weights are unstable and therefore,
difficult to interpret).
The consideration of these issues in the UTADIS method is performed
through the realization of a post–optimality analysis that follows the solution
of LP2. The objective of post–optimality analysis is to explore the existence
of alternate optimal solutions and near optimal solutions. There are many
different ways that can be used to perform the post–optimality stage consid-
ering the parameters that are involved in the model development process.
These parameters include the constants and as well as the number of
criteria subintervals. The use of mathematical programming techniques pro-
vides increased flexibility in considering a variety of different forms for the
post–optimality analysis. Some issues that are worth the consideration in the
post–optimality stage include:
1. The maximization of the constants and This implies a maximiza-
tion of the minimum distance between the correctly classified alterna-
tives and the utility thresholds, thus resulting to a more clear separation
of the groups.
2. Maximization of the sum of the differences between the global utilities
of the correctly classified alternatives from the utility thresholds. This
approach extends the previous point considering all differences instead
of the minimum ones.
3. Minimization of the total number of misclassified alternatives using the
error function (4.4).
4. Determination of the minimum number of criteria subintervals.
The appendix of this chapter describes the mathematical programming
formulations that can be used to address these issues during the post–
optimality stage. The formulations presented in the appendix can also be
used instead of LP2 to develop the optimal additive utility classification
model.
These post–optimality approaches consider either the technical parame-
ters of the model development process (cases 1 and 4) or alternative ways to
measure the quality of the developed classification model (cases 2 and 3).
Considering, however, the aforementioned issues regarding the stability
of the developed model and its interpretation, none of these approaches en-
sures the existence of a unique and stable solution. Consequently, the uncer-
tainty on the interpretation of the model is still an issue to be considered.
4. Preference disaggregation classification methods 99

To overcome this problem the post–optimality stage performed in the


UTADIS method focuses on the investigation of the stability of the criteria
weights rather than on the consideration of the technical parameters of the
model development process. In particular, during the post–optimality stage
n+q–1 new linear programs are solved, each having the same form with LP2.
The solution of LP2 is used as input to each of these new linear programs to
explore the existence of other optimal or near optimal solutions. The objec-
tive function of each problem t involves the maximization of each criterion
weight (for t=1, 2, …, n) and the value of the utility thresholds (for t > n). All
new solutions found during the post–optimality stage are optimal or near
optimal for LP2. This is ensured by imposing the following constraint:

where:
is the optimal value for the objective function of LP2,
is the value of the objective function of LP2 evaluated for any new solu-
tion obtained during the post–optimality stage.
is a small portion of (a tradeoff made to the optimal value of the ob-
jective function in order to investigate the existence of near optimal so-
lutions).
This constraint in added to the formulation of LP2 and the new linear
program that is formed, is solved to maximize either the criteria weights or
the utility thresholds as noted above.
Finally, the additive utility model used to perform the classification of
the alternatives is formed from the average of all solutions obtained during
the post–optimality stage.
Overall, despite the problems raised by the existence of multiple optimal
solutions, it should be noted that LP2 provides consistent estimates for the
parameters of the additive utility classification model. The consistency prop-
erty for mathematical programming formulations used to estimate the pa-
rameters of a decision making model was first introduced by Charnes et al.
(1955). The authors consider a mathematical programming formulation to
satisfy the consistency property if it provides estimates of the model’s pa-
rameters that approximate (asymptotically) the true values of the parameters
as the number of observations (alternatives) used for model developed in-
creases. According to the authors this is the most significant property that a
mathematical programming formulation used for model development should
have, since it ensures that the formulation is able to identify the true values
of the parameters under consideration, given that enough information is
available.
100 Chapter 4

LP2 has the consistency property. Indeed, as new alternatives are added
in an existing reference set and given that these alternatives add new infor-
mation (i.e., they are not dominated by alternatives already belonging in the
reference set), then the new alternatives will add new non–redundant con-
straints in LP2. These constraints reduce the size of the feasible set. Asymp-
totically, for large reference sets, this will lead to the identification of a
unique optimal solution that represents the decision maker’s judgment policy
and preferential system.

3. THE MULTI–GROUP HIERARCHICAL


DISCRIMINATION METHOD (MHDIS)
3.1 Outline and main characteristics
People often employ, sometimes intuitively, a sequential/hierarchical process
to classify alternatives to groups using available information and holistic
judgments. For example, examine if an alternative can be assigned to the
best group if not then try the second best group etc. This is the logic
of the MHDIS method and (Zopounidis and Doumpos, 2000c) its main dis-
tinctive feature compared to the UTADIS method. A second major differ-
ence between the two methods involves the mathematical programming
framework used to develop the classification models. Model development in
UTADIS is based on a linear programming formulation followed by a post–
optimality stage. In MHDIS the model development process is performed
using two linear programs and a mixed integer one that gradually calibrate
the developed model so that it accommodates two objectives: (1) the mini-
mization of the total number of misclassifications, and (2) the maximization
of the clarity of the classification1. These two objectives are pursued through
a lexicographic approach, i.e., initially the minimization of the total number
of misclassifications is sought and then the maximization of the clarity of the
classification is performed. The common feature shared by both MHDIS and
UTADIS involves the form of the criteria aggregation model that is used to
model the decision maker’s preferences in classification problems, i.e., both
methods employ a utility–based framework.

1
This objective corresponds to the maximization of the variance among groups in tradi-
tional discriminant analysis; cf. chapter 2.
4. Preference disaggregation classification methods 101

3.2 The hierarchical discrimination process


The MHDIS method proceeds progressively in the classification of the alter-
natives into the predefined groups. The hierarchical discrimination process
used in MHDIS consists of q–1 stages (Figure 4.7). Each stage k is consid-
ered as a two–group classification problem, where the objective is to dis-
criminate the alternatives of group from the alternatives of the other
groups. Since the groups are defined in an ordinal way, this is translated to
the discrimination of group from the set of groups
Therefore at each stage of the hierarchical discrimination process two
choices are available for the classification of an alternative:
1. To decide that the alternative belongs into group or
2. To decide that the alternative belongs at most in the group (i.e., it
belongs into one of the groups to
102 Chapter 4

Within this framework the procedure starts from group (most pre-
ferred alternatives). The alternatives found to belong into group (correctly
or incorrectly) are excluded from further consideration. In a second stage the
objective is to identify the alternatives belonging into group Once again,
all the alternatives found to belong into this group (correctly or incorrectly)
are excluded from further consideration and the same procedure continues
until all alternatives are classified into the predefined groups.
The criteria aggregation model used to decide upon the classification of
the alternatives at each stage k of the hierarchical discrimination process, has
the form of an additive utility function, similar to the one used in UTADIS.

denotes the utility of classifying any alternative into group on


the basis of the alternative’s performance on the set of criteria g, while
denotes the corresponding marginal utility function regarding the
classification of any alternative into group according to a specific crite-
rion Conceptually, the utility function provides a measure of the
similarity of the alternatives to the characteristics of group
Nevertheless, as noted above at each stage k of the hierarchical discrimi-
nation process there are two choices available for the classification of an al-
ternative, the classification into group and the classification at most into
group The utility function measures the utility (value) of the first
choice. To make a classification decision, the utility of the second choice
(i.e., classification at most into group ) needs also to be considered. This
is measured by a second utility function denoted by that has the same
form (4.35).
Based on these two utility functions the classification of an alternative
is performed using the following rules:

where denotes the set of alternatives belonging into groups


During model development the case
is considered to be a misclassification. When the devel-
oped additive utility functions are used for extrapolating purposes, such a
2
As noted in the UTADIS method, in this expression of the additive utility function the mar-
ginal utilities range in the interval where is the weight of criterion The
criteria weights sum up to 100%.
4. Preference disaggregation classification methods 103

case indicates that the classification of the alternatives is not clear and addi-
tional analysis is required. This analysis can be based on the examination of
the marginal utilities and to determine how the perform-
ance of the alternatives on each of the evaluation criterion affects their clas-
sification.
In both utility functions and the corresponding marginal
utilities and are monotone functions on the criteria scale.
The marginal utility functions are increasing, whereas are
decreasing functions. This specification is based on the ordinal definition of
the groups. In particular, since the alternatives of group are considered to
be preferred to the alternatives of the groups to it is expected that the
higher the performance of an alternative on criterion the more similar the
alternative is to the characteristics of group (increasing form of the mar-
ginal utility function and the less similar is to the characteristics of
the groups to (decreasing form of the marginal utility function
).
The marginal utility functions are modeled in a piece–wise linear form,
similarly to the case of the UTADIS method. The piece–wise linear model-
ing of the marginal utility functions in the MHDIS method is illustrated in
Figure 4.8. In contrast to the UTADIS method, the criteria’s scale is not di-
vided into subintervals. Instead, the alternatives of the reference set are
rank–ordered according to their performance on each criterion. The perform-
ance of each alternative is considered as a distinct criterion level. For in-
stance, assuming that the reference set includes m alternatives each having a
different performance on criterion then m criterion levels are considered,
ordered from the least preferred one to the most
preferred one where denotes the number of
distinct criterion levels (in this example
Denoting as and two consecutive levels of criterion
the monotonicity of the marginal utilities is imposed through
the following constraints (t is a small positive constant used to define the
smallest difference between the marginal utilities of and

where,
104 Chapter 4

Thus, it is possible to express the global utility of an alternative in


terms of the incremental variables w as follows denotes the position of
within the rank ordering of the criterion levels from the least preferred one
to the most preferred one

While both UTADIS and MHDIS employ a utility–based modeling


framework, it should be emphasized that the marginal utility functions in
MHDIS do not indicate the performance of an alternative with regard to an
evaluation criterion; they rather serve as a measure of the conditional simi-
larity of an alternative to the characteristics of group (on the basis of a
specific criterion) when the choice among and all the lower (worse)
groups is considered. In this regard, a high marginal utility
would indicate that when considering the performance of alterna-
tive on criterion the most appropriate decision would be to assign the
alternative into group instead of the set of groups (the
overall classification decision depends upon the examination of all criteria).
4. Preference disaggregation classification methods 105

This simple example indicates that the use of utilities in MHDIS does not
correspond to the alternatives themselves, but rather to the appropriateness
of the choices (classification decisions) that the decision maker has meas-
ured on the basis of the alternatives’ performances on the evaluation criteria.

3.3 Estimation of utility functions


According to the hierarchical discrimination procedure described above, the
classification of the alternatives in q classes requires the development of
2(q–1) utility functions. The estimation of these utility functions in MHDIS
is accomplished through mathematical programming techniques. In particu-
lar, at each stage of the hierarchical discrimination procedure, two linear pro-
grams and a mixed–integer one are solved to estimate “optimally” both
utility functions3. The term “optimally” refers to the classification of the al-
ternatives of the reference set, such that: (1) the total number of misclassifi-
cations is minimized and (2) the clarity of the classification is maximal.
These two objectives are addressed lexicographically through the se-
quential solution of two linear programming problems (LP1 and LP2) and a
mixed–integer programming problem (MIP). Essentially, the rationale be-
hind the sequential solution of these mathematical programming problems is
the following. As noted in the discussion of the UTADIS method the direct
minimization of the total classification error (cf. equations (4.4) or (4.5)) is
quite a complex and hard problem to face, from a computational effort point
of view. To cope with this problem in UTADIS an approximation was intro-
duced (cf. equation (4.6)) considering the magnitude of the violations of the
classification rules, rather than the number of violations, which defines the
classification error rate. As noted, this approximation overcomes the prob-
lem involving the computation intensity of optimizing the classification error
rate. Nevertheless, the results obtained from this new error function are not
necessarily optimal when the classification error rate is considered. To ad-
dress these issues MHDIS combines the approximation error function (4.6)
with the actual classification error rate. In particular, initially an error func-
tion of the form of (4.6) is employed to identify the alternatives of the refer-
ence set that are hard to classify correctly (i.e., they are misclassified). This
is performed through a linear programming formulation (LP1). Generally,
the number of these alternatives is expected to be a small portion of the
number of alternatives in the reference set. Then, a more direct error mini-

3
Henceforth, the discussion focuses on the development of a pair of utility functions at
stage k of the hierarchical discrimination process. The first utility function character-
izes the alternatives of group whereas the second utility function characterizes
the alternatives belonging in the set of groups The same process ap-
plies to all stages k=1, 2,..., q–1 of the hierarchical discrimination process.
106 Chapter 4

mization approach is used considering only this reduced set of misclassified


alternatives. This approach considers the actual classification error (4.4). The
fact that the analysis at this stage focuses only a reduced part of the reference
set (i.e. the misclassified alternatives) significantly reduces the computa-
tional effort required to minimize the actual classification error function
(4.4). The minimization of this error function is performed through a MIP
formulation. Finally, given the optimal classification model obtained through
the solution of MIP, a linear programming formulation (LP2) is employed to
maximize the clarity of the obtained classification without changing the
groups into which the alternatives are assigned. The details of this three–step
process are described below, along with the mathematical programming
formulations used at each step.

LP1: Minimizing the overall classification error

The initial step in the model development process is based on a linear pro-
gramming formulation. In this formulation the classification errors are con-
sidered as real–valued variables, defined similarly to the error variables
and used in the UTADIS method. In the case of the MHDIS method
these error variables are defined through the classification rule (4.36):

Essentially, the error indicates the misclassification of an alternative


towards a lower (worst) group compared to the one that it actually belongs,
whereas the error indicates a misclassification towards a higher (better)
group. Both errors refer to a specific stage k of the hierarchical model devel-
opment process.
On the basis of the above considerations, the initial linear program (LP1)
to be solved is the following:

Subject to:
4. Preference disaggregation classification methods 107

s, t small positive constants

Constraints (4.39) and (4.40) define the classification error variables


and These constraints are formulated on the basis of the classification
rule (4.36) and the global utility functions (4.37). In the right–hand side of
these constraints a small positive constant s is used to impose the inequalities
of the classification rule (4.36). This constant is similar to the constants
and used in the linear programming formulation of the UTADIS method.
The set of constraints defined in (4.41) is used to ensure the monotonicity of
the marginal utility functions, whereas the set of constraints in (4.42) nor-
malize the global utility to range between 0 and 1.

MIP: Minimizing the number of misclassifications

The solution of LP1 leads to the development of an initial pair of utility


functions and that discriminate group from the groups to
These utility functions define a classification of the alternatives in the
reference set that is optimal considering the classification error measured in
terms of the real–valued variables and . When the classification error
rate is considered, however, these utility functions may lead to sub–optimal
results. Nevertheless, this initial pair of utility function enables the identifi-
cation of the alternatives that can be easily classified correctly and the
“hard” alternatives. The “hard” alternatives are the ones misclassified by the
pair of utility functions developed through the solution of LP1. Henceforth,
the set of alternatives classified correctly by LP1 will be denoted by COR,
whereas the set of misclassified alternatives will be denoted by MIS.
Given that the set MIS includes at least two alternatives, it is possible to
achieve a “re–arrangement” of the magnitude of the classification errors
and for the misclassified alternatives (alternatives of MIS) that will lead
108 Chapter 4

to the reduction of the number of misclassifications. The example discussed


earlier in sub–section 3.3 of this chapter is indicative of this possibility.
However, as it has already been noted to consider the number of misclassifi-
cation, binary 0–1 error variables need to be introduced in a MIP context. To
avoid the increased computational effort required to solve MIP problems, the
MIP formulation used in MHDIS considers only the misclassifications that
occur through the solution of LP1, while retaining all the correct classifica-
tions. Thus, it becomes apparent that actually, LP1 is an exploratory problem
whose output is used as input information to MIP. This reduces significantly
the number of binary 0–1 variables, which are associated to each misclassi-
fied alternative, thus alleviating the computational effort required to obtain a
solution.
While this sequential consideration of LP1 and MIP considerably re-
duces the computational effort required to minimize the classification error
rate, it should be emphasized that the obtained classification model may be
near optimal instead of globally optimal. This is due to the fact that MIP in-
herits the solution of LP1. Therefore, the number of misclassifications at-
tained after solving MIP depends on the optimal solution identified by LP1
(i.e., different optimal solutions of LP1 may lead to different number of mis-
classifications by MIP). Nevertheless, using LP1 as a pre–processing stage
to provide an input to MIP provides an efficient mechanism (in terms of
computational effort) to obtain an approximation of the globally minimum
number of misclassifications. Formally, MIP is expressed as follows:

Subject to:
4. Preference disaggregation classification methods 109

s, t small positive constants

The first set of constraints (4.45) is used to ensure that all correct classi-
fications achieved by solving LP1 are retained. The second set of constraints
(4.46) is used only for the alternatives that were misclassified by LP1 (set
MIS). Their interpretation is similar to the constraints (4.39) and (4.40) in
LP1. Their only difference is the transformation of the real-valued error
variables and of LP1 into the binary 0–1 variables and that
indicate the classification status of an alternative. Constraints (4.46) define
these binary variables as follows: indicates that the alternative of
group is classified by the developed model into the set of groups
whereas indicates that the alternative belonging into one of the
groups to is classified by the developed model into group
Both cases are misclassifications. On the contrary the cases and
indicate the correct classification of the alternative The interpre-
tation of constraints (4.47) and (4.48) has already been discussed for the LP1
formulation. The objective of MIP involves the minimization of a weighted
sum of the error variables and The weighting is performed consid-
ering the number of alternatives of set MIS from each group This is de-
noted by

LP2: Maximizing the minimum distance

Solving LP1 and then MIP, leads to the “optimal” classification of the alter-
natives, where the term “optimal” refers to the minimization of the number
of misclassified alternatives. However, it is possible that the correct classifi-
cation of some alternatives is “marginal”. This situation appears when the
classification rules (4.36) are marginally satisfied, i.e., when there is only a
slight difference between and For instance, assume a pair of
utility functions developed such that for an alternative of group its
110 Chapter 4

global utilities are and Given these utilities and


considering the classification rules (4.36), it is obvious that alternative is
classified in the correct group This is, however, a mar-
ginal result. Instead, another pair of utility functions for which
and is clearly preferred, providing a more specific conclusion.
This issue is addressed in MHDIS through a third mathematical pro-
gramming formulation used on the basis of the optimal solution of MIP. At
this stage the minimum difference d between the global utilities of the cor-
rectly classified alternatives identified after solving MIP is introduced.

where COR' denotes the set of alternatives classified correctly by the pair of
utility functions developed through the solution of MIP. The objective of this
third phase of the model development procedure is to maximize d. This is
performed through the following linear programming formulation (LP2).

Subject to:

s, t small positive constants


4. Preference disaggregation classification methods 111

The first set of constraints (4.51) involves only the correctly classified
alternatives. In these constraints d represents the minimum absolute differ-
ence between the global utilities of each alternative according to the two util-
ity functions. The second set of constraints (4.52) involves the alternatives
misclassified after the solution of MIP (set MIS' ) and it is used to ensure
that they will be retained as misclassified.
After the solution of LP1, MIP and LP2 at stage k of the hierarchical dis-
crimination process, the “optimal” classification is achieved between the
alternatives belonging into group and the alternatives belonging into the
groups The term “optimal” refers to the number of misclassifications
and to the clarity of the obtained discrimination. If the current stage k is the
last stage of the hierarchical discrimination process (i.e., k=q–1) then the
model development procedure stops since all utility functions required to
classify the alternatives, have been estimated. Otherwise, the procedure pro-
ceeds to stage k+1, in order to discriminate between the alternatives belong-
ing into group and the alternatives belonging into the lower groups
In stage k+1 all alternatives classified by the pair of utility functions
developed at stage k into group are not considered. Consequently, a new
reference set A' is formed, including all alternatives that remain unclassified
in a specific group (i.e., the alternatives classified in stake k in the set of
groups According to the set A', the values of and are up-
dated, and the procedure proceeds with solving once again LP1, MIP and
LP2.

3.4 Model extrapolation


The classification of a new alternative is performed by descending
the hierarchy of Figure 4.7. Initially, the two first additive utility functions
and are used to determine whether the new alternative be-
longs into group or not. If then and the proce-
dure stops, while if then and the procedure pro-
ceeds with the consideration of the next pair of utility functions and
If then and the procedure stops, while if
then and the procedure continues in the same
way until the classification of the new alternative is achieved.
To estimate the global utility of a new alternative the partial value
(marginal utility) of this alternative on each one of the evaluation criteria
needs to be determined. Assuming that the performance of on criterion
lies between the performances of two alternatives of the reference set, i.e.,
denotes the number of distinct criterion
levels at stage k of the hierarchical discrimination process), then the mar-
112 Chapter 4

ginal utilities and need to be estimated through linear interpo-


lation (cf. Figure 4.8).
4. Preference disaggregation classification methods 113

APPENDIX

POST–OPTIMALITY TECHNIQUES FOR


CLASSIFICATION MODEL DEVELOPMENT IN THE
UTADIS METHOD

This appendix presents different techniques that can be implemented during


the post–optimality stage within the context of the UTADIS method. Fur-
thermore, the presented mathematical programming formulations can also be
employed in the fist stage of the model development instead of LP2.
Throughout the discussion made in this appendix, the following notation will
be used:
the optimal (minimum) value of the objective function of LP2,
z: a trade–off to the optimal value made to explore the existence of
near optimal solutions (z is a small portion of
COR: the set of alternatives classified correctly according to the additive
utility model defined by the solution of LP2,
MIS: the set of alternatives misclassified by the additive utility model de-
fined on the basis of the solution of LP2.

Maximization of the minimum difference between the global


utilities of the correctly classified alternatives from the utility
thresholds
The objective in this approach is to identify an alternative optimal or near
optimal additive utility classification model that maximizes the minimum
difference between the global utilities of the correctly classified alternatives
from the utility thresholds. Formally, the minimum difference d is defined as
follows:
114 Chapter 4

The maximization of the difference d eliminates the problem of defining


the values of the constants and Initially, these constants can be set to
an arbitrary small positive value (for instance 0.001). Then the maximization
of d is performed through the following linear program.

Max d

Subject to:

Constraints (4.30)–(4.32) of LP2,


Constraints (4.33)–(4.34) of LP2

where:

Constraints (A1)–(A3) define the minimum difference d. These con-


straints apply only to alternatives that are correctly classified according to
the solution (additive utility model) of LP2 (set COR). For the misclassified
alternatives (set MIS) the constraints (4.30)–(4.32) of LP2 are retained. Con-
straints (4.33)–(4.34) of LP2 are also retained to normalize the developed
utility functions and to ensure the ordering of the groups. Constraint (A4) is
4. Preference disaggregation classification methods 115

used to ensure that the classification error of the new additive utility model
developed through the solution of the above linear program does not exceed
the trade–off made on the optimal classification error defined on the basis
of the optimal solution for LP2.
The maximization of the minimum difference d can also be incorporated
into the formulation of LP2 as a secondary goal for model development (the
primary goal being the minimization of the classification error). In this case
few revisions are required to the above linear program, as follows:
1. No distinction is made between the sets COR and MIS and consequently
constraints (A1)–(A3) apply to all the alternatives of the reference set.
2. In constraints (A1)–(A3) the classification errors and are intro-
duced similarly to the constraints (4.30)–(4.32) of LP2.
3. Constraint (A4) is eliminated.
4. The new objective takes the form:

where and are weighted parameters for the two goals (minimization
of the classification error and maximization of the minimum difference)
defined such
5. During the post–optimality stage described in sub–section 2.2.3 a new
constraint is added: where:
is the maximum difference defined by solving the above linear pro-
gram considering the revisions 1-4,
is a trade-off made over to explore the existence of near optimal
solutions ( is a small portion of ).
d is the maximum difference defined through the solution of each lin-
ear program formed during the post-optimality stage described in
sub-section 2.2.3.

Similarly to the consideration of the minimum difference d (minimum


correct classification), it is also possible to consider the maximum difference
d' between the global utilities of the misclassified alternatives from the util-
ity thresholds. Essentially, the maximum difference d' represents the maxi-
mum individual classification error, i.e., A comibination
of the two differences is also possible.
116 Chapter 4

Maximization of the sum of differences between the correctly


classified alternatives from the utility thresholds
The consideration of the differences between the global utilities of the alter-
natives and the utility thresholds on the basis of the maximum or minimum
operators as described in the previous approach has been shown to be rather
sensitive to outliers (Freed and Glover, 1986). To address this issue it is pos-
sible to use instead of the maximum/minimum operator an metric
distance measure. This involves the sum of all differences between the alter-
natives’ global utilities and the utility thresholds. This involves only the al-
ternatives classified correctly by the additive utility classification model de-
veloped through LP2. The differences are defined similarly to the classifica-
tion errors and as follows:

The development of an additive utility classification model that maxi-


mizes the sum of these differences, given the classification of the alterna-
tives belonging into the reference set by LP2, is sought through the solution
of the following linear program:

Subject to:

Constraints (4.30)–(4.32) of LP2,


Constraints (4.33)–(4.34) of LP2
4. Preference disaggregation classification methods 117

where:

Similarly to the case of LP2, the objective function of the above linear
program considers a weighted sum of the differences and In particu-
lar, the differences are weighted to account for variations in the number of
alternatives of each group in the reference set.
Constraints (A5)–(A7) define the differences and for the alterna-
tives classified correctly after the solution of LP2 (set COR). The uncon-
trolled maximization of these differences, however, may lead to unexpected
results regarding the classification of alternatives belonging into intermedi-
ate groups In particular, given that any intermediate
group is defined by the upper utility threshold and the lower utility
threshold the maximization of the difference for an al-
ternative classified correctly by LP2, is possible to lead to the estima-
tion of a global utility that exceeds the utility threshold (upper
boundary of group ). Therefore, the alternative is misclassified. A similar
phenomenon may also appear when the difference is
maximized (the new estimated global utility may violate the utility
threshold i.e., the lower boundary of group ). To avoid these cases the
differences and should not exceed the range of each group where
the range is defined by the difference Constraints (A8) introduce
these appropriate upper bounds for the differences and
118 Chapter 4

For the alternatives misclassified by LP2 (set MIS) the constraints


(4.30)–(4.32) of LP2 apply. Constraints (4.33)–(4.34) of LP2 are also re-
tained to normalize the developed utility functions and to ensure the ordering
of the groups. Constraint (A9) is used to ensure that the classification error
of the new additive utility classification model complies with the trade–off
specified for the minimum classification error defined according to the solu-
tion of LP2.
Doumpos and Zopounidis (1998) presented a similar approach (UTADIS
I method), which introduces the sum of differences and as a secon-
dary goal in the objective function of LP2 for the development of the optimal
additive utility classification model. In that case, however, appropriate
weights should be specified for the two goals (minimization of the classifica-
tion errors and maximization of the differences) such that higher significance
is attributed to the minimization or the classification errors. The above ap-
proach overcomes the requirement for the specification of the appropriate
weighting parameters.

Minimization of the total number of misclassifications


As already mentioned in sub–section 2.2.1 of this chapter the error function
considered in the objective of LP2 is an approximation of the actual classifi-
cation error defined by equation (4.4). The use of this approximation is im-
posed by the increased computational effort required for the optimization of
the classification error function (4.4). Zopounidis and Doumpos (1998) pre-
sented a variation of the UTADIS method, the UTADIS II method that con-
siders the number of misclassified alternatives instead of the magnitude of
the classification errors and In this case all the error variables
and used in LP2 are transformed into binary 0–1 variables designating
the classification status of each alternative (0 designates correct classifica-
tion and 1 misclassification). In this case, the resulting mathematical pro-
gramming formulation is a mixed integer one with bi-
nary 0–1 variables (cf. Table 4.2 for the dimensions of LP2). Nevertheless,
when there is significant group overlap (even for small reference sets), the
minimization of the actual classification error (4.4) is a quite cumbersome
process (from a computational effort point of view). This problem becomes
even more significant considering that during the post–optimality analysis
stage as described in sub–section 2.2.3 of this chapter a series of similar
mixed integer programming problems needs to be solved. Considering the
minimization of the classification error function (4.4) at a post–optimality
context, given that a solution of LP2 is obtained, provides a good approach
for reducing the required computational effort. In this case, the MIP problem
to be solved is formulated as follows:
4. Preference disaggregation classification methods 119

Subject to:

Constraints (4.33)–(4.34) of LP2

where:

Essentially the above MIP formulation explores the possibility to reduce


the number of alternatives misclassified by LP2 (set MIS), without affecting
the classification of the correctly classified alternatives (set COR). This is a
similar approach to the one used in MHDIS.
Constrains (A10)–(A12) apply only for the alternatives of the set COR
(alternatives classified correctly by LP2). These constraints ensure that the
classification of the alternatives will remain unaffected. Constrains (A 13)–
(A15) apply to the misclassified alternatives of the set MIS (i.e., alternatives
120 Chapter 4

misclassified by LP2). For these alternatives it is explored whether it is pos-


sible or not correct the classification for some of them. The binary 0–1 vari-
ables and designate the classification status for each of these alter-
natives, as follows:

According to the solution of LP2 it is expected that the set MIS is a small
portion of the whole reference set. Therefore, the number of binary 0–1 vari-
ables and is considerably lower than the number of the correspond-
ing variables if all the alternatives of the reference set are considered (in this
case one should use binary 0–1 variables). This reduc-
tion in the number of binary variables is associated to a significant reduction
in the computational effort required to obtain an optimal solution. However,
the computational complexity problem still remains for large data sets. For
instance, considering a reference set consisting of 1,000 alternatives for
which LP2 misclassifies 100 alternatives (10% error), then 100 binary 0–1
variables should be introduced in the above mixed integer programming
formulation (assuming a two group classification problem). In this case sig-
nificant computational resources will be required to find an optimal solution.
The use of advanced optimization techniques such as genetic algorithms or
heuristics (tabu search; cf. Glover and Laguna, 1997), constitute promising
approaches to tackle with this problem. Overall, however, it should be em-
phasized that the optimal fit of the model to the data of the reference set does
not ensure high generalizing ability. This issue needs careful investigation.

Determination of the minimum number of criteria subinter-


vals
The specification of the criteria subintervals during the piece–wise linear
modeling of the marginal utility functions is an issue of major importance in
the UTADIS method. The discussion of this issue in this chapter has been
focused on two simple heuristic approaches (HEUR1 and HEUR2; cf. sub-
section 2.2.3). Alternatively, it is also possible to calibrate the criteria subin-
tervals through a linear programming approach. Initially, the scales of the
quantitative criteria are not divided into any subinterval. LP2 is used to de-
4. Preference disaggregation classification methods 121

velop an optimal additive utility classification model and then at the post–
optimality stage the following linear program is solved:

Subject to:

Constraints (4.30)–(4.34) of LP2

The objective of the above linear program is to minimize the differences


in the slopes of two consecutive linear segments of the piece–wise linear
marginal utility functions (cf. Figure 4.5), such that the classification error of
the new additive utility model complies with the trade–off specified for the
optimal error (cf. constraint (A 17)). The minimization of the differences
in the slopes corresponds to the elimination of the unnecessary criteria subin-
tervals (subintervals with equal slopes of the marginal utility segments can
be merged into one subinterval).
The slope of a linear segment of the piece–wise linear marginal utility
function between two consecutive criterion values and is defined as
follows:

On the basis of the expression, constraint (A16) defines the difference


between the slopes and of two consecutive linear segments of
122 Chapter 4

the piece–wise linear marginal utility function of criterion The deviational


variables used to account for these differences are denoted by and

Nevertheless, assuming that the performance of each alternative of the


reference set defines a new level at the criterion scale (a new point for the
specification of the piece–wise linear form of the marginal utility function),
the above linear program will have too many variables (incremental vari-
ables w) that will increase the computation effort for determining the optimal
solution. This problem can be easily solved by using one of the heuristics
HEUR1 and HEUR2, for the determination of initial set of criteria subinter-
vals and then proceeding with the solution of the above post–optimality ap-
proach to determine a minimum number of subintervals.
Chapter 5
Experimental comparison of classification techniques

1. OBJECTIVES
The existing research on the field of MCDA classification methods has been
mainly focused on the development of appropriate methodologies for sup-
porting the decision making process in classification problems. At the practi-
cal level, the use of MCDA classification techniques in real-world classifica-
tion problems has demonstrated the capabilities that this approach provides
to decision makers.
Nevertheless, the implementation in practice of any scientific develop-
ment is always the last stage of a research. Before this stage, experiments
need to be performed in a laboratory environment, under controlled data
conditions in order to investigate the basic features on the scientific devel-
opment under consideration. Such an investigation and the corresponding
experimental analysis enable the derivation of useful conclusions on the po-
tentials that the proposed research has in practice and the possible problems
that may be encountered during its practical implementation.
Within the field of MCDA experimental studies are rather limited. Some
MCDA researchers conducted experiments to investigate the features and
peculiarities of some MCDA ranking and choice methodologies (Stewart,
1993, 1996; Carmone et al., 1997; Zanakis et al., 1998). Comparative studies
involving MCDA classification techniques have been heavily oriented to-
wards the goal programming techniques discussed in Chapter 3. Such com-
parative studies tried to evaluate the efficiency of goal programming classi-
fication formulations as opposed to traditional statistical classification tech-
niques, such as LDA, QDA and LA.
124 Chapter 5

The present chapter follows this line of research to investigate the classi-
fication performance of the preference disaggregation methods presented in
Chapter 4, as opposed to other widely used classification techniques some of
which have been discussed in Chapters 2 and 3. The investigation is based
on an extensive Monte Carlo simulation experiment.

2. THE CONSIDERED METHODS


Every study investigating the classification performance of a new methodol-
ogy relatively to other techniques, should consider techniques which are: (1)
representative of a wide set of alternative methodological approaches, (2)
well-established among researchers, (3) non-overlapping (i.e., the considered
techniques should consider different underlying assumptions/functionality).
On the basis of these remarks, the experimental investigation of the classifi-
cation performance of the UTADIS and the MHDIS methods considers five
other classification techniques:
1. Linear discriminant analysis.
2. Quadratic discriminant analysis.
3. Logit analysis.
4. Rough sets.
5. ELECTRE TRI.
The two forms of discriminant analysis (linear and quadratic) are among
the most widely used classification techniques. Despite their shortcomings
(cf. Chapter 2), even today they are still often used in many fields for study-
ing classification problems. Also, they often serve as benchmark in compari-
sons regarding the investigation of the classification performance of new
techniques developed from the fields of operations research and artificial
intelligence. The use of these techniques in comparative studies should be of
no surprise. Indeed, considering the fact that LDA and QDA provide the op-
timal classification rule when specific assumptions are met regarding the
statistical properties of the data under consideration (multivariate normality,
known group variance-covariance matrices), their consideration in compara-
tive studies enables the analysts to investigate the ability of new techniques
to compete with a theoretically optimal approach.
Logit analysis (LA) has been developed as an alternative to LDA and
QDA, following an econometric approach. Its consideration in the experi-
mental comparison presented in this chapter is due to its theoretical advan-
tages over the two forms of discriminant analysis in combination with its
extensive use in addressing classification problems in many disciplines. The
ordered logit model is used to apply LA in the present analysis.
5. Experimental comparison of classification techniques 125

The last two methods considered in this experimental comparison, are


examples on non-parametric classification techniques. The rough set theory
has evolved rapidly over the last two decades as a broad discipline of opera-
tions research and artificial intelligence. During its development, the rough
set theory has found several connectives with other disciplines such as neural
networks, fuzzy set theory, and MCDA. In contrast to all the other methods
considered in the conducted experimental comparison, the rough set ap-
proach develops a symbolic classification model expressed in the form of
decision rules. Therefore, the consideration of rough sets enables the investi-
gation of an alternative approach to address the classification problem on the
basis of rule induction. The implementation of the rough set approach in this
study is performed through the MODLEM algorithm (Grzymala–Busse and
Stefanowski, 2001) and the LERS classification system (Grzymala–Busse,
1992). As noted in Chapter 2, the MODLEM algorithm is well suited to clas-
sification problems involving quantitative criteria (attributes) without requir-
ing the implementation of a discretization process. On the other hand, the
LERS system provides the basis to overcome problems such as the conflicts
that may be encountered in the classification of an alternative covered by
rules providing different recommendations, or the classification of an alter-
native that is not covered by any rule. Of course, the value closeness relation
(Slowinski, 1993) could have been used instead of the classification scheme
of the LERS system. Nevertheless, the implementation of the value close-
ness relation approach requires that the decision maker specifies some in-
formation that is necessary to construct the closeness relation. Since this is
an experimental comparison, there is no decision maker that can specify this
information and consequently the use of the value closeness relation is quite
cumbersome.
Finally, the ELECTRE TRI method is the most representative example
of the MCDA approach in addressing the classification problem. In contrast
to the UTADIS and the MHDIS methods, ELECTRE TRI originates from
the outranking relation approach of MCDA. Its major distinctive features as
opposed to the UTADIS and the MHDIS methods involve its non-
compensatory character1 and the modeling of the incomparability relation.
None of the other methods considered in this experimental design has these
two features. Typically, the application of the ELECTRE TRI method re-

1
Compensatory approaches lead to the development of criteria aggregation models consid-
ering the existing trade-offs between the evaluation criteria. Techniques based on the util-
ity theory approach have a compensatory character. On the other hand, non-compensatory
approaches involve techniques that do not consider the trade-offs between the evaluation
criteria. Typical examples of non-compensatory approaches are lexicographic models,
conjunctive/disjunctive models, and techniques based on the outranking relation approach
that employ the veto concept.
126 Chapter 5

quires the decision maker to specify several parameters (cf. Chapter 3). This
is impossible in this experimental comparison, since there is no decision
maker to interact with. To tackle this problem a new procedure has been de-
veloped allowing the specification of the parameters of the outranking rela-
tion constructed through ELECTRE TRI, using the preference disaggrega-
tion paradigm. The details of this procedure are discussed in the appendix of
this chapter.
Of course, in addition to the above techniques, other classification meth-
odologies could have also been used (e.g., neural networks, goal program-
ming formulations, etc.). Nevertheless, introducing additional classification
techniques in this experimental comparison, bearing in mind the already in-
creased size of the experiment would make the results difficult to analyze.
Furthermore, as already noted in Chapters 2 and 3, there have been several
comparative studies in these fields involving the relative classification per-
formance of the corresponding techniques as opposed to the statistical tech-
niques used in the analysis of this chapter. Therefore, the results of this com-
parative analysis can be examined in conjunction with the results of previous
studies to derive some conclusions on the classification efficiency of the
MCDA classification techniques compared to a variety of other non-
parametric techniques.

3. EXPERIMENTAL DESIGN

3.1 The factors


The comparison of the MCDA classification methods, presented in Chapter
4, to the methods noted in the previous sub-section (LDA, QDA, LA, rough
sets, ELECTRE TRI) is performed through an extensive Monte Carlo simu-
lation. The simulation approach provides a framework to conduct the com-
parison under several data conditions and derive useful conclusions on the
relative performance of the considered methods given the features and prop-
erties of the data. The term performance refers solely on the classification
accuracy of the methods. Of course, it should be emphasized that given the
orientation of MCDA methods towards providing support to decision mak-
ers, a comparison of MCDA classification techniques, should also consider
the interaction between the decision maker and the method itself. However,
the participation of an actual decision maker in an experimental analysis is
rather difficult. Consequently, the experiment presented in this chapter is
only concerned with the investigation of the classification accuracy of
MCDA methods on experimental data conditions.
5. Experimental comparison of classification techniques 127

In particular, the conducted simulation study investigates the perform-


ance of the methods on the basis of the following six factors.
1. The statistical distribution of data.
2. The number of groups.
3. The size of the reference set (training sample).
4. The correlation of the evaluation criteria.
5. The structure of the group variance-covariance matrices.
6. The degree of group overlap.
Table 5.1 presents the levels considered for each factor in the simulation
experiment. As indicated in the table the ELECTRE TRI and UTADIS
methods are both applied in two ways. In particular, ELECTRE TRI is ap-
plied with and without the discordance test (veto) in order to investigate the
impact of the veto concept on the efficiency of the method. The introduction
of the veto concept is the major distinguishing feature of the ELECTRE TRI
method (and the outranking relation approach in general) as opposed to other
MCDA methodologies employing a compensatory approach (e.g., UTADIS,
MHDIS methods).
128 Chapter 5

For the UTADIS method the two heuristic approaches (HEUR1,


HEUR2) presented in Chapter 4 for the definition of the criteria sub-intervals
during the piece-wise modeling of the marginal utility functions are em-
ployed. Using the two heuristic approaches enables the investigation of their
impact on the classification accuracy of the developed additive utility classi-
fication models.
All methods defined by the factor are compared (in terms of their
classification accuracy) under different data conditions defined by the re-
maining factors
Factor specifies the statistical distribution of data (i.e., the distribution
of the performances of the alternatives on the evaluation criteria). Most of
the past studies conducting similar experiments have been concentrated on
univariate distributions. Through univariate distributions, however, it is dif-
ficult to model the correlations between the criteria, which are an important
issue in the model development process. In the multivariate case the majority
of the existing studies focus only on the multivariate normal distribution
which is easy to model2. On the contrary, this experimental comparison con-
siders a rich set of multivariate distributions, including four cases. In particu-
lar, the first two of the multivariate distributions that are considered (normal
and uniform) are symmetric, while the exponential3 and log-normal distribu-
tions are asymmetric, thus leading to a significant violation of multivariate
normality. The methodology used to simulate these multivariate distributions
is presented in the subsequent sub-section.
Factor defines the number of groups into which the classification of
the objects is made. Most of the existing experimental studies consider only
the two-group case. In real-world problems, however, multi-group classifica-
tion problems are often encountered. A multi-group classification scheme
adds more flexibility to the decision making process as opposed to the strict
two-group classification framework. In this experimental design two-group
and three-group classification problems are considered. This specification
enables the derivation of useful conclusions on the performance of the meth-
ods in a wide range of situations that are often met in practice (many real-
world classification problems involve three groups).
Factor is used to define the size of the reference set (training sample),
and in particular the number of alternatives that it includes (henceforth this
number is denoted by m). The factor has three levels corresponding to 36, 72

2
If z is a vector of n random variables that follow the standard normal distribution N (0,1),
then the elements of the vector y=Bz+µ follow the multivariate normal distribution with
mean µ and variance-covariance (dispersion) matrix BB'.
3
This is actually a multivariate distribution that resembles the exponential distribution in
terms of its skewness and kurtosis. Nevertheless, for simplicity reasons, henceforth this will
be noted as the exponential distribution.
5. Experimental comparison of classification techniques 129

and 108 alternatives, distributed equally to the groups defined by factor


In all three cases the alternatives are described along five criteria. Generally,
small training samples contain limited information about the classification
problem being examined, but the corresponding complexity of the problem
is also limited. On the other hand, larger samples provide richer information,
but they also lead to increased complexity of the problem. Thus, the exami-
nation of the three levels for this factor enables the investigation of the per-
formance of the classification procedures under all these cases.
Factor defines the correlations between the evaluation criteria. Two
cases (levels) are considered for this factor. In the first case the correlations
are assumed to be limited (the correlation coefficient ranges between 0 and
0.1). In the second case higher correlations are used (the correlation coeffi-
cient ranges between 0.2 and 0.5). In both case the correlation coefficient
among the criteria and is specified as a uniformly distributed random
variable ranging in the appropriate interval.
The specified correlation coefficients for every pair of criteria define the
off-diagonal elements of the group variance-covariance matrices. The ele-
ments in the diagonal of these matrices, representing the variance of the cri-
teria are specified by the sixth factor which is considered in two levels.
In the first level, the variances of the criteria are equal for all groups,
whereas in the second level the variances differ. Denoting the variance of
criterion for group as the realization of these two situations re-
garding the homogeneity of the group dispersion matrices is performed as
follows:
For the multivariate normal, uniform and exponential distributions:

Level 1:

Level 2:

For the multivariate log-normal distribution, the variances are specified


so as to assure that the kurtosis of the data ranges within reasonable lev-
els4, as follows:
a) In the case of two groups:

4
In the log-normal distribution the skewness and kurtosis are defined by the mean and the
variance of the criteria for each group. The procedures for generating multivariate non-
normal data can replicate satisfactory the prespecified values of the first three moments
(mean, standard deviation and skewness) of a statistical distribution. However, the error is
higher for the fourth moment (kurtosis). Therefore, in order to reduce this error and conse-
quently to have better control of the generated data, both the mean and the variance of the
criteria for each group in the case of the multivariate log-normal distribution, are specified
so as the coefficient ofkurtosis is lower than 40.
130 Chapter 5

Level 1:

Level 2:

b) In the case of three groups:

Level 1:

Level 2:

The last factor is used to specify the degree of group overlap. The
higher the degree of group overlap the more difficult is to discriminate the
considered groups. The definition of the group overlap in this experiment is
performed using the Hotelling’s statistic. For a pair of groups and
this statistic is defined as follows:

where and denote the number of alternatives of the reference set be-
longing into each group, and are vectors of the criteria averages for
each group, and S is the within–groups variance–covariance matrix:

Hotelling’s is a multivariate test for the differences between the


means of two groups. To evaluate its statistical significance the
statistic is computed, which follows the F distribution
with n and degrees of freedom (Altman et al., 1981).
The use of the Hotelling’s statistic implies a multivariate normal dis-
tribution and an equality of the group variance-covariance matrices. Studies
5. Experimental comparison of classification techniques 131

investigating the multivariate normality assumption have shown that the re-
sults of the Hotelling’s are quite robust for multivariate non-normal data,
even for small samples (Mardia, 1975). Therefore, the use of the Hotelling’s
in this experimental combined with non-normal data is not a problem. In
the case where the group variance-covariance matrices are not equal, then it
is more appropriate to use the revised version of the Hotelling’s as de-
fined by Anderson (1958):

where:

: the vector consisting of the performance of an alternative on the


evaluation criteria.
In this revised version of the Hotelling’s , the statistic (M–n–
follows the F distribution with n and M–n–1 degrees of free-
dom.
The Hotelling’s and its revised version are used in this experiment to
define the average performance of the alternatives in each group. For the
multivariate normal, uniform and exponential distributions the average per-
formance of the alternatives of group to all criteria is set equal to one (i.e.,
whereas for the multivariate log-normal distribution the average per-
formance of the alternative of group on the evaluation criteria it set such
that The average performance of the alternatives in group is
specified so that the Hotteling’s or its revised version (when the group
variance-covariance matrices are unequal) defined between and is sig-
nificant at the 1% level (low overlap) and the 10% level (high overlap). The
average performance of the alternatives in group is defined in a simi-
lar way.

3.2 Data generation procedure


A crucial aspect of the experimental comparison is the generation of the data
having the required properties defined by the factors described in the previ-
ous sub-section.
132 Chapter 5

The generation of data that follow the multivariate normal distribution is


a well-known process. On the other hand, the simulation of multivariate non-
normal distributions is a more complex process. In this study the methodol-
ogy proposed by Vale and Maurelli (1983) is employed. The general outline
of this methodology is presented in Figure 5.1. The outcome of this method-
ology is the generation of a vector g' consisting of n random variables
(evaluation criteria) having the statistical properties described in the previous
sub-section. In this experimental comparison the generated criteria vector g'
consists of five criteria
5. Experimental comparison of classification techniques 133

Each criterion of the generated vector g' follow the specified multi-
variate non-normal distribution with zero mean and unit variance.
Their transformation, so that they have the desired mean and standard de-
viation defined by factors and respectively (cf. Table 5.1), is per-
formed through the relation
Each criterion of the vector g' is defined as
where is a random variable following the
multivariate standard normal distribution. The constants are
specified through the solution of a set of non-linear equations on the basis of
the desired level of skewness and kurtosis (Fleishman, 1978):

The use of the traditional techniques for generating multivariate normal


random variables is not adequate for generating the random variables This
is because the desired correlations between the criteria of the vector g should
be taken into consideration in generating the random vector y. To address
this issue Vale and Maurelli (1983), proposed the construction of an inter-
mediate correlation matrix. Each element of this matrix defines the corre-
lation between the variables and corresponding to criteria and The
calculation of is performed through the solution of the following equation:

where denotes the desired correlation between the criteria and as de-
fined by factor The intermediate correlation matrix is decomposed so that
the correlations between the random variables of the vector y corresponds to
the desired correlations between the criteria of the vector g. In this experi-
mental study the decomposition of the intermediate correlation matrix is per-
formed using principal components analysis. The data generation procedure
ends with the transformation of the vector g' to the criteria vector g with the
desired mean and standard deviation defined by factors and respec-
tively.
Since all the considered MCDA methods assume an ordinal definition of
the classes, it is important to ensure that the generated data meet this re-
quirement. This is achieved through the following constraint:
134 Chapter 5

This constraint ensures that an alternative of group does not domi-


nate the alternatives of group thus ensuring that group consists of al-
ternatives that are preferred to the ones of
For each combination of factors to the above data generation pro-
cedure is employed to produce two data samples. The first one is used as the
reference set, while the second one is used as the validation sample. The
number of alternatives in the reference set is specified by factor whereas
the validation sample consists of 216 alternatives, in all cases. In both the
reference set and the validation sample the number of alternatives are
equally distributed in the groups
This experiment is repeated 20 times for each combination of the factors
to (192 combinations). Overall, 3,840 reference sets are considered5,
each matched to a validation sample. Each reference set is used to develop a
classification model through the methods specified by factor (cf. Table
5.1). This model is then applied to the corresponding validation sample to
test its generalizing classification performance.
The simulation was conducted on a Pentium III 600Mhz PC, using Mat-
lab 5.2 for data generation as well as for the application of LA and QDA.
Appropriate codes for the other methods were written by the authors in the
Visual Basic 6 programming environment. The results of the simulation have
been analyzed using the SPSS 10 statistical package.

4. ANALYSIS OF RESULTS
The results obtained from the simulation experiment involve the classifica-
tion error rates of the methods both in the reference sets and the validation
samples. However, the analysis that follows is focused only on the classifica-
tion performance of the methods on the validation samples. This is because
the error rates obtained considering the reference set are downwardly biased
compared to the actual performance of the methods, since the same sample is
used both for model development and model validation. On the other hand,
the error rates obtained using the validation samples provide a better esti-
mate of the generalizing performance of the methods, measuring the ability
of the methods to provide correct recommendations on the classification of
new alternatives (i.e., alternatives not considered during model develop-
ment).

5
(192 combinations of factors F2 to F7)×(20 replications).
5. Experimental comparison of classification techniques 135

The analysis of the results is based on the use of a transformed measure


of the error rate proposed to stabilize the variance of the error rates (Bajgier
and Hill, 1982; Joachimsthaler and Stam, 1988):

Table 5.2 presents the ANOVA results for this error rate measure de-
fined on the basis of the validation samples. All main effects and the interac-
tion effects presented in this table are significant at the 1% level. Further-
more, each effect (main or interaction) explains at least 0.5% of the total
variance in the results ( statistic6). Except for the 17 effects presented in
Table 5.2 there were 64 more effects found to be significant at the 1% level.
None of these effects, however, explained more that 0.5% of the total vari-
ance and therefore, in order to reduce the complexity of the analysis, they are
not reported.
A first important note on the obtained results is that the main effects re-
garding the seven factors are all significant. This clearly shows that
each of these factors has a major impact on the classification performance of
the methods. The main effects involving the statistical distribution of the
data the structure of the group dispersion matrices
and the classification methods explain more than 48% of the total
variance. The latter effect (classification methods) is of major importance to
this analysis. It demonstrates that there are significant differences in the clas-
sification performances of the considered methods. Figure 5.2 presents the
average error rates for each method in the validation samples for the whole
simulation. The numbers in parentheses indicate the grouping of the methods
according to the Tukey’s test7 on the average transformed error rates. The
homogeneous groups of classification methods formed by the Tukey’s test
are presented in an increasing order (i.e., 1, 2, ...) from the methods with the
lower error rate to those with the higher error rate.

6
Denoting by SS the sum of squares, by MSE the mean square error, by df the degrees of
freedom and by TSS the total sum of squares, the statistic is calculated as follows:

7
Tukey’s honestly significantly different test is a post-hoc comparison technique that fol-
lows the results of ANOVA enabling the identification of the means that most contribute to
the considered effect. In this simulation study the Tukey’s test is used to perform all pair-
wise comparisons among average classification error rates (transformed error rates) of each
pair of methods to form homogenous sets of methods according to their classification error
rate. Each set includes methods that do not present statistically significant differences with
respect to their classification error rates (see Yandell, 1977 for additional details).
136 Chapter 5
5. Experimental comparison of classification techniques 137

The results indicate the increased performance of the considered MCDA


classification methods as opposed to the other techniques. In particular,
UTADIS provides the best performance (lower error rate) compared to all
the other methods. The use of the heuristic HEUR2 (UTADIS2) for the
specification of the subintervals during the piece-wise formulation of the
marginal utility functions in the UTADIS method, provides better overall
results compared to the use of the heuristic HEUR1 (UTADIS1). The differ-
ence between the two cases is significant at the 5% level according to the
Tukey’s test. UTADIS is followed by the MHDIS method, which provides
similar results to ELECTRE TRI. With regard to the ELECTRE TRI method
it should be noted that the procedure used to specify the parameters of the
outranking relation provides quite satisfactory classification results (a de-
tailed description of this procedure is given in the Appendix of this chapter).
The differences between the use of the discordance test or not (ELEC1 vs
ELEC2) are not significant. Regarding the other methods, the rough set ap-
proach provide the lower error rate followed by QDA, while LA and LDA
provide similar results.
These results provide an overview of the overall performance of the con-
sidered methods in the experiment. Further insight information can be de-
rived considering the significant interactions between the factor and the
factors to
The most significant of these interactions is that involving the perform-
ance of the methods for the different statistical distributions considered in
the experiment This interaction explains 8.71% of
the total variance of the results (cf. Table 5.2). Table 5.3 presents the corre-
sponding results for all combinations of these two factors (similarly to Fig-
ure 5.2 parentheses indicate the grouping of the methods through the
Tukey’s test at the 5% significance level).
138 Chapter 5

For all four statistical distributions the two implementations of the


UTADIS method provide the best results. In the case of the multivariate
normal distribution the error rates for UTADIS 1 and UTADIS2 are slightly
higher than the error rate of QDA, but the differences are not statistically
significant (both UTADIS 1 and UTADIS2 provide similar results in this
case). It is also important to note that the other MCDA classification meth-
ods (MHDIS and ELECTRE TRI) outperform both LDA and LA. For the
multivariate uniform distribution UTADIS 1 and UTADIS2 provide the
lower error rates, followed by the MHDIS method and QDA. The differ-
ences between the MCDA classification methods and the traditional statisti-
cal techniques significantly increase for the two asymmetric distributions,
the exponential and the log-normal. In the exponential case, the implementa-
tion of the UTADIS method using the heuristic HEUR2 provides better re-
sults compared to all the other approaches. It should also be noticed that the
difference with the use of the heuristic HEUR1 is significant according to
the Tukey’s test at the 5% level. MHDIS, ELECTRE TRI and the rough set
approach all provide similar results which are considerably better as opposed
to the three statistical methods. Similar results are also obtained for the log-
normal distribution. The use of heuristic HEUR2 in UTADIS once again
provides the best results. The heuristic HEUR1 (UTADIS1), however, leads
to similar results compared to MHDIS, and ELECTRE TRI.
A second significant two-way interaction that is of interest to the analy-
sis of the performance of the methods, is the interaction involving fac-
tors (classification methods) and (structure of the group dispersion ma-
trices). This interaction explains 4.46% of the total variance in the results of
the experiment. The corresponding results presented in Table 5.4 show that
in both cases (equal and unequal group dispersion matrices), the considered
MCDA classification techniques provide quite satisfactory results. In par-
ticular, the two implementations of the UTADIS method provide the lower
error rates both in case the group dispersion matrices are equal as well as in
case they are unequal. In the former case, the use of HEUR2 provides sig-
nificantly lower error rate compared to the use of HEUR1. In the case of
equal group dispersion matrices, the UTADIS method is followed by the
MHDIS method and the two implementations of the ELECTRE TRI method
(with and without the discordance test). In this case the MHDIS method per-
forms slightly better than the ELECTRE TRI method. In the case of unequal
group dispersion matrices the differences between MHDIS and ELECTRE
TRI are not significant. It is also worth noticing the performance of QDA for
the two considered cases regarding the structure of the group dispersion ma-
trices. When these matrices are equal along all groups, QDA performs worst
than all the other methods used in this experiment. The performance of the
method, however, improves significantly when the group dispersion matrices
5. Experimental comparison of classification techniques 139

are unequal. In this case the error rate of QDA is similar to the one of the
UTADIS method and significantly lower compared to all the other tech-
niques. These results indicate that QDA is quite sensitive to changes in the
structure of the group dispersion matrices.

The third two–way interaction which is found significant in this experi-


ment for the explanation of the differences in the performance of the meth-
ods, involves the size of the reference set This interac-
tion explains 1.22% of the total variance in the results of the experiment. The
results of Table 5.5 show that the increase of the size of the reference set
(number of alternatives) reduces the performance of all methods. This is an
expected result, since in this experiment larger reference sets are associated
with an increased complexity of the classification problem. The most sensi-
tive methods to the size of the reference set are LDA, LA and UTADIS. On
the other hand, QDA, rough sets and MHDIS appear to be the least sensitive
methods. Nevertheless, it should be noted that irrespective of the reference
set size the considered MCDA methods always perform better than the other
methods. In particular, the two implementations of the UTADIS method
provide the best results for small to moderate reference sets (36 and 72 alter-
natives). In both cases, the differences between UTADIS1 (HEUR1) and
UTADIS2 (HEUR2) are not significant according to the Tukey’s grouping at
the 5% level. The UTADIS method is followed by MHDIS, ELECTRE TRI
and rough sets. For larger reference sets (108 alternatives) UTADIS2 pro-
vides the best results. In this case, its difference to UTADIS 1 is statistically
significant, thus indicating that the use of the heuristic HEUR2 is less sensi-
tive to the size of the reference set compared to HEUR1. UTADIS 1, MHDIS
and ELECTRE TRI all provide similar results in this case, followed by
rough sets and QDA.
140 Chapter 5

The last two-way interaction, that is of interest in this analysis, is the one
involving the performance of the methods according to the number of groups
This interaction explains 0.64% of the total variance in
the results of the experiment. The corresponding results are presented in Ta-
ble 5.6. A first obvious remark is that the performance of all methods dete-
riorates significantly in the three-group classification problem as opposed to
the two-group case. This is no surprise, since the number of groups is posi-
tively related to the complexity of the problems (i.e., the complexity in-
creases with the number of groups). Nevertheless, in both the two-group and
the three-group case the use of the heuristic HEUR2 in UTADIS is the ap-
proach that provides the lower error rate. In particular, in the two-group case
UTADIS2 performs similarly to UTADIS 1 (use of HEUR1), whereas in the
three-groups case its differences from all other methods (including
UTADIS 1) are all statistically significant at the 5% level according to the
grouping obtained from the Tukey’s test. It should also be noticed that
MHDIS and ELECTRE TRI are the least sensitive methods to the increase
of the number of groups. In both cases, the increase in the error rates for the
three-group problem is the smallest compared to all other methods. As a re-
sult, both MHDIS and ELECTRE TRI perform similarly to UTADIS 1 in the
three-group classification problem.
Except for the above two-way interaction results, Table 5.2 also indicates
some three–way interactions to be significant in explaining the results of this
experiment regarding the performance of the considered classification meth-
ods. The first of these three–way interactions that is of interest involves the
performance of the methods according to the form of the statistical distribu-
tion of the data and the structure of the group dispersion matrices (interac-
tion The corresponding results presented in Table 5.7 provide
5. Experimental comparison of classification techniques 141

more insight information on the remarks noted previously when the statisti-
cal distribution and the structure of the group dispersion matrices were ex-
amined independently from each other (cf. Tables 5.3 and 5.4); the interac-
tion of these two factors is examined now.
142 Chapter 5

The results of the above table indicate that when the data are multivari-
ate normal and the group dispersion matrices are equal LDA and LA provide
the lower error rates, whereas when the group dispersion matrices are un-
equal QDA outperforms all the other methods, followed by UTADIS. These
results are to be expected considering that multivariate normality and the a-
priori knowledge of the structure of the group dispersion matrices are the
two major assumptions underlying the use both LDA and QDA. On the other
hand, when the data are not multivariate normal and the group dispersion
matrices are equal, then the MCDA classification methods (UTADIS,
MHDIS, ELECTRE TRI) provide the best results compared to the other
methods considered in this experiment. In all these cases the use of the
UTADIS method with the heuristic HEUR2 (UTADIS2) provides the best
results. Its differences from all the other MCDA approaches are significant
for the exponential and the log-normal distributions, whereas for the uniform
distribution its results are similar to UTADIS 1. The results obtained when
the data are not multivariate normal and the dispersion matrices are unequal
are rather similar. The differences, however, between the MCDA methods,
rough sets and QDA are reduced in this case. In particular, for the uniform
distribution QDA performs similarly to the UTADIS method, while outper-
forming both MHDIS and ELECTRE TRI. A similar situation also appears
for the log-normal distribution. On the other hand, for the exponential distri-
bution UTADIS outperforms all the other methods, followed by MHDIS and
rough sets.
The second three-way interaction that is of interest involves the per-
formance of the classification methods according to the form of the statisti-
cal distribution of the data and the size of the reference set (interaction
The results presented in Table 5.8 show that for low and moder-
ate sizes of the reference set (36 and 72 alternatives) the MCDA classifica-
tion methods compare favorably (in most cases) to the other techniques, irre-
spective of the form of the statistical distribution. Futhermore, it is interest-
ing to note that as the size of the reference set increases, the performance of
MHDIS and ELECTRE TRI relative to the other methods is improved. The
improvement is more significant for the two asymmetric distributions (expo-
nential and log-normal). For instance, in the case of the log-normal distribu-
tion with a large reference set (108 alternatives) both MHDIS and
ELECTRE TRI perform significantly better than the UTADIS method when
the heuristic HEUR1 (UTADIS1) is used.
5. Experimental comparison of classification techniques 143

5. SUMMARY OF MAJOR FINDINGS


The experiment presented in this chapter provided useful results regarding
the efficiency of a variety of MCDA classification methods compared to
144 Chapter 5

other established approaches. Additionally, it facilitated the investigation of


the relative performance of the MCDA classification methods compared to
each other. The methods UTADIS, MHDIS and ELECTRE TRI originate
from different MCDA approaches. The conducted extensive experiment
helped in considering the relative classification performance of these meth-
ods for a variety of different data conditions.
Overall, the main findings of the experimental analysis presented in this
chapter can be summarized in the following points:
1. The considered MCDA classification methods can be considered as an
efficient alternative to widely used statistical techniques, at least in cases
where the assumptions of these techniques are not met in the data under
consideration. Furthermore, the MCDA classification methods appear to
be quite effective compared to other non-parametric classification tech-
niques. Of course, in this analysis only the rough set approach was con-
sidered as an example of a non-parametric classification approach.
Therefore, the obtained results regarding the comparison of MCDA
methods and other non-parametric classification techniques should be
further extended considering a wider range of methods, such as neural
networks, machine learning, mathematical programming, etc. Despite
this shortcoming, it is important to consider the present results as op-
posed to the results of other experimental studies on the comparison of
multivariate statistical classification methods and non-parametric ap-
proaches (cf. Chapter 2). The fact that some of these studies often do not
show a clear superiority of the existing non-parametric techniques over
statistical classification methods, in conjunction with the results of the
above analysis, provides a first positive indication on the performance of
the considered MCDA classification methods as opposed to other non-
parametric techniques.
Table 5.9 provides a synopsis of the results of the experiment in
terms of pair-wise comparisons of the methods with regard to their error
rates in the validation samples. Furthermore, Table 5.10 presents the
methods with the lower error rates (in the validation samples) for each
combination of the four factors that were found the most significant for
the explanation of the differences between the methods. These factors
include the form of the statistical distribution of the data the
number of groups the size of the reference set and
the structure of the group dispersion matrices The methods
presented in Table 5.10 as the ones with the lower error rates do not
have significant differences according to the grouping of the Tukey’s
test at the 5% level. The methods are presented in ascending order from
the ones with the lower error rates to those with highest error rates.
5. Experimental comparison of classification techniques 145

The results of Table 5.9 show that the MCDA classification methods
(UTADIS, MHDIS, ELECTRE TRI) outperform, in most cases, the
other approaches. The high efficiency of the considered MCDA methods
is also illustrated in the results presented in Table 5.10. The analysis of
Table 5.10 shows that the implementation of UTADIS with the heuristic
HEUR2 provides the lowest error rates in most cases, especially when
the data come from an asymmetric distribution (exponential and log-
normal). In the same cases, the MHDIS method and ELECTRE TRI also
perform well.
The results of Tables 5.9 and 5.10 lead to the conclusion that the
modeling framework of MCDA methods is quite efficient in addressing
classification problems. The UTADIS and MHDIS methods that employ
a utility-based modeling approach seem to outperform the outranking re-
lations framework of the ELECTRE TRI method. Nevertheless, the dif-
ferences between these approaches are reduced when more complex
problems were considered (e.g., classification problems in three groups
and problems with larger reference sets).
2. The procedure proposed for estimating the parameters of the outranking
relation in the context of the ELECTRE TRI method (cf. the Appendix
of this chapter for a detailed description of the procedure), seems to be
well-suited to the study of classification problems. Extending this proce-
dure to consider also the optimistic assignment approach will contribute
to the full exploitation of the particular features and capabilities of
146 Chapter 5
5. Experimental comparison of classification techniques 147
148 Chapter 5

ELECTRE TRI. This will enable the modeling of the incomparability re-
lation which provides significant information to the decision maker.
Overall, during the whole experiment the discordance test in the
ELECTRE TRI method was performed in 1,250 out of the 3,840 total
replications conducted in the experiment (32.6%). In the proposed pro-
cedure used to specify the parameters of the outranking relation in the
ELECTRE TRI method the discordance test is performed only if it is
found to improve the classification of the alternatives in the reference
set. The limited use of the discordance test in this experiment is most
possibly due to the nature of the considered data. Generally, the discor-
dance test is useful in the evaluation of alternatives that have good per-
formance on some criteria but very poor performance on other criteria.
In such cases, it is possible that a criterion where the alternative has poor
performance vetoes the overall evaluation of the alternative, irrespective
of its good features on the other criteria. Such cases where the perform-
ances of the alternatives on the criteria have significant fluctuations were
not considered in this experiment. Modeling such cases within an ex-
perimental study would be an interesting further extension of this analy-
sis in order to formulate a better view of the impact of the discordance
test on the classification results of the ELECTRE TRI method.
Table 5.11 presents the percentage of replications at which the dis-
cordance test was conducted for each combination of the four factors
found to be the more significant in this experiment (i.e., the form of the
statistical distribution, the number of groups, the size of the reference set
and the structure of the group dispersion matrices). It should be noted
that for each combination of these four factors 80 replications were per-
formed.
5. Experimental comparison of classification techniques 149

The results of Table 5.11 indicate that the discordance test was most
frequently used in the three-group case. Furthermore, it is interesting to
note that the frequency of the use of the discordance test was reduced for
larger reference sets. Finally, it can also be observed that the heterogene-
ity of the group dispersion matrices reduced the frequency of the use of
the discordance test.
Of course, these results on the use of the discordance test need fur-
ther consideration. The discordance test is a key feature of the
ELECTRE TRI method together with the ability of the method to model
the incomparability relation. These two features are the major distin-
guishing characteristics of classification models developed through out-
ranking relation approaches compared to compensatory approaches such
as the UTADIS and the MHDIS methods. The analysis of the existing
differences in the recommendations (evaluation results) of such methods
will contribute to the understanding of the way that the peculiarities of
each approach affect their classification performance.

The experimental analysis presented in this chapter did not address this
issue. Instead, the focal point of interest was the investigation of the classifi-
cation performance of MCDA classification methods compared to other
techniques. The obtained results can be considered as encouraging for the
MCDA approach. Moreover, they provide the basis for further analysis
along the lines of the above remarks.
150 Chapter 5

APPENDIX

DEVELOPMENT OF ELECTRE TRI CLASSIFICATION


MODELS USING A PREFERENCE DISAGGREGATION
APPROACH

1. Prior research
As noted in the presentation of the ELECTRE TRI in Chapter 3, the use of
the method to develop a classification model in the form of an outranking
relation requires the specification of several parameters, including:
1. The weight of each criterion
2. The reference profiles distinguishing two consecu-
tive groups and for all
3. The preference, indifference and veto thresholds for all criteria
and for all k=1, 2, …, q–1.
4. The cut-off threshold that defines the minimum value of
the credibility index above which it can be ascertained that the
affirmation “alternative is at least as good as profile is valid.
The method assumes that all these parameters are specified by the deci-
sion maker in cooperation with the decision analyst through an interactive
process. Nevertheless, this process is often difficult to be implemented in
practice. This is due to two main reasons:
a) The increased amount of time required to elicit preferential information
by the decision maker.
b) The unwillingness of the decision makers to participate actively in the
process and to provide the required information.
These problems are often met in several fields (e.g., stock evaluation,
credit risk assessment, etc.) where decisions have to be taken on a daily ba-
sis, and the time and cost are crucial factors for the use of any decision mak-
ing methodology.
To overcome this problem Mousseau and Slowinski (1998) proposed an
approach to specify the parameters of the outranking relation classification
model of the ELECTRE TRI method using the principles of preference dis-
aggregation. In particular, the authors suggested the use of a reference set for
5. Experimental comparison of classification techniques 151

the specification of the above parameters, so that the misclassifications of


the alternatives in the reference set are minimized. This approach is similar
to the one used in UTADIS and MHDIS (cf. Chapter 4). The methodology
used by the authors implements only the pessimistic assignment procedure
(cf. Chapter 3), without considering the discordance test. In the proposed
methodology the partial concordance index is approximated through
a sigmoid function as follows (cf. Mousseau and Slowinski, 1998):

where:

On the basis of this approximation, a mathematical programming prob-


lem with non-linear constraints is formulated and solved to specify optimally
the reference profiles, the criteria’s weights, the preference and the indiffer-
ence thresholds, as well as the cut-off point This non-linear mathematical
programming problem includes 4m+3n(q–1)+2 constraints, where m is the
number of the alternatives of the reference set, n is the number of criteria and
q is the number of groups; 2m of these constraints have a non-linear form.
Therefore, assuming a reference set of 100 alternatives evaluated along five
criteria and classified into three groups, the proposed non-linear mathemati-
cal programming problem consists of 4×l00+3×5×(3–l)+2=432 constraints
overall, including 200 (2×100) non-linear constraints. This simple example
indicates that even for rather small reference sets, the optimization process in
the methodology of Mousseau and Slowinski (1998) can be quite consuming
in terms of the computational resources that are required to implement it ef-
ficiently.
Subsequent studies using this methodology assumed that the reference
profiles, as well as the preference and indifference thresholds are known
(i.e., specified by the decision maker) and focused only on the estimation on
the criteria’s weights (Dias et al., 2000; Mousseau et al., 2000). In this sim-
plified context the resulting mathematical programming formulation has a
linear form and consequently its solution is easy even for large reference
152 Chapter 5

sets. Nevertheless, this simplification does not address the problem of speci-
fying the parameters of the outranking relation adequately, since some criti-
cal parameters (reference profiles, the preference and indifference thresh-
olds) need to be specified by the decision maker.
Furthermore, it should be emphasized that using the pessimistic assign-
ment procedure in ELECTRE TRI without the discordance test, is quite
similar to the utility-based approach used in the UTADIS method. In particu-
lar, the partial concordance index can be considered as a form of a marginal
utility function. The higher the partial concordance index for the affirmation
“alternative is at least as good as reference profile on the basis of crite-
rion the higher is the utility/value of alternative on criterion
These remarks show that the main distinguishing feature of the two ap-
proaches (outranking relations vs utility-based techniques) is the non-
compensatory philosophy of the outranking relation approaches that is im-
plemented through the discordance test. In this regard, the use of the ELEC-
TRE TRI method without the discordance test cannot be considered as a
different approach to model the classification problem compared to com-
pensatory techniques such as the use of additive utility functions.

2. The proposed approach


In order to address all the aforementioned issues a new methodology has
been developed to estimate the parameters of an outranking relation classifi-
cation model within the context of the ELECTRE TRI method. Similarly to
the approach of Mousseau and Slowinski (1998), the proposed methodology
implements the pessimistic assignment procedure, considering both the con-
cordance and the discordance tests. The methodology combines heuristic
techniques for the specification of the preference, indifference and veto
thresholds, as well as linear programming techniques for the specification of
the criteria’s weights and the cut-off point The major advantages of this
methodology can be summarized in the following two issues:
1. It is computationally efficient, even for large data sets.
2. It implements the most significant feature of the ELECTRE TRI method,
i.e., the discordance test.
The proposed methodology is a regression–based approach that imple-
ments the preference disaggregation philosophy leading to an indirect speci-
fication of the parameters involved in the construction and exploitation of an
outranking relation within the context of the ELECTRE TRI method. In par-
ticular, the determination of the parameters involved in the ELECTRE TRI
method is based on the analysis of a reference set of n alternatives
5. Experimental comparison of classification techniques 153

which are classified into the pre–specified ordered groups


( is the group of the most preferred alternatives and is the group of the
least preferred alternatives). The outcome of the methodology involves the
specification of the criteria’s weights the cut–off point and the parame-
ters and which are defined as follows:
and This definition shows that the specification
of and provides similar information to the specification of
and In particular, using and instead of and
the computation of the partial concordance index and the discordance
index can be performed as follows:

To facilitate the presentation of the proposed approach, henceforth,


and will denote the vectors
and respectively,
will be used to denote the global concordance index defined on the basis of
the partial concordance indices i.e.8:

Similarly, will be used to denote the credibility index


defined on the basis of the global concordance index and the
discordance index i.e.:

8
In contrast to the discussion of the ELECTRE TRI method in chapter 3, in this presenta-
tion the criteria’s weights are assumed to sum up to 1.
154 Chapter 5

The specification of the parameters and in the proposed


methodology is performed in two stages. The first stage involves the concor-
dance test in order to determine the criteria’s weights and the parameters
and The latter two parameters are specified through the following algo-
rithm, which is applied on the reference set for each criterion separately.
Step 1: Rank-order the performances of the alternatives of the reference
set from the lowest to the highest ones
where denotes the number of distinct values
for criterion
Step 2: Break down the range of criterion values into sub-
intervals
such that each and are consecutive values
corresponding to two alternatives from different groups. Let de-
note the number of such sub-intervals formed. Calculate the mid-
point of each sub-interval as
Step 3: For all set and such
that and so that the follow-
ing difference is maximized:

where and denote the sets of alternatives belonging into


the sets of groups and respectively.
Steps 1 and 2 are inspired from the algorithm of Fayyad and Irani (1992)
which is one of the most popular approaches for the discretization of quanti-
tative criteria for machine learning algorithms. It should be noted that the
above algorithm does not lead to the optimal specification of the parameters
and Instead, it leads to the identification of “reasonable” values for
these parameters which can provide a useful basis for an interactive decision
aiding process.
Once the and parameters are specified through the above proce-
dure, the following linear program is solved to estimate the criteria’s weights
denotes the number of alternatives of the reference set that belong into
group

9
All criteria are assumed to be of increasing preference.
5. Experimental comparison of classification techniques 155

Minimize

Subject to:

is a small positive constant.

Linear programs of the above form often have multiple optimal solu-
tions. Furthermore, it is even possible that near-optimal solutions provide a
more accurate classification of the alternatives than the attained optimum
solution (this is because the objective function of the above linear program
does not consider the number of misclassifications). These issues clearly in-
dicate the necessity of exploring the existence of alternative optimal or near-
optimal solutions. However, performing a thorough search of the polyhedron
defined from the constraints of the above linear program could be a difficult
and time consuming process. To overcome this problem the heuristic proce-
dure proposed by Jacquet-Lagrèze and Siskos (1982) for the UTA method is
used (this procedure is also used in the UTADIS method, cf. sub-section
2.3.2 of Chapter 4). This procedure involves the realization of a post opti-
mality analysis stage in order to identify a characteristic subset of the set of
feasible solutions of the above linear program. In particular, the partial ex-
ploration of the feasible set involves the identification of solutions that
156 Chapter 5

maximize the criteria’s weights. Thus, during this post optimality stage n
alternative optimal or near–optimal solutions are identified corresponding to
the maximization of the weights of the n criteria, one at a time. This enables
the derivation of useful conclusions on the stability of the estimated parame-
ters (criteria’s weights). The criteria’s weights which are used in building the
outranking relation are then computed as the average of all solutions identi-
fied during the post optimality process (Jacquet-Lagrèze and Siskos, 1982).
Alternative procedures to aggregate the results of the post optimality analy-
sis are also possible (Siskos, 1982).
At this point of the methodology all the information required to compute
the concordance index is available. Assuming (for the moment) that no crite-
rion has a veto capability, the assignment of the alternatives is performed as
follows (the pessimistic procedure is employed):

Before the end of the first stage of the methodology and the concordance
test, the sets are identified as follows:

These sets include the misclassified alternatives identified on the basis of


the concordance test. In particular, each set includes the alternatives of
group that are assigned (according to the concordance test and the classi-
fication rule (A11)) in the set of groups or in the set
of groups
The second stage of the process involves the discordance test and the
specification of the parameter that corresponds to the veto threshold. The
specification of this parameter is based on the use of the three-steps algo-
rithm used also in the concordance test. Beginning from step 2 of this algo-
rithm, a criterion is given veto ability only if it aggravates the classification
of the alternatives that is obtained on the basis of the concordance test. In
particular, for all k=1, 2,…, q–1 and every criterion is initially set as
follows: such that and On
the basis of this setting, (A2) is used to compute the discordance index
and then the credibility index is computed
using (A3). On the basis of the credibility index new sets of misclassi-
fied alternatives are formed using (A 12). In using relation (A 12) at this stage
is set equal to 0.5 and is used instead of
If the cardinality of all is smaller than or equal to the cardinality of
5. Experimental comparison of classification techniques 157

then it is considered that the choice aggravates the classification of


the alternatives that is obtained on the basis of the concordance test. There-
fore, is an acceptable value for the veto threshold parameter If this is
not the case then is unacceptable. In either case, the procedure proceeds
with setting such that and and the above proc-
ess is repeated.
If the result of the above process indicates that all selections for
criterion such that and provide a worse
classification of the alternatives compared to the concordance test, then no
veto capability is given to the criterion regarding the comparison of the al-
ternatives to the profile If, overall, no criterion is given a veto capability,
then the procedure ends, otherwise the cut-point level needs to be deter-
mined. This is performed as follows:
If the groups are perfectly separable, i.e. where:

and then otherwise the specification of is


performed through the following linear program:

Subject to:

a small positive constant.


Chapter 6
Classification problems in finance

1. INTRODUCTION
Financial management is a broad and rapidly developing field of manage-
ment science. The role of financial management covers all aspects of busi-
ness activity, including investment, financing and dividend policy issues.
During the last decades the globalization of the financial markets, the in-
tensifying competition between corporate entities and the socio-political and
technological changes have increased the complexity of the business, eco-
nomic and financial environments.
Within this new context the smooth financial operation of any corporate
entity and organization becomes a crucial issue for its sustainable growth
and development. Nevertheless, the increasing complexity of the financial
environment poses new challenges that need to be faced. The plethora of the
new financial products that are now available to firms and organization as
risk management, investment and financing instruments is indicative of the
transformations that have occurred in the finance industry over the past dec-
ades and the existing complexity in this field.
To address this complexity it is necessary to adjust the financial decision-
making methodologies so that they meet the requirements of the new finan-
cial environment. Empirical approaches are no longer adequate. Instead,
gradually there is a worldwide increasing trend towards the development and
implementation of more sophisticated approaches based on advanced quanti-
tative analysis techniques, such as statistics, optimization, forecasting, simu-
lation, stochastic processes, artificial intelligence and operations research.
160 Chapter 6

The roots of this new approach of financial decision-making problem can


be traced back to the 1950s and the work of Nobelist Harry Markowitz
(1952, 1959) on portfolio theory and the use of mathematical programming
techniques for portfolio construction. Since then, the contribution of applied
mathematics, statistics and econometrics, operations research, artificial intel-
ligence and computer science, in conjunction with the advances of the fi-
nance theory, have played a major role on addressing the complexity of fi-
nancial decision-making problems.
The application of the aforementioned quantitative analysis techniques in
financial decision-making is of interest both to practitioners and researchers.
In particular, practitioners in the finance industry are interested on the devel-
opment and implementation of efficient quantitative approaches that can
provide efficient support in their daily practice. On the other hand, research-
ers from the aforementioned fields often consider financial decision-making
problems as an excellent field where the outcomes of the ongoing theoretical
research can be tested under complex and challenging real-world conditions.
Several quantitative analysis techniques applied in finance, implement
the classification paradigm. This should be of no surprise. A variety of fi-
nancial decisions need to be taken following the classification approach.
Some typical examples include:
Business failure prediction: discrimination between failed and non-failed
firms.
Credit risk assessment: discrimination of firms of low credit risk from
firms of high risk (default).
Corporate mergers and acquisitions: discrimination between firms that
are likely to be merged or acquired from firms for which their ownership
status is not expected to change.
Stock evaluation and mutual funds’ assessment: classification of the
stocks or mutual funds into predefined groups according to their suitabil-
ity as investment instruments for a particular investor. The classification
can be performed in terms of their expected future returns, their risk or
any other evaluation criterion that is considered relevant by the decision-
maker/investor. Several investment firms have adopted this approach in
their evaluation of stocks and mutual funds (Standard & Poor’s Rating
Services, 1997, 2000; Moody’s Investors Service, 1998, 2000; Sharpe,
1998).
Bond rating: evaluation of corporate or government bond issues accord-
ing to the characteristics of the issuer and classification into rating
groups. Several well-known financial institutions follow this approach in
their bond ratings (e.g., Moody’s Standard & Poor’s, Fitch Investors
Service).
6. Classification problems in finance 161

Country risk assessment: evaluation of the performance of countries tak-


ing into consideration economic measures as well as social and political
indicators, in order to classify the countries into predefined groups ac-
cording to their default risk. Such classifications are available from lead-
ing financial institutions including Moody’s (Moody’s Investors Service,
1999) and Standard & Poor’s.
Venture capital investments: evaluation of venture capital investment
projects and classification into the ones that should be accepted, rejected
or submitted to further analysis (Zopounidis, 1990).
Assessment of the financial performance of organizations (banks, insur-
ance companies, public firms, etc.): classification of the organizations
into predefined groups according to their financial performance.
All the above examples are illustrative of the significance of developing
efficient classification models for financial decision-making purposes.
On the basis of this finding, the objective of this chapter is to explore the
efficiency of the proposed MCDA paradigm in modeling and addressing fi-
nancial classification problems. This analysis extends the results of the pre-
vious chapter through the consideration of real-world classification prob-
lems. Three financial decision-making problems are used for this purpose:
1. Bankruptcy prediction.
2. Corporate credit risk assessment.
3. Stock evaluation.
In each of these problems the methods considered in the experimental de-
sign of Chapter 5 are used to develop appropriate classification models. With
regard to the application of the UTADIS method, it should be noted that all
the subsequent results are obtained using the heuristic HEUR2, which was
found to outperform HEUR1 in many cases considered in the simulation of
Chapter 5.

2. BANKRUPTCY PREDICTION

2.1 Problem domain


In the field of corporate finance any individual, firm or organization that es-
tablishes some form of relationship with a corporate entity (i.e., as an inves-
tor, creditor or stockholder) is interested on the analysis of the performance
and viability of the firm under consideration.
Financial researchers have explored this issue from different points of
view considering the different forms of financial distress, including default,
insolvency and bankruptcy. Essentially, the term “bankruptcy” refers to the
162 Chapter 6

termination of the operation of the firm following a filing for bankruptcy due
to severe financial difficulties of the firm in meeting its financial obligations
to its creditors. On the other hand, the other forms of financial distress do not
necessarily lead to the termination of the operation of the firm. Further de-
tails on the different forms of financial distress can be found in the books of
Altman (1993), Zopounidis and Dimitras (1998).
The consequences of bankruptcy are not restricted to the individuals,
firms or organizations that have an established relationship with the bankrupt
firm; they often extend to the whole economic, business and social environ-
ment of a country or a region. For instance, developing countries are often
quite vulnerable to corporate bankruptcies, especially when the bankruptcy
involves a firm with a major impact in the country’s economy. Furthermore,
taking into account the globalization of the economic environment, it be-
comes clear that such a case may also have global implications. The recent
crisis in Southeast Asia is an indicative example.
These findings demonstrate the necessity of developing and implement-
ing efficient procedures for bankruptcy prediction. Such procedures are nec-
essary for financial institutions, individual and institutional investors, as well
as for the firms themselves and even for policy makers (e.g., government
officers, central banks, etc.).
The main goal of bankruptcy prediction procedures is to discriminate the
firms that are likely to go bankrupt from the healthy firms. This is a two-
group classification problem. However, often an additional group is also
considered to add flexibility to the analysis. The intermediate group may
include firms for which it is difficult to make a clear conclusion. Some re-
searchers place in such an intermediate group distressed firms that finally
survive through restructuring plans, including mergers and acquisitions
(Theodossiou et al., 1996).
The classification of the firms into groups according to their bankruptcy
risk is usually performed on the basis of their financial characteristics using
information derived by the available financial statements (i.e., balance sheet
and income statement). Financial ratios calculated through the accounts of
the financial statements are the most widely used bankruptcy prediction cri-
teria. Nevertheless, making bankruptcy prediction solely on the basis of fi-
nancial ratios has been criticized by several researchers (Dimitras et al.,
1996; Laittinen, 1992). The criticism has been mainly focused on the fact
that financial ratios are only the symptoms of the operating and financial
problems that a firm faces rather than the cause of these problems. To over-
come this shortcoming, several researchers have noted the significance of
considering additional qualitative information in bankruptcy prediction.
Such qualitative information involves criteria such as the management of the
firms, their organization, the market niche/position, the market’s trends, their
6. Classification problems in finance 163

special competitive advantages, etc. (Zopounidis, 1987). However, this in-


formation is not publicly available and consequently quite difficult to gather.
This difficulty justifies the fact that most existing studies on bankruptcy pre-
diction are based only on financial ratios.
The first approaches used for bankruptcy prediction were empirical. The
most well-known approaches of this type include the “5 C method” (Charac-
ter, Capacity, Capital, Conditions, Coverage), the “LAPP” method (Liquid-
ity, Activity, Profitability, Potential), and the “Creditmen” method
(Zopounidis, 1995). Later more sophisticated univariate statistical ap-
proaches were introduced in this field to study the discriminating power of
financial ratios in distinguishing the bankrupt firms from the non-bankrupt
ones (Beaver, 1966).
However, the real thrust in the field of bankruptcy prediction was given
by the work of Altman (1968) on the use of linear discriminant analysis
(LDA) for developing bankruptcy prediction models. Altman used LDA in
order to develop a bankruptcy prediction model considering several financial
ratios in a multivariate context. This study motivated several other research-
ers towards the exploration of statistical and econometric techniques for
bankruptcy prediction purposes. Some characteristic studies include the
work of Altman et al. (1977) on the use of QDA, the works of Jensen (1971),
Gupta and Huefner (1972) on cluster analysis, the work of Vranas (1992) on
the linear probability model, the works of Martin (1977), Ohlson (1980),
Zavgren (1985), Peel (1987), Keasey et al. (1990) on logit analysis, the
works of Zmijewski (1984), Casey et al. (1986), Skogsvik (1990) on probit
analysis, the work of Luoma and Laitinen (1991) on survival analysis, and
the work of Scapens et al. (1981) on catastrophe theory.
During the last two decades new non-parametric approaches have gained
the interest of the researchers in the field. These approaches include among
others, mathematical programming (Gupta et al., 1990), expert systems
(Elmer and Borowski, 1988; Messier and Hansen, 1988), machine learning
(Frydman et al., 1985), rough sets (Slowinski and Zopounidis, 1995; Dimi-
tras et al., 1999), neural networks (Wilson and Sharda, 1994; Boritz and
Kennedy, 1995), and MCDA (Zopounidis, 1987; Andenmatten, 1995; Dimi-
tras et al., 1995; Zopounidis, 1995; Zopounidis and Dimitras, 1998). The
results of these studies have shown that the aforementioned new approaches
are well-suited to the bankruptcy prediction problem providing satisfactory
results compared to the traditional statistical and econometric techniques.
A comprehensive review of the relevant literature on the bankruptcy pre-
diction problem can be found in the books of Altman (1993), Zopounidis and
Dimitras (1998), as well as in the works of Keasey and Watson (1991),
Dimitras et al. (1996), Altman and Saunders (1998).
164 Chapter 6

2.2 Data and methodology


The data of this application originate from the study of Dimitras et al.
(1999). Two samples of Greek industrial firms are considered. The first
sample includes 80 firms and is used for model development purposes, while
the second sample consists of 38 firms and serves as the validation sample.
Henceforth, the first sample will be referred to as the basic sample, and the
second one as the holdout sample.
The basic sample includes 40 firms that went bankrupt during the period
1986–1990. The specific time of bankruptcy is not common to all firms. In
particular, among these 40 firms, 6 went bankrupt in 1986, 10 in 1987, 9 in
1988, 11 in 1989 and 4 in 1990. For each of the bankrupt firms, financial
data are collected for up to five years prior to bankruptcy using their pub-
lished financial statements. For instance, for the firms that went bankrupt in
1986, the collected financial data span the period 1981–1985. Consequently,
the basic sample actually spans the period 1981–1989. To facilitate the pres-
entation and discussion of the results, each year prior to bankruptcy will be
denoted as year –1, year –2, year –3, year –4 and year –5. Year –1 refers to
the first year prior to bankruptcy (e.g., for the firms that went bankrupt in
1986, year –1 refers to 1985); year –2 refers to the second year prior to bank-
ruptcy (e.g., for the firms that went bankrupt in 1986, year –2 refers to
1984), etc. The bankrupt firms operate in 13 different industrial sectors in-
cluding food firms, textile firms, chemical firms, transport, wear and foot-
wear industries, metallurgical industries, etc. To each of these bankrupt firms
a non-bankrupt firm is matched from the same business sector. The matching
is performed on the basis of the size of the firms, measured in terms of their
total assets and the number of employees.
The holdout sample was compiled in a similar fashion. This sample in-
cludes 19 firms that went bankrupt in the period 1991–1993. The financial
data gathered for these firms span over a three-year period prior to bank-
ruptcy. A matching approach, similar to the one used for the basic sample,
has also been used for the selection of the non-bankrupt firms. The fact that
the holdout sample covers a different period than the one of the basic sam-
ple, enables the better investigation of the robustness of the performance of
the developed bankruptcy prediction models.
On the basis of the available financial data for the firms of the two sam-
ples 12 financial ratios have been calculated to be used as bankruptcy predic-
tion criteria. These financial ratios are presented in Table 6.1. The selection
of these ratios is based on the availability of financial data, their relevance to
the bankruptcy prediction problem as reported in the international financial
literature, as well as on the experience of an expert credit manager of a lead-
ing Greek commercial bank (Dimitras et al., 1999).
6. Classification problems in finance 165

Among the financial ratios considered, the first four ratios measure the
profitability of the firms. High values of these ratios correspond to profitable
firms. Thus, all these ratios are negatively related to the probability of bank-
ruptcy. The financial ratios current assets/current liabilities and quick
assets/current liabilities involve the liquidity of the firms and they are
commonly used to predict bankruptcy (Altman et al., 1977; Gloubos and
Grammaticos, 1988; Zavgren, 1985; Keasey et al., 1990; Theodossiou, 1991;
Theodossiou et al., 1996). Firms having enough liquid assets (current assets)
are in better liquidity position and are more capable in meeting their short–
term obligations to their creditors. Thus, these two ratios are negatively re-
lated to the probability of bankruptcy. The remaining ratios are related to the
solvency of the firms and their working capital management. High values on
the ratios and (solvency ratios) indicate severe indebtedness, in which
case the firms have to generate more income to meet their obligations and
repay their debt. Consequently both ratios are positively related to the prob-
ability of bankruptcy. Ratios and are related to the working capital
management efficiency of the firms. Generally, the higher is the working
capital of a firm, the less likely is that it will go bankrupt. In that regard, ra-
tios and are negatively related to the probability of bankruptcy,
whereas ratio is positively related to bankruptcy (inventories are often
difficult to liquidate and consequently a firm holding a significant amount of
inventory is likely to face liquidity problems).

Of course the different industry sectors included both in the basic and
the holdout sample, are expected to have different financial characteristics,
thus presenting differences in the financial ratios that are employed. Some
researchers have examined the industry effects on bankruptcy prediction
models, by adjusting the financial ratios to industry averages. However, the
166 Chapter 6

obtained results are controversial. Platt and Platt (1990) concluded that an
adjusted bankruptcy prediction model performs better than an unadjusted
one, while Theodossiou (1987) did not find any essential difference or im-
provement. Furthermore, Theodossiou et al. (1996) argue that adjusted in-
dustry or time models implicitly assume that bankruptcy rates for businesses
are homogenous across industries and time, an assumption which is hardly
the case. On this basis, no adjustment to the industry sector is being made on
the selected financial ratios.
Tables 6.2-6.6 present some descriptive statistics regarding the two sam-
ples with regard to the selected financial ratios for the two groups of firms,
the bankrupt and the non-bankrupt and respectively). In particular,
Table 6.2 presents the means of the financial ratios, Tables 6.3 and 6.4 pre-
sent the skewness and kurtosis coefficients, whereas Tables 6.5 and 6.6 pre-
sent the correlation coefficients between the selected financial ratios.
It is interesting to note from Table 6.2 that many of the considered finan-
cial ratios significantly differentiate the two groups of firms at least in the
case of the basic sample. However, in the holdout sample the differences
among the two groups of firms are less significant. Actually, the only ratio
that significantly differentiates the two groups for all three years of the hold-
out sample, is the solvency ratio total liabilities/total assets that meas-
ures the debt capacity of the firms.
On the basis of the two samples and the selected set of financial ratios,
the development and validation of bankruptcy prediction models is per-
formed in three stages:
1. In the first stage the data of the firms included in the basic sample for the
first year prior to bankruptcy (year -1) are used to develop a bankruptcy
prediction model. The predictions made with this model involve a time-
depth of one year. In that sense, the model uses as input the financial ra-
tios of the firms for a given year and its output involves an assessment
of the bankruptcy risk for the firms in year Alternatively, it could be
possible to develop different models for each year prior to bankruptcy
(years –1 up to –5). In this scheme, each model would use the financial
ratios for a year to produce an estimate of the bankruptcy risk in years
t+1, t+2, …, t+5. Also, it could be possible to develop a multi-group
bankruptcy prediction model considering not only the status of the firms
(bankrupt or non-bankrupt) but also the time bankruptcy occurs. In this
multi-group scheme the groups could be defined as follows: non-
bankrupt firms firms that will go bankrupt in the forthcoming year
firms that will go bankrupt in year firms that will go
bankrupt in year firms that will go bankrupt in year
firms that will go bankrupt in year This approach has been used
in the study of Keasey et al. (1990). Nevertheless, the fact that the hold-
6. Classification problems in finance 167

out sample involves a three year period as opposed to the five year pe-
riod of the basic sample posses problems on the validation of the classi-
fication models developed through these alternative schemes.
2. In the second stage of the analysis the developed bankruptcy prediction
models are applied to the data of the firms of the basic sample for the
years –2, –3, –4 and –5. This enables the investigation of the ability of
the developed models to provide early warning signals for the bank-
ruptcy status of the firms used to develop these models.
168 Chapter 6
6. Classification problems in finance 169
170 Chapter 6
6. Classification problems in finance 171

3. In the final stage of the analysis, the developed models are applied to the
three years of the holdout sample. This provides an assessment of the
generalizing ability of the models when different firms for a different
time period are considered.
The above three-stage procedure was used for all the considered classifi-
cation methods. The sub-sections that follow analyze and compare the ob-
tained results.
172 Chapter 6

2.3 The developed models


2.3.1 The model of the UTADIS method

To apply the UTADIS method the heuristic HEUR2 is used for the specifica-
tion of the piece-wise linear form of the marginal utility functions. Follow-
ing this approach, the additive utility model developed for bankruptcy pre-
diction purposes has the following form:

The coefficients of the marginal utilities in this function, show that the
most significant ratios for the discrimination of the two groups of firms and
the prediction of bankruptcy are the solvency ratios total liabilities/total as-
sets and net worth/(net worth + long-term liabilities) For the other
ratios there are no significant differences in their contribution to bankruptcy
prediction. Only the ratio inventory/working capital has very low
significance in the developed additive utility model. The specific form of the
marginal utility functions of the developed model is illustrated in Figure 6.1.
On the basis of this additive utility model the classification of the firm as
bankrupt or non-bankrupt is performed through the following classification
rules:

If then firm is non-bankrupt.


If then firm is bankrupt.
6. Classification problems in finance 173
174 Chapter 6

2.3.2 The model of the MHDIS method

In the case of the MHDIS method, based on the financial ratios of the firms
for year –1, two additive utility functions are developed, since there are only
two groups (bankrupt and non-bankrupt firms). The first additive utility
function, denoted as characterizes the non–bankrupt firms, while the
second one, denoted as characterizes the bankrupt firms. The form
of these two functions is the following:

According to the weighting coefficients of the marginal utilities in the


above utility functions the dominant factors for the estimation of bankruptcy
risk are the profitability ratio net income/total assets and the solvency
ratio total liabilities/total assets The latter was also found to be signifi-
cant in the bankruptcy prediction model of the UTADIS method. In the case
of the MHDIS method the ratio net income/total assets mainly characterizes
the non-bankrupt firms, since its weight in the additive utility function
is more than 85%. On the other hand, the fact that the weight of this ratio in
the utility function is only 3.85% indicates that while high values of
net income/total assets are significant characteristic of non-bankrupt firms,
low values do not necessarily indicate bankruptcy. The second ratio that is
6. Classification problems in finance 175

found significant, i.e., the ratio total liabilities/total assets mainly character-
izes the bankrupt firms, since its weight in the utility function exceeds
88%. However, the weight of this ratio in the utility function is only
2.81%, indicating that while high values of total liabilities/total assets char-
acterize the bankrupt firms, low values of this ratio are not a significant
characteristic for non-bankrupt firms.
The forms of the marginal utility functions for these two ratios (Figure
6.2) provide some insight on the above remarks. In particular, the form of
the marginal utility function for the profitability ratio net income/total assets
indicates that firms with net income/total assets higher than 1.29% are more
likely to be classified as non–bankrupt. On the other hand, firms with total
liabilities/total assets higher than 77.30% are more likely to go bankrupt.
These results indicate that profitability and solvency are the two main distin-
guishing characteristics of non–bankrupt and bankrupt firms, according to
the model developed through MHDIS.
176 Chapter 6

The decision regarding the classification of a firm into one of the two
considered groups (bankrupt and non-bankrupt) is based upon the global
utilities obtained through the two developed additive utility functions. In that
sense, a firm is considered bankrupt if and non–bankrupt if
Through out the application there were no cases where

2.3.3 The ELECTRE TRI model

The application of the ELECTRE TRI is based on the use of the procedure
presented in Chapter 5 for the specification of the parameters of the outrank-
ing relation classification model. In applying this procedure, first the con-
cordance test is performed to specify the parameter vector corresponding
to the preference thresholds for all criteria and the parameter vector
corresponding to the indifference thresholds for all criteria At the same
stage the criteria weights are estimated. An initial estimation for the cut-off
6. Classification problems in finance 177

point is also obtained. All this information is used to perform an initial


classification of the firms of the reference set (year -1, basic sample) and to
measure the classification error rate. Then, in a second stage the impact of
the discordance test is explored on the classification results. If the discor-
dance test improves the classification of the firms, then it is used in the con-
struction of the outranking relation classification model, otherwise the re-
sults of the discordance test are ignored and the classification is performed
considering only the results of the concordance test.
In this bankruptcy prediction case study, the discordance test was not
found to improve the classification results obtained through the concordance
test. Consequently all the presented results are the ones obtained only from
the first step of the parameter estimation procedure involving the concor-
dance test. The results of the parameter estimation process for the considered
bankruptcy prediction data are presented in Table 6.7.

According to the weights of the financial ratios in the outranking relation


model of the ELECTRE TRI method, the most significant ratios for the pre-
diction of bankruptcy are the profitability ratio net income/total assets
and the solvency ratios net worth/(net worth + long-term liabilities) and
current liabilities/total assets The net income/total assets ratio was also
found to be a significant factor in the bankruptcy prediction model of the
MHDIS method.
The classification of the firms as bankrupt or non-bankrupt according to
the outranking relation model of the ELECTRE TRI method is performed
178 Chapter 6

using only the results of the concordance test, since the use of the discor-
dance test was not found to improve the classification results for the data of
the reference set (i.e., year -1 of the basic sample). Therefore, the credibility
index is defined on the basis of the global concordance index, as follows:

for all firms

where:

On the basis of the credibility index calculated in this way, the classifica-
tion of the firms is performed through the following classification rule (the
cut-off point is estimated through the procedure described in the
appendix of the Chapter 5):

If then firm is non-bankrupt.

If then firm is bankrupt.

2.3.4 The rough set model

Similarly to the experimental simulation of Chapter 5, the MODLEM algo-


rithm is used to develop a rough set model for the prediction of bankruptcy.
The resulting model consists of 10 decision rules presented in Table 6.8.
The rule-based model considers only six ratios thus leading to a signifi-
cant decrease in the information required to derive bankruptcy predictions.
The ratios used in the developed rules include the profitability ratios
and the solvency ratios and and the working capital ratio Some
of these ratios were also found significant in the bankruptcy models of the
MCDA classification techniques. In particular: (a) the net income/total assets
ratio was found significant by the MHDIS method and the ELECTRE
TRI method, (b) the total liabilities/total assets ratio was found signifi-
cant by UTADIS and MHDIS, and (c) the net worth/(net worth + long-term
liabilities) ratio was found significant by UTADIS and ELECTRE TRI.
Six of the developed rules involve the bankrupt firms, whereas the re-
maining four rules involve the non-bankrupt ones. It is also interesting to
6. Classification problems in finance 179

note that the rules for the non-bankrupt firms are stronger than the rules
corresponding to the bankrupt firms. The average strength of the rules for the
non-bankrupt firms is 15.75 as opposed to 11.17 which is the average
strength of the rules for the bankrupt firms. These two findings (the number
and the strength of the rules per group) indicate that, generally, it is more
difficult to describe the bankrupt firms than the non-bankrupt ones.

2.3.5 The statistical models

The application of the three statistical classification techniques in the bank-


ruptcy prediction data led to the development of three bankruptcy prediction
models. The functional form of these models and the corresponding bank-
ruptcy prediction rules are the following:
1. Linear discriminant analysis (LDA):

If then firm is non-bankrupt.

If then firm is bankrupt.

2. Quadratic discriminant analysis (QDA):

If then firm is non-


bankrupt.

If then firm is bankrupt.


180 Chapter 6

3. Logit analysis (LA):

If then firm is non-bankrupt.

If then firm is bankrupt.

Tables 6.9 and 6.10 present the estimates for the parameters of the above
models, including the constant term the discriminant coefficients and
the cross-product terms for the case of QDA.
6. Classification problems in finance 181

2.4 Comparison of the bankruptcy prediction models


The detailed classification results obtained through all the above bankruptcy
prediction models are presented in Tables 6.10 and 6.11 for the basic and the
holdout sample, respectively.
The presented results involve the two types of error rates. The type I error
refers to the classification of bankrupt firms as non-bankrupt ones, whereas
the type II error refers to the classification of non-bankrupt firms as bankrupt
ones. Generally, the type I error leads to capital loss (e.g., a firm that goes
bankrupt cannot fulfill its debt obligations to its creditors), while the cost of
the type II error has the form of an opportunity cost (e.g., the creditor loses
the opportunity to gain revenues from granting credit to a healthy firm). In
that sense, it is obvious that the type I error is much more significant than the
type II error. Altman (1993) argues that the type I error for banks in USA is
approximately 62% of the amount of granted loans, whereas the type II error
is only 2% (the difference between a risk-free investment and the interest
rate of the loan). However, in obtaining a good measure of the overall classi-
fication it is necessary to consider both the cost of the individual error types,
and the a-priori probability that an error of a specific type may occur. In par-
ticular, generally, the number of non-bankrupt firms is considerably larger
than the number of firms that go bankrupt. For instance in USA, Altman
(1993) notes that bankrupt firms constitute approximately 5% of the total
population of firms. Of course, this percentage varies from country to coun-
try and for different time periods. Nevertheless, it gives a clear indication
that the probability that a firm will go bankrupt is considerably lower than
the probability that a firm will not go bankrupt. On the basis of these re-
marks, it is reasonable to argue that both the type I error and the type II error
contribute equally to the estimation of the overall error rate. Therefore, the
overall error rate is estimated as the average of the two error types. For more
details on the manipulation of the probabilities and the costs associated with
the type I and II errors see Theodossiou et al. (1996) and Bardos (1998).
The comparison of the bankruptcy prediction results regarding the overall
error rate in the basic sample shows that UTADIS, MHDIS and rough sets
perform rather better than the other techniques. In particular, in year -1
rough sets perform better than all the other techniques, followed by
UTADIS, MHDIS and ELECTRE TRI. This is of no surprise: recall that the
data of the firms for the first year prior to bankruptcy are used as the refer-
ence set for model development. Rough sets and all the MCDA classification
techniques have a higher fitting ability compared to the three statistical tech-
niques and consequently they are expected to perform better in terms of the
error rate in the reference set. In year -2 UTADIS and MHDIS provide the
same overall error rate which is significantly lower than the error rate of all
182 Chapter 6

the other methods. In year -3 MHDIS provides the lowest error rate, fol-
lowed by UTADIS (HEUR2). In year -4 the lowest overall error rate is ob-
tained by the UTADIS model, whereas in year -5 the best result is obtained
by the models of MHDIS, rough sets and LA.
As far as the holdout sample is concerned, it is clear that the overall error
rate of all methods is increased compared to the case of the basic sample.
The comparison of the methods shows that the bankruptcy prediction model
of UTADIS performs better than the other models in years -1 and -2,
whereas in year -3 the lower overall error rate is obtained by the model of
the MHDIS method. It is important to note that all three MCDA methods
(UTADIS, MHDIS and ELECTRE TRI) outperform the three statistical
techniques in all three years of the holdout sample. This is significant finding
with regard to the relative efficiency of the corresponding bankruptcy pre-
diction models. It is also interesting to note that the two bankruptcy predic-
tion models of UTADIS and MHDIS provide significantly lower type I error
rates compared to the other techniques. For both these models the type I er-
ror rate is lower than 50% for all years of the holdout sample, whereas in
many cases regarding the other methods it considerably exceeds 50%.
Overall, it should be noticed that the type I error rate is higher than the
type II error rate for all methods in both the basic and the holdout samples.
Nevertheless, this is not a surprising result. Generally the process which
leads a firm to bankruptcy is a dynamic one and it cannot be fully explained
through the examination of the financial characteristics of the firm. In the
beginning of this process the financial characteristics of both non-bankrupt
and bankrupt firms are usually similar (Dimitras et al., 1998). As time
evolves some specific changes in the environment (internal and external) in
which the firm operates, such as changes in the management of the firm or
changes in the market, may lead the firm in facing significant problems
which ultimately lead to bankruptcy. Thus, non-bankrupt firms are rather
easier to describe than the bankrupt firms, in terms of their financial charac-
teristics (they remain in good position over time).
The above finding has motivated researchers to propose the consideration
of additional qualitative strategic variables in bankruptcy prediction models,
including among others management of the firms, their organization, their
market niche/position, their technical facilities, etc. (Zopounidis, 1987). Ac-
tually, as pointed out by Laitinen (1992) the inefficiency of the firms along
these qualitative factors is the true cause of bankruptcy; the poor financial
performance is only a symptom of bankruptcy rather than its cause.
6. Classification problems in finance 183
184 Chapter 6

Despite the significance of the consideration of qualitative strategic vari-


ables in developing bankruptcy prediction models, their collection is a diffi-
cult process since they are not publicly available to researchers and analysts.
This difficultly was also encountered in this case study. The exclusion from
the analysis of qualitative information can be considered as one of the main
reasons justifying the rather low performance of all the developed models
that is apparent in some cases in both the basic and the holdout samples.
However, similar results are obtained in most of the existing studies that em-
ploy the same methodological framework for bankruptcy prediction pur-
poses (i.e., the use of financial ratios), thus indicating that additional qualita-
tive information is required to perform a better description of the bankruptcy
process.
Another form of information that could be useful for modeling and esti-
mating bankruptcy risk involves the general economic conditions within
which firms operate. The economic environment (e.g., inflation, interest
6. Classification problems in finance 185

rates, exchange rates, taxation, etc.) has often a significant impact on the per-
formance and the viability of the firms. Consequently, the consideration of
the sensitivity of the firms to the existing economic conditions could add a
significant amount of useful information for bankruptcy prediction purposes,
especially in case of economic vulnerability and crises. Some studies have
adopted this approach (Rose et al., 1982; Foster, 1986) but still further re-
search is required on this issue.
Finally, it would be useful to consider the bankruptcy prediction problem
in a dynamic context rather than in a static one. As already noted, bank-
ruptcy is a time-evolving event. Therefore, it could be useful to consider all
the available bankruptcy related information as time evolves in order to de-
velop more reliable early warning models for bankruptcy prediction. Kahya
and Theodossiou (1999) followed this approach and they modeled the bank-
ruptcy prediction problem in a time-series context.
All the above remarks constitute interesting research directions in the
field of bankruptcy prediction. The fact that there is no research study that
combines all the above issues to develop a unified bankruptcy prediction
theory indicates the complexity of the problem and the significant research
effort that still needs to be made.
Despite this finding the existing research on the development of bank-
ruptcy prediction models should not be considered inadequate for meeting
the needs of practitioners. Indeed, any model that considers publicly avail-
able information (e.g., financial ratios) and manages to outperform the esti-
mations of expert analysts has an obvious practical usefulness. Studies on
this issue have shown that bankruptcy prediction models such as the ones
developed above perform often better than experts (Lennox, 1999).

3. CORPORATE CREDIT RISK ASSESSMENT

3.1 Problem domain


Credit risk assessment refers to the analysis of the likelihood that a debtor
(firm, organization or individual)1 will not be able to meet its debt obliga-
tions to its creditors (default). This inability can be either temporary or per-
manent. This problem is often related to bankruptcy prediction. Actually,
bankruptcy prediction models are often used within the credit risk assess-
ment context. However, the two problems are slightly different: bankruptcy
has mainly a legal interpretation, whereas default has a financial interpreta-

1
Without loss of generality the subsequent analysis will focus on the case where the debtor
is a firm or organization (corporate credit risk assessment).
186 Chapter 6

tion. Indeed, most authors consider that a firm is in a situation of default


when the book value of its liabilities exceeds the market value of its assets
(Altman, 1993).
Except for this difference on the definition of bankruptcy and credit risk,
there is also a significant underlying difference in the practical context
within which they are addressed. In particular, bankruptcy prediction simply
involves the assessment of the likelihood that a firm will go bankrupt. On the
other hand, credit risk assessment decisions need to be taken in a broader
context considering the following two issues:
1. The estimated loss from granting credit to a firm that will ultimately de-
fault (default risk).
2. The estimated profit from granting credit to a healthy firm.
The tradeoff between the estimated losses and profits is a key issue for
deciding on the acceptance or rejection of the credit as well as on the amount
of credit that will be granted. Within this context, the credit risk assessment
problem can be addressed within a three-stage framework (Srinivasan and
Kim, 1987):
Stage 1: Estimation of the present value of the expected profits and
losses for each period of the loan, on the basis of the background (credit
history) of the firm.
Stage 2: Combination of the present value of the expected profit/losses
with the probabilities of default and non-default to estimate the net pre-
sent value from granting the credit.
Stage 3: If the net present value is negative then the credit is rejected,
otherwise it is accepted and the amount of loan to grant is determined.
The implementation of this context assumes that the credit granting prob-
lem (and consequently credit risk assessment) is a multi-period problem (cf.
stage 1). This is true considering that the repayment of the loan is performed
through a series of interest payments (monthly, semi-annual or annual)
spanned over a period of time (usually some years). Over this period the
credit institution has the opportunity to extend its cooperation with the firm.
In this regard, profits are not only derived from the interest that the firm pays
for the loan, but they may also be derived through the extended cooperation
between the bank and the firm.
The contribution of classification techniques in the implementation of the
above framework is realized in the second stage involving the estimation of
the probabilities of default and non-default. Default is often followed by a
bankruptcy filing, thus the analysis presented in the previous sub-section for
the development of bankruptcy prediction models is similar to that used of-
ten for credit risk assessment purposes. Nevertheless, it should be empha-
sized that default does not necessarily mean that a firm will go bankrupt. For
6. Classification problems in finance 187

instance, a firm in default may implement a restructuring plan to recover


from its problems and ultimately to avoid bankruptcy (Altaman, 1993).
The use of classification techniques for credit risk assessment aims at de-
veloping models which assign the firms into groups, according to their credit
risk level. Usually, two groups are used: a) firms for which the credit should
be granted and b) firms for which the credit should be rejected. The gather-
ing of the data required for the development of the appropriate credit risk
model (i.e., construction of a reference set/training sample) can be realized
using the existing credit portfolio of the financial institution for which the
development of the model takes place. The development of such a credit risk
assessment model provides significant advantages for financial institutions
(Khalil et al., 2000):
It introduces a common basis for the evaluation of firms who request
financing. The credit applications are, usually, evaluated at a peripheral
level and not at a central one, particularly in cases where the amount of
the credit is limited. The practical implementation of a credit risk as-
sessment model allows the use of a common evaluation system, thus re-
ducing the peremptoriness and subjectivity that often characterize indi-
vidual credit analysts.
It constitutes a useful guide for the definition of the amount of the credit
that could be granted (Srinivasan and Kim, 1987).
It reduces the time and cost of the evaluation procedure, which could be
restricted to firms of high credit risk. Further analysis of the credit appli-
cations of these firms can be realized thoroughly from the specialized
credit analysts, at a central level.
It facilitates the management and monitoring of the whole credit portfo-
lio of the financial institution.
The above four points justify the wide spread use of credit risk assess-
ment systems. At the research level, there has been a wide use of statistical
approaches up to today. An analytical presentation of the relevant applica-
tions is outlined in the book of Altaian et al. (1981). However, recently there
has been a spread of alternative approaches such as machine learning and
expert systems (Cronan et al. 1991; Tessmer, 1997; Matsatsinis et al., 1997),
decision support systems (Srinivasan and Ruparel, 1990; Duchessi and
Belardo, 1987; Zopounidis et al., 1996; Zopounidis and Doumpos, 2000b),
genetic algorithms and neural networks (Fritz and Hosemann, 2000), multic-
riteria analysis (Bergeron et al., 1996; Zopounidis and Doumpos, 1998;
Jablonsky, 1993; Lee et al., 1995; Khalil et al., 2000), e.t.c.
The objective of the application presented in the subsequent subsections
is to illustrate the potentials of credit risk assessment models within the
aforementioned credit granting framework, using data derived from the
188 Chapter 6

credit portfolio of a leading Greek commercial bank. For this purpose, the
classification methods used in Chapter 5 are employed and the obtained re-
sults are compared.

3.2 Data and methodology


The data of this application involve 60 industrial firms derived from the
credit portfolio of a leading Greek commercial bank. The data span through
the three year period 1993–1995. The firms in the sample are classified into
two groups:
The firms with high financial performance that cooperate smoothly with
the bank and manage to fulfill their debt obligations. These firms are
considered as typical examples of firms with low credit risk that should
be financed by the bank. The number of these low credit risk firms in the
sample is 30.
The firms with poor financial performance. The cooperation of the bank
with these firms had several problems since the firms were not able to
meet adequately their debt obligations. In that regard, these firms are
considered as typical examples of firms in default for which credit
should be rejected. The sample includes 30 such firms.
Based on the detailed financial data of the firms for the period under con-
sideration, all the 30 financial ratios which are included in the financial
model base of the FINCLAS system (FINancial CLASsification; Zopouni-
dis and Doumpos, 1998) were computed as an initial set of evaluation crite-
ria describing the financial position and the credit risk of the firms. Table
6.12 presents these ratios. Of course, developing a credit risk assessment
model, involving such a large amount of information, is of rather limited
practical interest. Indeed, a broad set of ratios is likely to include ratios that
provide similar information (i.e., highly correlated ratios). Such a situation
poses both practical and model development problems. At the practical level,
credit risk analysts do not feel comfortable with the examination of ratios
that provide the same information, because data collection is time consuming
and costly. Therefore, the analysis needs to be based on a compact set of ra-
tios to avoid the collection of unnecessary and overlapping data. From the
model development point of view, the consideration of correlated ratios
poses problems on the stability and the interpretation of the developed
model.
To avoid these problems a factor analysis was initially performed.
Through this process nine factors were obtained. Table 6.13 presents the fac-
tor loadings of each financial ratio.
6. Classification problems in finance 189
190 Chapter 6

It was decided to include in the analysis the ratios with factor loadings
greater than 0.7 (in absolute terms). Therefore, the ratios (net in-
come/sales), (net worth/total liabilities), (current assets/current liabili-
ties), (quick assets/current liabilities), (cash/current liabilities),
(dividends/cash flow), (working capital/total assets) and (current li-
abilities/inventories) were selected. Ratios and involve the profitability
6. Classification problems in finance 191

of the firms, ratios and involve the solvency and liquidity,


while ratio involves the managerial performance (Courtis, 1978).
In addition to the above ratios selected through factor analysis, in a sec-
ond stage it was decided to incorporate in the analysis some additional fi-
nancial ratios which are usually considered as important factors in the as-
sessment of credit risk, in order to have a more complete description of the
firms’ credit risk and financial performance. Through a cooperative process
and discussion with the credit managers of the bank, four additional financial
ratios were selected at this stage: the profitability ratios (earnings before
interest and taxes/total assets) and (net income/net worth), the solvency
ratio (total liabilities/total assets) and the managerial performance ratio
(interest expenses/sales). The final set of the selected ratios is presented
in Table 6.14.

Most of the selected financial ratios are negatively related to credit risk
(i.e., higher values indicate lower credit risk). Only ratios (total liabili-
ties/total assets), (interest expenses/sales) and (current liabili-
ties/inventories) are positively related to credit risk (i.e., higher values indi-
cate higher credit risk).
Tables 6.15-6.17 present some descriptive statistics regarding the two
groups of firms (i.e., low credit risk and high credit risk) with regard
to the selected financial ratios. In particular, Table 6.15 presents the means
of the financial ratios, Table 6.16 presents the skewness and kurtosis coeffi-
cients, whereas Table 6.17 presents the correlation coefficients between the
selected financial ratios.
192 Chapter 6
6. Classification problems in finance 193

From the results of the descriptive statistical analysis it is interesting to


observe that most financial ratios significantly differentiate the two groups
of firms (cf. Table 6.15). The only exceptions involve the ratios cash/current
liabilities and current liabilities/inventories which are not found
statistically significant in any of the three years of the analysis. It should also
be noted that the skewness and kurtosis are similar for the two groups of
firms for many ratios (cf. Table 6.16), while their changes over time are in
many cases limited.
On the basis of the considered sample of firms, the credit risk model de-
velopment and validation process is similar to the one used in the bankruptcy
prediction case study discussed earlier in section 2 of this chapter. In particu-
194 Chapter 6

lar, the data of the firms in the sample for the most recent year (year 1995)
are used as the reference set for the development of appropriate credit risk
assessment models that distinguish the low risk firms from the high risk
ones. In a second stage the developed models are applied on years 1994 and
1993 to test their ability in providing reliable early-warning signals of the
credit risk level of firms. Following this analysis framework the subsequent
sub-sections present in detail the obtained results for all methods.

3.3 The developed models


3.3.1 The UTADIS model

Using the data of the firms for the most recent year (year 1995) as the refer-
ence set, the application of the UTADIS method (HEUR2) led to the devel-
opment of the following additive utility function as the appropriate credit
risk assessment model:

Figure 6.3 presents form of the marginal utility function in the


above additive utility model.
6. Classification problems in finance 195
196 Chapter 6

The weighting coefficients of the marginal utility functions in the credit


risk assessment model (6.2) of the UTADIS method indicate that the most
significant ratios for assessing credit risk are the profitability ratios net in-
come/net worth and net income/sales The weights for most of the
other ratios are similar, ranging between 6% and 9%. Only ratios
cash/current liabilities and dividends/cash flow have weights
lower than 6% and consequently they can be considered as the least signifi-
cant factors for determining the level of credit risk.
On the basis of the developed additive utility credit risk assessment
model of the UTADIS method, the distinction between low and high risk
firms is performed through the following rules (the cut-off point 0.6198 is
estimated by the method during the model development process):

If then firm is a low risk firm.

If then firm is a high risk firm.

3.3.2 The model of the MHDIS method

Similarly to case of the UTADIS method, the data of the firms in the sample
for year 1995 are used to develop a credit risk assessment model through the
MHDIS method. The resulting model consists of two additive utility func-
tion, denoted by and The former characterizes the firms of
low credit risk, whereas the latter characterizes the firms of high credit risk.
The analytic form of these two functions is the following:
6. Classification problems in finance 197

The weighting coefficients of the marginal utilities in the above utility


functions differ for each of the two groups. In particular, the main character-
istics of the low risk firms are the ratios earnings before interest and
taxes/total assets current assets/current liabilities quick as-
sets/current liabilities dividends/cash flow interest expenses/sales
and current liabilities/inventories The weights of these ratios in
the utility function are 15.74%, 11.58%, 14.38%, 14.89%, 16.21%,
and 15.26% respectively. On the other hand, the ratios that best describe the
high risk firms are the ratios net income/net worth net income/sales
and net worth/total liabilities with weights, in the function
equal to 22.42%, 29% and 18.11% respectively. Ratios and were also
found significant in the UTADIS credit risk assessment model. The marginal
utility functions of all financial ratios are illustrated in Figure 6.4.
198 Chapter 6
6. Classification problems in finance 199

The decision regarding the classification of a firm into one of the two
considered groups (low credit risk and high credit risk) is based upon the
global utilities obtained through the two developed additive utility functions.
In that sense, a firm is considered to be a low risk firm if
otherwise if then is classified as a firm of high credit risk.

3.3.3 The ELECTRE TRI model

The main parameters of the credit risk assessment developed through the
ELECTRE TRI method are presented in Table 6.18. Similarly to the bank-
ruptcy prediction model discussed earlier in this chapter, the developed
credit risk assessment model does not employ the discordance test of the
ELECTRE TRI method. Consequently, all the presented results and the clas-
sification of the firms are solely based on the concordance test. The classifi-
cation is performed as follows:

If then firm is a low credit risk firm.

If then firm is a high credit risk firm.

The estimated weights of the financial ratios in the ELECTRE TRI model
indicate that the most significant factors for assessing corporate credit risk
include the ratios net income/net worth net income/sales current
assets/current liabilities and dividends/cash flow These results
have some similarities with the conclusions drawn from the models devel-
oped through the UTADIS and the MHDIS method. In particular, ratios net
income/net worth and net income/sales were found significant in the
UTADIS model. The same ratios were found significant in characterizing the
firms of high credit risk according to the model of the MHDIS method. Fur-
thermore, ratios current assets/current liabilities and dividends/cash
200 Chapter 6

flow were found significant in describing the firms of low credit risk
according to the credit risk model of MHDIS.

3.3.4 The rough set model

The credit risk assessment model developed through the rough set approach
consists of only three simple decision rules, presented in Table 6.19.

The first rule covers all the low risk firms included in the reference set
(year 1995). Its condition part considers two ratios, namely the ratios net
income/sales and working capital/total assets The former ratio was
found significant in all the credit risk models developed through UTADIS,
MHDIS and ELECTRE TRI. This result indicates that the net profit margin
(net income/sales) is indeed a decisive factor in discriminating the low risk
firms from the high risk ones. Rules 2 and 3 describe the firms of high credit
risk. It is interesting to note that these rules are actually the negation of rule
1 that describes the low risk firms. Therefore, the above rule set actually
6. Classification problems in finance 201

consists of only one rule, that is rule 1. If rule 1 is fulfilled then it is con-
cluded that the firm under consideration is of low credit risk, otherwise it is a
high risk firm.
The major advantage of the credit risk assessment model developed
through the rough set approach is derived from the fact that it is quite a com-
pact one, in terms of the information required to implement it. The analyst
using this model needs to specify only two ratios There-
fore, credit risk assessment decisions can be taken in very short time without
requiring the use of any specialized software to implement the developed
model. This significantly reduces the time and the cost of the decisions taken
by the credit analysts.

3.3.5 The models of the statistical techniques

The application of the LDA, QDA as well as LA in the sample for credit risk
model development led to the estimation of three credit risk assessment
models. The form of these models is similar to the one of the bankruptcy
prediction models presented in sub-section 2.3.5 earlier in this chapter. Ta-
bles 6.20 and 6.21 present the parameters of these models (i.e., discriminant
coefficients, constant terms and cross-product term of the QDA model). The
classification of the firms as high or low risk is performed using the classifi-
cation rules discussed in sub-section 2.3.5 for the bankruptcy prediction case
and consequently they are not repeated.
202 Chapter 6

3.4 Comparison of the credit risk assessment models


The detailed classification results for all the credit risk models presented in
the previous sub-section are presented in Table 6.22. Similarly to the bank-
ruptcy prediction case, the results involve the two types of error rates, i.e.,
the type I and the type II error rates. In the credit risk assessment problem, a
type I error corresponds to the classification of a high risk firm as a low risk
one, resulting to a potential capital loss. On the other hand, a type II error
corresponds to the classification of a low risk firm as a high risk one, result-
ing to a potential opportunity cost. The discussion on the contribution of
these two error types in estimating the overall error presented earlier in sub-
section 2.4 of this chapter for the bankruptcy prediction case, is still valid for
the credit risk assessment problem due to the similarity of the two problems.
Therefore, following the arguments made in sub-section 2.4 earlier in this
chapter, the overall error rate is estimated as the average of the type I and the
type II error rates, assuming that both contribute equally to the estimation of
the overall error of a credit risk assessment model.
In analyzing the obtained results, recall that the data of the firms for the
year 1995 served as the reference set for the development of the credit risk
assessment models. Consequently, it is not surprising that the models of
most of the methods do not misclassify any firm. Only the models of LDA
and QDA provide a small overall error rate of 5%.
Since the results obtained for the reference set are only indicative of the
fitting ability of the models, the results in years 1994 and 1993 are of major
interest. These results are obtained by applying the developed models on the
data of the firms in the corresponding two years to perform their classifica-
6. Classification problems in finance 203

tion and then comparing the models’ classification with the a-priori known
classification of the firms in the two credit risk classes. Thus, the obtained
results are indicative of the performance of the models to provide accurate
early-warning estimations on the credit risk of the firms.

A close examination of the results indicates that the credit risk assess-
ment models developed through the three MCDA classification methods
(UTADIS, MHDIS and ELECTRE TRI) are more effective compared to the
models of the other methods. In particular, in terms of the overall error rate,
the UTADIS model outperforms the models of rough sets, LDA, QDA and
LA in both years 1994 and 1993. Similarly, the MHDIS model outperforms
the credit risk models of the three statistical methods in both 1994 and 1993.
Compared to the rough set model, MHDIS provides the same performance in
1994, but in 1993 its overall error rate is significantly lower than the rough
set model. Similar conclusions can also be derived for the performance of
204 Chapter 6

the ELECTRE TRI credit risk assessment model as opposed to the rough set
model and the models of the three statistical methods. The comparison of the
three MCDA models to each other shows that UTADIS provides the best
result in 1994, but its performance in 1993 deteriorates significantly com-
pared both to MHDIS and ELECTRE TRI. On the other hand, MHDIS
seems to provide more robust results, since its overall error rate deteriorates
more slowly from 1994 to 1993 compared to the two other MCDA models.
Another interesting point that needs to be noted in the obtained results, is
that the credit risk models developed through the four non-parametric classi-
fication techniques (UTADIS, MHDIS, ELECTRE TRI and rough sets) pro-
vide significantly lower type I error rates in both 1994 and 1993 compared to
the models of the three statistical techniques. This fact indicates that these
models are able to identify the high risk firms with a higher success rate than
the statistical methods. This finding has significant practical implications for
the selection of the appropriate credit risk assessment model, since analysts
often fill more comfortable with models that are efficient in identifying the
firms of high credit risk.
A further analysis of the practical usefulness of these credit risk assess-
ment models could be performed through the comparison of their results
with the corresponding error rates that are obtained by the expert credit ana-
lysts of the bank from which the data have been derived. Obviously, a credit
risk assessment model that performs consistently worse than the actual credit
analyst, cannot provide meaningful support in the credit risk assessment
process. From a decision-aiding perspective such a model is not consistent
with the credit analyst’s evaluation judgment and therefore it is of limited
practical use. On the other hand, models that are able to perform at least as
good with the analysts’ estimations, they can be considered as consistent
with the credit analyst’s evaluation judgment and furthermore, they have the
ability to eliminate the inconsistencies that often arise in credit risk assess-
ment based on human judgment. Therefore, the incorporation of such models
in the bank’s credit risk management process is of major help, both for as-
sessing new credit applications submitted to the bank, as well as for monitor-
ing the risk exposure of the bank from its current credit portfolio. However,
the information required to perform such a comparison between the devel-
oped models’ estimations and the corresponding estimations of the bank’s
credit analysts was not available and consequently the aforementioned
analysis was solely based on the comparison between the selected classifica-
tion methods.
6. Classification problems in finance 205

4. STOCK EVALUATION
4.1 Problem domain
Portfolio selection and management has been one of the major fields of in-
terest in the area of finance for almost the last 50 years. Generally stated,
portfolio selection and management involves the construction of a portfolio
of securities (stocks, bonds, treasury bills, mutual funds, repos, financial de-
rivatives, etc.) that maximizes the investor’s2 utility. The term “construction
of a portfolio” refers to the allocation of a known amount of capital to the
securities under consideration. Generally, portfolio construction can be real-
ized as a two stage process:
1. Initially, in the first stage of the process, the investor needs to evaluate
the available securities that constitute possible investment opportunities
on the basis of their future perspectives. This evaluation leads to the se-
lection of a reduced set consisting of the best securities. Considering the
huge number of securities that are nowadays traded in the international
financial markets, the significance of this stage becomes apparent. The
investor is very difficult to be able to manage a portfolio consisting of a
large number of securities. Such a portfolio is quite inflexible since the
investor will need to be able to gather and analyze a huge amount of
daily information on the securities in the portfolio. This is a difficult and
time consuming process. Consequently portfolio updates will be difficult
to take place in order to adjust to the rapidly changing market conditions.
Furthermore, a portfolio consisting of many securities imposes increased
trading costs which are often a decisive factor in portfolio investment
decisions. Therefore, a compact set of securities needs to be formed for
portfolio construction purposes.
2. Once this compact set of the best securities is specified after the evalua-
tion in the first stage, the investor needs to decide on the allocation of the
available capital to these securities. The allocation should be performed
so that the resulting portfolio best meets the investor’s policy, goals and
objectives. Since these goals/objectives are often diversified in nature
(some are related to the expected return, whereas some are related to the
risk of the portfolio), the resulting portfolio cannot be an optimal one, at
least in the sense that the term “optimal” has in the traditional optimiza-
tion framework where the existence of a single objective is assumed. In-

2
The term “investor” refers both to individual investors as well as to institutional investors,
such as portfolio managers and mutual funds managers. Henceforth, the term “investor” is
used to refer to anyone (individual, firm or organization) who is involved with portfolio
construction and management.
206 Chapter 6

stead, the constructed portfolio will be a satisfying one, i.e., a portfolio


that meets in a satisfactory way (but not necessarily optimal) all the
goals and objectives of the investor.
The implementation of the above two stage process is based on the clear
specification of how the terms “best securities” and “satisfying portfolio” are
defined. The theory of financial markets assumes that the investor’s policy
can be represented through a utility function of some unknown form. This
function is implicitly used by the investor in his/her decision making proc-
ess. The pioneer of modern portfolio theory, Harry Markowitz assumed that
this unknown utility function is a function of two variables/criteria: (1) the
expected return of the portfolio and (2) the risk of the portfolio (Markowitz,
1952, 1959). These two criteria define the two main objectives of the portfo-
lio selection and management process, i.e.: (1) to maximize the expected
return and (2) to minimize the risk of the investment. Markowitz proposed
two well-known statistical measures for considering the return and the risk
of a portfolio. In particular, he used the average as a tool to estimate the ex-
pected return and the variance as a measure of risk. Within this modeling
framework, Markowitz proposed the use of a quadratic programming formu-
lation in order to specify an efficient portfolio that minimizes the risk (vari-
ance) for a given level of return.
The mean-variance model of Markowitz provided the basis for extending
the portfolio selection and management process over a wide variety of as-
pects. Typical examples of the extensions made in the Markowitz’s mean-
variance model, include single and multi index models, average correlation
models, mixed models, utility models, as well as models based on the con-
cepts of geometric mean return, safety first, stochastic dominance, skewness,
etc. A comprehensive review of all these approaches is presented in the book
of Elton and Gruber (1995), whereas Pardalos et al. (1994) provide a review
on the use of optimization techniques in portfolio selection and management.
Generally, the existing research on the portfolio selection and manage-
ment problem can be organized into three major categories:
1. The studies focusing on the securities’ risk/return characteristics. These
studies are primarily conducted by financial researchers in order to spec-
ify the determinants of risk and return in securities investment decisions.
The most well known examples of studies within this category include
Sharpe’s study on the capital asset pricing model (CAPM; Sharpe,
1964), Ross’ study on the arbitrage pricing theory (APT; Ross, 1976)
and the Black-Scholes study on option valuation (Black and Scholes,
1973).
6. Classification problems in finance 207

2. The studies focusing on the development of methodologies for evaluat-


ing the performance of securities according to different performance
measures. These studies can be further categorized into two groups:
The first group includes studies on the modeling and representation
of the investor’s policy, goals and objectives in a mathematical
model, usually of a functional form. This model aggregates all the
pertinent factors describing the performance of the securities to pro-
duce an overall evaluation of the securities that complies with the
policy of the investor. The securities with the higher overall evalua-
tion according to the developed model are selected for portfolio con-
struction purposes in a latter stage of the analysis. The developed
model has usually the form of a utility function following the gen-
eral framework of portfolio theory, according to which the investor
is interested in constructing a portfolio that maximizes his utility.
Thus, making explicit the form of this utility function contributes
significantly in the portfolio selection and management process,
both as a security evaluation mechanism as well as for portfolio
construction.
MCDA researchers have been heavily involved with this line of
research. Some characteristic studies employing MCDA methods to
model the investor’s policy include those of Saaty et al. (1980),
Rios-Garcia and Rios-Insua (1983), Evrard and Zisswiller (1983),
Martel et al. (1988), Szala (1990), Khoury et al. (1993), Dominiak
(1997), Hurson and Ricci (1998), Zopounidis (1993), Hurson and
Zopounidis (1995, 1996, 1997), Zopounidis et al. (1999). A compre-
hensive review of the use of MCDA techniques in the field of port-
folio selection and management is presented in the book of Hurson
and Zopounidis (1997) as well as in the studies of Spronk and
Hallerbach (1997) and Zopounidis (1999).
The second group involves studies on the forecasting of securities’
prices. The objective of this forecasting-based approach is to de-
velop models that are able to provide accurate predictions on the fu-
ture prices of the securities. Given that reliable predictions can be
obtained from historical time-series data, the investor can select the
securities with the highest anticipated future upward trend in their
price. These securities are then considered for portfolio construction
purposes.
The development of such forecasting models is traditionally a
major field of interest for researchers in econometrics and statistics.
Nevertheless, recently the interest on the use of artificial intelligence
techniques has significantly increased. This is mainly due to the
flexibility of these techniques in modeling and representing the
208 Chapter 6

complexity that describes securities’ prices movements and the


highly non-linear behavior of the financial markets. Some examples
based on this new approach, include neural networks (Wood and
Dasgupta, 1996; Trippi and Turban, 1996; Kohara et al., 1997;
Steiner and Wittkemper, 1997), machine learning (Tam et al., 1991;
John et al., 1996), expert systems (Lee et al., 1989; Lee and Jo,
1999; Liu and Lee, 1997), fuzzy set theory (Wong et al., 1992; Lee
and Kim, 1997) and rough sets (Jog et al., 1999). With regard to the
contribution of these new techniques in portfolio selection and man-
agement, it is important to note that their use is not solely restricted
on the academic research, but they are also often used in the daily
practice of investors worldwide.
3. The studies on the development of methodologies for portfolio construc-
tion. These methodologies follow an optimization perspective, usually
in a multiobjective context. This complies with the nature of the portfo-
lio construction problem. Indeed, portfolio construction is a multiobjec-
tive optimization problem, even if it is considered in the mean-variance
framework of Markowitz. Within this framework, the investor is inter-
esting in constructing a portfolio that maximizes the expected return and
minimizes the risk of the investment. This is a two-objective optimiza-
tion problem. Furthermore, considering that actually both return and risk
are multidimensional, it is possible to extend the traditional mean-
variance framework so that all pertinent risk and return factors are con-
sidered. For instance, risk includes both systematic and non-systematic
risk. The traditional mean-variance framework considers only the non-
systematic risk, while within an extended framework the systematic risk
(beta coefficient) can also be considered (e.g., construction of a portfo-
lio with a pre-specified beta). Such an extended optimization frame-
work can consider any goal/objective as perceived by the investor and
not necessarily following a probabilistic approach such as the one of the
mean-variance model. Actually, as noted by Martel et al. (1988), meas-
uring risk and return on a probabilistic context does not always comply
with the investors’ perception of these two key concepts. This finding
motivated several researchers to introduce additional goals/objectives in
the portfolio construction process (e.g., marketability, dividend yield,
earning per share, price/earnings, etc.).
Following this line of research, the construction of portfolios within
this extended optimization framework, can be performed through mul-
tiobjective mathematical and goal programming techniques. Some typi-
cal studies following this approach have been presented by Lee and
Chesser (1980), Nakayama et al. (1983), Rios-Garcia and Rios-Insua
(1983), Colson and De Bruyn (1989), Tamiz et al. (1997), Zopounidis et
6. Classification problems in finance 209

al. (1998), Hurson and Zopounidis (1995, 1996, 1997), Bertsimas et al.
(1999), Zopounidis and Doumpos (2000d).
The use of classification techniques in the two-stage portfolio construc-
tion process discussed in the beginning of this sub-section can be realized
during the first stage and it can be classified in the second group of studies
mentioned above. The use of a classification scheme is not an uncommon
approach to practitioners who are involved with security evaluation. For in-
stance, in the case of stock evaluation most investment analysts and financial
institutions periodically announce their estimations on the performance of
the stocks in the form of recommendations such as “strong buy”, “buy”,
“market perform”, etc. Smith (1965) first used a classification method
(LDA) in order to develop a model that can reproduce such expert’s recom-
mendations. A similar study was compiled by White (1975). Some more re-
cent studies such as the ones of Hurson and Zopounidis (1995, 1996, 1997),
Zopounidis et al. (1999) employ MCDA classification methods including
ELECTRE TRI and UTADIS for the development of stock classification
models considering the investor’s policy and preferences.
Of course, except for the evaluation and classification on the basis of ex-
pert’s judgments, other classification schemes can also be considered. For
instance, Klemkowsky and Petty (1973) used LDA to develop a stock classi-
fication model that classified stocks into risk classes on the basis of their his-
torical return volatility. Alternatively, it is also possible to consider a classi-
fication scheme where the stocks are classified on the basis of their expected
future return (e.g., stocks that will outperform the market, stocks that will
not outperform the market, etc.). Jog et al. (1999) adopted this approach and
used the rough set theory to develop a rule-based model that used past data
to classify the stocks into classes according to their expected future return, as
top performers (stocks with the highest future return), intermediate stocks,
low performers (stocks with the lowest future return). A similar approach
was used by John et al. (1996) who employed a machine learning methodol-
ogy, whereas Liu and Lee (1997) developed an expert system that provides
buy and sell recommendations (a two-group classification scheme) on the
basis of technical analysis indicators for the stocks (Murphy, 1995).
The results obtained through such classification models can be integrated
in a later stage of the analysis with an optimization methodology (goal pro-
gramming, multiobjective programming) to perform the construction of the
most appropriate portfolio.

4.2 Data and methodology


Following the framework described in the above brief review, the applica-
tion in this sub-section involves the development of a stock evaluation model
210 Chapter 6

that classifies the stocks into classes specified by an expert stock market ana-
lyst. The development of such a model has both research and practical impli-
cations for at least two reasons:
The model can be used by stock market analysts and investors in their
daily practice as a supportive tool for the evaluation of stocks on the ba-
sis of their financial and stock market performance. This reduces signifi-
cantly the time and cost of the analysis of financial and stock market data
on a daily basis.
If the developed model has a specific quantitative form (utility function,
discriminant function, outranking relation, etc.) it can be incorporated in
the portfolio construction process. Assuming that the developed model is
a stock performance evaluation mechanism representing the judgment
policy of an expert stock market analyst, then the construction of a port-
folio that achieves the best performance according to the developed
model can be considered as an “optimal” one in the sense that it best
meets the decision-maker’s preferences.
From the methodological point of view, this application has several dif-
ferences compared to the previous two applications on bankruptcy prediction
and credit risk assessment:
1. The stock evaluation problem is considered as a multi-group classifica-
tion problem (the stocks are classified into three groups). Both bank-
ruptcy prediction and credit risk assessment were treated as two-group
problems.
2. There is an imbalance in the size of the groups in the considered sample.
In both bankruptcy prediction and credit risk assessment each group in
the samples used consisted of half the total number of firms. On the
other hand, in this application the number of stocks per group differs for
the three groups. This feature in combination with the consideration of
more than two groups increases the complexity of this application com-
pared to the two previous ones.
3. The sample used in this application involves only one period and there
is no additional holdout sample. Consequently, the model validation
techniques used in the bankruptcy prediction and in the credit risk as-
sessment cases are not suitable in this application. To tackle this prob-
lem a jackknife model validation approach is employed (McLachlan,
1992; Kahya and Theodossiou, 1999; Doumpos et al., 2001) to obtain an
unbiased estimate of the classification performance of the developed
stock evaluation models. The details of this approach will be discussed
later.
Having in mind these features, the presented stock evaluation case study
involves the evaluation of 98 stocks listed in the Athens Stock Exchange
6. Classification problems in finance 211

(ASE). The data are derived from the studies of Karapistolis et al. (1996) and
Zopounidis et al. (1999). All stocks in the sample were listed in ASE during
1992 constituting the 68.5% of the total number of stocks listed in ASE at
that time (143 stocks). The objective of the application is to develop a stock
evaluation model that will classify the stocks into the following three groups:
Group This group consists of 9 stocks with the best investment po-
tentials in the medium/long run. These stocks are attractive to the inves-
tors, while the corresponding firms are in a sound financial position and
they have a very positive reputation in the market.
Group The second group of stock includes 31 stocks. The overall
performance and stock market behavior of these stocks is rather moder-
ate. However, they could be used by the portfolio manager to achieve
portfolio diversification.
Group This is the largest group of stocks, since it includes 58 stocks
out of the 98 stocks in the sample. The stocks belonging into this group
do not seem to be good investment opportunities, at least for the medium
and long-term. The consideration of these stocks in a portfolio construc-
tion context can only be realized in a risk-prone investment policy seek-
ing short-term profits.
This trichotomous classification approach enables the portfolio manager
to distinguish the promising stocks from the less promising ones. However,
the stocks that are found to belong to the third class (less promising stocks)
are not necessarily excluded from further consideration. Although the portfo-
lio manager is informed about their poor stock market and financial per-
formance in the long-term, he may select some of them (the best ones ac-
cording to their performance measured on the basis of the developed classi-
fication model) in order to achieve portfolio diversification or to make short-
term profits. In that sense, the obtained classification provides an essential
form of information to portfolio managers and investors; it supports the
stock evaluation procedure and leads to the selection of a limited number of
stocks for portfolio construction.
The classification of the stocks in the sample was specified by an expert
stock market analyst with experience on ASE. The classification of the
stocks by this expert was based on the consideration of 15 financial and
stock market criteria describing different facets of the performance of the
stocks. These criteria are presented in Table 6.23.
Criteria describe the stock market behavior of the stocks, whereas
criteria are commonly used financial ratios similar to the ones em-
ployed in the previous two case studies. The combination of stock market
indices and financial ratios enables the evaluation of all the fundamental fea-
tures of the stocks and the corresponding firms.
212 Chapter 6

For most of the criteria, the portfolio managers’ preferences are increas-
ing functions on their scale; this means that the greater the value of the crite-
ria, the greater is the satisfaction of the portfolio manager. On the contrary,
criterion (P/E ratio) has a negative rate, which means that the portfolio
managers’ preference decreases as the value of this criterion increases (i.e., a
portfolio manager would prefer a stock with low price that could yield high
earnings). Furthermore, although it is obvious that the criteria and are
correlated, an expert portfolio manager that has collaborated in this case
study, has indicated that both criteria should be retained in the analysis.

3
Gross book value per share=Total assets/Number of shares outstanding
Capitalization ratio=1/(Price/Earning per share)
Stock market value=(Number of shares outstanding)×(Price)
Marketability=Trading volume/Number of shares outstanding
Financial position progress=(Book value at year t)/(Book value at year t-1)
Dividend yield=(Dividend paid at time t)/(Price at time t)
Capital gain=(Price at time t- Price at time t-1)/(Price at time t-1)
Exchange flow ratio=(Number of days within a year when transactions for the stock took
place)/(Number of days within a year when transactions took place in ASE)
Round lots traded per day=(Trading volume over a year)/[(Number of days within a year
when transactions took place in ASE)×(Minimum stock negotiation unit)]
Transactions value per day=(Transactions value over a year)/(Number of days within a year
when transactions for the stock took place)
6. Classification problems in finance 213

Tables 6.24 and 6.25 present some descriptive statistics (group means,
skewness, kurtosis and correlations) regarding the performance of the stocks
in the sample on the considered evaluation criteria. The comparison of the
criteria averages for the three groups of stocks presented in Table 6.24 show
that the three groups have significantly different performance as far as the
stock market criteria are concerned (only criterion is found insig-
nificant). On the contrary, the existing differences for the financial ratios are
not found to be significant (except for the net income/net worth ratio,
214 Chapter 6
6. Classification problems in finance 215

As noted earlier the model validation techniques used in the bankruptcy


prediction case study and in credit risk assessment (i.e., application of the
models in earlier years of the data or in the holdout sample) cannot be used
in this stock evaluation case. Instead, a jackknife model validation approach
is employed. This approach is implemented in five steps as follows:
Step 1: Select at random one stock from each of the three groups
and The three selected stocks form a random holdout sample. A ran-
dom number generator is used to perform the selection of the stocks. For
each group the random number generator produces a ran-
dom integer in the interval where is the number of stocks be-
longing into group This integer indicates the
stock to be selected.
Step 2: A stock classification model is developed using as the reference
set the initial sample of stocks, excluding the three stocks selected ran-
domly at step 1.
Step 3: Use of the model developed at step 2 to classify the three stocks
of the holdout sample formed at step 1.
Step 4: Estimate the classification error for both the reference set and
the holdout sample.
Step 5: Repeat steps 1-4 for 150 times.
The procedure provides an unbiased estimate of the classification per-
formance of the compared methods in classifying the stocks into the three
groups.

4.3 The developed models


For each replication of the above model validation process (150 replications
overall) a different stock evaluation model is developed. Therefore, to facili-
tate the presentation of the developed models, the subsequent analysis is
based on statistical information obtained for the models’ parameters and the
classification error rates over all 150 replications.

4.3.1 The MCDA models

The most significant parameter of the models developed through UTADIS


(HEUR2), MHDIS and ELECTRE TRI involve the criteria’s weights. Since
different models are developed at each replication of the jackknife model
validation process, different estimations are obtained at each replication for
the criteria’s weights. Tables 6.26 and 6.27 present some summary statistics
for the different estimates of the stock evaluation criteria weights, including
the average weights, their standard deviation and their coefficient of varia-
216 Chapter 6

tion. It should be noted, that for the MHDIS method these statistics involve
all the additive utility functions that are developed. These include four utility
functions. Functions and are developed at the first stage of the hierar-
chical discrimination process for the discrimination of the stocks belonging
into group from the stocks of groups and The former function
characterizes the high performance stocks whereas the latter
function characterizes the stocks of groups and (medium perform-
ance stocks and low performance stocks, respectively). The second pair of
utility functions and are developed at the second stage of the hierar-
chical discrimination process for the distinction between medium perform-
ance stocks and low performance stocks The function
characterizes the medium performance stocks, whereas the function
characterizes the low performance stocks.
6. Classification problems in finance 217

According to the results of Tables 6.26-6.27 there are some interesting


similarities among the models of the three MCDA methods as far as the most
contributing criteria that describe the performance and classification of the
stocks are concerned. In particular, the results of both UTADIS and
ELECTRE TRI methods agree that the exchange flow ratio the transac-
tions value per day and the financial ratio net worth/total asses are
significant factors for the classification of the stocks. Of course, the weights
assigned by each method to these stock evaluation criteria differ, but they all
lead to the above conclusion. In the MHDIS method the exchange flow ratio
is found significant in discriminating between medium and low per-
formance stock (the weights of this ratio in the utility functions and
exceed 10%), whereas the transactions value per day is found signifi-
cant at all stages of the hierarchical discrimination process employed in
MHDIS (the weights of this ratio in all four utility functions are higher than
15%).
Considering the coefficient of variation as a measure of the stability of
the criteria’s weights, it is interesting to observe that the estimates of
UTADIS and MHDIS are quite more robust compared to the ones of
ELECTRE TRI. For instance, in the case of the UTADIS method the coeffi-
218 Chapter 6

cient of variation for the weights of 9 out of the 15 ratios is lower than one.
In the ELECTRE TRI’s results only two weight estimates have coefficient of
variation lower than one, whereas in the MHDIS method the weight esti-
mates for all the significant ratios (ratios with average weight higher than
10%) have coefficient of variation lower than one.
Table 6.28 provides some further results with regard to the significance
of the stock evaluation criteria in the classification models developed by the
three MCDA classification methods. In particular, this table presents the
ranking of the stock evaluation criteria according to their importance in the
models of each method. The criteria are ranked from the most significant
(lowest entries in the table) to the least significant ones (highest entries in the
table). In each replication of the jackknife experiment, the criteria are ranked
in this way for each of the models developed by the three MCDA methods.
The different rankings obtained over all 150 replications of the jackknife
experiment are then averaged. The average rankings are the one illustrated in
Table 6.28. The Kendall’s coefficient of concordance W is also reported for
each method as a measure of the concordance in the rankings of the criteria
over the 150 replications. The results indicate that the results of the UTADIS
method are quite more robust than the ones of ELECTRE TRI and MHDIS.
In addition, the significance of the transactions value per day is clearly
indicated since in all methods this criterion has one of the highest positions
in the rankings.
6. Classification problems in finance 219

In addition to the criteria’s weights, the ELECTRE TRI models consider


three additional parameters involving the preference, indifference and veto
thresholds. The corresponding results for these parameters are presented in
Table 6.29 (the presented results involve the average estimates over all 150
ELECTRE TRI models developed at each replication of the jackknife proce-
dure).
The results of Table 6.29 show that most ratios are given a veto ability
when the discrimination between the high performance stocks and the me-
dium performance stock is performed (comparison with the profile In
particular, during the 150 replications ratio was given a veto ability in 87
replications, ratio in 10 replications, ratios and in
4 replications, while ratio in two replications. In the case where the stocks
are compared to the reference profile in order to discriminate between the
medium performance stocks and the low performance stocks (groups and
respectively), the discordance test is used less frequently. In particular,
ratio was given a veto ability in four replications, whereas ratios and
were given a veto ability in only one replication.
220 Chapter 6

4.3.2 The rough set model

The rule-based stock evaluation models developed at each replication of the


jackknife process are analyzed in terms of the number of rules describing
each group of stocks and the strength of the rules (i.e., the number of stocks
covered by each rule). However, given that the sample is not balanced in
terms of the number of stocks per group (9 stocks in 31 stocks in and
58 stocks in it is obvious that the rules describing the smaller groups
will have lower strength. To overcome this difficulty in us-
ing the traditional strength measure defined as the number of stocks covered
by each rule, a modified strength measure is employed in this application.
The modified measure used is referred to as relative strength. The relative
strength of a rule describing group (i.e., a rule whose conclusion part rec-
ommends the classification into group is defined as the ratio of the num-
ber of stocks that belong into group and they are covered by the rule to
the number of stocks belonging into group The higher is the relative
strength of a rule corresponding to group the more general is the descrip-
tion of this group of stocks. Tables 6.30 and 6.31 present the statistics on the
developed rule-based stock classification models in terms of the three
aforementioned features (strength, relative strength, and number of rules).
The results show that the number of rules describing the high performance
stock is smaller compared to the number of rules corresponding
to the medium and the low performance stocks (groups and respec-
tively). This is of no surprise, since as noted earlier there are only nine high
performance stocks in the sample and consequently only a small number of
rules is required to describe them. Despite this fact, it is interesting to note
that the relative strength of the rules corresponding to high performance
stock is considerably higher than the relative strength of the rules
developed for the two other groups of stocks, thus indicating that the rules of
group are more general compared to the other rules.
6. Classification problems in finance 221

Figure 6.5 presents some results derived from the developed rule-based
models with regard to the significance of the stock evaluation criteria. The
presented results involve the number of replications for which each stock
evaluation criterion was considered in the developed stock classification
rules.

The more frequently a criterion appears in the developed rules the more
significant it is considered. On the basis of this remark, the results of Figure
6.5 show that the gross book value per share ratio and the exchange flow
ratio are the most significant factors in the stock classification rules,
since they are used in all 150 rule-based models developed at each replica-
tion of the jackknife process. The former ratio was found significant in the
MHDIS models (cf. Table 6.27), while in the ELECTRE TRI models it was
often given a veto ability. On the other hand, the exchange flow ratio
was found significant by all the three MCDA classification methods. An-
other ratio that was found significant in the models developed by the MCDA
222 Chapter 6

methods, the transactions value per day ratio is quite often used in the
stock classification rules constructed through the rough set approach. In par-
ticular, this ratio is used in 127 out of the 150 rule-based models that are de-
veloped. On the other hand the net worth/total assets ratio that was
found significant in the UTADIS and the ELECTRE TRI models is not
found to be significant by the rough set approach; it is used only in 19 out of
the 150 rule-based models.

4.4 Comparison of the stock evaluation models


The classification results obtained by all the above described stock evalua-
tion models as well as the corresponding results of the statistical classifica-
tion techniques (LDA and LA)4 are summarized in Table 6.32. The presented
results are averages over all 150 jackknife replications; they refer to the
holdout sample formed at each replication and consisted of three randomly
selected stocks. Along with the classification results, Table 6.32 also pre-
sents (in parentheses) the grouping of the considered methods on the basis of
their overall error rate using the Tukey’s test at the 5% significance level.
In terms of the overall error rate UTADIS provides the best results, fol-
lowed closely by MHDIS and LA. The results of these three methods do not
differ significantly according to the grouping of the Tukey’s test. Neverthe-
less, it should be noted that UTADIS is the only method for which the over-
all error rate is lower than 30%. Furthermore, both UTADIS and MHDIS are
quite effective in classifying correctly the high performance stocks
and the low performance stocks as well. On the other hand,
the classification performance of these two methods with regard to the me-
dium performance stocks is rather poor compared to the other
methods considered in the comparison.
In contrast to UTADIS and MHDIS, ELECTRE TRI is quite effective in
the group of medium performance stocks, both when the discordance test is
employed as well as in the case where this test is not considered (ELECTRE
TRI with veto vs. ELECTRE TRI without veto). In terms of the overall error
rate, the performance of the ELECTRE TRI models that do not consider the
discordance test, is slightly inferior to the case where the discordance test is
performed, but the differences between the two cases are not statistically
significant. The results of the ELECTRE TRI models are similar to the ones
of LDA. Finally, the rough set approach seems to provide the worst results
compared to the other methods in terms of the overall error rate. One reason
for the low performance of the rough set approach could be the imbalance of
4
QDA was not used in the stock evaluation case study due to some high correlations be-
tween the criteria for the high performance stocks (group C1). This posed problems in the
estimation of the quadratic discriminant functions’ coefficients.
6. Classification problems in finance 223

the three groups of stocks in the sample (9 stock in group 31 stocks in


group and 58 stocks in group

As far as the individual error rates are concerned it is interesting to note


that the ELECTRE TRI models that consider the discordance test dot not
lead to any error of the forms or The errors are as-
sociated to an opportunity cost for the investor/portfolio manager due to the
decision not to invest in a high performance stock. On the other hand, the
errors are likely to lead to capital losses in a medium-long-term,
since they correspond to an incorrect decision to invest in a low performance
stock. The ELECTRE TRI models that do not consider the discordance test
also lead to very limited errors of the forms and and so are
the models of the MHDIS method. The errors are also limited in the
224 Chapter 6

other methods too, but the errors are higher especially in the case of
LDA and LA.
Overall, the rather high error rates of all methods (28.89%-41.83%) indi-
cate the complexity of the stock evaluation problem. The dynamic nature of
the stock markets in combination with the plethora of internal and external
factors that affect stock performance as well as the huge volume of financial
and stock market information that is available to investors and stock market
analysts, all contribute to the complexity of the stock evaluation problem.
Chapter 7
Conclusions and future perspectives

1. SUMMARY OF MAIN FINDINGS


The classification problem has always been a problem of major practical and
research interest. This remark is justified by the plethora of decision making
problems that require absolute judgments/comparisons of the alternatives
with explicit or implicit reference profiles in order to decide upon the classi-
fication of the alternatives into predefined homogenous groups. Several typi-
cal examples of such problems have been mentioned through out this book.
Most of the existing research on classification problems is focused on the
development of efficient methodologies for developing classification models
that aggregate all the pertinent factors (evaluation criteria) describing the
problem at hand so that the following two main objectives can be met:
1. The decision maker support in the evaluation of the alternatives in an
accurate and reliable way by providing correct recommendations on the
classification of the alternatives.
2. The analysis of the impact that the considered evaluation criteria have on
the evaluation and classification of the alternatives.
Methodologies that meet satisfactory these objectives are of major inter-
est as tools to study complex decision making problems and provide efficient
support to decision makers.
The traditional methodologies used for developing classification models
originate from the fields of multivariate statistics and econometrics. These
methodologies have set the bases for understanding the nature of the classifi-
cation problem and for the modeling and representation of classification
problems in quantitative models. Nevertheless, similarly to any statistical
226 Chapter 7

procedure, these methodologies are based on specific statistical assumptions.


The validity of these assumptions is often impossible to be checked in real
world problems, since in most cases the analyst cannot have a full descrip-
tion of the actual phenomenon under consideration, but only a small sample.
This restriction has motivated researchers to explore the development of
more flexible and less restrictive methodologies for developing classification
models. Towards this direction a significant part of the research has been
focused on the use of artificial intelligence techniques and operations re-
search methods.
This book followed the operations research approach and in particular the
MCDA paradigm. MCDA has evolved over the past three decades as one of
the most significant field of operations research and decision sciences. The
book was mainly focused on the preference disaggregation approach of
MCDA. Two preference disaggregation methods were introduced to develop
classification models, namely the UTADIS and the MHDIS methods. Both
methods assume that the decision maker’s system of preferences can be rep-
resented in the form of an additive utility function that is used to decide on
the classification of the alternatives into the predefined groups. The main
advantage of both methods compared to other MCDA techniques is that the
development of the additive utility models requires only minimal informa-
tion to be specified by the decision maker. In contrast to other methods (e.g.,
ELECTRE TRI), in preference disaggregation analysis the decision maker
does not need to specify detailed preferential information in the form of cri-
teria weights, reference profiles, indifference, preference, veto thresholds,
etc. Instead, only a representative sample of the actual decisions that he takes
is needed. The decisions taken by the decision maker are the outcome of his
system of preferences and values, i.e., the outcome of the decision policy
that he employs in his daily practice. In this regard, a sample of decisions
taken on representative situations encompasses all the information required
to describe the decision maker’s system of preferences. Given that such a
sample can be formed, the model development process is then focused on the
development of a model that can reproduce the actual decisions taken by the
decision maker. This approach is analogous to the well-known regression
paradigm used in statistics and econometrics.
The use of mathematical programming techniques for model develop-
ment purposes within the above context provides increased flexibility. In
particular, the discussion made in Chapter 4, shows that the use of mathe-
matical programming enables the use of different approaches to measure the
efficiency of the classification model to be developed. Except for the in-
creased model development flexibility, the implementation of preference
disaggregation paradigm within the context of the two proposed methods
(UTADIS and MHDIS) has the following two main advantages:
7. Conclusions and future perspectives 227

1. The parameters of the additive utility models (criteria weights and mar-
ginal utility functions) have clear interpretation that can be understood
by the decision maker. This is a very important issue for understanding
the results and the recommendations of the developed models with re-
gard to the classification of the alternatives and the amelioration of the
model so that it is as consistent as possible with the decision maker’s
system of preferences. Actually, the model development process in the
context of the proposed MCDA methods should not be considered as a
straightforward automatic process involving the solution of an optimiza-
tion problem. Instead, the specification of the model’s parameters
through an optimization procedure is only the first stage of the model
development process. The results obtained at this first stage constitute
only an initial basis for the further calibration of the model through the
interactive communication among the decision maker and the analyst.
The implementation of this interactive process will clarify and eliminate
the possible inconsistencies in the model or even in the decision
maker’s judgments.
2. The use of the additive utility function as the modeling and representa-
tion form enables the use of qualitative criteria. Many classification
methods from the fields of statistics and econometrics but also non-
parametric classification techniques such as mathematical programming
and neural networks assume that all criteria (variables) are quantitative.
For qualitative criteria two approaches are usually employed: (a) Quanti-
fication of the qualitative scale by assigning an arbitrary chosen real or
integer value to each level of the scale (e.g., 0=low, l=medium, 2=high).
(b) Consideration of each level of the qualitative scale as a distinct bi-
nary variable (criterion). For instance, the criterion market reputation of
the firm measured in the three level scale {good, medium, bad}, follow-
ing this second approach would be broken down into three binary crite-
ria: market reputation good={0, 1}, market reputation medium={0, 1},
market reputation bad={0, 1}, where zeros correspond to no and ones to
yes. Both these approaches, alter the nature of the qualitative criteria and
hardly correspond to the way that the decision maker perceives them. On
the other hand, the proposed MCDA methods do not require any change
in the way that the qualitative criteria are measured, and consequently
the developed classification models can easily combine quantitative and
qualitative criteria. This is an important advantage, mainly for real-world
problems where qualitative information is vital.
Of course, additive utility functions are not the only available choice for
the representation and modeling of the decision maker’s preferences in clas-
sification problems within the preference disaggregation paradigm. In Chap-
ter 5 the use of the outranking relation model was considered as the criteria
228 Chapter 7

aggregation mechanism for classification purposes. The methodology pro-


posed in Chapter 5 (appendix) for the specification of the parameters of an
outranking relation classification model (criteria weights, preference, indif-
ference and veto thresholds) on the basis of the preference disaggregation
paradigm is a research direction that is worth further investigation. The ap-
proach presented in Chapter 5 is new attempt to address this issue that over-
comes some of the problems of previous similar techniques (Mousseau and
Slowinski, 1998), mainly with regard to computational complexity issues
and the modeling of the discordance test. Nevertheless, further research is
still required in order to take full advantage of the capabilities that the out-
ranking relation modeling framework provides, such as the introduction of
incomparability relation.
Except for the above methodological issues, the book also focused on
more “practical” issues regarding the comparison of MCDA classification
methods to other well-known techniques. Most existing comparative studies
in the field of MCDA involve the comparison of different MCDA methods
in terms of their theoretical grounds and the kind of support that they provide
to decision makers (Siskos et al., 1984b; Roy and Bouyssou, 1986; Carmone
et al., 1997; Zanakis et al., 1998). Such studies provide valuable insight to
the peculiarities and features that underlie the operation of MCDA methods,
thus contributing to the understanding of the way that MCDA can be used to
support and ultimately improve the decision making process. Nevertheless,
there is an additional question that needs to be answered; this involves the
analysis of the relative performance of MCDA classification methods as op-
posed to other existing and widely used techniques. Such analysis is of major
practical interest. An actual decision maker would not simply be interested
in an approach that provides enhanced preference modeling capabilities, but
he would also be interested in using a method that meets a complex real-
world problem as effectively as possible providing accurate recommenda-
tions.
The investigation of this issue in the present book was realized in two di-
rections. The first involved an extensive experimental comparison of MCDA
methods compared to other classification approaches, whereas at a second
stage the analysis was based on real-world financial data (bankruptcy predic-
tion, credit risk assessment, stock evaluation). The results obtained in both
cases can be considered as encouraging for the MCDA classification meth-
ods (UTADIS, MHDIS, ELECTRE TRI). In most cases, their classification
performance was found superior to widely used classification techniques
such as the linear and the quadratic discriminant analysis, the logit analysis
and the non-parametric rule-based framework of the rough set approach.
7. Conclusions and future perspectives 229

2. ISSUES FOR FURTHER RESEARCH


In this book an effort was made to cover as comprehensively as possible a
plethora of issues regarding the model developing techniques for MCDA
classification methods and their performance. Nevertheless, in the fields of
classification, in general, and MCDA classification methods, in particular,
there are still quite many topics that are worth further research and analysis.
Two of the most characteristic future research topics involve the valida-
tion of the developed models and their uniqueness. Both issues are of major
interest for MCDA methodologies that employ indirect model development
approaches. Methodologies that employ direct interrogation procedures for
model development, cope with these issues using the information that the
decision maker provides directly to the analyst during model development.
However, when employing an indirect approach for preferential elicitation
purposes, there are several issues raised. Most of the proposed methodolo-
gies rest with the development of a classification/sorting model that satisfies
some optimality criteria defined on the reference set of alternatives. Never-
theless, no matter what these optimality criteria are, it is possible that there
are other optimal or sub–optimal (near–optimal) classification/sorting mod-
els that can provide a more appropriate representation of the problem and the
decision maker’s preferences. This is because what is optimal according to
the limited information included in the reference set cannot be guaranteed to
remain optimal when the whole information becomes available. In some
methods, such as the UTADIS method, this problem is addressed through
post–optimality analysis. In MHDIS the sequential solution of three mathe-
matical programming problems provides similar results. The further investi-
gation of this issue is clearly of major importance towards performing a
more thorough analysis of the information included in the reference set, thus
providing significant support to the decision maker in selecting the most ap-
propriate model not according to a pre–specified “optimality” criterion, but
rather according to his/her preferential system.
The above issue is closely related to another possible future research di-
rection. Since it is possible to develop many classification/sorting models
(optimal or near–optimal) from a given reference set, it would be interesting
to explore ways of combining their outcomes (classification/sorting recom-
mendations). Such combination of classification/sorting models of the same
or different forms could have a positive impact on the accuracy of the classi-
fication/sorting decisions taken through these models. This issue has been
widely studied by researchers in the field of machine learning through the
development of voting algorithms (Breiman, 1996; Quinlan, 1996; Jelanek
and Stefanowski, 1998). The consideration of similar approaches for MCDA
classification/sorting models is also worth the investigation.
230 Chapter 7

Special treatment should also be given to the specification of the techni-


cal parameters involved in the model development process (e.g., optimality
criterion, normalization constraints imposed on the development of discrimi-
nant functions, number of break-points for the piece-wise linear formulation
of marginal utility functions in utility-based techniques, etc.). The specifica-
tion of these parameters affects both the performance of the developed clas-
sification/sorting models as well as their stability. Thus, investigating the
way that these parameters should be specified will provide significant sup-
port in eliminating a source of peremptoriness during model development,
thus facilitating the interactive development of classification and sorting
models. Interactivity is an issue of major importance in MCDA and its fur-
ther consideration for the development of decision rules from assignment
examples is also an important topic for future research.
Except for the above model development and validation issues, the future
research can also focus on the form of the criteria aggregation models that
are used. In the present book the additive utility function and the outranking
relation have only been considered. As noted above the consideration of the
outranking relation approach needs further investigation so that outranking
relations models can be developed through the preference disaggregation
paradigm that consider the incomparability between the alternatives. The
way that the discordance test is performed needs also careful consideration.
The methodology proposed in the appendix of Chapter 5 implied the use of
the discordance test only in cases where it was found to improve the classifi-
cation results. Other ways can also be explored to the realization of the dis-
cordance test to comply with its nature and the introduction of the veto abil-
ity for the criteria. With regard to the utility function framework, it is worth
investigating the use of other forms than additive ones. The multiplicative
utility function is a typical example (Keeney and Raiffa, 1993) having some
interesting advantages over the additive case, mainly with regard to the
modeling of the interactions between the criteria (additive utility functions
assume that the criteria are preferentially independent; cf. Chapter 3). How-
ever, the use of the multiplicative utility function in optimization problems
such as the ones used in UTADIS and MHDIS leads to non-linear mathe-
matical programming formulations with non-linear constraints which can be
quite difficult to solve especially for large data sets. The use of advanced
optimization algorithms, such as genetic algorithms and heuristic optimiza-
tion techniques (e.g., tabu search) could be helpful at this point.
Another future research direction, which could be of interest at the levels
of research and practice, is the investigation of the similarities and differ-
ences of MCDA classification and sorting methods. Since there is an arsenal
of different MCDA classification/sorting methods, the analyst should be able
to recommend to the decision maker the most appropriate one according to
7. Conclusions and future perspectives 231

the features of the problem at the hand. Providing such a recommendation


requires the examination of the circumstances under which different MCDA
models give similar or different results, as well as the relative comparison of
the classification and sorting performance of these models subject to differ-
ent data conditions.
Finally, it is important to emphasize the necessity for the development of
multicriteria decision support systems that do not focus only on model de-
velopment, but they also incorporate all the above future research directions.
Such integrated systems that will employ different classification/sorting
models will provide major support to decision makers in constructing appro-
priate classification/sorting models for real-world decision making purposes.
References

Abad, P.L. and Banks, W.J. (1993), “New LP based heuristics for the classification problem”,
European Journal of Operational Research, 67, 88–100.
Altman, E.I. (1968), “Financial ratios, discriminant analysis and the prediction of corporate
bankruptcy”, Journal of Finance, 23, 589-609.
Altman, E.I. (1993), Corporate Financial Distress and Bankruptcy, John Wiley and Sons,
New York.
Altman, E.I. and Saunders, A. (1998), “Credit risk measurement: Developments over the last
20 years”, Journal of Banking and Finance, 21, 1721–1742.
Altman, E.I., Avery, R., Eisenbeis, R. and Stinkey, J. (1981), Application of Classification
Techniques in Business, Banking and Finance, Contemporary Studies in Economic and
Financial Analysis, Vol. 3, JAI Press, Greenwich.
Altman E.I., Hadelman, R.G. and Narayanan, P. (1977), “Zeta analysis: A new model to iden-
tify bankruptcy risk of corporations”, Journal of Banking and Finance, 1, 29–51.
Andenmatten, A. (1995), Evaluation du Risque de Défaillance des Emetteurs d’Obligations:
Une Approche par l’Aide Multicritère à la Décision, Presses Polytechniques et Universi-
taires Romandes, Lausanne.
Anderson, T.W. (1958), An Introduction to Multivariate Statistical Analysis, Wiley, New
York.
Archer, N.P. and Wang, S. (1993), “Application of the back propagation neural networks
algorithm with monotonicity constraints for two-group classification problems”, Deci-
sion Sciences, 24, 60-75.
Bajgier, S.M. and Hill, A.V. (1982), “A comparison of statistical and linear programming
approaches to the discriminant problem”, Decision Sciences, 13, 604–618.
Bana e Costa, C.A. and Vansnick, J.C. (1994), “MACBETH: An interactive path towards the
construction of cardinal value functions”, International Transactions on Operations Re-
search, 1, 489-500.
234

Banks, W.J. and Abad, P.L. (1991), “An efficient optimal solution algorithm for the classifi-
cation problem”, Decision Sciences, 22, 1008–1023.
Bardos, M. (1998), “Detecting the risk of company failure at the Banque de France”, Journal
of Banking and Finance, 22, 1405–1419.
Bastian, A. (2000), “Identifying fuzzy models utilizing genetic programming”, Fuzzy Sets and
Systems, 113, 333-350.
Beaver, W.H. (1966), “Financial ratios as predictors of failure”, Empirical Research in Ac-
counting: Selected Studies, Supplement to Journal of Accounting Research, 5, 179-199.
Belacel, N. (2000), “Multicriteria assignment method PROAFTN: Methodology and medical
applications”, European Journal of Operational Research, 125, 175-183.
Belton, V. and Gear, T. (1983), “On a short-coming of Saaty’s method of analytic hierar-
chies”, Omega, 11/3, 228-230.
Benayoun, R., De Montgolfier, J., Tergny, J. and Larichev, O. (1971), "Linear programming
with multiple objective function: Stem method (STEM)", Mathematical Programming,
1/3, 366-375.
Bergeron, M., Martel, J.M. and Twarabimenye, P. (1996), “The evaluation of corporate loan
applications based on the MCDA”, Journal of Euro-Asian Management, 2/2, 16-46.
Berkson, J. (1944), “Application of the logistic function to bio-assay”, Journal of the Ameri-
can Statistical Association, 39, 357-365.
Bertsimas, D., Darnell, C. and Soucy, R. (1999), “Portfolio construction through mixed-
integer programming at Grantham, Mayo, Van Otterloo and Company”, Interfaces, 29,
49-66.
Black, F. and Scholes, M. (1973), “The pricing of options and corporate liabilities”, Journal
of Political Economy, 81, 659-674.
Bliss, C.I. (1934), “The method of probits”, Science, 79, 38-39.
Boritz, J.E. and Kennedy, D.B. (1995), “Effectiveness of neural network types for prediction
of business failure”, Expert Systems with Applications, 9/4, 503-512.
Brans, J.P. and Vincke, Ph. (1985), “A preference ranking organization method”, Manage-
ment Science, 31/6, 647-656.
Breiman, L., Friedman, J.H., Olsen, R.A. and Stone, C.J. (1984), Classification and Regres-
sion Trees, Pacific Grove, California.
Carmone Jr., F.J., Kara, A. and Zanakis, S.H. (1997), “A Monte Carlo investigation of in-
complete pairwise comparison matrices in AHP”, European Journal of Operational Re-
search, 102, 538-553.
Catelani, M. and Fort, A., (2000), “Fault diagnosis of electronic analog circuits using a radial
basis function network classifier”, Measurement, 28/3, 147-158.
Casey, M., McGee, V. and Stinkey, C. (1986), “Discriminating between reorganized and liq-
uidated firms in bankruptcy”, The Accounting Review, April, 249–262.
Charnes, A. and Cooper, W.W. (1961), Management Models and Industrial Applications of
Linear Programming, Wiley, New York.
References 235

Charnes, A., Cooper, W.W. and Ferguson, R.O. (1955), “Optimal estimation of executive
compensation by linear programming”, Management Science, 2, 138-151.
Chmielewski, M.R. and Grzymala-Busse, J.W. (1996), “Global discretization of continuous
attributes as preprocessing for machine learning”, International Journal of Approximate
Reasoning, 15, 319-331.
Choo, E.U. and Wedley, W.C. (1985), “Optimal criterion weights in repetitive multicriteria
decision–making”, Journal of the Operational Research Society, 36/11, 983–992.
Clark, P. and Niblett, T. (1989), “The CN2 induction algorithm”, Machine Learning, 3, 261-
283.
Colson, G. and de Bruyn, Ch. (1989), “An integrated multiobjective portfolio management
system”, Mathematical and Computer Modelling, 12/10-11, 1359-1381.
Conway, D.G., Victor Cabot A. and Venkataramanan, M.A. (1998), “A genetic algorithm for
discriminant analysis”, Annals of Operations Research, 78, 71-82.
Cook, W.D. and Kress, M. (1991), “A multiple criteria decision model with ordinal prefer-
ence data”, European Journal of Operational Research, 54, 191-198.
Courtis, J.K. (1978), “Modelling a financial ratios categoric framework”, Journal of Business
Finance & Accounting, 5/4, 371-387.
Cronan, T.P., Glorfeld, L.W. and Perry, L.G. (1991), “Production system development for
expert systems using a recursive partitioning induction approach: An application to
mortgage, commercial and consumer lending”, Decision Sciences, 22, 812-845.
Devaud, J.M., Groussaud, G. and Jacquet-Lagrèze, E. (1980), “UTADIS: Une méthode de
construction de fonctions d’utilité additives rendant compte de jugements globaux”,
European Working Group on Multicriteria Decision Aid, Bochum.
Diakoulaki, D., Zopounidis, C., Mavrotas, G. and Doumpos, M. (1999), “The use of a prefer-
ence disaggregation method in energy analysis and policy making”, Energy–The Interna-
tional Journal, 24/2, 157-166.
Dias, L., Mousseau, V., Figueira, J. and Climaco, J. (2000), “An aggregation/disaggregation
approach to obtain robust conclusions with ELECTRE TRI”, Cahier du LAMSADE, No
174, Université de Paris-Dauphine.
Dillon, W.R. and Goldstein, M. (1978), “On the performance of some multinomial classifica-
tion rules”, Journal of the American Statistical Association, 73, 305-313.
Dimitras, A.I., Zopounidis, C. and Hurson, C. (1995), “A multicriteria decision aid method
for the assessment of business failure risk”, Foundations of Computing and Decision Sci-
ences, 20/2, 99-112.
Dimitras, A.I., Zanakis, S.H. and Zopounidis, C. (1996), “A survey of business failures with an
emphasis on prediction methods and industrial applications”, European Journal of Opera-
tional Research, 90, 487-513.
Dimitras, A.I., Slowinski, R., Susmaga, R. and Zopounidis, C. (1999), “Business failure pre-
diction using rough sets”, European Journal of Operational Research, 114, 263-280.
Dominiak C. (1997), “Portfolio selection using the idea of reference solution”, in: G. Fandel
and Th. Gal (eds.), Multiple Criteria Decision Making, Proceedings of the Twelfth Inter-
236

national Conference, Lectures Notes in Economics and Mathematical Systems 448,


Hagen, Germany, Berlin-Heidelberg, 593-602.
Doumpos, M. and Zopounidis, C. (1998), “The use of the preference disaggregation analysis in
the assessment of financial risks”, Fuzzy Economic Review, 3/1, 39-57.
Doumpos, M. and Zopounidis, C. (2001), “Developing sorting models using preference disag-
gregation analysis: An experimental investigation”, in: C. Zopounidis, P.M. Pardalos and
G. Baourakis (Eds), Fuzzy Sets in Management, Economy and Marketing, World Scien-
tific, Singapore, 51-67.
Doumpos, M., Pentaraki, K., Zopounidis, C. and Agorastos, C. (2001), “Assessing country
risk using a multi–group discrimination method: A comparative analysis”, Managerial
Finance, 27/7-8, 16-34..
Duarte Silva, A.P. and Stam, A. (1994), “Second-order mathematical programming formula-
tions for discriminant analysis”, European Journal of Operational Research, 74, 4-22.
Dubois, D. and Prade, H. (1979), “Decision-making under fuzziness”, in: M.M. Gupta, R.K.
Ragade and R.R. Yager (Eds), Advances in Fuzzy Set Theory and Applications, North-
Holland, Amsterdam, 279-302.
Duchessi, P. and Belardo, S. (1987), “Lending analysis support system (LASS): An applica-
tion of a knowledge-based system to support commercial loan analysis”, IEEE Transac-
tions on Systems, Man, and Cybernetics, 17/4, 608-616.
Duda, R.O. and Hart, P.E. (1978), Pattern Classification and Scene Analysis, John Wiley and
Sons, New York.
Dutka, A. (1995), AMA Handbook of Customer Satisfaction: A Guide to Research, Planning
and Implementation, NTC Publishing Group, Illinois.
Dyer, J.S. (1990), “A clarification of ‘Remarks on the analytic hierarchy process’”, Manage-
ment Science, 36/3, 274-275.
Elmer, P.J. and Borowski, D.M. (1988), “An expert system approach to financial analysis:
The case of S&L bankruptcy”, Financial Management, 17, 66-76.
Elton, E.J. and Gruber, M.J. (1995), Modern Portfolio Theory and Investment Analysis (5th
edition), John Wiley and Sons, New York.
Evrard, Y. and Zisswiller, R. (1982), “Une analyse des décisions d’investissement fondée sur
les modèles de choix multi-attributs”, Finance, 3/1, 51-68.
Falk, J.E. and Karlov, V.E. (2001), “Robust separation of finite sets via quadratics”, Com-
puters and Operations Research, 28, 537–561.
Fayyad, U.M. and Irani, K.B. (1992), “On the handling of continuous-valued attributes in
decision tree generation”, Machine Learning, 8, 87-102.
Fishburn, P.C. (1965), “Independence in utility theory with whole product sets”, Operations
Research 13, 28-45.
Fishburn, P.C. (1970), Utility Theory for Decision Making, Wiley, New York.
Fisher, R.A. (1936), “The use of multiple measurements in taxonomic problems”, Annals of
Eugenics, 7, 179-188.
References 237

Fleishman, A.I. (1978), “A method for simulating nonnormal distributions”, Psychometrika,


43, 521-532.
Fodor, J. and Roubens, M. (1994), Fuzzy Preference Modelling and Multicriteria Decision
Support, Kluwer Academic Publishers, Dordrecht.
Foster, G. (1986), Financial Statements Analysis, Prentice Hall, London.
Fraughnaugh, K., Ryan, J., Zullo, H. and Cox Jr., L.A. (1998), “Heuristics for efficient classi-
fication”, Annals of Operations Research, 78, 189-200.
Freed, N. and Glover, F. (198la), “A linear programming approach to the discriminant prob-
lem”, Decision Sciences, 12, 68-74.
Freed, N. and Glover, F. (1981b), “Simple but powerful goal programming models for dis-
criminant problems”, European Journal of Operational Research, 7, 44-60.
Freed, N. and Glover, F. (1986), “Evaluating alternative linear programming models to solve
the two–group discriminant problem”, Decision Sciences, 17, 151–162.
Fritz, S. and Hosemann, D. (2000), “Restructuring the credit process: Behavior scoring for
German corporates”, International Journal of Intelligent Systems in Accounting, Finance
and Management, 9, 9-21.
Frydman, H., Altman, E.I. and Kao, D.L. (1985), “Introducing recursive partitioning for fi-
nancial classification: The case of financial distress”, Journal of Finance, XL/1, 269-291.
Gehrlein, W.V. and Wagner, B.J. (1997), Nontraditional Approaches to the Statistical Classi-
fication and Regression Problems, Special Issue in Annals of Operations Research, 74.
Gelfand, S., Ravishankar, C. and Delp, E. (1991), “An iterative growing and pruning algo-
rithm for classification tree design”, IEEE Transactions on Pattern Analysis and Ma-
chine Intelligence, 13/2, 163–174.
Glover, F. (1990), “Improved linear programming models for discriminant analysis”, Deci-
sion Sciences, 21, 771–785.
Glover, F. and Laguna, M. (1997), Tabu Search, Kluwer Academic Publishers, Boston.
Glover, F., Keene, S. and Duea, B. (1988), “A new class of models for the discriminant prob-
lem”, Decision Sciences, 19, 269–280.
Gloubos, G. and Grammatikos, T. (1988), “The success of bankruptcy prediction models in
Greece”, Studies in Banking and Finance supplement to the Journal of Banking and Fi-
nance, 7, 37–46.
Gochet, W., Stam, A., Srinivasan, V. and Chen, S. (1997), “Multigroup discriminant analysis
using linear programming”, Operations Research, 45/2, 213-225.
Grabisch, M. (1995), “Fuzzy integrals in multicriteria decision making”, Fuzzy Sets and Sys-
tems, 69, 279-298.
Grabisch, M. (1996), “The application of fuzzy integrals in multicriteria decision making”,
European Journal of Operational Research, 89, 445-456.
Greco, S., Matarazzo, B. and Slowinski, R. (1997), “Rough set approach to multi-attribute
choice and ranking problems”, in: G. Fandel and T. Gal (Eds), Multiple Criteria Decision
Making, Springer-Verlag, Berlin, 318-329.
238

Greco, S., Matarazzo, B. and Slowinski, R. (1999a), “The use of rough sets and fuzzy sets in
MCDM”, in: T. Gal, T. Hanne and T. Stewart (eds.), Advances in Multiple Criteria Deci-
sion Making, Kluwer Academic Publishers, Dordrecht, 14.1-14.59.
Greco, S., Matarazzo, B., Slowinski, R. and Zanakis, S. (1999b), “Rough set analysis of in-
formation tables with missing values”, in: D. Despotis and C. Zopounidis (Eds.), Integrat-
ing Technology & Human Decisions: Bridging into the 21st Century, Vol. II, Proceedings of
the 5th International Meeting of the Decision Sciences Institute, New Technologies Edi-
tions, Athens, 1359–1362.
Greco, S., Matarazzo, B. and Slowinski, R. (2000a), “Extension of the rough set approach to
multicriteria decision support”, INFOR, 38/3, 161–196.
Greco, S., Matarazzo, B. and Slowinski, R. (2000b), “Dealing with missing values in rough
set analysis of multi-attribute and multi-criteria decision problems”, in: S.H. Zanakis, G.
Doukidis and C. Zopounidis (Eds.), Decision Making: Recent Developments and World-
wide Applications, Kluwer Academic Publishers, Dordrecht, 295–316.
Greco, S., Matarazzo, B. and Slowinski, R. (2000b), “Dealing with missing values in rough
set analysis of multi-attribute and multi-criteria decision problems”, in: S.H. Zanakis, G.
Doukidis and C. Zopounidis (Eds.), Decision Making: Recent Developments and World-
wide Applications, Kluwer Academic Publishers, Dordrecht, 295–316.
Greco, S., Matarazzo, B. and Slowinski, R. (2001), “Conjoint measurement and rough sets
approach for multicriteria sorting problems in presence of ordinal data”, in: A. Colorni,
M. Paruccini and B. Roy (eds), AMCDA-Aide Multicritère à la decision (Multiple Crite-
ria Decision Aiding), EUR Report, Joint Research Centre, The European Commission,
Ispra (to appear).
Greco, S., Matarazzo, B. and Slowinski, R. (2002), “Rough sets methodology for sorting
problems in presence of multiple attributes and criteria”, European Journal of Opera-
tional Research, 138, 247-259.
Grinold, R.C. (1972), “Mathematical programming methods for pattern classification”, Man-
agement Science, 19, 272-289.
Grzymala-Busse, J.W. (1992), “LERS: A system for learning from examples based on rough
sets”, in: R. Slowinski (ed.), Intelligent Decision Support. Handbook of Applications and
Advances of the Rough Sets Theory, Kluwer Academic Publishers, Dordrecht, 3–18.
Grzymala-Busse, J.W. and Stefanowski, J. (2001), “Three discretization methods for rule
induction”, International Journal of Intelligent Systems, 26, 29-38.
Gupta, M.C. and Huefner, R.J. (1972), “A cluster analysis study of financial ratios and indus-
try characteristics”, Journal of Accounting Research, Spring, 77-95.
Gupta, Y.P., Rao, R.P. and Bagghi, P.K. (1990), “Linear goal programming as an alternative
to multivariate discriminant analysis: A Note”, Journal of Business Finance and Ac-
counting, 17/4, 593-598.
Hand, D.J. (1981), Discrimination and Classification, Wiley, New York.
Harker, P.T. and Vargas, L.G. (1990), “Reply to ‘Remark on the analytic hierarchy process’
By J.S. Dyer”, Management Science, 36/3, PP. 269-273.
References 239

Horsky, D. and Rao, M.R. (1984), “Estimation of attribute weights from preference compari-
sons”, Management Science, 30/7, 801-822.
Hosseini, J.C. and Armacost, R.L. (1994), “The two-group discriminant problem with equal
group mean vectors: An experimental evaluation of six linear/nonlinear programming
formulations”, European Journal of Operational Research, 77, 241-252.
Hung, M.S. and Denton, J.W. (1993), “Training neural networks with the GRG2 nonlinear
optimizer”, European Journal of Operational Research, 69, 83-91.
Hurson Ch. and Ricci, N. (1998), “Multicriteria decision making and portfolio management
with arbitrage pricing theory”, in: C. Zopounidis (ed.), Operational Tools in The Man-
agement of Financial Risks, Kluwer Academic Publishers, Dordrecht, 31-55.
Hurson, Ch. and Zopounidis, C. (1995), “On the use of multi-criteria decision aid methods to
portfolio selection”, Journal of Euro-Asian Management, 1/2, 69-94.
Hurson Ch. and Zopounidis C. (1996), “Méthodologie multicritère pour l’évaluation et la
gestion de portefeuilles d’actions”, Banque et Marché 28, Novembre-Décembre, 11-23.
Hurson, Ch. and Zopounidis, C. (1997), Gestion de Portefeuille et Analyse Multicritère,
Economica, Paris.
Ishibuchi, H., Nozaki, K. and Tanaka, H. (1992), “Distributed representation of fuzzy rules
and its application to pattern classification”, Fuzzy Sets and Systems, 52, 21-32.
Ishibuchi, H., Nozaki, K. and Tanaka, H. (1993), “Efficient fuzzy partition of pattern space
for classification problems”, Fuzzy Sets and Systems, 59, 295-304.
Inuiguchi, M., Tanino, T. and Sakawa, M. (2000), “Membership function elicitation in possi-
bilistic programming problems”, Fuzzy Sets and Systems, 111, 29-45.
Jablonsky, J. (1993), “Multicriteria evaluation of clients in financial houses”, Central Euro-
pean Journal of Operations Research and Economics, 3/2, 257-264.
Jacquet-Lagrèze, E. (1995), “An application of the UTA discriminant model for the evalua-
tion of R & D projects”, in: P.M. Pardalos, Y. Siskos, C. Zopounidis (eds.), Advances in
Multicriteria Analysis, Kluwer Academic Publishers, Dordrecht, 203-211.
Jacquet-Lagrèze, E. and Siskos, J. (1978), “Une méthode de construction de fonctions d’
utilité additives explicatives d’ une préférence globale”, Cahier du LAMSADE, No 16,
Université de Paris-Dauphine.
Jacquet-Lagrèze, E. and Siskos, Y. (1982), “Assessing a set of additive utility functions for
multicriteria decision making: The UTA method”, European Journal of Operational Re-
search, 10, 151-164.
Jacquet-Lagrèze, E. and Siskos, J. (1983), Méthodes de Décision Multicritère, Editions
Hommes et Techniques, Paris.
Jacquet-Lagrèze, E. and Siskos, J. (2001), “Preference disaggregation: Twenty years of
MCDA experience”, European Journal of Operational Research, 130, 233-245.
Jelanek, J. and Stefanowki, J. (1998), “Experiments on solving multiclass learning problems
by n2-classifier”, in: Proceedings of the 10th European Conference on Machine Learning,
Chemnitz, April 21-24, 1998, Lecture Notes in AI, vol. 1398, Springer-Verlag, Berlin,
172-177.
240

Jensen, R.E. (1971), “A cluster analysis study of financial performance of selected firms”,
The Accounting Review, XLVI, January, 36-56.
Joachimsthaler, E.A. and Stam, A. (1988), “Four approaches to the classification problem in
discriminant analysis: An experimental study”, Decision Sciences, 19, 322–333.
Joachimsthaler, E.A. and Stam, A. (1990), “Mathematical programming approaches for the
classification problem in two-group discriminant analysis”, Multivariate Behavioral Re-
search, 25/4, 427-454.
Jog, V., Michalowski, W., Slowinski, R. and Susmaga, R. (1999), “The Rough Sets Analysis
and the Neural Networks Classifier: A Hybrid Approach to Predicting Stocks’ Perform-
ance”, in: D.K. Despotis and C. Zopounidis (eds.), Integrating Technology & Human De-
cisions: Bridging into the 21st Century, Vol. II, Proceedings of the 5th International Meet-
ing of the Decision Sciences Institute, New Technologies Editions, Athens, 1386-1388.
John, G.H., Miller, P. and Kerber, R. (1996), “Stock selection using RECONTM/SM,., in: Y. Abu-
Mostafa, J. Moody, P. Refenes and A. Weigend (eds.), Neural Networks in Financial Engi-
neering, World Scientific, London, 303-316.
Karapistolis, D., Katos, A., and Papadimitriou, G. (1996), “Selection of a solvent portfolio
using discriminant analysis”, in: Y. Siskos, C. Zopounidis, and K. Pappis (Eds.), Man-
agement of small firms, Cretan University Editions, Iraklio, 135-140 (in Greek).
Kahya, E. and Theodossiou, P. (1999), “Predicting corporate financial distress: A time-series
CUSUM methodology”, Review of Quantitative Finance and Accounting, 13, 323-345.
Karst, O.J. (1958), “Linear curve fitting using least deviations”, Journal of the American Sta-
tistical Association, 53, 118-132.
Keasey, K. and Watson, R. (1991), “Financial distress prediction models: A review of their
usefulness”, British Journal of Management, 2, 89-102.
Keasey, K., McGuinness, P. and Short, H. (1990), “Multilogit approach to predicting corpo-
rate failure-Further analysis and the issue of signal consistency”, Omega, 18/1, 85-94.
Kelley, J.E. (1958), “An application of linear programming to curve fitting”, Journal of In-
dustrial and Applied Mathematics, 6, 15-22.
Keeney, R.L. and Raiffa, H. (1993), Decisions with Multiple Objectives: Preferences and
Value Trade-offs, Cambridge University Press, Cambridge.
Khalil, J., Martel, J-M. and Jutras, P. (2000), “A multicriterion system for credit risk rating”,
Gestion 2000: Belgian Management Magazine, 15/1, 125-146.
Khoury, N.T., Martel, J.M. and Veilleux, M. (1993), “Méthode multicritère de sélection de
portefeuilles indiciels internationaux”, L’Actualité Economique, Revue d’Analyse
Economique, 69/1, 171-190.
Klemkowsky, R. and Petty, J.W. (1973), “A multivariate analysis of stock price variability”,
Journal of Business Research, Summer.
Koehler, G.J. and Erenguc, S.S. (1990), “Minimizing misclassifications in linear discriminant
analysis”, Decision Sciences, 21, 63–85.
Kohara, K., Ishikawa, T., Fukuhara, Y. and Nakamura, Y. (1997), “Stock price prediction
using prior knowledge and neural networks”, Intelligent Systems in Accounting, Finance
and Management, 6, 11-22.
References 241

Koopmans, T.C. (1951), Activity Analysis of Production and Allocation, John Wiley and
Sons, New York.
Kordatoff, Y. and Michalski, R.S. (1990), Machine Learning: An Artificial Intelligence Ap-
proach, Volume III, Morgan Kaufmann Publishers, Los Altos, California.
Korhonen, P. (1988), “A visual reference direction approach to solving discrete multiple crite-
ria problems”, European Journal of Operational Research, 34, 152-159.
Korhonen, P. and Wallenius, J. (1988), “A Pareto race”, Naval Research Logistics, 35, 615-
623.
Kosko, B. (1992), Neural Networks and Fuzzy Systems, Prentice-Hall, Englewood Cliffs,
New Jersey.
Krzanowski, W.J. (1975), “Discrimination and classification using both binary and continu-
ous variables”, Journal of the American Statistical Association, 70, 782-790.
Krzanowski, W.J. (1977), “The performance of Fisher’s linear discriminant function under
nonoptimal conditions”, Technometrics, 19, 191-200.
Laitinen, E.K. (1992), “Prediction of failure of a newly founded firm”, Journal of Business
Venturing, 7, 323-340.
Lam, K.F. and Choo, E.U. (1995), “Goal programming in preference decomposition”, Journal
of the Operational Research Society, 46, 205-213.
Lanchenbruch, P.A., Snuringer, C. and Revo, L.T. (1973), “Robustness of the linear and
quadratic discriminant function to certain types of non-normality”, Communications in
Statistics, 1, 39-56.
Langholz, G., Kandel, A., Schneider, M. and Chew, G. (1996), Fuzzy Expert System Tools,
John Wiley and Sons, New York.
Lee, S.M. and Chesser, D.L. (1980), “Goal programming for portfolio selection”, The Journal
of Portfolio Management, Spring, 22-26.
Lee, K.C. and Kim, H.S. (1997), “A fuzzy cognitive map-based bi-directional inference
mechanism: An application to stock investment analysis”, Intelligent Systems in Account-
ing, Finance & Management, 6, 41-57.
Lee, C.K. and Ord, K.J. (1990), “Discriminant analysis using least absolute deviations”, Deci-
sion Science, 21, 86-96.
Lee, K.H. and Jo, G.S. (1999), “Expert system for predicting stock market timing using a
candlestick chart”, Expert Systems with Applications, 16, 357-364.
Lee, J.K., Kim, H.S. and Chu, S.C. (1989), “Intelligent stock portfolio management system”,
Expert Systems, 6/2, 74-85.
Lee, H., Kwak, W. and Han, I. (1995), “Developing a business performance evaluation sys-
tem: An analytic hierarchical model”, The Engineering Economist, 30/4, 343-357.
Lennox, C.S. (1999), “The accuracy and incremental information content of audit reports in
predicting bankruptcy”, Journal of Business Finance & Accounting, 26/5-6, 757-778.
Liittschwager, J.M., and Wang, C. (1978), “Integer programming solution of a classification
problem”, Management Science, 24/14, 1515-1525.
242

Liu, N.K.. and Lee, K.K. (1997), “An intelligent business advisor system for stock invest-
ment”, Expert Systems, 14/4, 129-139.
Lofti, V., Stewart, T.J. and Zionts, S. (1992), “An aspiration-level interactive model for mul-
tiple criteria decision making”, Computers and Operations Research, 19, 677-681.
Lootsma, F.A. (1997), Fuzzy Logic for Planning and Decision Making, Kluwer Academic
Publishers, Dordrecht.
Luce, D. (1956), “Semiorders and a theory of utility discrimination”, Econometrica, 24.
Luoma, M. and Laitinen, E.K. (1991), “Survival analysis as a tool for company failure predic-
tion”, Omega, 19/6, 673-678.
Lynch, J.G. (1979), “Why additive utility models fail as descriptions of choice behavior”,
Journal of Experimental Social Phychology, 15, 397-417.
Mangasarian, O.L. (1968), “Multisurface method for patter separation”, IEEE Transactions
on Information Theory, IT-14/6, 801-807.
Mardia, K.V. (1975), “Assessment of multinormality and the robustness of Hotelling’s T2
test”, Applied Statistics, 24, 163-171.
Markowitz, H. (1952), “Portfolio selection”, Journal of Finance, 7/1, 77-91.
Markowitz, H. (1959), Portfolio Selection: Efficient Diversification of Investments, John
Wiley and Sons, New York.
Markowski, C.A. (1990), “On the balancing of error rates for LP discriminant methods”,
Managerial and Decision Economics, 11, 235-241.
Markowski, E.P. and Markowski, C.A. (1985), “Some difficulties and improvements in ap-
plying linear programming formulations to the discriminant problem”, Decision Sci-
ences, 16, 237-247.
Markowski, C.A. and Markowski, E.P. (1987), “An experimental comparison of several ap-
proaches to the discriminant problem with both qualitative and quantitative variables”,
European Journal of Operational Research, 28, 74-78.
Martel, J.M., Khoury, N.T. and Bergeron, M. (1988), “An application of a multicriteria ap-
proach to portfolio comparisons”, Journal of the Operational Research Society, 39/7,
617-628.
Martin, D. (1977), “Early warning of bank failure: A logit regression approach”, Journal of
Banking and Finance, 1, 249-276.
Massaglia, M. and Ostanello, A. (1991), “N-TOMIC: A decision support for multicriteria
segmentation problems”, in: P. Korhonen (ed.), International Workshop on Multicriteria
Decision Support, Lecture Notes in Economics and Mathematics Systems 356, Springer-
Verlag, Berlin, 167-174.
Matsatsinis, N.F., Doumpos, M. and Zopounidis, C. (1997), “Knowledge acquisition and repre-
sentation for expert systems in the field of financial analysis”, Expert Systems with Applica-
tions, 12/2, 247-262.
McFadden, D. (1974), “Conditional logit analysis in qualitative choice behavior”, in: P. Za-
rembka (ed.), Frontiers in Econometrics, Academic Press, New York.
References 243

McFadden, D. (1980), “Structural discrete probability models derived from the theories of
choice”, in: C.F. Manski and D. McFadden (eds.), Structural Analysis of Discrete Data
with Econometric Applications, MIT Press, Cambridge, Mass.
McLachlan, G. J. (1992), Discriminant Analysis and Statistical Pattern Recognition, Wiley,
New York.
Messier, W.F. and Hansen, J.V. (1988), “Inducing rules for expert system development: An
example using default and bankruptcy data”, Management Science, 34/12, 1403-1415.
Michalski, R.S. (1969), “On the quasi-minimal solution of the general covering problem”,
Proceedings of the 5th International Federation on Automatic Control, Vol. 27, 109-129.
Mienko, R., Stefanowski, J., Toumi, K. and Vanderpooten, D., (1996), “Discovery-oriented
induction of decision rules”, Cahier du LAMSADE no. 141, Université de Paris Dau-
phine, Paris.
Moody’s Investors Service (1998), Moody’s Equity Fund Analyzer (MFA): An Analytical
Tool to Assess the Performance and Risk Characteristics of Equity Mutual Funds,
Moody’s Investors Service, New York.
Moody’s Investors Service (1999), Moody’s Sovereign Ratings: A Ratings Guide, Moody’s
Investors Service, New York.
Moody’s Investors Service (2000), Moody’s Three Point Plot: A New Approach to Mapping
Equity Fund Returns, Moody’s Investors Service, New York.
Moore, D.H. (1973), “Evaluation of five discriminant procedures for binary variables”, Jour-
nal of the American Statistical Association, 68, 399-404.
Mousseau, V. and Slowinski, R. (1998), “Inferring an ELECTRE-TRI model from assignment
examples”, Journal of Global Optimization, 12/2, 157-174.
Mousseau, V., Slowinski, R. and Zielniewicz, P. (2000), “A user-oriented implementation of
the ELECTRE-TRI method integrating preference elicitation support”, Computers and
Operations Research, 27/7-8, 757-777.
Murphy, J. (1999), Technical Analysis of the Financial Markets: A Comprehensive Guide to
Trading Methods and Applications, Prentice Hall Press, New Jersey.
Nakayama, H. and Kagaku, N. (1998), “Pattern classification by linear goal programming and
its extensions”, Journal of Global Optimization, 12/2, 111-126.
Nakayama, H., Takeguchi, T. and Sano, M. (1983), “Interactive graphics for portfolio selec-
tion”, in: P. Hansen (ed.), Essays and Surveys on Multiple Criteria Decision Making,
Lectures Notes in Economics and Mathematical Systems 209, Springer Verlag, Berlin-
Heidelberg, 280-289.
Nieddu, L. and Patrizi, G. (2000), “Formal methods in pattern recognition: A review”, Euro-
pean Journal of Operational Research, 120, 459-495.
Oh, S. and Pedrycz, W. (2000), “Identification of fuzzy systems by means of an auto-tuning
algorithm and its application to nonlinear systems”, Fuzzy Sets and Systems, 115, 205-
230.
Ohlson, J.A. (1980), “Financial ratios and the probabilistic prediction of bankruptcy”, Journal
of Accounting Research, 18, 109–131.
244

Oral, M. and Kettani, O. (1989), “Modelling the process of multiattribute choice”, Journal of
the Operational Research Society, 40/3, 281-291.
Östermark, R. and Höglund, R. (1998), “Addressing the multigroup discriminant problem
using multivariate statistics and mathematical programming”, European Journal of Op-
erational Research, 108, 224-237.
Pareto, V. (1896), Cours d’ Economie Politique, Lausanne.
Pardalos, P.M., Sandström, M. and Zopounidis, C. (1994), “On the use of optimization mod-
els for portfolio selection: A review and some computational results”, Computational
Economics, 7/4, 227-244.
Pardalos, P.M., Siskos, Y. and Zopounidis, C. (1995), Advances in Multicriteria Analysis,
Kluwer Academic Publishers, Dordrecht.
Patuwo, E., Hu, M.Y. and Hung, M.S. (1993), “Two-group classification using neural net-
works”, Decision Sciences, 24, 825-845.
Pawlak, Z. (1982), “Rough sets”, International Journal of Information and Computer Sci-
ences, 11, 341–356.
Pawlak, Z. (1991) Rough Sets: Theoretical Aspects of Reasoning about Data, Kluwer Aca-
demic Publishers, Dordrecht.
Pawlak, Z. and Slowinski, R. (1994), “Rough set approach to multi-attribute decision analy-
sis”, European Journal of Operational Research, 72, 443-459.
Peel, M.J. (1987), “Timeliness of private company reports predicting corporate failure”, In-
vestment Analysis, 83, 23-27.
Perny, P. (1998), “Multicriteria filtering methods based on concordance and non-discordance
principles”, Annals of Operations Research, 80, 137-165.
Platt, H.D. and Platt, M.B. (1990), “Development of a class of stable predictive variables: The
case of bankruptcy prediction”, Journal of Business Finance and Accounting, 17/1, 31–
51.
Press, S.J. and Wilson, S. (1978), “Choosing between logistic regression and discriminant
analysis”, Journal of the American Statistical Association, 73, 699-705.
Quinlan, J.R. (1983), “Learning efficient classification procedures and their application to
chess end games”, in: R.S. Michalski, J.G. Carbonell and T.M. Mitchell (eds.), Machine
Learning: An Artificial Intelligence Approach, Tioga Publishing Company, Palo Alto,
CA.
Quinlan, J.R. (1986), “Induction of decision trees”, Machine Learning 1, 81–106.
Quinlan J.R. (1993), C4.5: Programs for Machine Learning, Morgan Kaufmann Publishers,
Los Altos, California.
Ragsdale, C.T. and Stam, A. (1991), “Mathematical programming formulations for the dis-
criminant problem: An old dog does new tricks”, Decision Sciences, 22, 296-307.
Rios-Garcia, S. and Rios-Insua, S. (1983), “The portfolio problem with multiattributes and
multiple criteria”, in: P. Hansen (ed.), Essays and Surveys on Multiple Criteria Decision
Making, Lectures Notes in Economics and Mathematical Systems 209, Springer Verlag,
Berlin Heidelberg, 317-325.
References 245

Ripley, B.D. (1996), Pattern Recognition and Neural Networks, Cambridge University Press,
Cambridge.
Rose, P.S., Andrews W.T. and Giroux, G.A. (1982), “Predicting business failure: A macro-
economic perspective”, Journal of Accounting and Finance, 6/1, 20-31.
Ross, S. (1976), “The arbitrage theory of capital asset pricing”, Journal of Economic Theory,
13, 343-362.
Roy, B. (1968), “Classement et choix en présence de points de vue multiples: La méthode
ELECTRE”, R.I.R.O, 8, 57-75.
Roy, B. (1985), Méthodologie Multicritère d’ Aide à la Décision, Economica, Paris.
Roy, B. (1991), “The outranking approach and the foundations of ELECTRE methods”, The-
ory and Decision, 31, 49-73.
Roy, B. and Vincke, Ph. (1981), “Multicriteria analysis: Survey and new directions”, Euro-
pean Journal of Operational Research, 8, 207-218.
Roy, B. and Bouyssou D. (1986), “Comparison of two decision-aid models applied to a nu-
clear power plant sitting example”, European Journal of Operational Research, 25, 200-
215.
Rubin, P.A. (1990a), “Heuristic solution procedures for a mixed–integer programming dis-
criminant model”, Managerial and Decision Economics, 11, 255–266.
Rubin, P.A. (1990b), “A comparison of linear programming and parametric approaches to the
two–group discriminant problem”, Decision Sciences, 21, 373–386.
Rumelhart, D.E., Hinton, G.E. and Williams, R.J. (1986), “Learning internal representation by
error propagation”, in: D.E. Rumelhart and J.L. Williams (eds.), Parallel Distributed
Processing: Explorations in the Microstructure of Cognition, MIT Press, Cambridge,
Mass.
Saaty, T.L. (1980), The Analytic Hierarchy Process, McGraw-Hill, New York.
Saaty, T.L., Rogers, P.C. and Pell, R. (1980), “Portfolio selection through hierarchies”, The
Journal of Portfolio Management, Spring, 16-21.
Scapens, R.W., Ryan, R.J. and Flecher, L. (1981), “Explaining corporate failure: A ca-
tastrophe theory approach”, Journal of Business Finance and Accounting, 8/1, 1-26.
Schoner B. and Wedley, W.C. (1989), “Ambiguous criteria weights in AHP: Consequences
and solutions”, Decision Sciences, 20, 462-475.
Schoner B. and Wedley, W.C. (1993), “A unified approach to AHP with linking pins”, Euro-
pean Journal of Operational Research, 64, 384-392.
Sharpe, W. (1964), “Capital asset prices: A theory of market equilibrium under conditions of
risk”, Journal of Finance, 19, 425-442.
Sharpe, W. (1998), “Morningstar’s risk adjusted ratings”, Financial Analysts Journal,
July/August, 21-33.
Shen, L., Tay, F.E.H, Qu, L. and Shen, Y. (2000), “Fault diagnosis using rough sets theory”,
Computers in Industry 43, 61-72.
Siskos, J. (1982), “A way to deal with fuzzy preferences in multicriteria decision problems”,
European Journal of Operational Research, 10, 314-324.
246

Siskos, J. and Despotis, D.K. (1989), “A DSS oriented method for multiobjective linear pro-
gramming problems”, Decision Support Systems, 5, 47-55.
Siskos, Y. and Yannacopoulos, D. (1985), “UTASTAR: An ordinal regression method for
building additive value functions”, Investigação Operacional, 5/1, 39-53.
Siskos, J., Lochard, J. and Lombardo, J. (1984a), “A multicriteria decision-making methodol-
ogy under fuzziness: Application to the evaluation of radiological protection in nuclear
power plants”, in: H.J. Zimmermann, L.A. Zadeh, B.R. Gaines (eds.), Fuzzy Sets and
Decision Analysis, North-Holland, Amsterdam, 261-283.
Siskos, J., Wascher, G. and Winkels, H.M. (1984b), “Outranking approaches versus MAUT
in MCDM”, European Journal of Operational Research, 16, 270-271.
Siskos, Y., Grigoroudis, E., Zopounidis, C. and Saurais, O. (1998), “Measuring customer satis-
faction using a survey based preference disaggregation model”, Journal of Global Optimi-
zation, 12/2, 175-195.
Skogsvik, K. (1990), “Current cost accounting ratios as predictors of business failure: The
Swedish case”, Journal of Business Finance and Accounting, 17/1, 137-160.
Skowron, A. (1993), “Boolean reasoning for decision rules generation”, in: J. Komorowski
and Z. W. Ras (eds.), Methodologies for Intelligent Systems, Lecture Notes in Artificial
Intelligence vol. 689, Springer-Verlag, Berlin, 295–305.
Slowinski, R. (1993), “Rough set learning of preferential attitude in multi-criteria decision
making”, in: J. Komorowski and Z. W. Ras (eds.), Methodologies for Intelligent Sys-
tems. Lecture Notes in Artificial Intelligence vol. 689, Springer-Verlag, Berlin, 642–651.
Slowinski, R. and Stefanowski, J. (1992), “RoughDAS and RoughClass software implemen-
tations of the rough sets approach”, in: R. Slowinski (ed.), Intelligent Decision Support:
Handbook of Applications and Advances of the Rough Sets Theory, Kluwer Academic
Publishers, Dordrecht, 445-456.
Slowinski, R. and Stefanowski, J. (1994), “Rough classification with valued closeness rela-
tion”, in: E. Diday et al. (eds.), New Approaches in Classification and Data Analysis,
Springer-Verlag, Berlin, 482–488.
Slowinski, R. and Zopounidis, C. (1995), “Application of the rough set approach to evaluation
of bankruptcy risk”, International Journal of Intelligent Systems in Accounting, Finance
and Management, 4, 27–41.
Smith, C. (1947), “Some examples of discrimination”, Annals of Eugenics, 13, 272-282.
Smith, K.V. (1965), “Classification of investment securities using multiple discriminant
analysis”, Institute Paper No. 101, Institute for Research in the Behavioral, Economic
and Management Sciences, Perdue University.
Smith, F.W. (1968), “Pattern classifier design by linear programming”, IEEE Transactions on
Computers, C-17,4, 367-372.
Spronk, J. (1981), Interactive Multiple Goal Programming Application to Financial Planning,
Martinus Nijhoff Publishing, Boston.
Spronk, J. and Hallerbach, W. (1997), “Financial modeling: Where to go? With an illustration
for portfolio management”, European Journal of Operational Research, 99, 113-125.
References 247

Srinivasan, V. and Kim, Y.H. (1987), “Credit granting: A comparative analysis of classifica-
tion procedures”, Journal of Finance, XLII/3, 665–683.
Srinivasan, V. and Ruparel, B. (1990), “CGX: An expert support system for credit granting”,
European Journal of Operational Research, 45, 293-308.
Srinivasan, V. and Shocker, A.D. (1973), “Linear programming techniques for multidimen-
sional analysis of preferences”, Psychometrika, 38/3, 337–396.
Stam, A. (1990), “Extensions of mathematical programming-based classification rules: A
multicriteria approach”, European Journal of Operational Research, 48, 351-361.
Stam, A. and Joachimsthaler, E.A. (1989), “Solving the classification problem via linear and
nonlinear programming methods”, Decision Sciences, 20, 285–293.
Standard & Poor’s Rating Services (1997), International Managed Funds: Profiles, Criteria,
Related Analytics, Standard & Poor’s, New York.
Standard & Poor’s Rating Services (2000), Money Market Fund Criteria, Standard & Poor’s,
New York.
Stefanowski, J. and Vanderpooten, D. (1994), “A general two-stage approach to inducing
rules from examples”, in: W. Ziarko (ed.) Rough Sets, Fuzzy Sets and Knowledge Dis-
covery, Springer-Verlag, London, 317–325.
Steiner, M. and Wittkemper, H.G. (1997), “Portfolio optimization with a neural network im-
plementation of the coherent market hypothesis”, European Journal of Operational Re-
search, 100, 27-40.
Steuer, R.E. and Choo, E.U. (1983), “An interactive weighted Tchebycheff procedure for
multiple objective programming”, Mathematical Programming, 26/1, 326-344.
Stewart, T.J. (1993), “Use of piecewise linear value functions in interactive multicriteria deci-
sion support: A Monte Carlo study”, Management Science, 39, 1369-1381.
Stewart, T.J. (1996), “Robustness of additive value function methods in MCDM”, Journal of
Multi-Criteria Decision Analysis, 5, 301-309.
Subramanian, V., Hung, M.S. and Hu, M.Y. (1993), “An experimental evaluation of neural
networks for classification”, Computers and Operations Research, 20/7, 769-782.
Subrahmaniam, K. and Chinganda, E.F. (1978), “Robustness of the linear discriminant func-
tion to nonnormality: Edgeworth series”, Journal of Statistical Planning and Inference,
2, 79-91.
Szala, A. (1990), L’ Aide à la Décision en Gestion de Portefeuille, Diplôme Supérieur de Re-
cherches Appliquées, Université de Paris Dauphine.
Tam, K.Y., Kiang, M.Y. and Chi, R.T.H. (1991), “Inducing stock screening rules for portfolio
construction”, Journal of the Operational Research Society, 42/9, 747-757.
Tamiz, M., Hasham, R. and Jones, D.F. (1997), “A comparison between goal programming
and regression analysis for portfolio selection”, in: G. Fandel and Th. Gal (eds.), Lectures
Notes in Economics and Mathematical Systems 448, Multiple Criteria Decision Making,
Proceedings of the Twelfth International Conference, Hagen, Germany, Berlin-
Heidelberg, 422-432.
Tessmer, A.C. (1997), “What to learn from near misses: An inductive learning approach to
credit risk assessment”, Decision Sciences, 28/1, 105-120.
248

Theodossiou, P. (1987), Corporate Failure Prediction Models for the US Manufacturing and
Retailing Sectors, Unpublished Ph.D. Thesis, City University of New York.
Theodossiou, P. (1991), “Alternative models for assessing the financial condition of business
in Greece”, Journal of Business Finance and Accounting, 18/5, 697–720.
Theodossiou, P., Kahya, E., Saidi, R. and Philippatos, G. (1996), “Financial distress and cor-
porate acquisitions: Further empirical evidence”, Journal of Business Finance and Ac-
counting, 23/5–6, 699–719.
Trippi, R.R. and Turban, R. (1996), Neural Networks in Finance and Investing, Irwin, Chi-
cago.
Tsumoto, S. (1998), “Automated extraction of medical expert system rules from clinical data-
bases based on rough set theory”, Information Sciences, 112, 67-84.
Wagner, H.M. (1959), “Linear programming techniques for regression analysis”, Journal of
the American Statistical Association, 54, 206-212.
White, R. (1975), “A multivariate analysis of common stock quality ratings”, Financial Man-
agement Association Meetings.
Wierzbicki, A.P. (1980), “The use of reference objectives in multiobjective optimization”, in:
G. Fandel and T. Gal (eds.), Multiple Criteria Decision Making: Theory and Applica-
tions, Lecture Notes in Economic and Mathematical Systems 177, Springer-Verlag, Ber-
lin-Heidelberg, 468-486.
Wilson, J.M. (1996), “Integer programming formulation of statistical classification prob-
lems”, Omega, 24/6, 681–688.
Wilson, R.L. and Sharda, R. (1994), “Bankruptcy prediction using neural networks”, Decision
Support Systems, 11, 545-557.
Wong, F.S., Wang, P.Z., Goh, T.H. and Quek, B.K. (1992), “Fuzzy neural systems for stock
selection”, Financial Analysts Journal, January/February, 47-52.
Wood, D. and Dasgupta, B. (1996), “Classifying trend movements in the MSCI U.S.A. capital
market index: A comparison of regression, ARIMA and neural network methods”, Com-
puters and Operations Research, 23/6, 611 -622.
Vale, D.C. and Maurelli, V.A. (1983), “Simulating multivariate nonnormal distributions”,
Psychometrika, 48/3,465-471.
Vargas, L.G. (1990), “An overview of the AHP and its applications”, European Journal of
Operational Research, 48, 2-8.
Von Altrock, C. (1996), Fuzzy Logic and Neurofuzzy Applications in Business and Finance,
Prentice Hall, New Jersey.
Von Neumann, J. and Morgenstern, O. (1944), Theory of Games and Economic Behavior,
Princeton, New Jersey.
Yager, R.R. (1977), “Multiple objective decision-making using fuzzy sets”, International
Journal of Man-Machine Studies, 9, 375-382.
Yandell, B.S. (1977), Practical Data Analysis for Designed Experiments, Chapman & Hall,
London.
References 249

Yanev, N. and Balev, S. (1999), “A combinatorial approach to the classification problem”,


European Journal of Operational Research, 115, 339-350.
Young, T.Y and Fu, K.-S. (1997), Handbook of Pattern Recognition and Image Processing,
Handbooks in Science and Technology, Academic Press.
Yu, W. (1992), “ELECTRE TRI: Aspects methodologiques et manuel d’utilisation”. Docu-
ment du Lamsade No 74, Universite de Paris-Dauphine, 1992.
Vranas, A.S. (1992), “The significance of financial characteristics in predicting business fail-
ure: An analysis in the Greek context,” Foundations of Computing and Decision Sci-
ences, 17/4, 257-275.
Zadeh, L.A. (1965), “Fuzzy sets”, Information and Control, 8, 338-353.
Zahedi, F. (1986), “The analytic hierarchy process: A survey of the method and its applica-
tions”, Interfaces, 16, 96-108.
Zanakis, S.H., Solomon, A., Wishart, N. and Duvlish, S. (1998), “Multi-attribute decision
making: A simulation comparison of select methods”, European Journal of Operational
Research, 107, 507-529.
Zavgren, C.V. (1985), “Assessing the vulnerability to failure of American industrial firms. A
logistic analysis”, Journal of Business Finance and Accounting, 12/1, 19–45.
Ziarko, W., Golan, D. and Edwards, D. (1993), “An application of DATALOGIC/R knowl-
edge discovery tool to identify strong predictive rules in stock market data”, in: Proceed-
ings of the AAAI Workshop on Knowledge Discovery in Databases, Washington D.C.,
89–101.
Zighed, D., Rabaseda, S. and Rakotomala, R. (1998), “FUSINTER: A method for discretisa-
tion of continuous attributes”, International Journal of Uncertainty, Fuzziness and
Knowledge-Based Systems, 6/3, 307-326.
Zions, S. and Wallenius, J. (1976), “An interactive programming method for solving the mul-
ticriteria problem”, Management Science, 22, 652-663.
Zimmermann, H.J. (1978), “Fuzzy programming and linear programming with several objec-
tive functions”, Fuzzy Sets and Systems, 1, 45-55.
Zmijewski, M.E. (1984), “Methodological issues related to the estimation of financial distress
prediction models”, Studies on current Econometric Issues in Accounting Research, 59-
82.
Zopounidis, C. (1987), “A multicriteria decision making methodology for the evaluation of
the risk of failure and an application”, Foundations of Control Engineering, 12/1,45–67.
Zopounidis, C. (1990), La Gestion du Capital-Risque, Economica, Paris.
Zopounidis, C. (1993), “On the use of the MINORA decision aiding system to portfolio selec-
tion and management”, Journal of Information Science and Technology, 2/2, 150-156.
Zopounidis, C. (1995), Evaluation du Risque de Défaillance de l’Entreprise: Méthodes et Cas
d’Application, Economica, Paris.
Zopounidis, C. (1998), Operational Tools in the Management of Financial Risks, Kluwer Aca-
demic Publishers, Dordrecht.
250

Zopounidis, C. (1999), “Multicriteria decision aid in financial management”, European Jour-


nal of Operational Research, 119, 404-415.
Zopounidis, C. and Dimitras, A.I. (1998), Multicriteria Decision Aid Methods for the Predic-
tion of Business Failure, Kluwer Academic Publishers, Dordrecht.
Zopounidis, C. and Doumpos, M. (1997), “A multicriteria decision aid methodology for the
assessment of country risk”, European Research on Management and Business Econom-
ics, 3/3, 13-33.
Zopounidis, C. and Doumpos, M. (1998), “Developing a multicriteria decision support system
for financial classification problems: The FINCLAS system”, Optimization Methods and
Software, 8, 277-304.
Zopounidis, C. and Doumpos, M. (1999a), “Business failure prediction using UTADIS multic-
riteria analysis”, Journal of the Operational Research Society, 50/11, 1138-1148.
Zopounidis, C. and Doumpos, M. (1999b), “A multicriteria decision aid methodology for sort-
ing decision problems: The case of financial distress”, Computational Economics, 14/3,
197-218.
Zopounidis, C. and Doumpos, M. (2000a), “PREFDIS: A multicriteria decision support sys-
tem for sorting decision problems”, Computers and Operations Research, 27/7-8, 779-
797.
Zopounidis, C. and Doumpos, M. (2000b), Intelligent Decision Aiding Systems Based on Mul-
tiple Criteria for Financial Engineering, Kluwer Academic Publishers, Dordrecht.
Zopounidis, C. and Doumpos, M. (2000c), “Building additive utilities for multi-group hierar-
chical discrimination: The M.H.DIS method”, Optimization Methods and Software, 14/3,
219-240.
Zopounidis, C. and Doumpos, M. (2000d), “INVESTOR: A decision support system based on
multiple criteria for portfolio selection and composition”, in: A. Colorni, M. Paruccini and
B. Roy (eds.), A-MCD-A (Aide Multi Critère à la Décision – Multiple Criteria Decision
Aiding), European Commission Joint Research Centre, 371-381.
Zopounidis, C., Matsatsinis, N.F. and Doumpos, M. (1996), “Developing a multicriteria knowl-
edge-based decision support system for the assessment of corporate performance and vi-
ability: The FINEVA system”, Fuzzy Economic Review, 1/2, 35-53.
Zopounidis, C., Despotis D.K. and Kamaratou, I. (1998), “Portfolio selection using the
ADELAIS multiobjective linear programming system”, Computational Economics, 11/3
(1998), 189-204.
Zopounidis, C., Doumpos, M. and Zanakis, S.H. (1999), “Stock evaluation using a preference
disaggregation methodology”, Decision Sciences, 30/2, 313-336.
Subject index

Arbitrage pricing theory, 206 ELECTRE TRI


Bankruptcy prediction, 6, 161-163 Assignment procedures, 63-
Bayes rule, 20, 68, 70 64
C4.5, 28-29 Concordance index, 60
Bond rating, 160 Concordance test, 60
Capital asset pricing model, 206 Credibility index, 62
Capital losses, 223 Discordance index, 62
Classification error rate, 82, 84-88 Discordance test, 60
Clustering, 5 Indifference threshold, 61
Coefficient of variation, 216-217 Preference threshold, 61
Compensatory approaches, 125, Reference profiles, 59-60,
149 Veto threshold, 62
Consistent family of criteria, 42- Error types
43 Type I error, 181
Consistency, 99 Type II error, 181
Correlation coefficient, 129,166, Experimental design, 126-127
191 Expert systems, 31, 187, 208
Country risk, 6,161 Factor analysis, 188
Credit granting, 58, 185-188 Financial management, 13, 54, 159
Credit risk assessment, 6, 13, 185- Financial ratios, 162-171
188 Financial statements, 162
Decision problematics, 1-3 FINCLAS system, 188, 178
Decision rules, 27-28, 31, 34-37 Forecasting, 159, 207
Decision trees, 28 Fuzzy sets, 30-32
Decision support systems, 41, 49, Genetic algorithms, 71, 120, 187
78,187 Goal programming, 47
Default risk, 161, 186 Group overlap, 130
Degeneracy, 97 ID3, 28
Descriptive statistics, 166, 191, Incomparability relation, 51
213 Jackknife, 215
Discriminant analysis Kurtosis, 72
Linear, 16-18 Linear interpolation, 92
Quadratic, 18-19 Linear probability model, 20
Discriminant function LERS system, 37
Linear, 16 Logit analysis
Quadratic, 18 Logit model, 20-23
Dividend policy, 159 Ordered model, 23
Dominance relation, 37 Multinomial model, 22
Efficient set, 40, 46 Machine learning, 27-30
252

Mean-variance model, 206 Core, 34


Mergers and acquisitions, 160 Discretization, 32
MHDIS Decision rules, 34-37
Classification rule, 102 DOMLEM algorithm, 35
Hierarchical discrimination, Indiscernibility relation, 33
101-105 MODLEM algorithm, 36
Marginal utility functions, Reduct, 34
102-104 Rule induction, 34-36
Model extrapolation, 111- Valued closeness relation, 37
112 Skewness, 133
Mixed-integer programming, 71 Sorting, 4
Model validation, 134, 210, 215 Statistical distribution, 127
Monotonicity, 42-43 Stock evaluation, 205-209
Multiattribute utility theory, 48-49 Tabu search, 71, 120, 230
Multicriteria decision aid, 39-55 Time-series, 185, 207
Multi-group classification, 22, Trade-off, 47, 49
128, 210 Training sample, 8
Multiobjective mathematical UTADIS
programming, 45-48 Additive utility function, 78,
Mutual funds, 160, 205 90, 94
Net present value, 186 Classification rules, 82
Neural networks, 24-27 Criteria subintervals, 91, 96-
Non-compensatory approaches, 97
125 Marginal utility functions,
Opportunity cost, 85,181 79-81, 91-92
Option valuation, 206 Piece-wise linear modeling,
Ordinal regression, 54 96-98
Outranking relation theory, 50-52 Post-optimality analysis, 98-
Portfolio theory, 160, 206, 207 99, 113-122
PREFDIS system, 178 Utility thresholds, 82
Preference disaggregation Utility functions
analysis, 52-55 Additive utility function, 48-
Preferential independence, 49 49
Principal components analysis, Multiplicative utility
133 function, 55
Quadratic programming, 206 Variance-covariance matrix, 17, 19
Random data generation, 131-134 Venture capital, 161
Rank reversal, 58-59 Weighted average model, 54, 80
Risk attitude, 79 Voting algorithms, 229
Reference set, 54, 82
Regression analysis, 6
Rough sets
Applied Optimization

1. D.-Z. Du and D.F. Hsu (eds.): Combinatorial Network Theory. 1996


ISBN 0-7923-3777-8
2. M.J. Panik: Linear Programming: Mathematics, Theory and Algorithms. 1996
ISBN 0-7923-3782-4
3. R.B. Kearfott and V. Kreinovich (eds.): Applications of Interval Computations.
1996 ISBN 0-7923-3847-2
4. N. Hritonenko and Y. Yatsenko: Modeling and Optimization of the Lifetime of Tech-
nology. 1996 ISBN 0-7923-4014-0
5. T. Terlaky (ed.): Interior Point Methods of Mathematical Programming. 1996
ISBN 0-7923-4201-1
6. B. Jansen: Interior Point Techniques in Optimization. Complementarity, Sensitivity
and Algorithms. 1997 ISBN 0-7923-4430-8
7. A. Migdalas, P.M. Pardalos and S. Storøy (eds.): Parallel Computing in Optimization.
1997 ISBN 0-7923-4583-5
8. F. A. Lootsma: Fuzzy Logic for Planning and Decision Making. 1997
ISBN 0-7923-4681-5
9. J.A. dos Santos Gromicho: Quasiconvex Optimization and Location Theory. 1998
ISBN 0-7923-4694-7
10. V. Kreinovich, A. Lakeyev, J. Rohn and P. Kahl: Computational Complexity and
Feasibility of Data Processing and Interval Computations. 1998
ISBN 0-7923-4865-6
11. J. Gil-Aluja: The Interactive Management of Human Resources in Uncertainty. 1998
ISBN 0-7923-4886-9
12. C. Zopounidis and A.I. Dimitras: Multicriteria Decision Aid Methods for the Predic-
tion of Business Failure. 1998 ISBN 0-7923-4900-8
13. F. Giannessi, S. Komlósi and T. Rapcsák (eds.): New Trends in Mathematical Pro-
gramming. Homage to Steven Vajda. 1998 ISBN 0-7923-5036-7
14. Ya-xiang Yuan (ed.): Advances in Nonlinear Programming. Proceedings of the ’96
International Conference on Nonlinear Programming. 1998 ISBN 0-7923-5053-7
15. W.W. Hager and P.M. Pardalos: Optimal Control. Theory, Algorithms, and Applica-
tions. 1998 ISBN 0-7923-5067-7
16. Gang Yu (ed.): Industrial Applications of Combinatorial Optimization. 1998
ISBN 0-7923-5073-1
17. D. Braha and O. Maimon (eds.): A Mathematical Theory of Design: Foundations,
Algorithms and Applications. 1998 ISBN 0-7923-5079-0
Applied Optimization

18. O. Maimon, E. Khmelnitsky and K. Kogan: Optimal Flow Control in Manufacturing.


Production Planning and Scheduling. 1998 ISBN 0-7923-5106-1
19. C. Zopounidis and P.M. Pardalos (eds.): Managing in Uncertainty: Theory and Prac-
tice. 1998 ISBN 0-7923-5110-X
20. A.S. Belenky: Operations Research in Transportation Systems: Ideas and Schemes
of Optimization Methods for Strategic Planning and Operations Management. 1998
ISBN 0-7923-5157-6
21. J. Gil-Aluja: Investment in Uncertainty. 1999 ISBN 0-7923-5296-3
22. M. Fukushima and L. Qi (eds.): Reformulation: Nonsmooth, Piecewise Smooth,
Semismooth and Smooting Methods. 1999 ISBN 0-7923-5320-X
23. M. Patriksson: Nonlinear Programming and Variational Inequality Problems. A Uni-
fied Approach. 1999 ISBN 0-7923-5455-9
24. R. De Leone, A. Murli, P.M. Pardalos and G. Toraldo (eds.): High Performance
Algorithms and Software in Nonlinear Optimization. 1999 ISBN 0-7923-5483-4
25. A. Schöbel: Locating Lines and Hyperplanes. Theory and Algorithms. 1999
ISBN 0-7923-5559-8
26. R.B. Statnikov: Multicriteria Design. Optimization and Identification. 1999
ISBN 0-7923-5560-1
27. V. Tsurkov and A. Mironov: Minimax under Transportation Constrains. 1999
ISBN 0-7923-5609-8
28. V.I. Ivanov: Model Development and Optimization. 1999 ISBN 0-7923-5610-1
29. F.A. Lootsma: Multi-Criteria Decision Analysis via Ratio and Difference Judgement.
1999 ISBN 0-7923-5669-1
30. A. Eberhard, R. Hill, D. Ralph and B.M. Glover (eds.): Progress in Optimization.
Contributions from Australasia. 1999 ISBN 0-7923-5733-7
31. T. Hürlimann: Mathematical Modeling and Optimization. An Essay for the Design
of Computer-Based Modeling Tools. 1999 ISBN 0-7923-5927-5
32. J. Gil-Aluja: Elements for a Theory of Decision in Uncertainty. 1999
ISBN 0-7923-5987-9
33. H. Frenk, K. Roos, T. Terlaky and S. Zhang (eds.): High Performance Optimization.
1999 ISBN 0-7923-6013-3
34. N. Hritonenko and Y. Yatsenko: Mathematical Modeling in Economics, Ecology and
the Environment. 1999 ISBN 0-7923-6015-X
35. J. Virant: Design Considerations of Time in Fuzzy Systems. 2000
ISBN 0-7923-6100-8
Applied Optimization

36. G. Di Pillo and F. Giannessi (eds.): Nonlinear Optimization and Related Topics. 2000
ISBN 0-7923-6109-1
37. V. Tsurkov: Hierarchical Optimization and Mathematical Physics. 2000
ISBN 0-7923-6175-X
38. C. Zopounidis and M. Doumpos: Intelligent Decision Aiding Systems Based on
Multiple Criteria for Financial Engineering. 2000 ISBN 0-7923-6273-X
39. X. Yang, A.I. Mees, M. Fisher and L. Jennings (eds.): Progress in Optimization.
Contributions from Australasia. 2000 ISBN 0-7923-6286-1
40. D. Butnariu and A.N. Iusem: Totally Convex Functions for Fixed Points Computation
and Infinite Dimensional Optimization. 2000 ISBN 0-7923-6287-X
41. J. Mockus: A Set of Examples of Global and Discrete Optimization. Applications of
Bayesian Heuristic Approach. 2000 ISBN 0-7923-6359-0
42. H. Neunzert and A.H. Siddiqi: Topics in Industrial Mathematics. Case Studies and
Related Mathematical Methods. 2000 ISBN 0-7923-6417-1
43. K. Kogan and E. Khmelnitsky: Scheduling: Control-Based Theory and Polynomial-
Time Algorithms. 2000 ISBN 0-7923-6486-4
44. E. Triantaphyllou: Multi-Criteria Decision Making Methods. A Comparative Study.
2000 ISBN 0-7923-6607-7
45. S.H. Zanakis, G. Doukidis and C. Zopounidis (eds.): Decision Making: Recent Devel-
opments and Worldwide Applications. 2000 ISBN 0-7923-6621-2
46. G.E. Stavroulakis: Inverse and Crack Identification Problems in Engineering Mech-
anics. 2000 ISBN 0-7923-6690-5
47. A. Rubinov and B. Glover (eds.): Optimization and Related Topics. 2001
ISBN 0-7923-6732-4
48. M. Pursula and J. Niittymäki (eds.): Mathematical Methods on Optimization in Trans-
portation Systems. 2000 ISBN 0-7923-6774-X
49. E. Cascetta: Transportation Systems Engineering: Theory and Methods. 2001
ISBN 0-7923-6792-8
50. M.C. Ferris, O.L. Mangasarian and J.-S. Pang (eds.): Complementarity: Applications,
Algorithms and Extensions. 2001 ISBN 0-7923-6816-9
51. V. Tsurkov: Large-scale Optimization – Problems and Methods. 2001
ISBN 0-7923-6817-7
52. X. Yang, K.L. Teo and L. Caccetta (eds.): Optimization Methods and Applications.
2001 ISBN 0-7923-6866-5
53. S.M. Stefanov: Separable Programming Theory and Methods. 2001
ISBN 0-7923-6882-7
Applied Optimization

54. S.P. Uryasev and P.M. Pardalos (eds.): Stochastic Optimization: Algorithms and
Applications. 2001 ISBN 0-7923-6951-3
55. J. Gil-Aluja (ed.): Handbook of Management under Uncertainty. 2001
ISBN 0-7923-7025-2
56. B.-N. Vo, A. Cantoni and K.L. Teo: Filter Design with Time Domain Mask Con-
straints: Theory and Applications. 2001 ISBN 0-7923-7138-0
57. S. Zlobec: Stable Parametric Programming. 2001 ISBN 0-7923-7139-9
58. M.G. Nicholls, S. Clarke and B. Lehaney (eds.): Mixed-Mode Modelling: Mixing
Methodologies for Organisational Intervention. 2001 ISBN 0-7923-7151-8
59. F. Giannessi, P.M. Pardalos and T. Rapesák (eds.): Optimization Theory. Recent
Developments from Mátraháza. 2001 ISBN 1-4020-0009-X
60. K.M. Hangos, R. Lakner and M. Gerzson: Intelligent Control Systems. An Introduc-
tion with Examples. 2001 ISBN 1-4020-0134-7
61. D. Gstach: Estimating Output-Specific Efficiencies. 2002 ISBN 1-4020-0483-4
62. J. Geunes, P.M. Pardalos and H.E. Romeijn (eds.): Supply Chain Management:
Models, Applications, and Research Directions. 2002 ISBN 1-4020-0487-7
63. M. Gendreau and P. Marcotte (eds.): Transportation and Network Analysis: Current
Trends. Miscellanea in Honor of Michael Florian. 2002 ISBN 1-4020-0488-5
64. M. Patriksson and M. Labbé (eds.): Transportation Planning. State of the Art. 2002
ISBN 1-4020-0546-6
65. E. de Klerk: Aspects of Semidefinite Programming. Interior Point Algorithms and
Selected Applications. 2002 ISBN 1-4020-0547-4
66. R. Murphey and P.M. Pardalos (eds.): Cooperative Control and Optimization. 2002
ISBN 1-4020-0549-0
67. R. Corrêa, I. Dutra, M. Fiallos and F. Gomes (eds.): Models for Parallel and Distri-
buted Computation. Theory, Algorithmic Techniques and Applications. 2002
ISBN 1-4020-0623-3
68. G. Cristescu and L. Lupsa: Non-Connected Convexities and Applications. 2002
ISBN 1-4020-0624-1
69. S.I. Lyashko: Generalized Optimal Control of Linear Systems with Distributed Para-
meters. 2002 ISBN 1-4020-0625-X
70. P.M. Pardalos and V.K. Tsitsiringos (eds.): Financial Engineering, E-commerce and
Supply Chain. 2002 ISBN 1-4020-0640-3
71. P.S. Knopov and E.J. Kasitskaya: Empirical Estimates in Stochastic Optimization
and Indentification. 2002 ISBN 1 -4020-0707-8
KLUWER ACADEMIC PUBLISHERS – DORDRECHT / BOSTON / LONDON

You might also like