Professional Documents
Culture Documents
(Applied Optimization) Michael Doumpos, Constantin Zopounidis - Multicriteria Decision Aid Classification Methods - Springer (2002) PDF
(Applied Optimization) Michael Doumpos, Constantin Zopounidis - Multicriteria Decision Aid Classification Methods - Springer (2002) PDF
Applied Optimization
Volume 73
Series Editors:
Panos M. Pardalos
University of Florida, U.S.A.
Donald Hearn
University of Florida, U.S.A.
The titles published in this series are listed at the end of this volume.
Multicriteria Decision Aid
Classification Methods
by
Michael Doumpos
and
Constantin Zopounidis
Technical University of Crete,
Department of Production Engineering and Management,
Financial Engineering Laboratory,
University Campus, Chania, Greece
No part of this eBook may be reproduced or transmitted in any form or by any means, electronic,
mechanical, recording, or otherwise, without written consent from the Publisher
PROLOGUE xi
REFERENCES 233
Decision making problems, according to their nature, the policy of the deci-
sion maker, and the overall objective of the decision, may require the choice
of an alternative solution, the ranking of the alternatives from the best to the
worst ones or the assignment of the considered alternatives into predefined
homogeneous classes. This last type of decision problem is referred to as
classification or sorting. Classification problems are often encountered in a
variety of fields including finance, marketing, environmental and energy
management, human resources management, medicine, etc.
The major practical interest of the classification problem has motivated
researchers in developing an arsenal of methods for studying such problems,
in order to develop mathematical models achieving the higher possible clas-
sification accuracy and predicting ability. For several decades multivariate
statistical analysis techniques such as discriminant analysis (linear and quad-
ratic), and econometric techniques such as logit and probit analysis, the lin-
ear probability model, etc., have dominated this field. However, the paramet-
ric nature and the statistical assumptions/restrictions of such approaches
have been an issue of major criticism and skepticism on the applicability and
the usefulness of such methods in practice.
The continuous advances in other fields including operations research
and artificial intelligence led many scientists and researchers to exploit the
new capabilities of these fields, in developing more efficient classification
techniques. Among the attempts made one can mention neural networks,
machine learning, fuzzy sets as well as multicriteria decision aid. Multicrite-
ria decision aid (MCDA) has several distinctive and attractive features, in-
volving, mainly, its decision support orientation. The significant advances in
MCDA over the last three decades constitute a powerful non-parametric al-
ternative methodological approach to study classification problems. Al-
xii
though the MCDA research, until the late 1970s, has been mainly oriented
towards the fundamental aspects of this field, as well as to the development
of choice and ranking methodologies, during the 1980s and the 1990s sig-
nificant research has been undertaken on the study of the classification prob-
lem within the MCDA framework.
Following the MCDA framework, the objective of this book is to provide
a comprehensive discussion of the classification problem, to review the ex-
isting parametric and non-parametric techniques, their problems and limita-
tions, and to present the MCDA approach to classification problems. Special
focus is given to the preference disaggregation approach of MCDA. The
preference disaggregation approach refers to the analysis (disaggregation) of
the global preferences (judgement policy) of the decision maker in order to
identify the criteria aggregation model that underlies the preference result
(classification).
The book is organized in seven chapters as follows:
Initially, in chapter 1 an introduction to the classification problem is pre-
sented. The general concepts related to the classification problem are dis-
cussed, along with an outline of the procedures used to develop classification
models.
Chapter 2 provides a comprehensive review of existing classification
techniques. The review involves parametric approaches (statistical and
econometric techniques) such as the linear and quadratic discriminant analy-
sis, the logit and probit analysis, as well as non-parametric techniques from
the fields of neural networks, machine learning, fuzzy sets, and rough sets.
Chapter 3 is devoted to the MCDA approach. Initially, an introduction to
the main concepts of MCDA is presented along with a panorama of the
MCDA methodological streams. Then, the existing MCDA classification
techniques are reviewed, including multiattribute utility theory techniques,
outranking relation techniques and goal programming formulations.
Chapter 4 provides a detailed description of the UTADIS and MHDIS
methods, including their major features, their operation and model develop-
ment procedures, along with their mathematical formulations. Furthermore, a
series of issues is also discussed involving specific aspects of the functional-
ity of the methods and their model development processes.
Chapter 5 presents an extensive comparison of the UTADIS and MHDIS
methods with a series of well-established classification techniques including
the linear and quadratic discriminant analysis, the logit analysis and the
rough set theory. In addition, ELECTRE TRI a well-known MCDA classifi-
cation method based on the outranking relation theory is also considered in
the comparison and a new methodology is presented to estimate the parame-
ters of classification models developed through ELECTRE TRI. The com-
parison is performed through a Monte-Carlo simulation, in order to investi-
xiii
Decision science is a very broad and rapidly evolving research field at theo-
retical and practical levels. The post-war technological advances in combi-
nation with the establishment of operations research as a sound approach to
decision making problems, created a new context for addressing real-world
problems through integrated, flexible and realistic methodological ap-
proaches. At the same time, the range of problems that can be addressed ef-
ficiently has also been extended. The nature of these problems is widely di-
versified in terms of their complexity, the type of solutions that should be
investigated, as well as the methodological approaches that can be used to
address them.
Providing a full categorization of the decision making problems on the
basis of the above issues is a difficult task depending upon the scope of the
categorization. A rather straightforward approach is to define the two fol-
lowing categories of decision making problems (Figure 1.1):
Discrete problems involving the examination of a discrete set of alterna-
tives. Each alternative is described along some attributes. Within the de-
cision making context these attributes have the form of evaluation crite-
ria.
Continuous problems involving cases where the number of possible
alternatives is infinite. In such cases one can only outline the region
where the alternatives lie (feasible region), so that each point in this re-
2 Chapter 1
When considering a discrete decision making problem, there are four dif-
ferent kinds of analyses (decision making problematics) that can be per-
formed in order to provide meaningful support to decision makers (Roy,
1985; cf. Figure 1.2):
to identify the best alternative or select a limited set of the best alterna-
tives,
to construct a rank–ordering of the alternatives from the best to the worst
ones,
to classify/sort the alternatives into predefined homogenous groups,
to identify the major distinguishing features of the alternatives and per-
form their description based on these features.
The first three forms of decision making problems (choice, ranking, clas-
sification) lead to a specific result regarding the evaluation of the alterna-
tives. Both choice and ranking are based on relative judgments, involving
pair-wise comparisons between the alternatives. Consequently, the overall
evaluation result has a relative form, depending on the alternatives being
evaluated. For instance, an evaluation result of the form “product X is the
best of its kind” is the outcome of relative judgments, and it may change if
the set of products that are similar to product X is altered.
On the contrary, the classification problem is based on absolute judg-
ments. In this case each alternative is assigned to a specific group on the
basis of a pre-specified rule. The definition of this rule, usually, does not
depend on the set of alternatives being evaluated. For instance, the evalua-
tion result “product X does not meet the consumer needs” is based on abso-
lute judgments, since it does not depend on the other products that are simi-
lar to product X. Of course, these judgments are not always absolute, since
1. Introduction to the classification problem 3
they are often defined within the general context characterizing the decision
environment. For instance, under specific circumstances of the general eco-
nomic and business environment a firm may fulfill the necessary require-
ments for its financing by a credit institution (these requirements are inde-
pendent of the population of firms seeking financing). Nevertheless, as the
economic and business conditions evolve, the financing requirements may
change towards being stricter or more relaxed. Therefore, it is possible that
the same firm is rejected credit under in a different decision environment.
Generally, despite any changes that are made in the classification rule used,
this rule is always defined independently of the existing decision alterna-
tives. This is the major distinguishing difference between the classification
problem and the problems of choice or ranking.
4 Chapter 1
rely on the development and use of a quantitative index to decide upon the
assignment of the alternatives1.
1
The term “quantitative models” does not necessarily imply that the corresponding
approaches handle only quantitative variables. The developed function can also
consider qualitative variables too. This will be demonstrated later in this book,
through the presentation of multicriteria decision aid classification methodologies.
10 Chapter 1
fication rules. The conditions part of each rule involves the characteristics of
the alternatives, thus defining the conditions that should be fulfilled in order
for the alternatives to be assigned into the group indicated in the conclusion
part of the rule. Except for a classification recommendation, in some cases
the conclusion part also includes a numerical coefficient representing the
strength of the recommendation (conclusion). Procedures used to develop
such decision rules are referred to as rule induction techniques. Generally, it
is possible to develop an exhaustive set of rules covering all alternatives be-
longing in the training sample, thus producing a zero classification error.
This, however, does not ensure that the developed rules have the necessary
generalizing ability. For this reason, in order to avoid the development of
rules of limited usefulness a more compact set of rules is often developed.
The plethora of real-world classification problems encountered in many
research and practical fields has been the major motivation for researchers
towards the continuous development of advanced classification methodolo-
gies. The general model development procedure presented above, reflects the
general scheme and objective of every classification methodological ap-
proach, i.e. the elicitation of knowledge from a sample of alternatives and its
representation into a functional or symbolic form, such that the reality can be
modeled as consistently as possible. A consistent modeling ensures the reli-
ability of the model’s classification recommendations.
2
Recall from the discussion in section 2 of this chapter that sorting problems involve the
consideration of the existing preferences with regard to the specification of the groups (or-
1. Introduction to the classification problem 11
dinal specification of the groups), while discrimination problems do not consider this spe-
cial feature.
12 Chapter 1
1. INTRODUCTION
As mentioned in the introductory chapter, the major practical importance of
the classification problem motivated researchers towards the development of
a variety of different classification methodologies. The purpose of this chap-
ter is to review the most well-known of these methodologies for classifica-
tion model development. The review is organized into two major parts, in-
volving respectively:
1. The statistical and econometric classification methods which constitute
the “traditional” approach to develop classification models.
2. The non-parametric techniques proposed during the past two decades as
innovative and efficient classification model development techniques.
variate techniques can be traced back to the work of Fisher (1936) on the
linear discriminant analysis (LDA). LDA has been the most extensively used
methodology for developing classification models for several decades. Ap-
proximately a decade after the publication of Fisher’s paper, Smith (1947)
extended LDA to the more general quadratic form (quadratic discriminant
analysis - QDA).
During the subsequent decades the focus of the conducted research
moved towards the development of econometric techniques. The most well-
known methods from this field include the linear probability model, logit
analysis and probit analysis. These three methods are actually special forms
of regression analysis in cases where the dependent variable is discrete. The
linear probability model is only suitable for two-group classification prob-
lems, whereas both logit and probit analysis are applicable to multi-group
problems too. The latter two methodologies have several significant advan-
tages over discriminant analysis. This has been one of the main reasons for
their extensive use.
Despite the criticism on the use of these traditional statistical and econo-
metric approaches, they still remain quite popular both as research tools as
well as for practical purposes. This popularity is supported by the existence
of a plethora of statistical and econometric software, which contribute to the
easy and timeless use of these approaches. Furthermore, statistical and
econometric techniques are quite often considered in comparative studies
investigating the performance of new classification techniques being devel-
oped. In this regard, statistical and econometric techniques often serve as a
reference point (benchmark) in conducting such comparisons. It is also im-
portant to note that under specific data conditions, statistical techniques yield
the optimal classification rule.
where:
is a n×1 vector consisting of the attributes’ mean values for group
is the within-groups variance-covariance matrix. Denoting by m the
number of alternatives in the training sample, by the vector
and by q the number of groups, the matrix is specified as
follows:
1
In contrast to the traditional multivariate regression analysis, in discriminant analysis statis-
tical tests such as the t-test are rarely used to estimate the significance of the discriminant
function coefficients, simply because these coefficients are not unique.
18 Chapter 2
In the case where the group variance-covariance matrices are not equal,
then QDA is used instead of LDA. The general form of the quadratic dis-
criminant function developed through QDA, for each pair of groups and
is the following:
where denotes the number of alternatives of the training sample that be-
long into group
Given the discriminant score of an alternative on every discrimi-
nant function corresponding to a pair of groups and the quadratic clas-
sification rule (Figure 2.2) is similar to the linear case: the alternative is
classified into group if and only if for all other groups the following
inequality holds:
20 Chapter 2
In practice, both in LDA and QDA the specification of the a priori prob-
abilities and the misclassification costs K (k | l) is a cumbersome process.
To overcome this problem, trial and error processes are often employed to
specify the optimal cut-off points in the above presented classification rules.
Except for the above issue, LDA and QDA have been heavily criticized
for a series of other problems regarding their underlying assumptions, in-
volving mainly the assumption of multivariate normality and the hypotheses
made on the structure of the group variance-covariance matrices. A compre-
hensive discussion of the impact that these assumptions have on the obtained
discriminant analysis’ results is presented in the book of Altman et al.
(1981).
Given that the above two major underlying assumptions are valid (multi-
variate normality and known structure of the group variance-covariance ma-
trices), the use of the Bayes rule indicates that the two forms of discriminant
analysis (linear and quadratic) yield the optimal classification rule (the LDA
in the case of equal group variance-covariance matrices and the QDA in the
opposite case). In particular, the developed classification rules are asymp-
totically optimal (as the training sample size increases the statistical proper-
ties of the considered groups approximate the unknown properties of the cor-
responding populations). A formal proof of this finding is presented by Duda
and Hart (1978), as well as by Patuwo et al. (1993).
Such restrictive statistical assumptions, however, are rarely met in prac-
tice. This fact raises a major issue regarding the real effectiveness of dis-
criminant analysis in realistic conditions. Several studies have addressed this
issue. Moore (1973), Krzanowski (1975, 1977), Dillon and Goldstein (1978)
showed that when the data include discrete variables, then the performance
of discriminant analysis deteriorates especially when the attributes are sig-
nificantly correlated (correlation coefficient higher than 0.3). On the con-
trary, Lanchenbruch et al. (1973), Subrahmaniam and Chinganda (1978)
concluded that even in the case of non-normal data the classification results
of discriminant analysis models are quite robust, especially in the case of the
QDA and for data with small degree of skewness.
Logit analysis:
Probit analysis:
The estimation of the constant term a and the vector b, is performed us-
ing maximum likelihood techniques. In particular, the parameters’ estimation
process involves the maximization of the following likelihood function:
2
The first studies on probit and logit analysis can be traced back to the 1930s and the 1940s
with the works of Bliss (1934) and Berkson (1944), respectively.
3
If a binary 0-1 variable is assigned to designate each group such that and
then equations (2.1)-(2.2) provide the probability that an alternative belongs into group C2.
If the binary variable is used in the opposite way then equations
(2.1)-(2.2) provide the probability that an alternative belongs into group C1.
22 Chapter 2
During the last three decades both logit and probit analysis have been ex-
tensively used by researchers in a wide range of fields as efficient alterna-
tives to discriminant analysis. However, despite the theoretical advantages of
these approaches over LDA and QDA (logit and probit analysis do not pose
assumptions on the statistical distribution of the data or the structure of the
group variance-covariance matrices) several comparative studies made have
not clearly shown that these techniques outperform discriminant analysis
(linear or quadratic) in terms of their classification performance
(Krzanowski, 1975; Press and Wilson, 1978).
24 Chapter 2
3. NON-PARAMETRIC TECHNIQUES
In practice the statistical properties of the data are rarely known, since the
underlying population is difficult to be fully specified. This poses problems
on the use of statistical techniques and motivated researchers towards the
development of non-parametric methods. Such approaches have no underly-
ing statistical assumptions and consequently it is expected that they are
flexible enough to adjust themselves according to the characteristics of the
data under consideration. In the subsequent sections the most important of
these techniques are described.
The most widely used network training methodology is the back propaga-
tion approach (Rumerlhart et al., 1986). Recently advanced nonlinear opti-
mization techniques have also contributed in obtaining globally optimum
estimations of the network’s connection weights (Hung and Denton, 1993).
On the basis of the connections’ weights, the input to each node is deter-
mined as the weighted average of the outputs of all other nodes with which
there is a connection established. In the general case of a fully connected
neural network (cf. Figure 2.3) the input to node i of the hidden layer r is
defined as follows:
where:
the number of nodes at the hidden layer j,
the weight of the connection between node i of layer r and node k
of layer j,
the output of node k at layer j ,
an error term.
The output of each node is specified through a transformation function.
The most common form of this function is the logistic function:
Except for the above two problems, research studies investigating the
classification performance of neural networks as opposed to statistical and
econometric techniques have led to conflicting results. Subramanian et al.
(1993) compared neural networks to LDA and QDA through a simulation
experiment using data conditions that were in accordance with the assump-
tions of the two statistical techniques. Their results show that neural net-
works can be a promising approach, especially in cases of complex classifi-
cation problems involving more than two groups and a large set of attributes.
On the other hand, LDA and QDA performed better when the sample size
was increased.
A similar experimental study by Patuwo et al. (1993), leads to the conclu-
sion that there are many cases where statistical techniques outperform neural
networks. In particular, the authors compared neural networks to LDA and
QDA, considering both the case where the data conditions are in line with
the assumptions of these statistical techniques, as well as the opposite case.
According to the obtained results, when the data are multivariate normal
with equal group variance-covariance matrices, then LDA outperforms neu-
ral networks. Similarly in the case of multivariate normality with unequal
variance-covariance matrices, QDA outperformed neural networks. Even in
the case of non-normal data, the results of the analysis did not show any
clear superiority of neural networks, at least compared to QDA.
The experimental analysis of Archer and Wang (1993) is also worth men-
tioning. The authors discussed the way that neural networks can be used to
address sorting problems, and compared their approach to LDA. The results
of this comparison show a higher classification performance for the neural
networks approach, especially when there is a significant degree of group
overlap.
The first part of such rules examines the necessary and sufficient condi-
tions required for the conclusion part to be valid. The elementary conditions
are connected using the AND operator. The conclusion consists of a recom-
mendation on the classification of the alternatives satisfying the conditions
part of the rule.
One of the most widely used techniques developed on the basis of the in-
ductive learning paradigm is the C4.5 algorithm (Quinlan, 1993). C4.5 is an
improved modification of the ID3 algorithm (Quinlan, 1983, 1986). Its main
advantages over its predecessor involve:
1. The capability of handling qualitative attributes.
2. The capability of handling missing information.
3. The elimination of the overfitting problem4.
The decision rules developed through the C4.5 algorithm are organized in
the form of a decision tree such as the one presented in Figure 2.4. Every
node of the tree considers an attribute, while the branches correspond to
elementary conditions defined on the basis of the node attributes. Finally, the
leaves designate the group to which an alternative is assigned, given that it
satisfies the branches’ conditions.
4
Overfitting refers to the development of classification models that perform excellently in
classifying the alternatives of the training sample, but their performance in classifying
other alternatives is quite poor.
2. Review of classification techniques 29
bois and Prade, 1979; Siskos, 1982; Siskos et al., 1984a; Fodor and
Roubens, 1994; Grabisch, 1995, 1996; Lootsma, 1997).
5
Discretization involves the partitioning of an attribute’s domain [a, b] into h subintervals
where and
2. Review of classification techniques 33
Having defined the quality of the approximation, the first major capabil-
ity that the rough set theory provides is to reduce the available information,
so as to retain only the information that is absolutely necessary for the de-
scription and classification of the alternatives. This is achieved by discover-
ing subsets R of the complete set of attributes P, which can provide the same
quality of classification as the whole attributes’ set, i.e Such
subsets of attributes are called reducts and are denoted by Gener-
ally, the reducts are more than one. In such a case the intersection of all re-
ducts is called the core, i.e The core is the collec-
tion of the most relevant attributes, which cannot be excluded from the
analysis without reducing the quality of the obtained description (classifica-
tion). The decision maker can examine all obtained reducts and proceed to
the further analysis of the considered problem according to the reduct that
best describes reality. Heuristic procedures can also be used to identify an
appropriate reduct (Slowinski and Zopounidis, 1995).
The subsequent steps of the analysis involve the development of a set of
rules for the classification of the alternatives into the groups where they ac-
tually belong. The rules developed through the rough set approach have the
following form:
The procedures used to construct a set of decision rules employ the ma-
chine learning paradigm. Such procedures developed within the context of
the rough set theory have been presented by Grzymala-Busse (1992), Slow-
inski and Stefanowski (1992), Skowron (1993), Ziarko et al. (1993), Ste-
2. Review of classification techniques 35
6
Rules covering only alternatives that belong to the group indicated by the conclusion ofthe
rule (positive examples) are called discriminant rules. On the contrary, rules that cover
both positive and negative examples (alternatives not belonging into the group indicated by
the rule) are called partly discriminant rules. Each partly discriminant rule is associated
with a coefficient measuring the consistency of the rule. This coefficient is called level of
discrimination and is defined as the ratio of positive to negative examples covered by the
rule.
7
Completeness refers to a set of rules that cover all alternatives of the training sample. A set
of rules is called non-redundant if the elimination of any single rule from initial rule set
leads to a new set of rules that does not have the completeness property.
36 Chapter 2
the strength for each individual group of the condition part must be consid-
ered). The stronger rule can be used to take the final classification decision.
This approach is employed in the LERS classification system developed by
Grzymala-Busse (1992).
Situation (4) is the most difficult one, since using the developed rule set
one has no evidence as to the classification of the alternative. The LERS sys-
tem tackles this problem through the identification of rules that partly cover
the characteristics of the alternative under consideration8. The strength of
these rules as well as the number of elementary conditions satisfied by the
alternative are considered in making the decision. This approach will be dis-
cussed in mode detail in Chapter 5. An alternative approach proposed by
Slowinski (1993), involves the identification of a rule that best matches the
characteristics of the alternative under consideration. This is based on the
construction of a valued closeness relation measuring the similarity between
each rule and the alternative. The construction of this relation is performed
in two stages. The first stage involves the identification of the attributes that
are in accordance to the affirmation “the alternative is close to rule r”. The
strength of this affirmation is measured on a numerical scale between 0 and
1. The second stage involves the identification of the characteristics that are
in discordance with the above affirmation. The strength of concordance and
discordance tests are combined to estimate an overall index representing the
similarity of a rule to the characteristics of the alternative.
Closing this brief discussion of the rough set approach, it is important to
note the recent advances made in this field towards the use of the rough set
approach as a methodology of preference modeling in multicriteria decision
problems (Greco et al., 1999a, 2000a). The main novelty of the recently de-
veloped rough set approach concerns the possibility of handling criteria, i.e.
attributes with preference ordered domains, and preference ordered groups in
the analysis of sorting examples and the induction of decision rules. The
rough approximations of decision groups involve dominance relation, in-
stead of indiscernibility relation considered in the basic rough set approach.
They are build of reference alternatives given in the sorting example (train-
ing sample). Decision rules derived from these approximations constitute a
preference model. Each “if ... then ...” decision rule is composed of: (a) a
condition part specifying a partial profile on a subset of criteria to which an
alternative is compared using the dominance relation, and (b) a decision part
suggesting an assignment of the alternative to “at least” or “at most” a given
class9.
8
Partly covering involves the case where the alternative satisfies only some of the elemen-
tary conditions of a rule.
9
The DOMLEM algorithm discussed previously in this chapter is suitable for developing
such rules.
38 Chapter 2
The decision rule preference model has also been considered in terms of
conjoint measurement (Greco et al., 2001). A representation theorem for
multicriteria sorting proved by Greco et al. states an equivalence of simple
cancellation property, a general discriminant (sorting) function and a specific
outranking relation (cf. Chapter 3), on the one hand, and the decision rule
model on the other hand. It is also shown that the decision rule model result-
ing from the dominance-based rough set approach has an advantage over the
usual functional and relational models because it permits handling inconsis-
tent sorting examples. The inconsistency in sorting examples is not unusual
due to instability of preference, incomplete determination of criteria and
hesitation of the decision maker.
It is also worth noting that the dominance-based rough set approach is
able to deal with sorting problems involving both criteria and regular attrib-
utes whose domains are not preference ordered (Greco et al., 2002), and
missing values in the evaluation of reference alternatives (Greco et al.,
1999b; Greco et al., 2000b). It also handles ordinal criteria in more general
way than the Sugeno integral, as it has been proved in Greco et al. (2001).
The above recent developments have attracted the interest of MCDA re-
searchers on the use of rough sets as an alternative preference modeling
framework to the ones traditionally used in MCDA (utility function, out-
ranking relation; cf. Chapter 3). Therefore, the new extended rough set the-
ory can be considered as a MCDA approach. Nevertheless, in this book the
traditional rough set theory based on the indiscernibility relation is consid-
ered as an example of rule-based classification techniques that employ the
machine learning framework. The traditional rough set theory cannot be con-
sidered as a MCDA approach since it is only applicable with attributes (in-
stead of criteria) and with nominal groups. This is the reason for the inclu-
sion of the rough sets in this chapter rather than the consideration of rough
sets in Chapter 3 that refers to MCDA classification techniques.
Chapter 3
Multicriteria decision aid classification techniques
1. INTRODUCTION TO MULTICRITERIA
DECISION AID
on the model development aspects that are related to the modeling and repre-
sentation of the decision makers’ preferences, values and judgment policy.
This feature is of major importance within a decision making context,
bearing in mind that an actual decision maker is responsible for the imple-
mentation of the results of any decision analysis procedure. Therefore, de-
veloping decision models without considering the decision maker’s prefer-
ences and system of values, may be of limited practical usefulness. The deci-
sion maker is given a rather passive role in the decision analysis context. He
does not participate actively to the model development process and his role
is restricted to the implementation of the recommendation of the developed
model, whose features are often difficult to understand.
The methodological advances made in the MCDA field involve any form
of decision making problem (choice, ranking, classification/sorting and de-
scription problems). The subsequent sub-sections describe the main MCDA
methodological approaches and their implementation to address classifica-
tion problems.
The first level of the above process, involves the specification of a set A
of feasible alternative solutions to the problem at hand (alternatives). The
objective of the decision is also determined. The set A can be continuous or
discrete. In the former case it is specified through constraints imposed by the
42 Chapter 3
These properties define the main distinctive feature of the criterion con-
cept compared to the attribute concept often used in other disciplines such as
statistics, econometrics, artificial intelligence, etc. (cf. the previous chapter).
Both an attribute and a criterion assign a description (quantitative or qualita-
tive) to an alternative. In the case of a criterion, however, this description
entails some preferential information regarding the performance of an alter-
native compared to other alternatives.
The set of the criteria identified at this second stage of
the decision aiding process, must form a consistent family of criteria. A con-
sistent family of criteria is a set of criteria having the following properties:
1. Monotonicity: every criterion must satisfy the conditions described by
relations (3.1) and (3.2). Some criteria often satisfy (3.1) in the opposite
way: In this case the criterion g is referred to as cri-
terion of decreasing preference (lower values indicate higher prefer-
ence). Henceforth, this book will not make any distinction between cri-
teria of increasing or decreasing preference (any decreasing preference
criterion can be transformed to an increasing preference criterion
through sign reversal). A specified criteria set is considered to satisfy
the monotonicity property if and only if: for every pair of alternatives x
3. Multicriteria decision aid classification techniques 43
2. METHODOLOGICAL APPROACHES
As already noted, MCDA provides a plethora of methodologies for address-
ing decision making problems. The existing differences between these meth-
odologies involve both the form of the models that are developed as well as
the model development process. In this respect, MCDA researchers have
defined several categorizations of the existing methodologies in this field.
Roy (1985) identified three major methodological streams considering the
features of the developed models:
1. Unique synthesis criterion approaches.
2. Outranking synthesis approaches.
3. Interactive local judgment approaches.
44 Chapter 3
1
Henceforth, all subsequent discussion made in this book adopts the approach presented by
Pardalos et al . (1995).
3. Multicriteria decision aid classification techniques 45
tions and preference disaggregation analysis can also be used within the con-
text of continuous decision problems. In this case, they provide the necessary
means to model the decision maker’s preferential system in a functional or
relational model, which can be used in a second stage in an optimization
context (multiobjective mathematical programming). A well-known example
where this framework is highly applicable is the portfolio construction prob-
lem, i.e. the construction of a portfolio of securities that maximizes the in-
vestor’s utility. In this case, the multiattribute utility theory or the preference
disaggregation analysis can be used to estimate an appropriate utility func-
tion representing the investors’ decision making policy. Similarly, the mul-
tiobjective mathematical programming framework can be used in combina-
tion with the other MCDA approaches to address discrete problems. Within
this context, multiobjective mathematical programming techniques are
commonly used for model development purposes. This approach is em-
ployed within the preference disaggregation analysis framework, discussed
later on in this chapter (cf. sub-section 2.4).
The following sub-sections outline the main concepts and features of
each of the aforementioned MCDA approaches. This discussion provides the
basis for reviewing the use of MCDA for classification purposes.
where:
x is the vector of the decision variables,
are the objective functions (linear or non-linear) to be opti-
mized,
B is the set of feasible solutions.
In contrast to the traditional mathematical programming theory, within
the MMP framework the concept of optimal solution is no longer applicable.
This is because the objective functions are of conflicting nature (the opposite
is rarely the case). Therefore, it is not possible to find a solution that opti-
mizes simultaneously all the objective functions. In this regard, within the
46 Chapter 3
Max/Min
subject to :
where
is goal defined as a function (linear or non-linear) of the deci-
sion variables
is the target value for goal
are the deviations from the target value repre-
senting the under-achievement and over-achievement of the goal
respectively.
48 Chapter 3
(alternative x is preferred to )
(alternative x is indifferent to )
The most commonly used form of utility function is the additive one:
where,
3. Multicriteria decision aid classification techniques 49
fee containing of sugar and a cup of coffee that is full of sugar, irre-
spective of Obviously, this is an incorrect conclusion, indicating that
there are cases where transitivity is not valid.
2. The outranking relation is not complete: In the MAUT framework only
the preference and indifference relations are considered. In addition to
these two relations, ORT introduces the incomparability relation. In-
comparability arises in cases where the considered alternatives have ma-
jor differences with respect to their characteristics (performance on the
evaluation criteria) such that their comparison is difficult to be per-
formed.
Despite the above two major differences, both MAUT and ORT use simi-
lar model development techniques, involving the direct interrogation of the
decision maker. Within the ORT context, the decision maker specifies sev-
eral structural parameters of the developed outranking relation. In most ORT
techniques these parameters involve:
1. The significance of the evaluation criteria.
2. Preference, indifference and veto thresholds. These thresholds define a
fuzzy outranking relation such as the one presented in Figure 3.4. Fur-
thermore, the introduction of the veto threshold facilitates the develop-
ment of non-compensatory models (models in which the significantly
low performance of an alternative in an evaluation criterion is not com-
pensated by the performance of the alternatives on the remaining crite-
ria).
The combination of the above information enables the decision-analyst to
measure the strength of the indications supporting the affirmation “alterna-
tive x is at least as good as alternative as well as the strength of the in-
dications against this affirmation.
52 Chapter 3
Saaty (1980) first proposed the AHP method (Analytic Hierarchy Process)
for addressing complex decision making problem involving multiple criteria.
The method is particularly well suited for problems where the evaluation
criteria can be organized in a hierarchical way into sub-criteria. During the
last two decades the method has become very popular, among operations
researchers and decision scientists, mainly in USA. At the same time, how-
56 Chapter 3
ever, it has been heavily criticized for some major theoretical shortcomings
involving its operation.
AHP models a decision making problem through a process involving four
stages:
Stage 1 : Hierarchical structuring of the problem.
Stage 2 : Data input.
Stage 3 : Estimation of the relative weights of the evaluation criteria.
Stage 4 : Combination of the relative weights to perform an overall evalua-
tion of the alternatives (aggregation of criteria).
In the first stage the decision maker defines a hierarchical structure repre-
senting the problem at hand. A general form of such a structure is presented
in Figure 3.6. The top level of the hierarchy considers the general objective
of the problem. The second level includes all the evaluation criteria. Each
criterion is analyzed in the subsequent levels into sub-criteria. Finally, the
last level of the hierarchy involves the objects to be evaluated. Within the
context of a classification problem the elements of the final level of the hier-
archy represent the choices (groups) available to the decision maker regard-
ing the classification of the alternatives. For instance, for a two-group classi-
fication problem the last level of the hierarchy will include two elements
corresponding to group 1 and group 2.
3. Multicriteria decision aid classification techniques 57
Once the hierarchy of the problem is defined, in the second stage of the
method the decision maker performs pairwise comparisons of all elements at
each level of the hierarchy. Each of these comparisons is performed on the
basis of the elements of the proceeding level of the hierarchy. For instance,
considering the general hierarchy of Figure 3.6 at the first level, no compari-
sons are required (the first level involves only one element). In the second
level, all elements (evaluation criteria) are compared in a pairwise way on
the basis of the objective of the problem (first level of the hierarchy). Then,
the sub-criteria of the third level are compared each time from a different
point of view considering each criterion of the second level of the hierarchy.
For instance, the sub-criteria and are initially compared on the basis
of the criterion then on the basis of criterion etc. The same process is
continued until all elements of the hierarchy are compared.
The objective of all these comparisons is to assess the relative signifi-
cance of all elements of the hierarchy in making the final decision according
to the initial objective. The comparisons are performed using the 9-point
scale presented in Table 3.1.
The results of the comparisons made by the decision maker are used to
form a n×n matrix for each level k of the hierarchy, where denotes the
number of elements in level k.
of the level k-1. Assuming that all comparisons are consistent, the weights
can be estimated through the solution of the following system of linear
equalities:
If is known, then this relation can be used to solve for The prob-
lem for solving for a nonzero solution to this set of equation is known as the
eigenvalue problem:
where denotes the weight of criterion (the criteria weights are specified
by the decision maker), and denotes the partial concordance index
defined for criterion Each partial concordance index measures the
strength of the affirmation “alternative is at least as good as profile on
the basis of criterion The estimation of the partial concordance index
requires the specification of two parameters: the preference threshold and the
indifference threshold. The preference threshold for criterion represents
the largest difference compatible with a preference in favor of on
criterion The indifference threshold for criterion represents the
smallest difference that preserves indifference between an alternative
and profile on criterion The values of these thresholds are specified
by the decision maker in cooperation with the decision analyst. On the basis
of these thresholds, the partial concordance index is estimated as follows
(Figure 3.8):
62 Chapter 3
where, F denotes the set of criteria for which the discordance index is higher
than the concordance index:
Obviously, if
3. Multicriteria decision aid classification techniques 63
In the first case, both the optimistic and the pessimistic procedures will
assign the alternative into group In the second case, however, the pes-
simistic procedure will assign the alternative into group whereas the op-
timistic procedure will assign the alternative into group
Overall, the key issue for the successful implementation of the above
process is the elicitation of all the preferential parameters involved (i.e., cri-
teria weights, preference, indifference, veto thresholds, profiles). This elici-
tation is often cumbersome in real-world situations due to time constraints or
the unwillingness of the decision makers to actively participate in a direct
interrogation process managed by an expert decision analyst. Recently,
Mousseau and Slowinski (1998) proposed a methodology to infer all this
preferential information using the principles of PDA. The main features, ad-
vantages and disadvantages of this methodology will be discussed in the Ap-
pendix of Chapter 5, together with the presentation of a new approach to ad-
dress this problem.
2
and denote two profiles ranging between 0.5 and 1, which are defined by the decision
analyst.
3
denotes a profile ranging between 0.5 and 1.
66 Chapter 3
Both the ELECTRE TRI method and the N–TOMIC method are suitable for
addressing sorting problems where the groups are defined in an ordinal way.
The major distinguishing feature of the PROAFTN method (Belacel, 2000)
and the method of Perny (1998) is their applicability in classification prob-
lems with nominal groups.
In such cases, the reference profiles distinguishing the groups cannot be
defined such that they represent the lower bound of each group. Instead,
each reference profile is defined such that it indicates a representative exam-
ple of each group. On the basis of this approach both the PROAFTN method
and the Perny’s method develop a fuzzy indifference relation measuring the
strength of the affirmation “alternative is indifferent to profile The
development of the fuzzy indifference relation is based on similar procedures
to the one used in ELECTRE TRI. Initially, the indications supporting the
above affirmation are considered through the concordance test. Then, the
discordance test is employed to measure the indications against the above
affirmation. The realization of the two tests leads to the estimation of the
credibility index measuring the indifference degree between an al-
ternative and the profile The credibility index is used to decide upon
the classification of the alternatives. The assignment (classification) proce-
dure consists of comparing an alternative to all reference profiles, and as-
signing the alternative to the group for which the alternative is most similar
(indifferent) to the corresponding profile. This is formally expressed as fol-
lows:
a utility function is only applicable in sorting problems where the groups are
defined in an ordinal way). Furthermore, using the Bayes rule it can be
shown that the linear discriminant function is the optimal classification
model (in terms of the expected classification error) when the data are multi-
variate normal with equal group dispersion matrices (Patuwo et al., 1993).
These assumptions are strong, however, and only rarely satisfied in practice.
On the basis of the linear discriminant function, Freed and Glover
(1981a) used the following simple classification rule for two-group classifi-
cation problems:
The first approach proposed by Freed and Glover (1981a), introduced the
minimum distance between the alternatives’ scores and the cut-off point c as
the model development criterion. This is known as the MMD model (maxi-
mize the minimum distance; cf. Figure 3.10):
Max d
Subject to:
d unrestricted in sign
c user-defined constant
Soon after the publication of their first paper, Freed and Glover published
a second one (Freed and Glover, 1981b) describing an arsenal of similar
goal-programming formulations for developing classification models. The
most well-known of these is the MSD model (minimize the sum of devia-
tions), which considers two measures for the quality of the classification ob-
tained through the developed models (Figure 3.11): (1) the violation of the
classification rules (3.3) by an alternative of the training sample, and (2)
the distance (absolute difference) between a correctly classified alternative
and the cut-off point that discriminates the groups. On the basis of these two
3. Multicriteria decision aid classification techniques 69
Subject to:
where and are constants representing the relative significance of the two
goals of the problem (minimization of the violations and maximization of
the distances These constants are specified by the decision maker such
that
4. The nature of the problems that are addressed. The existing research is
heavily focused on classification problems where the groups are defined
in a nominal way. However, bearing in mind that sorting problems (or-
dinal groups) are of particular interest in many real-world decision mak-
ing fields, it is clear that this field is of major practical and research in-
terest and it deserves further investigation.
The MCDA methods that will be presented in detail in the next chapter
address most of the above issues in an integrated and flexible framework.
Chapter 4
Preference disaggregation classification methods
1. INTRODUCTION
The review of MCDA classification methods presented in the previous chap-
ter reveals two major shortcomings:
1. Several MCDA classification methods require the definition of a signifi-
cant amount of information by the decision maker. The process involv-
ing the elicitation of this information is often cumbersome due to: (1)
time constraints, (2) the willingness of the decision maker to participate
actively in this process, and (3) the ability of the analyst to interact effi-
ciently with the decision maker.
2. Other MCDA techniques that employ the preference disaggregation phi-
losophy usually assume a linear relationship between the classification
of the alternatives and their characteristics (criteria). Such an approach
implicitly assumes that the decision maker is risk–neutral which is not
always the case.
This chapter presents two MCDA classification methods that respond
satisfactory to the above limitations. The considered methods include the
UTADIS method (UTilités Additives DIScriminantes) and the MHDIS
method (Multi–group Hierarchical DIScrimination). Both methods combine
a utility function–based framework with the preference disaggregation para-
digm. The problems addressed by UTADIS and MHDIS involve the sorting
of the alternatives into q predefined groups defined in an ordinal way:
78 Chapter 4
where denotes the group consisting of the most preferred alternatives and
denotes the group of the least preferred alternatives.
The subsequent sections of this chapter discuss in detail all the model
development aspects of the two methods as well as all the important issues
of the model development and implementation process.
where:
4. Preference disaggregation classification methods 79
where and denote the least and the most preferred value of criterion
respectively. These values are specified according to the set A of the al-
ternatives under consideration, as follows:
For increasing preference criteria (criteria for which higher values in-
dicate higher preference, e.g. return/profitability criteria):
and
For decreasing preference criteria (criteria for which higher values in-
dicate lower preference, e.g. risk/cost criteria):
and
Transforming the criteria’s scale into utility terms through the use of
marginal utility functions has two major advantages:
1. It enables the modeling and representation of the nonlinear behavior of
the decision maker when evaluating the performance of the alternatives.
2. It enables the consideration of qualitative criteria in a flexible way. Con-
sider for instance, a qualitative corporate performance criterion repre-
senting the organization of a firm measured through a three–level quali-
tative scale: “good”, “medium”, and “poor”. Using such a qualitative cri-
terion through simple weighted average models, requires the a priori as-
signment of a numerical value to each level of the qualitative scale (e.g.,
Such an assignment is often arbitrary
and misleading. On the contrary, the specification of the marginal utility
function provides a sound methodological mechanism to identify the
value (in quantitative terms) that the decision maker assigns to each
level of the qualitative scale. Within the context of the UTADIS method
and the preference disaggregation framework in general, the form of the
4. Preference disaggregation classification methods 81
On the basis of this binary variable, the classification error rate is de-
fined as the ratio of the number of misclassified alternatives to the total
number of alternatives in the reference set:
84 Chapter 4
This classification error rate measure is adequate for cases where the
number of alternatives of each group in the reference set is similar along all
groups In the case however, where there are significant
differences then the use of the classification error
rate defined in (4.3) may lead to misleading results. For instance, consider a
reference set consisting of 10 alternatives, 7 belonging into group and 3
belonging into group In this case a classification that as-
signs correctly all alternatives of group and incorrectly all alternatives of
group has an error rate
In the above example the error rates for the two groups (0% for and
100% for can be considered as estimates for the probabilities
and respectively. Assuming that the a priori probabilities for the
two groups are equal then the expected error of the classifi-
cation is 0.5:
Even though this measure takes into consideration the a priori probabili-
ties of each group, it assumes that all classification errors are of equal cost to
the decision maker. This is not always the case. For instance the classifica-
tion error regarding the assignment of a bankrupt firm to the group of
healthy firms is much more costly to an error involving the assignment of a
healthy firm to the bankrupt group. The former leads to capital cost (loss of
the amount of credit granted to a firm), whereas the latter leads to opportu-
nity cost (loss of profit that would result from granting a credit to a healthy
firm). Therefore, it would be appropriate to extend the expected classifica-
tion error rate (4.4) so that the costs of each individual error are also consid-
ered. The resulting measure represents the expected misclassification cost
(EMC), rather than the expected classification error rate:
where:
is the misclassification cost involving the classification of an alter-
native of group into group
is a binary 0–1 variable defined such that if an alternative
is classified into group and if is not classified
into group
Comparing expressions (4.4) and (4.5) it becomes apparent that the ex-
pected classification error rate in (4.4) is a special case of the expected mis-
classification cost, when all costs are considered equal for every k, l=1,
2, …, q. The main difficulty related to the use of the expected misclassifica-
86 Chapter 4
tion cost as the appropriate measure of the quality of the obtained classifica-
tion is that it is often quite difficult to have reliable estimates for the cost of
each type of classification error. For this reason, all subsequent discussion in
this book concentrates on the use of the expected classification error rate
defined in (4.4). Furthermore, without loss of generality, it will be assumed
that all a priori probabilities are equal to
If the expected classification error rate, regarding the classification of the
alternatives that belong into the reference set, is considered satisfactory, then
this constitutes an indication that the developed classification model might
be useful in providing reliable recommendations for the classification of
other alternatives. On the other hand, if the obtained expected classification
error rate indicates that the classification of the alternatives in the reference
set is close to a random classification then the de-
cision maker must check the reference set regarding its completeness and
adequacy for providing representative information on the problem under
consideration. Alternatively, it is also possible that the criteria aggregation
model (additive utility function) is not able to provide an adequate represen-
tation of the decision maker’s preferential system. In such a case an alterna-
tive criteria aggregation model must be considered.
However, it should be pointed that a low expected classification error
rate does not necessarily ensure that practical usefulness of the developed
classification model; it simply provides an indication supporting the possible
usefulness of the model. On the contrary, a high expected classification error
rate leads with certainty to the conclusion that the developed classification
model is inadequate.
These expressions illustrate better the notion of the two classification er-
ror forms. The error indicates that to classify correctly a misclassified
alternative that actually belongs into group its global utility
should be increased by Similarly, the errors indicates that to
classify correctly a misclassified alternative that actually belongs into
its global utility should be decreased by
Introducing these error terms in the additive utility model, it is possible
to rewrite the classification rule (4.2) in the form of the following con-
straints:
4. Preference disaggregation classification methods 89
subject to:
cally, however, since all alternatives of group are placed on the utility
threshold, the generalizing ability of such a model is expected to be limited.
Therefore, to avoid such situations a small positive (non–zero) value for the
constant should be chosen. The constant in (4.12)–(4.13) is used in a
similar way.
Constraints (4.14) and (4.15) are used to normalize the global utilities in
the interval [0, 1]. In these constraints and denote the vectors consisting
of the most and the least preferred alternatives of the evaluation criteria. Fi-
nally, constraint (4.16) is used to ensure that the utility threshold dis-
criminating groups and is higher than the utility threshold dis-
criminating groups and This specification ensures the ordering of
the groups from the most preferred to the least preferred ones In
this ordering of the groups, higher utilities are assigned to the most preferred
groups. In constraint (4.16) s is a constant defined such that
Introducing the additive utility function (4.1) in MP1 leads to the formu-
lation of a nonlinear programming problem. This is because the additive util-
ity function (4.1) has two unknown parameters to be specified: (a) the crite-
ria weights and (b) the marginal utility functions. Therefore, constraints
(4.11)–(4.15) take a nonlinear form and the solution of the resulting nonlin-
ear programming problem can be cumbersome. To overcome this problem,
the additive utility function (4.1) is rewritten in a simplified form as follows:
where:
Both (4.1) and (4.18) are equivalent expressions for the additive utility
function. Nevertheless, the latter requires only the specification of the mar-
ginal utility functions As illustrated in Figure 4.1, these func-
tions can be of any form. The UTADIS method does not pre–specify a func-
tional form for these functions. Therefore, it is necessary to express the mar-
ginal utility functions in terms of specific decision variables to be estimated
through the solution of MP1. This is achieved through the modeling of the
marginal utilities as piece–wise linear functions through a process that is
graphically illustrated in Figure 4.5.
4. Preference disaggregation classification methods 91
Once the marginal utilities for every break–point are estimated, the mar-
ginal utility of any criterion value can be found using a simple linear
interpolation:
where
subject to:
4. Preference disaggregation classification methods 93
Almost half of the constraints in this simple case are monotonicity con-
straints determined by the number of criteria and the definition of the subin-
tervals. The increased number of these constraints increases the computa-
tional effort required to solve LP1. This problem can be easily addressed if
the monotonicity constraints are transformed to non–negativity constraints
(non–negativity constraints do not increase the computational effort in linear
programming). This transformation is performed using the approach pro-
posed by Siskos and Yannacopoulos (1985). In particular, new variables
are introduced representing the differences between the marginal utilities of
two consecutive break–points and
94 Chapter 4
where
Global utilities:
According to all the above changes LP1 is now rewritten in a new form
(LP2) presented below. Table 4.2 illustrates the dimensions of the new prob-
lem.
subject to:
4. Preference disaggregation classification methods 95
The way that the piece–wise linear modeling of the marginal utility functions
is performed is quite significant for the stability and the performance of the
additive utility classification models developed through the UTADIS
method. This issue is related to the subintervals defined for each criterion’s
range and consequently to the number of incremental variables w of LP2.
In traditional statistical regression it is known that to induce statistically
meaningful estimates for a regression model consisting of n independent
variables the model development sample should have at least n+1 observa-
tions. Horsky and Rao (1984) emphasize the fact that this observation also
holds for mathematical programming approaches.
In the case of the UTADIS method every basic solution of LP2 includes
as many variables as the number of constraints
In addition, the optimal basic solution includes the utility thresholds (q–1
variables). Therefore, overall the optimal solution includes at most
of the incremental variables w. It is obvious, that if a
large number of subintervals is determined such that the number of incre-
mental variables w exceeds t, then at least incremental variables w will
not be included in any basic solution of LP2 (they will be redundant). Such a
case affects negatively the developed model, increasing the instability of the
estimates of the true significance of the criteria.
One way to address this issue is to increase the number of constraints of
LP2. Such an approach has been used by Oral and Kettani (1989). The ap-
pendix of this chapter also presents a way that such an approach can be im-
plemented. Increasing the number of constraints, however, results to in-
creased computational effort required to obtain an optimal solution.
An alternative approach that is not subject to this limitation is to con-
sider appropriate techniques for determining how the criteria scale is divided
into subintervals. The heuristic HEUR1 presented earlier in this chapter is a
simple technique that implements this approach. However, this heuristic
does not consider how alternatives of different groups are distributed in each
criterion’s scale. To accommodate this valuable information, a new simple
heuristic can be proposed, which will be referred to as HEUR2. This heuris-
tic is performed for all quantitative criteria in five steps as follows:
Step 1: Rank–order all alternatives of the reference set according to
their performances on each quantitative criterion from the
least to the most preferred ones. Set the minimum acceptable
number of alternatives belonging into a subinterval equal to zero
4. Preference disaggregation classification methods 97
Step 3: Check the number of alternatives that lie into each subinterval
formed after step 2. If the number of alternatives in a subinterval is
less than then merge this subinterval with the precedent one
(this check is skipped when ).
Step 4: Check the consistency of the total number of subintervals formed
after step 3 for all criteria, as opposed to the size of the linear pro-
gram LP2, i.e. the number of constraints. If the number of subin-
tervals leads to the specification of more than
incremental variables w, then set
and repeat the process from step 3; otherwise the procedure ends.
The recent study of Doumpos and Zopounidis (2001) showed that under
several data conditions HEUR2 increases the stability of the developed addi-
tive utility classification models and contributes positively to the improve-
ment of their classification performance.
The simple linear form of LP2 ensures the existence of a global optimum
solution. However, often there are multiple optimal solutions. In the linear
programming theory this phenomenon is known as degeneracy. The exis-
tence of multiple optimal solutions is most often when the groups are per-
fectly separable, i.e., when there is no group overlap. In such cases all error
variables and are zero. The determination of a large number of crite-
ria subintervals is positively related to the existence of multiple optimal solu-
tions (as already mentioned as the number of subintervals increases, the de-
grees of freedom of the developed additive utility model also increases and
so does the fitting ability of the model). Even if the subintervals are defined
in an appropriate way, on the basis of the remarks pointed out in the previous
sub–section, this does not necessarily eliminate the degeneracy phenomenon
for LP2 and the existence of multiple optimal solutions.
In addition to the degeneracy phenomenon, it is also important to em-
phasize that even if a unique optimal solution does exist for LP2 its stability
needs to be carefully considered. A solution is considered to be stable if it is
not significantly affected by small tradeoffs to the objective function (i.e., if
near-optimal solutions are quite similar to the optimal one). The instability
of the optimal solution is actually the result of overfitting the developed ad-
ditive utility model to the alternatives of the reference set. This may affect
negatively the generalizing classification performance of the developed clas-
98 Chapter 4
where:
is the optimal value for the objective function of LP2,
is the value of the objective function of LP2 evaluated for any new solu-
tion obtained during the post–optimality stage.
is a small portion of (a tradeoff made to the optimal value of the ob-
jective function in order to investigate the existence of near optimal so-
lutions).
This constraint in added to the formulation of LP2 and the new linear
program that is formed, is solved to maximize either the criteria weights or
the utility thresholds as noted above.
Finally, the additive utility model used to perform the classification of
the alternatives is formed from the average of all solutions obtained during
the post–optimality stage.
Overall, despite the problems raised by the existence of multiple optimal
solutions, it should be noted that LP2 provides consistent estimates for the
parameters of the additive utility classification model. The consistency prop-
erty for mathematical programming formulations used to estimate the pa-
rameters of a decision making model was first introduced by Charnes et al.
(1955). The authors consider a mathematical programming formulation to
satisfy the consistency property if it provides estimates of the model’s pa-
rameters that approximate (asymptotically) the true values of the parameters
as the number of observations (alternatives) used for model developed in-
creases. According to the authors this is the most significant property that a
mathematical programming formulation used for model development should
have, since it ensures that the formulation is able to identify the true values
of the parameters under consideration, given that enough information is
available.
100 Chapter 4
LP2 has the consistency property. Indeed, as new alternatives are added
in an existing reference set and given that these alternatives add new infor-
mation (i.e., they are not dominated by alternatives already belonging in the
reference set), then the new alternatives will add new non–redundant con-
straints in LP2. These constraints reduce the size of the feasible set. Asymp-
totically, for large reference sets, this will lead to the identification of a
unique optimal solution that represents the decision maker’s judgment policy
and preferential system.
1
This objective corresponds to the maximization of the variance among groups in tradi-
tional discriminant analysis; cf. chapter 2.
4. Preference disaggregation classification methods 101
Within this framework the procedure starts from group (most pre-
ferred alternatives). The alternatives found to belong into group (correctly
or incorrectly) are excluded from further consideration. In a second stage the
objective is to identify the alternatives belonging into group Once again,
all the alternatives found to belong into this group (correctly or incorrectly)
are excluded from further consideration and the same procedure continues
until all alternatives are classified into the predefined groups.
The criteria aggregation model used to decide upon the classification of
the alternatives at each stage k of the hierarchical discrimination process, has
the form of an additive utility function, similar to the one used in UTADIS.
case indicates that the classification of the alternatives is not clear and addi-
tional analysis is required. This analysis can be based on the examination of
the marginal utilities and to determine how the perform-
ance of the alternatives on each of the evaluation criterion affects their clas-
sification.
In both utility functions and the corresponding marginal
utilities and are monotone functions on the criteria scale.
The marginal utility functions are increasing, whereas are
decreasing functions. This specification is based on the ordinal definition of
the groups. In particular, since the alternatives of group are considered to
be preferred to the alternatives of the groups to it is expected that the
higher the performance of an alternative on criterion the more similar the
alternative is to the characteristics of group (increasing form of the mar-
ginal utility function and the less similar is to the characteristics of
the groups to (decreasing form of the marginal utility function
).
The marginal utility functions are modeled in a piece–wise linear form,
similarly to the case of the UTADIS method. The piece–wise linear model-
ing of the marginal utility functions in the MHDIS method is illustrated in
Figure 4.8. In contrast to the UTADIS method, the criteria’s scale is not di-
vided into subintervals. Instead, the alternatives of the reference set are
rank–ordered according to their performance on each criterion. The perform-
ance of each alternative is considered as a distinct criterion level. For in-
stance, assuming that the reference set includes m alternatives each having a
different performance on criterion then m criterion levels are considered,
ordered from the least preferred one to the most
preferred one where denotes the number of
distinct criterion levels (in this example
Denoting as and two consecutive levels of criterion
the monotonicity of the marginal utilities is imposed through
the following constraints (t is a small positive constant used to define the
smallest difference between the marginal utilities of and
where,
104 Chapter 4
This simple example indicates that the use of utilities in MHDIS does not
correspond to the alternatives themselves, but rather to the appropriateness
of the choices (classification decisions) that the decision maker has meas-
ured on the basis of the alternatives’ performances on the evaluation criteria.
3
Henceforth, the discussion focuses on the development of a pair of utility functions at
stage k of the hierarchical discrimination process. The first utility function character-
izes the alternatives of group whereas the second utility function characterizes
the alternatives belonging in the set of groups The same process ap-
plies to all stages k=1, 2,..., q–1 of the hierarchical discrimination process.
106 Chapter 4
The initial step in the model development process is based on a linear pro-
gramming formulation. In this formulation the classification errors are con-
sidered as real–valued variables, defined similarly to the error variables
and used in the UTADIS method. In the case of the MHDIS method
these error variables are defined through the classification rule (4.36):
Subject to:
4. Preference disaggregation classification methods 107
Subject to:
4. Preference disaggregation classification methods 109
The first set of constraints (4.45) is used to ensure that all correct classi-
fications achieved by solving LP1 are retained. The second set of constraints
(4.46) is used only for the alternatives that were misclassified by LP1 (set
MIS). Their interpretation is similar to the constraints (4.39) and (4.40) in
LP1. Their only difference is the transformation of the real-valued error
variables and of LP1 into the binary 0–1 variables and that
indicate the classification status of an alternative. Constraints (4.46) define
these binary variables as follows: indicates that the alternative of
group is classified by the developed model into the set of groups
whereas indicates that the alternative belonging into one of the
groups to is classified by the developed model into group
Both cases are misclassifications. On the contrary the cases and
indicate the correct classification of the alternative The interpre-
tation of constraints (4.47) and (4.48) has already been discussed for the LP1
formulation. The objective of MIP involves the minimization of a weighted
sum of the error variables and The weighting is performed consid-
ering the number of alternatives of set MIS from each group This is de-
noted by
Solving LP1 and then MIP, leads to the “optimal” classification of the alter-
natives, where the term “optimal” refers to the minimization of the number
of misclassified alternatives. However, it is possible that the correct classifi-
cation of some alternatives is “marginal”. This situation appears when the
classification rules (4.36) are marginally satisfied, i.e., when there is only a
slight difference between and For instance, assume a pair of
utility functions developed such that for an alternative of group its
110 Chapter 4
where COR' denotes the set of alternatives classified correctly by the pair of
utility functions developed through the solution of MIP. The objective of this
third phase of the model development procedure is to maximize d. This is
performed through the following linear programming formulation (LP2).
Subject to:
The first set of constraints (4.51) involves only the correctly classified
alternatives. In these constraints d represents the minimum absolute differ-
ence between the global utilities of each alternative according to the two util-
ity functions. The second set of constraints (4.52) involves the alternatives
misclassified after the solution of MIP (set MIS' ) and it is used to ensure
that they will be retained as misclassified.
After the solution of LP1, MIP and LP2 at stage k of the hierarchical dis-
crimination process, the “optimal” classification is achieved between the
alternatives belonging into group and the alternatives belonging into the
groups The term “optimal” refers to the number of misclassifications
and to the clarity of the obtained discrimination. If the current stage k is the
last stage of the hierarchical discrimination process (i.e., k=q–1) then the
model development procedure stops since all utility functions required to
classify the alternatives, have been estimated. Otherwise, the procedure pro-
ceeds to stage k+1, in order to discriminate between the alternatives belong-
ing into group and the alternatives belonging into the lower groups
In stage k+1 all alternatives classified by the pair of utility functions
developed at stage k into group are not considered. Consequently, a new
reference set A' is formed, including all alternatives that remain unclassified
in a specific group (i.e., the alternatives classified in stake k in the set of
groups According to the set A', the values of and are up-
dated, and the procedure proceeds with solving once again LP1, MIP and
LP2.
APPENDIX
Max d
Subject to:
where:
used to ensure that the classification error of the new additive utility model
developed through the solution of the above linear program does not exceed
the trade–off made on the optimal classification error defined on the basis
of the optimal solution for LP2.
The maximization of the minimum difference d can also be incorporated
into the formulation of LP2 as a secondary goal for model development (the
primary goal being the minimization of the classification error). In this case
few revisions are required to the above linear program, as follows:
1. No distinction is made between the sets COR and MIS and consequently
constraints (A1)–(A3) apply to all the alternatives of the reference set.
2. In constraints (A1)–(A3) the classification errors and are intro-
duced similarly to the constraints (4.30)–(4.32) of LP2.
3. Constraint (A4) is eliminated.
4. The new objective takes the form:
where and are weighted parameters for the two goals (minimization
of the classification error and maximization of the minimum difference)
defined such
5. During the post–optimality stage described in sub–section 2.2.3 a new
constraint is added: where:
is the maximum difference defined by solving the above linear pro-
gram considering the revisions 1-4,
is a trade-off made over to explore the existence of near optimal
solutions ( is a small portion of ).
d is the maximum difference defined through the solution of each lin-
ear program formed during the post-optimality stage described in
sub-section 2.2.3.
Subject to:
where:
Similarly to the case of LP2, the objective function of the above linear
program considers a weighted sum of the differences and In particu-
lar, the differences are weighted to account for variations in the number of
alternatives of each group in the reference set.
Constraints (A5)–(A7) define the differences and for the alterna-
tives classified correctly after the solution of LP2 (set COR). The uncon-
trolled maximization of these differences, however, may lead to unexpected
results regarding the classification of alternatives belonging into intermedi-
ate groups In particular, given that any intermediate
group is defined by the upper utility threshold and the lower utility
threshold the maximization of the difference for an al-
ternative classified correctly by LP2, is possible to lead to the estima-
tion of a global utility that exceeds the utility threshold (upper
boundary of group ). Therefore, the alternative is misclassified. A similar
phenomenon may also appear when the difference is
maximized (the new estimated global utility may violate the utility
threshold i.e., the lower boundary of group ). To avoid these cases the
differences and should not exceed the range of each group where
the range is defined by the difference Constraints (A8) introduce
these appropriate upper bounds for the differences and
118 Chapter 4
Subject to:
where:
According to the solution of LP2 it is expected that the set MIS is a small
portion of the whole reference set. Therefore, the number of binary 0–1 vari-
ables and is considerably lower than the number of the correspond-
ing variables if all the alternatives of the reference set are considered (in this
case one should use binary 0–1 variables). This reduc-
tion in the number of binary variables is associated to a significant reduction
in the computational effort required to obtain an optimal solution. However,
the computational complexity problem still remains for large data sets. For
instance, considering a reference set consisting of 1,000 alternatives for
which LP2 misclassifies 100 alternatives (10% error), then 100 binary 0–1
variables should be introduced in the above mixed integer programming
formulation (assuming a two group classification problem). In this case sig-
nificant computational resources will be required to find an optimal solution.
The use of advanced optimization techniques such as genetic algorithms or
heuristics (tabu search; cf. Glover and Laguna, 1997), constitute promising
approaches to tackle with this problem. Overall, however, it should be em-
phasized that the optimal fit of the model to the data of the reference set does
not ensure high generalizing ability. This issue needs careful investigation.
velop an optimal additive utility classification model and then at the post–
optimality stage the following linear program is solved:
Subject to:
1. OBJECTIVES
The existing research on the field of MCDA classification methods has been
mainly focused on the development of appropriate methodologies for sup-
porting the decision making process in classification problems. At the practi-
cal level, the use of MCDA classification techniques in real-world classifica-
tion problems has demonstrated the capabilities that this approach provides
to decision makers.
Nevertheless, the implementation in practice of any scientific develop-
ment is always the last stage of a research. Before this stage, experiments
need to be performed in a laboratory environment, under controlled data
conditions in order to investigate the basic features on the scientific devel-
opment under consideration. Such an investigation and the corresponding
experimental analysis enable the derivation of useful conclusions on the po-
tentials that the proposed research has in practice and the possible problems
that may be encountered during its practical implementation.
Within the field of MCDA experimental studies are rather limited. Some
MCDA researchers conducted experiments to investigate the features and
peculiarities of some MCDA ranking and choice methodologies (Stewart,
1993, 1996; Carmone et al., 1997; Zanakis et al., 1998). Comparative studies
involving MCDA classification techniques have been heavily oriented to-
wards the goal programming techniques discussed in Chapter 3. Such com-
parative studies tried to evaluate the efficiency of goal programming classi-
fication formulations as opposed to traditional statistical classification tech-
niques, such as LDA, QDA and LA.
124 Chapter 5
The present chapter follows this line of research to investigate the classi-
fication performance of the preference disaggregation methods presented in
Chapter 4, as opposed to other widely used classification techniques some of
which have been discussed in Chapters 2 and 3. The investigation is based
on an extensive Monte Carlo simulation experiment.
1
Compensatory approaches lead to the development of criteria aggregation models consid-
ering the existing trade-offs between the evaluation criteria. Techniques based on the util-
ity theory approach have a compensatory character. On the other hand, non-compensatory
approaches involve techniques that do not consider the trade-offs between the evaluation
criteria. Typical examples of non-compensatory approaches are lexicographic models,
conjunctive/disjunctive models, and techniques based on the outranking relation approach
that employ the veto concept.
126 Chapter 5
quires the decision maker to specify several parameters (cf. Chapter 3). This
is impossible in this experimental comparison, since there is no decision
maker to interact with. To tackle this problem a new procedure has been de-
veloped allowing the specification of the parameters of the outranking rela-
tion constructed through ELECTRE TRI, using the preference disaggrega-
tion paradigm. The details of this procedure are discussed in the appendix of
this chapter.
Of course, in addition to the above techniques, other classification meth-
odologies could have also been used (e.g., neural networks, goal program-
ming formulations, etc.). Nevertheless, introducing additional classification
techniques in this experimental comparison, bearing in mind the already in-
creased size of the experiment would make the results difficult to analyze.
Furthermore, as already noted in Chapters 2 and 3, there have been several
comparative studies in these fields involving the relative classification per-
formance of the corresponding techniques as opposed to the statistical tech-
niques used in the analysis of this chapter. Therefore, the results of this com-
parative analysis can be examined in conjunction with the results of previous
studies to derive some conclusions on the classification efficiency of the
MCDA classification techniques compared to a variety of other non-
parametric techniques.
3. EXPERIMENTAL DESIGN
2
If z is a vector of n random variables that follow the standard normal distribution N (0,1),
then the elements of the vector y=Bz+µ follow the multivariate normal distribution with
mean µ and variance-covariance (dispersion) matrix BB'.
3
This is actually a multivariate distribution that resembles the exponential distribution in
terms of its skewness and kurtosis. Nevertheless, for simplicity reasons, henceforth this will
be noted as the exponential distribution.
5. Experimental comparison of classification techniques 129
Level 1:
Level 2:
4
In the log-normal distribution the skewness and kurtosis are defined by the mean and the
variance of the criteria for each group. The procedures for generating multivariate non-
normal data can replicate satisfactory the prespecified values of the first three moments
(mean, standard deviation and skewness) of a statistical distribution. However, the error is
higher for the fourth moment (kurtosis). Therefore, in order to reduce this error and conse-
quently to have better control of the generated data, both the mean and the variance of the
criteria for each group in the case of the multivariate log-normal distribution, are specified
so as the coefficient ofkurtosis is lower than 40.
130 Chapter 5
Level 1:
Level 2:
Level 1:
Level 2:
The last factor is used to specify the degree of group overlap. The
higher the degree of group overlap the more difficult is to discriminate the
considered groups. The definition of the group overlap in this experiment is
performed using the Hotelling’s statistic. For a pair of groups and
this statistic is defined as follows:
where and denote the number of alternatives of the reference set be-
longing into each group, and are vectors of the criteria averages for
each group, and S is the within–groups variance–covariance matrix:
investigating the multivariate normality assumption have shown that the re-
sults of the Hotelling’s are quite robust for multivariate non-normal data,
even for small samples (Mardia, 1975). Therefore, the use of the Hotelling’s
in this experimental combined with non-normal data is not a problem. In
the case where the group variance-covariance matrices are not equal, then it
is more appropriate to use the revised version of the Hotelling’s as de-
fined by Anderson (1958):
where:
Each criterion of the generated vector g' follow the specified multi-
variate non-normal distribution with zero mean and unit variance.
Their transformation, so that they have the desired mean and standard de-
viation defined by factors and respectively (cf. Table 5.1), is per-
formed through the relation
Each criterion of the vector g' is defined as
where is a random variable following the
multivariate standard normal distribution. The constants are
specified through the solution of a set of non-linear equations on the basis of
the desired level of skewness and kurtosis (Fleishman, 1978):
where denotes the desired correlation between the criteria and as de-
fined by factor The intermediate correlation matrix is decomposed so that
the correlations between the random variables of the vector y corresponds to
the desired correlations between the criteria of the vector g. In this experi-
mental study the decomposition of the intermediate correlation matrix is per-
formed using principal components analysis. The data generation procedure
ends with the transformation of the vector g' to the criteria vector g with the
desired mean and standard deviation defined by factors and respec-
tively.
Since all the considered MCDA methods assume an ordinal definition of
the classes, it is important to ensure that the generated data meet this re-
quirement. This is achieved through the following constraint:
134 Chapter 5
4. ANALYSIS OF RESULTS
The results obtained from the simulation experiment involve the classifica-
tion error rates of the methods both in the reference sets and the validation
samples. However, the analysis that follows is focused only on the classifica-
tion performance of the methods on the validation samples. This is because
the error rates obtained considering the reference set are downwardly biased
compared to the actual performance of the methods, since the same sample is
used both for model development and model validation. On the other hand,
the error rates obtained using the validation samples provide a better esti-
mate of the generalizing performance of the methods, measuring the ability
of the methods to provide correct recommendations on the classification of
new alternatives (i.e., alternatives not considered during model develop-
ment).
5
(192 combinations of factors F2 to F7)×(20 replications).
5. Experimental comparison of classification techniques 135
Table 5.2 presents the ANOVA results for this error rate measure de-
fined on the basis of the validation samples. All main effects and the interac-
tion effects presented in this table are significant at the 1% level. Further-
more, each effect (main or interaction) explains at least 0.5% of the total
variance in the results ( statistic6). Except for the 17 effects presented in
Table 5.2 there were 64 more effects found to be significant at the 1% level.
None of these effects, however, explained more that 0.5% of the total vari-
ance and therefore, in order to reduce the complexity of the analysis, they are
not reported.
A first important note on the obtained results is that the main effects re-
garding the seven factors are all significant. This clearly shows that
each of these factors has a major impact on the classification performance of
the methods. The main effects involving the statistical distribution of the
data the structure of the group dispersion matrices
and the classification methods explain more than 48% of the total
variance. The latter effect (classification methods) is of major importance to
this analysis. It demonstrates that there are significant differences in the clas-
sification performances of the considered methods. Figure 5.2 presents the
average error rates for each method in the validation samples for the whole
simulation. The numbers in parentheses indicate the grouping of the methods
according to the Tukey’s test7 on the average transformed error rates. The
homogeneous groups of classification methods formed by the Tukey’s test
are presented in an increasing order (i.e., 1, 2, ...) from the methods with the
lower error rate to those with the higher error rate.
6
Denoting by SS the sum of squares, by MSE the mean square error, by df the degrees of
freedom and by TSS the total sum of squares, the statistic is calculated as follows:
7
Tukey’s honestly significantly different test is a post-hoc comparison technique that fol-
lows the results of ANOVA enabling the identification of the means that most contribute to
the considered effect. In this simulation study the Tukey’s test is used to perform all pair-
wise comparisons among average classification error rates (transformed error rates) of each
pair of methods to form homogenous sets of methods according to their classification error
rate. Each set includes methods that do not present statistically significant differences with
respect to their classification error rates (see Yandell, 1977 for additional details).
136 Chapter 5
5. Experimental comparison of classification techniques 137
are unequal. In this case the error rate of QDA is similar to the one of the
UTADIS method and significantly lower compared to all the other tech-
niques. These results indicate that QDA is quite sensitive to changes in the
structure of the group dispersion matrices.
The last two-way interaction, that is of interest in this analysis, is the one
involving the performance of the methods according to the number of groups
This interaction explains 0.64% of the total variance in
the results of the experiment. The corresponding results are presented in Ta-
ble 5.6. A first obvious remark is that the performance of all methods dete-
riorates significantly in the three-group classification problem as opposed to
the two-group case. This is no surprise, since the number of groups is posi-
tively related to the complexity of the problems (i.e., the complexity in-
creases with the number of groups). Nevertheless, in both the two-group and
the three-group case the use of the heuristic HEUR2 in UTADIS is the ap-
proach that provides the lower error rate. In particular, in the two-group case
UTADIS2 performs similarly to UTADIS 1 (use of HEUR1), whereas in the
three-groups case its differences from all other methods (including
UTADIS 1) are all statistically significant at the 5% level according to the
grouping obtained from the Tukey’s test. It should also be noticed that
MHDIS and ELECTRE TRI are the least sensitive methods to the increase
of the number of groups. In both cases, the increase in the error rates for the
three-group problem is the smallest compared to all other methods. As a re-
sult, both MHDIS and ELECTRE TRI perform similarly to UTADIS 1 in the
three-group classification problem.
Except for the above two-way interaction results, Table 5.2 also indicates
some three–way interactions to be significant in explaining the results of this
experiment regarding the performance of the considered classification meth-
ods. The first of these three–way interactions that is of interest involves the
performance of the methods according to the form of the statistical distribu-
tion of the data and the structure of the group dispersion matrices (interac-
tion The corresponding results presented in Table 5.7 provide
5. Experimental comparison of classification techniques 141
more insight information on the remarks noted previously when the statisti-
cal distribution and the structure of the group dispersion matrices were ex-
amined independently from each other (cf. Tables 5.3 and 5.4); the interac-
tion of these two factors is examined now.
142 Chapter 5
The results of the above table indicate that when the data are multivari-
ate normal and the group dispersion matrices are equal LDA and LA provide
the lower error rates, whereas when the group dispersion matrices are un-
equal QDA outperforms all the other methods, followed by UTADIS. These
results are to be expected considering that multivariate normality and the a-
priori knowledge of the structure of the group dispersion matrices are the
two major assumptions underlying the use both LDA and QDA. On the other
hand, when the data are not multivariate normal and the group dispersion
matrices are equal, then the MCDA classification methods (UTADIS,
MHDIS, ELECTRE TRI) provide the best results compared to the other
methods considered in this experiment. In all these cases the use of the
UTADIS method with the heuristic HEUR2 (UTADIS2) provides the best
results. Its differences from all the other MCDA approaches are significant
for the exponential and the log-normal distributions, whereas for the uniform
distribution its results are similar to UTADIS 1. The results obtained when
the data are not multivariate normal and the dispersion matrices are unequal
are rather similar. The differences, however, between the MCDA methods,
rough sets and QDA are reduced in this case. In particular, for the uniform
distribution QDA performs similarly to the UTADIS method, while outper-
forming both MHDIS and ELECTRE TRI. A similar situation also appears
for the log-normal distribution. On the other hand, for the exponential distri-
bution UTADIS outperforms all the other methods, followed by MHDIS and
rough sets.
The second three-way interaction that is of interest involves the per-
formance of the classification methods according to the form of the statisti-
cal distribution of the data and the size of the reference set (interaction
The results presented in Table 5.8 show that for low and moder-
ate sizes of the reference set (36 and 72 alternatives) the MCDA classifica-
tion methods compare favorably (in most cases) to the other techniques, irre-
spective of the form of the statistical distribution. Futhermore, it is interest-
ing to note that as the size of the reference set increases, the performance of
MHDIS and ELECTRE TRI relative to the other methods is improved. The
improvement is more significant for the two asymmetric distributions (expo-
nential and log-normal). For instance, in the case of the log-normal distribu-
tion with a large reference set (108 alternatives) both MHDIS and
ELECTRE TRI perform significantly better than the UTADIS method when
the heuristic HEUR1 (UTADIS1) is used.
5. Experimental comparison of classification techniques 143
The results of Table 5.9 show that the MCDA classification methods
(UTADIS, MHDIS, ELECTRE TRI) outperform, in most cases, the
other approaches. The high efficiency of the considered MCDA methods
is also illustrated in the results presented in Table 5.10. The analysis of
Table 5.10 shows that the implementation of UTADIS with the heuristic
HEUR2 provides the lowest error rates in most cases, especially when
the data come from an asymmetric distribution (exponential and log-
normal). In the same cases, the MHDIS method and ELECTRE TRI also
perform well.
The results of Tables 5.9 and 5.10 lead to the conclusion that the
modeling framework of MCDA methods is quite efficient in addressing
classification problems. The UTADIS and MHDIS methods that employ
a utility-based modeling approach seem to outperform the outranking re-
lations framework of the ELECTRE TRI method. Nevertheless, the dif-
ferences between these approaches are reduced when more complex
problems were considered (e.g., classification problems in three groups
and problems with larger reference sets).
2. The procedure proposed for estimating the parameters of the outranking
relation in the context of the ELECTRE TRI method (cf. the Appendix
of this chapter for a detailed description of the procedure), seems to be
well-suited to the study of classification problems. Extending this proce-
dure to consider also the optimistic assignment approach will contribute
to the full exploitation of the particular features and capabilities of
146 Chapter 5
5. Experimental comparison of classification techniques 147
148 Chapter 5
ELECTRE TRI. This will enable the modeling of the incomparability re-
lation which provides significant information to the decision maker.
Overall, during the whole experiment the discordance test in the
ELECTRE TRI method was performed in 1,250 out of the 3,840 total
replications conducted in the experiment (32.6%). In the proposed pro-
cedure used to specify the parameters of the outranking relation in the
ELECTRE TRI method the discordance test is performed only if it is
found to improve the classification of the alternatives in the reference
set. The limited use of the discordance test in this experiment is most
possibly due to the nature of the considered data. Generally, the discor-
dance test is useful in the evaluation of alternatives that have good per-
formance on some criteria but very poor performance on other criteria.
In such cases, it is possible that a criterion where the alternative has poor
performance vetoes the overall evaluation of the alternative, irrespective
of its good features on the other criteria. Such cases where the perform-
ances of the alternatives on the criteria have significant fluctuations were
not considered in this experiment. Modeling such cases within an ex-
perimental study would be an interesting further extension of this analy-
sis in order to formulate a better view of the impact of the discordance
test on the classification results of the ELECTRE TRI method.
Table 5.11 presents the percentage of replications at which the dis-
cordance test was conducted for each combination of the four factors
found to be the more significant in this experiment (i.e., the form of the
statistical distribution, the number of groups, the size of the reference set
and the structure of the group dispersion matrices). It should be noted
that for each combination of these four factors 80 replications were per-
formed.
5. Experimental comparison of classification techniques 149
The results of Table 5.11 indicate that the discordance test was most
frequently used in the three-group case. Furthermore, it is interesting to
note that the frequency of the use of the discordance test was reduced for
larger reference sets. Finally, it can also be observed that the heterogene-
ity of the group dispersion matrices reduced the frequency of the use of
the discordance test.
Of course, these results on the use of the discordance test need fur-
ther consideration. The discordance test is a key feature of the
ELECTRE TRI method together with the ability of the method to model
the incomparability relation. These two features are the major distin-
guishing characteristics of classification models developed through out-
ranking relation approaches compared to compensatory approaches such
as the UTADIS and the MHDIS methods. The analysis of the existing
differences in the recommendations (evaluation results) of such methods
will contribute to the understanding of the way that the peculiarities of
each approach affect their classification performance.
The experimental analysis presented in this chapter did not address this
issue. Instead, the focal point of interest was the investigation of the classifi-
cation performance of MCDA classification methods compared to other
techniques. The obtained results can be considered as encouraging for the
MCDA approach. Moreover, they provide the basis for further analysis
along the lines of the above remarks.
150 Chapter 5
APPENDIX
1. Prior research
As noted in the presentation of the ELECTRE TRI in Chapter 3, the use of
the method to develop a classification model in the form of an outranking
relation requires the specification of several parameters, including:
1. The weight of each criterion
2. The reference profiles distinguishing two consecu-
tive groups and for all
3. The preference, indifference and veto thresholds for all criteria
and for all k=1, 2, …, q–1.
4. The cut-off threshold that defines the minimum value of
the credibility index above which it can be ascertained that the
affirmation “alternative is at least as good as profile is valid.
The method assumes that all these parameters are specified by the deci-
sion maker in cooperation with the decision analyst through an interactive
process. Nevertheless, this process is often difficult to be implemented in
practice. This is due to two main reasons:
a) The increased amount of time required to elicit preferential information
by the decision maker.
b) The unwillingness of the decision makers to participate actively in the
process and to provide the required information.
These problems are often met in several fields (e.g., stock evaluation,
credit risk assessment, etc.) where decisions have to be taken on a daily ba-
sis, and the time and cost are crucial factors for the use of any decision mak-
ing methodology.
To overcome this problem Mousseau and Slowinski (1998) proposed an
approach to specify the parameters of the outranking relation classification
model of the ELECTRE TRI method using the principles of preference dis-
aggregation. In particular, the authors suggested the use of a reference set for
5. Experimental comparison of classification techniques 151
where:
sets. Nevertheless, this simplification does not address the problem of speci-
fying the parameters of the outranking relation adequately, since some criti-
cal parameters (reference profiles, the preference and indifference thresh-
olds) need to be specified by the decision maker.
Furthermore, it should be emphasized that using the pessimistic assign-
ment procedure in ELECTRE TRI without the discordance test, is quite
similar to the utility-based approach used in the UTADIS method. In particu-
lar, the partial concordance index can be considered as a form of a marginal
utility function. The higher the partial concordance index for the affirmation
“alternative is at least as good as reference profile on the basis of crite-
rion the higher is the utility/value of alternative on criterion
These remarks show that the main distinguishing feature of the two ap-
proaches (outranking relations vs utility-based techniques) is the non-
compensatory philosophy of the outranking relation approaches that is im-
plemented through the discordance test. In this regard, the use of the ELEC-
TRE TRI method without the discordance test cannot be considered as a
different approach to model the classification problem compared to com-
pensatory techniques such as the use of additive utility functions.
8
In contrast to the discussion of the ELECTRE TRI method in chapter 3, in this presenta-
tion the criteria’s weights are assumed to sum up to 1.
154 Chapter 5
9
All criteria are assumed to be of increasing preference.
5. Experimental comparison of classification techniques 155
Minimize
Subject to:
Linear programs of the above form often have multiple optimal solu-
tions. Furthermore, it is even possible that near-optimal solutions provide a
more accurate classification of the alternatives than the attained optimum
solution (this is because the objective function of the above linear program
does not consider the number of misclassifications). These issues clearly in-
dicate the necessity of exploring the existence of alternative optimal or near-
optimal solutions. However, performing a thorough search of the polyhedron
defined from the constraints of the above linear program could be a difficult
and time consuming process. To overcome this problem the heuristic proce-
dure proposed by Jacquet-Lagrèze and Siskos (1982) for the UTA method is
used (this procedure is also used in the UTADIS method, cf. sub-section
2.3.2 of Chapter 4). This procedure involves the realization of a post opti-
mality analysis stage in order to identify a characteristic subset of the set of
feasible solutions of the above linear program. In particular, the partial ex-
ploration of the feasible set involves the identification of solutions that
156 Chapter 5
maximize the criteria’s weights. Thus, during this post optimality stage n
alternative optimal or near–optimal solutions are identified corresponding to
the maximization of the weights of the n criteria, one at a time. This enables
the derivation of useful conclusions on the stability of the estimated parame-
ters (criteria’s weights). The criteria’s weights which are used in building the
outranking relation are then computed as the average of all solutions identi-
fied during the post optimality process (Jacquet-Lagrèze and Siskos, 1982).
Alternative procedures to aggregate the results of the post optimality analy-
sis are also possible (Siskos, 1982).
At this point of the methodology all the information required to compute
the concordance index is available. Assuming (for the moment) that no crite-
rion has a veto capability, the assignment of the alternatives is performed as
follows (the pessimistic procedure is employed):
Before the end of the first stage of the methodology and the concordance
test, the sets are identified as follows:
Subject to:
1. INTRODUCTION
Financial management is a broad and rapidly developing field of manage-
ment science. The role of financial management covers all aspects of busi-
ness activity, including investment, financing and dividend policy issues.
During the last decades the globalization of the financial markets, the in-
tensifying competition between corporate entities and the socio-political and
technological changes have increased the complexity of the business, eco-
nomic and financial environments.
Within this new context the smooth financial operation of any corporate
entity and organization becomes a crucial issue for its sustainable growth
and development. Nevertheless, the increasing complexity of the financial
environment poses new challenges that need to be faced. The plethora of the
new financial products that are now available to firms and organization as
risk management, investment and financing instruments is indicative of the
transformations that have occurred in the finance industry over the past dec-
ades and the existing complexity in this field.
To address this complexity it is necessary to adjust the financial decision-
making methodologies so that they meet the requirements of the new finan-
cial environment. Empirical approaches are no longer adequate. Instead,
gradually there is a worldwide increasing trend towards the development and
implementation of more sophisticated approaches based on advanced quanti-
tative analysis techniques, such as statistics, optimization, forecasting, simu-
lation, stochastic processes, artificial intelligence and operations research.
160 Chapter 6
2. BANKRUPTCY PREDICTION
termination of the operation of the firm following a filing for bankruptcy due
to severe financial difficulties of the firm in meeting its financial obligations
to its creditors. On the other hand, the other forms of financial distress do not
necessarily lead to the termination of the operation of the firm. Further de-
tails on the different forms of financial distress can be found in the books of
Altman (1993), Zopounidis and Dimitras (1998).
The consequences of bankruptcy are not restricted to the individuals,
firms or organizations that have an established relationship with the bankrupt
firm; they often extend to the whole economic, business and social environ-
ment of a country or a region. For instance, developing countries are often
quite vulnerable to corporate bankruptcies, especially when the bankruptcy
involves a firm with a major impact in the country’s economy. Furthermore,
taking into account the globalization of the economic environment, it be-
comes clear that such a case may also have global implications. The recent
crisis in Southeast Asia is an indicative example.
These findings demonstrate the necessity of developing and implement-
ing efficient procedures for bankruptcy prediction. Such procedures are nec-
essary for financial institutions, individual and institutional investors, as well
as for the firms themselves and even for policy makers (e.g., government
officers, central banks, etc.).
The main goal of bankruptcy prediction procedures is to discriminate the
firms that are likely to go bankrupt from the healthy firms. This is a two-
group classification problem. However, often an additional group is also
considered to add flexibility to the analysis. The intermediate group may
include firms for which it is difficult to make a clear conclusion. Some re-
searchers place in such an intermediate group distressed firms that finally
survive through restructuring plans, including mergers and acquisitions
(Theodossiou et al., 1996).
The classification of the firms into groups according to their bankruptcy
risk is usually performed on the basis of their financial characteristics using
information derived by the available financial statements (i.e., balance sheet
and income statement). Financial ratios calculated through the accounts of
the financial statements are the most widely used bankruptcy prediction cri-
teria. Nevertheless, making bankruptcy prediction solely on the basis of fi-
nancial ratios has been criticized by several researchers (Dimitras et al.,
1996; Laittinen, 1992). The criticism has been mainly focused on the fact
that financial ratios are only the symptoms of the operating and financial
problems that a firm faces rather than the cause of these problems. To over-
come this shortcoming, several researchers have noted the significance of
considering additional qualitative information in bankruptcy prediction.
Such qualitative information involves criteria such as the management of the
firms, their organization, the market niche/position, the market’s trends, their
6. Classification problems in finance 163
Among the financial ratios considered, the first four ratios measure the
profitability of the firms. High values of these ratios correspond to profitable
firms. Thus, all these ratios are negatively related to the probability of bank-
ruptcy. The financial ratios current assets/current liabilities and quick
assets/current liabilities involve the liquidity of the firms and they are
commonly used to predict bankruptcy (Altman et al., 1977; Gloubos and
Grammaticos, 1988; Zavgren, 1985; Keasey et al., 1990; Theodossiou, 1991;
Theodossiou et al., 1996). Firms having enough liquid assets (current assets)
are in better liquidity position and are more capable in meeting their short–
term obligations to their creditors. Thus, these two ratios are negatively re-
lated to the probability of bankruptcy. The remaining ratios are related to the
solvency of the firms and their working capital management. High values on
the ratios and (solvency ratios) indicate severe indebtedness, in which
case the firms have to generate more income to meet their obligations and
repay their debt. Consequently both ratios are positively related to the prob-
ability of bankruptcy. Ratios and are related to the working capital
management efficiency of the firms. Generally, the higher is the working
capital of a firm, the less likely is that it will go bankrupt. In that regard, ra-
tios and are negatively related to the probability of bankruptcy,
whereas ratio is positively related to bankruptcy (inventories are often
difficult to liquidate and consequently a firm holding a significant amount of
inventory is likely to face liquidity problems).
Of course the different industry sectors included both in the basic and
the holdout sample, are expected to have different financial characteristics,
thus presenting differences in the financial ratios that are employed. Some
researchers have examined the industry effects on bankruptcy prediction
models, by adjusting the financial ratios to industry averages. However, the
166 Chapter 6
obtained results are controversial. Platt and Platt (1990) concluded that an
adjusted bankruptcy prediction model performs better than an unadjusted
one, while Theodossiou (1987) did not find any essential difference or im-
provement. Furthermore, Theodossiou et al. (1996) argue that adjusted in-
dustry or time models implicitly assume that bankruptcy rates for businesses
are homogenous across industries and time, an assumption which is hardly
the case. On this basis, no adjustment to the industry sector is being made on
the selected financial ratios.
Tables 6.2-6.6 present some descriptive statistics regarding the two sam-
ples with regard to the selected financial ratios for the two groups of firms,
the bankrupt and the non-bankrupt and respectively). In particular,
Table 6.2 presents the means of the financial ratios, Tables 6.3 and 6.4 pre-
sent the skewness and kurtosis coefficients, whereas Tables 6.5 and 6.6 pre-
sent the correlation coefficients between the selected financial ratios.
It is interesting to note from Table 6.2 that many of the considered finan-
cial ratios significantly differentiate the two groups of firms at least in the
case of the basic sample. However, in the holdout sample the differences
among the two groups of firms are less significant. Actually, the only ratio
that significantly differentiates the two groups for all three years of the hold-
out sample, is the solvency ratio total liabilities/total assets that meas-
ures the debt capacity of the firms.
On the basis of the two samples and the selected set of financial ratios,
the development and validation of bankruptcy prediction models is per-
formed in three stages:
1. In the first stage the data of the firms included in the basic sample for the
first year prior to bankruptcy (year -1) are used to develop a bankruptcy
prediction model. The predictions made with this model involve a time-
depth of one year. In that sense, the model uses as input the financial ra-
tios of the firms for a given year and its output involves an assessment
of the bankruptcy risk for the firms in year Alternatively, it could be
possible to develop different models for each year prior to bankruptcy
(years –1 up to –5). In this scheme, each model would use the financial
ratios for a year to produce an estimate of the bankruptcy risk in years
t+1, t+2, …, t+5. Also, it could be possible to develop a multi-group
bankruptcy prediction model considering not only the status of the firms
(bankrupt or non-bankrupt) but also the time bankruptcy occurs. In this
multi-group scheme the groups could be defined as follows: non-
bankrupt firms firms that will go bankrupt in the forthcoming year
firms that will go bankrupt in year firms that will go
bankrupt in year firms that will go bankrupt in year
firms that will go bankrupt in year This approach has been used
in the study of Keasey et al. (1990). Nevertheless, the fact that the hold-
6. Classification problems in finance 167
out sample involves a three year period as opposed to the five year pe-
riod of the basic sample posses problems on the validation of the classi-
fication models developed through these alternative schemes.
2. In the second stage of the analysis the developed bankruptcy prediction
models are applied to the data of the firms of the basic sample for the
years –2, –3, –4 and –5. This enables the investigation of the ability of
the developed models to provide early warning signals for the bank-
ruptcy status of the firms used to develop these models.
168 Chapter 6
6. Classification problems in finance 169
170 Chapter 6
6. Classification problems in finance 171
3. In the final stage of the analysis, the developed models are applied to the
three years of the holdout sample. This provides an assessment of the
generalizing ability of the models when different firms for a different
time period are considered.
The above three-stage procedure was used for all the considered classifi-
cation methods. The sub-sections that follow analyze and compare the ob-
tained results.
172 Chapter 6
To apply the UTADIS method the heuristic HEUR2 is used for the specifica-
tion of the piece-wise linear form of the marginal utility functions. Follow-
ing this approach, the additive utility model developed for bankruptcy pre-
diction purposes has the following form:
The coefficients of the marginal utilities in this function, show that the
most significant ratios for the discrimination of the two groups of firms and
the prediction of bankruptcy are the solvency ratios total liabilities/total as-
sets and net worth/(net worth + long-term liabilities) For the other
ratios there are no significant differences in their contribution to bankruptcy
prediction. Only the ratio inventory/working capital has very low
significance in the developed additive utility model. The specific form of the
marginal utility functions of the developed model is illustrated in Figure 6.1.
On the basis of this additive utility model the classification of the firm as
bankrupt or non-bankrupt is performed through the following classification
rules:
In the case of the MHDIS method, based on the financial ratios of the firms
for year –1, two additive utility functions are developed, since there are only
two groups (bankrupt and non-bankrupt firms). The first additive utility
function, denoted as characterizes the non–bankrupt firms, while the
second one, denoted as characterizes the bankrupt firms. The form
of these two functions is the following:
found significant, i.e., the ratio total liabilities/total assets mainly character-
izes the bankrupt firms, since its weight in the utility function exceeds
88%. However, the weight of this ratio in the utility function is only
2.81%, indicating that while high values of total liabilities/total assets char-
acterize the bankrupt firms, low values of this ratio are not a significant
characteristic for non-bankrupt firms.
The forms of the marginal utility functions for these two ratios (Figure
6.2) provide some insight on the above remarks. In particular, the form of
the marginal utility function for the profitability ratio net income/total assets
indicates that firms with net income/total assets higher than 1.29% are more
likely to be classified as non–bankrupt. On the other hand, firms with total
liabilities/total assets higher than 77.30% are more likely to go bankrupt.
These results indicate that profitability and solvency are the two main distin-
guishing characteristics of non–bankrupt and bankrupt firms, according to
the model developed through MHDIS.
176 Chapter 6
The decision regarding the classification of a firm into one of the two
considered groups (bankrupt and non-bankrupt) is based upon the global
utilities obtained through the two developed additive utility functions. In that
sense, a firm is considered bankrupt if and non–bankrupt if
Through out the application there were no cases where
The application of the ELECTRE TRI is based on the use of the procedure
presented in Chapter 5 for the specification of the parameters of the outrank-
ing relation classification model. In applying this procedure, first the con-
cordance test is performed to specify the parameter vector corresponding
to the preference thresholds for all criteria and the parameter vector
corresponding to the indifference thresholds for all criteria At the same
stage the criteria weights are estimated. An initial estimation for the cut-off
6. Classification problems in finance 177
using only the results of the concordance test, since the use of the discor-
dance test was not found to improve the classification results for the data of
the reference set (i.e., year -1 of the basic sample). Therefore, the credibility
index is defined on the basis of the global concordance index, as follows:
where:
On the basis of the credibility index calculated in this way, the classifica-
tion of the firms is performed through the following classification rule (the
cut-off point is estimated through the procedure described in the
appendix of the Chapter 5):
note that the rules for the non-bankrupt firms are stronger than the rules
corresponding to the bankrupt firms. The average strength of the rules for the
non-bankrupt firms is 15.75 as opposed to 11.17 which is the average
strength of the rules for the bankrupt firms. These two findings (the number
and the strength of the rules per group) indicate that, generally, it is more
difficult to describe the bankrupt firms than the non-bankrupt ones.
Tables 6.9 and 6.10 present the estimates for the parameters of the above
models, including the constant term the discriminant coefficients and
the cross-product terms for the case of QDA.
6. Classification problems in finance 181
the other methods. In year -3 MHDIS provides the lowest error rate, fol-
lowed by UTADIS (HEUR2). In year -4 the lowest overall error rate is ob-
tained by the UTADIS model, whereas in year -5 the best result is obtained
by the models of MHDIS, rough sets and LA.
As far as the holdout sample is concerned, it is clear that the overall error
rate of all methods is increased compared to the case of the basic sample.
The comparison of the methods shows that the bankruptcy prediction model
of UTADIS performs better than the other models in years -1 and -2,
whereas in year -3 the lower overall error rate is obtained by the model of
the MHDIS method. It is important to note that all three MCDA methods
(UTADIS, MHDIS and ELECTRE TRI) outperform the three statistical
techniques in all three years of the holdout sample. This is significant finding
with regard to the relative efficiency of the corresponding bankruptcy pre-
diction models. It is also interesting to note that the two bankruptcy predic-
tion models of UTADIS and MHDIS provide significantly lower type I error
rates compared to the other techniques. For both these models the type I er-
ror rate is lower than 50% for all years of the holdout sample, whereas in
many cases regarding the other methods it considerably exceeds 50%.
Overall, it should be noticed that the type I error rate is higher than the
type II error rate for all methods in both the basic and the holdout samples.
Nevertheless, this is not a surprising result. Generally the process which
leads a firm to bankruptcy is a dynamic one and it cannot be fully explained
through the examination of the financial characteristics of the firm. In the
beginning of this process the financial characteristics of both non-bankrupt
and bankrupt firms are usually similar (Dimitras et al., 1998). As time
evolves some specific changes in the environment (internal and external) in
which the firm operates, such as changes in the management of the firm or
changes in the market, may lead the firm in facing significant problems
which ultimately lead to bankruptcy. Thus, non-bankrupt firms are rather
easier to describe than the bankrupt firms, in terms of their financial charac-
teristics (they remain in good position over time).
The above finding has motivated researchers to propose the consideration
of additional qualitative strategic variables in bankruptcy prediction models,
including among others management of the firms, their organization, their
market niche/position, their technical facilities, etc. (Zopounidis, 1987). Ac-
tually, as pointed out by Laitinen (1992) the inefficiency of the firms along
these qualitative factors is the true cause of bankruptcy; the poor financial
performance is only a symptom of bankruptcy rather than its cause.
6. Classification problems in finance 183
184 Chapter 6
rates, exchange rates, taxation, etc.) has often a significant impact on the per-
formance and the viability of the firms. Consequently, the consideration of
the sensitivity of the firms to the existing economic conditions could add a
significant amount of useful information for bankruptcy prediction purposes,
especially in case of economic vulnerability and crises. Some studies have
adopted this approach (Rose et al., 1982; Foster, 1986) but still further re-
search is required on this issue.
Finally, it would be useful to consider the bankruptcy prediction problem
in a dynamic context rather than in a static one. As already noted, bank-
ruptcy is a time-evolving event. Therefore, it could be useful to consider all
the available bankruptcy related information as time evolves in order to de-
velop more reliable early warning models for bankruptcy prediction. Kahya
and Theodossiou (1999) followed this approach and they modeled the bank-
ruptcy prediction problem in a time-series context.
All the above remarks constitute interesting research directions in the
field of bankruptcy prediction. The fact that there is no research study that
combines all the above issues to develop a unified bankruptcy prediction
theory indicates the complexity of the problem and the significant research
effort that still needs to be made.
Despite this finding the existing research on the development of bank-
ruptcy prediction models should not be considered inadequate for meeting
the needs of practitioners. Indeed, any model that considers publicly avail-
able information (e.g., financial ratios) and manages to outperform the esti-
mations of expert analysts has an obvious practical usefulness. Studies on
this issue have shown that bankruptcy prediction models such as the ones
developed above perform often better than experts (Lennox, 1999).
1
Without loss of generality the subsequent analysis will focus on the case where the debtor
is a firm or organization (corporate credit risk assessment).
186 Chapter 6
credit portfolio of a leading Greek commercial bank. For this purpose, the
classification methods used in Chapter 5 are employed and the obtained re-
sults are compared.
It was decided to include in the analysis the ratios with factor loadings
greater than 0.7 (in absolute terms). Therefore, the ratios (net in-
come/sales), (net worth/total liabilities), (current assets/current liabili-
ties), (quick assets/current liabilities), (cash/current liabilities),
(dividends/cash flow), (working capital/total assets) and (current li-
abilities/inventories) were selected. Ratios and involve the profitability
6. Classification problems in finance 191
Most of the selected financial ratios are negatively related to credit risk
(i.e., higher values indicate lower credit risk). Only ratios (total liabili-
ties/total assets), (interest expenses/sales) and (current liabili-
ties/inventories) are positively related to credit risk (i.e., higher values indi-
cate higher credit risk).
Tables 6.15-6.17 present some descriptive statistics regarding the two
groups of firms (i.e., low credit risk and high credit risk) with regard
to the selected financial ratios. In particular, Table 6.15 presents the means
of the financial ratios, Table 6.16 presents the skewness and kurtosis coeffi-
cients, whereas Table 6.17 presents the correlation coefficients between the
selected financial ratios.
192 Chapter 6
6. Classification problems in finance 193
lar, the data of the firms in the sample for the most recent year (year 1995)
are used as the reference set for the development of appropriate credit risk
assessment models that distinguish the low risk firms from the high risk
ones. In a second stage the developed models are applied on years 1994 and
1993 to test their ability in providing reliable early-warning signals of the
credit risk level of firms. Following this analysis framework the subsequent
sub-sections present in detail the obtained results for all methods.
Using the data of the firms for the most recent year (year 1995) as the refer-
ence set, the application of the UTADIS method (HEUR2) led to the devel-
opment of the following additive utility function as the appropriate credit
risk assessment model:
Similarly to case of the UTADIS method, the data of the firms in the sample
for year 1995 are used to develop a credit risk assessment model through the
MHDIS method. The resulting model consists of two additive utility func-
tion, denoted by and The former characterizes the firms of
low credit risk, whereas the latter characterizes the firms of high credit risk.
The analytic form of these two functions is the following:
6. Classification problems in finance 197
The decision regarding the classification of a firm into one of the two
considered groups (low credit risk and high credit risk) is based upon the
global utilities obtained through the two developed additive utility functions.
In that sense, a firm is considered to be a low risk firm if
otherwise if then is classified as a firm of high credit risk.
The main parameters of the credit risk assessment developed through the
ELECTRE TRI method are presented in Table 6.18. Similarly to the bank-
ruptcy prediction model discussed earlier in this chapter, the developed
credit risk assessment model does not employ the discordance test of the
ELECTRE TRI method. Consequently, all the presented results and the clas-
sification of the firms are solely based on the concordance test. The classifi-
cation is performed as follows:
The estimated weights of the financial ratios in the ELECTRE TRI model
indicate that the most significant factors for assessing corporate credit risk
include the ratios net income/net worth net income/sales current
assets/current liabilities and dividends/cash flow These results
have some similarities with the conclusions drawn from the models devel-
oped through the UTADIS and the MHDIS method. In particular, ratios net
income/net worth and net income/sales were found significant in the
UTADIS model. The same ratios were found significant in characterizing the
firms of high credit risk according to the model of the MHDIS method. Fur-
thermore, ratios current assets/current liabilities and dividends/cash
200 Chapter 6
flow were found significant in describing the firms of low credit risk
according to the credit risk model of MHDIS.
The credit risk assessment model developed through the rough set approach
consists of only three simple decision rules, presented in Table 6.19.
The first rule covers all the low risk firms included in the reference set
(year 1995). Its condition part considers two ratios, namely the ratios net
income/sales and working capital/total assets The former ratio was
found significant in all the credit risk models developed through UTADIS,
MHDIS and ELECTRE TRI. This result indicates that the net profit margin
(net income/sales) is indeed a decisive factor in discriminating the low risk
firms from the high risk ones. Rules 2 and 3 describe the firms of high credit
risk. It is interesting to note that these rules are actually the negation of rule
1 that describes the low risk firms. Therefore, the above rule set actually
6. Classification problems in finance 201
consists of only one rule, that is rule 1. If rule 1 is fulfilled then it is con-
cluded that the firm under consideration is of low credit risk, otherwise it is a
high risk firm.
The major advantage of the credit risk assessment model developed
through the rough set approach is derived from the fact that it is quite a com-
pact one, in terms of the information required to implement it. The analyst
using this model needs to specify only two ratios There-
fore, credit risk assessment decisions can be taken in very short time without
requiring the use of any specialized software to implement the developed
model. This significantly reduces the time and the cost of the decisions taken
by the credit analysts.
The application of the LDA, QDA as well as LA in the sample for credit risk
model development led to the estimation of three credit risk assessment
models. The form of these models is similar to the one of the bankruptcy
prediction models presented in sub-section 2.3.5 earlier in this chapter. Ta-
bles 6.20 and 6.21 present the parameters of these models (i.e., discriminant
coefficients, constant terms and cross-product term of the QDA model). The
classification of the firms as high or low risk is performed using the classifi-
cation rules discussed in sub-section 2.3.5 for the bankruptcy prediction case
and consequently they are not repeated.
202 Chapter 6
tion and then comparing the models’ classification with the a-priori known
classification of the firms in the two credit risk classes. Thus, the obtained
results are indicative of the performance of the models to provide accurate
early-warning estimations on the credit risk of the firms.
A close examination of the results indicates that the credit risk assess-
ment models developed through the three MCDA classification methods
(UTADIS, MHDIS and ELECTRE TRI) are more effective compared to the
models of the other methods. In particular, in terms of the overall error rate,
the UTADIS model outperforms the models of rough sets, LDA, QDA and
LA in both years 1994 and 1993. Similarly, the MHDIS model outperforms
the credit risk models of the three statistical methods in both 1994 and 1993.
Compared to the rough set model, MHDIS provides the same performance in
1994, but in 1993 its overall error rate is significantly lower than the rough
set model. Similar conclusions can also be derived for the performance of
204 Chapter 6
the ELECTRE TRI credit risk assessment model as opposed to the rough set
model and the models of the three statistical methods. The comparison of the
three MCDA models to each other shows that UTADIS provides the best
result in 1994, but its performance in 1993 deteriorates significantly com-
pared both to MHDIS and ELECTRE TRI. On the other hand, MHDIS
seems to provide more robust results, since its overall error rate deteriorates
more slowly from 1994 to 1993 compared to the two other MCDA models.
Another interesting point that needs to be noted in the obtained results, is
that the credit risk models developed through the four non-parametric classi-
fication techniques (UTADIS, MHDIS, ELECTRE TRI and rough sets) pro-
vide significantly lower type I error rates in both 1994 and 1993 compared to
the models of the three statistical techniques. This fact indicates that these
models are able to identify the high risk firms with a higher success rate than
the statistical methods. This finding has significant practical implications for
the selection of the appropriate credit risk assessment model, since analysts
often fill more comfortable with models that are efficient in identifying the
firms of high credit risk.
A further analysis of the practical usefulness of these credit risk assess-
ment models could be performed through the comparison of their results
with the corresponding error rates that are obtained by the expert credit ana-
lysts of the bank from which the data have been derived. Obviously, a credit
risk assessment model that performs consistently worse than the actual credit
analyst, cannot provide meaningful support in the credit risk assessment
process. From a decision-aiding perspective such a model is not consistent
with the credit analyst’s evaluation judgment and therefore it is of limited
practical use. On the other hand, models that are able to perform at least as
good with the analysts’ estimations, they can be considered as consistent
with the credit analyst’s evaluation judgment and furthermore, they have the
ability to eliminate the inconsistencies that often arise in credit risk assess-
ment based on human judgment. Therefore, the incorporation of such models
in the bank’s credit risk management process is of major help, both for as-
sessing new credit applications submitted to the bank, as well as for monitor-
ing the risk exposure of the bank from its current credit portfolio. However,
the information required to perform such a comparison between the devel-
oped models’ estimations and the corresponding estimations of the bank’s
credit analysts was not available and consequently the aforementioned
analysis was solely based on the comparison between the selected classifica-
tion methods.
6. Classification problems in finance 205
4. STOCK EVALUATION
4.1 Problem domain
Portfolio selection and management has been one of the major fields of in-
terest in the area of finance for almost the last 50 years. Generally stated,
portfolio selection and management involves the construction of a portfolio
of securities (stocks, bonds, treasury bills, mutual funds, repos, financial de-
rivatives, etc.) that maximizes the investor’s2 utility. The term “construction
of a portfolio” refers to the allocation of a known amount of capital to the
securities under consideration. Generally, portfolio construction can be real-
ized as a two stage process:
1. Initially, in the first stage of the process, the investor needs to evaluate
the available securities that constitute possible investment opportunities
on the basis of their future perspectives. This evaluation leads to the se-
lection of a reduced set consisting of the best securities. Considering the
huge number of securities that are nowadays traded in the international
financial markets, the significance of this stage becomes apparent. The
investor is very difficult to be able to manage a portfolio consisting of a
large number of securities. Such a portfolio is quite inflexible since the
investor will need to be able to gather and analyze a huge amount of
daily information on the securities in the portfolio. This is a difficult and
time consuming process. Consequently portfolio updates will be difficult
to take place in order to adjust to the rapidly changing market conditions.
Furthermore, a portfolio consisting of many securities imposes increased
trading costs which are often a decisive factor in portfolio investment
decisions. Therefore, a compact set of securities needs to be formed for
portfolio construction purposes.
2. Once this compact set of the best securities is specified after the evalua-
tion in the first stage, the investor needs to decide on the allocation of the
available capital to these securities. The allocation should be performed
so that the resulting portfolio best meets the investor’s policy, goals and
objectives. Since these goals/objectives are often diversified in nature
(some are related to the expected return, whereas some are related to the
risk of the portfolio), the resulting portfolio cannot be an optimal one, at
least in the sense that the term “optimal” has in the traditional optimiza-
tion framework where the existence of a single objective is assumed. In-
2
The term “investor” refers both to individual investors as well as to institutional investors,
such as portfolio managers and mutual funds managers. Henceforth, the term “investor” is
used to refer to anyone (individual, firm or organization) who is involved with portfolio
construction and management.
206 Chapter 6
al. (1998), Hurson and Zopounidis (1995, 1996, 1997), Bertsimas et al.
(1999), Zopounidis and Doumpos (2000d).
The use of classification techniques in the two-stage portfolio construc-
tion process discussed in the beginning of this sub-section can be realized
during the first stage and it can be classified in the second group of studies
mentioned above. The use of a classification scheme is not an uncommon
approach to practitioners who are involved with security evaluation. For in-
stance, in the case of stock evaluation most investment analysts and financial
institutions periodically announce their estimations on the performance of
the stocks in the form of recommendations such as “strong buy”, “buy”,
“market perform”, etc. Smith (1965) first used a classification method
(LDA) in order to develop a model that can reproduce such expert’s recom-
mendations. A similar study was compiled by White (1975). Some more re-
cent studies such as the ones of Hurson and Zopounidis (1995, 1996, 1997),
Zopounidis et al. (1999) employ MCDA classification methods including
ELECTRE TRI and UTADIS for the development of stock classification
models considering the investor’s policy and preferences.
Of course, except for the evaluation and classification on the basis of ex-
pert’s judgments, other classification schemes can also be considered. For
instance, Klemkowsky and Petty (1973) used LDA to develop a stock classi-
fication model that classified stocks into risk classes on the basis of their his-
torical return volatility. Alternatively, it is also possible to consider a classi-
fication scheme where the stocks are classified on the basis of their expected
future return (e.g., stocks that will outperform the market, stocks that will
not outperform the market, etc.). Jog et al. (1999) adopted this approach and
used the rough set theory to develop a rule-based model that used past data
to classify the stocks into classes according to their expected future return, as
top performers (stocks with the highest future return), intermediate stocks,
low performers (stocks with the lowest future return). A similar approach
was used by John et al. (1996) who employed a machine learning methodol-
ogy, whereas Liu and Lee (1997) developed an expert system that provides
buy and sell recommendations (a two-group classification scheme) on the
basis of technical analysis indicators for the stocks (Murphy, 1995).
The results obtained through such classification models can be integrated
in a later stage of the analysis with an optimization methodology (goal pro-
gramming, multiobjective programming) to perform the construction of the
most appropriate portfolio.
that classifies the stocks into classes specified by an expert stock market ana-
lyst. The development of such a model has both research and practical impli-
cations for at least two reasons:
The model can be used by stock market analysts and investors in their
daily practice as a supportive tool for the evaluation of stocks on the ba-
sis of their financial and stock market performance. This reduces signifi-
cantly the time and cost of the analysis of financial and stock market data
on a daily basis.
If the developed model has a specific quantitative form (utility function,
discriminant function, outranking relation, etc.) it can be incorporated in
the portfolio construction process. Assuming that the developed model is
a stock performance evaluation mechanism representing the judgment
policy of an expert stock market analyst, then the construction of a port-
folio that achieves the best performance according to the developed
model can be considered as an “optimal” one in the sense that it best
meets the decision-maker’s preferences.
From the methodological point of view, this application has several dif-
ferences compared to the previous two applications on bankruptcy prediction
and credit risk assessment:
1. The stock evaluation problem is considered as a multi-group classifica-
tion problem (the stocks are classified into three groups). Both bank-
ruptcy prediction and credit risk assessment were treated as two-group
problems.
2. There is an imbalance in the size of the groups in the considered sample.
In both bankruptcy prediction and credit risk assessment each group in
the samples used consisted of half the total number of firms. On the
other hand, in this application the number of stocks per group differs for
the three groups. This feature in combination with the consideration of
more than two groups increases the complexity of this application com-
pared to the two previous ones.
3. The sample used in this application involves only one period and there
is no additional holdout sample. Consequently, the model validation
techniques used in the bankruptcy prediction and in the credit risk as-
sessment cases are not suitable in this application. To tackle this prob-
lem a jackknife model validation approach is employed (McLachlan,
1992; Kahya and Theodossiou, 1999; Doumpos et al., 2001) to obtain an
unbiased estimate of the classification performance of the developed
stock evaluation models. The details of this approach will be discussed
later.
Having in mind these features, the presented stock evaluation case study
involves the evaluation of 98 stocks listed in the Athens Stock Exchange
6. Classification problems in finance 211
(ASE). The data are derived from the studies of Karapistolis et al. (1996) and
Zopounidis et al. (1999). All stocks in the sample were listed in ASE during
1992 constituting the 68.5% of the total number of stocks listed in ASE at
that time (143 stocks). The objective of the application is to develop a stock
evaluation model that will classify the stocks into the following three groups:
Group This group consists of 9 stocks with the best investment po-
tentials in the medium/long run. These stocks are attractive to the inves-
tors, while the corresponding firms are in a sound financial position and
they have a very positive reputation in the market.
Group The second group of stock includes 31 stocks. The overall
performance and stock market behavior of these stocks is rather moder-
ate. However, they could be used by the portfolio manager to achieve
portfolio diversification.
Group This is the largest group of stocks, since it includes 58 stocks
out of the 98 stocks in the sample. The stocks belonging into this group
do not seem to be good investment opportunities, at least for the medium
and long-term. The consideration of these stocks in a portfolio construc-
tion context can only be realized in a risk-prone investment policy seek-
ing short-term profits.
This trichotomous classification approach enables the portfolio manager
to distinguish the promising stocks from the less promising ones. However,
the stocks that are found to belong to the third class (less promising stocks)
are not necessarily excluded from further consideration. Although the portfo-
lio manager is informed about their poor stock market and financial per-
formance in the long-term, he may select some of them (the best ones ac-
cording to their performance measured on the basis of the developed classi-
fication model) in order to achieve portfolio diversification or to make short-
term profits. In that sense, the obtained classification provides an essential
form of information to portfolio managers and investors; it supports the
stock evaluation procedure and leads to the selection of a limited number of
stocks for portfolio construction.
The classification of the stocks in the sample was specified by an expert
stock market analyst with experience on ASE. The classification of the
stocks by this expert was based on the consideration of 15 financial and
stock market criteria describing different facets of the performance of the
stocks. These criteria are presented in Table 6.23.
Criteria describe the stock market behavior of the stocks, whereas
criteria are commonly used financial ratios similar to the ones em-
ployed in the previous two case studies. The combination of stock market
indices and financial ratios enables the evaluation of all the fundamental fea-
tures of the stocks and the corresponding firms.
212 Chapter 6
For most of the criteria, the portfolio managers’ preferences are increas-
ing functions on their scale; this means that the greater the value of the crite-
ria, the greater is the satisfaction of the portfolio manager. On the contrary,
criterion (P/E ratio) has a negative rate, which means that the portfolio
managers’ preference decreases as the value of this criterion increases (i.e., a
portfolio manager would prefer a stock with low price that could yield high
earnings). Furthermore, although it is obvious that the criteria and are
correlated, an expert portfolio manager that has collaborated in this case
study, has indicated that both criteria should be retained in the analysis.
3
Gross book value per share=Total assets/Number of shares outstanding
Capitalization ratio=1/(Price/Earning per share)
Stock market value=(Number of shares outstanding)×(Price)
Marketability=Trading volume/Number of shares outstanding
Financial position progress=(Book value at year t)/(Book value at year t-1)
Dividend yield=(Dividend paid at time t)/(Price at time t)
Capital gain=(Price at time t- Price at time t-1)/(Price at time t-1)
Exchange flow ratio=(Number of days within a year when transactions for the stock took
place)/(Number of days within a year when transactions took place in ASE)
Round lots traded per day=(Trading volume over a year)/[(Number of days within a year
when transactions took place in ASE)×(Minimum stock negotiation unit)]
Transactions value per day=(Transactions value over a year)/(Number of days within a year
when transactions for the stock took place)
6. Classification problems in finance 213
Tables 6.24 and 6.25 present some descriptive statistics (group means,
skewness, kurtosis and correlations) regarding the performance of the stocks
in the sample on the considered evaluation criteria. The comparison of the
criteria averages for the three groups of stocks presented in Table 6.24 show
that the three groups have significantly different performance as far as the
stock market criteria are concerned (only criterion is found insig-
nificant). On the contrary, the existing differences for the financial ratios are
not found to be significant (except for the net income/net worth ratio,
214 Chapter 6
6. Classification problems in finance 215
tion. It should be noted, that for the MHDIS method these statistics involve
all the additive utility functions that are developed. These include four utility
functions. Functions and are developed at the first stage of the hierar-
chical discrimination process for the discrimination of the stocks belonging
into group from the stocks of groups and The former function
characterizes the high performance stocks whereas the latter
function characterizes the stocks of groups and (medium perform-
ance stocks and low performance stocks, respectively). The second pair of
utility functions and are developed at the second stage of the hierar-
chical discrimination process for the distinction between medium perform-
ance stocks and low performance stocks The function
characterizes the medium performance stocks, whereas the function
characterizes the low performance stocks.
6. Classification problems in finance 217
cient of variation for the weights of 9 out of the 15 ratios is lower than one.
In the ELECTRE TRI’s results only two weight estimates have coefficient of
variation lower than one, whereas in the MHDIS method the weight esti-
mates for all the significant ratios (ratios with average weight higher than
10%) have coefficient of variation lower than one.
Table 6.28 provides some further results with regard to the significance
of the stock evaluation criteria in the classification models developed by the
three MCDA classification methods. In particular, this table presents the
ranking of the stock evaluation criteria according to their importance in the
models of each method. The criteria are ranked from the most significant
(lowest entries in the table) to the least significant ones (highest entries in the
table). In each replication of the jackknife experiment, the criteria are ranked
in this way for each of the models developed by the three MCDA methods.
The different rankings obtained over all 150 replications of the jackknife
experiment are then averaged. The average rankings are the one illustrated in
Table 6.28. The Kendall’s coefficient of concordance W is also reported for
each method as a measure of the concordance in the rankings of the criteria
over the 150 replications. The results indicate that the results of the UTADIS
method are quite more robust than the ones of ELECTRE TRI and MHDIS.
In addition, the significance of the transactions value per day is clearly
indicated since in all methods this criterion has one of the highest positions
in the rankings.
6. Classification problems in finance 219
Figure 6.5 presents some results derived from the developed rule-based
models with regard to the significance of the stock evaluation criteria. The
presented results involve the number of replications for which each stock
evaluation criterion was considered in the developed stock classification
rules.
The more frequently a criterion appears in the developed rules the more
significant it is considered. On the basis of this remark, the results of Figure
6.5 show that the gross book value per share ratio and the exchange flow
ratio are the most significant factors in the stock classification rules,
since they are used in all 150 rule-based models developed at each replica-
tion of the jackknife process. The former ratio was found significant in the
MHDIS models (cf. Table 6.27), while in the ELECTRE TRI models it was
often given a veto ability. On the other hand, the exchange flow ratio
was found significant by all the three MCDA classification methods. An-
other ratio that was found significant in the models developed by the MCDA
222 Chapter 6
methods, the transactions value per day ratio is quite often used in the
stock classification rules constructed through the rough set approach. In par-
ticular, this ratio is used in 127 out of the 150 rule-based models that are de-
veloped. On the other hand the net worth/total assets ratio that was
found significant in the UTADIS and the ELECTRE TRI models is not
found to be significant by the rough set approach; it is used only in 19 out of
the 150 rule-based models.
other methods too, but the errors are higher especially in the case of
LDA and LA.
Overall, the rather high error rates of all methods (28.89%-41.83%) indi-
cate the complexity of the stock evaluation problem. The dynamic nature of
the stock markets in combination with the plethora of internal and external
factors that affect stock performance as well as the huge volume of financial
and stock market information that is available to investors and stock market
analysts, all contribute to the complexity of the stock evaluation problem.
Chapter 7
Conclusions and future perspectives
1. The parameters of the additive utility models (criteria weights and mar-
ginal utility functions) have clear interpretation that can be understood
by the decision maker. This is a very important issue for understanding
the results and the recommendations of the developed models with re-
gard to the classification of the alternatives and the amelioration of the
model so that it is as consistent as possible with the decision maker’s
system of preferences. Actually, the model development process in the
context of the proposed MCDA methods should not be considered as a
straightforward automatic process involving the solution of an optimiza-
tion problem. Instead, the specification of the model’s parameters
through an optimization procedure is only the first stage of the model
development process. The results obtained at this first stage constitute
only an initial basis for the further calibration of the model through the
interactive communication among the decision maker and the analyst.
The implementation of this interactive process will clarify and eliminate
the possible inconsistencies in the model or even in the decision
maker’s judgments.
2. The use of the additive utility function as the modeling and representa-
tion form enables the use of qualitative criteria. Many classification
methods from the fields of statistics and econometrics but also non-
parametric classification techniques such as mathematical programming
and neural networks assume that all criteria (variables) are quantitative.
For qualitative criteria two approaches are usually employed: (a) Quanti-
fication of the qualitative scale by assigning an arbitrary chosen real or
integer value to each level of the scale (e.g., 0=low, l=medium, 2=high).
(b) Consideration of each level of the qualitative scale as a distinct bi-
nary variable (criterion). For instance, the criterion market reputation of
the firm measured in the three level scale {good, medium, bad}, follow-
ing this second approach would be broken down into three binary crite-
ria: market reputation good={0, 1}, market reputation medium={0, 1},
market reputation bad={0, 1}, where zeros correspond to no and ones to
yes. Both these approaches, alter the nature of the qualitative criteria and
hardly correspond to the way that the decision maker perceives them. On
the other hand, the proposed MCDA methods do not require any change
in the way that the qualitative criteria are measured, and consequently
the developed classification models can easily combine quantitative and
qualitative criteria. This is an important advantage, mainly for real-world
problems where qualitative information is vital.
Of course, additive utility functions are not the only available choice for
the representation and modeling of the decision maker’s preferences in clas-
sification problems within the preference disaggregation paradigm. In Chap-
ter 5 the use of the outranking relation model was considered as the criteria
228 Chapter 7
Abad, P.L. and Banks, W.J. (1993), “New LP based heuristics for the classification problem”,
European Journal of Operational Research, 67, 88–100.
Altman, E.I. (1968), “Financial ratios, discriminant analysis and the prediction of corporate
bankruptcy”, Journal of Finance, 23, 589-609.
Altman, E.I. (1993), Corporate Financial Distress and Bankruptcy, John Wiley and Sons,
New York.
Altman, E.I. and Saunders, A. (1998), “Credit risk measurement: Developments over the last
20 years”, Journal of Banking and Finance, 21, 1721–1742.
Altman, E.I., Avery, R., Eisenbeis, R. and Stinkey, J. (1981), Application of Classification
Techniques in Business, Banking and Finance, Contemporary Studies in Economic and
Financial Analysis, Vol. 3, JAI Press, Greenwich.
Altman E.I., Hadelman, R.G. and Narayanan, P. (1977), “Zeta analysis: A new model to iden-
tify bankruptcy risk of corporations”, Journal of Banking and Finance, 1, 29–51.
Andenmatten, A. (1995), Evaluation du Risque de Défaillance des Emetteurs d’Obligations:
Une Approche par l’Aide Multicritère à la Décision, Presses Polytechniques et Universi-
taires Romandes, Lausanne.
Anderson, T.W. (1958), An Introduction to Multivariate Statistical Analysis, Wiley, New
York.
Archer, N.P. and Wang, S. (1993), “Application of the back propagation neural networks
algorithm with monotonicity constraints for two-group classification problems”, Deci-
sion Sciences, 24, 60-75.
Bajgier, S.M. and Hill, A.V. (1982), “A comparison of statistical and linear programming
approaches to the discriminant problem”, Decision Sciences, 13, 604–618.
Bana e Costa, C.A. and Vansnick, J.C. (1994), “MACBETH: An interactive path towards the
construction of cardinal value functions”, International Transactions on Operations Re-
search, 1, 489-500.
234
Banks, W.J. and Abad, P.L. (1991), “An efficient optimal solution algorithm for the classifi-
cation problem”, Decision Sciences, 22, 1008–1023.
Bardos, M. (1998), “Detecting the risk of company failure at the Banque de France”, Journal
of Banking and Finance, 22, 1405–1419.
Bastian, A. (2000), “Identifying fuzzy models utilizing genetic programming”, Fuzzy Sets and
Systems, 113, 333-350.
Beaver, W.H. (1966), “Financial ratios as predictors of failure”, Empirical Research in Ac-
counting: Selected Studies, Supplement to Journal of Accounting Research, 5, 179-199.
Belacel, N. (2000), “Multicriteria assignment method PROAFTN: Methodology and medical
applications”, European Journal of Operational Research, 125, 175-183.
Belton, V. and Gear, T. (1983), “On a short-coming of Saaty’s method of analytic hierar-
chies”, Omega, 11/3, 228-230.
Benayoun, R., De Montgolfier, J., Tergny, J. and Larichev, O. (1971), "Linear programming
with multiple objective function: Stem method (STEM)", Mathematical Programming,
1/3, 366-375.
Bergeron, M., Martel, J.M. and Twarabimenye, P. (1996), “The evaluation of corporate loan
applications based on the MCDA”, Journal of Euro-Asian Management, 2/2, 16-46.
Berkson, J. (1944), “Application of the logistic function to bio-assay”, Journal of the Ameri-
can Statistical Association, 39, 357-365.
Bertsimas, D., Darnell, C. and Soucy, R. (1999), “Portfolio construction through mixed-
integer programming at Grantham, Mayo, Van Otterloo and Company”, Interfaces, 29,
49-66.
Black, F. and Scholes, M. (1973), “The pricing of options and corporate liabilities”, Journal
of Political Economy, 81, 659-674.
Bliss, C.I. (1934), “The method of probits”, Science, 79, 38-39.
Boritz, J.E. and Kennedy, D.B. (1995), “Effectiveness of neural network types for prediction
of business failure”, Expert Systems with Applications, 9/4, 503-512.
Brans, J.P. and Vincke, Ph. (1985), “A preference ranking organization method”, Manage-
ment Science, 31/6, 647-656.
Breiman, L., Friedman, J.H., Olsen, R.A. and Stone, C.J. (1984), Classification and Regres-
sion Trees, Pacific Grove, California.
Carmone Jr., F.J., Kara, A. and Zanakis, S.H. (1997), “A Monte Carlo investigation of in-
complete pairwise comparison matrices in AHP”, European Journal of Operational Re-
search, 102, 538-553.
Catelani, M. and Fort, A., (2000), “Fault diagnosis of electronic analog circuits using a radial
basis function network classifier”, Measurement, 28/3, 147-158.
Casey, M., McGee, V. and Stinkey, C. (1986), “Discriminating between reorganized and liq-
uidated firms in bankruptcy”, The Accounting Review, April, 249–262.
Charnes, A. and Cooper, W.W. (1961), Management Models and Industrial Applications of
Linear Programming, Wiley, New York.
References 235
Charnes, A., Cooper, W.W. and Ferguson, R.O. (1955), “Optimal estimation of executive
compensation by linear programming”, Management Science, 2, 138-151.
Chmielewski, M.R. and Grzymala-Busse, J.W. (1996), “Global discretization of continuous
attributes as preprocessing for machine learning”, International Journal of Approximate
Reasoning, 15, 319-331.
Choo, E.U. and Wedley, W.C. (1985), “Optimal criterion weights in repetitive multicriteria
decision–making”, Journal of the Operational Research Society, 36/11, 983–992.
Clark, P. and Niblett, T. (1989), “The CN2 induction algorithm”, Machine Learning, 3, 261-
283.
Colson, G. and de Bruyn, Ch. (1989), “An integrated multiobjective portfolio management
system”, Mathematical and Computer Modelling, 12/10-11, 1359-1381.
Conway, D.G., Victor Cabot A. and Venkataramanan, M.A. (1998), “A genetic algorithm for
discriminant analysis”, Annals of Operations Research, 78, 71-82.
Cook, W.D. and Kress, M. (1991), “A multiple criteria decision model with ordinal prefer-
ence data”, European Journal of Operational Research, 54, 191-198.
Courtis, J.K. (1978), “Modelling a financial ratios categoric framework”, Journal of Business
Finance & Accounting, 5/4, 371-387.
Cronan, T.P., Glorfeld, L.W. and Perry, L.G. (1991), “Production system development for
expert systems using a recursive partitioning induction approach: An application to
mortgage, commercial and consumer lending”, Decision Sciences, 22, 812-845.
Devaud, J.M., Groussaud, G. and Jacquet-Lagrèze, E. (1980), “UTADIS: Une méthode de
construction de fonctions d’utilité additives rendant compte de jugements globaux”,
European Working Group on Multicriteria Decision Aid, Bochum.
Diakoulaki, D., Zopounidis, C., Mavrotas, G. and Doumpos, M. (1999), “The use of a prefer-
ence disaggregation method in energy analysis and policy making”, Energy–The Interna-
tional Journal, 24/2, 157-166.
Dias, L., Mousseau, V., Figueira, J. and Climaco, J. (2000), “An aggregation/disaggregation
approach to obtain robust conclusions with ELECTRE TRI”, Cahier du LAMSADE, No
174, Université de Paris-Dauphine.
Dillon, W.R. and Goldstein, M. (1978), “On the performance of some multinomial classifica-
tion rules”, Journal of the American Statistical Association, 73, 305-313.
Dimitras, A.I., Zopounidis, C. and Hurson, C. (1995), “A multicriteria decision aid method
for the assessment of business failure risk”, Foundations of Computing and Decision Sci-
ences, 20/2, 99-112.
Dimitras, A.I., Zanakis, S.H. and Zopounidis, C. (1996), “A survey of business failures with an
emphasis on prediction methods and industrial applications”, European Journal of Opera-
tional Research, 90, 487-513.
Dimitras, A.I., Slowinski, R., Susmaga, R. and Zopounidis, C. (1999), “Business failure pre-
diction using rough sets”, European Journal of Operational Research, 114, 263-280.
Dominiak C. (1997), “Portfolio selection using the idea of reference solution”, in: G. Fandel
and Th. Gal (eds.), Multiple Criteria Decision Making, Proceedings of the Twelfth Inter-
236
Greco, S., Matarazzo, B. and Slowinski, R. (1999a), “The use of rough sets and fuzzy sets in
MCDM”, in: T. Gal, T. Hanne and T. Stewart (eds.), Advances in Multiple Criteria Deci-
sion Making, Kluwer Academic Publishers, Dordrecht, 14.1-14.59.
Greco, S., Matarazzo, B., Slowinski, R. and Zanakis, S. (1999b), “Rough set analysis of in-
formation tables with missing values”, in: D. Despotis and C. Zopounidis (Eds.), Integrat-
ing Technology & Human Decisions: Bridging into the 21st Century, Vol. II, Proceedings of
the 5th International Meeting of the Decision Sciences Institute, New Technologies Edi-
tions, Athens, 1359–1362.
Greco, S., Matarazzo, B. and Slowinski, R. (2000a), “Extension of the rough set approach to
multicriteria decision support”, INFOR, 38/3, 161–196.
Greco, S., Matarazzo, B. and Slowinski, R. (2000b), “Dealing with missing values in rough
set analysis of multi-attribute and multi-criteria decision problems”, in: S.H. Zanakis, G.
Doukidis and C. Zopounidis (Eds.), Decision Making: Recent Developments and World-
wide Applications, Kluwer Academic Publishers, Dordrecht, 295–316.
Greco, S., Matarazzo, B. and Slowinski, R. (2000b), “Dealing with missing values in rough
set analysis of multi-attribute and multi-criteria decision problems”, in: S.H. Zanakis, G.
Doukidis and C. Zopounidis (Eds.), Decision Making: Recent Developments and World-
wide Applications, Kluwer Academic Publishers, Dordrecht, 295–316.
Greco, S., Matarazzo, B. and Slowinski, R. (2001), “Conjoint measurement and rough sets
approach for multicriteria sorting problems in presence of ordinal data”, in: A. Colorni,
M. Paruccini and B. Roy (eds), AMCDA-Aide Multicritère à la decision (Multiple Crite-
ria Decision Aiding), EUR Report, Joint Research Centre, The European Commission,
Ispra (to appear).
Greco, S., Matarazzo, B. and Slowinski, R. (2002), “Rough sets methodology for sorting
problems in presence of multiple attributes and criteria”, European Journal of Opera-
tional Research, 138, 247-259.
Grinold, R.C. (1972), “Mathematical programming methods for pattern classification”, Man-
agement Science, 19, 272-289.
Grzymala-Busse, J.W. (1992), “LERS: A system for learning from examples based on rough
sets”, in: R. Slowinski (ed.), Intelligent Decision Support. Handbook of Applications and
Advances of the Rough Sets Theory, Kluwer Academic Publishers, Dordrecht, 3–18.
Grzymala-Busse, J.W. and Stefanowski, J. (2001), “Three discretization methods for rule
induction”, International Journal of Intelligent Systems, 26, 29-38.
Gupta, M.C. and Huefner, R.J. (1972), “A cluster analysis study of financial ratios and indus-
try characteristics”, Journal of Accounting Research, Spring, 77-95.
Gupta, Y.P., Rao, R.P. and Bagghi, P.K. (1990), “Linear goal programming as an alternative
to multivariate discriminant analysis: A Note”, Journal of Business Finance and Ac-
counting, 17/4, 593-598.
Hand, D.J. (1981), Discrimination and Classification, Wiley, New York.
Harker, P.T. and Vargas, L.G. (1990), “Reply to ‘Remark on the analytic hierarchy process’
By J.S. Dyer”, Management Science, 36/3, PP. 269-273.
References 239
Horsky, D. and Rao, M.R. (1984), “Estimation of attribute weights from preference compari-
sons”, Management Science, 30/7, 801-822.
Hosseini, J.C. and Armacost, R.L. (1994), “The two-group discriminant problem with equal
group mean vectors: An experimental evaluation of six linear/nonlinear programming
formulations”, European Journal of Operational Research, 77, 241-252.
Hung, M.S. and Denton, J.W. (1993), “Training neural networks with the GRG2 nonlinear
optimizer”, European Journal of Operational Research, 69, 83-91.
Hurson Ch. and Ricci, N. (1998), “Multicriteria decision making and portfolio management
with arbitrage pricing theory”, in: C. Zopounidis (ed.), Operational Tools in The Man-
agement of Financial Risks, Kluwer Academic Publishers, Dordrecht, 31-55.
Hurson, Ch. and Zopounidis, C. (1995), “On the use of multi-criteria decision aid methods to
portfolio selection”, Journal of Euro-Asian Management, 1/2, 69-94.
Hurson Ch. and Zopounidis C. (1996), “Méthodologie multicritère pour l’évaluation et la
gestion de portefeuilles d’actions”, Banque et Marché 28, Novembre-Décembre, 11-23.
Hurson, Ch. and Zopounidis, C. (1997), Gestion de Portefeuille et Analyse Multicritère,
Economica, Paris.
Ishibuchi, H., Nozaki, K. and Tanaka, H. (1992), “Distributed representation of fuzzy rules
and its application to pattern classification”, Fuzzy Sets and Systems, 52, 21-32.
Ishibuchi, H., Nozaki, K. and Tanaka, H. (1993), “Efficient fuzzy partition of pattern space
for classification problems”, Fuzzy Sets and Systems, 59, 295-304.
Inuiguchi, M., Tanino, T. and Sakawa, M. (2000), “Membership function elicitation in possi-
bilistic programming problems”, Fuzzy Sets and Systems, 111, 29-45.
Jablonsky, J. (1993), “Multicriteria evaluation of clients in financial houses”, Central Euro-
pean Journal of Operations Research and Economics, 3/2, 257-264.
Jacquet-Lagrèze, E. (1995), “An application of the UTA discriminant model for the evalua-
tion of R & D projects”, in: P.M. Pardalos, Y. Siskos, C. Zopounidis (eds.), Advances in
Multicriteria Analysis, Kluwer Academic Publishers, Dordrecht, 203-211.
Jacquet-Lagrèze, E. and Siskos, J. (1978), “Une méthode de construction de fonctions d’
utilité additives explicatives d’ une préférence globale”, Cahier du LAMSADE, No 16,
Université de Paris-Dauphine.
Jacquet-Lagrèze, E. and Siskos, Y. (1982), “Assessing a set of additive utility functions for
multicriteria decision making: The UTA method”, European Journal of Operational Re-
search, 10, 151-164.
Jacquet-Lagrèze, E. and Siskos, J. (1983), Méthodes de Décision Multicritère, Editions
Hommes et Techniques, Paris.
Jacquet-Lagrèze, E. and Siskos, J. (2001), “Preference disaggregation: Twenty years of
MCDA experience”, European Journal of Operational Research, 130, 233-245.
Jelanek, J. and Stefanowki, J. (1998), “Experiments on solving multiclass learning problems
by n2-classifier”, in: Proceedings of the 10th European Conference on Machine Learning,
Chemnitz, April 21-24, 1998, Lecture Notes in AI, vol. 1398, Springer-Verlag, Berlin,
172-177.
240
Jensen, R.E. (1971), “A cluster analysis study of financial performance of selected firms”,
The Accounting Review, XLVI, January, 36-56.
Joachimsthaler, E.A. and Stam, A. (1988), “Four approaches to the classification problem in
discriminant analysis: An experimental study”, Decision Sciences, 19, 322–333.
Joachimsthaler, E.A. and Stam, A. (1990), “Mathematical programming approaches for the
classification problem in two-group discriminant analysis”, Multivariate Behavioral Re-
search, 25/4, 427-454.
Jog, V., Michalowski, W., Slowinski, R. and Susmaga, R. (1999), “The Rough Sets Analysis
and the Neural Networks Classifier: A Hybrid Approach to Predicting Stocks’ Perform-
ance”, in: D.K. Despotis and C. Zopounidis (eds.), Integrating Technology & Human De-
cisions: Bridging into the 21st Century, Vol. II, Proceedings of the 5th International Meet-
ing of the Decision Sciences Institute, New Technologies Editions, Athens, 1386-1388.
John, G.H., Miller, P. and Kerber, R. (1996), “Stock selection using RECONTM/SM,., in: Y. Abu-
Mostafa, J. Moody, P. Refenes and A. Weigend (eds.), Neural Networks in Financial Engi-
neering, World Scientific, London, 303-316.
Karapistolis, D., Katos, A., and Papadimitriou, G. (1996), “Selection of a solvent portfolio
using discriminant analysis”, in: Y. Siskos, C. Zopounidis, and K. Pappis (Eds.), Man-
agement of small firms, Cretan University Editions, Iraklio, 135-140 (in Greek).
Kahya, E. and Theodossiou, P. (1999), “Predicting corporate financial distress: A time-series
CUSUM methodology”, Review of Quantitative Finance and Accounting, 13, 323-345.
Karst, O.J. (1958), “Linear curve fitting using least deviations”, Journal of the American Sta-
tistical Association, 53, 118-132.
Keasey, K. and Watson, R. (1991), “Financial distress prediction models: A review of their
usefulness”, British Journal of Management, 2, 89-102.
Keasey, K., McGuinness, P. and Short, H. (1990), “Multilogit approach to predicting corpo-
rate failure-Further analysis and the issue of signal consistency”, Omega, 18/1, 85-94.
Kelley, J.E. (1958), “An application of linear programming to curve fitting”, Journal of In-
dustrial and Applied Mathematics, 6, 15-22.
Keeney, R.L. and Raiffa, H. (1993), Decisions with Multiple Objectives: Preferences and
Value Trade-offs, Cambridge University Press, Cambridge.
Khalil, J., Martel, J-M. and Jutras, P. (2000), “A multicriterion system for credit risk rating”,
Gestion 2000: Belgian Management Magazine, 15/1, 125-146.
Khoury, N.T., Martel, J.M. and Veilleux, M. (1993), “Méthode multicritère de sélection de
portefeuilles indiciels internationaux”, L’Actualité Economique, Revue d’Analyse
Economique, 69/1, 171-190.
Klemkowsky, R. and Petty, J.W. (1973), “A multivariate analysis of stock price variability”,
Journal of Business Research, Summer.
Koehler, G.J. and Erenguc, S.S. (1990), “Minimizing misclassifications in linear discriminant
analysis”, Decision Sciences, 21, 63–85.
Kohara, K., Ishikawa, T., Fukuhara, Y. and Nakamura, Y. (1997), “Stock price prediction
using prior knowledge and neural networks”, Intelligent Systems in Accounting, Finance
and Management, 6, 11-22.
References 241
Koopmans, T.C. (1951), Activity Analysis of Production and Allocation, John Wiley and
Sons, New York.
Kordatoff, Y. and Michalski, R.S. (1990), Machine Learning: An Artificial Intelligence Ap-
proach, Volume III, Morgan Kaufmann Publishers, Los Altos, California.
Korhonen, P. (1988), “A visual reference direction approach to solving discrete multiple crite-
ria problems”, European Journal of Operational Research, 34, 152-159.
Korhonen, P. and Wallenius, J. (1988), “A Pareto race”, Naval Research Logistics, 35, 615-
623.
Kosko, B. (1992), Neural Networks and Fuzzy Systems, Prentice-Hall, Englewood Cliffs,
New Jersey.
Krzanowski, W.J. (1975), “Discrimination and classification using both binary and continu-
ous variables”, Journal of the American Statistical Association, 70, 782-790.
Krzanowski, W.J. (1977), “The performance of Fisher’s linear discriminant function under
nonoptimal conditions”, Technometrics, 19, 191-200.
Laitinen, E.K. (1992), “Prediction of failure of a newly founded firm”, Journal of Business
Venturing, 7, 323-340.
Lam, K.F. and Choo, E.U. (1995), “Goal programming in preference decomposition”, Journal
of the Operational Research Society, 46, 205-213.
Lanchenbruch, P.A., Snuringer, C. and Revo, L.T. (1973), “Robustness of the linear and
quadratic discriminant function to certain types of non-normality”, Communications in
Statistics, 1, 39-56.
Langholz, G., Kandel, A., Schneider, M. and Chew, G. (1996), Fuzzy Expert System Tools,
John Wiley and Sons, New York.
Lee, S.M. and Chesser, D.L. (1980), “Goal programming for portfolio selection”, The Journal
of Portfolio Management, Spring, 22-26.
Lee, K.C. and Kim, H.S. (1997), “A fuzzy cognitive map-based bi-directional inference
mechanism: An application to stock investment analysis”, Intelligent Systems in Account-
ing, Finance & Management, 6, 41-57.
Lee, C.K. and Ord, K.J. (1990), “Discriminant analysis using least absolute deviations”, Deci-
sion Science, 21, 86-96.
Lee, K.H. and Jo, G.S. (1999), “Expert system for predicting stock market timing using a
candlestick chart”, Expert Systems with Applications, 16, 357-364.
Lee, J.K., Kim, H.S. and Chu, S.C. (1989), “Intelligent stock portfolio management system”,
Expert Systems, 6/2, 74-85.
Lee, H., Kwak, W. and Han, I. (1995), “Developing a business performance evaluation sys-
tem: An analytic hierarchical model”, The Engineering Economist, 30/4, 343-357.
Lennox, C.S. (1999), “The accuracy and incremental information content of audit reports in
predicting bankruptcy”, Journal of Business Finance & Accounting, 26/5-6, 757-778.
Liittschwager, J.M., and Wang, C. (1978), “Integer programming solution of a classification
problem”, Management Science, 24/14, 1515-1525.
242
Liu, N.K.. and Lee, K.K. (1997), “An intelligent business advisor system for stock invest-
ment”, Expert Systems, 14/4, 129-139.
Lofti, V., Stewart, T.J. and Zionts, S. (1992), “An aspiration-level interactive model for mul-
tiple criteria decision making”, Computers and Operations Research, 19, 677-681.
Lootsma, F.A. (1997), Fuzzy Logic for Planning and Decision Making, Kluwer Academic
Publishers, Dordrecht.
Luce, D. (1956), “Semiorders and a theory of utility discrimination”, Econometrica, 24.
Luoma, M. and Laitinen, E.K. (1991), “Survival analysis as a tool for company failure predic-
tion”, Omega, 19/6, 673-678.
Lynch, J.G. (1979), “Why additive utility models fail as descriptions of choice behavior”,
Journal of Experimental Social Phychology, 15, 397-417.
Mangasarian, O.L. (1968), “Multisurface method for patter separation”, IEEE Transactions
on Information Theory, IT-14/6, 801-807.
Mardia, K.V. (1975), “Assessment of multinormality and the robustness of Hotelling’s T2
test”, Applied Statistics, 24, 163-171.
Markowitz, H. (1952), “Portfolio selection”, Journal of Finance, 7/1, 77-91.
Markowitz, H. (1959), Portfolio Selection: Efficient Diversification of Investments, John
Wiley and Sons, New York.
Markowski, C.A. (1990), “On the balancing of error rates for LP discriminant methods”,
Managerial and Decision Economics, 11, 235-241.
Markowski, E.P. and Markowski, C.A. (1985), “Some difficulties and improvements in ap-
plying linear programming formulations to the discriminant problem”, Decision Sci-
ences, 16, 237-247.
Markowski, C.A. and Markowski, E.P. (1987), “An experimental comparison of several ap-
proaches to the discriminant problem with both qualitative and quantitative variables”,
European Journal of Operational Research, 28, 74-78.
Martel, J.M., Khoury, N.T. and Bergeron, M. (1988), “An application of a multicriteria ap-
proach to portfolio comparisons”, Journal of the Operational Research Society, 39/7,
617-628.
Martin, D. (1977), “Early warning of bank failure: A logit regression approach”, Journal of
Banking and Finance, 1, 249-276.
Massaglia, M. and Ostanello, A. (1991), “N-TOMIC: A decision support for multicriteria
segmentation problems”, in: P. Korhonen (ed.), International Workshop on Multicriteria
Decision Support, Lecture Notes in Economics and Mathematics Systems 356, Springer-
Verlag, Berlin, 167-174.
Matsatsinis, N.F., Doumpos, M. and Zopounidis, C. (1997), “Knowledge acquisition and repre-
sentation for expert systems in the field of financial analysis”, Expert Systems with Applica-
tions, 12/2, 247-262.
McFadden, D. (1974), “Conditional logit analysis in qualitative choice behavior”, in: P. Za-
rembka (ed.), Frontiers in Econometrics, Academic Press, New York.
References 243
McFadden, D. (1980), “Structural discrete probability models derived from the theories of
choice”, in: C.F. Manski and D. McFadden (eds.), Structural Analysis of Discrete Data
with Econometric Applications, MIT Press, Cambridge, Mass.
McLachlan, G. J. (1992), Discriminant Analysis and Statistical Pattern Recognition, Wiley,
New York.
Messier, W.F. and Hansen, J.V. (1988), “Inducing rules for expert system development: An
example using default and bankruptcy data”, Management Science, 34/12, 1403-1415.
Michalski, R.S. (1969), “On the quasi-minimal solution of the general covering problem”,
Proceedings of the 5th International Federation on Automatic Control, Vol. 27, 109-129.
Mienko, R., Stefanowski, J., Toumi, K. and Vanderpooten, D., (1996), “Discovery-oriented
induction of decision rules”, Cahier du LAMSADE no. 141, Université de Paris Dau-
phine, Paris.
Moody’s Investors Service (1998), Moody’s Equity Fund Analyzer (MFA): An Analytical
Tool to Assess the Performance and Risk Characteristics of Equity Mutual Funds,
Moody’s Investors Service, New York.
Moody’s Investors Service (1999), Moody’s Sovereign Ratings: A Ratings Guide, Moody’s
Investors Service, New York.
Moody’s Investors Service (2000), Moody’s Three Point Plot: A New Approach to Mapping
Equity Fund Returns, Moody’s Investors Service, New York.
Moore, D.H. (1973), “Evaluation of five discriminant procedures for binary variables”, Jour-
nal of the American Statistical Association, 68, 399-404.
Mousseau, V. and Slowinski, R. (1998), “Inferring an ELECTRE-TRI model from assignment
examples”, Journal of Global Optimization, 12/2, 157-174.
Mousseau, V., Slowinski, R. and Zielniewicz, P. (2000), “A user-oriented implementation of
the ELECTRE-TRI method integrating preference elicitation support”, Computers and
Operations Research, 27/7-8, 757-777.
Murphy, J. (1999), Technical Analysis of the Financial Markets: A Comprehensive Guide to
Trading Methods and Applications, Prentice Hall Press, New Jersey.
Nakayama, H. and Kagaku, N. (1998), “Pattern classification by linear goal programming and
its extensions”, Journal of Global Optimization, 12/2, 111-126.
Nakayama, H., Takeguchi, T. and Sano, M. (1983), “Interactive graphics for portfolio selec-
tion”, in: P. Hansen (ed.), Essays and Surveys on Multiple Criteria Decision Making,
Lectures Notes in Economics and Mathematical Systems 209, Springer Verlag, Berlin-
Heidelberg, 280-289.
Nieddu, L. and Patrizi, G. (2000), “Formal methods in pattern recognition: A review”, Euro-
pean Journal of Operational Research, 120, 459-495.
Oh, S. and Pedrycz, W. (2000), “Identification of fuzzy systems by means of an auto-tuning
algorithm and its application to nonlinear systems”, Fuzzy Sets and Systems, 115, 205-
230.
Ohlson, J.A. (1980), “Financial ratios and the probabilistic prediction of bankruptcy”, Journal
of Accounting Research, 18, 109–131.
244
Oral, M. and Kettani, O. (1989), “Modelling the process of multiattribute choice”, Journal of
the Operational Research Society, 40/3, 281-291.
Östermark, R. and Höglund, R. (1998), “Addressing the multigroup discriminant problem
using multivariate statistics and mathematical programming”, European Journal of Op-
erational Research, 108, 224-237.
Pareto, V. (1896), Cours d’ Economie Politique, Lausanne.
Pardalos, P.M., Sandström, M. and Zopounidis, C. (1994), “On the use of optimization mod-
els for portfolio selection: A review and some computational results”, Computational
Economics, 7/4, 227-244.
Pardalos, P.M., Siskos, Y. and Zopounidis, C. (1995), Advances in Multicriteria Analysis,
Kluwer Academic Publishers, Dordrecht.
Patuwo, E., Hu, M.Y. and Hung, M.S. (1993), “Two-group classification using neural net-
works”, Decision Sciences, 24, 825-845.
Pawlak, Z. (1982), “Rough sets”, International Journal of Information and Computer Sci-
ences, 11, 341–356.
Pawlak, Z. (1991) Rough Sets: Theoretical Aspects of Reasoning about Data, Kluwer Aca-
demic Publishers, Dordrecht.
Pawlak, Z. and Slowinski, R. (1994), “Rough set approach to multi-attribute decision analy-
sis”, European Journal of Operational Research, 72, 443-459.
Peel, M.J. (1987), “Timeliness of private company reports predicting corporate failure”, In-
vestment Analysis, 83, 23-27.
Perny, P. (1998), “Multicriteria filtering methods based on concordance and non-discordance
principles”, Annals of Operations Research, 80, 137-165.
Platt, H.D. and Platt, M.B. (1990), “Development of a class of stable predictive variables: The
case of bankruptcy prediction”, Journal of Business Finance and Accounting, 17/1, 31–
51.
Press, S.J. and Wilson, S. (1978), “Choosing between logistic regression and discriminant
analysis”, Journal of the American Statistical Association, 73, 699-705.
Quinlan, J.R. (1983), “Learning efficient classification procedures and their application to
chess end games”, in: R.S. Michalski, J.G. Carbonell and T.M. Mitchell (eds.), Machine
Learning: An Artificial Intelligence Approach, Tioga Publishing Company, Palo Alto,
CA.
Quinlan, J.R. (1986), “Induction of decision trees”, Machine Learning 1, 81–106.
Quinlan J.R. (1993), C4.5: Programs for Machine Learning, Morgan Kaufmann Publishers,
Los Altos, California.
Ragsdale, C.T. and Stam, A. (1991), “Mathematical programming formulations for the dis-
criminant problem: An old dog does new tricks”, Decision Sciences, 22, 296-307.
Rios-Garcia, S. and Rios-Insua, S. (1983), “The portfolio problem with multiattributes and
multiple criteria”, in: P. Hansen (ed.), Essays and Surveys on Multiple Criteria Decision
Making, Lectures Notes in Economics and Mathematical Systems 209, Springer Verlag,
Berlin Heidelberg, 317-325.
References 245
Ripley, B.D. (1996), Pattern Recognition and Neural Networks, Cambridge University Press,
Cambridge.
Rose, P.S., Andrews W.T. and Giroux, G.A. (1982), “Predicting business failure: A macro-
economic perspective”, Journal of Accounting and Finance, 6/1, 20-31.
Ross, S. (1976), “The arbitrage theory of capital asset pricing”, Journal of Economic Theory,
13, 343-362.
Roy, B. (1968), “Classement et choix en présence de points de vue multiples: La méthode
ELECTRE”, R.I.R.O, 8, 57-75.
Roy, B. (1985), Méthodologie Multicritère d’ Aide à la Décision, Economica, Paris.
Roy, B. (1991), “The outranking approach and the foundations of ELECTRE methods”, The-
ory and Decision, 31, 49-73.
Roy, B. and Vincke, Ph. (1981), “Multicriteria analysis: Survey and new directions”, Euro-
pean Journal of Operational Research, 8, 207-218.
Roy, B. and Bouyssou D. (1986), “Comparison of two decision-aid models applied to a nu-
clear power plant sitting example”, European Journal of Operational Research, 25, 200-
215.
Rubin, P.A. (1990a), “Heuristic solution procedures for a mixed–integer programming dis-
criminant model”, Managerial and Decision Economics, 11, 255–266.
Rubin, P.A. (1990b), “A comparison of linear programming and parametric approaches to the
two–group discriminant problem”, Decision Sciences, 21, 373–386.
Rumelhart, D.E., Hinton, G.E. and Williams, R.J. (1986), “Learning internal representation by
error propagation”, in: D.E. Rumelhart and J.L. Williams (eds.), Parallel Distributed
Processing: Explorations in the Microstructure of Cognition, MIT Press, Cambridge,
Mass.
Saaty, T.L. (1980), The Analytic Hierarchy Process, McGraw-Hill, New York.
Saaty, T.L., Rogers, P.C. and Pell, R. (1980), “Portfolio selection through hierarchies”, The
Journal of Portfolio Management, Spring, 16-21.
Scapens, R.W., Ryan, R.J. and Flecher, L. (1981), “Explaining corporate failure: A ca-
tastrophe theory approach”, Journal of Business Finance and Accounting, 8/1, 1-26.
Schoner B. and Wedley, W.C. (1989), “Ambiguous criteria weights in AHP: Consequences
and solutions”, Decision Sciences, 20, 462-475.
Schoner B. and Wedley, W.C. (1993), “A unified approach to AHP with linking pins”, Euro-
pean Journal of Operational Research, 64, 384-392.
Sharpe, W. (1964), “Capital asset prices: A theory of market equilibrium under conditions of
risk”, Journal of Finance, 19, 425-442.
Sharpe, W. (1998), “Morningstar’s risk adjusted ratings”, Financial Analysts Journal,
July/August, 21-33.
Shen, L., Tay, F.E.H, Qu, L. and Shen, Y. (2000), “Fault diagnosis using rough sets theory”,
Computers in Industry 43, 61-72.
Siskos, J. (1982), “A way to deal with fuzzy preferences in multicriteria decision problems”,
European Journal of Operational Research, 10, 314-324.
246
Siskos, J. and Despotis, D.K. (1989), “A DSS oriented method for multiobjective linear pro-
gramming problems”, Decision Support Systems, 5, 47-55.
Siskos, Y. and Yannacopoulos, D. (1985), “UTASTAR: An ordinal regression method for
building additive value functions”, Investigação Operacional, 5/1, 39-53.
Siskos, J., Lochard, J. and Lombardo, J. (1984a), “A multicriteria decision-making methodol-
ogy under fuzziness: Application to the evaluation of radiological protection in nuclear
power plants”, in: H.J. Zimmermann, L.A. Zadeh, B.R. Gaines (eds.), Fuzzy Sets and
Decision Analysis, North-Holland, Amsterdam, 261-283.
Siskos, J., Wascher, G. and Winkels, H.M. (1984b), “Outranking approaches versus MAUT
in MCDM”, European Journal of Operational Research, 16, 270-271.
Siskos, Y., Grigoroudis, E., Zopounidis, C. and Saurais, O. (1998), “Measuring customer satis-
faction using a survey based preference disaggregation model”, Journal of Global Optimi-
zation, 12/2, 175-195.
Skogsvik, K. (1990), “Current cost accounting ratios as predictors of business failure: The
Swedish case”, Journal of Business Finance and Accounting, 17/1, 137-160.
Skowron, A. (1993), “Boolean reasoning for decision rules generation”, in: J. Komorowski
and Z. W. Ras (eds.), Methodologies for Intelligent Systems, Lecture Notes in Artificial
Intelligence vol. 689, Springer-Verlag, Berlin, 295–305.
Slowinski, R. (1993), “Rough set learning of preferential attitude in multi-criteria decision
making”, in: J. Komorowski and Z. W. Ras (eds.), Methodologies for Intelligent Sys-
tems. Lecture Notes in Artificial Intelligence vol. 689, Springer-Verlag, Berlin, 642–651.
Slowinski, R. and Stefanowski, J. (1992), “RoughDAS and RoughClass software implemen-
tations of the rough sets approach”, in: R. Slowinski (ed.), Intelligent Decision Support:
Handbook of Applications and Advances of the Rough Sets Theory, Kluwer Academic
Publishers, Dordrecht, 445-456.
Slowinski, R. and Stefanowski, J. (1994), “Rough classification with valued closeness rela-
tion”, in: E. Diday et al. (eds.), New Approaches in Classification and Data Analysis,
Springer-Verlag, Berlin, 482–488.
Slowinski, R. and Zopounidis, C. (1995), “Application of the rough set approach to evaluation
of bankruptcy risk”, International Journal of Intelligent Systems in Accounting, Finance
and Management, 4, 27–41.
Smith, C. (1947), “Some examples of discrimination”, Annals of Eugenics, 13, 272-282.
Smith, K.V. (1965), “Classification of investment securities using multiple discriminant
analysis”, Institute Paper No. 101, Institute for Research in the Behavioral, Economic
and Management Sciences, Perdue University.
Smith, F.W. (1968), “Pattern classifier design by linear programming”, IEEE Transactions on
Computers, C-17,4, 367-372.
Spronk, J. (1981), Interactive Multiple Goal Programming Application to Financial Planning,
Martinus Nijhoff Publishing, Boston.
Spronk, J. and Hallerbach, W. (1997), “Financial modeling: Where to go? With an illustration
for portfolio management”, European Journal of Operational Research, 99, 113-125.
References 247
Srinivasan, V. and Kim, Y.H. (1987), “Credit granting: A comparative analysis of classifica-
tion procedures”, Journal of Finance, XLII/3, 665–683.
Srinivasan, V. and Ruparel, B. (1990), “CGX: An expert support system for credit granting”,
European Journal of Operational Research, 45, 293-308.
Srinivasan, V. and Shocker, A.D. (1973), “Linear programming techniques for multidimen-
sional analysis of preferences”, Psychometrika, 38/3, 337–396.
Stam, A. (1990), “Extensions of mathematical programming-based classification rules: A
multicriteria approach”, European Journal of Operational Research, 48, 351-361.
Stam, A. and Joachimsthaler, E.A. (1989), “Solving the classification problem via linear and
nonlinear programming methods”, Decision Sciences, 20, 285–293.
Standard & Poor’s Rating Services (1997), International Managed Funds: Profiles, Criteria,
Related Analytics, Standard & Poor’s, New York.
Standard & Poor’s Rating Services (2000), Money Market Fund Criteria, Standard & Poor’s,
New York.
Stefanowski, J. and Vanderpooten, D. (1994), “A general two-stage approach to inducing
rules from examples”, in: W. Ziarko (ed.) Rough Sets, Fuzzy Sets and Knowledge Dis-
covery, Springer-Verlag, London, 317–325.
Steiner, M. and Wittkemper, H.G. (1997), “Portfolio optimization with a neural network im-
plementation of the coherent market hypothesis”, European Journal of Operational Re-
search, 100, 27-40.
Steuer, R.E. and Choo, E.U. (1983), “An interactive weighted Tchebycheff procedure for
multiple objective programming”, Mathematical Programming, 26/1, 326-344.
Stewart, T.J. (1993), “Use of piecewise linear value functions in interactive multicriteria deci-
sion support: A Monte Carlo study”, Management Science, 39, 1369-1381.
Stewart, T.J. (1996), “Robustness of additive value function methods in MCDM”, Journal of
Multi-Criteria Decision Analysis, 5, 301-309.
Subramanian, V., Hung, M.S. and Hu, M.Y. (1993), “An experimental evaluation of neural
networks for classification”, Computers and Operations Research, 20/7, 769-782.
Subrahmaniam, K. and Chinganda, E.F. (1978), “Robustness of the linear discriminant func-
tion to nonnormality: Edgeworth series”, Journal of Statistical Planning and Inference,
2, 79-91.
Szala, A. (1990), L’ Aide à la Décision en Gestion de Portefeuille, Diplôme Supérieur de Re-
cherches Appliquées, Université de Paris Dauphine.
Tam, K.Y., Kiang, M.Y. and Chi, R.T.H. (1991), “Inducing stock screening rules for portfolio
construction”, Journal of the Operational Research Society, 42/9, 747-757.
Tamiz, M., Hasham, R. and Jones, D.F. (1997), “A comparison between goal programming
and regression analysis for portfolio selection”, in: G. Fandel and Th. Gal (eds.), Lectures
Notes in Economics and Mathematical Systems 448, Multiple Criteria Decision Making,
Proceedings of the Twelfth International Conference, Hagen, Germany, Berlin-
Heidelberg, 422-432.
Tessmer, A.C. (1997), “What to learn from near misses: An inductive learning approach to
credit risk assessment”, Decision Sciences, 28/1, 105-120.
248
Theodossiou, P. (1987), Corporate Failure Prediction Models for the US Manufacturing and
Retailing Sectors, Unpublished Ph.D. Thesis, City University of New York.
Theodossiou, P. (1991), “Alternative models for assessing the financial condition of business
in Greece”, Journal of Business Finance and Accounting, 18/5, 697–720.
Theodossiou, P., Kahya, E., Saidi, R. and Philippatos, G. (1996), “Financial distress and cor-
porate acquisitions: Further empirical evidence”, Journal of Business Finance and Ac-
counting, 23/5–6, 699–719.
Trippi, R.R. and Turban, R. (1996), Neural Networks in Finance and Investing, Irwin, Chi-
cago.
Tsumoto, S. (1998), “Automated extraction of medical expert system rules from clinical data-
bases based on rough set theory”, Information Sciences, 112, 67-84.
Wagner, H.M. (1959), “Linear programming techniques for regression analysis”, Journal of
the American Statistical Association, 54, 206-212.
White, R. (1975), “A multivariate analysis of common stock quality ratings”, Financial Man-
agement Association Meetings.
Wierzbicki, A.P. (1980), “The use of reference objectives in multiobjective optimization”, in:
G. Fandel and T. Gal (eds.), Multiple Criteria Decision Making: Theory and Applica-
tions, Lecture Notes in Economic and Mathematical Systems 177, Springer-Verlag, Ber-
lin-Heidelberg, 468-486.
Wilson, J.M. (1996), “Integer programming formulation of statistical classification prob-
lems”, Omega, 24/6, 681–688.
Wilson, R.L. and Sharda, R. (1994), “Bankruptcy prediction using neural networks”, Decision
Support Systems, 11, 545-557.
Wong, F.S., Wang, P.Z., Goh, T.H. and Quek, B.K. (1992), “Fuzzy neural systems for stock
selection”, Financial Analysts Journal, January/February, 47-52.
Wood, D. and Dasgupta, B. (1996), “Classifying trend movements in the MSCI U.S.A. capital
market index: A comparison of regression, ARIMA and neural network methods”, Com-
puters and Operations Research, 23/6, 611 -622.
Vale, D.C. and Maurelli, V.A. (1983), “Simulating multivariate nonnormal distributions”,
Psychometrika, 48/3,465-471.
Vargas, L.G. (1990), “An overview of the AHP and its applications”, European Journal of
Operational Research, 48, 2-8.
Von Altrock, C. (1996), Fuzzy Logic and Neurofuzzy Applications in Business and Finance,
Prentice Hall, New Jersey.
Von Neumann, J. and Morgenstern, O. (1944), Theory of Games and Economic Behavior,
Princeton, New Jersey.
Yager, R.R. (1977), “Multiple objective decision-making using fuzzy sets”, International
Journal of Man-Machine Studies, 9, 375-382.
Yandell, B.S. (1977), Practical Data Analysis for Designed Experiments, Chapman & Hall,
London.
References 249
36. G. Di Pillo and F. Giannessi (eds.): Nonlinear Optimization and Related Topics. 2000
ISBN 0-7923-6109-1
37. V. Tsurkov: Hierarchical Optimization and Mathematical Physics. 2000
ISBN 0-7923-6175-X
38. C. Zopounidis and M. Doumpos: Intelligent Decision Aiding Systems Based on
Multiple Criteria for Financial Engineering. 2000 ISBN 0-7923-6273-X
39. X. Yang, A.I. Mees, M. Fisher and L. Jennings (eds.): Progress in Optimization.
Contributions from Australasia. 2000 ISBN 0-7923-6286-1
40. D. Butnariu and A.N. Iusem: Totally Convex Functions for Fixed Points Computation
and Infinite Dimensional Optimization. 2000 ISBN 0-7923-6287-X
41. J. Mockus: A Set of Examples of Global and Discrete Optimization. Applications of
Bayesian Heuristic Approach. 2000 ISBN 0-7923-6359-0
42. H. Neunzert and A.H. Siddiqi: Topics in Industrial Mathematics. Case Studies and
Related Mathematical Methods. 2000 ISBN 0-7923-6417-1
43. K. Kogan and E. Khmelnitsky: Scheduling: Control-Based Theory and Polynomial-
Time Algorithms. 2000 ISBN 0-7923-6486-4
44. E. Triantaphyllou: Multi-Criteria Decision Making Methods. A Comparative Study.
2000 ISBN 0-7923-6607-7
45. S.H. Zanakis, G. Doukidis and C. Zopounidis (eds.): Decision Making: Recent Devel-
opments and Worldwide Applications. 2000 ISBN 0-7923-6621-2
46. G.E. Stavroulakis: Inverse and Crack Identification Problems in Engineering Mech-
anics. 2000 ISBN 0-7923-6690-5
47. A. Rubinov and B. Glover (eds.): Optimization and Related Topics. 2001
ISBN 0-7923-6732-4
48. M. Pursula and J. Niittymäki (eds.): Mathematical Methods on Optimization in Trans-
portation Systems. 2000 ISBN 0-7923-6774-X
49. E. Cascetta: Transportation Systems Engineering: Theory and Methods. 2001
ISBN 0-7923-6792-8
50. M.C. Ferris, O.L. Mangasarian and J.-S. Pang (eds.): Complementarity: Applications,
Algorithms and Extensions. 2001 ISBN 0-7923-6816-9
51. V. Tsurkov: Large-scale Optimization – Problems and Methods. 2001
ISBN 0-7923-6817-7
52. X. Yang, K.L. Teo and L. Caccetta (eds.): Optimization Methods and Applications.
2001 ISBN 0-7923-6866-5
53. S.M. Stefanov: Separable Programming Theory and Methods. 2001
ISBN 0-7923-6882-7
Applied Optimization
54. S.P. Uryasev and P.M. Pardalos (eds.): Stochastic Optimization: Algorithms and
Applications. 2001 ISBN 0-7923-6951-3
55. J. Gil-Aluja (ed.): Handbook of Management under Uncertainty. 2001
ISBN 0-7923-7025-2
56. B.-N. Vo, A. Cantoni and K.L. Teo: Filter Design with Time Domain Mask Con-
straints: Theory and Applications. 2001 ISBN 0-7923-7138-0
57. S. Zlobec: Stable Parametric Programming. 2001 ISBN 0-7923-7139-9
58. M.G. Nicholls, S. Clarke and B. Lehaney (eds.): Mixed-Mode Modelling: Mixing
Methodologies for Organisational Intervention. 2001 ISBN 0-7923-7151-8
59. F. Giannessi, P.M. Pardalos and T. Rapesák (eds.): Optimization Theory. Recent
Developments from Mátraháza. 2001 ISBN 1-4020-0009-X
60. K.M. Hangos, R. Lakner and M. Gerzson: Intelligent Control Systems. An Introduc-
tion with Examples. 2001 ISBN 1-4020-0134-7
61. D. Gstach: Estimating Output-Specific Efficiencies. 2002 ISBN 1-4020-0483-4
62. J. Geunes, P.M. Pardalos and H.E. Romeijn (eds.): Supply Chain Management:
Models, Applications, and Research Directions. 2002 ISBN 1-4020-0487-7
63. M. Gendreau and P. Marcotte (eds.): Transportation and Network Analysis: Current
Trends. Miscellanea in Honor of Michael Florian. 2002 ISBN 1-4020-0488-5
64. M. Patriksson and M. Labbé (eds.): Transportation Planning. State of the Art. 2002
ISBN 1-4020-0546-6
65. E. de Klerk: Aspects of Semidefinite Programming. Interior Point Algorithms and
Selected Applications. 2002 ISBN 1-4020-0547-4
66. R. Murphey and P.M. Pardalos (eds.): Cooperative Control and Optimization. 2002
ISBN 1-4020-0549-0
67. R. Corrêa, I. Dutra, M. Fiallos and F. Gomes (eds.): Models for Parallel and Distri-
buted Computation. Theory, Algorithmic Techniques and Applications. 2002
ISBN 1-4020-0623-3
68. G. Cristescu and L. Lupsa: Non-Connected Convexities and Applications. 2002
ISBN 1-4020-0624-1
69. S.I. Lyashko: Generalized Optimal Control of Linear Systems with Distributed Para-
meters. 2002 ISBN 1-4020-0625-X
70. P.M. Pardalos and V.K. Tsitsiringos (eds.): Financial Engineering, E-commerce and
Supply Chain. 2002 ISBN 1-4020-0640-3
71. P.S. Knopov and E.J. Kasitskaya: Empirical Estimates in Stochastic Optimization
and Indentification. 2002 ISBN 1 -4020-0707-8
KLUWER ACADEMIC PUBLISHERS – DORDRECHT / BOSTON / LONDON