Professional Documents
Culture Documents
Applied Soft Computing: Magdalene Marinaki, Yannis Marinakis, Constantin Zopounidis
Applied Soft Computing: Magdalene Marinaki, Yannis Marinakis, Constantin Zopounidis
Applied Soft Computing: Magdalene Marinaki, Yannis Marinakis, Constantin Zopounidis
Technical University of Crete, Department of Production Engineering and Management, Industrial Systems Control Laboratory, 73100 Chania, Greece
Technical University of Crete, Department of Production Engineering and Management, Decision Support Systems Laboratory, 73100 Chania, Greece
c
Technical University of Crete, Department of Production Engineering and Management, Financial Engineering Laboratory, University Campus, 73100 Chania, Greece
b
A R T I C L E I N F O
A B S T R A C T
Article history:
Received 2 December 2008
Received in revised form 8 September 2009
Accepted 10 September 2009
Available online 17 September 2009
Nature inspired methods are approaches that are used in various elds and for the solution for a number
of problems. This study uses a nature inspired method, namely Honey Bees Mating Optimization, that is
based on the mating behaviour of honey bees for a nancial classication problem. Financial decisions
are often based on classication models which are used to assign a set of observations into predened
groups. One important step towards the development of accurate nancial classication models involves
the selection of the appropriate independent variables (features) which are relevant for the problem at
hand. The proposed method uses for the feature selection step, the Honey Bees Mating Optimization
algorithm while for the classication step, Nearest Neighbor based classiers are used. The performance
of the method is tested in a nancial classication task involving credit risk assessment. The results of the
proposed method are compared with the results of a particle swarm optimization algorithm, an ant
colony optimization, a genetic algorithm and a tabu search algorithm.
2009 Elsevier B.V. All rights reserved.
Keywords:
Honey Bees Mating Optimization
Metaheuristics
Feature selection problem
Financial classication problem
1. Introduction
Several biological and natural processes have been inuencing
the methodologies in science and technology in an increasing
manner in the past years. Feedback control processes, articial
neurons, the DNA molecule description and similar genomics
matters, studies of the behaviour of natural immunological
systems, and more, represent some of the very successful domains
of this kind in a variety of real world applications. During the last
decade, nature inspired intelligence becomes increasingly popular
through the development and utilization of intelligent paradigms
in advanced information systems design. Cross-disciplinary teambased thinking attempts to cross-fertilize engineering and life
science understanding into advanced inter-operable systems. The
methods contribute to technological advances driven by concepts
from nature/biology including advances in structural genomics
(intelligent drug design through imprecise data bases), mapping of
genes to proteins and proteins to genes (one-to-many and manyto-one characteristics of naturally occurring organisms), modelling
of complete cell structures (showing modularity and hierarchy),
functional genomics (handling of hybrid sources and heterogeneous and inconsistent origins of disparate databases), self-
* Corresponding author. Tel.: +30 28210 37236; fax: +30 28210 69410.
E-mail addresses: magda@dssl.tuc.gr (M. Marinaki), marinakis@ergasya.tuc.gr
(Y. Marinakis), kostas@dpem.tuc.gr (C. Zopounidis).
1568-4946/$ see front matter 2009 Elsevier B.V. All rights reserved.
doi:10.1016/j.asoc.2009.09.010
807
(genotype) of the one queen and more than one drones. In our
proposed algorithm except of this classic procedure that has
been used from the researchers in the previous published
algorithms based on Honey Bees Mating Optimization [1418],
we use also an adaptive memory procedure in order the queen
to have the possibility to store from previous selected good
drones (in previous mating ights) part of their solutions in
order to use them in a new mating ight and to produce more
ttest drones. Another difference from a classic memetic
algorithm is the role of the workers. Someone could say that
since the role of the workers is simply the brood care and they
are only a local search phase in the algorithm, then we have one
of the basic characteristics of the memetic algorithms (a genetic
algorithm with a local search phase [19]). But here we have a
strict parallelism of the local search phase with what happens in
real life, i.e. with the foraging behaviour of the honey bees. We
mean that each one of the workers, which are different honey
bees, takes care one brood in order to nd for him food and feed
him with the royal jelly in order to make him ttest and if this
brood is better than the queen to take her place. And thus, this
algorithm combines both the mating process of the queen and
one part of the foraging behaviour of the honey bees inside the
hive.
This paper presents a novel approach to solve the Feature
Subset Selection Problem using Honey Bees Mating Optimization
Algorithm. In the classication phase of the proposed algorithm,
a number of variants of the Nearest Neighbor classication
method are used [20]. The algorithm is used for the credit risk
assessment classication task that is a very challenging and
important management science problem in the domain of
nancial analysis [21]. Modern nance is a broad eld often
involved with hard decision-making problems related to risk
management. In several cases, nancial decision-making problems require the assignment of the available options into
predened groups/classes. Credit risk analysis, bankruptcy
prediction, and country risk assessment, among other are some
typical examples [22]. In this context the development of reliable
classication models is clearly of major importance to researchers and practitioners. The development of nancial classication
models is a complicated process, involving careful data collection
and pre-processing, model development, validation and implementation. Focusing on model development, several methods
have been used, including statistical methods, articial intelligence techniques and operations research methodologies. In all
cases, the quality of the data is a fundamental point. This is
mainly related to the adequacy of the sample data in terms of the
number of observation and the relevance of the decision
attributes (i.e., independent variables) used in the analysis.
During the last years the application of nature inspired methods
to nancial problems have been developed [2328]. More
precisely, in [24] a procedure that utilizes a genetic algorithm
in order to solve the Feature Subset Selection Problem is
presented and is combined with a number of Nearest Neighbor
based classiers. The genetic based classication algorithm is
applied for the solution of the credit risk assessment classication problem. In [25], a memetic algorithm, which is based on the
concepts of genetic algorithms and particle swarm optimization
is presented. Contrary to genetic algorithms, in this algorithm the
evolution of each individual of the population is performed using
a particle swarm optimization algorithm. The memetic-based
classication algorithm is combined with a number of nearest
neighbor based classiers and is tested in a very signicant
nancial classication task, involving the identication of
qualied audit reports. In [26] a tahu search metaheuristic
combined with a number of Nearest Neighbor classiers is
applied for the solution of the credit risk assessment problem
808
l1
i
k
X
i
(2)
(3)
(4)
OCA 100
i1
i1
C X
C
X
ci j
(5)
i1 j1
809
(6)
(7)
energyt 1 a energyt
(8)
810
numbers are selected (Cr 1 and Cr 2 ) that control the fraction of the
parameters that are selected for the adaptive memory, the selected
drones and the queen. If there are common parts in the solutions
(queen, drones and adaptive memory) then these common parts
are inherited to the brood, else the Cr 1 and Cr2 values are compared
with the output of a random number generator, randi 0; 1. If the
random number is less or equal to the Cr 1 the corresponding value
is inherited from the queen, if the random number is between the
Cr 1 and the Cr 2 then the corresponding value is inherited,
randomly, from the one of the elite solutions that are in the
adaptive memory, otherwise it is selected, randomly, from the
solutions of the drones that are stored in spermatheca. Thus, if the
solution of the brood is denoted by bi t (t is the iteration number),
the solution of the queen is denoted by qi t, the solution in the
adaptive memory is denoted by adi t and the solution of the drone
by di t:
8
if randi 0; 1 Cr 1
< qi t;
(9)
bi t adi t;
if Cr 1 < randi 0; 1 Cr 2
:
di t;
otherwise:
In each iteration the adaptive memory is updated based on the best
solution. In real life, the role of the workers is restricted to brood
care and for this reason the workers are not separate members of
the population but they are used as local search procedures in
order to improve the broods produced by the mating ight of the
queen. Each of the workers have different capabilities and the
choice of two different workers may produce different solutions.
Each of the worker have the possibility to activate or deactivate a
number of different features. Each of the brood will choose,
randomly, one worker to feed it with royal jelly (local search phase)
having as a result the possibility of replacing the queen if the
solution of the brood is better than the solution of the current
queen. If the brood fails to replace the queen, then in the next
mating ight of the queen this brood will be one of the drones. The
number of drones remains constant in each iteration and the
population of drones consists of the ttest drones of all
generations.
4. Application
The nature inspired algorithm is applied to a nancial
classication problem which is related to credit risk assessment.
The data, taken from Doumpos and Pasiouras [46] involve 1330
rm-year observations for UK non-nancial rms, over the period
19992001. The sample observations are classied into ve risk
groups according to their level of likelihood of default, measured
on the basis of their QuiScore, a credit rating assigned by Qui Credit
Assessment Ltd. In particular, on the basis of their QuiScore, the
rms are classied into ve risk groups as follows:
Secure group, consisting of rms for which failure is very unusual
and normally occurs only as a result of exceptional market
changes.
Stable group, consisting of rms for which failure is rare and will
only come about if there are major company or market changes.
Normal group, consisting of rms that do not fail, as well as some
that fail.
Unstable (caution) group, consisting of rms that have a
signicant risk of failure.
High-risk group, consisting of rms which are unlikely to be able
to continue their operation unless signicant remedial action is
undertaken.
Such multi-group credit risk rating systems are widely used in
practice, mainly by banking institutions. According to Treacy and
Carey [47] the internal reporting process in banking institutions is
Table 1
List of nancial ratios.
Feature number
Financial ratios
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
Current ratio
Quick ratio
Shareholders liquidity ratio
Solvency ratio
Asset cover
Annual change in xed assets
Annual change in current assets
Annual change in stock
Annual change in debtors
Annual change in total assets
Annual change in current liabilities
Annual change creditors
Annual change in loans/overdraft
Annual change in long-term liabilities
Net prot margin
Return on total assets
Interest cover
Stock turnover
Debtors turnover
Debtors collection (days)
Creditors payment (days)
Fixed assets turnover
Salaries/turnover
Gross prot margin
Earning Before Interest and Taxes (EBIT) margin
Earning Before Interest and Taxes, Depreciation
and Amortization (EBITDA) margin
811
Table 2
Classication results.
Methods
Overall classication
accuracy (%)
Average no.
of features
HBMO 1-nn
HBMO k-nn
HBMO wk-nn
ACO 1-nn
ACO k-nn
ACO wk-nn
PSO 1-nn
PSO k-nn
PSO wk-nn
Gen 1-nn
Gen k-nn
Gen wk-nn
Tabu 1-nn
Tabu 5-nn
Tabu w5-nn
1-nn
75.21
74.18
73.52
72.18
72.85
71.72
72.10
73.30
71.05
68.12
69.84
69.02
67.74
66.09
67.36
55.78
10.2
10.3
10.6
11.2
12
11.8
11.3
10.4
11.7
12.4
13
11.2
12.2
12
12.4
26
Table 3
Selection frequencies of the features in the optimal feature sets.
Features
Frequency
Relative
frequency
Features
Frequency
Relative
frequency
1
2
3
4
5
6
7
8
9
10
11
12
13
17
19
14
27
15
11
11
14
5
11
6
2
8
(56.66)
(63.33)
(46.66)
(90)
(50)
(36.67)
(36.67)
(46.67)
(16.67)
(36.67)
(20)
(6.67)
(26.67)
14
15
16
17
18
19
20
21
22
23
24
25
26
8
18
13
10
14
10
5
8
13
7
8
18
19
(26.67)
(60)
(43.33)
(33.33)
(46.67)
(33.33)
(16.67)
(26.67)
(43.33)
(23.33)
(26.67)
(60)
(63.33)
812
[20] R.O. Duda, P.E. Hart, D.G. Stork, Pattern Classication and Scene Analysis, second
edition, John Wiley and Sons, New York, 2001.
[21] M. Doumpos, K. Kosmidou, G. Baourakis, C. Zopounidis, Credit risk assessment
using a multicriteria hierarchical discrimination approach: A comparative analysis, European Journal of Operational Research 138 (2002) 392412.
[22] M. Doumpos, C. Zopounidis, P.M. Pardalos, Multicriteria sorting methodology:
Application to nancial decision problems, Parallel Algorithms and Applications
15 (12) (2000) 113129.
[23] A. Brabazon, M. ONeill, Biologically Inspired Algorithms for Financial Modelling,
Springer-Verlag, Berlin, 2006.
[24] M. Marinaki, Y. Marinakis, C. Zopounidis, Application of a genetic algorithm for
the credit risk assessment problem, Foundations of Computing and Decisions
Sciences 32 (2) (2007) 139152.
[25] M. Marinaki, Y. Marinakis, C. Zopounidis, A memetic algorithm for the identication of rms receiving qualied audit reports, Journal of Computational Optimization in Economics and Finance 1 (1) (2008) 5971.
[26] Y. Marinakis, M. Marinaki, M. Doumpos, N. Matsatsinis, C. Zopounidis, Optimization of nearest neighbor classiers via metaheuristic algorithms for credit risk
assessment, Journal of Global Optimization 42 (2008) 279293.
[27] Y. Marinakis, M. Marinaki, C. Zopounidis, Application of ant colony optimization
to credit risk assessment, New Mathematics and Natural Computation 4 (1)
(2008) 107122.
[28] L. Yu, S. Wang, K.K. Lai, L. Zhou, Bio-Inspired Credit Risk Analysis Computational
Intelligence with Support Vector Machines, Springer-Verlag, Berlin, 2008.
[29] R. Kohavi, G. John, Wrappers for feature subset selection, Articial Intelligence 97
(1997) 273324.
[30] A. Jain, D. Zongker, Feature selection: evaluation, application, and small sample
performance, IEEE Transactions on Pattern Analysis and Machine Intelligence 19
(1997) 153158.
[31] P.M. Narendra, K. Fukunaga, A branch and bound algorithm for feature subset
selection, IEEE Transactions on Computers 26 (9) (1977) 917922.
[32] D.W. Aha, R.L. Bankert, A comparative evaluation of sequential feature selection
algorithms, in: D. Fisher, J.-H. Lenx (Eds.), Articial Intelligence and Statistics,
Springer-Verlag, New York, 1996.
[33] E. Cantu-Paz, S. Newsam, C. Kamath, Feature selection in scientic application, in:
Proceedings of the 2004 ACM SIGKDD International Conference on Knowledge
Discovery and Data Mining, 2004, pp. 788793.
[34] P. Pudil, J. Novovicova, J. Kittler, Floating search methods in feature selection,
Pattern Recognition Letters 15 (1994) 11191125.
[35] E. Cantu-Paz, Feature subset selection, class separability, and genetic algorithms,
in: Genetic and Evolutionary Computation Conference, 2004, 959970.
[36] K. Kira, L. Rendell, A practical approach to feature selection, in: Proceedings of the
Ninth International Conference on Machine Learning, Aberdeen, Scotland, (1992),
pp. 249256.
[37] W. Siedlecki, J. Sklansky, On automatic feature selection, International Journal of
Pattern Recognition and Articial Intelligence 2 (2) (1988) 197220.
[38] F.G. Lopez, M.G. Torres, B.M. Batista, J.A.M. Perez, J.M. Moreno-Vega, Solving
feature subset selection problem by a parallel scatter search, European Journal of
Operational Research 169 (2006) 477489.
[39] A. Al-Ani, Feature subset selection using ant colony optimization, International
Journal of Computational Intelligence 2 (1) (2005) 5358.
[40] A. Al-Ani, Ant colony optimization for feature subset selection, Transactions on
Engineering Computing and Technology 4 (2005) 3538.
[41] R.S. Parpinelli, H.S. Lopes, A.A. Freitas, An ant colony algorithm for classication
rule discovery, in: H. Abbas, R. Sarker, C. Newton (Eds.), Data Mining: A Heuristic
Approach, Idea Group Publishing, London, UK, 2002, pp. 191208.
[42] P.S. Shelokar, V.K. Jayaraman, B.D. Kulkarni, An ant colony classier system:
application to some process engineering problems, Computers and Chemical
Engineering 28 (2004) 15771584.
[43] C. Zhang, H. Hu, Ant colony optimization combining with mutual information for
feature selection in support vector machines, in: S. Zhang, R. Jarvis (Eds.), AI 2005.
LNAI 3809, 2005, 918921.
[44] W. Siedlecki, J. Sklansky, A note on genetic algorithms for large-scale feature
selection, Pattern Recognition Letters 10 (1989) 335347.
[45] Y. Rochat, E.D. Taillard, Probabilistic diversication and intensication in local
search for vehicle routing, Journal of Heuristics 1 (1995) 147167.
[46] M. Doumpos, F. Pasiouras, Developing and testing models for replicating credit
ratings: a multicriteria approach, Computational Economics 25 (2005) 327341.
[47] W.F. Treacy, M. Cavey, Credit risk rating systems at large US banks, Journal of
Banking and Finance 24 (2000) 167201.