Download as pdf or txt
Download as pdf or txt
You are on page 1of 12

Adaptive Resonance Theory Applications for Business Forecasting

Anatoli Nachev and Seamus Hill


Business Information Systems
Cairnes School of Business & Economics, NUI Galway, Ireland

1 Introduction
One of the most significant threats for many businesses today, despite their size and the nature of their
operation, is insolvency. The economic cost of business failures is significant. The suppliers of capital,
investors, and creditors, as well as the management and employees, are severely affected by business failure.
The need for reliable empirical models that predict corporate failure promptly and accurately is imperative
to enable decision makers to take either preventive or corrective action.
Estimating the potential for insolvency, decision makers usually apply scoring systems, which takes
into account factors such as leverage, earnings, reputation, and so on. Due to the lack of metrics and
subjectiveness in estimates, decisions can sometimes be unrealistic and inconsistent [Vellido et al., 1999].
Generally, a prediction of corporate insolvency can be viewed as a pattern recognition problem, and
as such, it can be solved using one of two approaches: structural and empirical. The former derives the
probability of a company for default based on its characteristics and dynamics, while the latter approach
relies on previous knowledge and relationships in that area, learns from existing data or experience, and
deploys statistical or other methods to predict failure.
Kumar and Ravi provided a comprehensive survey on empirical techniques used to predict insolvency,
all grouped into two categories: statistical and intelligent [Kumar & Ravi, 2007]. The most popular statis-
tical techniques are linear discriminant analysis, multivariate discriminate analysis, quadratic discriminant
analysis, logistic regression, and factor analysis. Among the most common intelligence techniques are neu-
ral networks, decision trees, case-based reasoning, evolutionary approaches, and so on [Kumar & Ravi,
2007]. Traditional statistical techniques have often been criticized because of their assumptions about linear
separability of training data, multivariate normality, and independence of the predictive variables [Lacher
et al., 1995, Vellido et al., 1999]. Such constraints are incompatible with the complex nature, boundaries,
and interrelationships of most of financial ratios used for learning and prediction. The intelligent techniques
have shown themselves more appropriate for the task. For instance, neural networks do not rely on a-priori
assumptions about the distribution of data and work well in unstructured and noisy environments [Vellido et
al., 1999].
Multilayer perceptron is the most common, well-known, and widely used model of supervised neural
networks. Sharda, Wilson, and Odom [Odom & Sharda, 1990, Odom & Sharda, 1993, Sharda, & Wilson,
1993, Wilson & Sharda, 1994] used five financial ratios introduced by Altman [Altman, 1968] and multi-
layer perceptron to predict bankruptcy. They reported significantly better prediction accuracy than statistical
techniques applied to the same data. Rahimian et al. compared the performance of multilayer perceptron,
98 Chapter 7 – Adaptive Resonance Theory Applications for Business Forecasting

Athena (an entropy-based neural network), and single-layer perceptron on the bankruptcy prediction prob-
lem using the same Altman’s ratios [Rahimian et al., 1993]. Serrano-Cinca also used Altman’s ratios and
compared his multilayer perceptron with others’ and some statistical techniques [Serrano-Cinca, 1996]. Bell
et al., Hart, Yoon et al., and Curram and Mingers also compared the classifying power of different statistical
tools and multilayer perceptron [Bell et al., 1990, Hart, 1992, Yoon, 1993, Curram, 1994].
Despite their good performance, multilayer perceptrons have a well-known drawback: unclear struc-
ture. The choice of optimal network architecture, particularly the number of layers and hidden nodes, is
a challenge that has no theoretical answer. A good architecture for an application can be found only by
empirical means and experiments.
Many other techniques have been used to predict bankruptcy. Salcedo-Sanz et al. proposed genetic
programming applied to data from insurance companies [Salcedo-Sanz et al., 2005]. Their results were
compared with rough sets approaches. Shin, Lee, and Kim used support vector machines for modelling
business failure prediction [Shin & Lee, 2002]. Cielen, Peeters, and Vanhoof suggested the combined use of
linear programming and inductive machine learning [Cielen, 2004].
Our research was motivated by the fact that a class of neural networks – those based on the Adaptive
Resonance Theory – is still unexplored as a tool for bankruptcy prediction. This chapter explores four of
them, namely, fuzzy ARTMAP, distributed ARTMAP, instance counting ARTMAP, and default ARTMAP.
This study uses a dataset that has already been used with other neural network models, particularly multilayer
perceptrons and self-organizing feature maps, which allows us to compare results.
The chapter is organized as follows: Section 2 provides an overview of the neural network architectures
used in this study; Section 3 discusses the research design, dataset, data preprocessing, and techniques of
the analysis; Section 4 presents and discuses the experimental results; and Section 5 gives the conclusions.

2 Neural Networks

There is a variety of neural network models for clustering and classification, ranging from very general ar-
chitecture, which is applicable to most of the learning problems, to highly specialized networks that address
specific problems. Each model has a particular topology that determines the layout of the neurons (nodes)
and a specific algorithm to train the network or to recall stored information. Among the models, the most
common is the multilayer perceptron (MLP), which has a feed-forward topology and error-backpropagation
learning algorithm [Rumelhart & McClelland, 1986]. Authors often call MLP just neural networks, but this
is not quite correct as there are other members of the big family of neural networks, including those with re-
current topology such as self-organizing feature maps and Adaptive Resonance Theory networks discussed
here [Kohonen, 1997, Grossberg, 1976].

2.1 Neural Networks Based on the Adaptive Resonance Theory


The Adaptive Resonance Theory (ART), introduced by Grossberg in the 1970s, began with the analysis
of human cognitive information processing [Carpenter & Grossberg, 1987, Grossberg, 1976]. It led to the
creation of a family of self-organizing neural networks for fast learning, pattern recognition, and prediction.
Some popular members of the family are both unsupervised models (i.e., ART1, ART2, ART2-A, ART3,
fuzzy ART, distributed ART) and supervised models (i.e., ARTMAP, instance counting ARTMAP, fuzzy
ARTMAP, distributed ARTMAP, and default ARTMAP) [Carpenter, 2003]. The fundamental computational
design goals of these models provide memory stability, fast or slow learning mechanism in an open and
evolving input environment, and implementation by a system of ordinary differential equations approximated
A. Nachev & S. Hill 99

by appropriate techniques [Grossberg, 1976].


A remarkable feature of the ART neural networks is their on-line, one-pass fast learning algorithm.
In contrast, MLP offers an off-line slow learning procedure that requires the availability of all training
patterns at once to avoid catastrophic forgetting in an open input environment. The adaptiveness makes ART
networks suitable for classification problems in dynamic and evolving domains, whereas MLP is mostly
suited for problems related to static environments.
Many applications of ART networks are classification problems where the trained system tries to predict
a correct category of an input sample [Nachev & Stoyanov, 2007, Nachev, 2008]. In fact, these tasks are
pattern-recognition problems, and classification may be viewed as a many-to-one mapping task that entails
clustering of the input space and the association of the produced clusters with a limited number of class
labels.

2.2 ARTMAP Architecture


ARTMAP is a supervised neural network that consists of two unsupervised ART modules, ARTa and ARTb,
and an inter-ART module called a map-field (see Figure 1) [Carpenter et al., 1991a]. An ART module has
three layers of nodes: input layer F0, comparison layer F1, and recognition layer F2. A set of real-valued
weights W j is associated with the F1-to-F2 layer connections between nodes. Each F2 node represents a
recognition category that learns a binary prototype vector w j . The F2 layer is connected through weighted
associative links to a map field F ab .
Inter-ART Module

w ab
j
Fab Layer

ARTa Module ARTb Module

F2 Layer ρab F2 Layer

Wj wj Wj wj

F1 Layer F1 Layer

Match Tracking

F0 Layer ρa F0 Layer ρb

a t

Figure 1: Block diagram of an ARTMAP neural network adapted from [Carpenter et al., 1991a].

The following algorithm describes the ARTMAP learning [Carpenter et al., 1991a, Granger et al., 2001]:
1. Initialization. Initially, all the F2 nodes are uncommitted, and all weight values are initialized. Values
of network parameters are set.
2. Input pattern coding. When a training pattern is presented to the network as a pair of two components
(a, t), where a is the training sample and t is the class label, a process called complement coding takes
place. It transforms the pattern into a form suited to the network. A network parameter called vigilance
100 Chapter 7 – Adaptive Resonance Theory Applications for Business Forecasting

parameter (ρ) is set to its initial value. This parameter controls the network ‘vigilance’, that is, the level
of details used by the system when it compares the input pattern with the memorized categories.

3. Prototype selection. The input pattern activates layer F1 and propagates to layer F2, which produces
a binary pattern of activity such that only the F2 node with the greatest activation value remains active,
that is, ‘winner-takes-all’. The activated node propagates its signals back onto F1, where a vigilance
test takes place. If the test is passed, then resonance is said to occur. Otherwise, the network inhibits
the active F2 node and searches for another node that passes the vigilance test. If such a node does not
exist, an uncommitted F2 node becomes active and undergoes learning.

4. Class prediction. The class label t activates the F ab layer in which the most active node yields the
class prediction. If that node constitutes an incorrect class prediction, then another search among F2
nodes in Step 3 takes place. This search continues until an uncommitted F2 node becomes active (and
learning directly ensues in Step 5), or a node that has previously learned the correct class prediction
becomes active.

5. Learning. The neural network gradually updates its adaptive weights towards the presented training
patterns until a convergence occur. The learning dynamic can be described by a system of ordinary
differential equations [Carpenter et al., 1991a].

2.3 Fuzzy, Distributed, Instance Counting, and Default ARTMAP Neural Networks
Fuzzy ARTMAP was developed as a natural extension to the ARTMAP architecture [Carpenter et al., 1992].
This is accomplished by using fuzzy ART modules instead of ART1, which in fact replaces the crisp (binary)
logic embedded in the ART1 module with a fuzzy one. In fact, the intersection operator (∩) that describes the
ART1 dynamics is replaced by the fuzzy AND operator (∧) from the fuzzy set theory ((p ∧ q)I ≡ min(pi , qi ))
[Carpenter et al., 1991b]. This allows the fuzzy ARTMAP to learn stable categories in response to either
analog or binary patterns in contrast with the basic ARTMAP, which operates with binary patterns only.
An ART1 module maps the categories into F2 nodes according to the winner-takes-all rule, as discussed
above, but this way of functioning can cause category proliferation in a noisy input environment. The
explanation of this fact is that the system adds more and more F2 category nodes to meet the demands
of predictive accuracy, believing that the noisy patterns are samples of new categories. To address this
drawback, Carpenter introduced a new distributed ART module, which features a number of innovations
such as new distributed instar and outstar learning laws [Carpenter, 1996]. If the ART1 module of the basic
ARTMAP is replaced by a distributed ART module, the resulting network is called distributed ARTMAP
[Carpenter, 1996, Carpenter, 1997]. Some experiments show that a distributed ARTMAP retains the fuzzy
ARTMAP accuracy while significantly reducing the network size [Carpenter, 1997].
Instance Counting (IC) ARTMAP adds to the basic fuzzy ARTMAP system’s new capabilities designed
to solve computational problems that frequently arise in prediction. One such problem is inconsistent cases,
where identical input vectors correspond to cases with different outcomes. A small modification of the
fuzzy ARTMAP match-tracking search algorithm allows the IC ARTMAP to encode inconsistent cases and
make distributed probability estimates during testing even when training employs fast learning [Carpenter
& Markuzon, 1998]. IC ARTMAP extends the fuzzy ARTMAP, thus, giving a good performance on various
problems.
A comparative analysis of the ARTMAP modifications, including fuzzy ARTMAP, IC ARTMAP, and
distributed ARTMAP, has led to the identification of the default ARTMAP network [Carpenter, 2003], which
combines the winner-takes-all category node activation during training, distributed activation during testing,
A. Nachev & S. Hill 101

and a set of default network parameter values that define a ready-to-use, general-purpose neural network
for supervised learning and recognition. The default ARTMAP features simplicity of design and robust
performance in many application domains.

3 Research Design
Our motivation to conduct this research is to fill a gap in the bankruptcy prediction field by using four
models of neural networks still unexplored in that domain. The main research objective is to investigate how
ARTMAP models find common characteristics amongst failing firms and distinguish them from the viable
firms to predict bankruptcy. Another objective is to investigate how different variants of the ARTMAP
networks perform and which one is the most appropriate. We also want to compare ARTMAP performance
with that of other classification techniques by using the same datasets and experimental conditions.
To estimate neural network performance, we used different metrics and analysis techniques such as
accuracy, true and false positive rates, receiver operating characteristics analysis, unit cost, and cost analysis.
To validate results, we used the k-fold cross-validation method, where k = 5. According to Carpenter,
the value of 5 is sufficient to validate the majority of ARTMAP applications and is recommended as a default
parameter value [Carpenter et al., 1991a].
Finally, we want to estimate the efficiency of ARTMAP neural networks in terms of training and testing
time and memory consumption.

3.1 Data
The dataset we used has been used in other studies in the domain [Odom & Sharda, 1993, Rahimian et
al., 1993, Serrano-Cinca, 1996, Wilson & Sharda, 1994]. It contains financial information from Moody's
Industrial Manual for a number of years for a total of 129 firms, of which 65 are bankrupt and the rest are
solvent.
According to Brigham, there is an empirical basis for grouping financial ratios into seven categories: (1)
return on investment, (2) financial leverage, (3) capital turnover, (4) short-term liquidity, (5) cash position,
(6) inventory turnover, and (7) receivables turnover [Brigham & Gapenski, 1991]. However, Altman found
that only certain ratios have discriminating capabilities [Altman, 1968]. This research uses the Altman's
ratios, which are as follows:
• Working Capital/Total Assets (R1). Working capital is defined as the difference between current
assets and current liabilities. It can be regarded as net current assets; hence, it is a good measure of
short-term solvency.
• Retained Earnings/Total Assets (R2). Retained profits are obviously higher for older, established
firms than for new entrants, all else being equal. This ratio may seem to be unfairly weighted in favor
of older firms, but actually it is not.
• EBIT/Total Assets (R3). EBIT is the abbreviation for 'Earnings Before Interest and Taxes'. This ratio
is useful for comparing firms in different tax situations and with varying degrees of financial leverage.
• Market Value of Equity/Book Value of Total Debt (R4). This measure examines a firm's competitive
market place value.
• Sales/Total Assets (R5). This ratio, sometimes called the assets turnover ratio, measures the firm's
asset utilization.
102 Chapter 7 – Adaptive Resonance Theory Applications for Business Forecasting

Most applications with neural networks transform rough data into a form suitable for training and test-
ing. A common form of transformation is linear rescaling of the input variables. This is necessary as different
variables may have values that differ significantly because of different units of measurements. Such an im-
balance can reduce the predictive abilities of the model as some of the variables can dominate over others.
A linear transformation was used to arrange for all the inputs to have similar values. Transformed variables
have zero means and unit standard deviations over the transformed training set.

3.2 Reduction of Dimensionality


The principle motivation for the reduction of dimensionality is that a network with fewer inputs has fewer
adaptive parameters to be determined, and these are more likely to be properly constrained by a data set of
limited size (as the dataset used in this study), leading to a network with better generalization properties.
In addition, a neural network with fewer weights may be faster to train. Using univariate F-ratio analysis,
Serrano ranked the Altman’s ratios and suggested that the second and third variables have a greater discrim-
inatory power in contrast with the fifth one [Serrano-Cinca, 1996]. The univariate analysis, however, does
not estimate combinations of variables. Furthermore, the optimal variable selection is different for different
classification models, that is, there is no guarantee that an optimal set for an MLP will perform well with
an ARTMAP neural network. Ideally, the optimal subset for a model can be selected by the exhaustive
(brutal-force) search approach, which checks whether each variable subset satisfies the requirements of the
bankruptcy predictions problem. For n possible variables, we have a total of 2n possible subsets, as each
variable can be present or absent. We took advantage of the small number of variables (i.e., five) to apply
exhaustive search using 31 subsets of train and test datasets.

3.3 Performance Metrics


Binary classifiers, such as the ARTMAP neural nets, map test dataset instances to one element of the set
of positive and negative class labels {p, n}. Outcomes from classifications can be summarized into a 2×2
matrix (confusion matrix or contingency table) where the four values are as follows:

• true positive (TP), also known as hits;


• true negative (TN) or correct rejection;
• false positive (FP), also called Type I error;
• false negative (FN) or Type II error.

The confusion matrix forms the basis for several common metrics defined below:

• true positive rate (TPR), also known as hit rate, recall, or sensitivity:

TPR = TP /( TP + FN ) = TP / P

• false positive rate (FPR), also known as false alarm rate or fall-out:

FPR = FP /( FP + TN ) = FP / N

• true negative rate (TNR), also known as specificity

TNR = SPC = TN /( FP + TN ) = 1 − FPR


A. Nachev & S. Hill 103

• false negative rate (FNR) or positive error


FNR = FN /( TP + FN )

• accuracy (ACC)
ACC = ( TP + TN )/( P + N )

3.4 Receiver Operating Characteristic Analysis


Despite accuracy being a common figure of merit for classifiers, it can be misleading if an important class
is underrepresented, that is, a dataset that contains only few instances of bankrupt companies. In this case,
sensitivity and specificity can be more relevant performance estimators [Wilson & Sharda, 1994]. Moreover,
the accuracy depends on the classifier’s operating threshold, such as the vigilance parameter of ARTMAP,
and choosing the optimal threshold can be challenging. As such, different types of misclassifications have
different costs. For example, in the bankruptcy prediction domain, errors of Type I and Type II can produce
different consequences and have different cost.
In recent years, there has been an increased use of the receiver operation characteristic (ROC) analysis
in the machine learning research due to the realization that simple classification accuracy is often a poor
metric for measuring performance [Fawcett, 2006]. ROC curves have properties that make them especially
useful for applications with skewed class distribution and unequal classification error costs.16 An ROC
space is defined by FPR and TPR as x and y axes, respectively, which depicts relative trade-offs between
true positive (benefits) and false positive (costs). Each prediction result or one instance of a confusion matrix
represents one point in the ROC space. The perfect classification will yield a point in the upper left corner or
coordinate (0, 1), representing 100% sensitivity (all true positives are found) and 100% specificity (no false
positives are found). A completely random guess will give a point along the no-discrimination line from the
left bottom to the top right corner.
Soft or probabilistic classifiers, such as naive Bayesian and MLP neural network, produce probability
values representing the degree to which class an instance belongs. For these methods, setting a threshold
value will determine a point in the ROC space. In contrast, crisp or discrete classifiers, such as ARTMAP,
decision trees, and so on, yield numerical values or a binary label. When a set is given to such classifiers, the
result is a single point in the ROC space. As we want to generate an ROC curve from an ARTMAP classifier,
it can be converted to a soft classifier by ‘looking inside’ it. Varying the vigilance parameter generates the
aggregation of points in the ROC space.

4 Empirical Results and Discussion


In machine learning applications, classification performance is often measured by accuracy as a figure of
merit. For a given operating point of a classifier, the accuracy is the total number of correctly classified
instances divided by the total number of all available instances.
A series of experiments sought to estimate bankruptcy prediction accuracy of the fuzzy, IC, distributed,
and default ARTMAP neural networks. Using five-fold cross-validation and in accordance with the exhaus-
tive search strategy discussed above, 31 datasets were composed and indexed from 1 to 31, where indexes
represent data relevant to the subsets of financial ratios. To investigate how vigilance parameter is related to
the prediction accuracy, each subset was presented to the four models and iterated 41 times using vigilance
parameter values from 0 to 1 with an increment of 0.025. We can summarize that the single financial ratio
fuzzy, IC, and default perform better than distributed ARTMAP. In addition, when ARTMAP works with
104 Chapter 7 – Adaptive Resonance Theory Applications for Business Forecasting

Figure 2: Prediction accuracy in percent obtained by fuzzy, distributed, IC, and default ARTMAP
neural networks using variables {R2, R4} and 41 values of the vigilance parameter from 0 to 1
with an incremental step of 0.025.

individual Altman’s ratios, two of them have the highest discriminatory power – R2 and R4. Finally, all
ARTMAP models used with a single Altman’s ratio can outperform the statistical technique linear discrim-
inant analysis (74.5% accuracy) used with all ratios [Odom & Sharda, 1993].
From the results above, we can expect that when R2 and R4 are used together, their combined discrim-
inatory power can lead to even better results. Indeed, experiments show that {R2, R4} yields the highest
score for all ARTMAP models. The subset {R2, R4, R5} is second best. After a well-tuned vigilance
parameter and using financial ratios R2 and R4, ARTMAP neural networks achieve prediction accuracy as
follows: 85.5% for the fuzzy and default model and 83% for the IC model and distributed model. It can be
observed that fuzzy and default ARTMAP (85.5%) outperform the best accuracy reported from MPL (83%)
[Serrano-Cinca, 1996]. At the same time, IC and distributed models score 83.6%, which is equal to the MLP
score. Figure 2 provides a more detailed view of prediction accuracy obtained by the four ARTMAP models
using the best performing subset of financial ratios {R2, R4} and varying the vigilance parameter from 0 to
1.

4.1 ROC Analysis Results


Each classifier obtained by certain value(s) of the vigilance parameter can be mapped to a point in the ROC
space. Points can be compared given that one is better than another if it is to the northwest (TPR is higher,
FPR is lower, or both) of the first. Classifiers appearing on the left-hand side near the x axis may be consid-
A. Nachev & S. Hill 105

ered more ‘conservative’ as they make positive classifications only with strong evidence, thus making few
false positive errors. Classifiers close to the upper right-hand side of an ROC graph may be considered more
‘liberal’ as they make positive classifications with weak evidence. Thus, they classify nearly all positives
correctly but often have high false positive rates. According to the ROC analysis, the best classifier lays on
the ROC convex hull. This curve is constructed by connecting the most northwest points as well as the two
trivial classifiers (0, 0) and (1,1). The convex hull has a number of useful implications. The classifiers below
the convex hull curve can be discarded because there is no combination of class distribution/cost matrix for
which they can be optimal. As only the classifiers on the convex hull are potentially optimal, no others need
be retained. Selection of the optimal classifier depends on the context of application, class distribution, and
error cost, which is a subject of the cost analysis that follows.

4.2 Cost Analysis


The classifier with the lowest error rate is frequently not the best classifier. In many applications, not all the
errors produced by a predictor have the same consequences. The important thing is not to obtain the classifier
with fewer errors but the one with lowest cost. ROC graphs have been criticized because of their inability
to handle example-specific costs, as they are based on the assumption that the costs of all true positives and
false positives are equivalent. Cost analysis requires transformation of the confusion matrix into cost-benefit
matrix. In using the cost matrix, all points belonging to the convex hull curves are evaluated, as they are
candidates for optimal classifiers. To calculate unit cost of a classifier, we first calculate the slope of that
point by:

slope = ( FPcost / FNcost )( Neg / Pos ),


cost per unit = FNR + slope × FPR.

The analysis helps to rank the four ARTMAP models: the two best classifiers are distributed and IC,
followed by fuzzy and the default model. Comparing the results with those of the ROC analysis, it can be
concluded that the distributed model is not only the best performer after accurate tuning, but it also provides
best overall performance. The IC model has poor overall performance, but after careful tuning it can show
best results. The fuzzy model shows moderate results, which exceed those of the default ARTMAP model.
These conclusions, however, are application-specific rather than general. The conclusions also demonstrate
that different performance metrics, such as accuracy, ROC, and unit cost, show different results.

4.3 Further Experiments


A series of experiments sought to investigate if the four ARTMAP models are sensitive to outliers, that is,
data points that can be excluded because of inconsistency with the statistical nature of the bulk of the data.
We marked data points as outliers if their values are more than three times the standard deviation value away
from the mean of the relevant variable. The four models were trained and tested without outliers, and the
results show no difference from those obtained by the experiments discussed before. The four ARTMAP
models show no sensitivity to the outliers in the context of the domain and datasets discussed here.
We also examined the efficiency of the four in terms of train and test time. An explanation for the
instance responsiveness is that the ARTMAP models feature one-pass learning. In contrast, the widely used
MLP require multiple presentations (epochs) of the training set to learn. Some studies report that MLPs with
similarly sized datasets achieve convergence after 1,400 training iterations [Coats & Fant, 1993], 100,000
iterations [Wilson & Sharda, 1994], and 191,400 iterations over 24 hours [Odom & Sharda, 1990]. These
figures illustrate once again some of the advantages of the ARTMAP models.
106 Chapter 7 – Adaptive Resonance Theory Applications for Business Forecasting

5 Conclusion
Today, financial institutions are paying a heavy price for their indiscriminate practices. Corporate bankrupt-
cies have put many institutions on the brink of insolvency, and many others have been merged with or
acquired by other financial institutions. Decision-making problems in the area require efficient analytic
tools and techniques, most of which involve machine learning to predict the future financial status of firms.
Our research was motivated by a gap in the studies on bankruptcy prediction methods, that is, using
some still unexplored techniques based on the ART neural networks. Here, we examined four of them:
fuzzy ARTMAP, distributed ARTMAP, instance counting ARTMAP, and default ARTMAP. To illustrate
the network performance and compare results from other techniques, we used data, financial ratios, and
experimental conditions identical with those published in previous studies.
Our experiments show that financial ratios Retained Earnings/Total Assets and Market Value of Eq-
uity/Book Value of Total Debt provide the highest discriminatory power and ensure the best prediction ac-
curacy for all the ARTMAP networks. We also found that with appropriate network parameters, ARTMAP
provides 85.5% accuracy, which outperforms all MLP networks and other classification techniques applied
to the same data. To avoid bias from the prediction accuracy and to estimate the overall classifier perfor-
mance, we used five-fold cross validation and an exhaustive search of input variables, and receiver operating
characteristic analysis. The figures show that the distributed ARTMAP is the best performer, followed by
fuzzy, default, and instance counting ARTMAP. We also conducted an application-specific cost analysis to
find optimal network parameters. The unit cost metric shows that through the proper tuning of network vig-
ilance, fuzzy ARTMAP appears to be the best application-specific classifier, followed by instant counting,
distributed, and default ARTMAP.
In conclusion, we find that ARTMAP neural networks are a promising technique for bankruptcy pre-
diction that outperform the most popular MLP networks not only in terms of prediction accuracy but also in
training time and adaptiveness in a changing environment.

References
[Altman, 1968] Altman, E. (1968). Financial Ratios, Discriminant Analysis, and the Prediction of Corporate Bankruptcy. Journal
of Finance, 23(4), 598-609.
[Bell et al., 1990] Bell, T., Ribar, G., & Verchio, J. R. (1990). Neural Nets vs. Logistic Regression: a Comparison of each Models
Ability to Predict Commercial Bank Failures, In Proc. of the 1990 Deloitte & Touche / University of Kansas Symposium on
Auditing Problems (pp. 29-53).
[Brigham & Gapenski, 1991] Brigham, E.F., and Gapenski, L.C. (1991) Financial Management Theory and Practice. New York:
Dryden Press.
[Carpenter & Grossberg, 1987] Carpenter, G. & Grossberg, S. (1987). A Massively Parallel Architecture for a Self-Organizing
Neural Pattern Recognition Machine, Computer Vision, Graphics, and Image Processing, 37, 54-115.
[Carpenter et al., 1991a] Carpenter, G., Grossberg, S., & Reynolds, J. (1991a). ARTMAP: Supervised Real-Time Learning and
Classification of Non-stationary Data by a Self-Organizing Neural Network. Neural Networks, 6, 565-588.
[Carpenter et al., 1991b] Carpenter, G., Grossberg, S., & Rosen, D. (1991b). Fuzzy ART: Fast Stable Learning and Categorization
of Analog Patterns by an Adaptive Resonance system. Neural Networks, 4(6), 759-771.
[Carpenter et al., 1992] Carpenter, G., Grossberg, S., Markuzon, N., Reynorlds, J.H. & Rosen, D.B. (1992). Fuzzy ARTMAP:
Neural Network Architecture for Incremental Supervised Learning of Analog Multidimensional Maps, IEEE Transaction on
Neural Networks, 3(5), 698-713.
[Carpenter, 1996] Carpenter, G. (1996).Distributed Activation, Search, and Learning by ART and ARTMAP Neural Networks. In
Proceedings of the International Conference on Neural Networks (ICNN’96), (pp. 244-249).
A. Nachev & S. Hill 107

[Carpenter, 1997] Carpenter, G. (1997). Distributed Learning, Recognition, and Prediction by ART and ARTMAP Neural Networks.
Neural Networks, 10(8), 1473-1494.
[Carpenter & Markuzon, 1998] Carpenter, G. & Markuzon, N. (1998). ARTMAP-IC and Medical Diagnosis: Instance Counting
and Inconsistent Cases. Neural Networks, 11(2), 323-336.
[Carpenter, 2003] Carpenter, G. (2003). Default ARTMAP, In Proceedings of the International. Joint Conference on Neural Net-
works (IJCNN'03).
[Cielen, 2004] Cielen, A. Peeters, L., and Vanhoof, K. (2004) Bankruptcy Prediction Using a Data Envelopment Analysis, European
Journal of Operational Research, 154 (2), 526–532.
[Coats & Fant, 1993] Coats, P. & Fant, L. (1993). Recognizing Financial Distress Patterns Using Neural Network Tool. Financial
Management, November 1993, 142-155.
[Curram, 1994] Curram, S. P. & Mingers, J. (1994) Neural Networks, Decision Tree Induction and Discriminant Analysis: an
Empirical Comparison, Journal Operational Research Society, 45(4), 440-450.
[Fawcett, 2006] Fawcett, T. (2006). An introduction to ROC analysis. Pattern Recognition Letters, 27(8), 861-874.
[Granger et al., 2001] Granger, E., Rubin, A., Grossberg, S., & Lavoie, P. (2001).A What-and-Where Fusion Neural Network for
Recognition and Tracking of Multiple Radar Emitters. Neural Networks, 3, 325-344.
[Grossberg, 1976] Grossberg, S. (1976). Adaptive Pattern Recognition and Universal Encoding II: Feedback, expectation, olfaction,
and illusions. Biological Cybernetics, 23, 187-202.
[Hart, 1992] Hart, A. (1992). Using Neural Networks for Classification Task. Some Experiments on Datasets and Practical Advice,
Journal Operational Research Society, 43(3), 215-266.
[Kohonen, 1997] Kohonen, T. (1997) Self-Organizing Maps. Series in Information Sciences 2nd ed. vol. 30, Heidelberg: Springer.
[Kumar & Ravi, 2007] Kumar, P., Ravi, V. (2007). Bankruptcy Prediction in Banks and Firms via Statistical and Intelligent Tech-
niques, European Journal of Operational Research, 180 (1), 1-28.
[Lacher et al., 1995] Lacher, R., Coats, P., Sharma, S., Fantc, L. (1995). A Neural Network for Classifying the Financial Health of
a Firm. European Journal of Operational Research, 85, 53–65.
[Nachev & Stoyanov, 2007] Nachev, A. & Stoyanov, B. (2007). Default ARTMAP Neural Networks for Financial Diagnosis, In
Proceedings of the International Conference on Data Mining DMIN’07, (pp. 1-9).
[Nachev, 2008] Nachev, A. (2008). Forecasting with ARTMAP-IC Neural Networks. An Application using Corporate Bankruptcy
Data. In Proceedings of the 10th International Conference on Enterprise Information Systems (ICEIS’08), (pp. 167-172).
[Odom & Sharda, 1990] Odom, M. & Sharda, R. (1990). A Neural Network Model for Bankruptcy Prediction. Proceedings of the
IEEE International Conference on Neural Networks, (pp. 163-168).
[Odom & Sharda, 1993] Odom, M., and Sharda, R. (1993). A Neural Network Model for Bankruptcy Prediction. In R. Trippi and
E. Turban, Eds., Neural Networks in Finance and Investing, Chicago: Probus Publishing Company.
[Rahimian et al., 1993] Rahimian E., Singh, S. Thammachacote, T., & Virmani, R. (1993). Bankruptcy prediction by neural net-
works, Neural Networks in Finance and Investing, R. Trippi and E. Turban, Eds. Chicago: Probus Publ.
[Rumelhart & McClelland, 1986] Rumelhart, D. & McClelland, J. (1986) Parallel Distributed Processing. Explorations in the Mi-
crostructure of Cognition, Cambridge, MA: MIT Press.
[Salcedo-Sanz et al., 2005] Salcedo-Sanz, S., Fernandez-Villacanas, J., Segovia-Vargas M., & Bousono-Calzon, C. (2005) Genetic
Programming for the Prediction of Insolvency in Non-life Insurance Companies, Computers and Operations Research, 32, 749–
765.
[Serrano-Cinca, 1996] Serrano-Cinca, C. (1996) Self Organizing Neural Networks for Financial Diagnosis. Decision Support Sys-
tems, 17, 227–238.
[Sharda, & Wilson, 1993] Sharda, R. & Wilson, R. (1993). Performance comparison issues in neural network experiments for
classification problems, In 26th Hawai International Conference on System Scientists.
[Shin & Lee, 2002] Shin, K. & Lee, Y. (2002) A Genetic Algorithm Application in Bankruptcy Prediction Modelling, Expert Sys-
tems with Applications, 23(3), 321–328.
108 Chapter 7 – Adaptive Resonance Theory Applications for Business Forecasting

[Vellido et al., 1999] Vellido, A., Lisboa, P., & Vaughan, J. Neural networks in business: a survey of applications. Expert Systems
with Applications, 17, 51-70.
[Wilson & Sharda, 1994] Wilson, R., & Sharda, R. (1994). Bankruptcy prediction using neural networks. Decision Support Systems,
11, 31–447.
[Yoon, 1993] Yoon, Y., Swales, G., & Margavio, T. (1993) A Comparison of Discriminant Analysis versus Artificial Neural Net-
works, Journal Operational Research Society, 44(1), 51-60.

You might also like