Download as pdf or txt
Download as pdf or txt
You are on page 1of 23

Annals of Operations Research 99, 403–425, 2000

 2001 Kluwer Academic Publishers. Manufactured in The Netherlands.

Comparative Analysis of Artificial Neural Network


Models: Application in Bankruptcy Prediction
CHRIS CHARALAMBOUS, ANDREAS CHARITOU and FROSO KAOUROU {bachris, charitou}@ucy.ac.cy
Department of Business Administration, University of Cyprus, P.O.Box 537, Nicosia, CY 1678, Cyprus

Abstract. This study compares the predictive performance of three neural network methods, namely the
learning vector quantization, the radial basis function, and the feedforward network that uses the conjugate
gradient optimization algorithm, with the performance of the logistic regression and the backpropagation
algorithm. All these methods are applied to a dataset of 139 matched-pairs of bankrupt and non-bankrupt
US firms for the period 1983–1994. The results of this study indicate that the contemporary neural network
methods applied in the present study provide superior results to those obtained from the logistic regression
method and the backpropagation algorithm.

1. Introduction

Artificial Neural Networks (ANN) are steadily becoming a very popular research subject
with applications in many areas, such as medicine, business, politics and technology.
Their application in business, and specifically in bankruptcy prediction has become even
more important as recent evidence suggests that ANN models can effectively capture and
represent complex relationships in areas where other statistical methods do not perform
that well. In addition, ANN models can overcome some limitations imposed by such
statistical methods [15].
Thus far, most researchers have experimented mainly with simple feedforward net-
works trained by the Back-Propagation (BP) algorithm. Evidence regarding the predic-
tive ability of ANN models based on other ANN structures is limited. The purpose of
this study is to compare the predictive performance of three contemporary ANN meth-
ods with the performance of the logistic regression and the simple feedforward network
using the BP algorithm. The present study differs from prior research in the following
respects: (a) it uses the Kohonen Learning Vector Quantization (LVQs) training algo-
rithms [17], the Radial Basis Function (RBF) network [8], and the feedforward network
that minimizes the Least Squares Error Function (LSEF) with and without a penalty term
using conjugate gradient optimization algorithm [9], in addition to the common feedfor-
ward network trained by the BP algorithm [23], (b) it compares the results of these ANN
methods with the results of the logistic regression method, by applying these meth-
ods to bankruptcy prediction. We chose to apply these ANN methodologies to bank-
ruptcy prediction because bankruptcy is considered one of the most significant threats
for many businesses today. Evidence shows that in the past two decades business fail-
ures worldwide have occurred at rates higher than at any time since the early 1930s.
404 CHARALAMBOUS ET AL.

Bankruptcy does not only affect the organization itself, but it also affects the overall
economy. Specifically, investors, creditors, management and employees are severely
affected from business failures. Since the economic cost of corporate failure is large,
there is a great need for models giving timely and robust prediction for this particular
event [5,16,25,28,31,33].
Our data set consists of 139 matched-pairs of bankrupt and non-bankrupt US firms
for the period 1983–1994. Our empirical findings indicate that the contemporary ANN
methods employed achieve better prediction results than the feedforward-BP model and
the logistic regression in all three years prior to bankruptcy.
The remainder of this study is organized as follows: section 2 motivates the study;
section 3 discusses the methodology employed in this study; empirical evidence from
the application of the ANN models in bankruptcy prediction is given in section 4; the
final section provides summary and conclusions.

2. Motivation for the study

Extant evidence shows that several researchers assessed the predictive ability of ANN
models on bankruptcy prediction. However, most of them employed the feedforward
network structure in conjunction with the standard BP algorithm. More specifically,
Odom and Sharda [19], Rahimian et al. [22], Coats and Fant [11], and Wilson and
Sharda [32] applied the Altman’s model [1] to compare the predictive ability of the feed-
forward networks with the performance of the multiple discriminant analysis (MDA).
These researchers conclude that the classification results of the ANN models are su-
perior to the results provided by the MDA method. Moreover, Altman [3] concludes
that the feedforward ANN models consisting of more than two layers achieve better
classification results than the MDA models. Leshno and Spector [18] also conclude
that (a) the prediction results of the ANN models are superior to the results of the
MDA model, and (b) the prediction accuracy is improved when learning is performed
through enhanced techniques and when financial information from distinct periods is
presented to the network. Furthermore, other researchers exhibited extended exper-
imentation with a number of statistical methods, including logistic regression, mul-
tiple discriminant analysis and ANN models on the bankruptcy problem. Boritz et
al. [6], Tam and Kiang [29,30], Charitou and Charalambous [10], Huang et al. [15],
Brocket et al. [7], Salchenberger et al. [24] and Fanning and Cogger [12] compared
the predictive ability of feedforward networks with the performance of other statisti-
cal methods, such as the k-nearest neighbor method, the non-parametric discriminant
analysis, and the logit and probit models [20]. These researchers conclude that the
ANN models are superior to the other statistical methods, partly because the ANN
models are not constrained to a particular distribution. Contrary to the above evi-
dence, Barniv et al. [4] find no significant differences in the predictive ability of the
feedforword ANN, MDA, and logit methods. Finally, Ragupathi et al. [21] and Suh
and Kim [27] applied different ANN models to bankruptcy prediction. More specif-
ically, Ragupathi et al. [21] conclude that the two-hidden-layer networks outperform
COMPARATIVE ANALYSIS OF ARTIFICIAL NEURAL NETWORK MODELS 405

the single-hidden layer networks because a single-hidden-layer network could not effec-
tively capture the complex relationships between the predictor financial variables. Suh
and Kim [27] also tried to discriminate between healthy, bankrupt and financially dis-
tressed firms using single-hidden-layer feedforward networks but the results regarding
the separation of the sample consisting the bankrupt and distressed firms were inconclu-
sive.
In summary, the aforementioned bankruptcy studies indicate that the prediction re-
sults of the feedforward networks trained by the BP algorithm are superior to the results
obtained from the standard statistical techniques (i.e., logistic regression and multiple
discriminant analysis). Even though there are other contemporary ANN methods, re-
searchers did not apply these methods to determine whether they provide superior clas-
sification results. The objective of this study is to compare the predictive performance
of three neural network methods, namely the learning vector quantization, radial basis
function, the feedforward network that uses the conjugate gradient optimization algo-
rithm with the performance of the logistic regression and the backpropagation algorithm.

3. Data set

Our data set consists of 139 matched-pairs of bankrupt and non-bankrupt US firms for
the period 1983–1994. The bankrupt firms were matched with non-bankrupt firms on the
basis of their industry classification, size, and year of bankruptcy filing. All data used
in this study were collected from the Compustat database. The data were separated into
two sub-samples, the training and the testing samples. The training sample consists of all
firms, bankrupt and healthy, included in the data set for the period 1983–1991 (total of
192 firms). The testing or forecasting sample consists of the remaining 86 firms included
in the data set for the period 1992–1994. In order to identify the predictor variables we
use twenty-seven financial variables that were found to be the most significant in the
literature. By employing univariate and stepwise regression analysis we find that the
following seven variables are the most significant in predicting bankruptcy and these are
the ones used subsequently:

• CHETA: Cash and equivalents/Total assets;


• CLTA: Current liabilities/Total assets;
• DAR: Change in accounts receivables;
• DER: (Debt due in one year + Long term debt)/Total assets;
• OPN12N: Dummy for operating income, 1 if negative for the last two years, and 0
otherwise;
• UCFFOM: Change in cash flow from operations/Market value of equity at fiscal year
end.
• WCFOM: Working capital from operations/Market value of equity at fiscal year end.
406 CHARALAMBOUS ET AL.

4. The ANN models

The following ANN models and training algorithms are used in this study:
1. Kohonen’s self-organising map plus the three learning vector quantization training
algorithms; versions 1–3.
2. The radial basis functions network, with optimization.
3. The simple feedforward network when:
(i) trained by the standard back-propagation algorithm, and
(ii) trained by the conjugate gradient optimization algorithm, minimising the least-
squares error function with and without a penalty term.
A detailed discussion of these ANN models follows:

4.1. Kohonen Self-Organizing Feature Map (SOFM)

The architecture of Kohonen’s 2-dimensional SOFM [17] is shown in figure 1. It consists


of n = nx × ny neurons arranged in a rectangular array (hexagonal array is also used in
practice). For each neuron i there is an associated weight vector wi ,
wi = [wi1 , wi2 , . . . , wiN ],
where N is the dimensionality of the input patterns. For input pattern X ∈ RN , the neu-
ron whose weight vector matches the input pattern more closely (typically, the minimum
Euclidean distance between the two vectors) is declared the winner. On the same figure
we show neighborhoods N(i, R) of neuron i of radii R = 2, 1 and 0.

Figure 1. Kohonen’s 2-dimensional rectangular array.


COMPARATIVE ANALYSIS OF ARTIFICIAL NEURAL NETWORK MODELS 407

Kohonen’s SOFM gives solution to the following problem: given a set of input
patterns X(i) ∈ RN , i = 1, 2, . . . , s, with an unknown probability density function
p(x), we seek to find the weight vectors associated with the n-neurons of Kohonen’s
2-dimensional array such that their density function is an approximation of p(x) in an
orderly fashion. By orderly fashion we imply that patterns that are close in the input
space will give winning neurons that are close in the 2-dimensional array map. It is
crucial to the formation of ordered maps that during the self-organizing process, the
winner neuron and its neighboring neurons update their weights such that they move
closer to the input pattern. Hence one of the main activities of the Kohonen SOFM is
the clustering of the input patterns into n-clusters.
The following steps describe Kohonen’s SOFM algorithm.
Step 0: Set the grid size (i.e., values of nx and ny );
Set the initial value of the learning rate parameter α to α0 ;
Set the initial value of the radius R;
Initialize the weight vector for each neuron; in this study we adopt a random
initialization of the weight vectors within the range of the input space.
Step 1: While stopping condition is false do steps 1.1–1.3.
Step 1.1: For each input pattern X, do steps (i), (ii):
(i) Determine the winner neuron I ∗ ;
(ii) Update the network:
 
wi(new) = wi(old) + α X − wi(old) , i ∈ N(I ∗ , R),
wi(new) = wi(old) otherwise.
Step 1.2: Reduce the value of the learning rate α;
Reduce the value of R.
Step 1.3: Test stopping condition: the condition may specify fixed number of
iterations.
The value of R is relatively large to start with to include all neurons. As the learning
process continues it decreases until it ultimately goes to zero, where only the weight of
the winner neuron is updated.
The learning rate α is a slowly decreasing function of training epochs, ep. In this
work α was updated using the formula:
 
ep
α = α0 1 − ,
epmax
where epmax is the maximum number of epochs.
The Kohonen SOFM’s main activity is clustering the input patterns and cannot be
used as a pattern classifier.
408 CHARALAMBOUS ET AL.

4.2. Learning Vector Quantization (LVQ)

Learning vector quantization is a pattern classification method. Several output neurons


called reference neurons are assigned to each class. Their corresponding weight vectors
are called reference vectors for the class that they represent. The initial values of the
reference vectors can be obtained by applying Kohonen SOFM on the patterns in each
class. Kohonen [17] suggested three different ways of adjusting the position of the ref-
erence vectors resulting into algorithms LVQ1, LVQ2 and LVQ3. An unknown input
pattern X will be assigned to the class that the winner reference vector represents. From
all reference vectors there is only one winner neuron; the neuron whose reference weight
vector is closest to input pattern X. Neurons cannot change classes.
The following steps describe the main steps of the training phase of the three ver-
sions of the LVQ algorithm, i.e., LVQ1, LVQ2 and LVQ3.

LVQ1
Step 0: Set the number of neurons representing each class; For each class use Kohonen
SOFM to find the initial position of the reference vectors of these neurons.
Initialize the learning rate α.
Step 1: While stopping condition is false, do steps 1.1–1.3:
Step 1.1: For each training input pattern X do steps (i), (ii):
(i) Determine the winner neuron I ∗ .
(ii) Update the network:
 
wI(new)
∗ = wI(old)
∗ + α X − wI(old)
∗ if neuron I ∗ and pattern X
belong in the same class,
 
wI(new)
∗ = wI(old)
∗ − α X − wI(old)
∗ if neuron I ∗ and pattern X
belong in different class,
wi(new) = wi(old) for i = I ∗ .
Step 1.2: Reduce the value of the learning rate α.
Step 1.3: Test stopping condition.
In the LVQ1 method only the reference vector that corresponds to the winning
neuron is updated. In the improved algorithms (LVQ2 and LVQ3), two vectors, the
winner and the runner up are updated, if several conditions are satisfied. The idea is that
if the input is approximately the same distance from both the winner and the runner up,
then both corresponding reference vectors should be updated.

LVQ2
In this case step 1.1 is modified as follows:
(i) Determine the winner neuron I ∗ and the runner-up J ∗ .
(ii) Update both wI ∗ and wJ ∗ , if the following two conditions hold:
COMPARATIVE ANALYSIS OF ARTIFICIAL NEURAL NETWORK MODELS 409

(a) If I ∗ and J ∗ belong to two different classes;


(b) The distance of the input vector to the winner neuron, dI ∗ , and from the input
vector to the runner-up, dJ ∗ , are approximately equal.
This condition can be expressed in the form,
 
dI ∗ dJ ∗
min , > ρ,
dJ ∗ dI ∗
where ρ is usually set to the value 2/3.
In this case,
 
wi(new) = wi(old) + α X − wi(old) ,
where i is either I ∗ or J ∗ and belongs in the same class as X,
 
wj(new) = wj(old) − α X − wj(old) ,
where j is either I ∗ or J ∗ (j = i) and does not belong in the same class as X,

wk(new) = wk(old) for k = I ∗ ,J ∗ .


(iii) If (ii) does not hold use the LVQ1 algorithm, in which case only the reference
weight vector corresponding to the winner neuron is updated.
The LVQ2 algorithm presented above is a modification to the original LVQ2 algo-
rithm proposed by Kohonen [17]. In the original LVQ2 algorithm part (iii) is missing,
and in part (ii) there is the additional condition to be satisfied, that X and J ∗ belong to
the same class. As it was commented by Kohonen, in the original LVQ2 algorithm the
classification accuracy is first improved and then decreases if the algorithm is allowed to
run for too long. This modification tries to alleviate this problem. The modified LVQ2
algorithm was used to obtain the numerical results presented in the paper.

LVQ3
This algorithm is the same as the LVQ2 algorithm except part (iii) of step 1.1, which is
modified as follows:
(iii) If X, wI ∗ and wJ ∗ belong to the same class then,
   
wk(new) = wk(old) + εα X − wk(old) for k ∈ I ∗ , J ∗ ,
wk(new) = wk(old) for k = I ∗ ,J ∗ ,
else (this case occurs if I ∗ and J ∗ belong to two different classes and the conditions
stated in (ii) do not hold)
wk(new) = wk(old) for all k.
410 CHARALAMBOUS ET AL.

An illustrative example
This example illustrates the learning behavior of the LVQ algorithms as pattern clas-
sifiers. We would like to distinguish between two classes of “overlapping”, two-
dimensional, normally-distributed patterns, labeled C1 and C2 .
Class C1 consists of 200 sample points and has the following parameters:
µ1 = mean = [0 0]T , σ12 = variance = 1.
Class C2 consists of 200 sample points and has the following parameters:
µ2 = [2 0]T , σ22 = 4.
Figure 2 shows the joint scatter plots of the two classes. Two reference vectors are used
for each class, denoted by circles for C1 and by stars for C2 . Figures 2–4 show the final
position of the reference vectors and the decision boundaries obtained by applying the
LVQ1, the LVQ2 and the LVQ3 algorithms. The classification accuracies obtained by
the three methods are:
LVQ1: 98% for class 1 and 66.5% for class 2, giving 82.3% overall correct classifica-
tion.
LVQ2: 95% for class 1 and 70% for class 2, giving 82.8% overall correct classification.
LVQ3: 87% for class 1 and 74% for class 2, giving 80.5% overall correct classification.

Figure 2. Decision boundaries constructed by the LVQ1 method.


COMPARATIVE ANALYSIS OF ARTIFICIAL NEURAL NETWORK MODELS 411

Figure 3. Decision boundaries constructed by the LVQ2 method.

Figure 4. Decision boundaries constructed by the LVQ3 method.


412 CHARALAMBOUS ET AL.

It is important to note that the decision boundary between two class regions is
made up from linear pieces. As it is shown in figure 4, even though we started with two
reference vectors for C1 , these vectors overlap, resulting into a single reference vector
for C1 . The probability of correct classification produced by the Bayesian (optimum)
classifier is calculated to be (see [14]) 81.5%. The fact that this optimum result was
exceeded for the case of the LVQ1 and LVQ2 algorithms is attributed to the fact that we
used a finite number of sample points for the two classes.
Furthermore, in order to examine the connection between the LVQ’s architecture
and the single perceptron architecture, we consider the case where one reference vector
for each class is used, for a two class problem. Let w1 and w2 be the final position of
the two reference vectors. In this case the resulting decision boundary between the two
class regions obtained by the LVQ method is the hyperplane that is orthogonal to the line
segment joining the two reference vectors and cuts the line segment in the middle (see
figure 5(a)).

(a)

(b) (c)
Figure 5. Both the Kohonen’s architecture and that of the single preceptron lead to the same hyperplane
as the decision boundaries. (a) A two class decision boundary; (b) Kohonen’s architecture; (c) Single
perceptron architecture.
COMPARATIVE ANALYSIS OF ARTIFICIAL NEURAL NETWORK MODELS 413

Let X be any point on the decision boundary. Using the orthogonality property
between vectors X − (w1 + w2 )/2 and (w1 − w2 ) we obtain the equation of the decision
boundary,
 
w1 + w2
(w1 − w2 ) X −
T
=0
2
which can be expressed in the form
w T x + w0 = 0,
where
 
w = w1 − w2 , w0 = − 12 w1T w1 − w2T w2 .
A perceptron neuron is shown in figure 5(c). Its analog output ψ, is given by
ψ = w T x + w0 ,
where w is the perceptrons N-dimensional weight vector, w = [w1 , w2 , . . . , wN ]T and
w0 is its threshold weight. The analog output is passed through a hard-limit function to
produce the binary output y; y = 1 if ψ  0, and y = −1 if ψ < 0. The hyperplane
w T x + w0 = 0 (ψ = 0) divides the input space, RN , into two subregions. For a two
class classification problem we want to choose weights w and w0 such that we get the
“best” separation between the two classes. Hence in this case the LVQ network shown
in figure 5(b) and the perceptron network shown in figure 5(c) will produce the same
decision boundary if weights w1 , w2 of the LVQ network and weights w, w0 of the
perceptron network are related by the above equations.
By noting that the difference between the single perceptron model and the logistic
model is the replacement of the hard-limiting function by the log-sigmoid function, the
connection between the LVQ’s architecture and that of the logistic model can be seen.

4.3. Radial Basis Functions Network (RBF)

The second ANN structure examined is the Radial Basis Function (RBF) network shown
in figure 6. Although this network has a feedforward structure and it consists of a single
hidden layer and an output layer, it differs from the simple feedforward network because
it does not have weights in its hidden layer, but instead it has centers. The basic attribute
of this network is that all neurons in the hidden layer have locally tuned response charac-
teristics. These neurons are fully interconnected to a number of linear units in the output
layer (in this case there is only one output neuron). The purpose of RBF networks is to
transform a non-linearly separable classification problem into a linearly separable one.
Once an input vector is presented to the network, the hidden unit outputs are obtained
by calculating the closeness of the input vector X to the weight vector (center) of each
one of the hidden units. The function used to calculate the closeness is as follows:
ri2
(ri ) = exp − ,
2σi2
414 CHARALAMBOUS ET AL.

Figure 6. Architecture opf RBF network.

where:
ri = X − wi : the distance between X and wi ;
wi : weight vector associated with neuron i in the hidden layer;
σi2 : variance associated with neuron i;
X: input vector.
The above function gives an appreciable value (close to 1) only when the distance
between the input vector X and the weight vector of neuron i, wi is small; otherwise
it gives a value very close to zero. In order to reach the final output of the network,
we have to multiply the output vector of the hidden layer by the corresponding weight
vector associated with the neuron in the output layer. This weight vector is computed
using the pseudoinverse method, and the given targeted values. Since the output neuron
is linear, the actual output of the network is:
H
y = v0 + v∗i (ri ),
i=1

where H is the number of neurons in the hidden layer.


The weight vectors, wi of the neurons in the hidden layer can be randomly chosen
directly from the training data, or selected in a self-organized way. In the self-organized
way, we apply the Kohonen SOFM to cluster the training data of one of the two classes
into H -classes. In this paper we used the self-organized way to assign the values wi .
The variance σi2 associated with neuron i is set equal to the variance of the training data
in cluster i.
The RBF model just described is a simple model. In order to improve its per-
formance we applied optimization techniques for training the network. The conjugate
gradient optimization algorithm is used for the optimization phase, where it seeks to
COMPARATIVE ANALYSIS OF ARTIFICIAL NEURAL NETWORK MODELS 415

Figure 7. Architecture of a single hidden layer feedforward network.

minimise the least-squares error function between the actual output and the correspond-
ing targeted output, using the weight vectors wj and the weights of the output layer, v0
and vi , as variables. The initial values of the wj are those obtained above by using the
SOFM and the values of the v0 and vi are those obtained by using the pseudoinverse
method. The values of σi are kept fixed, as above.

4.4. Feedforward network

Figure 7 shows the final ANN architecture applied in this study which is based on a
simple feedforward network. The network used consists of three layers, the input layer,
the hidden layer with a number of hidden neurons, and the output layer with a single
neuron. The hidden layer uses the hyperbolic tangent sigmoid activation function fH (·),
while the output layer uses the log-sigmoid activation function f0 (·). (The hyperbolic
tangent sigmoid activation function can also be used in the output layer.) In this study,
the simple feedforward network is applied with two different training algorithms: (a) the
standard back-propagation algorithm, and (b) the conjugate gradient optimization algo-
rithm which minimises the least-squares error function plus a penalty term.

4.4.1. The Back-Propagation (BP) training algorithm


The BP algorithm attempts to minimize the least-squares error function (LSEF), defined
as the squared of the differences between the actual network output and the targeted
output. In its general form it adapts the weight vector from its present value, by taking
a nonnegative linear combination of the negative of the gradient vector of the LSEF
and the most recent weight vector change, with multipliers the learning rate λ and the
momentum coefficient µ. Although the BP algorithm is a simple learning algorithm
for training feedforward networks, unfortunately it is considered very inefficient and
unreliable. The algorithm can exhibit oscillatory behavior or can even diverge depending
on the values of the coefficients λ and µ used [9].
416 CHARALAMBOUS ET AL.

4.4.2. The conjugate gradient optimization algorithm


This particular algorithm seeks to find the best weights for all neurons in the hidden and
the output layer by utilizing the conjugate gradient optimization algorithm (see [13]) to
minimise the penalty function Pf (see [26]).

Pf (W ) = F (W ) + P (W ),

where
s
 2
F (W ) = 0.5 yi (W ) − ti ,
i=1
 
N H
β(wlm )2
N H
 2
P (W ) = e1 + e2 wlm ,
l=1 m=1
1 + β(wlm )2 l=1 m=1

yi is the actual output of the network to input X (i) ,


ti is the targeted output to input X (i) ,
s denotes the number of feature vectors,
H is the number of hidden units,
e1 , e2 are penalty parameters,
w m is an N-dimensional weight vector associated with the connection of the input layer
and the mth hidden unit.
The function F (W ) is the least-squares error-function between the actual and the
targeted outputs of the network and P (W ) is the penalty term. The penalty term is
designed to reduce the size of the weights (2nd term) and eliminate small weights (1st
term), hence improving generalization irrespective of the optimization algorithm used
to minimize F(W). The value of β controls the minimum value of weights for which
each term is summations in the 1st term is approximately equal to e1 . The following
parameter values were used in this work: e1 = 10−1 , e2 = 10−4 , β = 10.
In order to ensure that the optimization part works better than that used by the
BP and in order to improve generalization, we minimize the penalty function F (W ) by
using the conjugate gradient algorithm. The conjugate gradient algorithm is an improved
optimization algorithm that eliminates the arbitrary choice of the values of the learning
parameter λ and the momentum parameter µ used by BP. Furthermore, this algorithm
is based on a sound theoretical basis. With a fairly accurate line search algorithm, this
method is guaranteed to find a local minimum with a fast rate of convergence (for details
see [13]). Therefore, it is expected that this algorithm will produce better results than
the standard BP algorithm when employed for the training of a feedforward network by
minimizing function Pf (W ). Charalambous [9], demonstrated by a number of examples
that the performance of the conjugate gradient training algorithm is superior to that of
the conventional BP training algorithm.
COMPARATIVE ANALYSIS OF ARTIFICIAL NEURAL NETWORK MODELS 417

5. Empirical results

In this section we discuss (a) overview of data and (b) comparison of the prediction
results obtained by the logistic regression and by the ANN models.

(a) Overview of data

Figure 8 shows the trend of the seven significant variables used in all our models. Con-
sistent with our expectations, these trends reveal that there are major differences between
the two groups of companies (i.e., the bankrupt and healthy companies). As the year of
bankruptcy approaches, the medians of the WCFOM, OPNI2N and DER variables drop
substantially, whereas the medians of these variables for the healthy group remain stable
over the three-year period tested. Moreover, the median of the CLTA and DAR variables
for the bankrupt companies increases as the firm approaches bankruptcy, whereas it re-
mains relatively stable and at a lower level for the healthy companies. The median of the
UCFFOM variable also shows a significant difference between the two groups of firms.
These results show that as the firms get closer to the year of bankruptcy they are unable
to generate cash from their operations. Finally, as it was expected, the median of the
CHETA variable is lower for the bankrupt firms in all years examined.
Table 2 in the appendix shows the coefficients resulting from the application of
the logistic regression method on data taken over the period 1983–1991 for 96 matched-
pairs of bankrupt and non-bankrupt firms. These coefficients are then used to estimate
the probability of bankruptcy for a testing sample consisting of 43 matched-pairs of
bankrupt and non-bankrupt firms over the period 1992–1994.
Table 3 in the appendix shows the final weights of the training phase of the five
types of neural networks applied over the three years prior to bankruptcy. Annotation
H represents the number of neurons in the hidden layer for the feedforward and RBF
networks, and the number of reference neurons for the LVQ network (the best prediction
results of the LVQ2 and LVQ3 are quoted). The ith row of matrix W represents the
weight vector wi associated with neuron i. For the feedforward network, the last column
of W and V correspond to threshold weights.

(b) Comparison of the prediction results

Table 1 presents training and prediction results obtained by applying logistic regression
and neural network methods. The results of the logistic regression method on the training
sample show that the model classifies correctly 82.3%, 74.5% and 69.8% of the total
sample, for the first, second and third year prior to bankruptcy, respectively. To validate
these prediction results we use an out of sample period ex-ante test. The testing results
show that the logistic regression model classifies correctly 77.9%, 68.6% and 64%, one,
two and three years prior to bankruptcy respectively. Comparing the type I and type II
error rates of these models we observe that the type I error rates are much higher in all
418 CHARALAMBOUS ET AL.

Figure 8. Trends of the significant variables used in the models. Medians are presented for each variable
for all three years prior to bankruptcy.
COMPARATIVE ANALYSIS OF ARTIFICIAL NEURAL NETWORK MODELS 419

Table 1
Empirical results.

Models Year prior to Training Testing


bankruptcy Overall Overall Type I Type II
Logistic regression 1 82.3% 77.9% 30.2% 14%
2 74.5% 68.6% 37.2% 25.6%
3 69.8% 64.0% 41.9% 30.2%
Feedforward network BP 1 71.4% 73.3% 30.2% 23.3%
2 64.6% 68.6% 39.5% 34.9%
3 68.8% 67.4% 30.2% 34.9%
Feedforward network (LSEF+PEN+CG) 1 87.0% 82.6% 11.6% 23.3%
2 78.1% 73.3% 27.9% 25.6%
3 74.5% 70.9% 27.9% 30.2%
Feedforward network LSEF+CG 1 89.6% 81.4% 14.0% 23.3%
2 79.2% 73.3% 21.0% 32.6%
3 74.5% 72.1% 18.6% 37.2%
Kohonen SOFM with LVQ 1 83.3% 80.2% 20.9% 18.6%
2 75.0% 69.8% 32.6% 27.9%
3 68.2% 69.8% 25.6% 34.9%
RBF network 1 81.3% 79.1% 20.9% 20.9%
2 74.5% 73.3% 25.6% 27.9%
3 72.9% 69.8% 27.9% 32.6%
LVQ with 2 neurons 1 83.9% 79.1% 25.6% 16.3%
2 75.0% 69.8% 32.6% 27.9%
3 68.2% 69.8% 25.6% 34.9%
This table shows the empirical findings resulting from the application of the logistic regression and the
ANN methods. The training column represents the overall prediction results of the training phase using
data over the period 1983–1991, while the testing column represents the overall prediction results, as
well as the types I and II errors of the testing phase using data over the period 1992–1994. Results are
presented for all three years prior to bankruptcy.

years tested.1 These results are not so encouraging since evidence shows that the type I
error rates could be 35 times more costly than the type II error rates [2].
Since the major objective of our study is to compare the predictive performance
of various ANN methods with the classification accuracy of the logistic approach, we
also apply the ANN methods described in the previous section. The first three models
applied are based on the feedforward structure. The first model uses common BP learn-
ing algorithm while the second and the third models make use of the conjugate gradient
optimization algorithm, which minimises the LSEF with and without the penalty term.
All three networks consist of two layers of weights and have the same number of hidden
neurons in their hidden layer. The number of neurons in the hidden layer is based on the
1 Type I error is the misclassification of a bankrupt firm as healthy and type II error is the misclassification
of a healthy firm as bankrupt.
420 CHARALAMBOUS ET AL.

performance of the second feedforward network. Furthermore in order to improve the


classification accuracy of the models we use a different feedforward network for each
year prior to bankruptcy. More specifically, the hidden layer consists of two neurons for
the first and second year prior to failure, and it consists of three neurons for the third
year prior to bankruptcy.2 The prediction results of the feedforward networks trained
by the conjugate gradient algorithm outperform the results of the network trained by the
standard BP in all years tested. More specifically, the feedforward network using the
BP achieves 73.3%, 68.6% and 67.4% overall correct prediction for the first, second and
third year prior to failure, respectively (see table 1). On the other hand, the network
trained by the conjugate gradient algorithm by minimizing the LSEF with the penalty
term (without the penalty term) achieved 82.6%, 73.3%, 70.9% (81.4%, 73.3%, 72.1%)
overall correct prediction for the first, second and third year prior to failure, respectively.
However, it should be noted that the models using the conjugate gradient algorithm not
only achieve higher overall prediction results, but they also produce much lower type I
error rates than the model using the BP algorithm. It should be also noted that, at least
for the problem considered in this paper, the improvement gain over the common BP is
mainly due to the application of the conjugate gradient algorithm.
Another neural network method examined in the present study is the Kohonen
SOFM model using the LVQ algorithms for enhanced learning and improved classifi-
cation. This model uses a different number of neurons each year prior to bankruptcy.
Hence, the map consists of two reference vectors for each class in the first year prior
to bankruptcy and one reference vector for each class in the second and third year prior
to bankruptcy. Table 1 presents the best prediction results of the LVQ2 and the LVQ3
methods. After fifty epochs these models produced an overall correct classification accu-
racy of 80.2%, 69.8% and 69.8% for the first, second and third year prior to bankruptcy,
respectively. This ANN method also results in lower type I error rates than the rates
obtained from the feedforward network method trained by the BP algorithm and from
the logistic regression method.
Moreover, the RBF network was applied using the Kohonen SOFM to find the
initial positions of the centers. In order to improve the prediction accuracy of the models
we use a different number of neurons each year. Two neurons are used for the first year
and six neurons for the second and the third year prior to bankruptcy. After a maximum
of forty epochs using the conjugate gradient algorithm to minimise the LSEF (without
the penalty term), the model produced an overall prediction accuracy of 79.1%, 73.3%
and 69.8% for the first, second and third year prior to failure, respectively. Again, these
prediction results are superior to those results obtained from both the logistic regression
method and from the simple feedforward method using the BP algorithm.
Finally, table 1 also presents the results obtained from the LVQ method using one
reference vector for each class. As it was discussed in the previous section, the decision
2 The feedforward network using the BP algorithm was also applied with seven and ten neurons on data
from each year prior to bankruptcy. The best results obtained on the testing data were 79.1% for the first
year (10 neurons), 68.6% for the second year (7 neurons), and 58.1% for the third year (10 neurons) prior
to bankruptcy.
COMPARATIVE ANALYSIS OF ARTIFICIAL NEURAL NETWORK MODELS 421

surface between the two classes is a single hyperplane. Comparing these results with
the results given by the logistic regression method we observe that for this problem
the results given by the decision hyperplane of the LVQ model are superior to those
results given by the decision hyperplane of the logistic regression. Hence, if we are
interested for a single hyperplane to classify our data, it will be worthwhile to use the
LVQ method with one reference vector for each class in addition to using the logistic
regression method.

6. Conclusions

This study provides new evidence on the predictive ability of three contemporary Neural
Network (NN) methods in predicting bankruptcy. Specifically, our results indicate that
the three contemporary NN methods applied in the present study provide superior re-
sults to those obtained from the logistic regression method and from the BP algorithm.
The results of this study also encourage further research that may improve our under-
standing of the usefulness of these contemporary NN methods to other business issues.
Since the predictive performance of these NN methods may depend on the characteris-
tics of the dataset and on the complexity of the issue under examination, researchers
are encouraged to apply these NN methods to other business and non-business is-
sues in order to determine whether these methods indeed provide superior results to
those results obtained from the most common statistical and NN methods applied thus
far.

Appendix

Table 2
Logistic regression models.

Years prior to Constant CHETA CLTA DAR DER OPNI2N UCFFOM WCFOM
bankruptcy
1 −2.1571 −7.5201 3.3017 0.2619 1.8878 1.7422 −0.017 −0.8003
(0.0001) (0.0076) (0.0014) (0.0388) (0.0165) (0.0002) (0.0544) (0.0056)
2 −1.9448 −0.9192 3.4995 0.0499 1.4871 1.3414 −0.0045 −0.6698
(0.0003) (0.4934) (0.0003) (0.5536) (0.0858) (0.0023) (0.6574) (0.0506)
3 −2.142 0.3761 3.5297 0.2625 2.1247 2.0053 −0.0133 −0.2581
(0.7243) (0.2914) (0.0147) (0.0003) (0.2655) (0.6334)
The dependent variable takes the value of 1 if the firm belongs in the bankrupt group and it takes the value of
0 if the firm belongs in the non-bankrupt group; CHETA: Cash and equivalents/Total assets; CLTA: Current
liabilities/Total assets; DAR: Change in accounts receivalbes; DER: (Debt due in one year + Long term
debt)/Total assets; OPN12N: Dummy for operating income, 1 if negative for the last two years, 0 otherwise;
UCFFOM: Change in cash flow from operations/Market value of equity; WCFOM: Working capital from
operations/Market value of equity at fiscal year end.
422 CHARALAMBOUS ET AL.

Table 3
Neural network models.
One year prior to bankruptcy
Panel A1: Feedforward network – BP H = 2
1.1378 6.035 0.4225 5.6004 10.0373 −69.0788 −25.447 10.2213
W =
0.4319 −1.1075 0.0573 −1.0423 −2.414 26.2243 7.6243 −2.1903

V = [0.4911 − 0.6286 0.2291]

Panel B1: Feedforward network – (LSEF+PEN+CG) H = 2


−1.8927 0.6696 0.8902 1.4811 0.1404 0.6403 −5.4501 0.0304
W =
−0.8818 −0.1251 0.7115 −1.0603 −2.9015 −2.4670 −2.3493 1.2592

V = [5.3332 3.6595 2.5470]

Panel C1: Feedforward network – (LSEF+CG) H = 2


−1.0651 0.6937 1.0149 0.6666 −0.2104 0.4248 −3.6631 −0.1501
W =
−0.8713 −0.7776 −0.3728 −0.4014 2.4054 −2.3387 −1.2266 0.5088

V = [3.2120 1.9415 1.7375]

PanelD1: LVQ H = 4 (2 neurons for each class) 


−0.3337 0.5692 0.3717 0.2359 0.5626 −0.4619 −0.3882
 −0.2948 0.9144 −0.0417 0.7625 0.8896 −0.4498 −0.6868 
W =  0.9355 −0.9349 −0.3837 −0.3083

−0.7402 0.4992 0.3215 
0.6008 −0.7945 −0.3682 −0.4376 −0.9681 0.8216 0.5317

Panel E1: RBF network H = 2


−6.0284 9.0388 2.8713 6.6862 10.8605 −5.3700 −7.1086
W =
1.6729 −1.4228 0.0322 −0.4301 −1.6995 −1.0753 3.1556

V = [4.2264 − 4.5051 0.2403]


Two years prior to bankruptcy
Panel A2: Feedforward network – BP H = 2
−0.9472 0.4568 0.0153 0.2079 0.9137 −0.0025 0.0008 −0.0610
W =
−0.9129 0.6657 0.0209 0.3435 −0.4679 −0.0015 0.0042 −0.3602

V = [2.3203 − 1.5673 − 0.4738]

Panel B2: Feedforward network – (LSEF+PEN+CG) H = 2


0.4265 −0.2505 −1.3732 −0.1375 −2.2110 0.6281 4.6694 −1.6069
W =
−1.0015 −0.0135 −0.0106 0.8958 0.0351 0.3540 0.3133 0.0055

V = [−2.6912 0.7041 0.1428]

Panel C2: Feedforward network (LSEF+CG) H = 2


0.4459 −0.2638 0.6065 −0.6256 −2.1943 −0.4012 4.2731 −1.7907
W =
0.2655 0.5412 1.4057 −0.0675 0.3286 −2.8280 −2.0322 −1.3579

V = [−1.7905 2.3249 1.2258]


COMPARATIVE ANALYSIS OF ARTIFICIAL NEURAL NETWORK MODELS 423

Table 3
(continued).
Panel D2: LVQ H = 2 (1 neuron in each class)
−0.3616 1.5204 0.3626 0.9658 1.6687 −0.9212 −0.9856
W=
0.9059 −1.6916 −0.8440 −1.3930 −1.6048 0.7174 1.0205

PanelE2: RBF network H = 6 


0.5491 0.0933 −0.2754 0.5073 0.1703 3.2189 0.1299
 −0.6229 2.4257 0.0938 1.8564 1.5911 −0.3132 −1.9662 
 
 0.1068 −0.5187 5.7912 0.9618 0.5884 −0.8149 0.1660 
W =  1.0077 −1.2296

 0.3284 −0.2106 −0.3967 0.8109 3.1105 

 −0.8129 0.2631 −0.4201 2.0725 2.2448 0.0883 12.2349 
−0.2778 −0.3915 −0.5226 2.7688 2.0541 0.1313 −0.0734

V = [0.0793 1.0007 0.9312 − 4.2426 2.2369 − 0.0665 − 1.6174]

Three years prior to bankruptcy


PanelA3: Feedforward network – BP H =3 
−0.7270 0.3792 0.0689 −0.0019 −1.1852 0.8447 0.0755 1.5048
W =  −0.3440 −0.0648 −0.2545 −0.6848 −0.3843 −0.2574 0.0060 0.1898 
0.7429 −0.6306 −0.9100 −0.5446 −1.3823 0.8954 0.2596 −1.6827

V = [−0.8188 − 1.1062 − 1.1074 − 0.1888]

PanelB3: Feedforward network – (LSEF+PEN +CG) H = 3 


−0.1967 −0.6042 −0.3799 −1.0167 −0.9688 1.8338 0.6458 0.1237
W =  −0.7895 1.0564 1.0607 0.6921 1.3755 −0.0276 −1.2904 −0.4315 
−0.0201 1.1906 1.7053 2.5600 3.2045 −0.0235 −1.9127 −0.7539

V = [−4.5711 − 0.4560 − 1.9622 0.0439]

PanelC3: Feedforward network –(LSEF+CG) H =3 


−0.7125 −0.1782 0.2156 −0.5470 2.0544 0.4692 −1.6413 −0.2063
W =  −0.6069 −1.4670 0.1097 −3.8194 0.8800 3.3380 −0.8444 1.2036 
1.3347 2.2249 2.0288 −2.4566 0.2956 −1.4395 −4.3841 −1.6313

V = [0.8597 − 1.2520 1.2879 1.5396]

Panel D3: LVQ H = 2 (1 neuron in each class)


0.1626 1.4887 0.8099 1.0113 2.1671 −0.6303 −1.5203
W=
0.6731 −2.4469 −1.3860 −1.5096 −1.4135 1.3178 0.1943

PanelE3: RBF network H = 6 


0.4701 3.1690 8.3139 −0.4060 0.7337 −0.2029 0.6064
 2.1673 −0.0191 −0.4199 2.0093 4.5180 0.7420 −3.8664 
 
 0.5702 0.6723 −0.1506 0.3015 2.3727 −0.2148 −0.1712 
W =  0.8561 −2.1416 −1.7890

 −0.9070 0.4528 1.9088 0.3285 

 0.1254 −1.1884 0.3241 −0.8669 −1.5368 −0.3077 −0.9046 
−0.6828 −1.1513 −0.4106 0.9076 1.8634 3.2214 −1.2427

V = [1.7243 1.7332 0.2569 − 3.2426 − 0.7570 − 0.0646 − 0.9124]


424 CHARALAMBOUS ET AL.

Acknowledgements

We gratefully acknowledge the helpful comments and suggestions of P. Hadjicostas,


G. Hadjinicolas, A. Soteriou, L. Trigeorgis, A. Kyprianou and workshop participants at
the 1998 APMOD conference. This project was partially financially supported by the
University of Cyprus. Remaining errors are the responsibility of the authors.

References

[1] E. Altman, Financial ratios, discriminant analysis and the prediction of corporate bankruptcy, Journal
of Finance XXIII (September 1968).
[2] E. Altman, R. Halderman and P. Narayaman, Zeta analysis, Journal of Banking and Finance (June
1977).
[3] E. Altman, G. Marco and F. Varetto, Corporate distress diagnosis: Comparisons using linear dis-
criminant analysis and neural networks (the Italian experience), Journal of Banking and Finance 18
(1994).
[4] R. Barniv, A. Agarwal and R. Leach, Predicting the outcome following bankruptcy filing: A three-
state classification using neural networks, International Journal of Intelligent Systems in Accounting,
Finance and Management 6 (1997).
[5] J.E. Boritz, The “Going Concern” assumption: Accounting and auditing implications, Research Re-
port, CICA (1991).
[6] J.E. Boritz, D.B. Kennedy and A. de Miranda e Albuquerque, Predicting corporate failure using a
neural netwotk approach, International Journal of Intelligent Systems in Accounting, Finance and
Management 14 (1995).
[7] P.L. Brockett, W.W. Cooper, L.L. Golden and U. Pitaktong, A neural network model for obtaining an
early warning of insurer insolvency, Journal of Risk and Insurance 61(3) (September 1994).
[8] D.S. Broomhead and D. Lowe, Multivariate functional interpolation and adaptive networks, Complex
Systems 2 (1988) 321–355.
[9] C. Charalambous, Conjugate gradient algorithm for efficient training of artificial neural networks,
IEEE Proceedings 139(3) (June 1992).
[10] A. Charitou and C. Charalambous, The prediction of earnings using financial statement information:
Empirical evidence with logit models and artificial neural networks, International Journal of Intelli-
gent Systems in Accounting, Finance and Management 5 (1996).
[11] P. Coats and F.L. Fant, Recognizing financial distress patterns using a neural network tool, Financial
Management (Autumn 1993).
[12] K. Fanning and K. Cogger, A comparative analysis of artificial neural networks using financial distress
prediction, International Journal of Intelligent Systems in Accounting, Finance and Management 3
(1994).
[13] R. Fletcher, Practical Optimization (Wiley, 1980).
[14] S. Hayking, Neural Networks: A Compehensive Foundation, 2nd ed. (Prentice-Hall, 1999).
[15] C.S. Huang, R.E. Dorsey and M.A. Boose, Life insurer financial distress prediction: A neural network
model, Journal of Insurance Regulation 13(2) (1995).
[16] F. Jones, Current techniques in bankruptcy prediction, Journal of Accounting Literature 6 (1987).
[17] T. Kohonen, The self-organizing map, Proc. IEEE 78(9) (September 1990).
[18] M. Leshno and Y. Spector, Neural network prediction analysis: The bankruptcy case, Neurocomputing
10 (1996).
[19] M. Odom and R. Sharda, A neural network model for bankruptcy prediction, in: Proc. IEEE Interna-
tional Conference on Neural Networks (San Diego, CA, 1990).
COMPARATIVE ANALYSIS OF ARTIFICIAL NEURAL NETWORK MODELS 425

[20] J. Ohlson, Financial ratios and the probabilistic prediction of bankruptcy, Journal of Accounting Re-
search 18(1) (Spring 1980).
[21] W. Ragupathi, L.L. Schkade and B.S. Raju, A neural network approach to bankruptcy prediction, in:
Proc. IEEE 24th Annual Hawaii International Conference on Systems Science (1991).
[22] E. Rahimian, S. Singh, T. Thammachote and R. Virmani, Bankruptcy prediction by neural network,
in: Neural Networks in Finance and Investing, eds. R.R. Trippi and E. Turban (Probus, Chicago,
1992).
[23] D. Rumelhart, G. Hinton and G. Williams, Learning internal representations by error propagation, in:
Parallel Distributed Processing, Vol. 1, eds. D. Rumelhart and J. McCleland (MIT Press, 1986).
[24] L. Salchenberger, E. Cinar and N. Lash, Neural networks: A new tool for predicting thrift failures,
Decision Sciences 23 (1992).
[25] J. Scott, The probability of bankruptcy: A comparison of empirical predictions and theoretical models,
Journal of Banking and Finance 5 (1981).
[26] R. Setiono and H. Liu, Neural-network feature selector, IEEE Transactions on Neural Networks 8(3)
(1997) 654–661.
[27] Y. Suh and J. Kim, Current artificial neural network models for bankruptcy prediction, Journal of
Accounting & Business Research 4 (1996).
[28] T.S. Suan and K.H. Chye, Neural network applications in accounting and business, Accounting and
Business Review 4(2) (July 1997).
[29] K.Y. Tam and M.Y. Kiang, Managerial applications of neural networks: The case of bank failure
predictions, Management Science 28 (1992).
[30] K.Y. Tam and M.Y. Kiang, Predicting bank failures: A neural network approach, Applied Artificial
Intelligence 4 (1990).
[31] D. Trigueiros and R. Taffler, Neural networks and empirical research in accounting, Accounting and
Business Research 26(4) (1996).
[32] R. Wilson and R. Sharda, Bankruptcy prediction using neural networks, Decision Support Systems 11
(1994).
[33] C. Zavgren, The prediction of corporate failure: The state of the art, Journal of Accounting Literature
2 (1983).

You might also like