Download as pdf or txt
Download as pdf or txt
You are on page 1of 10

Expert SystemsWith Applications, Vol. 9, No. 4, pp.

503-512, 1995
Copyright © 1995 Elsevier Science Ltd
Pergamon Printed in the USA. All fights reserved
0957-4174/95 $9.50 + .IX)
0957-4174(95)00020-8

Effectiveness of Neural Network Types for Prediction of


Business Failure

J, EFR~M BORITZ AND DUANE B KENNEDY


School of Accountancy, University of Waterloo, Waterloo, Ontario, Canada N2L 3G 1

Abstract--The study examines the effectiveness of different neural networks in predicting bankruptcy
filing. Two approaches for training neural networks, Back-Propagation and Optimal Estimation Theory,
are considered. Within the back-propagation training method, four different models (Back-Propagation,
Functional Link Back-Propagation With Sines, Pruned Back-Propagation, and Cumulative Predictive
Back-Propagation) are tested. The neural networks are compared against traditional bankruptcy
prediction techniques such as discriminant analysis, logit, and probit. The results show that the level of
Type I and Type H errors varies greatly across techniques. The Optimal Estimation Theory neural
network has the lowest level of Type 1 error and the highest level of Type H error while the traditional
statistical techniques have the reverse relationship (i.e., high Type I error and low Type H error). The
back-propagation neural networks have intermediate levels of Type I and Type H error. We demonstrate
that the performance of the neural networks tested is sensitive to the choice of variables selected and that
the networks cannot be relied upon to "sift through" variables and focus on the most important variables
(network performance based on the combined set of Ohlson and Altman data was frequently worse than
their performance with one of the subsets). It is also important to note that the results are quite sensitive
to sampling error. The significant variations across replications for some of the models indicate the
sensitivity of the models to variations in the data.

1. I N T R O D U C T I O N of neural networks to prediction of business failure are


constrained by the choices of input variables, models and
THE ABILrrV TO accurately predict business failure is a
model structure. Like other techniques, neural network
concern to anyone who relies on a particular business for
approaches can be significantly influenced by such
income, either employment or investment, goods, or
choices. For example, in a previous study (Boritz,
services. In the past several years, a number of artificial
Kennedy, & Albuquerque, 1995) we have demonstrated
neural network models have been developed to predict
the sensitivity of neural network performance (i.e.,
corporate bankruptcy. The studies using neural networks
predictive accuracy) to the choice of input variables and
for bankruptcy prediction have shown both promise
sampling proportions in the training sample.
(Salchenberger, Sinar, & Lash, 1992; Tam & Kiang,
The purpose of this study is to extend our previous
1992; Fanning & Cogger, 1994) and disappointment
work by examining the effect of neural network type on
(Bell, Ribar, & Verchio, 1990).
bankruptcy prediction accuracy. Two approaches for
Most studies in this area have focussed on the
training neural networks, Back-Propagation and Optimal
effectiveness of back propagation networks relative to
Estimation Theory, are considered. Within the back-
benchmarks represented by conventional statistical tech-
propagation training method, four different models
niques such as discriminant analysis, logit and probit.
(Back-Propagation, Functional Link Back-Propagation
However, these studies have not reported the sensitivity
With Sines, Pruned Back-Propagation, and Cumulative
of model accuracy to variations in the variables selected,
Predictive Back-propagation) are tested. The neural
sampling methods, network type, and network archi-
networks are compared against traditional prediction
tecture. Thus, we have a limited understanding about the
techniques such as discriminant analysis, logit, and
degree to which the results of studies of the application
probit which serve as benchmarks for evaluating the
relative effectiveness of the neural network approach.
Requests for reprints should be sent to Duane Kennedy, School of Three sets of variables are considered in order to
Accountancy, Faculty of Arts, University of Waterloo, Waterloo, determine if the results are sensitive to the choice of
Ontario N2L 3G1, Canada. variables. The analysis is restricted to a single neural

503
504 J. E. Boritz and D. B. Kennedy

network architecture consisting of one layer of hidden the presented facts, it can be used to classify new data.
nodes with the number of hidden nodes set equal to the During the testing phase, the neural network's usefulness
number of input variables. is measured by its ability to correctly classify previously
The remainder of this paper is divided into five unseen data.
sections. The next section contains a general description One o f the most common training methods for neural
of neural networks and then describes the networks networks is the Back-Propagation paradigm. Back-
examined in this paper. The following section reviews propagation networks learn by modifying the input
previous research on the prediction of bankruptcy filing. connection weights until the difference (error) between
The fourth section describes the research design. The the output and target is minimized. These networks use
fifth section presents results of the analysis and the final the generalized data rule for propagating errors through
section contains concluding comments. the network. This rule places responsibility for the error
on every node in the network. The weights are first
modified for the output node. Then, the error is
2. N E U R A L N E T W O R K S
propagated backward to all nodes pointing to the output
Artificial neural networks consist of nodes and weights units in the preceding layer. These nodes, in turn,
for the degree of influence which one node exerts on propagate the error backward to nodes that point to them,
other nodes. The nodes are arranged in a pattern with a and so on, until the input level nodes are reached (Jones
series of input nodes forming the base layer, one or more & Hoskins, 1987). The standard back-propagation
output nodes at the top layer, and one or more hidden method for training neural networks is examined in this
layers of nodes between the input and output layers; The study as well as three variations of the back-propagation
input values for a node can be either excitatory, if the method (which are briefly described in the following
weight is positive, or inhibitory, if the weight is negative. subsections). In addition, we examine a neural network
The output of a node is equal to the weighted sum of its based on Optimal Estimation Theory. In all cases, the
input values transformed by a transfer function. Transfer networks are used to find a relationship between the
functions can be linear or non-linear. A commonly used input variables and a dichotomous output variable which
function is the sigmoid function 1/(1 +e-Z), which has has values of 1 for bankrupt companies and 0 for non-
upper and lower limits such that input value changes bankrupt companies.
outside of these limits do not affect the output of the
node.
Neural network architectures can be either feedback 2.1. Functional Link Back-Propagation With Sines
or feedforward. In a feedforward network, inputs are
This network uses the standard back-propagation algo-
only received from previous layers. Hence the informa-
rithm to adjust weights; however, it introduces additional
tion always flows in a forward direction, from the input
nodes in the input layer to help increase the learning rate.
nodes, through the hidden layers of nodes, to the output
The input variables are transformed by two functions.
nodes. The back-propagation paradigm, a common
Pair-wise interaction terms are added between all input
training method for neural networks, is a feedforward
variables and then the sine function is used to function-
network. In a feedback network, nodes can receive inputs
ally expand the data. 1 The overall effect is to map the
from nodes in any other layer in the network (including
input data items into a larger pattern space, thereby
the same layer); the direction of information flow is not
enhancing the representation and increasing the predic-
restricted.
tion accuracy. This results in a large number of nodes in
Neural network operation involves two phases, learn-
the Functional Link Back-Propagation With Sines net-
ing and testing. In the learning phase, the network is
works; hence, learning by this network takes a great deal
taught to solve certain problems or identify specific
of time. The Functional Link Back-Propagation With
patterns based on given information. Two types of
Sines networks typically took approximately twice the
learning are available for network operation, supervized
time to train compared to the other back-propagation
and unsupervized learning. In unsupervized learning, the
networks.
network is presented with a set of facts, however no
response is known by the network. The network
processes the input data and formulates an algorithm to
In theory, higher order interaction terms such as second or third
classify the presented input. This type of learning can be order can be constructed. NeuralWorks Professional H, the software
viewed as a multivariate technique to explore the used to construct the back-propagationneural networks in this study,
underlying structure of the data. In the supervized automatically constructs the pair-wise interaction terms. The software
learning mode, the network is presented with a set of also allows for multiple levels of sine functionsfor each input variable;
facts along with the correct response. The network uses for example, three levels would consist of the sin(~rXi), sin(2~rXi), and
sin(3~Xi) functions where Xi are the input variables. Pilot testing by
the information and correct response to formulate Boucher (1990) found that the predictive accuracy was higher with one
connection weights for each node in the network. Once layer of sine transformations compared to the accuracy of a model that
the network can correctly classify an acceptable level of contained only the pair-wise interaction terms.
Neural Networks and Business Failure Prediction 505

2.2. Pruned Back-Propagation subject of study for almost 30 years. The accurate
prediction of bankruptcy is important to investors,
This network disables all connections that have little
creditors and auditors. An accurate prediction model can
impact on the output of the network (i.e., low weight
help investors and creditors avoid heavy losses stemming
values). It is based on a concept introduced by David
from surprise bankruptcies and could help alert auditors
Rummelhart of Stanford University, where both the
about potential going concern problems (Boritz, 1991).
network complexity and error are minimized as the
Beaver (1966, 1968) was one of the first researchers
network learns. Rummelhart observed that generalization
to study the prediction of bankruptcy. He analyzed
ability declines with increased complexity of neural
several financial statement ratios, one-by-one, to evalu-
networks. Too many nodes produces a neural network
ate their predictive ability, then developed cut-off scores
that memorizes the input data and lacks the ability to
for each ratio to classify companies as failed and non-
generalize (often referred to as overfitting). As a result,
failed. For example, if a company had a debt-to-equity
increases in the network's ability to classify companies
ratio of 1.0 and the cut-off score for debt-to-equity is 2.0,
in the learning set are linked with a decline in its ability
then the company would be classified as healthy.
to classify companies in the testing set (Salchenberger et
Although this method has high predictive power, limita-
al., 1992).
tions of this process include the requirement to apply the
classification technique using one ratio at a time and the
2.3. Predictive Cumulative Back-Propagation
difficulty of resolving conflicts when one ratio classifies
This is a standard back-propagation network with the the company as healthy while another predicts distress.
input and output layers using a linear transfer function, Beaver's work was followed by Altman's (1968,
and the hidden layer using a sigrnoid transfer function. 1983) discriminant analysis-based model and Ohlson's
This network uses the cumulative generalized delta rule (1980) logit-based model. Zavgren (1983) and Jones
to speed up neural network training. The cumulative (1987) review relevant bankruptcy studies and analyze
generalized delta rule updates the weights after two data several key techniques, the data collection process, the
items are presented to the network instead of after each relative merits of the statistical approaches to prediction
item. The weights are accumulated in another variable and consider alternative ways of evaluating the validity
until both data items are presented, at which time the of a prediction.
actual node weights are updated. This approach can This study focuses on two major models, representing
decrease learning time up to 50%. different classes of statistical techniques; Altman's
(1968) model based on multiple discriminant analysis
2.4. Optimal Estimation Theory Neural Network and Ohlson's (1980) model based on the use of logistic
regression. We rely on these models in our choice of
Shepanski (1988) introduced the Optimal Estimation
input variables for training and testing the various neural
Theory paradigm as an alternative approach to training a
networks being studied. We also use these models and
feedforward neural network. It generates a set of least
their underlying statistical approaches for developing
squares estimators for the interconnection weights allow-
benchmarks against which we compare the performance
ing the optimal set of weights to be determined in a
of the neural networks.
single pass rather than requiring the training data set be
presented to the network many times as in back-
propagation. One advantage of the Optimal Estimation 3.1. Altman's Model
Theory approach is significant reductions in the time
Altman (1968, 1983) introduced a class of models based
required to train a neural network compared to the back-
on the discriminant analysis technique to produce a
propagation paradigm. A limitation of the basic Optimal
dichotomous classification of falling/non-failing finns.
Estimation Theory method is the necessity of having the
This technique involves the application of discriminant
same number of neurons in the hidden and input layers.
analysis to a representative "learning" set of data to
To overcome this limitation, the model can be modified
identify the discriminant function which best classifies
by using a nonlinear transformation of the input variables
the companies into known categories (e.g., bankrupt and
(Pao, 1989). The enhanced variables are constructed by
non-bankrupt). This function is then used to classify
finding the k least correlated pairs of variables, taking the
other data. Altman's variables and discriminant function
product of one variable (in the pair) by the cosine of the
are shown in Panel A of Table 1. The model's prediction
other variable, and using these products as additional
is based on the value obtained for the Z score. For Z
inputs to the neural network. Multiple values for k are
values greater than 2.675, the company is classified as
considered in our analysis.
"healthy," while the company is classified as bankrupt
for lower Z values.
3. PREVIOUS RESEARCH
Using the frequency of Type I and Type II errors as a
Bankruptcy prediction, through classification of known basis for evaluation, Airman's model typically has good
cases and generalization to other cases, has been a predictive power for one year before bankruptcy but the
506 J. E. Boritz and D. B. Kennedy

TABLE 1
Definition of Variables

Panel A: Altman

Variables:
WCTA = working capital/total assets
RETA = retained earnings/total assets
EBITTA = earnings before interest and taxes/total assets
MEDEBT = market value equity/book value of total debt
SALETA = sales/total assets
Model:
Z-- 0.210 WCTA + 0.014RETA + 0.033EBITTA + 0.006MEDEBT
+ 0.999SALETA

Panel B: Ohlson

Variables:
SIZE = log (total assets/GNP price-level index)
TLTA = total liabilities/total assets
WCTA = working capital/total assets
CLCA = current liabilities/current assets
OENEG = 1 if total liabilities>total assets
NITA = net income/total assets
FUTL = funds provided by operations/total liabilities
INTWO = 1 if net income negative for last two years
CHIN = change in net income = (NIt-NIt_l)/(INItl + INIt-ll)
NYSE,AMSE = dummy variables for exchange listing
9 Variable Model:
y = - 1 . 3 2 - 0.407SIZE + 6 . 0 3 T L T A - 1.43WCTA + 0.0757CLCA
- 2 . 3 7 N I T A - 1.83FUTL + 0.2851NTWO- 1.72OEN E G - 0 . 5 2 1 CHIN
11 Variable Model:
y = - 2 . 6 3 - 0 . 2 6 7 S I Z E + 5 . 6 3 T L T A - 1.43WCTA + 0.0585CLCA
- 2 . 3 5 N I T A - 1.99FUTL + 0.3071NTWO- 1.56OENEG
-0.5092CHIN-0.854NYSE-0.0513AMSE

model's predictive ability decreases substantially for of Table 1. Two models are shown, the primary model
predictions two and three years prior to bankruptcy. using nine variables and a second model which is
According to Altman (1983), the model's weakness in extended to eleven variables by adding dummy variables
predicting bankruptcy in an early stage is caused by a for exchange listing. The cumulative logistic function is
dramatic deteriorating trend in ratios as bankruptcy used to transform the y value into a bankruptcy
approaches and this deterioration is most serious probability. For probabilities greater than 0.50, the
between the third and the second years prior to company is classified as "healthy," while the company is
bankruptcy. classified as bankrupt for lower probabilities.
Boritz et al., 1995 tested the effect of various
proportions of bankrupt firms in the learning and testing
3.2. Ohlson's Model
populations on the predictive ability of conventional
Ohlson (1980) introduced the logistic regression tech- techniques and neural networks; samples of bankrupt
nique to estimate the probability of failing/non-failing firms ranging from 1% to 50% were created for both
for a company. This approach potentially can give a learning and testing populations. Boritz et al. also
decision maker a better sense of the distribution of showed that when the techniques were trained on a
financial risk than a model based on discriminant sample containing 1% bankrupt firms (which approx-
analysis. The objective is to determine the conditional imates the population proportion), all techniques except
probability of an observation belonging to a class, given quadratic discriminant analysis predicted that substan-
values of the independent variables for the observation. tially all observations would be non-bankrupt. As the
Logit, and the related probit technique, make use of a bankruptcy proportion increased in the training sample,
cumulative probability function to constrain the pre- the techniques became more capable o f differentiating
dicted value of the dependent variable within the interval between bankrupt and non-bankrupt.
(0, 1). The two models are similar, except that the logit The back-propagation neural network continued to
model uses the cumulative logistic function and probit predict that all observations were non-bankrupt until the
uses the cumulative normal distribution. proportion of bankrupt firms in the training sample
Ohlson's variables and models are shown in Panel B exceeded 20%, while the other techniques began to
Neural Networks and Business Failure Prediction 507

differentiate at lower training sample proportions. The of 230 (112) companies containing fifty percent bankrupt
proportion of bankrupt firms in the testing sample had firms, The learning data set is used to calculate
little impact on the rates o f Type I or Type II errors. coefficients/network weights and then the test sample is
used to measure predictive accuracy o f the models. This
4. RESEARCH DESIGN random data set selection, estimation, and testing process
This study examines four versions of the Back-Propaga- is replicated five times to reduce the impact o f random
variation in data set composition. 4
tion algorithm (standard Back-Propagation, Functional
Link Back-Propagation With Sines, Pruned Back-propa-
gation, and Cumulative Predictive Back-Propagation),
the Optimal Estimation Theory approach for training 4.2. Computational Procedure
neural networks, and several conventional techniques The SAS statistics package (SAS Institute Inc., 1989)
such as discriminant analysis, logit and probit. Three sets was used to construct the discriminant analysis, logit,
o f variables (Altman, Ohlson, and combined A l t m a n and and probit models. The Optimal Estimation Theory
Ohlson) are considered to enhance the generalizability of neural network was constructed using a specialized
the results. Fortran program. The back-propagation neural networks
were constructed using NeuralWorks Professional H. 5
4.1. Data Selection This software package requires scaling of all input data.
Scaling involves mapping each variable to a range with
The sample o f bankrupt companies used for the present
minimum and m a x i m u m values of 0.0 and 1.0, respec-
study was obtained from Boritz et al., 1995 which was
tively. F o r example, if the variable W C T A has minimum
based on the data set developed by K e n n e d y and Shaw
and m a x i m u m values o f - 0 . 8 0 and 0.60, respectively;
(1991). The sample consisted of 171 companies which
then the values - 0 . 8 0 and 0.60 are transformed to
filed for bankruptcy between 1971 and 1984 inclusive.
become 0.0 and 1.0, respectively, and all other values are
K e n n e d y and Shaw (1991) determined the exact date of
m a p p e d linearly between 0.0 and 1.0. As a result, the
bankrupty filing for all companies in the data set. This
minimum and m a x i m u m values must be determined for
ensured that the sample contained the last set o f financial
each variable. The m i n i m u m and m a x i m u m values were
statement data available prior to the bankruptcy filing}
calculated for the entire data set (i.e., including all
The non-bankrupt companies used in the present
bankrupt and non-bankrupt firms) to ensure that values
study also were obtained from Boritz et al., 1995. The
for data in the testing sample would be within the range
sample was collected from the Compustat H Database
used b y the learning sample. ~
and consisted of 6,153 non-bankrupt companies selected
The discriminant analysis, logit, and probit techniques
from the same time period as the bankrupt companies.
and the Optimal Estimation Theory neural network
These companies were selected by randomly assigning a
calculate parameter values directly from the data set. The
" f a i l i n g " year for each company in order to obtain a
back-propagation neural network uses a learning process
similar distribution over time for both the non-bankrupt
and bankrupt c o m p a n i e s )
F o r both the bankrupt and non-bankrupt f i r m s , data 4The number of firms in the example represents the theoretical
were collected to allow calculation of the variables used number for each replication. The actual number of companies in each
by Airman (1968) and Ohlson (1980). These variables replication may differ from this value due to the random selection
process. For computational simplicity, the two random splits into
are listed in Table 1. We present results for the primary learning/testing samples and the 50% proportion of bankrupt finns
Ohlson model which uses 9 variables rather than the 11 were actually conducted simultaneously.
variable version. For each set of variables and data set The discriminant analysis, logit, and probit models were con-
(bankrupt and non-bankrupt), a learning subset and structed using version 6.07 of SAS on a DEC 5000/200 computer with
testing subset, consisting of two-thirds and one-third of Ultrix operating system. The Optimal Estimation Theory model also
was estimated on the DEC computer. The back-propagationwas carded
the data, respectively, were randomly selected. The result out on a 486DX2 66mHZ DOS-based microcomputer with 16
was two learning (testing) subsets, one consisting of 115 megabytes of memory.
(56) bankrupt companies and the other with 4,100 6Examination of the data disclosed a small number of outlier
(2,053) non-bankrupt companies. Random samples con- observations. For example, for the variable Current Liabilities/Current
taining 115 (56) companies were selected from the Assets there was one data value in excess of 6,000 while the 99th
percentile for the variable was 5.4. Winsorization is a process where
learning (testing) subsets of non-bankrupt companies. extreme values are replaced by less extreme values which has the effect
These samples were combined with the subsets of of reducing the impact of the outlier while retaining the observation.A
bankrupt companies to create a learning (testing) sample total of six data values involving two Ohlson variables (Current
Liabilities/Current Assets and Funds From Operations/Total Debt)
were winsorized to reduce the effect of outliers on the range of
2For a complete description of the data collection procedure for the variables. Similarly, a total of seven data values involving two Altman
bankrupt companies, see Kennedy and Shaw (1991), p. 100. variables (Market Value of Eqnity/Book Value of Total Debt and
3 For a complete description of the data collection procedure for the Retained Earnings/Total Assets) were winsorized. The winsorized
non-bankrupt companies, see Boritz et al., 1995. values were used for all techniques~
508 J. E. Boritz and D. B. Kennedy

TABLE 2
Type I and Type II Errors and Misclassification Costs for Ohlson's Variables

Cost ratios

Type I error Type II error 0.05 1 20

Back-propagation NN 27.30 17.15 17.63 22.23 26.82


Functional link back-propagation with sines NN 28.73 17.90 18.42 23.32 28.21
Pruned back-propagation NN 21.45 17.25 17.45 19.35 21.25
Predictive cumulative back-propagation NN 20.82 14.99 15.27 17.91 20.54
Optimal estimation theory NN (0 enhanced variables) 6.91 47.27 45.35 27.09 8.83
Optimal estimation theory NN (10 enhanced variables) 6.51 37.80 36.31 22.16 8.00
Optimal estimation theory NN (25 enhanced variables) 11.06 33.98 32.89 22.52 12.15
Lineardiscriminant analysis 28.05 13.60 14.29 20.83 27~36
Quadratic discriminant analysis 52.46 7.07 9.23 29.77 50.30
Non-parametric discriminant analysis 23.32 14.17 14.61 18.75 22.88
Logit 28.75 13.82 14.53 21.29 28.04
Probit 29.16 13.82 14.55 21.49 28.43
Altman model 34.37 28.45 28.73 31.41 34.09
Ohlson model (9 variable version) 8.91 38.65 37.23 23.78 10.33
Ohlson model (11 variable version) 34.26 17.33 18.14 25.80 33.45

The results shown represent averages for 5 replications.


The first two numeric columns represent the Type I and Type II errors (expressed as percentages). Type I error is defined
as a prediction of non-bankrupt when the company is actually bankrupt. Type II error is defined as a prediction of bankrupt
when the company is actually non-bankrupt.
The remaining numeric columns present the misclassification costs for the cost ratio shown at the top of the column.
Misclassification cost is calculated as (Type I error rate times the percentage of bankrupt firms times the cost ratio plus Type
II error rate times the percentage of non-bankrupt firms) divided by (percentage of bankrupt firms times cost ratio plus
percentage of non-bankrupt firms).
The Altman and Ohlson models represent benchmarks and are based on the coefficients reported in Altman (1968) and
Ohlson (1980). The models are shown in Table 1. The Ohison models use the coefficients for one year prior to bankruptcy
and 9 and 11 explanatory variables (i.e., omitting or including the dummy variables for stock exchange).

requiring that each observation be presented multiple


Actually Actually
times to the network. Each observation in the learning set
bankrupt non-bankrupt
was presented 5,000 times to all four back-propagation
networks. One hidden layer consisting of 5, (9, 13)
Predicted Correct Type II error
processing elements was used for all four neural
bankrupt
networks for the analysis based on Altman (Ohlson,
Predicted Type I error Correct
combined) variables. 7 The Functional Link Back-Propa-
non-bankrupt
gation With Sines network used one level of sine
transformations. Three different values are tested for the
number of enhanced variables in the Optimal Estimation
The prediction accuracies for all techniques based on
Theory neural network; 0, 5, and 10 for the Altman
Ohlson's variables are summarized in Table 2. 8 Table 3
variables, 0, 10, and 25 for the Ohlson variables, and 0,
summarizes the results for the Altman variables. Table 4
10, 20 for the combined variables.
summarizes the results for the combined set of both the
Ohlson and Altman variables.
Determination of which technique is best is a function
5. R E S U L T S
of the relative costs of Type I and Type II errors. In
The neural networks in this study use financial statement
variables to classify companies as either bankrupt or
non-bankrupt. The following table describes the resulting s The results in Tables 2, 3, and 4 are based on trainingand testing
Type I and Type II errors. samples containing50% bankrupt finns. The analysis was replicated
usinga testingsamplecontaining1% bankruptfirms and the Type I and
Type II errors were similarto errors reported in the tables.The analysis
was also replicated using a training sample containing one percent
7This architecture was chosen because the Optimal Estimation bankrupt finns. Similar to Boritz et al., 1995, all techniques except
Theory network is restricted to havingan equal number of nodes in the quadratic discriminant analysis produced models that predicted that
input and hidden layers. The numberof nodes in the hidden layer will substantiallyall new observationswould be non-bankrupt(i.e., Type I
be varied in future work on this project. errors approaching 100% and Type II errors approaching0%).
Neural Networks and Business Failure Prediction 509

TABLE 3
Type I and Type II Errors and Misclassification Costs for Altman's V a r i a b l e s

Cost ratios

Type I error Type II error 0.05 1 20

Back-propagation NN 40.60 13.46 14.75 27.03 39,3t


Functional link back-propagation with sines NN 29.27 19.96 20.40 24.62 28.83
Pruned back-propagation NN 39,56 17.49 18.54 28.53 38.51
Predictive cumulative back-propagation NN 33.54 20.03 20.67 26.79 32.90
Optimal estimation theory NN (0 enhanced variables) 10.85 73.35 70.37 42.10 13.83
Optimal estimation theory NN (5 enhanced variables) 10.86 55.41 53,29 33.14 12,98
Optimal estimation theory NN (10 enhanced variables) 11.50 55.21 53.13 33.36 13.58
Linear discriminant analysis 40.91 15.50 16.71 28.21 39.70
Quadratic discriminant analysis 4.01 70.16 67.01 37.09 7.16
Non-parametricdiscriminant analysis 30,66 22.35 22.75 26.51 30.26
Logit 22.90 19.97 20.11 21.44 22.76
Probit 25.38 21.08 21,28 23.23 25.18
Altman model 34.37 28.45 28.73 31.41 34.09
Ohlson model (9 variable version) 8.91 38.65 37.23 23.78 10.33
Ohlson model (11 variable version) 34.26 17.33 18.14 25.80 33.45

The results shown represent averages for 5 replications.


The first two numeric columns represent the Type I and Type II errors (expressed as percentages). Type I error is defined
as a prediction of non-bankrupt when the company is actually bankrupt. Type II error is defined as a prediction of bankrupt
when the company is actually non-bankrupt.
The remaining numeric columns present the misclassification costs for the cost ratio shown at the top of the column.
Misclassification cost is calculated as (Type I error rate times the percentage of bankrupt firms times the cost ratio plus Type
II error rate times the percentage of non-bankrupt firms) divided by (percentage of bankrupt firms times cost ratio plus
percentage of non-bankrupt firms).
The Altman and Ohlson models represent benchmarks and are based on the coefficients reported in Altman (1968) and
Ohlson (1980). The models are shown in Table 1. The Ohlson models use the coefficients for one year prior to bankruptcy
and 9 and 11 explanatory variables (i.e., omitting or including the dummy variables for stock exchange).

general, Type I and Type II errors are unlikely to be [(27.30"0.5"0.05) + (17.15*0.5)]/[(0.5*0.05) + 0.5].
equally costly. Unfortunately, there is no generally In Tables 2, 3, and 4, the results labeled "Airman"
accepted basis for trading off Type I and Type II errors in and "Ohlson 9(11)" represent performance benchmarks.
bankruptcy prediction models. Different parties have These results are calculated using the variables and
different perceptions of the relative costs of Type I and coefficients exactly as reported by Altman (1968) and
Type II errors. For example, investors may be willing to Ohlson (1980) (see Table 1) but applied to our test
tolerate a higher rate of Type II error in order to have a samples.
low rate of Type I errors. In contrast, management may With reference to Table 2, the Optimal Estimation
prefer a lower rate of Type II errors to avoid the "self- Theory networks have the lowest level of Type I error,
fulfilling prophecies" when false bankruptcy signals followed by the back-propagation networks and the
actually result in bankruptcy filings. We calculate traditional statistical techniques. The Predictive Cumu-
misclassification costs under three assumptions regard- lative and Pruned Back-Propagation neural networks and
ing the ratio of the cost of the Type I and Type II errors. non-parametric discriminant analysis have similar levels
For example, a cost ratio of 20 (0.05) indicates that a of Type I error while the standard Back-Propagation and
Type I (II) error is considered to be twenty times more Functional Link Back-Propagation With Sines neural
costly than a Type II (I) error and the cost ratio of one networks have similar Type I errors compared to linear
implies that the errors are equally costly. Misclassifica- discrimlnant analysis, logit and probit. Type I error is
tion cost is calculated as: high for quadratic discriminant analysis. However, this
relationship is reversed for Type II error, where the
(Type I error*%bankrupt firms*cost ratio) traditional techniques have the lowest level, followed by
+ (Type II error*%non-bankrupt firms) the back-propagation networks. The Optimal Estimation
Theory networks have much higher levels of Type II
(%bankrupt finns*cost ratio) + %non-bankrupt firms errors.
The Ohlson 9 variable benchmark has low Type I
For example in Table 2, the misclassification cost of error and high Type II error which is comparable to the
17.63 for the Back-Propagation Neural Network with Optimal Estimation Theory networks in both cases. The
cost ratio of 0.05 is calculated as Altman and Ohlson 11 variable benchmarks have
510 J. E. Boritz and D. B. Kennedy

TABLE 4
Type I and Type II Errors and Misclassification Costs for the Combined Set of Altman and Ohlson Variables

Cost ratios

Type I error Type II error 0.05 1 20

Back-propagation NN 25.73 15.97 16.43 20.85 25.27


Functional link back-propagation with sines NN 25.52 16.68 17.10 21.10 25.10
Pruned back-propagation NN 38.02 11.i8 12.46 24.60 36.74
Predictive cumulative back-propagation NN 25.23 14.88 15.37 20.06 24.74
Optimal estimation theory NN (0 enhanced variables) 28.59 73.35 71.22 50.97 30.72
Optimal estimation theory NN (10 enhanced variables) 26.52 73.61 71.37 50.07 28.76
Optimal estimation theory NN (20 enhanced variables) 36.20 67.08 65.61 51.64 37.67
Lineardiscriminant analysis 27.49 12.66 13.37 20.08 26.78
Quadraticdiscriminant analysis 32.68 24.60 24.98 28.64 32.30
Non-parametric discriminant analysis 25.90 13.13 13.74 19.52 25.29
Logit 24.74 16.10 16.51 20.42 24.33
Probit 25.10 16.10 16.53 20.60 24.67
Altman model 34.37 28.45 28.73 31.41 34.09
Ohlson model (9variable version) 8.91 38.65 37.23 23.78 10.33
Ohlson model (11 variable version) 34.26 17.33 18.14 25.80 33.45

The results shown represent averagesfor 5 replications.


The first two numeric columns representthe Type I and Type II errors (expressed as percentages).Type I error is defined
as a prediction of non-bankrupt when the company is actually bankrupt. Type II error is defined as a prediction of bankrupt
when the company is actually non-bankrupt.
The remaining numeric columns present the misclassificationcosts for the cost ratio shown at the top of the column.
Misclassificationcost is calculated as (Type I error rate times the percentage of bankrupt firms times the cost ratio plus Type
II error rate times the percentage of non-bankrupt firms) divided by (percentage of bankrupt firms times cost ratio plus
percentage of non-bankrupt firms).
The Altman and Ohlson models represent benchmarks and are based on the coefficients reported in Altman (1968) and
Ohlson (1980). The models are shown in Table 1. The Ohlson models use the coefficients for one year prior to bankruptcy
and 9 and 11 explanatory variables (i.e., omitting or including the dummy variables for stock exchange).

relatively high Type I error (only quadratic discriminant similar for many of the techniques (24.7-28.6%) except
analysis is higher) while the Type II error is moderate for quadratic discriminant analysis, Optimal Estimation
Ohlson 11 variables relative to the other techniques and Theory neural network with 20 enhanced variables, and
higher for the Altman variables. Pruned Back-Propagation neural network. The Type II
When the cost ratio is one, the Predictive Cumulative errors are similar for many techniques (11.2-16.7%)
Back-Propagation Neural Network and non-parametric except quadratic discriminant analysis and the Optimal
discriminant analysis have the lowest misclassification Estimation Theory neural networks.
costs. As the cost ratio increases, the Optimal Estimation A comparison of Tables 2 and 3 indicates that,
Theory networks have the lowest misclassification cost, generally speaking, all models based on Ohlson's
while linear and non-parametric discriminant analysis, variables are superior to those same models based on
logit, and probit are favoured for the lowest cost ratios. Altman's variables. That is, in most instances (20 out of
Table 3 summarizes the results for the Altman 24 cells) the models based on Ohlson's variables have
variables. The results differ from those based on the both Type I and Type II error lower in Table 2 than in
Ohlson variables reported in Table 2. Quadratic dis- Table 3. There are only four exceptions to this general
criminant analysis has the lowest Type I error (4%), pattern: Back-Propagation Type II error, Quadratic
followed by the Optimal Estimation Theory networks Discriminant Analysis Type I error, Logit Type I error
(approx. 11%), logit and probit (23-25%), and then the and Probit Type I error. Also, when the cost ratio is 1, the
back-propagation networks and the other two dis- models based on Ohlson's variables are superior in the
criminant analysis techniques (30-41%). The relative sense of having a lower overall misclassification cost in
results for Type II error are almost exactly reversed. all 12 of the ceils in that column. When the cost ratios are
When the cost ratio is one, logit and probit have the 0.05 and 20, the models based on Ohlson's variables are
lowest misclassification costs. As the cost ratio increases superior in 11 and 9 of the 12 cells in their respective
(decreases), quadratic discriminant analysis and the cohunns.
Optimal Estimation Theory neural networks have the Comparison of Type I error and Type 1_Ierror in Tables
lowest (highest) costs. 2 and 4 shows that 12 of the 24 cells in Table 4 (which
Table 4 summarizes the results for the combined set of is based on the combined Altman and Ohlson variables)
Altman's and Ohlson's variables. The Type I error is have superior performance compared to that reported in
Neural Networks and Business Failure Prediction 511

TABLE 5
Coefficients of Variation for Type I and Type II Error Statistics Reported in Tables 2, 3, and 4

Ohlson Altman Combined

Type I Type II Type I Type II Type I Type II


error error error error error error

Back-propagation N N 52.49 48.95 38.15 50.12 28.17 58.39


Functional link back-propagation with sines NN 39.87 72.79 51:27 77.26 54.33 40.70
Pruned back-propagation NN 18.48 45.83 38.30 87.24 38.19 32.08
Predictive cumulative back-propagation NN 44.34 27.00 48.56 89.93 37.53 27.40
Optimal estimation theory NN 55.87 16.94 25.37 16.40 107.38 37.89
(0 enhanced variables)
Optimal estimation theory NN - - 49.80 13.89 - -
(5 enhanced variables)
Optimal estimation theory NN 59.75 26.63 48.15 2.60 110.50 32.05
(10 enhanced variables)
Optimal estimation theory NN . . . . 92.90 40.72
(20 enhanced variables)
Optimal estimation theory NN 67.15 29.96 . . . .
(25 enhanced variables)
Linear discriminant analysis 25.35 25,50 26.55 35.49 25.33 39.96
Quadratic discriminant analysis 39.96 121.32 104.00 21.27 88.43 105.11
Non-parametric discriminant analysis 22.75 23.43 33.59 12.06 17.95 30.93
Logit 17.16 46.34 24.69 32.41 19.48 30.14
Probit 17.84 46.34 38.61 44.14 18.36 30.14
AItman model 13.46 27.14 13.46 27.14 13,46 27.14
Ohlson model (9 variable version) 4.85 13.32 4;85 13.32 4.85 13.32
Ohlson model (11 variable version) 8.93 16.08 8.93 16.08 8.93 16.08

The results shown are based on 5 replications.


Type I error is defined as a prediction of non-bankrupt when the company is actually bankrupt. Type II error is defined as a prediction
of bankrupt when the company is actually non-bankrupt.
The Altman and Ohlson models represent benchmarks and are based on the coefficients reported in Altman (1968) and Ohlson
(1980). The models are shown in Table 1. The Ohlson models use the coefficients for one year prior to bankruptcy and 9 and 11
explanatory variables (i.e., omitting or including the dummy variables for stock exchange).

corresponding cells in Table 2 which is based solely on nificant variation in Type I errors for the OET models
Ohlson's variables. Some interesting additional observa- based on Ohlson and Combined variables and the
tions about Table 4 are: quadratic discriminant analysis models.
• the Type II error for all of the Back-Propagation
networks is lower in Table 4 compared with Table 2,
6. CONCLUSIONS
• the Type I errors for the Back-Propagation neural
network and the Functional Link Back-Propagation This study found that there are large differences in Type
with Sines neural network are lower, but interestingly, I error, Type II error, and misclassification costs among
the Type I errors for the Pruned Back-Propagation the neural networks tested. The Optimal Estimation
neural network and the Predictive Cumulative Back- Theory neural networks had the lowest rate of Type I
Propagation neural network are significantly higher in error but the highest rate of Type II error. The Back-
Table 4 compared with Table 2, Propagation neural networks had high Type I error but
• all three versions of the Optimal Estimation Theory lower rates of Type II error.
neural networks are significantly inferior with the It is difficult to make blanket statements about the
combined set of variables relative to the results using performance of the various techniques because there is a
Ohlson's variables alone. This underlines the necessity good deal of variation in their relative strengths and
of selecting appropriate variables to apply in a neural weaknesses according to which set of variables is used to
network setting and not simply leaving it to the neural fit the model, and whether Type I errors or Type II errors
network itself to sort out those variables. are of greatest interest. While the neural network models'
We examined the sensitivity of results across the five performance is in line with that of the more conventional
replications by considering the coefficients of variation techniques such as discriminant analysis and logit/probit,
for the various models. As can be seen from an it is noteworthy that their performance is not a dramatic
inspection of Table 5, there is a fair degree of variation in improvement over those conventional techniques. The
the results for the back-propagation models, and sig- latter have the advantage of being fairly well understood
512 J. E. Boritz and D. B. Kennedy

and easy to apply using readily available statistical REFERENCES


packages such as SAS. In contrast, the neural network Altman, E. (1968). Financial ratios, discriminant analysis and the
models are not as well understood, are not as easy to prediction of corporate bankruptcy. The Journal of Finance, 23(4),
apply, and do not appear to provide a significant (September), 589-609.
advantage in terms of dramatic reductions in mis- Altman, E. (1983). Corporate financial distress--A complete guide to
classification costs. predicting, avoiding and dealing with bankruptcy, New York: John
Wiley & Sons.
This observation is not intended to be a sweeping Beaver, W. (1966). Financial ratios as predictors of failure. Empirical
rejection of neural network models. For, while we have Research in Accounting: Selected Studies 1966, Journal of
varied a number of important factors in this study, we did Accounting Research, 4, (Supplement), 71-111.
not attempt to develop optimal neural network archi- Beaver, W. (1968). Market prices, financial ratios, and the prediction of
tectures for addressing the prediction of business failure failure. Journal of Accounting Research, 6(2), (Autumn),
179-192.
with the various networks. We are in the process of Bell, T. B., Ribar, G. S., & Verchio, J. (1990). Neural nets vs. logistic
conducting a study wherein we will, in fact, vary the regression: A comparison of each model's ability to predict
architectures of these networks. commercial bank failures. Proceedings of the 1990 Deloitte
In summary, while our study demonstrates that the Touche~University of Kansas Symposium on Auditing Problems,
various types of neural networks perform reasonably 29-53.
Boucher, J. (1990). Artificial neural systems and the prediction of
well in predicting business failure, we find that their corporate failure. Master's Project, School of Accountancy, Uni-
performance is not in any systematic way superior to the versity of Waterloo.
more conventional techniques that have been applied to Boritz, J. E. (1991). The going concern assumption. Toronto: Canadian
this problem such as discriminant analysis and logit/ Institute of Chartered Accountants.
probit. Comparing results across the three sets of Boritz, J. E., Kennedy, D. B., & Albuquerque, A. (1995). Predicting
corporate failure using a neural network approach. International
variables shows that the relative performance of neural Journal of Intelligent Systems in Accounting, Finance and Manage-
networks and traditional statistical techniques is affected ment, 4(2), (June), 95-111.
by the choice of variables in the learning sample. We Compustat Services, Inc., (1992). Compustat 11 Database, Englewood,
demonstrate that the performance of the neural networks CO: Standard & Poor's Compustat Services Inc.
Fanning, K. M., & Cogger, K. O. (1994). A comparative analysis of
tested is sensitive to the choice of variables selected and
artificial neural networks using financial distress prediction.
that the networks cannot be relied upon to "sift through" International Journal of Intelligent Systems in Accounting, Finance
variables and focus on the most important variables and Management, 3(4), (December), 241-252.
(network performance based on the combined set of Jones, E L. (1987). Current techniques in bankruptcy prediction.
Ohlson and Altman variables was frequently worse than Journal of Accounting Literature, 6, 131-164.
Jones, W., & Hoskins, J. (1987). Back propagation, a generalized delta
their performance with one of the subsets). It is also
rule. Byte, (October), 155-162.
important to note that the results are quite sensitive to Kennedy, D. B. (1992). Classification techniques in accounting
sampling error. The significant variations across replica- research: empirical evidence of comparative performance. Contem-
tions for some of the models indicate the sensitivity of porary Accounting Research, 8(2), (Spring), 419-442.
the models to variations in the data. Researchers Kennedy, D. B., & Shaw, W. H. (1991). Factors influencing auditor
opinion on bankrupt firms. Contemporary Accounting Research,
reporting on the results of applying neural networks
8(1), (Fall), 97-114.
should be replicating their models several times to obtain Ohlson, J. (1980). Financial ratios and the probabilistic prediction of
a reliable measure of model performance. Our results bankruptcy. Journal of Accounting Research, 18(1), (Spring),
suggest the need to exercise caution in interpreting results 109-131.
of studies where such replication was not performed. Pao, Y. H. (1989). Adaptive pattern recognition and neural networks.
Reading, MA: Addison Wesley.
There is potential for extending the research reported
Salchenberger, L. M., Cinar, E. M., & Lash, N. A. (1992). Neural
here as neural networks are affected by many parameters. networks: A new tool for predicting thrift failures. Decision
Examples include using different neural networks, Sciences, 23(4), (July/August), 899-916.
changing the number of hidden processing elements, and SAS Institute Inc. (1989). SAS/STAT User's Guide, Version 6, 4th ed.,
changing the set of explanatory variables. Vol. 1 and 2, Cary, NC: SAS Institute Inc.
Shepanski, J. E (1988). Fast learning in artificial neural systems:
Multilayer perceptron training using optimal estimation. Proceed-
Acknowledgements--We thank the Editor, anonymous reviewers, and ings of the IEEE 2nd International Conference in Neural Nets, Vol.
participants at the Third Annual Research Workshop on AI/ES in 4, San Diego, July 1988.
Accounting, Auditing and Tax for their useful comments. We gratefully Tam, K. Y., & Kiang, M. Y. (1992). Managerial applications of neural
acknowledge the contribution of Jeff Boucher and Shafin Kanji in pilot networks: The case of bank failure predictions. Management
testing some aspects of this research, computing assistance provided by Science, 38(7), (July), 926-947.
Nick Favron, and financial support provided by the Social Sciences and Zavgren, C. V. (1983). The prediction of corporate failure: The state of
Humanities Research Council of Canada. the art. Journal of Accounting Literature, 2, (Spring), 1-38.

You might also like