Professional Documents
Culture Documents
Effectiveness of Neural Network Types For Prediction of Busness Failure
Effectiveness of Neural Network Types For Prediction of Busness Failure
503-512, 1995
Copyright © 1995 Elsevier Science Ltd
Pergamon Printed in the USA. All fights reserved
0957-4174/95 $9.50 + .IX)
0957-4174(95)00020-8
Abstract--The study examines the effectiveness of different neural networks in predicting bankruptcy
filing. Two approaches for training neural networks, Back-Propagation and Optimal Estimation Theory,
are considered. Within the back-propagation training method, four different models (Back-Propagation,
Functional Link Back-Propagation With Sines, Pruned Back-Propagation, and Cumulative Predictive
Back-Propagation) are tested. The neural networks are compared against traditional bankruptcy
prediction techniques such as discriminant analysis, logit, and probit. The results show that the level of
Type I and Type H errors varies greatly across techniques. The Optimal Estimation Theory neural
network has the lowest level of Type 1 error and the highest level of Type H error while the traditional
statistical techniques have the reverse relationship (i.e., high Type I error and low Type H error). The
back-propagation neural networks have intermediate levels of Type I and Type H error. We demonstrate
that the performance of the neural networks tested is sensitive to the choice of variables selected and that
the networks cannot be relied upon to "sift through" variables and focus on the most important variables
(network performance based on the combined set of Ohlson and Altman data was frequently worse than
their performance with one of the subsets). It is also important to note that the results are quite sensitive
to sampling error. The significant variations across replications for some of the models indicate the
sensitivity of the models to variations in the data.
503
504 J. E. Boritz and D. B. Kennedy
network architecture consisting of one layer of hidden the presented facts, it can be used to classify new data.
nodes with the number of hidden nodes set equal to the During the testing phase, the neural network's usefulness
number of input variables. is measured by its ability to correctly classify previously
The remainder of this paper is divided into five unseen data.
sections. The next section contains a general description One o f the most common training methods for neural
of neural networks and then describes the networks networks is the Back-Propagation paradigm. Back-
examined in this paper. The following section reviews propagation networks learn by modifying the input
previous research on the prediction of bankruptcy filing. connection weights until the difference (error) between
The fourth section describes the research design. The the output and target is minimized. These networks use
fifth section presents results of the analysis and the final the generalized data rule for propagating errors through
section contains concluding comments. the network. This rule places responsibility for the error
on every node in the network. The weights are first
modified for the output node. Then, the error is
2. N E U R A L N E T W O R K S
propagated backward to all nodes pointing to the output
Artificial neural networks consist of nodes and weights units in the preceding layer. These nodes, in turn,
for the degree of influence which one node exerts on propagate the error backward to nodes that point to them,
other nodes. The nodes are arranged in a pattern with a and so on, until the input level nodes are reached (Jones
series of input nodes forming the base layer, one or more & Hoskins, 1987). The standard back-propagation
output nodes at the top layer, and one or more hidden method for training neural networks is examined in this
layers of nodes between the input and output layers; The study as well as three variations of the back-propagation
input values for a node can be either excitatory, if the method (which are briefly described in the following
weight is positive, or inhibitory, if the weight is negative. subsections). In addition, we examine a neural network
The output of a node is equal to the weighted sum of its based on Optimal Estimation Theory. In all cases, the
input values transformed by a transfer function. Transfer networks are used to find a relationship between the
functions can be linear or non-linear. A commonly used input variables and a dichotomous output variable which
function is the sigmoid function 1/(1 +e-Z), which has has values of 1 for bankrupt companies and 0 for non-
upper and lower limits such that input value changes bankrupt companies.
outside of these limits do not affect the output of the
node.
Neural network architectures can be either feedback 2.1. Functional Link Back-Propagation With Sines
or feedforward. In a feedforward network, inputs are
This network uses the standard back-propagation algo-
only received from previous layers. Hence the informa-
rithm to adjust weights; however, it introduces additional
tion always flows in a forward direction, from the input
nodes in the input layer to help increase the learning rate.
nodes, through the hidden layers of nodes, to the output
The input variables are transformed by two functions.
nodes. The back-propagation paradigm, a common
Pair-wise interaction terms are added between all input
training method for neural networks, is a feedforward
variables and then the sine function is used to function-
network. In a feedback network, nodes can receive inputs
ally expand the data. 1 The overall effect is to map the
from nodes in any other layer in the network (including
input data items into a larger pattern space, thereby
the same layer); the direction of information flow is not
enhancing the representation and increasing the predic-
restricted.
tion accuracy. This results in a large number of nodes in
Neural network operation involves two phases, learn-
the Functional Link Back-Propagation With Sines net-
ing and testing. In the learning phase, the network is
works; hence, learning by this network takes a great deal
taught to solve certain problems or identify specific
of time. The Functional Link Back-Propagation With
patterns based on given information. Two types of
Sines networks typically took approximately twice the
learning are available for network operation, supervized
time to train compared to the other back-propagation
and unsupervized learning. In unsupervized learning, the
networks.
network is presented with a set of facts, however no
response is known by the network. The network
processes the input data and formulates an algorithm to
In theory, higher order interaction terms such as second or third
classify the presented input. This type of learning can be order can be constructed. NeuralWorks Professional H, the software
viewed as a multivariate technique to explore the used to construct the back-propagationneural networks in this study,
underlying structure of the data. In the supervized automatically constructs the pair-wise interaction terms. The software
learning mode, the network is presented with a set of also allows for multiple levels of sine functionsfor each input variable;
facts along with the correct response. The network uses for example, three levels would consist of the sin(~rXi), sin(2~rXi), and
sin(3~Xi) functions where Xi are the input variables. Pilot testing by
the information and correct response to formulate Boucher (1990) found that the predictive accuracy was higher with one
connection weights for each node in the network. Once layer of sine transformations compared to the accuracy of a model that
the network can correctly classify an acceptable level of contained only the pair-wise interaction terms.
Neural Networks and Business Failure Prediction 505
2.2. Pruned Back-Propagation subject of study for almost 30 years. The accurate
prediction of bankruptcy is important to investors,
This network disables all connections that have little
creditors and auditors. An accurate prediction model can
impact on the output of the network (i.e., low weight
help investors and creditors avoid heavy losses stemming
values). It is based on a concept introduced by David
from surprise bankruptcies and could help alert auditors
Rummelhart of Stanford University, where both the
about potential going concern problems (Boritz, 1991).
network complexity and error are minimized as the
Beaver (1966, 1968) was one of the first researchers
network learns. Rummelhart observed that generalization
to study the prediction of bankruptcy. He analyzed
ability declines with increased complexity of neural
several financial statement ratios, one-by-one, to evalu-
networks. Too many nodes produces a neural network
ate their predictive ability, then developed cut-off scores
that memorizes the input data and lacks the ability to
for each ratio to classify companies as failed and non-
generalize (often referred to as overfitting). As a result,
failed. For example, if a company had a debt-to-equity
increases in the network's ability to classify companies
ratio of 1.0 and the cut-off score for debt-to-equity is 2.0,
in the learning set are linked with a decline in its ability
then the company would be classified as healthy.
to classify companies in the testing set (Salchenberger et
Although this method has high predictive power, limita-
al., 1992).
tions of this process include the requirement to apply the
classification technique using one ratio at a time and the
2.3. Predictive Cumulative Back-Propagation
difficulty of resolving conflicts when one ratio classifies
This is a standard back-propagation network with the the company as healthy while another predicts distress.
input and output layers using a linear transfer function, Beaver's work was followed by Altman's (1968,
and the hidden layer using a sigrnoid transfer function. 1983) discriminant analysis-based model and Ohlson's
This network uses the cumulative generalized delta rule (1980) logit-based model. Zavgren (1983) and Jones
to speed up neural network training. The cumulative (1987) review relevant bankruptcy studies and analyze
generalized delta rule updates the weights after two data several key techniques, the data collection process, the
items are presented to the network instead of after each relative merits of the statistical approaches to prediction
item. The weights are accumulated in another variable and consider alternative ways of evaluating the validity
until both data items are presented, at which time the of a prediction.
actual node weights are updated. This approach can This study focuses on two major models, representing
decrease learning time up to 50%. different classes of statistical techniques; Altman's
(1968) model based on multiple discriminant analysis
2.4. Optimal Estimation Theory Neural Network and Ohlson's (1980) model based on the use of logistic
regression. We rely on these models in our choice of
Shepanski (1988) introduced the Optimal Estimation
input variables for training and testing the various neural
Theory paradigm as an alternative approach to training a
networks being studied. We also use these models and
feedforward neural network. It generates a set of least
their underlying statistical approaches for developing
squares estimators for the interconnection weights allow-
benchmarks against which we compare the performance
ing the optimal set of weights to be determined in a
of the neural networks.
single pass rather than requiring the training data set be
presented to the network many times as in back-
propagation. One advantage of the Optimal Estimation 3.1. Altman's Model
Theory approach is significant reductions in the time
Altman (1968, 1983) introduced a class of models based
required to train a neural network compared to the back-
on the discriminant analysis technique to produce a
propagation paradigm. A limitation of the basic Optimal
dichotomous classification of falling/non-failing finns.
Estimation Theory method is the necessity of having the
This technique involves the application of discriminant
same number of neurons in the hidden and input layers.
analysis to a representative "learning" set of data to
To overcome this limitation, the model can be modified
identify the discriminant function which best classifies
by using a nonlinear transformation of the input variables
the companies into known categories (e.g., bankrupt and
(Pao, 1989). The enhanced variables are constructed by
non-bankrupt). This function is then used to classify
finding the k least correlated pairs of variables, taking the
other data. Altman's variables and discriminant function
product of one variable (in the pair) by the cosine of the
are shown in Panel A of Table 1. The model's prediction
other variable, and using these products as additional
is based on the value obtained for the Z score. For Z
inputs to the neural network. Multiple values for k are
values greater than 2.675, the company is classified as
considered in our analysis.
"healthy," while the company is classified as bankrupt
for lower Z values.
3. PREVIOUS RESEARCH
Using the frequency of Type I and Type II errors as a
Bankruptcy prediction, through classification of known basis for evaluation, Airman's model typically has good
cases and generalization to other cases, has been a predictive power for one year before bankruptcy but the
506 J. E. Boritz and D. B. Kennedy
TABLE 1
Definition of Variables
Panel A: Altman
Variables:
WCTA = working capital/total assets
RETA = retained earnings/total assets
EBITTA = earnings before interest and taxes/total assets
MEDEBT = market value equity/book value of total debt
SALETA = sales/total assets
Model:
Z-- 0.210 WCTA + 0.014RETA + 0.033EBITTA + 0.006MEDEBT
+ 0.999SALETA
Panel B: Ohlson
Variables:
SIZE = log (total assets/GNP price-level index)
TLTA = total liabilities/total assets
WCTA = working capital/total assets
CLCA = current liabilities/current assets
OENEG = 1 if total liabilities>total assets
NITA = net income/total assets
FUTL = funds provided by operations/total liabilities
INTWO = 1 if net income negative for last two years
CHIN = change in net income = (NIt-NIt_l)/(INItl + INIt-ll)
NYSE,AMSE = dummy variables for exchange listing
9 Variable Model:
y = - 1 . 3 2 - 0.407SIZE + 6 . 0 3 T L T A - 1.43WCTA + 0.0757CLCA
- 2 . 3 7 N I T A - 1.83FUTL + 0.2851NTWO- 1.72OEN E G - 0 . 5 2 1 CHIN
11 Variable Model:
y = - 2 . 6 3 - 0 . 2 6 7 S I Z E + 5 . 6 3 T L T A - 1.43WCTA + 0.0585CLCA
- 2 . 3 5 N I T A - 1.99FUTL + 0.3071NTWO- 1.56OENEG
-0.5092CHIN-0.854NYSE-0.0513AMSE
model's predictive ability decreases substantially for of Table 1. Two models are shown, the primary model
predictions two and three years prior to bankruptcy. using nine variables and a second model which is
According to Altman (1983), the model's weakness in extended to eleven variables by adding dummy variables
predicting bankruptcy in an early stage is caused by a for exchange listing. The cumulative logistic function is
dramatic deteriorating trend in ratios as bankruptcy used to transform the y value into a bankruptcy
approaches and this deterioration is most serious probability. For probabilities greater than 0.50, the
between the third and the second years prior to company is classified as "healthy," while the company is
bankruptcy. classified as bankrupt for lower probabilities.
Boritz et al., 1995 tested the effect of various
proportions of bankrupt firms in the learning and testing
3.2. Ohlson's Model
populations on the predictive ability of conventional
Ohlson (1980) introduced the logistic regression tech- techniques and neural networks; samples of bankrupt
nique to estimate the probability of failing/non-failing firms ranging from 1% to 50% were created for both
for a company. This approach potentially can give a learning and testing populations. Boritz et al. also
decision maker a better sense of the distribution of showed that when the techniques were trained on a
financial risk than a model based on discriminant sample containing 1% bankrupt firms (which approx-
analysis. The objective is to determine the conditional imates the population proportion), all techniques except
probability of an observation belonging to a class, given quadratic discriminant analysis predicted that substan-
values of the independent variables for the observation. tially all observations would be non-bankrupt. As the
Logit, and the related probit technique, make use of a bankruptcy proportion increased in the training sample,
cumulative probability function to constrain the pre- the techniques became more capable o f differentiating
dicted value of the dependent variable within the interval between bankrupt and non-bankrupt.
(0, 1). The two models are similar, except that the logit The back-propagation neural network continued to
model uses the cumulative logistic function and probit predict that all observations were non-bankrupt until the
uses the cumulative normal distribution. proportion of bankrupt firms in the training sample
Ohlson's variables and models are shown in Panel B exceeded 20%, while the other techniques began to
Neural Networks and Business Failure Prediction 507
differentiate at lower training sample proportions. The of 230 (112) companies containing fifty percent bankrupt
proportion of bankrupt firms in the testing sample had firms, The learning data set is used to calculate
little impact on the rates o f Type I or Type II errors. coefficients/network weights and then the test sample is
used to measure predictive accuracy o f the models. This
4. RESEARCH DESIGN random data set selection, estimation, and testing process
This study examines four versions of the Back-Propaga- is replicated five times to reduce the impact o f random
variation in data set composition. 4
tion algorithm (standard Back-Propagation, Functional
Link Back-Propagation With Sines, Pruned Back-propa-
gation, and Cumulative Predictive Back-Propagation),
the Optimal Estimation Theory approach for training 4.2. Computational Procedure
neural networks, and several conventional techniques The SAS statistics package (SAS Institute Inc., 1989)
such as discriminant analysis, logit and probit. Three sets was used to construct the discriminant analysis, logit,
o f variables (Altman, Ohlson, and combined A l t m a n and and probit models. The Optimal Estimation Theory
Ohlson) are considered to enhance the generalizability of neural network was constructed using a specialized
the results. Fortran program. The back-propagation neural networks
were constructed using NeuralWorks Professional H. 5
4.1. Data Selection This software package requires scaling of all input data.
Scaling involves mapping each variable to a range with
The sample o f bankrupt companies used for the present
minimum and m a x i m u m values of 0.0 and 1.0, respec-
study was obtained from Boritz et al., 1995 which was
tively. F o r example, if the variable W C T A has minimum
based on the data set developed by K e n n e d y and Shaw
and m a x i m u m values o f - 0 . 8 0 and 0.60, respectively;
(1991). The sample consisted of 171 companies which
then the values - 0 . 8 0 and 0.60 are transformed to
filed for bankruptcy between 1971 and 1984 inclusive.
become 0.0 and 1.0, respectively, and all other values are
K e n n e d y and Shaw (1991) determined the exact date of
m a p p e d linearly between 0.0 and 1.0. As a result, the
bankrupty filing for all companies in the data set. This
minimum and m a x i m u m values must be determined for
ensured that the sample contained the last set o f financial
each variable. The m i n i m u m and m a x i m u m values were
statement data available prior to the bankruptcy filing}
calculated for the entire data set (i.e., including all
The non-bankrupt companies used in the present
bankrupt and non-bankrupt firms) to ensure that values
study also were obtained from Boritz et al., 1995. The
for data in the testing sample would be within the range
sample was collected from the Compustat H Database
used b y the learning sample. ~
and consisted of 6,153 non-bankrupt companies selected
The discriminant analysis, logit, and probit techniques
from the same time period as the bankrupt companies.
and the Optimal Estimation Theory neural network
These companies were selected by randomly assigning a
calculate parameter values directly from the data set. The
" f a i l i n g " year for each company in order to obtain a
back-propagation neural network uses a learning process
similar distribution over time for both the non-bankrupt
and bankrupt c o m p a n i e s )
F o r both the bankrupt and non-bankrupt f i r m s , data 4The number of firms in the example represents the theoretical
were collected to allow calculation of the variables used number for each replication. The actual number of companies in each
by Airman (1968) and Ohlson (1980). These variables replication may differ from this value due to the random selection
process. For computational simplicity, the two random splits into
are listed in Table 1. We present results for the primary learning/testing samples and the 50% proportion of bankrupt finns
Ohlson model which uses 9 variables rather than the 11 were actually conducted simultaneously.
variable version. For each set of variables and data set The discriminant analysis, logit, and probit models were con-
(bankrupt and non-bankrupt), a learning subset and structed using version 6.07 of SAS on a DEC 5000/200 computer with
testing subset, consisting of two-thirds and one-third of Ultrix operating system. The Optimal Estimation Theory model also
was estimated on the DEC computer. The back-propagationwas carded
the data, respectively, were randomly selected. The result out on a 486DX2 66mHZ DOS-based microcomputer with 16
was two learning (testing) subsets, one consisting of 115 megabytes of memory.
(56) bankrupt companies and the other with 4,100 6Examination of the data disclosed a small number of outlier
(2,053) non-bankrupt companies. Random samples con- observations. For example, for the variable Current Liabilities/Current
taining 115 (56) companies were selected from the Assets there was one data value in excess of 6,000 while the 99th
percentile for the variable was 5.4. Winsorization is a process where
learning (testing) subsets of non-bankrupt companies. extreme values are replaced by less extreme values which has the effect
These samples were combined with the subsets of of reducing the impact of the outlier while retaining the observation.A
bankrupt companies to create a learning (testing) sample total of six data values involving two Ohlson variables (Current
Liabilities/Current Assets and Funds From Operations/Total Debt)
were winsorized to reduce the effect of outliers on the range of
2For a complete description of the data collection procedure for the variables. Similarly, a total of seven data values involving two Altman
bankrupt companies, see Kennedy and Shaw (1991), p. 100. variables (Market Value of Eqnity/Book Value of Total Debt and
3 For a complete description of the data collection procedure for the Retained Earnings/Total Assets) were winsorized. The winsorized
non-bankrupt companies, see Boritz et al., 1995. values were used for all techniques~
508 J. E. Boritz and D. B. Kennedy
TABLE 2
Type I and Type II Errors and Misclassification Costs for Ohlson's Variables
Cost ratios
TABLE 3
Type I and Type II Errors and Misclassification Costs for Altman's V a r i a b l e s
Cost ratios
general, Type I and Type II errors are unlikely to be [(27.30"0.5"0.05) + (17.15*0.5)]/[(0.5*0.05) + 0.5].
equally costly. Unfortunately, there is no generally In Tables 2, 3, and 4, the results labeled "Airman"
accepted basis for trading off Type I and Type II errors in and "Ohlson 9(11)" represent performance benchmarks.
bankruptcy prediction models. Different parties have These results are calculated using the variables and
different perceptions of the relative costs of Type I and coefficients exactly as reported by Altman (1968) and
Type II errors. For example, investors may be willing to Ohlson (1980) (see Table 1) but applied to our test
tolerate a higher rate of Type II error in order to have a samples.
low rate of Type I errors. In contrast, management may With reference to Table 2, the Optimal Estimation
prefer a lower rate of Type II errors to avoid the "self- Theory networks have the lowest level of Type I error,
fulfilling prophecies" when false bankruptcy signals followed by the back-propagation networks and the
actually result in bankruptcy filings. We calculate traditional statistical techniques. The Predictive Cumu-
misclassification costs under three assumptions regard- lative and Pruned Back-Propagation neural networks and
ing the ratio of the cost of the Type I and Type II errors. non-parametric discriminant analysis have similar levels
For example, a cost ratio of 20 (0.05) indicates that a of Type I error while the standard Back-Propagation and
Type I (II) error is considered to be twenty times more Functional Link Back-Propagation With Sines neural
costly than a Type II (I) error and the cost ratio of one networks have similar Type I errors compared to linear
implies that the errors are equally costly. Misclassifica- discrimlnant analysis, logit and probit. Type I error is
tion cost is calculated as: high for quadratic discriminant analysis. However, this
relationship is reversed for Type II error, where the
(Type I error*%bankrupt firms*cost ratio) traditional techniques have the lowest level, followed by
+ (Type II error*%non-bankrupt firms) the back-propagation networks. The Optimal Estimation
Theory networks have much higher levels of Type II
(%bankrupt finns*cost ratio) + %non-bankrupt firms errors.
The Ohlson 9 variable benchmark has low Type I
For example in Table 2, the misclassification cost of error and high Type II error which is comparable to the
17.63 for the Back-Propagation Neural Network with Optimal Estimation Theory networks in both cases. The
cost ratio of 0.05 is calculated as Altman and Ohlson 11 variable benchmarks have
510 J. E. Boritz and D. B. Kennedy
TABLE 4
Type I and Type II Errors and Misclassification Costs for the Combined Set of Altman and Ohlson Variables
Cost ratios
relatively high Type I error (only quadratic discriminant similar for many of the techniques (24.7-28.6%) except
analysis is higher) while the Type II error is moderate for quadratic discriminant analysis, Optimal Estimation
Ohlson 11 variables relative to the other techniques and Theory neural network with 20 enhanced variables, and
higher for the Altman variables. Pruned Back-Propagation neural network. The Type II
When the cost ratio is one, the Predictive Cumulative errors are similar for many techniques (11.2-16.7%)
Back-Propagation Neural Network and non-parametric except quadratic discriminant analysis and the Optimal
discriminant analysis have the lowest misclassification Estimation Theory neural networks.
costs. As the cost ratio increases, the Optimal Estimation A comparison of Tables 2 and 3 indicates that,
Theory networks have the lowest misclassification cost, generally speaking, all models based on Ohlson's
while linear and non-parametric discriminant analysis, variables are superior to those same models based on
logit, and probit are favoured for the lowest cost ratios. Altman's variables. That is, in most instances (20 out of
Table 3 summarizes the results for the Altman 24 cells) the models based on Ohlson's variables have
variables. The results differ from those based on the both Type I and Type II error lower in Table 2 than in
Ohlson variables reported in Table 2. Quadratic dis- Table 3. There are only four exceptions to this general
criminant analysis has the lowest Type I error (4%), pattern: Back-Propagation Type II error, Quadratic
followed by the Optimal Estimation Theory networks Discriminant Analysis Type I error, Logit Type I error
(approx. 11%), logit and probit (23-25%), and then the and Probit Type I error. Also, when the cost ratio is 1, the
back-propagation networks and the other two dis- models based on Ohlson's variables are superior in the
criminant analysis techniques (30-41%). The relative sense of having a lower overall misclassification cost in
results for Type II error are almost exactly reversed. all 12 of the ceils in that column. When the cost ratios are
When the cost ratio is one, logit and probit have the 0.05 and 20, the models based on Ohlson's variables are
lowest misclassification costs. As the cost ratio increases superior in 11 and 9 of the 12 cells in their respective
(decreases), quadratic discriminant analysis and the cohunns.
Optimal Estimation Theory neural networks have the Comparison of Type I error and Type 1_Ierror in Tables
lowest (highest) costs. 2 and 4 shows that 12 of the 24 cells in Table 4 (which
Table 4 summarizes the results for the combined set of is based on the combined Altman and Ohlson variables)
Altman's and Ohlson's variables. The Type I error is have superior performance compared to that reported in
Neural Networks and Business Failure Prediction 511
TABLE 5
Coefficients of Variation for Type I and Type II Error Statistics Reported in Tables 2, 3, and 4
corresponding cells in Table 2 which is based solely on nificant variation in Type I errors for the OET models
Ohlson's variables. Some interesting additional observa- based on Ohlson and Combined variables and the
tions about Table 4 are: quadratic discriminant analysis models.
• the Type II error for all of the Back-Propagation
networks is lower in Table 4 compared with Table 2,
6. CONCLUSIONS
• the Type I errors for the Back-Propagation neural
network and the Functional Link Back-Propagation This study found that there are large differences in Type
with Sines neural network are lower, but interestingly, I error, Type II error, and misclassification costs among
the Type I errors for the Pruned Back-Propagation the neural networks tested. The Optimal Estimation
neural network and the Predictive Cumulative Back- Theory neural networks had the lowest rate of Type I
Propagation neural network are significantly higher in error but the highest rate of Type II error. The Back-
Table 4 compared with Table 2, Propagation neural networks had high Type I error but
• all three versions of the Optimal Estimation Theory lower rates of Type II error.
neural networks are significantly inferior with the It is difficult to make blanket statements about the
combined set of variables relative to the results using performance of the various techniques because there is a
Ohlson's variables alone. This underlines the necessity good deal of variation in their relative strengths and
of selecting appropriate variables to apply in a neural weaknesses according to which set of variables is used to
network setting and not simply leaving it to the neural fit the model, and whether Type I errors or Type II errors
network itself to sort out those variables. are of greatest interest. While the neural network models'
We examined the sensitivity of results across the five performance is in line with that of the more conventional
replications by considering the coefficients of variation techniques such as discriminant analysis and logit/probit,
for the various models. As can be seen from an it is noteworthy that their performance is not a dramatic
inspection of Table 5, there is a fair degree of variation in improvement over those conventional techniques. The
the results for the back-propagation models, and sig- latter have the advantage of being fairly well understood
512 J. E. Boritz and D. B. Kennedy