Comparison of Artificial Neural Network and Bayesian Belief Network in Computer-Assisted Diagnosis Scheme For Mammography

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 5

Comparison of Artificial Neural Network and Bayesian Belief Network in a

Computer-Assisted Diagnosis Scheme for Mammography

Bin Zheng, Yuan-Hsiang Chang, Xiao-Hui Wang, Walter F. Good


Department of Radiology, University of Pittsburgh, Pittsburgh, PA 15261-0001
Bzheng@radserv.arad.upmc .edu

Abstract 85% to 90% sensitivity with 1 to 2 false-positive regions


per image in mass detection). Unlike many other pattern
Artificial neural networks (A“)have been widely used in recognition problems where the feature domain is
computer-assisted diagnosis (W) schemes as a reasonably limited and well defined (e.g., optical character
class$mtion tool to identii abnormalities in digitized recognition), the feature space in CAD of mammography
mammopams. Because of certain limitations of Ai%%, is very complex, due to wide diversity of normal tissue
some investigators argue that Bqesiun belief network patterns and variety of abnormalities. To improve CAD
(BBN may exhibit higherpe$ormance. In this stu& we performance, a large number of features are usually
compared the pet$ormance of an A” and a BBN used in extracted. Most of these features are neither visible nor
the same CAD scheme. The common databares and the understandable by human observers. It is very difficult to
same genetic algorithm (GA) were used to optimize both find the correlation and effectiveness of these features in
network. Ttre experimental results demonstrated that identifying masses in mammograms. Thus, many machine
using GA optimizution, the per$ormance of the two learning (classification) methods have been tested to
networks converged to the same level in detecting masses identify the positive mass regions based on a set of
fiom digitized mammograms. Ther#ore, in this study we computed features. Artificial neural networks (ANN) are
concluded that improving the peq4ormance of CAD the most popular paradigm used as a machine learning
schemes might be more dependent on optimimtion of classifier in current CAD schemes.
feaiure selection and divers@ of training database than on
anyparticular machine chs$cation paradigm. The ANN uses ‘hill-climbing” approach to learn the
correct response or output for each of the training samples.
Key Words: Artificial neural network, Bayesian belief After training, the structure of the ANN has been self-
network, Genetic algorithm,Computer-assisteddiagnosis. organized to enable extrapolation when faced with new,
yet similar, patterns, on the basis of ‘Bxperience” with the
training set. One attractive feature when using an ANN in
I. Introduction a complex pattern recognition problem is that the required
amount of a priori knowledge of the input features and
After intensive investigation for more than a decade by internal system operation is minimal [l 13. AlthoughANNs
many research groups, a large number of computer- have the ability to learn complex patterns directly fiom
assisted diagnosis (CAD) schemes in mammography have observations, their reasoning process is inaccessible to
been developed [l-lo]. For mass detection, the CAD human understanding and observers cannot be certain what
schemes usually adopted three steps to identify positive the ANN has learned [12]. Because of this, it may be hard
and negative mass regions. In the first step, they use for physicians to accept and act on a computer systems
different image segmentation methods to filter out basic advice without knowing the basis for the systems decision
tissue structure and select initial suspicious regions. The [13]. Furthermore, due to ‘hill-climbing” optimization
second step is to compute or extract features fiom each process of the A N N s , the possible data over-fitting may
suspicious region. Then, the third step uses a classifier to significantly deteriorate the robustness of ANN-based
identify positive and negative regions based on the set of CAD schemes in real clinical environment [14].
extracted features. Although each scheme was
independently developed using databases of limited size, Bayesian belief networks (BBN) use different training
most current CAD schemes yield similar performance(i.e., concept. A BBN is a causal probabilistic network that

0-7803-5529-6/99/$10.00 01999 IEEEi 4181


compactly represents the joint probability distribution of a
problem domain by exploiting conditional dependencies. The 1,557 images were individually processed by our
The BBN captures knowledge of a given problem domain multi-layer topographic based CAD scheme which has
in a natural and efficient way. A BBN can also explain its been described elsewhere [3]. In brief, this scheme has
reasoning and can avoid the danger of data over-fitting three distinct stages for the identification of suspicious
[15]. Because of these unique characteristics, BBNs have regions. The first stage of dual kernel filtering, subtraction,
been widely used in many machine learning applications thresholding, and labeling resulted in the selection of a
[16]. In the area of computer-assisted diagnosis of breast large number of suspicious regions (approximately 18
cancer, some researchers have claimed that BBNs should regions per image when applied to these image databases).
perform much better and more reliably than ANNs [17]. Based on local contrast measurements, the second stage
Because in current CAD studies, different schemes have used an adaptive region growth algorithm to define three
been trained and tested using different databases, topographic layers for each suspicious region. In each
performance of these schemes can not be compared [181. growth layer, a set of simple intra-layer boundary
conditions on the growth ratio and change of shape factor
In this study, we used the same database to train both an of the region was applied to eliminate a large number of
ANN and a BBN. After optimizing the topologies of these initial suspicious regions (> 80%), which may included
networks using a genetic algorithm (GA), an independent both positive and negative regions. Only the regions that
database was used to test the performance and robustness successllly pass through three topographic growths were
of the ANN and the BBN. In this way, we can objectively retained as suspicious regions for further classification.
compare the performance and robustness of the ANN and After the second stage, the number of suspicious regions
the BBN based CAD schemes developed in this (including both positive and negative regions) decreased to
experiment. A description of the approach, along with the 5,560 (approximately 3.6 per image) in these 1,557
preliminary experimental results derived fkom three images. For each of these remaining regions, a set of
independent databases involving total of 1,557 images, is image features was automatically computed by our CAD
presented here. scheme. Using these features, in the third stage of the
scheme, different classification tools based on nonlinear
II. Materials and Methods multi-layer feature analysis were incorporated to identify
positive and negative mass regions. The classification tools
2.1. Three independent databases that have been tested in our previous studies include a rule-
based expert classifier [3], set enumeration trees [8], an
Three independently acquired image databases were used ANN [19], and a BBN [20]. In this study, both an ANN
in this study. The first one was used for training the and a BBN were used as classification tools. All suspicious
networks. The second was used to evaluate a fitness regions identified by the second stage of the CAD scheme
function in the GA, and the third was used to assess the were included in the experimentaldatabases.
perfinmace and robustness of the optimized networks.
Three databases, which were acquired from three different 772 of these 5,560 regions depicted verified masses, while
medical centers, included 545,579, and 433 images. All of 4,788 suspicious regions were actually negative. With the
these 1,557 images were digitized in our laboratory using exception of the suspicious regions that matched the
the same laser film digitizer Cumisys 150) with a pixel verified masses, all other regions that had been identified
size of 100 pm x 100 pm and 12-bit gray-level resolution. by our CAD scheme in the second stage as suspicious were
The digitized images were then sub-sampled by a factor of determined to be negative. No pathologic verification was
four in both directions to generate new images with sizes available for the negative regions. In summary, there are
of approximately600 x 450 pixels. In each database there 288, 172, 312 positive mass regions in these three image
is a mixture of images with and without mass regions. All databases, respectively. The CAD scheme detected 1,65 1,
masses were pathology verified. The locations of these 1,876,1,261 negative (false-positive) regions in these three
verified mass regions in the original film mammograms image databases. A feature vector extracted by the CAD
were identified by expert radiologists in the different scheme was used to represent each suspiciousmass region.
medical centers where the mammograms were acquired. This feature vector contains 38 features. Within these
Most of the ‘hegative” images in these databases were features, 32 were computed from the interior of the region
considered to be difficult controls because they had dense and its sumounding background, which were considered to
breast parenchyma with highly fluctuated image features. be local features, while the other six were global features
It should be noted that because in this study we were only that represent the global tissue patterns of the breast.
interested in mass detection, an image without a positive Definitions of these features and related computational
mass was considered as a ‘hegative” image even though it methods using our CAD scheme have been described
might contain other abnormalities (i.e., microcalcification elsewhere [8,19,20].
clusters).

4182
22. Topology of networks and GA initialization of the chromosomes was set as 50. In order to incorporate
our experience in the feature selection and also to achieve
In these experiments, all 38 features were used to train and a diverse initial population, about one third of the initial
test the classifiers. The topologies of the ANN and the chromosomes were manually selected with a small number
BBN are different. The ANN has 38 input neurons, 16 of bits of I (S IO), while the rest of initial population was
hidden neurons, and one output neuron. The topology of randomly assigned by GA sohare. Meanwhile, the
the BBN was similar to the BBN that we have developed crossover rate, the mutation rate, and the generation gap
and tested for mass detection in our previous studies [20]. used in the GA were set at 0.6, 0.001, and 1.0,
Basically, the six global features are located in the top respectively.
layer of the network and comprise the ‘barent” nodes to
detection node (output for the mass identification). The 32 2.3. Optimization of feature selection
local features are ‘Wild nodes” located in a layer below
the detection node. Unlike the ANN where the input From initial 38 features, we used GA to select sub-sets of
features are continuous data (e.g., from 0 to l), in a BBN,
each node must be quantified to a fixed number of
features, xi,i = 1,2,... n , where n I38 . The selected
exclusive states. The continuous data for these features features were then connected to the input neurons in the
must be converted to discrete data. In this study, each ANN and the probability nodes in the BBN. The number
feature was divided into five discrete states. The methods ..
of hidden neurons (hj j = 1,2,. rn) in the ANN was
to convert these features into discrete states have been determined as half of the input neurons, or
reported elsewhere [20]. Although each feature vector
contains 38 features, many of them might be redundant.
+
m = (int)(0.5 n / 2 ) . There is one neuron in the
The redundant features used in the input nodes of an ANN output layer to represent the result of mass detection. The
or a BBN make very little contribution to information but detailed description of the ANN structure used in our CAD
add a lot of noise, which result in poor generalization for scheme was previously reported [19]. In this study, the
the networks. Evan though the topology of a BBN can be number of training iterations was fixed at 1,000. The
interpreted, manual selection of independent and effective momentum and learning rate were set as 0.8 and 0.01,
features is a difficult task. To find a small number of respectively. In the BBN, the features selected by the GA
independent features and eliminate the redundant ones in were located in the different nodes. If the selected feature
this feature vector, a genetic algorithm (GA) was used to was a global feature, it was placed in the top layq
optimize feature set and topologies of the ANN and the otherwise the feature was connected to one of the ‘Wild”
BBN. The GA sohare, GENESIS, was acquired from nodes in the BBN.
Prime Time Freeware for AI [21] and used in this study.
Because a GA is a task independent optimizer, users must
A GA solves a complex optimization problem by provide or define a fitness function and an evaluation
emulating evolutionary concepts that on& the strongest criterion, so that the GA has an optimization goal. In the
stvyiyes.A population of possible chromosomesis created, experiments, the fitness function was the receiver
evaluated, recombined, and mutated to generate more and operating characteristic (ROC) curve, and the evaluation
different chromosomes. The best are kept as a basis for criterion was the maximum area under the ROC curve
evolving better chromosomes. In general, a GA involves ( A , value). ROC curves were generated by the ROCFIT
the steps of initialization, evaluation, selection, search, and program [23], based on output data fiom the networks.
termination [22]. Although the software, GENESIS, Once GA selected a chromosome, a set of features was
provides a basic program structure for optimization, users also extracted to be used in the networks. The training
need to determine m y detailed parameters and functions, database was used to train the ANN by setting its weights
such as encoding, fitness function, and evaluation or to train the BBN by computing the conditional
criterion. probability table. After training the networks, the second
database (or evaluation database) was used to examine the
Based on their distributions in our databases, each of the performance of the networks. The output values of
38 features was normalized to a range between 0 and 1. In evaluation were directly used as input data to compute a
the GA, a binary coded chromosome was used. Each ROC curve. The ROCFIT program produced an A, value
feature corresponded to a gene in a chmosome (or a bit
of the structure defined in GEAESIS). In this binary coded for each selected chromosome. Chromosomes with higher
chromosome, I indicates the presence of a gene (the A,values have higher priority to be selected to generate
feature is used as an input node) and 0 indicates its absence new chromosomes in next generation by the GA using the
(the feature is rejected). In our experiments, all techniques of crossover and mutation. The GA was
chromosomes have fixedlength of 38 (including six global terminated when it had converged to a maximum A, value
features and 32 local features). The initial population size

4183
or the searching generation has reached to 50. Two and optimized using different databases are not
chromosomes that produced highest A,values for the comparable [24]. In this study, we used the same database
and the same optimization protocol to train and test two
ANN and the BNN were selected to build an optimal ANN
machine learning classifiers, an ANN and a BBN. Hence,
and an optimal BBN, respectively. The performance and
robustness of these networks were compared using another the performance of these two networks can be objectively
compared. When applied to a new independent database,
independent testing database containing 312 true-positive
the perfiormance deterioration of a CAD scheme may be
mass regions and 1,261 false-positive regions. This test
caused by two factors. The fist is bias in the training
database was never involved in any of the optimization
samples due to the Iimited size of the training set
processes. Finally, we tested a hybrid classifier, in which
feature vectors passed through two networks separately, compared to the diverse feature distributions in real
clinical testing populations. The second is data over-fitting
and the ultimate output was the average score from the
in certain learning algorithms, which makes classifiers
outputs of the two networks.
much more sensitive to the noise patterns in the testing
samples. The ANN and BBN represent two very typical
III. Results and popular classifiers used in CAD. The ANN uses a
black box ‘hill-climbing” method to search for the
Using the same training database and all 38 features to relationship between training samples and classification
train and test the ANN and BBN, we achieved A, values results. Both the sample bias and the data over-fitting have
of 0.791 f 0.012 and 0.783 f 0.011 on the evaluation impact in the robustness of the ANN. The BBN uses
database involving 172 positive mass regions and 1,876 Bayesian probability theory to find the optimal relationship
negative regions for the ANN and BBN, respectively. between the input features and output results. Although
there is no data over-fitting problem in the BBN, bias in
Table 1 demonstrates that after GA o p t i d o n , the the learning samples generates incorrect probability tables
number of features selected in both ANN and the BBN that reduce the robustness of the BBN. As a result, the
have been significantly reduced. Less than half of the ANN usually outperforms the BBN in training results, but
original 38 features were retained for these two networks. falls behind in testing new cases. In our experiments,
Although the features selected in two networks were not several methods have been used to minimize over-fitting
exactly the same, the perfonnance levels of the two during ANN training, which include limiting the number
networks converged to the same level. of training iterations and maintaining a large ratio between
momentum and learning rate [ll]. The training iteration
Table 1:Optimization results for the ANN and the BBN. number was limited to 1,000 and the ratio between
momentum and learning rate was set at 80.
I Network I Numberof I Number of I La A I
local features global features Although data over-fitting is a potential danger to testing
A” 12 2 0.866 an ANN, by using a GA and setting an appropriate fitness
BBN 14 3 0.868 criterion, the impact of over-fitting can be significantly
reduced, as shown in this study. In both optimization and
In an independent test of these optimal networks, the independent testing, the ANN and the BBN achieved the
results for the ANN and the BBN remained at the same same performance level (A, value), which clearly
level. For the ANN, A, value was 0.847 f 0.014, and for indicates that, in this experiment, the performance
‘tletmioration” in independent testing is mainly caused by
the BBN, A, value was 0.845 f 0.01 1. Finally, using a the bias in the training database. There is no significant
hybrid classifier containing both the ANN and the BBN, difference between using an ANN and a BBN in our CAD
the A, value on the test database was increased to 0.859 f scheme for mass detection, as long as each network has
0.01. been properly optimized or trained. This study
demonstrated that improving performance and robustness
of CAD schemes might be more dependent on feature
IV.Discussion selection and database diversity than on any particular
machine learning or classification algorithm.
Objectively evaluating CAD performance and robustness
is a very complicated and difficult task [18]. The
performance of a scheme depends on many factors, such as V. Acknowledgements
case difficulty in the training and testing databases [24],
the size of training database [19], validation methods [25], This work is supported in part by the National Cancer
and the ground truth for the comparison [26]. Basically, Institute under the grants of CA77850 and CA79587, and
the CAD schemes that developed at different institutions US Army under mt DAMD17-98-1-8018.

4184
13. Teach RL, Shortliffe EH, An analysis of physician attitudes
VI. References regarding computer-based clinical consultation systems,
Cbmput. Biomed.Res. 1981; 14:542-548.
14. Chan HP, Sahiner B, Wagner RF, Petrick N, Effects of
1. Vybomy CJ , Giger ML, Computer vision and artificial sample size on classifier design for computer-aided
intelligence in mammography, Am J. Roentgen 1994; diagnosis,Proc SPZE 1998; 3338~845-858.
162699-708. 15. Mitchell TM, Machine learning, WCB McGraw-Hill,
2. Kegelmeyer WP,Pnmeda JM, Bourland PD, Hills A, Eggs Boston, MA, 1997.
M W , Nipper ML, computer-aided mammographic screening 16. Michie D., Spiegelhalter DJ, Taylor CC, Machine learning,
for speculated lesions,Radwbgy 1994; 191:331-337. neural and statistical clawijhtion, Ellis Horwood, New
3. Zheng By Chang YH, Gur D, Computerized detection of York, NY,1994.
masses from digitized mammograms using a single image 17. Kahn CE, Roberts LM, Shaffer KA, Haddawy P,
segmentation and a multi-layer topographic feature analysis, Construction of a Bayesian network for mammographic
A d Radiol 1995;2:959-966. diagnosis of breast cancer, Conput. Biol. Med. 1997; 27:19-
4. Zhang W, Doi K, Giger ML, Nishikawa RM, Schmidt RA, 29.
An improved shift-invariant artificial neural network for 18. Nishikawa RM, Variations in measured performance of
computerized detection of clustered microcalcifications in CAD schemes due to database composition and scoring
digital mammograms,Med Phys 1996; 23:595-601. protocol, Proc SPZE on MedZmaging 1998; 3338:838-844.
5. Sahiner B,Chan HP, Wei D, Petrick N, Helvie MA, Adler 19. Zheng B, Chang YH, Good WF, Gur D, Adequacy testing of
DD, Goodsilt MM, Image feature selection by a genetic training set sample sizes in the development of a computer-
algorithm: application to classification of mass and normal assisted diagnosis scheme,AcadRadw11997; 4:497-502.
breast tissue, MedPhys 1996; 23:1671-1684. 20. Zheng B, Chang YH, Wang XH, Good WF, Gur D,
6. Li L, Qian W, Clarke LP, Computer-assisted diagnosis Application of a Bayesian belief network in a computer-
method for mass detection with multiorientation and assisted diagnosis scheme for mass detection,Proc SPZE on
multiresolution
-_
wavelet transforms, Acad Radiol 1997; M e d Z ~ g i n g1999; 3661-167.
4:724-73 1. 21. KantmGk My(editor), hime time freeware for AI, h u e 1-
7. Polakowski WE, Coumoyer DA, Rogers SK, Computer- 1, Selected materials from the Carnegie Mellon University
aided breast cancer detection and diagnosis of masses using Artificial Intelligence RepositoryJ994.
difference of Gaussians and derivative-based feature 22. G, Genetic algorithms,in chapter 12 offie -book
saliency,IEEE Trans Med Imaging 1997; 16:811-819. of appliedexpert Wtems, edited by Liebowik J, CRC Press,
8. Rymon R, Zheng B, Chang YH, Gur D, Incorporation of a Boca Raton,FL, 1997.
set enumeration treebased classifierinto a hybrid computer-
assisted diagnosis scheme for mass detection,Ad Radiol
23. Kronman =, wangpL, shen m, ROCFIT: a
modified maximum likelihood algorithm for estimating a
1998; 5:181-187. binormal ROC curve from confidencerating data,
9. Cheng HD, Lui YM, Freimanis, A novel approach to University of Chicago, Chicago, E, 1985.
microcdcification *tion using filzzy logic technique, 24. NiShikawa, RM,Giger a,mi K, Effect of case selection
ZEEE Trans MedZmaging 1998; 17:442-450. on the performance of computer-aided detection schemes,
10. Yu S, Guan L,Brown S, Automated detection of clustered M d P h y ~1994; 21~265-269.
microcalcifications in 25. Tourassi GD, Floyd CE, The effect of data sampling on the
ElectronicImaging 1999,8:76-82. performance evaluation of artificial neural networks in
11. Schalkoff R, pattern recornition: statistical. structural and medical diagnosis,Med. De&. Making 1997,17:186-197.
neural~roachesJohnWile~&sOns,Inc.NewYork,~, 26. mer@ M, meyGM, Gaviria J, Evaluating the
1992. performance of detection algorithms in digital
12. Diederich J, Explanation and artificial neural networks,Int. mammography,Med. Phys. 1999,26267-275.
J. M~~-Machine Stud 1992;37:335-341.

4185

You might also like