Probabilistic Neural Network For Breast Cancer Classification - SpringerLink

Advertisement
Search
Log in
Search SpringerLink
Search
Probabilistic neural network for breast cancer classification
Download PDF
Download PDF
Original Article
Published: 25 August 2012
Probabilistic neural network for breast cancer

classification
Ahmad Taher Azar 1 &
Shaimaa Ahmed El-Said2
Neural Computing and Applications volume 23, pages1737–1751(2013)Cite this article
1107 Accesses
42 Citations
2 Altmetric
Metrics details
Abstract
Among cancers, breast cancer causes second most number of deaths in women. To reduce the
high number of unnecessary breast biopsies, several computer-aided diagnosis systems have
been proposed in the last years. These systems help physicians in their decision to perform a
breast biopsy on a suspicious lesion seen in a mammogram or to perform a short-term follow-
up examination instead. In clinical diagnosis, the use of artificial intelligent techniques as
neural networks has shown great potential in this field. In this paper, three classification
algorithms, multi-layer perceptron (MLP), radial basis function (RBF) and probabilistic neural
networks (PNN), are applied for the purpose of detection and classification of breast cancer.
Decision making is performed in two stages: training the classifiers with features from
Wisconsin Breast Cancer database and then testing. The performance of the proposed
structure is evaluated in terms of sensitivity, specificity, accuracy and ROC. The results
revealed that PNN was the best classifiers by achieving accuracy rates of 100 and 97.66 % in
both training and testing phases, respectively. MLP was ranked as the second classifier and
was capable of achieving 97.80 and 96.34 % classification accuracy for training and validation
phases, respectively, using scaled conjugate gradient learning algorithm. However, RBF
performed better than MLP in the training phase, and it has achieved the lowest accuracy in
the validation phase.
Access provided by Egyptian Knowledge Bank
Introduction
Breast cancer is a disease in which abnormal cells in the breast divide and multiply in an
uncontrolled fashion; it also refers to the erratic growth and proliferation of cells that
originate in the breast tissue [53]. Hence, cancer of breast tissue is called breast cancer. The
cells can invade nearby tissues and can spread through the bloodstream and lymphatic
system (lymph nodes) to other parts of the body. Early breast cancer usually does not cause
pain and may exhibit no noticeable symptoms. As the cancer progresses, signs and symptoms
can include a lump or thickening in or near the breast; a change in the size or shape of the
breast; nipple discharge, tenderness or retraction (turning inward); and skin irritation,
dimpling or scaliness. These changes can occur as part of many different conditions, however.
Breast cancer continues to be a significant public health problem in the world. Worldwide,
breast cancer comprises 22.9 % of all cancers in women [13]. In 2008, breast cancer caused
458,503 deaths worldwide (13.7 % of cancer deaths in women) [13]. In the west, earlier
research has demonstrated that one in nine women will develop breast cancer in their life,
and this risk has been further stratified according to age, with patients up to 25 years, 1 in
15,000; up to age 30, 1 in 1,900; and up to 40, 1 in 200 [61, 67]. In Egypt, breast cancer is the
most common cancer among women, representing 18.9 % of total cancer cases (35.1 % in
women and 2.2 % in men) among the Egypt National Cancer Institute (NCI) series of 10,556
patients during the year 2001 [30, 68], with an age-adjusted rate of 49.6 per 100,000
population. Currently there are no methods to prevent breast cancer, which is why early
detection represents a very important factor in cancer treatment and allows reaching a high
survival rate. Mammography is considered the most reliable method in early detection of
breast cancer. Due to the high volume of mammograms to be read by physicians, the accuracy
rate tends to decrease and automatic reading of digital mammograms becomes highly
desirable. That is why the computer-aided diagnosis systems are necessary to assist the
medical staff to achieve high efficiency and effectiveness.
Breast cancer diagnosis
Increased breast cancer awareness with breast self-examinations and major improvements in
routine breast cancer screening had a paramount effect on early detection of breast cancer.
Early diagnosis requires an accurate and reliable diagnosis procedure that allows physicians
to distinguish benign breast tumors from malignant ones without going for surgical biopsy.
Improvements in conventional mammography (an X-ray technique to multimodal breast
cancer diagnosis is supported by automatic localization of small lesions, which are only
visible in the mammograms or the MR image.
Breast cancer screening tests
A mammogram is the current gold standard imaging modality for breast cancer diagnosis,
providing detailed 2D projection images of the compressed breast. A diagnostic mammogram
is used to diagnose breast disease in women who have breast symptoms or an abnormal
result on a screening mammogram. Two types of mammography are known: screen-film
mammography (SFM) and digital mammography (DM). Screening mammograms are used to
look for breast disease in women who are asymptomatic; that is, those who appear to have no
breast problems. Screening mammograms usually take two views (X-ray pictures taken from
different angles) of each breast. For a mammogram, the breast is compressed between two
plates to flatten and spread the tissue. The compression only lasts a few seconds. The entire
procedure for a screening mammogram takes about 20 min. SFM (also known as conventional
mammography) has its known limitations, which include the variability of diagnosis among
the screening radiologists as well as limitations in the detection of benign tumors [38]. Hence,
DM was introduced as an alternative diagnostic technique in order to overcome the problems
of SFM. DM incorporates a new technique called computer-aided diagnosis (CAD), which
employs the tools of image processing for image enhancement and diagnosis [6, 20, 26, 27, 51].
Digital mammograms are recorded and stored on a computer. After the examination, the
doctor can view them on a computer screen and adjust the image size, brightness or contrast
to see certain areas more clearly. Subjectivity among screening radiologists in the
interpretation of mammograms results in a high percentage of misdiagnosed cancer cases and
a high percentage of missed cancer cases. This subjectivity is the end product of several
factors including radiologists’ fatigue, incompetence and lack of training, to name a few.
Mammograms show a projection of the breast that can be made from different angles. The
two most common projections are medio-lateral oblique and cranio-caudal (see Fig. 1).
Fig. 1
Mammograms of two oblique and two cranio-caudal films [90]
Full size image
The advantage of the medio-lateral oblique projection is that almost the whole breast is
visible, often including lymph nodes. Part of the pectoral muscle will be shown in upper part
of the image, which is superimposed on a portion of the breast. The cranio-caudal view is
taken from above, resulting in an image that sometimes does not show the area close to the
chest wall. For certain women at high risk of breast cancer, screening magnetic resonance
imaging (MRI) is recommended along with a yearly mammogram [96, 97]. MRI is not generally
recommended as a screening tool by itself, because although it is a sensitive test, it may still
miss some cancers that mammograms would detect. MRI may also be used in other situations,
such as to better examine suspicious areas found by a mammogram. MRI can also be used in
women who have already been diagnosed with breast cancer to better determine the actual
size of the cancer and to look for any other cancers in the breast. In addition to providing 3D
images with excellent soft tissue contrast, MR imaging has the ability to show dynamic
functionality of the breast through the use of contrast agent injections [84]. MRI scans use
magnets and radio waves instead of X-rays to produce very detailed, cross-sectional images of
the body (see Fig. 2). Clinical breast MR imaging is performed with the use of a contrast agent,
referred to as contrast-enhanced (CE) imaging. MRI contrast agents contain gadolinium, a
paramagnetic material that affects the magnetic response of nearby atomic nuclei and causes
them to appear brighter on the resulting image [28].
Fig. 2
Breast cancer screening by magnetic resonance imaging (MRI)
(www.Breast.Cancer.org)
Full size image
Ultrasound, also known as sonography [4], is an imaging method in which sound waves are
used to look inside a part of the body. For this test, a small, microphone-like instrument, called
a transducer, is placed on the skin (which is often first lubricated with ultrasound gel). It emits
sound waves and picks up the echoes as they bounce off body tissues. The echoes are
converted by a computer into a black-and-white image that is displayed on a computer screen.
This test is painless and does not expose the patient to radiation. Some studies have suggested
that ultrasound may be a helpful addition to mammography when screening women with
dense breast tissue (which is hard to evaluate with a mammogram), but the use of ultrasound
instead of mammograms for breast cancer screening is not recommended [5, 22, 86, 98].
Ultrasound images in Fig. 3 reveal a hypoechoic, poorly defined, irregular mass in the breast.
There is also evidence of acoustic shadowing posteriorly. These findings on sonography
suggest malignant mass of the breast.
Fig. 3
Ultrasound images of breast malignant lesion (ultrasound image gallery

http://www.ultrasound-images.com; images courtesy of Dr. Nirmali Dutta, UAE)
Full size image
Diagnosis difficulties
Digital mammograms are among the most difficult medical images to be read due to their low
contrast and differences in the types of tissues. Important visual clues of breast cancer include
preliminary signs of masses and calcification clusters. Also, tumors are of different shapes,
and some of them have the characteristics of the normal tissue. All these reasons make the
decisions that are made on such images more difficult. Unfortunately, in the early stages of
breast cancer, these signs are very subtle and varied in appearance, making diagnosis
difficult, challenging even for specialists. A false-positive detection may cause an unnecessary
biopsy. Statistics show that only 20–30 % of breast biopsy cases are proved cancerous. In a
false-negative detection, an actual tumor remains undetected that could lead to higher costs or
even to the cost of a human life. Here is the trade-off that appears in developing a
classification system that could directly affect human life. In addition, the tumors existing are
of different types.
Problem statement
Although mammography is considered the most reliable method in early detection of breast
cancer, the high volume of mammograms to be read by physicians reduces the accuracy rate.
Accurate diagnosis of mammogram images data is not an easy task and is always time
consuming. In some extreme scenarios, diagnosis with wrong result and delay in delivery of a
correct diagnosis decision could occur due to the complexity and cognitive process of which it
is involved. Hence, automatic reading of digital mammograms becomes highly desirable. The
use of artificial intelligent techniques such as artificial neural networks (ANN) has shown
great potential in this field. With the involvement of soft computing, the pattern matching,
classification and detection of algorithms that have direct applications in many medical
problems have become much easier to be implemented and diagnosed. Hence, this paper tries
to find out the efficiency of the Delta Rule NN, Parzen PNN and RBE techniques in the breast
cancer classification and detection purposes. These systems classify the digital mammograms
in two categories: normal and abnormal. The normal ones are those characterizing a healthy
patient. The abnormal ones include both benign cases, representing mammograms showing a
tumor that is not formed by cancerous cells, and malignant cases, those mammograms taken
from patients with cancerous tumors.
The rest of the paper is organized as follows: Sect. 2 depicts the related work. Subjects and
methods are presented in Sect. 3. The description of neural network classification systems are
presented in Sect. 4. The performances analysis and comparison of the classification systems
are introduced in Sect. 5. Finally, the conclusion is presented in Sect. 6.
Related work
Artificial neural networks (ANN) possess a variety of alternative features such as massive
parallelism, distributed representation and computation, generalization ability, adaptability
and inherent contextual information processing [12, 23, 37, 39, 44, 63, 76]. An ANN is an
information processing system that roughly replicates the behavior of a human brain by
emulating the operations and connectivity of biological neurons [93]. In many fields of clinical
medicine and biomedical engineering, artificial neural networks (ANN) have been used
successfully to solve complex and chaotic problems without the need of mathematical models
and a precise understanding of the mechanisms involved [21, 55, 78, 92, 71, 75, 94]. When NN
is used in medical diagnosis, they are not affected by factors such as human fatigue, emotional
states and habituation. They are capable of rapid identification, analyses of conditions and
diagnosis in real time. This paper is an extension of much of the earlier works of breast cancer
using ANN [1–3, 15, 17, 19, 24, 31, 33, 35, 41, 46, 50, 52, 54, 58–60, 62, 65, 66, 77, 79, 85, 89, 91].
Recently, Salim et al. [81] have developed an approach in breast cancer diagnosis by using
hybrid magnetoacoustic method (HMM) and artificial neural network. The result shows the
advantages of HMM outputs in providing a combination of bioelectric and acoustic
information of tissue for a better breast cancer diagnosis consideration. An extensive survey
is carried out by Pradhan and Sahu [73] on artificial neural network–based cancer prediction
using soft computing approach. They have attempted to explain, compare and assess the
performance of different soft computing techniques that are being applied to cancer
prediction and prognosis. A comparative study is performed by Padmavati [69] on Wisconsin
breast cancer (WBC) dataset for breast cancer prediction using radial basis function (RBF) and
multi-layer perceptron (MLP) along with logistic regression. Logistic regression was
performed using logistic regression in SPSS package, and MLP and RBF were constructed
using MATLAB. It was observed that neural networks took slightly higher time than logistic
regression, but the sensitivity and specificity of both neural network models had a better
predictive power compared with logistic regression. Kala et al. [49] have proposed modular
evolutionary neural network architecture for breast cancer diagnosis. The performance with
the use of evolutionary neural networks was 96.28 % on the training data and 95.78 % on the
testing data. The major limitation of this work is that all simulations are restricted to a single
database of breast cancer. This database is not very scalable in nature. Hence, the
experimentation needs to be done on different databases with more attributes and data
recordings. Sarvestan et al. [83] have provided a comparison analysis among the capabilities
of various neural networks such as multi-layer perceptron (MLP), self-organizing map (SOM),
RBF and probabilistic neural network (PNN) that are used to classify breast cancer data. The
performance of these neural network structures was investigated for breast cancer diagnosis
problem. RBF and PNN were proved as the best classifiers in the training set. But the PNN
gave the best classification accuracy when the test set is considered. This work showed that
statistical neural networks can be effectively used for breast cancer diagnosis by applying
several neural network structures. Islam et al. [42] have presented an automatic mass
classification into benign and malignant based on the statistical and textural features
extracted from mass from the breast region using ANN. Using the proposed ANN classifier,
sensitivity of 90.91 % and specificity of 83.87 % were achieved. Rani [74] has proposed a
parallel approach by using neural network technique to help in the diagnosis of breast cancer.
The neural network is trained with breast cancer database by using feed-forward neural
network model and back-propagation learning algorithm with momentum and variable
learning rate. It was observed that 92 % test data are correctly classified and 8 % are
misclassified because of the analog conversion to digital conversion of dataset. Janghel et al.
[45] have implemented four models of neural networks: back-propagation, RBF, learning
vector quantization (LVQ) and competitive learning (CL) network. The experimental analysis
showed that LVQ has the best performance in the testing dataset followed by CL, MLP and
RBF.
Subjects and methods

In this comparative study, the medical data related to breast cancer is considered. This
database was obtained from the University of Wisconsin Hospital, Madison, from Dr. William
H. Wolberg [56, 57, 95]. Wisconsin breast cancer database can be used to predict the severity
(benign or malignant) of a mammographic mass lesion from the patient attributes. It consists
of nine input real variables, two output classes and 699 cases, of which 458 are diagnosed as
benign and the remaining 241 are known to be malignant that have been identified on full-
field digital mammograms. After the 16 instances are removed from the dataset due to
missing values, there are 683 instances, of which 444 are benign (65.01 %) and the remaining
239 (34.99 %) are diagnosed as malignant. There are totally 10 attributes (one class and nine
numeric features), such as clump thickness, uniformity of cell size, uniformity of cell shape,
marginal adhesion, single epithelial cell size, bare nuclei, bland chromatin, normal nucleoli
and mitoses. It is obvious that there is a close relationship between these characteristics and
identification of benign and malignant cells. The data were rescaled to the range [0–1] because
of preventing the data in the greater numeric range dominating those in the smaller range,
and the data should be normalized before being entered to the classifier.
Classifier models description

This section provides detailed description for three common ANNs classifiers: multi-layer
perceptron (MLP), radial basis function (RBF) and probabilistic neural networks (PNNs). MLPs
are very common, but require a large amount of time to train and assign weights to the
neurons. RBFs are similar to MLPs in that the neurons have weights, but they have fewer
weights to train and each neuron is assigned a distribution. GRNNs are similar to RBFs in that
each neuron is assigned a distribution, but there are no weights to train, making them
relatively fast compared with RBFs and MLPs [11].
Multi-layer perceptron (MLP)
Multi-layer perceptron (MLP) is one of the most frequently used neural network architectures
in biomedical applications, and it belongs to the class of supervised neural networks [14]. It
performs a humanlike reasoning, learns the attitude and stores the relationship of the
processes on the basis of a representative dataset that already exists. The attraction of MLP
can be explained by the ability of the network to learn complex relationships between input
and output patterns, which would be difficult to model with conventional methods. It consists
of a network of nodes arranged in layers. A typical MLP network consists of three or more
layers of processing nodes: an input layer, one or more hidden layers and an output layer (see
Fig. 4). The input layer distributes the inputs to subsequent layers. Input nodes have liner
activation functions (AFs) and no thresholds. Each hidden-unit node and each output node
have thresholds associated with them in addition to the weights. The hidden-unit nodes have
nonlinear AFs, and the outputs have linear AFs. Hence, each signal feeding into a node in a
subsequent layer has the original input multiplied by a weight with a threshold added and
then is passed through the AFs that may be linear or nonlinear (hidden units). Note that
unlike other layers, no computation is involved in the input layer. The principle of the
network is that when data are presented at the input layer, the network nodes perform
calculations in the successive layers until an output value is obtained at each of the output
nodes. This output signal should be able to indicate the appropriate class for the input data.
Fig. 4
Multi-layer perceptron (MLP) neural network
Full size image
As shown in Fig. 4, the input to the jth hidden layer neuron is given by:
Ni
netj = ∑ wji xi − θj 1 ≤ j ≤ NH
i=1
(1)
where net j is the weighted sum of inputs x 1, x 2,…, x p , θ j is the bias, w ji is the connection
weight between the input x i and the hidden neuron j. The output of the jth hidden neuron is
expressed by:
yj = f (netj )
(2)
The logistic (sigmoid) function is a common choice of the AFs in the hidden layers, as defined
in Eq. (3):
1
f (netj ) =
−snetj
1 + e
(3)
where s is the slope of the sigmoid function, and adjusting the value of s to less than unity
makes the slope shallower with the effect that the output will be less clear. The bias term θ j
contributes to the left or right shift of the sigmoid AFs, depending on whether θ j takes a
positive or negative value. This function approaches 1, for large positive values of net j and 0
for large negative values of net j [48, 82]. The input to the kth output neuron is given by:
NH
netk = ∑ yj wjk
j=1
(4)
The output of` whole neural network is given by:
yk = f (netk )
(5)
The overall performance of the MLP is measured by the mean square error (MSE) expressed
as:
Nv Nv M
1 1 2
E = ∑ Ep = ∑ ∑ (tk (i) − yk (i))
N N
p=1 p=1 i=1
(6)
where E p corresponds to the error for the Pth pattern, N is the number of training patterns, t k
is the desired target of the kth output neuron for the Pth pattern, and y k is the output of the
kth output neuron from the trained network.
Radial basis functions (RBF)
The construction of a radial basis function network in its most basic form involves three
entirely different layers. The input layer is made up of source nodes. The second layer is a
hidden layer of high enough dimension, which serves a different purpose from that in a multi-
layer perceptron. The output layer supplies the response of the network to the activation
patterns applied to the input layer. The transformation from the input space to the hidden-
unit space is nonlinear, whereas the transformation from the hidden-unit space to the output
space is linear. A static Gaussian function is used as the nonlinearity for the hidden layer
processing elements (see Fig. 5). The Gaussian function responds only to a small region of the
input space where the Gaussian is centered [16].
Fig. 5
Example of radial basis function neural networks [11]
Full size image
The Gaussian activation function for RBF networks is given by:

−1
T
φj (X) = exp[−(x − μj ) ∑ (x − μj )]
(7)
for j = 1,…,L, where X is the input feature vector, L is the number of hidden units, and μ j and Σ
j are the mean and the covariance matrix of the jth Gaussian function. The mean vector μ j
represents the location, while Σ j models the shape of the activation function. Statistically, an
activation function models a probability density function where μ j and Σ j represent the first-
and second-order statistics. The output layer implements a weighted sum of hidden-unit
outputs:
L
ψk (X) = ∑ λjk φj (X)
j=1
(8)
for k = 1,…, M, where λ j are the output weights, each corresponding to the connection between
a hidden unit and an output unit, and M represents the number of output units. The weights λ
j show the contribution of a hidden unit to the respective output unit. In a classification
problem if λ j > 0, the activation field of the hidden unit j is contained in the activation field of
the output unit k. In pattern classification applications, the output of the radial basis function
is limited to the interval (0, 1) by a sigmoidal function:
1
Yk (X) =
1 + exp[−ψk (X)]
(9)
The advantage of the radial basis function network is that it finds the input to output map
using local approximators. Usually the supervised segment is simply a linear combination of
the approximators. Since linear combiners have few weights, these networks train extremely
fast and require fewer training samples.
Probabilistic neural networks (PNN)
Probabilistic neural networks can be used for classification problems. When an input is
presented, the first layer computes distances from the input vector to the training input
vectors and produces a vector whose elements indicate how close the input is to a training
input [8]. The second layer sums these contributions for each class of inputs to produce as its
net output a vector of probabilities. Finally, a compete transfer function on the output of the
second layer picks the maximum of these probabilities and produces a 1 for that class and a 0
for the other classes [8]. The Parzen probabilistic neural network (PPNN) is a simple type of
neural network used to classify data vectors. These classifiers are really fast, simple to learn
and based on the Bayesian theory where a posteriori probability density function is estimated
from data using the Parzen window technique [88]. The good classification performance can
be obtained for a certain class of data distributions. The Bayesian classifiers use the Bayesian
equation to estimate the posteriori probability density P(w i |x):
P (x/wi )P (wi )
P (wi /x) =
∑ P (x/wi )P (wi )
i
(10)
Obviously, this method needs the probabilities P(x|w i ) and P(w i ) to be known. A technique is
to parameterize this pdfs; another is to estimate them from data. The Parzen window
technique estimates the probability defining a window (given the window size) and a function
on this window (i.e., a hypersphere with the Gaussian function truncated inside). This
obviously requires that the window function must have the integral (the hyper volume under
the function) equal to 1 to maintain the scale in the estimated pdf. The PPNN is a simple tool
that is the composition of the pdf estimation with the Parzen window and the Bayesian
classification. In this quick explanation, the particular derivations are not reported, but can be
found in Duda et al. [29]. A Parzen PNN is a four-layer neural network (NN) where the input
data are fully connected with the first neuron layer and the first layer is sparsely connected
with the second (and output) layer (see Fig. 6). The output layer is composed of c neurons
where c is the number of classes of the classifier. The weights on the first layer are trained as
follows: Each sample datum is normalized so that its length becomes unitary; each sample
datum becomes a neuron with the normalized values as weights w. The input data x is so dot-
multiplied by the weights obtaining the network activation signal (net) as in Eq. (11).
Fig. 6
Architecture of a four-layered PNN for n training cases of two classes
Full size image
xT
net =
xT
net = w
(11)
Then, the exponential nonlinearity is computed to obtain the synaptic activation signals (act)
as in Eq. (12).
net + 1
act = exp [sigm ( )]
2
(12)
During the learning process, each first-layer neuron is connected to the output layer neuron
related to its class with weight 1. During the classification process, the output neuron of each
class sums the activation signals from all the neurons of the first layer. Simply, the highest
output value selects the class of the input data.
Learning algorithm of Parzen PNN
The pattern layer contains one pattern neuron for each training case, with an exponential
activation function. A pattern neuron N i computes the squared Euclidean distance between a
new input vector X r and the ith training vector of the jth class. This distance is then
transformed by the neuron’s activation function (the exponential). In the PNN of Fig. 6, the
training set comprises cases belonging to two classes: A and B. In total, m training cases belong
to class A [10]. The associated pattern neurons are N 1,…,N m . For example, the neuron N 3
contains the third training case of class A. Class B contains n − m training cases; the associated
pattern neurons are N m+1,…,N n . For example, the neuron N m+2 contains the second training
case of class B. For each class, the summation layer contains a summation neuron. Since we
have two classes in this example, the PNN has two summation neurons. The summation
neuron for class A sums the output of the pattern neurons that contain the training cases of
class A. The summation neuron for class B sums the output of the pattern neurons that
contain the training cases of class B. The activation of the summation neuron for a class is
equivalent to the estimated density function value of this class. The summation neurons feed
their result to the output neurons in the output layer [10]. The training of PNN involves no
heuristic searches, but consists essentially of incorporating the training cases into the pattern
layer. However, finding the best smoothing factor for the training set remains an optimization
problem. PNNs tolerate erroneous samples and outliers. Sparse samples are adequate for the
PNN. When new training data become available, PNN do not need to be reconfigured or
retrained from scratch; new training data can be incrementally incorporated in the pattern
layer.
Results and discussion

Performance analysis
The performance of each ANN classifier was evaluated by using performance indices such as
sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV),
accuracy and F-measure. Receiver operating characteristic, ROC, curves are also used to
evaluate the performance of a diagnostic test [43]. This method consists of a lot of information
for comprehensibility and improving classifiers performance. The ROC curve is defined as
sensitivity and specificity that are the number of true-positive decision over the number of
actually positive cases and the number of true-negative decisions over the number of actually
negative cases, respectively. This method leads to the measurement of diagnostic test
performance [70]. The advantages of ROC analysis are the robust description of the network’s
predictive ability and an easy way to change the existence network based on differential cost
of misclassification and varying prior probabilities of class occurrences. However, it requires
visual inspection because the best classifiers are hard to recognize when the curves are
mixed.
Training phase of classifiers
The classification process starts by obtaining a dataset (input–output data pairs) and dividing
it into a training set and validation dataset. To avoid overfitting problems during modeling
process, k-fold cross-validation was used for better reliability of test results [34]. In k-fold
cross-validation, the original sample is randomly partitioned into k subsamples. A single
subsample is retained as the validation data for testing the model, and the remaining k − 1
subsamples are used as training data. The cross-validation process is then repeated k times
(the “folds”), with each of the k subsamples used exactly once as the validation data. The
average of the k results gives the validation accuracy of the algorithm [25]. In this study,
tenfold cross-validation was used.
During training of MLP, the weights and biases of the network are iteratively adjusted to
minimize the network performance function (see Eq. 6). Back-propagation (BP) is widely used
for classification modeling by using the concept of MLP training and testing [80]. However, the
major disadvantages of standard BP are its convergence rate relatively slow and being
trapped at the local minima [100]. Back-propagation using gradient descent often converges
very slowly or not at all. On large-scale problems, its success depends on user-specified
learning rate and momentum parameters. There is no automatic way to select these
parameters, and if incorrect values are specified, the convergence may be exceedingly slow,
or it may not converge at all. While back-propagation with gradient descent is still used in
many neural network programs, it is no longer considered to be the best or fastest algorithm.
Therefore, many powerful optimization algorithms based on conjugate gradient algorithms
have been devised for faster convergence than steepest descent direction methods [7, 9, 32, 36,
40, 47, 72]. However, these methods require a line search at each iteration. This line search is
computationally expensive, because it requires that the network response to all training
inputs be computed several times for each search. Therefore, the scaled conjugate gradient
algorithm (SCG), developed by Moller [64], was designed to avoid the time-consuming line
search per learning iteration, which makes the algorithm faster than other second-order
algorithms. And also, we got better results than with other training methods and neural
networks tested, as standard back-propagation neural network.
To choose the best architecture of MLP neural network, it was trained and tested for different
configurations. In fact, three-layer model (one input, one hidden and one output) is
recommended for MLP. The number of neurons used in the hidden layer was varied
systematically between 2 and 20 with one step while monitoring the classification error. All
networks were trained in batch mode. Different topologies with nine inputs and three layers
(9-X-2) were sequentially trained, using log-sigmoid (logsig) activation function (AF) in the
hidden layer with exception of the input layer, for which linear AF were used at all times. The
weights of the network with the minimum training error were saved as the final network
structure. The optimum topology for learning algorithm was found by experiment to be 9:7:2
as shown in Fig. 7. As noted from Fig. 7, seven neurons in the hidden layer resulted in the
lowest error misclassification percentage rate of 2.9283. Training parameters of MLP are
summarized in Table 1.
Fig. 7
The optimum topology of learning algorithm for MLP
Full size image
Table 1 MLP training parameters

Full size table
The training phase of RBF is started by determining the number of neurons in the hidden
layer, the coordinates of the center of each hidden layer, the radius (spread) of each RBF
function in each dimension and the weights applied to the RBF function outputs as they are
passed to the summation layer [87]. The training parameters of RBFNN are summarized in
Table 2. The optimum number of neurons is in the hidden layer is determined using an
evolutionary method called repeating weighted boosting search (RWBS) developed by Chen et
al. [18]. This algorithm determines the optimal center points and spreads for each neuron [87].
After training, it was found that 23 neurons is the optimum number in the hidden layer of
RBFNN as shown in Fig. 8.
Table 2 RBF training parameters

Full size table
Fig. 8
The optimum number of hidden neurons for RBF using weighted boosting search
(RWBS) optimization technique
Full size image
The primary work of training a PNN network is selecting the optimal sigma values to control
the spread of the RBF functions. Conjugate gradient algorithm was used to compute the
optimal sigma values in this study [87]. One of the disadvantages of PNN model compared
with MLP networks is that PNN model is large due to the fact that there is one neuron for each
training row. This causes the model to run slower than MLP network when using scoring to
predict values for new rows [87]. A separate sigma value for each predictor variable was
calculated in the model. This allows the influence of each variable on neighboring points to
differ, and it is a good compromise between having a single sigma and allowing a separate
sigma for each target category. A Gaussian function was used as the best kernel causing the
influence of a point to decline according to the value (height) of a Gaussian distribution
centered on the point.
The classification results of the training set by MLP, RBF and PNN are given in Table 3 using a
confusion matrix. In a confusion matrix, each cell contains the raw number of exemplars
classified for the corresponding combination of desired and actual classifier outputs where TP
and TN are the number of samples that are correctly identified as positives or negatives by the
classifier in the test set, respectively, and FN and FP represent the numbers of samples
corresponding to those cases as they are mistakenly classified as benign or malignant,
respectively. Considering imbalanced positive and negative samples in the datasets, another
appropriate quantity for evaluating the classification accuracy of imbalanced positive and
negative samples is the Matthews correlation coefficient (MCC), which is given as follows [99]:
TP. TN − FN. FP
MCC =
−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−− −
√(TP + FN) (TP + FP) (TN + FN) (TN + FP)
(13)
Table 3 Classification results of the training phase

Full size table
Obviously, the scope of the MCC is within the range of [−1, 1]. The larger the MCC value, the
better the classifier performance.
The performance analysis of the training set by MLP, RBF and PNN are given in Table 3 and
also represented graphically in Fig. 9. As noted from Tables 3 and 4, PNN gives the best
accuracy in training phase with 100 % with 444 correct classifications followed by RBF with a
percentage of 98.10 % and 436 correct classifications, while MLP has the lowest accuracy of
97.80 % with 435 correct classifications. Value of ROC and MCC for PNN achieved 1.00, which
is a superior classifier.
Fig. 9
Performance comparison of MLP, RBF and PNN classifiers during training phase
Full size image
Table 4 Performance indices for training phase of classifiers

Full size table
Besides classification accuracy, the amounts of times needed for classifier construction and
for classification are also an important factor for consideration. In this regard, computational
expenses were compared between all methods as shown in Table 4. Results showed that the
training time of RBF was much longer than the other methods. MLP was significantly faster
than other types of classifiers. Overall, the time cost difference mainly came from the training
phase as the testing time for all methods was short and practically feasible.
Validation phase of classifiers
Once the model structure and parameters have been identified, it is necessary to validate the
quality of the resulting model. In principle, the model validation should not only validate the
accuracy of the model, but also verify whether the model can be easily interpreted to give a
better understanding of the modeled process. It is therefore important to combine data-driven
validation, aiming at checking the accuracy and robustness of the model, with more subjective
validation, concerning the interpretability of the model. There will usually be a challenge
between flexibility and interpretability, the outcome of which will depend on their relative
importance for a given application. While it is evident that numerous cross-validation
methods exist, the choice of the suitable cross-validation method to be employed in the ANN is
based on a trade-off between maximizing method accuracy, stability and minimizing the
operation time. In this research, tenfold cross-validation method is adopted for ANN because
of its accuracy and possible implementation. The classification and performance results of the
validation phase by MLP, RBF and PNN are given in Tables 5 and 6.
Table 5 Classification results during the validation phase

Full size table
Table 6 Performance indices for validation phase of classifiers
Full size table
As noted from Tables 5 and 6, PNN gives the best accuracy in the validation phase with
97.66 % with 438 correct classifications followed by MLP with a percentage of 96.34 % and 431
correct classifications, while RBF has the lowest accuracy of 96.05 % with 429 correct
classifications. However, RBF performed better than MLP in the training phase, and it has
achieved the lowest accuracy in the validation phase. Performance indices comparisons are
shown in Fig. 10. Value of ROC and MCC for PNN achieved 0.99382 and 0.9484, respectively.
Fig. 10
Performance comparison of MLP, RBF and PNN classifiers during validation phase
Full size image
ROC curves for detecting benign breast tumors using PNN classifier during validation phase
are shown in Fig. 11. The sensitivity and specificity versus probability threshold during testing
phase of PNN for benign class are shown in Fig. 12. This chart shows how sensitivity and
specificity can be adjusted by shifting the probability threshold for classifying cases as benign
or malignant. According to the overall results, it is seen that the most suitable neural network
model for classifying breast cancer tumors is PNN.
Fig. 11
ROC for detecting benign breast tumors using PNN classifier during validation
phase
Full size image
Fig. 12
Sensitivity and specificity versus probability threshold during validation phase for
benign classification using PNN
Full size image
Conclusion
The prediction and classification of breast cancer has been a challenging research problem for
many researchers. Early detection of breast cancer will help to increase the chance of survival
since the early treatment can be decided for the patients who suffer this disease. To attain the
best solution in a specific problem, several NN learning algorithms must be tested. The choice
of the ANN type is critical for delineating, and it is clear that there is not an algorithm that
outperforms all the others NN algorithms in all problems. Each algorithm should be executed
several times with different weights initialization and the best ANN can be chosen. Therefore,
a comparison analysis for the generalization capability of MLP, RBF and PNN classifiers has
been studied. The goal of the classification is to distinguish between the cancerous (malignant)
and non-cancerous (benign) tumors. The performance of the proposed structure is evaluated
in terms of sensitivity, specificity, accuracy and ROC. The results revealed that PNN was the
best classifiers by achieving accuracy rates of 100 and 97.66 % in both training and testing
phases, respectively. MLP was ranked as the second classifier and was capable of achieving
97.80 and 96.34 % classification accuracy for training and validation phases, respectively,
using scaled conjugate gradient (SCG) learning algorithm. The best topology of MLP was found
to be 9-3-2 with linear AFs in the input layer, log-sigmoid AFs in the hidden layer and log-
sigmoid AFs in the output layer. The classification accuracy of RBF was obtained using 23
hidden neurons and was ranked as the third classifier. However, RBF performed better than
MLP in the training phase, and it has achieved the lowest accuracy in the validation phase.
Weighted boosting search (RWBS) optimization technique was used as an evolutionary
method for obtaining the optimum number of hidden neurons and achieving the best
accuracy. The experimental results proved that neural networks technique provides
satisfactory results for the classification task of breast cancer tumors. The future investigation
will pay much attention to evaluate PNN in other medical diagnosis problems like micro-array
gene selection, internet and other data mining problems. Therefore, the impressive results
may be obtained with the proposed method and improving the performance of PNN using
high-performance computing techniques. In addition, the combination of the approaches
mentioned above will yield an efficient classifier for a lot of applications.
References
1. 1.
Abbass HA (2002) An evolutionary artificial neural networks approach for breast cancer
diagnosis. Artif Intell Med 25(3):265–281
Article Google Scholar
2. 2.
Abdolmaleki P, Buadu LD, Murayama S et al (1997) Neural network analysis of breast

cancer from MRI findings. Radiat Med 15(5):283–293
Google Scholar
3. 3.
Abdolmaleki P, Buadu LD, Naderimansh H (2001) Feature extraction and classification of

breast cancer on dynamic magnetic resonance imaging using artificial neural network.
Cancer Lett 171(2):183–191
4. 4.
American Cancer Society (2010) Detailed guide: breast cancer.

http://www.cancer.org/Cancer/BreastCancer/DetailedGuide/index. Accessed on 14 April
2012
5. 5.
Athanasiou A, Tardivon A, Ollivier L et al (2009) How to optimize breast ultrasound. Eur J

Radiol 69(1):6–13
6. 6.
Balakumaran T, Vennila ILA, Gowri Shankar C (2010) Microcalcification detection in

digital mammograms using novel filter bank. Procedia CS 2:272–282
Google Scholar
7. 7.
Battiti R, Masulli F (1990), BFGS optimization for faster and automated supervised
learning, INCC 90 Paris, international neural network conference, pp 757–760
8. 8.
Beale MH, Hagan MT, Demuth HB (2011) Neural network toolbox™ 7 user’s guide. The
MathWorks, Inc., Natick
Google Scholar
9. 9.
Beale EML (1972) A derivation of conjugate gradients. In: Lootsma FA (ed) Numerical
methods for nonlinear optimization. Academic Press, London
Google Scholar
10. 10.
Berrar DP, Downes CS, Dubitzky W (2003) Multiclass cancer classification using gene
expression profiling and probabilistic neural networks. In: Proceedings of the 8th pacific
symposium on biocomputing (PSB 2003), Lihue, Hawaii, USA, Jan 3–7, pp 5–16
11. 11.
Bednar EM (2011) Identification and classification of player types in massive multiplayer

online games using avatar behavior. Ph.D, Air Force Institute of Technology, Ohio, USA
12. 12.
Bishop CM (1995) Neural networks for pattern recognition. Claren-don Press, Oxford
Google Scholar
13. 13.
Boyle P, Levin B (2008) World cancer report 2008. International Agency for Research on
Cancer, Lyon
Google Scholar
14. 14.
Bridle JS (1989) Probabilistic interpretation of feedforward classification network

outputs, with relationships to statistical pattern recognition. In: Fougelman-Soulie F (ed)
Neurocomputing: algorithms, architectures and applications. Springer, Berlin, pp 227–
236
Google Scholar
15. 15.
Burke HB, Goodman PH, Rosen DB et al (1997) Artificial neural networks improve the
accuracy of cancer survival prediction. Cancer 79(4):857–862
16. 16.
Buhmann MD (2003) Radial basis functions: theory and implementations. Cambridge

University Press, Cambridge
Google Scholar
17. 17.
Chen Y, Abraham A, Yang B (2005a) Hybrid neurocomputing for breast cancer detection.
The fourth IEEE international workshop on soft computing as transdisciplinary science
and technology (WSTST’5): 884–892
18. 18.
Chen S, Wang X, Harris CJ (2005) Experiments with repeating weighted boosting search
for optimization in signal processing applications. IEEE Trans Syst Man Cybern B Cybern
35(4):682–693
19. 19.
Chou SM, Lee TS, Shao YE, Chen IF (2004) Mining the breast cancer pattern using
artificial neural networks and multivariate adaptive regression splines. Expert Syst Appl
27(1):133–142
20. 20.
Christoyianni I, Koutras A, Dermatas E, Kokkinakis G (2002) Computer aided diagnosis of

breast cancer in digitized mammograms. Comput Med Imaging Graph 26(5):309–319
21. 21.
Cross SS, Harrison RF, Kennedy RL (1995) Introduction to neural networks. Lancet
346(8982):1075–1079
22. 22.
Crowe JP, Patrick RJ, Rybicki LA et al (2003) Does ultrasound core breast biopsy predict
histologic finding on excisional biopsy? Am J Surg 186(4):397–399
23. 23.
Cybenko G (1996) Neural networks in computational science and engineering. IEEE

Comput Sci Eng 36–42. doi:10.1109/99.486759
24. 24.
De Laurentiis M, De Placido S, Bianco AR et al (1999) A prognostic model that makes

quantitative estimates of probability of relapse for breast cancer patients. Clin Cancer
Res 5(12):4133–4139
Google Scholar
25. 25.
Diamantidis NA, Karlis D, Giakoumakis EA (2000) Unsupervised stratification of cross-

validation for accuracy estimation. Artif Intell 116:1–16
MathSciNet Article MATH Google Scholar
26. 26.
Doi K, MacMahon H, Katsuragawa S et al (1999) Computer-aided diagnosis in radiology:

potential and pitfalls. Eur J Radiol 31(2):97–109
27. 27.
Doi K (2007) Computer-aided diagnosis in medical imaging: historical review, current

status and future potential. Comput Med Imaging Graph 31(4–5):198–211
28. 28.
Dowsett DJ, Kenny PA, Johnston RE (2006) The physics of diagnostic imaging, 2nd edn.
Oxford University Press, Oxford
Google Scholar
29. 29.
Duda RO, Hart PE, Stork DG (2001) Pattern classification, 2nd edn. Wiley, New York
Google Scholar
30. 30.
Elatar I (2002) Cancer registration, NCI Egypt 2001. Cairo, Egypt, National Cancer
Institute. http://www.nci.edu.eg/Journal/nci2001%20.pdf. Accessed 1 April 2004
31. 31.
Floyd CE, Lo JY, Yun AJ et al (1994) Prediction of breast cancer malignancy using an
artificial neural network. Cancer 74(11):2944–2948
32. 32.
Fletcher R (1975) Practical methods of optimization. Wiley, New York
Google Scholar
33. 33.
Fogel DB, Wasson EC, Boughton EM, Porto VW (1998) Evolving artificial neural networks
for screening features from mammograms. Artif Intell Med 14(3):317–326
34. 34.
Francois D, Rossi F, Wertz V, Verleysen M (2007) Resampling methods for parameter-free

and robust feature selection with mutual information. Neurocomputing 70:1276–1288
35. 35.
Furundzic D, Djordjevic M, Bekic AJ (1998) Neural networks approach to early breast

cancer detection. J Syst Architect 44(8):617–633
36. 36.
Gill PE, Murray W, Wright MH (1980) Practical optimization. Academic Press inc.,
Massachusetts
Google Scholar
37. 37.
Gurney K, Wright MJ (1997) An introduction to neural networks. UCL Press (Taylor &
Francis group), London
Google Scholar
38. 38.
Hambly NM, McNicholas MM, Phelan N et al (2009) Comparison of digital mammography

and screen-film mammography in breast cancer screening: a review in the Irish breast
screening program. AJR Am J Roentgenol 193(4):1010–1018
39. 39.
Haykin S (1999) Neural networks, 2nd edn. Prentice Hall, New Jersey. Health Canada,
“Canadian Mammography Quality Guidelines,” 2002
40. 40.
Hestenes M (1980) Conjugate direction methods in optimization. Springer, New York
Google Scholar
41. 41.
Hu Y, Zhang SZ, Yu JK et al (2005) Diagnostic application of serum protein pattern and

artificial neural network software in breast cancer. Ai Zheng 24(1):67–71
Google Scholar
42. 42.
Islam MJ, Ahmadi M, Sid-Ahmed MA (2010) An efficient automatic mass classification

method in digitized mammograms using artificial neural network. Int J of Artif Intell
Appl (IJAIA) 1(3):1–13
Google Scholar
43. 43.
Kerekes J (2008) Receiver operating characteristic curve confidence intervals and

regions. IEEE Geosci Remote Sens Lett 5(2):251–255
44. 44.
Jain KA, Mao J, Mohiuddin KM (1996) Artificial neural networks: a tutorial. IEEE Comput
29(3):31–44
Google Scholar
45. 45.
Janghel RR, Shukla A, Tiwari R, Kala R (2010) Breast cancer diagnosis using artificial
neural network models. In: Proceedings of the 3rd IEEE international conference on
information sciences and interaction sciences, Chengdu, China, June 23–25, pp 89–94
46. 46.
Jerez-Aragonés JM, Gomez-Ruiz JA, Ramos-Jiménez G et al (2003) A combined neural

network and decision trees model for prognosis of breast cancer relapse. Artif Intell Med
27(1):45–63
47. 47.
Johansson EM, Dowla FU, Goodman DM (1990) Backpropagation learning for multi-layer
feed-forward neural networks using the conjugate gradient method, Lawrence
Livermore National Laboratory, Preprint UCRL-JC-104850
48. 48.
Jordan MI (1995) Why the logistic function? A tutorial discussion on probabilities and
neural networks. MIT computational cognitive science report 9503.
http://www.cs.berkeley.edu/*jordan
49. 49.
Kala R, Janghel RR, Tiwari R, Shukla A (2011) Diagnosis of breast cancer by modular
evolutionary neural networks. Int J Biomed Eng Technol (IJBET) 7(2):194–211
50. 50.
Kiyan T, Yildirim T (2004) Breast cancer diagnosis using statistical neural networks. J
Electr Electron Eng 4(2):1149–1153
Google Scholar
51. 51.
Li H, Giger ML, Yuan Y et al (2008) Evaluation of computer-aided diagnosis on a large

clinical full-field digital mammographic dataset. Acad Radiol 15(11):1437–1445
52. 52.
Lisboa PJ, Wong H, Harris P, Swindell RA (2003) Bayesian neural network approach for
modelling censored data with an application to prognosis after surgery for breast cancer.
Artif Intell Med 28(1):1–25
53. 53.
Locasale JW, Cantley LC (2010) Altered metabolism in cancer. BMC Biol 88:88
54. 54.
Lundin M, Lundin J, Burke HB et al (1999) Artificial neural networks applied to survival

prediction in breast cancer. Oncology 57(4):281–286
55. 55.
Malmgren H, Borga M, Niklasson L (2000) Artificial neural networks in medicine and

biology, perspectives in neural computing. Springer, Goteborg
Google Scholar
56. 56.
Mangasarian OL, Wolberg WH (1990) Cancer diagnosis via linear programming. SIAM
News 23(5):1–18
Google Scholar
57. 57.
Mangasarian OL, Setiono R, Wolberg WH (1990) Pattern recognition via linear

programming: theory and application to medical diagnosis. In: Coleman Thomas F, Li
Yuying (eds) Large-scale numerical optimization. SIAM Publications, Philadelphia, pp 22–
30
Google Scholar
58. 58.
Marchevsky AM, Shah S, Patel S (1999) Reasoning with uncertainty in pathology:

artificial neural networks and logistic regression as tools for prediction of lymph node
status in breast cancer patients. Mod Pathol 12(5):505–513
Google Scholar
59. 59.
Mariani L, Coradini D, Biganzoli E et al (1997) Prognostic factors for metachronous

contralateral breast cancer: a comparison of the linear Cox regression model and its
artificial neural network extension. Breast Cancer Res Treat 44(2):167–178
60. 60.
Mattfeldt T, Kestler HA, Sinn HP (2004) Prediction of the axillary lymph node status in
mammary cancer on the basis of clinicopathological data and flow cytometry. Med Biol
Eng Comput 42(6):733–739
61. 61.
McAree B, O’Donnell ME, Spence A et al (2010) Breast cancer in women under 40 years of
age: a series of 57 cases from Northern Ireland. Breast 19(2):97–104
62. 62.
Mian S, Ball G, Hornbuckle J et al (2003) A prototype methodology combining surface-

enhanced laser desorption/ionization protein chip technology and artificial neural
network algorithms to predict the chemoresponsiveness of breast cancer cell lines
exposed to Paclitaxel and Doxorubicin under in vitro conditions. Proteomics 3(9):1725–
1737
63. 63.
Mitchell T (1997) Machine learning. The McGraw-Hill Companies Inc., New York
Google Scholar
64. 64.
Moller MF (1993) A scaled conjugate gradient algorithm for fast supervised learning.
Neural Netw 6:525–533
65. 65.
Naguib RN, Sakim HA, Lakshmi MS et al (1999) DNA ploidy and cell cycle distribution of
breast cancer aspirate cells measured by image cytometry and analyzed by artificial
neural networks for their prognostic significance. IEEE Trans Inf Technol Biomed
3(1):61–69
66. 66.
Naguib RN, Adams AE, Horne CH et al (1997) Prediction of nodal metastasis and
prognosis in breast cancer: a neural model. Anticancer Res 17(4A):2735–2741
Google Scholar
67. 67.
NHS breast screening programmes: annual review 2011. ISBN 978-1-84463-079-0.

http://www.cancerscreening.nhs.uk/breastscreen/
68. 68.
Omar S, Khaled H, Gaafar R et al (2003) Breast cancer in Egypt: a review of disease

presentation and detection strategies. East Mediterr Health J 9(3):448–463
Google Scholar
69. 69.
Padmavati J (2011) A Comparative study on breast cancer prediction using RBF and MLP.
Int J Sci Eng Res 2(1):1–5
Google Scholar
70. 70.
Park SH, Goo JM, Jo CH (2004) Receiver operating characteristic (ROC) curve: practical
review for radiologists. Korean J Radiol 5(1):11–18

71. 71.
Penny W, Frost D (1996) Neural networks in clinical medicine. Med Decis Mak 16(4):386–
398
72. 72.
Powell M (1977) Restart procedures for the conjugate gradient method. Math Program
12(1):241–254
Article MATH Google Scholar
73. 73.
Pradhan M, Sahu RK (2011) An extensive survey on artificial neural network based

cancer prediction using SOFTCOMPUTING Approach. Int J Comput Sci Emerg Technol
IJCSET 2(4):2044–6004
Google Scholar
74. 74.
Rani KU (2010) Parallel approach for diagnosis of breast cancer using neural network
technique. Int J Comput Appl 10(3):1–5
Google Scholar
75. 75.
Reggia JA (1993) Neural computation in medicine. Artif Intell Med 5(2):143–157
76. 76.
Ripley BD (1996) Pattern recognition and neural networks. Cambridge University Press,
Cambridge
Google Scholar
77. 77.
Ripley RM, Harris AL, Tarassenko L (2004) Non-linear survival analysis using neural
networks. Stat Med 23(5):825–842
78. 78.
Rodvold DM, McLeod DG, Brandt JM et al (2001) Introduction to artificial neural

networks for physicians: taking the lid off the black box. Prostate 46(1):39–44
79. 79.
Ronco AL (1999) Use of artificial neural networks in modeling associations of

discriminant factors: towards an intelligent selective breast cancer screening. Artif Intell
Med 16(3):299–309
80. 80.
Rumelhart DE, McClelland JL (1986) Parallel Distributed Processing: exploration in the

microstructure of cognition. MIT Press, Cambridge
Google Scholar
81. 81.
Salim MI, Ahmad AH, Ariffin I et al (2012) Development of breast cancer diagnosis tool
using hybrid magnetoacoustic method and artificial neural network. Int J Biol Biomed
Eng 6(1):61–68
Google Scholar
82. 82.
Sarle WS (1997) Neural network FAQ. Periodic posting to the Usenet newsgroup
comp.ai.neural-nets. ftp://ftp.sas.com/pub/neural/FAQ.html
83. 83.
Sarvestan SA, Safavi AA, Parandeh MN, Salehi M (2010) Predicting breast cancer
survivability using data mining techniques. software technology and engineering
(ICSTE), 2nd international conference, vol 2, pp 227–231
84. 84.
Saslow D, Boetes C, Burke W et al (2007) American cancer society guidelines for breast
screening with MRI as an adjunct to mammography. CA Cancer J Clin 57(2):75–89
85. 85.
Sawarkar SD, Ghatol AA, Pande AP (2006) Neural network aided breast cancer detection
and diagnosis. In: Proceedings of the 7th WSEAS international conference on neural
networks, Cavtat, Croatia, June 12–14, pp 158–163
86. 86.
Sehgal CM, Weinstein SP, Arger PH, Conant EF (2006) A review of breast ultrasound. J
Mammary Gland Biol Neoplasia 11(2):113–123
87. 87.
Sherrod PH (2012) DTREG predictive modeling software. http://www.dtreg.com. Accessed

on April 2012
88. 88.
Specht DF (1990) Probabilistic neural networks. Neural Netw 3:109–118

89. 89.
Street W (1998) A neural network model for prognostic prediction. In: Proceeding ICML
‘98 proceedings of the fifteenth international conference on machine learning, pp 540–
546. ISBN:1-55860-556-8
90. 90.
Te Brake GM (2000) Computer aided detection of masses in digital mammograms. PhD in

medical sciences. Radboud University, Nijmegen
Google Scholar
91. 91.
Tourassi GD, Markey MK, Lo JY, Floyd CE (2001) A neural network approach to breast
cancer diagnosis as a constraint satisfaction problem. Med Phys 28(5):804–811
92. 92.
Trujillano J, March J, Sorribas A (2004) Methodological approach to the use of artificial

neural networks for predicting results in medicine. Med Clin (Barc) 122(Suppl 1):59–67
93. 93.
Tsoukalas LH, Uhrig RE (1997) Fuzzy and neural approaches in engineering. Wiley, NY
Google Scholar
94. 94.
Tu JV (1996) Advantages and disadvantages of using artificial neural networks versus

logistic regression for predicting medical outcomes. J Clin Epidemiol 49(11):1225–1231
95. 95.
UCI (2012) Machine learning repository. http://archive.ics.uci.edu/ml/index.html.

Accessed on 1 Aug 2012
96. 96.
Warner E, Messersmith H, Causer P et al (2008) Systematic review: using magnetic

resonance imaging to screen women at high risk for breast cancer. Ann Int Med
148(9):671–679
97. 97.
Warner E (2008) The role of magnetic resonance imaging in screening women at high
risk of breast cancer. Top Magn Reson Imaging 19(3):163–169

98. 98.
Weinstein SP, Conant EF, Sehgal C (2006) Technical advances in breast ultrasound
imaging. Semin Ultrasound CT MR 27(4):273–283
99. 99.
Yuan Q, Cai C, Xiao H et al (2007) Diagnosis of breast tumours and evaluation of

prognostic risk by using machine learning approaches. Commun Comput Inf Sci 2:1250–
1260
100. 100.
Zweiri YH, Whidborne JF, Sceviratne LD (2002) A three-term backpropagation algorithm.

Neurocomputing 50:305–318
Google Scholar
Download references
Acknowledgments
I would like to highly appreciate and gratefully acknowledge Phillip H. Sherrod, software
developer and consultant on predictive modeling, for his support and consultation during
modeling process.
Author information
Affiliations
1. Faculty of Engineering, Misr University for Science and Technology (MUST), 6th of
October City, Egypt
Ahmad Taher Azar
2. Electronics and Communications Department, Faculty of Engineering, Zagazig

University, Zagazig, Sharkia, Egypt
Shaimaa Ahmed El-Said
Authors
1. Ahmad Taher Azar

View author publications
You can also search for this author in PubMed Google Scholar
2. Shaimaa Ahmed El-Said

Corresponding author
Correspondence to Ahmad Taher Azar.
Rights and permissions

Reprints and Permissions
About this article

Cite this article
Azar, A.T., El-Said, S.A. Probabilistic neural network for breast cancer classification. Neural
Comput & Applic 23, 1737–1751 (2013). https://doi.org/10.1007/s00521-012-1134-8
Download citation
Received: 04 April 2012
Accepted: 06 August 2012
Published: 25 August 2012
Issue Date: November 2013
DOI: https://doi.org/10.1007/s00521-012-1134-8
Share this article
Anyone you share the following link with will be able to read this content:
Get shareable link
Sorry, a shareable link is not currently available for this article.
Copy to clipboard
Provided by the Springer Nature SharedIt content-sharing initiative
Keywords
Classification
Computer-aided diagnosis (CAD)
Scaled conjugate gradient (SCG)
Probabilistic neural networks (PNN)
Radial basis function (RBF)
Repeating weighted boosting search (RWBS)
Download PDF
Advertisement
Over 10 million scientific documents at your fingertips
Switch Edition
Academic Edition
Corporate Edition
Home
Impressum
Legal information
Privacy statement
California Privacy Statement
How we use cookies
Manage cookies/Do not sell my data
Accessibility
Contact us
Not logged in - 41.202.219.246
Egyptian Knowledge Bank (3000142857) - Specialized Presidential Council for Educ and
Scientific Research Portal (3002076174)
Springer Nature
© 2020 Springer Nature Switzerland AG. Part of Springer Nature.
Ahmad Taher Azar
Faculty of Engineering, Misr University for Science and Technology (MUST), 6th of
October City, Egypt
Contact Ahmad Taher Azar
Close
Shaimaa Ahmed El-Said
Electronics and Communications Department, Faculty of Engineering, Zagazig

University, Zagazig, Sharkia, Egypt
Close

Probabilistic Neural Network For Breast Cancer Classification - SpringerLink

Uploaded by

Copyright:

Available Formats

You might also like

Probabilistic Neural Network For Breast Cancer Classification - SpringerLink

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Probabilistic Neural Network For Breast Cancer Classification - SpringerLink

Uploaded by

Copyright:

Available Formats

Advertisement

Probabilistic neural network for breast cancer

Neural Computing and Applications volume 23, pages1737–1751(2013)Cite this article

Access provided by Egyptian Knowledge Bank

Breast cancer diagnosis

Breast cancer screening tests

Mammograms of two oblique and two cranio-caudal films [90]

Full size image

Full size image

Ultrasound images of breast malignant lesion (ultrasound image gallery

Full size image

Subjects and methods

Classifier models description

Multi-layer perceptron (MLP)

Multi-layer perceptron (MLP) neural network

Full size image

The output of` whole neural network is given by:

Radial basis functions (RBF)

Full size image

The Gaussian activation function for RBF networks is given by:

ψk (X) = ∑ λjk φj (X)

Probabilistic neural networks (PNN)

Architecture of a four-layered PNN for n training cases of two classes

Full size image

Learning algorithm of Parzen PNN

Results and discussion

Training phase of classifiers

Full size image

Table 1 MLP training parameters

Table 2 RBF training parameters

Full size image

Table 3 Classification results of the training phase

Full size image

Table 4 Performance indices for training phase of classifiers

Validation phase of classifiers

Table 5 Classification results during the validation phase

Full size image

Full size image

Full size image

Article Google Scholar

Abdolmaleki P, Buadu LD, Murayama S et al (1997) Neural network analysis of breast

Abdolmaleki P, Buadu LD, Naderimansh H (2001) Feature extraction and classification of

Article Google Scholar

American Cancer Society (2010) Detailed guide: breast cancer.

Athanasiou A, Tardivon A, Ollivier L et al (2009) How to optimize breast ultrasound. Eur J

Article Google Scholar

Balakumaran T, Vennila ILA, Gowri Shankar C (2010) Microcalcification detection in

Bednar EM (2011) Identification and classification of player types in massive multiplayer

Bridle JS (1989) Probabilistic interpretation of feedforward classification network

Article Google Scholar

Buhmann MD (2003) Radial basis functions: theory and implementations. Cambridge

Article Google Scholar

Article Google Scholar

Christoyianni I, Koutras A, Dermatas E, Kokkinakis G (2002) Computer aided diagnosis of

Article Google Scholar

Article Google Scholar

Article Google Scholar

Cybenko G (1996) Neural networks in computational science and engineering. IEEE