Download as pdf or txt
Download as pdf or txt
You are on page 1of 5

(IJARAI) International Journal of Advanced Research in Artificial Intelligence,

Vol. 3, No. 7, 2014

Breast Cancer Diagnosis using Artificial Neural


Networks with Extreme Learning Techniques
Chandra Prasetyo Utomo Aan Kardiana Rika Yuliwulandari
YARSI E-Health Research Center YARSI E-Health Research Center YARSI Genetics Research Center
Faculty of Information Technology Faculty of Information Technology Faculty of Medicine
YARSI University, YARSI University, YARSI University, Jakarta,
Jakarta, Indonesia Jakarta, Indonesia Indonesia

Abstract—Breast cancer is the second cause of dead among ANN is one of the best artificial intelligence techniques for
women. Early detection followed by appropriate cancer common data mining tasks, such classification and regression
treatment can reduce the deadly risk. Medical professionals can problems. A lot of research showed that ANN delivered good
make mistakes while identifying a disease. The help of technology accuracy in breast cancer diagnosis. However, this method has
such as data mining and machine learning can substantially several limitations. First, ANN has some parameters to be
improve the diagnosis accuracy. Artificial Neural Networks tuned in the beginning of training process such as number of
(ANN) has been widely used in intelligent breast cancer hidden layer and hidden nodes, learning rates, and activation
diagnosis. However, the standard Gradient-Based Back function. Second, it takes long time for training process due to
Propagation Artificial Neural Networks (BP ANN) has some
complex architecture and parameters update process in each
limitations. There are parameters to be set in the beginning, long
iteration that need expensive computational cost. Third, it can
time for training process, and possibility to be trapped in local
minima. In this research, we implemented ANN with extreme
be trapped to local minima so that the optimal performance
learning techniques for diagnosing breast cancer based on Breast cannot be guaranteed. Numerous efforts had been attempted to
Cancer Wisconsin Dataset. Results showed that Extreme get the solutions of neural networks limitations. Huang and
Learning Machine Neural Networks (ELM ANN) has better Babri [6] proved that Single Hidden Layer Neural Networks
generalization classifier model than BP ANN. The development of (SFLN) with tree steps extreme learning process that called
this technique is promising as intelligent component in medical ELM can solve that problems.
decision support systems.
In this paper, we revealed the implementation of artificial
Keywords—breast cancer; artificial neural networks; extreme
neural networks with extreme learning techniques in breast
learning machine; medical decision support systems cancer diagnosis. The dataset used for experiments was Breast
Cancer Wisconsin Dataset that was obtained from the
I. INTRODUCTION University of Wisconsin Hospital, Madison from Dr. William
The out of control development of cells in an organ is H. Wolberg [7]. We compared the perfomance of ELM with
conventional BP ANN with gradient descent based learning
called tumors that can be cancerous. There are two kinds of
algorithms. Sensitivity, specificity, and accuracy were used as
tumors, benign and malignant. Benign or non-cancerous tumors
are not spreading and are not life intimidating. In the other performance measurements. Results showed that ELM ANN
generally produced better result than BP ANN.
hand, malignant or cancerous tumor are expanding and life
threatening [1]. Malignant breast cancer is defined when the The rest of this paper is organized as the following. Section
growing cells are in the breast tissue. Breast cancer is the 2 is dedicated as literature review. In this section, brief review
second overall cause of mortality among women and the first of previous works in breast cancer diagnosis are presented. In
cause of dead among them between 40 and 55 ages [2]. Section 3, the concept, mathematical model, and training
process of extreme learning machine are explained. In Section
Regular breast cancer diagnosis followed by appropriate
4, experiments, results, and analysis are provided. Finally,
cancer treatment can reduce the unwilling risk. It is suggested
to do tumor evaluation test every 4-6 weeks. Based on that conclusions and future works are given in Section 5.
reason, benign and malignant detection based on classification II. LITREATURE REVIEW
features become very important [3].
The uses of classification systems in medical diagnosis,
Careful diagnosis in early detection has been proven to including breast cancer diagnosis, are growing rapidly.
lessen the dead rate because of breast cancer [4]. Depend on the Evaluation and decision making process from expert medical
expertise, mistakes can be made by medical professionals while diagnosis is key important factor. However, intelligent
identifying a disease. With the help of technology such as data classification algorithm may help doctor especially in
mining and machine learning, diagnosis can be more accurate minimizing error from unexperienced practitioners [3].
(91.1%) when related to a diagnosis made by an experienced
doctor (79.9%) [5]. Several techniques have been deployed to predict and
recognize meaningful pattern for breast cancer diagnosis. Ryua

This research is funded by YARSI University in the Internal Research


Grant Schema 2013/2014.

10 | P a g e
www.ijarai.thesai.org
(IJARAI) International Journal of Advanced Research in Artificial Intelligence,
Vol. 3, No. 7, 2014

[8] developed data classification method, called isotonic βi = [βi1, βi2, . . . , βip]T,
separation. The performances were compared against support (3)
vector machines, learning vector quantization, decision tree
induction, and other methods based on two-breast cancer data wi ∙ x(k) is the inner product of wi and x(k),
set, sufficient and insufficient data. The experiment results bi is the bias of the i-th hidden node,
demonstrated that isotonic separation was a practical tool for o(k) ϵ Rp is the output of neural network for k-th vector.
classification in the medical domain.
Hybrid machine learning method was applied by Sahan [9] SLFN can approximate m vectors means that there are exist
in diagnosing breast cancer. The method hybridized a fuzzy- wi, βi, and bi, such that:
artificial immune system with k-nearest neighbour algorithm.
The hybrid method delivered good accuray in Wisconsin
Breast Cancer Dataset (WBCD). They believe it can also be
(4)
tested in other breast cancer diagnosis problems.
Comprehensive view of automated diagnostic systems
implementation for breast cancer detection was provided by
Ubeyli [10]. It compared the performances of multilayer
perceptron neural network (MLPNN), combined neural
network (CNN), probabilistic neural network (PNN), recurrent (5)
neural network (RNN) and support vector machine (SVM). The
aim of that works was to be a guide for a reader who wants to
develop this kind of systems. Equation (5) can be written as:
Numerous combinations and hybrid systems used neural
networks as a component. However, since almost all of the , (6)
employed neural networks are conventional gradient descent
BP ANN, the novel or hybrid method still suffered the neural where
networks drawbacks that were mentioned in the previous
section. H є Rm x M is the hidden layer output matrix of the neural
III. EXTREME LEARNING MACHINE networks.
Huge efforts had been attempted to solve the weaknesses of
 g ( w1  x (1)  b1 )  g ( wM  x (1)  bM ) 
BP ANN. Huang and Babri [6] demonstrated that single hidden H=   (7)
layer feedforward neural networks (SLFN) with at most m     
hidden nodes was capable to estimate function for m different  g ( w1  x ( m )  b1 )  g ( wM  x ( m )  bM )
 
vectors in training dataset.
Given m instances in D = {(x(k), t(k)) | x(k) ϵ Rn, t(k) ϵ Rp, k = β є RM xp
is the weights between hidden and output layers
1,..,m} as training dataset where x(k) = [x1(k), x2(k), …., xn(k)]T as
features and t(k) = [t1(k), t2(k), …., tp(k)]T as target. A SLFN with M
  1T 
number of hidden nodes, activation function g(x) in hidden  
β=   , (8)
nodes, and linear activation function in output nodes is
 T 
mathematically wrote as:
 M 

T є Rm x p is the target values of m vectors in training dataset

 t (1) T 
(1)  
T=   , (9)
where t ( m ) T 
 
wi ϵ Rn is the weights between the input nodes and the i-th
hidden node In the traditional gradient descent based learning algorithm,
weights wi which was connecting the input layer and hidden
wi = [wi1, wi1, . . . , win ]T, layer and biases bi in the hidden nodes were needed to be
(2) initialized and tuned in every iteration. This was the main
factor which often made training process of neural became time
βi ϵ Rp is the weights between the i-th hidden node and the consuming and the trained model may not reach global
minima.
output nodes

11 | P a g e
www.ijarai.thesai.org
(IJARAI) International Journal of Advanced Research in Artificial Intelligence,
Vol. 3, No. 7, 2014

Huang [11] proposed minimum norm least-squares solution Cancer Wisconsin Dataset obtained from the University of
of SLFN which didn’t need to tune those parameters. Training Wisconsin Hospital, Madison from Dr. William H. Wolberg
SLFN with fixed input weights wi and the hidden layer biases bi [7]. The data has 699 instances with 10 attributes plus the class
was similar to find a least square solution of the linear attributes. The class distribution are 65.5% (458 instances) for
benign and 34.5% (241 instances) for malignant. The attribute
system :
information can be seen in TABLE I.
– TABLE I. ATTRIBUTE INFORMATION
(10) # Attributes Domain
1 Sample Code Number id number
The smallest norm least squares solution of that linear 2 Clump Thickness 1 – 10
system was 3 Uniformity of Cell Size 1 – 10
4 Uniformity of Cell Shape 1 – 10
(11) 5 Marginal Adhesion 1 – 10
6 Single Epithelial Cell Size 1 – 10
7 Bare Nuclei 1 – 10
where was the Moore-Penrose generalized inverse of 8 Bland Chromatin 1 – 10
matrix H. This solution had three important properties which 9 Normal Nucleoli 1 – 10
were minimum training error, smallest norm of weights, and 10 Mitoses 1 – 10
unique solution which is . 11 Class 2: benign; 4: malignant

The above minimum norm least-square solution for SLFN


was called extreme learning machine (ELM). Given m In the second step, the raw dataset was preprocessed to
instances in training dataset D = {(x(k), t(k)) | x(k) є Rn, t(k) є Rp, k produce well-from data that suitable for training and testing
= 1,..,m}, activation function g(x), and number of hidden node process. The first attribute, sample code number, was removed
M. The training process of ELM is the the following: because it was not relevant to the diagnosis. The next nine
attributes were normalized into [-1, 1] and used as predictor.
1. Randomly set input-hidden layer weights wi and bias The last attribute was transformed to 0 (benign) and 1
bi, i = 1,…,M. (malignant) such that it can be properly fitted to the standard
2. Compute the matrix of hidden layer output H BP ANN and ELM ANN implementation.
3. Compute the hidden-output layer weights for The method in this experiment was k-fold crossvalidation
where T = [t(1),…, t(m)]. with k = 5. This means, the data were randomly divided into 5
partitions. There were 5 experiments. In the each experiment, a
Based on that definition, there are three main differences partition was used as testing data and the rest partitions were
between BP ANN and ELM ANN. First, BP ANN needs to treated as training data.
tuning several parameters, such as number of hidden nodes, The standard performance measurement for classification
learning rates, momentum, and termination criteria. On the problem was accuracy. However, since the class distribution
other hand, ELM ANN is a simple tuning free algorithm. The was not balanced, it was important to use specificity and
only one to be defined is number of hidden nodes. Second, BP sensitivity as supplementary measurements. In addition, to
ANN works only for differentiable activation functions in minimize the effect of random generated weights in BP ANN
hidden and output nodes while ELM ANN can use both and ELM ANN, each experiment was run three times and the
differentiable and undifferentiable activation functions. average results were noted.
Finally, BP ANN get trained model which has minimum
training error so that there is a possibility to finish in local (13)
minima. On the other hand, ELM ANN get trained model
which has minimum training error and smallest norm of weight
so that it can produce better generalization model and reach (14)
global minima [12].
IV. EXPERIMENTS, RESULTS, AND ANALYSIS (15)
This section discussed about experimental design,
generated results, and analytical process in order to get valid B. Results and Analysis
conclusion.
With 5-fold crossvalidation method and each experiment
A. Experiments were run three times, there will be 15 experiments in total. The
The experiments consisted of three main steps, which were whole steps had been done in computer with Intel® Core™ i3,
data gathering, data preprocessing, and performance 4096MB RAM, and Windows 7 OS. The results of ELM are
evaluating. The dataset used in this experiment was Breast given in TABEL II.

12 | P a g e
www.ijarai.thesai.org
(IJARAI) International Journal of Advanced Research in Artificial Intelligence,
Vol. 3, No. 7, 2014

TABLE II. ELM PERFORMANCES


1.000

Average Sensitivity Rates


Experiments Running Sensitivity Specificity Accuracy
1 0,902 0,965 0,942 0.800
#1 2 0,882 0,977 0,942 0.600
3 0,882 0,965 0,934
1 0,975 0,958 0,963 0.400 BP ANN
#2 2 0,950 0,979 0,971 ELM ANN
0.200
3 1,000 0,969 0,978
1 0,909 0,988 0,956 0.000
#3 2 0,964 0,963 0,963 1 2 3 4 5
3 0,964 0,975 0,971 Experiments #
1 0,957 1,000 0,985
#4 2 0,957 0,989 0,978 Fig. 1. Average sensitivity rates of BP ANN dan ELM
3 0,957 1,000 0,985
1 0,957 0,967 0,964 1.000

Average Specificity Rates


#5 2 1,000 0,956 0,971
3 0,957 0,956 0,956 0.800
In order to compare the ELM ANN performances, training 0.600
and testing with BP ANN were conducted with identical
experimental design. The performances of BP ANN can be 0.400 BP ANN
seen in TABLE III. ELM ANN
0.200

TABLE III. BP ANN PERFORMANCES 0.000


Experiments Running Sensitivity Specificity Accuracy 1 2 3 4 5
1 0,875 0,975 0,934 Experiments #
#1 2 0,750 0,988 0,891
3 0,804 0,963 0,898 Fig. 2. Average specificity rates of BP ANN dan ELM
1 0,855 0,987 0,927
#2 2 0,871 0,973 0,927
3 0,903 0,973 0,941
1.000
Average Accuracy Rates

1 0,825 0,949 0,897 0.800


#3 2 0,860 0,987 0,934
3 0,825 0,975 0,912 0.600
1 0,790 0,975 0,898 BP ANN
0.400
#4 2 0,825 0,975 0,912
ELM ANN
3 0,807 0,975 0,905 0.200
1 0,892 1,000 0,949
#5 2 0,892 1,000 0,949 0.000
1 2 3 4 5
3 0,877 1,000 0,942
Experiments #
To have clear view between the performances of ELM
ANN and BP ANN, the results were transformed to graphical
charts. In each performance measurement, the average values Fig. 3. Average accuracy rates of BP ANN dan ELM
were computed in each experiment. Fig 1 shows the
comparison of average sensitivity rates between BP ANN and Based on Fig 1 and Fig 3, we can see that ELM ANN was
ELM ANN. The comparison of specificity rates were given in superior compared to BP ANN. ELM ANN has better
Fig 2 while accuracy rates can be seen in Fig 3. performances in term of sensitivity and accuracy in all
experiments.

13 | P a g e
www.ijarai.thesai.org
(IJARAI) International Journal of Advanced Research in Artificial Intelligence,
Vol. 3, No. 7, 2014

However, in term of specificity, BP ANN has better The development of interfaces in mobile, desktop, or web
performance in three experiments which were experiment #1, application may be useful. Third, there are new cases added
#2, and #5. regularly in the hospital. Developing intelligent diagnosis
systems that can not only learn from available data in
To get general conclusion, overall comparison need to be repositories but also from newly available data will be required.
computed. In each performance measurement, commulative
average rates were matched. Fig 4 displays the whole REFERENCES
sensitivity, specificity, and accuracy average rates between BP [1] T. S. Subashini, V. Ramalingam, and S. Palanivel, “Breast mass
ANN and ELM ANN. Result showed that, generally ELM classification based on cytological patterns using RBFNN and SVM,”
Expert Syst. Appl., vol. 36, no. 3, pp. 5284–5290, Apr. 2009.
ANN were better than BP ANN.
[2] TheAmericanCancerSociety, “What is Breast Cancer.” .
0.980 0.974 [3] M. F. Akay, “Support vector machines combined with feature selection
1.000 0.948 0.964 for breast cancer diagnosis,” Expert Syst. Appl., vol. 36, no. 2, pp. 3240–
0.921
3247, Mar. 2009.
0.900 0.843 [4] D. West, P. Mangiameli, R. Rampal, and V. West, “Ensemble strategies
0.800 for a medical diagnostic decision support system: A breast cancer
diagnosis application,” Eur. J. Oper. Res., vol. 162, no. 2, pp. 532–551,
0.700 Apr. 2005.
[5] R. W. Brause, “Medical Analysis and Diagnosis by Neural Networks,”
0.600 in Proceeding ISMDA ’01 Proceedings of the Second International
Symposium on Medical Data Analysis, 2001, pp. 1–13.
0.500
[6] G. B. Huang and H. A. Babri, “Upper bounds on the number of hidden
Sensitivity Specificity Accuracy neurons in feedforward networks with arbitrary bounded nonlinear
activation functions.,” IEEE Trans. Neural Netw., vol. 9, no. 1, pp. 224–
BP ANN ELM ANN 9, Jan. 1998.
[7] W. H. Wolberg and O. L. Mangasarian, “Multisurface method of pattern
Fig. 4. Overall average rates between BP ANN dan ELM separation for medical diagnosis applied to breast cytology,” in
Proceedings of the National Academy of Sciences, U.S.A., Volume 87,
December 1990, pp. 9193–9196.
V. CONLUSION AND FUTURE WORKS
[8] Y. U. Ryu, R. Chandrasekaran, and V. S. Jacob, “Breast cancer
The performances of ELM ANN were generally better than prediction using the isotonic separation technique,” Eur. J. Oper. Res.,
BP ANN in breast cancer diagnosis. Although the specificity vol. 181, no. 2, pp. 842–854, Sep. 2007.
rate was slightly lower than BP ANN, it can be clearly seen [9] S. Sahan, K. Polat, H. Kodaz, and S. Güneş, “A new hybrid method
that ELM ANN remarkably improved the sensitivity and based on fuzzy-artificial immune system and k-nn algorithm for breast
cancer diagnosis.,” Comput. Biol. Med., vol. 37, no. 3, pp. 415–23, Mar.
accuracy rates. Based on these results, we can conclude that 2007.
ELM ANN has better generalization model than BP ANN in [10] E. D. Übeyli, “Implementing automated diagnostic systems for breast
diagnosing breast cancer based on Breast Cancer Wisconsin cancer detection,” Expert Syst. Appl., vol. 33, no. 4, pp. 1054–1062,
Dataset. Nov. 2007.
[11] G.-B. Huang, Q.-Y. Zhu, and C.-K. Siew, “Extreme learning machine:
There are some necessary works to be done in near future. Theory and applications,” Neurocomputing, vol. 70, no. 1–3, pp. 489–
First, it is important to communicate the model with domain 501, Dec. 2006.
expert. The hybrid of ELM ANN with Decision Tree or any [12] C. P. Utomo, “The Hybrid of Classification Tree and Extreme Learning
other technique that can produce meaningful knowledge Machine for Permeability Prediction in Oil Reservoir,” Int. J. Comput.
representation will be promising. Second, to make intelligent Sci. Issues, vol. 10, no. 1, pp. 52–60, 2013.
diagnosis tool that can be used by end user, it is necessary to
develop interactive user interface.

14 | P a g e
www.ijarai.thesai.org

You might also like