Download as pdf or txt
Download as pdf or txt
You are on page 1of 14

Received: 4 March 2020

DOI: 10.1049/cvi2.12021
- -Revised: 20 August 2020

O R I G I N A L R E S E A R C H PA P E R
Accepted: 9 September 2020

- IET Computer Vision

Going deeper: magnification‐invariant approach for breast cancer


classification using histopathological images

S. Alkassar1 | Bilal A. Jebur1 | Mohammed A. M. Abdullah1 |


Joanna H. Al‐Khalidy1 | J. A. Chambers2

1
Computer and Information Engineering Abstract
Department, Ninevah University, Mosul, Iraq
Breast cancer has the highest fatality for women compared with other types of cancer.
2
Department of Engineering, University of Leicester, Generally, early diagnosis of cancer is crucial to increase the chances of successful treat-
Leicester, UK
ment. Early diagnosis is possible through physical examination, screening, and obtaining a
Correspondence
biopsy of the dubious area. In essence, utilizing histopathology slides of biopsies is more
Sinan Alkasaar, Computer and Information efficient than using typical screening methods. Nevertheless, the diagnosing process is still
Engineering Department, Ninevah University, tiresome and is prone to human error during slide preparation, such as when dyeing and
Mosul, Iraq. imaging. Therefore, a novel method is proposed for diagnosing breast cancer into benign
Email: sinan.alkassar@uoninevah.edu.iq
or malignant in a magnification‐specific binary (MSB) classification. Besides, the intro-
duced method classifies each type into four subclasses in a magnification‐specific multi‐
category (MSM) fashion. The proposed method involves normalizing the hematoxylin and
eosin stains to enhance colour separation and contrast. Then, two types of novel features
—deep and shallow features—are extracted using two deep structure networks based on
DenseNet and Xception. Finally, a multi‐classifier method based on the maximum value is
utilized to achieve the best performance. The proposed method is evaluated using the
BreakHis histopathology data set, and the results in terms of diagnostic accuracy are
promising, achieving 99% and 92% in terms of MSB and MSM, respectively, compared
with recent state‐of‐the‐art methods reported in the survey conducted by Benhammou on
the BreakHis data set using deep learning and texture‐based models.

1 | Introduction patients' unawareness of the symptoms of the disease, resulting


in late diagnosis and increased mortality.
Cancer is one of the most fatal diseases affecting humans in The breast cancer detection process involves first
recent years. According to the World Health Organization, the examining and imaging the suspected area in the breast using
International Agency for Research on Cancer report in 2014 versatile imaging devices such as mammogram x‐ray, magnetic
highlighted around 14 million new cases, of which 8 million led resonance imaging, thermography, and sonography [4].
to mortality, and approximately more than 30 million will be Although there has been significant studies on detecting and
affected over the next decade [1]. This has driven researchers diagnosing breast cancer using the aforementioned imaging
from different disciplines to conduct much research on im- techniques, acquiring a biopsy from the suspect malignant area
aging, diagnosing, and curing prospective patients. In general, a is still the most effective method to determine whether the
tumour can affect many anatomical parts of the human body, breast tissue is cancerous or not. This process, however, has
but breast cancer is the most common type, especially in some drawbacks as the pathologist has to inspect the histo-
women. Breast cancer also has a higher rate of fatality logical samples visually, using a microscope, taking consider-
compared with other types of cancer [2,3]. Moreover, other ation of multiple samples and magnification factors to make a
factors, such as wars and a lack of healthy nutrition, had led to diagnosis. Besides, the detection process will be subjective as it

-
This is an open access article under the terms of the Creative Commons Attribution License, which permits use, distribution and reproduction in any medium, provided the original work is
properly cited.
© 2021 The Authors. IET Computer Vision published by John Wiley & Sons Ltd on behalf of The Institution of Engineering and Technology.

IET Comput. Vis. 2021;15:151–164. wileyonlinelibrary.com/journal/cvi2 151


152
- ALKASSAR ET AL.

F I G U R E 1 Examples of two staining


protocols, which are the H&E and IHC, used for
dyeing the histopatholgic images. The first two
images from the left depict the H&E staining,
whereas the other two images show the IHC
staining examples

is open to the pathologist's interpretations. Thus, research on as the nuclei shape and cell mitotic structure, or using a global
histopathology images of tumours has been emerging to make approach where all the image pixels are utilized.
the process of detecting and diagnosing cancer less prone to Veta et al. [5] provided an overview of the recent methods
errors and to introduce a fully automatic classification system used to diagnose breast cancer using histopathology images, in
[6]. This has led to digitizing the workflow of most pathology particular using the WSI scanner images to predict tumour type
labs as a necessity, making it easier for experts to diagnose and via relevant segmented parts of such images to construct tissue
identify anomalies in histological images [5]. microarrays. They concluded that there is a lack in the number
Usually, hematoxylin and eosin (H&E) is the standard of data sets in general or the annotated images within the data
staining protocol, where the hematoxylin dyes the nuclei with sets. Also in [9], various CAD systems have been reviewed
purple or blue, whereas the eosin colours other structures, such utilizing different data sets, such as the digital database for
as cytoplasm, in pink [7]. There is another staining procedure, screening mammography (DDSM) [10] and BancoWeb [11],
called immunohistochemistry (IHC) staining, which is more that are all mammographic data sets. They reported that most
advanced, utilizing antibodies to dye certain antigens in the of the work on breast cancer CAD systems consists of four
affected tissue [8]. Examples of the H&E and IHC staining stages: preprocessing, segmentation, feature extraction and
samples are shown in Figure 1. After that, slide digitization is selection, and classification. In [20], the authors surveyed deep
achieved either by using a digital camera mounted on a mi- learning techniques used to detect numerous cancer types, in
croscope or via modern whole slide imaging (WSI) scanners. terms of network architecture, such as auto‐encoder, fully
However, most cancer detection centres around the world, connected, and convolutional networks, as well as their effec-
especially in developing countries, have neither the funding nor tiveness in detecting various cancer types. Similarly, they
the advanced techniques to use the WSI scanners, leaving them concluded that most of the work is applied to unpublished data
with the typical microscopic slide digitization. Hence, using a sets. As a result, Benhammou et al. [33] presented a survey on
digital technique or computer‐aided diagnosis (CAD) is bene- recent breast cancer diagnoses methods using an efficient
ficial to help the pathologist in determining the biopsy type, public data set called BreakHis [3] and provided a compre-
and therefore, we propose a method for classifying histopa- hensive list of all related work in terms of handcrafted or
thology biopsy images into benign and malignant using deep neural network methods and magnification dependency.
and shallow features. As mentioned, some approaches extract a region of
interest (ROI) in these slides to classify the tumour, and hence,
the efficiency of the segmentation method directly affects
1.1 | Literature Review classification accuracy. In this context, a novel approach is
proposed in [18] to segment the nuclei in H&E stained
For the last couple of decades, there has been much literature histological images using spatially constrained expectation
on developing a CAD system for breast cancer detection [43]. maximization, while in [17], an improved watershed method is
Nevertheless, all the ongoing work still faces challenges and presented to segment nuclei and overcome issues regarding
issues. For instance, different staining procedures exploit overlapped nuclei regions. In [19], Kurmi et al. focused on
various techniques depending on how the structure parts of developing an enhanced method to segment the overlapped
the cell are dyed. In particular, the H&E staining protocol has nuclei by using a silent‐based active contour for determining
been investigated, and many approaches are utilized. One of the nuclei areas and applied a Gaussian distribution with
the major challenges with H & E stained breast cancer histo- entropy maximization for the final segmentation. In contrast,
pathological images is the variability of appearance due to Bejnordi et al. [34] introduced an automatic ROI detection
many reasons, such as tissue and staining preparation, slide method for histopathology images captured via the WSI
digitization, as well as the heterogeneity of cancer. On the scanner. Their approach exploits a multiscale superpixel
other hand, cancer classification into malignant and benign has classification technique that outperforms the regular identifi-
been achieved using two approaches found in the literature, cation method using rectangular patches. Nevertheless, seg-
each having its pros and cons. The classification is accom- menting ROI areas might be prone to errors, resulting in
plished by either explicitly segmenting parts of the tissue, such misclassification of some regions.
ALKASSAR ET AL.
- 153

In terms of classification methods, Spanhol et al. [13] area, which are not as accurate as histopathology images. (2)
utilized convolutional neural networks (CNNs) and trained These studies have neither reviewed special techniques
them on patches extracted from the histopathology images employed in deep trained networks to enhance diagnosis nor
and examined various fusion rules for different combina- considered magnification‐invariant classification approaches.
tions of CNNs. Also in [12], the authors used DeCaf fea- (3) These studies have only used binary classification to
tures for breast cancer image classification and compared distinguish between the malignant and benign tumours.
their work with the appearance and textural descriptors, Additionally, classifying breast tumours using histopathology
while in [14], different textural descriptors have been used to images is facing problems with feature selection, which is
diagnose breast cancer cells into benign or malignant. An either extracted using handcrafted methods or via a deep
extended adaptive top–bottom transform is used to enhance trained neural network, especially when images are captured
the images, and nuclei regions are extracted. Then, the using many zooming factors.
Rotation Forest is utilized to classify these cell regions. In
[15], biomarkers such as estrogen receptor and human
epidermal growth factor receptor 2 have been used beside 1.2 | Our contribution
histopathology images dyed using the IHC protocol to
predict the cancer type through a deep trained network. In Lately, the rise of deep networks and their superior perfor-
addition, a fully automated deep discriminative network for mance for object recognition, image classification, and seg-
histopathology images that performs contextual grading of mentation have driven medical research to adopt neural
heterogeneous holistic‐level descriptions to distinguish the networks for diagnosing diseases. We introduce a novel
benign and malignant classes is suggested in [16]. In [32], method to diagnose breast cancer, and our contribution can be
Han et al. suggested a multi‐classification method using a summarized as follows:
new deep trained network structure. They tested their
hypothesis on the wide‐scale database BreakHis [3] and � Applying stain normalization to overcome the variance in
achieved an accuracy of around 93%. the staining procedure
Bardou et al. [35] compared two machine learning ap- � magnification‐invariant method for breast cancer detection
proaches based on CNN and the extraction of handcrafted using deep features (DF) and shallow features (SF) extracted
traits employing two coding modes to classify each tumour on the basis of the Dense and Xception architectures to
into benign and malignant. Song et al. [38] suggested an classify breast cancer into benign and malignant and
improved method for classifying benign and malignant tu- accordingly into eight subclasses utilizing binary and multi‐
mours on the basis of transfer learning, using the Fisher classification approaches
vector of local features. They first extracted local features � Using an ensemble classifier set and applying the maximum
using the Gaussian mixture model. Then, they added a new rule to determine the best classifier for various magnifica-
adaptive layer to the VGG deep network [30] using the tion scales
transfer learning approach. Gupta and Bhavsar [39] pre- � Comparing DF and SF with features extracted utilizing
sented a joint colour–texture feature extraction and classifi- recent state‐of‐the‐art deep trained networks
cation to produce a set of scores and discussed the best � Evaluating the proposed method using a large‐scale public
feature sets for various magnifying factors. The transition data set called BreakHis that provides histopathology slides
module in the Inception network has been investigated for captured with multiple magnification factors
tumour classification in [40]. The average pooling layers
from different filter sizes have been utilized in the learning The organization of content here is as follows: Section 2
process through the transition module to encourage class‐ presents the data set used to evaluate the proposed method.
specific filters, and the transition module is then integrated Section 3 describes the proposed system, including stain
into AlexNet and ZFNet for classification. Nejad et al. [41] normalization, feature extraction, and classification, and Sec-
used a pretrained CNN model for patient‐specific tumour tion 4 presents and reviews the results. Finally, we draw our
descriptors to obtain semantic scores to enhance the binary conclusions in Section 5.
classifier performance. Finally, a binary classification via
deep active learning framework is proposed in [42]. The
advantage of the proposed system is to reduce labelling 2 | BreakHis DATA SET
weight where unlabelled samples are chosen on the basis of
valuable information rather than random selection. The large‐scale BreakHis data base[3] is used to evaluate the
Although current work has covered many aspects such as proposed system. The data set has histopathologist micro-
histopathology image enhancement, handcrafted, and neural scopic images of breast biopsies representing two tumour
network‐based classification methods, some limitations and classes: malignant and benign. Besides, the two classes are
challenges remain, which can be listed as follows: (1) Most divided into four subclasses representing different types of
studies on breast cancer CAD system evaluation have been malignant and benign tumours, namely, ductal carcinoma (DC),
carried out using only small or non‐public data sets. Besides, lobular carcinoma (LC), mucinous carcinoma (MC), papillary
some have utilized images captured from screening the breast carcinoma (PC), adenosis (A), fibroadenoma (F), phyllodes
154
- ALKASSAR ET AL.

tumour (PT), and tubular adenoma (TA), which are shown in (a)
Table 1. The obtained samples are digitalized using four
magnifying factors— 40X, 100X, 200X, and 400X—having an
effective pixel size of 0.49, 0.20, 0.10, and 0.05 μm, respec-
tively. The black borders of original slides are removed during
digitalization, resulting in RGB images. Each channel is of
eight‐bit depth, and the final size is 700 � 460. However,
neither stain normalization nor colour standardization is
applied. Samples of several magnifying factors for the biopsy
slides are presented in Figure 2. The total number of digitalized
samples collected from 82 patients is 7909 images, divided as
2368 images for 24 benign‐diagnosed patients and 5429 images
for malignant‐diagnosed cases. The distribution of each sub-
class is depicted in Figure 3.

(b)
3 | PROPOSED METHOD

The proposed method consists of many steps, including stain


normalization, DF and SF extraction, and comparing these
features on the basis of the two classification scenarios shown
in Figure 4

TA B L E 1 Types of tumours in the BreakHis data set

Malignant tumours Bengin tumours


Ductal Carcinoma (DC) Adenosis (A)

Lobular Carcinoma (LC) Fibroadenoma (F)

Mucinous Carcinoma (MC) Phyllodes Tumour (PT) F I G U R E 3 The distribution of each category in the BreakHis data set
in terms of the number of captured slides: (a) malignant tumour subclasses
Papillary Carcinoma (PC) Tubular Adenoma (TA) and (b) bengin tumour subclasses

F I G U R E 2 Examples of the magnifying factors


introduced when capturing slides in the BreakHis
data set. The top row from the left shows the 40X
and 100X zooming factors, respectively, whereas the
bottom row from the left presents the 200X and
400X, respectively
ALKASSAR ET AL.
- 155

3.1 | Histopathology images stain The normalization process consists of four modules: stain
normalization matrix estimation, colour deconvolution (CD), statistics
nonlinear mapping, and slide reconstruction. The generation
The initial process before classifying histology images is to of stain matrix involves transforming both original and refer-
normalize the stain intensities (i.e. adjusting colour intensities ence slides from RGB colour space into a new colour space
for each pixel) for the H&E. This is an essential process due to defined by the stain colour descriptor (SCD). The SCD has two
many reasons, such as the variation in stain portions, tissue phases to calculate the stain matrix: (1) training—a set of
preparation disparity, user variation, and the degree of the quantized slide histograms is utilized to derive principal colour
absorbed light. We, therefore, applied the method suggested in histograms to obtain SCDs— and (2) learning—employing
[36] for stain normalization. This method depends on a non‐ both colour spaces (RGB and SCD) to create classification
linear mapping of exemplary reference slides for the H&E models that consist of probability maps for H&E stains and
stains using colour deconvolution to regulate all the slides. the slide background. Then, these classification models are
Therefore, the general idea is to unify all the H&E images into utilized to estimate stain colours for the new slides. After that,
a standard pigmentation degree using a reference slide. and for both deconvolved source and reference slides, a set of
statistics using B‐spline non‐linear mapping is calculated for
each stain and background channel to normalize the colour
statistics of the source slides to the corresponding reference
channels. Finally, the normalized stain channels of the source
slide are recombined in per‐pixel fashion to form the enhanced
histology slide. Examples of the normalized histopathology
images are shown in Figure 5.

3.2 | EXTRACTING SF AND DF

3.2.1 | Shallow features

For the SF we propose to use a new architecture inspired by


the Inception network [25], which is named Extreme
Inception or Xception [31]. The Inception module factors
the process of convolutional filtering at cross‐channel cor-
relation via applying a set of 1 � 1 convolution filters and
maps the input image into many separate feature sets. These
sets are then mapped together using higher convolutions
such as 5 � 5. In comparison, the Xception module uses a
F I G U R E 4 Block diagram of the proposed breast cancer classification 1 � 1 convolution filter first for cross‐channel correlations
method including stain normalization, SF and DF extraction, Ecmax
combiner, and MSB and MSM classification
and then for each output channel; the spatial correlations
are mapped separately in a technique called depthwise

F I G U R E 5 Examples of stain normalization


process using non‐linear mapping stain colour
descriptor (SCD). The top row presents the non‐
enhanced histology images, whereas the bottom row
shows the normalized images. The bottom right
image is an example of incorrect normalization
156
- ALKASSAR ET AL.

separable convolution (DSC)[37]. The DSC deals with both trait is its proficiency in providing maximum information flow
spatial (width and height) and depth dimensions (channels), between layers. This means that each layer obtains additional
working well with kernels that cannot be factored into a features passed on from all preceding layers, and the same layer
smaller size. transfer its features to all subsequent layers. As a result, deep
The SF are extracted from the middle‐flow layers, which layers will have rigorous feature maps that have lost their
are the linear stack of the DSC layers, with residual connections minimal spatial information. Another advantage of dense
that map the data from the entry flow. For each DSC module, connection is preventing overfitting as it has a regularizing
the data are first passed through channel‐wise pooling via a effect [22].
1 � 1 convolutional filter. Then, the extracted feature set is In terms of the modified DenseNet architecture as shown
divided into 728 channels, and each is filtered using a higher in Figure 7, an input histopathology image xi is pervading
convolutional filter size. After that, the filtered features are through a series of layers L. Each layer builds a non‐linear
grouped and passed on to other DSC modules. In the original transformation Hl, including operations such as batch
design, the middle flow is repeated 8 times, whereas we normalization, pooling, rectified linear units, or convolutional
modified the network middle flow to be processed 12 times filters, where the l represents the index of a given layer. The
empirically to achieve maximum performance. This is shown in connectivity of the layers can be defined as
Figure 6.
Xl ¼ H l fXi þ Xiþ1 þ Xiþ2 þ ⋯ þ Xl 1 g; ð1Þ
3.2.2 | Deep features
where Xl receives map features from all preceding layers and
For the DF, we utilized a modified version of the DenseNet connects to all subsequent layers. As the size of the concate-
network architecture suggested in [22], which is a modified nated feature maps change, down sampling via pooling
version of the ResNet [27], as our feature descriptor. DenseNet operation is used. As a result, the final DF extracted from the
has many compelling advantages, such as to strengthen feature final layers can be viewed as the collective knowledge of the
propagation, encourage feature reuse, but the most important whole network.

FIGURE 6 The structure of the depthwise separable convolution (DSC) modules utilized to extract shallow features (SF) from a histopathology image

FIGURE 7 Deep features (DF) obtained from a set of dense blocks and transition layers, including convolutional filtering and pooling layers
ALKASSAR ET AL.
- 157

Each dense block consists of a set of two convolu- training and the rest is for testing. However, and in terms of
tional filters with size 1 � 1 and 3 � 3, respectively. decision‐making in [3], the diagnosis accuracy is based on
Employing smaller filter size is most advantageous to patient classification accuracy (PCAc), where the accuracy is
obtain more information; however, this may cause the determined on the basis of detecting tumour type for a specific
extracted features to be confused during classification. We, patient, and image classification accuracy (ICAc), which con-
therefore, retained the first filter size and modified the siders the diagnosis accuracy of each image regardless of the
second filter size to be selected empirically for each patient, providing more accurate evaluation of the classification
Dense Block using a similar Gaussian pyramid routine, system. The ICAc is defined as
where we try filter sizes ranging from three to nine for
odd sizes only and pretrained that layer for fine‐tuning.
The best filter size will be discussed in Section 4. In the Imcorr
ICAc ¼ ; ð3Þ
end, each feature vector has 1920 elements representing Imtot
training and testing images.
where Imcorr is the number of the correctly classified test
images, whereas Imtot is the total test image number. As most
3.3 | Classification works calculate the ICAc, we adopted it in our comparison.
Additionally, we used the confusion matrix to present true,
After extracting both DF and SF, we propose using an false, and predicted scores of each reformulation. Moreover,
Ensemble of Classifiers with maximum rule (ECmax) to receiver operating characteristic (ROC) curves for the pro-
make a prediction. Each ith �classifier� generates a score posed classification system are plotted using two rates defined
� as true positive rate, also called sensitivity, and false positive
j j
defined as sk ∈ f0; 1g∣ J < 2 or sk ∈ f0; …; J 1g∣ pate. The experimental set‐up is achieved on a PC with 16GB
J > 2Þ where j ¼ [1, …, J] is the number of classes and of RAM, Intel Core‐i5, GPU Nividia GTX 1060, and under
k ¼ [1, 2, …, K] represents the number of classifiers. We Matlab 2019a.
utilized various up‐to‐the‐minute classification techniques for First, we compared our work with recent deep network
our ECmax combiner to enhance prediction. These methods architectures such as AlexNet [21], GoogleNet [23],
and their modified versions include Tree (Fine, Medium, and InceptionV3 [25], ResNet [27], InceptionResNetV2 [24],
coarse tuning), Linear and Quadratic Discriminant, Logistic ShuffleNet [28], MobileNetV2 [26], SqueezeNet [29], and
Regression, Naive Bayes (Gaussian and Kernel), support vector VGG as shown in Table 2 for both DF and SF. We extracted
machine (SVM) (Linear, Quadratic, Cubic, and Gaussian DF and SF from the aforementioned deep networks with four
kernels), and K‐nearest neighbors (Cubic, Cosine, and magnification factors for MSB and MSM classes. Then, for
Weighted). Then, ECmax is applied where the final ensemble each zooming factor, we calculated the mean accuracy of the
decision can be defined, as MSB and MSM classes together for DF and SF. It can be
observed from Table 2 that the DF and SF of the proposed
j feature extraction method have a magnifying‐invariant effect
ECmaxj ¼ arg max sk ; ð2Þ compared with other networks, where they perform well in one
k¼1;…;K
or more zoom factor but perform worse in either one zooming
meaning that class j is selected on the basis of the largest factor or when utilizing SF or DF. Besides, both the proposed
support generated from these classifiers. DF and SF outperform most of the compared network
features, especially the SF where they tend to outperform DF
in all the deep networks. This is because SF has rich spatial
4 | RESULTS AND DISCUSSION information due to not suffering from multiprocess degrada-
tion, which can be highly beneficial for the image‐classification
4.1 | Results approach. Additionally, the accuracy of SF with and without
the stain normalization process is compared in Table 3 which
We evaluated the proposed system in two reformulations: shows the significance of normalizing colour intensities of the
(1) The first one classifies all the images using binary classifi- H&E stains.
cation into benign and malignant, (2) and the second evaluates Next, we plotted the ROC curves for DF and SF utilizing
the system using all subclasses mentioned in Table 1. The MSB and MSM reformulations, respectively, as shown in
taxonomy of comparison based on these two reformulations is Figure 8. Moreover, confusion matrices showing the predictive
also referred to as magnification‐specific binary (MSB) and positive rate (PPR) for evaluating DF and SF in MSB and MSM
magnification‐specific multi‐category (MSM). For consistency, and for all magnifying factors are also depicted in Figures 9–12.
we utilized the exact five‐fold configuration provided online1 Both feature types achieved high accuracy rates using all the
by the authors in [3], where 70% of the data set is used for magnification factors when the tumour is only classified into
either benign or malignant in MSB taxonomy. Also, the pro-
posed system in the MSM classification has performed
1
http://web.inf.ufpr.br/vri/breast‐cancer‐database significantly well compared with recent work.
158
- ALKASSAR ET AL.

T A B L E 3 Comparison of proposed SF in terms of MSB taxonomy

96.40

95.40

94.90

94.75
accuracy with and without stain normalization

SF
Proposed
method Zooming Without Stain With Stain

94.00

93.35

92.75

91.50
Factor Normalization Normalization
DF
40X 89.2 99.0
85.05

83.00

82.20

79.25
100X 87.0 98.5
SF

200X 83.8 98.5


VGG

82.65

80.35

79.05

77.45
DF

400X 85.1 98.0


91.10

90.55

90.10

88.40
SqueezeNet

Finally, we compared the proposed method with the


SF

state‐of‐the‐art methods reported in the survey conducted


80.75

80.35

81.65

81.30

by Benhammou et al. [33] on the BreakHis data set


DF

using deep learning and texture‐based models. The tax-


onomy of comparison is based on two reformulations:
92.75

90.25

89.65

88.70
MobileNetV2

MSB and MSM. This is shown in Table 4. Notably, most


SF

of the recent work adopts the MSB reformulation, while


83.95

85.35

80.65

79.05

a few methods use the MSM reformulation. Nevertheless,


DF

our proposed method using DF and SF along with the


ECmax classifier combiner system has higher accuracy
95.35

87.70

91.50

88.95

rates for all zooming factors, providing an invariant


SF
ShuffleNet

approach for different magnification histopathology slides.


86.30

83.65

83.80

81.30
DF

4.2 | DISCUSSION
91.40

88.55

88.65

85.45
SF
ResNetV2
Inception
Comparison of proposed DF and SF in terms of mean accuracy with off‐the‐shelf deep networks

Histopathology images have different issues in terms of stain


87.75

84.50

84.60

81.30

colour intensity, contrast, brightness, blurring, and so on.


DF

Therefore, any classification method might dramatically fail if


no stain normalization is considered as setting these images to
95.50

93.80

93.50

91.35

en equal standards is crucial. As a consequence, we applied


SF

stain normalization, trying to standardize the quality of images.


ResNet

87.55

85.15

85.20

83.30

Still, this part of the algorithm is a wide research area where


DF

more investigation is necessary.


In general, utilizing deep CNN has outperformed
96.20

94.40

92.75

91.90
InceptionV3
SF

traditional methods in terms of classification accuracy as


shown in Table 4, where the work in [14] has employed
86.80

83.20

81.75

80.15

a nuclei detection for feature selection and random forest


DF

classification method. However, the complexity and tuning


of deep networks have always been an issue for re-
95.40

92.10

91.20

89.60
SF

searchers, where it is considered as a black box. No deep


GoogleNet

learning method is guaranteed to give the desired solution.


79.75

74.75

77.15

75.10

Nonetheless, our proposed method takes advantage of


DF

using activation layers of the DSC and Dense Blocks as


features and classifies them using off‐the‐shelf classifiers,
83.75

82.75

82.55

82.10
SF

yielding 99% and 92% PPR in terms of the MSB and


AlexNet

MSM classification, respectively. The MSM PPR is lower


80.40

79.65

80.30

79.30

due to the higher number of classes and the difficulty to


DF

discriminate between some malignant and benign types


due to tissue similarity. However, using the activation
Network architecture

layers as either deep or shallow features does not


Magnifying factor

necessarily achieve higher accuracy if certain layer struc-


tures and blocks are utilized for the designated task.
TA B L E 2

Therefore, and as shown in Table 1, extracting DF and


200X

400X
100X

SF from well‐known deep networks has not performed


40X

well in some zooming factors, such as GoogLeNet


ALKASSAR ET AL.
- 159

F I G U R E 8 ROC curve of DF and SF depicting the accuracy for the best classifiers using 40X, 100X, 200X, and 400X magnifying factors for breast
cancer histopathological images: (a) MSB classification using DF (b) MSB classification using SF (c) MSM classification using DF (d) MSM classification
using SF. ROC, receiver operating characteristic

(a) (b) (c) (d)


Benign 99% 1% Benign 97% 3% Benign 97% 2% Benign 95% 3%

Malignant 1% 99% Malignant 3% 97% Malignant 3% 98% Malignant 5% 97%


True class

True class

True class

True class

Positive Predictive Positive Predictive Positive Predictive Positive Predictive


99% 99% 97% 97% 97% 98% 95% 97%
Value Value Value Value

False Discovery False Discovery False Discovery False Discovery


1% 1% 3% 3% 3% 2% 5% 3%
Rate Rate Rate Rate
Be M Be M Be M Be M
nig ali ni ali n ign ali nig ali
n gn gn gn gna n g na
an an nt nt
t t
Predicted class Predicted class Predicted class Predicted class

FIGURE 9 Confusion matrices of DF for various magnifying factors using MSB reformulation. (a) 40X, (b) 100X, (c) 200X and (d) 400X
160
- ALKASSAR ET AL.

(a) (b) (c) (d)


Benign 99% 1% Benign 98% 1% Benign 98% 1% Benign 97% 1%

Malignant 1% 99% Malignant 2% 99% Malignant 2% 99% Malignant 3% 99%


True class

True class

True class

True class
Positive Predictive Positive Predictive Positive Predictive Positive Predictive
99% 99% 98% 99% 98% 99% 97% 99%
Value Value Value Value

False Discovery False Discovery False Discovery False Discovery


1% 1% 2% 1% 2% 1% 3% 1%
Rate Rate Rate Rate
Be M Be M Be M Be M
nig ali nig ali n ig ali nig ali
n gn n gn n gn n gna
an an an nt
t t t
Predicted class Predicted class Predicted class Predicted class

FIGURE 10 Confusion matrices of SF for various magnifying factors using MSB reformulation. (a) 40X, (b) 100X, (c) 200X and (d) 400X

(a) (b)

(c) (d)

FIGURE 11 Confusion matrices of DF for various magnifying factors using MSM reformulation. (a) 40X, (b) 100X, (c) 200X and (d) 400X

achieving low accuracy rates using DF. In contrast, our a small‐size filter is efficient for higher magnifying factors.
method has performed well in both DF and SF. Also, However, this is not the case with some images as some
and for the convolutional filter size, we noticed that using noise factors, such as blurring, are introduced. Therefore,
ALKASSAR ET AL.
- 161

(a) (b)

(c) (d)

FIGURE 12 Confusion matrices of SF for various magnifying factors using MSM reformulation. (a) 40X, (b) 100X, (c) 200X and (d) 400X

image quality measures must be considered in a real‐world lowest is 64% and 71% using DF and SF, respectively. The low
scenario. number of LC tumour images available in the BreakHis data
In terms of evaluating DF and SF using the MSB and set may be the cause that PPR value dropped dramatically.
MSM taxonomies, it is observed that SF have a better Another reason for the lower LC tumour diagnosis rate is that,
performance than DF using the MSM classification. This is even after stain normalization, the contrast between the H&E
evident in Figures 9 and 10, where the lowest PPR is 97%. stains still overlaps in some areas. Finally, it is clear from the
Examples of the activation channels and the strongest previous discussion that the proposed method has out-
activation channel of the SF are shown in Figure 13. SF performed the related state‐of‐the‐art methods in both the
have a better performance due to the structure of DSC MSB and MSM classification taxonomies, presenting a
filtering. DSC has a vast advantage over normal convolution, promising breast cancer diagnosis system.
being faster, and it only transforms the image once, then
elongates it into many channels. Hence, it preserves more
image spatial information. 5 | CONCLUSION
Nevertheless, SF has attained higher PPR compared with
DF using an SVM classifier. Also, it is noticeable in the A novel magnification‐invariant breast cancer diagnosis sys-
BreakHis data set that the higher the magnifying factor, the tem based on histopathology images was proposed here.
lower the PPR goes. Besides, utilizing SF and DF has a benefit First, an efficient stain normalization method was applied to
over other methods where the accuracy is effectively magni- enhance the contrast of stains in tumour histopathology
fication invariant. Moreover, one particular malignant type, LC, images. Then, two types of novel features were extracted
achieved low PPR in comparison to other malignant and using densely connected layers producing DF and DSC
benign tumour types as shown in Figures 11 and 12. The module extracting SF. Finally, after extracting all the features,
162
-

TA B L E 4 Accuracy comparison of the proposed breast cancer classification method with recent state‐of‐the‐art methods

Result (%)
Method Preprocessing Feature extraction Classifier Transfer learning Training/tesing Metrics Reformulation 40X 100X 200X 400X
[13] Resizing and Subtract mean images Na ECmax(AlexNet) ImageNet 70%/30% ICAc MSB 85.6 83.5 84.6 82.0

[12] None CaffeNet Logestic Regression ImageNet 70%/30% ICAc MSB 84.6 84.8 84.2 81.6

[14] Nuceli Detection PFTAS Rotation Forset None Not specified ICAc MSB 81.7 81.2 80.7 81.5

[32] Data Augmentation Na GoogleNet ImageNet 50%/50% ICAc MSB 92.8 93.9 93.7 92.9

[38] None Fisher Vector VGG SVM ImageNet 70%/30% ICAc MSB 87.0 86.2 85.2 82.9

[39] None Na Integrated None Not specified ICAc MSB Overall avg 88.09

[40] None Na AlexNet None Not specified ICAc MSB 82.7 Na Na Na

[41] Resizing & Subtract mean images VGG‐19 SVM ImageNet 98 %/2% ICAc MSB Overall avg 80.0

[42] None Na AlexNet ImageNet 70%/30% ICAc MSB 90.69 90.46 90.64 90.96

[35] Data Augmentation Na Design CNN None 70%/30% ICAc MSB 98.33 97.12 97.85 96.15

MSM 87.0 82.5 84.0 87.91

Proposed Method Stain Normalization DF ECmax ImageNet 70%/30% ICAc MSB 99.0 97.0 97.5 96.0

MSM 86.8 87.4 84.3 81.7

Proposed Method Stain Normalization SF ECmax ImageNet 70%/30% ICAc MSB 99.0 98.5 98.5 98.0

MSM 92.2 88.2 89.4 88.5

Abbreviation: SVM, support vector machine


ALKASSAR
ET AL.
ALKASSAR ET AL.
- 163

F I G U R E 1 3 The visualization of shallow features (SF) using depthwise separable convolution (DSC) architecture. (a) Malignant ductal carcinoma (DC)
tumour subclass (b) The activation channels of the DSC (c) SF activation channel (d) Deep features (DF) activation channel

the ECmax combiner to select the best classifier was utilized 10. Heath, M., et al.: The digital database for screening mammography. In:
to acquire the highest accuracy in both the MSB and MSM Medical Physics Publishing: Proceedings of the 5th International
reformulations. The large‐scale histopathology data set Workshop on Digital Mammography, pp. 212–218 (2000)
11. Matheus, B., Schiabel, H.: Online mammographic images database for
BreakHis was used to evaluate the proposed system, and the development and comparison of CAD schemes. J. Digital Imaging. 24(3),
results exhibited higher accuracy rates compared with recent 500–506 (2011)
work, achieving 99% and 92% in the MSB and MSM 12. Spanhol, F., et al.: Deep features for breast cancer histopathological image
classification methods, respectively. Furthermore, the pro- classification. IEEE Int. Conf. Syst. Man Cybernetics. 1868–1873 (2017)
13. Spanhol, F., et al.: Breast cancer histopathological image classification
posed method utilized modern deep learning techniques
using convolutional neural networks. IEEE Int. Joint Conf. Neural Netw.
such as the Dense Block and DSC as efficient feature ex- 2560–2567 (2016)
tractors. Some issues need to be considered, such as stain 14. Sharma, M., Singh, R., Bhattacharya, M.: Classification of breast tumors
normalization, which was inefficient on LC tumour images, as benign and malignant using textural feature descriptor. IEEE Int.
producing the lowest PPR score. Additionally, the effect of Conf. Bioinfor. Biomed. 1110–1113 (2017)
low‐contrast histopathology images on accuracy and fusing 15. Robertson, S., et al.: Digital image analysis in breast pathology from
image processing techniques to artificial intelligence. Transl. Res. 194,
both handcrafted and deep learning methods to enhance 10–35 (2018)
performance will be our future work. 16. Pratiher, S., Chattoraj, S.: Diving deep onto discriminative ensemble of
histological hashing & class‐specific manifold learning for multi‐class
O RC ID breast carcinoma taxonomy. IEEE Int. Conf. Acoust. Speech Sig. Proc.
S. Alkassar https://orcid.org/0000-0002-6211-6549 1025–1029 (2019)
17. Shen, P., et al.: Segmenting multiple overlapping nuclei in H&E stained
Bilal A. Jebur https://orcid.org/0000-0003-1621-7733 breast cancer histopathology images based on an improved watershed.
Mohammed A. M. Abdullah https://orcid.org/0000-0002- IEEE Int. Conf. Bioinfor. Biomed. 1–4 (2015)
3340-8489 18. Monaco, J., et al.: Image segmentation with implicit color standardization
using spatially constrained expectation maximization: detection of nuclei.
Int. Conf. Med. Image Comput. Comput. Assist. Interv. 365–372 (2012)
R EF ER EN CE S 19. Kurmi, Y., Chaurasia, V.: Multifeature based medical image segmentation.
1. Stewart, B., Wild, C.: World Cancer Report 2014. http://publications.iarc. IET Image Proc. 12(8), 1491–1498 (2018)
fr/Non-Series-Publications/World-Cancer-Reports/World-Cancer-Report- 20. Hu, Z., et al.: Deep learning for image‐based cancer detection and
2014. Accessed 8 November (2019) diagnosis: a survey. Pattern Recogn. 83, 134–149 (2018)
2. Bary, F., McCarron, P., Parkin, D.M.: The changing global patterns of 21. Krizhevsky, A., Sutskever, I., Hinton, G.: Imagenet classification with
female breast cancer incidence and mortality. Breast Cancer Res. 6(6), deep convolutional neural networks. Adv. Neural Infor. Process. Syst.
1–11 (2004) 1097–1105 (2012)
3. Spanhol, F., et al.: A dataset for breast cancer histopathological image 22. Huang, G., et al.: Densely connected convolutional networks. IEEE
classification. IEEE Trans. Biomed. Eng. 63(7), 1455–1462 (2015) Conf. Comput. Vis. Pattern Recog. 4700–4708 (2017)
4. Fletcher, S.: Breast cancer screening: a 35‐year perspective. Epidemiol. 23. Szegedy, C., et al.: Going deeper with convolutions. IEEE Conf.
Rev. 33(1), 165–175 (2011) Comput. Vis. Pattern Recog. 1–9 (2015)
5. Veta, M., et al.: Breast cancer histopathology image analysis: a review. 24. Szegedy, C., et al.: Inception‐V4, inception‐Resnet and the impact of
IEEE Trans. Biomed. Eng. 61(5), 1400–1411 (2014) residual connections on learning. 31st AAAI Conference on Artificial
6. Meijer, GA., et al.: Origins of Image Analysis in Clinical Pathology. Intelligence (2017)
J. Clin. Pathol. 50(5), 365 (1997) 25. Szegedy, C., et al.: Rethinking the inception architecture for computer
7. Chan, J.: The wonderful colors of the hematoxylin‐eosin stain in diag- vision. IEEE Conf. Comput. Vis. Pattern Recog. 2818–2826 (2016)
nostic surgical pathology. Int. J. Surg. Pathol. 22(1), 12–32 (2014) 26. Sandler, M., et al.: Mobilenetv2: inverted residuals and linear bottlenecks.
8. Rizzardi, A., et al.: Quantitative comparison of immunohistochemical IEEE Conf. Comput. Vis. Pattern Recog. 4510–4520 (2018)
staining measured by digital image analysis versus pathologist visual 27. He, K., et al.: Deep residual learning for image recognition. IEEE Conf.
scoring. J Diagn. Pathol. 7(1), 1–10 (2012) Comput. Vis. Pattern Recog. 770–778 (2016)
9. Yassin, N., et al.: Machine learning techniques for breast cancer computer 28. Zhang, X., et al.: Shufflenet: an extremely efficient convolutional neural
aided diagnosis using different image modalities: a systematic review. network for mobile devices. IEEE Conf. Comput. Vis. Pattern Recog.
Comput. Method Program Biomed. 156, 25–45 (2018) 6848–6856 (2018)
164
- ALKASSAR ET AL.

29. Iandola, F., et al.: SqueezeNet: AlexNet‐level accuracy with 50x fewer 39. Gupta, V., Bhavsar, A.: An integrated multi‐scale model for breast cancer
parameters and <0.5 MB model size. arXiv preprint arXiv:1602.07360 histopathological image classification with joint colour‐texture features.
(2016) Int. Conf. Comput. Anal. Images Patterns. 354–366 (2017)
30. Simonyan, K., Zisserman, A.: Very deep convolutional networks for 40. Akbar, S., et al.: Transitioning between convolutional and fully connected
large‐scale image recognition. arXiv preprint arXiv:1409.1556 (2014) layers in neural networks. In: Springer: Deep Learning in Medical Image
31. Chollet, F.: Xception: deep learning with depthwise separable convolu- Analysis and Multimodal Learning for Clinical Decision Support, pp.
tions. IEEE Conf. Comput. Vis. Pattern Recog. 1251–1258 (2017) 143–150 (2017)
32. Han, Z., et al.: Breast cancer multi‐classification from histopathological 41. Nejad, E., et al.: Transferred semantic scores for scalable retrieval of
images with structured deep learning model. Sci. Report. 7(1), 4172 (2017) histopathological breast cancer images. Int. J. Multimed. Inform.
33. Benhammou, Y., et al.: BreakHis based breast cancer automatic diagnosis Retrieval. 7(4), 241–249 (2018)
using deep learning: taxonomy, survey and insights. Neurocomputing. 42. Du, B., et al.: Breast cancer histopathological image classification via deep
375, 9–24 (2020) active learning and confidence boosting. Int. Conf. Artif. Neural Netw.
34. Bejnordi, B., et al.: A multi‐scale superpixel classification approach to the 109–116 (2018)
detection of regions of interest in whole slide histopathology images. 43. Cheng, H., et al.: Automated breast cancer detection and classification
Med. Imag. Dig. Pathol. (2015) using ultrasound images: a survey. Pattern Recog. 43(1), 299–317 (2010)
35. Bardou, D., Zhang, K., Ahmed, S.: Classification of breast cancer based
on histology images using convolutional neural networks. IEEE Access.
6, 24680–24693 (2018)
36. Khan, A., et al.: A nonlinear mapping approach to stain normalization in
How to cite this article: Alkassar S, Jebur BA,
digital histopathology images using image‐specific color deconvolution. Abdullah MAM, Al‐Khalidy JH, Chambers JA. Going
IEEE Trans. Biomed. Eng. 61(6), 1729–1738 (2014) deeper: Magnification‐invariant approach for breast
37. Sifre, L., Mallat, S.: Rigid‐Motion Scattering for Image Classification, cancer classification using histopathological images. IET
Ph. D. dissertation (2014) Comput. Vis. 2021;15:151–164. https://doi.org/
38. Song, Y., et al.: Adapting Fisher vectors for histopathology image
classification. In: IEEE 14th International Symposium on Biomedical
10.1049/cvi2.12021
Imaging (ISBI), pp. 600–603 (2017)

You might also like