Professional Documents
Culture Documents
A BERT Encoding With Recurrent Neural Network and Long-Short Term Memory For Breast Cancer Image Classification - 1-s2.0-S2772662223000176-Main
A BERT Encoding With Recurrent Neural Network and Long-Short Term Memory For Breast Cancer Image Classification - 1-s2.0-S2772662223000176-Main
∗ Corresponding author.
E-mail address: sushovan.chaudhury@gmail.com (S. Chaudhury).
https://doi.org/10.1016/j.dajour.2023.100177
Received 15 November 2022; Received in revised form 23 January 2023; Accepted 28 January 2023
Available online 2 February 2023
2772-6622/© 2023 The Author(s). Published by Elsevier Inc. This is an open access article under the CC BY-NC-ND license
(http://creativecommons.org/licenses/by-nc-nd/4.0/).
S. Chaudhury and K. Sau Decision Analytics Journal 6 (2023) 100177
2
S. Chaudhury and K. Sau Decision Analytics Journal 6 (2023) 100177
3
S. Chaudhury and K. Sau Decision Analytics Journal 6 (2023) 100177
benign, and malignant. Despite the fact that it was intended for lesion class imbalances among the various variants of the image classes and
detection rather than classification, our work uses it to classify lesions. were reduced to small patches. Figs. 4 and 5, show the sample histology
Histopathological Image Data: The original dataset consisted of images of Breast and Breast Ultra sound images after finding the region
images of Breast Cancer (BC) specimens with three sets of data such of interest from the masks. Fig. 7A, 7B and 7C shows the pre-processing
as testing, training, and validation data from the Kaggle repositories. : techniques and that data is broken down in train-test ratio of 80–20.
Breast Histopathology Images, Kaggle
3.3. Feature extracting using BERT (Transformer encoder)
3.2. Pre-processing
A transformer is a neural architecture that utilizes mainly two
The layouts of the breast are first distinguished during the un- things, namely sequence to sequence learning and attention and an
derlying pre-handling of ultrasound breast images, and afterward the Encoder–Decoder based architecture. The attention-mechanism looks
commotion content is eliminated without influencing the picture’s at an input sequence and decides at each step which other parts of
basic data. We carried out experiments on the BUSI-1311 dataset. The the sequence are important. [46]. Transformer is an architecture for
training set of the dataset BUSI-1311 Images consists of various images, converting one sequence into another with the aid of two compo-
whereas the test set consists of ultrasound breast images obtained at nents (Encoder and Decoder), similar to LSTM, but different from the
baseline from females between the ages of 25 and 75. In order to previously described/existing sequence-to-sequence models in that it
reduce the impact of stains on biological tissues in medical imaging, does not use any recurrent networks [47]. Specialists investigated a
special image normalizing techniques must be used. A standardized transformer’s true capacity in PC vision in the wake of finding out about
method for the quantitative examination of tissue slices was put forth how well it acted in NLP undertakings. One of the implementations
by Bejnordi [44] and it takes into account the staining procedures used was the ViT (Vision Transformers). To fulfil the information necessity
to make the tissue slices. First, the image’s colour is converted into an of the transformer, a picture is lumped into an assortment of picture
optical density via a logarithmic transformation. The OD tuple is then patches. The supposed picture fix implanting process is basically a to-
subjected to singular value decomposition to produce a 2D projection tally associated layer that goes through a straight change. Specifically,
with a high variance. The original image is then transformed using we might lay out that = HW/P^2 assuming an information picture
the produced colour space. The image’s histogram is then dynamically of size H (height) × W (width) × C (channels) is partitioned into
expanded to cover more than 90% of the data. Histological images patches (i.e., tokens), every one of size P × P × C, (P, P) being the
were pre-processed through augmentation techniques [45] to reduce resolution of each Patch. Then, a vector of size D is made by conveying
4
S. Chaudhury and K. Sau Decision Analytics Journal 6 (2023) 100177
Fig. 5. Sample ultrasound images from BUSI1311 dataset with mask and specific ROI.
each fix. The information is thus changed over into a 2D tensor of performed on the flattened embedded patch and fed into the BEiT and
size NXD. Moreover an extra [CLS] token is added to transformer to are linearly projected, which is similar to word embedding in BERT.
encode the order related information and remove any possible biases Encoder which in turns uses attention mechanisms using the Masked
unlike in CNN based architecture. Other pre-training frameworks, like Image modelling head to encode the most important features into the
the Bidirectional Encoder Representations from Transformers (BERT), visual tokens. The features are thus encoded using pre-trained BEiT and
much of the time utilize this strategy [48]. A position encoding vector can be used after suitable downstream tasks like classifications after
is furthermore added to each fix installing to keep up with the general doing some fine-tuning and feeding into the RNN-LSTM classifier.
position connection between different patches, making a token insert- Figs. 7A, 7B and 7C shows the preprocessing coding for Histology
and Ultra sound images. For ultrasound images, the code converts the
ing that is utilized by the primary layer of the transformer encoder.
images to grayscale, normalizes the pixel values, and applies histogram
Fig. 6A shows the overall flow of this research work and that we have
equalization to enhance the contrast.
used BERT for feature encoding, in particular we use BEiT (Bert Pre
For histology images, the code normalizes the pixel values and
training of Image transformers) [49] from Hugging face library. The
applies adaptive histogram equalization to enhance the contrast.
BEiT pre-training task is adapted from [49] and shown in Fig. 6B.
The images have two views of representations in our method, namely, The below code snippet shows how pretrained model is used to
image patch, and visual tokens. The two types serve as input and output extract features using pretrained Visual Transformers.[Fig. 8]
representations during pre-training, respectively [49]. The image data The following code explains how BEiT is used for Feature extrac-
is tokenized using a tokenizer [50]. The image is also broken down tion:[Fig. 9]
into smaller patches of proportionate sizes as explained above and those Figs. 8 and 9, clearly depicts the steps to do feature encoding using
patches are masked in between. The block wise patches along with the Visual Transformers and BEiT. The pretrained visual transformer mod-
masked areas are flattened as shown in Fig. 6B. Position embedding is els are loaded from hugging face library and features are extrated from
5
S. Chaudhury and K. Sau Decision Analytics Journal 6 (2023) 100177
the breast ultrasound and histology images to feed them in RNN-LSTM times more noteworthy than a RNN. Different methodologies, including
based classifier as shown in Figs. 12 and 13. Framework LSTM and Complex organizations, have been attempted
to resolve the issue of long haul association [53]. The latest LSTM
3.4. Training model(RNN- LSTM) adjustment was given by [54]. As portrayed in Fig. 5, Ct represents
the memory cell units in the LSTM, addressing the info entryway, the
One of the more advanced varieties of RNN is the LSTM network, neglect door, and the result door.
which can depend on order learning in tasks involving sequence pre-
It is essential to discover the information that will be erased from
diction. The goal of the LSTM architecture network is to create an
the cell state. As exhibited in the situation beneath, this sort of decision
appropriate BP training mechanism. It is an upgraded RNN architec-
is made utilizing the neglect entryway and a sigmoid capability:
ture. As was already noted, the LSTM model was created first to handle
the fading gradient that is mostly present in standard RNN [51]. ( )
= .ℎ−1 + + (1)
However, by implementing permanent error movement across per-
sistent error carrousels via superior components and preventing ‘‘back The neglect layer, the sigmoid actuation capability in the neglect
into time’’ collapsing of the error stream, the LSTM technique will door, the weight association with prior secret expresses, the weight
lessen the issue. For a very long time, [52] were aware of the challenges association with the information designs, and the predispositions at
involved in training RNN to capture long-term dependencies. There the neglect layer are totally addressed by the images in the situation.
have, however, been some effective strategies for overcoming this Finding extra data that can be kept in the cell state is likewise pivotal.
crucial problem, such as altering the state-to-state function transition
The info entryway layer, which chooses the qualities to be refreshed,
to encourage the hidden neurons to organize long-term memory and
and the transitory cell state layer, which makes new applicant values
produce routes in the time-unfolded RNN (see Fig. 10).
as vectors that could be added to the state as expressed in conditions
As numerous gates control their input and output, the LSTM ap-
(2) and (3), attempt this cycle.
proach substitutes memory cells for the hidden neurons that make up
the sigmoid or tanh functions. Information flow to hidden neurons is ( )
= .ℎ−1 + + (2)
controlled by these gates. High complexity in the buried layer is a ( )
problem for LSTM. The quantity of boundaries in a LSTM is multiple ̃ = ℎ .ℎ−1 + + (3)
6
S. Chaudhury and K. Sau Decision Analytics Journal 6 (2023) 100177
7
S. Chaudhury and K. Sau Decision Analytics Journal 6 (2023) 100177
Eq. (4) can be used to determine the LSTM’s output at the conclu- noise vector into a high-dimensional representation in the features of
sion. the image. The discriminator transforms the high-dimensional features
retrieved by convolutional layers before classification using several
ℎ = ∗ tanh( ) (4)
deep fully connected layers (see Fig. 11).
The reason for concatenating the pre-processed ultrasound and
4. Results and discussion histology images is to use them together as input to the RNN-LSTM
classifier and benefit from the advantages of both modalities. The
4.1. Classification using RNN-LSTM training ultrasound images may not be as informative as histological images, but
they are much faster and cheaper to acquire, and they are non-invasive.
We proposed RNN-LSTM, a deep fully connected and convolu- By combining both modalities, the overall performance of the classifier
tional layer-based architecture for the discriminator and generator in can be improved. Furthermore, by concatenating the images, it allows
Transformer encoding. Our proposed architecture generates samples of the model to learn from both modalities and make predictions based
higher quality than conventional architectures on a variety of bench- on both, which is particularly useful when the data from one modality
mark picture datasets. As illustrated in Fig. 6, we demonstrated that is not enough to make a decision, and the model can rely on the data
RNN-LSTM learns faster than conventional architecture and can pro- from the other modality to make a decision.
duce recognizable, high-quality photographs after just a few training Transformer encoding, which is based on an RNN-LSTM model,
rounds. Before the convolution layers in the generator, we utilized a learns the distribution more quickly than the CNN model does. The
number of deep fully connected layers to convert the low-dimensional Transformer encoding, which is based on RNN-LSTM, provides easily
8
S. Chaudhury and K. Sau Decision Analytics Journal 6 (2023) 100177
9
S. Chaudhury and K. Sau Decision Analytics Journal 6 (2023) 100177
= ∕( + ) (6) scans was 98.26%. The adaptive histogram equalization method, which
( ) was used in [56] to enhance ultrasound pictures, had an accuracy rate
1 + 2 ∗ ∗
1 − = (7) of 93.04 [59]. Explains a ResNet system for tumour identification that
2 ∗ ( ∗ )
combines composites of various CNN models with imaging data from
= ( + )∕( + + + ) (8) various image formats. The accuracy rate for this data collection was
= (, ) =
∩
(9) 95.32 [57]. States those fuzzy boosting methods were applied following
∪ bilateral filtering to the underlying breast ultrasound image. Accuracy
Where, X, Y=Regions of 95.48% was reached. The semi-supervised -GAN model’s accuracy,
( ) as utilized by the authors in [58], was 90.41%. The suggested method
∗ + − (+ ∗ + )
= (10) achieved a 99% accuracy rate when tested against a BUSI-1311 pictures
2 − (+ ∗ + )
upgraded dataset (Table 2).
Where the sum is the sum of all rows in the matrix, M (i+) represents
the sum of marginal rows, M (+i) represents the sum of marginal 4.6. Setting optimized parameters and coefficients for the classifier
columns, and o represents the number of observations.
There are a few different ways to set optimal parameters for an
4.3. Confusion matrix for histopathological image dataset
RNN-LSTM based classification model, some of the common ways are:
Grid Search: This method involves defining a set of possible parame-
A couple of boundaries registered for this RNN-LSTM classifier, for
ter values, and then training and evaluating the model with all possible
example, Accuracy rate, sensitivity, specificity and error rate with their
combinations of these values. This can be computationally expensive,
qualities are, 91, 100, 68.28 and 9.01%.
especially for models with many parameters, but it can be useful for
The sensitivity rate, accuracy, and efficiency of the LSTM classifier
finding a good set of starting parameters.
are demonstrated by the confusion matrix in Figs. 8 and 9 of training
Random Search: This method is similar to grid search, but it in-
and testing data (see Figs. 15 and 16).
volves randomly sampling from the parameter space instead of trying
4.4. Comparison of ultra sound image and histopathological image dataset all possible combinations. This can be less computationally expensive
classification using RNN-LSTM than grid search and can still yield good results.
Bayesian Optimization: This method uses a probabilistic model to
A comparison of the accuracy scores of the recommended Trans- model the relationship between the parameters and the model’s perfor-
former encoding based on RNN-LSTM method for both Ultra Sound mance. It can be more efficient than grid search or random search as it
Image and Histopathological Data shows that he suggested Ultra Sound uses the information from previous evaluations to guide the search for
data gives better accuracy and precision compared to histopathological optimal parameters.
data (Table 1). Hyperopt : is a library for model selection and hyperparameter tun-
ing. It includes a set of optimization algorithms such as TPE and Tree
4.5. Comparison of the proposed system to the current one for evaluation of Parzen Estimators, it uses the information from previous evaluations
to guide the search for optimal parameters.
A comparison of the accuracy scores of the recommended Trans- Optuna: is another library for model selection and hyperparameter
former encoding based on RNN-LSTM method and the available picture tuning. It uses a tree-structured Parzen estimator (TPE) algorithm to
classifying models in the literature shows that the suggested model optimize the model’s parameters.
achieves successful results (Table 2). Manually tuning: This method involves manually adjusting the pa-
In Table 2, the suggested method is compared to the state-of-the-art rameters based on the results of the model’s performance on a valida-
methods. The accuracy percentage for the writers of [55]’s ultrasound tion set. This can be time-consuming, but it can be useful for gaining
10
S. Chaudhury and K. Sau Decision Analytics Journal 6 (2023) 100177
Table 1
Result of proposed model.
Parameters Performance of LSTM in % Performance of LSTM in %
ultra sound image - dataset histopathological - dataset
Accuracy 99% 90.98%
Sensitivity 100 100
Specificity 100 68.28%
Error rate 1 9.01
Positive predictive value 1 0.8882
Negative predictive value 1 1
Positive likelihood NaN 3.1520
Negative likelihood 0 0
Overall Precision 99.1% 84.5%
Overall Recall 98.5% 94.4%
1 Score Value 98.8% 89.18%
a deeper understanding of the model and the relationship between the coefficients, also known as weights. There are several ways to do this,
parameters and the model’s performance. some of the most common methods include:
Ultimately, the best method for finding optimal parameters will Random Initialization: This method involves initializing the model’s
depend on the specific model and dataset you are working with. It is coefficients with random values, and then training the model using the
important to have a good understanding of the model and its param- optimal parameters determined earlier.
eters, as well as a good sense of what performance to expect, to set Pre-trained models: Another way to set the model’s coefficients is
optimal parameters. to use pre-trained models, which have already been trained on large
Once we have determined the optimal parameters for an RNN- amounts of data. This can be useful if you have a similar dataset and
LSTM based classification model, the next step is to set the optimal task as the pre-trained model.
11
S. Chaudhury and K. Sau Decision Analytics Journal 6 (2023) 100177
12
S. Chaudhury and K. Sau Decision Analytics Journal 6 (2023) 100177
99% and 91 percent respectively for BUSI1311 and Breast histology framework yields superior outcomes. The best features were chosen to
datasets. When compared to more current techniques, the proposed eliminate unneeded features, the dataset augmentation improved the
13
S. Chaudhury and K. Sau Decision Analytics Journal 6 (2023) 100177
Table 2 [4] W.K. Moon, Y.W. Lee, H.H. Ke, S.H. Lee, C.S. Huang, R.F. Chang, et al.,
Comparative evaluation of the state-of-the-art approaches and the suggested approach. Computer-aided diagnosis of breast ultrasound images using ensemble learning
Methods Accuracy (%) Ref. from convolutional neural networks, Comput. Methods Programs Biomed. 190
(2020) 105361, http://dx.doi.org/10.1016/j.cmpb.2020.105361.
ResNet-18, (ICS-ELM) 98.26 [55]
[5] S. Boumaraf, X. Liu, Z. Zheng, X. Ma, C. Ferkous, A new transfer learning based
CNN 93.04 [56] approach to magnification dependent and independent classification of breast
Transfer learning 95.48 [57] cancer in histopathological images, Biomed. Signal Process. Control 63 (2021)
102192, http://dx.doi.org/10.1016/j.bspc.2020.102192.
GAN-CNN 90.41 [58]
[6] Y. Eroğlu, M. Yildirim, A. Çinar, Convolutional neural networks based classifica-
Inception ResNet V2 95.32 [59] tion of breast ultrasonography images by hybrid method with respect to benign,
Bilateral Knowledge 96.0 [45] malignant, and normal using mRMR, Comput. Biol. Med. 133 (2021) 104407,
Distillation in breast http://dx.doi.org/10.1016/j.compbiomed.2021.104407.
histology dataset [7] A.K. Mishra, P. Roy, S. Bandyopadhyay, S.K. Das, Breast ultrasound tumour
classification: A machine learning—radiomics based approach, Expert Syst. 38
Breast Cancer Calcification 97.8 [60] (2021) e12713, http://dx.doi.org/10.1111/exsy.12713.
Identification using K [8] L. Abdelrahman, M.Al. Ghamdi, F. Collado-Mesa, M. Abdel-Mottaleb, Con-
Means, GLCM and HMM volutional neural networks for breast cancer detection in mammography: A
classifier survey, Comput. Biol. Med. 131 (2021) 104248, http://dx.doi.org/10.1016/j.
Hybrid dilated Ghost Model 99.3 [61] compbiomed.2021.104248.
[9] D. Singh, A.K. Singh, Role of image thermography in early breast cancer
Segmentation approach on 96.0 [62]
detection-past, present and future, Comput. Methods Programs Biomed. 183
Breast
(2020) 105074, http://dx.doi.org/10.1016/j.cmpb.2019.105074.
Mammograms using CLAHE
[10] Y. Benhammou, B. Achchab, F. Herrera, S. Tabik, BreakHis based breast
and Fuzzy SVM
cancer automatic diagnosis using deep learning: Taxonomy, survey and insights,
SVM Kernel trick and Hyper 99.1 [63] Neurocomputing 375 (2020) 9–24, http://dx.doi.org/10.1016/j.neucom.2019.09.
parameter tuning in WBCD 044.
VisionTransformer 99.00 Our proposed model [11] A. Das, M.S. Nair, S.D. Peter, Computer-aided histopathological image analysis
(ViT)encoding using techniques for automated nuclear atypia scoring of breast cancer: A review,
pre-trained BEiT and J. Digit. Imaging 33 (2020) 1091–1121, http://dx.doi.org/10.1007/s10278-019-
RNN-LSTM for BUSI dataset 00295-z.
[12] C. Kaushal, S. Bhat, D. Koundal, A. Singla, Recent trends in computer assisted
diagnosis (CAD) system for breast cancer diagnosis using histopathological
images, Irbm 40 (2019) 211–227, http://dx.doi.org/10.1016/j.irbm.2019.06.001.
[13] G. Hamed, M.A.E.R. Marey, S.E.S. Amin, M.F. Tolba, Deep learning in breast
training strength, and the combination technique improved accuracy
cancer detection and classification, in: The International Conference on Artificial
consistency. These elements work together to strengthen this study. The Intelligence and Computer Vision, Springer, Berlin, Germany, 2020, pp. 322–333.
limitation of this model is that it is performed on a pre-trained model [14] A.L. Beam, I.S. Kohane, Big data and machine learning in health care, JAMA
and not compared with other Image transformers’ result as explained 319 (2018) 1317–1318, http://dx.doi.org/10.1001/jama.2017.18391.
in base paper [49,50]. The pre-trained model’s accuracy ranges from [15] G. Li, Z. Xiao, Transfer learning-based neuronal cell instance segmentation with
pointwise attentive path fusion, IEEE Access (2022).
80 to 86 percent only and hence it is not used for classification task [16] A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner,
in our model. In our model we have used LSTM-RNN as the classifier. et al., An image is worth 16x16 words: Transformers for image recognition at
The optimal coefficients (type of optimizers used, learning rate etc.) can scale, 2020, arXiv Prepr. arXiv:2010.11929.
only be modified for the down-streamed task of LSTM based classifier [17] B. Gheflati, H. Rivaz, Vision transformer for classification of breast ultrasound
images, 2021, arXiv Prepr. arXiv:2110.14731.
over top of transfer learning. Future work will focus on two essential [18] J.E. Van Engelen, H.H. Hoos, A survey on semi-supervised learning, Mach. Learn.
tasks: (i) Training a Vision Transformer from scratch and/or fine tune 109 (2020) 373–440, http://dx.doi.org/10.1007/s10994-019-05855-6.
the BEiT model for down streamed tasks like segmentation and classi- [19] M. Fayyaz, S.A. Kouhpayegani, F.R. Jafari, E. Sommerlade, H.R.V. Joze, H.
fications (ii) Use of Variational Auto-encoders for tokenizing the image Pirsiavash, et al., Ats: Adaptive token sampling for efficient vision transformers,
2021, arXiv:2111.15667 [cs].
data and representing most relevant patches using understandable
[20] Q. Xie, Z. Dai, E. Hovy, T. Luong, Q. Le, Unsupervised data augmentation for
heat maps. We will discuss our proposed approach with professionals consistency training, Adv. Neural Inf. Process. Syst. 33 (2020) 6256–6268.
in ultrasound imaging, radiologists and medicine to implement it in [21] Z. Li, F. Liu, W. Yang, S. Peng, J. Zhou, A survey of convolutional neural
hospitals. networks: Analysis, applications, and prospects, IEEE Trans. Neural Netw. Learn.
Syst. 2021 (2021) 1–21, http://dx.doi.org/10.1109/TNNLS.2021.3084827.
[22] M. Masud, A.E. Eldin Rashed, M.S. Hossain, Convolutional neural network-
Declaration of competing interest based models for diagnosis of breast cancer, Neural Comput. Appl. (2020)
http://dx.doi.org/10.1007/s00521-020-05394-5.
[23] A. Krizhevsky, I. Sutskever, G.E. Hinton, Imagenet classification with deep
The authors declare that they have no known competing finan-
convolutional neural networks, Adv. Neural Inf. Process. Syst. 25 (2012).
cial interests or personal relationships that could have appeared to [24] J. Redmon, S. Divvala, R. Girshick, A. Farhadi, You only look once: Unified,
influence the work reported in this paper. real-time object detection, in: Proceedings of the IEEE Conference on Computer
Vision and Pattern Recognition, 2016, pp. 779–788.
[25] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, et al., Going
Data availability
deeper with convolutions, in: Proceedings of the IEEE Conference on Computer
Vision and Pattern Recognition, 2015, pp. 1–9.
Data will be made available on request. [26] A.G. Howard, M. Zhu, B. Chen, D. Kalenichenko, W. Wang, T. Weyand,
et al., Mobilenets: Efficient convolutional neural networks for mobile vision
applications, 2017, arXiv Prepr. arXiv:1704.04861.
References [27] K. He, X. Zhang, S. Ren, J. Sun, Deep residual learning for image recognition, in:
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition,
[1] F. Bray, J. Ferlay, I. Soerjomataram, R.L. Siegel, L.A. Torre, A. Jemal, et al., 2016, pp. 770–778.
Global cancer statistics 2018: GLOBOCAN estimates of incidence and mortality [28] F. Chollet, Xception: Deep learning with depthwise separable convolutions, in:
worldwide for 36 cancers in 185 countries, Ca. Cancer J. Clin. 68 (2018) Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition,
394–424, http://dx.doi.org/10.3322/caac.21492. 2017, pp. 1251–1258.
[2] Q. Hu, H.M. Whitney, M.L. Giger, A deep learning methodology for improved [29] G. Huang, Z. Liu, L.Van.Der. Maaten, K.Q. Weinberger, Densely connected
breast cancer diagnosis using multiparametric MRI, Sci. Rep. 10 (2020) 10536, convolutional networks, in: Proceedings of the IEEE Conference on Computer
http://dx.doi.org/10.1038/s41598-020-67441-4. Vision and Pattern Recognition, 2017, pp. 4700–4708.
[3] H.K. Mewada, A.V. Patel, M. Hassaballah, M.H. Alkinani, K. Mahant, Spectral– [30] M. Sandler, A. Howard, M. Zhu, A. ZhmogiNov, L.C. Chen, Mobilenetv2: Inverted
spatial features integrated convolution neural network for breast cancer residuals and linear bottlenecks, in: Proceedings of the IEEE Conference on
classification, Sensors 20 (4747) (2020) http://dx.doi.org/10.3390/s20174747. Computer Vision and Pattern Recognition, 2018, pp. 4510–4520.
14
S. Chaudhury and K. Sau Decision Analytics Journal 6 (2023) 100177
[31] D.A. Pisner, D.M. Schnyer, Support vector machine, in: Machine Learning, [48] J. Devlin, M.W. Chang, K. Lee, K. Toutanova, Bert: Pre-training of deep
Elsevier, Amsterdam, Netherlands, 2020, pp. 101–121. bidirectional transformers for language understanding, 2018, arXiv Prepr. arXiv:
[32] M.Z. Alom, C. Yakopcic, M.S. Nasrin, T.M. Taha, V.K. Asari, Breast cancer 1810.04805.
classification from histopathological images with inception recurrent residual [49] H. Bao, L. Dong, F. Wei, Beit: Bert pre-training of image transformers, 2021,
convolutional neural network, J. Digit. Imaging 32 (2019) 605–617, http://dx. arXiv preprint arXiv:2106.08254.
doi.org/10.1007/s10278-019-00182-7. [50] H. Touvron, M. Cord, A. Sablayrolles, G. Synnaeve, H. Jégou, Going deeper with
[33] S. Sabour, N. Frosst, G.E. Hinton, Dynamic routing between capsules, Adv. Neural image transformers, in: Proceedings of the IEEE/CVF International Conference on
Inf. Process. Syst. 30 (2017). Computer Vision, 2021, pp. 32–42.
[34] N. Zemmal, N. Azizi, N. Dey, M. Sellami, Adaptive semi supervised support vector [51] Juergen. Schmidhuber, D. Wierstra, M. Gagliolo, F. Gomez, Training recurrent
machine semi supervised learning with features cooperation for breast cancer networks by evolino, Neural Comput. 19 (3) (2007) 757–779, PDF (preprint).
classification, J. Med. Imaging Health Inf. 6 (2016) 53–62, http://dx.doi.org/10. Compare Evolino overview (since IJCAI 2005).
1166/jmihi.2016.1591. [52] Sepp. Hochreiter, Yoshua. Bengio, Paolo. Frasconi, Juergen. Schmidhuber, Gra-
[35] A.K. Jaiswal, I. Panshin, D. Shulkin, N. Aneja, S. Abramov, Semi- supervised dient flow in recurrent nets: the difficulty of learning long-term dependencies,
learning for cancer detection of lymph node metastases, 2019, arXiv Prepr. in: S.C. Kremer, J.F. Kolen (Eds.), A Field Guide to Dynamical Recurrent Neural
arXiv:1906.09587. Networks, IEEE press, 2001.
[36] M. Shi, B. Zhang, Semi-supervised learning improves gene expression-based [53] Q.V. Le, N. Jaitly, G.E. Hinton, A simple way to initialize recurrent networks of
prediction of cancer recurrence, Bioinformatics 27 (2011) 3017–3023, http: rectified linear units, 2015, arXiv preprint arXiv:1504.00941.
//dx.doi.org/10.1093/bioinformatics/btr502. [54] Wojciech. Zaremba, Ilya. Sutskever, Recurrent neural network regularization,
[37] T. Ma, A. Zhang, Affinity network fusion and semi-supervised learning for 2014, arXiv:1409.2329v5 [cs.NE] 19 Feb.
cancer patient clustering, Methods 145 (2018) 16–24, http://dx.doi.org/10.1016/ [55] S.S. Chakravarthy, H. Rajaguru, Automatic detection and classification of mam-
j.ymeth.2018.05.020. mograms using improved extreme learning machine with deep learning, IRBM
[38] Y. Liang, H. Chai, X.Y. Liu, Z.B. Xu, H. Zhang, K.S. Leung, et al., Cancer survival 43 (2021) 49–61.
analysis using semi-supervised learning method based on cox and AFT models [56] E.M. El Houby, N.I. Yassin, Malignant and nonmalignant classification of breast
with L1/2 regularization, BMC Med. Genomics 9 (2016) 11, http://dx.doi.org/ lesions in mammograms using convolutional neural networks, Biomed. Signal
10.1186/s12920-016-0169-6. Process. Control 70 (2021) 102954.
[39] A. Masood, A. Al-Jumaily, K. Anam, Self-supervised learning model for skin [57] Z. Zhuang, Z. Yang, A.N.J. Raj, C. Wei, P. Jin, S. Zhuang, Breast ultrasound tumor
cancer diagnosis, in: 2015 7th International IEEE/EMBS Conference on Neural image classification using image decomposition and fusion based on adaptive
Engineering (NER), 2015, pp. 1012–1015, http://dx.doi.org/10.1109/NER.2015. multi-model spatial feature fusion, Comput. Methods Programs Biomed. 208
7146798. (2021) 106221.
[40] G. Yu, K. Sun, C. Xu, X.H. Shi, C. Wu, T. Xie, et al., Accurate recognition of col- [58] T. Pang, J.H.D. Wong, W.L. Ng, C.S. Chan, Semi-supervised GAN-based radiomics
orectal cancer with semi-supervised deep learning on pathological images, Nature model for data augmentation in breast ultrasound mass classification, Comput.
Commun. 12 (2021) 6311, http://dx.doi.org/10.1038/s41467-021-26643-8. Methods Programs Biomed. 203 (2021) 106018.
[41] J. Chaki, M. Woźniak, Deep learning for neurodegenerative disorder (2016 to [59] M.A. Al-Antari, S.-M. Han, T.-S. Kim, Evaluation of deep learning detection and
2022) : a systematic review, Biomed. Signal Process. Control 80 (Pt. 1) (2023) classification towards a computer-aided diagnosis of breast lesions in digital X-ray
1–20, http://dx.doi.org/10.1016/j.bspc.2022.104223. mammograms, Comput. Methods Programs Biomed. 196 (2020) 105584.
[42] M. Wieczorek, J. Siłka, M. Woźniak, S. Garg, M.M. Hassan, Lightweight con- [60] Sushovan. Chaudhury, Manik. Rakhra, Naz. Memon, Kartik. Sau,
volutional neural network model for human face detection in risk situations, Melkamu Teshome Ayana, Breast cancer calcifications: Identification using
IEEE Trans. Ind. Inform. 18 (7) (2022) 4820–4829, http://dx.doi.org/10.1109/ a novel segmentation approach, Comput. Math. Methods Med. (2021)
TII.2021.3129629. http://dx.doi.org/10.1155/2021/9905808, Article ID 9905808, 13 pages,
[43] M. Woźniak, J. Siłka, M. Wieczorek, Deep neural network correlation learning 2021.
mechanism for CT brain tumor detection, Neural Comput. Appl. (2021) http: [61] Edwin Ramirez-Asis, Romel Percy Melgarejo Bolivar, Leonid Alemán Gonza-
//dx.doi.org/10.1007/s00521-021-05841-x. les, Sushovan Chaudhury, Ramgopal Kashyap, Walaa F. Alsanie, G.K. Viju, A
[44] B.E. Bejnordi, G. Litjens, N. Timofeeva, I. Otte-Höller, A. Homeyer, N. lightweight hybrid dilated ghost model-based approach for the prognosis of
Karssemeijer, J.A. van der Laak, Stain specific standardization of whole-slide breast cancer, Comput. Intell. Neurosci. (2022) 9325452, http://dx.doi.org/10.
histopathological images, IEEE Trans. Med. Imaging 35 (2015) 404–415. 1155/2022/9325452, 10 pages, 2022.
[45] Sushovan. Chaudhury, Nilesh. Shelke, Kartik. Sau, B. Prasanalakshmi, Moham- [62] Sushovan. Chaudhury, Alla.Naveen. Krishna, Suneet. Gupta, K. Sakthi-
mad. Shabaz, A novel approach to classifying breast cancer histopathology biopsy dasan Sankaran, Samiullah. Khan, Kartik. Sau, Abhishek. Raghuvanshi, F.
images using bilateral knowledge distillation and label smoothing regularization, Sammy, Effective image processing and segmentation-based machine learning
Comput. Math. Methods Med. (2021) http://dx.doi.org/10.1155/2021/4019358, techniques for diagnosis of breast cancer, Comput. Math. Methods Med. (2022)
Article ID 4019358, 11 pages, 2021. 6841334, http://dx.doi.org/10.1155/2022/684133, 6 pages, 2022.
[46] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A.N. Gomez, et al., [63] Chaudhury. Sushovan, Shelke. Nilesh, Rashid M. Zahraa, Sau. Kartik, Effect of
Attention is all you need, Adv. Neural Inf. Process. Syst. 30 (2017). grid search and hyper parameter tuned pipeline with various classifiers and
[47] J. Chung, C. Gulcehre, K. Cho, Y. Bengio, Empirical evaluation of gated recurrent PCA for breast cancer detection, Curr. Signal Transduct. Therapy 17 (2022)
neural networks on sequence modeling, 2014, arXiv Prepr. arXiv:1412.3555. e150722206811, http://dx.doi.org/10.2174/1574362417666220715105527.
15