A BERT Encoding With Recurrent Neural Network and Long-Short Term Memory For Breast Cancer Image Classification - 1-s2.0-S2772662223000176-Main

Decision Analytics Journal 6 (2023) 100177
Contents lists available at ScienceDirect
Decision Analytics Journal

journal homepage: www.elsevier.com/locate/dajour
A BERT encoding with Recurrent Neural Network and Long-Short Term

Memory for breast cancer image classification
Sushovan Chaudhury ∗, Kartik Sau
Department of Computer Science and Engineering, University of Engineering and Management, Kolkata, West Bengal, India
ARTICLE INFO ABSTRACT

Keywords: Computer-aided diagnostic systems developed with Artificial Intelligence (AI) are a major advancement in
Bidirectional Encoder Representations from healthcare analytics, assisting radiologists with a second opinion on cancer diagnosis. In this study, we explore
Transformers (BERT) two modalities of breast images, ultrasound, and histology, for their classification into cancerous and non-
Recurrent Neural Networks
cancerous categories. Traditionally Convolution Neural Networks (CNN) have done a commendable job of
Long Short-Term Memory
extracting features using convolution kernels from images with great accuracy. Also, the Bidirectional Encoder
Breast cancer
Ultrasound image classification
Representations from Transformers (BERT) has been widely used in Natural Language Processing (NLP) for
feature encoding and downstream tasks like segmentation and classification. We extract the power of Vision
Transformers (ViT) and implement transfer learning using BERT pre-training of Image Transformers (BEiT) as
a feature encoding technique. We use the encoded features for classification with Recurrent Neural Network
– Long short Term Memory (RNN-LSTM). The classification is performed on two modalities of breast image
datasets: BUSI1311 and breast histopathological images. Both modalities yielded competitive accuracies. The
BUSI1311 dataset produced 99 percent accuracy compared to 91 percent accuracy for breast histology images.
1. Introduction review by radiologists or pathologists is as yet important, which takes

time and needs an elevated degree of radiological/obsessive expertise.
The most predominant cause of casualty among ladies is Breast Besides, it has been shown in various examinations that when a similar
Cancer (BC). As per the World Cancer Research Fund’s 2020 report, arrangement of photographs is seen by different trained professionals,
there were multiple million new instances of BC analysed in 2018 [1] there is a significant degree of between onlooker changeability [12].
and the number one cause of death from cancer among women. These An AI- fuelled framework might have the option to lessen this assess-
disturbing figures accentuate the need of suitably using current spe- ment conflict welcomed on by different human foundations, insightful
cialized forward leaps to find out powerful tools to diagnosis of breast methodologies, and information, creating a more exact demonstrative
cancer in its nascent phases. To fabricate a more compelling computer- outcome to help clinical direction [13].
aided diagnosis (CAD) framework for BC location, scientists have as The field of medical care has been investigating advancements in
of late investigated the utilization of profound learning models in artificial intelligence, particularly in the area of deep learning [14,15].
an assortment of medical services applications [2–7] through health Additionally, there has been a growing use of deep learning in the
analytics and precision Artificial Intelligence. recognition of breast cancer [13]. The author’s review of literature
X-rays (mammography) [8], ultrasound (sonography) [4,7], ther- found two similarities among previous efforts to classify breast cancer
mography [9], magnetic resonance imaging (MRI), and histopathology images. Firstly, the learning models used, which include traditional
imaging are a portion of the imaging techniques that can be utilized deep convolutional neural networks (CNN) structures, new CNNs, and
to identify and analyse BC [10]. During the time spent diagnosing BC, models that combine a CNN, are generally established. Though CNN-
ultrasound has turned into a normally utilized, reasonable, painless, based representation models are important, recent advancements have
and non-radioactive imaging methodology. It is regularly trailed by led to the emergence of a new vision model called the Vision Trans-
histological examination. The last option utilizes biopsy methodology former (ViT) [16], which has been found to be more accurate in various
to get cell and tissue tests that are then mounted on a magnifying public benchmarks. Few explores have investigated the utilization of
lens slide, stained, and inspected under a magnifying lens. Histopatho- the ViT in BC classification, and its true capacity in this field has
logical finding has turned into the best quality level for all intents not been totally analysed [17]. Second, most of exploration now in
and purposes all disease types with a serious level of certainty [11]. presence depends on administered realizing, which requires total expla-
Regardless of the utilization of numerous imaging modalities, a visual nation of each picture test in the dataset. Explanation takes time and
∗ Corresponding author.
E-mail address: sushovan.chaudhury@gmail.com (S. Chaudhury).
https://doi.org/10.1016/j.dajour.2023.100177
Received 15 November 2022; Received in revised form 23 January 2023; Accepted 28 January 2023
Available online 2 February 2023
2772-6622/© 2023 The Author(s). Published by Elsevier Inc. This is an open access article under the CC BY-NC-ND license
(http://creativecommons.org/licenses/by-nc-nd/4.0/).
S. Chaudhury and K. Sau Decision Analytics Journal 6 (2023) 100177
requires information on the topic. Then again, semi-supervised learning

(SSL) [18] coordinates a bigger subset of unlabelled information during
preparing and just requires comment on a little subset of preparing
information. SSL can essentially diminish the turn out expected for
clarifying. SSL has not, nonetheless, been vigorously used in ongoing
BC identification research.
Our exploration looks to fill these systemic openings. We explicitly
propose a supervised learning and SSL-based ViT-based BC grouping
learning pipeline. We utilize an adaptive token sampling (ATS) strategy
to empower the first ViT model to powerfully choose the main picture
tokens [19]. Likewise, we give a custom-made consistency training (CT) Fig. 1. Left and right breasts as seen from the MLO and CC.
technique to join directed and solo learning with picture upgrade [20].
When utilized with an ATS-ViT (explicitly, ViT with ATS), the CT-based
SSL can altogether work on the presentation of the model. The recom-
1.2. Objective
mended strategy has been tried on two datasets, including the Breast
Cancer Histopathological Picture Arrangement (BreakHis) dataset and
As opposed to CNN, we want to foster transformer-based breast
the dataset of breast ultrasound images (BUS). When contrasted with
cancer growth recognition models that are better customized to breast
CNN models, the results of our methodology have been positive and
kind and that additionally utilize age, breast density, earlier history,
predominant. The project is accessible at TWO and is distributed under
the MIT License. and other data that is promptly accessible.
(i) We aim to show that the transformer can be used as a viable
1.1. Problem statement alternative in place of standard CNN techniques while spotting breast
cancer in both histological and ultrasound images.
Every year, almost 1.5 million women worldwide are affected by (ii) Successfully utilize the power of Vision Transformers and a
breast cancer, which also accounts for the majority of women who die BERT pre-trained model of Vision Transformer to extract meaningful
from the disease. As per patterns, 1 in each 8 ladies will get a breast information from images which are fed into the transformer through a
cancer finding during their lifetime. In recent years, breast cancer has combination of block wise mask and image patches.
surpassed cervical cancer as the most prevalent type of cancer in Indian (iii) Use of Attention based mechanism to form visual tokens as part
metropolitan centres. Cervical cancer still ranks second in rural India. of Image encoding and feeding the encoded features into the classifiers.
Women are now being affected by cases at startlingly younger ages (iii) Comparison of results from both ultra sound image data and
than they were 25 years ago. The average age at which Indian women Histopathological data using RNN-LSTM as classifiers.
develop breast cancer now is roughly 40, which is significantly younger
than the average age at which cancer strikes in other nations. 1.3. Terminology
Damaged DNA and family history are the main genetic causes of
breast cancer. Other risk factors, notwithstanding, can be way of life or The following are some key terms related to breast cancer:
natural related, like liquor utilization (concentrates on show that ladies
1. Mammography: Similar to x-rays, mammograms are screening
who drink three beverages each day have a 1.5 times higher risk of
tools. Images from a mammogram are used to find breast lesions and
being impacted), corpulence, hormonal therapies (an expanded degree
tumours for the first time. White areas denote tissues, while fat is
of estrogen because of chemical substitution treatment of conception
represented by grey areas. For both breasts, mammograms are recorded
prevention pills can connection to breast cancer), and a stationary way
of life without customary activity. There may be additional causes, such in the MLO (mediolateral-oblique) and CC (cranial-caudal) perspectives
as childbearing later in life or inadequate lactation. (see Fig. 1).
The largest causes, though, are a lack of knowledge about treatment 2. Magnetic Resonance Imaging (MRI) and ultrasound scans: These
options and screening procedures. A key problem is the lack of skilled are two more scanning techniques utilized in the evaluation of breast
radiologists and diagnostic facilities, as well as the delay in providing cancer. If a suspicious lesion is discovered during the mammography,
necessary care. We expect to make deep learning models to recognize an ultrasound is typically performed later. It aids in the distinction
dubious sores and thus convey expeditious and exact determination between a solid mass and a benign fluid-filled cyst.
with an end goal to help this growing reason.
One could comprehend that breast cancer has been the subject of 3. Biopsy: A biopsy involves surgically removing a sample of tissue,
broad exploration and that comparative learning machines have been and the cells are then examined. This can reveal the sort of cancer
made previously. We found that while these gadgets do exist, they are present and whether the cells are malignant. The strongest evidence
nowhere near impeccable, and clinical assessments require more trust- yet for breast cancer is here.
worthy administrations. Moreover, somewhat little review has been 4. Benign and Malignant: A lesion may be benign, malignant, or
finished utilizing Indian datasets. Just global patients approach by far neither. When the breast imaging shows no signs of a lesion, the breast
most of results and, as a matter of fact, all datasets. We see various is regarded as normal. Malignant lesions are cancerous, but benign
contrasts in the breast advancement and construction of Indian ladies, lesions are not.
like age contrasts in the beginning of cancer and breast. Indian breasts
have more sinewy tissues, making them more white and denser. We 5. Mass and calcification: The two breast abnormalities that occur
believe that the same networks cannot produce accurate information most frequently are masses and calcifications. Calcium deposits, also
when used on Indian situations. Conventionally Convolution Kernels known as calcifications, show up in the breast as tiny dots specks. The
have been used to extract features from the images but they are not term ‘‘masses’’ refers to tissue lumps that are dense in the centre and
often very light weight models and suffer from inherent biases. We less dense on the edges. There are benign and malignant variations of
observe that feature extraction techniques using BERT have done a both masses and calcifications. Ourse, popcorn calcification, dispersed,
commendable job in NLP related problems. We therefore search for a lucent-centred, and milk of calcium are examples of benign calcification
viable alternative to CNN based architecture and utilize the power of types, whereas fine and amorphous calcification are examples of ma-
pre-trained BEiT model from hugging face library for feature extraction lignant calcification types. Spiculated tumours, which alter the breast’s
from the image patches. architecture, are almost certainly malignant (see Fig. 2).
2
CNNs, can learn discriminative examples with naturally removed ele-

ments to address a picture test, as opposed to include based learning
models that require hand-created highlights [7,21]. Masud [22] pro-
posed a novel CNN model for ultrasound imaging in contrast with a few
previous CNN models, like AlexNet [23], Darknet19 [24], GoogleNet
[25], MobileNet [26], ResNet18 [27], ResNet50, V [28]. Gathering
learning has furthermore been utilized instead of single models. By
joining the image portrayals, Moon [4] joined three CNN models:
VGGNet, ResNet, and DenseNet [29]. Like this, Eroğlu [6] joined
highlights from Alexnet, MobilenetV2 [30], and Resnet50, then utilized
a Base Overt repetitiveness Greatest Significance based include choice
system to choose a bunch of the main elements that were utilized to
Fig. 2. Mass, left; calcification, right. train a feature-based classifier, for example, a support vector machine
(SVM) [31], or k-nearest With respect to histopathological imaging,
prior examinations utilized CNN models, which had progresses in
2. Litrature review various regions. To join the prescient force of the intermittent CNN,
ResNet, with the Beginning organization, Alom [32] made a Beginning
Intermittent Remaining Convolutional Brain Organization (IRRCNN).
This section summarizes previous research in two areas, namely SSL
To incorporate CNN and CapsNet [33] with profound component com-
used in biomedical image categorization and DNN-based BC detection
bination and improved directing, Wang et al. made FE-BkCapsNet. To
techniques.
tackle the assembly issue during preparing, Mewada [3] recommended
consolidating the spatial properties of a CNN with the ghastly data
2.1. Detection of breast cancer with deep neural networks
of a wavelet change. New training procedures have also been created
in addition to the model improvements. Block-wise fine-tuning was
Even though the problem of breast cancer detection has been a employed by [5], enabling the CNN’s final few residual blocks to be
significant one studied for many years, research on the subject showed more domain-specific. Despite the fact that DNN-based models for BC
that older approaches tended to use hand-crafted features like tissue location have gotten significant review, other model sorts have not
densities, age, and other characteristics of the breast and have applied yet been totally explored. As an as of late made and noticeable vision
machine learning classifiers on the features. Deep learning automates model, the ViT has attracted impressive interest different exercises.
the feature extraction process and makes certain that it extracts the Approving the ViT’s effect on imaging-based BC discovery is alluring.
features that are most pertinent to the task at hand. For breast cancer, Our review is such a work.
CNN-based methods have been used, but they only provide classifica-
tion, not detection. Image patches are frequently fed into classification 2.2. Semi-supervised learning-based classification of biological images
networks, and a binary classifier determines whether the patch is
benign or cancerous. But in order to help and support the field of cancer How much training instances required for a completely managed
detection, we must be able to provide a complete breast image and learning system can be diminished with the assistance of SSL. It might
precisely delineate the areas harbouring suspicious lesions. This is the require an investment to gather information in the biomedical domain,
task of localization or detection. Lesion localization techniques based especially in the field of cancer research where it might require months
on deep learning are currently under investigation, with only a few or even a very long time to evaluate a patient’s last condition [34].
popular papers establishing the gold standard. Thus, prior examinations utilized SSL to utilize the unlabelled informa-
Our goal is to successfully identify lesions in mammograms, pin- tion. For BC identification, Zemmal [34] utilized a Semi-Directed Help
point their location, and then determine whether they are cancerous or Vector Machine (S3VM) with custom highlights. Pseudo names on the
benign. Fix Camelyon-level were utilized by Jaiswal [35] to recognize cancer
Dezso have published a work in the Nature Scientific Reports titled, cells that had spread through histology. Shi and Zhang [36] performed
‘‘Detecting and categorizing lesions in mammograms with Deep Learn- quality articulation based result expectation for cancer recurrence using
ing’’. This study introduced a computer aided design framework in light low-density separation, a SSL procedure. Proclivity network combina-
of Quicker R-CNN, one of the best item distinguishing proof systems. tion and brain networks are consolidated in Mama and Zhang [37] SSL
Without the help of an individual, the innovation can recognize and model to execute not many shot realizing, which altogether upgrades
classify growths on a mammogram as harmless or threatening. On the model’s learning limit with less preparation information. Different
the famous public INbreast computerized bosom mammography data purposes of SSL incorporate colorectal cancer detection, skin cancer
set, the recommended procedure accomplished cutting edge grouping diagnosis, bladder cancer grading, and cancer survival analysis [38,39],
execution with an AUC of 0.95. With an AUC of 0.85 on the Fantasy and [40]. To the extent that we know, CT has not been read up for
dataset, the procedure utilized by the creators of the exploration came BC location; subsequently our exploration means to fill this hole. More
in runner up in the Computerized Mammography DREAM Challenge. related papers can be viewed as in [41–43].
On the INbreast dataset, the framework accomplished great respon-
siveness with an exceptionally low number of bogus positive imprints 3. Methodology
per picture when utilized as a locator. To affirm their recognitions,
the creators made a FROC bend that plots responsiveness against the 3.1. Dataset
ordinary number of bogus positive imprints per picture utilizing ROC
bends to decide AUC. This document has served as our starting point Ultra Sound Images: A dataset should, in general, be available to
for all our efforts have gone into first replicating this article, then be used in the development of a deep learning healthcare system. In this
suggesting our own ensemble network for enhanced breast cancer investigation, two different sets of breast ultrasound datasets are used.
detection performance (see Fig. 3). BUSI dataset was collected and acquired over time from US systems
For the characterization of breast tumours on ultrasound and with a range of requirements. The dataset BUSI-1311 Images contains
histopathological pictures, various prior and customized profound CNN ultrasound breast images collected at baseline by females between the
models have been applied. Deep neural network (DNN) models, like ages of 25 and 75. The data are divided into three categories: normal,
3
Fig. 3. Outline of a CAD model based on faster RCNN.
Fig. 4. Sample breast histology images.
benign, and malignant. Despite the fact that it was intended for lesion class imbalances among the various variants of the image classes and
detection rather than classification, our work uses it to classify lesions. were reduced to small patches. Figs. 4 and 5, show the sample histology
Histopathological Image Data: The original dataset consisted of images of Breast and Breast Ultra sound images after finding the region
images of Breast Cancer (BC) specimens with three sets of data such of interest from the masks. Fig. 7A, 7B and 7C shows the pre-processing
as testing, training, and validation data from the Kaggle repositories. : techniques and that data is broken down in train-test ratio of 80–20.
Breast Histopathology Images, Kaggle
3.3. Feature extracting using BERT (Transformer encoder)
3.2. Pre-processing
A transformer is a neural architecture that utilizes mainly two
The layouts of the breast are first distinguished during the un- things, namely sequence to sequence learning and attention and an
derlying pre-handling of ultrasound breast images, and afterward the Encoder–Decoder based architecture. The attention-mechanism looks
commotion content is eliminated without influencing the picture’s at an input sequence and decides at each step which other parts of
basic data. We carried out experiments on the BUSI-1311 dataset. The the sequence are important. [46]. Transformer is an architecture for
training set of the dataset BUSI-1311 Images consists of various images, converting one sequence into another with the aid of two compo-
whereas the test set consists of ultrasound breast images obtained at nents (Encoder and Decoder), similar to LSTM, but different from the
baseline from females between the ages of 25 and 75. In order to previously described/existing sequence-to-sequence models in that it
reduce the impact of stains on biological tissues in medical imaging, does not use any recurrent networks [47]. Specialists investigated a
special image normalizing techniques must be used. A standardized transformer’s true capacity in PC vision in the wake of finding out about
method for the quantitative examination of tissue slices was put forth how well it acted in NLP undertakings. One of the implementations
by Bejnordi [44] and it takes into account the staining procedures used was the ViT (Vision Transformers). To fulfil the information necessity
to make the tissue slices. First, the image’s colour is converted into an of the transformer, a picture is lumped into an assortment of picture
optical density via a logarithmic transformation. The OD tuple is then patches. The supposed picture fix implanting process is basically a to-
subjected to singular value decomposition to produce a 2D projection tally associated layer that goes through a straight change. Specifically,
with a high variance. The original image is then transformed using we might lay out that  = HW/P^2 assuming an information picture
the produced colour space. The image’s histogram is then dynamically of size H (height) × W (width) × C (channels) is partitioned into 
expanded to cover more than 90% of the data. Histological images patches (i.e., tokens), every one of size P × P × C, (P, P) being the
were pre-processed through augmentation techniques [45] to reduce resolution of each Patch. Then, a vector of size D is made by conveying
4
Fig. 5. Sample ultrasound images from BUSI1311 dataset with mask and specific ROI.
each fix. The information is thus changed over into a 2D tensor of performed on the flattened embedded patch and fed into the BEiT and
size NXD. Moreover an extra [CLS] token is added to transformer to are linearly projected, which is similar to word embedding in BERT.
encode the order related information and remove any possible biases Encoder which in turns uses attention mechanisms using the Masked
unlike in CNN based architecture. Other pre-training frameworks, like Image modelling head to encode the most important features into the
the Bidirectional Encoder Representations from Transformers (BERT), visual tokens. The features are thus encoded using pre-trained BEiT and
much of the time utilize this strategy [48]. A position encoding vector can be used after suitable downstream tasks like classifications after
is furthermore added to each fix installing to keep up with the general doing some fine-tuning and feeding into the RNN-LSTM classifier.
position connection between different patches, making a token insert- Figs. 7A, 7B and 7C shows the preprocessing coding for Histology
and Ultra sound images. For ultrasound images, the code converts the
ing that is utilized by the primary layer of the transformer encoder.
images to grayscale, normalizes the pixel values, and applies histogram
Fig. 6A shows the overall flow of this research work and that we have
equalization to enhance the contrast.
used BERT for feature encoding, in particular we use BEiT (Bert Pre
For histology images, the code normalizes the pixel values and
training of Image transformers) [49] from Hugging face library. The
applies adaptive histogram equalization to enhance the contrast.
BEiT pre-training task is adapted from [49] and shown in Fig. 6B.
The images have two views of representations in our method, namely, The below code snippet shows how pretrained model is used to
image patch, and visual tokens. The two types serve as input and output extract features using pretrained Visual Transformers.[Fig. 8]
representations during pre-training, respectively [49]. The image data The following code explains how BEiT is used for Feature extrac-
is tokenized using a tokenizer [50]. The image is also broken down tion:[Fig. 9]
into smaller patches of proportionate sizes as explained above and those Figs. 8 and 9, clearly depicts the steps to do feature encoding using
patches are masked in between. The block wise patches along with the Visual Transformers and BEiT. The pretrained visual transformer mod-
masked areas are flattened as shown in Fig. 6B. Position embedding is els are loaded from hugging face library and features are extrated from
5
Fig. 6A. Proposed flow chart.
the breast ultrasound and histology images to feed them in RNN-LSTM times more noteworthy than a RNN. Different methodologies, including
based classifier as shown in Figs. 12 and 13. Framework LSTM and Complex organizations, have been attempted
to resolve the issue of long haul association [53]. The latest LSTM
3.4. Training model(RNN- LSTM) adjustment was given by [54]. As portrayed in Fig. 5, Ct represents
the memory cell units in the LSTM, addressing the info entryway, the
One of the more advanced varieties of RNN is the LSTM network, neglect door, and the result door.
which can depend on order learning in tasks involving sequence pre-
It is essential to discover the information that will be erased from
diction. The goal of the LSTM architecture network is to create an
the cell state. As exhibited in the situation beneath, this sort of decision
appropriate BP training mechanism. It is an upgraded RNN architec-
is made utilizing the neglect entryway and a sigmoid capability:
ture. As was already noted, the LSTM model was created first to handle
the fading gradient that is mostly present in standard RNN [51]. ( )
 =   .ℎ−1 +   +  (1)
However, by implementing permanent error movement across per-
sistent error carrousels via superior components and preventing ‘‘back The neglect layer, the sigmoid actuation capability in the neglect
into time’’ collapsing of the error stream, the LSTM technique will door, the weight association with prior secret expresses, the weight
lessen the issue. For a very long time, [52] were aware of the challenges association with the information designs, and the predispositions at
involved in training RNN to capture long-term dependencies. There the neglect layer are totally addressed by the images in the situation.
have, however, been some effective strategies for overcoming this Finding extra data that can be kept in the cell state is likewise pivotal.
crucial problem, such as altering the state-to-state function transition
The info entryway layer, which chooses the qualities to be refreshed,
to encourage the hidden neurons to organize long-term memory and
and the transitory cell state layer, which makes new applicant values
produce routes in the time-unfolded RNN (see Fig. 10).
as vectors that could be added to the state as expressed in conditions
As numerous gates control their input and output, the LSTM ap-
(2) and (3), attempt this cycle.
proach substitutes memory cells for the hidden neurons that make up
the sigmoid or tanh functions. Information flow to hidden neurons is ( )
 =   .ℎ−1 +   +  (2)
controlled by these gates. High complexity in the buried layer is a ( )
problem for LSTM. The quantity of boundaries in a LSTM is multiple ̃ = ℎ  .ℎ−1 +   +  (3)
6
Fig. 6B. BEiT pre-training.

Source: Adapted from [49].
Fig. 7A. Preprocessing.
7
Fig. 7B. Preprocessing.
Fig. 7C. Preprocessing.
Eq. (4) can be used to determine the LSTM’s output at the conclu- noise vector into a high-dimensional representation in the features of
sion. the image. The discriminator transforms the high-dimensional features
retrieved by convolutional layers before classification using several
ℎ =  ∗ tanh( ) (4)
deep fully connected layers (see Fig. 11).
The reason for concatenating the pre-processed ultrasound and
4. Results and discussion histology images is to use them together as input to the RNN-LSTM
classifier and benefit from the advantages of both modalities. The
4.1. Classification using RNN-LSTM training ultrasound images may not be as informative as histological images, but
they are much faster and cheaper to acquire, and they are non-invasive.
We proposed RNN-LSTM, a deep fully connected and convolu- By combining both modalities, the overall performance of the classifier
tional layer-based architecture for the discriminator and generator in can be improved. Furthermore, by concatenating the images, it allows
Transformer encoding. Our proposed architecture generates samples of the model to learn from both modalities and make predictions based
higher quality than conventional architectures on a variety of bench- on both, which is particularly useful when the data from one modality
mark picture datasets. As illustrated in Fig. 6, we demonstrated that is not enough to make a decision, and the model can rely on the data
RNN-LSTM learns faster than conventional architecture and can pro- from the other modality to make a decision.
duce recognizable, high-quality photographs after just a few training Transformer encoding, which is based on an RNN-LSTM model,
rounds. Before the convolution layers in the generator, we utilized a learns the distribution more quickly than the CNN model does. The
number of deep fully connected layers to convert the low-dimensional Transformer encoding, which is based on RNN-LSTM, provides easily
8
Fig. 8. Feature extraction using pretrained visual transformers.
Fig. 9. Feature extraction using pretrained BEiT.
recognizable numbers after a limited number of epochs, in contrast

to the CNN model. After epoch 150, all models produce good im-
ages (Fig. 6), but Transformer encoding built on RNN-LSTM models
continues to outperform the CNN model in terms of image quality.
4.2. Confusion matrix for ultra sound image dataset
A couple of boundaries registered for this RNN-LSTM classifier, for

example, Accuracy rate, sensitivity, specificity and error rate with their
qualities are, 99, 100, 100 and 1%.
The sensitivity rate, accuracy, and efficiency of the LSTM classifier
are demonstrated by the confusion matrix in Fig. 7 (see Fig. 14).
A variety of performance indicators have been used to assess the
suggested approach’s correctness. They consist of precision, recall;
recall rate, recall accuracy, specificity, sensitivity, error rate, positive
predictive value, positive likelihood, negative predictive value, and
negative likelihood.
According to the results of the multi-class confusion matrix, the
Fig. 10. Structure of LSTM.
trained model employed 43, 21, and 35 images for the three separate
classes.
They can be stated in the following way:
  =   ∕(  +   ) (5)
9
Fig. 11. Training and data validation for prediction of accuracy.
 =   ∕(  +  ) (6) scans was 98.26%. The adaptive histogram equalization method, which
( ) was used in [56] to enhance ultrasound pictures, had an accuracy rate
1 +  2 ∗  ∗  
 1  −   = (7) of 93.04 [59]. Explains a ResNet system for tumour identification that
 2 ∗ ( ∗  )
combines composites of various CNN models with imaging data from
 = (  +  )∕(  +   +   +  ) (8) various image formats. The accuracy rate for this data collection was
      = (,  ) =
∩
(9) 95.32 [57]. States those fuzzy boosting methods were applied following
∪ bilateral filtering to the underlying breast ultrasound image. Accuracy
Where, X, Y=Regions of 95.48% was reached. The semi-supervised -GAN model’s accuracy,
( ) as utilized by the authors in [58], was 90.41%. The suggested method
 ∗ + − (+ ∗ + )
    = (10) achieved a 99% accuracy rate when tested against a BUSI-1311 pictures
2 − (+ ∗ + )
upgraded dataset (Table 2).
Where the sum is the sum of all rows in the matrix, M (i+) represents
the sum of marginal rows, M (+i) represents the sum of marginal 4.6. Setting optimized parameters and coefficients for the classifier
columns, and o represents the number of observations.
There are a few different ways to set optimal parameters for an
4.3. Confusion matrix for histopathological image dataset
RNN-LSTM based classification model, some of the common ways are:
Grid Search: This method involves defining a set of possible parame-
A couple of boundaries registered for this RNN-LSTM classifier, for
ter values, and then training and evaluating the model with all possible
example, Accuracy rate, sensitivity, specificity and error rate with their
combinations of these values. This can be computationally expensive,
qualities are, 91, 100, 68.28 and 9.01%.
especially for models with many parameters, but it can be useful for
The sensitivity rate, accuracy, and efficiency of the LSTM classifier
finding a good set of starting parameters.
are demonstrated by the confusion matrix in Figs. 8 and 9 of training
Random Search: This method is similar to grid search, but it in-
and testing data (see Figs. 15 and 16).
volves randomly sampling from the parameter space instead of trying
4.4. Comparison of ultra sound image and histopathological image dataset all possible combinations. This can be less computationally expensive
classification using RNN-LSTM than grid search and can still yield good results.
Bayesian Optimization: This method uses a probabilistic model to
A comparison of the accuracy scores of the recommended Trans- model the relationship between the parameters and the model’s perfor-
former encoding based on RNN-LSTM method for both Ultra Sound mance. It can be more efficient than grid search or random search as it
Image and Histopathological Data shows that he suggested Ultra Sound uses the information from previous evaluations to guide the search for
data gives better accuracy and precision compared to histopathological optimal parameters.
data (Table 1). Hyperopt : is a library for model selection and hyperparameter tun-
ing. It includes a set of optimization algorithms such as TPE and Tree
4.5. Comparison of the proposed system to the current one for evaluation of Parzen Estimators, it uses the information from previous evaluations
to guide the search for optimal parameters.
A comparison of the accuracy scores of the recommended Trans- Optuna: is another library for model selection and hyperparameter
former encoding based on RNN-LSTM method and the available picture tuning. It uses a tree-structured Parzen estimator (TPE) algorithm to
classifying models in the literature shows that the suggested model optimize the model’s parameters.
achieves successful results (Table 2). Manually tuning: This method involves manually adjusting the pa-
In Table 2, the suggested method is compared to the state-of-the-art rameters based on the results of the model’s performance on a valida-
methods. The accuracy percentage for the writers of [55]’s ultrasound tion set. This can be time-consuming, but it can be useful for gaining
10
Fig. 12. —Classification using RNN-LSTM.
Table 1
Result of proposed model.
Parameters Performance of LSTM in % Performance of LSTM in %
ultra sound image - dataset histopathological - dataset
Accuracy 99% 90.98%
Sensitivity 100 100
Specificity 100 68.28%
Error rate 1 9.01
Positive predictive value 1 0.8882
Negative predictive value 1 1
Positive likelihood NaN 3.1520
Negative likelihood 0 0
Overall Precision 99.1% 84.5%
Overall Recall 98.5% 94.4%
1 Score Value 98.8% 89.18%
a deeper understanding of the model and the relationship between the coefficients, also known as weights. There are several ways to do this,
parameters and the model’s performance. some of the most common methods include:
Ultimately, the best method for finding optimal parameters will Random Initialization: This method involves initializing the model’s
depend on the specific model and dataset you are working with. It is coefficients with random values, and then training the model using the
important to have a good understanding of the model and its param- optimal parameters determined earlier.
eters, as well as a good sense of what performance to expect, to set Pre-trained models: Another way to set the model’s coefficients is
optimal parameters. to use pre-trained models, which have already been trained on large
Once we have determined the optimal parameters for an RNN- amounts of data. This can be useful if you have a similar dataset and
LSTM based classification model, the next step is to set the optimal task as the pre-trained model.
11
Fig. 13. Code to generate confusion matrix.
Fig. 14. Confusion Matrix testing dataset of ultra sound images.
Transfer Learning: This method involves using a pre-trained model 5. Conclusion

as a starting point, and then fine-tuning the model on your specific
dataset. This can be useful if you have a small dataset and want to We proposed an automated method for ordering breast cancer
leverage the knowledge learned from a pre-trained model. growth utilizing ultrasound imaging and breast histological biopsy im-
Using Optimization Algorithms: Optimization algorithms such as ages. Due to the fact that Breast Ultra sound images have been masked
with a definite Region of Interest (ROI) demarked, the BUSI dataset
Stochastic Gradient Descent, Adam, and L-BFGS can be used to optimize
gave better accuracy, on the contrary because of close colour combina-
the coefficients of the model. These algorithms use the gradients of the
tions due to the use of stains in biopsy images, the results of classifica-
loss function to update the model’s coefficients.
tion in breast histology was comparatively less. The histology images
Bayesian optimization: This method uses a probabilistic model to would work better with Auto-encoders and distinguishable colour maps
model the relationship between the coefficients and the model’s per- to make the system understand that which part of the coloured stains
formance. It can be more efficient than grid search or random search are responsible for graded classification of the images into benign and
as it uses the information from previous evaluations to guide the search malignant. Out of numerous tests, the proposed technique, with Pre-
for optimal coefficients. trained BEiT and a RNN-LSTM classifier, had the highest accuracy of
12
Fig. 15. Confusion Matrix of training dataset of histopathological images.
Fig. 16. Confusion Matrix of testing dataset of histopathological images.
99% and 91 percent respectively for BUSI1311 and Breast histology framework yields superior outcomes. The best features were chosen to
datasets. When compared to more current techniques, the proposed eliminate unneeded features, the dataset augmentation improved the
13
Table 2 [4] W.K. Moon, Y.W. Lee, H.H. Ke, S.H. Lee, C.S. Huang, R.F. Chang, et al.,
Comparative evaluation of the state-of-the-art approaches and the suggested approach. Computer-aided diagnosis of breast ultrasound images using ensemble learning
Methods Accuracy (%) Ref. from convolutional neural networks, Comput. Methods Programs Biomed. 190
(2020) 105361, http://dx.doi.org/10.1016/j.cmpb.2020.105361.
ResNet-18, (ICS-ELM) 98.26 [55]
[5] S. Boumaraf, X. Liu, Z. Zheng, X. Ma, C. Ferkous, A new transfer learning based
CNN 93.04 [56] approach to magnification dependent and independent classification of breast
Transfer learning 95.48 [57] cancer in histopathological images, Biomed. Signal Process. Control 63 (2021)
102192, http://dx.doi.org/10.1016/j.bspc.2020.102192.
GAN-CNN 90.41 [58]
[6] Y. Eroğlu, M. Yildirim, A. Çinar, Convolutional neural networks based classifica-
Inception ResNet V2 95.32 [59] tion of breast ultrasonography images by hybrid method with respect to benign,
Bilateral Knowledge 96.0 [45] malignant, and normal using mRMR, Comput. Biol. Med. 133 (2021) 104407,
Distillation in breast http://dx.doi.org/10.1016/j.compbiomed.2021.104407.
histology dataset [7] A.K. Mishra, P. Roy, S. Bandyopadhyay, S.K. Das, Breast ultrasound tumour
classification: A machine learning—radiomics based approach, Expert Syst. 38
Breast Cancer Calcification 97.8 [60] (2021) e12713, http://dx.doi.org/10.1111/exsy.12713.
Identification using K [8] L. Abdelrahman, M.Al. Ghamdi, F. Collado-Mesa, M. Abdel-Mottaleb, Con-
Means, GLCM and HMM volutional neural networks for breast cancer detection in mammography: A
classifier survey, Comput. Biol. Med. 131 (2021) 104248, http://dx.doi.org/10.1016/j.
Hybrid dilated Ghost Model 99.3 [61] compbiomed.2021.104248.
[9] D. Singh, A.K. Singh, Role of image thermography in early breast cancer
Segmentation approach on 96.0 [62]
detection-past, present and future, Comput. Methods Programs Biomed. 183
Breast
(2020) 105074, http://dx.doi.org/10.1016/j.cmpb.2019.105074.
Mammograms using CLAHE
[10] Y. Benhammou, B. Achchab, F. Herrera, S. Tabik, BreakHis based breast
and Fuzzy SVM
cancer automatic diagnosis using deep learning: Taxonomy, survey and insights,
SVM Kernel trick and Hyper 99.1 [63] Neurocomputing 375 (2020) 9–24, http://dx.doi.org/10.1016/j.neucom.2019.09.
parameter tuning in WBCD 044.
VisionTransformer 99.00 Our proposed model [11] A. Das, M.S. Nair, S.D. Peter, Computer-aided histopathological image analysis
(ViT)encoding using techniques for automated nuclear atypia scoring of breast cancer: A review,
pre-trained BEiT and J. Digit. Imaging 33 (2020) 1091–1121, http://dx.doi.org/10.1007/s10278-019-
RNN-LSTM for BUSI dataset 00295-z.
[12] C. Kaushal, S. Bhat, D. Koundal, A. Singla, Recent trends in computer assisted
diagnosis (CAD) system for breast cancer diagnosis using histopathological
images, Irbm 40 (2019) 211–227, http://dx.doi.org/10.1016/j.irbm.2019.06.001.
[13] G. Hamed, M.A.E.R. Marey, S.E.S. Amin, M.F. Tolba, Deep learning in breast
training strength, and the combination technique improved accuracy
cancer detection and classification, in: The International Conference on Artificial
consistency. These elements work together to strengthen this study. The Intelligence and Computer Vision, Springer, Berlin, Germany, 2020, pp. 322–333.
limitation of this model is that it is performed on a pre-trained model [14] A.L. Beam, I.S. Kohane, Big data and machine learning in health care, JAMA
and not compared with other Image transformers’ result as explained 319 (2018) 1317–1318, http://dx.doi.org/10.1001/jama.2017.18391.
in base paper [49,50]. The pre-trained model’s accuracy ranges from [15] G. Li, Z. Xiao, Transfer learning-based neuronal cell instance segmentation with
pointwise attentive path fusion, IEEE Access (2022).
80 to 86 percent only and hence it is not used for classification task [16] A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner,
in our model. In our model we have used LSTM-RNN as the classifier. et al., An image is worth 16x16 words: Transformers for image recognition at
The optimal coefficients (type of optimizers used, learning rate etc.) can scale, 2020, arXiv Prepr. arXiv:2010.11929.
only be modified for the down-streamed task of LSTM based classifier [17] B. Gheflati, H. Rivaz, Vision transformer for classification of breast ultrasound
images, 2021, arXiv Prepr. arXiv:2110.14731.
over top of transfer learning. Future work will focus on two essential [18] J.E. Van Engelen, H.H. Hoos, A survey on semi-supervised learning, Mach. Learn.
tasks: (i) Training a Vision Transformer from scratch and/or fine tune 109 (2020) 373–440, http://dx.doi.org/10.1007/s10994-019-05855-6.
the BEiT model for down streamed tasks like segmentation and classi- [19] M. Fayyaz, S.A. Kouhpayegani, F.R. Jafari, E. Sommerlade, H.R.V. Joze, H.
fications (ii) Use of Variational Auto-encoders for tokenizing the image Pirsiavash, et al., Ats: Adaptive token sampling for efficient vision transformers,
2021, arXiv:2111.15667 [cs].
data and representing most relevant patches using understandable
[20] Q. Xie, Z. Dai, E. Hovy, T. Luong, Q. Le, Unsupervised data augmentation for
heat maps. We will discuss our proposed approach with professionals consistency training, Adv. Neural Inf. Process. Syst. 33 (2020) 6256–6268.
in ultrasound imaging, radiologists and medicine to implement it in [21] Z. Li, F. Liu, W. Yang, S. Peng, J. Zhou, A survey of convolutional neural
hospitals. networks: Analysis, applications, and prospects, IEEE Trans. Neural Netw. Learn.
Syst. 2021 (2021) 1–21, http://dx.doi.org/10.1109/TNNLS.2021.3084827.
[22] M. Masud, A.E. Eldin Rashed, M.S. Hossain, Convolutional neural network-
Declaration of competing interest based models for diagnosis of breast cancer, Neural Comput. Appl. (2020)
http://dx.doi.org/10.1007/s00521-020-05394-5.
[23] A. Krizhevsky, I. Sutskever, G.E. Hinton, Imagenet classification with deep
The authors declare that they have no known competing finan-
convolutional neural networks, Adv. Neural Inf. Process. Syst. 25 (2012).
cial interests or personal relationships that could have appeared to [24] J. Redmon, S. Divvala, R. Girshick, A. Farhadi, You only look once: Unified,
influence the work reported in this paper. real-time object detection, in: Proceedings of the IEEE Conference on Computer
Vision and Pattern Recognition, 2016, pp. 779–788.
[25] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, et al., Going
Data availability
deeper with convolutions, in: Proceedings of the IEEE Conference on Computer
Vision and Pattern Recognition, 2015, pp. 1–9.
Data will be made available on request. [26] A.G. Howard, M. Zhu, B. Chen, D. Kalenichenko, W. Wang, T. Weyand,
et al., Mobilenets: Efficient convolutional neural networks for mobile vision
applications, 2017, arXiv Prepr. arXiv:1704.04861.
References [27] K. He, X. Zhang, S. Ren, J. Sun, Deep residual learning for image recognition, in:
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition,
[1] F. Bray, J. Ferlay, I. Soerjomataram, R.L. Siegel, L.A. Torre, A. Jemal, et al., 2016, pp. 770–778.
Global cancer statistics 2018: GLOBOCAN estimates of incidence and mortality [28] F. Chollet, Xception: Deep learning with depthwise separable convolutions, in:
worldwide for 36 cancers in 185 countries, Ca. Cancer J. Clin. 68 (2018) Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition,
394–424, http://dx.doi.org/10.3322/caac.21492. 2017, pp. 1251–1258.
[2] Q. Hu, H.M. Whitney, M.L. Giger, A deep learning methodology for improved [29] G. Huang, Z. Liu, L.Van.Der. Maaten, K.Q. Weinberger, Densely connected
breast cancer diagnosis using multiparametric MRI, Sci. Rep. 10 (2020) 10536, convolutional networks, in: Proceedings of the IEEE Conference on Computer
http://dx.doi.org/10.1038/s41598-020-67441-4. Vision and Pattern Recognition, 2017, pp. 4700–4708.
[3] H.K. Mewada, A.V. Patel, M. Hassaballah, M.H. Alkinani, K. Mahant, Spectral– [30] M. Sandler, A. Howard, M. Zhu, A. ZhmogiNov, L.C. Chen, Mobilenetv2: Inverted
spatial features integrated convolution neural network for breast cancer residuals and linear bottlenecks, in: Proceedings of the IEEE Conference on
classification, Sensors 20 (4747) (2020) http://dx.doi.org/10.3390/s20174747. Computer Vision and Pattern Recognition, 2018, pp. 4510–4520.
14
[31] D.A. Pisner, D.M. Schnyer, Support vector machine, in: Machine Learning, [48] J. Devlin, M.W. Chang, K. Lee, K. Toutanova, Bert: Pre-training of deep
Elsevier, Amsterdam, Netherlands, 2020, pp. 101–121. bidirectional transformers for language understanding, 2018, arXiv Prepr. arXiv:
[32] M.Z. Alom, C. Yakopcic, M.S. Nasrin, T.M. Taha, V.K. Asari, Breast cancer 1810.04805.
classification from histopathological images with inception recurrent residual [49] H. Bao, L. Dong, F. Wei, Beit: Bert pre-training of image transformers, 2021,
convolutional neural network, J. Digit. Imaging 32 (2019) 605–617, http://dx. arXiv preprint arXiv:2106.08254.
doi.org/10.1007/s10278-019-00182-7. [50] H. Touvron, M. Cord, A. Sablayrolles, G. Synnaeve, H. Jégou, Going deeper with
[33] S. Sabour, N. Frosst, G.E. Hinton, Dynamic routing between capsules, Adv. Neural image transformers, in: Proceedings of the IEEE/CVF International Conference on
Inf. Process. Syst. 30 (2017). Computer Vision, 2021, pp. 32–42.
[34] N. Zemmal, N. Azizi, N. Dey, M. Sellami, Adaptive semi supervised support vector [51] Juergen. Schmidhuber, D. Wierstra, M. Gagliolo, F. Gomez, Training recurrent
machine semi supervised learning with features cooperation for breast cancer networks by evolino, Neural Comput. 19 (3) (2007) 757–779, PDF (preprint).
classification, J. Med. Imaging Health Inf. 6 (2016) 53–62, http://dx.doi.org/10. Compare Evolino overview (since IJCAI 2005).
1166/jmihi.2016.1591. [52] Sepp. Hochreiter, Yoshua. Bengio, Paolo. Frasconi, Juergen. Schmidhuber, Gra-
[35] A.K. Jaiswal, I. Panshin, D. Shulkin, N. Aneja, S. Abramov, Semi- supervised dient flow in recurrent nets: the difficulty of learning long-term dependencies,
learning for cancer detection of lymph node metastases, 2019, arXiv Prepr. in: S.C. Kremer, J.F. Kolen (Eds.), A Field Guide to Dynamical Recurrent Neural
arXiv:1906.09587. Networks, IEEE press, 2001.
[36] M. Shi, B. Zhang, Semi-supervised learning improves gene expression-based [53] Q.V. Le, N. Jaitly, G.E. Hinton, A simple way to initialize recurrent networks of
prediction of cancer recurrence, Bioinformatics 27 (2011) 3017–3023, http: rectified linear units, 2015, arXiv preprint arXiv:1504.00941.
//dx.doi.org/10.1093/bioinformatics/btr502. [54] Wojciech. Zaremba, Ilya. Sutskever, Recurrent neural network regularization,
[37] T. Ma, A. Zhang, Affinity network fusion and semi-supervised learning for 2014, arXiv:1409.2329v5 [cs.NE] 19 Feb.
cancer patient clustering, Methods 145 (2018) 16–24, http://dx.doi.org/10.1016/ [55] S.S. Chakravarthy, H. Rajaguru, Automatic detection and classification of mam-
j.ymeth.2018.05.020. mograms using improved extreme learning machine with deep learning, IRBM
[38] Y. Liang, H. Chai, X.Y. Liu, Z.B. Xu, H. Zhang, K.S. Leung, et al., Cancer survival 43 (2021) 49–61.
analysis using semi-supervised learning method based on cox and AFT models [56] E.M. El Houby, N.I. Yassin, Malignant and nonmalignant classification of breast
with L1/2 regularization, BMC Med. Genomics 9 (2016) 11, http://dx.doi.org/ lesions in mammograms using convolutional neural networks, Biomed. Signal
10.1186/s12920-016-0169-6. Process. Control 70 (2021) 102954.
[39] A. Masood, A. Al-Jumaily, K. Anam, Self-supervised learning model for skin [57] Z. Zhuang, Z. Yang, A.N.J. Raj, C. Wei, P. Jin, S. Zhuang, Breast ultrasound tumor
cancer diagnosis, in: 2015 7th International IEEE/EMBS Conference on Neural image classification using image decomposition and fusion based on adaptive
Engineering (NER), 2015, pp. 1012–1015, http://dx.doi.org/10.1109/NER.2015. multi-model spatial feature fusion, Comput. Methods Programs Biomed. 208
7146798. (2021) 106221.
[40] G. Yu, K. Sun, C. Xu, X.H. Shi, C. Wu, T. Xie, et al., Accurate recognition of col- [58] T. Pang, J.H.D. Wong, W.L. Ng, C.S. Chan, Semi-supervised GAN-based radiomics
orectal cancer with semi-supervised deep learning on pathological images, Nature model for data augmentation in breast ultrasound mass classification, Comput.
Commun. 12 (2021) 6311, http://dx.doi.org/10.1038/s41467-021-26643-8. Methods Programs Biomed. 203 (2021) 106018.
[41] J. Chaki, M. Woźniak, Deep learning for neurodegenerative disorder (2016 to [59] M.A. Al-Antari, S.-M. Han, T.-S. Kim, Evaluation of deep learning detection and
2022) : a systematic review, Biomed. Signal Process. Control 80 (Pt. 1) (2023) classification towards a computer-aided diagnosis of breast lesions in digital X-ray
1–20, http://dx.doi.org/10.1016/j.bspc.2022.104223. mammograms, Comput. Methods Programs Biomed. 196 (2020) 105584.
[42] M. Wieczorek, J. Siłka, M. Woźniak, S. Garg, M.M. Hassan, Lightweight con- [60] Sushovan. Chaudhury, Manik. Rakhra, Naz. Memon, Kartik. Sau,
volutional neural network model for human face detection in risk situations, Melkamu Teshome Ayana, Breast cancer calcifications: Identification using
IEEE Trans. Ind. Inform. 18 (7) (2022) 4820–4829, http://dx.doi.org/10.1109/ a novel segmentation approach, Comput. Math. Methods Med. (2021)
TII.2021.3129629. http://dx.doi.org/10.1155/2021/9905808, Article ID 9905808, 13 pages,
[43] M. Woźniak, J. Siłka, M. Wieczorek, Deep neural network correlation learning 2021.
mechanism for CT brain tumor detection, Neural Comput. Appl. (2021) http: [61] Edwin Ramirez-Asis, Romel Percy Melgarejo Bolivar, Leonid Alemán Gonza-
//dx.doi.org/10.1007/s00521-021-05841-x. les, Sushovan Chaudhury, Ramgopal Kashyap, Walaa F. Alsanie, G.K. Viju, A
[44] B.E. Bejnordi, G. Litjens, N. Timofeeva, I. Otte-Höller, A. Homeyer, N. lightweight hybrid dilated ghost model-based approach for the prognosis of
Karssemeijer, J.A. van der Laak, Stain specific standardization of whole-slide breast cancer, Comput. Intell. Neurosci. (2022) 9325452, http://dx.doi.org/10.
histopathological images, IEEE Trans. Med. Imaging 35 (2015) 404–415. 1155/2022/9325452, 10 pages, 2022.
[45] Sushovan. Chaudhury, Nilesh. Shelke, Kartik. Sau, B. Prasanalakshmi, Moham- [62] Sushovan. Chaudhury, Alla.Naveen. Krishna, Suneet. Gupta, K. Sakthi-
mad. Shabaz, A novel approach to classifying breast cancer histopathology biopsy dasan Sankaran, Samiullah. Khan, Kartik. Sau, Abhishek. Raghuvanshi, F.
images using bilateral knowledge distillation and label smoothing regularization, Sammy, Effective image processing and segmentation-based machine learning
Comput. Math. Methods Med. (2021) http://dx.doi.org/10.1155/2021/4019358, techniques for diagnosis of breast cancer, Comput. Math. Methods Med. (2022)
Article ID 4019358, 11 pages, 2021. 6841334, http://dx.doi.org/10.1155/2022/684133, 6 pages, 2022.
[46] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A.N. Gomez, et al., [63] Chaudhury. Sushovan, Shelke. Nilesh, Rashid M. Zahraa, Sau. Kartik, Effect of
Attention is all you need, Adv. Neural Inf. Process. Syst. 30 (2017). grid search and hyper parameter tuned pipeline with various classifiers and
[47] J. Chung, C. Gulcehre, K. Cho, Y. Bengio, Empirical evaluation of gated recurrent PCA for breast cancer detection, Curr. Signal Transduct. Therapy 17 (2022)
neural networks on sequence modeling, 2014, arXiv Prepr. arXiv:1412.3555. e150722206811, http://dx.doi.org/10.2174/1574362417666220715105527.
15

A BERT Encoding With Recurrent Neural Network and Long-Short Term Memory For Breast Cancer Image Classification - 1-s2.0-S2772662223000176-Main

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

A BERT Encoding With Recurrent Neural Network and Long-Short Term Memory For Breast Cancer Image Classification - 1-s2.0-S2772662223000176-Main

Uploaded by

Copyright:

Available Formats

Decision Analytics Journal 6 (2023) 100177

Contents lists available at ScienceDirect

Decision Analytics Journal

A BERT encoding with Recurrent Neural Network and Long-Short Term

ARTICLE INFO ABSTRACT

1. Introduction review by radiologists or pathologists is as yet important, which takes

requires information on the topic. Then again, semi-supervised learning

CNNs, can learn discriminative examples with naturally removed ele-

Fig. 3. Outline of a CAD model based on faster RCNN.

Fig. 4. Sample breast histology images.

Fig. 6A. Proposed flow chart.

Fig. 6B. BEiT pre-training.

Fig. 7A. Preprocessing.

Fig. 7B. Preprocessing.

Fig. 7C. Preprocessing.

Fig. 8. Feature extraction using pretrained visual transformers.

Fig. 9. Feature extraction using pretrained BEiT.

recognizable numbers after a limited number of epochs, in contrast

4.2. Confusion matrix for ultra sound image dataset

A couple of boundaries registered for this RNN-LSTM classifier, for

  =   ∕(  +   ) (5)

Fig. 11. Training and data validation for prediction of accuracy.

Fig. 12. —Classification using RNN-LSTM.

Fig. 13. Code to generate confusion matrix.

Fig. 14. Confusion Matrix testing dataset of ultra sound images.

Transfer Learning: This method involves using a pre-trained model 5. Conclusion

Fig. 15. Confusion Matrix of training dataset of histopathological images.

Fig. 16. Confusion Matrix of testing dataset of histopathological images.

You might also like