A Multi-Task Feature Fusion Model For Cervical Cell Classification

4668 IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, VOL. 26, NO.
9, SEPTEMBER 2022
A Multi-Task Feature Fusion Model for Cervical

Cell Classification
Jian Qin , Yongjun He , Jinping Ge , and Yiqin Liang
Abstract—Cervical cell classification is a crucial tech- Cervical cells can be categorized according to The Bethesda
nique for automatic screening of cervical cancer. Although System [3], and some examples are shown in Fig. 1. The cate-
deep learning has greatly improved the accuracy of cell gories of cells are negative for intraepithelial lesion for malig-
classification, the performance still cannot meet the needs
of practical applications. To solve this problem, we propose nancy (NILM), atypical squamous cells of undetermined sig-
a multi-task feature fusion model that consists of one aux- nificance (ASC-US), low-grade squamous intraepithelial lesion
iliary task of manual feature fitting and two main classifica- (LSIL), atypical squamous cells cannot exclude HSIL (ASC-H),
tion tasks. The auxiliary task enhances the main tasks in a and high-grade squamous intraepithelial lesion (HSIL). NIML
manner of low-layer feature fusion. The main tasks, i.e., a indicates normal cells, and the others indicate abnormal cells.
2-class classification task and a 5-class classification task,
are learned together to realize their mutual reinforcement ASC-US and LSIL are primarily found in epidermic squamous
and alleviate the influence of unreliable labels. In addition, cells generally at the beginning of HPV infection. ASC-H and
a label smoothing method based on cell category similarity HSIL from the basal squamous cells are usually warning signs
is designed to bring inter-cell class information into the that cancerous processes may occur.
label. Comparative experimental results with other state- Recently, deep learning has brought significant progress in
of-the-art models on the HUSTC and SIPaKMeD datasets
prove the effectiveness of the proposed method. With a cervical cell classification, but the performance still cannot meet
high sensitivity of 99.82% and a specificity of 98.12% for the needs of practical applications. There are two main reasons.
the 2-class classification task on the HUSTC dataset, our First, current deep learning networks need large amounts of
method shows potential to reduce cytologist workload. labeled data. This is hard to be satisfied because data label
Index Terms—Cervical cell classification, feature fusion, needs professional cytologists. Second, cervical cancer lesions
label smoothing, manual features, multi-task. are a continuous process, so there is no clear threshold to deter-
mine the boundary of cell category. This makes it challenging
to obtain accurate category annotations, resulting in degraded
I. INTRODUCTION performance. Currently, two hotspots are how to train a better
ERVICAL cancer is now a significant health threat to model without increasing labeled data and how to reduce the
C women worldwide. It is accountable for 311,000 deaths
in 2018, with 85% of these deaths are occurring in low- and
impact of mislabeling.
Multi-task learning can effectively improve the performance
middle-income countries [1]. Nevertheless, cervical cancer can of deep learning. This is because related tasks can benefit from
be effectively managed through preventive clinical management each other by jointly learning certain shared, or mutually related
strategies. Cervical cytopathology (pap smear or liquid-based representations. Multiple signals originating from different tasks
cytology) is the most common method for identifying precan- can be considered as implicit data augmentation or additional
cerous lesions of cervical cancer. It is tedious and laborious for regularization [4]. This enables models to learn mutually related
cytologists to manually screen abnormal cells from a cervical representations for multiple tasks, thus avoiding overfitting and
cytology sample containing 20,000–50,000 cells. Moreover, the leading to better generalizability.
missed rate of manual cervical screenings is still up to 20–30% Manual features defined in The Bethesda System rules [3] can
[2]. Hence, it is necessary to develop automatic screening tech- effectively assist in identifying normal and abnormal cells. Many
niques to help cytologists diagnose smears. works have been done to classify manual features of cervical
cells by machine learning [5], [6]. These manual features about
cells are highly relevant to the cell classification task, so we
Manuscript received 15 October 2021; revised 4 May 2022 and 31 consider using a multi-task model for simultaneous manual fea-
May 2022; accepted 2 June 2022. Date of publication 7 June 2022;
date of current version 9 September 2022. This work was supported in ture and category prediction. This allows the cell classification
part by the National Natural Science Foundation of China under Grant task to learn relevant hidden features through predicting man-
61673142 and in part by the Natural Science Foundation of Heilongjiang ual features, thus improving performance. In addition, current
Province of China under Grant JJ2019JQ0013. (Corresponding author:
Yongjun He.) label smoothing methods treat non-target categories equally by
The authors are with the School of Computer Science and Tech- assigning them with fixed identical probability [7]. If the prior
nology, Harbin University of Science and Technology, Harbin 150080, knowledge of category similarity is known, we can assign non-
China (e-mail: jian.qin1@qq.com; holywit@163.com; 13079655433@
163.com; 984013452@qq.com). target categories with different probabilities to further improve
Digital Object Identifier 10.1109/JBHI.2022.3180989 classification performance.
2168-2194 © 2022 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See https://www.ieee.org/publications/rights/index.html for more information.
Authorized licensed use limited to: Charles Darwin University. Downloaded on January 20,2024 at 14:55:31 UTC from IEEE Xplore. Restrictions apply.
QIN et al.: MULTI-TASK FEATURE FUSION MODEL FOR CERVICAL CELL CLASSIFICATION 4669
Fig. 1. Illustration of the different cells. Left: normal cells defined as NILM. Right: abnormal cells including ASC-US, LSIL, HSIL and ASC-H.
ASC-US and LSIL are usually found in epidermal squamous cells, while ASC-H and HSIL are generally found in basal squamous cells.
In this paper, we propose a multi-task feature fusion model 4) The proposed method is evaluated on the HUSTC and
that fully uses manual features. Specifically, the proposed model SIPaKMeD datasets [8]. The effectiveness of the pro-
contains a manual features fitting branch and a classification posed multi-task feature fusion model is confirmed by
branch. Rather than directly fusing manual features with deep comparing it with other state-of-the-art models.
learning features, we fuse the middle layer features of the manual
features fitting branch with those of the classification branch.
The classification branch simultaneously performs 2-class and II. RELATED WORK
5-class classification tasks. In the proposed model, features in
the manual feature fitting branch containing cervical cell domain A. Cervical Cell Classification
knowledge are incorporated into the classification branch by Research on automatic cervical cancer screening techniques
connection module. To reduce the influence of incorrect label- began in the 1950-ies and entered its active research phase
ing, we propose a new label smoothing method, which assigns in the 1970-ies [2]. Afterward, a series of cervical screening
non-target categories with a different probability according to systems were launched and some success was achieved [9].
the prior knowledge of similarity between categories. Further- In recent decades, automation technology and deep learning
more, supervised contrastive learning is used to pre-train the have achieved remarkable progress in medicine [10], which has
backbone network parameters. Our contributions are summa- powerfully promoted the continuous development of automatic
rized as follows: cervical cancer screening.
1) We propose a multi-task model for cervical cell classifi- In the early days, most literature involved manual features and
cation, which uses manual features fitting as an auxiliary traditional machine learning classifiers such as support vector
task to help the model extract discriminative features. machine [11] and AdaBoost [12], etc. Wang et al. [13] used
2) The classification branch of the proposed model performs feature selection algorithms to filter the features of shape, texture
2-class and 5-class classification tasks simultaneously. and Gabor features, then classified the features through the sup-
This allows the two tasks to promote each other and port vector machine. Arya et al. [14] used multiple texture fea-
reduce the influence of incorrect annotations among intra- tures, including first-order histogram, grey-level co-occurrence
group classes. matrix, local binary pattern, Laws energy texture feature and
3) We propose a new label smoothing method that sets discrete wavelet transform for the cervical cell classification.
non-target categories to different probabilities, which are Win et al. [15] used a random forest algorithm as a feature
determined by the similarity of cell categories. This can selection method. Then they used a bagging integrated classifier
improve model performance by bringing inter-cell class to combine the results of five classifiers to obtain the final results.
information into the label. These methods essentially use one or more classifiers to classify
4670 IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, VOL. 26, NO. 9, SEPTEMBER 2022
Fig. 2. Overview of the proposed method for cervical cell classification. Firstly, target cells are marked on whole slide images and cropped out.
After that, the cells are segmented and used to compute manual features. Finally, the manual features and the smoothed labels are used to train
our proposed multi-task model.
features. As a result, they are inevitably affected by feature information contained in the manual features can be brought into
extraction and feature selection choices. the model.
With the development of computer hardware, deep learning
is widely used in image recognition. Deep learning methods B. Label Smoothing
are designed to discover intricate discriminative information
Training deep learning models with hard labels (with 1 in-
from raw data automatically. They have been used successfully
dicating the target category and 0 non-target categories) often
in biomedical image processing, such as classifying lung dis-
results in over-confident models. For supervised learning, the
eases [16] and lymph nodes [17], etc. Deep learning models such
quality of labels determines the performance of models. Boost-
as ResNet [18] and EfficientNet [19] have been successfully used
ing and changing labels is a simple and effective way to alleviate
in cervical cell classification tasks. DeepCervix [20] achieves
overfitting problems. Reed et al. [27] used predicted distribution
leading performance in cervical cell classification tasks by fus-
or predicted category to smooth hard labels to improve the
ing hybrid deep features from different models. DeepPap [21]
performance on noisy data. Dubey et al. [28] perturbed a small
can directly classify a single cervical cell into normal and
portion of the labels during model training to prevent overfitting
abnormal from image patches centered on the nucleus centroid.
of the model. Wang et al. [29] proposed an innovative framework
Lin et al. [22] proposed a CNN-based method that combines
that learned feature representation by predicting non-spatial and
cell image appearance with cell morphology for cervical cell
spatial transformation parameters. Zhang et al. [30] proposed an
classification. Basak et al. [23] used the gray wolf optimizer to
Online Label Smoothing (OLS) strategy, which generates soft
select the reduced-dimensional CNN features and thus improve
labels based on the statistics of the model prediction for the target
the classification performance. More recent methods have shown
category. Unlike the mentioned approaches above, the soft labels
the effectiveness of Transformer-based architectures for image
generated by our proposed method utilize a priori knowledge of
classification [24]–[26]. The transformer directly models long-
cervical cells.
range dependencies of local patches through the self-attention
mechanism, providing greater flexibility in modeling visual
contents. III. METHODS
However, most current methods only use category labels to The block diagram of the proposed work is shown in Fig. 2.
train the model and do not use the manual features of cervi- We first label cells on the whole slide images to obtain training
cal cells to guide the training. In this paper, we use manual data. Secondly, the labeled cells are cropped out from the whole
features as an auxiliary learning task for the model so that the slide images (WSIs), and the cells are segmented for subsequent
Fig. 3. The architecture of the proposed multi-task model. The model comprises a classification branch and a manual features fitting branch.
Rest1-Rest4 represent the layers of the Resnet model, respectively. ‘Conv’ denotes a 1 × 1 convolution operation with a step size of 1. ‘BN’ and
‘FC’ denote batch normalization and fully connected layers, respectively.
processing. Thirdly, according to the rules of The Bethesda module. The manual features fitting branch and classification
System, we calculated the manual features of the cells. Finally, branch are explained explicitly in the following.
the multi-task model is trained using smoothed classification 1) Manual Features Fitting Branch: Resnet-50 can effec-
labels and manual features. tively avoid gradient disappearance and improve model feature
extraction to predict cellular manual features better. The manual
features fitting branch uses the pre-trained Resnet-50 to predict
A. Multi-Task Model Architecture manual features from the input image. We retain all the fea-
Multi-task learning can improve deep learning models by ture extracting blocks of Resnet-50, and only change the fully
simultaneously learning multiple related tasks. Manual features connected layer.
of cervical cells which contain a large amount of a priori knowl- 2) Classification Branch: The classification branch also uses
edge are highly relevant to the classification task. Therefore, we Resnet-50 as the feature extraction module, where Resnet-50 is
propose a multi-task mode that uses manual feature fitting as pre-trained on the 5-class dataset through supervised compara-
an auxiliary learning task, allowing the model to learn the mu- tive learning. As shown in Fig. 3, the feature extraction blocks in
tually related representations between tasks to improve model the classification branch and the manual features fitting branch
performance. are connected by a connection module. After that, we use 1 × 1
The proposed model borrows the idea from the convolution operation to reduce the number of feature channels
NDDR-CNN [31], using Neural Discriminative Dimensionality to half. Finally, the fused features are input into the classification
Reduction (NDDR) as a layerwise feature fusion scheme. NDDR branch.
is modulated by combining existing CNN components with clear The development of cervical cancer is a continuous process,
mathematical interpretability as discriminative dimensionality so there is no clear boundary between the categories of abnormal
reduction. In the proposed model, we use NDDR to implement cells. The relationships between various categories of cervical
feature fusion between the classification branch and the cells are shown in Fig. 1. ASC-US and LSIL belong to the
manual feature fitting branch. Compared with NDDR-CNN, epidermal group, while ASC-H and HSIL belong to the basal
the difference is that we only input the fused features into the group. Furthermore, cells in these two groups have a similar
classification task instead of all tasks. This is because in our intra-group appearance. In data labeling, if a cytologist does not
proposed model, we expect the performance of cell classification find enough evidence to determine an LSIL cell, he is prone
to be improved rather than the manual feature prediction. The to label this cell as ASC-US. Therefore, manual labeling of
proposed model comprises a classification branch and a manual cells is subjective, which inevitably results in mislabeling. An
features fitting branch (Fig. 3), which are used for manual feature LSIL falling into HSIL should be penalized more than falling
prediction and cell classification tasks, respectively. Cervical into ASC-US. The classification loss of the model consists of
cell domain knowledge in the manual feature fitting branch the 5-class and 2-class classification losses, which implements
is incorporated into the classification branch by connection a similar function to the synergistic grouping loss (SGL) [32]
Fig. 4. Different kinds of label distributions on the HUSTC dataset. The target category is ASC-US, and we scale the y-axis using the log function
for visualization. (a), (b), (c) show the original hard label, soft label generated by [7], and soft label generated by our method, respectively.
in the classification task. For example, when an ASC-H cell cervical cell classification. However, the original label smooth-
is incorrectly labeled as HSIL instead of ASC-US, it is still a ing operation [7] treats the non-target categories equally by as-
correct label for a 2-class classification task. Such an operation signing them with identical probability without considering the
can make an ASC-H cell that falls into ASC-US gaining more connection between the categories. Such a smoothing operation
punishment than its misclassification as HSIL. In this way, the would limit the improvement of the classification performance.
synergistic loss of 2-class and 5-class classification can better To address this problem, we used different probabilistic smooth-
optimize the model and reduce the influence of inter-class noise ing labels by summarizing the prior between different classes of
on fine-grained classification. cervical cells.
Since our model is a multi-task model, the model loss consists The cross-entropy loss function will make the models too
of classification loss and manual features fitting loss. The loss confident about the prediction, reducing their generalization
function of the model is shown as follows: ability, especially when there are many uncertain labels on the
training data. The label smoothing operation [7] is often used
Loss = λLmf + L5−class + L2−class (1)
to alleviate this problem. Fig. 4(b) shows this label smoothing

0.5 (xf − yf )2 , if |xf − yf | < 1 result, and its equation is shown below.
Lmy = (2) α
|xf − yf | − 0.5, otherwise YiLS = Yi (1 − α) + (4)
K
where xf and yf denote the prediction and label of manual fea-
where α is a small constant, K indicates the number of label
tures, respectively. Manual features are a N × 1 vector that com-
categories, and YiLS is the label after smoothed. This smoothing
bines different features, N represents the number of features.
operation does not consider the similarity between classes. ASC-
λ is used to control the weight of manual features prediction
US and LSIL belong to the epidermal group, while ASC-H and
loss in the overall loss of the model, and we set λ to 0.5 in this
HSIL enter the basal group. The cells of the same group have
paper. L5−class and L2−class are the weighted cross-entropy loss
similar pathological characteristics, so the labels of the same
of 5-class and 2-class classification, respectively. The weighted
group should be given greater confidence in the label smoothing
cross-entropy loss is shown below
operation. Therefore, we smooth the label distribution with:

C
αβ
L = −wc yi log(pi ) (3) Yi LS = Yi (1 − α) + Yj
i=1 Gi − 1
Yj ∈gi ,j=i
where pi is the predicted result, yi is the ground truth, wc is the α(1 − β)
weight, C is the number of classes. We set the weight of the + Yj (5)
K − Gi − 1
normal class to 0.6 and the weight of the abnormal class to 0.4. Yj ∈
/ gi
where α and β are the parameters used to control the smoothing,

B. Smoothing Noisy Label Regularization
K indicates the number of label categories. gi indicates the
Unlike natural images, various categories of cervical cells are labels in the same group as Yi , and Gi is the number of the
similar. Due to the complexity of cervical cells, noisy labels are labels in the gi . In our 5-class classification task, α and β are set
often present, especially intra-group mislabeling in the epider- to 0.1 and 0.7, respectively. For a cell belonging to ASC-US, the
mal (ASC-US and LSIL) and basal groups (ASC-H and HSIL). smoothed label means that the probability of this cell belonging
The label smoothing operation can effectively reduce the effect to ASC-US is 90%, and the probability belonging to LSIL is 7%.
of mislabeling on the classification model [30]. Xiang et al. [33] To determine the parameters of α and β, we try different
used label smoothing to reduce the influence of mislabeling on combinations of parameters to search the approximate global
optimization of the hyper-parameters on the validation dataset. 2) Nuclear to Cytoplasm Ratio: It is defined as the ratio of
Firstly, the parameter a and β is set to the minimum value in the nucleus area to cytoplasm area:
searching scope. Then we increase the parameter α and β with Anu
rα and rβ respectively. Where rα and rβ denote parameters of N/C = (6)
Acy
incremental size for each increment of α and β during the search.
The searching scope of α and β is 0–1. Finally, we calculate where Anu is the nucleus area, and Acy is the cytoplasm area.
the performance of all parameter combinations, and take the 3) Nucleus Roundness: The roundness is calculated as the
combination with the highest accuracy on the validation set as ratio of the actual area to the area inside the circle. In this case,
the optimized hyper-parameters. the area inside the circle is approximately calculated by the target
perimeter. The roundness is given by
4π · Anu
C. Supervised Contrastive Learning Nrd = (7)
Ncir
In recent years, contrastive learning, widely used in self- where Anu is the nucleus area, and Ncir is the perimeter of the
supervised representation, has performed well on many visual nucleus.
tasks [34]. The common idea of these methods is to pull together 4) Integral Optical Dens (IOD): To calculate the IOD of the
anchors and positive samples in the embedding space and push nucleus, we need to obtain the brightness of the image. In our
apart the anchors from negative samples. A positive pair often experiments, the brightness [37] is calculated by
consists of data augmentations of the sample, and negative pairs
are formed by the anchor and randomly chosen samples from Y(i,j) = 0.299R(i,j) + 0.587G(i,j) + 0.114B(i,j) (8)
the minibatch. Self-supervised comparative learning methods
where Y(i,j) is the brightness of the image at the position (i, j),
are extended to fully supervised tasks that can fully use label in-
R(i,j) , G(i,j) , and B(i,j) are the pixel values of the image at
formation. Due to the inclusion of label information, supervised
position (i, j) on the three RGB channels.
contrastive learning can make normalized embeddings from the
The optical density (OD) is derived from the Beer-Lambert
same class tighter than those from different classes [35].
law, which measures the degree of light absorption after parallel
In this paper, we use supervised contrastive learning to
monochromatic light passes through a uniform, non-scattering
pre-train the backbone network Resnet-50 of the classification
absorber. The optical density is related to the content of cell
branch on the training set of HUSTC. The supervised contrastive
chromatin. IOD is the sum of the optical densities of the points
loss contrasts all samples of the same class as positive samples
within the indicated measurement area, and the IOD value of the
against the negative samples from the remainder of the batch.
nucleus is calculated via
Such sample selection takes class label information into account
Yμ
to construct an embedding space where samples of the same class IOD = lg (9)
are more closely aligned. First, we apply data enhancement twice Y(i,j)
(i,j)∈S
on the input batch to obtain two copies. The encoded representa-
tions of the two copies are then obtained through Resnet-50, and where Y(i,j) denote the brightness at the position (i, j), and
such representations are further propagated through the projec- S represents the nucleus region.Yμ denotes the average bright-
tion network during training. Finally, the supervised contrastive ness of the image background, approximating the incident light
loss is computed on the output of the projection network. This brightness.
enables the model to acquire rich feature information and thus 5) Texture Features: To obtain the texture features of
improve the classification performance. the cells, we select the gray-level co-occurrence matrix
(GLCM) [38] feature to represent texture features. GLCM con-
tains the conditional joint probabilities of all pair wise combi-
D. Manual Features nations of gray levels given two parameters: interpixel distance
d and interpixel orientation θ. The probability measure can be
To extract manual features of cervical cells, we first need defined as:
to segment cells. In this paper, we use the CE-Net [36] as the
segmentation model. The CE-Net consists of three major parts: P ro(x) = {p(i, j)|(d, θ)} (10)
feature encoder, context extractor, and feature decoder. CE-Net
where p(i, j) is defined as:
has been trained with enough cell segmentation annotation data
and can segment cell nuclei well. C(i, j)
According to the rules of The Bethesda System, we select p(i, j) = Ng −1 Ng −1 (11)
i=0 j=0 C(i, j)
some discriminatively manual features. These features include
morphological features (e.g., area of nucleus, nuclear to cy- where C(i, j) represents the number of occurrences of gray
toplasm ratio, nucleus roundness), integral optical dens, and levels and within the window. Ng is the total number of gray
texture features. The formulas for these features are shown levels.
below. The features of contrast and entropy are selected from the
1) Area of Nucleus: It is defined as the number of pixels in GLCM feature. Contrast is a measure of the local variations
the region of a nucleus. presented in an image, and entropy measures the randomness of
TABLE I be classified into three category groups, Normal (Superficial,

DESCRIPTIONS OF THE HUSTC DATASET
Parabasal), Abnormal (Koilocytotic, Koilocytotic), and Benign
(Metaplastic).
B. Implementation Details
We adopt Pytorch library and Pytorch image models library
(timm) [39] to implement our models and conduct all experi-
ments. Our experiments are conducted on a computer configured
with four GeForce RTX 2080-TI GPUs and dual Inter Xeon
TABLE II E5-2678 V3 (12-core 2.50 GHz) CPUs. For fair comparisons, we
DESCRIPTIONS OF THE SIPAKMED DATASET implement the same training scheme for all models. During the
training process, the input image size of the model is 224 × 224,
and all methods apply data enhancement including flip, rotation,
Gaussian blur, and color perturbation. We train these models for
310 epochs, using an SDG optimizer with a mini-batch size
of 32. The initial learning rate of SDG is 0.02 and decayed
according to cosine annealing. In addition, we use a linear
warm-up performed in the first three epochs.
the image texture. They can be defined as: C. Evaluation Metrics

⎧ ⎫ In this work, we adopt accuracy, sensitivity, specificity, pre-
Ng −1
⎨Ng −1 Ng −1
⎬ cision, and F1-score to assess the performance of the models.
Contrast = k2 p(i, j)| |i − j| = k (12)
⎩ ⎭ The definition of these metrics is as follows:
k=0 i=0 j=0
TP + TN
Ng −1 Ng −1 Accuracy = (15)
TP + TN + FP + FN
Entropy = − p(i, j) log (p(i, j)) (13)
TP
i=0 j=0 Sensitivity = (16)
TP + FN
In the proposed method, every feature needs to be normalized.
TN
The normalization method is as follows: Specif icity = (17)
TN + FP
x i − μi
xi = (14) 2T P
σi F 1 − Score = (18)
2T P + F P + F N
where, xi represents the features that need to be normalized,
where T P is the number of accurately detected positive samples,
μi and σi represent the average and variance of the features,
F P represents the number of negative samples classified as
respectively.
positive, T N is the number of correctly classified negative
samples, and F N represents the number of positive instances
IV. EXPERIMENTS AND ANALYSIS
predicted as negative.
A. Dataset
Two datasets of cervical cells, i.e., the HUSTC dataset and D. Results and Analysis
the SIPaKMeD dataset, are employed to evaluate the proposed We have conducted extensive experiments to evaluate the
method. These two datasets cover different staining methods and proposed model. Firstly, we conduct ablation experiments to
imaging conditions. evaluate each module of our method. Secondly, we show the
1) HUSTC Dataset: HUSTC data set is a cervical cell classi- effect of label smoothing with different parameters. Finally, our
fication dataset built by our team. We collected 800 slides from proposed method is compared with existing methods.
The Second Affiliated Hospital of Harbin Medical University, 1) Ablation Study: In the experiment, we analyze the role
and these slides are scanned using digital slide scanners (WS-10, of the four key modules in our model and verify their perfor-
WISILEAP) with 20× objective lenses. These data are labeled mance. These modules are: 1) manual features fitting branch, 2)
and verified by multiple experienced cytologists. The data set classification branch, 3) pre-training using supervised contrast
contains 70197 single-cell images. The normal class is NILM, learning, 4) our proposed label smoothing operation.
and the abnormal classes are ASC-US, LSIL, ASC-H and HSIL. Our proposed method is based on the Resnet-50, so we use
The image number of each category is shown in Table I. Resnet-50 as the baseline model. The results of the ablation
2) Sipakmed: SIPaKMeD is an open-source pap smear im- study are shown in Table III. As we can see, the proposed
age database that contains 4049 cell images [8]. On the model has achieved the best performance in both the 5-class
SIPaKMeD dataset, cells are classified into five categories ac- and 2-class classification tasks. Our proposed method obtains
cording to morphology and cellular appearance. The division a 2.83% and 0.43% improvement of accuracy over the Resnet-
of these cell categories is shown in Table II. SIPaKMeD can 50 in the 5-class and 2-class classification tasks. As seen from
TABLE III
ABLATION ANALYSIS FOR THE PROPOSED METHOD
‘MF’ indicates the addition of manual features fitting branch, ‘ML’ denotes the addition of multi-classification branch, ‘SC’ means the pre-training using supervised contrast
learning, and ‘SL’ refers to using our proposed label smoothing operation.
label smoothing operation, we divide cells into different groups.

The detailed division is shown in Table I and Table II. The
HUSTC dataset includes three category groups of epidermic,
basal, and NILM. Moreover, the SIPaKMeD dataset includes
normal, abnormal, and benign category groups.
To reduce the search space of the algorithm, we set α to 0.1 and
only search for β, where rβ is set to 0.1. When α = 0, the label
smoothing does not work. Our proposed method approximates
the use of uniform distribution to smooth hard labels [7], when
α = 0.1 and β = 0.2. As shown in Fig. 6(a), the model achieves
the highest accuracy of 0.8188 on the HUSTC dataset when
β is set to 0.7. As shown in Fig. 6(b), our proposed label
smoothing operation effectively improves the performance on
the SIPaKMeD dataset when β is between 0.3 and 1 and achieves
the best accuracy of 0.9867 when β = 0.6. Experimental results
further confirm the effectiveness of our proposed label smooth-
ing method.
3) Quantitative Comparison: We compare our model with
other methods on the HUSTC and SIPaKMeD datasets. On
the HUSTC dataset, the compared models are: Multi-tasking
model NDDR-CNN [31], CNN-based models (e.g. ResNet-152,
EfficientNet-B3 [19], VOLO-D2 [40]), Mixer-B/16 [41], and
Transformer-based models (e.g. ViT-B/16 [24], Swin-B [25],
XCiT-S24 [26]). The results are shown in Table IV. Compared
Fig. 5. Confusion matrices for classification on the HUSTC dataset.
with the advanced multi-task learning model NDDR-CNN, our
We treat abnormal cells as positive samples in the binary classification proposed model improves the way of information interaction
task. between tasks, enabling classification tasks to acquire more
prior knowledge from manual features. As a result, our model
the confusion matrix in Fig. 5, our proposed method can improve performs better than NDDR-CNN in all metrics. Among the
the classification of indistinguishable ASC-US, while the false compared CNN-based models, EfficientNet-B3 achieves the
positive rate of prediction is effectively reduced. After adding best results thanks to neural architecture search techniques.
the manual features fitting branch to the model, the accuracy rate Compared to EfficientNet-B3, our model performs better on
of the 5-class classification is improved to 80.89%. The accuracy both the 5-class and 2-class classification tasks, with a 0.3%
rate is improved by 1.11% and 0.12% over the baseline model on and 0.28% improvement in specificity. XCiT-S24 combines the
5-class and 2-class classification tasks, with the addition of the accuracy of conventional transformers with the scalability of
multi-classification branch prediction module to the model. Fur- convolutional architectures, and achieves the best performance
thermore, supervised contrastive learning pre-training and label among the compared Transformer-based models in the 5-class
smoothing operation can help the model further improve its per- classification task. Compared to these mainstream classifica-
formance. Finally, our model achieves 81.88% and 99.52% accu- tion models, our method achieves the best performance on the
racy on the 5-class and 2-class classification tasks, respectively. HUSTC dataset.
2) Smoothing Label Regularization: Instead of using hard On the SIPaKMeD dataset, we use 5-fold cross-validation
labels in model training, the labels are processed with the for model verification. Our method is compared with
smoothing operations shown in Eq. (5). To use our proposed several conventional classifiers(e.g., DenseNet-121, Inception
Fig. 6. Label smoothing performance with different β.
TABLE IV
COMPARISON RESULTS WITH DIFFERENT CLASSIFICATION METHODS ON THE HUSTC DATASET
TABLE V
COMPARISON RESULTS WITH DIFFERENT CLASSIFICATION METHODS ON THE SIPAKMED DATASET
v3, EfficientNet-B3 [19], XCiT-S24 [26], Swin-B [25]), and lower than that of EfficientNet-B3, XCiT-S24 and Swin-B,
models designed for cervical cells (e.g., Plissiti et al. [8], Win et which is not conducive to the practical application. In the 2-class
al. [15], DeepPap [21], CompactVGG [42]). On the SIPaKMeD classification, our proposed model achieved the highest accuracy
dataset, we classify Benign as the normal class to complete the rate of 98.96%. Among the contrasting methods of the 5-class
2-class classification task. The experiment results are shown in classification, Swin-B achieves the best results. However, the
Table V. Win et al. [15] used a marker-controlled watershed performance of Swin-B in the 5-class classification task is still
approach to segment cells and extracted manual features. After lower than that of our proposed model.
that, a bagging ensemble classifier which combined the results
of five classifiers was applied achieved an accuracy of 94.09%.
CompactVGG makes VGG more compact through reasonable V. DISCUSSION
design. While reducing the computational effort, the accuracy Pap smear screening plays a fundamental role in preventing
of the model exceeds that of Inception v3, DenseNet-121 and women from cervical cancer. It is widely believed that computer-
EfficientNet-B3. However, the Specificity of CompactVGG is assisted diagnosis can reduce the workload of cytologists and
decrease the rate of potential misdiagnosis. Deep learning mod- from that in the patient classification. In cell classification, we
els are used in cervical cell classification tasks and have been expect sensitivity and specificity to be as high as possible. Low
proven to be more effective than traditional cell morphological sensitive results in missing diagnosis cells, while low specificity
features. Moreover, deep learning models show better clas- results in high false positive cells. The latter will bring a heavy
sification performance with continuous network architecture workload to cytologists. The cell classification is the base of
optimization. However, these models are trained only with the the patient classification. The patient classification can be made
category labels of the cells and cannot learn using other valid by considering the amounts and the lesion degree of abnormal
features. cells. In this situation, high sensitivity of the diagnostic system
Considering the rich classification information contained in is preferable because more true positive patients can be found.
the manual features, we aim to use the manual features of cells The combination of the high sensitivity of the diagnostic system
to assist classification models. Win et al. [15] achieved 94.09% and the high specificity of cytologists can effectively improve
accuracy in the five-classification task on the SIPaKMeD diagnostic accuracy.
dataset just by using manual features. Basak et al. [23] Our method can predict both 5-class and 2-class classification
combined features from multiple deep learning models to tasks simultaneously. With the help of the high-precision 2-class
improve classification performance. Unlike these models, classification task, abnormal cells can be accurately screened,
we consider whether we can fuse the information in manual and then the recommended fine-grained classification results
features with the features extracted by the deep learning model. are given according to the 5-class classification results. This
Therefore, we propose a multi-task feature fusion deep learning allows automated computer-aided diagnostic methods to give
model that fully uses manual features. This enables our model more refined predictions with guaranteed screening accuracy.
to achieve a 98.67% accuracy rate in the 5-class classification Our proposed model can effectively promote the development
task on the SIPaKMeD dataset. of computer-aided diagnosis methods for cervical cancer and
For supervised learning, the quality of labels determines the improve the accuracy of cytologists in cancer diagnosis.
performance of models. The label smoothing operation can
effectively improve the performance of the classification model. VI. CONCLUSION
As shown in Fig. 6, after using the label smoothing operation, the
accuracy of the model has a great improvement compared to the In this paper, we propose a multi-task model that can effec-
original one. However, existing label smoothing methods (e.g., tively classify cervical cells. We train a sub-network that fits
OLS [30], LS [7]) do not make use of the prior knowledge of the manual features as an auxiliary task and then fuses the middle
labels. Therefore, we smoothed the labels according to the sim- layer features of the sub-network with those of the classification
ilar relationship of different classes of cervical cells. Compared sub-network. The proposed multi-task model consists of two
to the traditional smoothing method [7], our proposed method branches, a manual features fitting branch and a classifica-
achieves better performance on the HUSTC dataset after apply- tion branch. By learning the mutually related representations
ing the label smoothing operation paired with prior knowledge. between the branches, the performance of cell classification
Although the proposed method has achieved improvement, is effectively improved. In particular, the classification branch
there are still some limitations. First, manual features are allows for 5-class and 2-class classification tasks for cervical
required as supervised information during training, and the cells. Furthermore, we propose a novel label smoothing method
calculation of manual features usually requires the segmentation that sets non-target classes to different probabilities determined
of nuclei and cytoplasm. In our future work, cell segmentation by the similarity of the cell categories. Smoothing the labels
under the interference of overlapping and impurities needs to provides the model with more inter-category information and
be investigated for accurate manual features. Secondly, our prevents the over-fitting of the model. The proposed method is
proposed label smoothing method adopts a search approach evaluated on the HUSTC dataset and the SIPaKMeD dataset.
to determine the best hyper-parameters, resulting in high It has achieved a better performance than other methods. Our
computational complexity. Future research can focus on more method is characterized by high sensitivity and high specificity
stable and fast label smoothing algorithms. Finally, our method on the HUSTC dataset for 2-class classification tasks. This
shows high specificity and sensitivity in the 2-class classification shows the potential to reduce the workload of cytologists and
task of cervical cells. However, on complex datasets such as help the development of automatic cervical cancer screening.
the HUSTC, the 5-class classification sensitivity needs further
improvement. We think the reasons are as follows. 1) The data REFERENCES
on the HUSTC is very complex, which reduces the performance. [1] M. Arbyn et al., “Estimates of incidence and mortality of cervical cancer
2) There is no obvious boundary between cells of different in 2018: A worldwide analysis,” Lancet Glob. Health, vol. 8, no. 2,
categories. For example, ASC-US cells are similar to LSIL pp. e191–e203, 2020.
[2] E. Bengtsson and P. Malm, “Screening for cervical cancer using automated
cells, while ASC-H cells are similar to HSIL cells. In future analysis of PAP-smears,” Comput. Math. Methods Med., vol. 2014, 2014,
work, we will explore how cytologists distinguish different Art. no. 842037 .
cervical cells and combine their professional knowledge with [3] R. Nayar and D. C. Wilbur, The Bethesda System for Reporting Cervical
Cytology: Definitions, Criteria, and Explanatory Notes. Berlin, Germany:
deep learning to improve performance. Springer, 2015.
In computer-aided diagnosis systems, the requirement for [4] S. Ruder, “An overview of multi-task learning in deep neural networks,”
sensitivity and specificity in the cell classification is different 2017, arXiv:1706.05098.
[5] K. Bora, M. Chowdhury, L. B. Mahanta, M. K. Kundu, and A. K. Das, “Au- [23] H. Basak, R. Kundu, S. Chakraborty, and N. Das, “Cervical cytology
tomated classification of pap smear images to detect cervical dysplasia,” classification using PCA and GWO enhanced deep features selection,” SN
Comput. Methods Prog. Biomed., vol. 138, pp. 31–47, 2017. Comput. Sci., vol. 2, no. 5, pp. 1–17, 2021.
[6] T. Chankong, N. Theera-Umpon, and S. Auephanwiriyakul, “Automatic [24] A. Dosovitskiy et al., “An image is worth 16 × 16 words: Transformers
cervical cell segmentation and classification in pap smears,” Comput. for image recognition at scale,” 2020, arXiv:2010.11929.
Methods Prog. Biomed., vol. 113, no. 2, pp. 539–556, 2014. [25] Z. Liu et al., “Swin transformer: Hierarchical vision transformer using
[7] C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna, “Rethinking shifted windows,” in Proc. IEEE/CVF Int. Conf. Comput. Vis., 2021,
the inception architecture for computer vision,” in Proc. IEEE Conf. pp. 10012–10022.
Comput. Vis. Pattern Recognit., 2016, pp. 2818–2826. [26] A. Ali et al., “XCiT: Cross-covariance image transformers,” in Proc. Adv.
[8] M. E. Plissiti, P. Dimitrakopoulos, G. Sfikas, C. Nikou, O. Krikoni, Neural Inf. Process. Syst., vol. 34, pp. 20014–20027, 2021.
and A. Charchanti, “SIPAKMED: A new dataset for feature and image [27] S. E. Reed, H. Lee, D. Anguelov, C. Szegedy, D. Erhan, and A. Rabinovich,
based classification of normal and pathological cervical cells in pap “Training deep neural networks on noisy labels with bootstrapping,” 2014,
smear images,” in Proc. 25th IEEE Int. Conf. Image Process., 2018, arXiv:1412.6596.
pp. 3144–3148. [28] A. Dubey, O. Gupta, P. Guo, R. Raskar, R. Farrell, and N. Naik, “Pair-
[9] A. D. Brown and A. M. Garber, “Cost-effectiveness of 3 methods to wise confusion for fine-grained visual classification,” in Proc. Eur. Conf.
enhance the sensitivity of papanicolaou testing,” JAMA, vol. 281, no. 4, Comput. Vis., 2018, pp. 71–88.
pp. 347–353, 1999. [29] X. Wang, D. Kihara, J. Luo, and G.-J. Qi, “EnAET: A self-trained
[10] M. M. Rahaman et al., “A survey for cervical cytopathology image analysis framework for semi-supervised and supervised learning with ensemble
using deep learning,” IEEE Access, vol. 8, pp. 61687–61710, 2020. transformations,” IEEE Trans. Image Process., vol. 30, pp. 1639–1647,
[11] J. A. K. Suykens and J. Vandewalle, “Least squares support vector machine 2021.
classifiers,” Neural Process. Lett., vol. 9, no. 3, pp. 293–300, 1999. [30] C.-B. Zhang et al., “Delving deep into label smoothing,” IEEE Trans.
[12] Y. Freund and R. E. Schapire, “A decision-theoretic generalization of on- Image Process., vol. 30, pp. 5984–5996, 2021.
line learning and an application to boosting,” J. Comput. Syst. Sci., vol. 55, [31] Y. Gao, J. Ma, M. Zhao, W. Liu, and A. L. Yuille, “NDDR-CNN: Layerwise
no. 1, pp. 119–139, 1997. feature fusing in multi-task CNNs by neural discriminative dimensionality
[13] P. Wang, L. Wang, Y. Li, Q. Song, S. Lv, and X. Hu, “Automatic cell nuclei reduction,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2019,
segmentation and classification of cervical pap smear images,” Biomed. pp. 3205–3214.
Signal Process. Control, vol. 48, pp. 93–103, 2019. [32] H. Lin, H. Chen, X. Wang, Q. Wang, L. Wang, and P.-A. Heng, “Dual-path
[14] M. Arya, N. Mittal, and G. Singh, “Texture-based feature extraction of network with synergistic grouping loss and evidence driven risk stratifica-
smear images for the detection of cervical cancer,” IET Comput. Vis., tion for whole slide cervical image analysis,” Med. Image Anal., vol. 69,
vol. 12, no. 8, pp. 1049–1059, 2018. 2021, Art. no. 101955.
[15] K. P. Win, Y. Kitjaidure, K. Hamamoto, and T. Myo Aung, “Computer- [33] Y. Xiang, W. Sun, C. Pan, M. Yan, Z. Yin, and Y. Liang, “A novel
assisted screening for cervical cancer using digital image processing of automation-assisted cervical cancer reading method based on convolu-
pap smear images,” Appl. Sci., vol. 10, no. 5, 2020, Art. no. 1800. tional neural network,” Biocybernetics Biomed. Eng., vol. 40, no. 2,
[16] J. Yao, X. Zhu, J. Jonnagaddala, N. Hawkins, and J. Huang, “Whole pp. 611–623, 2020.
slide images based cancer survival prediction using attention guided deep [34] K. He, H. Fan, Y. Wu, S. Xie, and R. Girshick, “Momentum contrast for
multiple instance learning networks,” Med. Image Anal., vol. 65, 2020, unsupervised visual representation learning,” in Proc. IEEE/CVF Conf.
Art. no. 101789. Comput. Vis. Pattern Recognit., 2020, pp. 9729–9738.
[17] P. Bandi et al., “From detection of individual metastases to classi- [35] P. Khosla et al., “Supervised contrastive learning,” in Proc. Adv. Neural
fication of lymph node status at the patient level: The Camelyon17 Inf. Process. Syst., 2020, pp. 18661–18673.
challenge,” IEEE Trans. Med. Imag., vol. 38, no. 2, pp. 550–560, Feb. [36] Z. Gu et al., “CE-Net: Context encoder network for 2D medical image
2019. segmentation,” IEEE Trans. Med. Imag., vol. 38, no. 10, pp. 2281–2292,
[18] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image Oct. 2019.
recognition,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2016, [37] J. Jantzen, J. Norup, G. Dounias, and B. Bjerregaard, “Pap-smear bench-
pp. 770–778. mark data for pattern classification,” in Nature inspired Smart Inf. Syst.,
[19] M. Tan and Q. Le, “Efficientnet: Rethinking model scaling for convolu- 2005, pp. 1–9.
tional neural networks,” in Proc. Int. Conf. Mach. Learn., 2019, pp. 6105– [38] R. M. Haralick, K. Shanmugam, and I. H. Dinstein, “Textural features for
6114. image classification,” IEEE Trans. Syst., Man, Cybern., vol. SMC-3, no. 6,
[20] M. M. Rahaman et al., “Deepcervix: A deep learning-based framework pp. 610–621, Nov. 1973.
for the classification of cervical cells using hybrid deep feature fusion [39] R. Wightman, “Pytorch image models,” GitHub repository, 2019. [Online].
techniques,” Comput. Biol. Med., vol. 136, 2021, Art. no. 104649. Available: https://github.com/rwightman/pytorch-image-models
[21] L. Zhang, L. Lu, I. Nogues, R. M. Summers, S. Liu, and J. Yao, [40] L. Yuan, Q. Hou, Z. Jiang, J. Feng, and S. Yan, “Volo: Vision outlooker
“DeepPap: Deep convolutional networks for cervical cell classifica- for visual recognition,” 2021, arXiv:2106.13112.
tion,” IEEE J. Biomed. Health Informat., vol. 21, no. 6, pp. 1633–1643, [41] I. O. Tolstikhin et al., “Mlp-mixer: An all-MLP architecture for vision,”
Nov. 2017. Adv. Neural Inf. Process. Syst., vol. 34, pp. 24261–24272, 2021.
[22] H. Lin, Y. Hu, S. Chen, J. Yao, and L. Zhang, “Fine-grained classification [42] H. Chen et al., “CytoBrain: Cervical cancer screening system based
of cervical cells using morphological and appearance based convolutional on deep learning technology,” J. Comput. Sci. Technol., vol. 36, no. 2,
neural networks,” IEEE Access, vol. 7, pp. 71541–71549, 2019. pp. 347–360, 2021.

A Multi-Task Feature Fusion Model For Cervical Cell Classification

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

A Multi-Task Feature Fusion Model For Cervical Cell Classification

Uploaded by

Copyright:

Available Formats

4668 IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, VOL. 26, NO.

A Multi-Task Feature Fusion Model for Cervical

where α and β are the parameters used to control the smoothing,

TABLE I be classified into three category groups, Normal (Superficial,

the image texture. They can be defined as: C. Evaluation Metrics

label smoothing operation, we divide cells into different groups.

Fig. 6. Label smoothing performance with different β.

You might also like