Professional Documents
Culture Documents
Traitement Du Signal: Received: 12 August 2021 Accepted: 12 October 2021
Traitement Du Signal: Received: 12 August 2021 Accepted: 12 October 2021
https://doi.org/10.18280/ts.380502 ABSTRACT
Received: 12 August 2021 In order to determine the health status of the neonates, studies focus on either statistical
Accepted: 12 October 2021 behavior of the thermograms’ temperature distributions, or just correct classifications of the
thermograms. However, there exists always a lack of explain-ability for classification
Keywords: processes. Especially in the medical studies, doctors need explanations to assess the possible
class activation maps, deep learning, results of the decisions. Presenting our new study, how Convolutional Neural Networks
explainable artificial intelligence, medicine, (CNNs) decide the health status of neonates has been shown for the first time by using Class
neonates, thermography, visualization Activation Maps (CAMs). VGG16 which is one of the pre-trained models has been selected
as a CNN model and the last layers of the VGG16 have been tuned according to CAMs.
When the model was trained for 50 epochs, train-validation accuracies reached over 95%
and test sensitivity-specificity were obtained as 80.701%-96.842% respectively. According
to our findings, the CNN learns the temperature distribution of the body by mainly looking
at the neck, armpit, and abdomen regions. The focused regions of the healthy babies are
armpit and abdomen whereas of the unhealthy babies are neck and abdomen regions. Thus,
we can say that the CNN focuses on dedicated regions to monitor the neonates and decides
the health status of the neonates.
1271
We are filling the big gap in neonatal healthy-unhealthy efficiently work in big models such as CNNs because of their
classification with this study and showing differences between limitations in view of time and computational costs.
healthy and unhealthy neonates by visualizing activation maps. The visual expressions are typically used in convolution-
Thanks to this study, the doctors and specialists will be able to based methods by creating activation maps (also known as
know how AI model decides for health status of the neonates. salient masks or heat-maps). Using the activation maps, the
The main contributions of our study to literature are as follows: contributions of each pixels can be represented on the input
• We classify the neonatal thermograms as healthy images. One of the visual expression methods is Class
and unhealthy. Activation Maps (CAMs) [17]. With the developing of CAMs,
• Deep learning methods such as CNNs and transfer researchers have been able to uncover the class-related pixels
learning have been used. contributions to the classification process [18, 19].
• This is the first study including explainable The three methods explained above can be combined to
features of neonatal classification. increase performance but it should not be forgotten that there
The rest of the paper is that the related works are given in is always a trade-off between the performance of a model and
Section 2. The imaging system used in NICU can be seen in its explain-ability.
Section 3. We are giving the methods and evaluating metrics
used in this study in Section 4. While Section 5 shows the
experiments and results, the Conclusion can be seen in Section 3. OBTAINING THERMOGRAMS FROM NEONATAL
6. INTENSIVE CARE UNIT (NICU)
1272
as weights, gestational ages, health information, heart rate, CNN learns the important features of the images and classifies
respiratory rate and diseases. To our best knowledge there is them into classes desired, transfer learning reduces the needed
no such a big neonatal thermal images dataset. The dataset time to train a CNN model effectively and improves the model
used in this study consists of 3800 thermograms taken from 38 performance. The models used in transfer learning are
different neonates half of them unhealthy and half healthy. described as pre-trained models. By using a pre-trained model
As can be inferred from Table 2 and Table 3, the mean value users do not train a CNN model from scratch, they only realize
of 19 unhealthy neonates' weights is 2094.684 g and standard fine-tuning operations on pre-trained models which were
deviation (std) is 785.826 g whereas 19 healthy neonates' are trained with millions of different images by big companies to
1583.429 g and 509.894 respectively. Since neonates are in train model so that decrease the training time.
NICU, they are premature babies having low weights. When As seen in Figure 4, first of all the important features such
we come to age distributions (gestational age), the unhealthy as edge, corner and texture are extracted by a pre-trained
neonates' mean value is bigger than healthy neonates' model and second of all the last layer of the pre-trained model
(approximately 20 days), therefore, the unhealthy neonates is changed according to outputs desired to produce class-
have more mean weight than healthy neonates (approximately related activations. Then the model is trained and the class-
500 gr). related activations are obtained.
1273
Figure 5. A CNNs model with pre-trained VGG16 architecture
Since lower-level features such as edges are typically the There are visualization techniques in CNNs such as layers'
same in each classification process, instead of re-learning outputs visualizations and filter visualizations. But these
these features for every training, the weights of the first techniques are not enough to assess the decisions. CAMs are
convolution layers of the pre-trained models can be directly used to determine which parts of image are learned by CNNs.
used. Thus, the CNNs models learn convolution weights that The layers' outputs give information about how each layers
find textures like tissues. affect their connected layers, and which types of convolutions
We used VGG model which can be seen in Figure 5 with 16 are learned by filter visualization. Since we need to explain
layers also known as VGG16 that has 5 pooling layers, 13 how CNNs work, advanced techniques such as CAMs are
convolutional layers, and 3 neural layers [27] (p.s. End-to-end necessary to visualize important regions (i.e. class-related
21,137,986 features, 224x224x3 input size and 7x7x512 the activations) of images.
last convolutional layer size). We are trying to find activations seen in Figure 6 because
those activations are related to diseases and if we apply them
4.3 Class activation maps (CAMs) to images we are going to find the essential regions for
classification. It is shown that how a CAM is built in Figure 7
and the detailed representation of the CAM side is given in
Figure 8.
If Figure 5 and Figure 7 are compared, it is seen that some
changes important are needed. To change the VGG16 model
into CAMs model, we need to add a new convolution layer
that its size is equal to our number of desired outputs as the
last convolution.
As can be seen in 9. part of Figure 5, there are three
convolutional layers (#512, #512, and #512). Since our output
has two classes as healthy/unhealthy, the last convolution's
layer size has been changed from #512 to #2 (10. part of Figure
7 and Figure 8).
After obtaining the last convolution with the same sizes of
output desired, Global Average Pooling (GAP) (Eq. (1)) [28]
is applied to each last layer (11. part of Figure 7 and Figure 8).
1274
Figure 7. A CAM model created by pre-trained VGG16 model
When it comes to the activation maps the following The Softmax function gives probabilities that sum of them
equation is calculated: is equal to 1 for example 0.3 healthy and 0.7 unhealthy.
𝑒 𝐼𝑐
𝑆𝐹𝑇𝑐 = (3) where, TP is number of the real patient data labeled as patient,
∑𝑐 𝑒 𝐼𝑐 FP is number of healthy data labeled as patient, TN is number
1275
of healthy data labeled as healthy, and FN is number of the convolutional layer, the GAP and a new neural layer with
unhealthy data as healthy. The confusion matrix is obtained by softmax were added to the model. Then the hyper-parameters
using those four expressions as shown at Table 4. were tuned as can be seen at Table 6 and the training and
validation processes started for 50 epochs. As shown in Figure
Table 4. The confusion matrix 9 the training and validation accuracies achieved over 95%.
𝑇𝑁
𝑠𝑝𝑒𝑐𝑖𝑓𝑖𝑐𝑖𝑡𝑦 = (5)
𝑇𝑁 + 𝐹𝑃
𝑇𝑃
𝑠𝑒𝑛𝑠𝑖𝑡𝑖𝑣𝑖𝑡𝑦 = (6) Figure 9. Accuracy values belonging training and validation
𝑇𝑃 + 𝐹𝑁 phases
1276
Figure 11. The outputs of the CAMs belonging three healthy
neonates
Figure 10. The outputs of the CAMs belonging three It is clearly seen that the model tries to find the thermal
unhealthy neonates distributions of neck, armpit, and abdomen regions. Especially
1277
for healthy class-related activations the model is looking for ACKNOWLEDGMENT
armpit regions whereas for unhealthy class-related activations
the model is looking for neck and abdomen regions. This study was supported by Huawei Turkey R&D Center
These findings show us that the model learn the features of and the Scientific and Technological Research Council of
neonates that is meaning it does not look at the unrelated Turkey (TUBITAK, project number: 215E019).
regions such as background. Moreover, related specific class
regions are the same for the all thermograms.
REFERENCES
1278
premature infants during the first week of life: [21] Dollar, P., Tu, Z., Tao, H., Belongie, S. (2007). Feature
Exploration using infrared thermal imaging. Journal of mining for image classification. In 2007 IEEE
Thermal Biology, 69: 118-123. Conference on Computer Vision and Pattern Recognition,
https://doi.org/10.1016/j.jtherbio.2017.06.005 pp. 1-8. https://doi.org/10.1109/CVPR.2007.383046
[14] Gunning, D., Stefik, M., Choi, J., Miller, T., Stumpf, S., [22] Saeys, Y., Inza, I., Larrañaga, P. (2007). A review of
Yang, G.Z. (2019). XAI—Explainable artificial feature selection techniques in bioinformatics.
intelligence. Science Robotics, 4(37): eaay7120. Bioinformatics, 23: 2507-2517.
https://doi.org/10.1126/scirobotics.aay7120 https://doi.org/10.1093/bioinformatics/btm344
[15] Lei, S. (2012). A feature selection method based on [23] Xie, M., Jean, N., Burke, M., Lobell, D., Ermon, S.
information gain and genetic algorithm. In 2012 (2016). Transfer learning from deep features for remote
International Conference on Computer Science and sensing and poverty mapping. arXiv:1510.00098.
Electronics Engineering, 2: 355-358. [24] Iandola, F.N., Han, S., Moskewicz, M.W., Ashraf, K.,
https://doi.org/10.1109/ICCSEE.2012.97 Dally, W.J., Keutzer, K. (2016). SqueezeNet: AlexNet-
[16] Safavian, S.R., Landgrebe, D. (1991). A survey of level accuracy with 50x fewer parameters and <0.5 mb
decision tree classifier methodology. IEEE Transactions model size. arXiv preprint arXiv:1602.07360
on Systems, Man, and Cybernetics, 21: 660-674. [25] Szegedy, C., Ioffe, S., Vanhoucke, V., Alemi, A. (2016).
https://doi.org/10.1109/21.97458 Inception-v4, inception-ResNet and the impact of
[17] Zhou, B., Khosla, A., Oliva, A., Torralba, A. (2016). residual connections on learning. arXiv preprint
Learning deep features for discriminative localization. arXiv:1602.07261.
2016 IEEE Conference on Computer Vision and Pattern [26] Qin, Z., Zhang, Z., Chen, X., Wang, C., Peng, Y. (2018).
Recognition (CVPR), pp. 2921-2929. Fd-mobilenet: Improved mobilenet with a fast
https://doi.org/10.1109/CVPR.2016.319 downsampling strategy. In 2018 25th IEEE International
[18] Muhammad, M.B., Yeasin, M. (2020). Eigen-CAM: Conference on Image Processing (ICIP), pp. 1363-1367.
Class activation map using principal components. In https://doi.org/10.1109/ICIP.2018.8451355
2020 International Joint Conference on Neural Networks [27] Simonyan, K., Zisserman, A. (2014). Very deep
(IJCNN), pp. 1-7. convolutional networks for large-scale image recognition.
https://doi.org/10.1109/IJCNN48605.2020.9206626 arXiv preprint arXiv:1409.1556.
[19] Patro, B.N., Lunayach, M., Patel, S., Namboodiri, V.P. [28] Hsiao, T.Y., Chang, Y.C., Chou, H.H., Chiu, C.T. (2019).
(2019). U-cam: Visual explanation using uncertainty Filter-based deep-compression with global average
based class activation maps. 2019 IEEE/CVF pooling for convolutional networks. Journal of Systems
International Conference on Computer Vision (ICCV), Architecture, 95: 9-18.
pp. 7444-7453. https://doi.org/10.1016/j.sysarc.2019.02.008
https://doi.org/10.1109/ICCV.2019.00754 [29] Kouretas, I., Paliouras, V. (2019). Simplified hardware
[20] Krizhevsky, A., Sutskever, I., Hinton, G.E. (2017). implementation of the softmax activation function. In
ImageNet classification with deep convolutional neural 2019 8th International Conference on Modern Circuits
networks. Communications of the ACM, 60(6): 84-90. and Systems Technologies (MOCAST), pp. 1-4.
https://doi.org/10.1145/3065386 https://doi.org/10.1109/MOCAST.2019.8741677
1279