Professional Documents
Culture Documents
A Computational Method To Assist The Diagnosis
A Computational Method To Assist The Diagnosis
A Computational Method To Assist The Diagnosis
1 Abstract: Breast cancer has been the second leading cause of cancer death among women. New
2 techniques to enhance early diagnosis are very important to improve cure chances. This paper
3 proposes and evaluates an image analysis method to automatically detect patients with breast
4 benign and malignant changes (tumors). Such method explores the difference of Dynamic Infrared
5 Thermography (DIT) patterns observed in patients’ skin. DIT images of the breast are acquired
6 by an infrared camera while the breasts recover from cooling up thermal stabilization of the skin
7 temperature in an examination room. Tumors produce temperature time series different when
8 compared with the time series of healthy tissue. The images are available at the Database for
9 Mastology Research by Infrared Image - DMR-IR, (http://visual.ic.uff.br/dmi). After obtaining the
10 sequential DIT images of each patient, their temperature arrays are computed and new images are
11 generated. Then the regions of interest (ROIs) of the images in grayscale are segmented and from
12 them also the ROIs of the temperature arrays are determined. Features are extracted from the ROIs
13 (they are based on statistical, clustering, histogram comparison, fractal geometry, diversity indices
14 and spatial statistics). Time series that are broken down into 9 subsets of different cardinalities are
15 generated from such features. Better features are automatically selected and used in the Support
16 Vector Machine (SVM) classifier to classify the breasts, using the Leave-One-Out Cross-Validation
17 method. Accuracy of 100% was obtained for a sample of 64 breast (composed of 32 healthy and
18 32 with some abnormality). Tests were carried out with 12 cases that did not participate in the
19 construction of the classification model and 100% accuracy was achieved in the classification of these
20 12 new cases.
22 1. Introduction
23 Breast cancer is the most common type of cancer among women. The World Health Organization
24 (WHO) states that 2.1 million women are impacted by this disease each year and that approximately
25 15% of all deaths among women in 2018 from cancer were from breast cancer [1]. Some risk factors for
26 the development of breast cancer are well known, such as: aging, women’s reproductive life, family
27 history, alcohol consumption, obesity, physical inactivity and exposure to ionizing radiation.
28 For a better prognosis, minimizing the aggressiveness of the treatment, and a decrease in
29 mortality, the diagnosis of early breast cancer assumes a decisive role. The term primary prevention
30 in epidemiology is used to refer to measures capable of preventing the onset of a health problem;
31 on the other hand, secondary prevention is related to measures that prevent the development of the
32 next stage of disease. Due to its complex and multifactorial etiopathogenesis, up to date, there is no
33 effective primary prevention for breast cancer [2], so efforts are directed to measures seeking secondary
34 prevention. Early-stage disease survival can reach 98% in 5 years.
35 Breast cancer screening aims at detecting asymptomatic tumors in order to reduce mortality from
36 the disease and increase survival chances after diagnosis. Specific questions include determining who
37 should be screened (risk stratification, the age to begin screening and to stop), and which method
38 to use for screening. As more information becomes available about the occurrence of false-positive
39 diagnostic results and the increasingly effective treatment for breast cancer, the relationship between
40 benefits and harms of screening for breast cancer has received increasing attention.
41 Mammography is an examination used for screening and considered as gold standard for cancer
42 detection. However, even mammography has some limitations, such as high false positive rates,
43 insufficient effectiveness in dense breasts and use of ionizing radiation. According to Arabi et al. [3],
44 the risk of patient developing breast cancer increases 2% for each X-ray exposure.
45 Breast thermography is yet another breast disease screening tool. It is based on the principle that
46 breast tissues with some disease behave thermally different from healthy tissues. Contrary to Figure
47 1, the temperature difference between healthy and diseased breasts is not always so evident. Often,
48 normal temperature variation is quite subtle in the early stages of malignancy, and changes may go
49 unnoticed by direct visual interpretation of thermographies. Normal breasts with increased vases are
50 generally warmer and may be misinterpreted as abnormal. Therefore the subjective interpretation of
51 thermographies often leads to false diagnoses and intelligent detection systems are required [4].
52 This article proposes a method capable of assisting in the detection of breast abnormalities. The
53 method is based on the analysis of time series generated from features computed from the breast
54 region. These features are calculated from DIT, using an image acquisition protocol described in [5].
55 The remaining text is organized as follows. Section 2 presents literature review. In Section 3,
56 the here proposed method is discussed. Section 4 presents and analyses the results achieved by the
57 proposed method. In Section 5, the conclusions and future work that can be developed from the
58 method and results achieved in this work are related.
59 2. Basic Concepts
60 This section presents the main concepts of the proposed method: the static and dynamic breast
61 infrared exams and computer-aided diagnosis.
62 Thermography is the recording of the body’s temperature distribution using infrared radiation
63 emitted by its surface. Special cameras are used to detect such radiation and, along with other
64 parameters, they are used to compute the object temperature [6]. Advances in infrared camera
65 technologies have increased the application of the system in the medical field [7]. Breast thermography
66 has been considered a method to aid in the screening of breast cancer, because anatomical changes in
67 tissue may be preceded by physiological changes such as an increase in metabolic activity [8].
68 The differentiated thermal behavior of breasts with cancer can be explained due to the pathological
69 angiogenesis process, which is the emergence of blood vessels for supply the tumor [6]. In this process,
70 there is a decrease in flat muscle cells of the vessels making them unable to perform the vasoconstriction
71 that would normally occur [9]. Moreover, the nitric oxide is a vasodilating substance that is also locally
Version June 6, 2020 submitted to Journal Not Specified 3 of 23
72 produced by cancer cells; and the inflammation process caused by cancer [6]. Thermography has
73 advantages in screening for breast disease in younger women, because mammography is not indicated
74 for women with dense breasts [10]. Infrared thermography can be categorized in relation to the body’s
75 behavior under heat transfer, and classified as static or dynamic.
118 sequence. The Fast Fourier Transform is applied over each time series generating an energy spectrum,
119 for each one. Features are extracted from these spectra and statistical analyzes are performed on them
120 in order to discriminate between healthy and sick breasts. The conclusions are based only on graphs
121 and tables, without using computational techniques in decisions. Specificity and sensitivity above 95%
122 were achieved in some tests, using 100 patients, 343 with cancer and 66 with benign tumors.
123 In the work developed by Wishart et al. [19], images are obtained by DIT. For 5 minutes, a cold
124 air flow was directed to the patients’ breasts and during this period 250 breast thermographies are
125 acquired by a thermal camera. Images of 106 patients were used. From that group of patients, 65 are
126 diagnosed with cancer and 41 with benign lesions. Thermal asymmetry features and temperature
127 differences were extracted from the sequential images. The authors did not determine the features that
128 were extracted from the images. For classification, an artificial intelligence method developed by the
129 authors called NoTouch BreastScan was used. The authors did not clarify how many features were
130 exactly extracted nor did they detail the classification technique. The method obtained a sensitivity of
131 70% and specificity of 48%.
132 Gerasimova et al. [20] proposed a multifractal analysis of DIT as an efficient method to identify
133 women at risk of breast cancer. They used a Wavelet-based multiscale method to analyze temporal
134 fluctuations in breast skin temperature obtained from a group of breast cancer patients and some
135 healthy breast volunteers. The authors demonstrated that the multifractal complexity of temperature
136 fluctuations observed in healthy breasts is lost in breasts with malignant tumors. Forty-seven patients
137 participated in the study, being 33 diagnosed with cancer and 14 healthy. The classification was
138 performed visually through graphs and tables. The authors obtained a sensitivity of 76% and specificity
139 of 86% and used a camera for image acquisition with a rate of 50 images per second.
140 Saniei et al. [21] developed a method where only two images are used. The patient is instructed
141 to immerse her hands in a container with water and ice for 1 minute (temperature of approximately
142 5 ◦ C) in order for vasoconstriction to occur on the breast surface. Immediately after removing the
143 hands from inside this container, another thermogram is captured. By computational techniques, both
144 breasts are segmented and images are recorded. The next step is the segmentation and skeletonization
145 of the vascular pattern present in the breasts, in the two thermograms. Features are extracted from
146 these vascular patterns and techniques similar to those used for fingerprint recognition are applied to
147 compare vascular patterns, from thermograms before and after thermal stress. In order to quantify
148 the degree of similarity between the patterns in the two images, a relationship score is generated by
149 means of a mathematical expression and these patterns will be considered if the score is below a certain
150 threshold, which is defined empirically. According to the authors, the method reached a sensitivity of
151 86% and specificity of 61%. Images of 50 patients, 25 with and 25 without breast cancer, were used.
152 Silva et al. [22] developed a method of analyzing thermal images of the breast obtained by DIT. In
153 this method, the patient is classified as healthy or having an abnormality in the breast. Time series of
154 temperature were generated from the region of interest (the breasts) and unsupervised and supervised
155 machine learning techniques were applied to them. A database with 70 patients was used, 35 healthy
156 and 35 with some breast anomaly. According to the authors, the method reached a sensitivity of
157 98% and specificity of 100%. A camera with a rate of 0.066 images per second was used for image
158 acquisition.
159 Baffa and Lattari [23] used deep learning (a Convolutional Neural Network) to classify patients
160 into healthy and un- healthy, using the thermographies obtained by DIT and SIT, from the DMR-IR
161 database [5]. To analyze the images obtained by DIT, they data of 137 patients, which 95 patients
162 were considered healthy and 42 were assigned to the non-healthy set. In addition to assessing the
163 correctness rate of the classification, the training time of the neural network was recorded. For the
164 images obtained by DIT, four strategies were defined to generate a single image that would feed the
165 neural network, for example, for each patient, to generate an image of the difference between the
166 last and the first image in the image sequence. They obtained 98% of average accuracy with imagens
167 obtained by DIT and 95% with imagens obtained by SIT.
Version June 6, 2020 submitted to Journal Not Specified 5 of 23
168 Table 1 contains the main features of the works presented in this section: Their proposed
169 methodologies for systems to aid medical diagnosis, with the exception of Silva et al.[22]’s work
170 and of Baffa and Lattari [23], achieved decisions based on computationally processed graphics or
171 images but did not on results from machine learning techniques, like the here present work.
195 Sequential images of 64 breasts were used, of which 32 were diagnosed with breast cancer or some
196 radiological findings, and 32 were healthy. As for each breast 20 sequential images were analyzed,
197 a total of 1280 images are used in the learning phase. The camera used is a FLIR model SC620 [24],
198 which has a sensitivity of less than 0,04o C and a standard range from −40 ◦ C to 500 ◦ C. Acquired
199 frames have a dimension of 640x480 pixels.
206 these arrays allow the use of some features based on the real temperatures (in Celsius degrees and in
207 decimals) and not only the integer transformed gray scale values. The aim is to increase the precision
208 of the information extracted from the breasts, since, when quantifying the arrays for the generation of
209 tonal images, details of the information is lost.
214 The generation of these images is justified by two reasons. The first is the need to segment the
215 region of interest that will happen in the next step of the method. The second reason is the computation
216 of some image based features. The segmentation of the region of interest is very important, it is
217 necessary since the analyses performed in this work are based on the region of the patients’ breasts,
218 disregarding the background of the image and the rest of the thorax. The analyses based on the
219 temperature arrays will use the limits of the segmented breasts as a mask in order to consider only the
220 temperature array values belonging to the regions of the breasts.
221 The gray scale images from the temperature arrays are defined according to by Equation 1.
222 where pixelvalue[i] is the pixel value of the image and min and max correspond to the lowest and highest
223 temperature value of the array. Each temperature value of the array is used in the transformation to
224 obtain the corresponding pixel of the gray scale image. It is noteworthy that during the process of
225 generating images in gray scale, the same maximum and the same minimum values of the temperature
226 arrays were used for all exams. In other words, the highest and lowest temperature of all the
227 temperature arrays used in the study were first of all obtained. Thus, it was ensured that all the
228 images were generated on the same scale and a comparison could be made among them.
233 All 1,280 images (64 breasts, each with 20 sequential images) used in the learning stage of
234 the method and the 240 images (12 breasts, each with 20 sequential images) used in the test
235 step were manually segmented using the GIMP tool [26], which is a free image editing software.
236 During this process, three people were responsible for these segmentations. Two of them were
237 responsible for manual segmentation, while the third checked whether segmentation was done
238 properly. If it was not proper, segmentation was repeated. All segmented images are available at
239 http://visual.ic.uff.br/proeng/thiagoelias/.
240 Even during the extraction process of the features based on the temperature arrays (not on the
241 images in grayscale), it is necessary to use the segmented images in gray scale, because such images
242 serve as a mask for the arrays. Thus, only the temperature elements belonging to the breasts are
243 considered in the computation of the features. Figure 7 exemplifies the use of ROI in gray scale for
244 determining the temperature array inside the ROI.
Figure 7. Determination of the ROI of the temperature array from the ROI in grayscale.
Version June 6, 2020 submitted to Journal Not Specified 9 of 23
260 These features are the Average Temperature and the Standard Deviation. They are extracted from
261 the ROI’s temperature values, i.e., considering only the temperatures of the breast regions of the
262 patients. For this purpose, the segmented images of the breasts were used as mask for the temperature
263 arrays, as detailed in Section 4.4. For each of the 20 sequential arrays of a breast on analysis, the mean
264 temperature and standard deviation were calculated, generating 20 values for each of these features.
266 Cluster-based features are also computed from the temperature arrays, and do not from gray
267 scale images. The temperatures in Celsius degrees of the sequential images of each of the breasts are
268 analyzed. The segmented images of the breasts are used as masks for the arrays. To extract these, the
269 K-means clustering algorithm proposed by MacQueen [27] was used. The amounts of 3, 6, 9, 12, and 15
270 seeds were tested, but the best results were obtained with 9 seeds. The choice of the 9 seeds for the
271 initialization of the algorithm was based on what was proposed by Arthur and Vassilvitskii [28], that
272 consists of an approximation algorithm for the k-means NP-difficult problem. After completing this
273 step, each element of a given group was re-positioned in another group in order to verify if this new
274 positioning generated a better degree of internal homogeneity than the group generated randomly.
275 The degree of homogeneity was obtained from the sum of residual squares, obtained by Equation 2.
nj
SQRes( j) = ∑ d2 (oi ( j); ō( j)) (2)
i =1
276 where oi ( j), ō ( j) and n j are respectively the values of the i-th register, the center of group j, and the
277 number of records of this group. The movement of elements between groups to try to obtain the
278 greatest internal homogeneity happened 1000 times with an accuracy of 0.01. After the grouping
279 stage of each breast temperature array, the centroid with the highest temperature value among the
280 9 centroids was identified. This value of the largest centroid was used as a feature and that feature
281 received the name of Grouping. This procedure was repeated for the 20 sequential arrays of each
282 breasts.
Version June 6, 2020 submitted to Journal Not Specified 11 of 23
284 An image histogram is a quantification of the number of pixels for each image gray level. In other
285 words, it is a representation of the intensities presented in an image [29]. Here, the comparison of two
286 histograms are done based on Bhattacharyya Distance, Chi-square Distance and Intersection Distance
287 [30], we call these features, respectively, by Battacharyya, Chi-square and Intersection. In this work, the
288 comparison is made between two subsequent frames, that is, the distances between i-th histogram and
289 (i + 1)-th histogram. Thus, as two histograms are necessary to generate a single value, for each feature
290 based on histogram comparison of a given breast, 19=(20 − 1) values were obtained.
291 4.5.1.4 Features of Texture from the Histograms of Sum and Differences
292 Six features presented by Baraldi and Parmiggiani [31] were analyzed from the sum and difference
293 histograms (Section 4.5.1.3), as proposed by Unser [32], for the study of the texture of each breast.
294 These features are: energy, entropy, contrast, variance, correlation and homogeneity. The analysis was
295 carried out in two ways: the first considering the normalized histograms of the sum and difference
296 with Dx = 1 and Dy = 0, generating the features that we will call Horizontal Energy, Horizontal Entropy
297 Horizontal Contrast Horizontal Variance Horizontal Correlation Horizontal Homogeneity; and the second
298 considering sum and difference of normalized histograms with Dx = 0 and Dy = 1, generating the
299 features that we will call Horizontal Energy, Vertical Entropy Vertical Contrast Vertical Variance Vertical
300 Correlation Vertical Homogeneity.
302 The following ecological diversity indexes were used to analyze the 20 sequential images of each
303 breast in this work: Macintosh, Simpson and Berger-Parker index [33]. The calculation of these indexes
304 was performed on sum and difference histograms. The population considered was all frequencies of
305 sum and difference histograms (Section 4.5.1.3). The species were created by segmenting the sum and
306 difference histograms into groups of 7 bins. For example, considering the histogram of sums, which
307 ranges from 0 to 510, the first species is considered from bin 0 to bin 6; from bin 7 to bin 13 it was
308 considered the second species and so on. The same was done with the histogram of differences, which
309 ranges from -255 to 255. In total, 146 different species were created (73 generated from the histogram
310 of sums and 73 generated from the histogram of differences).
311 As in the last subsection, because these features are also based on sum and difference histograms,
312 two neighborhoods on computing these histograms are considered. The histograms of sum and
313 difference with neighborhood Dx = 1 and Dy = 0 and with neighborhood Dx = 0 and Dy = 1.
314 Thus, six features were generated that we will call Horizontal Macintosh, Horizontal Simpson, Horizontal
315 Berger-Parker, Vertical Macintosh, Vertical Simpson and Vertical Berger-Parker.
317 Like histograms, spatiograms also capture high order moments of and array or image. The second
318 order moment’s of spatiogram of an object is similar to a histogram of same object, except that it also
319 stores additional spatial information. Here, the mean and covariance considering the spatial position
320 of pixels falling on each bin of the histogram is used [34]. As in the comparison of histograms, the
321 spatiogram comparisons follow the sequential order of frames in the examination: the i-th frame was
322 compared to the spatiogram of the (i+1)-th frame . Thus, at the end, 19 spatiogram were compared
323 for each breast. However, in order to perform correct positional comparison, the sequential images of
324 each breast must be submitted to the process of image registration. The technique used in this work
325 for registration was presented in [35]. This feature we will call Spatiogram.
Version June 6, 2020 submitted to Journal Not Specified 12 of 23
327 The fractal dimension can be seen as a measure of the irregularity of many physical processes.
328 However, different fractal sets with diverse textures may share the same fractal dimension value
329 [36]. To discriminate these textures, fractal dimension alone would be useless. Thus, Mandelbrot [36]
330 introduced an second measure: the lacunarity to better distinguish fractals of the same dimension but
331 with different texture appearances. Although there are several authors that propose different methods
332 to calculate the lacunarity, we chose to use the method proposed by Du and Yeo [37], using the images
333 in gray scale. The reason why we chose to use gray scale images instead of binary images, which have
334 a lower computational cost, is due to the process of binarization causes loss of information, reducing
335 the study possibilities. In this work, for computation of the Lacunarity feature the following sizes were
336 used in sliding window pixels: 2x2, 4x4, 8x8, 16x16, 32x32 and 64x64 .
Figure 11. Time series of average temperature for sick (red) and healthy (blue) breasts
347 In the next two sections, the process of generation of the time series with their different number
348 of elements (cardinality) will be presented, as well as the process of computation the various types of
349 features.
351 A time series is a sequence of data obtained at regular intervals of time during a specific period.
352 In the analysis of a time series, the main desire is to model the studied phenomenon to describe the
353 behavior over time [38]. Here, given a breast, and considering a particular feature extracted from its
Version June 6, 2020 submitted to Journal Not Specified 13 of 23
354 N gray scale images or temperature arrays, a time series is obtained when these N values is ordered
355 representing the variation of the feature over time, while the recovery of the breast temperature occurs,
356 after the cooling be induced, as already described.
Based on the Higuchi proposal [39], from a time series, sub-series of different resolutions can
be generated and also analyzed for better better observation of a behavior at a later stage. Given a
one-dimensional series of N values X = x (1), x (2), ..., x ( N ), new series were generated by:
N−m
Xkm = x (m), x (m + k), x (m + 2k), ..., x m + int k (3)
k
where k and m are integers: k represents the discrete interval between the points and m indicates the
initial value of the analyzed series with m = (0, 1, ..., (k − 1)). The number of elements of each new
series generated is calculated using by:
N −m
∑b k c | x (m + ik ) − x (m + (i − 1)k)|( N − 1)
i =1
L(m, k ) = /k (4)
b N −k m c k
357 where b( a)c represents the integer part of ( a), ( N − 1)/b( N − m)/kck is the normalization factor of
358 the series. In this work, the value of k ranges from 1 to 4. Thus, 10 sub-series can be generated for each
359 feature proposed, as can be observed in Table 2.
Table 2. The cardinality of the time series generated for each feature.
360 Figure 12 shows an example of the 10 resolutions generated from the sub series of a given feature
361 for two breasts, varying the k and m, as explained previously. In the horizontal axis the sequential
362 images are represented and in the vertical axis the value of the feature is shown. The series in red
363 represents the feature extracted from the sequential images of a sick breast. The series in blue represents
364 the same feature extracted from images of a healthy breast. The k and m present different values when
365 generating sub series in new resolutions. The main goal of new series is to eliminate outliers that may
366 arise, as observed in the first one of Figure 12 ( k = 1, m = 0). In small time increments, there is no
367 clear separation between the diseased and healthy breasts. On the other hand, it is possible to note
368 that, when k = 4, m = 3, the separation between the diseased breast and healthy breast is clear. Same
369 procedures were repeated for all features computed from each breast analyzed in this work.
Version June 6, 2020 submitted to Journal Not Specified 14 of 23
Figure 12. Time series of two breasts, healthy (blue) and sick (red), with different cardinalities
371 After the time series of all the features of each breast were created new features are computed
372 from them. Firstly, the series were separated by the groups of features described in Section 4.5.1.
373 Secondly, each group was subdivided by the type of the series, that is, from the values of k and m.
374 Therefore groups of series were formed as, for instance, those in Figure 13 and Figure 14. These figures
Version June 6, 2020 submitted to Journal Not Specified 15 of 23
375 show the series (with k = 2) of the group of features based on the comparison of histograms, explained
376 in Section 4.5.1, for two breasts: one healthy and one diseased in blue and red, respectively. The group
377 of feature in question is formed by three distances between histograms: Battacharyya, Chi-square and
378 Intersection. Thereby, for each breast, there are 3 time series representing each of the 3 features of each
379 group. The series represented by the solid line represent the intersection between the histograms, the
380 dashed line represents the chi-square distance of the histograms and the dotted lines represent the
381 Bhattacharyya distance of the histograms.
Figure 13. Time series based on the comparison of histograms, for healthy (blue) and sick (red) breasts,
k = 2 and m = 0.
Figure 14. Time series based on the comparison of histograms, for healthy (blue) and sick (red) breasts,
k = 2 and m = 1.
382 After the organization of the time series by group of features and by the type of the series, the
383 second phase of the feature extraction process started, which corresponds to the computation of
384 features from these new time series. Now each time series, two new features were computed. The first
385 corresponds to the range ( R) of the features, that is, the difference between the largest and the smallest
386 value of the characteristic described in the time series. The second feature corresponds to the square
387 root of the sum of the square of the time series values ( M2). They are defined from equations 5 and 7,
388 respectively.
s
∑in=1 xi2
M2 = (6)
n
Version June 6, 2020 submitted to Journal Not Specified 16 of 23
389 where xi represents the time series values and n is the number of values. Thus, using, for example, the
390 series of dimensions k = 2, m = 0 of the group of features based on the comparison of the histograms,
391 shown in Figure 13, for each breast, a total of 6 features were obtained: two features (range and square
392 root of the variance) of each of the time series representing each one of the 3 comparisons of histograms.
Xi − Xmin
Yi = (7)
Xmax − Xmin
408 where Yi is the value of the attribute Xi after scaled between 0 and 1, Xmin and Xmax are the
409 minimum and maximum values respectively obtained inspecting the whole sample of the same group.
410 The normalization is used to avoid that attribute with larger numerical ranges dominate those in
411 smaller numerical intervals. It also avoids numerical difficulties during calculation because some
412 processes generally depend on the internal products of feature vectors. Consequently, it is always
413 recommended to normalize the attributes to [−1, 1] or [0, 1].
414 The used SVC (Support Vector Classification) allows classification in two classes, to generate
415 the classification model, learning was configured in two ways: c-SVC and nu-SVC. Where c is a
416 regularization factor for the feature space and n is an upper limit on the fraction of training errors
417 and a lower bound of the support vector fraction. The core function used for the classification was
418 the Radial Basis Function (RBF), as indicated by Hsu et at. [41], and tests performed confirming that
419 this function actually presented the best results. The leave-one-out validation technique was used to
420 validate the classifier performance. It is a validation technique used to ensure that the results presented
421 by a classifier are independent of the used dataset. The technique uses a single feature vector of a class
422 as the validation data, and all remaining vectors of the sample, as the training data. This is repeated so
423 that each feature vector in the sample is used once as validation data.
430 5. Results
431 In this section, the results achieved from the method proposed in the previous section are
432 presented and discussed. A summary is presented first with the best classification results by group
433 of features. Then, the feature selection is performed and the results are presented for each series. In
434 the next sub section, the results of the classification based on the result of the feature selection are
435 presented. Then, the results achieved are discussed, in addition to testing the proposed method with
436 new breast exams that did not participate in the learning stage of the method. Comparisons of the
437 results achieved with the proposed method are also made with other related studies of the literature
438 for the diagnosis of breast diseases based on infrared images.
447 Figure 15 shows on the vertical axis the groups of features considered in Section 4.5.1, and also
448 a group formed by all features. On the horizontal axis, a scale is presented with the values as a
449 percentage of the accuracy. The bars in the graph represent the highest accuracy values achieved by
450 varying the size of the series and the classifier (C-SVC and Nu-SVC) for each group of features. Still in
451 relation to Figure 15, it is possible to notice that the greatest accuracy was obtained by the features of
452 the group of simple statistics and by the union of all features, reaching values close to 97% accuracy. On
453 the other hand, groups formed by diversity indexes (horizontal and vertical) and lacunarity obtained
454 lowest rates of accuracy.
k m Feature Names of
value value # selected features
1 0 7 R Average Temperature; A Average Temperature; R Grouping; R Horizontal Energy;
R Spatiogram; A Horizontal Contrast; A Vertical Contrast
2 0 6 R Average Temperature; A Average Temperature; R Grouping; R Lacunarity;
A Intersection; A Vertical Contrast
2 1 8 R Average Temperature; A Average Temperature; R Grouping; A Horizontal Energy;
R Horizontal Energy; R Spatiogram; A Intersection; A Vertical Contrast
3 0 7 R Average Temperature; A Average Temperature; R Grouping; R Horizontal Energy;
R Horizontal Contrast; R Lacunarity; A Vertical Contrast
3 1 5 R Average Temperature; A Average Temperature; R Lacunarity; A Intersection;
A Horizontal Energy
3 2 7 R Average Temperature; A Average Temperature; R Lacunarity; R Spatiogram;
A Intersection; A Horizontal Homogeneity; A Vertical Contrast
4 0 6 R Average Temperature; A Average Temperature; R Grouping; R Horizontal Energy;
A Horizontal Energy; R Lacunarity
4 1 6 R Average Temperature; A Average Temperature; R Horizontal Entropy; R Spatiogram;
R Lacunarity; A Intersection
4 2 7 R Average Temperature; A Average Temperature; R Grouping; R Horizontal Entropy;
R Lacunarity; A Bhattacharyya; A Vertical Contrast
4 3 6 R Average Temperature; A Average Temperature; R Grouping; A Intersection;
R Lacunarity; A Vertical Contrast
460 From the results presented in Table 3, it can be seen that some features were not selected in any of
461 the types of time series considered in this work. All features related to diversity indexes; standard
462 deviation; the comparison of histograms using the chi-square; the horizontal variance; the horizontal
463 correlation; the vertical variance; the vertical energy; the vertical correlation; the vertical entropy and
464 the vertical homogeneity were not selected for any time series. On the other hand, other features such
465 as those related to the Average Temperature, clustering, comparison of histograms using the intersection,
466 horizontal energy, vertical contrast and lacunarity were selected in all or most of them.
Table 4. Result of the Classification with C-SVC for each Series After the Feature Selection.
472
Table 5. Result of the classification with Nu-SVC for each series after the feature selection.
473
474 Using C-SVC, the best results were observed with 100% accuracy in the series k = 4, m = 0 in which 6 features
475 were selected; and in the series k = 4, m = 2, where 7 features were selected. Using Nu-SVC, the best results were
476 achieved with 100% accuracy in the series of k = 2, m = 0, in which 6 features were selected; in the series k = 4,
477 m = 2, where 7 features were selected; and in the series k = 4, m = 3, where 6 features were selected.
479 The best results were achieved after the step of selecting features, obtaining an accuracy of 100% in 5 different
480 time series with their respective selected features, as shown in Table 4 and Table 5. For the choice of the series
481 recommended by this work and its respective selected features, the lowest number of ROIs and the smallest number
482 of selected features were adopted as selection criteria.
483 Although 100% accuracy has been achieved in 2 different types of time series when using C-SVC and in 3 types
484 of series when using Nu-SVC, only the series of k = 4, m = 0 using C-SVC, with their respective selected features,
485 is recommended by this work for the construction of a model to aid the diagnosis of breast diseases using dynamic
486 thermography.
Version June 6, 2020 submitted to Journal Not Specified 21 of 23
487 Analyzing the 5 types of time series and their respective selected features, which provided 100% accuracy, in
488 two of them (k = 4, m = 2 with C-SVC and Nu-SVC) 7 features were selected. These are not recommended by this
489 study since they require a greater number of features extracted from the sequential ROIs of each breast.
491 In order to further test the method proposed in this work, a classification of 12 new breasts that did not
492 participate in the development stage of the method was carried out. Of these 12 new cases, 6 are considered sick and
493 6 are healthy. Table 6 shows the classification result:
Table 6. Result of the classification of 12 new cases using the proposed method.
Sensitivity Specificity Accuracy Youden’s index ROC area
100% 100% 100% 1 1
494 6. Conclusions
495 Breast cancer is the cancer that most affects women worldwide. The diagnosis of the disease in the first years
496 is decisive for its cure. In this article, a promising method to aid in the detection of breast diseases was presented.
497 From the method based on dynamic thermographies, it is possible to have indications for complementary exams
498 to be carried out in order to detect possible diseases. The presented ideas would assist in the initial screening of
499 patients, not replacing the other tests currently on used for breast screening.
500 The best results of this approach reached an accuracy of 100% using SVM implemented in WEKA with all its
501 default parameters. The validation technique used was Leave-One-Out Cross-Validation. It is worth mentioning that
502 these results are achiever using the Database for Mastology Research with Infrared Image - DMR-IR, accessible at
503 http://visual.ic.uff.br/dmi. This data stores, in addition to infrared data other exams, such as mammography, and
504 clinical data of volunteer. Therefore, future works would do the association of analysis of images and clinical data of
505 patients to assist in the diagnosis of breast diseases. The proposed method is able to assist in the diagnosis of breast
506 diseases, giving indications of which patient’s breast has an anomaly. However, it is not possible to indicate in which
507 region of the breast the anomaly is located. Thus, it is suggested as a future work the development of techniques
508 that, in addition to assisting in the indication of the possibly sick breast, indicate the region of the breast where the
509 anomaly is located. In view of the time required for the manual segmentation of the ROIs of the images to be used
510 by the presented method the development of an automatic technique for segmenting the ROIs is an important point.
511 Acknowledgments: The authors are thankful to the patients who consent on the use of their data for research. T.
512 A. E. S. gratefully acknowledge the support from CAPES/DINTER financial support. A. C. thanks to FAPERJ
513 project SIADE2, INCT-MACC and CNPq Universal 402988/2016-7 and PQ 305416/2018-9.
514 References
515 1. Organization, W.H. Breast cancer. https://www.who.int/cancer/prevention/diagnosis-screening/
516 breast-cancer/en/, 2020. March 24, 2020.
517 2. Warner, E. Breast-Cancer Screening. The New England journal of medicine 2011, 365, 1025–32.
518 3. Arabi, P.; Muttan, S.; Suji, R. Image enhancement for detection of early breast carcinoma by external
519 irradiation. Computing Communication and Networking Technologies (ICCCNT), 2010 International
520 Conference on, 2010, pp. 1–9.
521 4. Francis, S.V.; Sasikala, M.; Bharathi, G.B.; Jaipurkar, S.D. Breast cancer detection in rotational thermography
522 images using texture features. Infrared Physics Technology 2014, 67, 490 – 496.
523 5. Silva, L.F.; Saade, D.C.M.; Sequeiros, G.O.; Silva, A.C.; Paiva, A.C.; Bravo, R.S.; Conci, A. A New Database
524 for Breast Research with Infrared Image. Journal of Medical Imaging and Health Informatics 2014, 4, 92–100.
525 6. Kennedy, D.A.; Lee, T.; Seely, D. A Comparative Review of Thermography as a Breast Cancer Screening
526 Technique. Integrative Cancer Therapies 2009, 8, 9–16.
Version June 6, 2020 submitted to Journal Not Specified 22 of 23
527 7. EtehadTavakol, M.; Lucas, C.; Sadri, S.; Ng, E. Analysis of Breast Thermography Using Fractal Dimension
528 to Establish Possible Difference between Malignant and Benign Patterns. Journal of Healthcare Engineering
529 2010, 1, 27–44.
530 8. Gerasimova, E.; Audit, B.; Roux, S.G.; Khalil, A.; Argoul, F.; Naimark1, O.; Arneodo, A. Multifractal analysis
531 of dynamic infrared imaging of breast cancer. EPL (Europhysics Letters) 2013, 104, 68001–p1–68001–p6.
532 9. Ziyad, S.; Iruela-Arispe, M.L. Molecular mechanisms of tumor angiogenesis. Genes & Cancer 2 2011,
533 2, 1085–1096.
534 10. Ng, E.Y.K. A review of thermography as promising noninvasive detection modality for breast tumor.
535 International Journal of Thermal Sciences 2009, 48, 849–859.
536 11. Amalu, W.C.; Hobbins, W.B.; Head, J.F.; Elliot, R.L. Infrared Imaging of the Breast: A Review. CRC Press,
537 Taylor and Francis Group 2008, pp. 9.1–9.22. Book: Medical Infrared Imaging; publishers: Nicholas A.
538 Diakides e Joseph D. Bronzino.
539 12. Herman, C. The role of dynamic infrared imaging in melanoma diagnosis. Expert Review of Dermatology
540 2013, 8, 177–184.
541 13. Anbar, M. Dynamic Thermal Assessment. CRC Press, Taylor and Francis Group 2008, pp. 9.1–9.22. Book:
542 Medical Infrared Imaging; publishers: Nicholas A. Diakides e Joseph D. Bronzino.
543 14. Amalu, W.C. Nondestructive Testing of the Human Breast: The Validity of Dynamic Stress Testing in
544 Medical Infrared Breast Imaging. Engineering in Medicine and Biology Society 2004, 1, 1174–1177.
545 15. Moghbel, M.; Mashohor, S. A review of computer assisted detection/diagnosis (CAD) in breast
546 thermography for breast cancer detection. Artificial Intelligence Review 2011, 39, 305–313.
547 16. Calas, M.; Gutfilen, B.; Pereira, W. CAD and mammography: why use this tool? Radiologia Brasileira 2012,
548 45, 46–52.
549 17. Tang, J.; Agaian, S.; Thompson, I. Guest Editorial: Computer-Aided Detection or Diagnosis (CAD) Systems.
550 IEEE Systems Journal 2014, 8, 907–909.
551 18. Anbar, M.; Milescu, L.; Naumov, A.; Brown, C.; Button, T.; Carty, C.; AlDulaimi, K. Detection of Cancerous
552 Breasts by Dynamic Area Telethermometry. Engineering in Medicine and Biology Magazine, IEEE 2001,
553 20, 80–91.
554 19. Wishart, G.C.; Campisid, M.; Boswella, M.; Chapmana, D.; Shackletona, V.; Iddlesa, S.; Halletta, A.; Britton,
555 P.D. The accuracy of digital infrared imaging for breast cancer detection in women undergoing breast
556 biopsy. European Journal of Surgical Oncology 2010, 36, 535–540.
557 20. Gerasimova, E.; Audit, B.; Roux, S.; Khalil, A.; Gileva, O.; Argoul, F.; Naimark, O.; Arneodo, A.
558 Wavelet-based multifractal analysis of dynamic infrared thermograms to assist in early breast cancer
559 diagnosis. Frontiers in Physiology 2014, 5.
560 21. Saniei, E.; Setayeshi, S.; Akbari, M.E.; Navid, M. A vascular network matching in dynamic thermography
561 for breast cancer detection. Quantitative InfraRed Thermography Journal 2015, 12, 24–36.
562 22. Silva, L.F.; Santos, A.A.S.; Bravo, R.S.; Silva, A.C.; Muchaluat-Saade, D.C.; Conci, A. Hybrid analysis for
563 indicating patients with breast cancer using temperature time series. Computer Methods and Programs in
564 Biomedicine 2016, 130, 142–153.
565 23. de Freitas Oliveira Baffa, M.; Grassano Lattari, L. Convolutional Neural Networks for Static and Dynamic
566 Breast Infrared Imaging Classification. 2018 31st SIBGRAPI Conference on Graphics, Patterns and Images
567 (SIBGRAPI), 2018, pp. 174–181.
568 24. Flir. SC620 INFRARED CAMERA SYSTEM [WWW Document].
569 http://w1.sayato.com/7040/file/FLIR%20SC620.pdf. accessed on April 2, 2020.
570 25. Flir. QuickReport Software. https://flir-quickreport.software.informer.com/1.2/. accessed on April 2,
571 2020.
572 26. GIMP. GNU Image Manipulation Program. https://docs.gimp.org/2.10/en/. accessed on April 2, 2020.
573 27. MacQueen, J. Some methods for classification and analysis of multivariate observations. Proceedings of
574 the Fifth Berkeley Symposium on Mathematical Statistics and Probability, Volume 1: Statistics; University
575 of California Press: Berkeley, Calif., 1967; pp. 281–297.
576 28. Arthur, D.; Vassilvitskii, S. K-Means++: The Advantages of Careful Seeding. 2007, Vol. 8, pp. 1027–1035.
577 29. Gonzalez, R.C.; Woods, R.E. Digital image processing (3rd Edition); Prentice Hall, 2008.
578 30. Ding, Y. Visual Quality Assessment for Natural and Medical Image; Springer, 2018.
Version June 6, 2020 submitted to Journal Not Specified 23 of 23
579 31. Baraldi, A.; Panniggiani, F. An investigation of the textural characteristics associated with gray level
580 cooccurrence matrix statistical parameters. IEEE Transactions on Geoscience and Remote Sensing 1995,
581 33, 293–304.
582 32. Unser, M. Sum and Difference Histograms for Texture Classification. IEEE Transactions on Pattern Analysis
583 and Machine Intelligence 1986, PAMI-8, 118–125.
584 33. Mueller, G.; Bills, G.; Foster, M. Biodiversity of Fungi: Inventory and Monitoring Methods. Biodiversity of
585 Fungi: Inventory and Monitoring Methods 2004, pp. 1–777.
586 34. Conaire, C.O.; O’Connor, N.E.; Smeaton, A.F. An Improved Spatiogram Similarity Measure for Robust
587 Object Localisation. 2007 IEEE International Conference on Acoustics, Speech and Signal Processing -
588 ICASSP ’07, 2007, Vol. 1, pp. I–1069–I–1072.
589 35. Falcon, A.D.; Galvao, S.S.L.; Conci, A. A non linear registration without the use of the brightness constancy
590 hypothesis. The 27th International Conference on Systems, Signals and Image Processing;
591 36. Mandelbrot, B.B. The Fractal Geometry of Nature; Henry Holt and Company, 1983.
592 37. Gan Du.; Tat Soon Yeo. A novel lacunarity estimation method applied to SAR image segmentation. IEEE
593 Transactions on Geoscience and Remote Sensing 2002, 40, 2687–2691.
594 38. Gan.; Guojun.; Ma, C.; Wu, J. Data Clustering: Theory, Algorithms, and Applications, ASA-SIAM Series on
595 Statistics and Applied Probability, SIAM; Philadelphia, ASA, Alexandria, VA, 2007.
596 39. Higuchi, T. Approach to an irregular time series on the basis of the fractal theory. Physica D: Nonlinear
597 Phenomena 1988, 31, 277–283.
598 40. Hall, M.; Frank, E.; Holmes, G.; Pfahringer, B.; Reutemann, P.; Witten, I.H. The WEKA Data Mining
599 Software: An Update. SIGKDD Explorations 2009, 11, 10–18.
600 41. Hsu, C.; Chang, C.; Lin, C.J. A Practical Guide to Support Vector Classication. Bioinformatics 2003, 1.
601 Sample Availability: Samples of the compounds ...... are available from the authors.
602
c 2020 by the authors. Submitted to Journal Not Specified for possible open access
603 publication under the terms and conditions of the Creative Commons Attribution (CC BY) license
604 (http://creativecommons.org/licenses/by/4.0/).