3456-Thesis in PDF Format-20769-1-2-20220726

MAMMOGRAM IMAGE PRE-PROCESSING WITH CONVOLUTIONAL
NEURAL NETWORK FOR BREAST CANCER DETECTION AND

CLASSIFICATION
By
Michael Jeremy
11806003
BACHELOR’S DEGREE
in
BIOMEDICAL ENGINEERING
FACULTY OF LIFE SCIENCES AND TECHNOLOGY
SWISS GERMAN UNIVERSITY

The Prominence Tower
Jalan Jalur Sutera Barat No. 15, Alam Sutera
Tangerang, Banten 15143 - Indonesia
Revision after Thesis Defense on 13 July 2022

MAMMOGRAM IMAGE PRE-PROCESSING WITH CONVOLUTIONAL NEURAL NETWORK Page 2 of 100
FOR BREAST CANCER DETECTION AND CLASSIFICATION
STATEMENT BY THE AUTHOR
I hereby declare that this submission is my own work and to the best of my knowledge,
it contains no material previously published or written by another person, nor material
which to a substantial extent has been accepted for the award of any other degree or
diploma at any educational institution, except where due acknowledgement is made in
the thesis.
Michael Jeremy
_____________________________________________
Student Date
Approved by:
Aulia Arif Iskandar, S.T., M.T.

_____________________________________________
Thesis Advisor Date
Muhammad Fathony, Ph.D.

_____________________________________________
Thesis Co-Advisor Date
Dr. Dipl.-Ing. Samuel P. Kusumocahyo

_____________________________________________
Dean Date
Michael Jeremy
ABSTRACT
MAMMOGRAM IMAGE PRE-PROCESSING WITH CONVOLUTIONAL

NEURAL NETWORK FOR BREAST CANCER DETECTION AND
CLASSIFICATION
By
Michael Jeremy
Aulia Arif Iskandar, S.T., M.T., Advisor
Muhammad Fathony Ph.D., Co-Advisor
SWISS GERMAN UNIVERSITY
Breast Cancer is one of the most common types of cancer. This research was conducted
with the purpose of developing a Computer-Aided Diagnosis to detect breast cancers
from mammogram images. The mammogram images were obtained from the INbreast
Dataset and Husada Hospital in Jakarta. The program used pre-processing which
includes Median Filtering, Otsu thresholding, Truncation Normalization, and Contrast
Limited Adaptive Histogram Equalization, then Convolutional Neural Network to
classify the mammogram images into either mass or normal, or either benign or
malignant. The best result achieved reached an accuracy, precision and sensitivity of
94.1%, 100% and 85.7% in classifying the mammogram images into benign or
malignant, and 88.3%, 92.6% and 83.3% in classifying the mammogram images into
mass or normal. The best average accuracy, precision and sensitivity obtained were
90.8%, 85.3% and 86.7% in classifying benign or malignant, and 87.2%, 87.8% and
86.7% in classifying mass or normal. In conclusion, the algorithm was able to classify
mammogram images and has provided results as high as other related researches. The
algorithm can be used as a tool for doctors and radiologists to aid breast cancer
diagnosis in Indonesia.
Keywords: Breast Cancer, Classification, Convolutional Neural Network, Image
Processing, Mammography.
Michael Jeremy
© Copyright 2022
by Michael Jeremy
All rights reserved
Michael Jeremy
DEDICATION
I dedicate this thesis to, first and foremost, Jesus Christ who have guided me and gave
me strength. In addition, to my family that have helped me and supported me, and my
friends that have provided me with the emotional support.
Michael Jeremy
ACKNOWLEDGEMENTS
These past three to four months have been very challenging yet rewarding for me.
Therefore, I would like to express my deepest gratitude towards some very special
people that have helped and supported me throughout this thesis. First and foremost, I
would like to thank my God, Jesus Christ, for it is because of Him, I was able to have
the strength to be able to complete this thesis.
I would like to thank Mr. Aulia Arif Iskandar, S.T., M.T. for mentoring throughout this
thesis work to help me develop valuable skills and knowledge that I will cherish and
use in the future. He has been a very trusting and very supportive despite my lack of
communication skills and stubbornness. Therefore, I really appreciate his gentle
encouragement, and thankful for his guidance, patience, support and advice. I would
also like to thank my co-advisor, Muhammad Fathony, Ph.D, who also has been very
supportive and have given exceptional advices and guidance. It is because of them that
I was able to have knowledge in programming and engineering.
Special thanks are due to my aunt and my uncle, Ms. Lisa Kosin and Mr. Hans, and
also Ms. Intan Lasmanasari and Ms. Nuryana from Husada Hospital for helping me and
giving me valuable advice both for my studies and personal life.
I would also like to thank Elnora Listianto Lie, my literal partner-in-crime during this
thesis. Thank you for all the advice and the time spent together trying to figure out and
understand things, staying up late, and giving mental support. This thesis would not
have been completed without you. Equally important, thank you for David Jourdan
Manalu, Sylvester Mozes, Joseph Bryan, and everyone in “Keluarga Cemara” and
“Ecclesiastes” and also others that I have not mentioned for accompanying me and
giving me advice and inputs, survival advice, and friendship. Last but not least, I would
like to thank my parents, my brothers, my sister. Because of them, I have learned the
importance of prioritizing and managing my schedule, and importance in having pride
in my work. I am truly blessed to have many amazing people around me, and I cannot
thank you enough for all the prayers for me during this thesis.
Michael Jeremy
TABLE OF CONTENTS
Page
STATEMENT BY THE AUTHOR ............................................................................... 2

ABSTRACT ................................................................................................................... 3
DEDICATION ............................................................................................................... 5
ACKNOWLEDGEMENTS ........................................................................................... 6
TABLE OF CONTENTS ............................................................................................... 7
LIST OF FIGURES ....................................................................................................... 9
LIST OF TABLES ....................................................................................................... 11
CHAPTER 1 - INTRODUCTION ............................................................................... 12
1.1 Background ............................................................................................................ 12
1.2 Research Problems ................................................................................................. 15
1.3 Research Objectives ............................................................................................... 15
1.4 Significance of Study ............................................................................................. 15
1.5 Research Questions ................................................................................................ 15
1.6 Hypothesis.............................................................................................................. 16
CHAPTER 2 - LITERATURE REVIEW .................................................................... 17
2.1 Cancer .................................................................................................................... 17
2.2 Breast Cancer ......................................................................................................... 20
2.3 Diagnosis of Breast Cancer .................................................................................... 21
2.4 Breast Cancer Diagnosis through Mammography ................................................. 22
2.5 Previously Proposed Methods................................................................................ 29
2.6 Image Pre-processing ............................................................................................. 32
2.7 Confusion Matrix ................................................................................................... 42
CHAPTER 3 – RESEARCH METHODS ................................................................... 44
3.1 Study Design .......................................................................................................... 44
3.2 Materials and Equipment ....................................................................................... 46
3.3 Algorithm for Image Classification ....................................................................... 49
3.4 Doctor and Radiologist Opinions........................................................................... 58
CHAPTER 4 – RESULTS AND DISCUSSIONS....................................................... 60
4.1 Results of Pre-processing ....................................................................................... 60
4.2 Results of Training and Testing ............................................................................. 68
4.3 Final Output of the Program .................................................................................. 81
Michael Jeremy
4.4 Discussion .............................................................................................................. 83
CHAPTER 5 – CONCLUSIONS AND RECCOMENDATIONS .............................. 90
5.1 Conclusions ............................................................................................................ 90
5.2 Recommendations .................................................................................................. 91
GLOSSARY ................................................................................................................ 92
REFERENCES ............................................................................................................ 94
APPENDICES ............................................................................................................. 97
CURRICULUM VITAE .............................................................................................. 99
Michael Jeremy
LIST OF FIGURES
Figures Page
Figure 1. Stages of Tumor Development (Kanwal, 2013) ........................................... 19

Figure 2. Craniocaudal (CC) and Medial Lateral Oblique (MLO) View ................... 23
Figure 3. Different Types of Breast Density in Mammogram Images (Orr, 2020) ..... 25
Figure 4. Example of a Mass Found in a Mammogram Image ................................... 26
Figure 5. Schematic of a Breast (Thomas, 2010)......................................................... 27
Figure 6. A Digital Image (Khandelwal, 2021) ........................................................... 33
Figure 7. Median Filtering Process (Lu, Loh and Huang, 2019) ................................. 36
Figure 8. Image Histogram and Histogram Equalization (Ghr, 2022)......................... 37
Figure 9. Convolutional Layer Mechanism (IBM Cloud Education, 2020) ................ 41
Figure 10. A Fully Connected Layer (Charlotte Pelletier, no date) ............................. 42
Figure 11. Confusion Matrix........................................................................................ 43
Figure 12. Study Design Block Diagram ..................................................................... 44
Figure 13. Pre-processing Step Diagram ..................................................................... 45
Figure 14. Classification Step Diagram ....................................................................... 46
Figure 15. Block Diagram for Mammogram Classification ........................................ 49
Figure 16. Pre-processing Algorithm ........................................................................... 50
Figure 17. Example of Median Filter (Median Filtering, 2019).................................. 51
Figure 18. Example of Otsu Thresholding (Otsu Thresholding, 2019) ....................... 51
Figure 19. Padding Algorithm ..................................................................................... 53
Figure 20. Mass Normal Dataset Folder Structure ...................................................... 54
Figure 21. Benign Malignant Dataset Folder Structure ............................................... 55
Figure 22. Dataset Splitting Flowchart ........................................................................ 57
Figure 23. CNN Model Summary ................................................................................ 58
Figure 24. CNN Classification ..................................................................................... 58
Figure 25. Original Mammogram Image with Its Corresponding Histogram: (a)
Craniocaudal View; (b) Mediolateral Oblique View ................................................... 60
Figure 26. Output of Median Filtering: (a) Original Image; (b) Median Filtered ....... 61
Figure 27. Otsu Thresholding Results ......................................................................... 62
Michael Jeremy
Figure 28. Masked Image and Generated Bounding Box ............................................ 63
Figure 29. Truncation Normalization Results .............................................................. 64
Figure 30. Synthesized Image Result........................................................................... 65
Figure 31. Final Preprocessing Output: (a) Right Breast; (b) Left Breast ................... 66
Figure 32. (a) Training Accuracy vs Evaluation Accuracy (b) Training Loss vs
Evaluation Loss ............................................................................................................ 69
Figure 33. Confusion Matrix Result of DenseNet201 CNN Testing on the “Mass
Normal Dataset” Validation Folder using INbreast data ............................................. 70
Evaluation Loss ............................................................................................................ 73
Figure 35. Confusion Matrix Result of DenseNet201 CNN Testing on the “Benign
Malignant Dataset” Validation Folder using INbreast Data ........................................ 74
Evaluation Loss ............................................................................................................ 77
Figure 37. Confusion Matrix Result of DenseNet201 CNN Testing on “Mass Normal
Dataset” Validation Folder using INbreast and Husada Data ...................................... 78
Figure 38. Training Accuracy vs Evaluation Accuracy; Training Loss vs Evaluation
Loss .............................................................................................................................. 79
Figure 39. Confusion Matrix Result of DenseNet201 CNN Testing on “Benign
Malignant Dataset” using INbreast and Husada Data .................................................. 80
Figure 40. Output using "Mass Normal Dataset" ........................................................ 82
Figure 41. Output using "Benign Malignant Dataset" ................................................. 82
Figure 42. Comparison of Current Study with Related Researches ............................ 86
Figure 43. Original image vs Pre-processed Image; Left Image(s) (original); Right
Image(s) (Pre-processed) ............................................................................................. 88
Figure 44. (left image) Original image vs (right image) Pre-processed Image ........... 88
Michael Jeremy
LIST OF TABLES
Table Page
Table 1. BI-RADS Coding ........................................................................................... 28

Table 2. Results of Other Research .............................................................................. 32
Table 3. Experiment Trials with Pre-processing.......................................................... 71
Table 4. Experiment Trials without Pre-processing..................................................... 72
Table 5. Experiment Trials with Pre-processing.......................................................... 75
Table 6. Experiment Trials without Pre-Processing .................................................... 76
Table 7. Experiment Trials .......................................................................................... 78
Table 8. Experiment Results ........................................................................................ 81
Table 9. Results Comparison to Other Research ......................................................... 85
Michael Jeremy
CHAPTER 1 - INTRODUCTION
1.1 Background
Amongst the common fatal diseases, cancer is one of the leading types of diseases that
causes millions of deaths in a year worldwide. The human body is composed of trillions
of cells that group together and perform millions of functions which makes up for a
complete individual. Cancer arises due to an accumulation of mutations that transforms
normal cells to cancerous cells. Normally, these mutations are able to be detected by
the cells. The cells then either fix the mutations or self-destruct before it becomes
cancerous. However, a mutation can grow unchecked and accumulate. When this
happens, the accumulation of mutation then causes the cell to become cancerous and
invade nearby cells and tissues. From nearby cells and tissues, the cancer can even
metastasize to distant organs.
A cancer can become incurable once it has metastasized. As mentioned above, because
the body is composed of trillions of cells, cancer cells can form anywhere within the
body. According to the National Cancer Institute, based in the year of 2013 to 2017, the
rate of cancer incidents occurred at 442.4 per 100.000 men and women per year, and
the rate of cancer deaths occurred at 158.3 per 100.000 men and women per year
(Cancer Statistics, 2020).
There are many types of cancer; lung cancer, prostate cancer, bone cancer, and many
others. However, one of the most frequent types of cancer is breast cancer. Breast
cancer is a type of cancer that occurs in the cells of the breast tissues. Though almost
all breast cancer cases occur in women, in rare cases, men can also develop breast
cancer. According to Cancer Statistics 2020, breast cancer accounts for 30% of cancer
cases with 276,480 new cases and 42,000 estimated deaths in 2020 (Siegel, Miller and
Jemal, 2020). In Indonesia alone, breast cancer incidents occur with an incident rate of
42 cases in 100,000 women, and the mortality rate is found to be higher than the average
global rate (Icanervilia et al., 2021). The primary reason for this higher-than-average
mortality rate is the lack of early detection. In other words, the cancer is more often
detected at its late stages, or after it has metastasized. As mentioned above, a cancer
Michael Jeremy
becomes almost incurable when it has already reached its late stages or has
metastasized.
Therefore, early detection of breast cancer is crucial, as breast cancer survival depends
largely on effective and affordable treatments where proper follow ups and timely
treatments can be conducted. Early detection usually leads to a more effective and less
intensive treatment (Ginsburg et al., 2020). Late stages of breast cancer will need to be
followed up with surgical incision to remove the cancer from the breast or even a
mastectomy. In response, one of the methods aiming towards early detection of breast
cancer is a widespread screening using mammography. Using mammography, X-ray
images of the breast tissues are taken and any potential cancer can be seen on the film.
A mammography screening will result in a large quantity of mammogram images, as a
single scan for a patient will produce up to 4 images. Patients will then be notified
regarding their results and the appropriate steps needed if a cancer were to be detected
in their mammogram images.
Owing to the barrage of data obtained from such screenings, it is difficult to maintain a
highly accurate diagnosis when having to manually evaluate each mammogram image
obtained from the widespread screening (Icanervilia et al., 2021). It’s also evident that
misdiagnosis and misinterpretations by radiologists can also occur. This is caused by
the fatiguing task of evaluating hundreds of images, and in addition, histopathology is
also considered a subjective analysis. In research conducted by (Waheed et al., 2019),
out of 943 samples collected, 15 breast cancers were detected. However, 7 (46.6%) out
of the 15 breast cancer cases were missed from the assessment, and 3 (43%) cases were
missed due to misinterpretations by the radiologists. In addition, Indonesia is the 4th
most populated country, but there are only roughly 1,500 radiologists and 681
pathologists throughout the country and the distribution are highly uneven, with more
radiologists being concentrated in Western Indonesia compared to Eastern Indonesia
(Ramkita and Murti, 2021).
Therefore, numerous attempts have aimed to develop a Computer-Aided Diagnosis

(CAD) to help detect the indicators of breast cancer and to improve the accuracy of a
diagnosis. CAD is designed to help radiologists analyze images and serve as a second
opinion, or supporting tool for radiologists. For example, in research conducted by (Lu,
Michael Jeremy
Loh and Huang, 2019), a CAD is developed using convolutional neural network or
CNN to classify whether a mammogram image is benign or malignant. They also
utilized various image processing methods to pre-process the image before it is fed into
the CNN. Pre-processing an image helps to prepare the images first before the CNN,
and therefore, the performance of the CNN can be much more efficient. The pre-
processing methods (Lu, Loh and Huang, 2019) used consist of median filtering to
remove noises from the image and contrast limited adaptive histogram equalization or
CLAHE to sharpen the contrast of the image. The research also used data augmentation
to give more variance to the training samples. However, they used the whole
mammography image, which also contains the background of the image. In other
words, a great number of pixels in the image are actually the background pixels, which
provide no valuable information. This may lead to proportions of pixels that do contain
the breast tissue or possibly the mass to be underrepresented.
Other than (Lu, Loh and Huang, 2019), there have also been other researches which are
aimed to develop CAD to detect breast cancer, whether from mammogram images or
histopathology slides. Various methods have been utilized, which varies from different
pre-processing methods to the different classification methods. For example, (Khuzi et
al., 2009) developed a CAD which utilizes Gray Level Co-occurrence Matrix or GLCM
to extract features from mammogram images and detecting masses in the mammogram
images based on a decision tree algorithm. (Khuzi et al., 2009) also used a pre-
processing method which include CLAHE to enhance the contrast of the image, and
then used various segmentation techniques to obtain the breast region from the images.
Therefore, this study aims to develop an image processing method to pre-process

mammogram images for breast cancer detection and classification using CNN. This
study proposes the usage of Otsu thresholding to crop a range of interest or ROI from
the mammogram image which reduces unnecessary pixels in the image, median
filtering which removes any noise in the image and enhance the quality of the image,
and sharpening the contrast using CLAHE and truncation normalization. Data
augmentation will also be used to enlarge the dataset and balance the amount of data in
the dataset. Afterwards, CNN is used to provide a positive and negative diagnosis from
the mammogram images.
Michael Jeremy
1.2 Research Problems
1. Manually evaluating large sets of images obtained from screenings is a
repetitive and fatiguing task that could result in misdiagnosis or
misinterpretations. In addition, the assessments and analysis are also subjective
and dependent on the experience of the radiologist.
2. The performance of a trained CNN model is heavily dependent on the amount

of data in the dataset, different pre-processing and classification methods.
1.3 Research Objectives

1. To obtain a CAD using image processing (median filter, Otsu threshold,
CLAHE and truncation normalization) to enhance and extract the features of
mammogram images and then feeding it to a CNN to automatically detect and
classify breast cancer.
2. To obtain an improved program that can support or assist medical practitioners

in analyzing and identifying cancers in mammogram images.
1.4 Significance of Study

This research covers the development of a CAD with the use of image processing
methods Otsu thresholding, CLAHE, median filter, truncation normalization, data
augmentation and a classification method using CNN. The program can then be used to
obtain diagnostic results from a certain dataset, and the performance of the program is
then compared to previous researches.
This research is significant with regards to the potential to use the program to help in
detecting breast cancer with a more objective analysis, and possibly help detecting
cancer at its early stages. This study is limited towards the usage of mammogram
images as the primary data for detecting breast cancers.
1.5 Research Questions

Question #1 Will the usage of cropping to obtain the ROI using Otsu
thresholding, combined with denoising using median filtering,
enhancing the contrast using CLAHE and truncation
Michael Jeremy
normalization be able to provide enhanced images to detect breast
cancers from mammogram images?
Question #2 Will the image processing Otsu thresholding, median filtering,

CLAHE, truncation normalization and data augmentation as the
pre-processing method and a convolutional neural network
provide an improved performance to detect breast cancers from
mammogram images?
1.6 Hypothesis
Hypothesis #1 The usage of Otsu thresholding to crop the ROI combined with
median filtering to denoise, CLAHE and truncation normalization
to enhance the contrast will be able to provide good images to
detect breast cancer from mammogram images.
Hypothesis #2 The pre-processing method and convolutional neural network will

be able to give an improved performance to detect breast cancer.
Michael Jeremy
CHAPTER 2 - LITERATURE REVIEW
In this chapter, everything related towards literatures, theories, and past researches will
be covered. The disease covered in this chapter focuses on cancer, specifically towards
breast cancer. This chapter also includes the stages of cancer, how to diagnose a breast
cancer, and why it is important to detect breast cancer as early as possible. Afterwards,
mammography is discussed more deeply as it is one of the available and common tools
for breast cancer detection, how a mammogram is captured, how to interpret and report
a mammography scan, and what a mammogram can show. Previous research is
discussed at length in this chapter.
Additionally, to be able to develop a program to detect breast cancer, image processing

tools will be utilized extensively to pre-process the raw data. Therefore, in this chapter,
image processing tools are also discussed, which includes the following: median
filtering, Otsu thresholding, CLAHE, truncation normalization, and data augmentation.
The Convolutional Neural Network as the detection and classification tool is also
covered in more depth in this chapter. Image processing tools such as histogram are
covered as well as it is one of the tools used to analyze and understand an image.
2.1 Cancer
The human body consists of around 30 trillion cells. These cells are connected to each
other under a very complex system. Similar cells will form tissues. Afterwards,
different tissues form together to develop the organs. Then the organs work together to
form the system organ, and then it becomes one single individual. Various functions at
different levels are integrated to form the complete individual. Nevertheless, a complete
individual will not exist without its very basic building blocks, which are the cells. In
other words, cells can simply be defined as the basic building blocks of a person. And
under an interdependent complex system, cells regulate each other for its proliferation
(Weinberg, 1996).
Michael Jeremy
Proliferation of cells are carefully regulated and maintained by growth factors. Growth
factors will generate signals to promote a coordinated entry into the cell cycle (Yang,
Ray and Krafts, 2014). Normally, cells proliferate or reproduce only when it has the
instruction to proliferate from nearby cells within their vicinity (Weinberg, 1996). This
ensures that the size and architecture of the tissues that the body needs are well
maintained and regulated. However, mutations may occur and may lead towards
potential cancerous cells that can invade surrounding cells. The mutation that occurs
causes violations towards the regulating system, where the cell no longer follows
proliferation instructions and reproduce outside the body’s control. These cells are
called cancer cells.
Cancer cells does not follow the normal regulation followed by normal cells. Not only
does a cancer cell proliferate under its own regulation, it also possesses insidious
properties. In other words, cancer cells can migrate and invade nearby cells and tissues.
When this happens, it will form masses, and these masses can migrate from one organ
to another organ. When left untreated, cancer will become more and more dangerous,
and lethal after a certain threshold of time.
Over the years, scientists have discovered that cells in a tumor descend from a common
ancestral cell (Weinberg, 1996). This ancestral cell will initiate the unregulated
proliferation of cells, and will usually decay before the tumor is palpable. It is
mentioned before that the violation is caused by a mutation that occurs in the cell. This
mutation is the key to understand how cancer arises. Within the brain of the cell, the
nucleus carries chromosomes which contains the genes. These genes are utilized to
produce proteins which are essential for the human body. Specific gene sequences will
encode to a specific protein synthesis when they are turned on.
There are two gene classes that play a major role towards the occurrence of cancer,
proto-oncogenes and tumor suppressor (Weinberg, 1996). Proto-oncogenes drives the
growth of cells, whereas tumor suppressor inhibits the growth. Mutations in the proto-
oncogenes will result in carcinogenic oncogenes which drives excessive growth of
cells. The mutation forces the gene to yield its encoded protein which stimulates
growth. On the other hand, mutations in the tumor suppressor genes will cause the gene
Michael Jeremy
the deactivate. The deactivation of this gene will result in the loss of function of the cell
to suppress its growth.
Figure 1. Stages of Tumor Development (Kanwal, 2013)
The development of tumor occurs in different stages. The stages of tumor can be seen
in Figure 1, where a simple schematic is presented to demonstrate how tumor develops
in the epithelial cells. As mentioned before, a cancer arises when a genetic mutation
occurs within a cell, which is indicated by the genetically altered epithelial cell. This
mutation then increases the cell’s tendency to reproduce and proliferate, even though at
the time it normally wouldn’t proliferate (Weinberg, 1996). The genetically altered cell
will reproduce and will create its offspring. Although these cells appear normal, they
now have an abnormal rate of proliferation. This condition is called as hyperplasia,
indicated by the letter A in Figure 1.
Over time, the cells will accumulate its mutation which increases the rate of
proliferation. The offspring which are the genetically altered cells produced by the
ancestor cell will then change form and the shape and orientation of these cells becomes
abnormal. This stage is called dysplasia, indicated by the letter B in Figure 1.
Afterwards, the affected cells will continue to change its form and appearance, and also
increase its abnormal rate of proliferation. In situ cancer is the stage where the cancer
Michael Jeremy
has not yet broken through the tissues. Once the cancer cells invade the underlying
tissue and shed into the blood or lymph, the cancer is now a malignant tumor or cancer.
The cancer cells can then form new tumors at distant sites in the body through the
renegade cells which are shed from the originating tumor. This is called metastases.
And when the cancer disrupts a vital organ, it may become lethal.
2.2 Breast Cancer

Cancer can originate anywhere within our body. Almost every cell in our body has the
potential to be cancer cells or cancerous. One of the most frequent types of cancer to
occur today is breast cancer, which is simply referring to cancers that originates in the
breast tissues of an individual. Breast cancer is one of the leading causes of deaths
among women worldwide. Breast cancer incident rates and mortality rate are higher in
developing countries, whereas incident rates and mortality rates are found to be lower
in developed countries (Key, Verkasalo and Banks, 2001). One of the major
contributing factors towards the occurrence of breast cancer is age, where incident rate
increases rapidly in response to increasing age. It starts rapidly increasing during the
reproductive years and continues to increase at a slower rate after the age of 50.
According to Cancer Statistics 2020, breast cancer accounts for 30% of cancer cases in
2020, with 276,480 new cases and 42,000 estimated deaths (Siegel, Miller and Jemal,
2020). In Indonesia alone, breast cancer occurs with an incident rate of 42 cases in
100.000 women, and according to the study by (Icanervilia et al., 2021), the mortality
rate in Indonesia is found to be higher than the average global rate.
Breast cancer is considered a heterogeneous disease on its molecular level. Despite the
permanency molecular heterogeneity of breast cancers, some features are shared and
may also influence its therapy (Harbeck et al., 2019). Early breast cancers are
considered to be still curable, whereas metastasized breast cancer is not considered
curable with the currently available therapies. Most therapies that deal with
metastasized breast cancers aim to prolong the patient’s survival and control the
disease’s symptoms.
The classification of breast cancer reported by (Perou et al., 2000) can be divided into
four subtypes of breast cancer: luminal A, luminal B, basal-like and human epidermal
Michael Jeremy
growth factor receptor 2 (HER2)-enriched. Luminal A and luminal B types express the
estrogen receptor (ER), whereas basal-like and HER2 types are without the expression
of ER. Through this classification, clinical management of breast cancer has been
shifted towards biology-centered approaches (Harbeck et al., 2019). As of today, the
classification is categorized into five subtypes based on its histological and molecular
characteristics. Hormone receptor-positive breast cancers are breast cancers that its
tumors express ER and/or progesterone receptor (PR). Breast cancers which does not
express ER, PR, or HER2 are classified into the triple-negative breast cancer or TNBC
(Harbeck et al., 2019).
Many various treatments are currently being used to manage breast cancers. These
therapies include endocrine therapy, chemotherapy, anti-HER2 therapy, polymerase
inhibitors, immunotherapy, and many others. Therapies that are under development are
directed towards individualizing the therapy and managing the treatment based on the
tumor’s biology and its response towards the therapy. However, aside from the
challenges in developing the therapy, it remains a major obstacle to create an innovative
treatment that is able to be widely accessed worldwide.
2.3 Diagnosis of Breast Cancer

Before conducting any sort of treatments mentioned in the previous section, the first
step of any treatment is the diagnosis of the disease itself. Symptoms that indicate the
requirement for proper diagnostic evaluation include breast changes such as, lump,
localized pain, nipple symptoms or skin changes. Women that are recalled due to a
positive screening mammography are also required to undergo proper diagnostic
evaluation.
Additional tests are often conducted to find or diagnose breast cancer. However,
diagnosing a breast cancer is conducted based on a triple test. The triple test is a
recommended diagnostic approach for doctors to conduct on a patient, where it includes
clinical examination, imaging (either using mammography or ultrasonography) and
needle biopsy (Irwig, Macaskill and Houssami, 2002). Through proper diagnosis, it will
increase the accuracy of distinguishing patients that have breast cancer from patients
that have benign conditions or normal breast changes. After a proper diagnosis, it can
be assured that properly managed follow-up treatments can be conducted.
Michael Jeremy
Early detection of breast cancer increases a patient’s chance of survival rapidly.
Therefore, it has been a hot topic to use mammography to perform screening tests with
the aim to find any signs of potential breast cancers or early stages of breast cancer.
Afterwards, proper treatment can be recommended based on the findings from the
screening.
2.4 Breast Cancer Diagnosis through Mammography

As mentioned in the previous sections, mammography examination is one of the
available techniques currently that is used to detect the presence of breast cancer. A
mammography is a medical radiology imaging equipment which is used to take images
of the breast using low-doses of X-rays. A mammography scan will then produce X-
ray images of the breast, which are called mammograms.
As mentioned in Section 2.2, a tumor will become more and more lethal as time
progresses, and in addition, advanced breast cancer is also considered incurable.
Therefore, early detection of breast cancer is very crucial in increasing the patient’s rate
of survival. Mammograms allows for visualization of changes within the breast tissues.
That means that through mammography scans, early signs of breast cancers can be
detected.
A mammography has 2 types of main applications: screening mammograms and

diagnostic mammograms (Mammograms, 2019). A screening mammogram is a
procedure performed to search for any signs of breast cancer. Screening mammograms
are done to all women, even to women that does not have any symptoms. Whereas
diagnostic mammograms are used in women that have symptoms, such as a lump, pain
or nipple discharge. Diagnostic mammograms are also performed when a person is
recalled due to the result of their screening mammograms. The image results from a
mammogram are then assessed and analyzed by a radiologist.
2.4.1 Standard Views of Mammogram Images

A mammography scan typically results in 4 different images of the breast, which
consists of 2 different views on each breast. The name of the image is based on the
location and direction of the X-ray beam, where there are 2 different types of view:
Craniocaudal view or CC, and Mediolateral Oblique view or MLO.
Michael Jeremy
A CC view is obtained by passing through an X-ray beam from the cranial or superior
aspect of the breast (from the top) towards the patient’s feet. The breast is placed in
compression, and the X-ray beam is initiated from above the breast towards the image
receptor which is placed beneath the breast. A CC view of a mammogram image can
be seen in Figure 2, where 2 images are obtained which represents the right breast (A)
and the left breast (B) respectively. The upper aspect of the breast in both images
represents the outer part of the breast, whereas the lower aspect of the breast represents
the inner part of the breast.
In an MLO view image, the breast is placed under compression at a 45˚ angle, and the
image is obtained by passing through an X-ray beam from the medial aspect of the
breast towards the lateral aspect of the breast. The result of an MLO view of a
mammogram image can also be seen in Figure 2, where 2 images are also obtained
which represents the right breast (C) and the left breast (D) respectively. In an MLO
view, the upper part of the breast in the image represents the superior aspect of the
breast, and the lower part of the breast in the image represents the inferior aspect of the
breast.
Figure 2. Craniocaudal (CC) and Medial Lateral Oblique (MLO) View
These 4 different images are used to determine the location of a finding. If for example,
a mass is found in the top right of the breast in image D and the bottom part of the breast
in image B. Hence, it can be concluded that a mass is located in the left breast,
specifically in the upper and inner left breast.
Michael Jeremy
2.4.2 Patient Positioning
Patient positioning must also be considered, as it’s an essential step before performing
any mammogram scans. The breast of the patient must be pulled out far enough from
the chest wall to make sure that all of the breast is being imaged and examined. To
ensure this, the image must show the pectoralis muscle on the CC and MLO view, and
the distance from the pectoralis muscle to the nipple between the images from the CC
and MLO view of each breast must not differ more than 1 cm. Differences more than 1
cm suggests that one of the images have captured more of the breast than the other
image, and some parts of the breast is missing in one of those images.
The nipple must also be shown in at least one of the images from each breast, either the
CC view or the MLO view image. This ensures that the scan have adequately imaged
the retroareolar region of the breast; which refers to the region within two cm from the
nipple. This is done to avoid confusion from the nipple with a potential underlying
mass. The nipple on the MLO view must also be angled slightly upward, to ensure that
the entire breast is being lifted, and the inferior portion of the breast can be examined
fully. The MLO view must also adequately capture the inframammary fold to ensure
that the breast is fully captured in the posterior and inferior region.
2.4.3 Image Interpretation

A mammogram image is first evaluated based on the tissue density of the breast.
Afterwards, a mammogram is then examined and evaluated in search for any masses,
calcifications, and architectural distortions. Presence of these findings may suggest the
presence of breast cancer within an image.
2.4.3.1 Tissue Density

Tissue density can be defined as the percentage of glandular tissue in relative to breast
fat within the breast. Glandular tissue refers to the part of the breast that contains ducts
and lobules where milk is produced if a woman were to be lactating. A glandular tissue
is typically represented as a white colour within a mammogram image. Breast fat is
typically represented as a grey colour on a mammogram image. A mammogram image
has a spectrum of proportion that is categorized into four different categories: fatty,
scattered fibroglandular, heterogeneously dense, and dense, which can be seen in Figure
3. The far most left mammogram image represents what is called a predominantly fatty
Michael Jeremy
breast. This is indicated by the absence of white glandular tissues within the breast, and
the mammogram image is predominantly grey or black in colour. The second image
shows a mammogram image categorized into the scattered fibroglandular type, where
the type is indicated by small white areas within the breast that represents small areas
of fibroglandular tissues. The next type of breast is a heterogeneously dense breast. This
type of breast is indicated by the spike of white glandular tissues frequency which is
scattered throughout the breast. And the last type of breast is the dense type of breast,
where unlike fatty breast, a dense breast is predominantly composed of the white
fibroglandular tissues.
Figure 3. Different Types of Breast Density in Mammogram Images (Orr, 2020)
As a breast increases in its fibroglandular density, the chance of an underlying breast

cancer to be missed from the scan or from the evaluation also increases. The main
reason behind this occurrence is due to same colour representation of fibroglandular
tissues and a breast cancer in a mammography scan. The higher quantity of white areas
present due to fibroglandular tissues, the higher the possibility for it to mask an
underlying tumor due to the same colour representation.
2.4.3.2 Mass
A mass can be defined as a space occupying lesion that can be seen within on both of
the projections obtained from a mammography scan, which are the CC view and the
MLO view. An example of a mass can be seen in Figure 4, where a mass is found on
the upper middle part of the left breast.
Michael Jeremy
Figure 4. Example of a Mass Found in a Mammogram Image
When encountering a finding that is also present on both the CC view and the MLO
view, but the finding does not have as discrete borders when compared to a mass, it is
termed as a focal asymmetry. Findings that are only present in one view are termed as
asymmetry.
2.4.3.3 Calcifications
Calcifications are defined as calcium deposits that lie within the breast. Calcifications
can occur due to various reasons, and it can be both benign and malignant.
Calcifications are represented as white dots on the mammogram image. Within the
breast tissue, ducts branch into smaller ducts and ductules, and end in a bud-like
structure called the Terminal Ductal Lobular Unit or TDLU as seen in Figure 5. It is
here in the TDLU that milk is produced and transported through the ducts towards the
nipple when a woman is lactating. It is in the TDLU also where predominant breast
cancers and calcifications form.
Michael Jeremy
Figure 5. Schematic of a Breast (Thomas, 2010)
Despite not lactating, fluid may be present within the TDLU, and the fluid can calcify.
The calcification layer within the fluid and these calcifications are considered benign
calcifications. Calcifications that are linear and branching in the ducts are associated
with malignancy and are classic signs of breast cancers.
2.4.3.4 Architectural Distortions

Architectural distortions are used as a secondary sign of breast cancer within the breast.
These distortions are indicated by the “pulling in” of the tissues around the cancer. This
can be seen in the left image in Figure 4, where a “sunburst” like pulling is present
around the mass. Signs of architectural distortions must also have an underlying
explanation as to why there is an area of pulling in. Evaluation must provide a clear
conclusion as to whether the “pulling in” structure is associated with changes of the
breast tissue due to the presence of breast cancer, or whether the distortion is related to
any prior surgery or trauma to the breast in this location.
2.4.4 Breast Imaging Reporting and Data System

To report a mammogram scan and study, the “Breast Imaging Reporting and Data
System” or abbreviated as BI-RADS standard is widely used as reference on how a
report or study is dictated and how the result is encoded. The BI-RADS standard
provides recommended words and terms to describe findings on mammogram images.
This is done so any person, presumably doctors or radiologists, understand whether a
finding is considered concerning or not. For example, a benign mass might be indicated
by the words "round”, “circumscribed”, “absence of vascularity”. The BI-RADS
Michael Jeremy
standard also provides a BI-RADS Coding scheme that is required for each report made
on a scan and study.
Table 1. BI-RADS Coding

BI-RADS
Assessment Meaning Risk of Cancer
category
0 Assessment is incomplete Not applicable
1 Negative Negligible
2 Benign Finding Negligible
3 Probably Benign < 2%
4 Suspicious Abnormality 23 – 34%
5 Highly Suggestive of Malignancy ≥ 95%
6 Biopsy Proven Malignancy 100%
The coding scheme can be seen in Table 1. A mammogram scan coded with BI-RADS
0 indicates that the scan or assessment is incomplete, and the patient must be recalled
for additional imaging, or prior images can be obtained to compare to the current
images. The BI-RADS code 1 is used when a mammogram scan shows no findings in
the images. The BI-RADS code 2 is used when a scan shows a finding, but the finding
is certain to not be cancerous. An example of this is the finding of a lymph node present
within the breast, but not cancer.
BI-RADS code 3 is used when a scan shows a finding, but evaluation of the image is
not certain whether the image is cancerous or not. The usage of this code is usually
followed by a recommendation for a short interval follow-up, ranging from 1 month to
6 months. When a scan is coded with BI-RADS 4, the evaluation suggests that the
finding is of concern, and it requires to be further biopsied.
BI-RADS 4 is further categorized into three sub-categories, BI-RADS 4a, 4b, 4c, which
increases in its concernment respectively. BI-RADS 5 is a code used to indicate that the
evaluation of the image shows a finding that is highly suggestive of cancer or
malignancy. Scans encoded with BI-RADS 5 will usually be followed by further
biopsies, and even surgery. BI-RADS 6 is used when the patient is already confirmed
to have breast cancer at the current time of the scan, and is yet to be treated.
Michael Jeremy
Images from a mammography scan can then be stored either in a disk for analog
mammograms, or in a computer for digital mammograms. The most common format
for a mammogram file is the DICOM format.
2.5 Previously Proposed Methods

Development of a CAD dates back as far back as in the 1980s. A CAD system is a term
used to define computer systems that are built with the goal of providing assistive
support in detecting or diagnosing diseases among patients. A CAD system currently is
mostly intended to serve as a “second opinion” towards diagnosis and detection, where
it could improve on the accuracy of assessment by radiologists and also decreases the
time usually needed to do a full interpretation and assessment (Firmino et al., 2016).
CAD systems are categorized into two categories: Computer Aided Detection (CADe)
systems and Computer Aided Diagnosis (CADx) systems.
CADe systems are made in with the functional ability to detect the presence and
location of a lesion within an image. Whereas CADx systems are made with the
functional ability to perform characterization of the lesions, and could even make a
distinction and classification on malignancy and benign (Firmino et al., 2016). There
have been various CAD systems that have been developed for a variety of diseases,
ranging from brain tumours, pulmonary nodules, malaria diagnosis, and many others.
However, it is a current trend for CAD to be developed with the purpose of detecting
cancers, including breast cancers.
Different various algorithms and tools have been addressed in numerous researches and
papers to develop a CAD for automatic breast cancer detection. These papers used
either one of two samples: histopathology slide samples or mammogram images.
(Bandyopadhyay, Maitra and Kim, 2011) conducted a study on various image
processing methods that can be utilized for future studies. The paper went through
image processing methods such as, histogram equalization, thresholding, colour
quantization.
(Khuzi et al., 2009) used Grey Level Co-occurrence Matrix or GLCM to extract
features from mammogram images that were obtained from a publicly available MIAS
database. The extraction was done by the GLCM from four different angles: 0, 45, 90,
Michael Jeremy
135 and a block size of 8x8. These parameters showed significant texture information
extraction for mass and non-mass classification. The images were loaded into an
algorithm that held the mammogram images as an 8 bit greyscaled images with levels
from 0 – 225.
There have been numerous studies also in using artificial neural networks, such as CNN
as a way for automated classification and detection of breast cancer either using
histopathologic images or mammogram images. For example, (Wei et al., 2017)
proposed the usage of CNN for identifying and classifying histopathological images of
breast cancer. The study used data augmentation to balance and enlarge the data to
improve on the accuracy of training and validation.
(Zhou, Zaninovich and Gregory, 2017) has conducted a study where they utilized CNN
to detect masses and calcifications in mammogram image. This study only utilized data
augmentation to improve the dataset for CNN training, where the images were rotated
90, 180 and 270 degrees. (Zhou, Zaninovich and Gregory, 2017) specifically focused
on how they could minimize the risk of overfitting towards the CNN training. From
their experiment, their method achieved an accuracy of: 0.609 in classifying benign and
malignant classes, 1 in detecting calcifications, and 0.75 in detecting masses. Their
experiment was conducted using a Macbook with dual core 2.7Ghz Intel i5 processor.
An alternative method has also been proposed in another study conducted by (Lu, Loh
and Huang, 2019), where they adopted the usage of median filtering and CLAHE to
denoise and enhance the contrast of the mammogram images. Then the images were
loaded into a CNN of 13 convolutional layers and 4 pooling layers. Adam optimizer
was also used, and the learning rate was set to 0.0001. The CNN was used to classify
an image into either benign or malignant. The proposed method reached an accuracy of
82.3% in the testing dataset, and a sensitivity, specificity and F1 score of 0.91, 0.57,
and 0.88 respectively. The experiment was conducted on a computer with RTX 2080TI
GPU, and Intel i7-7900X CPU, along with 11GB memory.
(Wang et al., 2019) proposed a CAD for mass detection in mammogram images using
Extreme Learning Machine (ELM) or Support Vector Machine (SVM) based on a fused
feature set using CNN deep features. The study used pre-processing methods adaptive
Michael Jeremy
mean filter algorithm to eliminate any noise present within the original image. After
noise reduction, the study used dynamic histogram equalization to enhance the contrast
of the image. Then CNN was used to extract features from the images. These features
are called deep features. The study also extracted morphological features, texture
features, and density features from the images. (Wang et al., 2019) conducted extensive
experiments where ELM or SVM was used along with the features obtained, and the
highest accuracy reached was 0.865 where the trial used the ELM classifier with
multiple features which consist of CNN deep feature combined with morphological
features, density features and texture features.
The study conducted by (El Houby and Yassin, 2021) used CNN for malignant and
non-malignant classification of breast lesions in mammogram images. This study
presents a CNN model which is built from scratch and designed to be able to learn
features from mammogram images. The pre-processing methods is similar to that of
(Lu, Loh and Huang, 2019), but differs in the usage of ROI.
(Lu, Loh and Huang, 2019) used the whole mammogram image in their study, which
does not minimize the amount of background pixels of the mammogram image,
whereas (El Houby and Yassin, 2021) used a simple algorithm to crop the ROI (breast
region) from the image, thereby reducing the amount of background pixels. Data
augmentation was also used in this study, due to the unbalance of data in each class.
The proposed method reached an accuracy of 0.965 for images that have the ROIs
annotated. An accuracy of 0.93 was achieved for images that are not annotated. This
accuracy was achieved using the INbreast dataset which is the same dataset used in this
thesis. The experiment with the MIAS and DDSM dataset reached an accuracy of 0.934
and 0.912 respectively.
(Zhang et al., 2021) proposed another method through the combination two advanced
neural network of CNN and Graph Convolutional Network or GCN. A standard 8-
layered CNN is combined with a 2-layered GCN. The images were obtained from the
MIAS dataset and was processed with data augmentation prior to the CNN-GCN model.
The proposed method yielded an accuracy of 0.961. The mentioned studies can be seen
in Table 2, comparing the achieved accuracy, sensitivity, precision and specificity.
Michael Jeremy
Table 2. Results of Other Research
Classification
Model Accuracy Sensitivity Precision Specificity
Classes
Combination of CNN
with Graph
Convolutional 0.961 0.962 - 0.960
Network (Zhang et al.,
2021)
CNN with median
filtering, CLAHE, ROI
cropping and data
0.930 0.948 0.917 0.912
augmentation (El
Houby and Yassin,
2021)
Extreme Machine
2-class: Learning and CNN
Benign, Deep Features with
Malignant adaptive mean filtering 0.865 0.851 0.845 0.880
and dynamic histogram
equalization (Wang et
al., 2019)
CNN with pre-
processing method
median filtering and 0.823 0.913 0.856 0.569
CLAHE (Lu, Loh and
Huang, 2019)
CNN with pre-
processing data
augmentation (Zhou, 0.75 0.5 0.714 -
Zaninovich and
Gregory, 2017)
2.6 Image Pre-processing

A digital image can be defined as an array or a matrix. An array of an image consists
of pixels which are arranged in columns and rows. Like the cell being the building of
Michael Jeremy
the body, these pixels are the building blocks of an image. Each pixel within an image
contains information about intensity and colour. Then these pixels group together in
columns and rows in a way that it creates a whole image.
An image can be represented as a 2D function F(x,y), where x and y are the spatial
coordinates (Khandelwal, 2021). When represented as a 3D array, it is commonly
known as an RGB image or colour images. A colour image consists of 3 layers of 2D
images, where these layers are defined as channels. A typical colour image usually
consists of 3 layers which are red, green, and blue channels, as depicted in Figure 6,
where the channels conform together to form a complete colour image. For greyscaled
images, it only consists of one single channel, where each pixel in that channel has an
intensity of 0 to 255. In a greyscaled image, a pixel containing an intensity of 0 will be
represented as the colour black, where as a pixel with an intensity of 255 will be
represented as the colour white.
Figure 6. A Digital Image (Khandelwal, 2021)
Image processing can be defined as the utilization of various methods and techniques
to take an image, and put it under a process to either enhance, restore, degrade, and
others depending on the objective that is desired. Image processing is used with the goal
of processing an image in a way that the image could be used for further analysis or
decision making. There are many image processing methods, for example Gaussian
filters that can be used to reduce the noise present within an image, fourier transforms
which breaks down an image into sine and cosine components, which can be used for
Michael Jeremy
image reconstruction, Canny Edge which detects the edges of an image, thresholding
techniques that can be used to mitigate unnecessary pixels, and many others.
These methods are all methods that are utilized to process a given image, and the
methods are selected based on the necessary ones needed for the objective of the
processing. In a CAD, image processing is used as a pre-processing step to improve,
sharpen or enhance the image before it is inputted into a machine learning, classifier or
even an artificial neural network.
The previously mentioned truncation normalization, Otsu thresholding, median

filtering, CLAHE, and data augmentation are all part of an image processing tools that
can be used as a pre-processing step, used to enhance the quality of the training dataset.
Each of these methods are selected to enhance certain features of an image, mitigate
unnecessary information, or extract the information needed from an image to classify
it. Selecting the appropriate tools is an important step in a pre-processing step.
Inappropriate tools may lead to the loss of important information and data from the
image, which may hinder the performance of a program. Selecting the appropriate tools
isn’t only based on enhancing the quality of an image, but also reducing it. Reduction
of an image quality may also improve a model’s performance through faster
computation time, or reducing the strain on the processor of the computer. The methods
used in this thesis will be discussed in the following chapters.
2.6.1 Otsu Thresholding

OpenCV is one of the most widely used computer vision tool that is usually used for
image processing. It is an open-source computer vision and machine learning library
that is used in applications of computer vision. OpenCV was developed to provide the
base infrastructure for computer vision applications that is common to all programmers
around the world. The library contains more than 2500 optimized algorithms (About
OpenCV, no date). OpenCV contains algorithms that can be used for face detection,
object identification, classifications, motion or object tracking, and other applications.
It includes C++, Python, Java and Matlab interfaces and is supported in Windows,
Linux, Android and MacOS. One technique that is available in OpenCV is the
thresholding technique.
Michael Jeremy
Thresholding technique can be defined as the assignment of pixel values within an
image that is determined by the provided threshold value. Thresholding technique is
very commonly used as a segmentation technique, where it can separate the object in
an image from its background. As explained in Section 2.6, an image is an array of
pixels, with each pixel containing its own value and information. Using the thresholding
technique, a pixel value within an image is compared to a given threshold value. In a
simple thresholding in python, if a pixel value is smaller than the given threshold value,
the pixel value will then be set to 0. On the contrary, if a pixel value is higher than a
threshold value, the pixel value will then be set to the maximum value, which is
generally 255. Due to the nature of the thresholding technique, this method is applied
to only greyscaled images. If a given image is not in a greyscale colour space, it must
first be converted into a greyscale colour space.
In pseudocode, thresholding can be interpreted as:
If f (x,y) < T
Then f (x,y) = 0
Else
f (x,y) = 255
where:
f (x,y) = Coordinate Pixel Value
T = Threshold value
In its application, the thresholding algorithm is written as cv2.threshold() function

in OpenCV python. The syntax is written as cv2.threshold(source,
thresholdValue, maxVal, thresholdingTechnique). In simple thresholding, there
are various thresholding techniques available. However, regardless of the technique, in

simple thresholding, the threshold value used will remain constant throughout its
application towards the image.
Another type of thresholding is the adaptive thresholding, which is a thresholding

technique where the threshold value is calculated for smaller regions in an image. In
other words, unlike with the simple thresholding, an adaptive thresholding’s threshold
value will not remain constant. The function of adaptive thresholding is
Michael Jeremy
cv2.adaptiveTHreshold(source, maxVal, adaptiveMethod, thresholdType,
blockSize, constant).
While simple thresholding uses a global value for the threshold value, and adaptive
thresholding use a threshold value that will be different for different regions in respect
to the lighting of an image, and in Otsu thresholding, the value of the threshold isn’t
chosen but determined through an automated process.
The function of an Otsu thresholding is the normal cv2.threshold() function.

However, an additional flag cv2.THRESH_OTSU is used at the end of the function.
Therefore, the syntax in python for Otsu thresholding is cv2.threshold(source,
thresholdValue, maxVal, thresholdTechnique + cv2.THRESH_OTSU).
2.6.2 Median Filtering

Pre-processing methods may include the usage of filters. Filtering methods are used to
remove noises from an image, and it can be done through various methods as there are
many different filters available. A median filter is a non-linear filter method that allows
the edges of an image to be preserved. The median filter removes the noises by
replacing each pixel value with the median of its neighbouring pixel (Lu, Loh and
Huang, 2019). A representation of the median filter mechanism can be seen in Figure 7
below.
Figure 7. Median Filtering Process (Lu, Loh and Huang, 2019)
Michael Jeremy
2.6.3 Contrast Limited Adaptive Histogram Equalization
An image histogram is a type of histogram that allows the visualization of the tonal
distribution in graphical representation. By getting the histogram of an image, the
quantity of pixel holding a certain value of intensity can be seen. An image histogram
can also be used to see how an image processing does towards the intensity pixel
distribution of an image, and the comparison before and after an image processing
method is applied. An example of an image and the corresponding image histogram can
be seen in the top graph of Figure 8, where the X axis represents the intensity value of
a pixel between 0 - 255, and the Y axis represents the frequency of an intensity value.
In a gray scaled image, the graph only has 1 channel. It can be seen that the original
image has poor contrast and poor quality. The original image can be enhanced and
histogram equalization is commonly used to enhance an image. A histogram
equalization will stretch the intensity distribution throughout the X axis, the resulting
effect on the graph and the image can be seen also in Figure 8 on the bottom graph.
Figure 8. Image Histogram and Histogram Equalization (Ghr, 2022)
In mammogram images, it is crucial that the model be able to distinguish the lesion
from the tissue of the breast. And this can be done through histogram equalization
where the contrast between the lesion from the surrounding breast tissue will be more
Michael Jeremy
distinguished. Adaptive Histogram Equalization or AHE is one of the techniques that
could be used to enhance the contrast between a lesion from the other breast tissues. A
histogram equalization stretches the intensity, thereby increasing the contrast.
However, the intensity distribution in a mammogram image is confined within a very
small interval. Stretching the intensity using histogram equalization may lead towards
the introduction of numerous noises into the image, which greatly reduce the image
quality.
Therefore, Contrast Limited Adaptive Histogram Equalization or abbreviated as

CLAHE is used, instead of the normal AHE. CLAHE provides a good enhancement on
the medical image and have been used in numerous studies. It is used as an amplifier
towards the contrast of an image. CLAHE has its advantages due to the reduction of
noise amplification that may occur when using other histogram equalization technique.
It reduces noise amplification by limiting the slope of cumulative distribution function
or CDF, and clipping the histogram before the CDF computation (Lu, Loh and Huang,
2019).
2.6.4 Truncation Normalization

Truncation Normalization is another tool that has the function to enhance the contrast
of mammogram images. In a mammogram image, it contains a lot of pixels that are
meaningless, where their values represent the background, not the value of the breast
itself (zero value regions), even after cropping the range of interest within the
mammogram image. Using the truncation normalization method, the breast mass region
can be greatly highlighted.
The truncation normalization consists of two processes: truncation processing and

normalizing. The truncation process sorts pixels whose value is greater than zero in an
ascending order. They use a pair of value (maximum intensity value or Pmax and
minimum intensity value or Pmin) from the mammogram image to cut of the intensity
of the image, and then the image is normalized. The Pmax value is the 1% largest values
of the image intensity, while the Pmin value is the 5% smallest values of the image
intensity.
Michael Jeremy
The process of truncation process sorts under three conditions. If the intensity (P) of a
pixel is lower than or equal to Pmin, then the value is set to Pmin. Otherwise, if P is
greater than Pmin, but lower than Pmax, then the pixel intensity value is set to P.
Otherwise, if P is greater than or equal to Pmax, the pixel intensity value is set to Pmax.
Under these three conditions, the original intensity value from the original image can
be retained prior to the normalization process, and it reduces the amount of noise
influence in the image. The truncation process can be represented in Equation 1 below:
(1)
Afterwards, the normalization process can be applied to the truncated image, and the
normalization formula can be seen in Equation 2 below:
(2)
2.6.5 Data Augmentation

Data augmentation is widely used when a small dataset is used or when an unbalanced
data structure is present. Through data augmentation, it utilizes the available training
data and either rotate, flip, zoom, crop, and other processes to enlarge the dataset. The
research conducted by (Lu, Loh and Huang, 2019) stated that the data augmentation
was used because the amount of benign data is far more larger than the malignant data.
Therefore, by flipping the available malignant tissue images horizontally and vertically,
a larger malignant tissue data was obtained. Thereby, the development of the model has
a balanced data for its training.
A balanced data reduces the risk of overfitting when training a model (Zhou,
Zaninovich and Gregory, 2017). There are several methods available for data
augmentation. This study utilizes the augmentor library, which can be imported in
python through the command import augmentor. The augmentor pipeline then saves
the augmented image results to the computer in an output folder created within the
Michael Jeremy
source folder. The target quantity of images produced is inputted in the sample
processing.
2.6.6 Neural Networks

Deep learning or Deep Neural Networks is one of the most powerful and popular tools
to handle large amount of data. Deep learning or Deep Neural Networks are Artificial
Neural Networks, which are neurons or nodes connected through multiple layers. These
neural networks are used in many applications, such as, speech recognition, text
recognition, and especially in pattern recognition.
There are three types of neural networks: Artificial Neural Network or ANN,
Convolutional Neural Network or CNN, and Recurrent Neural Network or RNN. CNN
is widely used for image classification, where it acts as an object recognizer and
classifier from an input image and predicts or classify based on the input given to the
CNN. CNN has three main layers, which are the convolutional layer, pooling layer, and
the fully connected layer.
The convolutional layer is the first layer of CNN. It can then either be followed by
another convolutional layer or a pooling layer. Nevertheless, the fully connected layer
is the last layer of the CNN. The complexity of CNN increases proportionally to each
layer (IBM Cloud Education, 2020), where the first layers will extract the simplest
features of an image, and further down the layer, the CNN will recognize larger
components of the image.
The majority of the computation will be done in the convolutional layer. The
convolutional layer uses a filter which is in the form of a matrix and the filter will then
be applied to areas of an image. The filter is typically a 3x3 matrix with fixed weights.
The pixels of the image and the filter will then be multiplied through a dot product.
After the computation of a certain area, the filter will then shift to another area of an
image. The shift depends on the stride given, and the filter or kernel will keep shifting
until the entire image is covered. The final output array of the computation is known as
the feature map. The schematic representation of this computation can be seen in Figure
9.
Michael Jeremy
Figure 9. Convolutional Layer Mechanism (IBM Cloud Education, 2020)
The pooling layer reduces the number of parameters in the input by reducing the
dimensionality of the input (Albawi, Mohammed and Al-Zawi, 2017). The process is
similar to the convolutional layer where a filter is swept across the entire input.
However, unlike the convolutional layer, the filter used in the pooling layer does not
include any weights. Instead, the filter will aggregate the values within a selected area,
which will then be put onto the output array (IBM Cloud Education, 2020). There are
two types of pooling: max pooling and average pooling.
Max pooling selects the pixel with the highest value and that given value is sent towards
the output array. Meanwhile, the average pooling calculates the average value within a
selected area and the averaged value is sent towards the output array. Based on the
mechanism of the pooling layer, this layer is known as the down sampling operation,
where lots of information of an image is lost (Albawi, Mohammed and Al-Zawi, 2017).
In return from the loss of information, the pooling layer helps in reducing the
complexity of the image, improving efficiency and limiting the risk of overfitting.
Michael Jeremy
Figure 10. A Fully Connected Layer (Charlotte Pelletier, no date)
The fully connected layer can be visualized as seen in Figure 10 where it is the final
layer of CNN. Each node from the output layer is connected directly to a node from the
previous layer. This differs from the convolutional layer and the pooling layer where
the input image are not directly connected to the output layer, or partially connected
(IBM Cloud Education, 2020). Fully connected layers typically use a softmax activation
function to classify or detect an input appropriately. This will result in a probability
from 0 to 1. Convolutional layers and pooling layers typically use the ReLu functions.
There are several popular architectures of CNN, which includes LeNet-5, AlexNet,
VGGNet, GoogLeNet, ResNet, and ZFNet. This study uses the DenseNet-201
Convolutional Neural Network which is a CNN with 201 layers deep (Huang et al.,
2017). DenseNet utilizes a dense connection between its layers, which decreases
information loss. A common problem arises with CNN as the layers go deeper, where
the path for the input information becomes too wide that the information gets lost in the
process. DenseNet counters this problem through a maximum flow of information by
connecting each layer to each other.
2.7 Confusion Matrix

The confusion matrix is a method to assess the performance of a model in classifying
each class correctly and incorrectly. In a binary classification, the confusion matrix
consists of a 2x2 square with the value True Positive (TP), True Negative (TN), False
Positive (FP), and False Negative (FN). With these 4 parameters, it grants quantitative
analysis regarding how the program performed in classifying a set of samples. Through
Michael Jeremy
the confusion matrix, other parameters can also be calculated, such as accuracy,
precision, sensitivity, and others. The confusion matrix can be seen in Figure 11 below.
Figure 11. Confusion Matrix

Using the values obtained from the confusion matrix, the equation to calculate accuracy,
precision, sensitivity, and specificity can be seen in Equation 3 to Equation 6. Accuracy
shows how well the program performed in classifying every sample into their correct
class. The higher the accuracy, the better the program performed. Precision and
Sensitivity value represents how well the program in detecting positive classes.
Precision and sensitivity are used depending on the case. For example, in making a
program to classify whether a mail is a spam or not, where positive indicates a spam
mail, and negative indicates an important mail. It is crucial that the program does not
miss any important mails, it means that the FP value must be low. Therefore, it is
important that the program have a high precision, and the sensitivity can be
compromised. In this study, positive indicates that the breast in the mammogram image
have some abnormalities. Therefore, it is important that the FN value be low. Sensitivity
is a more important parameter rather than precision. Specificity represents how well the
program performed in classifying the negative classes.
𝑇𝑃 + 𝑇𝑁
𝐴𝑐𝑐𝑢𝑟𝑎𝑐𝑦 = (3)
𝑇𝑃 + 𝐹𝑃 + 𝐹𝑁 + 𝑇𝑁
𝑇𝑃
𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 = (4)
𝑇𝑃 + 𝐹𝑃
𝑇𝑃
𝑆𝑒𝑛𝑠𝑖𝑡𝑖𝑣𝑖𝑡𝑦 = (5)
𝑇𝑃 + 𝐹𝑁
𝑇𝑁
𝑆𝑝𝑒𝑐𝑖𝑓𝑖𝑐𝑖𝑡𝑦 = (6)
𝑇𝑁 + 𝐹𝑃
Michael Jeremy
CHAPTER 3 – RESEARCH METHODS
In this chapter, the design and the methods used for the study is explained. The design
of the study includes the design block diagram, the algorithms, the pre-processing
algorithm which includes the detailed pre-processing steps, and the classification
process. The materials and equipment which are used in this study are also discussed in
the following sections. The materials include the dataset used and the data obtained,
whereas the equipment includes the hardware, the programming language, the platform,
and the environment.
3.1 Study Design

The study design explains the flow from the start of the program until the end. Figure
12 represents the design block diagram for the algorithm developed in this study. The
input image represents the mammogram image to be analyze or predicted. It can be
seen that to produce the prediction for the mammogram image, the whole process
includes two major steps: The “Image Pre-processing” step and the “CNN
Classification” step. The pre-processing step was designed to manipulate and enhance
the input image. The details on how the pre-processing step enhances the input image
will be discussed later in this section. However, the “image pre-processing” step was
done with the aim to further increase the performance of the CNN used in the
classification step. The “CNN classification” step used a transfer learning method that
used the DenseNet201 CNN architecture which took the pre-processed mammogram
image input and produced the predicted classification output for the corresponding
input image. The predicted output will also be discussed further in the following
paragraph in this section.
Figure 12. Study Design Block Diagram
Michael Jeremy
As seen in Figure 13, the flow of the “Pre-processing Step” starts from the input image
or the original image, and ends after the image is resized. The resized image is the final
output of the pre-processing step which then ready to be classified by the model. The
pre-processing pipeline involves 4 major steps which includes Median Filtering, Otsu
thresholding, Truncation Normalization, and CLAHE. The product of this pre-
processing pipeline is the synthesized image which is a 3-channeled image coloured
image in PNG format. Then the synthesized image is converted into a grayscaled image.
Afterwards, the image is further processed to meet the requirements for the
DenseNet201 CNN input. The requirement for DenseNet201 CNN input is a 224x224
image. Since the image has an uneven ratio due to the cropping by the Otsu
thresholding, therefore, to avoid losing clarity of the breast, the image was padded.
Flipping was required for right breast mammogram images both CC and MLO view.
Left Breast, CC and MLO view were not required to be flipped. Therefore, the images
were only required to be padded on its right side. After the padding process, the images
are then resized to 224x224. The end product of the “Pre-processing step” is the
processed image in PNG format. The processed image will then be either augmented
for CNN training or classified during the CNN testing. The data augmentation process
enlarges the dataset for efficient CNN training.
Figure 13. Pre-processing Step Diagram

Michael Jeremy
In Figure 14, the “Classification Step” shows how the deep learning model is structured
and how the layers are made, and the output results. The model used was the
DenseNet201 architecture with the customized additional top layers. The customized
top layers produced a variation of output shape. The classification step utilizes the
transfer learning method where a pre-trained weight is used to learn the mammogram
images. There are two different outputs of the classification step depending on which
dataset is used: Mass or Normal, Benign or Malignant. Mass is when the breast in the
image contains a mass, whereas normal is when the breast does not contain any mass.
Malignant is when the breast has a mass which suggests high potential to be cancer,
whereas benign is when the breast does not contain a mass or has a mass but does not
suggest a high potential to be cancer.
Figure 14. Classification Step Diagram
3.2 Materials and Equipment

Before CNN was used to be able to classify mammogram images, it needs sufficient
images to train the CNN, and the training results were saved in the form of weights
which was then used for the classification process. The more images to train the CNN,
the better the performance of the CNN will be. In this study, the mammogram images
were obtained from the INbreast dataset and mammogram images were also obtained
from Husada Hospital located in Jakarta.
Michael Jeremy
3.2.1 Mammogram Dataset
The data used in this study consist of mammogram image samples obtained from a
publicly available dataset, the INbreast dataset (Moreira et al., 2012) (can be seen in
Appendix 1). The INbreast dataset consists of 115 cases. Each case has either four
images (left and right, CC and MLO view) or two images (left or right, CC or MLO
view). Out of the 115 cases, 90 cases were from women with both breasts assessed
which results in four images per case, and 25 cases were from mastectomy patients
which results in 2 images per case. This means that the INbreast dataset has in total 410
images. There are several types of lesions included within the images from the INbreast
dataset, which consists of: mass, calcifications, asymmetries, and distortions. The
dataset also provides accurate contours made by radiologists in XML format.
In this study, only breasts that does not have any findings and breasts that have been
assessed to have masses were selected for this study. Out of the 4 types of lesions in the
mammogram images in the dataset, this study only used mass as it has the most
distinctive shape in a mammogram image. Out of the 410 images, the INbreast dataset
has in total 108 images that has a mass finding within the breast. Breasts that do not
have any findings were 68 images in total. Therefore, the total images used from the
INbreast dataset were 176 images.
This study was also provided with mammogram images from Husada Hospital which
is a hospital located in Jakarta. The mammogram images from Husada Hospital were
obtained from 15 patients dated from September 2021 until February 2022. In total, the
images obtained from Husada Hospital were 64 images. However, 2 images were not
used since it was defected. Therefore, the total mammogram images used from Husada
Hospital was 62 images (can be seen in Appendix 2).
To make the dataset to be used for the model in the classification step, the images were
then divided into 2 different folders: train and validation. Within each train and
validation folder consists of other folders which represents the classes the images were
to be classified into. The train folder is made to be the source folder for CNN training,
whereas the validation folder is to test the performance of the CNN.
Michael Jeremy
All images used in this dataset were provided in DICOM format, which stands for
Digital Imaging and Communications in Medicine. The DICOM format is an
international standard format, where it can be used to view, store, retrieve and share the
images. The DICOM format is used because it is standardized to conform to protocols
designed to maintain the accuracy of information relayed through medical images.
Therefore, to be able to process these images using python, the pydicom library must
be imported first prior to any further steps.
3.2.2 Equipment
The algorithm was designed and tested using the following hardware:
• Hardware: Acer Aspire 5
• Processor: Intel(R) Core(TM) i3-1005G1 CPU @ 1.20GHz, 10th GEN
• RAM: 12.0 GB
The image pre-processing scripts or codes developed for this research, were conducted
and implemented with the environment below:
• Operating System: Windows 64-bit OS, x64-based processor
• Platform: Jupyter Notebook 6.4.5
• Programming Language: Python 3.9.7
• OpenCV Python 4.5.5.62
The model or the CNN scripts or codes developed for this research was designed and
tested using the environment below:
• Operating System: Windows 64-bit OS, x64-based processor
• Platform: Kaggle
• Programming Language: Python 3.7.12
• GPU Accelerator: NVIDIA Tesla P100
Experiments for the CNN codes or the classification step were conducted on an online
platform, Kaggle. The Kaggle website provides a fully online and customable, Jupyter
Notebooks environment, as well as a GPU accelerator which accelerates image
processing and neural networks. Using the GPU accelerator, experiments can be
conducted at a much faster rate. However, Kaggle has set a limit of 30 hours of GPU
usage for a session per week. Using Kaggle, it allows users to edit and run code scripts
Michael Jeremy
on the browser rather than running it on the local computer. Therefore, the heavy tasks
are done online and under the environment set by Kaggle. By utilizing Kaggle, the time
needed to execute the CNN training and testing was greatly reduced, as well as the
weight on the local computer used. The pre-processing step was all conducted on the
local computer, utilizing the platform Jupyter Notebook. The pre-processing step was
done on the local computer because the task was not as heavy compared to train and
test the CNN. The programming language used in this study was Python.
3.3 Algorithm for Image Classification

The algorithm consists of image acquisition, pre-processing, and CNN training and
testing. The block diagram can be seen Figure 15 below.
Figure 15. Block Diagram for Mammogram Classification
3.3.1 Pre-processing Algorithm

After acquiring the images and selecting the needed images, the images were then
preprocessed to enhance its quality. The main preprocessing step includes processing
the image with Median Filtering, Otsu thresholding, Truncation normalization,
CLAHE. Afterwards, the images were then manipulated by flipping, padding and
Michael Jeremy
resizing to meet the CNN requirements. Data augmentation was used to enlarge and
balance the data. The augmentation code was used to enlarge the dataset for CNN
training to around 1200 images, which enlarges the dataset for training to around 500
to 600 images for each class. This is done through rotating the images either 90, 180,
and 270 degrees, and flipping the images left, right, top and bottom. However, prior to
the augmentation process, the images were first preprocessed. The mentioned methods
for the preprocessing pipeline can be seen in Figure 16.
Figure 16. Pre-processing Algorithm
Michael Jeremy
The median filter works to blur an input image. Through median filtering, the edges of
the breast were more preserved before any further preprocessing and it reduced the
amount of noise present within the image. The kernel size for the median filter is set to
3x3. The larger the kernel size, the more blurred the image becomes. In this study, 3x3
sized kernel was used since it provides the most optimum results.
Figure 17. Example of Median Filter (Median Filtering, 2019)
Figure 17 represents an example on how the median filter affects an image. In this
example, the original image is full of salt and pepper noise which lowers the quality of
the image. After processing using the median filter, the noise was removed from the
image and the quality of the image is enhanced. The filter works by blurring parts of
the image to remove those noises.
Figure 18. Example of Otsu Thresholding (Otsu Thresholding, 2019)
In Figure 18, it can be seen how the Otsu thresholding method separates the foreground
of interest of the image from its background, where the foreground is represented as the
colour white, and the background as black. Since most of a mammogram image consists
Michael Jeremy
of the black background, Otsu thresholding was used to get the breast region from the
image. Using Otsu thresholding, separation from the breast region or foreground from
the black background of a mammogram image was obtained. The bounding box for the
breast region can then be calculated from the resulting masked image. Using the
bounding box of the breast region’s dimensions, the image is then cropped to obtain
only the breast region. Therefore, the main purpose of using the Otsu thresholding is to
separate the foreground from the background, obtaining the masked image, and then
using the generated bounding box to crop the image to obtain only the foreground
region.
Afterwards, the contrast of the image is enhanced through truncation and normalization
and CLAHE. Truncation normalization is used to enhance the contrast of the mass from
the surrounding breast tissue. Because even after Otsu thresholding, the image still has
a considerable amount of black background. If a standard normalization is used, then it
would induce an adverse effect towards the image. Therefore, the normalization method
used is the truncation normalization. As explained in Chapter 2, Section 2.6.4, the
truncation normalization takes in two values, Pmin and Pmax which is determined from
the intensity distribution of the image. Afterwards, the image is processed using the
truncation processing, which can be seen in Equation 1. And then the image is
normalized using Equation 2.
Afterwards CLAHE is used to further define the contrast of the normalized

mammogram image obtained from the truncation normalization process. Two CLAHEs
were used where the clip limit was set to 1.0 and 2.0. Using two CLAHEs with two
different clip limits gave better contrast to differentiate any mass from the surrounding
breast tissue. The other reason to use two different limits was because as seen in Figure
12, the synthesized image was aimed to produce an image with 3 channels, like an RGB
image. Therefore, the processed image was synthesized where it uses the image
obtained from the truncation normalization, CLAHE with clip limit 1.0 and CLAHE
with clip limit 2.0. The three images were merged together to form a 3-channel image.
Afterwards, the image was converted into a gray scaled image to reduce the complexity
for the model in the classification step.
Michael Jeremy
Figure 19. Padding Algorithm
Since the model used in this study, which is the DenseNet201 CNN, requires a 224 x
224 image input, all the images needed to be resized to 224 x 224. Before resizing the
image, because of the Otsu thresholding, the ratio of the images became uneven due to
the cropping. Therefore, any image with an uneven ratio is first padded into a square
image, to avoid losing the clarity of the breast region due to resizing a non-square image
into a square image. To simplify the process, images with the breast region on its right
side were flipped. By flipping the image, the image can then be padded only on its right
side. After the image is padded into square, the image is resized into 224 x 224. The
padding algorithm can be seen in Figure 19.
3.3.2 Preparation for CNN Training

Prior to train the CNN, data augmentation was conducted and then the datasets were
developed. The images synthesized from the previous steps were augmented using an
augmentation pipeline. The aim of the data augmentation is to enlarge the dataset and
balance the amount of data in each class. This is done because a sufficient data will
produce good results from the CNN’s performance, and avoid unwanted situations,
such as overfitting. The output of the augmentation process is saved in the hard drive
of the used hardware within a generated output folder in the source folder. Before the
images were augmented, a folder for the completed dataset to be used was made. As
Michael Jeremy
mentioned in Section 3.2.1, the dataset consists of two folders: train and validation. In
this study, there are two different datasets made.
Figure 20. Mass Normal Dataset Folder Structure
The folder structure of the first dataset developed can be seen in Figure 20. In the first
dataset, within the train and validation folder, each consists of another two separate
folders which are mass and normal which represents the classification class for the
classification process. After the dataset folder was made, the synthesized images from
the preprocessing step were split and put into the train and validation folder, with each
image going into its respective classes. For the mass folder, all mammogram images
that have a mass finding were put into the mass folder in the train folder or the validation
folder. These images were labeled as mass. Mammogram images that do not have any
mass findings are put into the normal folder, also in the train or validation folder. These
images were labeled as normal. The split was set to 70% training and 30% validation
for the mass labeled images and 50% training and 50% validation for the normal labeled
images. The split was set to 50-50 for normal images due to the lack amount of normal
labeled images. The images in the train folder were then augmented using data
augmentation. In this study, this dataset was called Mass Normal Dataset. Using only
mammogram images from the INbreast dataset, the Mass Normal Dataset contains 577
mass images and 580 normal images in the train folder, and 30 mass and normal images
in the validation folder. Incorporating mammogram images from Husada Hospital into
Michael Jeremy
the dataset, the amount of normal images increased, since there were 4 mass images
and 58 normal images from Husada Hospital. Therefore, the split was set to 80%
training and 20% validation for both mass and normal images. After the addition of
mammogram images from Husada Hospital, the dataset contains 641 mass images and
653 normal images in the train folder, and 22 mass images and 25 normal images.
Figure 21. Benign Malignant Dataset Folder Structure
The folder structure for the second developed dataset can be seen in Figure 21. For the
second dataset, the structure of the dataset also follows the structure of the first dataset.
However, the difference is in the folder within the train and validation folder. In the
second dataset, the folders in the train and validation folder were benign and malignant.
Referring to Table 1 in Section 2.4.4, the mammogram images that were selected for
the malignant folder were mammogram images that were reported as BI-RADS 4 to BI-
RADS 6. These images were labeled as malignant, and all these images are guaranteed
to have a mass lesion, and the mass lesion are highly suggestive of cancer. For the
benign folder, the mammogram images that were selected for this folder were
mammogram images that were reported as BI-RADS 1 to BI-RADS 3. These images
were labeled as benign. The mammogram images in the benign folder consist of breasts
that did not have any findings and also breasts that have a mass, but not suggestive of
cancer or a benign mass. The split for the second dataset was 80% training and 20%
validation. In this study, this dataset was called Benign Malignant Dataset. Using only
Michael Jeremy
mammogram images from the INbreast dataset, the Benign Malignant Dataset contains
636 benign images and 617 malignant images in the train folder, and 20 benign images
and 14 malignant images in the validation folder. After the addition of mammogram
images from Husada Hospital, the dataset contains 659 benign images and 633
malignant images in the train folder, and 32 benign images and 15 malignant images.
After making the dataset, the CNN can now be trained and tested with the data in the
dataset. The purpose of the train folder is to train the CNN and update the pre-trained
weights which will be used for the classification step. The training of the CNN purely
used the train folder only. Each epoch or the iteration of the training process separates
the images from the train folder into 80% training and 20% evaluation, where the
training step is aimed to update the pre-trained weight that was obtained from imagenet.
In each epoch, it uses images which was separated for the 20% evaluation to evaluate
the CNN’s performance with the obtained weights. It must be noted that the evaluation
that is conducted in the training process of the CNN is not the same as the validation
step which will be explained in the next sub-chapter. The evaluation in the training
process only used images that are in the train folder, whereas the validation in the
validation testing process uses images in the validation folder.
The program starts by loading all of the images in the dataset in Figure 20 and Figure
21 into numpy arrays which will be the dataframe and then creating the labels for each
image. The labels for normal or benign images (depending on which dataset used) were
set to zero using np.zeros, whereas the labels for mass or malignant images (depending
on which dataset used) were set to ones using np.ones. Then, the data is categorized
into X_train, Y_train, X_test and Y_test, where: X_train is the mammogram images
from the train folder, Y_train is the labels for the images in X_train, X_test is the
mammogram images from the validation folder and Y_test is the labels for the images
in X_test. Afterwards, the data in the dataframe were shuffled. The data is then
categorized and split into train and evaluation for training. The split used images and
labels in the X_train and Y_train data and categorized into x_train, y_train, x_val and
y_val. The ratio was set to 0.2, which means that 80% of the data in X_train and Y_train
were categorized into x_train and y_train, and 20% of the data were categorized into
x_val and y_val. The flowchart of the process can be seen in Figure 22.
Michael Jeremy
Figure 22. Dataset Splitting Flowchart
Each epoch in the training of the CNN produced 4 parameters: loss which represents
the training loss, accuracy which represents the training accuracy, val_loss which
represents the evaluation loss, and val_accuracy which represents the evaluation
accuracy. The parameters loss and accuracy indicate the performance of the CNN using
x_train and y_train, whereas val_loss and val_accuracy indicate the evaluation of
performance of the CNN using x_val and y_val. A learning rate reducer was used to
reduce the learning rate of the CNN. The learning rate reducer was used to avoid the
problem where the CNN learns to fast and results in a plateau where the performance
of the CNN does not improve. The learning rate reducer monitors the val_accuracy, and
the patience was set to 3. This means that if the val_accuracy does not improve after 3
epochs, the learning rate of the CNN will be reduced. The learning rate was set to 1e-4
and the minimum learning rate was set to 1e-7. Each epoch will update and generate a
pre-trained weight which is named weights.best and is saved in HDF5 format. The
program will assess whether the validation accuracy improved with the generated
weight in each respective epoch. If the validation accuracy improved, then the weights
is updated. The program will only save the best weight from the best epoch.
3.3.3 CNN Testing

The model summary of the CNN used can be seen in Figure 23. The CNN starts by the
DenseNet201 architecture. To avoid the DenseNet201 CNN creating its own output,
the top layer of the DenseNet201 CNN was disabled, and replaced by a customized
layers to create the outputs based on the class that have been designed. The layers
consist of a global average pooling layer, followed by a series of dropout and dense
layers and a batch normalization layer.
Michael Jeremy
Figure 23. CNN Model Summary
After the training step, the program was tested in the validation or testing step, where it
uses the validation folder which has also been loaded into the dataframe, categorized
into X_test and Y_test. Up until this step, the CNN have only utilized the images only
from the train folder, which means the images in the validation folder hasn’t been used
by the CNN. Therefore, the performance of the CNN can be evaluated. The processed
images are inputted into the CNN with the DenseNet201 architecture with pre-trained
weights obtained from imagenet that has been updated from the training process.
Figure 24. CNN Classification
The overall flow on how the CNN classifies a mammogram image can be seen in the
classification process block diagram which can be seen in Figure 24.
3.4 Doctor and Radiologist Opinions

As a part of validation method, consultations were made with an expert radiologist from
Husada Hospital. Opinions from doctor’s and radiologist’s on how the developed image
Michael Jeremy
pre-processing methods used in this study affected the mammogram images were
obtained through these consultations. Confirmation on the presence of cancer was also
obtained. Due to the shortage of time, these consultations were only interview based,
where several mammogram images from the pre-processing pipeline were directly
shown to the expert radiologist, and then the insight was noted for further discussion in
the next chapter. Further improvements towards the consultation and interview method
still need to be conducted to gain more insight on the medical perspective regarding this
study. These improvements may include the usage of questioners, where proper records
can be obtained from the consultations.
Michael Jeremy
CHAPTER 4 – RESULTS AND DISCUSSIONS
4.1 Results of Pre-processing

As explained in the Section 2.6, preprocessing an image is aimed to manipulate and
enhance the input image for better CNN performance.
Figure 25. Original Mammogram Image with Its Corresponding Histogram: (a)
Craniocaudal View; (b) Mediolateral Oblique View
An example of an original mammogram input image can be seen in Figure 25, which is
the image that will be obtained when performing a mammogram scan. The input image
will then be processed through several steps. The example image in Figure 25 represents
the left breast from the same patient. Figure 25(a) represents the CC view, whereas
Figure 25(b) represents the MLO view. As seen in the image, the mass is found to be at
the middle and upper part of the breast. Analyzing the histogram, both histograms have
a maximum peak at the intensity value 0 represented by the dashed line, and the breast
region is distributed to a certain range within the intensity distribution. Because of the
Michael Jeremy
high number of pixels with the intensity value of 0, this made the breast region’s pixel
intensity distribution very small and barely visible. These histograms put a heavy
emphasis on how much background there was compared to the needed part of the image
or the ROI of the image which is the breast region. Therefore, to reduce the burden on
the processor and the computation time, preprocessing was needed to crop the ROI of
the image, which is why the Otsu thresholding is needed.
4.1.1 Median Filtering

However, the first preprocess step is to filter the image using Median Filtering. The
Median Filter blurs the image in correspondence to the kernel size used. In this study,
the kernel size was set to 3x3. A larger kernel size will over-blur the input image. Other
than blurring, the filter also preserves the edges of the breast in the mammogram image.
By preserving the edges of the breast in the image, the next preprocessing steps will not
degrade the shape of the breast from the mammogram image. Figure 26(a) and Figure
26(b) shows the before and after the image is processed using the median filter
respectively. Looking at the filtered image, the image can be seen that it is slightly
blurred when compared to the original image.
Figure 26. Output of Median Filtering: (a) Original Image; (b) Median Filtered
4.1.2 Otsu Thresholding

After the median filter, it can be seen that the majority of the mammogram image was
still composed of the black background, which means that there were still a high number
of pixels with intensity value 0. These 0 values hold no valuable information as the
needed information are only contained only within the breast region of the image.
Therefore, Otsu thresholding was used to crop the background of the image by
Michael Jeremy
separating the foreground which is the breast region from the black background. Using
the Otsu thresholding, the masked image was obtained where it identified which part of
the image is the foreground and which part of the image is the background.
Figure 27. Otsu Thresholding Results
The output obtained from the Otsu thresholding can be seen in Figure 27, where the top
image represents the input image, and the bottom image is the cropped image.
Assessing the image histogram, it can be seen that the number of pixels with the
intensity value of 0 in the original image was 1.06x107. After cropping using Otsu, the
breast region’s pixel intensity was more visible because the amount of pixel with the
value of 0 were clipped. Through Otsu thresholding, the number of pixels with the
intensity value of 0 was reduced from 1.06x107 number of pixels to 861,150 number of
pixels. In other words, there was a reduction of 91.9% for pixels with intensity value 0,
and 71.4% in the total number of pixels from the original image. Thus, there was a clear
reduction of background from the image after Otsu thresholding, which is beneficial to
reduce processing burden and computational time for the next preprocessing steps. It is
also important to note that the Otsu thresholding method retained the information from
the breast, which can be seen when comparing the original image histogram and the
Michael Jeremy
cropped image histogram, the shape of both histograms was still identical, and only the
number of pixels with intensity value 0 were reduced. Therefore, through the Otsu
threshold, it maximizes the process to reduce computation time and processing burden.
From Otsu thresholding, the masked version of the original image was also obtained.
The foreground was represented as a white region in the masked image, and the black
region represented the background. The masked version can be seen in Figure 28. The
separation of the foreground from the background results in the masked image. Then
by using cv2.findContours, the bounding box (x, y, w, h) for the breast region can be
obtained, where x represents the x axis value, y represents the y axis value, w represents
the width, and h represents the height. For the original mammogram image input, the
bounding box obtained was 6, 420, 1493, 2600 which are x, y, w and h respectively. It
is by using the bounding box obtained that the input image was cropped to obtain only
the breast region from the image. Each image will have a different bounding box output,
as the shape of the breast are not always the same.
Figure 28. Masked Image and Generated Bounding Box
4.1.3 Truncation Normalization

Afterwards, the cropped image is enhanced using truncation normalization and
CLAHE. As explained in Section 2.6, an image is stored as a numpy array. The Pmin
and Pmax was set using np.percentile where: Pmin (minimum value) is selected as the
5% small end position of the sorted pixel intensity; Pmax (maximum value) is selected
at the 1% at the large value end position of the sorted pixel intensity. These parameters
were optimally set based on all the images. The process used the breast mask obtained
from the Otsu threshold to extract the intensity only from the region corresponding to
the breast region.
Michael Jeremy
Figure 29. Truncation Normalization Results
After setting the Pmin and Pmax values, the intensity P of each pixel within the image
is subjected towards the truncation process where the process can be seen in Equation
1. After the truncation process, the pixels were then normalized using Equation 2. The
result of the truncation normalization process can be seen in Figure 29. It can be seen
that the mass has become much clearer after normalizing and because of the truncation
process, the breast region’s intensity has also been retained. After normalizing the
image, it can clearly be seen that the intensity distribution of the breast region shifted.
Originally, the intensity distribution for the breast region in the original mammogram
image was narrower, and many of the pixels are confined into a small region within the
intensity value distribution. After the truncation normalization, the intensity distribution
had become wider, where it occupies the whole region of the histogram. By shifting the
intensity distribution, it can be seen that the mass lesion within the image had become
very visible, whereas the visualization of the breast tissue was reduced to maximize the
visualization on the mass lesion. However, the area surrounding the mass lesion had
become too dark, and the vascular system in the breast tissue had also become too dark.
Therefore, the next pre-processing step, which is the CLAHE method was done to
further improve the contrast from the normalized mammogram image.
Michael Jeremy
4.1.4 Contrast Limited Adaptive Histogram Equalization
Using the image obtained from the truncation normalization, the contrast of the image
is further enhanced using CLAHE. As explained in Section 2.6.3, a histogram
equalization algorithm will stretch the pixel intensity distribution throughout the
histogram. A standard Adaptive Histogram Equalization however induces a problem
where if an image has a narrow interval within its histogram, equalizing it will stretch
the narrow interval to the entire intensity range. When this occurs, noises will be
generated within the image, which is to be avoided. By using CLAHE, it introduces the
usage of a clip limit where after the calculation of probability density function (PDF),
the code loops through the PDF, comparing it to the clip limit. If the PDF is greater than
the clip limit, the code will “clip” the count of pixels at the given pixel intensity. In this
study, a clip limit of 1.0 and 2.0 was used. After enhancing the contrast using CLAHE,
the normalized image and the CLAHE enhanced image is merged together to ensure
that the CNN model will be able to extract the essential features from the image.
Afterwards, the image is converted into grayscale to reduce the computation burden
towards the CNN.
Figure 30. Synthesized Image Result
In Figure 30, the comparison between the original image and the enhanced image using
truncation normalization and CLAHE can be seen. It can be seen that after processing
Michael Jeremy
with truncation normalization and CLAHE, the mass within the breast has become very
defined. The confined intensity distribution from the original image has also been
stretch and smoothen in the synthesized image. Through this process, even the vascular
system within the breast tissue can be seen.
4.1.5 Padding, Flipping, and Resizing

The DenseNet201 CNN has a requirement for the input image to be a square image
with a ratio of 224 x 224. Because the images currently have an uneven ratio due to the
cropping from Otsu thresholding, therefore, the images are required to be padded into
a square image. This is done to avoid losing the clarity of the breast due to resizing an
uneven ratioed image into a square image. To simplify the process, all the mammogram
images that showed the left breast was flipped, so the padding process only requires the
image to be padded on its right side of the image. The process follows the algorithm
mentioned in Section 3.3.1 in Figure 19. After padding the image, the image can then
be resized into 224 x 224. Figure 31(a) shows the padding of a right breast mammogram
image, whereas Figure 31(b) shows the padding of a left breast mammogram image.
Figure 31. Final Preprocessing Output: (a) Right Breast; (b) Left Breast
Michael Jeremy
After pre-processing, the final file size for the images were around 20kb. The small file
size was largely caused by the reduction of image size to 224x224. The reduction of
image size was necessary because the input image of the DenseNet201 CNN needs to
be 224x224. However, prior to resizing, the smallest file size of the images after
preprocessing were 645KB, and the largest file size of the images after preprocessing
were 5.25 MB. The original images provided in the DICOM format, were either 16.2
MB or 25.9 MB in size.
4.1.6 Data Augmentation

After padding and resizing the images, the dataset was ready to be made, since all the
images are now preprocessed using the image processing methods. The structure of the
dataset was mentioned in Section 3.3.2 in Figure 20 and Figure 21 where it consists of
2 main folders: train and validation, and each folder has 2 classes: normal and mass for
“Mass Normal Dataset”; benign and malignant for “Benign Malignant Dataset”. After
splitting the images into the train and validation folder, the images in the train folder
were augmented.
The images were augmented using an augmentor library available in python. Using the
augmentor, the images in each class were enlarged and balanced out in quantity. This
was done because there were only 68 normal images, 108 mass images, and 104 benign
images and 72 malignant images from the INbreast Dataset. The images taken from
Husada Hospital provided with 4 mass and malignant images and 58 normal and benign
images. The augmentation process marks the end of the preprocessing steps. After the
data augmentation, the dataset is now ready to be put into the CNN for training and
testing.
For the “Mass Normal Dataset”, after augmentation, using only mammogram images
from the INbreast dataset, the Mass Normal Dataset contains 577 mass images and 580
normal images in the train folder, and 30 mass and normal images in the validation
folder. Incorporating mammogram images from Husada Hospital into the dataset, the
dataset contains 641 mass images and 653 normal images in the train folder, and 22
mass images and 25 normal images.
Michael Jeremy
For the “Benign Malignant Dataset”, after augmentation, using only mammogram
images from the INbreast dataset, the dataset contains 636 benign images and 617
malignant images in the train folder, and 20 benign images and 14 malignant images in
the validation folder. After incorporating mammogram images from Husada Hospital,
the dataset contains 659 benign images and 633 malignant images in the train folder,
and 32 benign images and 15 malignant images.
4.2 Results of Training and Testing

For the results of training and testing on DenseNet201 CNN, several experiments using
the developed dataset was conducted. Because there were two different datasets
developed, several experiments were conducted using the “Mass Normal Dataset”, and
several experiments were conducted using the “Benign Malignant Dataset”. Because
the mammogram images from Husada Hospital were obtained near the end of the
research, the first few trials were conducted using mammogram images only from the
INbreast dataset. After the images from Husada was obtained, then the dataset was
added with the images from Husada Hospital. Therefore, in this subchapter, Section
4.2.1 first shows the results using mammogram images from INbreast dataset only, and
Section 4.2.2 shows the results using mammogram images from the INbreast dataset
and Husada Hospital.
4.2.1 Results using Images from INbreast Dataset

The training of CNN was conducted at the Kaggle website with the usage of GPU
accelerator to accelerate the processing of images and the process of CNN. The
developed datasets were uploaded into the Kaggle website by creating a new dataset.
Therefore, all the datasets were stored inside the Kaggle’s online server, and doesn’t
burden the memory of the local computer. As explained in the previous chapter, the
program starts by loading the images from the dataset into numpy arrays and creating
the labels for each image.
4.2.1.1 Results using “Mass Normal Dataset”

Using the “Mass Normal Dataset”, at 30 epochs, the training of DenseNet201 CNN
reached the highest accuracy score of 97.8% during the training, which is the highest
accuracy obtained from the highest evaluation accuracy during the training. The “Mass
Normal Dataset” contains a total of 1,157 images for CNN training which consists of
Michael Jeremy
577 mass images and 580 normal images in the train folder. A total of 60 images for
CNN testing was used which are 30 mass and normal images in the validation folder.
Evaluation Loss
Using the Training Accuracy vs Evaluation Accuracy graph, and Training Loss vs
Evaluation Loss graph, the performance during the CNN training on the 1,157 training
images can be seen in the graph in Figure 32. The graphs show how the CNN performs
over the 30 epochs. The X axis in both graphs represents the epoch and the Y axis
represents the accuracy and the loss. It can be seen that the higher the epoch, the
accuracy increases while the loss decreases. This behavior is synonymous for both the
training (the blue line) and the evaluation (the orange line). The graph shows that the
more the CNN learns and trains, the higher the accuracy is obtained by the CNN,
whereas the loss decreases. This is the expected result when training the CNN, and it
shows that using the preprocessing methods from earlier to obtain the used dataset, the
program is able to perform efficiently. It can also be seen from the graph that there were
no case of overfitting in the performance of the program. Thus, it can be said that the
training of the CNN was very efficient and highly likely to produce a good result during
the testing.
After training, the CNN was then tested to assess its performance. Earlier, the training
used the images from the train folder. To test the performance of the CNN, the images
from the validation folder was used. The validation folder in the “Mass Normal Dataset”
consisted of 60 images where 30 images were mass images and 30 images were normal
images. Using the weights obtained from the training, the CNN was tested to classify
these 60 images.
Michael Jeremy
Figure 33. Confusion Matrix Result of DenseNet201 CNN Testing on the “Mass
Normal Dataset” Validation Folder using INbreast data
The CNN testing was carried out 3 times. One of the results of the testing can be seen
in Figure 33, where it is represented by the confusion matrix. As mentioned in Section
2.7, the confusion matrix shows the true positive, true negative, false positive and false
negative values which represents which image is classified correctly and incorrectly by
the program. In this case, positive indicates Mass, whereas negative indicates normal.
A positive result indicates mass is present within the mammogram image, which
classifies the corresponding mammogram image into mass class, whereas a negative
result indicates no mass is present within the mammogram image, which classifies the
corresponding mammogram image into normal class. Looking at the confusion matrix,
it can be seen that in this trial, out of the 60 images, the program has predicted 53 images
correctly and 7 images incorrectly. The correctly predicted normal images were 26
images out of 30 images which is defined as the True Negative value (TN), and the
correctly predicted mass images were 27 images out of 30 images, which is defined as
the True Positive value (TP). That means that the program falsely predicted 3 mass
images and 4 normal images, defined as False Negative value (FN) and False Positive
value (FP) respectively. From the values obtained, the accuracy, sensitivity, precision,
and specificity can be calculated using Equation 3 to Equation 6 in Section 2.7.
Therefore, the accuracy obtained is 88.33%, the sensitivity obtained is 90%, the
precision obtained is 87.10% and the specificity obtained is 86.67%. The values showed
that the program was able to classify the mammogram images with some misclassified
results.
Michael Jeremy
Table 3. Experiment Trials with Pre-processing
Experimental Runs (%) Average

Result
Run 1 Run 2 Run 3 Performance (%)
Model : DenseNet201 CNN
TP 26 27 25
FN 4 3 5
TN 25 26 28
FP 5 4 2
Accuracy 85.00 88.33 88.33 87.22
Specificity 83.33 86.67 93.33 87.78
Sensitivity 86.67 90.00 83.33 86.67
Precision 83.87 87.10 92.59 87.85
Mammogram Image Source(s) INbreast Data
Dataset Used Mass Normal Dataset
Note: Positive (Mass), Negative (Normal)
Table 3 shows all the results from 3 experimental runs. It must be noted that the most
significant parameter to assess in this study is the accuracy achieved by the CNN. It can
be seen that the highest accuracy achieved from the CNN testing was 88.33% which
was achieved in experimental run 2 and experimental run 3 (yellow filled cell), and the
average accuracy performance for 3 experimental runs was 87.22%. It can also be seen
that both experimental run 2 and 3 had the same achieved accuracy. However, the case
in this study, it is crucial that the program does not miss any mass images, since it was
more important that the program does not miss any mass cases from patients in real
world application. Therefore, falsely predicted mass images as normal images by the
program were deemed more critical rather than falsely predicted normal images.
Therefore, we can compromise the precision and the sensitivity or the recall should be
higher than the precision. Thus, it can be seen that experimental run 2 achieved the best
in accuracy as well as sensitivity, correctly classifying mass images with 27 TP value,
and the least FN value of all experimental runs, with just 3 FN value. Compared to
experimental run 3, this trial yielded a better precision, due to the FP value being only
2. However, it can be said that the CNN performance in experimental run 3 was not the
more desired performance compared to experimental run 2 as it yielded more FN
predictions, which means that there are mass images that have been missed by the
program.
Michael Jeremy
Table 4. Experiment Trials without Pre-processing

Result
TP 26 24 30
FN 4 6 0
TN 0 0 0
FP 30 30 30
Accuracy 43.33 40.00 50.00 44.44
Specificity 0.00 0.00 0.00 0.00
Sensitivity 86.67 80.00 100.00 88.89
Precision 46.43 44.44 50.00 46.96
Mammogram Image Source(s) INbreast Data
Another experiment trial was also conducted to compare how the pre-processing steps
helped in increasing the CNN’s performance. Table 4 shows results for CNN testing
without any usage of image pre-processing. It can clearly be seen that the program
performed better when the pre-processing methods were used, since in this experiment
trials, the highest accuracy achieved was only 50%. Without the usage of pre-
processing, the CNN tends to classify most of the mammogram images into mass class,
resulting in a high FP value, which can clearly be seen in the poor precision and poor
specificity, along with a high sensitivity. From the 3 experimental runs, not a single
normal image was correctly classified, which can be seen in the 0 TN value throughout
the 3 experimental runs. The main reason for this poor performance majorly revolves
around the insufficient data to train the CNN. Without the usage of data augmentation
to pre-process the images and enlarge the dataset, the program didn’t get enough
mammogram data to be able to train efficiently. It is clear that without the pre-
processing methods, which results in CNN’s inability to differentiate mass images and
normal images, it offers no benefits for real world applications.
4.2.1.2 Results using “Benign Malignant Dataset”

The “Benign Malignant Dataset” was also uploaded onto the Kaggle online server, and
the DenseNet201 CNN training was also conducted at the Kaggle platform using the
Michael Jeremy
GPU accelerator. Using the “Benign Malignant Dataset”, at also 30 epochs, the program
reached the highest accuracy score of 95.62%, which is the highest achieved evaluation
accuracy achieved during the DenseNet201 CNN training. The “Benign Malignant
Dataset” contains 1,253 training images which consist of 636 benign images and 617
malignant images.
Evaluation Loss
Figure 34 represents how the program performed during its training. The training
accuracy vs evaluation accuracy and training loss vs evaluation loss graph also did not
indicate any overfitting in the performance of the program. The graph also shows that
the behaviour of accuracy and loss was similar to the graph produced when using the
“Mass Normal Dataset”, where the accuracy increases and the loss decreases as the
epoch progresses. However in this case, it can be seen that the validation loss curve lies
considerably beneath the training loss curve. This indicates a possibility that the images
in the validation folder in the “Benign Malignant Dataset” was easier to predict for the
program compared to the images in the validation folder in the “Mass Normal Dataset.
This occurrence is also commonly associated with poor sampling procedures where
duplicate samples exist in the training and evaluation datasets. It can also happen due
to a less variety of mammogram image in the evaluation dataset. Another possibility
for this occurrence is information leakage, where features from the mammogram image
samples in the training dataset has a direct link or ties towards the features of
mammogram image samples in the evaluation dataset. After training and obtaining the
weight, the program was then tested again to test its performance. The validation folder
in the “Benign Malignant Dataset” had in total 34 images which consists of 20 benign
images and 14 malignant images.
Michael Jeremy
Figure 35. Confusion Matrix Result of DenseNet201 CNN Testing on the “Benign
Malignant Dataset” Validation Folder using INbreast Data
There were also three experimental runs conducted for the CNN testing. In Figure 35,
shows one of the three results of the testing, represented by the confusion matrix. In
this case, positive indicates malignant images which refers to mammogram images that
have mass lesions and is highly suggestive of cancer, whereas negative indicates benign
images which refers to mammogram images that do not have any mass lesion findings
or have mass lesions but is not suggestive of cancer. Looking at the confusion matrix,
it can be seen that the program correctly predicted all 20 benign images, whereas 12 out
of 14 malignant images were correctly classified. Therefore, only 2 images were falsely
predicted by the CNN out of the 34 mammogram images in the validation folder. In
other words, there are 2 FN result in this confusion matrix. Using the values obtained
through the confusion matrix, the accuracy obtained was 94.1%, the sensitivity obtained
was 85.7%, the precision obtained was 100%, and the specificity obtained was also
100%. Improvements towards the program can be made to increase the sensitivity, to
reduce the number of FN, thereby reducing the risk of missing a malignant case from a
mammogram image. However, the achieved performance from the CNN testing has
given a satisfactory result for benign malignant classification of mammogram images
which will be beneficial to help or support doctors and radiologists diagnose a possible
malignant case from the obtained mammogram image .
Michael Jeremy
Table 5. Experiment Trials with Pre-processing

Result
TP 12 12 10
FN 2 2 4
TN 20 18 20
FP 0 2 0
Accuracy 94.12 88.24 88.24 90.20
Specificity 100.00 90.00 100.00 96.67
Sensitivity 85.71 85.71 71.43 80.95
Precision 100.00 85.71 100.00 95.24
Mammogram Image Source INbreast Data
Dataset Used Benign Malignant Dataset
Note: Positive (Malignant), Negative (Benign)
All three experiment results using the “Benign Malignant Dataset” can be seen in Table
5. Assessing the results using the “Benign Malignant Dataset”, it can be seen that the
program also performed efficiently during the testing. The best performance obtained
was during the first experiment run with an accuracy score of 94.12%, which has
already been represented by the confusion matrix in Figure 35. It can also be seen that
the program performed better in detecting benign cases compared to malignant cases,
which can be seen where the highest number of FP value was only 2, and also evident
by the higher precision rather than sensitivity. However, the overall performance by the
CNN have given satisfactory results with an average accuracy performance of 90.2%,
which is already considered a high accuracy. However, since the CNN was stuck at the
lowest, 2 FN value. Improvements can still be made to further decrease the FN value
and thereby increasing the accuracy score.
Trials without the usage of pre-processing were also conducted to compare how the
pre-processing steps affect the performance of the CNN classification. Again, it can has
been discovered that the developed pre-processing plays a major role in obtaining
optimum results from the program. Without the usage of pre-processing, the program
performed very poorly in detecting malignant cases, which is the more critical class in
this experiment. Without the usage of pre-processing, the program tended to classify
cases into the benign class, even malignant cases, which is not the desired outcome.
Michael Jeremy
The highest accuracy achieved was only 55.88, which is the same for all the
experimental runs. Looking at these low achieved accuracies, it can be concluded that
without the pre-processing step, the program will not be able to give the desired
outcome. The results from experiment trials without pre-processing can be seen in
Table 6 below.
Table 6. Experiment Trials without Pre-Processing

Result
TP 2 1 0
FN 12 13 14
TN 17 18 19
FP 3 2 1
Accuracy 55.88 55.88 55.88 55.88
Specificity 85.00 90.00 95.00 90.00
Sensitivity 14.29 7.14 0.00 7.14
Precision 40.00 33.33 0.00 24.44
Mammogram Image Source INbreast Data
4.2.2 Results using Images from INbreast Dataset and Husada Hospital
As mentioned in Section 3.2.1, the mammogram images that was obtained from Husada
Hospital were 64 images in total. However, 2 images were left out since there are
defects within the image. These images were incorporated onto the existing dataset,
which means that both “Mass Normal Dataset” and “Benign Malignant Dataset” now
consisted of mammogram images from the INbreast Dataset, and mammogram images
from Husada Hospital. The images were added into the “Mass Normal Dataset” and
“Benign Malignant Dataset” according to the criteria that was set for the respective
dataset. After the addition of data, the “Mass Normal Dataset” contains 1,294 training
images (641 mass images, 653 normal images), and 47 testing images (22 mass images,
25 normal images), whereas the “Benign Malignant Dataset” contains 1,292 training
images (659 benign images, 633 malignant images) and 47 testing images (32 benign
images, 15 malignant images).
Michael Jeremy
4.2.2.1 Results using “Mass Normal Dataset”

Using the “Mass Normal Dataset”, the program reached the highest training accuracy
of 94.98% at 30 epochs, which is the highest evaluation accuracy achieved during the
DenseNet201 CNN training.
Evaluation Loss
Using the graph representing training accuracy vs evaluation accuracy and training loss
vs evaluation loss from the CNN model’s training, the performance of the CNN during
training was observed. The graphs seen in Figure 36 shows the performance of CNN
during its training on the training dataset of “Mass Normal Dataset”. Assessing the
graphs in Figure 36, it can be seen that both the training accuracy and evaluation
accuracy increases and both training loss and evaluation loss decreases as the epoch
increases which is synonymous to the previous graphs in Section 4.2.1.1 and Section
4.2.1.2. Again, this behavior from the CNN during its training is to be expected as the
higher the epoch, the more CNN have learned regarding the data in the dataset,
therefore, the increasing accuracy and decreasing loss. There were also no case of
overfitting during the training of the CNN. However, in this case, it can also be seen
the validation loss curve lies considerably below the training loss curve, which indicates
the possibility that the mammogram images in the validation dataset was easier to
predict for the CNN. This behavior was also seen in the CNN training performance in
Section 4.2.1.2. After training, the validation folder from the dataset was then used to
test the CNN’s performance. The validation folder consisted of 47 images in total which
consists of 22 mass images and 25 normal images.
Michael Jeremy
Figure 37. Confusion Matrix Result of DenseNet201 CNN Testing on “Mass

Normal Dataset” Validation Folder using INbreast and Husada Data
The testing of the performance of the CNN on the validation folder was also carried out
in 3 repetitions, and one of the results can be seen in Figure 37 represented by the
confusion matrix. In Figure 37, it can be seen that the CNN correctly predicted 41 out
of 47 images, with 3 images from each class were incorrectly predicted. Calculating
using the values obtained from the confusion matrix resulted in 87.23% accuracy, with
86.36% precision and sensitivity. In this result, it can be seen that the CNN
misclassified each 3 images for mass and normal mammogram images, which can be
seen by the same precision and sensitivity achieved.
Table 7. Experiment Trials

Result
TP 19 20 19
FN 3 2 3
TN 22 20 22
FP 4 5 3
Accuracy 85.42 85.11 87.23 85.92
Specificity 84.62 80.00 88.00 84.21
Sensitivity 86.36 90.91 86.36 87.88
Precision 82.61 80.00 86.36 82.99
Mammogram Image Source INbreast and Husada Data
Michael Jeremy
All the experimental runs conducted can be seen in table 7. The highest accuracy
achieved was 87.2% in experimental run 3, and the average accuracy performance was
85.92%. It can also be debated that although the program reached the best accuracy in
experimental run 3, it can also be said that the most preferred performance of the CNN
in a real world application is the performance obtained in experimental run 2. Compared
to experimental run 3, the performance of CNN in experimental run 2 had the least
amount of false negative results, which results in the highest sensitivity out of the 3
experimental runs. In other words, experimental run 2 had the least risk of missing a
mass lesion from a mammogram image. However, in terms on the most accurate CNN
performance, it was obtained in experimental run 3 with only 6 misclassified
mammogram images.
4.2.2.2 Results using “Benign Malignant Dataset”

Using the “Benign Malignant Dataset”, there were in total 1,292 images for CNN
training and 47 images for CNN testing. Using the dataset resulted in the highest
training accuracy of 94.59% at 30 epochs, which is the highest achieved evaluation
accuracy during DenseNet201 CNN training.
Figure 38. Training Accuracy vs Evaluation Accuracy; Training Loss vs

Evaluation Loss
The performance of the CNN during training can be seen in Figure 38 represented by
the graph training accuracy vs evaluation accuracy and training loss vs evaluation loss.
The behaviour seen in the graphs in Figure 38 was also synonymous with the graph
results from the preceding subchapters, where the training accuracy and evaluation
accuracy increased and the training loss and evaluation loss decreased as the epoch
increases. With the increasing epoch, the CNN has trained and evaluate using the data
Michael Jeremy
and updated the weight to be used to classify the mammogram images. There were also
no case of overfitting from the program also. However, the evaluation loss curve also
lies considerably below the training loss curve, which also indicates the possibility that
the mammogram images within the validation folder was easier to predict by the CNN
due to poor sampling, lack of variety in the mammogram image samples and possible
information leak from the training dataset to the validation dataset. After training, the
CNN was then tested using the validation dataset which consists of 15 malignant
images and 32 benign images.
Figure 39. Confusion Matrix Result of DenseNet201 CNN Testing on “Benign

Malignant Dataset” using INbreast and Husada Data
The CNN testing was also carried in 3 repetitions. One of the results of the experiment
runs can be seen in Figure 39 represented by the confusion matrix. From the confusion
matrix, it can be seen that the program predicted 44 out of 47 images correctly and 3
images were predicted incorrectly. 2 malignant images and 1 benign image were falsely
classified, resulting in a 2 FN value and 1 FP value. Using the values obtained from the
confusion matrix, the accuracy, sensitivity and precision can be calculated. The
corresponding calculation resulted in a 93.62% accuracy, 86.67% sensitivity and
92.86% precision. However, despite the high accuracy, the FN value is still higher than
the FP value, which is not ideal in real world applications. However, the achieved
accuracy by the CNN was already considered a high accuracy, and the performance of
the CNN is satisfactory.
Michael Jeremy
Table 8. Experiment Results

Result
TP 13 13 13
FN 2 2 2
TN 30 28 31
FP 2 4 1
Accuracy 91.49 87.23 93.62 90.78
Specificity 93.75 87.50 96.88 92.71
Sensitivity 86.67 86.67 86.67 86.67
Precision 86.67 76.47 92.86 85.33
Mammogram Image Source INbreast and Husada Data
The 3 experimental runs conducted for CNN testing can be seen in Table 8. From the
results taken from 3 experimental runs, the program reached the highest accuracy of
93.62% in the third experimental run which is also the best CNN performance where
the performance resulted in the least amount of incorrect classifications. The program
reached an average accuracy performance of 90.8%. However, it can be seen that all
trials had the same amount of FN which is stuck at 2 misclassified malignant images.
Thus, it resulted in the same sensitivity obtained throughout all the trials. Therefore, it
can be concluded that the CNN performed the best in the third run where it was most
accurate in its classification.
4.3 Final Output of the Program

To visualize the predicted outputs made by the CNN, at the end of the CNN testing,
code scripts were made to take out some of the mammogram images in the validation
folder, as well as their predicted output with its corresponding truth label. In Figure 40,
the output of the program using “Mass Normal Dataset” can be seen. The images shown
was examples of classification results made by the CNN for the images in the validation
folder of the dataset. On top of each figure showed the predicted result by the program
either mass or normal. when using “Mass Normal Dataset” and benign or malignant
when using “Benign Malignant Dataset”, and it also showed the actual result of the
image. In Figure 40(a), it shows some correct predictions made by the program using
Michael Jeremy
the “Mass Normal Dataset”, whereas Figure 40(b) shows some incorrect predictions. It
must also be noted that repeated training of the CNN may not result in the same
predicted result for each image.
Figure 40. Output using "Mass Normal Dataset"
In Figure 41 also shows the output of the program but using the “Benign Malignant
Dataset”. Figure 41(a) shows the correct predictions made by the program using the
“Benign Malignant Dataset”, whereas Figure 41(b) shows the incorrect predictions.
Figure 41. Output using "Benign Malignant Dataset"
Michael Jeremy
4.4 Discussion
After conducting several experiments using the developed algorithm for pre-processing
and CNN, it can be said that the program was able to be used for mammogram image
classification for breast cancer detection. Although improvements must be made
towards the program, this research has produced satisfactory results and to a certain
extent, provide improved results. This research has provided a solid stepping stone for
future studies to continue developing the program to further increase the capability and
potential of this program.
The developed pre-processing pipeline consisting of Median Filtering, Otsu

Thresholding, Truncation Normalization, and CLAHE have enhanced the mammogram
images. The pre-processing methods have made the mass lesion within the breast tissue
very defined and more visible, and the vascularity and tissues were also clearly defined
after the mammogram image is manipulated.
All of the trainings of CNN using the developed datasets showed no signs of overfitting.
The achieved evaluation accuracy score during CNN training was also high. However,
it must be noted that some graphs, specifically the graphs in Figure 34 in Section
4.2.1.2, Figure 36 in Section 4.2.2.1, Figure 38 in Section 4.2.2.2, the evaluation loss
curve lies considerably beneath the training loss curve which may indicate the
possibility that the mammogram images in the evaluation dataset were easier to predict
by the CNN. This may be caused by poor sampling where duplicate or similar samples
exists in both training dataset and evaluation dataset, poor variety in the evaluation
dataset, or information leak where features from the training dataset have direct ties
towards features in the evaluation dataset.
Based on the experiment results for CNN testing in Section 4.2.1 and Section 4.2.2,
using accuracy as the primary parameter, the program performed better using the
“Benign Malignant Dataset”, compared to using the “Mass Normal Dataset”. The
highest accuracy achieved using the “Benign Malignant Dataset” was 94.1%, whereas
the highest accuracy achieved using the “Mass Normal Dataset” was 88.3%. This may
be caused by the difficulty in identifying the mass due to the variation of tissue density
in mammogram images, where masses in breasts with high tissue density were more
Michael Jeremy
difficult to distinguish and spot compared to masses in breast with low tissue density or
fatty breast. Because both fibroglandular tissues and mass lesion is represented as white
areas within a mammogram image, the higher the density of the fibroglandular tissues,
the higher the chance of the program misclassifying the corresponding mammogram
image, because the white areas caused by the high tissue density may mask an
underlying mass. From repeated experimental runs, the highest average accuracy
obtained was 90.8% in classifying benign or malignant, and 87.2% in classifying mass
or normal. It can be seen that the accuracy in classifying benign or malignant was still
higher compared to classifying mass or normal. Experiments using datasets that have
been added with mammogram images obtained from Husada Hospital have achieved a
relatively similar accuracy. Therefore, it can also be said that the CNN can still perform
using different sources of mammogram images and also mammogram images from
Indonesia.
The pre-processing methods have also been proven to play a major role in obtaining the
desired results from the program. As proven from experiments from Table 4 in Section
4.2.1.1 and Table 6 in Section 4.2.1.2 which shows experiments of CNN testing without
the pre-processing pipeline, the accuracy achieved was far inferior compared to the
accuracy achieved when using the pre-processing pipeline.
Several limitations were also encountered during this study. The code developed in this
program, specifically the Truncation Normalization code to manipulate the contrast of
the mammogram image can be improved to be more adaptive to each image. The
parameters for the truncation process were fixed which was optimally decided based on
all images. The hardware used in this study was also limited to which conducting the
training and testing of the CNN was not possible using the local computer because of
the experimentally exhaustive and high computational power and burden. Therefore, an
online server was needed to conduct the experiments using the CNN. This will cause
some problems towards future usage of the program, because some hospital’s ethical
code does not allow any patient data or image be uploaded onto an online server. The
size of the dataset used in this study was also substantially small, whereas CNN works
better in large datasets. Therefore, the performance of the CNN can drastically be
improved if a large enough dataset was used.
Michael Jeremy
In the future, this program can be beneficial as it can serve as a second opinion for
doctors and radiologists in analyzing mammogram images for breast cancers. The
image obtained from the pre-processing step can also provide enhanced mammogram
images that are easier for doctors and radiologists to analyze, and hopefully this
program can improve or provide support towards their diagnosis, reducing the risk of
missing a cancer from a mammogram image.
4.4.1 Comparison to Previous Research

Table 9. Results Comparison to Other Research
Classification
Research Accuracy Sensitivity Precision Specificity
Classes
(Zhang et al., 2021) 0.961 0.962 - 0.960
(El Houby and
2 Class: 0.930 0.948 0.917 0.912
Yassin, 2021)
Benign,
Current Research
Malignant 0.908 0.867 0.853 0.927
Model B
Current Research
0.902 0.810 0.952 0.967
Model A
2 Class: Current Research
0.872 0.867 0.878 0.878
Mass, Normal Model C
2 Class:
Benign, (Wang et al., 2019) 0.865 0.851 0.845 0.880
Malignant
2 Class: Current Research
0.859 0.879 0.830 0.842
Mass, Normal Model D
(Lu, Loh and
2 Class: 0.823 0.913 0.856 0.569
Huang, 2019)
Benign,
(Zhou, Zaninovich
Malignant 0.75 0.5 0.714 -
and Gregory, 2017)
Model A: Using “Benign Malignant Dataset” (INbreast Dataset)
Model B: Using “Benign Malignant Dataset” (INbreast Dataset and Husada)
Model C: Using “Mass Normal Dataset” (INbreast Dataset)
Model D: Using “Mass Normal Dataset” (INbreast Dataset and Husada)
Michael Jeremy
Based on Table 2 in Section 2.5, Table 9 was also made in the order comparing
“Accuracy” as the main parameter of comparison. Table 9 was made to compare the
results achieved in this research with results achieved by other research that have
similar design of study. All related study within Table 9 uses CNN, and mammogram
as the primary data. It can be seen that the program has produced a comparable or
improved results compared to other related researches. But there are also other
researches that had better results compared to the results obtain in this current research.
Other research can obtain high accuracy which can be credited to their advanced CNN
or deep learning techniques and also advanced or higher-level hardware with high
computational power. Advanced CNN techniques include methods such as combining
different architectures of CNN. However, it can be seen that the program developed in
this research have produced comparable results compared to other researches.
Comparison of Current Study With Related Researches

100
96,1
95 93
Average Accuracy (%)
90,8 90,2
90 87,2 86,5 85,9
85 82,3
80
75
75
70
Combined Average of Other Research (86,58%)
Figure 42. Comparison of Current Study with Related Researches
In bar graph representation, figure 42 is also made to compare all related studies
regarding breast cancer detection and classification using mammography as the primary
source of data. All the related research in the figure used CNN with various methods,
either combining different architectures, transfer learning, or others. The combined
Michael Jeremy
average accuracy for other related study was 86.58%, which sets the trendline for a
standard average for comparison. It can be seen that the accuracy achieved in all models
that have been developed in this study have exceeded the trendline, with the exception
of “Current Research Model D. Research that achieved better results were credited to
their advanced CNN and deep learning techniques, as well as advanced and higher level
of methodology and equipment. However, the performance achieved using the “Benign
Malignant Dataset” can clearly be seen that it is above the standard average of other
related researches.
The main difference from this study compared to other related study is highlighted in
the methods, hardware, and data used. The hardware used in this study was a standard
computer with the average processor. The amount of mammogram images used in this
study was also very lacking in comparison to other researches. However, the results
obtained in this research have provided comparable and also improvements to a certain
extent. This study has also had the privilege to use mammogram images that were
obtained from real patient-case from a local hospital in Jakarta, Husada Hospital.
Compared to other countries, there is a lack of mammography data available in
Indonesia. This is most likely caused by the still-developing health infrastructure in
Indonesia currently. This study has also provided a perspective on how doctors and
radiologist’ view a pre-processed mammogram images. Insights on the pre-processed
mammogram images from an expert radiologist is discussed in the following sub-
chapter.
4.4.2 Doctors and Radiologist’ Perspective

During the data retrieval, consultations were made with an expert radiologist from
Husada Hospital, an insight regarding the program was obtained from a medical
perspective. The image output obtained from the pre-processing method was done to
enhance the program’s performance. After the pre-processing method, there were
images that had better quality and enhanced information from a medical’s perspective.
However, there were some images that had a loss of information.
In Figure 43, both the left image shows the image before preprocessing, while both the
right image shows the image after preprocessing. From a medical’s perspective, both
images had an enhancement in quality after preprocessing, where it clearly shows the
Michael Jeremy
irregularity of the outer shape of the mass, indicating the signs of cancer. In Figure
43(b) specifically, although the mass within the breast was small in size, it was also
categorized into a malignant type of mass which also indicates cancer. Both images had
the mass lesion within the breast greatly enhanced which also benefits doctors and
radiologists.
Figure 43. Original image vs Pre-processed Image; Left Image(s) (original);

Right Image(s) (Pre-processed)
Figure 44 also shows an image before preprocessing (left image) and after
preprocessing (right image). However, in this image, although it is clear that the mass
lesion within the breast was also enhanced, there was an issue caused by the pre-
processing method from a medical’s perspective. The issue lies specifically on some of
the outer part of the breast, which became too dark due to the contrast enhancement
from the pre-processing method. Hence, some of the outer part of the breast, which
represents the breast tissue was lost. Although the important region of the image, the
mass lesion was enhanced, doctors and radiologists still need to assess the entirety of
the image, which also includes the outer part of the breast.
Figure 44. (left image) Original image vs (right image) Pre-processed Image
Michael Jeremy
The expert radiologist also gave insights regarding how the pre-processing method
should be conducted. Rather than putting all the image through the same pre-processing
code with all the same parameters, it is better to one-by-one pre-process the image, to
ensure that the pre-processing method brings out the maximum outcome from the
mammogram image, which benefits both from an engineering’s perspective and a
medical’s perspective. The development of the program was also greatly encouraged to
be used as a tool to provide a second opinion for doctors and radiologists. And therefore,
the chance of missing a cancer, or misdiagnosing a case can greatly be reduced.
Michael Jeremy
CHAPTER 5 – CONCLUSIONS AND RECCOMENDATIONS
In conclusion, this study has met the first objective, which is to obtain a program using
pre-processing Median Filtering, Otsu Thresholding, Truncation Normalization and
CLAHE, and Convolutional Neural Network which is able to classify mammogram
images, and the second objective which is to obtain a program that could assist doctors,
radiologists. The public INbreast Dataset, was used as the source of mammogram
images to train and test the program. Near the end of the study, an opportunity was also
given to obtain and use mammogram images from Husada Hospital.
5.1 Conclusions
The best result achieved during this study obtained an accuracy, precision and
sensitivity of 94.1%, 100%, and 85.71% in classifying benign or malignant. The best
result achieved to classify mass or normal obtained an accuracy, precision and
sensitivity of 88.3%, 92.6% and 83.3%.
The best average accuracy, precision and sensitivity obtained were 90.8%, 85.3%,
86.7% in classifying benign or malignant, and 87.2%, 87.8% and 86.7% in classifying
mass or normal.
The pre-processing methods followed by the CNN have given improved and
satisfactory results where the program achieved an accuracy higher or as high as other
related researches in predicting breast cancers from mammogram images.
Thus, it can be concluded that the first and second hypothesis were retained. The pre-
processing method have been proven to be able to produce enhanced mammogram
images for the program to detect mass lesions and cancers from the mammogram
images, and also for doctors and radiologists. It was also found that the pre-processing
method used was a crucial step in this program. Experiments using the program without
the usage of pre-processing resulted in the program not being able to classify the
images.
Michael Jeremy
5.2 Recommendations
There are several recommendations that could be implemented to further improve the
development of this program. This study has proposed good results but also have left
rooms for improvements and future enhancements. The followings are some ideas for
the development of computer aided diagnostic for breast cancer detection in Indonesia:
• For the pre-processing step, segmentation can be added to isolate just the mass
lesion from the image. By isolating the mass from the breast, the program can
focus in analyzing just the mass lesion. Segmenting the mass can also give other
valuable information, such as: visualizing the shape more clearly, calculating
the area of the mass in respect to the breast region as the reference. By
calculating the size or area of the mass, it will be beneficial specifically towards
monitoring the treatment of the patient.
• Since the program developed in this study only used mammogram images that
have mass lesions, it is also important to improve the program to be able to
distinguish the other findings that are assessed from a mammogram image,
which includes: tissue density, calcifications and architectural distortions.
Michael Jeremy
GLOSSARY
Benign – A term that is used to indicate that a finding is not fatal and not a potential
cancer.
CAD – Computer-Aided Diagnosis
CLAHE – A type of histogram equalization technique that introduces the use of a clip-
limit value to avoid excessive stretching towards the histogram.
Class – Pre-established categories to which the input variables can be assigned to.
Diagnose – The process and result of determining and distinguishing an illness through
an analysis.
Enhancement – The process of improving the quality of a data.
False Positive – In a two-class classification, it is a misclassification error that predicts

a negative class as a positive class.
False Negative – In a two-class classification, it is a misclassification error that predicts

a positive class as a negative class.
Histogram – A graphical representation of the number of pixels and its corresponding

intensity value.
Image Processing – The process of manipulating images using various methods.
Malignant – A term that is used to indicate that a finding is dangerous and strongly
suggest a cancer.
Mammogram Images – The product of an X-Ray scan performed on the breast.

Typically performed to diagnose a breast cancer.
Mass – A term used to define a finding of a space-occupying lesion from a

mammogram image.
Michael Jeremy
Median Filter – A nonlinear spatial filter that replaces the value of the center pixel
with the value of the input group of pixels.
Otsu Thresholding – A type of thresholding used to separate the foreground of an

image from its background.
Pre-Processing – The process conducted prior to any training and testing of a Machine
Learning or Artificial Intelligence, to enhance the images to improve the performance
of the program.
True Negative – In a two-class classification, the predicted negative class matches the
actual negative class.
True Positive – In a two-class classification, the predicted positive class matches the
actual positive class.
Transfer Learning – The process of taking an existing architecture and the pre-trained
weights and using it to learn a new type of data.
Truncation Normalization – A type of normalization technique that consists of two

steps: truncation and normalization.
Michael Jeremy
REFERENCES
About OpenCV (no date).
Albawi, S., Mohammed, T. A. and Al-Zawi, S. (2017) ‘Understanding of a

convolutional neural network’, in 2017 International Conference on Engineering and
Technology (ICET). Ieee, pp. 1–6.
Bandyopadhyay, S. K., Maitra, I. K. and Kim, T.-H. (2011) ‘Identification of

abnormal masses in digital mammography images’, in 2011 International Conference
on Ubiquitous Computing and Multimedia Applications. IEEE, pp. 35–41.
Cancer Statistics (2020) National Cancer Institute. Available at:

https://www.cancer.gov/about-cancer/understanding/statistics.
Charlotte Pelletier (no date) Temporal Convolutional Neural Network for the
Classification of Satellite Image Time Series, Scientific Figure on ResearchGate.
Explain in detail about Median Filtering (2019) i2tutorials. Available at:

https://www.i2tutorials.com/explain-in-detail-about-median-filtering/.
Firmino, M. et al. (2016) ‘Computer-aided detection (CADe) and diagnosis (CADx)

system for lung cancer with likelihood of malignancy’, Biomedical engineering
online, 15(1), pp. 1–17.
Ghr, M. (2022) Histogram Equalization, MATLAB Central File Exchange. Available

at: https://www.mathworks.com/matlabcentral/fileexchange/50457-histogram-
equalization (Accessed: 19 April 2022).
Ginsburg, O. et al. (2020) ‘Breast cancer early detection: A phased approach to

implementation’, Cancer, 126, pp. 2379–2393.
Harbeck, N. et al. (2019) ‘Breast cancer’, Nature reviews Disease primers, 5(1), pp.
1–31.
El Houby, E. M. F. and Yassin, N. I. R. (2021) ‘Malignant and nonmalignant

classification of breast lesions in mammograms using convolutional neural networks’,
Biomedical Signal Processing and Control, 70, p. 102954.
Huang, G. et al. (2017) ‘Densely connected convolutional networks’, in Proceedings

of the IEEE conference on computer vision and pattern recognition, pp. 4700–4708.
IBM Cloud Education (2020) Convolutional Neural Networks, IBM Cloud Learn
Hub.
Icanervilia, A. V. et al. (2021) ‘A Qualitative Study: Early Detection of Breast Cancer

in Indonesia (After Universal Health Coverage Implementation)’.
Michael Jeremy
Irwig, L., Macaskill, P. and Houssami, N. (2002) ‘Evidence relevant to the
investigation of breast symptoms: the triple test’, The Breast, 11(3), pp. 215–220.
Kanwal, S. (2013) ‘Effect of O-GlcNAcylation on tamoxifen sensitivity in breast

cancer derived MCF-7 cells’. Paris 5.
Key, T. J., Verkasalo, P. K. and Banks, E. (2001) ‘Epidemiology of breast cancer’,

The lancet oncology, 2(3), pp. 133–140.
Khandelwal, N. (2021) Image Processing in Python: Algorithms, Tools, and Methods

You Should Know.
Khuzi, A. M. et al. (2009) ‘Identification of masses in digital mammogram using gray

level co-occurrence matrices’, Biomedical imaging and intervention journal, 5(3).
Lu, H.-C., Loh, E.-W. and Huang, S.-C. (2019) ‘The Classification of Mammogram
Using Convolutional Neural Network with Specific Image Preprocessing for Breast
Cancer Detection’, in 2019 2nd International Conference on Artificial Intelligence
and Big Data (ICAIBD). IEEE, pp. 9–12.
Mammograms (2019) American Cancer Society, ACS Journals.
Mengenal Otsu Thresholding (2019) PPIG ULM. Available at:

https://ppiig.ulm.ac.id/2019/04/08/mengenal-otsu-thresholding/.
Moreira, I. C. et al. (2012) ‘INbreast: Toward a Full-field Digital Mammographic

Database’, Academic Radiology, 19(2), pp. 236–248. doi:
10.1016/J.ACRA.2011.09.014.
Orr, L. (2020) Advice for Breast Cancer Awareness Month: Know Your Breast
Density, University of Rochester Medical Center. Available at:
https://www.urmc.rochester.edu/news/story/advice-for-breast-cancer-awareness-
month-know-your-breast-density.
Perou, C. M. et al. (2000) ‘Molecular portraits of human breast tumours’, nature,

406(6797), pp. 747–752.
Ramkita, N. and Murti, K. (2021) ‘The Challenges of Anatomic Pathology Medical

Education in Indonesia’, Jurnal RSMH Palembang, 2(1), pp. 119–124.
Siegel, R. L., Miller, K. D. and Jemal, A. (2020) ‘Cancer Statistics, 2020’, American
Cancer Society, ACS Journals.
Thomas, P. A. (2010) Breast Cancer and Its Precursor Lesions: Making Sense and
Making it Early. Springer Science & Business Media.
Waheed, K. B. et al. (2019) ‘Breast cancers missed during screening in a tertiary-care

hospital mammography facility’, Annals of Saudi medicine, 39(4), pp. 236–243.
Wang, Z. et al. (2019) ‘Breast cancer detection using extreme learning machine based
Michael Jeremy
on feature fusion with CNN deep features’, IEEE Access, 7, pp. 105146–105158.
Wei, B. et al. (2017) ‘Deep learning model based breast cancer histopathological
image classification’, in 2017 IEEE 2nd international conference on cloud computing
and big data analysis (ICCCBDA). IEEE, pp. 348–353.
Weinberg, R. A. (1996) ‘How cancer arises’, Scientific American, 275(3), pp. 62–70.
Yang, N., Ray, S. D. and Krafts, K. (2014) ‘Cell proliferation’, in Encyclopedia of

Toxicology: Third Edition. Elsevier, pp. 761–765.
Zhang, Y.-D. et al. (2021) ‘Improved breast cancer classification through combining
graph convolutional network and convolutional neural network’, Information
Processing & Management, 58(2), p. 102439.
Zhou, H., Zaninovich, Y. and Gregory, C. (2017) ‘Mammogram classification using

convolutional neural networks’, in International conference on technology trends.
Michael Jeremy
APPENDICES
Appendix 1. INbreast Dataset
Michael Jeremy
Appendix 2. Husada Dataset
Appendix 3. Architecture of DenseNet-201 Model (Huang et al., 2017)
Michael Jeremy
CURRICULUM VITAE
Name : Michael Jeremy
Place and Date of Birth : Jakarta, 24th August 2000
Phone Number : 0813-8252-8237
Green Garden O1/37, Kebon

Address :
Jeruk, Jakarta, 11520
Email : michaeljeremy248@gmail.com
Educational Background
2018 - 2022 Swiss German University, The Prominence Tower,

Jl. Jalur Sutera Barat No.15 Alam Sutera,
Tangerang, Banten 15143, Indonesia
2015 - 2018 SMA IPEKA Tomang, Jl. Green Ville Blok D, Duri
Kepa, Kebon Jeruk, Jakarta Barat, DKI Jakarta
11510, Indonesia
Internship and Social Work Experience
February – August 2021 Internship at Hochschule für Gesundheit Bochum,

Gesundheitscampus 6-8, Bochum 44801, Germany
January – February 2020 Internship at Bioteknik Design, Setra Duta Laguna 6

Blok C1/9, Bandung, Jawa Barat 40559, Indonesia
Michael Jeremy
January 2019 – April 2020 Teacher at Blessing Kids Learning Center, Ruko
Vienna Blok A/17, Jl. Kelapa Dua Raya 17, Gading
Serpong, Tangerang, Banten 15810, Indonesia
Academic and Non-Academic Awards
January 2017 2nd Place winner at Band Competition at

CASANOVA 2017, Kolese Kanisius
October 2016 1st Place winner at Vocal Group

Competition in West Jakarta 2017
Publications
“Joint-Development of Mobility Assistive Devices for Patients with Special

Treatment”
Authors: Dedy H.B. Wicaksono, Michael Jeremy, Lydia Anggraini Kidarsa
Prosiding PKM-CSR Vol 3 (2020), Page 376, Subsection Kesehatan: Peran Perguruan
Tinggi dan Dunia Usaha Dalam Pemberdayaan Masyarakat Untuk Menyongsong
Tatanan Kehidupan Baru, e-ISSN: 2655-3570
DOI: https://doi.org/10.37695/pkmcsr.v3i0.898
Michael Jeremy

3456-Thesis in PDF Format-20769-1-2-20220726

Uploaded by

Copyright:

Available Formats

You might also like

3456-Thesis in PDF Format-20769-1-2-20220726

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

3456-Thesis in PDF Format-20769-1-2-20220726

Uploaded by

Copyright:

Available Formats

MAMMOGRAM IMAGE PRE-PROCESSING WITH CONVOLUTIONAL

NEURAL NETWORK FOR BREAST CANCER DETECTION AND

SWISS GERMAN UNIVERSITY

Revision after Thesis Defense on 13 July 2022

STATEMENT BY THE AUTHOR

Aulia Arif Iskandar, S.T., M.T.

Muhammad Fathony, Ph.D.

Dr. Dipl.-Ing. Samuel P. Kusumocahyo

MAMMOGRAM IMAGE PRE-PROCESSING WITH CONVOLUTIONAL

SWISS GERMAN UNIVERSITY

STATEMENT BY THE AUTHOR ............................................................................... 2

Figure 1. Stages of Tumor Development (Kanwal, 2013) ........................................... 19

Table 1. BI-RADS Coding ........................................................................................... 28

Therefore, numerous attempts have aimed to develop a Computer-Aided Diagnosis

Therefore, this study aims to develop an image processing method to pre-process

2. The performance of a trained CNN model is heavily dependent on the amount

1.3 Research Objectives

2. To obtain an improved program that can support or assist medical practitioners

1.4 Significance of Study

1.5 Research Questions

Question #2 Will the image processing Otsu thresholding, median filtering,

Hypothesis #2 The pre-processing method and convolutional neural network will

CHAPTER 2 - LITERATURE REVIEW

Additionally, to be able to develop a program to detect breast cancer, image processing

Figure 1. Stages of Tumor Development (Kanwal, 2013)

2.2 Breast Cancer

2.3 Diagnosis of Breast Cancer

2.4 Breast Cancer Diagnosis through Mammography

A mammography has 2 types of main applications: screening mammograms and

2.4.1 Standard Views of Mammogram Images

Figure 2. Craniocaudal (CC) and Medial Lateral Oblique (MLO) View

2.4.3 Image Interpretation

2.4.3.1 Tissue Density

Figure 3. Different Types of Breast Density in Mammogram Images (Orr, 2020)

As a breast increases in its fibroglandular density, the chance of an underlying breast

Figure 4. Example of a Mass Found in a Mammogram Image

Figure 5. Schematic of a Breast (Thomas, 2010)

2.4.3.4 Architectural Distortions

2.4.4 Breast Imaging Reporting and Data System

Table 1. BI-RADS Coding

2.5 Previously Proposed Methods

2.6 Image Pre-processing

Figure 6. A Digital Image (Khandelwal, 2021)

The previously mentioned truncation normalization, Otsu thresholding, median

2.6.1 Otsu Thresholding

In pseudocode, thresholding can be interpreted as:

In its application, the thresholding algorithm is written as cv2.threshold() function

thresholdValue, maxVal, thresholdingTechnique). In simple thresholding, there

are various thresholding techniques available. However, regardless of the technique, in

Another type of thresholding is the adaptive thresholding, which is a thresholding

The function of an Otsu thresholding is the normal cv2.threshold() function.

2.6.2 Median Filtering

Figure 7. Median Filtering Process (Lu, Loh and Huang, 2019)

Figure 8. Image Histogram and Histogram Equalization (Ghr, 2022)

Therefore, Contrast Limited Adaptive Histogram Equalization or abbreviated as

2.6.4 Truncation Normalization

The truncation normalization consists of two processes: truncation processing and

2.6.5 Data Augmentation

2.6.6 Neural Networks