Professional Documents
Culture Documents
Ukoha Chinonso Precious 17CG023225
Ukoha Chinonso Precious 17CG023225
BY
JULY 2021
CERTIFICATION
I hereby certify that this project was carried out by Ukoha Chinonso Precious in the
Department of Computer and Information Sciences, College of Science and Technology,
Covenant University, Ogun State, Nigeria, under my supervision.
PAGE ii
DEDICATION
This report is dedicated to my parents, Dr & Mrs. Ukoha for their ever-present support,
encouragement, and advice.
Most importantly, to the Almighty God, for the wisdom, guidance, understanding, and
favor all through my schooling experience. It was your grace from the start to finish.
PAGE ii
ACKNOWLEDGEMENT
My sincere appreciation goes to God, for His grace and his favor without which this study
would have been impossible.
My parents, Dr and Mrs. Ukoha, for their unfailing support and encouragement during my
academic years and the completion of this study, and to my siblings, for always keeping
me on track whenever I faced challenges.
A special thanks to my final year project supervisor, Dr Olanma Iheanetu, who made this
work a reality. Her advice was invaluable and I was led through all phases of my project
by her enlightening suggestions, support, and counsel. I'd also want to express my
gratitude to my final year defense panelists my experience a pleasurable one and for their
insightful remarks and recommendations.The HOD and the entire department of
Computer and Information Sciences, thank you for the tireless hours spent on impacting
priceless knowledge to my life. God bless you all.
PAGE ii
TABLE OF CONTENTS
Title Page
Certification
Dedication
Acknowledgement
Table of contents
List of figures
Abstract xiii
1.5. Methodology
PAGE ii
Breast Cancer
15
2.3.3. Predicting Breast Cancer Risk Using Personal Health Data and Machine
Learning Models 19
PAGE ii
3.3.2. Functional Requirements 27
5.2. Summary
65
5.3. Recommendation 65
5.4. Conclusion 66
REFERENCES
PAGE ii
LIST OF FIGURES
Title Page
Figure 2.1: Obi-Hep Breast Cancer Diagnosis System 16
PAGE ii
LIST OF TABLES
Title Page
Table 3.1: User Creation Table 40
Table 3.2: Table of the Patient Diagnosis Data 40
Table 3.3: Table of the Patient Risk Assessment Data 41
PAGE ii
PAGE ii
ABSTRACT
Breast cancer and other NonCommunicable diseases have been a great cause of concern to
Nigeria and other Low and Middle Income Countries (LMICs). The inadequacy of skilled
personnel and infranstructure has led to the high rate of breast cancer misdiagnosis and
mortality. The insufficient amount of indigenous data in the nation has further aggravated
the situation. This study aims to automate the process of breast cancer diagnosis by
developing a web-based Breast Cancer Diagnosis System using Convolutional Neural
Networks (CNN) which will take into consideration risk factors of patients into analysis
for breast cancer preventablilty and also serve as a standardized data repository for patient
data. To carry out this study, an extensive review of existing systems and journals were
carried out to discover which machine learning algorithms have previously diagnosed
breast and performed breast cancer risk assessment efficiently. Using tensorflow python
library and jupyterlab, a MobileNet Convolutional Neural Network (CNN) model was
used to diagnose breast cancer biopsy images using the Breast Cancer Histopathological
Image Classification (BreakHis) dataset for training and a Logistic Regression model was
used to classify breast cancer risk as high or low using the Breast Cancer Surveillance
Consortium (BCSC) Risk Estimation dataset for training. A web based application was
built using HTML, CSS, Bootstrap and Django web framework for the server side with
MySql for the database creation and the models were embedded into the system. It was
discovered that MobileNet CNN architecture can efficiently diagnose breast cancer with a
training accuracy of 93.8%. On further evaluation with 9 data samples from Clinix
hospital, the system predicted 7 samples correctly achieving of 77.8%. Logistic regression
classifier was able to accurately predict breast cancer risk with an accuracy of 99%. This
system can help to serve as a second opinion to pathologists or substitute for pathologists
in cases where they are unavailable.
PAGE ii
CHAPTER ONE
INTRODUCTION
Disease can be specified as any adverse diversion from an organism's natural state. It
varies from normal bodily damage in that it is generally followed by some
clinical symptoms. Diseases may have their origins within the individual, they may be the
product of a medical procedure, or they may be triggered by a foreign agent such as a
toxic chemical. In the latter situation, the illness is Noncommunicable, meaning that it
only attacks the individual that is subjected to it. The World Health Organization (WHO)
has recognized four major types of Noncommunicable Diseases (NCDs): cardiovascular
disease, chronic respiratory disease, diabetes mellitus and cancer (Burrows, W. and
Scarpelli, 2020). Cancer is responsible for a vast majority of Noncommunicable Diseases
(NCDs) deaths in the world (WHO, 2021b).
Cancer is one of the most prevalent cause of NCD mortality globally, responsible for
almost ten million deaths each year. Cancer is also accountable for nearly one in every six
deaths worldwide (Ferlay et al., 2019). The most commonly diagnosed cancers globally in
2020 were breast, lung, colon and rectum, prostate, non-melanoma skin, and stomach
cancers. The leading causes of global cancer mortality in 2020 were lung cancer, colon
and rectum cancer, liver cancer, stomach cancer and breast cancer. Cancer accounts for 70
percent of mortality occurrences in developing countries. (WHO, 2021a).
PAGE ii
In Nigeria, cancer is responsible for 70,000 deaths annually (28 414 for male and 41 913
for female). Breast cancer, cervix uteri, prostate, non-Hodgkin lymphoma, and liver
cancer are the five cancers with the largest reported occurrence in the nation. Breast
cancer, cervix uteri, prostate, liver, and non-Hodgkin lymphoma have the highest
projected death rates. Breast cancer is now Nigeria's most diagnosed cancer (Ferlay et al.,
2019).
Breast cancer occurs when some cells in the breasts start to grow unevenly. The affected
cells create a lump as they split up at a very rapid rate when compared to unaffected cells
and keep growing until they eventually affect the nodes in the lymphs and subsequently
the rest of the body (Mayo Clinic, 2021). Breast cancer affects both genders but women
are more succeptible to the disease. This is because the male breast tissue is all fat and
fibrous tissue called stroma, and they have less ducts and lobules than female breast
tissue. Also, women have higher estrogen levels than men, which can increase their risk
of breast cancer (CCTA, 2019).
The number of women at risk of breast cancer in Nigeria has gradually grown since 1990,
when it was about 24.5 million to nearly 40 million in 2010 and is expected to rise above
50 million in 2020 (Olatunji et al., 2019). Nigeria has one of the greatest age-standardised
incidence rates (ASR) of breast cancer in Sub-Saharan Africa, next just to South Africa,
and less than Europe and North America. In 2010, the country's ASR for breast cancer
was 54.3 per 100,000 people. Despite being much less than Belgium's rate of 111.9 per
100,000 women and the United States' rate of 92.9 per 100,000 women, it reflected a 100
percent rise in breast cancer incidence in the country over the last ten years. Nigeria also
has the world's third highest breast cancer death rate (25.9 per 100,000 women), resulting
in the deaths of half the women affected (Emilia, 2017).
The breast cancer-related mortality rates keep increasing in Nigeria because the public
and health-care providers lack understanding on the importance of early detection of
breast cancer, and as a result, severe stage at diagnosis remains the norm. This is bad for
PAGE ii
patients’ prognosis as the stage at breast cancer diagnosis is among the most important
prognostic factors. (Emilia, 2017). There is also a significant lack of healthcare
professionals in Nigeria. With at least 3,000 doctors graduating per year, the Nigeria
Medical Association (NMA) estimates that the country will need about twenty five years
(25) to produce enough doctors to meet the country's needs (Agency Report, 2019).
This situation is especially worse in the field of pathology. Pathologists are very important
to breast cancer diagnosis because they examine breast tissues gotten from biopsy, which
is the surest method of diagnosis for determining if cells are cancerous or not, to derive
some important breast cancer values. According to the College of Nigerian Pathologists,
the total number of pathologists is said to be only 500. This makes the ratio of
pathologists to the entire population to be a concerning 1:400,000 persons. This goes
against the recommended standard of one pathologist to about 40,000 people. At the rate
at which the nation is going, Nigeria will take about 500 years to achieve the patologist-
patient ratio that exists in the United States and the United Kingdom today. This is
detrimental to the healthcare system because pathology is used to render evidence-based
diagnoses of 70 to 80 percent of diseases (College of Nigerian Pathologists, 2020).
The lack of pathologists often leads to misdiagnosis and according to the Care
Organization Public Englightenment (COPE) 70 percent of cancer patients are
misdiagnosed in Nigeria (Chioma, 2020). The most common reasons for long intervals for
breast cancer tumor staging in Nigeria is symptom misinformation and misdiagnosis
(Agodirin et al., 2019). In the case of Mary Abia, a misdiagnosis of her biopsy reports led
to her ultimately progressing to stage IV cancer which is the most deadly cancer stage
with the worst prognosis (Ake, 2018). There is also a severe lack of awareness of breast
cancer risk factors and issues among Nigerian women with most women attributing their
symptoms to spiritual attacks (Agatha Ogunkorode et al., 2021).
There are a number of risk factors of which affect an individual’s chances of getting
affected by breast cancer. While all cancers are caused by several mutations, these
mutations are caused by environmental interaction. Studies show that the majority of
PAGE ii
cancers are not inherited and that environmental influences such as eating patterns,
obesity, alcohol intake, and infections have a substantial effect on their growth. The fact
that cancer has a lower genetic impact and that environmental causes can be changed
suggest that cancer can be avoided (Anand et al., 2008). Risk assessment of breast cancer
is done by risk factors taking into account and evaluating those factors to determine breast
cancer susceptibility level. It is important to perform proper risk assessment in order to
carry out preventive measures, prolong and improve quality of life and foster early
detection and diagnosis of breast cancer (Akinnuwesi et al., 2020).
Research into how to improve quality of life and the health care system as a whole has
been hindered due to the country’s lack of standardized data. According to Nigeria’s
minister of health, Isaac Adewole, Nigeria struggles with high burden diseases and must
encourage data collection to foster research on how to combat those diseases. He also
emphasized on how the lack of data collection as a whole is restricting the interventions
of the health sector to perform practical and evidence-based control programmes for the
NonCommunicable Diseases in the country. The commissioner of health in Lagos state,
Jide Idris, also described the need to utilize Information and Communication Technology
to aid the healthcare system in Nigeria to minimize the obstacles faced due to inadequacy
of data. It is evident that a proper and reliable data collection system will be key in
overcoming the challenges brought on by breast cancer high growth rate and diagnosis
issues (Anthonia Obokoh, 2019).
PAGE ii
The fundamental factors that determines breast cancer survivability are proper risk
assessment, as well as early and accurate diagnosis. It is important for breast cancer to be
detected and risk assessment to be evaluated as early as possible as it plays a significant
role in reducing the mortality rate of patients because patients can begin receiving the
necessary treatments as soon as possible (Akinnuwesi et al., 2020). Hence, there is a dire
need to employ strategies to manage maldistribution and other human related factors in
the healthcare industry. One of most efficient strategy is to employ the use of artificial
intelligence and machine learning techniques.
Machine learning techniques have been applied to breast images and biopsy records
classification in the form of Computer-Aided Detection (CAD) technologies, to aid
doctors in reading and decoding medical images and reports. The aim of CAD programs
is to improve sensitivity and precision so that doctors can make better diagnoses.
(Akinnuwesi et al., 2020). Machine learning techniques can also be applied to breast
cancer risk assessment to determine the effective risk factors and their association. The
models built from these techniques can be used for prediction and estimation of an
individual’s succeptibility to breast cancer. Efficient risk assessment can enhance the
success of treatment, improve survivability chances and minimize expenses. Hence it is
necessary to build a dependable and efficient machine learning models to perform proper
breast cancer risk assessment. (Al-Quraishi et al., 2017).
PAGE ii
Symptom/Risk Factors mismanagement leads to worse prognosis of breast cancer
(Agodirin et al., 2019 ).
Lack of standardized datasets in Nigeria is restricting the research interventions to
combat Noncommunicable diseases such as breast cancer (Anthonia Obokoh,
2019).
1.5. METHODOLOGY
The techniques and tools used to achieve the objectives of the study include:
PAGE ii
This is in order to gather knowledge about limitations of the existing systems and
the most efficient machine learning algorithm in providing a solution to the
problem statement.
Objective 4: This objective will be achieved by, carrying out component testing,
system testing, acceptance testing and evaluating the diagnosis model performance
on unknown data.
PAGE ii
The breast cancer risk assessment tool will only classify breast cancer risk as high risk
and low risk and will not give a 5 year and lifetime risk prediction. The diagnosis data is
limited to women only.
PAGE ii
CHAPTER TWO
LITERATURE REVIEW
1.9. INTRODUCTION
Cancer is typically named for the part of the body where it first appeared; so, therefore,
breast cancer refers to when breast tissues grow in an accelerated and out-of-control
manner. (Sharma et al., 2010). Female breasts are made up of a variety of tissues. The
glandular tissue (lobules) that contains milk, the fatty tissue that specifies breast size, and
the fibrous tissue that binds glandular and fatty breast tissue in place are the various forms
of breast tissue. The muscles that attach the breasts to the ribs are not part of the breast
anatomy. (Cleaveland Clinic, 2020).
Breast cancer may originate from a range of sites in the breast. Ductal cancers arise from
the ducts that transport milk to the nipple, while lobular cancers arise from the glands that
produce breast milk. Breast cancer forms, such as phyllodes tumor and angiosarcoma, are
not as prevalent. A small percentage of breast cancers originate from other tissues.
(American Cancer Society, 2019). Tumors can grow in different areas of the breasts. Most
breast cancers develop by benign (non-cancerous) changes. An instance of this is a
fibrocystic alteration, a non-cancerous tumor in which women experience cysts, fibrosis,
unevenness, and regions of swelling, soreness, or painful breast. (Sharma et al., 2010).
Breast cancer can be divided into two types: non-invasive and invasive. Non-invasive
breast cancers, also known as carcinoma in situ or pre-cancers, remain located within the
breast's milk ducts, and they do not infiltrate the normal tissues inside or outside the
breast. On the other hand, invasive breast cancers break through the lobular walls and
infiltrate the neighboring regular and stable fatty and connective tissues. Most breast
cancer types are invasive. Cancer type and patient therapy response are determined by
PAGE ii
breast cancer metastasis, i.e., cancer spread. Breast cancer can generally be classified as
one of the following:
Ductal Carcinoma In Situ
Lobular Carcinoma In Situ
Invasive Ductal Carcinoma
Invasive Lobular Carcinoma
Inflammatory Breast Cancer
Male Breast Cancer
Paget's Disease of the Nipple
Phyllodes Tumors of the Breast
Metastatic Breast Cancer.
Many people are predisposed to breast cancer due to unavoidable risk factors such as
gender and age. Other risk factors, such as a person's family history, cannot be changed.
However, some risk factors, such as alcohol consumption and body weight, can be altered
(University of California San Francisco, 2020).The factors that raise the risk of having
breast cancer are listed below, including both those that cannot be changed and those that
can (American Cancer Society, 2020).
PAGE ii
estrogen hormone, which is responsible for irregular cell growth, in women
breasts.
Age is the also a major breast cancer risk factor. Breast cancer is more likely to
develop as a person gets older.
Race also plays an important role in breast cancer risk level. White women are
likelier than Black, Latino, and Asian women to develop breast cancer, but black
women tend to develop late-stage breast cancer with a poor-prognosis which is
diagnosed at an early age and also have a higher mortality rate.
Family history is also a factor, as women that have a first-degree family member
who previously had or currently has breast cancer, are more succeptible to the
disease.
A woman’s age at menarch is also a risk factor. Women who start menstruating
before the age of 12 are thought to be at a marginally greater risk. This is because
the level of female sex hormones a woman is exposed to over her lifespan is also a
risk factor. The more a woman is exposed to the greater her chance of getting
breast cancer.
The age at a woman at her first birth is another risk factor. Women who give
birth before 29 are more succeptible to breast cancer, while women who give birth
after 29 or do not have children are less succeptible to the disease.
The age of a women when she reaches menopause is also a risk factor. Women
that reach menopause when they are 54 or younger have a reduced risk than
PAGE ii
women who reach their menopause after they are 54. This increased risk may be
linked to their elevated lifetime exposure to sex hormones.
Women with higher breasts density are more succeptible to the disease and their
breast cancer tumors are more difficult to detect.
Physical exercise has been attributed to reduced of breast cancer risk levels in
postmenopausal and premenopausal women. Increased levels of physical exercise,
especially in postmenopausal women, reduces breast cancer succeptibility.
Consumption of alcoholic drinks increases the risk of HER2 positive breast
cancer. Alcohol increases the level of estrogen hormone which is related to HER2
postive breast cancer. Alcohol also destroys the DNA in cells which can in turn
increase breast cancer susceptibility.
Women that use Hormone Replacement Therapy (HRT) for a long time have a
greater chance of developing breast cancer when compared to women that do not
use it. If women stop their HRT intake, their risk of breast cancer decreases, but
some additional risk persists for more than ten years.
All women is at a level of risk for breast cancer, but the risk level varies greatly from one
woman to the next. Understanding risk is crucial because it impacts medical choices
ranging from whether a symptom-free woman can get a mammogram and how actively to
seek preventive measures such as anti-estrogens or prophylactic mastectomy and ovary
removal.
PAGE ii
1.9.2. Machine Learning
By concept, machine learning is a field of computer science that arose from analysing
pattern recognition and intelligence. It is the method of designing algorithms that can
learn from experience and predict outputs using data sets. Instead of following linear
static instruction set, these processes work on the basis of input variables to make guided
decisions or judgements. (Simon et al., 2020).
Broadly speaking there are three main machine learning methods. These include:
Supervised learning: Supervised machine learning algorithms are algorithms that
require extra guidance. The training and testing datasets are separated from the
input dataset. An output attribute, that requires prediction or classification, is
included in the train dataset. For estimation or grouping, all algorithms learn
trends from the train set and add these to the test set. Examples include decision
tree algorithm, naïve baye algorithm, support vector machine algorithm e.t.c.
Unsupervised learning: The algorithms for unsupervised learning only learn a few
features depending on the data it is introduced to. It uses formerly taught
features to identify the class of new data. It is mostly used for feature reduction
and clustering. Examples include k-Means clustering, principal component
analysis and so on.
Machine learning has been used in various areas of medicine such as cardiology, cancer
diabetes e.t.c. Most machine learning models built in laboratory are used for prognosis,
diagnosis, or classification of clinical groups, which shows that they may be useful in the
creation of automated decision support tools. Big databases and precise labels, which are
PAGE ii
usually given by expert practitioners, are important for developing these methods. The
premise is to find the data models or factors that are related to the desired result (eg, the
cancer diagnosis of a woman). This situation helps to obtain important insights from the
given data, so that patients can keep track of their overall wellbeing and healthcare
practitioners can make appropriate decisions on treatment and management.
(Triantafyllidis & Tsanas, 2019).
Deep learning is a field of machine learning that learns from its personal algorithmic
system. It works based on how human beings make judgments and systematically
disseminates knowledge into a uniform framework. This is accomplished by integrating
many algorithms into a multilayer network known as an artificial neural system (ANN)
which was created to model the way the human brain assimilates information using the
biological neural network. Therefore, deep learning is said to be more efficient when
compared to baseline machine learning models (Krishna et al., 2018). To perform breast
image classification, specialized technology of natural image classification and machine
learning techniques are primarily used. This helps medical practitioners to have an
assistive tool while still saving them time (Nahid & Kong, 2017).
The contribution of machine learning in breast cancer risk assessment has had a positive
impact on breast cancer treatment. It has the ability to decrease breast cancer deaths by
aiding with the early identification of the disease. It has since ensured that the risk of
developing breast cancer is correctly measured and detected. This is in order to encourage
early detectiom of women with high risk of getting the disease so as to take the necessary
preventability measures, as well as to prevent breast cancer misdiagnosis. It has the ability
to bring down the cost of breast cancer treatment for health care providers and patients.
(Akinnuwesi et al., 2020). Since machine learning algorithms are not limited to a fixed set
of risk factors, they have the ability to modify or integrate new (Krishna et al., 2018) ones
and deliver the promising prospect of better and more accurate risk estimates. (Ming et
al., 2019).
PAGE ii
on the device. The system's usage can be controlled and monitored by an administrator
who is in charge of the application's access information (Okikiola et al., 2016).
Figure 2.1: Obi-Hep Breast Cancer Diagnosis System (Okikiola et al., 2016)
PAGE ii
The cedars-sinai breast health assessment is an online tool for performing breast cancer
risk assessment. The system takes in a range of values including breast cancer risk factors
such as age, age at menstruation, age at menopause, weight e.t.c and gives a risk value
based on each of these features. The limitation of this tool it’s prediction model was
trained predominantly on white women data and as such may not predict accurately for
black women.
The images were preprocessed by eliminating the label anomaly that is found in all
images in the archive, equalizing the intensity levels in the images, and decreasing the
image quality and contrast. The rectified linear unit (RelU) was used as a nonlinear layer
to build a convolutional neural network architecture by developing a convolutional layer
with thirty filters and a five by five kernel size, three two by two pooling layers with two
strides (that takes significant images and decreases them), and four fully connected
layers.
The contribution of this method was the preprocessing of mammogram images with the
contourlet transform and the proposal of a new neural network topology of layers suited to
the role of breast cancer detection. The limitations of this study are using a limited dataset
and using mammography images for training the model which is ineffective in identifying
breast cancer in dense breasts. Future work involves testing the proposed algorithm with a
larger database to avoid a possible overfitting (Pirouzbakht & Mejía, 2017).
PAGE ii
1.11.2. Machine Learning Classification Techniques for Breast Cancer Diagnosis
The focus of this research is to introduce an automated approach to diagnose breast cancer
that follows a logical workflow.The Wisconsin Diagnostic Breast Cancer (WDBC)
dataset, contributed on the 1st of November, 1995, was used in this research. It contains
569 instances, 357 of which are non-cancerous and 212 of which are cancerous. It has 32
attributes, including two class attribute labels (B= benign, M= malignant), an ID number,
and 30 real-valued attributes. These attributes were derived through digitising images of a
fine needle aspiration procedure performed on a breast tumor and are used to explain the
properties of the cell nuclei in the image.
The data selection and preprosessing were performed by data cleaning, data partitioning
into testing and validation data, feature selection by Recursive Feature Elimination (RFE)
and Corerelation-based Feature Selection (CFS) methods, and feature extraction by
Principal Component Analysis (PCA) and Linear Discriminant Analysis (LDA). The
machine learning classification techiques carried out for classification were Support
Vector Machine (SVM), Naïve Bayes Classifier (NBC) and Artificial Neural Networks
(ANN) on the processed data.
The study illustrated that machine learning techniques such as the Support Vector
Machine (SVM), Artificial Neural Networks (ANN), and Nave Bayes (NB) can
efficiently improve the detection of non-cancerous and cancerous tumours. It also served
PAGE ii
as a base for a comparative study of the aforementioned methods and how feature
selection and feature extraction can assist in the selection of an appropriate machine
learning algorithm while constructing an adaptive intelligent model.
The limitation of this study is that it makes use of a small dataset, does not have an
interface and uses numerical data for training the model which can also be prone to
misdiagnosis. Prospects of the study includes developing the proposed approach into a
feasible practical tool for assisting and assisting doctors with a fast second opinion of
breast cancer diagnosis, comparing more machine learning algorithms used for breast
cancer diagnosis, analysing the obtained results and applying the proposed approach to
more disease options (Omondiagbe et al., 2019).
1.11.3. Predicting Breast Cancer Risk Using Personal Health Data and Machine
Learning Models
The study aimed to create machine learning algorithms that can accurately determine
breast cancer risk in five years with a higher accuracy than the standardized Breast Cancer
Riak Assessment Tool (BCRAT), and therefore foster early detection and prevention of
breast cancer.The Prostate, Lung, Colorectal and Ovarian (PLCO) data set was used in
this study. The data collection was created as part of a longitudinal, random, controlled
trial to assess the efficacy of various prostate, lung, colorectal, and ovarian cancer
screenings. Contributors completed a survey form outlining their past and existing health
problems from November 1993 to July 2001. The dataset processing was done entirely in
Python.
The study was performed using logistic regression, Gaussian naive Bayes, decision tree,
linear discriminant analysis, support vector machine, and feed-forward artificial neural
network. to The success of these six machine learning classifier models in estimating the
likelihood of individuals developing breast cancer in the next five years based on the
PLCO survey form data was tested. These machine learning models were chosen as they
all have distinct benefits that can make them the best for fulfilling the aim of the study.
The implementation code was written in the Python programming language. The Python
PAGE ii
scikit-learn package was used to build the logistic regression, naive Bayes, decision tree,
support vector machine, and linear discriminant analysis models. TensorFlow, a Python
library, was used to build the neural networks. The biases were configured as constants
and all neural network weights were Xavier configured. The neural networks had logistic
activation functions after each hidden layer and after the output layer. The loss was
described using cross-entropy, and minimized using an Adam optimizer.
The study's drawbacks include the absence of adequate biopsy or atypical hyperplasia
evidence in the PLCO dataset, as well as the fact that models were trained and tested on
different parts of one data set. The prospects of the study include evaluating how
effectively the machine learning models generalise to unseen data after training on the
total PLCO data set, determining if the selected machine learning models perform better
when trained on a larger data set after feature selection, and comparing the machine
learning models to the BCRAT when all models are trained with the seven BCRAT inputs
(Stark et al., 2019).
A version based on the AlexNet architecture was used in the process. The design
contained three convolutional layers with receptive fields (kernels) of size 55, zero-
padding set to 2 and the stride set to 1, and one pooling layer after each convolutional
layer with each set to use a 33 receptive area. When comparing standard machine learning
models to CNN, the study's main contribution was increased accuracy. In order to
increase accuracy, future research might look at various CNN architectures, hyper-
PAGE ii
parameter optimisation, and techniques for selecting representative patches (Spanhol et
al., 2016).
The data was pre-processed and transformed to remove missing values and other
anomalies and converted into an suitable format for further processing that
was comprehensible and consistent with the data mining methods used on the dataset.
SMOTE was used as the first class balancing method to oversample the minority class
label, resulting in the creation of a separate dataset. The SpreadSubsample method was
used to apply undersampling as the second method of class balancing on the original
training dataset, and a new training dataset was created. Following that, the imbalanced
dataset was resampled using a combination of oversampling and undersampling
techniques, and a training dataset was created using this method. Data mining techniques
such as the Bayesian Network, Random Forest, Decision Tree (C4.5) were applied to the
dataset.The k-fold method (10 folds) was used to validate the model. The class-
imbalanced training data and the three class-balanced training sets were all cross-
validated.
The findings revealed that the Bayesian Network generated using the hybrid approach
from class balanced BCSC data had better overall output in terms of accuracy (99.1%),
ROC (0.937), sensitivity (78.1%), and False Positive rate (0%) or precision (100 percent ).
The contributions of this study include introducing class balancing methods on the BCSC
dataset and introducing Bayesian network, which is rarely explored in previous studies as
a method to achieve high accuracy.
PAGE ii
Further works for the study involve employing feature selection on the BCSC dataset and
seperating of variables with the same properties, developing a predictive model based on
feature selection and similar variables that can generate a generic model with less risk
factors to be used for prediction, and application of the technique to other data with
features including form, position, tumor size, or radiation intensity (Rajendran et al.,
2020).
Classification models such as Support Vector Machine (SVM), Decision Tree (DT),
Logistic Regression (LR), Random Forest Classifier (RF) and Naïve Bayes (NB).
Stratified K-folds was applied from 0 to 20 with a step size of 5. Dimensionality reduction
was performed by applying Principal Component Analysis (PCA).
After applying Stratified K-Fold, it was discovered that the Random Forest classification
model had the highest accuracy, with an accuracy of 96.05 percent. Dimensionality
reduction produced effective effects, mostly for Support vector Machine, where the
accuracy increased exponentially from 62.57 percent to 95.5 percent. However, owing to
the randomization of the train-test break, this resulted in a very small decrease in the
accuracy of Random Forest, resulting in an accuracy of 95%.
PAGE ii
The limitation of this study is that the application may fail to classify correctly because
the classification mark includes a large group number. The prospects of the study include
using a larger dataset, dealing with outliers and incomplete values, tuning algorithms,
grouping several models, and applying deep learning (Sreenivasa B C, 2020).
Three supervised machine learning algorithms were selected for this project: Logistic
regression, J48 Decision Tree and Random forest. Logistic regression (LR) is an
algorithm that resolves problems relating to binary classification by matching the data to a
logistic function to estimate the risk of an occurrence (in this case, developing invasive
cancer). The J48 Decision Tree (DT) is a rank-based model made up of decision priciples
that repeatedly separates independent variables into similar sections depending on the
input variables' most important splitter. Random forest (RF) is an algorithm that
generates, constructs, and then joins a series of classification trees to predict the
probability of breast cancer. The models Logistic regression and Random forest models
were selected because they are do not overfit easily, while Decision Tree model can
accommodate non-linear correlations between variables. The models were created using
Weka 3.8, an open source data mining platform. The calibration and discrimination
accuracy of each model were calculated to assess their efficiency.
There are two drawbacks to this research. To begin, the data for this analysis came from a
single research center. As a result, unless the six prediction models established in this
analysis are further validated with larger health data sets, they cannot be generalized.
PAGE ii
Second, the prediction models were created using data from primarily white women
(91.1%), posing a generalizability problem once more. Other races, such as African
Americans, could not be correctly estimated by the simulations. Prospects of the study
includes further evaluation of the models of data sets with a varied racial/ethnic mix (Choi
et al., 2020).
PAGE ii
CHAPTER THREE
1.13. INTRODUCTION
This chapter gives an explanation of the methodologies and designs of the proposed
implementation of the breast cancer diagnosis system in depth. The analysis begins with a
comprehensive written description of the system's requirements, which is then used to
develop precise graphical models of the breast cancer diagnosis software application.
PAGE ii
This system aimed to diagnose breast mammography images using convolutional
neural networks to diagnose breast cancer. However a small dataset was used,
mammography images are unreliant in diagnosing breast cancer tissues in dense
breasts and the system lacked a user interface (Pirouzbakht & Mejía, 2017).
There are two types of requirements: functional and non-functional. In most cases,
functional requirement describes a behaviour from the perspective of the end-user. It
defines a user behaviour or activity that is not constrained by any system limitations and
is conducted by the system user. Non-functional requirements, define the outcome of user
PAGE ii
actions and is limited by the system's platform, environment, design restrictions, or
dependencies (Demirel & Das, 2018).
PAGE ii
The system should only authenticate users with a valid username and password.
The system diagnosis and risk assessment should have a short response time.
The system should update the database as required with no null fields.
System architecture activities aim to produce a holistic result based on interrelated and
uniform concepts, notions, and characteristics. The solution architecture provides
aspects that, to the maximum extent possible, address the issue or opportunity outlined by
a set of system requirements and product lifecycle ideas and can be implemented using
technologies.one of them being machine learning (Barrier, 2003).
Figure 3.1: Convolutional Neural Network Architecture (Phung & Rhee, 2019)
The CNN is made of three layers which include the convolutional layers, pooling layers,
and fully-connected (FC) layers. The connection of these layers produces a CNN
architecture. Asides these layers, there are two required factors which are the dropout
layer and the activation function. These layers are explained below (Gurucharan, 2020):
Convolutional Layer: This is the very first layer in a CNN architecture. It is
responsible for deriving distinct properties from the images passed in as an input.
This is the layer where the convolutional mathematical operation is performed by
taking the dot product between the filter and the input image sections, with respect
to the filter size, by sliding the filter across the input image (MxM). This process
produces a feature map, which includes all necessary details about the picture such
as its corners and edges. This feature map is passed to additional layers in order
for them to learn all the various distinct features from the input image.
PAGE ii
Pooling Layer: The pooling layer usually comes after the convolutional layer. It
aims to computing cost by minimizing the convolved feature map size. To fufill its
aim, the pooling layer minimizes the connections between layers and works
autonomously on each feature map. Pooling procedures can be classified based on
the procedure followed. These are: max pooling where the most prominent
element is selected from the feature map, average pooling where the mean value of
the constituents in an established sized image is calculated and sum pooling in
which the total addition of all the elements in the predefined section is calculated.
The pooling layer serves as a link between the convolutional and fully connected
layers.
Fully Connected (FC) Layer: This layer consists of the weights, biases and
neurons and acts as the connection between the neurons of two layers. The output
layer is preceded by the last various CNN architecture layers in which input
images are flattened and transmitted to the FC layer. Following this step, the
flattened vector is through a few extra FC levels, where the mathematical
functions operations are typically carried out and the classification procedure
begins.
Dropout: This layer is used to prevent the machine learning model from only
performing well on its training data and failing to generalize on unseen data or in
more general terms overfitting. It accomplishes this by creating a small model
through the process of eliminating few neurons from the neural network. For
example, when a dropout of 0.3 is passed, 30% of the neural network nodes are
dropped at random.
There are several CNN pretrained models that are used to solve image classification
problems rather than building a model from scratch. For the purpose of this training the
MELIGNANT breast cancer diagnosis model, the pre-trained model MobileNet is used.
MobileNet is a convolutional neural network for mobile vision applications that is simple,
efficient, and computationally light. Object identification, fine-grained classifications,
facial characteristics, and localisation are just a few of the real-world applications that
employ MobileNet. The features of MobileNet includes:
Depth-wise seperable convolution: This is made up of the depth-wise layer and
the point-wise layer. In essence, the first layer filters the input channels, while the
second layer combines them to generate a new feature.
PAGE ii
MobileNet Parameters: Despite the fact that the fundamental MobileNet design is
tiny and computationally light, it contains two global hyperparameters that
successfully lessen the computing expenses which are the width multiplayer and
the resolution wise multiplayer.
Data Processing: The acquired data from the data acquisition layer is
subsequently transferred to the data processing layer for enhanced refining and
assimilation. This involves data normalization, cleaning, modification, and
ciphering. Data processing is impacted by the type of learning that is employed.
In supervised learning, the imput data is separated into numerous phases of
sample data necessary for training the system, and the resulting data is called a
training data.
PAGE ii
Data Modelling: This layer entails the process of selecting several algorithms
that may condition the system to solve the machine learning problem. These
algorithms are either developed or derived from a collection of libraries.
The algorithm to be used for training the MELIGNANT breast cancer risk
assessment model is a Logistic Regression Classifier. Logistic Regression is a
predictive regression analysis used for binary classification. It describes the data a
nd the association between one output variable and one or more input variables.
The types of logistic regression includes: binary logistic regression which is used
when the output variable has two categories, multinominal logistic regression
which is when the output variable has more than two categories and Ordinal
Logistic Regression (OLS) which is used when the output variable category is
ranked.
Execution: Experimentation, testing, and tweaking are all done at this phase in
machine learning. Improving the algorithm is the overall objective of this stageso
as to derive the required computation result and improve system efficiency. The
output of this step is a fine-tuned result capable of supplying required data for the
decision making process of the machine.
PAGE ii
Figure 3.3: Supervised Learning Architecture (Pedamkar, 2021)
Django, written in python programming language, is a popular and widely used non-
proprietry web framework. Django is built on the Model-View-Controller (MVC)
architecture, and can be categorized as:
PAGE ii
The Model, which is represented by a database, is the logical data structure that
supports the program i.e MySql, Postgres.
The View or user interface, is the part users see when they visit a website on the
browser. HTML/CSS/Javascript files are used to represent them.
The Controller which acts as the connection between the view and the model, and
it is responsible for delivering data from the latter to the former.
Hence the software will concentrate on the model using MVC, by displaying or modifying
it.
PAGE ii
Figure 3.4: Class Diagram of the Breast Cancer Diagnosis System
PAGE ii
1.17.2. Sequence Diagram
Also known as event diagrams or scenerios, this is a behavioral diagram that describes the
order and the manner that items in an application interact. This diagram provides an
understanding of the specified requirements of a new system or an existing activity to
software and business professionals. The sequence diagram of MELIGNANT Breast
Cancer Diagnosis System is shown below:
PAGE ii
Figure 3 5: Sequence Diagram of the Breast Cancer Diagnosis System
1.17.3. Use Case Diagram
This is the fundamental description of system or software requirements for an software
application that is yet to undergo development. It only defined the appropriate system
behavior and not the methodologies and can either be displayed in a textual or graphical
manner. It helps to explain a system from the users’ viewpoint and it describes system
PAGE ii
behavior to users by detailing every noticeable system process. The use case diagram of
MELIGNANT Breast Cancer Diagnosis System is shown below
Figure 3.6: Use Case Diagram of the Breast Cancer Diagnosis System
PAGE ii
1.18. DESCRIPTION OF TABLES
The breast cancer diagnosis system database design consists of some tables that are
described in the section below
1.18.1. User Creation Table
This is the table that stores the information of each pathologist that uses the application.
Table 3.1: User Creation Table
PAGE ii
1.18.3. Patient Risk Assessment Data Table
This is the table that stores the information of each patient that has been that has taken a
risk assessment test via the application.
Table 3.3: Table of the Patient Risk Assessment Data
1.18.4.
PAGE ii
CHAPTER FOUR
SYSTEM IMPLEMENTATION
1.19. INTRODUCTION
This chapter discusses the hardware and software requirements, user interface modules,
data analysis and processing outcomes, programming languages, frameworks, and
libraries used to accomplish the project's goals. The breast cancer diagnosis system's
deployment would be described and demonstrated. The system was created with the intent
to maximise user experience, understanding, simplicity of use, and predictive accuracy in
mind.
Requirement Hardware
Processor Intel Pentium II 2.5GHz or higher
Primary Memory 4GB or higher
Architecture 64Bit (X64)
Microscope Digital or Stereo Microscope
USB USB Microscope
PAGE ii
Secondary Storage 32GB HDD or higher
1.20.2. Software Requirements
Software requirements define the software resource needs and necessities that must be
preinstalled in order for a program to work properly. The minimum software requirements
for running the MELIGNANT Breast Cancer Diagnosis System include:
Table 4.2: Software Requirements
Requirement Software
Operating System I. MAC: OS X v10.7 or higher
II. Windows: 7 or newer
III. Linux: Ubuntu
IV. iOS : v5 or higher
V. Android : v2.1 or higher
Programming Language Python, HTML, CSS, Bootstrap
Development Tool Visual Studio Code, Jupyter Notebook
Database Management System MySQL
Web Framework Django
Supported Browsers I. Chrome 4.0 and above
II. Safari 5.0 and above
PAGE ii
Numpy: For scientific computing, NumPy is the most significant Python
package. It provides a multidimensional array object, derivative objects,
and a number of array-related functions.
Tensorflow: This is a non-propriety machine learning platform. It allows
intellectuals and developers construct and launch machine learning
applications with a vast and versatile range of packages, libraries and
community resources.
Matplotlib: This is a charting library which embeds charts in applications
using an object-oriented API.
Scikit-Learn: This is a popular machine learning package. Scikit-learn
consists of machine learning tools that includes mathematical, analytical,
and general-purpose algorithms that serve as the foundation for a variety
of machine learning technologies.
Pandas: This is a quick, powerful, versatile, and simple to use non-
propriety tool used for analyzing and manipulating data. It has functions
and data structures for handling numerical tables and time series.
Seaborn: This is a module for developing statistical visualizations built
on matplotlib. It interacts with pandas data structures and assists people
with data exploration and comprehension. The seaborn charting operations
rdoes the required semantic mapping and statistical aggregation to produce
useful charts and runs with dataframes and arrays that comprise of whole
datasets.
PAGE ii
Visual Studio Code: Also known as VS Code, this is a non-propriety text editor.
VS Code is one of the most popular integrated development editor as it has
various prominent features, despite its low weight. It works with a variety of
programming languages and it allows users to add and develop new extensions i.e
code linters, debuggers, and support for cloud and web development.
HTML and CSS: These are the most widely used web development
technologies. HTML is used for page organization while CSS improves the
visuals of the page layout. HTML, CSS, images and programming, are the
fundamentals for creating web applications.
The first step in building the diagnosis model was importing the necessary libraries. These
include tensorflow library with tools such as keras for training convolutional neural
network models and matplotlib for visualizing the model’s metrics.
The next step was to load the dataset and preprocess the images. This was done using
keras imagedatagenerator for image data augumentation. Image data augmentation is
the process of enlargening the train data size through artificial means by generating
altered varieties of the images in the dataset. The ImageDataGenerator class in Keras
PAGE ii
supports a variety of augmentation strategies and is a fast and simple way to augment
images. The images were preprocessed using pretrained model Mobilenet CNN
preprocessing method, resized and assigned a batch size of ten (10).
To visualize the preprocessed images, matplotlib was used to plot a single batch of the
images.
PAGE ii
For model architecture definition, the CNN pretrained model, MobileNet was imported
and modified to suit the purpose of classifying breast cancer histopathological images
accurately, and the summary() method was called to view the model architecture.
To configure the learning process, the compile() method was called with an Adam
optimizer, a loss function of categorical crossentropy and set the metrics of model
evaluation to accuracy. The ReduceLROn Plateau method was used to reduce the learning
rate of the model in order to avoid model convergence and weights with the highest
validation accuracy were saved to a hdf5 file for future evaluation. The model was trained
with ten (10) epochs and gave a training accuracy of 98.7% and a validation accuracy of
94.2%.
PAGE ii
Figure 4.6: Training the diagnosis model
The learning curve of the model training accuracy against validation accuracy and training
loss against validation loss was plotted.
PAGE ii
Figure 4.7: Accuracy and loss graphs against training and validation data
The model was evaluated again with the test dataset and a confusion matrix was plotted.
On the test set, the model had an accuracy of 93.8%.
PAGE ii
Figure 4.8: Confusion matrix of the Breast Cancer Diagnosis model on the test data
Next the model was saved to be used in the web application using the keras model.save()
method. Keras saves the model and all trackable objects such as the layers and variables
linked to it using the save() method. The optimizer, weights, and model settings are also
preserved. Additionally the settings and information for each Keras layer linked to the
model are preserved.
PAGE ii
Figure 4.10: Embedding the diagnosis model
The Pandas describe() method was called in order to know the fundamental statistics of
the dataset. The derived statistics include count, mean, standard deviation, minimum
value, maximum value, 25th percentile, 50th percentile and 75th percentile.
The next step was to visualize the data using matplotlib and seaborn. This is done to
visualize the relationship between features of the datasets and draw conclusions. First we
visualize the distribution of the columns of the dataset and plot a histogram which is seen
below.
PAGE ii
Figure 4.13: Distribution of the dataset columns
Next we visualize the importance of the target feature (cancer) to the dataset and plot a
histogram. From this visualization we can see that the dataset is highly unbalanced with a
vast majority of the patients not having breast cancer.
PAGE ii
Figure 4.15: Distribution of each age group in the dataset
Next, the training and count fetures were dropped because they were not risk factors and
therefore insignificant to the study and a correlation matrix was plotted to discover which
features were highly correlated with the target feature. After plotting, it was discovered
that a history of breast cancer had a high correlation with breast cancer diagnosis.
PAGE ii
Next the data was split into training and testing sets using 75% for training the model and
25% for testing the trained model on unfamiliar data. The dataset was stratified to
maintain proportionality and improve accuracy.
PAGE ii
Figure 4.20: Pickling the Risk Assessment Model
PAGE ii
1.22. PROGRAM MODULES AND INTERFACES
1.22.1. Landing Page Module
The Landing page is the first page interacted with on using the application. It welcomes
users to the application, provides information about the application and gives helpful links
to navigate to the login, signup or about us page.
PAGE ii
1.22.2. Signup Page Module
Users are required to register to the application by creating an account before performing
diagnosis. This page consists of the user’s first name, last name, Medical ID, email
address, password and confirmation of password. The user is notified if the field does not
match the intended format, is required and if the password field is too common or does
not match the password confirmation field. The user is registered to the application and
redirected to the login page if all these requirements are met.
PAGE ii
Figure 4.26: Login Page of the Breast Cancer Diagnosis System
PAGE ii
Figure 4.29: View Diagnosis Results Page
PAGE ii
1.22.6. Risk Assessment Page
This is the second essential page of the application containing all details required to
predict breast cancer risk. All the fields are required and assigned error messages in case
of issues. The user is required to select or input Patient ID. The user then selects the
patient’s menopausal status, current age group, bi-rads breast density result, body mass
index, age at first birth, history of relatives with breast cancer, previous breast procedure,
result of last mammogram, natural or surgical menopausal status, use of hormone
replacement therapy and history of breast cancer. Based on the selections, the user can
then view patient’s risk assessment results. The user can also view a history of all past
risk assessment tests and their corresponding results.
PAGE ii
Figure 4.32: Risk Assessment Form
PAGE ii
Figure 4.34: Risk Assessment History
PAGE ii
CHAPTER FIVE
1.23. INTRODUCTION
This chapter details an overall overview of the project and future works that can be
considered to close the gaps available in the project.
1.24. SUMMARY
This project was able to proffer machine learning techniques as a means of providing a
second opinion to pathologists and eventually substituting for pathologists in
environments that none exist. In the course of the study, it was discovered that MobileNet
CNN architecture was able to classify malignant and benign tissues with an accuracy of
94.2% and Logistic Regression classifier was able to predict breast cancer risk with an
accuracy of 99%. It was also discovered that a history of breast cancer, breast density,
body mass index, age of women at first birth and the number of relatives with breast
cancer are the most important factors for developing breast cancer. With the high rate at
which breast cancer is spreading it is important to consider those factors and treat women
that fall into those categories as high priority. These models were then embedded into a
Django interface in order to enable pathologist to input patient’s histopathological image
from a microscope and risk factors data from their case files to ensure proper diagnosis
and prevention steps are taken.
1.25. RECOMMENDATION
Although this project was able to achieve a high accuracy in diagnosing and predicting
breast cancer, there are some factors that need to be improved due to the limitations and
challenges faced while developing the system. Future work on this project include:
Using annotated images by a pathologist for retraining the diagnosis model so that
the system can better discern between malignant and benign tissue cells.
PAGE ii
Retraining the risk assessment model with more samples of the cancer positive
samples in order to improve the precision and recall of the risk assessment model
on the cancer positive class.
Using the accumulated histopathological image dataset from the system for
analysis on indigenous data in order to detect outliers or anomalies and retraining
the model on thst dataset.
1.26. CONCLUSION
Early diagnosis of breast cancer is very important as it improves patient survivability and
prognosis. Regular risk assessment tests are necessary to ensure preventability of breast
cancer. This project has been able to effectively develop a user friendly system to
effectively diagnose and predict breast cancer. It has also created a database that can be
used to store subsequent indigenous data for further research. Proper diagnosis of breast
cancer in Nigeria is a very important issue as the disease is widespread and the country
lacks the necessary facilities and labor for handling it. As a result, this system will prove
to be very essential and valuable in the health sector.
PAGE ii
REFERENCES
Agency Report. (2019) It will take 25 years to reduce doctors’ shortage in Nigeria –
NMA President. Premium Times. Retrieved from
https://www.premiumtimesng.com/health/health-news/367620-it-will-take-25-years-
to-reduce-doctors-shortage-in-nigeria-nma-president.html
Agodirin, O., Olatoke, S., Rahman, G., Olaogun, J., Kolawole, O., Agboola, J., …
Fatudimu, O. (2019). Impact of primary care delay on progression of breast cancer in
a black african population: A multicentered survey. Journal of Cancer Epidemiology,
2019, 1–10. https://doi.org/10.1155/2019/2407138
Ake, A. (2018). Battling Cancer Misdiagnosis. This Day. Retrieved from
https://www.thisdaylive.com/index.php/2018/08/30/battling-cancer-misdiagnosis/
Akinnuwesi, B. A., Macaulay, B. O., & Aribisala, B. S. (2020). Breast cancer risk
assessment and early diagnosis using Principal Component Analysis and support
vector machine techniques. Informatics in Medicine Unlocked, 21, 100459.
https://doi.org/10.1016/j.imu.2020.100459
Al-Quraishi, T., Abawajy, J., Chowdhury, M. U., Rajasegarar, S., & Abdalrada, A. S.
(2017). Breast cancer risk assessment prediction using an ensemble classifier. 30th
International Conference on Computer Applications in Industry and Engineering,
CAINE 2017, February 2018, 177–183.
American Cancer Society. (2020). Breast cancer risk and prevention. Cancer.Org, 1–45.
Anand, P., Kunnumakara, A. B., Sundaram, C., Harikumar, K. B., Tharakan, S. T., Lai, O.
S., Sung, B., & Aggarwal, B. B. (2008). Cancer is a preventable disease that requires
major lifestyle changes. Pharmaceutical Research, 25(9), 2097–2116.
https://doi.org/10.1007/s11095-008-9661-9
Anthonia Obokoh. (2019). How lack of data, research hinders Nigeria healthcare system.
Business Day. Retrieved from https://businessday.ng/health/article/how-lack-of-data-
research-hinders-nigeria-healthcare-system/
Barrier, T. (2003). Encyclopedia of information systems (1st ed.). Michigan: Academic
Press
Burrows, W. & Scarpelli, . Dante G. (2020). Disease. Encyclopedia Britannica. Retrieved
from https://www.britannica.com/science/disease
PAGE ii
CTCA. (2019). What’s the difference? Male breast cancer and female breast cancer.
Retrieved from https://www.cancercenter.com/community/blog/2019/07/whats-the-
difference-female-male-breast-cancer
Chioma, O. (2020). How misdiagnosis kills 70 percent of Nigeria’s cancer patients –
Expert. Vanguard Nigeria. Retrieved from
https://www.vanguardngr.com/2019/03/how-misdiagnosis-kills-70-percent-of-
nigerias-cancer-patients-expert/
Choi, J., Jung, H.-T., & Choi, W. J. (2020). Development of a breast cancer risk
Assessment model using a machine learning approach. 24(2), Retrieved from
https://www.mendeley.com/catalogue/bc719895-bf9b-34ef-a82c-d42288ffd7e8/?
utm_source=desktop&utm_medium=1.19.8&utm_campaign=open_catalog&userDo
cumentId=%7Bc081cf63-d152-3b6c-aff5-dd58b20cb0a7%7D
College of Nigerian Pathologists. (2020). Only 500 pathologists in Nigeria ’ s healthcare
sector – CNP. Retrieved from https://theeagleonline.com.ng/only-500-pathologists-
in-nigerias-healthcare-sector-cnp/
Demirel, S., & Das, R. (2018). Software requirement analysis: research challenges and
technical approaches. 6th International Symposium on Digital Forensic and Security
(ISDFS). https://doi.org/10.1109/ISDFS.2018.8355322
Emilia, J.-A. (2017). Breast cancer in sub-Saharan Africa : determinants of stage at
diagnosis and diagnostic delays in women with symptomatic breast cancer. London
School of Hygiene and Tropical Medicine, 10(17).
Felsenstein, D. (2003). Encyclopedia of information systems (1st ed.). Michigan:
Academic Press
Ferlay, J., Colombet, M., Soerjomataram, I., Mathers, C., Parkin, D. M., Piñeros, M.,
Znaor, A., & Bray, F. (2019). Estimating the global cancer incidence and mortality in
2018: GLOBOCAN sources and methods. International Journal of Cancer (Vol.
144, Issue 8, pp. 1941–1953). Wiley-Liss Inc. https://doi.org/10.1002/ijc.31937
Gogolla, M. (2009). Unified modeling language. Encyclopedia of Database Systems,
3232–3239. https://doi.org/10.1007/978-0-387-39940-9_440
Gurucharan, M. (2020). Basic CNN Architecture: Explaining 5 Layers of Convolutional
Neural Network. Retrieved from https://www.upgrad.com/blog/basic-cnn-
PAGE ii
architecture/
MacIntyre, J. (2014). Predictive Analytics Using Machine Learning. AI Journal.
Krishna, M., Neelima, M., Harshali, M., & Rao, M. V. G. (2018). Image classification
using Deep learning. International Journal of Engineering and Technology(UAE),
7(March), 614–617. https://doi.org/10.14419/ijet.v7i2.7.10892
Mayo Clinic. (2021). Breast Cancer - Symptoms and Causes. Retrieved from
https://www.mayoclinic.org/diseases-conditions/breast-cancer/symptoms-causes/syc-
20352470
Ming, C., Viassolo, V., Probst-Hensch, N., Chappuis, P. O., Dinov, I. D., & Katapodi, M.
C. (2019). Machine learning techniques for personalized breast cancer risk
prediction: comparison with the BCRAT and BOADICEA models. Breast Cancer
Research, 21(1), 75. https://doi.org/10.1186/s13058-019-1158-4
Nahid, A.-A., & Kong, Y. (2017). Involvement of machine learning for breast cancer
image classification: a survey. Computational and Mathematical Methods in
Medicine https://doi.org/10.1155/2017/3781951
Niu, Q., Teng, Y., & Chen, L. (2019). Design of gesture recognition system based on
Deep Learning. Journal of Physics: Conference Series, 1168(3).
https://doi.org/10.1088/1742-6596/1168/3/032082
Okikiola, F. M., Aigbokhan, E. ., Mustapha, A. M., Onadokun, I. O., & Akinade, O. A.
(2016). Design and implementation of a fuzzy expert system for diagnosing breast
cancer.
Olatunji, T., Sowunmi, A., Ketiku, K., & Campbell, O. (2019). Sociodemographic
correlates and management of breast cancer in Radiotherapy Department, Lagos
University Teaching Hospital: A 10-year review. Journal of Clinical Sciences, 16(4),
111. https://doi.org/10.4103/jcls.jcls_82_18
Omondiagbe, D. A., Veeramani, S., & Sidhu, A. S. (2019). Machine learning
classification techniques for breast cancer diagnosis . IOP Conf. Series: Materials
Science and Engineering . https://doi.org/10.1088/1757-899X/495/1/012033
Pedamkar, P. (2021). Machine learning architecture: process and types of machine
learning. EDUCBA. Retrieved from https://www.educba.com/machine-learning-
architecture/
PAGE ii
Phung, V. H., & Rhee, E. J. (2019). A high-accuracy model average ensemble of
convolutional neural networks for classification of cloud image patches on small
datasets. Applied Sciences (Switzerland), 9(21). https://doi.org/10.3390/APP9214500
Pilone, D., & Pitman, N. (2005). UML 2.0 in a Nutshell.
Pirouzbakht, N., & Mejía, J. (2017). Algorithm for the detection of breast cancer in digital
mammograms using deep learning. CEUR Workshop Proceedings, 2031, 46–49.
Rajendran, K., Jayabalan, M., & Thiruchelvam, V. (2020). Predicting breast cancer via
supervised machine learning methods on class imbalanced data. International
Journal of Advanced Computer Science and Applications (Vol. 11, Issue 8).
www.ijacsa.thesai.org
Silaparasetty N. (2020) An Overview of Machine Learning. In: Machine Learning
Concepts with Python and the Jupyter Notebook Environment. Apress, Berkeley,
CA. https://doi.org/10.1007/978-1-4842-5967-2_2
Spanhol, F. A., Oliveira, L. S., Petitjean, C., & Heutte, L. (2016). Breast cancer
histopathological image classification using Convolutional Neural Networks.
Proceedings of the International Joint Conference on Neural Networks, February
2018, 2560–2567. https://doi.org/10.1109/IJCNN.2016.7727519
Sreenivasa B C. (2020). Breast cancer and prostate cancer detection using classification
algorithms. International Journal of Engineering Research And, V9(06).
https://doi.org/10.17577/ijertv9is060085
Triantafyllidis, A. K., & Tsanas, A. (2019). Applications of machine learning in real-life
digital health interventions: Review of the literature. Journal of Medical Internet
Research, 21(4), 1–9. https://doi.org/10.2196/12286
University of California San Francisco. (2020). Breast Cancer Risk Factors.
W.H.O. (2008). Constitution of World Health Organization. (Vol. 44, Issue 3).
https://doi.org/10.2307/2004468
WHO. (2021a). Cancer. Retrieved from
https://www.who.int/news-room/fact-sheets/detail/noncommunicable-diseases
WHO. (2021b). Noncommunicable diseases. Retrieved from https://www.who.int/news-
room/fact-sheets/detail/noncommunicable-diseases
PAGE ii