SVM 1-3

DEVELOPMENT OF MACHINE LEARNING CLASSIFIERS FOR BREAST CANCER
CHAPTER ONE
INTRODUCTION
1.1 Background of the Study
Cancer is known to be a disease that causes the uncontrollable growth of cells in any
part of the body. These malformed cells called tumor cells or malignant cells can penetrate
the tissues that are present in the human body. The continuous splitting of cells that have
been developed over time could result in the formation of a lump, microcalcifications, or
architectural distortions which are usually referred to as tumors (Ebrahim and Wu, 2016).
These tumors can be cancerous or not cancerous (benign). Breast cancer is known to be a
malignant tumor that grows in the cells of breast tissue. Breast cancer has been identified to
be one of the leading causes of death rate among women (Barrios, 2022). There exist diverse
types of breast cancer, however, the type of breast cancer is dependent on the particular cell
that is present in the breast which develops into cancer.
Most breast cancers begin in the ducts or lobules. There exists the possibility of breast
cancer spreading outside the breast through blood vessels and lymph vessels. When breast
cancer spreads to various body parts, it is said to have metastasized (CDC, 2022). Breast
cancer can easily be diagnosed by the radiologist only if the abnormalities in the breast are
detected early. Accurate diagnosis and early detection have been shown to decrease the
mortality rate arising from breast cancer. Humans can make mistakes while diagnosing a
disease due to limitations and it depends on the expertise of doctors. The effective tools that
can be used to diagnose breasts that have been affected by cancer will aid medical
practitioners in accurately diagnosing and aptly treating the patients.
Diagnosis made with the help of machine learning tends to diagnose more correctly
(91.1%) when compared to a diagnosis made by an experienced physician (79.97 %). There
1
is a gradual increase in the usage of classifier systems for medical diagnosis (Ali and Feng,
2016). The evaluation and making decisions of expert physicians are important factors.
However, the classifier algorithms with artificial intelligence also help experts especially by
minimizing possible errors which inexperienced experts can make.
Artificial intelligence (AI) technologies are significant in usage as they provide an
improved response that is sensitive to time while reducing the rates of error for individual
patients. The increase in the generation of clinical notes, diagnostic images, and reports is
being saved in medical records that are electronically generated (Panahiazar et al., 2021).
This heterogeneous data result in challenges associated with data analytics and reusability
which has the nature of high complexity, thus bringing about modern methods to store,
manage and process, and reuse big data. This proposes the need to urgently advance new, and
extensible AI frameworks and generic processes that can allow the providers of healthcare to
access knowledge for individual patients, yielding better decisions and outcomes.
Machine learning is a discipline of study that aims to train machines to do cognitive
tasks in the same way that humans do. While they have far fewer cognitive abilities than
ordinary people, they are capable of quickly processing data of large amounts and extracting
significant commercial insights. Machine learning algorithms employ a computative
approach to “learn” information straight away from data rather than depending on a model
that is based on a preconceived equation. As the number of resources available for learning
grows, the algorithms alter their performance. Deep learning is an especially complex part
of machine learning (Janiesch et al., 2021).
As a result, a practitioner in machine learning may come across a variety of forms of
learning, ranging from entire fields of research to individual methodologies. Machine
learning employs the use of two methods. The first method is supervised learning which
trains a model to predict future outputs based on the input and output data that has been
2
known. The second method which is unsupervised learning uses concealed patterns or
internal structures in its input data. Supervised learning makes use of regression and
classification approaches to advance the models in machine learning. The classification
approach classifies the input data and predicts discrete responses, while the regression
approach predicts continuous responses. If the data range or response nature that is being
worked upon is a real number, like time until there is a failure in the piece of equipment or
temperature, then it is more suitable to use regression techniques. Unsupervised learning
detects hidden patterns or internal structures in unsupervised learning data. It is used to
eliminate datasets containing input data without labeled responses (Anand, 2022).
Support Vector Machine (SVM) is a new approach to supervised pattern classification
which has been effectively implemented in a wide range of pattern recognition problems and
it is likewise a training algorithm for learning regression and classification rules from data.
SVM is mostly appropriate for accurate and effective work with increased feature spaces in
dimensionality. In addition to that, SVM is strongly rooted in mathematics and results in the
simplest way and very powerful algorithms (Ali and Feng, 2016).
The standard SVM algorithm builds a binary classifier. A simple way to model a
binary classifier is to construct a hyperplane separating class members from non-members in
the input area. SVM also finds a nonlinear decision function in the input space by mapping
the data into a higher dimensionality feature space and separating it using a maximum margin
hyperplane. The system has been developed in a way that it automatically detects a member
of informative points known as support vectors and further uses them to depict the
disconnecting hyperplane that is rarely a linear synergy of these points. Finally, SVM solves
a simple convex optimization problem.
3
1.2 Statement of Problem
Breast cancer depicts a crucial priority across the globe. As this becomes a universal
concern and as the burden to which the disease increases globally, the current evaluation
signifies that in a couple of decades to come, much of the incidence and mortality related to
breast cancer will be seen in underserved populations. It is of high necessity for the delicate
and ill-equipped healthcare systems in low- and middle-income countries (LMIC) to take up
this challenge to provide solutions with the short supply of their resources.
Significant disparities can be recognized at the presentation stage as there exists a
compromise in detecting the earlier stage of the disease, thereby resulting in terrible
consequences associated with late diagnoses.
There are challenges addressing breast cancer at a global level indicating critical
disparities. Recent information from the World Health Organization (WHO) indicates that
although 70% of countries have established cancer guidelines and 62% report screening
programs, at the same time, 40% report important management and treatment access
restrictions and less than half have palliative care plans (WHO- Cancer Report, 2020). This is
a complicated problem involving multiple stakeholders and several aspects to consider.
Focusing on the most important, we could argue that access to healthcare, late-stage
diagnosis, and lack of timely and appropriate diagnostic and treatment procedures are
probably on the top of the list. Somehow these issues are all intimately related (Sung et al.,
2021).
Diagnosis of this deadly disease is also one of the major challenges medical
practitioners face especially in countries that are developing such as Nigeria. However,
correct, and prompt diagnosis of diseases is an essential matter in the medical field. Limited
human capability and limitations decrease the rate of correct diagnosis. Algorithms of
machine learning such as support vector machine (SVM) among other techniques can help
4
physicians to diagnose more correctly because it has been identified to be a very dynamic and
flexible model of machine learning, efficient in conducting even outlier detection, linear or
nonlinear classification, and regression. SVM is the most appropriate for the classification of
complex but small or medium-sized datasets.
1.3 Justification
The constant spread of cancerous cells in the breast needs to be properly taken care of,
in the bid to lessen the death rates in women. Measures should likewise be employed that
would aid the detection of the early stage growth of the cancerous cells to avoid the worse
conditions that emanate from late cancerous cell growth. For a precise diagnosis, this
research is purposed to move from the old approach to solving cancer problems to an
advanced approach for effective treatment.
The effective approach to using a machine language algorithm known as a Support
Vector Machine (SVM) is to solve problems by recognizing patterns, regression, and
classification. The models in SVM entail collecting the data of cancerous cells, pre-
processing to determine if the cells were benign or malignant, extracting the features of the
data, building a predictive model, and validating the model through the ROC curve.
1.4 Aim and Objectives
The major aim of this research is to classify breast cancer by using a support vector machine
(SVM).
The specific objectives of this research are to;
(i) select diagnostic data set for breast cancer.
(ii) compute two-dimensional independent component analysis of the selected data set.
(iii) test and practice support vector machine (SVM) classifier.
(iv) evaluate the performance of breast cancer classifiers using ROC curve analysis and
empirical measures.
5
1.5 Scope of the Study
This study is limited to;
i. The usage of machine learning techniques in the classification of cancer of the breast
ii. Support vector machine as the only machine learning technique employed
iii. The performance measures will be evaluated using ROC analysis
iv. The empirical measures that will be used to evaluate the effectiveness of the
classifiers will only include sensitivity, accuracy, and specificity
6
CHAPTER TWO
LITERATURE REVIEW
2.1 Definition of Machine Learning
Machine Learning is an interdisciplinary field of study that evaluates algorithms that
facilitate pattern, recognition, classification, and prediction based on models derived from
existing data. This learning pertains to diverse computing areas for program algorithms with
high-performance output, such as medical diagnosis, internet stock trading, detection of
online fraud, filtering of email spam character detection, and recognition, to mention a few
(Alzubi et al., 2018). Moreover, these are a set of tools utilized for creating and evaluating
algorithms that facilitate prediction, pattern recognition, and classification. This learning is
based on four steps which are; a collection of data, selection of model, model training, and
testing of the model (Amrane et al., 2018).
In broad terms, there are two scopes to be examined under machine learning, the first
which is machine learning output as a classifier that can be suitable for hardware, to create
the classifier to be highly mechanized without involving much human input, and the second
scope which examines the use of automatic algorithm construction methods that can
minimize the chances that human biases may affect the algorithm performance (Tarca et al.,
2007). This branch of Computer science enables computers to learn without being directly
programmed as it has the potential to transform epidemiological sciences and offers a new
medium of solving issues for epidemiologists seeking to integrate machine learning
techniques into research work. Some machine learning concepts lack statistical or
epidemiologic parallels and their terminologies differ where underlying concepts are the
same.
7
Researchers from different backgrounds of study have employed machine learning on
biology data and healthcare systems for improved outcomes. This technology performs a
crucial role in resolving healthcare issues and other scientific problems. Image recognition,
medical diagnosis, extraction, and statistical arbitrage, amongst other areas of human life, are
where machine learning can be applied, and this application can be carried out by machine
learning classifiers like Artificial Neural Networks, decision trees, Support Vector Machine,
and random forest (Siddiqui et al., 2020).
Currently, machine learning plays a vital role in the field of healthcare analysis by
helping in early diagnosis and analysis which improves medical image processing and
reduces processing time, especially in healthcare informatics. Before the recent advancement
and use of machine learning in healthcare systems, it had been challenging to achieve
accurate results (Kumar et al., 2021). The healthcare industry generates clinical information
in form of big data which is stored in form of electronic health records for effective
healthcare management.
However, the medical industry needs access to appropriate technology for effective
data analysis the aid of detecting diseases, and recommending treatment and clinical services
to doctors. Thus, algorithms such as decision tree, naïve Bayes and random forest algorithms
help classify a patient dataset based on a particular disease, predicting the presence of
diseases such as cancer, heart disease, and neurodevelopment disorders, and providing more
information about a patient’s health (Kokilavani et al., 2021).
8
Fig. 2.1: Multi-disciplinary Machine Learning
Source: Alzubi et al., (2018).
9
2.2 Machine Learning Classifiers
Classifiers also known as estimators are generally used in machine learning to refer to
algorithms that perform a prediction or classification of interest. In healthcare management, a
machine learning classifier gives decisions from a patient’s data which further reduces the
work of medical practitioners, with treatment being carried out within a shorter time. Its
algorithms are effective in analyzing both structured and unstructured data and are useful for
classifying the patient dataset based on a particular disease (Qifang et al., 2019).
In machine learning, a classifier is an algorithm that automatically assigns data points
to a range of categories or classes. Within the classifier category, there are two main models:
supervised and unsupervised. In the supervised model, classifiers train to make distinctions
between labeled and unlabeled data. This training allows them to recognize patterns and
ultimately operate autonomously without using labels. Unsupervised algorithms use pattern
recognition to classify unlabeled datasets, progressively becoming more accurate (Indeed,
2022).
Machine learning is widely classified based on a computer’s learning which could
either be supervised learning or unsupervised learning. Supervised learning fits models to
both labeled and unlabeled data. This learning supplements limited labeled data with an
abundance of unlabeled data to improve model performance, though studies reflect that
unlabeled data helps build a better classifier but appropriate model selection is critical (Zhu
et al., 2009). Supervised learning is based on a data training sample from the data source with
the correct classification already assigned. On the other hand, unsupervised learning is a
10
machine learning discipline that discovers patterns in large data sets or classifies the data into
several categories without being trained explicitly. This algorithm identifies natural
relationships within data without reference to any outcome (Duda et al., 2012).
Generally, supervised learning applies data training usually known as labeled data.
This training data has one or more inputs with a labeled output. Models use these labeled
results to access themselves during training with the sole purpose of improving new data
prediction. Supervised learning models usually focus on regression algorithms and
classification. In the medical field, classification problems are common, and diagnosing
patients involves doctors in clinics and healthcare environments, to classify illnesses or
diseases based on symptoms. Algorithms involved in this learning are; decision trees, random
forests, Naïve Bayes models, support vector machines, and linear and logistic regression,
however, Neural Networks can be trained with supervised learning (Toh et al., 2020).
Unlabeled data is being employed by unsupervised learning to detect patterns within
the data. These learning algorithms excel at clustering data into relevant groups to detect
latent characteristics, however, they are more algorithmically intensive, requiring a large
amount of data to operate. Deep learning and K-means clustering are well-known and
common algorithms, although, in a supervised approach. Since there is no human input, the
models are regarded to be unsupervised (Längkvist et al., 2014).
2.3 Algorithms Classifications in Machine Learning
Although machine learning can be classified into two main types which are
supervised learning and unsupervised learning, there are a wide variety of classification
algorithms used in machine learning and each one uses a different mechanism to analyze
data. These are five common types of classification algorithms in machine learning which
are; Naïve Bayes, Logistic Regression, K-Nearest Neighbors, Decision Tree, and Support
Vector Machines.
11
Fig. 2.2: Machine Learning Approaches
Source: Hasan et al., (2021).
12
2.3.1 Naive Bayes classifier
The Bayesian classifier is based on Bayes’ theorem which assumes that the effect of
an attribute value on a given class is independent of the values of the other attributes. This
assumption is called a class conditional independence which is made to simplify the
computation involved and as a result is considered naïve. These classifiers are statistical
classifiers that predict class membership probabilities (Leung, 2007). A naïve Bayes
algorithm calculates the probability associated with each possible class conditional on a set of
covariates— that is, the product of the prior probability and the likelihood function. The
classifier then selects the class with the highest probability as the “correct” class.
Naive Bayes classifiers use probability to predict whether an input will fit into a
certain category. The Naive Bayes algorithm family includes a range of different classifiers
based on a theorem of probability. These classifiers can determine the probability of an input
fitting into one or more categories. In multiple category scenarios, the algorithm reviews the
probability that a data point fits into each classification. After comparing the probability of a
match in each category, it outputs the category that is most likely to match the given text
(Saritas et al., 2019). Many companies use this type of algorithm to assign tags to text
segments like email subject lines, customer comments, and articles.
2.3.2 Decision tree
Decision tree classifiers are regarded to be a standout of the most well-known
methods of data classification representation of classifiers. Different researchers from various
fields and backgrounds have considered the problem of extending a decision tree from
available data, such as machine study, pattern recognition, and statistics. In various fields
such as medical disease analysis, text classification, user smartphone classification, images,
13
and many more the employment of Decision tree classifiers has been proposed in many ways
(Jijo et al., 2021).
A decision tree is a classification algorithm that uses a process of division to split data
into increasingly specific categories. It's called a decision tree because the classification
process resembles a tree's branches when represented graphically. The algorithm works on a
supervised model and requires high-quality data to produce good results. Since the primary
goal of a decision tree is to make increasingly specific distinctions, it must continuously learn
new classification rules. It learns these rules by applying if-then logic to training data. The
algorithm continues the classification process until it reaches reaches a designated stopping
condition (Leung, 2007).
2.3.3 Artificial neural networks
An artificial neural network is a well-known machine technique influenced by the
biological neural network in the human brain. These are computing frameworks made up of
many individual algorithms. Their mechanism of action mimics how human brains work, and
includes a collection of artificial neurons that transmit signals. This makes artificial neural
networks capable of solving extremely complex problems that involve multiple layers.
Because of their complexity, it can be challenging to train and adjust ANNs, and it often
requires large amounts of training data.
However, a fully trained ANN can perform tasks that would be impossible for single
algorithms. There are many types of artificial neural networks, including Feed forward neural
network, Feedback neural network, Re-current neural network, Classification-prediction
network, Radial basis function network, Dynamic neural network, and Modular neural
network (Saritas et al., 2019). An artificial neural network (ANN) is a data processing
algorithm in which the computations simulate a biological neural network An ANN model
14
has three layers: an input layer, a hidden layer, and an output layer. Links connect nodes in
different layers.
Fig. 2.3: Decision Tree
Source: Jijo et al., (2021).
15
Nodes in the input layer represent predictors, and nodes in the output layer represent
outcomes (Lou et al., 2020). A common application of neural networks is the multilayer back
propagation learning algorithm, which models nonlinear systems.
2.3.4 K-nearest neighbor
K-nearest neighbor (KNN) is a supervised lazy learner algorithm used in machine
learning. This means that it stores the training data that supervisors present and compares it to
other data to make predictions. While the training period for these algorithms is often shorter
than for "eager learners," they're often slower to make predictions. After storing its training
data, a KNN algorithm compares it with test data and measures the degree of similarity
between them. It then stores all instances that correspond with the training data. Next, the
algorithm attempts to predict the likelihood that future data will correspond to the dataset it
compiled. While this algorithm is common in classification, many professionals also use it to
complete regression tasks (Islam et al., 2007). This algorithm has been used in different
applications such as healthcare, finance, image, and video recognition, and in handwriting.
The algorithm first trains a model with labeled data with different classes and then tests the
model using new points. The algorithm calculates the nearest known neighbor points to the
new data points using one of the approaches such as Manhattan distance, Hamming distance,
Minkowski distance, and Euclidean distance. The new point is classified to a known class
that is nearest to this point, the algorithm repeats the same procedure for all new points.
Combing K-NN with SVM can improve the efficient diagnosis.
2.3.5 Support vector machine
16
A Support Vector Machine is a set of supervised learning methods used for
classification and regression problems. This learning method is a simple algorithm that
professionals can use for classification or regression activities. They work by finding
hyperplanes within a data distribution, which you can visualize as a line separating two
different classes of data. There are often many hyperplanes capable of separating the data,
and the algorithm will select the optimum line of separation. In the SVM model, the optimum
hyperplane is the dividing line that offers the greatest margin between the different classes.
SVMs can work in multiple dimensions if they are unable to find an ideal hyperplane to
separate the data into two dimensions. This makes them extremely effective for creating
classifications from complicated data distributions. The more complex the data inputs are, the
more accurate the SVM becomes, making them excellent machine learning tools (Qifang et
al., 2019).
SVMs are binary classifier algorithms that seek to create a linear boundary that
separates classes in a high-dimensional feature space. To successfully create linear separators
(known as hyperplanes) within complex nonlinear data sets, SVMs use a technique known as
a kernel trick. The kernel trick allows the algorithm to transform the input data to straighten
out the complexity and to allow a linear hyperplane to separate the classes (e.g, will a patient
experience financial toxicity or not) (Rani et al., 2022).
2.4 Classification of Breast Cancer
One of the significant and common diseases that causes death in women is breast
cancer. This cancer is a complex multifactorial disease encompassing a great variety of
entities that show considerable variation in clinical, morphological, and molecular attributes.
The causes of this disease are multifaceted and are transmitted through genes, hormones,
reproductive factors, and history of family or hereditary factors, leading to millions of death
cases in women every year. However, half of the percentage involved in this disease die due
17
to late detection by doctors, according to the world health organization (Amrane et al., 2018).
It is essential to prevent the progression of this disease in women through early diagnosis and
treatment to reduce morbidity rates.
Fig. 2.4: Components of SVM
Source: (Rani et al., 2022).
18
This diagnosis is carried out by detection and regular checkups with the use of
ultrasound imaging and mammography followed by breast tissue biopsy once the check-up
shows the possibility of malignant tissue growth. Breast cancers are common among women
and are very rare in men. They are not common in men due to the less development of breast
cells in males. The center for disease control and prevention has given the statistics that one
out of every hundred breast cancers identified is identified in men. Most of them do not check
for signs of lumps as a result male breast cancers are diagnosed at a much later stage
(Nesamani et al., 2021).
Based on the heterogeneity of breast cancer, it cannot be viewed as a clinic pathology
entity, must necessarily be dissected into homogeneous entities, thus it needs to be classified.
Generally, a suitable classification of any disease must be scientifically sound, clinically
useful, easily applicable, and widely reproducible, but unfortunately, the perfect classification
of breast cancer has not been expressly stated in years of research to date. Breast cancer is
caused by a mutation in one cell which can be shut down by the body system or cause the
division of cells.
19
Fig. 2.5: Types of Cancer. Source: (Tahmooresi et al., 2018).
20
Moreover, malignant tumors expand to neighboring blood cells that can most likely
extend to other parts, although benign masses cannot expand to other tissues. It may be
difficult to detect breast cancer at an early stage due to no symptoms in the breast, but after
tests and diagnosis, malignant and benign tumors will be easily differentiated (Amrane et al.,
2018). The interconnection between breast cancer and machine learning has existed for
decades to classify tumors and other related malignancies.
2.5 Application of Machine Learning Technology for Diagnostic Analysis
The application of machine learning technology in various capacities to solve life
problems has changed the way problems are detected and solved. In healthcare management,
machine learning has been employed for different varieties of healthcare data. Some well-
known Machine learning techniques which include Neural Network (NN), K-Nearest
Neighbor (K-NN), Decision Tree (DT), and classical Support Vector Machine (SVM) are
used for diagnosing diseases such as cancer, cardiovascular disease, diabetes, hepatitis, and
another related disease (Mohammad et al., 2020).
The use of machine learning in healthcare management examines many varying data points
and gives accurate results. This learning is a significant field of Artificial Intelligence and it
continues to evolve. however, some valuable machine learning applications in medical and
healthcare are in the detection and treatment of disease, discovery, and manufacturing of
drugs, medical imaging diagnosis, medicine, health records, behavioral modification through
machine learning, clinical research, and trials, amongst others (Butryn et al., 2021).
Due to the regular increase in the fatality rate of diseases like diabetes, cancer, heart
disease, and hepatitis, and the large volume of patient data used for getting valuable
information, scientists utilize data mining to tackle real-world medical issues and treat
diseases. These days Data mining technology is employed by scientists in assessing
21
numerous diseases like cancer, diabetes, cardiovascular disease, and hepatitis, and many
machine learning techniques when implemented showed accuracy in results (Obaid et al.,
2018).
Machine learning is a key technology in supporting management processes in all areas
of life such as the healthcare system by collating information resources about diseases and
researching for diagnostic tools in an adequate field of application for artificial intelligence
tools. Machine learning and neural networks are technologies that can be used for data
analysis of patients through categorization to create advanced predictive models that
determine the probability of a patient suffering from a specific disease. This is of particular
importance in the implementation of diagnostic processes, which, through the support of
artificial intelligence solutions, can become significantly more effective (Butryn et al., 2021).
Breast Cancer is a class of disease that has a usual growth of cells that is prone to
attack and expands to different organs of the body. Medical specialists categorize cancer into
different types based on where it develops in the body such as sarcomas, leukemias,
carcinomas, and lymphomas which are four major types of cancer. Amongst the female
gender, breast cancer, colorectal cancer, lung cancer, and cervical cancer are the most
common types, while prostate cancer, stomach, lung, and colorectal cancer are predominant
among the male gender. Although other cases of death by cancer are caused by smoking,
obesity, poor nutrition, lack of body exercise, and excessive consumption of liquor, while
some are inherited from hereditary problems. All these symptoms are detected through
machine learning technology, specifically through medical imaging using biopsy, a procedure
used in removing tissue pieces or cell samples from the body for laboratory tests Mohammad
et al., 2020).
22
Fig. 2.6: The main Application of Machine Learning in Medicine
Source: (Obaid et al., 2018).
23
2.6 Role of Machine Learning in Breast Cancer Classification
Machine Learning plays a crucial role in a wide range of critical applications and the
the healthcare management system, it helps find out the biomarker gene to assess and
diagnose diseases. Breast cancer has become a significant concern in the medical field and
has been the recurrent reason behind death cases globally. According to India’s statistics over
the years, over a million cases of breast cancer is recorded with thousands of women
suffering it and most likely losing their live due to late diagnosis of the disease. An early
diagnosis of this disease will lower death and save the lives of breast cancer patients.
To detect and identify this disease, mass spectrometry is employed and combined
with various tools to boost the accuracy of pathological analysis using biomarkers. Once the
diagnosis analysis is collected from the patient, it is then analyzed under specific biological
circumstances and pathological procedures. However, in classifying cancerous and non-
cancerous genes conditions, biomarkers are distinctive determinant features. Machine
learning is employed in the detection of Breast cancer and classification is carried out with
different computing techniques to the dataset used to determine the probability of cancer
(Bellaachia et al., 2006). Machine learning plays a vital role in the diagnosis and treatment of
breast cancer with the Supply Vector Machine being the most efficient amongst other
techniques, as its classification performance derives a better result in accuracy and sensitivity
of diagnosis conclusions, thus it becomes most employed as a diagnostic instrument for
accurate prediction and detection of breast cancer (Sahu et al., 2020).
2.7 Support Vector Machine for Breast Cancer Classification
Support Vector Machine is classified under the supervised learning pattern to train an
algorithm for learning classification and regression from collated data. This algorithm has
24
been utilized by various researchers in solving different problems in regression and
classification, the latter is commonly used. According to the number of features, n-spaces are
formed where each coordinate is created for each feature. This algorithm tries to draw
different new lines, which are called hyperplanes, among the n-spaces to find out the best line
that has the maximum margin. The maximum margin can be defined as s margin that
segregates between different classes, which are represented by data points. Various studies
have used this algorithm to classify breast cancer tumors that achieved promising results.
These studies utilized different algorithms (i.e., SVM, K-NN, C4.5, NB, K-means, EM,
PAM, and fuzzy c-means). It was found that the SVM algorithm achieved higher accuracy
than other algorithms (Medjahed et al., 2013).
2.8 Previous Related Research
Table 3.1 below presents some literature reviews of previous related works done by
various researchers.
S/ Title, Author(s), and Research Limitations and

Research Findings
N Year of Publication. Gaps
1. Breast Cancer The researchers present two In this study, only two
Classification using different classifiers which different classifiers were
Machine Learning are the Naïve Bayes presented and evaluated.
(Amrane et al., 2018). classifier and K-nearest
neighbor (KNN) to classify
breast cancer and evaluate
their accuracy using cross-
25
validation.
2. Development of In this paper, the authors This paper only develops
Machine Learning develop and test a tool to machine learning
Algorithms for the accurately predict an algorithms for the
Prediction of Financial individual’s risk of financial prediction of financial
Toxicity in Localized toxicity data before initiation toxicity in localized breast
Breast Cancer of breast cancer treatment. cancer cases.
Following Surgical The researchers also explore
Treatment (Sidey- whether supervised machine
Gibbons et al., 2021). learning algorithms can
reliably predict financial
toxicity in patients with
breast cancer who undergo
surgical treatment.
3. Comparative study of This paper compares three of This paper only compares
machine learning the most popular ML support vector machines,
algorithms for breast techniques commonly used random forests, and
cancer detection and for breast cancer detection Bayesian networks.
diagnosis (Bazazeh et and diagnosis, namely
al., 2016). Support Vector Machine
(SVM), Random Forest
(RF), and Bayesian
Networks (BN).
4. A comparative This paper conducts a The study conducts a
26
analysis of nonlinear performance comparison performance comparison
machine learning between five nonlinear between five non-linear
algorithms for breast machines machine learning
cancer detection learning algorithms viz algorithms.
(Bataineh, 2019). Multilayer Perceptron
(MLP),
K-Nearest Neighbors
(KNN), Classification and
Regression
Trees (CART), Gaussian
Nave Bayes (NB), and
Support Vector
Machines (SVM) on the
Wisconsin Breast Cancer
Diagnostic
(WBCD) dataset.
5. Performance analysis This study develops a This study only evaluates
of different machine machine learning model the performance analysis of
learning algorithms in coupled with limited features different machine learning
breast cancer to produce high algorithms in breast cancer
predictions (Battineni classification accuracy in predictions.
et al., 2020). tumor classification by
considering a dataset of 569
females diagnosed as 212
malignant and 357 benign
27
types. For model
development, three
supervised ML algorithms
namely support vector
machines (SVM), logistic
regression (LR), and K-
nearest neighbors (KNN)
were employed. Each model
was further validated by 10-
fold cross-validation and
performance measures were
defined to evaluate the
model outcomes.
6. Machine Learning This study had two key his study has several
Algorithms to Predict findings: (1) the comparison limitations inherent in any
Recurrence within 10 of goodness-of-fit results large database analysis.
Years after Breast indicated that provider First, the validity of the
Cancer Surgery: A characteristics (e.g., the comparisons in the study is
Prospective Cohort volume of breast cancer limited by the exclusion of
Study (Lou et al., patients per surgeon and per complications associated
2020). hospital) are essential with recurrence after
considerations in the design surgery. Second, the
of clinical decision support analysis was limited to
systems; and (2) the recurrence over 10 years
comparison of AUROC after surgery, which
28
values indicated that the reduced the subset of breast
ANN model is superior to cancer patients in which the
other prediction models. ANN model is clinically
applicable. Third, this study
only compared individual
ANN, KNN, SVM, NBC,
and COX models. Future
works may consider the use
of an alternative study
design that compares a
balanced sample of
surgeons or hospitals at the
first level and then
randomly selects breast
cancer patients at the
second level.
7. Breast Cancer Type The researchers evaluated Further research is
Classification Using the performance of four ML- recommended to investigate
Machine Learning based classification the power of ML algorithms
(Wu et al., 2021). algorithms: K-NNs, NGB, in the classifications of
DT, and SVM for the subtypes of triple-negative
classification of breast breast cancers TNBC and
cancer into triple-negative non-TNBC, to identify the
breast cancers (TNBC) and best classification features,
non-TNBC using gene and to integrate radionics
29
expression data. The with genomics data.
investigation revealed that
ML algorithms could
classify Breast Cancer into
TNBC and non-TNBC.
SVM algorithm turned out
the most accurate among the
four algorithms.
8. Using Three Machine In this paper, using data This paper has explored risk
Learning Techniques mining techniques, the factors for predicting breast
for Predicting Breast authors developed models to cancer by using data mining
Cancer Recurrence predict the recurrence of techniques. Each method
(Ahmad et al., 2013). breast cancer by analyzing has its limitations and
data collected from the strengths specific to the
Iranian Center for Breast type of application. There
Cancer ICBC registry. are some limitations in the
current study. There were
many cases lost in the
follow-up and there were
records with missing values
that were omitted,
unfortunately. Some
important variables such as
S-phase fraction and DNA
index were not included in
30
the study because of their
unavailability which may
have decreased the
performance of the models
and and there was some
degree of omission in our
data.
However, these obtained
results were based on a new
database in
Iranian Center for Breast
Cancer, by comparison, has
three different data mining
methodologies and a Weka
toolkit.
9. Comparison of Two of the most popular In this study, only two of
Machine Learning machine learning techniques the most popular machine
Methods for Breast were employed for the learning techniques were
Cancer Diagnosis classification of the employed for the
(Bayrak et al., 2019). Wisconsin Breast Cancer classification of the
(Original) dataset and the Wisconsin Breast Cancer
classification performance of (Original) dataset.
these techniques have been
compared with each other
using the values of accuracy,
31
precision, recall, and ROC
Area. The best performance
has been obtained by the
Support Vector Machine
technique with the highest
accuracy.
10. Discovering This research explores the The study only evaluates
Mammography-based design of mammography- Machine Learning
Machine Learning based machine learning Classifiers configurations to
Classifiers for Breast classifiers (MLC) and classify features vectors
Cancer Diagnosis propose a new method to extracted from segmented
(Ramos-Pollan et al., build MLC for breast cancer regions (pathological lesion
2012). diagnosis. Moreover, the or normal tissue) on
study evaluated MLC craniocaudal (CC) and/or
configurations to classify mediolateral oblique
features vectors extracted (MLO) mammography
from segmented regions image views, providing BI-
(pathological lesion or RADS diagnosis
normal tissue) on
craniocaudal (CC) and/or
mediolateral oblique (MLO)
mammography image views,
providing BI-RADS
diagnosis
32
11. Evaluating the In this paper, three machine- In this study, only three
performance of learning algorithms (Support machine learning
Machine Learning Vector Machine, K-nearest algorithms were employed
Techniques in the neighbors, and Decision to detect which classifier
Classification of tree) have been used and the works better in the
Wisconsin Breast performance of these classification of breast
Cancer (Obaid et al., classifiers has been cancer.
2018). compared to detect which
classifier works better in the
classification of breast
cancer. Furthermore, the
dataset of Wisconsin Breast
Cancer (Diagnostic) has
been used in this study. The
main aim of this work is to
make a comparison among
several classifiers and find
the best classifier which
gives better accuracy
12. Developing a Novel The researchers used The study only aims at
Machine Learning- machine learning techniques evaluating the development
Based Classification to develop a novel of a Novel Machine
Scheme for Predicting classification scheme, which Learning-Based
Secondary Primary included the transformation Classification Scheme for
Cancers (SPCs) in of data, clustering, Predicting Secondary
33
Breast Cancer resampling, and ensemble Primary Cancers (SPCs) in
Survivors (Chang et learning (TCRE) to predict Breast Cancer Survivors.
al., 2019). Secondary Primary Cancers
(SPCs) in women to have
had breast cancer. The
results of this study suggest
that age, sequence of
radiotherapy and surgery,
surgical margins of the
primary site, HER2, dose to
CTV high, and ER, when
appropriate, should be
recommended for patients
with breast cancer.
13. Prediction of breast This study demonstrated that This study only aims to
cancer risk using a applying the LPP algorithm investigate the advantages
machine learning effectively reduced feature of applying a machine
approach embedded dimensionality, and yielded learning approach
with a locality higher and potentially more embedded with a locally
preserving projection robust performance in preserving projection (LPP)
algorithm (Heidari et predicting short-term breast based feature combination
al., 2018). cancer risk. and regeneration algorithm
to predict short-term breast
cancer risk.
34
14. Breast Cancer In this research, four Further research in this field
Prediction Using algorithms SVM, Logistic should be carried out for the
Machine Learning Regression, Random Forest, better performance of the
and KNN predict the breast classification techniques so
cancer that it can predict more
the outcome has been variables
compared using different
datasets. All experiments are
executed within a simulation
environment. The aim of the
research categorizes into
three domains. The first
domain is a prediction of
cancer before the diagnosis,
the second domain is a
prediction of diagnosis and
treatment and the third
domain focuses on outcome
during treatment. The
proposed work can be used
to predict the outcome of
different techniques and
suitable techniques can be
used depending upon the
35
requirement.
15. Application of The paper discusses an The paper only presents the
Machine Learning approach to the problem in application of machine
Models for Survival which the main factor used learning models for the
Prognosis in Breast to predict survival time is the accurate prediction of
Cancer Studies originally developed tumor- survival time in breast
( Mihaylov et al., integrated clinical feature, cancer based on clinical
2019) which combines tumor stage, data.
tumor size, and age at
diagnosis. Two datasets from
corresponding breast cancer
studies are united by
applying a data integration
approach based on
horizontal and vertical
integration by using proper
document-oriented and
graph databases which show
good performance and no
data losses.
36
CHAPTER THREE
METHODOLOGY
3.1 Project Analysis
Developing a classifier system using SVM for the classification of breast cancer is
very crucial in the field of medicine. Its accuracy and effectiveness work with increased
feature spaces in dimensionality encourages us to use the SVM for this complex problem. It
is an effective statistical learning method for pattern recognition used to find the optimum
hyperplane which separates the classes. This work will be executed by choosing diagnostic
breast cancer data set and classifying tumors as benign or malignant using SVM which will
enable medical experts by minimizing possible errors which inexperienced ones can make
especially in the diagnosis and treatment of breast cancer. In this chapter, more emphasis is
laid on how the aim of the project was achieved.
The methodologies to be used in achieving the aim of the project are:
37
- Diagnostic Breast Cancer (DBC) data set will be chosen and the data set will
further be used to classify tumors as benign or malignant.
- Independent Component Analysis (ICA) algorithm will be used to compute
two-dimensional ICs of the chosen data set.
- Reduced two-dimensional feature vector will be used to test and train Support
Vector Machine (SVM) classifiers with linear, radial basis function (RBF)
and polynomial kernels.
- The effect of ICA on breast cancer classification using the SVM classifier
will be analyzed.
- Classifiers’ performance will be evaluated.
Therefore, the above tasks are required for the effective classification of breast
cancer using SVM.
3.2 Experiment Environment
All experiments on the classifiers described in this study will be conducted using
libraries from the WEKA machine learning environment. WEKA contains a collection of
machine learning algorithms for data pre-processing, classification, regression, clustering,
and association rules. Machine learning techniques implemented in WEKA are applied to a
variety of real-world problems. The program offers a well-defined framework for
experimenters and developers to build and evaluate their models.
3.3 Breast Cancer Dataset Collection and Preparation
The Wisconsin Breast Cancer (original) datasets from the UCI Machine Learning
Repository will be used in this study. Breast-cancer-Wisconsin has 699 instances (Benign:
458 Malignant: 241). The dataset is partitioned into two classes, the benign class (B) and the
malignant class (M) which constitute 65.5% and 34.5% respectively. Breast cancer is the
38
most prominent disease in the field of medical diagnostics and is increasing each year. The
dataset has 32 features which are Radius mean, Texture mean, Area mean, Smoothness,
Compactness, and Concavity except for sample code number and class. The benign instances
are represented as a positive class as they do not affect the body badly and the malignant
instances are represented as a negative class as they are the cancerous cells that affect the
body badly in our study. There are 11 integer-valued attributes in the data set. Finally, the
data set is randomized to ensure the correct propagation of data.
3.4 SVM Algorithm Model
In the SVM algorithm model, each data item is drawn as coordinates in n-dimensional
space. Where n is the total number of features used for classification. The value of each
feature is expressed in data point coordinates. The SVM contains decision hyperplanes to
divide different classes of data points using maximum margin. Data points near hyperplanes
are called support vectors. The classification process generates non-linear decision
boundaries and classifies data points not represented in vector space. During the classification
process, the model that use SVM algorithm model of Machine Learning can classify faster
that a cancerous raw data is a benign or malignant.
39
Fig. 3.1: Flow Diagram of Work
3.5 Independent Component Analysis
Independent Component Analysis (ICA) algorithm will be is used to compute two-
dimensional ICs of WDBC data set with 30 features. The reduced two-dimensional feature
vectors will be used to test and train SVM classifiers with linear, radial basis function (RBF)
and polynomial kernels. The effect of ICA on breast cancer classification using SVM
classifier will also be analyzed. The performance measures including sensitivity, specificity,
accuracy, and the ROC curve with its criterion values will be computed and presented in
40
order to compare classification results with original feature set to classification results with
reduced two-dimensional feature vectors using ICA and SVM.
Suppose that the measured signal consists of linear combination of two independently
distributed signals. The measured signal, x can be written:
x = As (1)
where s refers a vector of source signals. A is an unknown mixing matrix consists of
constant elements. The aim of using ICA is to compute the original signals. When a
separating matrix, W which is inverse of A can be computed the original signals can be found
by
ŝ=Wx (2)
ICA computing starts with centering data by removing the mean values of the
variables as PCA. The next step is to whiten data in order to uncorrelate the data as PCA. A
linear transformation is applied to whitened data in order to compute ICs by following
equation;
ic i = b Ti x (3)
where ic is the independent component and b is the vector to reconstruct ic. Many
different approaches can be used to estimate b. they use an objective function that relates to
variable independence.
3.6 Support Vector Machine
The SVM method searches for an optimal separating hyperplane (OSH) separating
two classes. Bounds between data sets and OSH are called ‘‘support vectors”. The geometric
view of support vectors lying on the margins and the hyperplane are shown in Fig. 3.2.
41
Fig. 3.2: The geometric view of support vectors and hyperplane
The hyperplane can be found by:
g( x )=w T x+ b (4)
where, x refers data points, w is a coefficient vector and b is offset from the origin. In
linear SVM g ( x ) ≥ 0for the closest point on the one of the classes, g(x )< 0 for the closest
point belongs to another class.
The margin between support vectors is defined by;
2
d= 2 (5)
¿∨w∨¿
The margin, d should be maximized for better separation. For this reason, the norm, w
must be minimized using Lagrange function,

n
1
L p (w , b , α ) = ¿∨w∨¿ - ∑ α 1 { y 1 (W T x 1 ¿+b)−1 }¿
2
(6)
2 I=1
Here y 1 ( W T x 1+ b ) ≥ 1 i=1,2 , … , n and y 1=¿ {+ 1, -1} represents class labels, α 1 is
Lagrange multipliers LP must be minimized to compute optimal w and b .
In the case of non-linear classification problem SVM with kernel function is used.
The kernel function maps data onto a higher dimensional space to construct a hyperplane
42
separating the classes. The new discriminant function used in SVM with kernel functions is
found by:
T
g ( x )=w ø (X )+ b (7)
Here Ø (X ) represents the mapping of input data, x onto the kernel space. Therefore, the
optimization equation can be written as:
Maximize ¿ (8)
where K ( xi , xj) refers the kernel function. The kernel functions are RBF or
polynomial.
3.7 The Performance Measure Indices
The performance of machine learning techniques is measured by several performance
indicators. A confusion matrix for actual and predicted class is formed comprising of TP
(True Positive), FP (False Positive), TN (True Negative), and FN (False Negative) to evaluate
the parameter. The significance of the terms is given below.
TP (True Positive) = Correctly Identified
TN (True Negative) = Incorrectly Identified
FP (False Positive) = Correctly Rejected
FN (False Negative) = Incorrectly Rejected
The most common empirical measure to evaluate the effectiveness is accuracy for classifier
and it is formulated by
TP+ FN
Accuracy = (9)
TP+ FP+ FN +TN
The proportion of actual positives which are correctly identified is the sensitivity and
Specificity is simply the proportion of negatives which are correctly identified. These are
calculated by
43
TP
Sensitivity = (10)
TP+ FN
TN
Specificity = (11)
TN + FP
The diagnostic performance of a test or a classifier to separated diseased cases from healthy
cases will be evaluated using the ROC curve analysis.
References
44
Abdul Halim, A., Andrew, A., Mohd Yasin, M., Abd Rahman, M., Jusoh, M., and
Veeraperumal, V. (2021). Existing and Emerging Breast Cancer Detection
Technologies and Its Challenges: A Review. Applied Sciences, 11(22), 10753.
https://doi.org/10.3390/app112210753
Ali, E., and Feng, W. (2016). Breast Cancer Classification using Support Vector Machine
and Neural Network. International Journal of Science and Research (IJSR), 5(3), 1-
6. https://doi.org/10.21275/v5i3.nov161719
Alzubi J., Nayyar A., and Kumar A. (2018). Machine Learning from Theory to
Algorithms: An Overview. Journal of Physics: Conference Series, 1142, 012012.
https://doi.org/10.1088/1742-6596/1142/1/012012
Atif, M., Siddiqui, Jamshed, J., Talib, F., and Sohail, S. (2020). Applications Of Machine
Learning Techniques for Disease Diagnosis: A Review. Journal of Critical Reviews.
7, 2652-2661. https://doi.org/10.31838/jcr.07.17.330.
Amrane M., Oukid S., Gagaoua I., and Ensari T. (2018). Breast Cancer Classification
Using Machine Learning. Conference: 2018 Electric Electronics, Computer Science,
Biomedical Engineerings' Meeting (EBBT).
Anand, A. (2022). Top 6 Machine Learning Techniques | Analytics Steps.
Analyticssteps.com. Retrieved 9 June 2022, from
https://www.analyticssteps.com/blogs/top-6-machine-learning-techniques.
Barrios, C. (2022). Global challenges in breast cancer detection and treatment. The
Breast, 62, S3-S6. https://doi.org/10.1016/j.breast.2022.02.003
Bellaachia, A., and Guven, E. (2006). Predicting breast cancer survivability using data
mining techniques. Age. 58(13), 10-110.
45
Biq Q., Goodman K., Kaminsky J., and Lessler J., (2019). What is Machine Learning? A
Primer for the Epidemiologist. American Journal of Epidemiology, 188(12), 2222-
2239.
Butryn, B., Chomiak-Orsa, I., Hauke, K., Pondel, M., and Siennicka, A. (2021).
Application of Machine Learning in medical data analysis illustrated with an
example of association rules, Procedia Computer Science, 192, 3134-3143
CDC, (2022). Retrieved 9 June 2022, from
https://www.cdc.gov/cancer/breast/basic_info/what-is-breast-cancer.htm.
Duda R.O, Hart P.E, Stork D.G., (2012). Pattern Classification. 2nd ed. Hoboken, NJ: John
Wiley & Sons, Inc.; 2012:517.
Ebrahim, E., and Wu, Z. (2016). Breast Cancer Classification using Support Vector
Machine and Neural Network. International Journal of Science and Research
(IJSR), 5(3), 1-6. https://doi.org/10.21275/v5i3.nov161719
Hasan, S., Sagheer, A.M., and Veisi, H., (2021). Breast Cancer Classification Using
Machine Learning Techniques: A Review. Turkish Journal of Computer and
Mathematics Education (TURCOMAT), 12, 1970-1979.
Islam, M. J., Wu, Q. M. J., Ahmadi, M., & Sid-Ahmed, M. A. (2007). Investigating the
Performance of Naive- Bayes Classifiers and K- Nearest Neighbor Classifiers. 2007
International Conference on Convergence Information Technology (ICCIT 2007).
Janiesch, C., Zschech, P., and Heinrich, K. (2021). Machine learning and deep learning.
Electron Markets. 31, 685–695
Kokilavani, T., and Beena L. A, (2021). Machine Learning–Based Case Studies for
Healthcare Analytics. Machine learning and analytics in healthcare systems 1, 18.
https://doi.org/10.1201/9781003185246
Kumar K.S., and Rajendran A., Machine Learning Classifiers in Health Care, 1, 22.
46
Längkvist M, Karlsson L, and Loutfi A. (2014). A review of unsupervised feature learning
and deep learning for time-series modeling. Pattern Recognition Letters, 42(1):11-24
Leung, K. M. (2007). Naive bayesian classifier. Polytechnic University Department of
Computer Science/Finance and Risk Engineering, 123-156.
Medjahed, S., Saadi T., and Benyettou A. (2013). “Breast Cancer Diagnosis by using k-
Nearest Neighbor with Different Distances and Classification Rules,” International
Journal of Computer Applications, 62(1), 0975 – 8887
Qifang Bi, Katherine E Goodman, Joshua Kaminsky, Justin Lessler, what is Machine
Learning? A Primer for the Epidemiologist, American Journal of Epidemiology,
188(12), 2222–2239
Panahiazar, M., Chen, N., Lituiev, D., and Hadley, D. (2021). Empowering study of breast
cancer data with application of artificial intelligence technology: promises,
challenges, and use cases. Clinical & Amp; Experimental Metastasis, 39(1), 249-254.
https://doi.org/10.1007/s10585-021-10125-8
Rani A., Kumar N., Kumar J., Kumar J., and Sinha N. K. (2022). Chapter 6 - Machine
learning for soil moisture assessment. In Cognitive Data Science in Sustainable
Computing, Deep Learning for Sustainable Agriculture, Academic Press, 143-168,
Sahu, B., and Panigrahi, A. (2020). Efficient Role of Machine Learning Classifiers in the
Prediction and Detection of Breast Cancer. 5th International Conference on Next
Generation Computing Technologies (NGCT-2019).
Saritas M. M, and Yasar A. (2019). “Performance Analysis of ANN and Naive Bayes
Classification Algorithm for Data Classification”, International
Journal of Intelligent Systems and Applications in Engineering (IJISAE). 7(2), 88–
91.
47
Siddiqui, M.K., Morales-Menendez, R., Huang, X., and Hussain N., (2020). A Review of
Epileptic Seizure Detection using Machine Learning Classifiers. Brain Infection. 7,
(5).
Sidey-Gibbons, C., Pfob, A., Asaad, M., Boukovalas, S., Lin, Y. L., Selber, J. C., Butler C.
E., and Offodile, A. C. (2021). Development of machine learning algorithms for the
prediction of financial toxicity in localized breast cancer following surgical
treatment. JCO clinical cancer informatics, 5, 338-347.
Sung, H., Ferlay, J., Siegel, R., Laversanne, M., Soerjomataram, I., Jemal, A., and Bray, F.
(2021). Global Cancer Statistics 2020: GLOBOCAN Estimates of Incidence and
Mortality Worldwide for 36 Cancers in 185 Countries. CA: A Cancer Journal for
Clinicians, 71(3), 209-249. https://doi.org/10.3322/caac.21660
Tahmooresi M., Afshar A., Rad B., Nowshath K. B., and Bamiah M. A. (2018). Early
Detection of Breast Cancer Using Machine Learning Techniques. Journal of
Telecommunication, Electronic, and Computer Engineering. 10, (3-2).
Tarca A.L, Carey V.J, Chen X, Romero R, and Draghici A., (2007). Machine Learning and
its Applications to Biology. PLoS Computational Biology. 3(6): e16.
Tijo B. T., and Abdulazeez A. M., (2021). Classification Based on Decision Tree
Algorithm for Machine Learning. Journal of Applied Science and Technology
Trends, 2(1), 20 – 28.
Toh, C., and Brody, J. P. (2020). Applications of Machine Learning in Healthcare. In
(Ed.), Smart Manufacturing - When Artificial Intelligence Meets the Internet of
Things. IntechOpen.
WHO- Cancer Report, (2020). Retrieved 10 June 2022, from https://www.who.int/news-
room/fact-sheets/detail/cancer.
48
Zhu X., Goldberg A. B., (2009). Introduction to Semi-Supervised Learning. San Rafael,
CA: Morgan and Claypool Publishers; 1(11).
49

SVM 1-3

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

SVM 1-3

Uploaded by

Copyright:

Available Formats

DEVELOPMENT OF MACHINE LEARNING CLASSIFIERS FOR BREAST CANCER

1.1 Background of the Study

that is present in the breast which develops into cancer.

practitioners in accurately diagnosing and aptly treating the patients.

minimizing possible errors which inexperienced experts can make.

Artificial intelligence (AI) technologies are significant in usage as they provide an

Machine learning is a discipline of study that aims to train machines to do cognitive

significant commercial insights. Machine learning algorithms employ a computative

of machine learning (Janiesch et al., 2021).

As a result, a practitioner in machine learning may come across a variety of forms of

learning, ranging from entire fields of research to individual methodologies. Machine

classification approaches to advance the models in machine learning. The classification

temperature, then it is more suitable to use regression techniques. Unsupervised learning

detects hidden patterns or internal structures in unsupervised learning data. It is used to

Support Vector Machine (SVM) is a new approach to supervised pattern classification

binary classifier is to construct a hyperplane separating class members from non-members in

a simple convex optimization problem.

Significant disparities can be recognized at the presentation stage as there exists a

consequences associated with late diagnoses.

a complicated problem involving multiple stakeholders and several aspects to consider.

complex but small or medium-sized datasets.

advanced approach for effective treatment.

The effective approach to using a machine language algorithm known as a Support

Vector Machine (SVM) is to solve problems by recognizing patterns, regression, and

1.4 Aim and Objectives

The specific objectives of this research are to;

(i) select diagnostic data set for breast cancer.

(iii) test and practice support vector machine (SVM) classifier.

This study is limited to;

iii. The performance measures will be evaluated using ROC analysis

classifiers will only include sensitivity, accuracy, and specificity

2.1 Definition of Machine Learning

Machine Learning is an interdisciplinary field of study that evaluates algorithms that

high-performance output, such as medical diagnosis, internet stock trading, detection of

testing of the model (Amrane et al., 2018).

medium of solving issues for epidemiologists seeking to integrate machine learning

and random forest (Siddiqui et al., 2020).

information about a patient’s health (Kokilavani et al., 2021).

Source: Alzubi et al., (2018).

algorithms that perform a prediction or classification of interest. In healthcare management, a

In machine learning, a classifier is an algorithm that automatically assigns data points

recognition to classify unlabeled datasets, progressively becoming more accurate (Indeed,

Machine learning is widely classified based on a computer’s learning which could

either be supervised learning or unsupervised learning. Supervised learning fits models to

prediction. Supervised learning models usually focus on regression algorithms and

patients involves doctors in clinics and healthcare environments, to classify illnesses or

Unlabeled data is being employed by unsupervised learning to detect patterns within

models are regarded to be unsupervised (Längkvist et al., 2014).

2.3 Algorithms Classifications in Machine Learning

Source: Hasan et al., (2021).

assumption is called a class conditional independence which is made to simplify the

segments like email subject lines, customer comments, and articles.

2.3.2 Decision tree

Decision tree classifiers are regarded to be a standout of the most well-known

methods of data classification representation of classifiers. Different researchers from various

(Jijo et al., 2021).

condition (Leung, 2007).

2.3.3 Artificial neural networks

An artificial neural network is a well-known machine technique influenced by the

requires large amounts of training data.

network, Feedback neural network, Re-current neural network, Classification-prediction

Fig. 2.3: Decision Tree

Source: Jijo et al., (2021).