Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 49

DEVELOPMENT OF MACHINE LEARNING CLASSIFIERS FOR BREAST CANCER

CHAPTER ONE

INTRODUCTION

1.1    Background of the Study

Cancer is known to be a disease that causes the uncontrollable growth of cells in any

part of the body. These malformed cells called tumor cells or malignant cells can penetrate

the tissues that are present in the human body. The continuous splitting of cells that have

been developed over time could result in the formation of a lump, microcalcifications, or

architectural distortions which are usually referred to as tumors (Ebrahim and Wu, 2016).

These tumors can be cancerous or not cancerous (benign).  Breast cancer is known to be a

malignant tumor that grows in the cells of breast tissue. Breast cancer has been identified to

be one of the leading causes of death rate among women (Barrios, 2022). There exist diverse

types of breast cancer, however, the type of breast cancer is dependent on the particular cell

that is present in the breast which develops into cancer. 

Most breast cancers begin in the ducts or lobules. There exists the possibility of breast

cancer spreading outside the breast through blood vessels and lymph vessels. When breast

cancer spreads to various body parts, it is said to have metastasized (CDC, 2022). Breast

cancer can easily be diagnosed by the radiologist only if the abnormalities in the breast are

detected early. Accurate diagnosis and early detection have been shown to decrease the

mortality rate arising from breast cancer. Humans can make mistakes while diagnosing a

disease due to limitations and it depends on the expertise of doctors. The effective tools that

can be used to diagnose breasts that have been affected by cancer will aid medical

practitioners in accurately diagnosing and aptly treating the patients. 

Diagnosis made with the help of machine learning tends to diagnose more correctly

(91.1%) when compared to a diagnosis made by an experienced physician (79.97 %). There

1
is a gradual increase in the usage of classifier systems for medical diagnosis (Ali and Feng,

2016). The evaluation and making decisions of expert physicians are important factors.

However, the classifier algorithms with artificial intelligence also help experts especially by

minimizing possible errors which inexperienced experts can make.

Artificial intelligence (AI) technologies are significant in usage as they provide an

improved response that is sensitive to time while reducing the rates of error for individual

patients. The increase in the generation of clinical notes, diagnostic images, and reports is

being saved in medical records that are electronically generated (Panahiazar et al., 2021).

This heterogeneous data result in challenges associated with data analytics and reusability

which has the nature of high complexity, thus bringing about modern methods to store,

manage and process, and reuse big data. This proposes the need to urgently advance new, and

extensible AI frameworks and generic processes that can allow the providers of healthcare to

access knowledge for individual patients, yielding better decisions and outcomes.

Machine learning is a discipline of study that aims to train machines to do cognitive

tasks in the same way that humans do. While they have far fewer cognitive abilities than

ordinary people, they are capable of quickly processing data of large amounts and extracting

significant commercial insights. Machine learning algorithms employ a computative

approach to “learn” information straight away from data rather than depending on a model

that is based on a preconceived equation. As the number of resources available for learning

grows, the algorithms alter their performance. Deep learning is an especially complex part

of machine learning (Janiesch et al., 2021).

As a result, a practitioner in machine learning may come across a variety of forms of

learning, ranging from entire fields of research to individual methodologies. Machine

learning employs the use of two methods. The first method is supervised learning which

trains a model to predict future outputs based on the input and output data that has been

2
known. The second method which is unsupervised learning uses concealed patterns or

internal structures in its input data. Supervised learning makes use of regression and

classification approaches to advance the models in machine learning. The classification

approach classifies the input data and predicts discrete responses, while the regression

approach predicts continuous responses. If the data range or response nature that is being

worked upon is a real number, like time until there is a failure in the piece of equipment or

temperature, then it is more suitable to use regression techniques. Unsupervised learning

detects hidden patterns or internal structures in unsupervised learning data. It is used to

eliminate datasets containing input data without labeled responses (Anand, 2022).

Support Vector Machine (SVM) is a new approach to supervised pattern classification

which has been effectively implemented in a wide range of pattern recognition problems and

it is likewise a training algorithm for learning regression and classification rules from data.

SVM is mostly appropriate for accurate and effective work with increased feature spaces in

dimensionality. In addition to that, SVM is strongly rooted in mathematics and results in the

simplest way and very powerful algorithms (Ali and Feng, 2016). 

The standard SVM algorithm builds a binary classifier. A simple way to model a

binary classifier is to construct a hyperplane separating class members from non-members in

the input area. SVM also finds a nonlinear decision function in the input space by mapping

the data into a higher dimensionality feature space and separating it using a maximum margin

hyperplane. The system has been developed in a way that it automatically detects a member

of informative points known as support vectors and further uses them to depict the

disconnecting hyperplane that is rarely a linear synergy of these points. Finally, SVM solves

a simple convex optimization problem.

3
1.2    Statement of Problem

Breast cancer depicts a crucial priority across the globe. As this becomes a universal

concern and as the burden to which the disease increases globally, the current evaluation

signifies that in a couple of decades to come, much of the incidence and mortality related to

breast cancer will be seen in underserved populations. It is of high necessity for the delicate

and ill-equipped healthcare systems in low- and middle-income countries (LMIC) to take up

this challenge to provide solutions with the short supply of their resources.

Significant disparities can be recognized at the presentation stage as there exists a

compromise in detecting the earlier stage of the disease, thereby resulting in terrible

consequences associated with late diagnoses.

There are challenges addressing breast cancer at a global level indicating critical

disparities. Recent information from the World Health Organization (WHO) indicates that

although 70% of countries have established cancer guidelines and 62% report screening

programs, at the same time, 40% report important management and treatment access

restrictions and less than half have palliative care plans (WHO- Cancer Report, 2020). This is

a complicated problem involving multiple stakeholders and several aspects to consider.

Focusing on the most important, we could argue that access to healthcare, late-stage

diagnosis, and lack of timely and appropriate diagnostic and treatment procedures are

probably on the top of the list. Somehow these issues are all intimately related (Sung et al.,

2021).

Diagnosis of this deadly disease is also one of the major challenges medical

practitioners face especially in countries that are developing such as Nigeria. However,

correct, and prompt diagnosis of diseases is an essential matter in the medical field. Limited

human capability and limitations decrease the rate of correct diagnosis. Algorithms of

machine learning such as support vector machine (SVM) among other techniques can help

4
physicians to diagnose more correctly because it has been identified to be a very dynamic and

flexible model of machine learning, efficient in conducting even outlier detection, linear or

nonlinear classification, and regression. SVM is the most appropriate for the classification of

complex but small or medium-sized datasets. 

1.3    Justification 

The constant spread of cancerous cells in the breast needs to be properly taken care of,

in the bid to lessen the death rates in women. Measures should likewise be employed that

would aid the detection of the early stage growth of the cancerous cells to avoid the worse

conditions that emanate from late cancerous cell growth. For a precise diagnosis, this

research is purposed to move from the old approach to solving cancer problems to an

advanced approach for effective treatment.  

The effective approach to using a machine language algorithm known as a Support

Vector Machine (SVM) is to solve problems by recognizing patterns, regression, and

classification. The models in SVM entail collecting the data of cancerous cells, pre-

processing to determine if the cells were benign or malignant, extracting the features of the

data, building a predictive model, and validating the model through the ROC curve.   

1.4    Aim and Objectives

The major aim of this research is to classify breast cancer by using a support vector machine

(SVM).

The specific objectives of this research are to;

(i) select diagnostic data set for breast cancer.

(ii) compute two-dimensional independent component analysis of the selected data set.

(iii) test and practice support vector machine (SVM) classifier.

(iv) evaluate the performance of breast cancer classifiers using ROC curve analysis and

empirical measures.

5
1.5    Scope of the Study

This study is limited to;

i.  The usage of machine learning techniques in the classification of cancer of the breast

ii. Support vector machine as the only machine learning technique employed

iii. The performance measures will be evaluated using ROC analysis

iv. The empirical measures that will be used to evaluate the effectiveness of the

classifiers will only include sensitivity, accuracy, and specificity

6
CHAPTER TWO

LITERATURE REVIEW

2.1 Definition of Machine Learning

Machine Learning is an interdisciplinary field of study that evaluates algorithms that

facilitate pattern, recognition, classification, and prediction based on models derived from

existing data. This learning pertains to diverse computing areas for program algorithms with

high-performance output, such as medical diagnosis, internet stock trading, detection of

online fraud, filtering of email spam character detection, and recognition, to mention a few

(Alzubi et al., 2018). Moreover, these are a set of tools utilized for creating and evaluating

algorithms that facilitate prediction, pattern recognition, and classification. This learning is

based on four steps which are; a collection of data, selection of model, model training, and

testing of the model (Amrane et al., 2018).

In broad terms, there are two scopes to be examined under machine learning, the first

which is machine learning output as a classifier that can be suitable for hardware, to create

the classifier to be highly mechanized without involving much human input, and the second

scope which examines the use of automatic algorithm construction methods that can

minimize the chances that human biases may affect the algorithm performance (Tarca et al.,

2007). This branch of Computer science enables computers to learn without being directly

programmed as it has the potential to transform epidemiological sciences and offers a new

medium of solving issues for epidemiologists seeking to integrate machine learning

techniques into research work. Some machine learning concepts lack statistical or

epidemiologic parallels and their terminologies differ where underlying concepts are the

same.

7
Researchers from different backgrounds of study have employed machine learning on

biology data and healthcare systems for improved outcomes. This technology performs a

crucial role in resolving healthcare issues and other scientific problems. Image recognition,

medical diagnosis, extraction, and statistical arbitrage, amongst other areas of human life, are

where machine learning can be applied, and this application can be carried out by machine

learning classifiers like Artificial Neural Networks, decision trees, Support Vector Machine,

and random forest (Siddiqui et al., 2020).

Currently, machine learning plays a vital role in the field of healthcare analysis by

helping in early diagnosis and analysis which improves medical image processing and

reduces processing time, especially in healthcare informatics. Before the recent advancement

and use of machine learning in healthcare systems, it had been challenging to achieve

accurate results (Kumar et al., 2021). The healthcare industry generates clinical information

in form of big data which is stored in form of electronic health records for effective

healthcare management.

However, the medical industry needs access to appropriate technology for effective

data analysis the aid of detecting diseases, and recommending treatment and clinical services

to doctors. Thus, algorithms such as decision tree, naïve Bayes and random forest algorithms

help classify a patient dataset based on a particular disease, predicting the presence of

diseases such as cancer, heart disease, and neurodevelopment disorders, and providing more

information about a patient’s health (Kokilavani et al., 2021).

8
Fig. 2.1: Multi-disciplinary Machine Learning

Source: Alzubi et al., (2018).

9
2.2 Machine Learning Classifiers

Classifiers also known as estimators are generally used in machine learning to refer to

algorithms that perform a prediction or classification of interest. In healthcare management, a

machine learning classifier gives decisions from a patient’s data which further reduces the

work of medical practitioners, with treatment being carried out within a shorter time. Its

algorithms are effective in analyzing both structured and unstructured data and are useful for

classifying the patient dataset based on a particular disease (Qifang et al., 2019).

In machine learning, a classifier is an algorithm that automatically assigns data points

to a range of categories or classes. Within the classifier category, there are two main models:

supervised and unsupervised. In the supervised model, classifiers train to make distinctions

between labeled and unlabeled data. This training allows them to recognize patterns and

ultimately operate autonomously without using labels. Unsupervised algorithms use pattern

recognition to classify unlabeled datasets, progressively becoming more accurate (Indeed,

2022).

Machine learning is widely classified based on a computer’s learning which could

either be supervised learning or unsupervised learning. Supervised learning fits models to

both labeled and unlabeled data. This learning supplements limited labeled data with an

abundance of unlabeled data to improve model performance, though studies reflect that

unlabeled data helps build a better classifier but appropriate model selection is critical (Zhu

et al., 2009). Supervised learning is based on a data training sample from the data source with

the correct classification already assigned. On the other hand, unsupervised learning is a

10
machine learning discipline that discovers patterns in large data sets or classifies the data into

several categories without being trained explicitly. This algorithm identifies natural

relationships within data without reference to any outcome (Duda et al., 2012).

Generally, supervised learning applies data training usually known as labeled data.

This training data has one or more inputs with a labeled output. Models use these labeled

results to access themselves during training with the sole purpose of improving new data

prediction. Supervised learning models usually focus on regression algorithms and

classification. In the medical field, classification problems are common, and diagnosing

patients involves doctors in clinics and healthcare environments, to classify illnesses or

diseases based on symptoms. Algorithms involved in this learning are; decision trees, random

forests, Naïve Bayes models, support vector machines, and linear and logistic regression,

however, Neural Networks can be trained with supervised learning (Toh et al., 2020).

Unlabeled data is being employed by unsupervised learning to detect patterns within

the data. These learning algorithms excel at clustering data into relevant groups to detect

latent characteristics, however, they are more algorithmically intensive, requiring a large

amount of data to operate. Deep learning and K-means clustering are well-known and

common algorithms, although, in a supervised approach. Since there is no human input, the

models are regarded to be unsupervised (Längkvist et al., 2014).

2.3 Algorithms Classifications in Machine Learning

Although machine learning can be classified into two main types which are

supervised learning and unsupervised learning, there are a wide variety of classification

algorithms used in machine learning and each one uses a different mechanism to analyze

data. These are five common types of classification algorithms in machine learning which

are; Naïve Bayes, Logistic Regression, K-Nearest Neighbors, Decision Tree, and Support

Vector Machines.

11
Fig. 2.2: Machine Learning Approaches

Source: Hasan et al., (2021).

12
2.3.1 Naive Bayes classifier

The Bayesian classifier is based on Bayes’ theorem which assumes that the effect of

an attribute value on a given class is independent of the values of the other attributes. This

assumption is called a class conditional independence which is made to simplify the

computation involved and as a result is considered naïve. These classifiers are statistical

classifiers that predict class membership probabilities (Leung, 2007). A naïve Bayes

algorithm calculates the probability associated with each possible class conditional on a set of

covariates— that is, the product of the prior probability and the likelihood function. The

classifier then selects the class with the highest probability as the “correct” class.

Naive Bayes classifiers use probability to predict whether an input will fit into a

certain category. The Naive Bayes algorithm family includes a range of different classifiers

based on a theorem of probability. These classifiers can determine the probability of an input

fitting into one or more categories. In multiple category scenarios, the algorithm reviews the

probability that a data point fits into each classification. After comparing the probability of a

match in each category, it outputs the category that is most likely to match the given text

(Saritas et al., 2019). Many companies use this type of algorithm to assign tags to text

segments like email subject lines, customer comments, and articles.

2.3.2 Decision tree

Decision tree classifiers are regarded to be a standout of the most well-known

methods of data classification representation of classifiers. Different researchers from various

fields and backgrounds have considered the problem of extending a decision tree from

available data, such as machine study, pattern recognition, and statistics. In various fields

such as medical disease analysis, text classification, user smartphone classification, images,

13
and many more the employment of Decision tree classifiers has been proposed in many ways

(Jijo et al., 2021).

A decision tree is a classification algorithm that uses a process of division to split data

into increasingly specific categories. It's called a decision tree because the classification

process resembles a tree's branches when represented graphically. The algorithm works on a

supervised model and requires high-quality data to produce good results. Since the primary

goal of a decision tree is to make increasingly specific distinctions, it must continuously learn

new classification rules. It learns these rules by applying if-then logic to training data. The

algorithm continues the classification process until it reaches reaches a designated stopping

condition (Leung, 2007).

2.3.3 Artificial neural networks

An artificial neural network is a well-known machine technique influenced by the

biological neural network in the human brain. These are computing frameworks made up of

many individual algorithms. Their mechanism of action mimics how human brains work, and

includes a collection of artificial neurons that transmit signals. This makes artificial neural

networks capable of solving extremely complex problems that involve multiple layers.

Because of their complexity, it can be challenging to train and adjust ANNs, and it often

requires large amounts of training data.

However, a fully trained ANN can perform tasks that would be impossible for single

algorithms. There are many types of artificial neural networks, including Feed forward neural

network, Feedback neural network, Re-current neural network, Classification-prediction

network, Radial basis function network, Dynamic neural network, and Modular neural

network (Saritas et al., 2019). An artificial neural network (ANN) is a data processing

algorithm in which the computations simulate a biological neural network An ANN model

14
has three layers: an input layer, a hidden layer, and an output layer. Links connect nodes in

different layers.

Fig. 2.3: Decision Tree

Source: Jijo et al., (2021).

15
Nodes in the input layer represent predictors, and nodes in the output layer represent

outcomes (Lou et al., 2020). A common application of neural networks is the multilayer back

propagation learning algorithm, which models nonlinear systems.

2.3.4 K-nearest neighbor

K-nearest neighbor (KNN) is a supervised lazy learner algorithm used in machine

learning. This means that it stores the training data that supervisors present and compares it to

other data to make predictions. While the training period for these algorithms is often shorter

than for "eager learners," they're often slower to make predictions. After storing its training

data, a KNN algorithm compares it with test data and measures the degree of similarity

between them. It then stores all instances that correspond with the training data. Next, the

algorithm attempts to predict the likelihood that future data will correspond to the dataset it

compiled. While this algorithm is common in classification, many professionals also use it to

complete regression tasks (Islam et al., 2007). This algorithm has been used in different

applications such as healthcare, finance, image, and video recognition, and in handwriting.

The algorithm first trains a model with labeled data with different classes and then tests the

model using new points. The algorithm calculates the nearest known neighbor points to the

new data points using one of the approaches such as Manhattan distance, Hamming distance,

Minkowski distance, and Euclidean distance. The new point is classified to a known class

that is nearest to this point, the algorithm repeats the same procedure for all new points.

Combing K-NN with SVM can improve the efficient diagnosis.

2.3.5 Support vector machine

16
A Support Vector Machine is a set of supervised learning methods used for

classification and regression problems. This learning method is a simple algorithm that

professionals can use for classification or regression activities. They work by finding

hyperplanes within a data distribution, which you can visualize as a line separating two

different classes of data. There are often many hyperplanes capable of separating the data,

and the algorithm will select the optimum line of separation. In the SVM model, the optimum

hyperplane is the dividing line that offers the greatest margin between the different classes.

SVMs can work in multiple dimensions if they are unable to find an ideal hyperplane to

separate the data into two dimensions. This makes them extremely effective for creating

classifications from complicated data distributions. The more complex the data inputs are, the

more accurate the SVM becomes, making them excellent machine learning tools (Qifang et

al., 2019).

SVMs are binary classifier algorithms that seek to create a linear boundary that

separates classes in a high-dimensional feature space. To successfully create linear separators

(known as hyperplanes) within complex nonlinear data sets, SVMs use a technique known as

a kernel trick. The kernel trick allows the algorithm to transform the input data to straighten

out the complexity and to allow a linear hyperplane to separate the classes (e.g, will a patient

experience financial toxicity or not) (Rani et al., 2022).

2.4 Classification of Breast Cancer

One of the significant and common diseases that causes death in women is breast

cancer. This cancer is a complex multifactorial disease encompassing a great variety of

entities that show considerable variation in clinical, morphological, and molecular attributes.

The causes of this disease are multifaceted and are transmitted through genes, hormones,

reproductive factors, and history of family or hereditary factors, leading to millions of death

cases in women every year. However, half of the percentage involved in this disease die due

17
to late detection by doctors, according to the world health organization (Amrane et al., 2018).

It is essential to prevent the progression of this disease in women through early diagnosis and

treatment to reduce morbidity rates.

Fig. 2.4: Components of SVM

Source: (Rani et al., 2022).

18
This diagnosis is carried out by detection and regular checkups with the use of

ultrasound imaging and mammography followed by breast tissue biopsy once the check-up

shows the possibility of malignant tissue growth. Breast cancers are common among women

and are very rare in men. They are not common in men due to the less development of breast

cells in males. The center for disease control and prevention has given the statistics that one

out of every hundred breast cancers identified is identified in men. Most of them do not check

for signs of lumps as a result male breast cancers are diagnosed at a much later stage

(Nesamani et al., 2021).

Based on the heterogeneity of breast cancer, it cannot be viewed as a clinic pathology

entity, must necessarily be dissected into homogeneous entities, thus it needs to be classified.

Generally, a suitable classification of any disease must be scientifically sound, clinically

useful, easily applicable, and widely reproducible, but unfortunately, the perfect classification

of breast cancer has not been expressly stated in years of research to date. Breast cancer is

caused by a mutation in one cell which can be shut down by the body system or cause the

division of cells.

19
Fig. 2.5: Types of Cancer. Source: (Tahmooresi et al., 2018).

20
Moreover, malignant tumors expand to neighboring blood cells that can most likely

extend to other parts, although benign masses cannot expand to other tissues. It may be

difficult to detect breast cancer at an early stage due to no symptoms in the breast, but after

tests and diagnosis, malignant and benign tumors will be easily differentiated (Amrane et al.,

2018). The interconnection between breast cancer and machine learning has existed for

decades to classify tumors and other related malignancies.

2.5 Application of Machine Learning Technology for Diagnostic Analysis

The application of machine learning technology in various capacities to solve life

problems has changed the way problems are detected and solved. In healthcare management,

machine learning has been employed for different varieties of healthcare data. Some well-

known Machine learning techniques which include Neural Network (NN), K-Nearest

Neighbor (K-NN), Decision Tree (DT), and classical Support Vector Machine (SVM) are

used for diagnosing diseases such as cancer, cardiovascular disease, diabetes, hepatitis, and

another related disease (Mohammad et al., 2020).

The use of machine learning in healthcare management examines many varying data points

and gives accurate results. This learning is a significant field of Artificial Intelligence and it

continues to evolve. however, some valuable machine learning applications in medical and

healthcare are in the detection and treatment of disease, discovery, and manufacturing of

drugs, medical imaging diagnosis, medicine, health records, behavioral modification through

machine learning, clinical research, and trials, amongst others (Butryn et al., 2021).

Due to the regular increase in the fatality rate of diseases like diabetes, cancer, heart

disease, and hepatitis, and the large volume of patient data used for getting valuable

information, scientists utilize data mining to tackle real-world medical issues and treat

diseases. These days Data mining technology is employed by scientists in assessing

21
numerous diseases like cancer, diabetes, cardiovascular disease, and hepatitis, and many

machine learning techniques when implemented showed accuracy in results (Obaid et al.,

2018).

Machine learning is a key technology in supporting management processes in all areas

of life such as the healthcare system by collating information resources about diseases and

researching for diagnostic tools in an adequate field of application for artificial intelligence

tools. Machine learning and neural networks are technologies that can be used for data

analysis of patients through categorization to create advanced predictive models that

determine the probability of a patient suffering from a specific disease. This is of particular

importance in the implementation of diagnostic processes, which, through the support of

artificial intelligence solutions, can become significantly more effective (Butryn et al., 2021).

Breast Cancer is a class of disease that has a usual growth of cells that is prone to

attack and expands to different organs of the body. Medical specialists categorize cancer into

different types based on where it develops in the body such as sarcomas, leukemias,

carcinomas, and lymphomas which are four major types of cancer. Amongst the female

gender, breast cancer, colorectal cancer, lung cancer, and cervical cancer are the most

common types, while prostate cancer, stomach, lung, and colorectal cancer are predominant

among the male gender. Although other cases of death by cancer are caused by smoking,

obesity, poor nutrition, lack of body exercise, and excessive consumption of liquor, while

some are inherited from hereditary problems. All these symptoms are detected through

machine learning technology, specifically through medical imaging using biopsy, a procedure

used in removing tissue pieces or cell samples from the body for laboratory tests Mohammad

et al., 2020).

22
Fig. 2.6: The main Application of Machine Learning in Medicine

Source: (Obaid et al., 2018).

23
2.6 Role of Machine Learning in Breast Cancer Classification

Machine Learning plays a crucial role in a wide range of critical applications and the

the healthcare management system, it helps find out the biomarker gene to assess and

diagnose diseases. Breast cancer has become a significant concern in the medical field and

has been the recurrent reason behind death cases globally. According to India’s statistics over

the years, over a million cases of breast cancer is recorded with thousands of women

suffering it and most likely losing their live due to late diagnosis of the disease. An early

diagnosis of this disease will lower death and save the lives of breast cancer patients.

To detect and identify this disease, mass spectrometry is employed and combined

with various tools to boost the accuracy of pathological analysis using biomarkers. Once the

diagnosis analysis is collected from the patient, it is then analyzed under specific biological

circumstances and pathological procedures. However, in classifying cancerous and non-

cancerous genes conditions, biomarkers are distinctive determinant features. Machine

learning is employed in the detection of Breast cancer and classification is carried out with

different computing techniques to the dataset used to determine the probability of cancer

(Bellaachia et al., 2006). Machine learning plays a vital role in the diagnosis and treatment of

breast cancer with the Supply Vector Machine being the most efficient amongst other

techniques, as its classification performance derives a better result in accuracy and sensitivity

of diagnosis conclusions, thus it becomes most employed as a diagnostic instrument for

accurate prediction and detection of breast cancer (Sahu et al., 2020).

2.7 Support Vector Machine for Breast Cancer Classification

Support Vector Machine is classified under the supervised learning pattern to train an

algorithm for learning classification and regression from collated data. This algorithm has

24
been utilized by various researchers in solving different problems in regression and

classification, the latter is commonly used. According to the number of features, n-spaces are

formed where each coordinate is created for each feature. This algorithm tries to draw

different new lines, which are called hyperplanes, among the n-spaces to find out the best line

that has the maximum margin. The maximum margin can be defined as s margin that

segregates between different classes, which are represented by data points. Various studies

have used this algorithm to classify breast cancer tumors that achieved promising results.

These studies utilized different algorithms (i.e., SVM, K-NN, C4.5, NB, K-means, EM,

PAM, and fuzzy c-means). It was found that the SVM algorithm achieved higher accuracy

than other algorithms (Medjahed et al., 2013).

2.8 Previous Related Research

Table 3.1 below presents some literature reviews of previous related works done by

various researchers.

S/ Title, Author(s), and Research Limitations and


Research Findings
N Year of Publication. Gaps

1. Breast Cancer The researchers present two In this study, only two

Classification using different classifiers which different classifiers were

Machine Learning are the Naïve Bayes presented and evaluated.

(Amrane et al., 2018). classifier and K-nearest

neighbor (KNN) to classify

breast cancer and evaluate

their accuracy using cross-

25
validation.

2. Development of In this paper, the authors This paper only develops

Machine Learning develop and test a tool to machine learning

Algorithms for the accurately predict an algorithms for the

Prediction of Financial individual’s risk of financial prediction of financial

Toxicity in Localized toxicity data before initiation toxicity in localized breast

Breast Cancer of breast cancer treatment. cancer cases.

Following Surgical The researchers also explore

Treatment (Sidey- whether supervised machine

Gibbons et al., 2021). learning algorithms can

reliably predict financial

toxicity in patients with

breast cancer who undergo

surgical treatment.

3. Comparative study of This paper compares three of This paper only compares

machine learning the most popular ML support vector machines,

algorithms for breast techniques commonly used random forests, and

cancer detection and for breast cancer detection Bayesian networks.

diagnosis (Bazazeh et and diagnosis, namely

al., 2016). Support Vector Machine

(SVM), Random Forest

(RF), and Bayesian

Networks (BN).

4. A comparative This paper conducts a The study conducts a

26
analysis of nonlinear performance comparison performance comparison

machine learning between five nonlinear between five non-linear

algorithms for breast machines machine learning

cancer detection learning algorithms viz algorithms.

(Bataineh, 2019). Multilayer Perceptron

(MLP),

K-Nearest Neighbors

(KNN), Classification and

Regression

Trees (CART), Gaussian

Nave Bayes (NB), and

Support Vector

Machines (SVM) on the

Wisconsin Breast Cancer

Diagnostic

(WBCD) dataset.

5. Performance analysis This study develops a This study only evaluates

of different machine machine learning model the performance analysis of

learning algorithms in coupled with limited features different machine learning

breast cancer to produce high algorithms in breast cancer

predictions (Battineni classification accuracy in predictions.

et al., 2020). tumor classification by

considering a dataset of 569

females diagnosed as 212

malignant and 357 benign

27
types. For model

development, three

supervised ML algorithms

namely support vector

machines (SVM), logistic

regression (LR), and K-

nearest neighbors (KNN)

were employed. Each model

was further validated by 10-

fold cross-validation and

performance measures were

defined to evaluate the

model outcomes.

6. Machine Learning This study had two key his study has several

Algorithms to Predict findings: (1) the comparison limitations inherent in any

Recurrence within 10 of goodness-of-fit results large database analysis.

Years after Breast indicated that provider First, the validity of the

Cancer Surgery: A characteristics (e.g., the comparisons in the study is

Prospective Cohort volume of breast cancer limited by the exclusion of

Study (Lou et al., patients per surgeon and per complications associated

2020). hospital) are essential with recurrence after

considerations in the design surgery. Second, the

of clinical decision support analysis was limited to

systems; and (2) the recurrence over 10 years

comparison of AUROC after surgery, which

28
values indicated that the reduced the subset of breast

ANN model is superior to cancer patients in which the

other prediction models. ANN model is clinically

applicable. Third, this study

only compared individual

ANN, KNN, SVM, NBC,

and COX models. Future

works may consider the use

of an alternative study

design that compares a

balanced sample of

surgeons or hospitals at the

first level and then

randomly selects breast

cancer patients at the

second level.

7. Breast Cancer Type The researchers evaluated Further research is

Classification Using the performance of four ML- recommended to investigate

Machine Learning based classification the power of ML algorithms

(Wu et al., 2021). algorithms: K-NNs, NGB, in the classifications of

DT, and SVM for the subtypes of triple-negative

classification of breast breast cancers TNBC and

cancer into triple-negative non-TNBC, to identify the

breast cancers (TNBC) and best classification features,

non-TNBC using gene and to integrate radionics

29
expression data. The with genomics data.

investigation revealed that

ML algorithms could

classify Breast Cancer into

TNBC and non-TNBC.

SVM algorithm turned out

the most accurate among the

four algorithms.

8. Using Three Machine In this paper, using data This paper has explored risk

Learning Techniques mining techniques, the factors for predicting breast

for Predicting Breast authors developed models to cancer by using data mining

Cancer Recurrence predict the recurrence of techniques. Each method

(Ahmad et al., 2013). breast cancer by analyzing has its limitations and

data collected from the strengths specific to the

Iranian Center for Breast type of application. There

Cancer ICBC registry. are some limitations in the

current study. There were

many cases lost in the

follow-up and there were

records with missing values

that were omitted,

unfortunately. Some

important variables such as

S-phase fraction and DNA

index were not included in

30
the study because of their

unavailability which may

have decreased the

performance of the models

and and there was some

degree of omission in our

data.

However, these obtained

results were based on a new

database in

Iranian Center for Breast

Cancer, by comparison, has

three different data mining

methodologies and a Weka

toolkit.

9. Comparison of Two of the most popular In this study, only two of

Machine Learning machine learning techniques the most popular machine

Methods for Breast were employed for the learning techniques were

Cancer Diagnosis classification of the employed for the

(Bayrak et al., 2019). Wisconsin Breast Cancer classification of the

(Original) dataset and the Wisconsin Breast Cancer

classification performance of (Original) dataset.

these techniques have been

compared with each other

using the values of accuracy,

31
precision, recall, and ROC

Area. The best performance

has been obtained by the

Support Vector Machine

technique with the highest

accuracy.

10. Discovering This research explores the The study only evaluates

Mammography-based design of mammography- Machine Learning

Machine Learning based machine learning Classifiers configurations to

Classifiers for Breast classifiers (MLC) and classify features vectors

Cancer Diagnosis propose a new method to extracted from segmented

(Ramos-Pollan et al., build MLC for breast cancer regions (pathological lesion

2012). diagnosis. Moreover, the or normal tissue) on

study evaluated MLC craniocaudal (CC) and/or

configurations to classify mediolateral oblique

features vectors extracted (MLO) mammography

from segmented regions image views, providing BI-

(pathological lesion or RADS diagnosis

normal tissue) on

craniocaudal (CC) and/or

mediolateral oblique (MLO)

mammography image views,

providing BI-RADS

diagnosis

32
11. Evaluating the In this paper, three machine- In this study, only three

performance of learning algorithms (Support machine learning

Machine Learning Vector Machine, K-nearest algorithms were employed

Techniques in the neighbors, and Decision to detect which classifier

Classification of tree) have been used and the works better in the

Wisconsin Breast performance of these classification of breast

Cancer (Obaid et al., classifiers has been cancer.

2018). compared to detect which

classifier works better in the

classification of breast

cancer. Furthermore, the

dataset of Wisconsin Breast

Cancer (Diagnostic) has

been used in this study. The

main aim of this work is to

make a comparison among

several classifiers and find

the best classifier which

gives better accuracy

12. Developing a Novel The researchers used The study only aims at

Machine Learning- machine learning techniques evaluating the development

Based Classification to develop a novel of a Novel Machine

Scheme for Predicting classification scheme, which Learning-Based

Secondary Primary included the transformation Classification Scheme for

Cancers (SPCs) in of data, clustering, Predicting Secondary

33
Breast Cancer resampling, and ensemble Primary Cancers (SPCs) in

Survivors (Chang et learning (TCRE) to predict Breast Cancer Survivors.

al., 2019). Secondary Primary Cancers

(SPCs) in women to have

had breast cancer. The

results of this study suggest

that age, sequence of

radiotherapy and surgery,

surgical margins of the

primary site, HER2, dose to

CTV high, and ER, when

appropriate, should be

recommended for patients

with breast cancer.

13. Prediction of breast This study demonstrated that This study only aims to

cancer risk using a applying the LPP algorithm investigate the advantages

machine learning effectively reduced feature of applying a machine

approach embedded dimensionality, and yielded learning approach

with a locality higher and potentially more embedded with a locally

preserving projection robust performance in preserving projection (LPP)

algorithm (Heidari et predicting short-term breast based feature combination

al., 2018). cancer risk. and regeneration algorithm

to predict short-term breast

cancer risk.

34
14. Breast Cancer In this research, four Further research in this field

Prediction Using algorithms SVM, Logistic should be carried out for the

Machine Learning Regression, Random Forest, better performance of the

and KNN predict the breast classification techniques so

cancer that it can predict more

the outcome has been variables

compared using different

datasets. All experiments are

executed within a simulation

environment. The aim of the

research categorizes into

three domains. The first

domain is a prediction of

cancer before the diagnosis,

the second domain is a

prediction of diagnosis and

treatment and the third

domain focuses on outcome

during treatment. The

proposed work can be used

to predict the outcome of

different techniques and

suitable techniques can be

used depending upon the

35
requirement.

15. Application of The paper discusses an The paper only presents the

Machine Learning approach to the problem in application of machine

Models for Survival which the main factor used learning models for the

Prognosis in Breast to predict survival time is the accurate prediction of

Cancer Studies originally developed tumor- survival time in breast

( Mihaylov et al., integrated clinical feature, cancer based on clinical

2019) which combines tumor stage, data.

tumor size, and age at

diagnosis. Two datasets from

corresponding breast cancer

studies are united by

applying a data integration

approach based on

horizontal and vertical

integration by using proper

document-oriented and

graph databases which show

good performance and no

data losses.

36
CHAPTER THREE

METHODOLOGY

3.1 Project Analysis

Developing a classifier system using SVM for the classification of breast cancer is

very crucial in the field of medicine. Its accuracy and effectiveness work with increased

feature spaces in dimensionality encourages us to use the SVM for this complex problem. It

is an effective statistical learning method for pattern recognition used to find the optimum

hyperplane which separates the classes. This work will be executed by choosing diagnostic

breast cancer data set and classifying tumors as benign or malignant using SVM which will

enable medical experts by minimizing possible errors which inexperienced ones can make

especially in the diagnosis and treatment of breast cancer. In this chapter, more emphasis is

laid on how the aim of the project was achieved.

The methodologies to be used in achieving the aim of the project are:

37
- Diagnostic Breast Cancer (DBC) data set will be chosen and the data set will

further be used to classify tumors as benign or malignant.

- Independent Component Analysis (ICA) algorithm will be used to compute

two-dimensional ICs of the chosen data set.

- Reduced two-dimensional feature vector will be used to test and train Support

Vector Machine (SVM) classifiers with linear, radial basis function (RBF)

and polynomial kernels.

- The effect of ICA on breast cancer classification using the SVM classifier

will be analyzed.

- Classifiers’ performance will be evaluated.

Therefore, the above tasks are required for the effective classification of breast

cancer using SVM.

3.2 Experiment Environment

All experiments on the classifiers described in this study will be conducted using

libraries from the WEKA machine learning environment. WEKA contains a collection of

machine learning algorithms for data pre-processing, classification, regression, clustering,

and association rules. Machine learning techniques implemented in WEKA are applied to a

variety of real-world problems. The program offers a well-defined framework for

experimenters and developers to build and evaluate their models.

3.3 Breast Cancer Dataset Collection and Preparation

The Wisconsin Breast Cancer (original) datasets from the UCI Machine Learning

Repository will be used in this study. Breast-cancer-Wisconsin has 699 instances (Benign:

458 Malignant: 241). The dataset is partitioned into two classes, the benign class (B) and the

malignant class (M) which constitute 65.5% and 34.5% respectively. Breast cancer is the

38
most prominent disease in the field of medical diagnostics and is increasing each year. The

dataset has 32 features which are Radius mean, Texture mean, Area mean, Smoothness,

Compactness, and Concavity except for sample code number and class. The benign instances

are represented as a positive class as they do not affect the body badly and the malignant

instances are represented as a negative class as they are the cancerous cells that affect the

body badly in our study. There are 11 integer-valued attributes in the data set. Finally, the

data set is randomized to ensure the correct propagation of data.

3.4 SVM Algorithm Model

In the SVM algorithm model, each data item is drawn as coordinates in n-dimensional

space. Where n is the total number of features used for classification. The value of each

feature is expressed in data point coordinates. The SVM contains decision hyperplanes to

divide different classes of data points using maximum margin. Data points near hyperplanes

are called support vectors. The classification process generates non-linear decision

boundaries and classifies data points not represented in vector space. During the classification

process, the model that use SVM algorithm model of Machine Learning can classify faster

that a cancerous raw data is a benign or malignant.

39
Fig. 3.1: Flow Diagram of Work

3.5 Independent Component Analysis

Independent Component Analysis (ICA) algorithm will be is used to compute two-

dimensional ICs of WDBC data set with 30 features. The reduced two-dimensional feature

vectors will be used to test and train SVM classifiers with linear, radial basis function (RBF)

and polynomial kernels. The effect of ICA on breast cancer classification using SVM

classifier will also be analyzed. The performance measures including sensitivity, specificity,

accuracy, and the ROC curve with its criterion values will be computed and presented in

40
order to compare classification results with original feature set to classification results with

reduced two-dimensional feature vectors using ICA and SVM.

Suppose that the measured signal consists of linear combination of two independently

distributed signals. The measured signal, x can be written:

x = As (1)

where s refers a vector of source signals. A is an unknown mixing matrix consists of

constant elements. The aim of using ICA is to compute the original signals. When a

separating matrix, W which is inverse of A can be computed the original signals can be found

by

ŝ=Wx (2)

ICA computing starts with centering data by removing the mean values of the

variables as PCA. The next step is to whiten data in order to uncorrelate the data as PCA. A

linear transformation is applied to whitened data in order to compute ICs by following

equation;

ic i = b Ti x (3)

where ic is the independent component and b is the vector to reconstruct ic. Many

different approaches can be used to estimate b. they use an objective function that relates to

variable independence.

3.6 Support Vector Machine

The SVM method searches for an optimal separating hyperplane (OSH) separating

two classes. Bounds between data sets and OSH are called ‘‘support vectors”. The geometric

view of support vectors lying on the margins and the hyperplane are shown in Fig. 3.2.

41
Fig. 3.2: The geometric view of support vectors and hyperplane

The hyperplane can be found by:

g( x )=w T x+ b (4)

where, x refers data points, w is a coefficient vector and b is offset from the origin. In

linear SVM g ( x ) ≥ 0for the closest point on the one of the classes, g(x )< 0 for the closest

point belongs to another class.

The margin between support vectors is defined by;

2
d= 2 (5)
¿∨w∨¿

The margin, d should be maximized for better separation. For this reason, the norm, w

must be minimized using Lagrange function,


n
1
L p (w , b , α ) = ¿∨w∨¿ - ∑ α 1 { y 1 (W T x 1 ¿+b)−1 }¿
2
(6)
2 I=1

Here y 1 ( W T x 1+ b ) ≥ 1 i=1,2 , … , n and y 1=¿ {+ 1, -1} represents class labels, α 1 is

Lagrange multipliers LP must be minimized to compute optimal w and b .

In the case of non-linear classification problem SVM with kernel function is used.

The kernel function maps data onto a higher dimensional space to construct a hyperplane
42
separating the classes. The new discriminant function used in SVM with kernel functions is

found by:
T
g ( x )=w ø (X )+ b (7)

Here Ø (X ) represents the mapping of input data, x onto the kernel space. Therefore, the

optimization equation can be written as:

Maximize ¿ (8)

where K ( xi , xj) refers the kernel function. The kernel functions are RBF or

polynomial.

3.7 The Performance Measure Indices

The performance of machine learning techniques is measured by several performance

indicators. A confusion matrix for actual and predicted class is formed comprising of TP

(True Positive), FP (False Positive), TN (True Negative), and FN (False Negative) to evaluate

the parameter. The significance of the terms is given below.

TP (True Positive) = Correctly Identified

TN (True Negative) = Incorrectly Identified

FP (False Positive) = Correctly Rejected

FN (False Negative) = Incorrectly Rejected

The most common empirical measure to evaluate the effectiveness is accuracy for classifier

and it is formulated by

TP+ FN
Accuracy = (9)
TP+ FP+ FN +TN

The proportion of actual positives which are correctly identified is the sensitivity and

Specificity is simply the proportion of negatives which are correctly identified. These are

calculated by

43
TP
Sensitivity = (10)
TP+ FN

TN
Specificity = (11)
TN + FP

The diagnostic performance of a test or a classifier to separated diseased cases from healthy

cases will be evaluated using the ROC curve analysis.

References

44
Abdul Halim, A., Andrew, A., Mohd Yasin, M., Abd Rahman, M., Jusoh, M., and

Veeraperumal, V. (2021). Existing and Emerging Breast Cancer Detection

Technologies and Its Challenges: A Review. Applied Sciences, 11(22), 10753.

https://doi.org/10.3390/app112210753

Ali, E., and Feng, W. (2016). Breast Cancer Classification using Support Vector Machine

and Neural Network. International Journal of Science and Research (IJSR), 5(3), 1-

6. https://doi.org/10.21275/v5i3.nov161719

Alzubi J., Nayyar A., and Kumar A. (2018). Machine Learning from Theory to

Algorithms: An Overview. Journal of Physics: Conference Series, 1142, 012012.

https://doi.org/10.1088/1742-6596/1142/1/012012

Atif, M., Siddiqui, Jamshed, J., Talib, F., and Sohail, S. (2020). Applications Of Machine

Learning Techniques for Disease Diagnosis: A Review. Journal of Critical Reviews.

7, 2652-2661. https://doi.org/10.31838/jcr.07.17.330.

Amrane M., Oukid S., Gagaoua I., and Ensari T. (2018). Breast Cancer Classification

Using Machine Learning. Conference: 2018 Electric Electronics, Computer Science,

Biomedical Engineerings' Meeting (EBBT).

Anand, A. (2022). Top 6 Machine Learning Techniques | Analytics Steps.

Analyticssteps.com. Retrieved 9 June 2022, from

https://www.analyticssteps.com/blogs/top-6-machine-learning-techniques.

Barrios, C. (2022). Global challenges in breast cancer detection and treatment. The

Breast, 62, S3-S6. https://doi.org/10.1016/j.breast.2022.02.003

Bellaachia, A., and Guven, E. (2006). Predicting breast cancer survivability using data

mining techniques. Age. 58(13), 10-110.

45
Biq Q., Goodman K., Kaminsky J., and Lessler J., (2019). What is Machine Learning? A

Primer for the Epidemiologist. American Journal of Epidemiology, 188(12), 2222-

2239.

Butryn, B., Chomiak-Orsa, I., Hauke, K., Pondel, M., and Siennicka, A. (2021).

Application of Machine Learning in medical data analysis illustrated with an

example of association rules, Procedia Computer Science, 192, 3134-3143

CDC, (2022). Retrieved 9 June 2022, from

https://www.cdc.gov/cancer/breast/basic_info/what-is-breast-cancer.htm.

Duda R.O, Hart P.E, Stork D.G., (2012). Pattern Classification. 2nd ed. Hoboken, NJ: John

Wiley & Sons, Inc.; 2012:517.

Ebrahim, E., and Wu, Z. (2016). Breast Cancer Classification using Support Vector

Machine and Neural Network. International Journal of Science and Research

(IJSR), 5(3), 1-6. https://doi.org/10.21275/v5i3.nov161719

Hasan, S., Sagheer, A.M., and Veisi, H., (2021). Breast Cancer Classification Using

Machine Learning Techniques: A Review. Turkish Journal of Computer and

Mathematics Education (TURCOMAT), 12, 1970-1979.

Islam, M. J., Wu, Q. M. J., Ahmadi, M., & Sid-Ahmed, M. A. (2007). Investigating the

Performance of Naive- Bayes Classifiers and K- Nearest Neighbor Classifiers. 2007

International Conference on Convergence Information Technology (ICCIT 2007).

Janiesch, C., Zschech, P., and Heinrich, K. (2021). Machine learning and deep learning.

Electron Markets. 31, 685–695

Kokilavani, T., and Beena L. A, (2021). Machine Learning–Based Case Studies for

Healthcare Analytics. Machine learning and analytics in healthcare systems 1, 18.

https://doi.org/10.1201/9781003185246

Kumar K.S., and Rajendran A., Machine Learning Classifiers in Health Care, 1, 22.

46
Längkvist M, Karlsson L, and Loutfi A. (2014). A review of unsupervised feature learning

and deep learning for time-series modeling. Pattern Recognition Letters, 42(1):11-24

Leung, K. M. (2007). Naive bayesian classifier. Polytechnic University Department of

Computer Science/Finance and Risk Engineering, 123-156.

Medjahed, S., Saadi T., and Benyettou A. (2013). “Breast Cancer Diagnosis by using k-

Nearest Neighbor with Different Distances and Classification Rules,” International

Journal of Computer Applications, 62(1), 0975 – 8887

Qifang Bi, Katherine E Goodman, Joshua Kaminsky, Justin Lessler, what is Machine

Learning? A Primer for the Epidemiologist, American Journal of Epidemiology,

188(12), 2222–2239

Panahiazar, M., Chen, N., Lituiev, D., and Hadley, D. (2021). Empowering study of breast

cancer data with application of artificial intelligence technology: promises,

challenges, and use cases. Clinical & Amp; Experimental Metastasis, 39(1), 249-254.

https://doi.org/10.1007/s10585-021-10125-8

Rani A., Kumar N., Kumar J., Kumar J., and Sinha N. K. (2022). Chapter 6 - Machine

learning for soil moisture assessment. In Cognitive Data Science in Sustainable

Computing, Deep Learning for Sustainable Agriculture, Academic Press, 143-168,

Sahu, B., and Panigrahi, A. (2020). Efficient Role of Machine Learning Classifiers in the

Prediction and Detection of Breast Cancer. 5th International Conference on Next

Generation Computing Technologies (NGCT-2019).

Saritas M. M, and Yasar A. (2019). “Performance Analysis of ANN and Naive Bayes

Classification Algorithm for Data Classification”, International

Journal of Intelligent Systems and Applications in Engineering (IJISAE). 7(2), 88–

91.

47
Siddiqui, M.K., Morales-Menendez, R., Huang, X., and Hussain N., (2020). A Review of

Epileptic Seizure Detection using Machine Learning Classifiers. Brain Infection. 7,

(5).

Sidey-Gibbons, C., Pfob, A., Asaad, M., Boukovalas, S., Lin, Y. L., Selber, J. C., Butler C.

E., and Offodile, A. C. (2021). Development of machine learning algorithms for the

prediction of financial toxicity in localized breast cancer following surgical

treatment. JCO clinical cancer informatics, 5, 338-347.

Sung, H., Ferlay, J., Siegel, R., Laversanne, M., Soerjomataram, I., Jemal, A., and Bray, F.

(2021). Global Cancer Statistics 2020: GLOBOCAN Estimates of Incidence and

Mortality Worldwide for 36 Cancers in 185 Countries. CA: A Cancer Journal for

Clinicians, 71(3), 209-249. https://doi.org/10.3322/caac.21660

Tahmooresi M., Afshar A., Rad B., Nowshath K. B., and Bamiah M. A. (2018). Early

Detection of Breast Cancer Using Machine Learning Techniques. Journal of

Telecommunication, Electronic, and Computer Engineering. 10, (3-2).

Tarca A.L, Carey V.J, Chen X, Romero R, and Draghici A., (2007). Machine Learning and

its Applications to Biology. PLoS Computational Biology. 3(6): e16.

Tijo B. T., and Abdulazeez A. M., (2021). Classification Based on Decision Tree

Algorithm for Machine Learning. Journal of Applied Science and Technology

Trends, 2(1), 20 – 28.

Toh, C., and Brody, J. P. (2020). Applications of Machine Learning in Healthcare. In

(Ed.), Smart Manufacturing - When Artificial Intelligence Meets the Internet of

Things. IntechOpen.

WHO- Cancer Report, (2020). Retrieved 10 June 2022, from https://www.who.int/news-

room/fact-sheets/detail/cancer. 

48
Zhu X., Goldberg A. B., (2009). Introduction to Semi-Supervised Learning. San Rafael,

CA: Morgan and Claypool Publishers; 1(11).

49

You might also like