Download as pdf or txt
Download as pdf or txt
You are on page 1of 8

Machine Learning Algorithms

for Breast Cancer Prediction

Machine learning is an automatic learning method , the algorithms are designed to


learn from past dataset, we input a large number of data, machine learning model
analyze that data and on the basis of that train model we can make a prediction
about future . For breast cancer predictions, major machine learning algorithms are
as follow:

Artificial Neural Network (ANN)

Artificial Neural Network is a common algorithm for data mining process. Neural
network consists of input layer, hidden layer and output layer. This technique is
used to extract the pattern that is too complex . Algorithm is based on parallel
processing , distributed memory , collective solution and network architecture .

Logistics Regression (LR)

It is a supervised learning algorithm that includes more dependent variables. The


response of this algorithm is in the binary form. Logistics regression can provide
the continuous outcome of a specific data. This algorithm consists of statistical
model with binary variables .

K-Nearest Neighbor (KNN)

This algorithm is used in pattern recognition. It is a good approach for breast


cancer prediction. In order to recognize the pattern, each class has given an equal
importance. K Nearest Neighbor extract the similar featured data from a large
dataset. On the basis of features similarity we classify a big dataset .

Decision Tree (DT)

Decision tree is based on classification and regression model. Dataset is


divided into smaller number of subsets. These smaller set of data can
make prediction with the highest level of precision. Decision tree
method includes CART , C4.5 , C5.0 and conditional tree .

Naive Bayes Algorithm (NB)

This model is used to make an assumption of large training dataset. The


algorithm is used to calculate the probability through Bayesian method .
It provides the highest accuracy while calculating the probabilities of
noisy data that is used as an input . It is an analogy classifier that is
used for comparing training dataset with training tuple .

Support Vector Machine (SVM)

It is a supervised learning algorithm which is used for both classification


and regression problems . It consists of theoretical and numeric
functions to solve the regression problem. It provides the highest
accuracy rate while doing prediction of large dataset. It is a strong
machine learning technique that is based on 3D and 2D modelling .

Random Forest (RF)

Random Forest algorithm is based on supervised learning that is used


to solve both classification and regression problems. It is a building
block of machine learning that is used for prediction of new data on the
basis of previous dataset .

K Mean Algorithm

K mean is clustering algorithm that provides the partition of data in the


form of small clusters. Algorithm is used to find out the similarity
between different data points. Data points exactly consist of at least
one cluster that is most suitable for the evaluation of big dataset .
C Mean Algorithm

Clusters are identified on the similarity basis. Cluster that consist of


similar data point belongs to one single family. In C mean algorithm
each data point belongs to one single cluster. It is mostly used in
medical images segmentation and disease prediction .

Hierarchical Algorithm

Hierarchical algorithm mostly provides the evaluation of raw data in the


form of matrix. Each cluster is separated from other clusters in the form
of hierarchy. Every single cluster consists of similar data points.
Probabilistic model is used to measure the distance between each
cluster .

Gaussian Mixture Algorithm

It is most popular technique of unsupervised learning. It is known as soft


clustering technique which is used to compute the probability of
different types of clustered data. The implementation of this algorithm
is based on expectation maximization .
Logistic Regression:
The logistic model is used to model the probability of a certain class or event existing such as
pass/fail, win/lose, alive/dead or healthy/sick. This can be extended to model several classes of
events such as determining whether an image contains a cat, dog, lion, etc. We can understand
the working of Logistic Regression algorithm with the help of the following steps −

Step 1: Load the data


Step 2:Logistic Regression measures the relationship between
the dependent variable and the independent variables, by
estimating probabilities using its underlying logistic function.
Step 3: These probabilities must then be transformed into
binary values in order to actually make a prediction using the
sigmoid function.
Step 4: The Sigmoid-Function takes any real-valued number and
map it into a value between the range of 0 and 1.
Step 5: This values between 0 and 1 will then be transformed
into either 0 or 1 using a threshold classifier.
Random Forest:
Random forests or random decision forests are an ensemble learning method for classification,
regression and other tasks that operates by constructing a multitude of decision trees at training
time and outputting the class that is the mode of the classes or mean prediction of the individual
trees. We can understand the working of Random Forest algorithm with the help of the following
steps −

Step 1 − First, start with the selection of random samples from a


given dataset.
Step 2 − Next, this algorithm will construct a decision tree for
every sample. Then it will get the prediction result from every
decision tree.
Step 3 − In this step, voting will be performed for every
predicted result.
Step 4 − At last, select the most voted prediction result as the
final prediction result.

Steps for K-Nearest Neighbor:


It is one of the most basic yet essential classification algorithms in Machine Learning. It belongs
to the supervised learning domain and finds intense application in pattern recognition, data
mining and intrusion detection. It is widely disposable in real-life scenarios since it is non-
parametric, meaning, it does not make any underlying assumptions about the distribution of data

We can implement a KNN model by following the below steps:


Step 1 − Load the data
Step 2 − Initialise the value of k
Step 3 − For getting the predicted class, iterate from 1 to total number
of training data points
Step 4 − Calculate the distance between test data and each row of
training data. Here we will use Euclidean distance as our distance metric
since it’s the most popular method. The other metrics that can be used
are Chebyshev, cosine, etc.
Step 5 − Sort the calculated distances in ascending order based on
distance values
Step 6 − Get top k rows from the sorted array
Step 7 − Get the most frequent class of these rows
Step 8 − Return the predicted class

Steps for Support Vector Machine:


Support vector classifier is a classifier model for the support vector machine.
Support Vector Machine (SVM) is a supervised machine learning algorithm which
can be used for both classification or regression challenges. A Support Vector
Machine (SVM) is a discriminative classifier formally defined by a separating
hyperplane. In other words, given labelled training data (supervised learning), the
algorithm outputs an optimal hyperplane which categorizes new examples. In two-
dimensional space this hyperplane is a line dividing a plane in two parts where in
each class lay in either side.

We can implement a SVM model by following the below steps:


Step 1: Load the data
Step 2: Split data into train and test
Step 3: First, it finds decision boundaries that correctly classify the
training dataset.
Step 4: Pick the decision boundary which has maximum distance from
the nearest points (supported vectors) of these two classes as the best
one.
Step 5: Predict values using the SVM algorithm model
Step 6:Calculate the accuracy and precision.

conclusion
Logistic Regression provided highest accuracy of 95%, which is
slightly higher than K-Nearest Neighbour.
While Support Vector Machine gave the worst accuracy for
prediction.
we get good accuracy results using logistic regression, Random
Forest and Support Vector Machine, correctly classifying tumour
into malignant or benign almost 95-97% of times.

You might also like