308387_Brain and Breast

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 9

Similarity Report ID: oid:27535:15899281

PAPER NAME AUTHOR

1649598308387_Brain and Breast.doc Nitin Yadav

WORD COUNT CHARACTER COUNT

3442 Words 19482 Characters

PAGE COUNT FILE SIZE

5 Pages 1.2MB

SUBMISSION DATE REPORT DATE

Apr 11, 2022 11:19 AM GMT+5:30 Apr 11, 2022 11:21 AM GMT+5:30

69% Overall Similarity


The combined total of all matches, including overlapping sources, for each database.
60% Internet database 40% Publications database
Crossref database Crossref Posted Content database
18% Submitted Works database

Excluded from Similarity Report


Bibliographic material Small Matches (Less then 8 words)
Manually excluded text blocks

Summary
Cancer Cell Detection using Machine Learning
Ankit Kumar Singh
Ms. TanuShree Nitin Kumar Yadav
Department of Computer Science and
Department of Computer Science and Department of Computer Science and
Engineering
Engineering Engineering
Galgotias College of Engineering and
Galgotias College of Engineering and Galgotias College of Engineering and
Technology
Technology Technology
Greator Noida, 201310, Uttar Pradesh,
Greator Noida, 201310, Uttar Pradesh, Greator Noida, 201310, Uttar Pradesh,
India
India India
ankitsinghchs07@gmail.com
tanu.shree@galgotiacollege.edu nk157798@gmail.com
Akash Pandey
Department of Computer Science and
Engineering
Galgotias College of Engineering and
Technology
Greator Noida, 201310, Uttar Pradesh,
India

4
Abstract— The early stage of cancer detection is required treatment to the patients, symptoms must be studied
to provide proper and personalized treatment to the patient properly and an automatic prediction system is required
and reduce the risk of death due to cancer. Detection of these
cancerous cells at later stages leads to more suffering and which will classify the tumour into benign or malignant.
potentially increases the chances of death. Researchers have There are near about 100 different types of cancers
been working on and developing various machine learning affecting the human body.
solutions to produce encouraging results. In this paper, we 5
explore the various techniques and technologies that are As soon as the disease is discovered, next task would be
already in practice to detect the cancer cells in their early determining in which stage the cancer is. The stage in which
10
stages and works presently going in the industry. The main the cancer is can be determined by various factors such as
objective of this project is to develop a machine learning thickness, the depth of penetration, and the extent to which
algorithm which requires minimal intervention of humans.
the melanoma or the infection has spread. Based on the
Keywords— cancer, machine learning, dataset, algorithm stage determined, the patients are treated accordingly. In the
5
recent years, due to the increased use of cosmetics and
pollution and radiations, cancer is becoming a common
I. INTRODUCTION disease in the modern era.
1
Cancer is the world‘s second biggest killer disease after
II. RELATED WORK
the deadly heart disease and stroke. Cancer is a group of 1
In paper [2] author has tried to resolve the impression of
diseases. It is a dangerous disease that is characterized by
syndrome clusters in breast cancer scraps evolved from both
the nature of the cell inside the body which has no control social media and research study data using improved K-
of itself. It involves abnormal growth of the cells. It medoid clustering and also developed improved K-medoid
spreads and affects very fast to other parts of the body. clustering which helps to improve the clustering performance
Cancer damages the human body gradually when cells by reassigning some of the negative average silhouette width
(ASW) syndrome to other clusters after initial k-medoid
start growing uncontrollably to form many lumps of clustering.
tissues inside the human body called tumours. It is not
necessary that all kind of tumors are cancerous. Some In paper [7] the author has explored many features and
1 classifiers to select extracted genes from microarray which
tumors do not spread in the body. Tumors may grow and
interact with the other parts of the body. That part may be have many noises. They have taken two datasets: Brain
nervous system, digestive system or circulatory system. cancer and Breast cancer which has the sample 72 and 47
The effect of infected parts of the body releases the respectively. They have used Pearson‘s and Spearman‘s
1 correlation coefficients, Euclidean distance, information
hormones that cause change in the body. Cell can grow to
other cells and can destroy the surrounding tissues that gain, mutual information and signal to noise ratio for
causes other tumors to develop. Tumors can be of two feature selection. For classification, they used MLP, kNN,
1 SVM and SOM. They performed experimental results with
types namely malignant and benign. Malignant tumors can
be a life-threatening and more dangerous in nature. Benign all the dataset given and shown the best result for accuracy
tumors usually do not cause much damage but can become is 97.1% on Leukemia dataset with all the classifiers.
more dangerous if they grow a lot or they might become In paper[8] the author explored PSO for the prediction
malignant after certain amount of time. of patient survival using gene expression data. PSO
Cancer has various symptoms such as tumor, abnormal reduces the dimensionality by implementing Probabilistic
bleeding, more weight loss etc. To provide appropriate NN. The experimental results of PSO/PNN on B-cell

15
XXX-X-XXXX-XXXX-X/XX/$XX.00 ©20XX IEEE
1
Lymphoma dataset of 240 sample was more effective up Correlation based feature selection (CFS). In the
to 80% accuracy in predicting survival. conclusion, the authors got the best efficient result by
SVM-RFE feature selection methods with 100% accuracy
In paper [11] the author has proposed a novel approach
to identify the significant genes present in the body.
based on the feature selection method in order to classify
the high dimensional cancer microarray data. This In paper [21] the author applied data mining techniques
approach uses one of the filtering techniques for on large scale to discover the valuable knowledge. Rough
optimization: signal-to- noise ratio (SNR) and PSO. They set theory was utilized to find the data reliance and reduce
predicted that the PSO gives better result when the feature set contained in the data set. The Hybrid
implementation is done along with SVM, k-NN and PNN. Particle Genetic Swarm Optimization is used to optimize
They have described the dataset of Brain Cancer having 72 the selected features of brain cancer at different stages.
instances with 7129 genes and Breast cancer having 62 Multi class SVM is adopted to classify normal or different
instances with 2000 genes, DLBCL having 77 instances stage of brain cancer using optimized feature set. The
1 1
with 6817 genes. The accuracy they found PSO along with dataset of brain cancer composed of 12042 genes with 493
other classifiers gave 100% in the case of Breast Cancer. instances. The classifier Multiclass SVM, ANN and Naïve
Bayes analyzed their experimental result of accuracy 96%,
In paper [12] the author seeked to extract differentially 1
93% and 90% respectively.
expressed (DE) genes between early and advanced cases
of multiple cancer types through the use of RNA Paper [25] compared different Machine Learning
sequencing data. The importance of these genes is further algorithms: SVM, C4.5, NB and kNN for that dataset is
available on WBCD which has 699 instances and 11 integer-
examined by developing predictive models using K-
valued attributes. Among all algorithms, SVM 1
gave the
nearest neighbour and linear discriminate analysis highest accuracy of 97.13% among all and the lowest error
1
classifiers. The outcome states that a cancer analysis may rate conducted in WEKA data mining tool.
be highly equivalent to standard analyses of individual
III. METHODS
cancers for describing biologically relevant DE genes and
can assist in developing powerful predictive models for The following steps are performed for the early detection of
cancerous cell in human beings. These steps show how the
cancer prediction. Microarray gene expression information cancerous cell can be identified in human beings. Following
normally consists of an enormous number of genes are the steps-
contrasted with less number of tests accessible. In this
manner, it is the motivating assignment to recognize a little A. DATASET
subgroup of persistent genes from microarray gene The datasets have been collected from the database.
expression information where the differentiating Databases already have the cancer cell data through which
chromosome can exclusively be utilized for precisely the identification of the cancer cell be done. There are many
arranging the cancer subspace. Consequently, In paper databases such as Kaggle to store the data.
[24] a reckoning proficient but precise gene ID strategy B. PRE-PROCESSING
has been nominated. At the commencement, the t-test
Pre-Processing is the process of improving the quality of
technique is antiquated to diminish the measurement of
the datasets, which are going to be used for testing and
the dataset and after that; the recommended particle swarm
training of the machine. It includes mainly thresholding,
optimization based approach has been utilized to discover
filtering and log transformation process for better quality
helpful genetic code. This strategy has been connected on
of datasets.
the small round blue cell tumor (SRBCT) information to
arrange the four sub-divisions particularly neuroblastoma,
C. CLASSIFICATION TECHNIQUES
non-Hodgkin lymphoma, rhabdomyosarcoma and Ewing
sarcoma syndrome to other clusters after initial k-medoid There are different type of are being used to detect the
clustering. cancerous cell. Following are the few techniques used-

In paper [18] author studied about many classification i) Stochastic Gradient Descent (SGD)
methods and feature selection methods for expressed
ii) Support Vector Machines(SVM- Linear Kernel)
genes in microarray data. They were able to find the
efficiency of the various classification methods like: iii) Support Vector Machines (SVM- Gaussian
SVM, Radial Basic Function, Multi-Layer Perceptron, DT Kernel)
and RF. The 9-fold cross validation had been applied to
calculate the accuracy of the classifier that includes: K- iv) Convolutional Neural Network (CNN)
means. Further the efficiency of the feature selection
1
methods was measured by SVM-RFE, Chi-Squared and
D. Classification Techniques used

i) Stochastic Gradient
3
Descent (SGD)- forms the basis of
Neural Networks. It is an iterative algorithm that starts from
a random point of a function and then travels down its slope
in steps until it reaches the lowest point of that function.
This algorithm is useful in cases where the optimal points
cannot be found by equating the slope of the function to
zero.

ii) Support Vector Machines (SVM- Linear Kernel)- SVM


with a linear kernels learns
7
a linear decision boundary in the
original feature space. It can be used for both classification
as well as regression problems. In this algorithm, we plot
Fig. 1 Custom Dataset for Breast Cancer
each data item as a point in x-dimensional space (where x is
a number of features) with the value of each feature being
the value of a particular coordinate. Then, we compute the
classification by finding the hyper-plane that differentiates
the two classes very well.

iii) Support Vector Machines


3
(SVM- Gaussian Kernel)-
Gaussian kernel model is another popular Kernel method
used in SVM models for more. Gaussian kernel is a function
whose value depends on the distance from the origin or from
some point. Gaussian Kernel is of the following format;

||X1 — X2 || = Euclidean distance between X1 & X2.

11
iv) Convolutional Neural Networks (CNN) – It is a type of
artificial neural networks that is used in
12
image recognition
and processing. It uses machine vision that includes image
Fig.2 Difference between malignant and benign cancers and video recognition along with recommendation systems
8
and natural language processing. CNN uses multilayer
perceptron system that has been designed for reduced
processing requirements. It consists of input layer, output
layer, and a hidden layer that includes multiple
convolutional layers, pooling layers, fully connected layers
and normalization layers.

IV. EXPERIMENT and RESULTS

A. For Brain Cancer


2
i) Input MRI Image- The image acquiring stage that starts
with taking a group of pictures from the database. Pictures
are hold in MATLAB is displayed as a grey scale image.

Fig 3. Picture depicting the images of brain that has brain ii) Pre- Processing - The main task of pre-processing is to
cancer enhance the input image and build it in an exceedingly
either human or machine vision system. Pre-processing
helps to enhance parameters of man pictures like SNR,
The above figures depicts the different types of dataset that removing noise artifacts, inner smoothing and conserving its
are to be taken into account while predicting the different edges. To enhance the SNR values, and the clarity of raw
types of cancers namely- Brain Cancer and Breast Cancer. man pictures, we tend to apply adjective distinction
The datasets are thoroughly trained and then tested in order improvement supported changed sigmoid operation.
to remove any kind of error still persisting in the model that 2
might interfere in the final results. The accuracy of different iii) Feature Extraction - It is the method of aggregation of
types of algorithms is also caliberated in order to get a clear higher-level information of a picture like form, texture,
picture of which algorithm is better. colour, and distinction. In fact, texture analysis is a very
important parameter of human perception in machine
learning algorithm. It is used effectively to enhance the
accuracy of designation
2
system by choosing distinguished
options. One of the foremost wide used image analysis
applications of grey Level Cooccurrence Matrix (GLCM)
and texture feature.
2
iv) Classification - The classification of imaging pictures is
more difficult task for the automated detection of neoplasm
images. Classification may provide the solution whether or
not the image contains neoplasm or not. For classification
purpose several classifiers can be applied. Fig 5. Histogram plotted for different class labels used
B. For Breast Cancer
6 Discussions
i) The Dataset - The machine learning algorithms were
trained to detect breast cancer using the6 Wisconsin 17
In this paper we have proposed
18
a Machine Learning model
Diagnostic Breast Cancer (WDBC) dataset. The dataset and also used different machine learning algorithms for
consists of features which were calculated from a digitized diagnosis and detection of breast and brain cancers. For pre-
image of a fine needle aspirate (FNA) of a breast mass. The processing of dataset, we have used the standardization
said features describe the
6
characteristics of the cell nuclei. method. The dataset will automatically be extracted from
The dataset features are as follows: radius, texture, perimeter, websites like kaggle. Then we have implemented our ML
area, smoothness, compactness, concavity, concave points, algorithms and achieved 94.56% accuracy .
symmetry, and fractal dimensions.
14 V. CONCLUSION
ii) Dataset Pre-processing -To avoid inappropriate
1
assignment of relevance, the dataset was standardized as: From this survey we conclude that, most of the automatic
9
z =(X – μ)/σ cancer predication systems are based on machine learning
where X is the feature to be standardized, μ refers to mean concepts including classification and clustering algorithms.
value of the feature, and σ signifies the standard deviation of This paper presents an extensive review of various Machine
the feature. Learning classification techniques for the prediction of
cancer and standard datasets have been used in various 1
iii) Machine Learning Algorithms – For detection of breast variety of cancer such as brain cancer and breast cancer. A
cancers three different algorithms are used namely: detailed list of results found by many researchers has been
Stochastic gradient descent with the accuracy of 95.53%, calculated to solve the problems by various computational
SVM with linear kernel with the accuracy of 96.49% and intelligence techniques. The most successful approach is
SVM with Gaussian kernel with the accuracy of 97.56%. SVM and combination of SVM technique which gave up to
99% accuracy on a smaller number of training datasets
which is not a good prediction in case with large datasets.
Results However, options are available for the possibilities of
improvement of predicting the cancer at an early stage. There
are many datasets that are available to explore more for the
same. There are huge numbers of cancer types present with
unknown and invariable functions.
VI. ACKNOWLEDGEMENT

We are highly indebted to Ms. Tanu Shree for her


guidance and constant supervision. Also, we are highly
thankful to her for providing necessary information regarding
the project and also for their invaluable support19 in
completing the project. We are extremely indebted to Dr.
Vishnu Sharma, HOD, Department of Computer Science and
Engineering, GCET and Dr. Jaya Sinha, Project Coordinator,
16
Fig. 4 Graph plot of true and false positive rate in SVM
Department of Computer Science and Engineering, GCET
with linear kernel
for their valuable suggestions and constant 13
support
throughout our project tenure. We would also like to express
our sincere thanks to all faculty and staff members of
Department of Computer Science and Engineering, GCET.
[15] B.M.Gayathri, C.P.Sumathi, T.Santhanam, ―Breast cancer diagnosis
using machine learning algorithms–a survey‖, International Journal of
REFERENCES Distributed and Parallel Systems (IJDPS) Vol.4, No.3, May 2013, pp
105-112.
[1] WorldHealthOrganization(WHO),CancerFactSheet,Http://www.Who.
[16] Ahmad LG, Eshlaghy AT, Poorebrahimi A, Ebrahimi M, Razavi
int/Mediacentre/Factsheets/Fs297/En/(Accessedon:October/2017),201
AR, ―Using Three Machine Learning Techniques for Predicting
7.
Breast Cancer Recurrence‖, Open Access, Journal of Health &
[2] Q. Ping, C. C. Yang, S. A. Marshall, N. E. Avis, and E. H. Ip, ―Breast Medical Informatics 2013, vol 4, issue 2. ISSN: 2157-7420,
cancer symptom clusters derived from social media and research http://dx.doi.org/10.4172/2157- 7420.1000124
study data using improved K-Medoid clustering,‖ in IEEE
[17] P. Ramachandran, N.Girija, T.Bhuvaneswari, ―Early Detection and
Transactions on Computational Social Systems, vol. 3, no. 2, pp. 63–
Prevention of Cancer using Data Mining Techniques‖, International
74, June 2016.
Journal of Computer Applications (0975–8887), Volume 97– No.13,
[3] Alpaydin E. Introduction to Machine Learning. MIT press; 2009. July 2014, pp 48-53.
[4] Linthicum KP, Schafer KM, Ribeiro JD. Machine learning in suicide [18] Mehdi Pirooznia, Jack Y Yang, Mary Qu Yang, Youping Deng, ―A
science: Applications and ethics. Behav Sci Law. 2019;37(3):214- comparative study of different machine learning methods on
222. microarray gene expression data‖, BMC Genomics, Open Access
[5] Marshland, S. (2009) Machine Learning an Algorithmic Perspective. BioMed Central, 2008, International Conference on Bioinformatics &
CRC Press, New Zealand, 6-7. Computational Biology (BIOCOMP'07) Las Vegas, NV, USA. 25-28
[6] Y. Hu, K. Ashenayi, R. Veltri, G. O'Dowd, G. Miller, R. Hurst and June 2007, DOI: 10.1186/1471-2164-9-S1-S13.
R. Bonner, ―A Comparison of Neural Network and Fuzzy c-Means [19] Arunanand T A, Abdul Nazeer K A, Mathew J P, Meeta Pradhant,
Methods in Bladder Cancer Cell Classification‖, Proceedings of 1994 ―A Nature-inspired Hybrid Fuzzy C-means algorithm for Better
IEEE International Conference on Neural Networks (ICNN'94), pp Clustering of Biological Data Sets‖, IEEE International Conference on
3461- 3466, ISBN: 0- 7803-1901-X DOI: Data Science & Engineering (ICDSE ‗14), pp 76-82.
10.1109/ICNN.1994.374891 [20] Vikas Chaurasia, Saurabh Pal, ―Data Mining Techniques: To
http://ieeexplore.ieee.org/lpdocs/epic03/wrapper.htm?arnumber=37 Predict and Resolve Breast Cancer Survivability‖, International
4891 Journal of Computer Science and Mobile Computing, Vol.3 Issue.1,
[7] Sung-Bae Cho, Hong-Hee Won, ―Machine Learning in DNA January- 2014, pg. 10-22, ISSN 2320–088X
Microarray Analysis for Cancer Classification‖, First Asia-Pacific [21] P. Yasodha, N.R. Anathanarayanan, ―Analysing Big Data to Build
Bioinformatics Conference, Adelaide, Australia. Conferences in Knowledge Based System for Early Detection of Ovarian Cancer‖,
Research and Practice in Information Technology, Vol. 19.2003 Indian Journal of Science and Technology (IJST), Vol 8(14), July
[8] Rui Xu, Xindi Cai, Donald C. Wunsch II, ―Gene Expression Data for 2015, ISSN 0974-5645, DOI: 10.17485/ijst/2015/v8i14/65745
DLBCL Cancer Survival Prediction with A Combination of Machine [22] Ammu P K, Siva Kumar K C, Sathish M, ―A BBO Based Feature
Learning Technologies‖, Proceedings of the 2005 IEEE Engineering in Selection Method for DNA Microarray‖, International Journal of
Medicine and Biology 27th Annual Conference, 2005, pp 894-897, Research Studies in Biosciences (IJRSB) Volume 3, Issue 1, January
ISBN 0780387406 2015, PP 201-204, ISSN 2349-0365
[9] P. Rajeswari, G. Sophia Reena, ―Human Liver Cancer [23] K. Sivakami, ―Mining Big Data: Breast Cancer Prediction using DT -
Classification using Microarray Gene Expression Data‖, International SVM Hybrid Model‖, International Journal of Scientific Engineering
Journal of Computer Applications (0975–8887) Volume 34–No.6, and Applied Science (IJSEAS), Volume-1, Issue-5, August 2015, pp
November 2011, pp 25-37. 418-429, ISSN: 2395-3470.
[10] Jayashree Dev, Sanjit K Dash, Swet Dash, Madhusmita Swain, ―A [24] Kar, Subhajit, Kaushik Das Sharma, and Madhubanti Maitra, ―A
Classification Technique for Microarray Gene Expression Data particle swarm optimization based gene identification technique for
using PSO- FLANN‖, International Journal on Computer Science and classification of cancer subgroups,‖ in 2nd IEEE International
Engineering (IJCSE), Vol. 4 No. 09 Sep 2012, ISSN: 0975- 3397, pp Conference on Control, Instrumentation, Energy and Communication
1534-1539. (CIEC), 2016.
[11] Barnali Sahu, Debahuti Mishra, ―A Novel Feature Selection [25] Hiba Asria, Hajar Mousannifb, Hassan Moatassimec, Thomas
Algorithm using Particle Swarm Optimization for Cancer Microarray Noeld, ―Using Machine Learning Algorithms for Breast Cancer
Data‖, International Conference on Modeling Optimization and Risk Prediction and Diagnosis‖, ELSEVIER 6th International
Computing (ICMOC-2012), ELSEVIERProcedia Engineering 38 Symposium on Frontiers in Ambient and Mobile Systems (FAMS
(2012 ) pp 27–31. 2016), Procedia Computer Science 83 ( 2016 ) pp 1064–1069.
[12] S. Mishra, C. D. Kaddi, and M. D. Wang, ―Pan-cancer analysis for
studying cancer stage using protein and gene expression data,‖ in
38th Annual International Conference of the IEEE Engineering in
Medicine and Biology Society (EMBC), Orlando, FL, USA, 2016, pp.
2440–2443.
[13] Cuong Nguyen, Yong Wang, Ha Nam Nguyen,‖ Random forest
classifier combined with feature selection for breast cancer diagnosis
and prognostic‖, J.Biomedical Science and Engineering, 2013, 6, pp
551-560, DOI: http://dx.doi.org/10.4236/jbise.2013.65070
[14] Ammu P K, Preeja V,‖ Review on Feature Selection Techniques of
DNA Microarray Data‖, Intl. J. of Computer Applications (0975–
8887) Volume 61–No.12, January 2013, pp 39-44.
Similarity Report ID: oid:27535:15899281

69% Overall Similarity


Top sources found in the following databases:
60% Internet database 40% Publications database
Crossref database Crossref Posted Content database
18% Submitted Works database

TOP SOURCES
The sources with the highest number of matches within the submission. Overlapping sources will not be
displayed.

ijsret.com
1 42%
Internet

A. Keerthana, B. Kavin Kumar, K.S Akshaya, S. Kamalraj. "Brain Tumour...


2 7%
Crossref

towardsdatascience.com
3 3%
Internet

Priyank Hajela, Ambika Vishal Pawar, Swati Ahirrao. "Deep Learning for...
4 3%
Crossref

ijrat.org
5 3%
Internet

arxiv.org
6 3%
Internet

Liverpool John Moores University on 2021-08-15


7 2%
Submitted works

mdpi.com
8 1%
Internet

Sources overview
Similarity Report ID: oid:27535:15899281

Queen Mary and Westfield College on 2022-03-13


9 <1%
Submitted works

coursehero.com
10 <1%
Internet

Florida Institute of Technology on 2021-12-16


11 <1%
Submitted works

University of Wales Institute, Cardiff on 2019-01-16


12 <1%
Submitted works

University of Technology on 2019-10-22


13 <1%
Submitted works

deepai.org
14 <1%
Internet

repository.tudelft.nl
15 <1%
Internet

ijert.org
16 <1%
Internet

"Proceedings of the International Conference on Big Data, IoT, and Ma...


17 <1%
Crossref

"Intelligent Systems and Computer Technology", IOS Press, 2020


18 <1%
Crossref

researchgate.net
19 <1%
Internet

Sources overview
Similarity Report ID: oid:27535:15899281

Excluded from Similarity Report


Bibliographic material Small Matches (Less then 8 words)
Manually excluded text blocks

EXCLUDED TEXT BLOCKS

Department of Computer Science andEngineeringGalgotias College of Engineering...


Tanu Shree, Rajiv Kumar, Nikhil Kumar. "Green Computing in Cloud Computing", 2020 2nd International Conf...

Department of Computer Science andEngineeringGalgotias College of Engineering...


Shri Guru Gobind Singhji Institute of Engineering and Technology on 2018-03-15

Department of Computer Science andEngineeringGalgotias College of Engineering...


www.ijert.org

Excluded from Similarity Report

You might also like