Professional Documents
Culture Documents
Deep Learning of Mitosis From Cancer Tissue Images Using Convolutional Neural Networks
Deep Learning of Mitosis From Cancer Tissue Images Using Convolutional Neural Networks
2021. 05. 23
국문초록
The 16th Korea Scholars’ Conference for Youth
비정상적인 체세포 활동을 보이며 병원성 돌연변이를 감지하지 못한다. 연구원들은 종종 암세포를
발견할 수 있는 모델을 개발하기 위해 딥러닝을 사용하는 것이 이 병의 예후와 치사율을 향상시킬 수
있다는 가설을 세운다. 이 연구는 합성공 신경 네트워크와 다층 지각론을 사용하여 체세포 영상에서
비 체세포성(암세포의 원인)을 정확하게 식별할 수 있다. 이 연구에서는 데이터 세트 MITOS 를
사용하여 모델을 교육한 다음 K-평균 군집 분석을 사용하여 테스트 세트 영상을 분류한다. 이 모델은
영상을 유사한 특징을 가진 5 개의 세포 클러스터로 나눈 다음 유사 세포를 분류함으로써 유사
-Abstract-
Key Words: Deep learning, Mitosis, Artificial Intelligence, Convolutional Neural Networks, Cancer
Cancer is the result of unregulated duplication of cells. Cancer cells often ignore the death cycle, show
abnormal mitotic activity and fail to detect pathogenic mutations. Researchers often hypothesize that the use of
deep learning to develop a model that can detect cancer cells can improve the prognosis and fatality of the
disease. Using convolutional neural networks and multiple layer perceptron, the study is able to accurately
identify non-mitotic (cause of cancer cells) from mitotic cell images. The study uses the dataset MITOS to train
the model and then uses K-means clustering to classify the test set images. This model shows to effectively
classify the difference between mitotic and non-mitotic cells by dividing the images up into 5 clusters of cells
with similar features then classifying mitotic. Vs non-mitotic between those groups. The result of this model
shows a 77% validation accuracy and a classification accuracy of over 90% for two groups created within the
data. However, within the other remaining three clusters the accuracy shows room for improvement. This study
can be a stepping stone to creating an accurate and clinically significant deep learning model that can help
diagnose and detect cancerous activity before the onset of cancer within patients.
Contents
The 16th Korea Scholars’ Conference for Youth
1. Introduction
5. Conclusion
6. Reference List
The 16th Korea Scholars’ Conference for Youth
I. Introduction
Despite the advancements of medicine and healthcare, cancer is a disease that is still feared and
prolific. The disease accounts for about 25% of all deaths in most medically and economically developed
countries and the second highest cause of death due to disease, after circulatory disease. The etiology of cancer
is obscure because there is not a straightforward or clear cause; it can be caused by carcinogens, viruses,
genetics and countless other reasons. Cancer cells ignore the death cycle and duplicate abnormally, destroying
the function of the human body. These cells can originate from any living tissue in the body and display
Mitosis is the process and cycle of which cells duplicate. The cell cycle is divided into stages: G1
phase, S phase, G2 phase, and M phase. Between each phase there are checkpoints at which the cell decides to
move forward by checking whether the process was done correctly and completely. If a cell is unable to pass
through the checkpoint between G1 and S phase then it will enter its dormant phase (otherwise known as the G0
phase). Eventually if the cell is unable to divide then it will go through apoptosis, or cell death. Cells that often
exhibit mutations or other abnormalities will be stopped at this checkpoint so its mutation will not divide.
Cancer cells are mutated and abnormal cells that are able to pass this checkpoint regardless and
duplicating therefore causing proliferation of cancerous cells within the tissue and surrounding region.
Eventually over time a tumor can develop and the cancerous cells can migrate to other organs via the lymphatic
The 16th Korea Scholars’ Conference for Youth
or circulatory system and metastasize throughout the body. Once metastasis occurs, where the cancer has spread
through other organs and escapes its localized location, treatment is incredibly difficult. Clinically, cancer is
classified in four stages depending on the spread of the cancer through the patient. It is proven that the earlier
the stage of cancer the easier it is to treat it, and the better the prognosis. Since, there is no treatment of cancer
that guarantees long-term survival without relapse, physicians and healthcare providers focus on the early
detection of cancer.
The limitation of cancer detection is it requires human work and labor. Diagnosis of cancer requires
active and deliberate detection. Researchers hypothesize that the use of artificial intelligence can allow
pathologists to be able to detect cancer cells even before stage 1 or the presence of any symptoms. The
limitation of creating such model and tool to detect cancer cells would require a dataset of cancer cells that have
already been identified as cancer cells to train the data, this is called the training set. Since cancer cells can be
exhibited in any living cell from any type of tissue and patient, cancer cells come with countless features and
results on various image recognition are emerging. In particular,studies on cancer cell image recognition and
classification have made rapid progress. From a medical point of view, deep learning technology has not yet
provided confidence in recognition rate or accuracy that does not require a human hand. However, when the
pathologist distinguishes cancer cells in the classification of cancer cells, it is possible to significantly reduce the
fatigue and consumption time of the pathologist by focusing on the low-accuracy or ambiguous pictures after
first completing the classification of cancer cells through deep learning technology.In this paper, we are aiming
to develop an algorithm to detect mitotic cells and non-mitotic cells using supervised learning and unsupervised
learning. Directly Developing algorithms to detect cancer cells is too difficult, as there are a large variety of
cancer cells that show different and random traits. What we are focusing on is the cell mitosis, as cancer cells
are normally from DNA duplication error, which is part of the mitosis process. Even mitosis detection is very
difficult, as mitosis cells exhibit a huge amount of different structures, as mitotic nuclei, for example, exhibit a
variable appearance, and often have similar exterior traits to non-mitotic nuclei and other organelles. Therefore,
the machine learning process will have multiple layers to categorize random cells into different groups based on
The goal of this study is to test and validate the effectiveness of deep learning models when
classifying mitotic and non-mitotic cells. A classification, deep learning model is a method of learning where a
labeled training set of data is used to train a model and then validated on another set of similar data. The main
logic behind this model is to first use supervised learning, learning which involves a known response and data
and discovering the relationship between the data and the response. Then, after some method of refining the
features and relationship between the data and response, the model now uses a test set of data which the model
will now use the features and try to label each data point with a response. Then these answers that the model
comes up with can be compared with the actual data and the test accuracy can be calculated.
The goal of this study is to test and validate the effectiveness of deep learning models when
classifying mitotic and non-mitotic cells. Classifying cells through Deep learning models is a method of learning
where a labeled training set of data is used to train a model and validate on other sets of similar data. The main
logic behind this model is to first use supervised learning, a learning which involves a known response and data,
to discover the relationship between the data(cell images) and the response(label). Then, after some method of
refining the feature and relationship between the data and response, the model then uses a test set of data.
The testing of the convolutional neural network was initially conducted with the mnist dataset. Mnist
dataset is a black and white image with handwritten numbers (0~9), and is small in size with 28*28 pixels. The
The 16th Korea Scholars’ Conference for Youth
training curve was also observed in the way of Transfer Learning, which is a way to speed up training and
The learning method used in this study to extrapolate the features of the data is convolutional neural
networks. Within this study, the pre-existing python module, VGGNet, was used. Convolutional neural
networks use convolutional layers, where during each layer features are mapped and then the action max
pooling was processed with other layers. Also as the layers are processed the depth or the kernel at which the
pixels are processed increase. The more layers used would increase the number of features and accuracy.
However, using too many features can cause overfitting. The pre-existing python module, VGGNet, uses 19
convolutional layers. The data sets from the classification were then extracted and processed with UMAP
plotting.
When the cell pictures were classified through transfer learning, the existing CNNmodule VGG 19
was used. VGGNet is called VGG16 if it is composed of 16 layers, and VGG19 if it is composed of 19 layers,
depending on how many layers it is composed of.It is a model that can observe that the classification error
decreases as the depth increases to the 11th, 13th, 16th and 19th floors. In other words, the deeper it is, the
The data sets from the classification were then extracted and processed withUMAP plotting. UMAP
(Uniform Manifold Approximation and Projection) is a new manifold learning technique for nonlinear
dimension reduction. In actual finite data, it is said that a certain radius does not cover the manifold that they
The 16th Korea Scholars’ Conference for Youth
imagined. Likewise, when there are too many points, the cover is covered too much and we end up with a higher
level of simplification than we ideally thought. If the data is uniformly distributed across the manifold, it is easy
to choose a suitable radius. The average distance between points will work fine. So, when uniform distribution
is used, the entire cover is guaranteed to cover the entire manifold with no gaps and no unnecessarily
disconnected components. In other words, it seems that there is no breakage and no gap, so that the entire
manifold can be composed. When plotting the data, classification technique through clustering, K-Means, which
is an analysis method that uses the distance or similarity between observations given. The data sets were
grouped into several groups in order to understand the structure of the entire data. The clustering of the data was
separated into K-groups so that the distance from the center of the reference point is minimal. This process
operates in a way that minimizes the variance of the distance difference between each cluster and an algorithm
the use of supervised learning and unsupervised learning of cells are expected to show a good potential of
classifying mitotic and non mitotic cells. In the future, this research can contribute to a wide variety of
improvements and tools in the medical and oncology field. Currently, cancer is not diagnosed until a tumor or
enough significant mutation of cells has occurred, because non-mitotic cells are still hard to detect with the
current diagnostic technology that we have now. If a model was created to detect non-mitotic cells, this results
in software and diagnostic machines and tools that will allow physicians and pathologists to detect abnormal
mitosis activity and be able to warn the patient of the possibility of developing cancer in the future. This can
also lead to the development of surgical tools and other methods to detect and destroy these abnormal cells
The 16th Korea Scholars’ Conference for Youth
before the cancer has progressed. This research would be cutting edge to the medical field and increase the
V. Conclusion
The next step for this is to refine the models and test on more data to refine the features of the model.
First thing is to improve the given models. Since, we can only use preset models, this can’t give us the flexibility
to run more layers or make any sort of modification to the models used. Second would be to collect more
diverse data. Since abnormal or normal mitotic activity has countless variables and features, having a data set
with many diverse cell types would allow the model to be less overfit and be rid of the possibility of biases or
any confounding features and factors. This can lead to a more broad and accurate learning model that can
correctly identify if any cell is normal. The limitation of this data would definitely be in the collection. There are
so many different cells from all types of people and animals it would be hard to create such a diverse data set
Shen, D., Wu, G. & Suk, H.-I. Deep learning in medical image analysis. Annual review of biomedical
engineering 19, 221-248 (2017)
Giusti, Alessandro, et al. "A comparison of algorithms and humans for mitosis detection." 2014 IEEE 11th
International Symposium on Biomedical Imaging (ISBI). IEEE, 2014.
Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). Imagenet classification with deep convolutional neural
networks. In Advances in neural information processing systems (pp. 1097-1105).
Simonyan, K. & Zisserman, A. Very deep convolutional networks for large-scale image recognition.arXiv
preprint arXiv:1409.1556 (2014) Leland McInnes, John Healy, James Melville, UMAP: Uniform Manifold
Approximation and Projection for Dimension Reduction, arXiv:1802.03426