Deep Learning of Mitosis From Cancer Tissue Images Using Convolutional Neural Networks

The 16th Korea Scholars’ Conference for Youth
Deep Learning of Mitosis from Cancer Tissue

Images using Convolutional Neural Networks.
- Detecting mitotic and non-mitotic cells using machine learning -
2021. 05. 23
Northfield Mount Hermon School

Cheongna Dalton International School
Euijin Lee & Jinhyuk Jang
국문초록
주제어: 딥러닝, 유사분열, 인공지능, 합성공 신경망, 암
암은 조절되지 않은 세포의 복제의 결과이다. 암세포는 종종 사망 주기를 무시하고
비정상적인 체세포 활동을 보이며 병원성 돌연변이를 감지하지 못한다. 연구원들은 종종 암세포를
발견할 수 있는 모델을 개발하기 위해 딥러닝을 사용하는 것이 이 병의 예후와 치사율을 향상시킬 수
있다는 가설을 세운다. 이 연구는 합성공 신경 네트워크와 다층 지각론을 사용하여 체세포 영상에서
비 체세포성(암세포의 원인)을 정확하게 식별할 수 있다. 이 연구에서는 데이터 세트 MITOS 를
사용하여 모델을 교육한 다음 K-평균 군집 분석을 사용하여 테스트 세트 영상을 분류한다. 이 모델은
영상을 유사한 특징을 가진 5 개의 세포 클러스터로 나눈 다음 유사 세포를 분류함으로써 유사
세포와 비 유사 세포 간의 차이를 효과적으로 분류하는 것을 보여준다. 그 집단들 사이에선 비운동성

장애요 이 모델의 결과는 데이터 내에서 생성된 두 그룹에 대해 77%의 검증 정확도와 90% 이상의
분류 정확도를 보여준다. 그러나 나머지 3 개 클러스터에서는 정확도가 개선의 여지가 있다. 이

연구는 환자 내에서 암이 발생하기 전에 암 활동을 진단하고 발견하는 데 도움을 줄 수 있는
정확하고 임상적으로 중요한 딥러닝 모델을 만들기 위한 디딤돌이 될 수 있다.

-Abstract-
Key Words: Deep learning, Mitosis, Artificial Intelligence, Convolutional Neural Networks, Cancer
Cancer is the result of unregulated duplication of cells. Cancer cells often ignore the death cycle, show
abnormal mitotic activity and fail to detect pathogenic mutations. Researchers often hypothesize that the use of
deep learning to develop a model that can detect cancer cells can improve the prognosis and fatality of the
disease. Using convolutional neural networks and multiple layer perceptron, the study is able to accurately
identify non-mitotic (cause of cancer cells) from mitotic cell images. The study uses the dataset MITOS to train
the model and then uses K-means clustering to classify the test set images. This model shows to effectively
classify the difference between mitotic and non-mitotic cells by dividing the images up into 5 clusters of cells
with similar features then classifying mitotic. Vs non-mitotic between those groups. The result of this model
shows a 77% validation accuracy and a classification accuracy of over 90% for two groups created within the
data. However, within the other remaining three clusters the accuracy shows room for improvement. This study
can be a stepping stone to creating an accurate and clinically significant deep learning model that can help
diagnose and detect cancerous activity before the onset of cancer within patients.
Contents
1. Introduction
2. Purpose and Significance of the Study
3. Methods of Analysis and Data Collection
4. Expected Results and Limitations
5. Conclusion
6. Reference List
Deep Learning of Mitosis from Cancer Tissue

Images using Convolutional Neural Networks.
- Detecting mitotic and non-mitotic cells using machine learning -
Northfield Mount Hermon School
Cheongna Dalton International School
Euijin Lee & Jinhyuk Jang
I. Introduction
Despite the advancements of medicine and healthcare, cancer is a disease that is still feared and
prolific. The disease accounts for about 25% of all deaths in most medically and economically developed
countries and the second highest cause of death due to disease, after circulatory disease. The etiology of cancer
is obscure because there is not a straightforward or clear cause; it can be caused by carcinogens, viruses,
genetics and countless other reasons. Cancer cells ignore the death cycle and duplicate abnormally, destroying
the function of the human body. These cells can originate from any living tissue in the body and display
abnormal mitotic activity.
Mitosis is the process and cycle of which cells duplicate. The cell cycle is divided into stages: G1
phase, S phase, G2 phase, and M phase. Between each phase there are checkpoints at which the cell decides to
move forward by checking whether the process was done correctly and completely. If a cell is unable to pass
through the checkpoint between G1 and S phase then it will enter its dormant phase (otherwise known as the G0
phase). Eventually if the cell is unable to divide then it will go through apoptosis, or cell death. Cells that often
exhibit mutations or other abnormalities will be stopped at this checkpoint so its mutation will not divide.
Cancer cells are mutated and abnormal cells that are able to pass this checkpoint regardless and
duplicating therefore causing proliferation of cancerous cells within the tissue and surrounding region.
Eventually over time a tumor can develop and the cancerous cells can migrate to other organs via the lymphatic
or circulatory system and metastasize throughout the body. Once metastasis occurs, where the cancer has spread
through other organs and escapes its localized location, treatment is incredibly difficult. Clinically, cancer is
classified in four stages depending on the spread of the cancer through the patient. It is proven that the earlier
the stage of cancer the easier it is to treat it, and the better the prognosis. Since, there is no treatment of cancer
that guarantees long-term survival without relapse, physicians and healthcare providers focus on the early
detection of cancer.
The limitation of cancer detection is it requires human work and labor. Diagnosis of cancer requires
active and deliberate detection. Researchers hypothesize that the use of artificial intelligence can allow
pathologists to be able to detect cancer cells even before stage 1 or the presence of any symptoms. The
limitation of creating such model and tool to detect cancer cells would require a dataset of cancer cells that have
already been identified as cancer cells to train the data, this is called the training set. Since cancer cells can be
exhibited in any living cell from any type of tissue and patient, cancer cells come with countless features and
can be very different.
II. Purpose and Significance of the Study

Recently, as researches using Convolutional Neural Networks are actively progressing, research
results on various image recognition are emerging. In particular,studies on cancer cell image recognition and
classification have made rapid progress. From a medical point of view, deep learning technology has not yet
provided confidence in recognition rate or accuracy that does not require a human hand. However, when the
pathologist distinguishes cancer cells in the classification of cancer cells, it is possible to significantly reduce the
fatigue and consumption time of the pathologist by focusing on the low-accuracy or ambiguous pictures after
first completing the classification of cancer cells through deep learning technology.In this paper, we are aiming
to develop an algorithm to detect mitotic cells and non-mitotic cells using supervised learning and unsupervised
learning. Directly Developing algorithms to detect cancer cells is too difficult, as there are a large variety of
cancer cells that show different and random traits. What we are focusing on is the cell mitosis, as cancer cells
are normally from DNA duplication error, which is part of the mitosis process. Even mitosis detection is very
difficult, as mitosis cells exhibit a huge amount of different structures, as mitotic nuclei, for example, exhibit a
variable appearance, and often have similar exterior traits to non-mitotic nuclei and other organelles. Therefore,
the machine learning process will have multiple layers to categorize random cells into different groups based on
its classification methods.

The goal of this study is to test and validate the effectiveness of deep learning models when
classifying mitotic and non-mitotic cells. A classification, deep learning model is a method of learning where a
labeled training set of data is used to train a model and then validated on another set of similar data. The main
logic behind this model is to first use supervised learning, learning which involves a known response and data
and discovering the relationship between the data and the response. Then, after some method of refining the
features and relationship between the data and response, the model now uses a test set of data which the model
will now use the features and try to label each data point with a response. Then these answers that the model
comes up with can be compared with the actual data and the test accuracy can be calculated.
III. Methods of Analysis and Data Collection
The goal of this study is to test and validate the effectiveness of deep learning models when
classifying mitotic and non-mitotic cells. Classifying cells through Deep learning models is a method of learning
where a labeled training set of data is used to train a model and validate on other sets of similar data. The main
logic behind this model is to first use supervised learning, a learning which involves a known response and data,
to discover the relationship between the data(cell images) and the response(label). Then, after some method of
refining the feature and relationship between the data and response, the model then uses a test set of data.
The testing of the convolutional neural network was initially conducted with the mnist dataset. Mnist
dataset is a black and white image with handwritten numbers (0~9), and is small in size with 28*28 pixels. The
training curve was also observed in the way of Transfer Learning, which is a way to speed up training and
improve accuracy by creating new models using existing models.
The learning method used in this study to extrapolate the features of the data is convolutional neural
networks. Within this study, the pre-existing python module, VGGNet, was used. Convolutional neural
networks use convolutional layers, where during each layer features are mapped and then the action max
pooling was processed with other layers. Also as the layers are processed the depth or the kernel at which the
pixels are processed increase. The more layers used would increase the number of features and accuracy.
However, using too many features can cause overfitting. The pre-existing python module, VGGNet, uses 19
convolutional layers. The data sets from the classification were then extracted and processed with UMAP
plotting.
When the cell pictures were classified through transfer learning, the existing CNNmodule VGG 19
was used. VGGNet is called VGG16 if it is composed of 16 layers, and VGG19 if it is composed of 19 layers,
depending on how many layers it is composed of.It is a model that can observe that the classification error
decreases as the depth increases to the 11th, 13th, 16th and 19th floors. In other words, the deeper it is, the
performance increases exponentially.
The data sets from the classification were then extracted and processed withUMAP plotting. UMAP
(Uniform Manifold Approximation and Projection) is a new manifold learning technique for nonlinear
dimension reduction. In actual finite data, it is said that a certain radius does not cover the manifold that they
imagined. Likewise, when there are too many points, the cover is covered too much and we end up with a higher
level of simplification than we ideally thought. If the data is uniformly distributed across the manifold, it is easy
to choose a suitable radius. The average distance between points will work fine. So, when uniform distribution
is used, the entire cover is guaranteed to cover the entire manifold with no gaps and no unnecessarily
disconnected components. In other words, it seems that there is no breakage and no gap, so that the entire
manifold can be composed. When plotting the data, classification technique through clustering, K-Means, which
is an analysis method that uses the distance or similarity between observations given. The data sets were
grouped into several groups in order to understand the structure of the entire data. The clustering of the data was
separated into K-groups so that the distance from the center of the reference point is minimal. This process
operates in a way that minimizes the variance of the distance difference between each cluster and an algorithm
that bundles the given data into k predefined clusters.
IV. Expected Results

Although mitotic cell detection using machine learning still requires much improvements in accuracy,
the use of supervised learning and unsupervised learning of cells are expected to show a good potential of
classifying mitotic and non mitotic cells. In the future, this research can contribute to a wide variety of
improvements and tools in the medical and oncology field. Currently, cancer is not diagnosed until a tumor or
enough significant mutation of cells has occurred, because non-mitotic cells are still hard to detect with the
current diagnostic technology that we have now. If a model was created to detect non-mitotic cells, this results
in software and diagnostic machines and tools that will allow physicians and pathologists to detect abnormal
mitosis activity and be able to warn the patient of the possibility of developing cancer in the future. This can
also lead to the development of surgical tools and other methods to detect and destroy these abnormal cells
before the cancer has progressed. This research would be cutting edge to the medical field and increase the
chances of preventing cancer and deadly diseases.
V. Conclusion
The next step for this is to refine the models and test on more data to refine the features of the model.
First thing is to improve the given models. Since, we can only use preset models, this can’t give us the flexibility
to run more layers or make any sort of modification to the models used. Second would be to collect more
diverse data. Since abnormal or normal mitotic activity has countless variables and features, having a data set
with many diverse cell types would allow the model to be less overfit and be rid of the possibility of biases or
any confounding features and factors. This can lead to a more broad and accurate learning model that can
correctly identify if any cell is normal. The limitation of this data would definitely be in the collection. There are
so many different cells from all types of people and animals it would be hard to create such a diverse data set
due to limitations in funding and time.
VI. Reference List

LeCun, Yann, Yoshua Bengio, and Geoffrey Hinton. "Deep learning." nature 521.7553 (2015): 436-444.
Shen, D., Wu, G. & Suk, H.-I. Deep learning in medical image analysis. Annual review of biomedical
engineering 19, 221-248 (2017)
Giusti, Alessandro, et al. "A comparison of algorithms and humans for mitosis detection." 2014 IEEE 11th
International Symposium on Biomedical Imaging (ISBI). IEEE, 2014.
Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). Imagenet classification with deep convolutional neural
networks. In Advances in neural information processing systems (pp. 1097-1105).
Simonyan, K. & Zisserman, A. Very deep convolutional networks for large-scale image recognition.arXiv
preprint arXiv:1409.1556 (2014) Leland McInnes, John Healy, James Melville, UMAP: Uniform Manifold
Approximation and Projection for Dimension Reduction, arXiv:1802.03426

Deep Learning of Mitosis From Cancer Tissue Images Using Convolutional Neural Networks

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Deep Learning of Mitosis From Cancer Tissue Images Using Convolutional Neural Networks

Uploaded by

Copyright:

Available Formats

The 16th Korea Scholars’ Conference for Youth

Deep Learning of Mitosis from Cancer Tissue

Northfield Mount Hermon School

주제어: 딥러닝, 유사분열, 인공지능, 합성공 신경망, 암

암은 조절되지 않은 세포의 복제의 결과이다. 암세포는 종종 사망 주기를 무시하고

세포와 비 유사 세포 간의 차이를 효과적으로 분류하는 것을 보여준다. 그 집단들 사이에선 비운동성

분류 정확도를 보여준다. 그러나 나머지 3 개 클러스터에서는 정확도가 개선의 여지가 있다. 이

정확하고 임상적으로 중요한 딥러닝 모델을 만들기 위한 디딤돌이 될 수 있다.

2. Purpose and Significance of the Study

3. Methods of Analysis and Data Collection

4. Expected Results and Limitations

Deep Learning of Mitosis from Cancer Tissue

Northfield Mount Hermon School

Cheongna Dalton International School

Euijin Lee & Jinhyuk Jang

abnormal mitotic activity.

can be very different.

II. Purpose and Significance of the Study

its classification methods.

III. Methods of Analysis and Data Collection

improve accuracy by creating new models using existing models.

performance increases exponentially.

that bundles the given data into k predefined clusters.

IV. Expected Results

chances of preventing cancer and deadly diseases.

due to limitations in funding and time.

VI. Reference List

You might also like