Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 10

The 16th Korea Scholars’ Conference for Youth

Deep Learning of Mitosis from Cancer Tissue


Images using Convolutional Neural Networks.
- Detecting mitotic and non-mitotic cells using machine learning -

2021. 05. 23

Northfield Mount Hermon School


Cheongna Dalton International School
Euijin Lee & Jinhyuk Jang

국문초록
The 16th Korea Scholars’ Conference for Youth

주제어: 딥러닝, 유사분열, 인공지능, 합성공 신경망, 암

암은 조절되지 않은 세포의 복제의 결과이다. 암세포는 종종 사망 주기를 무시하고

비정상적인 체세포 활동을 보이며 병원성 돌연변이를 감지하지 못한다. 연구원들은 종종 암세포를
발견할 수 있는 모델을 개발하기 위해 딥러닝을 사용하는 것이 이 병의 예후와 치사율을 향상시킬 수

있다는 가설을 세운다. 이 연구는 합성공 신경 네트워크와 다층 지각론을 사용하여 체세포 영상에서
비 체세포성(암세포의 원인)을 정확하게 식별할 수 있다. 이 연구에서는 데이터 세트 MITOS 를

사용하여 모델을 교육한 다음 K-평균 군집 분석을 사용하여 테스트 세트 영상을 분류한다. 이 모델은
영상을 유사한 특징을 가진 5 개의 세포 클러스터로 나눈 다음 유사 세포를 분류함으로써 유사

세포와 비 유사 세포 간의 차이를 효과적으로 분류하는 것을 보여준다. 그 집단들 사이에선 비운동성


장애요 이 모델의 결과는 데이터 내에서 생성된 두 그룹에 대해 77%의 검증 정확도와 90% 이상의

분류 정확도를 보여준다. 그러나 나머지 3 개 클러스터에서는 정확도가 개선의 여지가 있다. 이


연구는 환자 내에서 암이 발생하기 전에 암 활동을 진단하고 발견하는 데 도움을 줄 수 있는

정확하고 임상적으로 중요한 딥러닝 모델을 만들기 위한 디딤돌이 될 수 있다.


The 16th Korea Scholars’ Conference for Youth

-Abstract-

Key Words: Deep learning, Mitosis, Artificial Intelligence, Convolutional Neural Networks, Cancer

Cancer is the result of unregulated duplication of cells. Cancer cells often ignore the death cycle, show

abnormal mitotic activity and fail to detect pathogenic mutations. Researchers often hypothesize that the use of

deep learning to develop a model that can detect cancer cells can improve the prognosis and fatality of the

disease. Using convolutional neural networks and multiple layer perceptron, the study is able to accurately

identify non-mitotic (cause of cancer cells) from mitotic cell images. The study uses the dataset MITOS to train

the model and then uses K-means clustering to classify the test set images. This model shows to effectively

classify the difference between mitotic and non-mitotic cells by dividing the images up into 5 clusters of cells

with similar features then classifying mitotic. Vs non-mitotic between those groups. The result of this model

shows a 77% validation accuracy and a classification accuracy of over 90% for two groups created within the

data. However, within the other remaining three clusters the accuracy shows room for improvement. This study

can be a stepping stone to creating an accurate and clinically significant deep learning model that can help

diagnose and detect cancerous activity before the onset of cancer within patients.

Contents
The 16th Korea Scholars’ Conference for Youth

1. Introduction

2. Purpose and Significance of the Study 

3. Methods of Analysis and Data Collection 

4. Expected Results and Limitations 

5. Conclusion

6. Reference List
The 16th Korea Scholars’ Conference for Youth

Deep Learning of Mitosis from Cancer Tissue


Images using Convolutional Neural Networks.
- Detecting mitotic and non-mitotic cells using machine learning -

Northfield Mount Hermon School

Cheongna Dalton International School

Euijin Lee & Jinhyuk Jang

I. Introduction      
Despite the advancements of medicine and healthcare, cancer is a disease that is still feared and

prolific. The disease accounts for about 25% of all deaths in most medically and economically developed

countries and the second highest cause of death due to disease, after circulatory disease. The etiology of cancer

is obscure because there is not a straightforward or clear cause; it can be caused by carcinogens, viruses,

genetics and countless other reasons. Cancer cells ignore the death cycle and duplicate abnormally, destroying

the function of the human body. These cells can originate from any living tissue in the body and display

abnormal mitotic activity. 

Mitosis is the process and cycle of which cells duplicate. The cell cycle is divided into stages: G1

phase, S phase, G2 phase, and M phase. Between each phase there are checkpoints at which the cell decides to

move forward by checking whether the process was done correctly and completely. If a cell is unable to pass

through the checkpoint between G1 and S phase then it will enter its dormant phase (otherwise known as the G0

phase). Eventually if the cell is unable to divide then it will go through apoptosis, or cell death. Cells that often

exhibit mutations or other abnormalities will be stopped at this checkpoint so its mutation will not divide. 

Cancer cells are mutated and abnormal cells that are able to pass this checkpoint regardless and

duplicating therefore causing proliferation of cancerous cells within the tissue and surrounding region.

Eventually over time a tumor can develop and the cancerous cells can migrate to other organs via the lymphatic
The 16th Korea Scholars’ Conference for Youth

or circulatory system and metastasize throughout the body. Once metastasis occurs, where the cancer has spread

through other organs and escapes its localized location, treatment is incredibly difficult. Clinically, cancer is

classified in four stages depending on the spread of the cancer through the patient. It is proven that the earlier

the stage of cancer the easier it is to treat it, and the better the prognosis. Since, there is no treatment of cancer

that guarantees long-term survival without relapse, physicians and healthcare providers focus on the early

detection of cancer. 

The limitation of cancer detection is it requires human work and labor. Diagnosis of cancer requires

active and deliberate detection. Researchers hypothesize that the use of artificial intelligence can allow

pathologists to be able to detect cancer cells even before stage 1 or the presence of any symptoms. The

limitation of creating such model and tool to detect cancer cells would require a dataset of cancer cells that have

already been identified as cancer cells to train the data, this is called the training set. Since cancer cells can be

exhibited in any living cell from any type of tissue and patient, cancer cells come with countless features and

can be very different. 

II. Purpose and Significance of the Study 


Recently, as researches using Convolutional Neural Networks are actively progressing, research

results on various image recognition are emerging. In particular,studies on cancer cell image recognition and

classification have made rapid progress. From a medical point of view, deep learning technology has not yet

provided confidence in recognition rate or accuracy that does not require a human hand. However, when the

pathologist distinguishes cancer cells in the classification of cancer cells, it is possible to significantly reduce the

fatigue and consumption time of the pathologist by focusing on the low-accuracy or ambiguous pictures after

first completing the classification of cancer cells through deep learning technology.In this paper, we are aiming

to develop an algorithm to detect mitotic cells and non-mitotic cells using supervised learning and unsupervised

learning. Directly Developing algorithms to detect cancer cells is too difficult, as there are a large variety of

cancer cells that show different and random traits. What we are focusing on is the cell mitosis, as cancer cells

are normally from DNA duplication error, which is part of the mitosis process. Even mitosis detection is very

difficult, as mitosis cells exhibit a huge amount of different structures, as mitotic nuclei, for example, exhibit a

variable appearance, and often have similar exterior traits to non-mitotic nuclei and other organelles. Therefore,

the machine learning process will have multiple layers to categorize random cells into different groups based on

its classification methods.


The 16th Korea Scholars’ Conference for Youth

The goal of this study is to test and validate the effectiveness of deep learning models when

classifying mitotic and non-mitotic cells. A classification, deep learning model is a method of learning where a

labeled training set of data is used to train a model and then validated on another set of similar data. The main

logic behind this model is to first use supervised learning, learning which involves a known response and data

and discovering the relationship between the data and the response. Then, after some method of refining the

features and relationship between the data and response, the model now uses a test set of data which the model

will now use the features and try to label each data point with a response. Then these answers that the model

comes up with can be compared with the actual data and the test accuracy can be calculated.

III. Methods of Analysis and Data Collection 

The goal of this study is to test and validate the effectiveness of deep learning models when

classifying mitotic and non-mitotic cells. Classifying cells through Deep learning models is a method of learning

where a labeled training set of data is used to train a model and validate on other sets of similar data. The main

logic behind this model is to first use supervised learning, a learning which involves a known response and data,

to discover the relationship between the data(cell images) and the response(label). Then, after some method of

refining the feature and relationship between the data and response, the model then uses a test set of data.

The testing of the convolutional neural network was initially conducted with the mnist dataset. Mnist

dataset is a black and white image with handwritten numbers (0~9), and is small in size with 28*28 pixels. The
The 16th Korea Scholars’ Conference for Youth

training curve was also observed in the way of Transfer Learning, which is a way to speed up training and

improve accuracy by creating new models using existing models. 

The learning method used in this study to extrapolate the features of the data is convolutional neural

networks. Within this study, the pre-existing python module, VGGNet, was used. Convolutional neural

networks use convolutional layers, where during each layer features are mapped and then the action max

pooling was processed with other layers. Also as the layers are processed the depth or the kernel at which the

pixels are processed increase. The more layers used would increase the number of features and accuracy.

However, using too many features can cause overfitting. The pre-existing python module, VGGNet, uses 19

convolutional layers. The data sets from the classification were then extracted and processed with UMAP

plotting. 

When the cell pictures were classified through transfer learning, the existing CNNmodule VGG 19

was used. VGGNet is called VGG16 if it is composed of 16 layers, and VGG19 if it is composed of 19 layers,

depending on how many layers it is composed of.It is a model that can observe that the classification error

decreases as the depth increases to the 11th, 13th, 16th and 19th floors. In other words, the deeper it is, the

performance increases exponentially.

The data sets from the classification were then extracted and processed withUMAP plotting. UMAP

(Uniform Manifold Approximation and Projection) is a new manifold learning technique for nonlinear

dimension reduction. In actual finite data, it is said that a certain radius does not cover the manifold that they
The 16th Korea Scholars’ Conference for Youth

imagined. Likewise, when there are too many points, the cover is covered too much and we end up with a higher

level of simplification than we ideally thought. If the data is uniformly distributed across the manifold, it is easy

to choose a suitable radius. The average distance between points will work fine. So, when uniform distribution

is used, the entire cover is guaranteed to cover the entire manifold with no gaps and no unnecessarily

disconnected components. In other words, it seems that there is no breakage and no gap, so that the entire

manifold can be composed. When plotting the data, classification technique through clustering, K-Means, which

is an analysis method that uses the distance or similarity between observations given. The data sets were

grouped into several groups in order to understand the structure of the entire data. The clustering of the data was

separated into K-groups so that the distance from the center of the reference point is minimal. This process

operates in a way that minimizes the variance of the distance difference between each cluster and an algorithm

that bundles the given data into k predefined clusters.

IV. Expected Results


Although mitotic cell detection using machine learning still requires much improvements in accuracy,

the use of supervised learning and unsupervised learning of cells are expected to show a good potential of

classifying mitotic and non mitotic cells. In the future, this research can contribute to a wide variety of

improvements and tools in the medical and oncology field. Currently, cancer is not diagnosed until a tumor or

enough significant mutation of cells has occurred, because non-mitotic cells are still hard to detect with the

current diagnostic technology that we have now. If a model was created to detect non-mitotic cells, this results

in software and diagnostic machines and tools that will allow physicians and pathologists to detect abnormal

mitosis activity and be able to warn the patient of the possibility of developing cancer in the future. This can

also lead to the development of surgical tools and other methods to detect and destroy these abnormal cells
The 16th Korea Scholars’ Conference for Youth

before the cancer has progressed. This research would be cutting edge to the medical field and increase the

chances of preventing cancer and deadly diseases.  

V. Conclusion
The next step for this is to refine the models and test on more data to refine the features of the model.

First thing is to improve the given models. Since, we can only use preset models, this can’t give us the flexibility

to run more layers or make any sort of modification to the models used. Second would be to collect more

diverse data. Since abnormal or normal mitotic activity has countless variables and features, having a data set

with many diverse cell types would allow the model to be less overfit and be rid of the possibility of biases or

any confounding features and factors. This can lead to a more broad and accurate learning model that can

correctly identify if any cell is normal. The limitation of this data would definitely be in the collection. There are

so many different cells from all types of people and animals it would be hard to create such a diverse data set

due to limitations in funding and time. 

VI. Reference List


LeCun, Yann, Yoshua Bengio, and Geoffrey Hinton. "Deep learning." nature 521.7553 (2015): 436-444.

Shen, D., Wu, G. & Suk, H.-I. Deep learning in medical image analysis. Annual review of biomedical
engineering 19, 221-248 (2017)

Giusti, Alessandro, et al. "A comparison of algorithms and humans for mitosis detection." 2014 IEEE 11th
International Symposium on Biomedical Imaging (ISBI). IEEE, 2014.

Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). Imagenet classification with deep convolutional neural
networks. In Advances in neural information processing systems (pp. 1097-1105).

Simonyan, K. & Zisserman, A. Very deep convolutional networks for large-scale image recognition.arXiv
preprint arXiv:1409.1556 (2014) Leland McInnes, John Healy, James Melville, UMAP: Uniform Manifold
Approximation and Projection for Dimension Reduction, arXiv:1802.03426

You might also like