Thyroid Disorder Prediction System: Page - 1

You might also like

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 9

Thyroid Disorder Prediction System

1. Introduction
Information retrieval is the most distinguished field of machine learning and widely used
in clinical research. The conventional diagnostic frameworks are changing over machine-
based systems such as expert systems to build accurate decisions. It is a bit challenging
for the medical field to diagnose the patient at an early stage with fluctuating thyroid
levels during its structural growth. This may mislead the patient by wrong diagnosis of
growth of the thyroid organ and this is found as a major issue in healthcare. In these
situations, correct diagnosis of the type of thyroid disorder is more important.

Thyroid organ is the endocrine gland which manufactures the thyroid hormones. It is
found within the neck, under the jaw. The thyroid hormones maintain and balance the
body's metabolism, regulate the body’s temperature and maintain overall production of
energy. The thyroid releases two main hormones. The first is called thyroxine (T4) and
the other one is triiodothyronine (T3) into the bloodstream. There are two common
problems of thyroid disorder: Hyperthyroidism and Hypothyroidism. The first one
releases too much thyroid hormone into the bloodstream and the second one releases too
low thyroid hormones to the bloodstream. These means that thyroids are very active in
Hyperthyroidism.

The study aims to diagnose thyroid disease using a classification method that is SVM
(support vector machine). SVM is one of the very effective supervised machine learning
algorithms that is used to classify thyroid data. The purpose of this algorithm is to
separate the two classes of data points. We are going to use a support vector machine to
identify and predict thyroid diseases whether new patient data show thyroid disorder or
normal.

2. Problem statement
Nepal is a landlocked country surrounded by India and China. So, the risk of iodine
deficiency disease is high[1]. Thyroid disorder is one of the major iodine deficiency
diseases among the Nepalese people with the prevalence of hyperthyroid (13.68%),
hypothyroid (17.19%) in eastern Nepal (Helfand et al., 998) and 17.42% in western
Nepal (Risal et al., 2010)[1][2]. Female and aged population are more vulnerable to
thyroid disorder. Being one of the developing countries and limited health facilities reach
among the citizen thyroid disorder diagnosis has been a major challenge. As well as in
developed cities, the limited number of doctors and high number of patients has also been
a problem in diagnosis of thyroid.

Page | 1
3. Objective

● The research is expected to establish a model that can assist physicians /doctors in
diagnosing disease.
● The model combines human expertise with the technology intelligence to achieve
diagnosis that is more accurate for better and effective treatment plans.
● The model helps to expand the reach of thyroid diagnosis services and collect
patient’s health information for future research and study.

4. Scope and limitation


4.1. Scope:

This model is used by semi-skilled and skilled manpower within health sectors such as
physicians, doctors, health assistants etc. that combines clinical decision support with
computer-based patient records to reduce medical errors, enhance patient safety, decrease
unwanted practice variation, and improve patient’s test outcome.

4.2. Limitation:

● Medical diagnosis is considered as a significant yet intricate task that needs to be


carried out precisely and efficiently. So semi-skilled and skilled manpower are
only capable of operating this model.
● Our model is based on data of reports so the area of operation must have basic
facilities like blood tests.

5. Methodology

5.1. Requirement identification

5.1.1. Study of the Existing System


Literature review
For a literature review, we have decided to go through an existing work named
"Prediction of Thyroid Disorders Using Advanced Machine Learning Techniques " by
Priyanka Duggal and Shipra Shukla, students of Amity University, Department of CSE,
Uttar Pradesh, India.

Page | 2
The paper presented several methods of feature selection and classification for thyroid
disease diagnosis, related to the machine learning classification problems. They worked
on an important problem of pattern recognition to extract or select a feature set.

The proposed methods of feature selection were Univariate Selection, Recursive Feature
Elimination and Tree Based Feature Selection. Three classification techniques that have
been used namely were Naive Bayes, Support vector machines and Random Forest.
Results and reviews showed that the combination of the Recursive Feature Elimination
and the Support Vector Machine Technique has proven to be effective on their selected
dataset giving an accuracy of 92.92%. Hence, they were used to separate the symptoms
of thyroid diseases into 4 classes namely Hypothyroid, Hyperthyroid, Sick Euthyroid and
Euthyroid (negative).
The feature set used for selection were ‘Age’, ‘Sex’, ‘TSH’, ‘TT4’, ‘T4U’, ‘T3’ and
‘FTI’. [3]

Another existing work for literature reviews we went through is "Classification of


Hypothyroid Disorder using Optimized SVM Method" by Vaishali S. Vairale, a Research
Scholar from Engineering Kengeri Campus, Bangalore, India and Dr. Samiksha Shukla,
an Associate Professor of Data Science Department from Lavas a Pune Campus, Pune,
Maharashtra, India.

The research was based on Hypothyroidism, a disorder where the thyroid organ doesn't
provide enough thyroid hormones. Detection of hypothyroidism needs suitable diagnostic
tests to encourage prompt analysis and medication. They worked on how to interpret and
understand such a huge data and retrieve the accurate and relevant information from it.
The proposed study tends to gain the knowledge from the hypothyroid dataset to predict
the level of disease.

To identify the level of hypothyroid disorder, they used four classification machine
learning techniques, namely KNN (K-Nearest Neighbor), SVM (Support Vector
Machines), LR (Logistic Regression) and NN (Artificial Neural Network). The
Experimental results compared the classification accuracy of four methods. Logistic
Regression method achieved 96.08% accuracy among other three classifiers. But, SVM
was found the best classifier after standardizing the data and parameter tuning with
accuracy of 99.08%. [4]

Algorithm Usage
RFE & SVM algorithms are our choice for feature selection and classification after deep
study and review of existing works and literature.

Page | 3
In the Recursive Feature Elimination (RFE) method, an external estimator assigns
weights or importance values to features and it recursively selects features creating a
smaller and smaller set of features. Initially the estimator is trained on the entire set of
features and it uses the ‘feature_importances’ attribute or the ‘coef_’ attribute to test the
significance of each feature. The ‘fit’ method provides information regarding the feature
importance. Then the least important features are discarded from the dataset and these
steps are performed repeated recursively until a specific number of desired features are
reached. The estimated accuracy is about 77.5% and the 5 best features chosen by RFE
are: TSH, T4U, TT4, T3, and FTI.

Support Vector Machine or SVM is one of the Supervised Learning algorithms


used for Classification problems in Machine Learning. It is a classifier which works by
separating classes through a hyperplane. The input to the algorithm is a set of labeled
training data (supervised learning) and the output is a graph separating new instances of
data into the classes through an optimal hyperplane. The hyper plane is basically a line
separating a plane into 2 parts, each class lies on either side of the line. It can be utilized
for both regression and classification problems, although it is mostly used for
classification problems. Every data point is plotted in an n dimensional space , ‘n’ is the
total no. of features and the value of each feature is the value of that particular coordinate
on the graph Support vectors are essentially the coordinators of individual observations
and a Support Vector Machine is the model that best separates the 2 classes of support
vectors.

5.1.2. Requirement Collection


Functional requirement: -
The model should be able to take user data as an input and classify the type of thyroid
disorder if present as an output.

Non-functional requirement: -
● Time: - The output of the model should be faster.
● Accuracy: - The model should be able to generate output with acceptable
accuracy.
● Simplicity: - The model should be simple and easy to use so that semi-
skilled manpower can use without complexion.

5.2. Feasibility Study


A feasibility study is an analysis that takes all of a project's relevant factors into account
—including economic, technical, legal, and scheduling considerations—to ascertain the
likelihood of completing the project successfully.

Page | 4
5.2.1. Technical feasibility
The project is technically feasible as it can be built using the existing available
technologies. It is a model with use of python language as a platform.

5.2.2 Economic feasibility

The project is economically feasible as the cost of the project is involved only in the
hosting of the project. As the data samples increase, which consume more time and
processing power. In that case a better processor might be needed. In the case of our
project, we dedicated our valuable time and we were able to build, train, validate and test
our project on our personal laptops although the training and testing phase was time
consuming.

5.2.3 Operational feasibility

The project is operationally feasible as the user (physician) has basic knowledge about
computers and the Internet. Disease Predictor is based on client-server architecture where
the client is the user (physician) and server is the machine where datasets are stored.

5.2.4 Legal Feasibility:

This project does not violate any of the law and regulation

5.2.5 Schedule Feasibility: -

1st 2n 3r 4th 5th 6t 7th 8th 9t 10th 11th 12th


d d h h

Study and
3W
analysis 

Data collection  2W

Implementation 4W

Testing 2W

Documentation 6W

Review  2W

Presentation 1W

Fig: - Schedule Feasibility

Page | 5
5.3. Data Collection
In our research, UCI repository will be used as our data sources for developing automatic
machine learning tools, in order to produce useful predictive methods for diagnosis of
thyroid. We are planning to use its thyroid disease dataset for training our data whereas
testing data are planned to be collected through health services like hospitals and clinics
related to thyroid disorder diagnosis nationally.

5.4. Tools

5.4.1. Analysis and design tools


This phase consists of detail structure that helps to know about overall process of our
system.

Input data

Feature selection

Hyperthyroid Normal Hypothyroid

Fig: - flow chart

Page | 6
5.4.2. Implementation tool

Python: project programming language


Jupyter Notebook: for implementation.

5.5. Testing

In our project, the data we are using is split into training dataset and test dataset. Here,
70% of the data will be used for training and 30% will be used for testing. The data is
split so that there is data for the model to be evaluated on to see how well the model
performs on unseen data.

The training set consists of a known output and the model learns on this in order to be
generalized to other data later on. We have the test dataset in order to test our model’s
prediction on this test-set.

Page | 7
6. High Level Design of Proposed System

Fig: - flow chart of overall process

7. Expected outcomes

We are building a predictive tool that can help doctors/ physicians to diagnose and
predict thyroid gland disorder. The expected outcome is to separate the symptoms of
thyroid diseases into 3 classes namely Hypothyroid, Hyperthyroid and Normal state from
the tested patient’s health information.

Page | 8
References
[1] RV Mahato, B Jha, KP Singh, BK Yadav, SK Shah and M Lamsal.” STATUS OF
THYROID DISORDERS IN CENTRAL NEPAL: A TERTIARY CARE HOSPITAL
BASED
STUDY”.Internet:https://www.researchgate.net/publication/277595234_Status_of_Thyro
id_Disorders_in_Central_Nepal_A_Tertiary_Care_Hospital_Based_Study,March
2015[june, 2020].

[2] Madhukar Aryal, Prabin Gyawali, Nirakar Rajbhandari, Pratibha Aryal, Dipendra Raj
Pandeya,”https://www.researchgate.net/publication/215774289_A_prevalence_of_thyroi
d_dysfunction_in_Kathmandu_University_Hospital_Nepal”,Internet:https://www.researc
hgate.net/publication/215774289_A_prevalence_of_thyroid_dysfunction_in_Kathmandu
_University_Hospital_Nepal.Oct,2010[June,2020].

[3] Priyanka Duggal and Shipra Shukla (2020). “Prediction Of Thyroid Disorders Using
Advanced Machine Learning Techniques” Journal title. [Online]. Vol. (issue), 670-675
Available: site/path/file [June, 2020].

[4] Vaishali S. Vairale and Dr. Samiksha Shukla (2019). “Classification of Hypothyroid
Disorder using Optimized SVM Method” Journal title. [Online]. Vol. (issue),258-263 .
Available: site/path/file [June,2020].

Page | 9

You might also like