Professional Documents
Culture Documents
Thyroid Disorder Prediction System: Page - 1
Thyroid Disorder Prediction System: Page - 1
Thyroid Disorder Prediction System: Page - 1
1. Introduction
Information retrieval is the most distinguished field of machine learning and widely used
in clinical research. The conventional diagnostic frameworks are changing over machine-
based systems such as expert systems to build accurate decisions. It is a bit challenging
for the medical field to diagnose the patient at an early stage with fluctuating thyroid
levels during its structural growth. This may mislead the patient by wrong diagnosis of
growth of the thyroid organ and this is found as a major issue in healthcare. In these
situations, correct diagnosis of the type of thyroid disorder is more important.
Thyroid organ is the endocrine gland which manufactures the thyroid hormones. It is
found within the neck, under the jaw. The thyroid hormones maintain and balance the
body's metabolism, regulate the body’s temperature and maintain overall production of
energy. The thyroid releases two main hormones. The first is called thyroxine (T4) and
the other one is triiodothyronine (T3) into the bloodstream. There are two common
problems of thyroid disorder: Hyperthyroidism and Hypothyroidism. The first one
releases too much thyroid hormone into the bloodstream and the second one releases too
low thyroid hormones to the bloodstream. These means that thyroids are very active in
Hyperthyroidism.
The study aims to diagnose thyroid disease using a classification method that is SVM
(support vector machine). SVM is one of the very effective supervised machine learning
algorithms that is used to classify thyroid data. The purpose of this algorithm is to
separate the two classes of data points. We are going to use a support vector machine to
identify and predict thyroid diseases whether new patient data show thyroid disorder or
normal.
2. Problem statement
Nepal is a landlocked country surrounded by India and China. So, the risk of iodine
deficiency disease is high[1]. Thyroid disorder is one of the major iodine deficiency
diseases among the Nepalese people with the prevalence of hyperthyroid (13.68%),
hypothyroid (17.19%) in eastern Nepal (Helfand et al., 998) and 17.42% in western
Nepal (Risal et al., 2010)[1][2]. Female and aged population are more vulnerable to
thyroid disorder. Being one of the developing countries and limited health facilities reach
among the citizen thyroid disorder diagnosis has been a major challenge. As well as in
developed cities, the limited number of doctors and high number of patients has also been
a problem in diagnosis of thyroid.
Page | 1
3. Objective
● The research is expected to establish a model that can assist physicians /doctors in
diagnosing disease.
● The model combines human expertise with the technology intelligence to achieve
diagnosis that is more accurate for better and effective treatment plans.
● The model helps to expand the reach of thyroid diagnosis services and collect
patient’s health information for future research and study.
This model is used by semi-skilled and skilled manpower within health sectors such as
physicians, doctors, health assistants etc. that combines clinical decision support with
computer-based patient records to reduce medical errors, enhance patient safety, decrease
unwanted practice variation, and improve patient’s test outcome.
4.2. Limitation:
5. Methodology
Page | 2
The paper presented several methods of feature selection and classification for thyroid
disease diagnosis, related to the machine learning classification problems. They worked
on an important problem of pattern recognition to extract or select a feature set.
The proposed methods of feature selection were Univariate Selection, Recursive Feature
Elimination and Tree Based Feature Selection. Three classification techniques that have
been used namely were Naive Bayes, Support vector machines and Random Forest.
Results and reviews showed that the combination of the Recursive Feature Elimination
and the Support Vector Machine Technique has proven to be effective on their selected
dataset giving an accuracy of 92.92%. Hence, they were used to separate the symptoms
of thyroid diseases into 4 classes namely Hypothyroid, Hyperthyroid, Sick Euthyroid and
Euthyroid (negative).
The feature set used for selection were ‘Age’, ‘Sex’, ‘TSH’, ‘TT4’, ‘T4U’, ‘T3’ and
‘FTI’. [3]
The research was based on Hypothyroidism, a disorder where the thyroid organ doesn't
provide enough thyroid hormones. Detection of hypothyroidism needs suitable diagnostic
tests to encourage prompt analysis and medication. They worked on how to interpret and
understand such a huge data and retrieve the accurate and relevant information from it.
The proposed study tends to gain the knowledge from the hypothyroid dataset to predict
the level of disease.
To identify the level of hypothyroid disorder, they used four classification machine
learning techniques, namely KNN (K-Nearest Neighbor), SVM (Support Vector
Machines), LR (Logistic Regression) and NN (Artificial Neural Network). The
Experimental results compared the classification accuracy of four methods. Logistic
Regression method achieved 96.08% accuracy among other three classifiers. But, SVM
was found the best classifier after standardizing the data and parameter tuning with
accuracy of 99.08%. [4]
Algorithm Usage
RFE & SVM algorithms are our choice for feature selection and classification after deep
study and review of existing works and literature.
Page | 3
In the Recursive Feature Elimination (RFE) method, an external estimator assigns
weights or importance values to features and it recursively selects features creating a
smaller and smaller set of features. Initially the estimator is trained on the entire set of
features and it uses the ‘feature_importances’ attribute or the ‘coef_’ attribute to test the
significance of each feature. The ‘fit’ method provides information regarding the feature
importance. Then the least important features are discarded from the dataset and these
steps are performed repeated recursively until a specific number of desired features are
reached. The estimated accuracy is about 77.5% and the 5 best features chosen by RFE
are: TSH, T4U, TT4, T3, and FTI.
Non-functional requirement: -
● Time: - The output of the model should be faster.
● Accuracy: - The model should be able to generate output with acceptable
accuracy.
● Simplicity: - The model should be simple and easy to use so that semi-
skilled manpower can use without complexion.
Page | 4
5.2.1. Technical feasibility
The project is technically feasible as it can be built using the existing available
technologies. It is a model with use of python language as a platform.
The project is economically feasible as the cost of the project is involved only in the
hosting of the project. As the data samples increase, which consume more time and
processing power. In that case a better processor might be needed. In the case of our
project, we dedicated our valuable time and we were able to build, train, validate and test
our project on our personal laptops although the training and testing phase was time
consuming.
The project is operationally feasible as the user (physician) has basic knowledge about
computers and the Internet. Disease Predictor is based on client-server architecture where
the client is the user (physician) and server is the machine where datasets are stored.
This project does not violate any of the law and regulation
Study and
3W
analysis
Data collection 2W
Implementation 4W
Testing 2W
Documentation 6W
Review 2W
Presentation 1W
Page | 5
5.3. Data Collection
In our research, UCI repository will be used as our data sources for developing automatic
machine learning tools, in order to produce useful predictive methods for diagnosis of
thyroid. We are planning to use its thyroid disease dataset for training our data whereas
testing data are planned to be collected through health services like hospitals and clinics
related to thyroid disorder diagnosis nationally.
5.4. Tools
Input data
Feature selection
Page | 6
5.4.2. Implementation tool
5.5. Testing
In our project, the data we are using is split into training dataset and test dataset. Here,
70% of the data will be used for training and 30% will be used for testing. The data is
split so that there is data for the model to be evaluated on to see how well the model
performs on unseen data.
The training set consists of a known output and the model learns on this in order to be
generalized to other data later on. We have the test dataset in order to test our model’s
prediction on this test-set.
Page | 7
6. High Level Design of Proposed System
7. Expected outcomes
We are building a predictive tool that can help doctors/ physicians to diagnose and
predict thyroid gland disorder. The expected outcome is to separate the symptoms of
thyroid diseases into 3 classes namely Hypothyroid, Hyperthyroid and Normal state from
the tested patient’s health information.
Page | 8
References
[1] RV Mahato, B Jha, KP Singh, BK Yadav, SK Shah and M Lamsal.” STATUS OF
THYROID DISORDERS IN CENTRAL NEPAL: A TERTIARY CARE HOSPITAL
BASED
STUDY”.Internet:https://www.researchgate.net/publication/277595234_Status_of_Thyro
id_Disorders_in_Central_Nepal_A_Tertiary_Care_Hospital_Based_Study,March
2015[june, 2020].
[2] Madhukar Aryal, Prabin Gyawali, Nirakar Rajbhandari, Pratibha Aryal, Dipendra Raj
Pandeya,”https://www.researchgate.net/publication/215774289_A_prevalence_of_thyroi
d_dysfunction_in_Kathmandu_University_Hospital_Nepal”,Internet:https://www.researc
hgate.net/publication/215774289_A_prevalence_of_thyroid_dysfunction_in_Kathmandu
_University_Hospital_Nepal.Oct,2010[June,2020].
[3] Priyanka Duggal and Shipra Shukla (2020). “Prediction Of Thyroid Disorders Using
Advanced Machine Learning Techniques” Journal title. [Online]. Vol. (issue), 670-675
Available: site/path/file [June, 2020].
[4] Vaishali S. Vairale and Dr. Samiksha Shukla (2019). “Classification of Hypothyroid
Disorder using Optimized SVM Method” Journal title. [Online]. Vol. (issue),258-263 .
Available: site/path/file [June,2020].
Page | 9