AI-based Chatbot For Skin Disease Prediction Using CNN and ID3 Decision Tree

AI-based chatbot for skin disease prediction
using CNN and ID3 Decision Tree

G-13
Team Members:
Vishnupriya N 2019103599
Ishwarya Rani M 2019103527
Navvya L 2019103548
Under the guidance of

Dr. P . Geetha
1. OVERALL OBJECTIVE
• Skin diseases can be difficult to diagnose, and patients may not always have easy access to a dermatologist. A
possible solution to this problem is to develop a system that can predict skin diseases based on images and provide
basic information about the condition through a chatbot.
• The chatbot will use a convolutional neural network (CNN) for classification of skin diseases based on images
submitted by the user. This will be a critical component of the chatbot, as it will enable the system to accurately
diagnose skin diseases based on visual cues.
• To aid in the diagnosis process, the chatbot will use a Dynamic ID3 decision tree for preparing the questionnaire.
This will enable the chatbot to ask relevant questions to the user, which will help narrow down the potential
diagnoses.
• Additionally, the chatbot will use natural language processing (NLP) for other chatbot-related user questions with a
pre-built FAQ dataset. This will enable the chatbot to answer any general questions the user may have about skin
diseases, as well as provide additional information and guidance.
• Overall, the objective is to create a user-friendly and efficient system for diagnosing skin diseases that leverages the
power of CNN, Dynamic ID3 decision tree, and NLP. By doing so, the chatbot will be able to accurately diagnose
skin diseases, provide additional information and guidance, and ultimately help improve the health and well-being
of its users.
2. OVERALL ARCHITECTURE
3. DETAILED MODULE DESIGN
Module 1: Customized CNN Model Creation on ISIC2019 Dataset
Module 2: Shortlisting of skin diseases for given user image
Module 3: Dynamic ID3 Decision Tree Model
Module 4: Creation of Chatbot and Questionnaire Model
Module 5: Prediction of exact Skin Disease by the user- chatbot interaction

Module 1: Customized CNN Model Creation on ISIC2019 Dataset
• CNN model is created for the ISCI2019 skin disease image data set.
• This dataset consists of images of six skin diseases which are actinic keratosis, basal cell carcinoma, melanoma,
nevus, seborrheic keratosis, squamous cell carcinoma.
• For this image dataset, the customized CNN model is trained and created
INPUT - ISIC2019 skin disease image data set.

OUTPUT - CNN model
ALGORITHM:
1. Import all required libraries
2. Define path for train and test images of ISIC2019 dataset.
3. Visualise images
4. Perform data augmentation on train data set
4.1. for each class:
4.1.1. specify output directory
4.1.2. image -> rotate with probability 0.7
4.1.2.1. max_left_rotation = 10
4.1.2.2. max_right_rotation = 10
4.1.3. maintain images count as 1000
5. Spilt augmented data for train and validation
5.1. Train -> 80%
5.2. Validation data -> 20%
5.3. Rescale images to 180 x 180

6. Create Sequential CNN Model
6.1. Layer 1 ->
6.1.1. add Conv2D() -> filters=32, size = 3, padding = ‘same’,
activation = ‘relu’
6.1.2. add MaxPool2d()
6.2. Layer 2 ->
6.2.1. add Conv2D() -> filters=64, size = 3, padding = ‘same’,
6.3. Layer 3 ->
6.3.1. add Conv2D()->filters=128, size =3, padding = ‘same’,
6.3.3. add Dropout() -> rate = 0.15
6.4. Layer 4 ->
6.5. Layer 5 ->
6.6. Layer 6 ->
6.6.1. add Flatten()
6.6.2. add Dense() -> units = 1024, activation = ‘rule’
6.7. Layer 7 ->
6.7.1. add Dense() -> units = 6, activation = ‘sigmoid’
7. Compile CNN model
7.1. Optimiser -> Adam, learning_rate = 0.001
7.2. Loss Function -> SparseCategoricalCrossentropy, from_logits =
True
7.3. metrics -> accuracy
8. Model.fit(), epochs = 25
9. Print -> accuracy
10. Save the created Model and the learned weights
FORMULAS:
 Activation function:
Y = ( Σ ( weight * input ) + bias )
 Sigmoid activation:
sig(x) = 1/(1+exp(-x))
 Relu Activation Function:
F(x) = max(0,x) = x if x>0 or o if x<0
 Sparse categorical cross-entropy (S - samples, C - classes, s∈c - sample belongs to class c) :
−1/N(∑s∈S∑c∈C1s∈clogp(s∈c))
DETAILED DIAGRAM:
Module 2: Shortlisting of skin diseases for given user image
• Shortlisting skin diseases for given user image is done using the CNN model which was created in module 1.
• User image is collected and is applied on the created CNN model.
• The skin disease above the threshold 0.5 are shortlisted for the next process.
INPUT – CNN model and User Image

OUTPUT – Shortlisted skin diseases
ALGORITHM:
1. Load the created CNN model

2. Threshold = 0.5
3. The diseases are mapped to short forms:
ACK –>Actinic Keratosis
BCC –> Basal Cell Carcinoma
MEL –> Melanoma
NEV –> Nevus
SEK –> Seborrheic Keratosis
SCC –> Squamous Cell Carcinoma
4. User image is collected FORMULAS:
5. The image is rescaled to 180 X 180  Activation function:
6. Convert image to array
Y = ( Σ ( weight * input ) + bias )
7. Apply the created CNN model
8. Predictions -> sigmoid values of each class  Sigmoid activation:
9. Find the shortlist using the predictions sig(x) = 1/(1+exp(-x))
9.1. Initialise empty list ‘shortlist’
9.2. for i in each prediction
9.2.1. if i > threshold
9.2.1.1. Append to ‘shortlist’
9.3. Return shortlist
DETAILED DESIGN:
Module 3: Dynamic ID3 Decision Tree Model
• Here another dataset PAD-UFES-20, which consist of the external factors is used.
• Only the details of the diseases that were shortlisted from the previous module are extracted from the PAD-UFES-20
dataset.
• ID3 decision tree model will be created for the extracted details.
INPUT – Shortlisted skin diseases

OUTPUT – ID3 Decision tree model
ALGORITHM:

1. Extract only the data of shortlisted diseases from PAD-UFES-20
dataset
1.1 Initialise empty data[]
1.1.1. for i in dataset:
1.1.2. if i[disease] in shortlisted_diseases:
1.1.3. append to data[]
1.2. return data
2. Calculate entropy for whole data
3. For each attribute/feature
3.1. Calculate entropy for all its categorical values
3.2. Calculate information gain for the feature
4. Find the feature with maximum information gain
5. Add feature as node in the tree
6. Repeat steps 2 – 5 to get the decision tree
7. Dynamic Decision tree created
FORMULAS:
• Entropy:
S - The current dataset for which entropy is being calculated

C - Set of classes in S
p(c) - The proportion of the number of elements in class c to the number of elements in set S.
• Information Gain:
H(S) - Entropy of set S.
T - The subsets created from splitting set S by attribute A such that

p(t) - The proportion of the number of elements in t to the number
of elements in set S.
H(t) - Entropy of subset t.

DETAILED DESIGN:
Module 4: Creation of Chatbot and Questionnaire Model
• The chatbot is created using the questions that need to be answered by the user.
• A model for creating the customized question and answer is created. Using this model, the chatbot generates the
customized question-answer for the dynamically generated decision tree.
• The Chatbot collects the user image for further processing.
INPUT - NLP Model and Question-Answer model

OUTPUT - Chatbot
ALGORITHM:
(a) Questionnaire model:
1. Each attribute is framed as a question.

2. Each question has a defined answer set.
3. Traverse the decision tree obtained from the input user image provided.
4. checked each attribute if it has more than one branch
4.1. If it has exactly one branch –> Skip the question.
4.2. If more than one branch –>Add them all as options
4.3 If there can be ‘n’ set of options, but only ‘m’ set of options present as branches to a
particular attribute, set ‘m’ set of options to be displayed.
5. Add the question and answer accordingly
6. Proceed with the questions and navigate through the tree until a decision is reached.
7. Question- Answer Model is created
(b) NLP Model:

1. Tokenization
1.1. break entire sentences into words.
1.2. extract the tokens from sentences
1.3. prepare a vocabulary
2. Normalising
2.1. get rid of randomness
3. Recognizing Entities
3.1. make the chatbot to understand the subject of conversation
4. Dependency Parsing
4.1. identify the dependencies between different phrases in a sentence
4.2. determining the correct grammatical structure of a sentence
5. Response Generation
5.1 puts together all the information obtained in the previous four steps
5.2 decide the most accurate response that should be given to the user
DETAILED DESIGN
Module 5: Prediction of exact Skin Disease by the user- chatbot interaction
• Accurate questions are queried by the chatbot to the user based on the Questionnaire that is generated.
• Based on the user response about the symptoms follow-up questions are also generated.
• The exact skin disease of the user is predicted.
• Remedies for the predicted disease are also provided.
INPUT - Chatbot
OUTPUT - Skin disease of the user is predicted
ALGORITHM:

1. user interacts with chat
1.1. chat bot queries responses using NLP model
1.2. user skin disease image is requested
1.3. user provides image
4. the decision tree created from the user image is given as input to the questionnaire model
5. questionnaire model generates the question and answer
6. chatbot queries question according the model
7. according to the user response follow – up questions are generated from the model
8. skin is diseases is predicted based on the leaf node that is attained
9. remedies are provided
10. other user queries are also resolved
DETAILED DESIGN:
4. IMPLEMENTATION DETAILS 100%
MODULE 1- CNN MODEL CREATION FOR ISIC2019 SKIN DISEASES IMAGE DATA SET
Sample conversations with the bot: NLP bot that answers user questions.
5. METRICS FOR EVALUATION
For the performance evaluation, we calculate the accuracy of the model by the percentage ratio of the number of
correct prediction to the total cases:

Our CNN model has produced an validation accuracy of 0.874.

The full dataset has been passed into the CNN model and the shortlist for every image is obtained.
Decision tree is created with this shortlist for each image and this is given as input to the chatbot. It takes
the answers from the data and predicts the skin disease.
Total number of correct predictions = 2104
Total number of cases = 2298
Accuracy = 91.557
6. TEST CASES
For testing the accuracy of our model the full dataset has been passed into the CNN model, ie each image from the
data has been passed into the CNN model and the shortlist for every image is obtained. Decision tree is created with
this shortlist for each image and this is given as input to the chatbot. It takes the answers from the data and predicts
the skin disease.
Counting the incorrect and correct predictions by getting the actual value and comparing it with the predicted value.
An accuracy of 91.557 on the whole has been obtained.

CNN MODEL ACCURACY:
An validation accuracy of 0.874 has been obtained for the created CNN MODEL
TEST CASE 1: Skin Disease Prediction Chatbot:
Testing the working of skin disease chatbot. The data corresponds to BCC. And the skin disease has been correctly predicted.
TEST CASE 2: Testing the working of NLP
CHATBOT
7. References
[1] Yao, P., Shen, S., Xu, M., Liu, P., Zhang, F., Xing, J., Shao, P., Kaffenberger, B. and Xu, R.X.
(2022). Single Model Deep Learning on Imbalanced Small Datasets for Skin Lesion
Classification. IEEE Transactions on Medical Imaging, 41(5), pp.1242–1254.
doi:10.1109/tmi.2021.3136682.
[2] Chakraborty, S., Paul, H., Ghatak, S., Pandey, S.K., Kumar, A., Singh, K.U. and Shah, M.A.
(2022). An AI-Based Medical Chatbot Model for Infectious Disease Prediction. IEEE Access, 10,
pp.128469–128483. doi:10.1109/access.2022.3227208.
[3] Thurnhofer-Hemsi, K., Lopez-Rubio, E., Dominguez, E. and Elizondo, D.A. (2021). Skin
Lesion Classification by Ensembles of Deep Convolutional Networks and Regularly Spaced
Shifting. IEEE Access, 9, pp.112193–112205. doi:10.1109/access.2021.3103410.
[4]‌Sanas, S., Pawale, P., Ghadage, G. and Sahani, M. (2021). SKIN DISEASE PREDICTION.
8(4), pp.4344–4347.
References
[5] Yu, H.Q. and Reiff-Marganiec, S. (2021). Targeted Ensemble Machine Classification Approach for
Supporting IoT Enabled Skin Disease Detection. IEEE Access, pp.1–1. doi:10.1109/access.2021.3069024.
[6] Mtende Mkandawire and Dr. Glorindal Selvam (2022). Prediction of Skin Diseases using Machine Learning
Algorithms. International Journal of Advanced Research in Science, Communication and Technology, pp.54–61.
doi:10.48175/ijarsct-7139.
[7] Safi, Z., Abd-Alrazaq, A., Khalifa, M. and Househ, M. (2020). Technical Aspects of Developing Chatbots
for Medical Applications: Scoping Review. Journal of Medical Internet Research, 22(12), p.e19127.
doi:10.2196/19127.
References
[8] Gandhi, M., Kumar Singh, V. and Kumar, V. (2019). IntelliDoctor – AI based Medical Assistant. In: 2019
Fifth International Conference on Science Technology Engineering and Mathematics (ICONSTEM).
[9] Yao, C., Qu, Y., Jin, B., Guo, L., Li, C., Cui, W. and Feng, L. (2016). A Convolutional Neural Network
Model for Online Medical Guidance. IEEE Access, 4, pp.4094–4103. doi:10.1109/access.2016.2594839.
[10] Patil, A. (2022). Healthcare Chatbot using Artificial Intelligence. International Journal for Research in
Applied Science and Engineering Technology, 10(8), pp.905–909. doi:10.22214/ijraset.2022.46299.

AI-based Chatbot For Skin Disease Prediction Using CNN and ID3 Decision Tree

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

AI-based Chatbot For Skin Disease Prediction Using CNN and ID3 Decision Tree

Uploaded by

Copyright:

Available Formats

AI-based chatbot for skin disease prediction

using CNN and ID3 Decision Tree

Under the guidance of

INPUT - ISIC2019 skin disease image data set.

INPUT – CNN model and User Image

1. Load the created CNN model

INPUT – Shortlisted skin diseases

S - The current dataset for which entropy is being calculated

T - The subsets created from splitting set S by attribute A such that

H(t) - Entropy of subset t.

INPUT - NLP Model and Question-Answer model

1. Each attribute is framed as a question.

(b) NLP Model:

Our CNN model has produced an validation accuracy of 0.874.

An accuracy of 91.557 on the whole has been obtained.

You might also like