Detection of Autism Spectrum Disorder in Children Using

Detection of Autism Spectrum Disorder
in Children Using
Machine Learning Techniques
Md. Abdullah
20CSE049
Abstract:
Autism Spectrum Disorder (ASD) is a neurological condition that impacts

language, speech, cognitive, and social skills throughout an individual's life. It
usually becomes noticeable in early childhood, affecting around 1% of the global
population. Genetics and environmental factors contribute to its development.
Detecting ASD early and providing intervention is crucial for symptom
improvement. Currently, clinical tests are used for diagnosis, but they are time-
consuming and costly. To improve accuracy and efficiency, machine learning
methods like Support Vector Machines, Random Forest Classifier, Naïve Bayes,
Logistic Regression, and KNN were employed in a study to predict early
susceptibility to ASD in children. The results showed that Logistic Regression
achieved the highest accuracy among the tested models for the given dataset.
Introduction:
Autism Spectrum Disorder (ASD) is a developmental condition that affects how people interact and
communicate. It can be caused by genetic or environmental factors and impacts various aspects of a
person's life, including their cognitive, social, emotional, and physical well-being. Symptoms of ASD can
vary widely and may include difficulties in social communication, repetitive behaviors, and intense
interests. Diagnosing ASD is a complex process that involves comprehensive assessments by
psychologists and certified professionals, especially in children. Traditional diagnostic methods like the
Autism Diagnostic Interview Revised (ADI-R) and the Autism Diagnostic Observation Schedule Revised
(ADOS-R) are time-consuming and demanding.
ASD is prevalent among pediatric populations, but diagnosing it early is challenging due to the
subjective
and time-consuming nature of current diagnostic procedures. This leads to a waiting period of at least
13 months from initial suspicion to confirmed diagnosis. The lengthy diagnostic process contributes to a
mismatch between the high demand for appointments and the limited capacity of pediatric clinics.
Review of Literature:
 In our research, we adopted five distinct ML models to classify individuals as

ASD or No-ASD, considering features such as age, sex, and ethnicity. Each
classifier's performance was thoroughly evaluated to identify the optimal
model.
Working Model:
 Figure 1 illustrates the system's workflow:

Methodology:
 Data Preprocessing:
 The dataset compiled by Dr. Fadi Thabtah comprises 1054 instances with 18
attributes, including categorical, continuous, and binary types. Preprocessing
was conducted to enhance the data's suitability for modeling by eliminating
non-contributing attributes ('Case_No', 'Who completed the test', and 'Qchat-
10-Score’). Categorical values were managed through label encoding,
converting labels into numeric format for machine comprehension. This
approach was applied to four features with two classes each ('Sex’, 'Jaundice',
'Family_mem_with_ASD', and 'Class/ASD_Traits'). For features with more than
two classes, such as 'Ethnicity' with 11 classes, one-hot encoding was utilized
to avoid introducing unintended hierarchical order.
Classification Algorithms:
 The dataset is divided into a training set (80% of the data) used for model
training and a test set (20% of the data) for evaluating the model's accuracy
on new data. This separation helps identify overfitting or underfitting.
Overfitting occurs when a model performs well on training data but poorly on
testing data. Underfitting is when a model performs poorly on both training
and testing data. An ideal model neither overfits nor underfits, striking a
balance between the two.
Dataset Analysis:
Table 2 Features mapping with Q-CHAT-10 screening method:
Dataset variable Description

A1 Child responding to you calling his/her name
A2 Ease of getting eye contact from child
A3 Child pointing to objects he/she wants
A4 Child pointing to draw your attention to
his/her interests
A5 If the child shows pretense
A6 Ease of child to follow where you point/look
A7 If the child wants to comfort someone who is
upset
A8 Child’s first words
A9 If the child uses basic gestures
A10 If the child daydreams/stares at nothing
Evaluation Matrix:
 Normally, in most predictive models, the data points lie in the following four categories:
 1.
 True positive (TP): The individual has ASD and we predicted correctly that the individual has ASD.
 2.
 True negative (TN): The individual does not have ASD and we predicted correctly that the individual
does
 not have ASD.
 3.
 False positive (FP): The individual does not have ASD, but we predicted incorrectly that the individual
 has ASD. This is known as Type 1 error.
 4.
 False negative (FN): The individual has ASD, but we predicted incorrectly that the individual does not
 have ASD. This is known as Type 2 error
Table 3 Confusion matrix for ASD prediction:
Predicted Individual has ASD Individual does not have ASD

ASD is predicted True positive False positive
ASD is not predicted False negative True negative
Table 4 A comparison of the applied ML models:
LR NB SVM KNN RFC

Accuracy 97.15% 94.79% 93.84% 90.52% 81.52%
[5715148] [5656144] [52310146] [51911140] [451417135]

Confusion
matrix
F1 score [5751148]
0.98 [5665144]
0.96 [52103146]
0.95 [51119140]
0.93 [451714135]
0.88
F1 score 0.98 0.96 0.95 0.93 0.88
Precision and Recall Curves:
 Precision determines how accurate our positive predictions were, i.e., out of all
the points predicted to be positive how many of them were actually positive
 Precision =TPTP +FP.Precision =TPTP+FP.
 Recall measures what fraction of the positives our model deteceted, i.e., out of
the points that are labeled positive, how many of them were accurately
predicted as positive. Recall is the same as sensitivity
 Recall=TPTP+FN
 Accuracy is called the probability of the number of correct predictions made by
the classifier. Otherwise,
 it is the fraction of correct predictions made out of the total number of
predictions
 Accuracy=TP+TNTP+FP+TN+FN
Conclusion:
 The research focused on developing an automated model to predict Autism

Spectrum Disorder (ASD) using behavioral traits. Current ASD assessment is
time-consuming due to overlapping symptoms, and no quick and accurate
diagnostic test exists. The team created a model using a limited dataset and
found that Logistic Regression yielded the highest accuracy among five models
tested. A key challenge was the lack of large open-source ASD datasets.
Despite dataset limitations, the study offers insights for medical practitioners
and lays the groundwork for future improvements. The team aims to use
larger datasets and integrate deep learning techniques like CNNs to enhance
the system's performance. The research provides valuable classification
models for ASD detection and can serve as a foundation for further
exploration by other researchers in this field.
References:
 1. Al Banna MH, Ghosh T, Taher KA, Kaiser MS, Mahmud M. A monitoring system
for patients of autism spectrum disorder using artificial intelligence. In:
International conference on brain informatics. Cham: Springer; 2020. pp. 251–
6
 2. Baron-Cohen S, Allen J, Gillberg C. Can autism be detected at 18 months?
The needle, the haystack, and the CHAT. Br J Psychiatry. 1992;161:839–43.
THANK YOU

Detection of Autism Spectrum Disorder in Children Using

Uploaded by

Copyright:

Available Formats

You might also like

Detection of Autism Spectrum Disorder in Children Using

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Detection of Autism Spectrum Disorder in Children Using

Uploaded by

Copyright:

Available Formats

Detection of Autism Spectrum Disorder

Autism Spectrum Disorder (ASD) is a neurological condition that impacts

 In our research, we adopted five distinct ML models to classify individuals as

 Figure 1 illustrates the system's workflow:

Dataset variable Description

Predicted Individual has ASD Individual does not have ASD

LR NB SVM KNN RFC

[5715148] [5656144] [52310146] [51911140] [451417135]

 The research focused on developing an automated model to predict Autism

You might also like