Professional Documents
Culture Documents
Early Screening of Autism in Toddlers Via Response-To-Instructions Protocol
Early Screening of Autism in Toddlers Via Response-To-Instructions Protocol
Early Screening of Autism in Toddlers Via Response-To-Instructions Protocol
5, MAY 2022
Abstract—Early screening of autism spectrum disorder (ASD) contact and joint attention, and sensitivity to physical con-
is crucial since early intervention evidently confirms significant tact [2]. It is noted that the incidence of autism is far beyond
improvement of functional social behavior in toddlers. This arti- the public imagination. It is estimated that 1/59 children had
cle attempts to bootstrap the response-to-instructions (RTIs)
protocol with vision-based solutions in order to assist professional autism in the U.S. in 2018, while the number of autistic
clinicians with an automatic autism diagnosis. The correlation patients reached 67 million in the world [3]. Children with
between detected objects and toddler’s emotional features, such ASD would bring about enormous costs for families and gov-
as gaze, is constructed to analyze their autistic symptoms. Twenty ernments, including special education services and parental
toddlers between 16–32 months of age, 15 of whom diagnosed productivity loss [4]. ASD individuals can be severely stressed
with ASD, participated in this study. The RTI method is val-
idated against human codings, and group differences between due to the difficulty of proper mutual communication and
ASD and typically developing (TD) toddlers are analyzed. The being misunderstood. Because of the lack of necessary social
results suggest that the agreement between clinical diagnosis and skills, their personal lives will be impacted, and their career
the RTI method achieves 95% for all 20 subjects, which indicates opportunities will be hampered, also resulting in a burden
vision-based solutions are highly feasible for automatic autistic for society. Although there is currently no cure for autism,
diagnosis.
authoritative research suggests that early behavioral treatments
Index Terms—Autistic early screening, gaze estimation, social can improve the symptoms of children with autism, and the
behavior disorder. earlier the intervention, the better the effect [5], [6]. Thus,
the ability to screen toddlers with ASD in the early period
is significant since early screening and early diagnosis are
the premise of early intervention. The importance of early
I. I NTRODUCTION screening of children with autism has also been highlighted in
UTISM spectrum disorders (ASD), characterized by
A severe impairments in social communication and
unusual, restricted, or repetitive behaviors, contain a series of
recent practice guidelines issued by the American Academy
of Pediatrics [7].
Currently, the screening and diagnosis of autism are con-
neurodevelopmental disorders [1]. The signs of autism include ducted based on developmental history, assessment scales,
difficulty in communicating, using language, talking about or and behavior observation by professional clinicians [8]. In
understanding feelings, disinclination to share or engage in the process of clinical diagnosis, the physicians observe and
reciprocal play with others, lack of behaviors, such as eye record behaviors of children referring to some assessment cri-
teria and scales, such as autism diagnostic interview-revised
Manuscript received February 14, 2020; revised June 4, 2020; accepted (ADI-R) and autism diagnostic observation schedule (ADOS),
August 5, 2020. Date of publication September 23, 2020; date of current
version May 19, 2022. This work was supported by the National Natural which are considered to be the most standard tools in autism
Science Foundation of China under Grant 61733011 and Grant 51575338. diagnosis [9], [10]. ADI-R refers to an interview with par-
This article was recommended by Associate Editor S. Chen. (Corresponding ents to collect children’s behavioral manifestations in detail.
authors: Xiu Xu; Honghai Liu.)
Jingjing Liu and Zhiyong Wang are with the State Key Laboratory of ADOS is mainly employed to assess the ability of lan-
Mechanical System and Vibration, Shanghai Jiao Tong University, Shanghai guage communication, interpersonal communication, playing
200240, China, and also with the State Key Laboratory of Robotics and games, and imagination of individuals with suspected autism
Systems, Harbin Institute of Technology Shenzhen, Shenzhen 518055, China
(e-mail: lily121@sjtu.edu.cn). or other widespread developmental disorders. Due to the dif-
Kai Xu and Bin Ji are with the State Key Laboratory of Mechanical System ferent levels of perspectives and clinical experience maintained
and Vibration, Shanghai Jiao Tong University, Shanghai 200240, China. by different clinicians, there are some variabilities across
Gongyue Zhang is with the School of Computing, University of Portsmouth,
Portsmouth PO1 3HE, U.K. their diagnostic results since the process of assessing autism
Yi Wang, Jingxin Deng, Qiong Xu, and Xiu Xu are with the Department of based on these scales heavily relies on artificial observation.
Child Health Care, Children’s Hospital of Fudan University, Shanghai 201102, Thus, accurate diagnosis requires extensive clinical experience.
China (e-mail: xuxiu@fudan.edu.cn).
Honghai Liu is with the State Key Laboratory of Robotics and Systems, Besides, the time for ASD diagnosis is long [11] and quali-
Harbin Institute of Technology Shenzhen, Shenzhen, China (e-mail: hong- fied professional clinicians are in short supply. To cope with
hai.liu@icloud.com). the above challenges in clinical autism screening, increased
Color versions of one or more figures in this article are available at
https://doi.org/10.1109/TCYB.2020.3017866. research efforts using technical means have been made to facil-
Digital Object Identifier 10.1109/TCYB.2020.3017866 itate an objective and automatic progress of autistic diagnosis.
2168-2267
c 2020 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See https://www.ieee.org/publications/rights/index.html for more information.
Authorized licensed use limited to: SRM Univ - Faculty of Eng & Tech- Vadapalani. Downloaded on January 21,2024 at 09:00:22 UTC from IEEE Xplore. Restrictions apply.
LIU et al.: EARLY SCREENING OF AUTISM IN TODDLERS VIA RESPONSE-TO-INSTRUCTIONS PROTOCOL 3915
Various methods are employed to collect and process autistic- well as related algorithms for realizing automatic assessment.
related information [12]–[16], such as the EEG recording Section IV describes experiments on ASD children and nor-
system, wristband with an accelerometer, fMRI, vision based, mal children and the results to validate the feasibility of RTI.
vocalization based, etc. Among these technical methods, com- Finally, this article is concluded in Section V with future
puter vision methods [17], [18] especially stand out since they works.
are a more intuitive way to model the pathological mechanism
of autism based on behavioral factors. By capturing videos II. R ELATED W ORK
of subjects in an unconstrained environment, these methods Core symptoms of autism can be concluded as three cate-
can provide quantitative analysis about human language, eye gories: 1) social interaction disorder; 2) verbal and nonverbal
gaze, expressions, and actions that can reflect typical autism communication disorder; and 3) narrow interests and stereo-
patterns. typy. We mainly focus on the former two characteristics.
There is evidence that reduced levels of social attention and Social interaction disorder defects of ASD children are char-
social communication, as well as increased repetitive behav- acterized by external behaviors, such as avoiding eye contact,
ior with objects, are early markers of ASD between 12 and lack of interest and response to human voice, lack of interest
24 months of age [7]. Social attention and communication in socializing, and so on. On the other hand, ASD children
indicators of ASD include decreased response to one’s name have dysfunction in emotional perception internally, which is
being called (i.e.,“orienting to name”), reduced visual atten- represented as difficulties in recognizing emotional and social
tion to socially meaningful stimuli, and less frequent use of information from faces. The objective measurement of these
joint attention and communicative gestures. Besides, failure to symptoms [22]–[24] would undoubtedly enhance the reliabil-
understand language instructions is also an essential observa- ity of assessment methods. In this regard, various computer
tional indicator in the screening and diagnosis of ASD. There vision methods [25], [26] are employed to generate automated
are some researchers [19]–[21] who focused on recognizing measurement and reveal intrinsic information. Wang et al. [27]
one or two early indicators of autism by computer vision algo- proposed an objective and effective method to assist autism
rithms. However, we aim to develop a vision-based system screening. A multisensor system is built for 3-D gaze direc-
to assist autism screening in which various early indicators tion estimation to assess the common clinical task in the autism
of autism are taken into consideration comprehensively. A diagnosis process: response to name calling. The experiment
multisensor platform is elaborately designed and built to col- results on ten adults and seven children (five ASD subjects
lect video information from different views. In this article, and two healthy subjects) achieved an average classification
we focus on describing toddlers’ ability of distraction from score of 92.7%. Joint attention also plays a major role in the
nonsocial stimulus and responding to the language instruc- development of autism in which the eye gaze is a key fac-
tions. Incomprehension or neglect of the primary interactive tor [28]. Courgeon et al. [29] attempted to simulate the joint
language is taken as a severe defect in toddlers with autism attention with virtual humans, which are endowed with the
because the appropriate reaction to the basic vocabulary is ability to follow the user’s attention by eye tracking. Some
the first step in social communication. A novelty experimental works [30]–[32] attempt to uncover the characteristics of facial
protocol is proposed that toddlers are presented with toys as expressions, notably distinct from those in typically devel-
a central stimulus, and their abilities are tested to disengage oping (TD) children. Leo et al. [20] used a single-camera
from the toys and respond to instructions. Computer vision system to assess the capability of ASD children to produce
algorithms, including hand detection and gaze estimation, are facial expressions. A comparison of the system’s outputs with
employed to assess performances of toddlers, and their validity the evaluations performed by psychologists made evident that
is testified by comparing them against human codings. The main the proposed system could perform quantitative analysis and
contributions of this article can be summarized as follows. overcome human limitations in observing and understand-
1) A novel experimental protocol called response-to- ing behaviors. ASD researchers have also proposed automatic
instruction (RTI) is able to assist the screening of autism, emotion annotation solutions [33] to assist autistic patients
and relevant details are standardized for subsequent perceive facial expression of emotion in their social lives.
unified assessment. Hashemi et al. [21] provided computer vision tools for the
2) Appropriate technical solutions are proposed to automat- early detection of autism based on three critical autism obser-
ically assess the protocol in unconstrained conditions. vation scale for infants (AOSI) activities that assess visual
Besides, a database called the TASD (the ASD database) tracking, disengagement of attention, and sharing interest,
including hands movement of children is established, respectively. Visual attention is assessed using head motion
which can be used for the subsequent analysis of by tracking facial features.
children’s hand gestures. As for communication disorders, ASD children fail to use
3) A multivision sensor system is constructed to capture the right body language to express desires or transmit mes-
ASD children’s quantitative behavior ranging from body sages in terms of nonverbal communication. A common scene
movement to emotional states. is that the child may pull adults’ hands toward what he
The remainder of this article is organized as follows. wants without corresponding facial expressions and eye con-
Section II reviews the related work of computational meth- tact. However, there are few relevant studies on ASD children’s
ods for characterizing autism symptoms. Section III presents hand gestures and body languages using computer vision
the proposed protocol RTI and its hardware platform, as methods. For verbal communication disorders embodied in
Authorized licensed use limited to: SRM Univ - Faculty of Eng & Tech- Vadapalani. Downloaded on January 21,2024 at 09:00:22 UTC from IEEE Xplore. Restrictions apply.
3916 IEEE TRANSACTIONS ON CYBERNETICS, VOL. 52, NO. 5, MAY 2022
Authorized licensed use limited to: SRM Univ - Faculty of Eng & Tech- Vadapalani. Downloaded on January 21,2024 at 09:00:22 UTC from IEEE Xplore. Restrictions apply.
LIU et al.: EARLY SCREENING OF AUTISM IN TODDLERS VIA RESPONSE-TO-INSTRUCTIONS PROTOCOL 3917
Fig. 2. Overview of the whole protocol. There will be at most four rounds of test depending on the subject’s performance. Both the audio data flow and
video data flow are used to assess the subject’s performance. The audio data flow offers a trigger for assessment by speech recognition. Then, the video data
from different sources are used for object detection and gaze estimation perspectively. The relevant results are combined to score the subject’s behavior. The
final score is the sum of four rounds.
TABLE I
rounds. If the child leaves his seat during the experiment, the mAP OF D IFFERENT C LASSES OVER 0.5 IOU
clinician would guide him back to the chair and continue the
experiments.
B. Algorithms
The processing algorithms for automatic measurement
of RTI protocol are divided into two parts: one is the network is trained for 200 k steps using arms prop opti-
interpretation algorithms for the analysis of human behavior mizer with a starting learning rate of 0.004, a weight
and another is the evaluation part that assesses the RTI protocol decay of 0.05, and a momentum of 0.9. Although the
based on the results of interpretation algorithms. trained model performed well on detecting children’s
1) Interpretation Algorithms: The interpretation algorithms hands, clinician’s hands, and the wind-up toy, the mean
mainly detect human hand movement and attention states by average precision (mAP) of detection of the balls is
using object detection and gaze estimation algorithms. relatively low. Unsatisfactory detection results for balls
1) Object Detection: The successive images captured by may be due to the misdetection for small objects using
camera C1 are used to detect the locations of toys and the SSD model since balls are commonly occluded by
human hands on the tabletop. Since the hand is sim- hands. A simple but effective method using traditional
ply an instantiation of one particular object, it is taken image processing is utilized to solve the misdetection
as another object different from the used toys. Due to for balls. Color identification based on HSV spaces is
the flexible shapes of human hands and occlusion in first applied on the RGB images captured by camera C1.
the interaction process, deep learning-based algorithms Then, two requirements are proposed to pick the contour
are employed for object detection instead of traditional of balls from possible contours in the binary image since
object detection algorithms. In this article, one single- the processed binary image has many noisy pixels and
shot multibox detector (SSD) [36] model is taken as interference options. On the one hand, area, aspect ratio,
the preliminary model for object detection because of width, and height of possible counters have to satisfy a
its high speed and high accuracy. In practice, the SSD certain threshold. On the other hand, the center of the
model pretrained on the COCO dataset [37] is taken as ball’s bounding box has to be located at the inside of the
our start point followed by transfer learning on our own table’s contour. The table contour is also detected using
dataset. The hands of children, the hands of the clinician, color threshold segmentation on HSV spaces. As shown
the balls, and the wind-up toy are labeled as four kinds in Table I, mAP of four different classes of objects over
of objects. The original images from C1 sized 800×600 0.5 IOU has exceeded 90%.
pixels are cropped to square sized 500 × 500 pixels to 2) Gaze Estimation: Typical social communication and
eliminate unnecessary background environment. More interaction impairments of ASD children can be
than 15 000 images performed by ten subjects are used reflected by their visual attention and gaze patterns.
to build our own dataset called TASD in which images Thus, gaze direction is another significant indicator
are shuffled to be split into three parts: a) 50% for train- to be measured in the RTI protocol. Images cap-
ing; b) 40% for test; and c) 10% for evaluation. The tured by camera C2 are used for children’s head pose
Authorized licensed use limited to: SRM Univ - Faculty of Eng & Tech- Vadapalani. Downloaded on January 21,2024 at 09:00:22 UTC from IEEE Xplore. Restrictions apply.
3918 IEEE TRANSACTIONS ON CYBERNETICS, VOL. 52, NO. 5, MAY 2022
Authorized licensed use limited to: SRM Univ - Faculty of Eng & Tech- Vadapalani. Downloaded on January 21,2024 at 09:00:22 UTC from IEEE Xplore. Restrictions apply.
LIU et al.: EARLY SCREENING OF AUTISM IN TODDLERS VIA RESPONSE-TO-INSTRUCTIONS PROTOCOL 3919
Authorized licensed use limited to: SRM Univ - Faculty of Eng & Tech- Vadapalani. Downloaded on January 21,2024 at 09:00:22 UTC from IEEE Xplore. Restrictions apply.
3920 IEEE TRANSACTIONS ON CYBERNETICS, VOL. 52, NO. 5, MAY 2022
Authorized licensed use limited to: SRM Univ - Faculty of Eng & Tech- Vadapalani. Downloaded on January 21,2024 at 09:00:22 UTC from IEEE Xplore. Restrictions apply.
LIU et al.: EARLY SCREENING OF AUTISM IN TODDLERS VIA RESPONSE-TO-INSTRUCTIONS PROTOCOL 3921
Fig. 5. (a) Time series of the toy’s location for round 3 after giving instructions (time = 0), measured in pixels. The red dot is the vertical distance between
the toy’s current location and initial location, that is, y,k,t for frame t and subject k. The blue cross represents the horizontal distance between the center of
the toy and the table, i.e., x,k,t for frame t and subject k. Time segments meeting the threshold values are labeled as green bars along the timeline. Screenshots
from the recorded video are displayed. (b) Time series of measuring the toy’s location for round 4. The green dot represents the distance between the center
of the toy and clinician’s hands, that is, k,t for frame t and subject k. (c) Time series of measuring the gaze direction. The green diamond denotes the gaze
angle ϕ using the degree corresponding to ϕ using radian. ϕ1 and ϕ2 are degree values converted from ϕ1 and ϕ2 .
Authorized licensed use limited to: SRM Univ - Faculty of Eng & Tech- Vadapalani. Downloaded on January 21,2024 at 09:00:22 UTC from IEEE Xplore. Restrictions apply.
3922 IEEE TRANSACTIONS ON CYBERNETICS, VOL. 52, NO. 5, MAY 2022
Authorized licensed use limited to: SRM Univ - Faculty of Eng & Tech- Vadapalani. Downloaded on January 21,2024 at 09:00:22 UTC from IEEE Xplore. Restrictions apply.
LIU et al.: EARLY SCREENING OF AUTISM IN TODDLERS VIA RESPONSE-TO-INSTRUCTIONS PROTOCOL 3923
[2] B. Scassellati, H. Admoni, and M. Mataric, “Robots for use in autism [24] G. Boccignone and M. Ferraro, “Ecological sampling of gaze shifts,”
research,” Annu. Rev. Biomed. Eng., vol. 14, pp. 275–294, May 2012. IEEE Trans. Cybern., vol. 44, no. 2, pp. 266–279, Apr. 2014.
[3] J. Baio et al., “Prevalence of autism spectrum disorder among children [25] C. Liu, K. Conn, N. Sarkar, and W. Stone, “Online affect detec-
aged 8 years—Autism developmental disabilities monitoring network, tion and robot behavior adaptation for intervention of children
11 sites, United States, 2014 (vol 67, pg 1, 2018),” Morbidity Mortality with autism,” IEEE Trans. Robot., vol. 24, no. 4, pp. 883–896,
Weekly Rep., vol. 67, no. 45, p. 1280, Nov. 2018. Aug. 2008.
[4] X. Liu, Q. Wu, W. Zhao, and X. Luo, “Technology-facilitated diagnosis [26] P. Sarah et al., “Disease prediction using graph convolutional networks:
and treatment of individuals with autism spectrum disorder: An engi- Application to autism spectrum disorder and Alzheimer’s disease,” Med.
neering perspective,” Appl. Sci. BASEL, vol. 7, no. 10, pp. 731–736, Image Anal., vol. 48, pp. 117–130, Aug. 2018.
Oct. 2017. [27] Z. Wang, J. Liu, K. He, Q. Xu, X. Xu, and H. Liu, “Screening
[5] M. Iliana, C. Tony, and H. Patricia, “A two-year prospective follow-up early children with autism spectrum disorder via response-to-name
study of community-based early intensive behavioural intervention and protocol,” IEEE Trans. Ind. Informat., early access, Dec. 9, 2019,
specialist nursery provision for children with autism spectrum disorders,” doi: 10.1109/TII.2019.2958106
J. Child Psychol. Psychiat., vol. 48, no. 8, pp. 803–812, 2010. [28] Z. Yucel, A. A. Salah, C. Mericli, T. Mericli, R. Valenti, and T. Gevers,
[6] S. Ming, T. A. Mulhern, I. Stewart, L. Moran, and K. Bynum, “Training “Joint attention by gaze interpolation and saliency,” IEEE Trans.
class inclusion responding in typically-developing children and individ- Cybern., vol. 43, no. 3, pp. 829–842, May 2013.
uals with autism,” J. Appl. Behav. Anal., vol. 51, no. 1, pp. 53–60, [29] M. Courgeon, G. Rautureau, J.-C. Martin, and O. Grynszpan, “Joint
2018. attention simulation using eye-tracking and virtual humans,” IEEE Trans.
[7] L. Zwaigenbaum et al., “Early identification of autism spectrum disor- Affect. Comput., vol. 5, no. 3, pp. 238–250, Jul. 2014.
der: Recommendations for practice and research,” Pediatrics, vol. 136, [30] M. Samad, N. Diawara, J. Bobzien, C. Taylor, J. Harrington, and
no. S1, p. S10, 2015. K. Iftekharuddin, “A pilot study to identify autism related traits in
[8] E. Fernell, M. Eriksson, and C. Gillberg, “Early diagnosis of autism spontaneous facial actions using computer vision,” Res. Autism Spectr.
and impact on prognosis: A narrative review,” Clin. Epidemiol., vol. 5, Disord., vol. 65, pp. 14–24, Nov. 2019.
pp. 33–43, Feb. 2013. [31] K. Owada et al., “Computer-analyzed facial expression as a surrogate
[9] C. J. Dover and A. L. Couteur, “How to diagnose autism,” Archives marker for autism spectrum social core symptoms,” PLoS ONE, vol. 13,
Disease Childhood, vol. 92, no. 6, p. 540, 2007. no. 1, 2018, Art. no. e0190442.
[10] J. L. Matson, M. Nebel-Schwalm, and M. L. Matson, “A review of [32] T. Guha, Z. Yang, R. B. Grossman, and S. S. Narayanan, “A
methodological issues in the differential diagnosis of autism spec- computational study of expressive facial dynamics in children with
trum disorders in children,” Res. Autism Spectr. Disord., vol. 1, no. 1, autism,” IEEE Trans. Affect. Comput., vol. 9, no. 1, pp. 14–20,
pp. 38–54, 2007. Jan.–Mar. 2018.
[11] N. Muty and Z. Azizul, “Detecting arm flapping in children with autism [33] X. Zhao, J. Zou, H. Li, E. Dellandrea, I. A. Kakadiaris, and L. Chen,
spectrum disorder using human pose estimation and skeletal represen- “Automatic 2.5-D facial landmarking and emotion annotation for
tation algorithms,” in Proc. Int. Conf. Adv. Informat. Concepts, 2017, social interaction assistance,” IEEE Trans. Cybern., vol. 46, no. 9,
pp. 33–45. pp. 2042–2055, Aug. 2016.
[12] T. Heunis et al., “Recurrence quantification analysis of resting state
[34] J. Depriest, A. Glushko, K. Steinhauer, and S. Koelsch, “Language and
eeg signals in autism spectrum disorder—A systematic methodological
music phrase boundary processing in autism spectrum disorder: An ERP
exploration of technical and demographic confounders in the search for
study,” Sci. Rep., vol. 7, no. 1, 2017, Art. no. 14465.
biomarkers,” BMC Med., vol. 16, no. 1, pp. 28–37, 2018.
[35] G. Metta, P. Fitzpatrick, and L. Natale, “YARP: Yet another robot
[13] J. R. Sato, M. Vidal, S. de Siqueira Santos, K. B. Massirer, and
platform,” Int. J. Adv. Robot. Syst., vol. 3, no. 1, p. 2006, 2008.
A. Fujita, “Complex network measures in autism spectrum disor-
[36] W. Liu et al., “SSD: Single shot multibox detector,” in Proc. Eur. Conf.
ders,” IEEE/ACM Trans. Comput. Biol. Bioinformat., vol. 15, no. 2,
Comput. Vis., 2016, pp. 21–37.
pp. 581–587, Mar./Apr. 2018.
[14] M. S. Goodwin, M. Haghighi, Q. Tang, M. Akcakaya, D. Erdogmus, [37] T.-Y. Lin et al., “Microsoft COCO: Common objects in context,” in
and S. Intille, “Moving towards a real-time system for automatically Proc. Eur. Conf. Comput. Vis., Apr. 2014, p. 8693.
recognizing stereotypical motor movements in individuals on the autism [38] N. Ruiz, E. Chong, and J. M. Rehg, “Fine-grained head pose estima-
spectrum using wireless accelerometry,” in Proc. ACM Int. Joint Conf. tion without keypoints,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern
Pervasive Ubiquitous Comput. (UbiComp), 2014, pp. 861–872. Recognit. Workshops (CVPRW), vol. 1, 2018, Art. no. 215509.
[15] J. Wang, Q. Wang, H. Zhang, J. Chen, S. Wang, and D. Shen, “Sparse [39] Y. Wang, Y. Hui, J. Dong, B. Stevens, and H. Liu, “Facial expression-
multiview task-centralized ensemble learning for ASD diagnosis based aware face frontalization,” in Proc. Asian Conf. Comput. Vis., 2016,
on age- and sex-related functional connectivity patterns,” IEEE Trans. pp. 375–388.
Cybern., vol. 49, no. 8, pp. 3141–3154, Aug. 2019. [40] Z. Wang, H. Cai, and H. Liu, “Robust eye center localization based on an
[16] R. Ognjen, L. Jaeryoung, D. Miles, S. Bjorn, and R. W. Picard, improved SVR method,” in Proc. 25th Int. Conf. (ICONIP), Jan. 2018,
“Personalized machine learning for robot perception of affect and pp. 623–634.
engagement in autism therapy,” Science, vol. 3, no. 19, 2018, [41] X. Zhou, H. Cai, Y. Li, and H. Liu, “Two-eye model-based gaze esti-
Art. no. eaao6760. mation from a Kinect sensor,” in Proc. IEEE Int. Conf. Robot. Autom.
[17] T. Li, B. Ni, M. Xu, M. Wang, Q. Gao, and S. Yan, “Data-driven affective (ICRA), 2017, pp. 1646–1653.
filtering for images and videos,” IEEE Trans. Cybern., vol. 45, no. 10, [42] iflytek Open Platform. Accessed: Apr. 2019. [Online]. Available: https:
pp. 2336–2349, Oct. 2015. //www.xfyun.cn/services/voicedictation
[18] L. Shao, X. Zhen, D. Tao, and X. Li, “Spatio-temporal Laplacian pyra- [43] T. Ohno and H. Ogasawara, “Information acquisition model of highly
mid coding for action recognition,” IEEE Trans. Cybern., vol. 44, no. 6, interactive tasks,” in Proc. ICCS/JCSS, Aug. 1999, p. 26.
pp. 817–827, Jun. 2014. [44] A. Scalmato, A. Sgorbissa, and R. Zaccaria, “Describing and recognizing
[19] K. Campbell et al., “Computer vision analysis captures atypical attention patterns of events in smart environments with description logic,” IEEE
in toddlers with autism,” Autism, vol. 23, no. 2, pp. 619–628, 2018. Trans. Cybern., vol. 43, no. 6, pp. 1882–1897, Dec. 2013.
[20] M. Leo et al., “Computational assessment of facial expression produc- [45] F. Happe and U. Frith, “The weak coherence account: Detail-focused
tion in ASD children,” Sensors, vol. 18, p. 3993, Nov. 2018. cognitive style in autism spectrum disorders,” J. Autism Develop.
[21] J. Hashemi et al., “Computer vision tools for the non-invasive assess- Disord., vol. 36, no. 1, p. 5, 2006.
ment of autism-related behavioral markers,” 2012. [Online]. Available: [46] J. Han, L. Shao, D. Xu, and J. Shotton, “Enhanced computer vision
arXiv:1210.7014. with Microsoft Kinect sensor: A review,” IEEE Trans. Cybern., vol. 43,
[22] T. Zhang, W. Zheng, Z. Cui, Y. Zong, and Y. Li, “Spatial ctemporal no. 5, pp. 1318–1334, Oct. 2013.
recurrent neural network for emotion recognition,” IEEE Trans. Cybern., [47] H. Zhou, H. Hu, H. Liu, and J. Tang, “Classification of upper limb
vol. 49, no. 3, pp. 839–847, Mar. 2019. motion trajectories using shape features,” IEEE Trans. Syst., Man,
[23] H. Meng, N. Bianchi-Berthouze, Y. Deng, J. Cheng, and J. P. Cosmas, Cybern. C, Appl. Rev., vol. 42, no. 6, pp. 970–982, Nov. 2012.
“Time-delay neural network for continuous emotional dimension [48] B. Liu, Z. Ju, and H. Liu, “A structured multi-feature representation for
prediction from facial expression sequences,” IEEE Trans. Cybern., recognizing human action and interaction,” Neurocomputing, vol. 318,
vol. 46, no. 4, pp. 916–929, Apr. 2016. pp. 287–296, Nov. 2018.
Authorized licensed use limited to: SRM Univ - Faculty of Eng & Tech- Vadapalani. Downloaded on January 21,2024 at 09:00:22 UTC from IEEE Xplore. Restrictions apply.
3924 IEEE TRANSACTIONS ON CYBERNETICS, VOL. 52, NO. 5, MAY 2022
Jingjing Liu received the B.E. degree from the Yi Wang received the master’s degree in child
Beijing Institute of Technology, Beijing, China, in healthcare from Fudan University, Shanghai, China,
2017. She is currently pursuing the Ph.D. degree in 2020.
with the Robotics Institute, Shanghai Jiao Tong Her research interests include molecular mecha-
University, Shanghai, China. nisms and treatments of autism.
She is also a Visiting Scholar with the State
Key Laboratory of Robotics and Systems, Harbin
Institute of Technology Shenzhen, Shenzhen, China.
Her current research interests include computer
vision and applications in autistic screening and
intervention.
Jingxin Deng received the bachelor’s degree in clin-
ical medicine from Chongqing Medical University,
Chongqing, China, in 2018. She is currently pursu-
ing the M.Med. degree with the Children’s Hospital
of Fudan University, Shanghai, China.
Her research interests include early child develop-
Zhiyong Wang (Graduate Student Member, IEEE) ment and mechanisms of autism spectrum disorders.
received the B.E. degree from the South China
University of Technology, Guangzhou, China, in
2016. He is currently pursuing the Ph.D. degree
with the Robotics Institute, Shanghai Jiao Tong
University, Shanghai, China.
He is also a Visiting Scholar with the State
Key Laboratory of Robotics and Systems, Harbin Qiong Xu received the Ph.D. degree in pediatrics
Instiute of Technology Shenzhen, Shenzhen, China. from Fudan University, Shanghai, China, in 2015.
His research interests include gaze estimation, She is a Chief Physician and the Associate Chief of
human motion analytics, and applications in autistic the Division of Child Health Care, Children’s Hospital
screening and intervention. of Fudan University. She was a Visiting Scholar with
Duke Children’s Hospital and Health Center, Durham,
NC, USA, from 2012 to 2013. She is working on
early detection and early intervention for ASD in
community- and hospital-based practices in Shanghai
as well as exploring the molecular mechanism of
genetic mutations in humans and animal models.
Authorized licensed use limited to: SRM Univ - Faculty of Eng & Tech- Vadapalani. Downloaded on January 21,2024 at 09:00:22 UTC from IEEE Xplore. Restrictions apply.