Professional Documents
Culture Documents
A Multi-Center Clinical Trial For Camera-Based Infant Sleep and Awake Detection in Neonatal Intensive
A Multi-Center Clinical Trial For Camera-Based Infant Sleep and Awake Detection in Neonatal Intensive
A Multi-Center Clinical Trial For Camera-Based Infant Sleep and Awake Detection in Neonatal Intensive
Abstract—Infants need adequate sleep to develop their brain the problem of irregular breathing patterns or apnea [2]. There
and cardiovascular systems, especially for preterm infants in the is an urgent need to monitor infant sleep in NICU. As the
Neonatal Intensive Care Unit. Camera-based infant monitoring first step, sleep-awake detection can provide clinicians the
is an emerging direction of research in video health monitoring.
Intuitively, camera can easily identify the sleep-awake stage of basic information on the total amount of sleeping time of an
infants by detecting the state of eyes, e.g. closed or opening. infant. This may guide caregivers to optimize the workflow
Thus in this paper, we propose to explore the unique advantage for neonatal care and take appropriate medical interventions
of camera-based facial analysis for sleep-awake detection, as to improve the prognosis of infants in NICU.
a fundamental step toward infant sleep monitoring. A multi- Clinically, polysomnography assembling both the cerebral
center clinical trial was conducted to collect infant videos for
investigating the feasibility of our proposal. A benchmark in- and physiological measurements is considered to be the gold
cluding four machine learning methods of SVM, KNN, MLP, and standard for infant sleep monitoring [3]. However, it requires
CNN (ResNet18) was set up to classify the sleep/awake stage of multiple electrodes attached to the infant fragile skin to obtain
infants. To alleviate the overfitting issue caused by over-sampling physiological signals, which increases the risk of skin damage
of a sleeping infant, we propose to integrate ResNet18 with and infections. This is not preferred for preterm infants,
the contrastive learning strategy to strengthen the consistency
of facial features learned from different infants. The clinical especially critically ill infants. Recently, contact-free infant
evaluation shows that all benchmarked methods obtained an monitoring has been achieved by video cameras [4], which
accuracy above 75% while the proposed method achieved the can solve above issues related to contact-based monitoring.
best accuracy of 86%. This invokes further explorations of using However, almost all related work in this area focused on vital
facial/eye features of infants for sleep-awake staging, towards signs monitoring, and less on behavioral monitoring such as
intelligent contactless sleep analysis of infants in combination
with camera-based vital signs monitoring. context understanding in NICU, though there is a clear need in
Index Terms—Infant sleep, sleep and awake classification, tracking the cognitive-related neuron-development of infants.
clinical trial, contrastive learning Long et al. [5] extracted whole-body motion from videos to
detect the sleep-awake state of infants.
I. INTRODUCTION Although there are prospective experiments, camera-based
Infants usually need to sleep 14-17 hours per day to secrete infant sleep-awake detection was not fully explored and it
enough growth hormone for developing various body tissues still faces many challenges. First, most methods are trained
and organs, especially the nervous system [1]. However, due and evaluated on the dataset with less than 20 infants [6]–[8].
to disruptions caused by certain diseases (sudden infant death Since sleeping infants have limited variations in the sampled
syndrome, sleep apnea, etc.) and frequent medical interven- images from a video, the network trained on such data may
tions, infants in the Neonatal Intensive Care Unit (NICU) are be overfitted to a few infants in specific conditions. Second,
particularly susceptible to sleep disorders. This may lead to most methods exploit motion cues from an infrared camera for
malnutrition and weakened immunity of infants, aggravating sleep-awake detection rather than facial features [5], [8]–[10].
Since sleep and awake states can be derived from the eyes (i.e.
This work is supported by the National Key R&D Program of China closing or opening) directly, we consider that exploiting facial
(2022YFC2407800), General Program of National Natural Science Founda-
tion of China (62271241), Guangdong Basic and Applied Basic Research features for sleep-awake detection is a more straightforward
Foundation (2023A1515012983), Shenzhen Science and Technology Program option, especially for awake but quiet infants without much
(JSGGKQTD20221103174704003), and Shenzhen Fundamental Research body motion.
Program (JCYJ20220530112601003).
1 Department of Biomedical Engineering, Southern University of Science To explore the feasibility of using facial information for
and Technology, China. infant sleep-awake classification, in this paper, we conducted
2 Department of Obstetrics, Baoan Hospital of Traditional Chinese Medicine
a multi-center clinical trial at three Chinese hospitals to record
in Shenzhen, China.
3 Neonatal Intensive Care Unit, Nanfang Hospital of Southern Medical the videos of infants from NICU. This study was approved
University, China. by the hospitals’ Institutional Review Boards and informed
4 Neonatal Intensive Care Unit, The Third People’s Hospital of Shenzhen,
consents were obtained from the legal guardian of infants. To
China. detect the sleep-awake state of an infant, we benchmarked four
† These authors contributed equally to this work.
∗ Corresponding author: Hongzhou Lu (luhongzhou@fudan.edu.cn), Wenjin machine-learning methods including support vector machine
Wang (wangwj3@sustech.edu.cn) (SVM), k-nearest neighbors (KNN), multilayer perceptron
Authorized licensed use limited to: VIT University- Chennai Campus. Downloaded on April 11,2024 at 15:03:27 UTC from IEEE Xplore. Restrictions apply.
979-8-3503-0230-1/23/$31.00 ©2023 IEEE 319
2023 IEEE International Conference on E-health Networking, Application & Services (Healthcom)
Dense
MediaPipe
(CNN-based)
Sensors Video stream Right eye
(24 x 24 x 3) Face Image
Annotate
ResNet18-CL (Improved Model)
Recorder (awake/sleep)
Mapper
(CNN-based)
Dense Dense Lmsup
ResNet18
Infant
Classifier
Face Image Face Image
(224 x 224 x 3) Dense Dense Lc
Clinical scenario N Medical Staff
Fig. 1. Clinical setup and pipeline for infant sleep-awake detection. The multi-center clinical trial was performed at three hospitals to collect the infant videos
including the state of awake and sleep. The preprocessing unifies the data format. The benchmarked methods evaluate the feasibility of using infant facial
features for sleep and awake classification, and ResNet18-CL is proposed to improve classification performance.
(MLP), and convolutional neural network (CNN). In particular, the recording length from 2 to 10 minutes. Their gestational
to address the scarcity of data diversity caused by over- ages were between 32 and 37 weeks.
sampling the data from a sleeping infant, we proposed a In the Department of Obstetrics of the Shenzhen Baoan
novel method based on the combination of ResNet18 and Hospital of Traditional Chinese Medicine, the infant videos
contrastive learning to improve the consistency of the learned were recorded by nurses in an unconstrained environment
features from different infants but with the same state (or using the 48MP camera of a smartphone. Each video was
label) by pairwise comparison between the data from different recorded between 20 and 35 seconds, with a resolution of
infants. Extensive experiments on clinical data demonstrate 720×1280 pixels. The goal of this setup is to evaluate the
the feasibility of camera-based sleep and awake detection on performance of trained models on a new setting with limited
unseen infants. training data. Here we collected the videos of 26 full-term
infants within one hour of birth.
II. C LINICAL INFANT SLEEP DATASET A clinical dataset including a total of 55 infants (full-
term, preterm, and critically ill) was created in our multi-
Previous studies [6]–[10] collected video data from a small center clinical trial. To annotate the dataset, the medical
number of infants (less than 20) to train and evaluate their physicians labeled two states (awake and sleep) based on the
proposed methods. Due to the lack of the diversity of training guidelines [11]. After that, the machine learning methods were
data, it may create bias when evaluating the devices or meth- trained and evaluated on 5754 images (2831 sleep images and
ods. To address this, a multi-center clinical trial was conducted 2923 awake images) from 55 infants in this paper.
to collect the videos of infants with different gestational ages
and physical conditions (see Fig. 1). III. M ETHOD
In the NICU of the Third People’s Hospital of Shenzhen The pipelines of the benchmarked methods and the proposed
and the Nanfang Hospital of Southern Medical University, the method are shown in Fig. 1. The benchmarked methods are
RGB camera (JX6420, JieXiang Optoelectronics, china) was in two categories: (i) handcraft feature-based methods. Mon-
utilized to record the infant videos with different resolutions of itoring of eye movements can identify the infant sleep/awake
160×90 pixels and 320×180 pixels, sampled at 20 frames per state, and rapid eye movement is associated with different
second (FPS). Each video, ranging from 10 minutes to 2 hours, sleep stages [12]. For facial analysis, it has been reported that
recorded a variety of infant activities without any constraints, the histograms of oriented gradients (HOG) is an effective
including crying, body movement, wakefulness, sleep, etc. 26 feature for characterizing the texture patterns of the eye [13].
preterm or critically ill infants, with the gestational age from Thus, we extract HOG features from the eye areas and use
27 to 39 weeks were recorded. SVM, KNN, and MLP to perform classifications between
In the NICU of the Shenzhen Baoan Hospital of Traditional the sleep and awake state; (ii) CNN-based methods. CNN
Chinese Medicine, we used a different RGB camera (IDS- can automatically learn task-relevant features from the labeled
Ul3860C, Germany) to capture the infant videos with a reso- data in an end-to-end fashion. It can also leverage existing
lution of 968×608 pixels, sampled at 20 FPS. Three videos of data from neighboring fields (e.g. ImageNet) to improve
2 preterm infants and 1 full-term infant were recorded, with domain-specific performance by fine-tuning. ResNet18 that
Authorized licensed use limited to: VIT University- Chennai Campus. Downloaded on April 11,2024 at 15:03:27 UTC from IEEE Xplore. Restrictions apply.
320
2023 IEEE International Conference on E-health Networking, Application & Services (Healthcom)
Authorized licensed use limited to: VIT University- Chennai Campus. Downloaded on April 11,2024 at 15:03:27 UTC from IEEE Xplore. Restrictions apply.
321
2023 IEEE International Conference on E-health Networking, Application & Services (Healthcom)
Sleep
Sleep
Sleep
Sleep
Sleep
0.64 0.36 0.71 0.29 0.62 0.38 0.84 0.16 0.84 0.16
Awake
Awake
Awake
Awake
Awake
0.15 0.85 0.073 0.93 0.13 0.87 0.2 0.8 0.11 0.89
Sleep Awake Sleep Awake Sleep Awake Sleep Awake Sleep Awake
(a) SVM (b) KNN (c) MLP (d) ResNet18 (e) ResNet18-CL
Fig. 2. The confusion matrices obtained by benchmarked methods on the test set of infant sleep dataset.
Authorized licensed use limited to: VIT University- Chennai Campus. Downloaded on April 11,2024 at 15:03:27 UTC from IEEE Xplore. Restrictions apply.
322