An Efficient Edge Deep Learning Computer Vision System To Prevent Sudden Infant Death Syndrome

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 6

2021 IEEE International Conference on Smart Computing (SMARTCOMP)

An Efficient Edge Deep Learning Computer Vision


System to Prevent Sudden Infant Death Syndrome
2021 IEEE International Conference on Smart Computing (SMARTCOMP) | 978-1-6654-1252-0/21/$31.00 ©2021 IEEE | DOI: 10.1109/SMARTCOMP52413.2021.00061

Vivek Bharati
Homestead High School
Cupertino, California, USA
vbharati238@student.fuhsd.org

Abstract—Sudden Infant Death Syndrome (SIDS) causes in- factors stated above cannot be seen or pinpointed, the most
fants under one year of age to die inexplicably. One of the most effective way to reduce the risk of SIDS is to reduce or remove
important external factors, also called an “outside stressor,” that outside stressors altogether [1, 2]. This involves eliminating
is responsible for Sudden Infant Death Syndrome (SIDS), is the
sleeping position of the baby. According to past research, the the sleeping-position-related-risk by placing the sleeping baby
risk of SIDS increases when the baby sleeps facedown on the on the back instead of on the stomach and ensuring the baby
stomach. We propose a Convolutional Neural Network (CNN) doesn’t turn from the back sleeping position to the stomach
based computer-vision system that estimates the sleeping pose sleeping position.
of the baby and alerts caregivers on their mobile phones within A number of tools have been created to attempt the task
a few seconds of the baby moving to the hazardous facedown
sleeping position. The system has a low computational load and of avoiding the sleeping position as an outside stressor. These
a low memory footprint. This characteristic allows the system include baby sleeping sacks, breathing monitors, movement
to be embedded in low power edge devices such as certain sensor pads, special mattresses, bumpers, etc [3]. The U.S
baby monitors. Processing at the edge also alleviates privacy Food and Drug Administration (FDA) has made it clear that
concerns with regards to sending images into the network. We a safe sleep environment for a baby is for the infant to be
experimented with various numbers of convolutional processing
units and dense layers as well as the number of convolutional alone in the crib or bassinet, free of other objects including the
kernels to arrive at the optimal production configuration. We devices mentioned above [4]. Therefore, it would be beneficial
observed a consistently high accuracy of detection of infant to create a non-intrusive, contact-less, and automated system
sleeping position changes from turning to facedown positions that can be used to ensure the baby maintains a back-sleeping
with a potential towards even higher accuracies with caregiver position while asleep. The system should have no devices,
feedback for model retraining. Therefore, this system is a viable
candidate for consideration as a non-intrusive solution to assist objects, or sensors in contact with the baby in order to prevent
in preventing SIDS. risks. The device should be expedient in detecting the sleeping
Index Terms—Sudden Infant Death Syndrome, SIDS, convo- position outside stressor and should alert caregivers quickly so
lutional neural network, image classification, posture estimation, that they can take immediate remedial action.
edge deep learning In this paper, we have proposed a custom, convolutional
neural network (CNN) based computer-vision system that can
I. I NTRODUCTION process images from a camera focused on the baby’s bed
Sudden Infant Death Syndrome (SIDS) is a leading cause of which can automatically detect the sleeping position of the
death in infants. Research has led the scientific community to baby. Figure 1 presents the sleeping positions of the baby that
believe that babies who succumb to SIDS are born with one or need to be detected. The system also detects an attempt by the
more conditions that may lead to a fatality in response to what baby to change position from the back sleep position to the
are called internal and external stressors [1]. Researchers have stomach sleep position. The system automatically generates a
identified three primary risk factors that must all be present local audible alert and sends an alert notification to caregivers’
and combine to result in the death of an infant from SIDS mobiles. The system allows the caregivers to provide a valida-
[2]. The first of the risk factors is the infant being vulnerable. tion feedback regarding whether the alert is a valid alert or is a
Infants may be susceptible to an abnormality in the portion of false positive. This feedback can be used to retrain the system
the brain that controls heart rate and respiration. The second so that accuracy levels are increased. The system is optimized
risk factor is the crucial nature of the developmental period of for low computational power and a low memory footprint,
the baby. For example, during the first six months of a baby’s which creates the potential for it to be embedded in edge
life, rapid growth may destabilize the baby’s internal systems. devices, such as certain baby monitors and home computers.
The third and most important risk factor is the set of external Our work highlights the potential to use deep learning
or outside stressors. These include things such as the sleeping techniques at the edge to create non-intrusive life-saving solu-
position of the baby, especially when the baby sleeps on the tions that eliminate the need for manual monitoring. The deep
stomach in a facedown position. All three risk factors listed learning techniques can also eliminate the use of automated
above must be present simultaneously for a SIDS fatality to monitoring devices, which, because they are intrusive, may
occur. It is recognized by experts that since the first two risk interfere with the objectives of such monitoring. We have

978-1-6654-1252-0/21/$31.00 ©2021 IEEE 286


DOI 10.1109/SMARTCOMP52413.2021.00061

Authorized licensed use limited to: Koninklijke Philips N.V.. Downloaded on May 09,2022 at 08:35:52 UTC from IEEE Xplore. Restrictions apply.
Fig. 1. Sleeping position classifications.

demonstrated that deep learning techniques can be optimized Fig. 2. Custom CNN architecture.
for settings that necessitate the use of low power and low
memory footprints. Deep learning techniques optimized for
low power and low memory footprint requirements would to detect movement and temperature of the baby, they are not
open up the avenues for embedding machine learning in useful in identifying the specific sleeping posture of the baby.
edge devices, such as cameras, that are normally found in III. M ODEL
clinical settings. Optimized deep learning techniques would
also expand the use of deep learning techniques into settings Since images of the head and shoulders area contain a
where privacy concerns would hinder the transfer of images limited number of features, we hypothesized a simplified
from a clinical setting into the cloud for processing. Integration custom convolutional neural network which implements image
of edge deep-learning models with mobile notification services classification for sleeping pose estimation. Then, we trained
as demonstrated in our work creates the potential to free-up the model on a substantial number of baby images in various
the anxiety of caregivers while increasing the effectiveness of sleeping positions that would be suitable for accomplishing
care. the objectives of our work. The three classifications for the
image classification task were the following: (1) the baby
II. R ELATED W ORK in a back sleeping position (normal); (2) the baby changing
The use of Convolutional Neural Network (CNN) systems from one position to another; (3) the baby sleeping in a
for various computer vision tasks is well established, inspired stomach sleeping position (outside stressor). Our proposed
by the work of Hubel and Wiesel on the vision systems of architecture (see Figure 2) draws upon recent advancements
animals [5]. A progression of various advancements in the in image classification and pose estimation, but, at the same
architecture of CNNs can be found in [6-11]. Currently, one time, simplifies these so as to address the challenges of
of the primary uses of CNNs in computer vision is image computational load and memory footprint at the edge.
classification [12-15]. In image classification, the input images In section III.A, we describe the components of our custom
are classified into one of a set of pre-specified categories. The CNN model. In section III.B, we describe the components that
key aspect in using image classification CNNs for a certain surround the core custom CNN model in order to provide the
task is to formulate the task as a classification problem. A full functionality of a complete practical system. In section
related class of tasks is activity detection from videos [16-19] III.C, we describe the process adopted to generate the training
wherein a specific activity, typically by a human, is detected and test datasets. The approach we took for evaluation of our
from a given video. An area of research related to baby candidate models is presented in section III.D. Results of the
sleeping-position detection is that of human-pose estimation, evaluation are presented in section III.E.
which is the localization of human joints in images or videos
and the search for a specific pose among all articulated poses. A. Model Architecture
Examples of prior research in human pose estimation include The overall architecture of our custom CNN can be repre-
[20-25]. sented as follows:
We experimented with various prior classes of neural net- Input image → [Convolutional Layer + Max Pooling Layer]
work model architectures for the objectives of this work. Most x multiple times → Fully Connected Layer(s) → Output.
of the prior art deep-learning models for image classification 1) Input Image: A 2D image of the baby captured by the
and posture estimation that we looked at required significant camera is the input data. The input image is converted into
computing resources as well as memory ranging from tens of grayscale and rescaled to 100 pixel x 100 pixel size. This
megabytes to hundreds of megabytes, precluding their use in preprocessing to grayscale and resizing is performed both for
low power, low memory edge devices. Hosting these models training data as well as for live images captured for sleeping
in the cloud creates cost and privacy challenges. These models pose estimation. This preprocessing improves computational
also take up to tens of seconds to process images. This would efficiency and speed.
render their use in time-critical tasks, such as quickly detecting 2) Convolutional Layer: For this system, a 3x3 convolu-
the SIDS outside stressor, impractical. Stripping-down prior art tional window was used. The output of the above convolution
models to make them edge-ready led to downgrades in their is passed through an activation function. We used the Rectified
performance. While techniques for patient monitoring using Linear Unit (ReLU) as the activation function due to its ability
sensors such as bed sensors and thermal sensors can be used to eliminate the issue of vanishing gradients in large scale

287

Authorized licensed use limited to: Koninklijke Philips N.V.. Downloaded on May 09,2022 at 08:35:52 UTC from IEEE Xplore. Restrictions apply.
positions (Sleeping on Back – OK , Baby trying to turn –
Alert, Baby turned over and lying on stomach – Alarm).
3) For a specific image captured, if the system decides the
baby is OK; it moves to Step 1 to capture the next image
and proceeds to further steps
4) In the case when the system decides to notify the
caregivers, an onscreen visible indicator and an audible
alert are switched on.
5) Also, the system can generate multiple channels of alerts
such as an SMS text message. These alerts are sent to
pre-configured mobile numbers. The alerts can contain
Fig. 3. End-to-end system design. messages such as “The baby has turned over. Please
check on the baby.” In our implementation, we used
Amazon Simple Notification Service to send notifica-
convolutional neural networks. Once the output is calculated, tions.
the input window is shifted by the stride value. For this system, 6) Alternatively, a Push Notification can be sent to a mobile
a stride value of 1 was used. Multiple filters could be used application with the alert message along with the picture
within a convolutional layer. from the camera. The system can generate multiple alerts
3) Max Pooling Layer: Max Pooling layer helps in picking in multiple channels (for example, SMS to a few phones,
up the most relevant features and reducing the size of the data. Push Notifications to multiple mobile devices, voice
For this system, the window/kernel size of 2x2 with a stride messages on home automation devices, etc). The intent
of 2 was used with Max (maximum of the given values) as the is to ensure that a channel failure or latency does not
pooling function. Max Pooling retains the largest value within hinder the performance of the system.
each window. 7) The Push Notifications can be sent along with the
4) Dense Layer: Dense layers or fully connected layers captured baby image with options for the caregiver to
are the multilayer perceptron layers, where all inputs are acknowledge and validate the notification. Two options
connected to the required outputs. In the final layer in our are provided for the caregiver: a) “OK” to acknowledge
implementation, the output is the set of probabilities of the notification as correct and switch off the alarm; b)
three classes corresponding to the three possible sleeping “False Alarm” to indicate that the system prediction is
positions. In this particular application of dense layers, the incorrect.
Sigmoid activation function was used. The final dense layer 8) Based on the caregiver’s acknowledgement that the
had a dropout rate of 0.5 (50% probability of shutting down notification is correct, the corresponding images are an-
randomly chosen nodes in each layer of the neural network notated to the predicted class. If the caregiver disagrees
[26]). with the prediction, the prediction with the next higher
probability value is taken as the annotation for the next
The custom CNN system was built leveraging Python 3 round of training.
with Jupyter Notebook as the development environment. 9) Periodic retraining with live images can also be per-
The core of the model was built using TensorFlow/Keras. formed. As explained in Step 1, the system continuously
Sequential modeling (where layers are organized sequentially) captures the baby’s images for predicting the baby’s
was used to develop the various layers in the neural network. position. These images are retained for retraining along
The open source code of our implementation can be found at: with the predictions corresponding to these images. Any
https://github.com/viveksbharati/SIDS image where there is an acknowledgement from the
caregiver (Step 8) is also included for the retraining
B. Supporting components of end-to-end system
process. Periodic retraining of the model with the images
Figure 3 shows the high-level system functionality with captured as part of the system execution adapts the
end-to-end system components including the above CNN system to a specific baby.
architecture at its core.
1) A camera is positioned facing the baby. The camera is C. Dataset generation
connected to a computer (which could be a processing The datasets needed to train and test the model, which
unit that is part of the camera). The program running on would be a large number of images of sleeping babies in the
the computer is capable of taking pictures. three positions discussed earlier, were not publicly available.
2) The high resolution color image from the camera is Therefore, we generated the datasets needed for this work.
converted to grayscale, resized to 100 x 100 pixels, and Lifelike baby dolls of three different ethnicities in equal
passed on to a Convolutional Neural Network whose proportions were used to generate the datasets. The continuous
architecture was described earlier. The model returns stream of image data was provided by a webcam that was
confidence levels against the three categories of sleeping trained on the head and shoulders area of the baby. OpenCV

288

Authorized licensed use limited to: Koninklijke Philips N.V.. Downloaded on May 09,2022 at 08:35:52 UTC from IEEE Xplore. Restrictions apply.
was used to periodically capture images from the video feed to • Number of Dense layers
create the datasets. While capturing the images, the angle of The five models evaluated were:
the baby, position of the arms and the position of the baby • Model 1: 1 x (2D convolutional layer with 32 kernels +
relative to the crib were changed continuously to account 1 Max Pooling layer) followed by 1 Dense layer.
for movements of the baby in the crib. Given the limited • Model 2: 2 x (2D convolutional layer with 32 kernels +
set of features in the head and shoulders areas of the baby, 1 Max Pooling layer) followed by 1 Dense layer.
training of the model with datasets generated with realistic • Model 3: 3 x (2D convolutional layer with 32 kernels +
baby dolls provides a starting point for the model, which 1 Max Pooling layer) followed by 1 Dense layer.
then gets enhanced with live images during periodic retraining • Model 4: 3 x (2D convolutional layer with 64 kernels +
when deployed in live settings. 1 Max Pooling layer) followed by 2 Dense layers.
A total of 9,000 images generated as above were used as • Model 5: 3 x (2D convolutional layer with 32 kernels +
seeds. Every seed image was also used to generate additional 1 Max Pooling layer) followed by 2 Dense layers.
images using image transformations. The ImageDataGener- The models were compiled with the following parameters:
ator part of Keras image processing package was used for • The loss function used was “Sparse Categorical Cross
rotating, cropping, and zooming the seed images. Additional Entropy”, which computes the cross entropy loss between
transformations involved altering the height and width of each the labels and predictions.
seed image and flipping the seed images horizontally. The • The Adam optimizer was used with learning rate = 0.001,
parameters for the transformations were: beta 1 = 0.9, beta 2 = 0.999, epsilon = 1e-07 and
• Image rotation with range up to 20 degrees amsgrad = False
• Image shift (pan) horizontally/vertically by 10% of the • Capturing of the accuracy matrix was enabled using
image width/height TensorBoard
• Horizontal flips Accuracy is calculated by counting the correct predictions
• Zoom range of 20% and dividing it by the total number of predictions. The Sparse
• Up to 5% shearing of the image Categorical Cross Entropy function was used to calculate the
This transformation process was carried out to ensure the loss. The model that yielded the best results, in terms of a
success of the neural network model in classifying different higher accuracy rate combined with lower error/loss as well
sleeping positions regardless of any new positioning of the as lower memory and lower training time, was sought for
camera, as well as variations in the image quality and camera production deployment.
quality. Along with the seed and transformed images, the Accuracy evaluation: Model 1 never improved over an
total dataset included about 18,000 images across the three accuracy of 0.4. The rest of the models yielded much higher
sleeping position categories in nearly equal proportions. 5% accuracy levels and tended to plateau around 10 epochs.
of that (about 900) was used as the test set, and the remaining Therefore, the number of training epochs was reduced to
95% was used as the training set. Care was taken to avoid 12. Figure 4 presents the results of the accuracy evaluation
including transformed images of training set images in the test for training. Figure 5 presents the results of the accuracy
set. The images were then randomized and labeled to the three evaluation for validation. Model 1 is not included in either
categories (OK - Face Up, Alert - Turning, Alarm - Facedown). figure because of its significantly lower performance.
All images were converted to grayscale, and scaled down to
100 x 100 pixels. The image size was chosen after trial and
error between the visible features of the baby’s position and
optimal size for model generation. The sized images were
pickled (utility to marshal and unmarshal a data structure)
for later use. The data was normalized using (x - x.min())
/ (x.max() - x.min()). In our case, since the grayscale image
had just one channel, we could normalize the grayscale image
data by dividing it by 255, changing every pixel value to be
between 0.0 and 1.0.

D. Evaluation
The combination of a Convolutional Layer with a Max
Pooling Layer was used as a single convolutional processing
unit. Different models were built and evaluated with five
different combinations of
• Number of (Convolutional layer + Max Pooling layer)
convolutional processing units
• Number of filters within each Convolutional layer Fig. 4. Accuracy evaluation - Training.

289

Authorized licensed use limited to: Koninklijke Philips N.V.. Downloaded on May 09,2022 at 08:35:52 UTC from IEEE Xplore. Restrictions apply.
Fig. 5. Accuracy evaluation - Validation.
Fig. 7. Loss evaluation - Validation.

E. Results
Model 1 had a training time of 20 minutes. Since the
training and validation accuracies didn’t improve beyond 0.4
Loss evaluation: Model 1 had its lowest loss function of
and loss didn’t reduce below 0.5, this configuration was
0.5 after 14 epochs. For the other models, the loss function
discarded.
considerably reduced after 10 epochs. Figure 6 presents the
Model 2 had good accuracy and loss performance. The size
results of the loss evaluation for training. Figure 7 presents
of the model was 850 kB. The training time was 162 minutes.
the results of the loss evaluation for validation. Since Model
Model 3 had the best accuracy and loss performance. The
1 had significantly poorer performance compared to the other
size of the model was 500 kB. The training time was 25
models, it is not included in either figure.
minutes.
Model 4 had good accuracy and loss performance. The size
of the model was 5.7 MB. The training time was 52 minutes.
Model 5 had good accuracy and loss performance. The size
of the model was 1.5 MB. The training time was 25 minutes.
Model 3 had the best overall accuracy and loss performance,
along with the least memory footprint and training time.
Therefore, it was selected for production deployment.
During the live test runs, the system always predicted
the right posture, especially for the faceup and facedown
positions (i.e., the accuracy rates were close to 100% during
our observations). The processing time taken for the system
to capture three continuous images, transmit the data to the
model, classify all three images, perform majority voting on
the posture estimations, and generate corresponding notifica-
tions locally and on the mobile (in case of the baby turning
or lying facedown) was within 3 seconds.
IV. RUNTIME C ONFIGURATION O PTIONS
The production model could be deployed on low-powered
single-board computers, for example, Raspberry Pi or Nvidia
Jetson Nano. We have implemented the system on a laptop
with a connected webcam as well as on a Raspberry Pi device
with a miniature HD camera. The Raspberry Pi implementa-
Fig. 6. Loss evaluation - Training. tion setup was as follows:

290

Authorized licensed use limited to: Koninklijke Philips N.V.. Downloaded on May 09,2022 at 08:35:52 UTC from IEEE Xplore. Restrictions apply.
Hardware: [5] D. H. Hubel and T. N. Wiesel, “Receptive fields, binocular interaction,
• Raspberry Pi 4 Model-B with 8 GB RAM, Quad-core
and functional architecture in the cat’s visual cortex,” The Journal of
Physiology, vol. 160, pp. 106–154, 1962.
64-bit Broadcom 2711, Cortex A-72 processor [6] Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-based learning
• OV5647 5 MP 1080p IR-cut camera mounted on the crib applied to document recognition,” Proceedings of the IEEE, vol. 86, no.
and focused on baby’s head and shoulders area 11, pp. 2278–2323, 1998.
[7] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “ImageNet Classification
Software: With Deep Convolutional Neural Networks,” In Proc. Advances in Neural
• 64 bit Debian GNU/Linux OS installed on Pi 4. Camera Information Processing 25, 2012, pp. 1097-1105.
[8] M. Zeiler and R. Fergus, “Visualizing and Understanding Convolutional
module enabled and Pi connected to WiFi. Networks,” In Proc. European Conference on Computer Vision 2014,
• Python 3.7.3 for runtime 2014, pp. 818-833.
• Tensorflow 2.3.0/Keras 2.4.0 to load and run model [9] C. Szegedy, W. Liu, Y. Jia, et al, “Going Deeper with Convolution,”
In Proc. IEEE Conference on Computer Vision and Pattern Recognition
• OpenCV 4.1.1 for image capture and manipulations (CVPR), 2015, pp. 1-9.
• Boto3 - 1.17.30 to communicate with Amazon Web [10] K. He, X. Zhang, S. Ren et al, “Deep Residual Learning for Image
Services Simple Notification Service Recognition”, In Proc. IEEE Conference on Computer Vision and Pattern
Recognition (CVPR), 2016, pp. 770-778.
The model could be further fine-tuned and ported on to Ten- [11] F. Chollet, “Xception: Deeplearning with Depthwise Separable
sorFlow Lite to optimize the footprint and execution efficiency. Convolution,” arXiv.org, Apr. 2017. [Online]. Available:
This would help in ensuring portability to a host of edge https://arxiv.org/abs/1610.02357. [Accessed December 12, 2020].
[12] S. Soffer, A. Ben-Cohen, O. Shimon, et al., “Convolutional Neural
platforms, including mobile phones (iOS/Android) as well Networks for Radiologic Images,” Radiology, vol. 209, no. 3, pp. 590-
as platforms with embedded Linux and microcontroller-based 606. 2019.
single-board computers based on ARM Cortex-M Series. [13] S. Ahlawat, A. Choudhary, A. Nayyar, et al., “Improved Handwriting
Recognition Using Convolutional Neural Networks,” Sensors 2020, vol.
V. C ONCLUSION 20, no. 12, pp. 3344, 2020.
[14] M. Coskun, A. Ucar, O. Yildrium, et al., “Face recognition based
We have proposed a custom CNN system to help prevent on convolutional neural network,” In Proc. International Conference on
the Sudden Infant Death Syndrome. It can provide alerts Modern Electrical and Energy Systems (MEES), 2017, pp. 376-379.
[15] Z. Zhao and S, Xu, “Object Detection with Deep Learning: A Review,”
via mobile notifications to help avoid the sleeping position IEEE Transactions on Neural Networks and Learning Systems, vol. 30,
outside stressor. The system has been developed to minimize no. 11, pp. 3212-3232, 2019.
computational load and memory footprint, which enables it to [16] A. S. Voulodimos, D. I. Kosmopoulos, N. D. Doulamis, and T. A.
Varvarigou, “A top-down event-driven approach for concurrent activity
be embedded in edge devices, such as baby monitors, low-end recognition,” Multimedia Tools and Applications, vol. 69, no. 2, pp.
home computers and low-cost single-board computers such 293–311, 2014.
as Raspberry Pi devices. Our experiments show that a model [17] A. S. Voulodimos, N. D. Doulamis, D. I. Kosmopoulos, and T. A.
Varvarigou, “Improving multi-camera activity recognition by employing
with three sequential sets of convolutional processing units, neural network based readjustment,” Applied Artificial Intelligence, vol.
each consisting of a 2D Convolutional layer with 32 kernels 26, no. 1-2, pp. 97–118, 2012.
and a Max Pooling layer, followed by one fully connected [18] T. Kautz, B. H. Groh, J. Hannink, U. Jensen, H. Strubberg, and B. M.
Eskofier, “Activity recognition in beach volleyball using a DEEp Con-
Dense layer is able to estimate the sleeping pose of the volutional Neural NETwork: leveraging the potential of DEEp Learning
baby and any pose changes with high accuracy, and alert in sports,” Data Mining and Knowledge Discovery, vol. 31, no. 6, pp.
caregivers on mobile devices in about three seconds. The 1678–1705, 2017.
[19] K. Tang, B. Yao, L. Fei-Fei, and D. Koller, “Combining the right features
memory footprint of the model was around 500 kilobytes. for complex event recognition,” In Proc. 2013 14th IEEE International
This gives us confidence that this custom convolutional neural Conference on Computer Vision, 2013, pp. 2696–2703.
network model is a viable candidate for consideration as a non- [20] A. Toshev and C.Szegedy, “DeepPose: Human pose estimation via
Deep Neural Networks,” arXiv.org, Aug. 2014. [Online]. Available:
intrusive, contact-less, low-cost technology to help prevent https://arxiv.org/abs/1312.4659. [Accessed December 12, 2020].
Sudden Infant Death Syndrome. Our work highlights the [21] J. Tompson, R. Goroshin, A. Jain, et al., “Efficient object localization
potential to use deep learning techniques at the edge to create using convolutional networks,” arXiv.org, June 2015. [Online]. Available:
https://arxiv.org/abs/1411.4280. [Accessed December 12, 2020]
non-intrusive life-saving solutions that eliminate the need for [22] S. Wei, V. Ramakrishna, et al., “Convolutional Pose
manual monitoring. Machines”, arXiv.org, Apr. 2016. [Online]. Available:
https://arxiv.org/abs/1602.00134. [Accessed December 12, 2020]
R EFERENCES [23] J. Carreira, P. Agrawal, et al., “Human Pose Estimation with It-
[1] National Center of Child Health and Human Development, “Re- erative Error Feedback”, arXiv.org, June 2016. [Online]. Available:
search on Possible Causes of SIDS,” nichd.nih.gov, [Online]. Available: https://arxiv.org/abs/1507.06550. [Accessed December 12, 2020]
https://bit.ly/2XHaKrb. [Accessed December 12, 2020]. [24] A. Newell, K. Yang, J Deng, ”Stacked Hourglass Networks for
[2] National Center of Child Health and Human Development, “What is Human Pose Estimation”, arXiv.org, July 2016. [Online]. Available:
SIDS?,” nichd.nih.gov, [Online]. Available: https://bit.ly/3qgsaY5. [Ac- https://arxiv.org/abs/1603.06937. [Accessed December 12, 2020]
cessed December 12, 2020]. [25] Papandreou, George, et al. “PersonLab: Person Pose Estimation
[3] A. Wehrli, “SIDS: The 10 Best Prevention Products and 5 That and Instance Segmentation with a Bottom-Up, Part-Based, Geomet-
Were Recalled,” babygaga.com, May 26, 2018 [Online]. Available: ric Embedding Model,” arXiv.org, Mar. 2018, [Online]. Available:
https://www.babygaga.com/sids-the-10-best-prevention-products-and-5- arxiv.org/abs/1803.08225. [Accessed December 12, 2020]
that-were-recalled/. [Accessed December 12, 2020]. [26] N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhut-
[4] U.S Food and Drug Administration, “Baby Products with dinov. 2014. “Dropout: A Simple Way To Prevent Neural Networks From
SIDS Prevention Claims,” fda.gov, October 17, 2019. [Online]. Overfitting,” Journal of Machine Learning Research, vol. 15, no. 56, pp.
Available: https://www.fda.gov/medical-devices/products-and-medical- 1929–1958, 2014.
procedures/baby-products-sids-prevention-claims. [Accessed December
12, 2020].

291

Authorized licensed use limited to: Koninklijke Philips N.V.. Downloaded on May 09,2022 at 08:35:52 UTC from IEEE Xplore. Restrictions apply.

You might also like