Professional Documents
Culture Documents
Considering Interpersonal Differences in Validating Wearable Sleep-Tracking Technologies
Considering Interpersonal Differences in Validating Wearable Sleep-Tracking Technologies
Abstract—This study investigated an important yet usually Well Co., Osaka, Japan) for approximately 40 nights. The
neglected question in validating consumer sleep tracking devices: measurements of consumer devices were validated against
does interpersonal difference play a role in the validity of those by SLEEP SCOPE and such analysis were done for each
consumer sleep trackers? Most existing validation studies assume participant separately. We then compared the results obtained
that the accuracy of these devices is consistent cross individuals, from each of the participants. The measurements by SLEEP
ignoring the fact that human sleep demonstrates significant SCOPE were used as the ground truth to compare with.
interpersonal differences. This study aimed to test this
assumption through validating two newest gadgets of home sleep We used several statistical techniques to analyze the
tracking technologies, i.e. Fitbit Charge 2 (wearable wristband) validity of consumer sleep trackers. The results revealed
and Neuroon (wearable EEG eye mask), in comparison to a significant inter-personal differences. First, the Fitbit
clinical device. Two participants were recruited to track their measurements completely deviated from the clinical
sleep using a Fitbit, a Neuroon, and a clinical device SLEEP measurements for one participant, but agreed very well for
SCOPE simultaneously for approximately 40 nights. Data another participant especially in the dimension of total sleep
analysis was conducted for each participant separately. The time (TST). Second, the Neuroon measurements agreed well
results showed that the Fitbit measurements completely deviated but were uncorrelated to the clinical measurements for one
from the clinical measurements for one participant, but agreed participant (on SOL), and were correlated significantly but
very well for another participant, especially in the dimension of
agreed poorly for another participant (on WASO and SE). In
total sleep time. Neuroon measurements agreed well but were
addition, we found that good agreement between Neuroon and
uncorrelated to clinical measurements for one participant, and
were correlated significantly but agreed poorly for another
SLEEP SCOPE generally occurred when the user had a good
participant. This study suggested that interpersonal differences night of sleep. The main contributions of this study are two-
should be considered in future validation studies and in designing fold: (1) the results revealed that the validity of consumer
better consumer sleep-tracking technologies. wearable devices varies from user to user; (2) we identified
conditions for good validity of consumer wearable devices to
Keywords—Sleep; wearables; EEG; Fitbit; validation; n-of-1; inform the future design of accurate consumer sleep trackers.
personal informatics. The outcome of this study brought forward an important
question that we need to consider in future validation studies
I. INTRODUCTION and in designing better consumer sleep tracking technologies.
As sensing and wearable technologies advance, many home
sleep tracking technologies emerge in consumer market. These II. RELATED WORK
technologies range from free mobile applications (e.g. Sleep As In clinical studies, polysomnography (PSG) and actigraphy
Android, Sleep Bot) to affordable wearable devices (e.g. Fitbit, are the most widely used devices for objectively measuring
Jawbone) to portable EEG (e.g. Neuroon and Sleep Shepherd). sleep. It has been most useful for the diagnosis and treatment of
Since human sleep demonstrates significant interpersonal sleep diseases such as obstructive sleep apnea (OSA),
differences [1-2], it is an interesting question as to whether narcolepsy, rapid-eye-movement sleep behavior disorder [6].
consumer sleep trackers are equivalently accurate for all Actigraphy, on the other hand, is mostly used for the diagnosis
individuals. Most of the work in literature assumes that the of circadian rhythm disorders [7].
accuracy of these devices is consistent across individuals [3].
In addition to the clinical sleep monitoring devices, many
The aim of this study is to test this assumption through
consumer sleep tracking devices has emerged over recent years.
validating two newest gadgets of home sleep tracking
These technologies can be generally divided into two
technologies in comparison to a clinical device. To investigate
categories, i.e. sleep tracking devices based on movement (e.g.
interpersonal differences, we adopted the n-of-1 approach (i.e.
Fitbit, Jawbone, mobile applications) and sleep tracking
personalized approach) [4] in place of the 1-of-n approach (i.e.
devices based on brain activity signals (e.g. Zeo headband,
large-sample approach) [5] that has been used in most of the
Neuroon eye mask, Sleep Shepherd headband) [14-15]. This
existing studies. We recruited two participants who wore a
field is expanding rapidly and new sleep tracking devices are
Fitbit Charge 2 wearable wristband, a Neuroon wearable EEG
introduced to the consumer market every year. Validation
eye mask, and a clinical device named SLEEP SCOPE (Sleep
studies on previous versions of Fitbit and Jawbone found that
This study was sponsored by JSPS KAKENHI Grant-in-Aid for Research
Activity Start-up (Grant Number 16H07469). The opinions expressed in
this paper do not represent the views of the second author’s company.
978-4-907626-31-0/17/$31.00 ©2017
Authorized licensed use limited to: MapuaIEEE
University. Downloaded on May 01,2021 at 12:32:40 UTC from IEEE Xplore. Restrictions apply.
2017 Tenth International Conference on Mobile Computing and Ubiquitous Network (ICMU)
Fitbit devices overestimated TST and SE and underestimated participants are shown in Table II (according to the
WASO compared to PSG and clinical actigraphy in “normal measurements of SLEEP SCOPE).
mode”. While in “sensitive mode”, Fitbit devices substantially TABLE II. BASIC STATISTICS OF SLEEP QUALITY
underestimated TST and SE and overestimated WASO [8-9].
In this study, we aim to investigate the impact of interpersonal TST WASO SOL
NAWK SE (%)
(min) (min) (min)
differences on the accuracy of recently-released sleep trackers Participant
in the consumer market, i.e. Fitbit Charge 2 wristband and 1
408േ30 34േ11 21േ14 34േ8 88േ4
Neuroon wearable EEG eye mask, with the purpose of Participant
412േ40 34േ33 8േ7 27േ8 90േ7
establishing a baseline for the state-of-the-art in this field. 2
TABLE I. DEFINITION OF MEASURED SLEEP METRICS TABLE III. Z-VALUES OF WSR TEST ON SLEEP METRICS
Authorized licensed use limited to: Mapua University. Downloaded on May 01,2021 at 12:32:40 UTC from IEEE Xplore. Restrictions apply.
2017 Tenth International Conference on Mobile Computing and Ubiquitous Network (ICMU)
D. Relationship between Consumer and Clinical Devices 0.43, p = 0.049). Moderate correlation was also found between
The Pearson correlation coefficients were computed to Neuroon and SLEEP SCOPE in multiple sleep dimensions for
assess the linear relationship between the measurements by participant 2, including WASO (r = 0.46, p = 0.029), SE (r =
consumer devices and those by the clinical device. The results 0.42, p = 0.054), and Wake Ratio (r = 0.45, p = 0.036).
on TST, WASO, SOL, NAWK and SE are summarized in E. Discussions
Table IV. For participant 1, no significant correlation was
found between Neuroon/Fitbit and SLEEP SCOPE. For The results presented above demonstrated that the validity
participant 2, strong correlation was found between Fitbit and of Fitbit and Neuroon was not consistent between the two
SLEEP SCOPE in the dimension of TST (r = 0.72, p < 0.000), participants.
and moderate correlation was found in terms of WASO (r =
(a) Bland-Altman plots between Neuroon and SLEEP SCOPE on aggregated sleep metrics for participant 1.
(b) Bland-Altman plots between Neuroon and SLEEP SCOPE on aggregated sleep metrics for participant 2.
Fig. 1. Bland-Altman plots of aggregated sleep metrics (TST, WASO, NAWK, SOL, SE) measured by Neuroon and SLEEP SCOPE for each participant. (The x-
axis is the mean of Neuroon and SLEEP SCOPE, and the y-axis is the difference between the two devices. Dotted line represents two standard deviations from the
mean.)
(a) Bland-Altman plots between Fitbit and SLEEP SCOPE on aggregated sleep metrics for participant 1.
(b) Bland-Altman plots between Fitbit and SLEEP SCOPE on aggregated sleep metrics for participant 2.
Fig. 2. Bland-Altman plots of aggregated sleep metrics (TST, WASO, NAWK, SOL, SE) measured by Fitbit and SLEEP SCOPE for each participant. (The x-
axis is the mean of Fitbit and SLEEP SCOPE, and the y-axis is the difference between the two devices. Dotted line represents two standard deviations from the
mean.)
Authorized licensed use limited to: Mapua University. Downloaded on May 01,2021 at 12:32:40 UTC from IEEE Xplore. Restrictions apply.
2017 Tenth International Conference on Mobile Computing and Ubiquitous Network (ICMU)
TABLE IV. CORRLEATION ANALYSIS ON AGGREGATED SLEEP METRICS mask, in comparison to a clinical device SLEEP SCOPE on
TST WASO SOL NAWK SE
two participants separately. The results demonstrated
Fitbit vs SLEEP
significant interpersonal differences. On the one hand, the
0.25 0.25 0.04 0.08 0.23 Fitbit measurements completely deviated from the clinical
SCOPE (Participant 1)
Neuroon vs SLEEP measurements for one participant, but agreed very well for
0.42 0.09 -0.53 0.18 -0.16
SCOPE (Participant 1) another participant, especially in the dimension of TST. On the
a**
Fitbit vs SLEEP 0.72 other hand, the Neuroon measurements agreed well but were
0.43* 0.22 0.02 0.23
SCOPE (Participant 2) ** uncorrelated to the clinical measurements for one participant
Neuroon vs SLEEP
0.36 0.46* -0.28 0.27 0.42*
(on SOL), and correlated significantly but agreed poorly for
SCOPE (Participant 2) another participant (on WASO, SE, Wake Ratio). Our study
a.
Bold indicates a significant correlation (*p 0.05, **p 0.01, ***p 0.001, ****p<0.000). revealed that the validity of consumer sleep tracking devices
varies from user to user. Such interpersonal differences should
Fitbit measurements completely deviated from those by be considered in future validation studies and in designing
SLEEP SCOPE for participant 1, whereas for participant 2 better consumer sleeps tracking technologies.
Fitbit achieved good agreement and significant correlation to
SLEEP SCOPE in the dimensions of TST and WASO. An REFERENCES
investigation into the scatterplot between Fitbit and SLEEP [1] Z. Liang, M. A. Chapa-Martell, B. Ploderer, “Inter-individual
SCOPE on SOL showed that Fitbit underestimated SOL differences in sleep quality: insights from mining wearable sleep-
tracking data,” IPSJ Technical Report, 2017-MBL-82(54), pp.1-6, 2017.
compared to SLEEP SCOPE. On the other hand, Neuroon
[2] H. P. A. van Dongen, K. M. Vitellaro, D. F. Dinges, “Individual
showed good agreement to SLEEP SCOPE in the dimension differences in adult human sleep and wakefulness: Leitmotif for a
of SOL for participant 1, but agreed poorly for participant 2. research agenda,” Sleep, vol.28, no.4, p.479-496, 2005.
Noting that participant 1 had longer SOL than participant 2, it [3] J. Mantua, N. Gravel, and R. Spencer, “Reliability of sleep measures
is likely that participant 1 were simply lying still during the from four personal health monitoring device compared to research-based
actigraphy and polysomnography,” Sensors, vol.16, 646, 2016.
process of falling asleep. Since Fitbit measures sleep based on
[4] E. O. Lillie, B.Patay, J. Diamant, et al, “The n-of-1 clinical trial: the
movement, it may estimate this segment as the time asleep ultimate strategy for individualizing medicine?” Personalized Medicine,
rather than the time to fall asleep. As contrast, Neuroon vol.8, no.2, p.161-173, 2011.
measures sleep based on brainwave, it was able to accurate [5] J. C. Cappelleri, N. Ting, “A modified large-sample approach to
classify this segment as the time to fall asleep even if the approximate interval estimation for a particular intraclass correlation
participant was not moving. As for participant 2, the disparity coefficient,” Statics in Medicine, vol.22, no.11, p.1861-1877, 2003.
between Neuroon and SLEEP SCOPE in the dimension of [6] C. A. Kushida,M. R. Littner, T. M. Morgenthaler, et al. “Practice
parameters or the indications for polysomnography and related
SOL may due to the fact that he often fell asleep while reading procedures: an updated for 2005,” Sleep, vol.18, no.4, p.499-519, 2005.
in bed without putting on the Neuroon eye mask properly. [7] M. Littner, C. A. Kushida, W. M. Anderson, et al., “Practice parameters
After a little while, he would wake up to put on the eye mask. for the role of actigraphy in the study of sleep and circadian rhythms: an
This may explain the overestimation of SOL for him. update for 2002,” Sleep, vol.26, no.3, pp.337-341, 2003.
In addition to the interpersonal differences, we also found [8] B. P. Kolla, S. Mansukhani, and M. P. Mansukhani, “Consumer sleep
tracking devices: a review of mechanisms, validity and utility,” Expert
a common trend in the validity of Neuroon for both Review of Medical Devices, vol.13, no.5, p.497-506, 2016.
participants. Good agreement between Neuroon and SLEEP [9] M. de Zambotti, S. Claudatos, S. Inkelis, et al, “Evaluation of a
SCOPE occurred when the sleep quality of the user matched consumer fitness-tracking device to assess sleep in adults,” Chronobiol
the clinical definition of good sleep, which suggests that Int. vol.32, no.7, pp.1024-1028, 2015.
wearable EEG sleep trackers may be more accurate when the [10] D. Shrivastava, S. Jung, M. Saadat, et al. “How to interpret the results of
user has a good night of sleep. a sleep study,” Journal of Community Hospital Internal Medicine
Perspectives, vol.4, 24983, 2014.
Based on this analysis, we may conclude that the validity
[11] J. M. Bland, D. G. Altman, “Statistical methods for assessing agreement
of consumer sleep tracking devices is affected by users’ sleep between two methods of clinical measurement,” Lancet, vol.1, p.307-
quality per se as well as users’ sleep habit. Fitbit may produce 310, 1986.
less accurate sleep measurements for people who take long [12] F. Wilcoxon, “Individual comparisons by ranking methods,” Biometrics
time to fall asleep, while Neuroon may not work well for Bulletin, vol.1, no.6, 80-83, 1945.
people who read in bed. In the next step, we will recruit a [13] M. M. Ohayon, M. A. Carskadon, C. Guilleminault, M. V. Vitiello,
larger cohort to investigate the characteristics of people who “Meta-analysis of quantitative sleep parameters from childhood to old
age in healthy individuals: developing normative sleep values across the
tend to generate inaccurate Fitbit and Neuroon measurements human lifespan,” SLEEP, vol.27, no.7, 2004.
as well as the role of geographical variables such as gender, [14] Z. Liang, B. Ploderer, “Sleep tracking in the real world: a qualitative
BMI and age. Based on such information, we may tune the study into barriers for improving sleep,” in Proceedings of the 28th
sleep analysis model for each type of user. Australian Conference on Computer-Human Interaction, pp.537-541,
2016.
V. CONCLUSION [15] W. Liu, B. Ploderer, and T. Hoang, “In bed with technology: challenges
and opportunities for sleep tracking,” in Proceedings of the Australian
In this study we validated two newest consumer sleep tracking Computer-Human Interaction Conference, pp. 142-151, 2015.
devices, i.e., Fitbit Charge 2 and Neuroon wearable EEG eye
Authorized licensed use limited to: Mapua University. Downloaded on May 01,2021 at 12:32:40 UTC from IEEE Xplore. Restrictions apply.