Learning Sleep Stages From Radio Signals: A Conditional Adversarial Architecture

See discussions, stats, and author profiles for this publication at: https://www.researchgate.
net/publication/323243813
Learning Sleep Stages from Radio Signals: A Conditional Adversarial

Architecture
Conference Paper · August 2017
CITATIONS READS
126 404
5 authors, including:
Mingmin Zhao Shichao Yue

Massachusetts Institute of Technology Massachusetts Institute of Technology
25 PUBLICATIONS 1,386 CITATIONS 11 PUBLICATIONS 251 CITATIONS
SEE PROFILE SEE PROFILE
All content following this page was uploaded by Mingmin Zhao on 17 February 2018.
The user has requested enhancement of the downloaded file.

Learning Sleep Stages from Radio Signals:
A Conditional Adversarial Architecture
Mingmin Zhao 1 Shichao Yue 1 Dina Katabi 1 Tommi S. Jaakkola 1 Matt T. Bianchi 2
Abstract ducted in a hospital or sleep lab, and requires the subject to

We focus on predicting sleep stages from radio wear a plethora of sensors, such as EEG-scalp electrodes,
measurements without any attached sensors on an ECG monitor, multiple chest bands, and nasal probes.
subjects. We introduce a new predictive model As a result, patients can experience sleeping difficulties,
that combines convolutional and recurrent neu- which renders the measurements unrepresentative (Herbst,
ral networks to extract sleep-specific subject- 2010). Furthermore, the cost and discomfort of PSG limit
invariant features from RF signals and capture the potential for long term sleep studies.
the temporal progression of sleep. A key inno- Recent advances in wireless systems have demonstrated
vation underlying our approach is a modified ad- that radio technologies can capture physiological signals
versarial training regime that discards extrane- without body contact (Kaltiokallio et al., 2014; Adib et al.,
ous information specific to individuals or mea- 2015; Zhao et al., 2016). These technologies transmit a
surement conditions, while retaining all infor- low power radio signal (i.e., 1000 times lower power than
mation relevant to the predictive task. We an- a cell phone transmission) and analyze its reflections. They
alyze our game theoretic setup and empirically extract a person’s breathing and heart beats from the ra-
demonstrate that our model achieves significant dio frequency (RF) signal reflected off her body. Since the
improvements over state-of-the-art solutions. cardio-respiratory signals are correlated with sleep stages,
in principle, one could hope to learn a subject’s sleep stages
by analyzing the RF signal reflected off her body. Such a
1. Introduction system would significantly reduce the cost and discomfort
of today’s sleep staging, and allow for long term sleep stage
Sleep plays a vital role in an individual’s health and well-
monitoring.
being. Sleep progresses in cycles that involve multi-
ple sleep stages: Awake, Light sleep, Deep sleep and There are multiple challenges in realizing the potential
REM (Rapid Eye Movement). Different stages are asso- of RF measurements for sleep staging. In particular, we
ciated with different physiological functions. For exam- must learn RF signal features that capture the sleep stages
ple, deep sleep is essential for tissue growth, muscle repair, and their temporal progression, and the features should be
and memory consolidation, while REM helps procedural transferable to new subjects and different environments.
memory and emotional health. At least, 40 million Amer- The problem is that RF signals carry much information that
icans each year suffer from chronic sleep disorders (Na- is irrelevant to sleep staging, and are highly dependent on
tional Institute of Health). Most sleep disorders can be the individuals and the measurement conditions. Specifi-
managed once they are correctly diagnosed (National In- cally, they reflect off all objects in the environment includ-
stitute of Health). Monitoring sleep stages is beneficial ing walls and furniture, and are affected by the subject’s po-
for diagnosing sleep disorders, and tracking the response sition and distance from the radio device. These challenges
to treatment (Carskadon & Rechtschaffen, 2000). were not addressed in past work which used hand-crafted
signal features to train a classifier (Zaffaroni et al., 2014;
Prevailing approaches for monitoring sleep stages are in-
Tataraidze et al., 2016b). The accuracy was relatively low
convenient and intrusive. The medical gold standard re-
(∼64%) and the model did not generalize beyond the envi-
lies on Polysomnography (PSG), which is typically con-
ronment where the measurements were collected.
1
MIT CSAIL, Cambridge, MA, USA 2 Massachusetts General
Hospital, Boston, MA, USA. Correspondence to: Mingmin Zhao This paper presents a new model that delivers a signif-
<mingmin@mit.edu>. icantly higher accuracy and generalizes well to new en-
vironments and subjects. The model adapts a convolu-
Proceedings of the 34 th International Conference on Machine tional neural network (CNN) to extract stage-specific fea-
Learning, Sydney, Australia, PMLR 70, 2017. Copyright 2017 tures from RF spectrograms, and couples it with a recurrent
by the author(s).
Learning Sleep Stages from Radio Signals: A Conditional Adversarial Architecture
neural network (RNN) to capture the temporal dynamics of

Table 1. Automated Sleep Staging Systems
sleep stages.
Signal Source Accuracy (acc/κ)1 Comfort
However, a CNN-RNN combination alone would remain li- EEG High (83%/0.76)2 Low
able to distracting features pertaining to specific individuals Cardiorespiratory Medium (71%/0.56) Medium
or measurement conditions (i.e., the source domains), and Actigraphy Low (65%/-)3 High
hence would not generalize well. To address this issue, we State-of-the-art Low (64%/0.49) High
introduce a new adversarial training regime that discards RF
Ours High (79.8%/0.70) High
extraneous information specific to individuals or measure- 1
Four-class subject-independent classification accuracy on
ment conditions, while retaining all information relevant to
every 30-second segment.
the predictive task –i.e., the adversary ensures conditional 2
Some studies achieve accuracy over 90% (da Silveira et al.,
independence between the learned representation and the
2016) but they discard artifacts and use segments from the
source domains.
same night to train and test.
3
Our training regime involves 3 players: the feature encoder Three-class classification based on 5-minute segment.
(CNN-RNN), the sleep stage predictor, and the source dis-
criminator. The encoder plays a cooperative game with signal: EEG-based, Cardiorespiratory-based, Actigraphy-
the predictor to predict sleep stages, and a minimax game based, or RF-based. Table 1 summarizes the state of the
against the source discriminator. Our source discriminator art performance in each category. The table shows both
deviates from the standard domain-adversarial discrimina- the classification accuracy and the Cohen’s Kappa coef-
tor in that it takes as input also the predicted distribution ficient, κ. The most accurate methods rely on EEG sig-
of sleep stages in addition to the encoded features. This nals (Ebrahimi et al., 2008; Fraiwan et al., 2012; Popovic
dependence facilitates accounting for inherent correlations et al., 2014; Shambroom et al., 2012). However, EEG mon-
between stages and individuals, which cannot be removed itors are also the most intrusive because they require the
without degrading the performance of the predictive task. subject to sleep with a skullcap or a head-band equipped
with multiple electrodes, which is uncomfortable and can
We analyze this game and demonstrate that at equilibrium, cause headaches and skin irritation.The second category re-
the encoded features discard all extraneous information that quires the subject to wear a chest-band and analyzes the re-
is specific to the individuals or measurement conditions, sulting cardiorespiratory signals. It is more comfortable
while preserving all information relevant to predicting the than the prior method but also less accurate (Tataraidze
sleep stages. We also evaluate our model on a dataset of et al., 2016a; Long et al., 2014). The third approach is
RF measurements and corresponding sleep stages1 . Exper- based on actigraphy; it leverages accelerometers in FitBit
imental results show that our model significantly improves or smart phones (Hao et al., 2013; Gu et al., 2014) to moni-
the prediction accuracy of sleep stages. In particular, our tor body movements and infer sleep quality. Yet, motion is
model has a prediction accuracy of 79.8% and a Cohen’s known to be a poor metric for measuring sleep stages and
Kappa of 0.70, whereas the best prior result for predicting quality (Pollak et al., 2001). The last approach relies on
sleep stages from RF signals (Tataraidze et al., 2016b) has RF signals reflected off the subject body during her sleep.
an accuracy of 64% and a Cohen’s Kappa of 0.49. It allows the subject to sleep comfortably without any on-
body sensors. Yet past approaches in this category have the
2. Related Work worst performance in comparison to other solutions.
(a) Sleep Staging: The gold standard in sleep staging is This paper builds on the above literature but delivers sig-
based on Polysomnography (PSG) conducted overnight in nificant new contributions. In comparison to methods that
a hospital or sleep lab. The subject has to sleep while wear- use sources other than RF signals, the paper enables accu-
ing multiple sensors including an EEG monitor, an EMG rate monitoring of sleep stages while allowing the subject
monitor, an EOG monitor, nasal probes, etc. A sleep tech- to sleep comfortably in her own bed without sensors on her
nologist visually inspects the output of the sensors and as- body. Furthermore, due to differences between RF signals
signs to each 30-second window a stage label (Rechtschaf- and other signal sources, our model has to eliminate extra-
fen & Kales, 1968). neous information that are specific to the environment and
irrelevant to sleep stage classification. In comparison to
A few past proposals have tried to automate the process past work on learning sleep stages from RF signals (Rah-
and reduce the number of sensors. These solutions can man et al., 2015; Tataraidze et al., 2016b; Liu et al., 2014),
be classified into four categories according to their source our approach significantly improves the prediction accu-
1
Dataset is available at: racy as shown in Table 1. This improvement is due to in-
http://sleep.csail.mit.edu/ trinsic differences between past models and the model in
this paper, which avoids hand-crafted features, and learns
features that capture the temporal dependencies and trans- x x

fer well to new subjects and different environments.
(b) Representation Learning: We build on a rich body of E E
literature on CNNs and RNNs which have been success-
fully used to model spatial patterns (Szegedy et al., 2015; Py (·|x)
E(x) E(x)
He et al., 2016) and temporal dynamics (Sutskever et al.,
2014), including combinations of the two (Pigou et al.,
2015). Our CNN differs slightly in terms of convolutions F D F D
that are adapted to our domain while, architecturally, our
RNN is a standard variety LSTM. QF (y) QD (s) QF (y) QD (s)
Our work also contributes to learning invariant represen- (a) Model (Ideal Game) (b) Extended Game
tations in deep adversarial networks. Adversarial net-
works were introduced to effectively train complex gener- Figure 1. Model and Extended Game. Dotted arrow indicates that
ative models of images (Goodfellow et al., 2014; Radford the information does not propagate back on this link.
et al., 2015; Chen et al., 2016) where the adversary (dis-
criminator) was introduced so as to match generated sam-
ples with observed ones. The broader approach has since metrics for measuring distance between distributions (Ben-
been adopted as the training paradigm across a number of David et al., 2010; Fernando et al., 2013).
other tasks as well, from learning representations for semi-
supervised learning (Makhzani et al., 2015), and model- 3. Model
ing dynamic evolution (Vondrick et al., 2016; Purushotham
et al., 2017) to inverse maps for inference (Donahue et al., Let x ∈ Ωx be an input sample, and y ∈ {1, 2, ..., ny }
2017; Dumoulin et al., 2017), and many others. Substan- an output label. Let s ∈ {1, 2, ..., ns } denote an auxiliary
tial work has also gone into improving the stability of ad- label that refers to the source of a specific input sample. We
versarial training (Metz et al., 2016; Arjovsky et al., 2017; define x = [x1 , x2 ..., xt ] ∈ Ωx as the sequence of input
Arjovsky & Bottou, 2017). samples from the beginning of time until the current time t.
On a technical level, our work is most related to adversar- In the context of our application, the above notation trans-
ial architectures for domain adaptation (Ganin & Lempit- lates into the following: The input sample x is a 30-second
sky, 2015; Ganin et al., 2016; Tzeng et al., 2015; 2016). RF spectrogram, and the output label y is a sleep stage
Yet, there are key differences between our approach and that takes one of four values: Awake, Light Sleep, Deep
the above references, beyond the main application of sleep Sleep, or REM. The vector x refers to the sequence of RF
staging that we introduce. First, our goal is to remove spectrograms from the beginning of the night until the cur-
conditional dependencies rather than making the represen- rent time. Since RF signals carry information about the
tation domain independent. Thus, unlike the above ref- subject and the measurement environment, we assign each
erences which do not involve conditioning in the adver- input x an auxiliary label s which identifies the subject-
sary, our adversary takes the representation but is also con- environment pair, hereafter referred to as the source.
ditioned on the predicted label distribution. Second, our Our goal is to learn a latent representation (i.e., an encoder)
game theoretic setup controls the information flow differ- that can be used to predict label y; yet, we want this repre-
ently, ensuring that only the representation encoder is mod- sentation to generalize well to predict sleep stages for new
ified based on the adversary performance. Specifically, the subjects without having labeled data from them. Simply
predicted distribution over stages is strategically decoupled making the representation invariant to the source domains
from the adversary (conditioning is uni-directional). Third, could hamper the accuracy of the predictive task. Instead
we show that this new conditioning guarantees an equi- we would like to remove conditional dependencies between
librium solution that fully preserves the ability to predict the representation and the source domains.
sleep staging while removing, conditionally, extraneous in-
formation specific to the individuals or measurement con- We introduce a multi-domain adversarial model that
ditions. Guarantees of this kind are particularly important achieves the above goal. Our model is shown in Fig. 1(a).
for healthcare data where the measurements are noisy with It has three components: An encoder E, a label predictor
a variety of dependencies that need to be controlled. F , and a source discriminator D. Our model is set up as a
game, where the representation encoder plays a cooperative
Finally, our work is naturally also related to other game with the label predictor to allow it to predict the cor-
non-adversarial literature on multi-source domain adapta- rect labels using the encoded representation. The encoder
tion (Crammer et al., 2008; Long et al., 2015), and work on also plays a minimax game against the source discrimina-
tor to prevent it from decoding the source label from the loss of the source discriminator D as the cross-entropy be-
encoded representation. tween Ps (·|x) and QD (·|E(x), Py (·|x)):
A key characteristic of our model is the conditioning of Ld (D; E) = Ex,s [− log QD (s|E(x), Py (·|x))] (2)
the source discriminator on the label distribution, Py (·|x)
(see Fig. 1(a)). This conditioning of the adversary allows During training, encoder E and discriminator D play a
the learned representation to correlate with the domains, minimax game: while D is trained to minimize the source
but only via the label distribution –i.e., removes conditional prediction loss, encoder E is trained to maximize it in order
dependencies between the representation and the sources. to achieve the above invariance.
The rest of this section is organized as follows. We first
3.1. Ideal Game
formally define three players E, F , and D and the repre-
sentation invariance they are trained to achieve. In Sec. 3.1, During training, encoder E plays a co-operative game with
we analyze the game and prove that at equilibrium the en- predictor F , and a minimax game with discriminator D.
coder discards all extraneous information about the source We define a value function of E, F and D with λ > 0:
that is not beneficial for label prediction (i.e., predicting y).
Training the ideal model in Fig. 1(a) is challenging because V(E, F, D) = Lf (F ; E) − λ · Ld (D; E) (3)
it requires access to the label distribution Py (·|x). To drive
an efficient training algorithm, we define in Sec. 3.2 an ex- The training procedure can be viewed as a three-player
tended game where the source discriminator uses the output minimax game of E, F and D:
of the label predictor as an approximation of the posterior min min max V(E, F, D) = min max V(E, F, D) (4)
probabilities, as shown in Fig. 1(b). We prove that the equi- E F D E,F D
libriums of the original game are also equilibriums in the
Proposition 2 (Optimal predictor). Given encoder E,
extended one.
Encoder E: An encoder E(·) : Ωx → Ωz is a func- Lf (E) , min Lf (F ; E) ≥ H(y|E(x)), (5)
F
tion that takes a sequence of input samples x, and returns a
vector summary of x as z = E(x). where H(·) is entropy.
Label Predictor F : A label predictor F (·) : Ωz → The optimal predictor F ∗ that achieves equality is:
[0, 1]ny takes a latent representation E(x) as input and pre-
QF ∗ (y|E(x)) = p(y|E(x)) (6)
dicts the probability of each label y associated with input
x as QF (y|E(x)). The goal of an ideal predictor F is to
approximate Py (·|x) with QF (·|E(x)). Proof.
The loss of the label predictor, F , given the encoder E, is Lf (F ; E)

defined as the cross-entropy between the label distribution = Ex,y [− log QF (y|E(x))]
Py (·|x) and QF (·|E(x)): = EE(x),y [− log QF (y|E(x))]
Lf (F ; E) = Ex,y [− log QF (y|E(x))] (1) = Ez∼P (E(x)) Ey∼P (y|z) [− log QF (y|z)]
During training, the encoder E and predictor F play a co- = Ez∼P (E(x)) [H(y|z) + DKL (P (y|z) k QF (y|z))]
operative game to minimize the label prediction loss. ≥ Ez∼P (E(x)) [H(y|z)]
Source Discriminator D: We define a source dis- =H(y|E(x))
criminator as D(·, ·) : Ωz × [0, 1]ny → [0, 1]ns . It
The equality holds when
takes the latent representation E(x) and the label distri-
DKL (P (y|E(x)) k QF (y|E(x))) = 0 for almost ev-
bution Py (·|x) as inputs, and predicts which source do-
ery x ∈ Supp(Px ). That is QF ∗ (y|E(x)) = p(y|E(x))
main (i.e., subject and environment) they are sampled from
for almost every y and x ∈ Supp(Px ).
as QD (·|E(x), Py (·|x)).
Next, we define the desired representation invariance. Similarly we can prove the following Proposition.
Definition 1 (Representation invariance). We say that rep- Proposition 3 (Optimal discriminator). Given encoder E,
resentation E is invariant if E(x) contains no information
about s beyond what is already contained in Py (·|x); that Ld (E) , min Ld (D; E) ≥ H(s|E(x), Py (·|x)) (7)
D
is, QD (·|E(x), Py (·|x)) = QD (·|Py (·|x)) for the optimal
D. The optimal discriminator D∗ that achieves this value is:
To measure the invariance of an encoder E, we define the QD∗ (s|E(x), Py (·|x)) = P (s|E(x), Py (·|x)) (8)
Corollary 3.1. H(s) is an upper bound of the loss of the Remark 5.1. During the proof of Theorem 5, we construct
optimal discriminator D∗ for any encoder E. an encoder E0 (x) = Py (·|x) that can achieve the opti-
mal value of C(E). However, we argue that training will
Next, we state the virtual training criterion of the encoder. not converge to this trivial encoder in practice. This is be-
Proposition 4. If predictor F and discriminator D have cause Py (·|x) is a mapping from the full signal history to
enough capacity and are trained to achieve their optimal the distribution over stages at the current step, therefore it-
losses, the minimax game (4) can be rewritten as the fol- self highly complex. Since we use the RNN state as the en-
lowing training procedure of encoder E: coding E(x), and it feeds into the LSTM gates, distribution
over stages at previous step does not represent a sufficient
min[H(y|E(x)) − λ · H(s|E(x), Py (·|x))] (9) summary of the history until the current one. Therefore,
E
E(x) must be able to anticipate the temporal evolution
of the signal and contain a more effective summary than
Proof. Based on the losses of the optimal predictor F ∗ and
Py (·|x) would be.
the optimal discriminator D∗ in Proposition 2 and Proposi-
tion 3, the minimax game (4) can be rewritten as (9). Thus, Corollary 5.2. If encoder E and predictor F have enough
encoder E is trained to minimize a virtual training criterion capacity and are trained to reach optimum, the output of F
C(E) = H(y|E(x)) − λ · H(s|E(x), Py (·|x)). is equal to Py (·|x).
Next, we describe the optimal encoder. Proof. When predictor F is optimal (Proposition 2),
QF (y|E(x)) = p(y|E(x)). When E is optimal (The-
Theorem 5 (Optimal encoder). If encoder E, predictor F orem 5), H(y|E(x)) = H(y|x), that is p(y|E(x)) =
and discriminator D have enough capacity and are trained p(y|x). Therefore, QF (y|E(x)) = p(y|x).
to reach optimum, any global optimal encoder E ∗ has the
following properties:
3.2. Extended Game
∗
H(y|E (x)) = H(y|x) (10a) In practice, estimating the posterior label distribution
∗ Py (·|x) from labeled data is a non-trivial task. Fortunately
H(s|E (x), Py (·|x)) = H(s|Py (·|x)) (10b)
however our predictor F and encoder E are playing a coop-
Proof. Since E(x) is a function of x: erative game to approximate this posterior label distribution
Py (·|x) with QF (·|E(x)). Therefore, we use QF (·|E(x)),
Lf (E) = H(y|E(x)) ≥ H(y|x) (11a) the output of predictor F , as a proxy of Py (·|x) and feed it
Ld (E) = H(s|E(x), Py (·|x)) ≤ H(s|Py (·|x)) (11b) as input to discriminator D (Fig. 1(b)).
An extended three-player game arises: while encoder E
Hence, C(E) = H(y|E(x)) − λ · H(s|E(x), Py (·|x)) ≥ still plays a cooperative game with predictor F and a mini-
H(y|x) − λ · H(s|Py (·|x)). The equality holds if and only max game with discriminator D, discriminator D depends
if both (10a) and (10b) are satisfied. Therefore, we only strategically on predictor F but not vice versa. The dotted
need to prove that the optimal value of C(E) is equal to line in Fig. 1(b) illustrates this dependency.
H(y|x)−λ·H(s|Py (·|x)) in order to prove that any global
encoder E ∗ satisfies both (10a) and (10b). The relationship between the ideal minimax
game (Sec. 3.1) and the extended one is stated below.
We show that C(E) can achieve H(y|x)−λ·H(s|Py (·|x))
Proposition 6. If encoder E, predictor F and discrimina-
by considering the following encoder E0 : E0 (x) =
tor D have enough capacity, the solution that encompasses
Py (·|x). It can be examined that H(y|E0 (x)) = H(y|x)
the optimal encoder, E ∗ , predictor, F ∗ and discriminator,
and H(s|E0 (x), Py (·|x)) = H(s|Py (·|x)).
D∗ , in the ideal minimax game is also an equilibrium solu-
tion of the extended game.
Adversarial training of D can be viewed as a regularizer,
which leads to a common representation space for multi-
Proof. By Corollary 5.2, when encoder E and predictor F
ple source domains. From Theorem 5, the optimal encoder
are optimal, QF (·|E(x)) is equal to Py (·|x). Thus, the
E ∗ using adversarial training satisfies H(y|E ∗ (x)) =
extended game becomes equivalent to the ideal game, and
H(y|x), which is the maximal discriminative capability
E ∗ , F ∗ and D∗ is an equilibrium solution of both games.
that any encoder E can achieve. Thus, we have the fol-
lowing corollary.
Corollary 5.1. Adversarial training of the discriminator
does not reduce the discriminative capability of the repre-
sentation.
Algorithm 1 Encoder, predictor and discriminator training adversarial feedback even when the target labels are un-
Input: Labeled data {(xi , yi , si )}M certain. For example, in the sleep staging problem, each
i=1 , learning rate η.
Compute stop criterion for inner loop: δd ← H(s) 30-second window is given one label. Yet, many such win-
for number of training iterations do dows include transitions between sleep stages, e.g., a tran-
Sample a mini-batch of training data {(xi , yi , si )}m sition from light to deep sleep. These transitions are grad-
i=1
Lif ← − log QF (yi |E(xi )) ual and hence the transition windows can be intrinsically
wi ← QF (·|E(xi )) I stop gradient along this link different from both light and deep sleep. It would be desir-
Lid ← − log QD (si |E(xi ), wi ) able to have the learned representation capture the concept
V i = Lif − λ · Lid of transition and make it invariant to the source (see the
Update encoder E: results in Sec. 4.5). 3) It allows the conditioning to remain
1
Pm i
θ e ← θ e − η e ∇θ e m i=1 V
available for additional guiding of representations based on
Update predictor F : unlabeled data. The model can incorporate unlabeled data
1
Pm i
θ f ← θ f − η f ∇θ f m i=1 V
for either semi-supervised learning or transductive learning
repeat within a unified framework.
Update discriminator D: Pm i
1
θd ← θd + ηd ∇θd m i=1 V 4. Experiments
1
P m i
until m i=1 dL ≤ δ d
end for In this section, we empirically evaluate our model.
4.1. RF-Sleep Dataset

3.3. Training Algorithm RF-Sleep is a dataset of RF measurements during sleep
with corresponding sleep stage labels. All studies that in-
We implement the extended three-player game with itera-
volve human subjects were approved by our IRB.
tive updates of the players (Algorithm 1). Note that, since
the output of the label predictor is a proxy of the underly- Study setup: The sleep studies are done in the bedroom
ing posterior, and since the source discriminator depends of each subject. We install a radio device in the bedroom.
strategically on the predictor but not vice versa, the gradi- It transmits RF signals and measure their reflections while
ent does not back-propagate from the discriminator to the the subject is sleeping alone in the bed.
predictor (i.e., the dotted link in Fig. 1(b)).
Ground truth: During the study, each subject sleeps
The number of training steps in the inner loop usually needs with an FDA-approved EEG-based sleep monitor (Popovic
to be carefully chosen (Goodfellow et al., 2014). A large et al., 2014), which collects 3-channel frontal EEG. The
number of steps is computationally inefficient but a small monitor labels every 30-second of sleep with the subject’s
one will cause the model to collapse. This is because the sleep stage. This system has human-level comparable ac-
outer players, E and F , can be over-trained against a non- curacy (Popovic et al., 2014), and has already been used in
optimal inner player D, and they will try to maximize Ld several sleep studies(Lucey et al., 2016; Shah et al., 2016).
at the cost of increasing Lf . To prevent the model col-
Size of dataset: The dataset collects 100 nights of sleep
lapse phenomenon, we use an adaptive number of training
from 25 young healthy subjects (40% females). It contains
steps in the inner loop and adjust it dynamically based on
over 90k 30-second epochs of RF measurements and their
Ld (Algorithm 1). The idea is to use the upper bound in
corresponding sleep stages provided by the EEG-based
Corollary 3.1 as the stopping criterion for the inner loop.
sleep monitor. Each epochs has one of four labels Awake,
REM, Light Sleep (N1 or N2) and Deep Sleep (N3).
3.4. Discussion of the Model Benefits
While we described our model in the context of sleep stag- 4.2. Parameterization
ing, we believe the model can be applied more broadly. Our
We parameterize encoder E, predictor F and discrimina-
model is characterized by the 3-way game and the adver-
tor D as neural networks. Encoder E is parameterized by
sarial conditioning on the label distribution. This combina-
a hybrid CNN-RNN model. We adapt a residual networks
tion yields the following benefits: 1) It guarantees an equi-
architecture (He et al., 2016) with 24 convolutional layers
librium solution that fully preserves the ability to perform
to extract features from each 30-second RF spectrogram,
the predictive task while removing any distracting informa-
and an RNN with LSTM cell (Hochreiter & Schmidhuber,
tion specific to the source domains. Guarantees of this kind
1997) that takes sequences of CNN features as input. Both
are particularly important in healthcare where the measure-
predictor F and discriminator D are parameterized by net-
ments are noisy and have a variety of dependencies that
works with two fully-connected layers.
need to be controlled. 2) It allows to properly leverage the
Ground Truth
Wake
Table 2. Sleep Stage Classification Accuracy and Kappa REM
Light
Approach Accuracy κ Deep
Prediction
Wake
Tataraidze et al. (2016b) 0.635 0.49 REM
Light
Zaffaroni et al. (2014) 0.641 0.45 Deep
0 1 2 3 4 5 6 7
Time since light off(h)
Ours 0.798 0.70
(a) Average Accuracy (80.4%)
1.0 Ground Truth

Awake .63 .12 .24 .01 Wake
REM
Actual stage
0.8 Light
Accuracy
REM .03 .82 .15 0 0.6
Deep
Prediction
Wake
Light .04 .07 .83 .06 0.4 REM
Light
0.2 Deep
0 1 2 3 4 5 6 7
Deep .01 0 .24 .75 Time since light off(h)
0.0
Awake REM Light Deep 1 5 9 13 17 21 25 (b) Best Accuracy (91.2%)
Predicted stage Subject
(a) Confusion Matrix (b) Accuracy on each subject Wake
Ground Truth
REM
Light
Deep
Figure 2. 2(a) shows that our model can distinguish deep and light Prediction
Wake
sleep with high accuracy. And 2(b) illustrates that our model REM
Light
works well for different subjects and environments. Deep
0 1 2 3 4 5 6
Time since light off(h)
(c) Worst Accuracy (71.2%)
4.3. Classification Results Figure 3. Three examples of full night predictions corresponding
to the average, best and worst classification accuracy.
We evaluate the model on every subject while training on
the data collected from the other subjects (i.e., the model
is never trained on data from the test subject). The train- CNN Response RNN Response
●● ● ● ●
● ●●●
●● ● ●● ●●●●
●● ●● ●
ing data is randomly split into a training set and validation ●

●
● ●
●
●●● ●
●
● ● ●
● ●●
●● ●
●
● ●
●
● ● ●●
●●●●● ●
● ●
●●
●
●● ●●
● ●● ●
●● ●
●● ●● ●● ● ●●
●
●
●●
●
●
●
● ●
● ● ● ● ●●
set (75%/25%). ● ●
● ● ● ●●
● ●
● ●
● ●● ●
● ●
● ● ● ●
● ●
● ●
● ●● ● ● ●
●
●●
● ●● ●
● ●● ● ●●
We use two metrics commonly used in automated sleep ●● ●

●●●●● ● ● ●
● ●●● ●
●●●
●
●●
●
● ● ● ●●
● ●
● ●●
● ● ●
●
●
●
●
●
●
●●● ●
● ●
●
●● ●●●●● ● ●●●●● ●
staging, namely Accuracy and Cohen’s Kappa. While ac- ●

●
●●
●
●
●
●
● ●
●●
●
●
●● ●●
●●●●
● ● ●
●
●
curacy measures the percent agreement with ground truth, ●

● ●●
Cohen’s Kappa coefficient κ (Cohen, 1960) takes into ac- Awake ● REM Light Deep
count the possibility of the agreement occurring by chance
and is usually a more robust metric. κ > 0.4, κ > 0.6, Figure 4. Visualizations of the CNN and RNN responses. CNN
κ > 0.8 are considered to be moderate, substantial and al- can separate Wake REM and from the other stages, yet Deep and
most perfect agreement (Landis & Koch, 1977). Light Sleep can only be distinguished by RNN.
Table 2 shows the accuracy and Cohen’s Kappa of our

model compared to the state-of-the-art in classifying sleep
4.4. Understanding the Role of CNN & RNN
stages using RF reflections. Since neither the dataset nor
the code used in past papers is publicly available, we com- We analyze the role of CNN and RNN in predicting sleep
pare with their published results. We note however that stages. To do so, we use t-SNE embedding (Maaten & Hin-
the Cohen’s Kappa provides some normalization since it ton, 2008) to visualize the response of our network after
accounts for the underlying uncertainty in the data. The ta- CNN and RNN, respectively. Fig. 4 shows the visualiza-
ble shows that our model has an accuracy of 79.8% and a tion results from one of the subjects. Data points are ran-
κ = 0.70, which significantly outperforms past solutions. domly sub-sampled for better viewing. The result shows
that the CNN succeeds at separating the Wake, REM from
Fig. 2(a) shows the confusion matrix of our model.
Light and Deep Sleep. However it fails at separate Light
Fig. 2(b) also shows the accuracy on each subject. It has
Sleep and Deep Sleep from each other. In contrast, Light
a standard deviation of 2.9%, suggesting that our model is
Sleep and Deep Sleep form different clusters in the RNN
capable of adapting to different subjects and environments.
response. These results demonstrate the role of CNN and
Finally, we show in Fig. 3 the full-night predictions along RNN in our model: CNN learns stage-specific features
with the ground truth for the average, best, and worst clas- that can distinguish Wake, REM and from Deep and Light
sification accuracy. Sleep. RNN captures the dynamics of those features to fur-
Source Loss Test Loss w/o Adversary w/ Adversary

●
● ●
●
●●
1.4 ●●
●●
●● ●
●
●
●
●
●
● ●● ●
● ● ●
●● ● ● ● ● ● ●
●
● ●● ● ● ●● ● ●
● ●●● ●
2 1.2 ●
●
●
●●
●
●
●
●●
●
● ●
●
●● ●
●●
●
●●●● ● ●
●●
●
●
●
●
●
●
●
●
●
●● ●
●● ●●
● ●●
● ●
●
●●●●● ●●●●●● ● ●
● ●●●
● ●
● ●●
● ● ● ●
●
●●
● ● ●●●●
●●
●● ● ●
● ●● ● ●● ●
●●● ● ● ●●●●
●
● ●● ● ● ● ● ●
1.0 ●●●● ●● ● ● ● ● ●
● ● ● ● ●
● ● ● ● ● ● ● ● ●● ●
● ●
● ● ●
●
● ●● ● ●●●● ● ●●
● ●●● ● ● ●●● ● ● ●
●
● ● ● ● ● ● ●● ●
●
● ●● ● ● ●●
●●
● ● ● ●
●● ● ●
1 ● ● ●● ●
● ● ● ●
●●
●● ● ●
● ● ●
● ●●
● ● ●●● ● ●
● ● ●●●
● ●
● ● ●
● ● ●
● ● ● ●●●●●
●● ● ●
● ●
●
0.8 ●
●
●
●
●
●
●
●
●
●
●
●●
● ●
●
● ●
●●
●
●●
●
●● ●●●
●●
●●
●
●●●●●●●●
●
●●●
●●● ● ●●● ● ●
●●
●
●●
●
●
● ●
●
●
●
● ●
●
●
● ●● ●● ● ● ●●●
●
●● ● ●●●
●●●● ●● ●
● ● ● ●
● ● ●
● ● ● ● ●● ● ●● ● ●● ● ●
●
● ● ● ● ●
●
● ● ●●●● ●● ● ●● ●
● ●
● ●
● ●●●● ● ● ●● ●●
● ● ● ●● ● ● ● ●
●
● ●●●● ● ●● ● ● ● ● ●● ●
● ● ● ●
●●
0 2000 4000 6000 0 2000 4000 6000 ● ●●●
●
●
●
●
● ●
●
●
●
●●● ●●●● ●
●
●●●●
● ●
●
● ●
●
● ●
● ●● ● ●
●
●
●
●
●
●
●
●● ●
● ●● ● ●
● ●
● ●
●
●
w/o Adversary w/ Adversary

Source1 ● Source2 ● Deep ● Light ● Transition
Figure 5. Baseline model and ours are evaluated on same dataset.

A higher source loss indicates the removal of source specific in- Figure 7. Visualization of fine-grained alignment on test data. Our
formation, and a lower test loss shows that the proposed setup can model, which conditions the adversary on the posterior distribu-
better avoid overfitting. tion, not only aligns deep and light stages, but also aligns the tran-
sition periods, which are not directly specified by the labels.
w/o Adversary w/ Adversary
●● ● ● ●
● ●● ●
● ●
●● ● ●●●●●
●● ●
●●
●
● ●
● ●●● ●
● ● ● ● ●
● ●●● ● ●●●● ●● ●●● ● ●● ● ●
● ● ● ● ● ● ●● ● ●● ● ●● ● ●●
●
● ● ●● ● ●
●● ● ● ● ●
●● ● ●
we train a (non-adversarial) discriminator to determine the

● ● ● ● ●●● ●
● ● ●●● ● ●● ● ● ● ● ●● ● ● ● ●
●
● ● ● ●● ●● ● ●● ● ●●● ●
● ● ●
● ● ● ●●
●●●
● ● ● ● ●● ●●
●
● ● ● ● ● ●● ● ●● ● ●●
● ● ● ● ● ●● ● ● ●● ●
● ● ● ● ● ●
●● ●● ● ● ● ●
●● ● ● ●●
●● ● ● ● ● ● ● ● ●● ●
● ●●●●● ●
● ● ● ●●
●● ● ●● ●●
●●●
●
●● ● ● ● ●● ●● ●●
●
● ●●● ● ●● ● ● ●● ●
●●
● ● ● ●● ● ●● ●● ● ●●● ● ● ●
● ●
●● ●
●● ●● ● ●
● ●●
●
● ● ● ● ●
●
●● ● ●
●
● ● ●● ●
● ●
● ●● ● ●● ● ● ●●● ● ●● ●
● ●● ● ●● ●●
● ●●
● ● ● ● ● ●● ● ● ● ●
● ● ● ● ●● ● ● ● ● ●
●
●●
● ● ● ● ●● ● ● ● ● ●●● ● ●
● ●● ● ● ●
source of features. Fig. 5 shows that the loss of the source

● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ●●
● ● ●● ● ● ● ● ● ●
● ●●● ● ● ● ●● ● ● ● ●●●● ● ●
● ●
● ●
● ● ●● ● ● ● ● ● ● ●● ●●
●● ● ●● ● ● ● ● ● ● ●● ● ● ●● ●● ● ●●
●●●
●●●● ● ●
● ● ●
● ● ● ●● ● ●
● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ●●
● ●● ●
● ● ● ● ● ● ● ● ● ● ●● ● ●● ●
●● ●●● ● ● ● ●
●●
● ● ●
● ●
● ● ●
● ●●
●
● ●
●● ● ● ●● ●
●
● ● ● ● ● ● ● ●● ● ● ● ● ●
● ● ●● ●
●
● ● ● ●● ● ● ● ● ● ● ●
● ● ●
● ● ● ● ●
● ● ●● ● ●● ● ●● ● ● ● ● ● ●● ● ● ●● ● ●● ●
● ●● ● ●● ●
●
●● ●● ● ●● ●● ● ●
●● ●
● ●● ● ●● ● ●● ● ● ● ●● ●● ● ●
●● ●● ●● ●
● ● ●● ●● ● ● ●●
discriminator in the baseline model decreases very quickly

●
●
●● ● ●●● ● ● ● ●● ● ●● ●
● ● ● ●●
● ● ● ●
● ●●● ● ●● ●● ●● ● ● ● ●● ●●● ● ●●● ●
● ● ●● ● ● ● ●●●
● ●● ● ●●● ●●
● ●● ● ● ● ●
● ● ● ● ●● ●● ●● ● ●● ● ● ● ●● ● ●● ●● ● ●
● ● ● ● ● ● ●● ● ● ●● ● ● ●
● ● ● ● ● ● ●●
● ● ●
● ● ●●● ●● ●● ● ●● ●●
● ● ●
●
●● ●● ● ● ●●
●● ●
● ●●
●●●● ● ●● ●● ● ●● ● ● ●
●● ● ●● ● ● ● ● ●●
● ●
● ● ● ● ● ● ● ● ● ●
● ● ●● ● ● ● ● ●
●
●● ● ●
●
● ● ● ●● ●
●
● ●● ● ●● ● ●● ● ●
● ●●
●● ●● ● ● ●● ●●●●
●
while ours stays high (upper bounded by H(s) = 2.81 in

● ●● ● ● ● ● ●
●● ● ● ● ● ● ● ●● ● ● ●●●
●
●
●● ● ● ● ●
●
● ● ●● ● ● ●● ●● ● ● ●
●
● ●● ● ● ● ●●● ● ● ● ● ● ● ●●● ● ●● ●●
● ● ● ●● ● ●
●● ●● ●● ●● ●
●
● ●● ● ● ● ● ● ●● ●●● ● ● ●
●● ● ● ●● ● ● ● ●● ● ● ● ●● ● ● ● ●
●● ● ●●
●● ● ● ●●●● ●● ●● ● ● ● ●● ●●● ● ● ● ● ● ● ●
● ●
●● ●● ● ● ● ● ●●
● ●● ●● ●●● ●● ● ● ● ●
●●
●●●
●
● ●● ●●
● ●●● ● ●● ● ● ●●● ●●
●● ● ● ●● ●●● ●
●●● ●
● ● ●● ●●● ● ●
●
● ●
●● ● ●
● ●● ●
● ● ●●
● ●
this case), suggesting that our learned representation is in-

● Source1 Source2
variant across sources. The figure also shows that adding
Figure 6. Visualization of learned latent representations from two an adversarial discriminator increases the performance on
sources. Data-points are separated when no adversary, yet they the test set and can be helpful in reducing over-fitting.
are well aligned by proposed setup. To check that our adversarial model has learned transfer-
able features, we visualize the learned features E(x) on the
ther determine whether the sleep is light or deep. Note that test data for both models. Color-coding the sources, Fig. 6
Light and Deep Sleep are more similar to each other and shows that our learned features have almost the same distri-
are typically referred to as NREM, i.e., non-REM. bution on different sources, while the baseline model learns
features that are separable.
We have trained a similar model without the RNN layer on
top of CNN. In this case, the overall accuracy decreases Next, we illustrate the benefits of conditioning on the pos-
by 12.8%, specifically the precision light and deep sleep terior distribution, and that it can recover underlying con-
decreases by 23.5%. This suggests that there are stage- cepts not specified in the labels. We consider the learned
specific information embedded in the temporal dynamics features for transition periods between light and deep sleep,
of the RF measurements, and therefore can only be cap- which might be a class that is different from both light and
tured and exploited with RNN. Moreover, these temporal deep sleep. We define transition periods as epochs that have
dynamics are particularly crucial for distinguishing light both light and deep sleep as neighbors. We visualize it with
and deep sleep. Indeed, there are known temporal pat- a different color. Color-coding stages and shape-coding
terns that govern the progression of light and deep sleep sources, Fig. 7 shows the learned features from transition
through the night (Carskadon et al., 2005). For example, periods are segregated, as those from light sleep and deep
the probability of being in deep sleep decreases as sleep sleep. This indicates that our learned features have recov-
progresses. Also, people usually need to go through light ered the concept of a transition period, which is helpful in
sleep before they can get into deep sleep. These tempo- understanding and predicting sleep stages.
ral dynamics of sleep stages can be captured by RNN and
might be exploited to distinguish light and deep sleep. 5. Conclusion
This paper introduce a new predictive model that learns
4.5. Role of Our Adversarial Discriminator
sleep stages from RF signals and achieves a significant
We evaluate the role of our adversarial discriminator in improvement over the state-of-the-art. We believe this
learning transferable features for predicting sleep stages. work marks an important step in sleep monitoring. We
We first look at the losses on the validation set as train- also believe that the proposed adversarial setup, which ex-
ing progresses to check whether the extraneous informa- tracts task-specific domain-invariant features, is applica-
tion specific to the individuals and environments can be re- ble to other predictive tasks, particularly in health sensing
moved. As a baseline, we compare with a version of our where variations across subjects and measurement condi-
model without the source discriminator. For this baseline, tions could be a major challenge.
Acknowledgments Ebrahimi, Farideh, Mikaeili, Mohammad, Estrada, Edson,

and Nazeran, Homer. Automatic sleep stage classifica-
The authors thank the anonymous reviewers for their help- tion based on eeg signals by using neural networks and
ful comments in revising the paper. We are grateful to the wavelet packet coefficients. IEEE EMBC, 2008.
members of the CSAIL for their insightful discussions and
to all the human subjects for their participation in our ex- Fernando, Basura, Habrard, Amaury, Sebban, Marc, and
periments. Tuytelaars, Tinne. Unsupervised visual domain adapta-
tion using subspace alignment. ICCV, 2013.
References
Fraiwan, Luay, Lweesy, Khaldon, Khasawneh, Natheer,
Adib, Fadel, Mao, Hongzi, Kabelac, Zachary, Katabi, Wenz, Heinrich, and Dickhaus, Hartmut. Auto-
Dina, and Miller, Robert C. Smart homes that monitor mated sleep stage identification system based on time–
breathing and heart rate. ACM CHI, 2015. frequency analysis of a single eeg channel and random
Arjovsky, Martin and Bottou, Léon. Towards principled forest classifier. Computer methods and programs in
methods for training generative adversarial networks. biomedicine, 2012.
ICLR, 2017.
Ganin, Yaroslav and Lempitsky, Victor. Unsupervised do-
Arjovsky, Martin, Chintala, Soumith, and Bottou, Léon. main adaptation by backpropagation. ICML, 2015.
Wasserstein generative adversarial networks. ICML,
2017. Ganin, Yaroslav, Ustinova, Evgeniya, Ajakan, Hana, Ger-
main, Pascal, Larochelle, Hugo, Laviolette, François,
Ben-David, Shai, Blitzer, John, Crammer, Koby, Kulesza, Marchand, Mario, and Lempitsky, Victor. Domain-
Alex, Pereira, Fernando, and Vaughan, Jennifer Wort- adversarial training of neural networks. Journal of Ma-
man. A theory of learning from different domains. Ma- chine Learning Research, 2016.
chine learning, 2010.
Carskadon, Mary A and Rechtschaffen, Allan. Monitor- Goodfellow, Ian, Pouget-Abadie, Jean, Mirza, Mehdi, Xu,
ing and staging human sleep. Principles and practice of Bing, Warde-Farley, David, Ozair, Sherjil, Courville,
sleep medicine, 2000. Aaron, and Bengio, Yoshua. Generative adversarial nets.
NIPS, 2014.
Carskadon, Mary A, Dement, William C, et al. Normal
human sleep: an overview. Principles and practice of Gu, Weixi, Yang, Zheng, Shangguan, Longfei, Sun, Wei,
sleep medicine, 2005. Jin, Kun, and Liu, Yunhao. Intelligent sleep stage mining
service with smartphones. ACM UbiComp, 2014.
Chen, Xi, Duan, Yan, Houthooft, Rein, Schulman, John,
Sutskever, Ilya, and Abbeel, Pieter. Infogan: Inter- Hao, Tian, Xing, Guoliang, and Zhou, Gang. isleep: un-
pretable representation learning by information maxi- obtrusive sleep quality monitoring using smartphones.
mizing generative adversarial nets. NIPS, 2016. ACM SenSys, 2013.
Cohen, Jacob. A coefficient of agreement for nominal
scales. Educational and psychological measurement, He, Kaiming, Zhang, Xiangyu, Ren, Shaoqing, and Sun,
1960. Jian. Deep residual learning for image recognition.
CVPR, 2016.
Crammer, Koby, Kearns, Michael, and Wortman, Jennifer.
Learning from multiple sources. JMLR, 2008. Herbst, Ellen et al. Adaptation effects to sleep studies
in participants with and without chronic posttraumatic
da Silveira, Thiago LT, Kozakevicius, Alice J, and Ro-
stress disorder. Psychophysiology, 2010.
drigues, Cesar R. Single-channel eeg sleep stage classi-
fication based on a streamlined set of statistical features Hochreiter, Sepp and Schmidhuber, Jürgen. Long short-
in wavelet domain. Medical & biological engineering & term memory. Neural computation, 1997.
computing, 2016.
Donahue, Jeff, Krähenbühl, Philipp, and Darrell, Trevor. Kaltiokallio, Ossi, Yigitler, Huseyin, Jantti, Riku, and Pat-
Adversarial feature learning. ICLR, 2017. wari, Neal. Non-invasive respiration rate monitoring us-
ing a single cots tx-rx pair. IPSN, 2014.
Dumoulin, Vincent, Belghazi, Ishmael, Poole, Ben, Lamb,
Alex, Arjovsky, Martin, Mastropietro, Olivier, and Landis, J Richard and Koch, Gary G. The measurement
Courville, Aaron. Adversarially learned inference. of observer agreement for categorical data. Biometrics,
ICLR, 2017. 1977.
Liu, Xuefeng, Cao, Jiannong, Tang, Shaojie, and Wen, Ji- Rahman, Tauhidur, Adams, Alexander T, Ravichandran,
aqi. Wi-sleep: Contactless sleep monitoring via wifi sig- Ruth Vinisha, Zhang, Mi, Patel, Shwetak N, Kientz,
nals. RTSS, 2014. Julie A, and Choudhury, Tanzeem. Dopplesleep: A con-
tactless unobtrusive sleep sensing system using short-
Long, Mingsheng, Cao, Yue, Wang, Jianmin, and Jordan, range doppler radar. ACM UbiComp, 2015.
Michael I. Learning transferable features with deep
adaptation networks. ICML, 2015. Rechtschaffen, Allan and Kales, Anthony. A manual of
standardized terminology, techniques and scoring sys-
Long, Xi, Yang, Jie, Weysen, Tim, Haakma, Reinder, tem for sleep stages of human subjects. US Government
Foussier, Jérôme, Fonseca, Pedro, and Aarts, Ronald M. Printing Office, US Public Health Service, 1968.
Measuring dissimilarity between respiratory effort sig-
nals based on uniform scaling for sleep staging. Physio- Shah, Purav C, Yudelevich, Eric, Genese, Frank, Martillo,
logical measurement, 2014. Miguel, Ventura, Iazsmin B, Fuhrmann, Katherine, Mor-
tel, Marie, Levendowski, Daniel, Gibson, Charlisa D,
Lucey, Brendan P, Mcleland, Jennifer S, Toedebusch, Ochieng, Pius, et al. Can disrupted sleep affect mortality
Cristina D, Boyd, Jill, Morris, John C, Landsness, in the mechanically ventilated critically ill? A state of
Eric C, Yamada, Kelvin, and Holtzman, David M. unrest: Sleep/SDB in the ICU and hospital, 2016.
Comparison of a single-channel eeg sleep study to
polysomnography. Journal of sleep research, 2016. Shambroom, John R, Fábregas, Stephan E, and Johnstone,
Jack. Validation of an automated wireless system to
Maaten, Laurens van der and Hinton, Geoffrey. Visualizing monitor sleep in healthy adults. Journal of sleep re-
data using t-sne. JMLR, 2008. search, 2012.
Makhzani, Alireza, Shlens, Jonathon, Jaitly, Navdeep, Sutskever, Ilya, Vinyals, Oriol, and Le, Quoc V. Sequence
Goodfellow, Ian, and Frey, Brendan. Adversarial autoen- to sequence learning with neural networks. NIPS, 2014.
coders. arXiv preprint arXiv:1511.05644, 2015.
Szegedy, Christian, Liu, Wei, Jia, Yangqing, Sermanet,
Metz, Luke, Poole, Ben, Pfau, David, and Sohl-Dickstein, Pierre, Reed, Scott, Anguelov, Dragomir, Erhan, Du-
Jascha. Unrolled generative adversarial networks. arXiv mitru, Vanhoucke, Vincent, and Rabinovich, Andrew.
preprint arXiv:1611.02163, 2016. Going deeper with convolutions. CVPR, 2015.
National Institute of Health. Sleep disorders. Tataraidze, Alexander, Korostovtseva, Lyudmila, An-
http://www.ninds.nih.gov/disorders/ ishchenko, Lesya, Bochkarev, Mikhail, and Sviryaev,
brain_basics/understanding_sleep.htm# Yurii. Sleep architecture measurement based on car-
sleep_disorders. diorespiratory parameters. IEEE EMBC, 2016a.
Pigou, Lionel, Van Den Oord, Aäron, Dieleman, Sander, Tataraidze, Alexander, Korostovtseva, Lyudmila, An-
Van Herreweghe, Mieke, and Dambre, Joni. Beyond ishchenko, Lesya, Bochkarev, Mikhail, Sviryaev, Yurii,
temporal pooling: Recurrence and temporal convolu- and Ivashov, Sergey. Bioradiolocation-based sleep stage
tions for gesture recognition in video. IJCV, 2015. classification. IEEE EMBC, 2016b.
Pollak, Charles P, Tryon, Warren W, Nagaraja, Haikady, Tzeng, Eric, Hoffman, Judy, Darrell, Trevor, and Saenko,
and Dzwonczyk, Roger. How accurately does wrist Kate. Simultaneous deep transfer across domains and
actigraphy identify the states of sleep and wakefulness? tasks. ICCV, 2015.
SLEEP-NEW YORK, 2001. Tzeng, Eric, Hoffman, Judy, Saenko, Kate, and Darrell,
Popovic, Djordje, Khoo, Michael, and Westbrook, Philip. Trevor. Adversarial discriminative domain adaptation.
Automatic scoring of sleep stages and cortical arousals NIPS Workshop on Adversarial Training, 2016.
using two electrodes on the forehead: validation in Vondrick, Carl, Pirsiavash, Hamed, and Torralba, Antonio.
healthy adults. Journal of sleep research, 2014. Generating videos with scene dynamics. NIPS, 2016.
Purushotham, Sanjay, Carvalho, Wilka, Nilanon, Tanachat, Zaffaroni, A, Gahan, L, Collins, L, O’hare, E, Heneghan,
and Liu, Yan. Variational recurrent adversarial deep do- C, Garcia, C, Fietze, I, and Penzel, T. Automated sleep
main adaptation. ICLR, 2017. staging classification using a non-contact biomotion sen-
sor. Journal of Sleep Research, 2014.
Radford, Alec, Metz, Luke, and Chintala, Soumith. Un-
supervised representation learning with deep convolu- Zhao, Mingmin, Adib, Fadel, and Katabi, Dina. Emo-
tional generative adversarial networks. arXiv preprint tion recognition using wireless signals. ACM MobiCom,
arXiv:1511.06434, 2015. 2016.
View publication stats

Learning Sleep Stages From Radio Signals: A Conditional Adversarial Architecture

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Learning Sleep Stages From Radio Signals: A Conditional Adversarial Architecture

Uploaded by

Copyright:

Available Formats

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

Learning Sleep Stages from Radio Signals: A Conditional Adversarial

Conference Paper · August 2017

Mingmin Zhao Shichao Yue

SEE PROFILE SEE PROFILE

The user has requested enhancement of the downloaded file.

Abstract ducted in a hospital or sleep lab, and requires the subject to

neural network (RNN) to capture the temporal dynamics of

features that capture the temporal dependencies and trans- x x

The loss of the label predictor, F , given the encoder E, is Lf (F ; E)

4.1. RF-Sleep Dataset

1.0 Ground Truth

(c) Worst Accuracy (71.2%)

ing data is randomly split into a training set and validation ●

We use two metrics commonly used in automated sleep ●● ●

staging, namely Accuracy and Cohen’s Kappa. While ac- ●

curacy measures the percent agreement with ground truth, ●

Table 2 shows the accuracy and Cohen’s Kappa of our

Source Loss Test Loss w/o Adversary w/ Adversary

w/o Adversary w/ Adversary

Figure 5. Baseline model and ours are evaluated on same dataset.

we train a (non-adversarial) discriminator to determine the

source of features. Fig. 5 shows that the loss of the source

discriminator in the baseline model decreases very quickly

while ours stays high (upper bounded by H(s) = 2.81 in

this case), suggesting that our learned representation is in-

Acknowledgments Ebrahimi, Farideh, Mikaeili, Mohammad, Estrada, Edson,

View publication stats

You might also like