Professional Documents
Culture Documents
Learning Sleep Stages From Radio Signals: A Conditional Adversarial Architecture
Learning Sleep Stages From Radio Signals: A Conditional Adversarial Architecture
net/publication/323243813
CITATIONS READS
126 404
5 authors, including:
All content following this page was uploaded by Mingmin Zhao on 17 February 2018.
Mingmin Zhao 1 Shichao Yue 1 Dina Katabi 1 Tommi S. Jaakkola 1 Matt T. Bianchi 2
Our work also contributes to learning invariant represen- (a) Model (Ideal Game) (b) Extended Game
tations in deep adversarial networks. Adversarial net-
works were introduced to effectively train complex gener- Figure 1. Model and Extended Game. Dotted arrow indicates that
ative models of images (Goodfellow et al., 2014; Radford the information does not propagate back on this link.
et al., 2015; Chen et al., 2016) where the adversary (dis-
criminator) was introduced so as to match generated sam-
ples with observed ones. The broader approach has since metrics for measuring distance between distributions (Ben-
been adopted as the training paradigm across a number of David et al., 2010; Fernando et al., 2013).
other tasks as well, from learning representations for semi-
supervised learning (Makhzani et al., 2015), and model- 3. Model
ing dynamic evolution (Vondrick et al., 2016; Purushotham
et al., 2017) to inverse maps for inference (Donahue et al., Let x ∈ Ωx be an input sample, and y ∈ {1, 2, ..., ny }
2017; Dumoulin et al., 2017), and many others. Substan- an output label. Let s ∈ {1, 2, ..., ns } denote an auxiliary
tial work has also gone into improving the stability of ad- label that refers to the source of a specific input sample. We
versarial training (Metz et al., 2016; Arjovsky et al., 2017; define x = [x1 , x2 ..., xt ] ∈ Ωx as the sequence of input
Arjovsky & Bottou, 2017). samples from the beginning of time until the current time t.
On a technical level, our work is most related to adversar- In the context of our application, the above notation trans-
ial architectures for domain adaptation (Ganin & Lempit- lates into the following: The input sample x is a 30-second
sky, 2015; Ganin et al., 2016; Tzeng et al., 2015; 2016). RF spectrogram, and the output label y is a sleep stage
Yet, there are key differences between our approach and that takes one of four values: Awake, Light Sleep, Deep
the above references, beyond the main application of sleep Sleep, or REM. The vector x refers to the sequence of RF
staging that we introduce. First, our goal is to remove spectrograms from the beginning of the night until the cur-
conditional dependencies rather than making the represen- rent time. Since RF signals carry information about the
tation domain independent. Thus, unlike the above ref- subject and the measurement environment, we assign each
erences which do not involve conditioning in the adver- input x an auxiliary label s which identifies the subject-
sary, our adversary takes the representation but is also con- environment pair, hereafter referred to as the source.
ditioned on the predicted label distribution. Second, our Our goal is to learn a latent representation (i.e., an encoder)
game theoretic setup controls the information flow differ- that can be used to predict label y; yet, we want this repre-
ently, ensuring that only the representation encoder is mod- sentation to generalize well to predict sleep stages for new
ified based on the adversary performance. Specifically, the subjects without having labeled data from them. Simply
predicted distribution over stages is strategically decoupled making the representation invariant to the source domains
from the adversary (conditioning is uni-directional). Third, could hamper the accuracy of the predictive task. Instead
we show that this new conditioning guarantees an equi- we would like to remove conditional dependencies between
librium solution that fully preserves the ability to predict the representation and the source domains.
sleep staging while removing, conditionally, extraneous in-
formation specific to the individuals or measurement con- We introduce a multi-domain adversarial model that
ditions. Guarantees of this kind are particularly important achieves the above goal. Our model is shown in Fig. 1(a).
for healthcare data where the measurements are noisy with It has three components: An encoder E, a label predictor
a variety of dependencies that need to be controlled. F , and a source discriminator D. Our model is set up as a
game, where the representation encoder plays a cooperative
Finally, our work is naturally also related to other game with the label predictor to allow it to predict the cor-
non-adversarial literature on multi-source domain adapta- rect labels using the encoded representation. The encoder
tion (Crammer et al., 2008; Long et al., 2015), and work on also plays a minimax game against the source discrimina-
Learning Sleep Stages from Radio Signals: A Conditional Adversarial Architecture
tor to prevent it from decoding the source label from the loss of the source discriminator D as the cross-entropy be-
encoded representation. tween Ps (·|x) and QD (·|E(x), Py (·|x)):
A key characteristic of our model is the conditioning of Ld (D; E) = Ex,s [− log QD (s|E(x), Py (·|x))] (2)
the source discriminator on the label distribution, Py (·|x)
(see Fig. 1(a)). This conditioning of the adversary allows During training, encoder E and discriminator D play a
the learned representation to correlate with the domains, minimax game: while D is trained to minimize the source
but only via the label distribution –i.e., removes conditional prediction loss, encoder E is trained to maximize it in order
dependencies between the representation and the sources. to achieve the above invariance.
The rest of this section is organized as follows. We first
3.1. Ideal Game
formally define three players E, F , and D and the repre-
sentation invariance they are trained to achieve. In Sec. 3.1, During training, encoder E plays a co-operative game with
we analyze the game and prove that at equilibrium the en- predictor F , and a minimax game with discriminator D.
coder discards all extraneous information about the source We define a value function of E, F and D with λ > 0:
that is not beneficial for label prediction (i.e., predicting y).
Training the ideal model in Fig. 1(a) is challenging because V(E, F, D) = Lf (F ; E) − λ · Ld (D; E) (3)
it requires access to the label distribution Py (·|x). To drive
an efficient training algorithm, we define in Sec. 3.2 an ex- The training procedure can be viewed as a three-player
tended game where the source discriminator uses the output minimax game of E, F and D:
of the label predictor as an approximation of the posterior min min max V(E, F, D) = min max V(E, F, D) (4)
probabilities, as shown in Fig. 1(b). We prove that the equi- E F D E,F D
libriums of the original game are also equilibriums in the
Proposition 2 (Optimal predictor). Given encoder E,
extended one.
Encoder E: An encoder E(·) : Ωx → Ωz is a func- Lf (E) , min Lf (F ; E) ≥ H(y|E(x)), (5)
F
tion that takes a sequence of input samples x, and returns a
vector summary of x as z = E(x). where H(·) is entropy.
Label Predictor F : A label predictor F (·) : Ωz → The optimal predictor F ∗ that achieves equality is:
[0, 1]ny takes a latent representation E(x) as input and pre-
QF ∗ (y|E(x)) = p(y|E(x)) (6)
dicts the probability of each label y associated with input
x as QF (y|E(x)). The goal of an ideal predictor F is to
approximate Py (·|x) with QF (·|E(x)). Proof.
To measure the invariance of an encoder E, we define the QD∗ (s|E(x), Py (·|x)) = P (s|E(x), Py (·|x)) (8)
Learning Sleep Stages from Radio Signals: A Conditional Adversarial Architecture
Corollary 3.1. H(s) is an upper bound of the loss of the Remark 5.1. During the proof of Theorem 5, we construct
optimal discriminator D∗ for any encoder E. an encoder E0 (x) = Py (·|x) that can achieve the opti-
mal value of C(E). However, we argue that training will
Next, we state the virtual training criterion of the encoder. not converge to this trivial encoder in practice. This is be-
Proposition 4. If predictor F and discriminator D have cause Py (·|x) is a mapping from the full signal history to
enough capacity and are trained to achieve their optimal the distribution over stages at the current step, therefore it-
losses, the minimax game (4) can be rewritten as the fol- self highly complex. Since we use the RNN state as the en-
lowing training procedure of encoder E: coding E(x), and it feeds into the LSTM gates, distribution
over stages at previous step does not represent a sufficient
min[H(y|E(x)) − λ · H(s|E(x), Py (·|x))] (9) summary of the history until the current one. Therefore,
E
E(x) must be able to anticipate the temporal evolution
of the signal and contain a more effective summary than
Proof. Based on the losses of the optimal predictor F ∗ and
Py (·|x) would be.
the optimal discriminator D∗ in Proposition 2 and Proposi-
tion 3, the minimax game (4) can be rewritten as (9). Thus, Corollary 5.2. If encoder E and predictor F have enough
encoder E is trained to minimize a virtual training criterion capacity and are trained to reach optimum, the output of F
C(E) = H(y|E(x)) − λ · H(s|E(x), Py (·|x)). is equal to Py (·|x).
Next, we describe the optimal encoder. Proof. When predictor F is optimal (Proposition 2),
QF (y|E(x)) = p(y|E(x)). When E is optimal (The-
Theorem 5 (Optimal encoder). If encoder E, predictor F orem 5), H(y|E(x)) = H(y|x), that is p(y|E(x)) =
and discriminator D have enough capacity and are trained p(y|x). Therefore, QF (y|E(x)) = p(y|x).
to reach optimum, any global optimal encoder E ∗ has the
following properties:
3.2. Extended Game
∗
H(y|E (x)) = H(y|x) (10a) In practice, estimating the posterior label distribution
∗ Py (·|x) from labeled data is a non-trivial task. Fortunately
H(s|E (x), Py (·|x)) = H(s|Py (·|x)) (10b)
however our predictor F and encoder E are playing a coop-
Proof. Since E(x) is a function of x: erative game to approximate this posterior label distribution
Py (·|x) with QF (·|E(x)). Therefore, we use QF (·|E(x)),
Lf (E) = H(y|E(x)) ≥ H(y|x) (11a) the output of predictor F , as a proxy of Py (·|x) and feed it
Ld (E) = H(s|E(x), Py (·|x)) ≤ H(s|Py (·|x)) (11b) as input to discriminator D (Fig. 1(b)).
An extended three-player game arises: while encoder E
Hence, C(E) = H(y|E(x)) − λ · H(s|E(x), Py (·|x)) ≥ still plays a cooperative game with predictor F and a mini-
H(y|x) − λ · H(s|Py (·|x)). The equality holds if and only max game with discriminator D, discriminator D depends
if both (10a) and (10b) are satisfied. Therefore, we only strategically on predictor F but not vice versa. The dotted
need to prove that the optimal value of C(E) is equal to line in Fig. 1(b) illustrates this dependency.
H(y|x)−λ·H(s|Py (·|x)) in order to prove that any global
encoder E ∗ satisfies both (10a) and (10b). The relationship between the ideal minimax
game (Sec. 3.1) and the extended one is stated below.
We show that C(E) can achieve H(y|x)−λ·H(s|Py (·|x))
Proposition 6. If encoder E, predictor F and discrimina-
by considering the following encoder E0 : E0 (x) =
tor D have enough capacity, the solution that encompasses
Py (·|x). It can be examined that H(y|E0 (x)) = H(y|x)
the optimal encoder, E ∗ , predictor, F ∗ and discriminator,
and H(s|E0 (x), Py (·|x)) = H(s|Py (·|x)).
D∗ , in the ideal minimax game is also an equilibrium solu-
tion of the extended game.
Adversarial training of D can be viewed as a regularizer,
which leads to a common representation space for multi-
Proof. By Corollary 5.2, when encoder E and predictor F
ple source domains. From Theorem 5, the optimal encoder
are optimal, QF (·|E(x)) is equal to Py (·|x). Thus, the
E ∗ using adversarial training satisfies H(y|E ∗ (x)) =
extended game becomes equivalent to the ideal game, and
H(y|x), which is the maximal discriminative capability
E ∗ , F ∗ and D∗ is an equilibrium solution of both games.
that any encoder E can achieve. Thus, we have the fol-
lowing corollary.
Corollary 5.1. Adversarial training of the discriminator
does not reduce the discriminative capability of the repre-
sentation.
Learning Sleep Stages from Radio Signals: A Conditional Adversarial Architecture
Algorithm 1 Encoder, predictor and discriminator training adversarial feedback even when the target labels are un-
Input: Labeled data {(xi , yi , si )}M certain. For example, in the sleep staging problem, each
i=1 , learning rate η.
Compute stop criterion for inner loop: δd ← H(s) 30-second window is given one label. Yet, many such win-
for number of training iterations do dows include transitions between sleep stages, e.g., a tran-
Sample a mini-batch of training data {(xi , yi , si )}m sition from light to deep sleep. These transitions are grad-
i=1
Lif ← − log QF (yi |E(xi )) ual and hence the transition windows can be intrinsically
wi ← QF (·|E(xi )) I stop gradient along this link different from both light and deep sleep. It would be desir-
Lid ← − log QD (si |E(xi ), wi ) able to have the learned representation capture the concept
V i = Lif − λ · Lid of transition and make it invariant to the source (see the
Update encoder E: results in Sec. 4.5). 3) It allows the conditioning to remain
1
Pm i
θ e ← θ e − η e ∇θ e m i=1 V
available for additional guiding of representations based on
Update predictor F : unlabeled data. The model can incorporate unlabeled data
1
Pm i
θ f ← θ f − η f ∇θ f m i=1 V
for either semi-supervised learning or transductive learning
repeat within a unified framework.
Update discriminator D: Pm i
1
θd ← θd + ηd ∇θd m i=1 V 4. Experiments
1
P m i
until m i=1 dL ≤ δ d
end for In this section, we empirically evaluate our model.
Ground Truth
Wake
Table 2. Sleep Stage Classification Accuracy and Kappa REM
Light
Approach Accuracy κ Deep
Prediction
Wake
Tataraidze et al. (2016b) 0.635 0.49 REM
Light
Zaffaroni et al. (2014) 0.641 0.45 Deep
0 1 2 3 4 5 6 7
Time since light off(h)
Ours 0.798 0.70
(a) Average Accuracy (80.4%)
0.8 Light
Accuracy
REM .03 .82 .15 0 0.6
Deep
Prediction
Wake
Light .04 .07 .83 .06 0.4 REM
Light
0.2 Deep
0 1 2 3 4 5 6 7
Deep .01 0 .24 .75 Time since light off(h)
0.0
Awake REM Light Deep 1 5 9 13 17 21 25 (b) Best Accuracy (91.2%)
Predicted stage Subject
(a) Confusion Matrix (b) Accuracy on each subject Wake
Ground Truth
REM
Light
Deep
Figure 2. 2(a) shows that our model can distinguish deep and light Prediction
Wake
sleep with high accuracy. And 2(b) illustrates that our model REM
Light
works well for different subjects and environments. Deep
0 1 2 3 4 5 6
Time since light off(h)
4.3. Classification Results Figure 3. Three examples of full night predictions corresponding
to the average, best and worst classification accuracy.
We evaluate the model on every subject while training on
the data collected from the other subjects (i.e., the model
is never trained on data from the test subject). The train- CNN Response RNN Response
●● ● ● ●
● ●●●
●● ● ●● ●●●●
●● ●● ●
● ● ●●
●●●●● ●
● ●
●●
●
●● ●●
● ●● ●
●● ●
●● ●● ●● ● ●●
●
●
●●
●
●
●
● ●
● ● ● ● ●●
set (75%/25%). ● ●
● ● ● ●●
● ●
● ●
● ●● ●
● ●
● ● ● ●
● ●
● ●
● ●● ● ● ●
●
●●
● ●● ●
● ●● ● ●●
● ● ●
●
●
Cohen’s Kappa coefficient κ (Cohen, 1960) takes into ac- Awake ● REM Light Deep
count the possibility of the agreement occurring by chance
and is usually a more robust metric. κ > 0.4, κ > 0.6, Figure 4. Visualizations of the CNN and RNN responses. CNN
κ > 0.8 are considered to be moderate, substantial and al- can separate Wake REM and from the other stages, yet Deep and
most perfect agreement (Landis & Koch, 1977). Light Sleep can only be distinguished by RNN.
1.4 ●●
●●
●● ●
●
●
●
●
●
● ●● ●
● ● ●
●● ● ● ● ● ● ●
●
● ●● ● ● ●● ● ●
● ●●● ●
2 1.2 ●
●
●
●●
●
●
●
●●
●
● ●
●
●● ●
●●
●
●●●● ● ●
●●
●
●
●
●
●
●
●
●
●
●● ●
●● ●●
● ●●
● ●
●
●●●●● ●●●●●● ● ●
● ●●●
● ●
● ●●
● ● ● ●
●
●●
● ● ●●●●
●●
●● ● ●
● ●● ● ●● ●
●●● ● ● ●●●●
●
● ●● ● ● ● ● ●
1.0 ●●●● ●● ● ● ● ● ●
● ● ● ● ●
● ● ● ● ● ● ● ● ●● ●
● ●
● ● ●
●
● ●● ● ●●●● ● ●●
● ●●● ● ● ●●● ● ● ●
●
● ● ● ● ● ● ●● ●
●
● ●● ● ● ●●
●●
● ● ● ●
●● ● ●
1 ● ● ●● ●
● ● ● ●
●●
●● ● ●
● ● ●
● ●●
● ● ●●● ● ●
● ● ●●●
● ●
● ● ●
● ● ●
● ● ● ●●●●●
●● ● ●
● ●
●
0.8 ●
●
●
●
●
●
●
●
●
●
●
●●
● ●
●
● ●
●●
●
●●
●
●● ●●●
●●
●●
●
●●●●●●●●
●
●●●
●●● ● ●●● ● ●
●●
●
●●
●
●
● ●
●
●
●
● ●
●
●
● ●● ●● ● ● ●●●
●
●● ● ●●●
●●●● ●● ●
● ● ● ●
● ● ●
● ● ● ● ●● ● ●● ● ●● ● ●
●
● ● ● ● ●
●
● ● ●●●● ●● ● ●● ●
● ●
● ●
● ●●●● ● ● ●● ●●
● ● ● ●● ● ● ● ●
●
● ●●●● ● ●● ● ● ● ● ●● ●
● ● ● ●
●●
0 2000 4000 6000 0 2000 4000 6000 ● ●●●
●
●
●
●
● ●
●
●
●
●●● ●●●● ●
●
●●●●
● ●
●
● ●
●
● ●
● ●● ● ●
●
●
●
●
●
●
●
●● ●
● ●● ● ●
● ●
● ●
●
●
Liu, Xuefeng, Cao, Jiannong, Tang, Shaojie, and Wen, Ji- Rahman, Tauhidur, Adams, Alexander T, Ravichandran,
aqi. Wi-sleep: Contactless sleep monitoring via wifi sig- Ruth Vinisha, Zhang, Mi, Patel, Shwetak N, Kientz,
nals. RTSS, 2014. Julie A, and Choudhury, Tanzeem. Dopplesleep: A con-
tactless unobtrusive sleep sensing system using short-
Long, Mingsheng, Cao, Yue, Wang, Jianmin, and Jordan, range doppler radar. ACM UbiComp, 2015.
Michael I. Learning transferable features with deep
adaptation networks. ICML, 2015. Rechtschaffen, Allan and Kales, Anthony. A manual of
standardized terminology, techniques and scoring sys-
Long, Xi, Yang, Jie, Weysen, Tim, Haakma, Reinder, tem for sleep stages of human subjects. US Government
Foussier, Jérôme, Fonseca, Pedro, and Aarts, Ronald M. Printing Office, US Public Health Service, 1968.
Measuring dissimilarity between respiratory effort sig-
nals based on uniform scaling for sleep staging. Physio- Shah, Purav C, Yudelevich, Eric, Genese, Frank, Martillo,
logical measurement, 2014. Miguel, Ventura, Iazsmin B, Fuhrmann, Katherine, Mor-
tel, Marie, Levendowski, Daniel, Gibson, Charlisa D,
Lucey, Brendan P, Mcleland, Jennifer S, Toedebusch, Ochieng, Pius, et al. Can disrupted sleep affect mortality
Cristina D, Boyd, Jill, Morris, John C, Landsness, in the mechanically ventilated critically ill? A state of
Eric C, Yamada, Kelvin, and Holtzman, David M. unrest: Sleep/SDB in the ICU and hospital, 2016.
Comparison of a single-channel eeg sleep study to
polysomnography. Journal of sleep research, 2016. Shambroom, John R, Fábregas, Stephan E, and Johnstone,
Jack. Validation of an automated wireless system to
Maaten, Laurens van der and Hinton, Geoffrey. Visualizing monitor sleep in healthy adults. Journal of sleep re-
data using t-sne. JMLR, 2008. search, 2012.
Makhzani, Alireza, Shlens, Jonathon, Jaitly, Navdeep, Sutskever, Ilya, Vinyals, Oriol, and Le, Quoc V. Sequence
Goodfellow, Ian, and Frey, Brendan. Adversarial autoen- to sequence learning with neural networks. NIPS, 2014.
coders. arXiv preprint arXiv:1511.05644, 2015.
Szegedy, Christian, Liu, Wei, Jia, Yangqing, Sermanet,
Metz, Luke, Poole, Ben, Pfau, David, and Sohl-Dickstein, Pierre, Reed, Scott, Anguelov, Dragomir, Erhan, Du-
Jascha. Unrolled generative adversarial networks. arXiv mitru, Vanhoucke, Vincent, and Rabinovich, Andrew.
preprint arXiv:1611.02163, 2016. Going deeper with convolutions. CVPR, 2015.
National Institute of Health. Sleep disorders. Tataraidze, Alexander, Korostovtseva, Lyudmila, An-
http://www.ninds.nih.gov/disorders/ ishchenko, Lesya, Bochkarev, Mikhail, and Sviryaev,
brain_basics/understanding_sleep.htm# Yurii. Sleep architecture measurement based on car-
sleep_disorders. diorespiratory parameters. IEEE EMBC, 2016a.
Pigou, Lionel, Van Den Oord, Aäron, Dieleman, Sander, Tataraidze, Alexander, Korostovtseva, Lyudmila, An-
Van Herreweghe, Mieke, and Dambre, Joni. Beyond ishchenko, Lesya, Bochkarev, Mikhail, Sviryaev, Yurii,
temporal pooling: Recurrence and temporal convolu- and Ivashov, Sergey. Bioradiolocation-based sleep stage
tions for gesture recognition in video. IJCV, 2015. classification. IEEE EMBC, 2016b.
Pollak, Charles P, Tryon, Warren W, Nagaraja, Haikady, Tzeng, Eric, Hoffman, Judy, Darrell, Trevor, and Saenko,
and Dzwonczyk, Roger. How accurately does wrist Kate. Simultaneous deep transfer across domains and
actigraphy identify the states of sleep and wakefulness? tasks. ICCV, 2015.
SLEEP-NEW YORK, 2001. Tzeng, Eric, Hoffman, Judy, Saenko, Kate, and Darrell,
Popovic, Djordje, Khoo, Michael, and Westbrook, Philip. Trevor. Adversarial discriminative domain adaptation.
Automatic scoring of sleep stages and cortical arousals NIPS Workshop on Adversarial Training, 2016.
using two electrodes on the forehead: validation in Vondrick, Carl, Pirsiavash, Hamed, and Torralba, Antonio.
healthy adults. Journal of sleep research, 2014. Generating videos with scene dynamics. NIPS, 2016.
Purushotham, Sanjay, Carvalho, Wilka, Nilanon, Tanachat, Zaffaroni, A, Gahan, L, Collins, L, O’hare, E, Heneghan,
and Liu, Yan. Variational recurrent adversarial deep do- C, Garcia, C, Fietze, I, and Penzel, T. Automated sleep
main adaptation. ICLR, 2017. staging classification using a non-contact biomotion sen-
sor. Journal of Sleep Research, 2014.
Radford, Alec, Metz, Luke, and Chintala, Soumith. Un-
supervised representation learning with deep convolu- Zhao, Mingmin, Adib, Fadel, and Katabi, Dina. Emo-
tional generative adversarial networks. arXiv preprint tion recognition using wireless signals. ACM MobiCom,
arXiv:1511.06434, 2015. 2016.