16 Sreekumar20IT

2044 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 66, NO.
4, APRIL 2020
Distributed Hypothesis Testing Over Discrete

Memoryless Channels
Sreejith Sreekumar , Student Member, IEEE, and Deniz Gündüz , Senior Member, IEEE
Abstract— A distributed binary hypothesis testing (HT) prob-

lem involving two parties, one referred to as the observer and the
other as the detector is studied. The observer observes a discrete
memoryless source (DMS) and communicates its observations
to the detector over a discrete memoryless channel (DMC).
The detector observes another DMS correlated with that at the
observer, and performs a binary hypothesis test on the joint
distribution of the two DMS’s using its own observed data and Fig. 1. Distributed HT over a DMC.
the information received from the observer. The trade-off between
the type I error probability and the type II error-exponent of the
HT is explored. Single-letter lower bounds on the optimal type is depicted in Fig. 1, where there is a single observer and
II error-exponent are obtained by using two different coding two possibilities for the joint distribution of the data. The
schemes, a separate HT and channel coding scheme and a joint
HT and channel coding scheme based on hybrid coding for the
observer observes k independent and identically distributed
matched bandwidth case. Exact single-letter characterization of (i.i.d) data samples U k , and communicates its observation
the same is established for the special case of testing against to the detector by n uses of the DMC, characterized by the
conditional independence, and it is shown to be achieved by the conditional distribution PY |X . The detector performs a binary
separate HT and channel coding scheme. An example is provided hypothesis test on the joint distribution of the data (U k , V k )
where the joint scheme achieves a strictly better performance
than the separation based scheme.
to decide between them, based on the channel outputs Y n as
well as its own observations V k . The null and the alternate
Index Terms— Distributed hypothesis testing, noisy channel, hypothesis of the hypothesis test are given by
error-exponents, reliability, statistical inference, separate scheme,
unequal error protection, joint scheme, hybrid coding. k

H0 : (U k , V k ) ∼ PUV , (1a)
I. I NTRODUCTION i=1
and
G IVEN data samples, statistical hypothesis testing (HT)
deals with the problem of ascertaining the true
assumption, that is, the true hypothesis, about the data
H1 : (U k , V k ) ∼
k

QUV , (1b)
i=1
from among a set of hypotheses. In modern communication
networks (like in sensor networks, cloud computing and respectively. Our goal is to characterize the optimal exponen-
Internet of things (IoT)), data is gathered at multiple remote tial rate of decay of the type II error probability asymptotically,
nodes, referred to as observers, and transmitted over noisy known as the type II error-exponent (henceforth, also referred
links to another node for further processing. Often, there is to as error-exponent) for a prescribed constraint on the type
some prior statistical knowledge available about the data, for I error probability for the above hypothesis test.
example, that the joint probability distribution of the data In the centralized scenario, in which the detector performs
belongs to a certain prescribed set. In such scenarios, it is of a binary hypothesis test on the probability distribution of the
interest to identify the true underlying probability distribution, data it observes directly, the optimal error-exponent is charac-
and this naturally leads to the problem of distributed HT terized by the well-known lemma of Stein [2] (see also [3]).
over noisy channels [1]. The simplest case of such a scenario The study of distributed statistical inference under communi-
cation constraints was conceived by Berger in [4]. In [4], and
Manuscript received June 2, 2018; revised November 5, 2019; accepted in the follow up literature summarized below, communication
November 7, 2019. Date of publication November 15, 2019; date of current
version March 17, 2020. This work was supported in part by the European from the observers to the detector are assumed to be over rate-
Research Council (ERC) through Starting Grant BEACON under Grant limited error-free channel. Some of the fundamental results in
677854. This work was presented in part at the International Symposium this setting for the case of a single observer was established by
on Information Theory (ISIT), Aachen, Germany, 2017.
The authors are with the Imperial College London, London SW7 2AZ, U.K. Ahlswede and Csiszár in [5]. They obtained a tight single-letter
(e-mail: s.sreekumar15@imperial.ac.uk; d.gunduz@imperial.ac.uk). characterization of the optimal error-exponent for a special
Communicated by N. Merhav, Associate Editor for Shannon Theory. case of HT known as testing against independence (TAI),
Color versions of one or more of the figures in this article are available
online at http://ieeexplore.ieee.org. in which, QUV = PU × PV . Furthermore, the authors estab-
Digital Object Identifier 10.1109/TIT.2019.2953750 lished a lower bound on the optimal error-exponent for the
0018-9448 © 2019 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See https://www.ieee.org/publications/rights/index.html for more information.
Authorized licensed use limited to: IMT ATLANTIQUE. Downloaded on October 21,2020 at 14:44:52 UTC from IEEE Xplore. Restrictions apply.
SREEKUMAR AND GÜNDÜZ: DISTRIBUTED HYPOTHESIS TESTING OVER DISCRETE MEMORYLESS CHANNELS 2045
general HT case, and proved a strong converse result, which communicate its observation subject to a rate constraint. How-
states that the optimal achievable error-exponent is indepen- ever, this is no longer the case in noisy settings, which compli-
dent of the constraint on the type I error probability. A tighter cates the study of error-exponents in HT. Since the capacity of
lower bound for the general HT problem is established by the channel PY |X , denoted by C(PY |X ), quantifies the max-
Han [6], which recovers the corresponding lower bound in [5]. imum rate of reliable communication over the channel, it is
Han also considered complete data compression in a related reasonable to expect that it plays a role in the characterization
setting where either U , or V , or both (also referred to as two- of the optimal error-exponent similar to the rate-constraint R in
sided compression setting) are compressed and communicated the noiseless setting. Another measure of the noisiness of the
to the detector using a message set of size two. It is shown that, channel is the so-called reliability function E(R, PY |X ) [23],
asymptotically, the optimal error-exponent achieved in these which is defined as the maximum achievable exponential
three settings are equal. In contrast, a single-letter characteri- decay rate of the probability of error (asymptotically) with
zation of the optimal error-exponent for even the TAI with two- respect to the blocklength for message rate of R. It appears
sided compression and general rate constraints remains open natural that the reliability function plays a role in the char-
till date. Shalaby and Papamarcou [7] extended the complete acterization of the achievable error-exponent for distributed
data compression result of Han to show that the optimal error- HT over a noisy channel. Indeed, in Theorem 2 given below,
exponent is not improved even if the rate constraint is relaxed we provide a lower bound on the optimal error-exponent that
to that of zero-rate compression (sub-exponential message set depends on the expurgated exponent at rate R, Ex (R, PY |X ),
with respect to blocklength k). Shimokawa et al. [8] obtained which is a lower bound on E(R, PY |X ) [24]. However,
a tighter lower bound on the optimal error-exponent for general surprisingly, it will turn out that the reliability function does
HT by considering quantization and binning at the encoder not play a role in the characterization of the error-exponent
along with a minimum empirical-entropy decoder. Rahman for TACI in the regime of vanishing type I error probability
and Wagner [9] studied the setting with multiple observers, constraint.
in which, they showed that for the case of a single-observer, The goal of this paper is to study the best attainable
the quantize-bin-test scheme achieves the optimal error- error-exponent for distributed HT over a DMC with a single
exponent for testing against conditional independence (TACI), observer and obtain a computable characterization of the same.
in which, V = (E, Z) and QUEZ = PUZ PE|Z . Extensions of Although a complete solution is not to be expected for this
the distributed HT problem has also been considered in several problem (since even the corresponding noiseless case is still
other interesting scenarios involving multiple detectors [10], open), the aim is to provide an achievable scheme for the
multiple observers [11], interactive HT [12], [13], collabo- general problem, and to identify special cases in which a tight
rative HT [14], HT with lossy source reconstruction [15], characterization can be obtained. In the sequel, we first intro-
HT over a multi-hop relay network [16], etc., in which, the duce a separation based scheme that performs independent
authors obtain a single-letter characterization of the optimal hypothesis testing and channel coding, which we refer to as
error-exponent in some special cases. the separate hypothesis testing and channel coding (SHTCC)
While the works mentioned above have studied the unsym- scheme. This scheme combines the Shimokawa-Han-Amari
metric case of focusing on the error-exponent for a constraint scheme [8], which is the best known coding scheme till
on the type I error probability, other works have analyzed the date for distributed HT over a rate-limited noiseless channel,
trade-off between the type I and type II error probabilities with the channel coding scheme that achieves the expurgated
in the exponential sense. In this direction, the optimal trade- exponent [24] [23] of the channel along with the best channel
off between the type I and type II error-exponents in the coding error-exponent for a single special message. The chan-
centralized scenario is obtained in [17]. The distributed version nel coding scheme is based on the Borade-Nakiboğlu-Zheng
of this problem is first studied in [18], where inner bounds on unequal error-protection scheme [25]. As we show later, the
the above trade-off are established. This problem has also been SHTCC scheme achieves the optimal error-exponent for TACI.
explored from an information-geometric perspective for the Although the SHTCC scheme is attractive due to its modular
zero-rate compression scenario in [19] and [20], which provide design, joint source channel coding (JSCC) schemes are
further insights into the geometric properties of the optimal known to outperform separation based schemes in several
trade-off between the two exponents. A Neyman-Pearson like different contexts, for example, the error exponent for reliable
test in the zero-rate compression scenario is proposed in [21], transmission of a source over a DMC [26], reliable trans-
which, in addition to achieving the optimal trade-off between mission of correlated sources over a multiple-access chan-
the two exponents, also achieves the optimal second order nel [27], etc., to name a few. While in separation based
asymptotic performance among all symmetric (type-based) schemes coding is usually performed by first quantizing the
encoding schemes. However, the optimal trade-off between the observed source sequence to an index, and transmitting the
type I and type II error-exponents for the general distributed channel codeword corresponding to that index (independent
HT problem remains open. Recently, an inner bound for this of the source sequence), JSCC schemes allow the channel
trade-off is obtained in [22], by using the reliability function codeword to be dependent on the source sequence, in addition
of the optimal channel detection codes. to the quantization index. Motivated by this, we propose
In contrast, HT in distributed settings that involve commu- a second scheme, referred to as the joint HT and channel
nication over noisy channels has not been considered until coding (JHTCC) scheme, based on hybrid coding [28] for the
now. In noiseless rate-limited settings, the encoder can reliably communication between the observer and the detector.
2046 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 66, NO. 4, APRIL 2020
Our main contributions can be summarized as follows. Similar notations are used for inequalities that hold asymptot-
(k)
(i) We propose two different coding schemes (namely, ically, e.g.„ ak ≥ b denotes limk→∞ ak ≥ b. P(E) denotes
SHTCC and JHTCC) for distributed HT over a DMC, the probability of event E. For functions f1 : A → B and
and analyze the error-exponents achieved by these f2 : B → C, f2 ◦ f1 denotes function composition. Finally,
schemes. ½(·) denotes the indicator function, and O(·) and o(·) denote
(ii) We obtain an exact single-letter characterization of the the standard asymptotic notation.
optimal error-exponent for the special case of TACI with
a vanishing type I error probability constraint, and show
that it is achievable by the SHTCC scheme. B. Problem Formulation
(iii) We provide an example where the JHTCC scheme All the r.v.’s considered henceforth are discrete with finite
achieves a strictly better error-exponent than the SHTCC support. Unless specified otherwise, we will denote the prob-
scheme. ability distribution of a r.v. Z under the null and alternate
The rest of the paper is organized as follows. In Section II, hypothesis by PZ and QZ , respectively. Let k, n ∈ Z+ be
we introduce the notations, detailed system model and def- arbitrary. The encoder (at the observer) observes U k , and
initions. Following this, we introduce the main results in transmits codeword X n = f (k,n) (U k ), where f (k,n) : U k →
Section III and IV. The achievable schemes are presented in X n represents the encoding function (possibly stochastic). Let
Section III and the optimality results for special cases are dis- τ := nk denote the bandwidth ratio. The channel output Y n is
cussed in Section IV. Finally, Section V concludes the paper. given by the probability law
n

II. P RELIMINARIES PY n |X n (y n |xn ) = PY |X (yj |xj ), (3)
j=1
A. Notations
Random variables (r.v.’s) are denoted by capital letters i.e., the channels between the observers and the detector are
(e.g., X), their realizations by the corresponding lower case independent of each other and memoryless. Depending on
letters (e.g., x), and their support by calligraphic letters the received symbols Y n and its own observations V k , the
(e.g., X ). The cardinality of a finite set X is denoted by |X |. detector makes a decision between the two hypotheses H0 and
The set of all probability distributions on alphabet X is H1 given in (1). Let H ∈ {0, 1} denote the actual hypothesis
denoted by PX . Similar notations apply for set of conditional and Ĥ ∈ {0, 1} denote the output of the hypothesis test, where
probability distributions, e.g., PY|X . X − Y − Z denotes that 0 and 1 denote H0 and H1 , respectively, and A(k,n) ⊆ Y n ×V k
X, Y and Z form a Markov chain. For m ∈ Z+ , X m denotes denote the acceptance region for H0 . Then, the decision rule
the sequence X1 , . . . , Xm . Following the notation in [23], g (k,n) : Y n × V k → {0, 1} is given by

for a probability distribution PX on r.v. X, TPmX and T[P m
X ]δ g (k,n) y n , v k = 1 − ½ y n , v k ∈ A(k,n) .
m m m
(or T[X]δ ) denote the set of sequences x ∈ X of type PX
and the set of PX -typical sequences, respectively. The set of Let

all possible types of sequences of length m with alphabet X
α k, n, f (k,n) , g (k,n) := 1 − PY n V k A(k,n) ,
is denoted by TXm , and ∪m∈Z+ TXm is denoted by TX . Similar
notations apply for pair’s and other larger combinations of and β k, n, f (k,n) , g (k,n) := QY n V k A(k,n) ,
r.v.’s, e.g., TPmXY T[Pm
XY ]δ
, TXmY , TX Y , etc.. The standard
information theoretic quantities like Kullback-Leibler (KL) denote the type I and type II error probabilities for the
divergence between distributions PX and QX , the entropy encoding function f (k,n) and decision rule g (k,n) , respectively.
of X with distribution PX , the conditional entropy of X Definition 1. An error-exponent κ is (τ, ) achievable if there
given Y and the mutual information between X and Y with exists a sequence of integers k, corresponding sequences of
joint distribution PXY , are denoted by D(PX ||QX ), HPX (X), encoding function f (k,nk ) and decision rules g (k,nk ) such that
HPXY (X|Y ) and IPXY (X; Y ), respectively. When the distri- nk ≤ τ k, ∀ k,
bution of the r.v.’s involved are clear from the context, the
last three quantities are denoted simply by H(X), H(X|Y ) −1
lim inf log β k, nk , f (k,nk ) , g (k,nk ) ≥ κ, (4a)
and I(X; Y ), respectively. Given realizations X m = xm and k→∞ k

Y m = y m , He (xm |y m ) denotes the conditional empirical and lim sup α k, nk , f (k,nk ) , g (k,nk ) ≤ . (4b)
entropy defined as k→∞
He (xm |y m ) := HPX̃ Ỹ (X̃|Ỹ ), (2) For (τ, ) ∈ R+ × [0, 1], let
where PX̃ Ỹ denote the joint type of (xm , y m ), and := κ(τ, ) := sup{κ : κ is (τ, ) achievable}. (5)
represents equality by definition (throughout this paper). For We are interested in obtaining a computable characterization
a ∈ R+ , [a] denotes the set of integers {1, 2, . . . , a}. All of κ(τ, ).
logarithms considered in this paper are with respect to the It is well known that the Neyman-Pearson test [29] gives the
base e unless specified otherwise. For any set G, G c denotes optimal trade-off between the type I and type II error proba-
(k)
the set complement. ak −−→ b represents limk→∞ ak = b. bilities, and hence, also between the error-exponents in HT.
(k,n)
It follows that the optimal error-exponent for distributed codeword X n = fc (M ). Note that the rate of this coding
HT over a DMC is achieved when the channel-input X n scheme is kR n = R
τ bits per channel use. The channel decoder
is generated correlated with U k according to some optimal (k,n)
gc : Y n → M maps the channel-output Y n into an index
conditional distribution PX n |U k , and the optimal Neyman- M̂ = gc
(k,n) (k)
(Y n ), and gs : M × V k → {0, 1} outputs
Pearson test is performed on the data available (both received (k) (k,n)
the result of the HT as Ĥ = gs (M̂, V k ). Note that fc
and observed) at the detector. It can be shown, similarly to [5, k (k) k (k,n)
depends on U only through the output of fs (U ) and gc
Theorem 1], that the optimal error-exponent for vanishing type k n
depends on V only through Y . Hence, the scheme is mod-
I error probability constraint is characterized by the multi-letter ular in the sense that (fc
(k,n) (k,n)
, gc ) can be designed indepen-
expression (see [30]) given by (k) (k)
dent of (fs , gs ). In other words, any good channel coding
1 scheme may be used in conjunction with a good compression
lim κ(τ, ) = sup D (PY n V k ||QY n V k ) . (6) (k)
→0 PX n |U k ∈ PX n |U k , k scheme. If U k is not typical according to PU , fs outputs a
k,n ∈ Z+ , n≤τ k special message, referred to as the error message, denoted by
However, the above expression does not single-letterize in M = 0, to inform the detector to declare Ĥ = 1. There is obvi-
general, and hence, is intractable as it involves optimization ously a trade-off between the reliability of the error message
over large dimensional probability simplexes when k and n and the other messages in channel coding. The best known
are large. Moreover, the encoder and the detector of a scheme reliability for protecting a single special message when the
achieving the error-exponent given in (6) would be compu- other messages M ∈ [enR ] of rate R, referred to as ordinary
tationally complex to implement from a practical viewpoint. messages, are required to be communicated reliably is given
Consequently, we establish two computable single-letter lower by the red-alert exponent in [25]. The red-alert exponent is
bounds on κ(τ, ) in the next section by using the SHTCC and defined as

JHTCC schemes. Em (R, PY |X ) := max PS (s) D PY |S=s ||PY |X=s .
PSX : S=X ,
I(X;Y |S)=R, s∈S
III. ACHIEVABLE S CHEMES S−X−Y
In [8], Shimokawa et al. obtained a lower bound on the (7)

optimal error-exponent for distributed HT over a rate-limited Borade et al.’s scheme uses an appropriately generated
noiseless channel by using a coding scheme that involves codebook along with a two-stage decoding procedure. The
quantization and binning at the encoder. In this scheme, the first stage is a joint-typicality decoder to decide whether
type1 of the observed sequence U k = uk is transmitted X n (0) is transmitted, while the second stage is a maximum-
by the encoder to the detector, which is useful to improve likelihood decoder to decode the ordinary message if the
the performance of the hypothesis test. In fact, in order to output of the first stage is not zero, i.e., M̂ = 0. On the other
achieve the error-exponent proposed in [8], it is sufficient hand, it is well-known that if the rate of the messages is R,
to send a message indicating whether U k is typical or not, a channel coding error-exponent equal to Ex (R, PY |X ) is
rather than sending the exact type of U k . Although it is not achievable, where Ex (R, PY |X ) is the expurgated exponent
possible to get perfect reliability for messages transmitted at rate R [24] [23] defined in (8) below. Let
over a noisy channel, intuitively, it is desirable to protect the
typicality information about the observed sequence as reliably Em (PSX , PY |X ) := PS (s) D PY |S=s ||PY |X=s , (9)
as possible. Based on this intuition, we next propose the s∈S
SHTCC scheme that performs independent HT and channel where, S = X and S − X − Y , and Ex (R, PSX , PY |X ) be
coding and protects the message indicating whether U k is as defined in (10) below. Although Borade et al.’s scheme is
typical or not, as reliably as possible. concerned only with the reliability of the special message, it is
not hard to see using the technique of random-coding that for a
A. SHTCC Scheme: fixed distribution PSX , there exists a codebook C̃, and encoder
In the SHTCC scheme, the encoding and decoding functions and decoder as in Borade et al.’s scheme, such that the rate
are restricted to be of the form f (k,n) = fc
(k,n) (k)
◦ fs and is 0 ≤ R ≤ I(X; Y |S) and the special message achieves
g (k,n) (k)
= gs ◦ gc
(k,n)
, respectively. The source encoder a reliability equal to Em (PSX , PY |X ), while the ordinary
(k) messages achieve a reliability equal to Ex (R, PSX , PY |X ).
fs : U k → M = {0, 1, · · · , ekR } generates an index
(k) (k,n) Note that Em (PSX , PY |X ) and Ex (R, PSX , PY |X ) denote
M = fs (U k ) and the channel encoder fc : M →
Borade et al.’s red-alert exponent and the expurgated exponent
C˜ = {X n (j), j ∈ [0 : ekR ]} generates the channel-input
with fixed distribution PSX , respectively, and that both are
1 Since the number of types is polynomial in the blocklength, these can be inter-dependent through PSX . Thus, varying PSX provides a
communicated error-free at asymptotically zero-rate. trade-off between the reliability for the ordinary messages and
⎧ ⎛ ρ1 ⎞⎫
⎨ ⎬
Ex (R, PY |X ) := max max −ρ R − ρ log ⎝ PX (x)PX (x̃) PY |X (y|x)PY |X (y|x̃) ⎠ , (8)
PX ρ≥1 ⎩ ⎭
yx,x̃
the special message. We will use Borade et al.’s scheme for are additional terms introduced due to the noisiness of the
channel coding in the SHTCC scheme, such that the error channel. E3 (PW |U , PSX , R, τ ) corresponds to the event when
(k)
message and the other messages correspond to the special and M = 0, M̂ = M and gs (M̂, V k ) = 0, whereas
ordinary messages, respectively. The SHTCC scheme will be E4 (PW |U , PSX , R, τ ) is due to the event when M = 0,
described in detail in Appendix A. We next state a lower (k)
M̂ = M and gs (M̂, V k ) = 0. Note that, in general,
bound on κ(τ, ) that is achieved by the SHTCC scheme. Em (PSX ) can take the value of ∞ and when this happens,
For brevity, we will use the shorter notations C, Em (PSX ) the term τ Em (PSX ) becomes undefined for τ = 0. In this
and Ex (R, PSX ) instead of C(PY |X ), Em (PSX , PY |X ) and case, we define τ Em (PSX ) := 0.
Ex (R, PSX , PY |X ), respectively.
Remark 3. In the SHTCC scheme, although we use
Theorem 2. For τ ≥ 0, κ(τ, ) ≥ κs (τ ), ∀ ∈ (0, 1], where Borade et al.’s scheme for channel coding, that is concerned
κs (τ ) is defined in (11). specifically with the protection of a special message when
The proof of Theorem 2 is given in Appendix A. Although the ordinary message rate is R, any other channel coding
the expression κs (τ ) in Theorem 2 appears complicated, the scheme with the same rate can be employed. For instance, the
terms E1 (PW |U ) to E4 (PW |U , PSX , R, τ ) can be understood ordinary message can be transmitted with an error-exponent
to correspond to distinct events that can possibly lead to a equal to the reliability function E(R, PY |X ) [23] of the
type II error. Note that E1 (PW |U ) and E2 (PW |U , PSX , R) are channel PY |X at rate R, while the special message achieves
the same terms appearing in the error-exponent achieved by the the maximum reliability possible subject to this constraint.
Shimokawa et al.’s scheme [8] for the noiseless channel set- However, it should be noted that a computable character-
ting, while E3 (PW |U , PSX , R, τ ) and E4 (PW |U , PSX , R, τ ) ization of neither E(R, PY |X ) (for all values of R) nor
Ex (R, PSX , PY |X )
⎧ ⎛ ρ1 ⎞⎫
⎨ ⎬
:= max −ρ R − ρ log ⎝ PS (s)PX|S (x|s)PX|S (x̃|s) PY |X (y|x)PY |X (y|x̃) ⎠ . (10)
ρ≥1 ⎩ ⎭
s,x,x̃ y

κs (τ ) := sup min E1 (PW |U ), E2 (PW |U , PSX , τ ), E3 (PW |U , PSX , τ ), E4 (PW |U , PSX , τ ) , (11)
(PW |U ,PSX ,R)
∈ B(τ,PY |X )
where

(PW |U , PSX , R) : S = X , PUV W SXY (PW |U , PSX ) := PUV PW |U PSX PY |X ,
B τ, PY |X := , (12)
IP (U ; W |V ) ≤ R < τ IP (X; Y |S)
E1 (PW |U ) := min D(PŨ Ṽ W̃ ||QUV W ), (13)

PŨ Ṽ W̃ ∈T1 (PU W ,PV W )

min
PŨ Ṽ W̃ ∈T2 (PU W ,PV ) D(PŨ Ṽ W̃ ||QUV W ) + R − IP (U ; W |V ), if IP (U ; W ) > R,
E2 (PW |U , PSX , R) := (14)
∞, otherwise,
E3 (PW |U , PSX , R, τ )

min
PŨ Ṽ W̃ ∈T3 (PU W ,PV ) D(PŨ Ṽ W̃ ||QUV W ) + R − IP (U ; W |V ) + τ Ex R τ , PSX , if IP (U ; W ) > R,
:= min
(15)
PŨ Ṽ W̃ ∈T3 (PU W ,PV ) D(PŨ Ṽ W̃ ||QUV W ) + IP (V ; W ) + τ Ex R τ , PSX , otherwise,

D(PV ||QV ) + R − IP (U ; W |V ) + τ Em (PSX ) , if IP (U ; W ) > R,
E4 (PW |U , PSX , R, τ ) := (16)
D(PV ||QV ) + IP (V ; W ) + τ Em (PSX ) , otherwise,
QUV W := QUV PW |U ,
T1 (PUW , PV W ) := {PŨ Ṽ W̃ ∈ TU VW : PŨ W̃ = PUW , PṼ W̃ = PV W },
T2 (PUW , PV ) := {PŨ Ṽ W̃ ∈ TU VW : PŨ W̃ = PUW , PṼ = PV , H(W̃ |Ṽ ) ≥ HP (W |V )},
T3 (PUW , PV ) := {PŨ Ṽ W̃ ∈ TU VW : PŨ W̃ = PUW , PṼ = PV }.
the associated best reliability achievable for a single message we assume the matched-bandwidth scenario, i.e., k = n
is known in general. (τ = 1). In hybrid coding, the source U n is first mapped to
one of the codewords W̄ n within a compression codebook.
Remark 4. Similarly to the zero-rate compression scenario
Then, a symbol-by-symbol function (deterministic) of W̄ n
considered in [6] for the case of a rate-limited noiseless
and U n is transmitted as the channel codeword X n . This
channel, it is possible to achieve an error-exponent of κ0 (τ ) in
procedure is reversed at the decoder, in which, the decoder
general by using a one-bit communication scheme (see [30]), ˆ n of W̄ n using the
first attempts to obtain an estimate W̄
where
n
channel output Y and its own correlated side information
D(PV ||QV ) , if τ = 0, V n . Then, the reconstruction Û n of the source is obtained as
κ0 (τ ) :=
min {β0 , τ Ec + D(PV ||QV )} , otherwise. a symbol-by-symbol function of the reconstructed codeword,
Here, Y n and V n . In this subsection, we propose a lower bound
on the optimal error-exponent that is achieved by a scheme
β0 := β0 (PU , PV , QUV ) := min D(PŨ Ṽ ||QUV ), that utilizes hybrid coding for the communication between the
PŨ Ṽ :
PŨ =PU , PṼ =PV observer and the detector, which we refer to as the JHTCC
scheme. Post estimation of W̄ ˆ n , the detector performs the
(17)
ˆ
hypothesis test using W̄ n , Y n and V n , instead of estimating
and
Û n as is done in JSCC problems. We will in fact consider a
Ec := Ec (PY |X ) := D(PY |X=a ||PY |X=b ), (18)
slightly generalized form of hybrid coding in that the encoder
where a and b denote channel input symbols that satisfy and detector is allowed to perform “time-sharing” according
(a, b) = arg max D(PY |X=x ||PY |X=x ). (19) to a sequence S n that is known a priori to both parties.
(x,x )∈X ×X Also, the input X n is allowed to be generated according
Note that β0 denotes the optimal error-exponent for distributed to an arbitrary memoryless stochastic function instead of a
HT over a noiseless channel, when the communication rate- deterministic function. The JHTCC scheme will be described
constraint is zero [6] [7]. in detail in Appendix B. Next, we state a lower bound on
κ(τ, ) that is achieved by the JHTCC scheme.
In [30], it is shown that the one-bit communication scheme
mentioned in Remark 4 achieves the optimal error-exponent Theorem 5. κ(1, ) ≥ κh , ∀ ∈ (0, 1], where κh is defined
for HT over a DMC, i.e., when the detector has no side- in (20) shown in the next page.
information. Moreover, it is also proved that optimal error- The proof of Theorem 5 is given in Appendix B. The
exponent is not improved if the type I error probability different factors inside the minimum in (20) can be intuitively
constraint is relaxed; and hence, strong converse holds. In the understood to be related to the various events that could
limiting case of zero channel capacity, i.e., C(PY |X ) = 0, possibly lead to a type 2 error. More specifically, let
it is intuitive to expect that communication from the observer the event that the encoder is unsuccessful in finding a
to the detector does not improve the achievable error-exponent codeword W̄ n in the quantization codebook that is typical
for distributed HT. In Appendix C below, we show that this with U n be referred to as the encoding error, and the
is indeed the case in a strong converse sense, i.e., the optimal event that a wrong codeword W̄ ˆ n (unintended by the
error-exponent depends only on the side-information V k , and encoder) is reconstructed at the detector be referred to
is given by D(PV ||QV ), for any constraint ∈ (0, 1) on as the decoding error. Then, E1 (PS , PW̄ |US , PX|US W̄ )
the type I error probability. This is in contrast to the zero- is related to the event that neither the encoding nor the
rate compression case considered in [6], where one bit of decoding error occurs, while E2 (PS , PW̄ |US , PX|US W̄ )
communication between the observer and detector can achieve and E3 (PS , PW̄ |US , PX |US , PX|US W̄ ) are related to
a strictly positive error-exponent, in general. the events that only the decoding error and both the
The SHTCC schemes introduced above performs indepen- encoding and decoding errors occur, respectively. From
(k,n)
dent HT and channel coding, i.e., the channel encoder fc Theorem 2 and Theorem 5, we have the following
(k)
neglects U k given the output M of source encoder fs , and corollary.
(k)
gs neglects Y n given the output of the channel decoder
(k,n) Corollary 6.
gc . The following scheme ameliorates these restrictions and
uses hybrid coding to perform joint HT and channel coding.
κ(1, ) ≥ max {κh , κs (1)} , ∀ ∈ (0, 1]. (27)
B. JHTCC Scheme
Hybrid coding is a form of JSCC introduced in [28] for It is well-known that in the context of JSCC, hybrid coding
the lossy transmission of sources over noisy networks. As the recovers separate source-channel coding as a special case [28].
name suggests, hybrid coding is a combination of the digital It is also known that hybrid coding, of which uncoded
and analog (uncoded) transmission schemes. For simplicity2 , transmission is a special case, strictly outperforms separation
based schemes in certain multi-terminal settings [27]. Below,
2 For the case τ = 1, as mentioned in [28], we can consider hybrid coding
∗ ∗ we provide an example where the error-exponent achieved by
over super symbols U k and X n , where k ∗ and n∗ are some integers
satisfying the constraint n∗ ≤ τ k ∗ . However, we omit its description since the JHTCC scheme is strictly better than that achieved by the
the technique is standard and only adds notational clutter. SHTCC scheme, i.e., κh > κs (1).
Example 1. Let U = V = X = Y = {0, 1} and PU = QU = Also, we have

[0.5 0.5]. Let
E4 (PW |U , PSX , R, 1)
1 − p0 p0 1 − p1 p1
PV |U = , QV |U = , R − IP (U ; W |V ) + Em (PSX ) , if IP (U ; W ) > R,
p0 1 − p0 p1 1 − p1 :=
IP (V ; W ) + Em (PSX ) , otherwise,
and
1−q q (34)
PY |X = ,
q 1−q ≥ E3 (PW |U , PSX , R, 1),
where q = 0.2, p0 = 0.8 and p1 = 0.25. For this example, since Em (PSX ) ≥ Ex (R, PSX ) (the reliability of a special
we have κh ≥ 0.3244 > 0.161 ≥ κs (1). message in Borade et al.’s scheme is at least as good as that
Proof: Note that PV = QV = [0.5 0.5], and of an ordinary message), which implies (29). Given that (29)
holds, |S| can be taken to be equal to 1, and PX can be
HQ (V |W ) ≥ HP (V̄ |W ) = HP (V |W ), V̄ = V ⊕ 1, (28) chosen to be the capacity achieving channel input distribution
(PX (0) = PX (1) = 0.5) which maximizes Ex (R, PSX ) (for
for any W that satisfies V − U − W , since
any R) (see [24] and [23, Exercise 10.26]) without loss of
generality. Hence, IP (X; Y ) = C(PY |X ) = 1 − hb (q).
1 − p̄0 p̄0
PV̄ |U = Let r := h−1 −1 −1
p̄0 1 − p̄0 b (HP (U |W )) = hb (HQ (U |W )), where hb :
[0, 1] → [0, 0.5] is the inverse of the binary entropy function
with p̄0 = 0.2 < p1 . Then, the lower bound κs (1) simplifies given by hb (r) := −r log2 (r) − (1 − r) log2 (1 − r). First,
as consider PW |U ∈ B̃, where
κs (1) = sup min{E1 (PW |U ), E2 (PW |U , PSX , R), B̃ := {PW |U : IP (U ; W ) < IP (X; Y ) = 1 − hb (q)}. (35)
(PW |U ,PSX ,R)
∈ B(1,PY |X )
Note that if R ≥ IP (U ; W ), then E2 (PW |U , PSX , R) =
E3 (PW |U , PSX , R, 1)}. (29) ∞, and E3 (PW |U , PSX , R, 1) = IP (V ; W ) + Ex (R, PSX ).
Hence,
To see this, consider an arbitrary (PW |U , PSX , R) ∈
B(1, PY |X ). We have E1 (PW |U ), E2 (PW |U , PSX , R) and min{E2 (PW |U , PSX , R), E3 (PW |U , PSX , R, 1)}
E3 (PW |U , PSX , R, 1) as given in (30)-(32) below, since = IP (V ; W ) + Ex (R, PSX )
QUV W ∈ T2 (PUW , PV ) ∩ T3 (PUW , PV ), which follows from
≤ IP (V ; W ) + Ex (I(U ; W ), PSX ), (36)
(28), PUW = QUW and PV = QV . This in turn implies that
min where (36) follows since Ex (R, PSX ) is a decreas-
PŨ Ṽ W̃ ∈T2 (PU W ,PV ) D(PŨ Ṽ W̃ ||QUV W ) ing function of R. On the other hand, if R <
= PŨ Ṽ W̃ ∈Tmin
3 (PU W ,PV )
D(PŨ Ṽ W̃ ||QUV W ) = 0. (33) IP (U ; W ), then E2 (PW |U , PSX , R) = R − IP (U ; W |V ) and

κh := sup min E1 (PS , PW̄ |US , PX|US W̄ ), E2 (PS , PW̄ |US , PX|US W̄ ), E3 (PS , PW̄ |US , PX |US , PX|US W̄ ) , (20)
b ∈ Bh

b = PS , PW̄ |US , PX |US , PX|US W̄ : IP̂ (U ; W̄ |S) < IP̂ (W̄ ; Y, V |S), X = X ,
Bh := , (21)
P̂UV S W̄ X XY PS , PW̄ |US , PX |US , PX|US W̄ := PUV PS PW̄ |US PX |US PX|US W̄ PY |X

E1 PS , PW̄ |US , PX|US W̄ := min D PŨ Ṽ S̃ W̃ Ỹ ||Q̂UV S W̄ Y , (22)
PŨ Ṽ S̃ W̃ Ỹ ∈T1 (P̂U S W̄ ,P̂V S W̄ Y )

E2 PS , PW̄ |US , PX|US W̄ := min D PŨ Ṽ S̃ W̃ Ỹ ||Q̂UV S W̄ Y + IP̂ (W̄ ; V, Y |S) − IP̂ (U ; W̄ |S), (23)
PŨ Ṽ S̃ W̃ Ỹ ∈T2 (P̂U S W̄ ,P̂V S W̄ Y )

E3 PS , PW̄ |US , PX |US , PX|US W̄ := D(P̂V SY ||Q̌V SY ) + IP̂ (W̄ ; V, Y |S) − IP̂ (U ; W̄ |S), (24)
Q̂UV S W̄ X XY (PS , PW̄ |US , PX |US , PX|US W̄ ) := QUV PS PW̄ |US PX |US PX|US W̄ PY |X , (25)
Q̌UV SX XY (PS , PX |US ) := QUV PS PX |US ½(X = X )PY |X , (26)
T1 (P̂US W̄ , P̂V S W̄ Y ) := {PŨ Ṽ S̃ W̃ Ỹ ∈ TU VSWY : PŨ S̃ W̃ = P̂US W̄ , PṼ S̃ W̃ Ỹ = P̂V S W̄ Y },
T2 (P̂US W̄ , P̂V S W̄ Y ) := {PŨ Ṽ S̃ W̃ Ỹ ∈ TU VSWY : PŨ S̃ W̃ = P̂US W̄ , PṼ S̃ Ỹ = P̂V SY , H(W̃ |Ṽ, S̃, Ỹ ) ≥ HP̂ (W̄ |V, S, Y )}.
E3 (PW |U , PSX , R, 1) = R − IP (U ; W |V ) + Ex (R, PSX )

yielding that
min{E2 (PW |U , PSX , R), E3 (PW |U , PSX , R, 1)}

= R − IP (U ; W |V ) ≤ IP (V ; W ). (37)
Hence, from (36) and (37), we have
sup min{E2 (PW |U , PSX , R), E3 (PW |U , PSX , R,1)}

(PW |U ,PSX ,R)
∈ B(1,PY |X ):
PW |U ∈B̃
≤ IP (V ; W ) + Ex (I(U ; W ), PSX ).
Also, note that (35) implies hb (r) ≥ hb (q); and hence, r ∈

[q, 0.5]. Thus, we can write
IP (V ; W ) + Ex (I(U ; W ), PSX )
= 1 − HP (V |W ) + Ex (I(U ; W ), PSX )
≤ 1 − hb (h−1
b (H(U |W )) ∗ p0 ) + Ex (I(U ; W ), PSX ) (38) Fig. 2. Plot of f (r) in the range r ∈ [0.2, 0.5].
= 1 − hb (r ∗ p0 ) + Ex (1 − hb (r), PSX ) := f (r), (39)
and hence,
where p ∗ q := (1 − p)q + p(1 − q), and (38) follows by an
application of Mrs. Gerber’s Lemma [31]. The plot of f (r)
sup min{E2 (PW |U , PSX , R), E3 (PW |U , PSX , R, 1)}
as a function of r ∈ [q, 0.5] is shown in Fig. 2 below, which (PW |U ,PSX ,R)
uses the expression for Ex (R, PSX ) given in [23, Exercise ∈ B(1,PY |X ):
10.26]. As is evident from the plot, the maximum value of PW |U ∈B̃c
f (r) is attained at r = 0.5, and equals f (0.5) = Ex (0) = ≤ sup R − IP (U ; W |V )
−0.5 ∗ 0.5 ∗ log2 (4q(1 − q)) = 0.161. It follows that (PW |U ,PSX ,R)∈ B(1,PY |X ):
PW |U ∈B̃c
sup min{E2 (PW |U , PSX , R), E3 (PW |U , PSX , R,1)} < 1 − hb (q) − (hb (r ∗ p0 ) − hb (r)) (42)
(PW |U ,PSX ,R)
∈ B(1,PY |X ): ≤ 1 − hb (q ∗ p0 ) = 0.0956, (43)
PW |U ∈B̃
≤ 0.161. (40) where (42) follows again from Mrs. Gerber’s lemma, and (43)
follows since the R.H.S. of (42) is an increasing function of
Next, consider that PW |U ∈ B̃ c , where r and hence the maximum is attained at r = q in the range
[0, q]. Thus, from (40) and (43), it follows that κs (1) ≤ 0.161.
B̃ c := {PW |U : IP (W ; U ) ≥ 1 − hb (q) ≥ IP (U ; W |V )}. Finally, we show that the JHTCC scheme can achieve
a strictly larger error-exponent, i.e., κh > 0.161. In fact,
Note that the first and second inequalities in the definition of uncoded transmission which is a special case of the JHTCC
B̃ c imply, respectively, that r ∈ [0, q], and scheme with X = X = U , W = S = constant, achieves an
error-exponent of
1 − hb (r) − (1 − hb (r ∗ p0 )) ≤ 1 − hb (q). (41)
Also, since R < 1 − hb (q) holds for any D(PV Y ||QV Y ) = Db (q ∗ p0 ||q ∗ p1 )
(PW |U , PSX , R) ∈ B(1, PY |X ), we have IP (U ; W ) > R, = Db (0.68||0.35) = 0.3244, (44)
E1 (PW |U ) := min D(PŨ Ṽ W̃ ||QUV W ), (30)

PŨ Ṽ W̃ ∈T1 (PU W ,PV W )

R − IP (U ; W |V ), if IP (U ; W ) > R,
E2 (PW |U , PSX , R) = (31)
∞, otherwise,

R − IP (U ; W |V ) + Ex (R, PSX ) , if IP (U ; W ) > R,
E3 (PW |U , PSX , R, 1) := (32)
IP (V ; W ) + Ex (R, PSX ) , otherwise,
where, Db denotes the binary KL divergence

defined as where (48) follows from the log-sum inequality [23]. Also,
Db (p||q) = p log2 pq + (1 − p) log2 1−p 1−q . Thus, we have
E2 PW |U , PSX , Rm
shown that the error-exponent achieved by the JHTCC scheme
is strictly greater than that achieved by the SHTCC scheme. ≥ Rm − I(U ; W |E, Z)
≥ I(U ; W |Z) − I(U ; W |E, Z) = I(E; W |Z),
Thus far, we obtained lower bounds on the optimal error-
min
exponent for distributed HT over a DMC, and showed via D(PŨ Ẽ Z̃ W̃ ||PZ PU|Z PE|Z PW |U )
PŨ Ẽ Z̃ W̃ ∈T3 (PU W ,PEZ )
an example that the joint scheme strictly outperforms the
Rm
separation based scheme in some cases. In order to get an exact + Rm − I(U ; W |E, Z) + τ Ex , PSX
characterization of the optimal error-exponent, a matching τ
upper bound is required. However, obtaining a tight com- ≥ I(U ; W |Z) − I(U ; W |E, Z) = I(E; W |Z), (49)
putable upper bound remains a challenging open problem in min
the general hypothesis testing case even when the channel is D(PŨ Ẽ Z̃ W̃ ||PZ PU|Z PE|Z PW |U )
PŨ Ẽ Z̃ W̃ ∈T3 (PU W ,PEZ )

noiseless, and consequently, an exact computable characteriza- Rm
+ I(E, Z; W ) + τ Ex , PSX
tion of the optimal error-exponent is unknown. However, as we τ
show in the next section, the problem does admit single-letter
≥ I(E; W |Z), (50)
characterization for TACI.
D(PEZ ||PEZ ) + Rm − I(U ; W |E, Z) + τ Em (PSX )
IV. O PTIMALITY R ESULT FOR TACI ≥ I(U ; W |Z) − I(U ; W |E, Z) = I(E; W |Z), (51)
Recall that for TACI, V = (E, Z) and QUEZ = PUZ PE|Z . D(PEZ ||PEZ ) + I(E, Z; W ) + τ Em (PSX )
Let ≥ I(E; W |Z), (52)
κ(τ ) := lim κ(τ, ). (45) where in (49)-(52), we used the non-negativity of
→0
KL-divergence, Ex (·, ·) and Em (·). Thus, from (49)-(52),
We will drop the subscript P from information theoretic it follows that
quantities like mutual information, entropy, etc., as there is no E3 (PW |U , PSX , Rm , τ ) ≥ I(E; W |Z), (53)
ambiguity on the joint distribution involved, e.g., IP (U ; W )
will be denoted by I(U ; W ). The following result holds. and
E4 (PW |U , PSX , Rm , τ ) ≥ I(E; W |Z). (54)
Proposition 7. For TACI over a DMC PY |X ,
Denoting B(τ, PY |X ) and B (τ, PY |X ) by B and B , respec-
κ(τ ) = tively, we obtain

I(E; W |Z) : ∃ W s.t. I(U ; W |Z) ≤ τ C(PY |X ),
sup . κ(τ, )
(Z, E) − U − W, |W| ≤ |U| + 1.
≥ sup min E1 (PW |U ), E2 (PW |U , PSX , Rm ),
(46) (PW |U ,PSX ,Rm )∈B

Proof: For the proof of achievability, we will show that E3 (PW |U , PSX , Rm , τ ), E4 (PW |U , PSX , Rm , τ )
κs (τ ) when specialized to TACI recovers (46). Let μ >
0 be a arbitrarily small positive number, and B τ, PY |X ≥ sup I(E; W |Z)
(PW |U ,PSX ,Rm )∈B
be as defined in (47) below. Note that B (τ, PY |X ) ⊆
B(τ, PY |X ) since I(U ; W |E, Z) ≤ I(U ; W |Z), which holds ≥ sup I(E; W |Z) (55)
(PW |U ,PSX ,Rm )∈B
due to the Markov chain (Z, E) − U − W . Now, consider
(PW |U , PSX , Rm ) ∈ B (τ, PY |X ). Then, we have = sup I(E; W |Z), (56)
PW |U :I(W ;U|Z)≤τ C(PY |X )−μ
E1 (PW |U ) where (55) follows from the fact that B ⊆ B; and (56) follows
= min D(PŨ Ẽ Z̃ W̃ ||PZ PU|Z PE|Z PW |U ) by maximizing over all PSX and noting that sup I(X; Y |S) =
PŨ Ẽ Z̃ W̃ ∈T1 (PU W ,PEZW ) PXS
≥ min D(PẼ Z̃ W̃ ||PZ PE|Z PW |Z ) (48) C(PY |X ). The proof of achievability is complete by noting
PŨ Ẽ Z̃ W̃ ∈T1 (PU W ,PEZW ) that μ > 0 is arbitrary and I(E; W |Z) and I(U ; W |Z) are
= I(E; W |Z), continuous functions of PW |U .

(PW |U , PSX , Rm ) : S = X , PUEZW SXY (PW |U , PSX ) := PUEZ PW |U PSX PY |X ,
B τ, PY |X := . (47)
I(U ; W |Z) ≤ Rm := τ I(X; Y |S) − μ < τ I(X; Y |S)
Converse: For any sequence of encoding functions f (k,nk ) , Here, (64) follows due to Z k − U k − Y nk ; (65) follows since
acceptance regions A(k,nk ) for H0 such that nk ≤ τ k and the sequences (U k , Z k ) are memoryless; (66) follows since
E i−1 − (Y nk , U i−1 , Z i−1 , Zi+1
k
) − Ui ; (67) follows from the
lim sup α k, nk , f (k,nk ) , g (k,nk ) = 0, (57) fact that T is independent of all the other r.v.’s. Finally, note
k→∞
that (E, Z) − U − W holds and that the cardinality bound on
we have similar to [5, Theorem 1 (b)], that W follows by standard arguments based on Caratheodory’s
−1
lim sup log β k, nk , f (k,nk ) , g (k,nk ) theorem. This completes the proof of the converse, and hence
k→∞ k of the proposition.
1 As the above result shows, TACI is an instance of distributed
≤ lim sup D (PY nk E k Z k ||QY nk E k Z k ) (58)
k→∞ k HT over a DMC, in which, the optimal error-exponent is equal
1 to that achieved over a noiseless channel of the same capacity.
= lim sup I(Y nk ; E k |Z k ) (59) Hence, a noisy channel does not always degrade the achievable
n→∞ k
1 error-exponent. Also, notice that a separation based coding
= H(E|Z) − lim inf H(E k |Y nk , Z k ), (60) scheme that performs independent HT and channel coding is
k→∞ k
sufficient to achieve the optimal error-exponent for TACI. The
where (59) follows since QY nk E k Z k = PY nk Z k PE k |Z k .
investigation of a single-letter characterization of the optimal
Now, let T be a r.v. uniformly distributed over [k] and
error-exponent for TACI over a DMC is inspired from an
independent of all the other r.v.’s (U k , E k , Z k , X nk , Y nk ).
analogous result for TACI over a noiseless channel. It would
Define an auxiliary r.v. W := (WT , T ), where Wi :=
be interesting to explore whether the noisiness of the channel
(Y nk , E i−1 , Z i−1 , Zi+1
k
), i ∈ [k]. Then, the last term can be
enables obtaining computable characterizations of the error-
single-letterized as follows.
k exponent for some other special cases of the problem.
H(E k |Y nk , Z k ) = H(Ei |E i−1 , Y nk , Z k )
i=1
k V. C ONCLUDING R EMARKS
= H(Ei |Zi , Wi )
i=1 In this paper, we have studied the error-exponent achievable
= kH(ET |ZT , WT , T ) for distributed HT problem over a DMC with side information
= kH(E|Z, W ). (61) available at the detector. We obtained single-letter lower
bounds on the optimal error-exponent for general HT, and
Substituting (61) in (60), we obtain exact single-letter characterization for TACI. It is interesting to
−1
(k,n ) note from our results that the reliability function of the channel
lim sup log β k, nk , f1 k , g (k,nk ) ≤ I(E; W |Z).
k→∞ k does not play a role in the characterization of the optimal error-
Next, note that the data processing inequality applied to the exponent for TACI, and only the channel capacity matters.
Markov chain (Z k , E k )−U k −X n −Y n yields I(U k ; Y nk ) ≤ We also showed via an example that the lower bound on the
I(X nk ; Y nk ) which implies that error-exponent obtained using our joint hypothesis testing and
channel coding scheme is strictly better than that obtained
I(U k ; Y nk ) − I(U k ; Z k ) ≤ I(X nk ; Y nk ). (62) using our separation based scheme. Although this does not
The R.H.S. of (62) can be upper bounded due to the memo- imply that “separation does not hold” for distributed HT
ryless nature of the channel as over a DMC, it points to the possibility that joint HT and
channel coding schemes outperform separation based schemes,
I(X nk ; Y nk ) ≤ nk max I(X; Y ) = nk C(PY |X ), (63) in general, and it is worthwhile investigating this aspect in
PX
greater detail. While a strong converse holds for distributed
while the left hand side (L.H.S.) can be simplified HT over a rate-limited noiseless channel [5], it remains an
as follows. open question whether this property holds for noisy channels.
I(U k ; Y nk ) − I(U k ; Z k ) As a first step, it is shown in [30] that this is indeed the case for
HT over a DMC with no side-information. While we did not
= I(U k ; Y nk |Z k ) (64)
discuss the complexity of the schemes considered in this paper,
k
= I(Y nk ; Ui |U i−1 , Z k ) it is an important factor that needs to be taken into account in
i=1 any practical implementation of these schemes. In this regard,
k
= I(Y nk , U i−1 , Z i−1 , Zi+1
k
; Ui |Zi ) (65) it is evident that the SHTCC and JHTCC schemes are in
i=1 increasing order of complexity.
k
= I(Y nk , U i−1 , Z i−1 , Zi+1
k
, E i−1 ; Ui |Zi ) (66)
i=1
k A PPENDIX A
≥ I(Y nk , Z i−1 , Zi+1
k
, E i−1 ; Ui |Zi ) P ROOF OF T HEOREM 2
i=1
k The proof outline is as follows. We first describe the
= I(Wi ; Ui |Zi ) = kI(WT ; UT |ZT , T ) encoding and decoding operations of the SHTCC scheme. The
i=1
random coding method is used to analyze the type I and type II
= kI(WT , T ; UT |ZT ) (67)
error probabilities achieved by this scheme, averaged over the
= kI(W ; U |Z). ensemble of randomly generated codebooks. By the standard
expurgation technique [24] (e.g., removing “worst” codebooks M = m = fB (j), m ∈ [ekR ] or M = m = j depending on
in the ensemble with the highest type I error probability such whether I(U ; W ) + μ > R, or otherwise. If uk ∈ k
/ T[U]
or
δ
that the total probability of the removed codebooks lies in (k)
such an index j does not exist, fs outputs the error message
the interval (0.5, 1)), this guarantees the existence of at least M = 0. The channel encoder fc
(k,n)
transmits the codeword
one deterministic codebook that achieves type I and type II n
x (m) from codebook C. ˜
error probabilities of the same order, i.e., within a constant (k,n)
Decoding: At the decoder, gc outputs M̂ = 0 if for
multiplicative factor. Since, in our scheme below, the type
some 1 ≤ i ≤ |S|, the channel outputs corresponding to the
I error probability averaged over the random code ensemble
ith block does not belong to T[Pn
Y |S=si ]δ
. Otherwise, M̂ is set
vanishes asymptotically with the the number of samples k,
as the index of the codeword corresponding to the maximum-
the same holds for the codebook obtained after expurgation.
likelihood candidate among the ordinary codewords. If M̂ =
Moreover, the error-exponent is not affected by a constant
0, H1 is declared. Else, given the side information sequence
multiplicative factor on the type II error probability, and thus, (k,n)
V k = v k and estimated bin-index M̂ = m̂, gs searches for
this codebook asymptotically achieves the same type I error
a typical sequence ŵk = wk (ĵ) ∈ T[W k
]δ̂ , in codebook C such
probability and error-exponent as the average.
that
For brevity, in the proof below, we denote the information
theoretic quantities like IP (U ; W ), T[Pk
U W ]δ
, etc., that are ĵ = arg min He (wk (l)|v k ), if I(U ; W ) + μ > R,
computed with respect to joint distribution PUV W SXY given l: fB (l)=m̂,
k w k (l)∈T[W
k
in (68) by I(U ; W ), T[UW ]δ , etc.
]
δ̂
Codebook Generation: Let k ∈ Z+ and n = τ k. Fix ĵ = m̂, otherwise.

a finite alphabet W, a positive number (small) δ > 0, and
The decoder declares Ĥ = 0 if (ŵk , v k ) ∈ T[W
k
V ] . Else,
distributions PW |U and PSX . Let δ := δ2 , δ̂ := |U|δ, δ̃ := δ̃
δ Ĥ = 1 is declared.
2δ, δ̄ := |V| , δ̌ := |W|δ̃ and
We next analyze the type I and type II error probabilities
PUV W SXY (PW |U , PSX ) := PUV PW |U PSX PY |X . (68) achieved by the above scheme.
Analysis of Type I error: A type I error occurs only if one
Let μ = O(δ) (subject to constraints that will be specified of the following events happen.
below) and R be such that
ET E = (U k , V k ) ∈ k
/ T[UV ]δ̄
I(U ; W |V ) + 2μ ≤ R ≤ τ I(X; Y |S) − μ. (69)
Denoting Mk := ek(I(U:W )+μ) , the source codebook C used EEE = j ∈ [Mk ] : (U k , W k (j)) ∈ T[UWk
]δ
(k)
by the source encoder fs is obtained by generating Mk k k
EME = (V , W (J)) ∈ k
/ T[V W ]
k
sequences w (j), j ∈ [Mk ], independently at random accord-
δ̃
k EDE = ∃ l ∈ [Mk ] , l = J : fB (l) = fB (J), W k (l) ∈ T[W
k
ing to the distribution i=1 PW (wi ), where ]δ̂ ,
k k k k

He (W (l)|V ) ≤ He (W (J)|V )
PW (w) = PW |U (w|u)PU (u), ∀ w ∈ W.
u∈U ECD = gc(k,n) (Y n ) = M
The channel codebook C˜ used by fc
(k,n)
is obtained as follows. P(ET E |H = 0) tends to 0 asymptotically by the weak law of
The codeword length n is divided into |S| = |X | blocks, where large numbers. Conditioned on ETc E , U k ∈ T[U]δ and by the
the length of the first block is PS (s1 )n, the second block is covering lemma [23, Lemma 9.1], it is well known that for μ =
PS (s2 )n, so on so forth, and the length of the last block is O(δ) chosen appropriately, P(EEE |ETc E ) tends to 0 doubly
chosen such that the total length is n. The codeword xn (0) = exponentially with k. Given EEE c
∩ ETc E holds, it follows
sn corresponding to M = 0 is obtained ! by repeating the from the Markov chain relation V − U − W and the Markov
kR
letter si in block
" i.#The remaining e ordinary codewords lemma [31], that P(EME |ETc E ∩EEE
c
) tends to zero as k → ∞.
xn (m), m ∈ ekR , are obtained by blockwise i.i.d. random Next, we consider P(EDE ). Given that EME c c
∩ EEE ∩ ETc E
coding, i.e., the symbols in the ith block of each codeword holds, note that for k sufficiently large, He (W (J)|V k ) ≤
k
are generated i.i.d. according to PX|S=si . The sequence sn is H(W |V ) + O(δ). Thus, following the steps leading to (73)
revealed to the detector. given below, we have (for sufficiently large k) that
Encoding: If I(U ; W ) + μ > R, i.e., the number of
P(EDE | V k = v k , W k (J) = wk , EME
c c
∩ EEE ∩ ETc E )
codewords in the source codebook is larger than the number (k)
of codewords in the channel codebook, the encoder performs ≤ e−k(R−I(U;W |V )−δ1 )
, (70)
uniform random binning on the sequences wk (i), i ∈ [Mk ] in where
C, i.e., for each codeword in C, it selects an index uniformly 1 log(2)
(k)
at random from the set [ekR ]. Denote the bin index selected δ1 = μ + O(δ) +|V||W| log(k + 1) + .
k k
for wk (i) by fB (i). If the observed sequence U k = uk is
(k) To obtain (71) shown at the bottom of the next page, we used
typical, i.e., uk ∈ T[U]
k
, the source encoder fs first looks
δ the fact that
for a sequence wk (j) in C such that (uk , wk (j)) ∈ T[UW k
]δ .
If there exist multiple such codewords, it chooses an index j P(W k (l) = w̃k | EME
c c
∩ EEE ∩ ETc E , W k (J) = wk , V k = v k )
among them uniformly at random, and outputs the bin-index ≤ 2 · P(W k (l) = w̃k ). (74)
This follows similarly to (101), which is discussed in the Analysis of Type II error: First, note that a type II error
type II error analysis section below. In order to obtain the occurs only if V k ∈ T[V k
]δ̌ , and hence, we can restrict the
expression in (72), we first summed over the types PW̃ of type II error analysis to only such V k . Denote the event that
k
sequences within the typical set T[W ]δ that have empirical a type II error happens by D0 . Let
k k
entropy less than He (w |v ); and used the facts that the
number of sequences within such a type is upper bounded E0 = U k ∈ k
/ T[U] δ
. (78)
by ek(H(W |V )+γ1 (k)) , and the total number of types is Then, the type II error probability can be written as
upper bounded by (k + 1)|V||W| [23]. Summing over all
(wk , v k ) ∈ T[V
k
W ] , we obtain (for sufficiently large k) that β k, n, f (k,n) , g (k,n)
δ̃
c c

P(EDE |EME ∩ EEE ∩ ETc E ) = P(U k = uk , V k = v k |H = 1)

≤ P(W k (J) = wk , V k = v k |EME
c c
∩ EEE ∩ ETc E ) (uk ,v k )∈U k ×V k
(w k ,v k )∈ P(D0 |U k = uk , V k = v k ). (79)
k
T[W c
V]
δ̃ Let EN E := EEE ∩ E0c . The last term in (79) can be upper
(k)
e−k(R−I(U;W |V )−δ1 ) bounded as follows.
(k) μ
≤ e−k(R−I(U;W |V )−δ1 )
≤ e−k 2 , (75) P(D0 |U k = uk , V k = v k )
where, (75) follows from (69) by choosing μ = O(δ) = P(EN E |U k = uk , V k = v k ) P(D0 |U k = uk , V k = v k , EN E )
c k k k k k k k k c
appropriately. + P(EN E |U = u , V = v ) P(D0 |U = u , V = v , EN E )
Finally, we consider the event ECD . Denoting by ECT , the ≤ P(D0 |U k = uk , V k = v k , EN E )
event that the channel outputs corresponding to the ith block
n + P(D0 |U k = uk , V k = v k , EN
c
E ).
does not belong to T[P Y |S=si ]δ
for some 1 ≤ i ≤ |S|, it follows
from the weak law of large numbers and the union bound, that Thus, we can write (80), shown at the bottom of the next page.
c (k) Now, first we assume that EN E holds. Then,
P(ECT |EEE ) −−→ 0. (76)
P(D0 | U k = uk , V k = v k , EN E )
Also, it follows from [23, Exercise 10.18, 10.24] that for
Mk e kR
sufficiently large n (depending on μ, τ, |X | and |Y|),
= P(J = j, fB (J) = m| U k = uk , V k = v k , EN E )
μ
c c −nEx ( R
τ + 2τ ,PSX )
P (ECD |EEE ∩ ECT ) ≤e . (77) j=1 m=1
P(D0 |U k = uk , V k = v k , J = j, fB (J) = m, EN E ). (81)

This implies that the probability that an error occurs at
(k,n)
the channel decoder gc tends to 0 as n → ∞ since By the symmetry of the codebook generation, encoding and
μ
Ex ( R
τ + 2τ , PSX ) > 0 for R ≤ τ I(X; Y |S) − μ. Thus, decoding procedure, the term P(D0 |U k = uk , V k = v k , J =
since I(U ; W |V ) + μ ≤ R ≤ τ I(X; Y |S) − μ, the prob- j, fB (J) = m, EN E ) in (81) is independent of the value of J
ability of the events causing type I error tends to zero and fB (J). Hence, w.l.o.g. assuming J = 1 and fB (J) = 1,
asymptotically. we can write (82), shown at the bottom of the next page.
P(EDE | V k = v k , W k (J) = wk , EME

c c
∩ EEE ∩ ETc E )
Mk

≤ P fB (l) = fB (J), W k (l) = w̃k | V k = v k , W k (J) = wk , EME
c c
∩ EEE ∩ ETc E
l=1, w̃ k ∈T[W
k
] :
l=J δ̂
He (w̃ k |v k )
≤He (w k |v k )

Mk
1
= P(W k (l) = w̃k | V k = v k , W k (J) = wk , EME
c c
∩ EEE ∩ ETc E )
ekR
l=1, w̃ k ∈T[W
k
] :
l=J δ̂
He (w̃ k |v k )≤He (w k |v k )

Mk

≤ 2 · e−kR e−k(H(W )−O(δ)) (71)
l=1, w̃ k k
∈T[W :
]
l=J δ̂
He (w̃ k |v k )≤He (w k |v k )

Mk

≤ (k + 1)|V||W| ek(H(W |V )+O(δ)) · 2 · e−kR e−k(H(W )−O(δ)) (72)
l=1,
l=J
(k)
−k(R−I(U;W |V )−δ1 )
≤e , (73)
Given EN E holds, D0 may occur in three possible ways: W k (J) depends on the entire codebook. Following steps
c
(i) when M̂ = 0, i.e., ECT occurs, the channel decoder makes similar to those in [28], we analyze the probability of this
an error and the codeword retrieved from the bin is jointly event (averaged over codebooks C and random binning) as
typical with V k ; (ii) when an unintended wrong codeword is follows. We first consider the case when I(U ; W ) + μ > R.
retrieved from the correct bin that is jointly typical with V k ; Then, through the steps leading to (94) shown in the next page,
and (iii) when there is no error at the channel decoder and we obtain
the correct codeword is retrieved from the bin, that is also Mk
1
jointly typical with V k . We refer to the event in case (i) as P(D0 |F1 ) ≤ P(W k (l) = w̃k |F1 ) . (92)
the channel error event ECE , and the one in case (ii) as the ekR
l=2 w̃ k : (w̃ k ,v k )
binning error event EBE . More specifically, ∈ T[W k
V]
δ̃
c
ECE = {ECT and M̂ = gc(k,n) (Y n ) = M }, (83) Let −
C1,l k k
= C\{W (1), W (l)}. Then,
and
P(W (l) = w̃k |F1 )
k
EBE = ∃ l ∈ [Mk ] , l = J, fB (l) = M̂, W k (l)) ∈ T[W

k
]δ̂ , − −
= P(C1,l = c|F1 )P(W k (l) = w̃k |F1 , C1,l = c). (95)
(V k , W k (l)) ∈ T[V
k
W] . (84) −
C1,l =c
δ̃
Define the events F , F1 , F2 , F21 and F22 as in (85)-(89) The term in (95) can be upper bounded as shown in (97) at the
shown in the next page. The last term in (82) can be expressed bottom of the next page. Since the codewords are generated
as follows: independently of each other and the binning operation is
independent of the codebook generation, we have
c
P(D0 |F) = P(ECE |F) P(D0 |F1 ) + P(ECE |F) P(D0 |F2 ), −
P(W k (1) = wk |W k (l) = w̃k , U k = uk , V k = v k , C1,l = c)
where −
= P(W k (1) = wk |U k = uk , V k = v k , C1,l = c),
P(D0 |F2 ) and
c
= P(EBE |F2 ) P(D0 |F21 ) + P(EBE |F2 ) P(D0 |F22 ). (90)
P(fB (J) = 1|J = 1, W k (1) = wk , W k (l) = w̃k , U k = uk ,
−
It follows from (77) that for sufficiently large k, V k = v k , C1,l = c)
R μ R μ
P(ECE |F) ≤ e−nEx ( τ + 2τ ,PSX ) = e−kτ Ex ( τ + 2τ ,PSX ) . = P(fB (J) = 1|J = 1, W k (1) = wk , U k = uk , V k = v k ,
−
C1,l = c).
(91)
Also, note that
Next, consider the type II error event that happens when an
error occurs at the channel decoder. We need to consider P(EN E , ECE |fB (J) = 1, J = 1, W k (1) = wk , W k (l) = w̃k ,
−
two separate cases: I(U ; W ) + μ > R and I(U ; W ) + μ ≤ U k = uk , V k = v k , C1,l = c)
R. Note that in the former case, binning is performed and = P(EN E , ECE |fB (J) = 1, J = 1, W k (1) = wk , U k = uk ,
type II error happens at the decoder only if a sequence −
V k = v k , C1,l = c).
W k (l) exists in the wrong bin M̂ = M = fB (J) such that
(V k , W k (l)) ∈ T[V
k
W ]δ̃ . As noted in [28], the calculation
Next, consider the term in (96) shown at the bottom of the
− −
of the probability of this event does not follow from the next page. Let N (uk , C1,l ) = |{wk (l ) ∈ C1,l : l = 1, l = l,
k k k
standard random coding argument usually encountered in (w (l ), u ) ∈ T[W U ]δ }|. Recall that if there are multiple
achievability proofs due to the fact that the chosen codeword sequences in codebook C that are jointly typical with the

β k, n, f (k,n) , g (k,n)
$ %
≤ P(U k = uk , V k = v k |H = 1) P(D0 |U k = uk , V k = v k , EN E ) + P(D0 |U k = uk , V k = v k , EN
c
E .
) (80)
(uk ,v k )
∈ U k ×V k
P(D0 | U k = uk , V k = v k , EN E )
kR
Mk e

= P(J = j, fB (J) = m| U k = uk , V k = v k , EN E )P(D0 |U k = uk , V k = v k , J = 1, fB (J) = 1, EN E )
j=1 m=1
= P(D0 |U k = uk , V k = v k , J = 1, fB (J) = 1, EN E )

= P(W k (1) = wk |U k = uk , V k = v k , J = 1, fB (J) = 1, EN E )
w k ∈W k
P(D0 |U k = uk , V k = v k , J = 1, fB (J) = 1, W k (1) = wk , EN E ). (82)
observed sequence U k , then the encoder selects one of them Substituting (101) in (92), we obtain
uniformly at random. Also, note that given F1 , (wk , uk ) ∈
k P(D0 |F1 )
T[W U]δ . Thus, the term in (96) can be bounded as shown Mk
in (99) or (100) at the bottom of the next page, depending 1
≤ 2 P(W k (l) = w̃k )
on whether (w̃k , uk ) ∈ T[W k k k
U ]δ or (w̃ , u ) ∈
k
/ T[W U ]δ , l=1 k
ekR
respectively. This implies that the term in (95) can be upper (w̃ k ,v kw̃
)∈T: k
[W V ]
δ̃

Mk
bounded as 1
= 2 · e−k(H(W )−O(δ̂))
ekR
P(W k (l) = w̃
k
|F1 ) l=1
(w̃ k ,v kw̃
)∈T
k
: k
[W V ]
δ̃
−
≤ P(C1,l = c|F1 ) 2 P(W k (l) = w̃k |U k = uk , 1
=2 Mk e k(H(W |V )+δ)
e−k(H(W )−O(δ̂))
−
C1,l =c (k)
ekR
V k k
= v , C1,l−
=
c) ≤ e−k(R−I(U;W |V )−δ2 )
, (102)
= 2 P(W (l) = w̃k |U k = uk , V k = v k )
k (k)
where δ2 := O(δ) + log(2)
k . For the case I(U ; W ) + μ ≤ R
= 2 P(W k (l) = w̃k ). (101) (when binning is not done), the terms can be bounded
F = {U k = uk , V k = v k , J = 1, fB (J) = 1, W k (1) = wk , EN E }, (85)

F1 = {U k = uk , V k = v k , J = 1, fB (J) = 1, W k (1) = wk , EN E , ECE }, (86)
k k k k k k c
F2 = {U = u , V = v , J = 1, fB (J) = 1, W (1) = w , EN E , ECE }, (87)
F21 = {U k = uk , V k = v k , J = 1, fB (J) = 1, W k (1) = wk , EN E , ECE
c
, EBE }, (88)
F22 = {U k = uk , V k = v k , J = 1, fB (J) = 1, W k (1) = wk , EN E , ECE
c c
, EBE }. (89)
P(D0 |F1 ) ≤ P( ∃ W k (l) : fB (l) = M̂ = 1, (W k (l), v k ) ∈ T[W

k
V ] |F1 ) δ̃
Mk

≤ P(M̂ = m̂|F1 ) P((W k (l), v k ) ∈ T[W
k
V ]δ̃ : fB (l) = m̂|F1 )
l=2 m̂=1

Mk

= P(M̂ = m̂|F1 ) P(W k (l) = w̃k : fB (l) = m̂|F1 )
l=2 m̂=1 w̃ k :
(w̃ k ,v k )∈T[W k
V]
δ̃
Mk
1
= P(M̂ = m̂|F1 ) P(W k (l) = w̃k |F1 ) (93)
ekR
l=2 m̂=1 w̃ k :
(w̃ k ,v k )∈T[W k
V]
δ̃
Mk
1
= P(W k (l) = w̃k |F1 ) . (94)
ekR
l=2 k
w̃ :
(w̃ k ,v k )∈T[W
k
V]
δ̃
−
P(W k (l) = w̃k |F1 , C1,l = c)
−
−
P(W k (1) = wk |W k (l) = w̃k , U k = uk , V k = v k , C1,l = c)
= P(W k (l) = w̃k |U k = uk , V k = v k , C1,l = c) −
P(W k (1) = wk |U k = uk , V k = v k , C1,l = c)
−
P(J = 1|W k (1) = wk , W k (l) = w̃k , U k = uk , V k = v k , C1,l = c)
− (96)
P(J = 1|W k (1) = wk , U k = uk , V k = v k , C1,l = c)
−
P(fB (J) = 1|J = 1, W k (1) = wk , W k (l) = w̃k , U k = uk , V k = v k , C1,l = c)
−
P(fB (J) = 1|J = 1, W k (1) = wk , U k = uk , V k = v k , C1,l = c)
−
P(EN E , ECE |fB (J) = 1, J = 1, W k (1) = wk , W k (l) = w̃k , U k = uk , V k = v k , C1,l = c)
− . (97)
P(EN E , ECE |fB (J) = 1, J = 1, W k (1) = wk , U k = uk , V k = v k , C1,l = c)
c c
similarly using (101) as follows. First, consider that the event EN E ∩ ECE ∩ EBE holds. In this
case, we have
P(D0 |F1 )

= P(M̂ = m̂|F1 ) P((W k (m̂), v k ) ∈ T[W
k
V ] |F1 ) P(D0 |F22 )
δ̃
m̂=1
= P(D0 |U k = uk , V k = v k , J = 1, fB (J) = 1,
≤ P(M̂ = m̂|F1 ) 2 P(W k (m̂) = w̃k )
W k (1) = wk , EN E , ECE
c c
, EBE )
m̂=1 w̃ k :
(w̃ k ,v k )∈T[W k
k k
1, if Puk wk ∈ T[UW ]δ and Pvk wk ∈ T[V W] ,
V]
δ̃
(k) = δ̃ (110)
≤ e−k(I(V ;W )−δ2 )
. (103) 0, otherwise.
Next, consider the event when there are no encoding or

c Thus, the following terms in (108) can be simplified (for
channel errors, i.e., EN E ∩ ECE . For the case I(U ; W ) + μ >
sufficiently large k) as shown in (111) on the next page. To
R, the binning error event denoted by EBE happens when
obtain (111), we used (109) and (110). Note that for δ small
a wrong codeword W k (l), l = J, is retrieved from the
enough, (113) holds.
bin with index M by the empirical entropy decoder such
Next, consider the terms corresponding to the event
that (W k (l), V k ) ∈ T[W k
V ]δ . Let PŨ Ṽ W̃ denote the type of c
k
EN E ∩ ECE ∩ EBE in (108). Note that given the event
PU k V k W k (J) . Note that PŨ W̃ ∈ T[UW ]δ when EN E holds. F21 = {U k = uk , V k = v k , J = 1, fB (J) = 1, W k (1) =
If H(W̃ |Ṽ ) < H(W |V ), then in the bin with index M , wk , EN E , ECE c
, EBE } occurs, Puk wk ∈ T[UW k
]δ . Also,
there exists a codeword with empirical entropy strictly less k k
D0 can happen only if He (w |v ) ≥ H(W |V ) − O(δ̃),
than H(W |V ). Hence, the decoded codeword Ŵ k is such and Pvk ∈ T[V k
]δ̌ .Using these facts to simplify the terms
that (Ŵ k , V k ) ∈ k
/ T[W k
V ] (asymptotically) since (Ŵ , V ) ∈
k
c
δ̃ corresponding to the event EN E ∩ ECE ∩ EBE in (108),
k k k
T[W V ]δ̃ necessarily implies that He (Ŵ |V ) ≥ H(W |V ) − we obtain (114) given below. Also, note that EBE occurs
O(δ) (for δ small enough). Consequently, a type II error only when I(U ; W ) + μ > R.
can happen under the event EBE only when H(W̃ |Ṽ ) ≥ Next, consider that the event EN E ∩ ECE holds. As in the
H(W |V ) − O(δ). The probability of the event EBE can be case above, note that given F1 = {U k = uk , V k = v k ,
upper bounded under this condition as shown in (105) in the J = 1, fB (J) = 1, W k (1) = wk , EN E , ECE }, Puk wk ∈
k k
next page. In (104) given below, we used the fact that T[UW ]δ and D0 occurs only if Pv k ∈ T[V ]δ̌ . Using these facts
and eqns. (102), (103) and (91), it can be shown that the
P W k (l) = w̃k |F2 ≤ 2 P(W k (l) = w̃k ), (106)
terms corresponding to this event in (108) results in the factor
which follows in a similar way as (101). Also, note that, E3 (PW |U , PSX , R, τ ) − O(δ) in the error-exponent.
c
by definition, P(D0 |F21 ) = 1. Finally, we analyze the case when the event EN E occurs.
We proceed to analyze the R.H.S of (80) which upper Since the encoder declares H1 if M̂ = 0, it is clear that D0
bounds the type II error probability. Towards this end, we first occurs only when the channel error event ECE happens. Thus,
focus on the the case when EN E holds. From (82), (107) we have
follows. Rewriting the summation in (107) as the sum over the
types and sequences within a type, we obtain (108) below. P(D0 | U k = uk , V k = v k , EN
c
E)
Also, note that (109) holds, where PŨ Ṽ W̃ denotes the type
= P(ECE | U k = uk , V k = v k , EN
c
E)
of the sequence (uk , v k , wk ). With (91), (102), (103), (105)
and (109), we have the necessary machinery to analyze (108). P(D0 | U k = uk , V k = v k , EN
c
E ∩ ECE ). (117)
−
P(J = 1|W k (1) = wk , W k (l) = w̃k , U k = uk , V k = v k , EN E , ECE , C1,l = c)
−
& '
1 1
= − − (98)
N (u , C1,l ) + 2 P(J = 1|W (1) = w , U = uk , V k = v k , C1,l
k k k k = c)
−
N (uk , C1,l )+2
≤ − = 1. (99)
N (uk , C1,l )+2
−
P(J = 1|W k (1) = wk , W k (l) = w̃k , U k = uk , V k = v k , C1,l = c)
−
& '
1 1
= − −
N (uk , C1,l ) + 1 P(J = 1|W k (1) = wk , U k = uk , V k = v k , C1,l = c)
k −
N (u , C1,l ) + 2
≤ − ≤ 2. (100)
N (uk , C1,l )+1
It follows from Borade et al.’s coding scheme [25] that When binning is performed at the encoder, D0 occurs only
asymptotically, if there exists a sequence Ŵ k in the bin M̂ = 0 such
that (Ŵ k , V k ) ∈ T[W
k
V ]δ̃ . Also, recalling that the encoder
P(ECE | U k = uk , V k = v k , EN
c
E)
sends the error message M = 0 independent of the source
≤ e−n(Em (PSX )−O(δ)) = e−kτ (Em (PSX )−O(δ)) . (118) codebook C, it can be shown using standard arguments that
P(EBE
|F2 )
≤ P ∃ l = 1, l ∈ [Mk ] : fB (l) = 1 and (W k (l), v k ) ∈ T[W
k
V ]δ̃ |F 2
Mk

≤ P (W k (l), v k ) ∈ T[W
k
V ] |F2 P fB (l) = 1|F2 , (W k (l), v k ) ∈ T[W
k
V]
δ̃ δ̃
l=2
Mk

= P (W k (l), v k ) ∈ T[W
k
V ] |F2 e−kR
δ̃
l=2
Mk

≤ 2 P(W k (l) = w̃k ) e−kR (104)
l=2 k
(w̃ k ,v kw̃
)∈T: k
[W V ]
δ̃
(k)
= e−k(R−I(U;W |V )−δ2 )
. (105)

P(U k = uk , V k = v k |H = 1) P(D0 |U k = uk , V k = v k , EN E )
(uk ,v k )∈U k ×V k

= P(U k = uk , V k = v k |H = 1) P(D0 |U k = uk , V k = v k , J = 1, fB (J) = 1, EN E ). (107)
k k
(u ,v )∈
U k ×V k
P(D0 | EN E , H = 1)
$ %
= P(U k = uk , V k = v k |H = 1) P(D0 |F) P(W k (1) = wk |U k = uk , V k = v k , J = 1, fB (J) = 1, EN E ) .
PŨ Ṽ W̃ (uk ,v k ,w k )
k ∈TP
∈TUVW Ũ Ṽ W̃
(108)
P(U k = uk , V k = v k |H = 1) P(W k (1) = wk |U k = uk , V k = v k , J = 1, fB (J) = 1, EN E )

& k '

= QUV (ui , vi ) P(W k (1) = wk |U k = uk , V k = v k , J = 1, fB (J) = 1, EN E )
i=1
& k
'
1 1
≤ QUV (ui , vi ) ≤ e−k(H(Ũ Ṽ )+D(PŨ Ṽ ||QU V )+H(W̃ |Ũ)− k |U ||W| log(k+1)) , (109)
i=1
|TPW̃ |Ũ |
$
P(U k = uk , V k = v k |H = 1) P(ECE
c c
|F) P(EBE |F2 ) P(D0 |F22 )
k
∈TUVW ∈TP
Ũ Ṽ W̃ %
$ P(W k (1) = wk |U k = uk , V k = v k , J = 1, fB (J) = 1, EN E )

≤ P(U k = uk , V k = v k |H = 1) P(D0 |F22 )
k
∈TUVW ∈TP
Ũ Ṽ W̃ %
P(W k (1) = wk |U k = uk , V k = v k , J = 1, fB (J) = 1, EN E )
1
≤ (k + 1)|U ||V||W| max ekH(Ũ Ṽ W̃ ) e−k(H(Ũ Ṽ )+D(PŨ Ṽ ||QU V )+H(W̃ |Ũ)− k |U ||W| log(k+1))
PŨ Ṽ W̃ ∈
(k)
T̂1 (PU W ,PV W )
= e−kẼ1k , (111)
(k) k k
T̂1 (PUW , PV W ) := {PŨ Ṽ W̃ : PŨ W̃ ∈ T[UW ]δ and PṼ W̃ ∈ T[V W ] }, δ̃
and
1
Ẽ1k := min H(Ũ Ṽ ) + D(PŨ Ṽ ||QUV ) + H(W̃ |Ũ ) − H(Ũ Ṽ W̃ ) − |U|(|V| + 1)|W| log(k + 1). (112)
(k)
PŨ Ṽ W̃ ∈T̂1 (PU W ,PV W ) k
for such v k ∈ T[V

k
] , A PPENDIX B
δ̌
P ROOF OF T HEOREM 5
P(D0 | U k = uk , V k = v k , EN
c
E ∩ ECE )
≤ e−k(R−I(U;W |V )−O(δ)) . (119) We only give a sketch of the proof as the intermediate steps
follow similarly to those in the proof of Theorem 2. We will
Thus, from (117), (118) and (119), we obtain (asymptotically) use the random coding method combined with the expurgation
that, technique as explained in the proof of Theorem 2, to guar-
antee the existence of at least one deterministic codebook
P(U k = uk , V k = v k |H = 1)
that achieves the type I error probability and error-exponent
uk ,v k
claimed in Theorem 5. For brevity, we will denote information
P(D0 | U k = uk , V k = v k , EN
c
E ∩ ECE )
−k(R−I(U;W |V )+D(PV ||QV )+τ Em (PSX )−O(δ)) theoretic quantities like IP̂ (U, S; W̄ ), T[nP̂ , etc., that
≤e . (120) ] U S W̄ δ
are computed with respect to joint distribution P̂UV S W̄ X XY
On the other hand, when binning is not performed, D0 occurs given below in (122) by I(U, S; W̄ ), T[US n
, etc.
W̄ ]δ
only if (W k (M̂ ), V k ) ∈ T[W
k
V ]δ̃ and in this case, we obtain Fix distributions (PS , PW̄ |US , PX |US , PX|US W̄ ) ∈ Bh and
(asymptotically) that, a positive number δ > 0. Let μ = O(δ) subject to constraints
δ
P(U k = uk , V k = v k |H = 1) that will be specified below. Let δ̂ := |W̄|δ, δ := δ2 , δ̄ := |V| ,
uk ,v k δ̃ := 2δ, and
P(D0 | U k = uk , V k = v k , EN
c
E ∩ ECE )
P̂UV S W̄ X XY (PS , PW̄ |US , PX |S , PX|US W̄ )
≤ e−k(I(V ;W )+D(PV ||QV )+τ Em (PSX )−O(δ)) . (121)
:= PUV PS PW̄ |US PX |US PX|US W̄ PY |X . (122)
This results in the factor E4 (PW |U , PSX , R, τ ) − O(δ) in the n
error-exponent. Since the error-exponent is lower bounded by Generate a sequence S n i.i.d. according to i=1 PS (si ). The
the minimal value of the exponent due to the various type II realization S n = sn is revealed to both the encoder and detec-
error events, the proof of the theorem is complete by noting tor. Generate the quantization codebook C = {w̄n (j), j ∈
that δ > 0 is arbitrary. [en(I(U,S;W̄ )+μ) ]}, where each codeword w̄n (j) is generated

(k) PŨ Ṽ 1 PŨ
Ẽ1k ≥ min PŨ Ṽ W̃ log P − O(δ)
PŨ Ṽ W̃ ∈ QUV PŨ Ṽ PŨ W̃ Ũ Ṽ W̃
T1 (PU W ,PV W )
= min D(PŨ Ṽ W̃ ||QUV W ) − O(δ) = E1 (PW |U ) − O(δ), (113)

PŨ Ṽ W̃ ∈
T1 (PU W ,PV W )
$
P(U k = uk , V k = v k |H = 1) P(ECE
c
|F) P(EBE |F2 ) P(D0 |F21 )
k
∈TUVW ∈TP
Ũ Ṽ W̃
%
$
≤ P(U k = uk , V k = v k |H = 1) P(EBE |F2 ) P(D0 |F21 )
k
∈TUVW ∈TP
Ũ Ṽ W̃
%
≤ max ekH(Ũ Ṽ W̃ ) e−k(H(Ũ Ṽ )+D(PŨ Ṽ ||QU V )+H(W̃ |Ũ)+R−I(U;W |V )−O(δ))

PŨ Ṽ W̃ ∈
(k)
T̂2 (PU W ,PV )
e(|U ||V||W| log(k+1)+|U ||W| log(k+1))

= e−kẼ2k , (114)
(k) k k
T̂2 (PUW , PV ) := {PŨ Ṽ W̃ : PŨ W̃ ∈ T[UW ]δ , PṼ ∈ T[V ]δ̌ and H(W̃ |Ṽ ) ≥ H(W |V ) − O(δ)}, (115)
1
Ẽ2k := min H(Ũ Ṽ ) + D(PŨ Ṽ ||QUV ) + H(W̃ |Ũ ) + R − I(U ; W |V ) − |U|(|V| + 1)|W| log(k + 1) − O(δ)
PŨ Ṽ W̃ ∈ k
T2 (PU W ,PV )
(k)
≥ E2 (PW |U , PSX , R) − O(δ). (116)
n
independently according to the distribution i=1 P̂W̄ , where it can be shown that
c
P(ẼDE |ẼCE c
∩ ẼME c
∩ ẼEE ∩ E˜Tc E )
P̂W̄ = PU (u)PS (s)PW̄ |US (w̄|u, s).
(n)
(u,s)∈U ×S ≤ e−n(IP̂ (W̄ ;V,S,Y )−IP̂ (U,S;W̄ )−δ3 )
.
Encoding: If (un , sn ) is typical, i.e., (un , sn ) ∈ T[US] n
, (n) (n)
where δ3 −−→ O(δ). Hence, for δ > 0 small enough, the
δ
n
the encoder first looks for a sequence w̄ (j) such that probability of the events causing type I error tends to zero
(un , sn , w̄ n (j)) ∈ T[USWn
]δ . If there exists multiple such asymptotically since I(U ; W̄ |S) < I(W̄ ; Y, V |S).
codewords, it chooses one among them uniformly at ran- Analysis of Type II error: The analysis of the error-
dom. The encoder transmits X n = xn over the chan- exponent is very similar to that of the SHTCC scheme given
nel, where X n is generated according to the distribution
in Appendix A. Hence, only a sketch of the proof is provided,
n n n k
i=1 PX|US W̄ (xi |ui , si , w̄i (j)). If (u , s ) ∈ / T[US]
δ
or with the differences from the proof of the SHTCC scheme
such an index j does not exist, the encoder generates highlighted.
the channel input X n = xn randomly according to
Let
n
i=1 PX |US (xi |ui , si ).
Decoding: Given the side information sequence V n = v n , Ē0 := {(U n , S n ) ∈ n
/ T[US] δ
}. (123)
received sequence Y n = y n and sn , the detector first checks Then, the type 2 error probability can be written as shown in
if (v n , sn , y n ) ∈ T[V
n
SY ]δ̃ , δ̃ > δ. If the check is unsuccessful, (124), at the bottom of the page, where ẼN E := ẼEE c
∩ Ē0c . It is
Ĥ = 1. Else, it searches for a typical sequence w̄ ˆ n = w̄ n (ĵ) ∈ sufficient to restrict the analysis to the events ẼN E and Ē0 that
T[kW̄ ] , in the codebook such that dominate the type 2 error. We define the following events:
δ̂
$ %
ĵ = arg min He (w̄ n (l)|v n , sn , y n ). ẼT 2 = ∃ l ∈ en(I(U,S;W̄ )+μ) , l = J, W̄ n (l) ∈
l:w̄ n (l)∈T[nW̄ ]
T[nW̄ ] , (V n , W̄ n (l), S n , Y n ) ∈ T[V
n
δ̂
δ̂ S W̄ Y ]δ̃ ,
ˆn ) ∈ T n
If (v n , sn , y n , w̄ , Ĥ = 0. Else, Ĥ = 1. (125)
[V SY W̄ ]δ̃
Analysis of Type I error: F̃ = {U n = un , V n = v n , J = 1, W̄ n (1) = w̄n ,
A type I error occurs only if one of the following events occur:
S n = sn , Y n = y n , ẼN E }, (126)
ẼT E = (U n , V n , S n ) ∈ n
/ T[UV S]δ̄ , n
F̃1 = {U = u , V n n n n
= v , J = 1, W̄ (1) = w̄ , n
$ %
ẼEE = j ∈ en(I(U,S;W̄ )+μ) : (U n , S n , W̄ n (j)) S n = sn , Y n = y n , ẼN E , ẼTc 2 }, (127)

n n n n n n
∈ T[US n F̃2 = {U = u , V = v , J = 1, W̄ (1) = w̄ ,
W̄ ]δ ,

ẼME = (V n , S n , W̄ n (J)) ∈ / T[V n S n = sn , Y n = y n , ẼN E , ẼT 2 }. (128)
S W̄ ]δ̃ ,

By the symmetry of the codebook generation, encoding and
ẼCE = (V n , S n , W̄ n (J), Y n ) ∈ / T[Vn
S W̄ Y ]δ̃ ,
$ % decoding procedure, the term P(D0 |U n = un , V n = v n , J =
E˜DE = ∃ l ∈ en(I(U,S;W̄ )+μ) , l = J, W̄ n (l)) ∈ T[nW̄ ] , j, ẼN E ) is independent of the value of J. Hence, w.l.o.g.

δ̂
assuming J = 1, we obtain (129) shown in the next page.
n n n n n n n n
He (W̄ (l)|V , S , Y ) ≤ He (W̄ (J)|V , S , Y ) . The last term in (129) can be upper bounded using the events
in (126)-(128) as follows:
By the weak law of large numbers, ẼT E tends to 0 asymptoti-
P(D0 | F̃ ) ≤ P(D0 | F̃1 ) + P(ẼT 2 | F̃ ) P(D0 | F̃2 ).
cally with n. The covering lemma guarantees that ẼEE ∩ ẼTc E
tends to 0 doubly exponentially if μ = O(δ) is chosen appro- We next analyze the R.H.S of (124), which upper bounds the
c
priately. Given ẼEE ∩ ẼTc E holds, it follows from the Markov type 2 error probability. We can write,
lemma and the weak law of large numbers, respectively,
P(D0 |F̃1 )
that P(ẼME ) and P(ẼCE ) tends to zero asymptotically. Next, ⎧ n
⎨1, if Pun sn w̄n ∈ T[US W̄ ]δ and Pvn w̄n sn yn ∈
⎪
c
we consider the probability of the event ẼDE . Given that ẼCE ∩
(n) k
c
ẼME c
∩ẼEE ∩E˜Tc E holds, note that He (W̄ n (J)|V n , S n , Y n ) ≥ = T[V , (130)
⎪
⎩
S W̄ Y ]δ̃
H(W̄ |V, S, Y )−O(δ). Hence, similarly to (70) in Appendix A, 0, otherwise.

β n, n, f (n,n) , g (n,n)
$
≤ P(U n = un , V n = v n |H = 1) P(E˜EE ∩ Ē0c |U n = un , V n = v n )
(un ,v n )∈U n ×V n
%
+ P(D0 |U n = un , V n = v n , ẼN E ) + P(D0 |U n = un , V n = v n , Ē0 ) , (124)
Hence, the terms corresponding to the event F̃1 in (124) next page. In (133), we used the fact that
can be upper bounded as shown in (131) below. Here, (132)
follows from the fact that PS̃ W̃ |Ũ → PS W̄ |U as δ → 0 P(ẼT 2 |F̃ ) ≤ 2 · e−n(I(W̄ ;V,Y |S)−I(U;W̄ |S)−O(δ)) ,
given ẼN E .
Next, consider the terms corresponding to the event F̃2 in which follows from
n
(124). Given F̃2 , PŨ S̃ W̃ ∈ T[US and D0 occurs only if
n n n n
W̄ ]δ P W̄ n (l) = w̃n |F̃ ≤ 2 P(W̄ n (l) = w̃n ). (136)
(V , S , Y ) ∈ T[V SY ] , δ = |W̄|δ̃, and H(W̃ |Ṽ, S̃, Ỹ ) ≥
δ
H(W̄ |V, S, Y ) − O(δ̃). Thus, we can write (134) shown in the Eqn. (136) can be proved similarly to (101).
P(D0 | U n = un , V n = v n , ẼN E )
en(I(U,S;

W̄ )+μ)
= P(J = j| U n = un , V n = v n , ẼN E ) P(D0 |U n = un , V n = v n , J = 1, ẼN E )

j=1
|U n = un , V n = v n , J = 1, ẼN E )
= P(D0
= P(W̄ n (1) = w̄n , S n = sn , Y n = y n |U n = un , V n = v n , J = 1, ẼN E )
(w̄ n ,sn ,y n )
∈ W̄ n ×S n ×Y n
P(D0 |U n = un , V n = v n , J = 1, W̄ n (1) = w̄n , S n = sn , Y n = y n , ẼN E )
= P(W̄ n (1) = w̄n , S n = sn , Y n = y n |U n = un , V n = v n , J = 1, ẼN E ) P(D0 | F̃ ). (129)
(w̄ n ,sn ,y n )
∈ W̄ n ×S n ×Y n
$
P(U n = un , V n = v n |H = 1) P(D0 |F̃1 )
(un ,v n ,w̄ n ,sn ,y n )
∈ U n ×V n ×W̄ n ×S n ×Y n %
P(W̄ n (1) = w̄n , S n = sn , Y n = y n |U n = un , V n = v n , J = 1, ẼN E )
$
≤ P(U n = un , V n = v n |H = 1) P(D0 |F̃1 )
PŨ Ṽ S̃ W̃ Ỹ (un ,v n ,w̄ n ,sn ,y n )
n
∈TUV W̄ SY
∈TP
Ũ Ṽ S̃ W̃ Ỹ
P(S n = sn , W̄ n (1) = w̄n |U n = un , J = 1, ẼN E )
%
P(Y n = y n |U n = un , S n = sn , J = 1, W̄ n (1) = w̄n , E˜N E )
$
≤ P(D |F̃ ) e−n(H(Ũ Ṽ )+D(PŨ Ṽ ||QU V ))
0 1
PŨ Ṽ S̃ W̃ Ỹ (un ,v n ,w̄ n ,sn ,y n )
n
∈TUV ∈TP
W̄ SY Ũ Ṽ S̃ W̃ Ỹ %
e−n(H(S̃ W̃ |Ũ)− n |U ||W̄||S| log(n+1)) e−n(H(Ỹ |Ũ S̃ W̃ )+D(PỸ |Ũ S̃W̃ ||P̂Y |U SW̄ |PŨ S̃W̃ ))
1
$
e−n(H(Ũ Ṽ )+D(PŨ Ṽ ||QU V )) e−n(H(S̃ W̃ |Ũ)− n |U ||W̄||S| log(n+1))
1
≤ max
PŨ Ṽ S̃ W̃ Ỹ ∈
(n)
T1 (P̂U S W̄ ,P̂V S W̄ Y ) %
e−n(H(Ỹ |Ũ S̃ W̃ )+D(PỸ |Ũ S̃W̃ ||P̂Y |U SW̄ |PŨ S̃W̃ )) en(H(Ũ Ṽ S̃ W̃ Ỹ )− n ||U ||V||W̄||S||Y| log(n+1))
1
∗
= e−nE1n , (131)
(n) n n
T1 (P̂US W̄ , P̂V S W̄ Y ) := {PŨ Ṽ S̃ W̃ Ỹ ∈ TU VSWY : PŨ S̃ W̃ ∈ T[US W̄ ]δ , PṼ S̃ W̃ Ỹ ∈ T[V S W̄ Y ]δ̃ },
&
∗
E1n := min H(Ũ Ṽ ) + D(PŨ Ṽ ||QUV ) + H(S̃ W̃ |Ũ ) + H(Ỹ |Ũ S̃ W̃ ) − H(Ũ Ṽ W̃ S̃ Ỹ )
PŨ Ṽ S̃ W̃ Ỹ ∈
T1 (P̂U S W̄ ,P̂V S W̄ Y ) '
1
+ D(PỸ |Ũ S̃ W̃ ||P̂Y |US W̄ |PŨ S̃ W̃ ) − (|U||W̄| + |U||V||W̄||S||Y|) log(n + 1)
n
& '
(n) 1 PŨ Ṽ PŨ 1 PỸ |Ũ S̃ W̃
≥ min PŨ Ṽ S̃ W̃ Ỹ log P − O(δ)
PŨ Ṽ S̃ W̃ Ỹ ∈ PŨ Ṽ QUV PŨ S̃ W̃ PỸ |Ũ S̃ W̃ P̂Y |US W̄ Ũ Ṽ S̃ W̃ Ỹ

T1 (P̂U S W̄ ,P̂V S W̄ Y )& Ũ Ṽ S̃ W̃ Ỹ
'
= min D(PŨ Ṽ S̃ W̃ Ỹ |QUV PS̃ W̃ |Ũ P̂Y |US W̄ ) − O(δ)
PŨ Ṽ S̃ W̃ Ỹ ∈
T1 (P̂U S W̄ ,P̂V S W̄ Y )
= E1 (PS , PW̄ |US , PX|US W̄ ) − O(δ). (132)
Finally, we consider the case when Ē0 holds. This event can Also,
be first upper bounded as shown in (137) below. The event D0
occurs only if there exists a sequence (W̄ n (l), V n , S n , Y n ) ∈ P(S n = sn , Y n = y n | U n = un , Ē0 )
T[nW̄ V SY ] for some l ∈ [en(I(U,S;W̄ )+μ) ]. Noting that the ≤ e−n(H(S̃ Ỹ |Ũ)+D(PS̃Ỹ |Ũ ||Q̌SY |U |PŨ )) . (139)
δ̃
quantization codebook is independent of the (V n , S n , Y n )
given that Ē0 holds, it can be shown using standard arguments Hence, using (138) and (139) in (137), we obtain
that
P(U n = un , V n = v n |H = 1)
P(D0 | U n = un , V n = v n , S n = sn , Y n = y n , Ē0 ) un ,v n
≤e −n(I(W̄ ;V,Y |S)−I(U;W̄ |S)−O(δ))

. (138) P(D0 | U n = un , V n = v n , Ē0 )
$
P(U n = un , V n = v n |H = 1) P(D0 |F̃2 ) P(ẼT 2 |F̃ )
(un ,v n ,w̄ n ,sn ,y n )
∈ U n ×V n ×W̄ n ×S n ×Y n
%
P(W̄ n (1) = w̄n , S n = sn , Y n = y n |U n = un , V n = v n , J = 1, ẼN E )
$
≤ P(U n = un , V n = v n |H = 1) P(D0 |F̃2 ) P(ẼT 2 |F̃ )
PŨ Ṽ S̃ W̃ Ỹ ∈ (un ,v n ,w̄ n ,sn ,y n )
T n (U ×V×W̄×S×Y) ∈TP
Ũ Ṽ S̃ W̃ Ỹ
%
P(S n = sn , W̄ n (1) = w̄n |U n = un , J = 1, ẼN E ) P(Y n = y n |U n = un , S n = sn , J = 1, W̄ n (1) = w̄n , ẼN E )
$
≤ e−n(H(Ũ Ṽ )+D(PŨ Ṽ ||QU V )) P(D0 |F̃2 ) · 2 · e−n(I(W̄ ;V,S,Y )−I(U,S;W̄ )−O(δ))
PŨ Ṽ S̃ W̃ Ỹ ∈ (un ,v n ,w̄ n ,sn ,y n )
T n (U ×V×W̄×S×Y) ∈TP
Ũ Ṽ S̃ W̃ Ỹ
%
e−n(H(S̃ W̃ |Ũ)− n |U ||W̄||S| log(n+1)) e−n(H(Ỹ |Ũ S̃ W̃ )+D(PỸ |Ũ S̃W̃ ||P̂Y |U SW̄ |PŨ S̃W̃ ))
1
(133)
$
e−n(H(Ũ Ṽ )+D(PŨ Ṽ ||QU V )) e−n(H(S̃ W̃ |Ũ)− n |U ||W̄||S| log(n+1))
1
≤ max
PŨ Ṽ S̃ W̃ Ỹ ∈
(n)
T2 (P̂U W ,P̂V SW Y )
e−n(I(W̄ ;V,S,Y )−I(U,S;W̄ )−O(δ)− n ) e−n(H(Ỹ |Ũ S̃ W̃ )+D(PỸ |Ũ S̃W̃ ||P̂Y |U SW̄ |PŨ S̃W̃ ))
1
%
en(H(Ũ Ṽ S̃ W̃ Ỹ )− n ||U ||V||W̄||S||Y| log(n+1))
1
∗
= e−nE2n , (134)
where,
(n) n n
T2 (P̂US W̄ , P̂V S W̄ Y ) := {PŨ Ṽ S̃ W̃ Ỹ ∈ TU VSWY : PŨ S̃ W̃ ∈ T[US W̄ ]δ , PṼ S̃ W̃ Ỹ ∈ T[V S W̄ Y ] δ̃
and H(W̃ |Ṽ, S̃, Ỹ ) ≥ H(W̄ |V, S, Y ) − O(δ)},

and
& '
(n)
∗
E2n ≥ min D(PŨ Ṽ S̃ W̃ Ỹ |QUV PS̃ W̃ |Ũ P̂Y |US W̄ ) + I(W̄ ; V, Y |S) − I(U ; W̄ |S) − O(δ)
PŨ Ṽ S̃ W̃ Ỹ ∈
T2 (P̂U S W̄ ,P̂V S W̄ Y )
= E2 (PS , PW̄ |US , PX|US W̄ ) − O(δ). (135)

P(U n = un , V n = v n |H = 1) P(D0 | U n = un , V n = v n , Ē0 )
un ,v n

= P(U n = un , V n = v n |H = 1) P(S n = sn , Y n = y n , D0 | U n = un , V n = v n , Ē0 )
un ,v n sn ,y n
$
= P(U n = un , V n = v n |H = 1) P(S n = sn , Y n = y n | U n = un , V n = v n , E¯0 )
un ,v n sn ,y n
%
P(D0 | U n = un , V n = v n , S n = sn , Y n = y n , Ē0 )
$
= P(U n = un , V n = v n |H = 1) P(S n = sn , Y n = y n | U n = un , Ē0 )
un ,v n sn ,y n
%
P(D0 | U n = un , V n = v n , S n = sn , Y n = y n , Ē0 ) . (137)
≤ (n + 1)|U ||V||S||Y| max enH(Ũ Ṽ S̃ Ỹ ) e−nH(Ũ Ṽ ) Next, note that

PŨ Ṽ S̃ Ỹ :
PṼ S̃ Ỹ =P̂V SY
PY nk |V k (y nk |v k )
−nD(PŨ Ṽ ||QU V ) −n(H(S̃ Ỹ |Ũ)+D(PS̃ Ỹ |Ũ ||Q̌SY |U |PŨ ))
e e
−n(I(W̄ ;V,Y |S)−I(U;W̄ |S)−O(δ)) = PU k |V k (uk |v k )PX nk |U k (xnk |uk )
e
∗ (uk ,xnk )
−nE3n
=e , ∈ U k ×X nk
PY nk |X nk (y nk |xnk )
where,
n
∗
k
E3n = min D(PṼ S̃ Ỹ ||Q̌V SY ) + I(W̄ ; V, Y |S) = PY∗ (yi ) PU k |V k (uk |v k )PX nk |U k (xnk |uk )
PṼ S̃ Ỹ =P̂V SY
i=1 (uk ,xnk )
− I(U ; W̄ |S) − |U||V||S||Y| log(n + 1) − O(δ) ∈ U k ×X nk
(143)
(n) nk
−−→ E3 PS , PW̄ |US , PX |US , PX|US W̄ − O(δ).
= PY∗ (yi ), (144)
i=1
Since the error-exponent is lower bounded by the minimal
value of the exponent due to the various type 2 error events, where, (143) follows from (3) and (140). Similarly, it follows
this completes the proof of the theorem. that
nk

A PPENDIX C QY nk |V k (y nk |v k ) = PY∗ (yi ). (145)
O PTIMAL S INGLE -L ETTER C HARACTERIZATION OF i=1
E RROR -E XPONENT W HEN C(PY |X ) = 0 From (141), (142), (144) and (145), we obtain that
The proof of achievability follows from the one-bit scheme −1
mentioned in Remark 4 which states that for τ ≥ 0, κ(τ, ) ≥ lim sup log β k, nk , f (k,nk ) , g (k,nk ) ≤ D(PV ||QV ).
k→∞ k
κ0 (τ ), ∀ ∈ (0, 1]. Now, it is well-known (see [23]) that
C(PY |X ) = 0 only if This completes the proof of the weak converse.
Next, we proceed to show that D(PV ||QV ) is the optimal
PY∗ := PY |X=x = PY |X=x , ∀ x, x ∈ X . (140) error-exponent for every ∈ (0, 1). For any fixed ∈ (0, 1),
let f (k,nk ) and A(k,nk ) denote any encoding function and
From (140), it follows that Ec (PY |X ) = 0. Also, acceptance region for H0 , respectively, such that nk ≤ τ k
) and
β0 ≥ D(PV ||QV ) + min D(PŨ|Ṽ ||QU|V )PṼ )
PŨ Ṽ : lim sup α k, nk , f (k,nk ) , g (k,nk ) ≤ . (146)
PŨ =PU , PṼ =PV k→∞
≥ D(PV ||QV ), The joint distribution of (V k , Y nk ) under the null and alternate
hypothesis is given by
which implies that κ0 (τ ) ≥ D(PV ||QV ).
Converse: We first show the weak converse, i.e., κ(τ ) ≤ k ⎛ n ⎞
k
D(PV ||QV ), where κ(τ ) is as defined in (45). For any PV k Y nk (v k , y nk ) = PV (vi ) ⎝ PY∗ (yj )⎠ , (147)
sequence of encoding functions f (k,nk ) and acceptance regions i=1 j=1
A(k,nk ) for H0 that satisfy nk ≤ τ k and (57), it follows
and
similarly to (58), that k
⎛n ⎞
k
−1 QV k Y nk (v k , y nk ) = QV (vi ) ⎝ PY∗ (yj )⎠ , (148)
lim sup log β k, nk , f (k,nk ) , g (k,nk )
k→∞ k i=1 j=1
1
≤ lim sup D (PY nk V k ||QY nk V k ) . (141) respectively. By the weak law of large numbers, for any δ > 0,
k→∞ k
(147) implies that
The terms in the R.H.S. of (141) can be expanded as
k nk
lim PV k Y nk T[P ] × T [P ∗] = 1. (149)
1 k→∞ V δ δ Y
D (PY nk V k ||QY nk V k )
k & Also, from (146), we have
1 k nk
= D(PV ||QV ) + PV k Y nk (v , y ) lim inf PV k Y nk A(k,nk ) ≥ (1 − ). (150)
k k n k→∞
(v ,y k )
∈V k ×Y nk
' From (149) and (150), it follows that
PY nk |V k (y nk |v k )
log . (142) k nk
QY nk |V k (y nk |v k ) PV k Y nk A(k,nk ) ∩ T[P ]
V δ
× T [P ]δ ≥ 1 − ,
∗ (151)
Y
for any > and k sufficiently large (k ≥ k0 (δ, |V|, |Y|)). [7] H. M. H. Shalaby and A. Papamarcou, “Multiterminal detection with
Let zero-rate data compression,” IEEE Trans. Inf. Theory, vol. 38, no. 2,
pp. 254–267, Mar. 1992.
A(v k , δ) [8] H. Shimokawa, T. S. Han, and S. Amari, “Error bound of hypothesis
testing with data compression,” in Proc. IEEE Int. Symp. Inf. Theory,
nk
:= y nk : (v k , y nk ) ∈ A(k,nk ) ∩ T[P
k
V ]δ
× T [P ]δ ,
∗
Trondheim, Norway, 1994, p. 114.
Y [9] M. S. Rahman and A. B. Wagner, “On the optimality of binning for
and distributed hypothesis testing,” IEEE Trans. Inf. Theory, vol. 58, no. 10,
pp. 6282–6303, Oct. 2012.
D(η, δ) := v k ∈ T[P
k
V ]δ
: PY
k
nk (A(v , δ)) ≥ η . (152) [10] M. Wigger and R. Timo, “Testing against independence with multi-
ple decision centers,” in Proc. Int. Conf. Signal Process. Commun.,
Fix 0 < η < 1 − . Then, we have from (151) that for any Bengaluru, India, Jun. 2016, pp. 1–5.
[11] W. Zhao and L. Lai, “Distributed testing against independence
δ > 0 and sufficiently large k, with multiple terminals,” in Proc. 52nd Annu. Allerton Conf.
1 − − η Commun., Control, Comput., Monticello, IL, USA, Sep. 2014,
PV k (D(η , δ)) ≥ . (153) pp. 1246–1251.
1 − η [12] Y. Xiang and Y.-H. Kim, “Interactive hypothesis testing against indepen-
dence,” in Proc. IEEE Int. Symp. Inf. Theory, Istanbul, Turkey, Jul. 2013,
From [23, Lemma 2.14], (153) implies that D(η , δ) should pp. 2840–2844.

−η
contain atleast 1− 1−η fraction (approx.) of sequences in [13] Y. Xiang and Y.-H. Kim, “Interactive hypothesis testing with
k k communication constraints,” in Proc. 50th Annu. Allerton Conf.
T[P V ]δ
and for each v ∈ D(η , δ), (152) implies that A(v k , δ) Commun., Control, Comput., Monticello, IL, USA, Oct. 2012,

should contain atleast η fraction (approx.) of sequences pp. 1065–1072.
nk
in T[P ∗ , asymptotically. Hence, for sufficiently large k, [14] G. Katz, P. Piantanida, and M. Debbah, “Collaborative distributed
Y ]δ hypothesis testing,” Apr. 2016, arXiv:1604.01292. Accessed: Jan. 2018.
we have [Online]. Available: https://arxiv.org/abs/1604.01292
[15] G. Katz, P. Piantanida, and M. Debbah, “Distributed binary detection
QV k Y nk A(k,nk )
with lossy data compression,” IEEE Trans. Inf. Theory, vol. 63, no. 8,
≥ QV k (v k ) PY n (y nk ) pp. 5207–5227, Aug. 2017.
[16] S. Salehkalaibar, M. Wigger, and L. Wang, “Hypothesis testing over
v k ∈D(η ,δ) y nk ∈A(v k ,δ) the two-hop relay network,” IEEE Trans. Inf. Theory, vol. 65, no. 7,
1− −η
pp. 4411–4433, Jul. 2019.
[17] R. E. Blahut, “Hypothesis testing and information theory,” IEEE Trans.
−k D(P V ||QV )−
log
1−η
k k

− log(η )
−O(δ) Inf. Theory, vol. IT-20, no. 4, pp. 405–417, Jul. 1974.
≥e . (154) [18] T. S. Han and K. Kobayashi, “Exponential-type error probabilities for
multiterminal hypothesis testing,” IEEE Trans. Inf. Theory, vol. 35, no. 1,
Here, (154) follows from [23, Lemma 2.6]. pp. 2–14, Jan. 1989.
nk [19] S.-I. Amari and T. S. Han, “Statistical inference under multiterminal
Let A(k,nk ) := T[P
k
V ]δ
× T[P ∗ . Then, for sufficiently large
Y ]δ rate restrictions: A differential geometric approach,” IEEE Trans. Inf.
k, Theory, vol. 35, no. 2, pp. 217–227, Mar. 1989.
(k) [20] T. S. Han and S.-I. Amari, “Statistical inference under multiterminal data
PV k Y nk A(k,nk ) −−→ 1, (155) compression,” IEEE Trans. Inf. Theory, vol. 44, no. 6, pp. 2300–2324,
Oct. 1998.
[21] S. Watanabe, “Neyman–Pearson test for zero-rate multiterminal hypoth-
and QV k Y nk A(k,nk ) ≤ e−k(D(PV ||QV )−O(δ)) , (156) esis testing,” IEEE Trans. Inf. Theory, vol. 64, no. 7, pp. 4923–4939,
Jul. 2018.
[22] N. Weinberger and Y. Kochman, “On the reliability function of dis-
where, (155) and (156) follows from weak law of large tributed hypothesis testing under optimal detection,” IEEE Trans. Inf.
numbers and [23, Lemma 2.6], respectively. Together (154), Theory, vol. 65, no. 8, pp. 4940–4965, Aug. 2019.
(155) and (156) implies that [23] I. Csiszár and J. Körner, Information Theory: Coding Theorems for
Discrete Memoryless Systems. Cambridge, U.K.: Cambridge Univ. Press,
|κ(τ, ) − κ(τ )| ≤ O(δ), 2011.
[24] R. G. Gallager, “A simple derivation of the coding theorem and some
and the proposition is proved since δ > 0 is arbitrary. applications,” IEEE Trans. Inf. Theory, vol. IT-11, no. 1, pp. 3–18,
Jan. 1965.
[25] S. Borade, B. Nakiboğlu, and L. Zheng, “Unequal error protection:
R EFERENCES An information-theoretic perspective,” IEEE Trans. Inf. Theory, vol. 55,
[1] S. Sreekumar and D. Gündüz, “Distributed hypothesis testing over noisy no. 12, pp. 5511–5539, Dec. 2009.
channels,” in Proc. IEEE Int. Symp. Inf. Theory, Aachen, Germany, [26] I. Csiszár, “Joint source-channel error exponent,” Probl. Control Inf.
Jun. 2017, pp. 983–987. Theory, vol. 9, no. 5, pp. 315–328, 1980.
[2] H. Chernoff, “A measure of asymptotic efficiency for tests of a hypothe- [27] T. M. Cover, A. El Gamal, and M. Salehi, “Multiple access channels
sis based on the sum of observations,” Ann. Math. Statist., vol. 23, no. 4, with arbitrarily correlated sources,” IEEE Trans. Inf. Theory, vol. IT-26,
pp. 493–507, 1952. no. 6, pp. 648–657, Nov. 1980.
[3] W. Hoeffding, “Asymptotically optimal tests for multinomial dis- [28] P. Minero, S. H. Lim, and Y.-H. Kim, “A unified approach to hybrid
tributions,” Ann. Math. Statist., vol. 36, no. 2, pp. 369–401, coding,” IEEE Trans. Inf. Theory, vol. 61, no. 4, pp. 1509–1523,
1965. Apr. 2015.
[4] T. Berger, “Decentralized estimation and decision theory,” in Proc. [29] J. Neyman and E. S. Pearson, “On the problem of the most efficient tests
IEEE 7th Springs Workshop Inf. Theory, Mount Kisco, NY, USA, of statistical hypotheses,” Philos. Trans. Roy. Soc. London. A, Containing
Sep. 1979. Papers Math. Phys. Character, vol. 231, nos. 694–706, pp. 289–337,
[5] R. Ahlswede and I. Csiszar, “Hypothesis testing with communication 1933.
constraints,” IEEE Trans. Inf. Theory, vol. IT-32, no. 4, pp. 533–542, [30] S. Sreekumar and D. Gündüz, “Hypothesis testing over a noisy chan-
Jul. 1986. nel,” in Proc. IEEE Int. Symp. Inf. Theory, Paris, France, Jul. 2019,
[6] T. S. Han, “Hypothesis testing with multiterminal data compres- pp. 2004–2008.
sion,” IEEE Trans. Inf. Theory, vol. IT-33, no. 6, pp. 759–772, [31] A. El Gamal and Y.-H. Kim, Network Information Theory. Cambridge,
Nov. 1987. U.K.: Cambridge Univ. Press, 2011.
Sreejith Sreekumar (S’15) is a PhD student at Imperial College London. lie in the areas of communications and information theory, machine learning,
He received the B.Tech. degree in electrical engineering from the National and privacy.
Institute of Technology, Calicut, in 2011, and the M.Tech. degree in commu- Dr. Gündüz is an Editor of the IEEE T RANSACTIONS ON W IRELESS
nication engineering from Indian Institute of Technology, Bombay, in 2013. C OMMUNICATIONS and the IEEE T RANSACTIONS ON G REEN C OMMU -
From 2013 to 2015, he worked as a Systems Design Engineer at Broadcom NICATIONS AND N ETWORKING . He also served as a Guest Editor of the
Communications Pvt. Ltd, Bengaluru. His research interests lie in the areas of IEEE JSAC Special Issue on Machine Learning in Wireless Communication
information and communication theory, coding theory and machine learning. (2019), and an Editor of the IEEE T RANSACTIONS ON C OMMUNICATIONS
(2013-18). He is a Distinguished Lecturer for the IEEE Information Theory
Deniz Gündüz (S’03–M’08–SM’13) received the B.S. degree in electrical and Society (2020-2021). He is the recipient of the IEEE Communications
electronics engineering from METU, Turkey in 2002, and the M.S. and Ph.D. Society—Communication Theory Technical Committee (CTTC) Early
degrees in electrical engineering from NYU Tandon School of Engineering Achievement Award in 2017, a Starting Grant of the European Research
(formerly Polytechnic University) in 2004 and 2007, respectively. After his Council (ERC) in 2016, IEEE Communications Society Best Young
Ph.D., he served as a postdoctoral research associate at Princeton University, Researcher Award for the Europe, Middle East, and Africa Region in
and as a consulting assistant professor at Stanford University. He was a 2014, Best Paper Award at the 2016 IEEE Wireless Communications
research associate at CTTC in Barcelona, Spain until September 2012, when and Networking Conference (WCNC), and the Best Student Paper
he joined the Electrical and Electronic Engineering Department of Imperial Awards at the 2018 IEEE Wireless Communications and Networking
College London, U.K., where he is currently a Reader (Associate Professor) Conference (WCNC) and the 2007 IEEE International Symposium on
in information theory and communications, and serves as the deputy head Information Theory (ISIT). He was the General Co-chair of the 2019 London
of the Intelligent Systems and Networks Group, and leads the Information Symposium on Information Theory, 2018 International ITG Workshop on
Processing and Communications Laboratory (IPC-Lab). His research interests Smart Antennas, and 2016 IEEE Information Theory Workshop.

16 Sreekumar20IT

Uploaded by

Copyright:

Available Formats

You might also like

16 Sreekumar20IT

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

16 Sreekumar20IT

Uploaded by

Copyright:

Available Formats

2044 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 66, NO.

Distributed Hypothesis Testing Over Discrete

Abstract— A distributed binary hypothesis testing (HT) prob-

He (xm |y m ) := HPX̃ Ỹ (X̃|Ỹ ), (2) For (τ, ) ∈ R+ × [0, 1], let

In [8], Shimokawa et al. obtained a lower bound on the (7)

E1 (PW |U ) := min D(PŨ Ṽ W̃ ||QUV W ), (13)

T1 (PUW , PV W ) := {PŨ Ṽ W̃ ∈ TU VW : PŨ W̃ = PUW , PṼ W̃ = PV W },

T2 (PUW , PV ) := {PŨ Ṽ W̃ ∈ TU VW : PŨ W̃ = PUW , PṼ = PV , H(W̃ |Ṽ ) ≥ HP (W |V )},

T3 (PUW , PV ) := {PŨ Ṽ W̃ ∈ TU VW : PŨ W̃ = PUW , PṼ = PV }.

Example 1. Let U = V = X = Y = {0, 1} and PU = QU = Also, we have

Q̌UV SX XY (PS , PX |US ) := QUV PS PX |US ½(X = X  )PY |X , (26)

T1 (P̂US W̄ , P̂V S W̄ Y ) := {PŨ Ṽ S̃ W̃ Ỹ ∈ TU VSWY : PŨ S̃ W̃ = P̂US W̄ , PṼ S̃ W̃ Ỹ = P̂V S W̄ Y },

E3 (PW |U , PSX , R, 1) = R − IP (U ; W |V ) + Ex (R, PSX )

min{E2 (PW |U , PSX , R), E3 (PW |U , PSX , R, 1)}

Hence, from (36) and (37), we have

sup min{E2 (PW |U , PSX , R), E3 (PW |U , PSX , R,1)}

Also, note that (35) implies hb (r) ≥ hb (q); and hence, r ∈

E1 (PW |U ) := min D(PŨ Ṽ W̃ ||QUV W ), (30)

where, Db denotes  the binary KL divergence

Codebook Generation: Let k ∈ Z+ and n = τ k. Fix ĵ = m̂, otherwise.

P(D0 |U k = uk , V k = v k , J = j, fB (J) = m, EN E ). (81)

P(EDE | V k = v k , W k (J) = wk , EME

EBE = ∃ l ∈ [Mk ] , l = J, fB (l) = M̂, W k (l)) ∈ T[W

F = {U k = uk , V k = v k , J = 1, fB (J) = 1, W k (1) = wk , EN E }, (85)

P(D0 |F1 ) ≤ P( ∃ W k (l) : fB (l) = M̂ = 1, (W k (l), v k ) ∈ T[W

Next, consider the event when there are no encoding or

P(U k = uk , V k = v k |H = 1) P(W k (1) = wk |U k = uk , V k = v k , J = 1, fB (J) = 1, EN E )

for such v k ∈ T[V

= min D(PŨ Ṽ W̃ ||QUV W ) − O(δ) = E1 (PW |U ) − O(δ), (113)

≤ max ekH(Ũ Ṽ W̃ ) e−k(H(Ũ Ṽ )+D(PŨ Ṽ ||QU V )+H(W̃ |Ũ)+R−I(U;W |V )−O(δ))

e(|U ||V||W| log(k+1)+|U ||W| log(k+1))

= P(J = j| U n = un , V n = v n , ẼN E ) P(D0 |U n = un , V n = v n , J = 1, ẼN E )

≤e −n(I(W̄ ;V,Y |S)−I(U;W̄ |S)−O(δ))

and H(W̃ |Ṽ, S̃, Ỹ ) ≥ H(W̄ |V, S, Y ) − O(δ)},

≤ (n + 1)|U ||V||S||Y| max enH(Ũ Ṽ S̃ Ỹ ) e−nH(Ũ Ṽ ) Next, note that

You might also like

He (xm |y m ) := HPX̃ Ỹ (X̃|Ỹ ), (2) For (τ, ) ∈ R+ × [0, 1], let

Q̌UV SX XY (PS , PX |US ) := QUV PS PX |US ½(X = X )PY |X , (26)

T1 (P̂US W̄ , P̂V S W̄ Y ) := {PŨ Ṽ S̃ W̃ Ỹ ∈ TU VSWY : PŨ S̃ W̃ = P̂US W̄ , PṼ S̃ W̃ Ỹ = P̂V S W̄ Y },

where, Db denotes the binary KL divergence

Codebook Generation: Let k ∈ Z+ and n = τ k. Fix ĵ = m̂, otherwise.

EBE = ∃ l ∈ [Mk ] , l = J, fB (l) = M̂, W k (l)) ∈ T[W