Professional Documents
Culture Documents
(Evgueni A. Haroutunian, Mariam E. Haroutunian, As
(Evgueni A. Haroutunian, Mariam E. Haroutunian, As
Information Theory
and in Statistical
Hypothesis Testing
Reliability Criteria in
Information Theory
and in Statistical
Hypothesis Testing
Evgueni A. Haroutunian
National Academy of Sciences of Armenia
Republic of Armenia
eghishe@sci.am
Mariam E. Haroutunian
National Academy of Sciences of Armenia
Republic of Armenia
armar@ipia.sci.am
Ashot N. Harutyunyan
Universität Duisburg-Essen
Germany
ashot@iem.uni-due.de
Boston – Delft
Foundations and Trends
R
in
Communications and Information Theory
ISBN: 978-1-60198-046-5
c 2008 E. A. Haroutunian, M. E. Haroutunian
and A. N. Harutyunyan
All rights reserved. No part of this publication may be reproduced, stored in a retrieval
system, or transmitted in any form or by any means, mechanical, photocopying, recording
or otherwise, without prior written permission of the publishers.
Photocopying. In the USA: This journal is registered at the Copyright Clearance Cen-
ter, Inc., 222 Rosewood Drive, Danvers, MA 01923. Authorization to photocopy items for
internal or personal use, or the internal or personal use of specific clients, is granted by
now Publishers Inc for users registered with the Copyright Clearance Center (CCC). The
‘services’ for users can be found on the internet at: www.copyright.com
For those organizations that have been granted a photocopy license, a separate system
of payment has been arranged. Authorization does not extend to other kinds of copy-
ing, such as that for general distribution, for advertising or promotional purposes, for
creating new collective works, or for resale. In the rest of the world: Permission to pho-
tocopy must be obtained from the copyright owner. Please apply to now Publishers Inc.,
PO Box 1024, Hanover, MA 02339, USA; Tel. +1-781-871-0245; www.nowpublishers.com;
sales@nowpublishers.com
now Publishers Inc. has an exclusive license to publish this material worldwide. Permission
to use this content must be obtained from the copyright license holder. Please apply to now
Publishers, PO Box 179, 2600 AD Delft, The Netherlands, www.nowpublishers.com; e-mail:
sales@nowpublishers.com
Foundations and Trends
R
in
Communications and Information Theory
Volume 4 Issue 2–3, 2007
Editorial Board
Editor-in-Chief:
Sergio Verdú
Depart of Electrical Engineering
Princeton University
Princeton, New Jersey 08544
Editors
Evgueni A. Haroutunian1 ,
Mariam E. Haroutunian2 and
Ashot N. Harutyunyan3
1
Inst. for Informatics and Automation Problems, National Academy of
Sciences of Armenia, Yerevan, Republic of Armenia, eghishe@sci.am
2
Inst. for Informatics and Automation Problems, National Academy of
Sciences of Armenia, Yerevan, Republic of Armenia, armar@ipia.sci.am
3
AvH Fellow, Inst. für Experimentelle Mathematik, Universität
Duisburg-Essen, Essen, Germany, ashot@iem.uni-due.de
Abstract
This survey is devoted to one of the central problems of Information
Theory — the problem of determination of interdependence between
coding rate and error probability exponent for different information
transmission systems. The overview deals with memoryless systems
of finite alphabet setting. It presents material complementary to the
contents of the series of the most remarkable in Information Theory
books of Feinstain, Fano, Wolfowitz, Gallager, Csiszar and Körner,
Kolesnik and Poltirev, Blahut, Cover and Thomas and of the papers
by Dobrushin, Gelfand and Prelov.
We briefly formulate fundamental notions and results of Shan-
non theory on reliable transmission via coding and give a survey of
results obtained in last two-three decades by the authors, their col-
leagues and some other researchers. The paper is written with the
goal to make accessible to a broader circle of readers the theory of
rate-reliability. We regard this concept useful to promote the noted
problem solution in parallel with elaboration of the notion of reliability-
reliability dependence relative to the statistical hypothesis testing and
identification.
Preface
ix
x Preface
1 Introduction 1
3 Multiuser Channels 33
3.1 Two-Way Channels 33
3.2 Interference Channels 38
3.3 Broadcast Channels 43
3.4 Multiple-Access Channels 47
xi
xii Contents
References 157
1
Introduction
1
2 Introduction
so, if C0 > 0, in the other case C(E) = 0 when E is great enough). So,
by analogy with the definition of the capacity, this characteristic of
the channel may be called E-capacity. From the other side the name
rate-reliability function is also logical. One of the advantages of our
approach is the convenience in study of the optimal rates of source
codes ensuring given exponential decrease of probability of exceeding
the given distortion level of messages restoration. This will be the rate-
reliability-distortion function R(E, ∆, P ) inverse to exponent function
E(R, ∆, P ) by Marton [171]. So the name shows which dependence of
characteristics is in study. Later on, it is possible to consider also other
arguments, for example, coding rates on the other inputs of channel
or source, if their number is greater than one. This makes the theory
more well-proportioned and comprehensible.
Concerning methods for the bounds construction, it is found that
the Shannon’s random coding method [191] of proving the existence of
codes with definite properties, can be applied with the same success for
studying of the rate-reliability function. For the converse coding the-
orem type upper bounds deduction (so called sphere packing bounds)
E. Haroutunian proposed a simple combinatorial method [98, 102],
which one can apply to various systems. This method is based on the
proof of the strong converse coding theorem, as it was in the method
put forth in [99] and used by other authors [35, 51], and [152] for deduc-
tion of the sphere packing bound for the reliability function. Moreover,
derivation of the upper bound of C(E) by passage to limit for E → ∞
comes to be the upper bound for the zero-error capacity C0 .
We note the following practically useful circumstance: the compar-
ison of the analytical form of writing of the sphere packing bound for
C(E) with expression of the capacity C in some cases gives us the pos-
sibility to write down formally the bound for each system, for which the
achievable rates region (capacity) is known. In rate-reliability-distortion
theory an advantage of the approach is the technical ease of treatment
of the coding rate as a function of distortion and error exponent which
allows to convert readily the results from the rate-reliability-distortion
area to the rate-distortion ones looking at the extremal values of the
reliability, e.g., E → 0, E → ∞. That fact is especially important when
one deals with multidimensional situation. Having solved the problem
1.3 Notations for Measures of Information and Some Identities 5
joint PD of RV X and Y be
M
P ◦ V = {P ◦ V (x, y) = P (x)V (y|x), x ∈ X , y ∈ Y},
and PD of RV Y be
( )
M
X
PV = P V (y) = P (x)V (y|x), y ∈ Y .
x∈X
6 Introduction
W : X → Y.
W N : X N → YN ,
9
10 E-capacity of the Discrete Memoryless Channel
We consider two versions of error probability of the code (f, g): the
maximal probability of error
M
e(f, g, N, W ) = max e(m),
m∈M (2.3)
M
e(M, N, W ) = min e(f, g, N, W ),
where the minimum is taken over all codes (f, g) of volume M , the
average probability of error for equiprobable messages
M 1
X
e(f, g, N, W ) = e(m), (2.4)
M
m∈M
2.1 Channel Coding and Error Probability: Shannon’s Theorem 11
with
M
e(M, N, W ) = min e(f, g, N, W )
e(f, g, N, W ) ≤ e(f, g, N, W ).
The same result is valid for the case of maximal probability of error.
One of the first proofs of Shannon’s theorem was given in the
works of Feinstein [74, 75], where it was also established that for
R < C(W ) the probability of error tends to zero exponentially with
growing N .
For a given channel W and given rate R the optimal exponent E(R)
of the exponential decrease of the error probability first was considered
by Shannon [195], who called it the reliability function.
Blahut [34]:
M
Esp (R, P, W ) = min D(V kW |P ), (2.8)
V :IP,V (X∧Y )≤R
M
Esp (R, W ) = max Esp (R, P, W ). (2.9)
P
Theorem 2.1. For any DMC W , for all R ∈ (0, C(W )) the following
inequality takes place
The proofs are in [35, 51, 99, 152], see also [178].
A similar way to (2.8) and (2.9) form of writing of the random cod-
ing bound of reliability function Er (R, W ) was introduced by Csiszár
and Körner [51] and defined as
M
Er (R, P, W ) = min(D(V kW |P ) + |IP,V (X ∧ Y ) − R|+ ),
V
Ex (R, P, W ) = max e + I (X ∧ X)
[EdB (X, X) e − R],
P,V
e
e =P, IP,V (X∧X)≤R
PX =PX
e both from X .
is the Bhattacharyya distance [35, 50] between x and x
14 E-capacity of the Discrete Memoryless Channel
Theorem 2.2. For any DMC W , for R ∈ (0, C(W )) the following
inequality holds
The smallest value of R, at which the convex curve Esp (R, W ) meets
its supporting line of slope −1, is called the critical rate and denoted
by Rcr .
The comparison of Esp (R, W ) and Er (R, W ) results in
Corollary 2.1. For Rcr ≤ R < C(W ) the reliability function of DMC
W is found exactly: E(R, W ) = Esp (R, W ) = Er (R, W ).
and with its help the above-mentioned refinement of upper bound for
the reliability function was obtained.
The graph of the typical behavior of the reliability function bounds
for the DMC is given in Figure 2.2.
In [100] Haroutunian proved that the reliability function E(R, W )
of DMC is a continuous and strictly monotone function of R for all
R > C0 (W ). This fact follows from the inequality
R1 + R2 Esp (R1 , W ) + E(R2 , W )
E ,W ≤ , (2.12)
2 2
which is valid for all R1 ≤ R2 < C(W ). The inequality (2.12) in turn
follows from the inequality (2.11).
and denoted by Rsp (E, W ), of the E-capacity C(E, W ) for the average
error probability. This bound, as the name shows, is the analogue of
the sphere packing bound (2.8), (2.9) for E(R, W ).
Let V : X → Y be a stochastic matrix. Consider the following
functions
M
Rsp (P, E, W ) = min IP,V (X ∧ Y ),
V :D(V kW |P )≤E
M
(2.15)
Rsp (E, W ) = max Rsp (P, E, W ).
P
Theorem 2.4. For DMC W , for E > 0 the following inequalities hold
Proof. Let E and δ be given such that E > δ > 0. Let the code (f, g)
of length N be defined, R be the rate of the code and average error
probability satisfies the condition
≤ M exp{−N (E − δ)}.
18 E-capacity of the Discrete Memoryless Channel
or
X M exp{−N (E − δ)}
|TPN∗ ,V (Y |f (m))| −
exp{−N (D(V kW |P ∗ ) + HP ∗ ,V (Y |X))}
m:f (m)∈TPN∗ (X)
X \
TP ∗ ,V (Y |f (m)) g −1 (m) .
N
≤
m:f (m)∈TPN∗ (X)
D(V kW |P ∗ ) ≤ E − δ.
Example 2.1. We shall calculate Rsp (E, W ) for the binary symmetric
channel (BSC). Consider BSC W with
W (00 |1) = W (10 |0) = w1 > 0, W (00 |0) = W (10 |1) = w2 > 0.
It is clear that w1 + w2 = 1, v1 + v2 = 1.
The maximal value of the mutual information IP,V (X ∧ Y ) in the
definition of Rsp (E, W ) is obtained when p∗ (0) = p∗ (1) = 1/2 because
of symmetry of the channel, therefore
IP ∗ ,V (X ∧ Y ) = 1 + v1 log v1 + v2 log v2 .
Theorem 2.5. For DMC W , for all E > 0 the following bound of
E-capacity holds
Rr (E, W ) ≤ C(E, W ) ≤ C(E, W ).
2.4 Random Coding Bound for E-capacity 21
is valid for any M . It remains to prove Lemma 2.6 for V 0 such that
D(V 0 kW |P ) < E. Denote
\ [
0 N N 0
Am (C, V, V ) = TP,V (Y |x(m))
TP,V 0 (Y |x(m ))
m6=m0
and
X X
Am (C) = (N + 1)|X ||Y| Am (C, V, V 0 )
V V 0 :D(V 0 kW |P )<E
0
× exp N (E − D(V kW |P ) − HP,V (Y |X)) .
On account of (1.4) C satisfies (2.21) for every m, V, V 0 , if
Am (C) ≤ 1, m = 1, M . (2.22)
22 E-capacity of the Discrete Memoryless Channel
m6=m0 y∈Y N
From (2.24) for any V 0 such that D(V 0 kW |P ) < E it follows that
and we get
or
and
Denote
M
D(P ) = V, V 0 : (2.25) is valid .
× exp −N (E − D(V 0 kW |P ))
where dB (X, X)
e is the Bhattacharyya distance (2.10) and
M
Rx (E, W ) = max Rx (P, E, W ).
P
2.6 Random Coding and Expurgated Bounds Derivation 25
Theorem 2.7. For DMC W for any E > 0 the following bound holds
In the next theorem the region, where the upper and the lower
bounds coincide, is pointed out. Let
∂Rsp (P, E, W )
Ecr (P, W ) = min E : ≥ −1 .
∂E
Theorem 2.8. For DMC W and PD P , for E ∈ [0, Ecr (P, W )] we have
and
Lemma 2.10. For any type P , for any r ∈ (0, |TPN (X)|) a set C exists,
such that C ⊂ TPN (X), |C| ≥ r and for any x
e ∈ C and matrix V : X → X ,
different from the identity matrix, the following inequality holds
\
N N
TP,V (X|e
x) C ≤ r TP,V x) exp{−N (HP (X) − δN )}, (2.27)
(X|e
gα : Y N → MN ,
M
Here Ve = {Ve (y|e e ∈ X , y ∈ Y} is a matrix different from V but guar-
x), x
anteeing that
X X
P (x)V (y|x) = x)Ve (y|x), y ∈ Y,
P (e (2.28)
x∈X e∈X
x
or equivalently, P V = P Ve .
2.6 Random Coding and Expurgated Bounds Derivation 27
+ D(V kW |P ) − E|+ ,
n
M
Rα,x (P, E, W ) = min IP,V (X ∧ X)
e + I
e (Y ∧ X|X)
e
e
e
V ,V,Ve :Ve ≺α V
P,V ,V
+ o
+ D(V kW |P ) − E .
Then
moreover, for
Rr (P, E, W ) = 0min
0
|Rsp (P, E 0 , W ) + E 0 − E|+ .
E :E ≤E
Proof. Since the function Rsp (P, E, W ) is convex in E, then for the
values of E less than Ecr (P, W ) the tangency of the tangent is less
than −1, and for E greater than Ecr (P, W ), it is equal or greater than
−1. In other words
Rsp (P, E, W ) − Rsp (P, E 0 , W )
< −1, when E 0 < E ≤ Ecr ,
E − E0
from where
and consequently
We obtain from this equality and Lemma 2.16 the statement of the
lemma for the case E ≤ Ecr (P, W ). Now if Ecr (P, W ) ≤ E 0 < E then
and consequently
Again using Lemma 2.16 and the latest equality we obtain that for the
case E ≥ Ecr
n
Rr (P, E, W ) = min 0 min 0 |Rsp (P, E 0 , W ) + E 0 − E|+ ,
E :Ecr ≤E <E
o
0 0 +
0
min
0
|R sp (P, E , W ) + E − E|
E :E ≤Ecr
= |Rsp (P, Ecr , W ) + Ecr − E|+ .
Remark 2.4. In the interval (0, Ecr (P, W )] the functions R(P, E, W )
and R(E, W ) are exactly determined by
and
33
34 Multiuser Channels
Denote
X
Wi (yi |x1 , x2 ) = W (y1 , y2 |x1 , x2 ), i = 1, 2.
y3−i
then
ei (m1 , m2 ) = WiN {YiN − gi−1 (mi |m3−i )|f (m1 , m2 )}, i = 1, 2, (3.1)
The region of all E-achievable rate pairs is called the E-capacity region
for average error probability and denoted by C(E, WT ).
The RTWC as well as the general TWC were first investigated
by Shannon [196], who obtained the capacity region C(WT ) of the
RTWC. The capacity region of the general TWC has not been found
up to now. Important results relative to various models of two-way
channels were obtained by authors of works [1, 2, 3, 61, 62, 144,
188, 228]. In particular, Dueck [61] demonstrated that the capacity
regions of TWC for average and maximal error probabilities do not
coincide.
Here the outer and inner bounds for C(E, WT ) are constructed.
Unfortunately, these bounds do not coincide and the liquidation of this
difference is a problem waiting for its solution.
Consider the following PDs:
P = {P (x1 , x2 ), x1 ∈ X1 , x2 ∈ X2 },
X
Pi = Pi (xi ) = P (x1 , x2 ), xi ∈ Xi , i = 1, 2,
x3−i
36 Multiuser Channels
and
!
[
Rsp (E, WT ) = co Rsp (P, E, WT ) .
P
+ D(P ◦ V1 kP ∗ ◦ W1 ) − E1 |+ , (3.6)
R2 ≤ min |IP,V2 (X2 ∧ X1 , Y2 )
P,V2 :D(P ◦V2 kP ∗ ◦W2 )≤E2
o
+ D(P ◦ V2 kP ∗ ◦ W2 ) − E2 |+ , (3.7)
and
!
[
Rr (E, WT ) = co Rr (P ∗ , E, WT ) .
P∗
where
Fig. 3.2 (a) The outer bound of the capacity region. (b) The inner bound on the capacity
region.
Fig. 3.3 (a) The inner bound of E-capacity. (b) The outer bound of E-capacity.
40 Multiuser Channels
[25, 44, 66, 90, 96, 169, 189, 203] but the capacity region is found only
in particular cases.
The main definitions are the same as for TWC except the error
probabilities of messages m1 and m2 , which are
ei (m1 , m2 ) = WiN {YiN − gi−1 (mi )|f (m1 , m2 )}, i = 1, 2,
because the decoding functions are gi : YiN → Mi , i = 1, 2.
In [129] the following theorem is proved.
+ D(P ◦ V1 kP ∗ ◦ W1 ) − E1 |+ ,
R2 ≤ min|IP,V2 (Y2 ∧ X2 )
P,V2 :D(P ◦V2 kP ∗ ◦W2 )≤E2
o
+ D(P ◦ V2 kP ∗ ◦ W2 ) − E2 |+ .
[
Rr (E, WI ) = Rr (P ∗ , E, WI )
P∗
is the random coding bound of E-capacity region in the case of average
error probability for GIFC:
Rr (E, WI ) ⊆ C(E, WI ).
3.2 Interference Channels 41
Lemma 3.3. For any E1 > 0, E2 > 0, δ ∈ (0, min(E1 , E2 )), type P ∗ on
X1 × X2 , if
1
0≤ log M1 ≤ min |IP,V1 (Y1 ∧ X1 )
N P,V1 :D(P ◦V1 kP ∗ ◦W1 )≤E1
+ D(P ◦ V1 kP ∗ ◦ W1 ) − E1 |+ − δ,
1
0≤ log M2 ≤ min |IP,V2 (Y2 ∧ X2 )
N P,V2 :D(P ◦V2 kP ∗ ◦W2 )≤E2
+ D(P ◦ V2 kP ∗ ◦ W2 ) − E2 |+ − δ,
then there exist M1 not necessarily distinct vectors x1 (m1 ) ∈ TP1 (X1 )
and M2 vectors x2 (m2 ) ∈ TP2 (X2 ) such that for all P : X1 → X2 , P 0 :
X1 → X2 , Vi : (X1 × X2 ) → Yi , Vi0 : (X1 × X2 ) → Yi , i = 1, 2, for suffi-
ciently large N the following inequalities take place
X \ [
0 0
TP,V (Y1 |f (m1 , m2 )) T P 0 ,V 0 (Y1 |f (m1 , m2 ))
1 1
f (m1 ,m2 )∈TP (X1 ,X2 ) m02 ; m01 6=m1
+
≤ exp{N HP,V1 (Y1 |X1 X2 )} exp{−N E1 − D(P 0 ◦ V10 kP ∗ ◦ W1 ) }
Now consider the situation when the second encoder learns from
the first encoder the codeword, that will be sent in the present block.
This model called IFC with cribbing encoders is given in Figure 3.5.
In this case the second codeword depends on the choice of the first
one: f2 : M2 × X1N → X2N and the random coding bound of E-capacity
42 Multiuser Channels
region for average error probability in Theorem 3.2 will take the fol-
lowing form:
n
Rr (P, E, W ) = (R1 , R2 ) : R1 ≥ 0, R2 ≥ 0,
+ D(V1 kW1 |P ) − E1 |+ ,
R2 ≤ min |IP,V2 (Y2 ∧ X2 ) − IP (X1 ∧ X2 )
V2 :D(V2 kW2 |P )≤E2
o
+ D(V2 kW2 |P ) −E2 |+ .
[
Rr (E, W ) = Rr (P, E, W ).
P
Lemma 3.4. For all E1 > 0, E2 > 0, δ ∈ (0, min(E1 , E2 )), type P , if
1 +
log M1 ≤ min |IP,V1 (Y1 ∧ X1 ) + D(V1 kW1 |P ) − E1 | − δ,
N V1 :D(V1 kW1 |P )≤E1
1
log M2 ≤ min |IP,V2 (Y2 ∧ X2 ) − IP (X1 ∧ X2 )
N V2 :D(V2 kW2 |P )≤E2
+ D(V2 kW2 |P ) − E2 |+ − δ,
then there exist M1 not necessarily distinct vectors x1 (m1 ) ∈ TP1 (X1 )
and for each x1 (m1 ) ∈ TP1 (X1 ) there exist M2 not necessarily dis-
tinct vectors x2 (m2 , x1 (m1 )) ∈ TP2 (X2 |x1 (m1 )) such that for all Vi :
(X1 × X2 ) → Yi , Vi0 : (X1 × X2 ) → Yi , i = 1, 2, for sufficiently large N
3.3 Broadcast Channels 43
Here we omit the proofs of Theorem 3.2, Lemmas 3.3 and 3.4 leaving
them as tasks to the reader.
and average
1 X
ei (f, gi , N, WB ) = ei (m0 , m1 , m2 ), i = 1, 2, (3.9)
M0 M1 M2 m ,m ,m
0 1 2
error probabilities.
Let E = (E1 , E2 ), Ei > 0, i = 1, 2. Nonnegative real numbers
R0 , R1 , R2 are called E-achievable rates triple for GBC if for any δ > 0
and sufficiently large N there exists a code such that
1
log M0 Mi ≥ R0 + Ri − δ, i = 1, 2, (3.10)
N
3.3 Broadcast Channels 45
and
Q ◦ P ◦ Vi = {Q ◦ P ◦ Vi (u0 , u1 , u2 , x, yi ) = Q(u0 , u1 , u2 )
×P (x|u0 , u1 , u2 )Vi (yi |x), i = 1, 2},
46 Multiuser Channels
where
Q = {Q(u0 , u1 , u2 ), ui ∈ Ui , i = 0, 1, 2},
P = {P (x|u0 , u1 , u2 ), x ∈ X , ui ∈ Ui , i = 0, 1, 2},
Vi = {Vi (yi |x), x ∈ X , yi ∈ Yi }, i = 1, 2.
Rir (Q, P, E, WB )
= {(R0 , R1 , R2 ) : inequalities (3.13), (3.14), (3.15) take place for some
(U0 , U1 , U2 ) → X → Yi , i = 1, 2} ,
[
Rr (Q, P, E, WB ) = Rir (Q, P, E, WB ), (3.16)
i=1,2
[
Rr (E, WB ) = Rr (Q, P, E, WB ).
QP ∈QP(U0 ×U1 ×U2 ×X )
Theorem 3.5. For all E1 > 0, E2 > 0 the region Rr (E, WB ) is an inner
estimate for E-capacity region of BC:
Rr (E, WB ) ⊆ C(E, WB ) ⊆ C(E, WB ),
or in other words any rate triple (R0 , R1 , R2 ) ∈ Rr (E, WB ) is
E-achievable for the BC WB .
where
Rr (Q, P, WB ) = {(R0 , R1 , R2 ) : for some (U0 , U1 , U2 ) → X → Yi },
i = 1, 2,
0 ≤ R0 ≤ min {IQ,P,W1 (Y1 ∧ U0 ); IQ,P,W2 (Y2 ∧ U0 )} ,
0 ≤ R0 + Ri ≤ IQ,P,Wi (Yi ∧ U0 Ui ), i = 1, 2,
R0 + R1 + R2 ≤ min {IQ,P,W1 (Y1 ∧ U0 ); IQ,P,W2 (Y2 ∧ U0 )}
+ IQ,P,W1 (Y1 ∧ U1 |U0 ) + IQ,P,W2 (Y2 ∧ U2 |U0 )
− IQ (U1 ∧ U2 |U0 ).
WM = {W (y|x1 , x2 ), x1 ∈ X1 , x2 ∈ X2 , y ∈ Y},
where X1 and X2 are the finite alphabets of the first and the second
inputs of the channel and Y is the finite output alphabet.
There exist various configurations of the MAC [2, 185, 199, 217].
The most general model of the MAC, the MAC with correlated encoder
inputs, was first studied by Slepian and Wolf [199] and then by Han
[89]. Three independent sources create messages to be transmitted by
two encoders (Figure 3.7). One of the sources is connected with both
encoders and each of the two others is connected with only one of the
encoders.
Let M0 = {1, 2, . . . , M0 }, M1 = {1, 2, . . . , M1 } and M2 = {1,
2, . . . , M2 } be the message sets of corresponding sources. The code
of length N for this model is a collection of mappings (f1 , f2 , g),
where f1 : M0 × M1 → X1N , f2 : M0 × M2 → X2N are the encod-
ings and g : Y N → M0 × M1 × M2 is the decoding. The numbers
N −1 log Mi , i = 0, 1, 2, are called code rates. Denote
then
Dueck [61] has shown that in general the maximal error capac-
ity region of MAC is smaller than the corresponding average error
capacity region. Determination of the maximal error capacity region
of the MAC in various communication situations is still an open
problem.
In [199] the achievable rates region of MAC with correlated sources
was found, the random coding bound for reliability function was con-
structed and in [101] a sphere packing bound was obtained.
Various bounds for error probability exponents and related results
have been derived also in [64, 65, 81, 166, 182], E-capacity region was
investigated in [119, 132].
To formulate the results let us introduce an auxiliary RV U with
values in a finite set U. Let RVs U, X1 , X2 , Y with values in alpha-
bets U, X1 , X2 , Y, respectively, form the following Markov chain: U →
(X1 , X2 ) → Y and be given by the following PDs:
and
( )
[
Rsp (E, WM ) = co Rsp (P, E, WM ) .
P
Theorem 3.6. For all E > 0, for MAC with correlated sources
Rr (E, WM ) ⊆ C(E, WM ) ⊆ Rsp (E, WM ).
but differ by the PDs P and P ∗ . The inner bound coincides with the
capacity region:
Rr (P ∗ , WM ) = (R0 , R1 , R2 ) :
0 ≤ Ri ≤ IP ∗ ,W (Xi ∧ Y |X3−i , U ), i = 1, 2,
R1 + R2 ≤ IP ∗ ,W (X1 , X2 ∧ Y |U ),
R0 + R1 + R2 ≤ IP ∗ ,W (X1 , X2 ∧ Y ) ,
obtained in [199], where it was also proved that in this case it is enough
to consider |U| ≤ |Y| + 3.
The results for the following special cases can be obtained from the
general case.
Regular MAC. In the case with M0 = 1 (Figure 3.8) we have the
classical MAC studied by Ahlswede [2, 3] and van der Meulen [209].
Ahlswede obtained a simple characterization of the capacity region.
+ D(P ◦ V kP ∗ ◦ W ) − E|+ ,
R1 + R2 ≤ min |IP,V (X1 ∧ X2 ∧ Y |U )
P,V :D(P ◦V kP ∗ ◦W )≤E
o
+ D(P ◦ V kP ∗ ◦ W ) − E|+ ,
52 Multiuser Channels
Rsp (P, E, WM )
= (R1 , R2 ) : 0 ≤ Ri ≤ min IP,V (Xi ∧ Y |X3−i ), i = 1, 2,
V :D(V kW |P )≤E
R1 + R2 ≤ min IP,V (X1 , X2 ∧ Y ) ,
V :D(V kW |P )≤E
and
Rsp (P, E, WM )
n
= (R0 , R2 ) : 0 ≤ R2 ≤ min IP,V (X2 ∧ Y |X1 ),
V :D(V kW |P )≤E
o
R0 + R2 ≤ min IP,V (X1 , X2 ∧ Y ) .
V :D(V kW |P )≤E
In this case, when E → 0, outer and inner bounds are equal and
coincide with the capacity region of the asymmetric MAC [101, 212].
MAC with cribbing encoders. Willems [217] and Willems and
van der Meulen [219] investigated MAC with cribbing encoders in
various communication situations and established the corresponding
capacity regions. We shall consider only one of these configurations
(Figure 3.10), investigated by van der Meulen [211], when the first
encoder has an information about the codeword produced by the sec-
ond encoder.
Theorem 3.9. For the MAC with cribbing encoders the outer and
inner bounds are:
n
Rr (P, E, WM ) = (R1 , R2 ) : R1 ≥ 0, R2 ≥ 0,
Ri ≤ min |IP,V (Xi ∧ Y |X3−i )
V :D(V kW |P )≤E
+ D(V kW |P ) − E|+ , i = 1, 2,
R1 + R2 ≤ min
V :D(V kW |P )≤E
o
× |IP,V (X1 , X2 ∧ Y ) + D(V kW |P ) − E|+ ,
and
n
Rsp (P, E, WM ) = (R1 , R2 ) : R1 ≥ 0, R2 ≥ 0,
R1 ≤ min min
i=1,2 P,Vi :D(P ◦Vi kP ∗ ◦Wi )≤Ei
and
[
Rr (E, WM ) = Rr (P ∗ , E, WM ).
P∗
Rr (E, WM ) ⊆ C(E, WM ).
57
58 E-capacity of Varying Channels
DCC were first studied by Blackwell et al. [33] and Dobrushin [56].
The capacity was found by Wolfowitz [221], who has shown that the
knowledge of the state s at the decoder does not improve the asymptotic
characteristics of the channel. So it is enough to study the channel in
two cases.
As for DMC, the capacity of the compound channel for average and
maximal error probabilities are the same.
In the book by Csiszár and Körner [51] the random coding bound
and the sphere packing bound for reliability function of DCC are given.
We shall formulate the sphere packing, random coding, and expurgated
bounds of E-capacity of DCC.
Let us denote by C(E, WC ) the E-capacity of DCC in the case,
when the state s is unknown at the encoder and decoder. In the case,
when s is known at the encoder and decoder, the E-capacity will be
denoted by Ĉ(E, WC ).
Let us introduce the following functions:
M
Rsp (E, WC ) = max min min IP,V (X ∧ Y ),
P s∈S V :D(V kWs |P )≤E
M
R̂sp (E, WC ) = min max min IP,V (X ∧ Y ).
s∈S P V :D(V kWs |P )≤E
and
n o
M
Rx (E, WC ) = max min IP,V (X ∧ X) + |EdW,s (X, X) − E|+ ,
min
P s∈S PX =PX =P
n o
M
R̂x (E, WC ) = min max min IP,V (X ∧ X) + |EdW,s (X, X) − E|+ .
s∈S P PX =PX =P
Theorem 4.1. For any DCC, any E > 0 the following inequalities hold
+ D(Q0 ◦ P ◦ V kQ ◦ P ◦ W ) − E|+ .
The following theorem was proved by M. Haroutunian in [128, 131].
Theorem 4.2. For all E > 0, for CRP with states sequence known to
the sender the following inequalities are valid
Rr (E, WQ ) ≤ C(E, WQ ) ≤ C(E, WQ ) ≤ Rsp (E, WQ ).
Note that when E → 0 we obtain the upper and the lower bounds
for capacity of the channel WQ :
M
Rsp (WQ ) = max Rsp (Q, P, V ),
P
M
Rr (WQ ) = max Rr (Q, P, V ),
P
4.2 Channels with Random Parameter 61
where Rr (WQ ) coincides with the capacity C(WQ ) of the CRP, obtained
by Gelfand and Pinsker [85]. They also proved that it is enough to
consider RV U with |U| ≤ |X | + |S|.
For the model with states sequence known at the encoder and
decoder Rsp (E, WQ ) is the same, but upper and lower bounds coincide
for small E, because
M
Rr (E, WQ ) = max min |IQ0 ,P,V (Y ∧ X|S)
P Q0 ,V :D(Q0 ◦P ◦V kQ◦P ◦W )≤E
+ D(Q0 ◦ P ◦ V kQ ◦ P ◦ W ) − E|+ ,
The results for this and next two cases are published in [117].
If the state sequence is unknown at the encoder and decoder let us
take W ∗ (y|x) = s∈S Q(s)W (y|x, s). Then the bounds will take the
P
following form:
M
Rsp (E, WQ ) = max min IP,V (Y ∧ X),
P V :D(V kW ∗ |P )≤E
M
Rr (E, WQ ) = max min |IP,V (Y ∧ X) + D(V kW ∗ |P ) − E|+ ,
P V :D(V kW ∗ |P )≤E
In the case when the state sequence is known at the decoder and
unknown at the encoder the following bounds are valid:
M
Rsp (E, WQ ) = max min IQ0 ,P,V (Y, S ∧ X),
P Q0 ,V :D(Q0 ◦V kQ◦W |P )≤E
M
Rr (E, WQ ) = max min |IQ0 ,P,V (Y, S ∧ X)
P Q0 ,V :D(Q0 ◦V kQ◦W |P )≤E
+ D(Q0 ◦ V kQ ◦ W |P ) − E|+ ,
62 E-capacity of Varying Channels
where
P 0 M
Rsp (E, W ) = inf 0 Rsp (E, WQ ),
Q∈P
0 M
RrP (E, W ) = inf Rr (E, WQ ).
Q∈P 0
f : M × S N × KN → X N
is the encoder, mapping host data block s, a message m and side infor-
mation k to a sequence x = f (s, m, k), which satisfies the following
distortion constraint:
dN
1 (s, f (s, m, k)) ≤ ∆1 ,
and
g : Y N × KN → M
is the decoding.
An attack channel, subject to distortion ∆2 , satisfies the following
condition:
X X
dN N
2 (x, y)A(y|x)p (x) ≤ ∆2 .
x∈X N y∈Y N
64 E-capacity of Varying Channels
The set of all attack channels are denoted by A(Q, P, ∆2 ) under the
condition of covert channel P ∈ P(Q, ∆1 ) and subject to distortion level
∆2 . The sets P(Q, ∆1 ) and A(Q, P, ∆2 ) are defined by linear inequality
constraints and hence are convex.
The error probability of the message m averaged over all (s, k) ∈
S × KN equals to
N
e(f, g, N, m, Q, A)
M
X
= Q(s, k)A{Y N − g −1 (m|k)|f (m, s, k)}.
(s,k)∈ S N ×KN
4.3 Information Hiding Systems 65
then
Q = {Q(s), s ∈ S},
Pi∗ = {Pi∗ (xi |s), xi ∈ Xi }, i = 1, 2,
∗
P = {P1∗ (x1 |s)P2∗ (x2 |s), x1 ∈ X1 , x2 ∈ X2 },
P = {P (x1 , x2 |s), x1 ∈ X1 , x2 ∈ X2 },
68 E-capacity of Varying Channels
with
X
P (x1 , x2 |s) = Pi∗ (xi |s), i = 1, 2,
x3−i
and joint PD
where
V = {V (y|x1 , x2 , s), s ∈ S, x1 ∈ X1 , x2 ∈ X2 , y ∈ Y}
with
n
M
Rr (P ∗ , E, WR ) = (R1 , R2 ) :
0 ≤ R1 ≤ min IQ0 ,P,V (X1 ∧ X2 , Y |S)
Q0 ,P,V :D(Q0 ◦P ◦V kQ◦P ∗ ◦W )≤E
+
+ D(Q0 ◦ P ◦ V kQ ◦ P ∗ ◦ W ) − E ,
0 ≤ R2 ≤ min IQ0 ,P,V (X2 ∧ X1 , Y |S)
Q0 ,P,V :D(Q0 ◦P ◦V kQ◦P ∗ ◦W )≤E
+
+ D(Q0 ◦ P ◦ V kQ ◦ P ∗ ◦ W ) − E ,
R1 + R2 ≤ min |IQ0 ,P,V (X1 , X2 ∧ Y |S)
Q0 ,P,V :D(Q0 ◦P ◦V kQ◦P ∗ ◦W )≤E
o
+ IQ0 ,P (X1 ∧ X2 |S) + D(Q0 ◦ P ◦ V kQ ◦ P ∗ ◦ W ) − E|+ .
where
M
Rsp (P, E, WR ) = {(R1 , R2 ) :
0 ≤ R1 ≤ min IQ0 ,P,V (X1 ∧ Y |X2 , S),
Q0 ,V :D(Q0 ◦P ◦V kQ◦P ◦W )≤E
Theorem 4.4. For all E > 0, for MAC with random parameter the
following inclusions are valid
For this model when the states are unknown at the encoders and
the decoder the mappings (f1 , f2 , g) are f1 : M1 → X1N , f2 : M2 → X2N
and g : Y N → M1 × M2 . Then
In this case the bounds in Theorem 4.4 take the following form:
n
M
Rr (P ∗ , E, WR ) = (R1 , R2 ) : R1 ≥ 0, R2 ≥ 0,
+
R1 ≤ min |IP,V (X1 ∧ X2 , Y ) + D(P ◦ V kP ∗ ◦ W ∗ ) − E| ,
P,V :D(P ◦V kP ∗ ◦W ∗ )≤E
+
R2 ≤ min |IP,V (X2 ∧ X1 , Y ) + D(P ◦ V kP ∗ ◦ W ∗ ) − E| ,
P,V :D(P ◦V kP ∗ ◦W ∗ )≤E
and
n
M
Rsp (P, E, WR ) = (R1 , R2 ) :
0 ≤ R1 ≤ min IQ0 ,P,V (X1 ∧ Y, S|X2 ),
Q0 ,V :D(Q0 ◦V kQ◦W |P )≤E
0 ≤ R2 ≤ min IQ0 ,P,V (X2 ∧ Y, S|X1 ),
Q0 ,V :D(Q0 ◦V kQ◦W |P )≤E
o
R1 + R2 ≤ min IP,V (X1 , X2 ∧ Y, S) .
Q0 ,V :D(Q0 ◦V kQ◦W |P )≤E
72 E-capacity of Varying Channels
When E → 0, we obtain the inner and outer estimates for the chan-
nel capacity region, the expressions of which as in the previous case are
similar but differ by the PDs P and P ∗ . The inner bound is
M
Rr (P ∗ , WR ) = {(R1 , R2 ) :
0 ≤ Ri ≤ IQ,P ∗ ,W (Xi ∧ Y |X3−i , S), i = 1, 2,
R1 + R2 ≤ IQ,P ∗ ,W (X1 , X2 ∧ Y |S)}.
The case when the states of the channel are known on the encoders
and unknown on the decoder is characterized by encodings f1 : M1 ×
S N → X1N and f2 : M2 × S N → X2N and decoding g : Y N → M1 ×
M2 . Then
Q = {Q(s), s ∈ S},
Pi∗ = {Pi∗ (ui , xi |s), xi ∈ Xi , ui ∈ Ui }, i = 1, 2,
∗
P = {P1∗ (u1 , x1 |s)P2∗ (u2 , x2 |s), x1 ∈ X1 , x2 ∈ X2 },
P = {P (u1 , u2 , x1 , x2 |s), x1 ∈ X1 , x2 ∈ X2 },
and
V = {V (y|x1 , x2 , s), s ∈ S, x1 ∈ X1 , x2 ∈ X2 , y ∈ Y}
Another case is when the states are known on one of the encoders
and unknown on the other encoder and on the decoder. For distinctness
we shall assume, that the first encoder has information about the state
of the channel. Then the code will consist of the following mappings:
f1 : M1 × S N → X1N and f2 : M2 → X2N are encodings and g : Y N →
M1 × M2 is decoding. The probability of the erroneous transmission
of messages m1 and m2 is
e(m1 , m2 , s) = W N {Y N − g −1 (m1 , m2 )|f1 (m1 , s), f2 (m2 ), s}.
Let the auxiliary RV U takes values in some finite set U and
Q = {Q(s), s ∈ S},
P1∗ = {P1∗ (u, x1 |s), x1 ∈ X1 , u ∈ U},
P2∗ = {P2∗ (x2 ), x2 ∈ X2 },
P ∗ = {P1∗ (u, x1 |s)P2∗ (x2 ), x1 ∈ X1 , x2 ∈ X2 },
P = {P (u, x1 , x2 |s), x1 ∈ X1 , x2 ∈ X2 },
V = {V (y|x1 , x2 , s), s ∈ S, x1 ∈ X1 , x2 ∈ X2 , y ∈ Y}.
The random coding bound of the E-capacity region will be
n
Rr (P ∗ , E, WR ) = (R1 , R2 ) :
R1 ≤ min IQ0 ,P,V (U ∧ Y, X2 ) − IQ0 ,P ∗ (U ∧ S)
1
Q0 ,P,V :D(Q0 ◦P ◦V kQ◦P ∗ ◦W )≤E
+
+ D(Q0 ◦ P ◦ V kQ ◦ P ∗ ◦ W ) − E ,
74 E-capacity of Varying Channels
R2 ≤ min IQ0 ,P,V (X2 ∧ Y, U ) − IQ0 ,P ∗ (U ∧ S)
1
Q0 ,P,V :D(Q0 ◦P ◦V kQ◦P ∗ ◦W )≤E
+
+ D(Q0 ◦ P ◦ V kQ ◦ P ∗ ◦ W ) − E ,
0 ≤ R1 + R2 ≤ min |IQ0 ,P,V (U, X2 ∧ Y )
Q0 ,P,V :D(Q0 ◦P ◦V kQ◦P ∗ ◦W )≤E
Rr (P ∗ , WR ) = {(R1 , R2 ) :
0 ≤ R1 ≤ IQ,P ∗ ,W (U ∧ Y |X2 ) − IQ,P1∗ (U ∧ S),
0 ≤ R2 ≤ IQ,P ∗ ,W (X2 ∧ Y |U ) − IQ,P1∗ (U ∧ S),
R1 + R2 ≤ IQ,P ∗ ,W (U, X2 ∧ Y ) − IQ,P1∗ (U ∧ S)}.
where Rsp (Q, P, V ) and Rr (Q, P, V ) are defined in (4.4) and (4.5),
respectively.
Theorem 4.5. For all E > 0, for arbitrarily varying channel with state
sequence known to the sender the following inequalities are valid
Note that when E → 0 we obtain the upper and the lower bounds
for capacity:
M
Rsp (W ) = min Rsp (WQ ),
Q∈Q(S)
M
Rr (W ) = min Rr (WQ ).
Q∈Q(S)
77
78 Source Coding Rates Subject to Fidelity and Reliability Criteria
P ∗ = {P ∗ (x), x ∈ X } (5.1)
The finite set Xb, different in general from X , is the reproduction alpha-
bet at the receiver. Let
d : X × Xb → [0; ∞) (5.3)
We name the code, and denote (f, g) the family of two mappings:
a coding
and a decoding
N −1 log L(N ) → R
Proof. It follows from Lemma 5.2 that for any fixed ∆ ≥ 0 and P ∗ for
every 0 < E2 ≤ E1 the following inequalities hold:
E= inf D(P k P ∗ ).
P :R(∆,P )>R
since now only the unitary matrix satisfies the condition on the expec-
tation under the minimum in (5.14). This is the solution to the lossless
84 Source Coding Rates Subject to Fidelity and Reliability Criteria
Taking into account these two facts, namely (5.15) and (5.16), it
is trivial to conclude about the minimum asymptotic rate sufficient
for the source lossless (zero-distortion) transmission under the reli-
ability requirement. Let us denote by R(E, P ∗ ) the special case of
the rate-reliability-distortion function R(E, ∆, P ∗ ) for ∆ = 0 and the
generic PD P ∗ , and call it the rate-reliability function in source
coding.
We define a code (f, g) for the message vectors of the type P with the
encoding:
j, when x ∈ C(P, QP , j), P ∈ α(E + δ, P ∗ ),
(
f (x) =
j0 , when x ∈ TPN (X), P ∈ / α(E + δ, P ∗ ),
88 Source Coding Rates Subject to Fidelity and Reliability Criteria
g(j) = x
bj , g(j0 ) = x
b0 ,
where j0 is a fixed number and x b0 is a fixed reconstruction vector. So,
it is not difficult to see that in our coding scheme an error occurs only
when j0 was sent by the coder.
According to the definition of the code (f, g), to Lemma 5.5 and the
inequality (5.12) we have for P ∈ α(E + δ, P ∗ )
X
d(x, xbj ) = N −1 b | x, x
n(x, x bj )d(x, x
b)
b
x,x
X
= x | x)d(x, x
P (x)QP (b b)
b
x,x
b ≤ ∆,
= EP,QP d(X, X) j = 1, J(P, QP ).
For a fixed type P and a corresponding conditional type QP the number
b used in the encoding, denoted by LP,QP (N ), is
of vectors x
LP,QP (N ) = exp{N (IP,QP (X ∧ X)
b + ε)}.
Using the polynomial upper estimate (Lemma 1.1) for the number of
conditional types Q, we have for N sufficiently large
\
\
N |X | N
A TP (X) ≤ (N + 1) A TP (X)
QP
\
N
≤ exp{N ε/2} A TP (X)
. (5.28)
QP
90 Source Coding Rates Subject to Fidelity and Reliability Criteria
From the last inequality, (5.27) and (5.28) it is not difficult to arrive to
the inequality
for any P ∈ α(E − ε, P ∗ ). This affirms the inequality (5.26) and hence
(5.25).
0
Proof. First note that α(E, P ∗ ) is a convex set, that is if P ∈ α(E, P ∗ )
00
and P ∈ α(E, P ∗ ), then
0 00
D(λP + (1 − λ)P k P ∗ ) ∈ α(E, P ∗ ),
because
0 00 0 00
D(λP + (1 − λ)P k P ∗ ) ≤ λD(P k P ∗ ) + (1 − λ)D(P k P ∗ )
≤ λE + (1 − λ)E = E,
5.4 Binary Hamming Rate-Reliability-Distortion Function 91
Proof. For fixed E let the points (∆1 , R1 ) and (∆2 , R2 ) belong to the
curve of R(E, ∆, P ∗ ) and ∆1 ≤ ∆2 . We shall prove that for every λ
from (0, 1),
R(E, λ∆1 + (1 − λ)∆2 , P ∗ ) ≤ λR(E, ∆1 , P ∗ ) + (1 − λ)R(E, ∆2 , P ∗ ).
Consider for any fixed PD P ∗ the rate-distortion function (5.14). Using
the fact that the rate-distortion function R(∆, P ∗ ) is a convex function
in ∆, one can readily deduce
R(E, λ∆1 + (1 − λ)∆2 , P ∗ )
= max R(λ∆1 + (1 − λ)∆2 , P )
P ∈α(E,P ∗ )
= λR(E, ∆1 , P ∗ ) + (1 − λ)R(E, ∆2 , P ∗ ).
Proof. In (5.17) the maximization is taken over the convex set α(E, P ∗ ).
From the concavity of HP (X) in P , it follows that the maximum is
attained at the boundaries of α(E, P ∗ ), that is at those P for which
D(P k P ∗ ) = E, unless the equiprobable PD (1/|X |, 1/|X |, . . . , 1/|X |) ∈
/
∗
α(E, P ), where the entropy attains its maximum value.
Let E1 and E2 be arbitrary values from the definitional domain of
R(E, P ∗ ), i.e., Ei ∈ (0, ∞), with 0 < E1 < E2 , i = 1, 2. And let
R(E1 , P ∗ ) = HPE1 (X), (5.29)
and
0 ≤ ∆ ≤ min{p∗ , 1 − p∗ },
(
HP ∗ (X) − H∆ (X),
RBH (∆, P ∗ ) =
0, ∆ > min{p∗ , 1 − p∗ }.
(5.31)
Using the following shortening,
M
pmax = max min{p, 1 − p},
P ∈α(E,P ∗ )
max HP (X) = 1,
P ∈α(E,P ∗ )
1
which holds when 2 log 2p1∗ + log 2(1−p
1
∗) ≤ E or
p∗ (1 − p∗ ) ≥ 2−2(E+1) . (5.33)
p∗2 − p∗ + 2−2(E+1) ≤ 0,
Therefore, the lemma is proved (at the same time proving that pmax =
pE ) and (5.34) is obtained, which gives us (5.32).
Note that, in particular, Einf (∆) can take on the value 0, providing
the concavity of the binary Hamming rate-reliability-distortion func-
tion RBH (E, ∆, P ∗ ) on the whole domain (0, ∞). The explicit form
(5.32) of the function allows us to conclude that it always holds when
RBH (∆, P ∗ ) > 0. And when RBH (∆, P ∗ ) = 0 it holds under the condi-
tion RBH (E, ∆, P ∗ ) > RBH (∆, P ∗ ) for all values of E from (0, ∞). We
will turn to some illustrations on this point in an example concerning
the robust descriptions system elaborated in Section 6.1 of the next
chapter.
maximum error exponent one in the setting of Section 5.2. The formu-
las derived there constitute more general results implying the main
formulas discussed in the previous sections of this review. Particularly,
in zero-distortion case, these formulas specialize the ones derived by Fu
and Shen [77] from their hypothesis testing analysis for AVS.
The model of AVS is more general than that of DMS. In the first
one, the source outputs distribution depends on the source state. The
latter varies within a finite set from one time instant to the next in an
arbitrary manner.
Let X and S be finite sets, X assigned for the source alphabet and
S for the source states, and let
M
P = {Ps , s ∈ S}
be a family of discrete PDs
M
Ps = {P (x | s) : x ∈ X }
on X . The probability of an x ∈ X N subject to a consecutive source
states s is determined by
N
M
Y
P N (x | s) = P (xi | si ).
n=1
Therefore, the AVS defined by P is a sequence of RVs {Xi }∞ i=1 for
which the PD of N -length RV X is an unknown element from P N ,
N
Theorem 5.13. Let the AVS be given by the family P. For any ∆ ≥ 0,
R(∆) = max R(∆, P ), (5.35)
P ∈P̄
Theorem 5.14. Let the AVS be given by the family P. For any E > 0
and ∆ ≥ 0,
or, equivalently,
R(E, ∆) = max max R(∆, Q). (5.37)
P ∈P̄ Q:D(QkP )≤E
(5.36) is the first general formula which implies several ones after-
wards. Apparently, R(∆, P ) is easily derivable.
The formula for the AVS maximum error exponent function E(R, ∆)
inverse to R(E, ∆) is the second general result.
100 Source Coding Rates Subject to Fidelity and Reliability Criteria
Theorem 5.17.
for
Corollary 5.8. R(E, ∆) and R(E, ∆, P ) have the same limit, namely
the zero-error rate-distortion function R̄(∆) (see [51]) as E → ∞. In
particular, (5.36) or (5.37) implies
R̄(∆) = max R(∆, P ).
P ∈P(X )
These were the main issues concerning the AVS coding that we
aimed to survey here. Along the aforecited claims one might convinced
of the generality of the results on the subject of source coding under
distortion and error exponent criteria. They require an effortless spe-
cialization for the DMS models that has been studying in the field more
intensively.
M M
with Q = {q, 1 − q} and QE = {qE , 1 − qE } be binary PD’s on source
alphabet X ,
M
pmax () = max max min{q 0 , 1 − q 0 }
Q: p≤q≤p+ Q0 : D(Q0 kQ)≤E
M
and Q0 = {q 0 , 1 − q 0 } be another binary PD on X .
6
Reliability Criterion in Multiterminal
Source Coding
103
104 Reliability Criterion in Multiterminal Source Coding
Fig. 6.2 (a) RBH (E 1 , ∆1 , P ∗ ) for p∗ = 0.15, ∆1 = 0.1, (b) for p∗ = 0.15, ∆1 = 0.3.
Fig. 6.4 RBH (E, ∆, P ∗ ) as a function of ∆1 and ∆2 for P ∗ = (0.15, 0.85), E 1 = 0.09,
E 2 = 0.49.
X
EP,QP dk (X, X k ) = P (x)QP (xk | x)dk (x, xk ) ≤ ∆k , k = j, K.
x, xk
108 Reliability Criterion in Multiterminal Source Coding
Fig. 6.5 RBH (E, ∆, P ∗ ) as a function of E 1 and E 2 for P ∗ = (0.15, 0.85), ∆1 = 0.1,
∆2 = 0.13.
Fig. 6.6 RBH (E, ∆, P ∗ ) for P ∗ = (0.15, 0.85), ∆1 = 0.1, ∆2 = 0.13, E 1 = 0.09.
R(E, ∆, P ∗ ) = R∗ (E, ∆, P ∗ ).
For E 1 such that (1/2, 1/2) ∈ α(E 1 , P ∗ ) Theorem 5.9 results in the
following reduction
(
1 1 ∗
1 − H∆1 (X), when ∆1 ≤ 1/2,
RBH (E , ∆ , P ) = (6.1)
0, when ∆1 > 1/2.
D(PE 1 k P ∗ ) = E 1 .
PE 2 is defined analogously.
/ α(E 1 , P ∗ )
For E 1 such that (1/2, 1/2) ∈
RBH (E, ∆, P ∗ )
= max max min IP,QP (X ∧ X 1 , X 2 ),
P ∈α(E 1 ,P ∗ ) QP ∈Q1 (P,∆)
2
max min IP,QP (X ∧ X ) .
P ∈α(E 2 ,P ∗ )−α(E 1 ,P ∗ ) QP ∈Q2 (P,∆)
6.1 Robust Descriptions System 111
/ α(E 1 , P ∗ )
For E 1 such that (1/2, 1/2) ∈
/ α(E 2 , P ∗ ) − α(E 1 , P ∗ )
For E 2 such that (1/2, 1/2) ∈
and when
then
RBH (E, ∆, P ∗ ) = max[HPE1 (X) − H∆1 (X), HPE2 (X) − H∆2 (X)],
and when
then
where x ∈X N , y ∈Y N , x1 ∈ X 1N , x2 ∈ X 2N , y1 ∈ Y 1N , y2 ∈ Y 2N .
For the considered system the code (f, fb, g, gb) is a family of four
mappings:
f : X N × Y N → {1, 2, . . . , K (N )},
fb : {1, 2, . . . , K (N )} → {1, 2, . . . , L (N )},
g : {1, 2, . . . , K (N )} → X 1N × Y 1N ,
gb : {1, 2, . . . , L (N )} → X 2N × Y 2N .
Define the sets
M
Ai = {(x, y) : g(f (x, y)) = (x1 , y1 ), gb(fb(f (x, y)))
= (x2 , y2 ), dix (x, xi ) ≤ ∆ix , diy (y, yi ) ≤ ∆iy }, i = 1, 2.
For given levels of admissible distortions ∆1x ≥ 0, ∆2x ≥ 0, ∆1y ≥ 0,
∆2y ≥ 0 the error probabilities of the code are
and
ei (f, fb, g, gb, P ∗ , ∆ix , ∆iy ) ≤ exp{−N (Ei − δ)}, i = 1, 2. (6.5)
6.2 Cascade System Coding Rates 115
Let R(E, ∆, P ∗ ) be the set of all pairs of (E, ∆)-achievable rates and
R(∆, P ∗ ) be the corresponding rate-distortion region. If at the first
decoder only the messages of the source X and at the second decoder
only the messages of the source Y are reconstructed, then R(∆, P ∗ )
becomes the set of all (∆1x , ∆2y )-achievable rates pairs R1 (∆1x , ∆2y , P ∗ )
studied by Yamamoto in [223].
We present the rate-reliability-distortion region R(E, ∆, P ∗ ) with-
out proof.
Let P = {P (x, y), x ∈ X , y ∈ Y} be some PD on X × Y and
Q = {Q(x1 , y 1 , x2 , y 2 | x, y), x ∈ X , y ∈ Y, x1 ∈ X 1 ,
y 1 ∈ Y 1 , x2 ∈ X 2 , y 2 ∈ Y 2 }
≤ ∆1x , (6.6)
EP,QP d1y (Y, Y 1 )
X
= P (x, y)QP (x1 , y 1 , x2 , y 2 | x, y)d1y (y, y 1 )
x,y,x1 ,y 1 ,x2 ,y 2
≤ ∆1y , (6.7)
EP,QP d2x (X, X 2 )
X
= P (x, y)QP (x1 , y 1 , x2 , y 2 | x, y)d2x (x, x2 )
x,y,x1 ,y 1 ,x2 ,y 2
≤ ∆2x , (6.8)
116 Reliability Criterion in Multiterminal Source Coding
≤ ∆2y , (6.9)
otherwise, if P ∈ α(E 2 , P ∗ )
− α(E 1 , P ∗ ),
then only (6.8) and
(6.9) hold.
(2) E 2 ≤ E 1 . If P ∈ α(E 2 , P ∗ ), let the set Q(P, E, ∆) consists
of those conditional PDs QP which make the inequalities
(6.6)–(6.9) valid, otherwise, if P ∈ α(E 1 , P ∗ ) − α(E 2 , P ∗ ),
then only (6.6) and (6.7) are satisfied. When E 1 = E 2 = 0,
the notation Q(P ∗ , ∆) specializes Q(P, E, ∆).
Introduce the following sets:
(1) for E 1 ≤ E 2 ,
\ [ n
R1 (E, ∆, P ∗ ) = (R, R)
b :
P ∈α(E 1 ,P ∗ ) QP ∈Q(P,E,∆)
R ≥ IP,QP (X, Y ∧ X 1 , Y 1 , X 2 , Y 2 ),
o
b ≥ IP,Q (X, Y ∧ X 2 , Y 2 ) ,
R P
R2 (E, ∆, P ∗ )
\ [ n
= b : R ≥ R,
(R, R) b
P ∈α(E 2 ,P ∗ )−α(E 1 ,P ∗ ) QP ∈Q(P,E,∆)
o
b ≥ IP,Q (X, Y ∧ X 2 , Y 2 ) ,
R P
(2) for E 2 ≤ E 1 ,
R3 (E, ∆, P ∗ )
\ [ n
= (R, R)
b :
P ∈α(E 2 ,P ∗ ) QP ∈Q(P,E,∆)
R ≥ max[IP,QP (X, Y ∧ X 1 , Y 1 , X 2 , Y 2 ),
max min IP,QP (X, Y ∧ X 1 , Y 1 )],
P ∈α(E 1 ,P ∗ )−α(E 2 ,P ∗ ) QP ∈Q(P,E,∆)
o
b ≥ IP,Q (X, Y ∧ X 2 , Y 2 ) .
R P
6.3 Reliability Criterion in Successive Refinement 117
R ≥ IP,QP (X, Y ∧ X 1 , Y 1 , X 2 , Y 2 ),
b ≥ IP,Q (X, Y ∧ X 2 , Y 2 ) .
R P
R ≥ IP ∗ ,QP ∗ (X, Y ∧ X 1 , Y 1 , X 2 , Y 2 ),
b ≥ IP ∗ ,Q ∗ (X, Y ∧ X 2 , Y 2 ) .
R P
Corollary 6.4. If the first decoder recovers only the messages of the
source X and the second decoder recovers only the messages of Y , we
obtain the result by Yamamoto [223]:
[
R(∆, P ∗ ) = b : R ≥ IP ∗ ,Q ∗ (X, Y ∧ X 1 , Y 2 ),
(R, R) P
QP ∗ ∈Q(P ∗ ,∆)
b ≥ IP ∗ ,Q ∗ (X, Y ∧ Y 2 ) .
R P
fk : X N → {1, 2, . . . , Lk (N )}, k = 1, 2,
g1 : {1, 2, . . . , L1 (N )} → X 1N ,
g2 : {1, 2, . . . , L1 (N )} × {1, 2, . . . , L2 (N )} → X 2N ,
and only
EP,QP d2 (X, X 2 ) ≤ ∆2
Or, equivalently,
RP ∗ (E1 , E2 , ∆1 , ∆2 )
\ [
= (R1 , R2 ) :
P ∈α(E2 ,P ∗ ) QP ∈Q(P,E1 ,E2 ,∆1 ,∆2 )
Note that the equivalency of (6.10) and (6.11) is due to the fact
that R(∆1 , P ) ≤ IP,QP (X ∧ X 1 ) for each P ∈ α(E2 , P ∗ ) and the repre-
sentation (5.16).
R1 ≥ IP,QP (X ∧ X 1 ),
1 2
R1 + R2 ≥ max max R(∆2 , P ), IP,QP (X ∧ X , X ) .
P ∈α(E2 ,P ∗ )−α(E1 ,P ∗ )
(6.12)
Or, equivalently,
RP ∗ (E1 , E2 , ∆1 , ∆2 )
\ [ n
= (R1 , R2 ) :
P ∈α(E1 ,P ∗ ) QP ∈Q(P,E1 ,E2 ,∆1 ,∆2 )
R1 ≥ IP,QP (X ∧ X 1 ),
o
R1 + R2 ≥ max(R(E2 , ∆2 , P ∗ ), IP,QP (X ∧ X 1 , X 2 )) . (6.13)
RP ∗ (∆1 , ∆2 )
[ n
= (R1 , R2 ) : R1 ≥ IP ∗ ,QP ∗ (X ∧ X 1 ),
QP ∗ ∈Q(P ∗ ,∆1 ,∆2 )
o
R1 + R2 ≥ IP ∗ ,QP ∗ (X ∧ X 1 X 2 ) . (6.15)
E1 ≥ E2 case: For this situation, from (6.11) it follows that the rates
pair (6.16) is achievable iff for each P ∈ α(E2 , P ∗ ) there exists a QP ∈
Q(P, E1 , E2 , ∆1 , ∆2 ) such that the inequalities
R(E2 , ∆2 , P ∗ ) ≥ IP,QP (X ∧ X 1 , X 2 )
≥ IP,QP (X ∧ X 2 ) ≥ R(∆2 , P ∗ ) (6.20)
for (6.18).
By the corollary (5.16) for the rate-reliability-distortion function it
follows that (6.19) and (6.20) hold for each P ∈ α(E2 , P ∗ ) iff there exist
a PD P̄ ∈ α(E2 , P ∗ ) and a conditional PD QP̄ ∈ Q(P̄ , E1 , E2 , ∆1 , ∆2 ),
such that X → X 2 → X 1 forms a Markov chain in that order and at
the same time
and
and
and
Then, recalling (5.16) again, the inequalities (6.27) and (6.28) in turn
hold for each P ∈ α(E1 , P ∗ ) iff
and meantime
Now, noting that the right-hand side of the last inequality does not
depend on E2 and the function R(E2 , ∆2 , P ∗ ) is monotonically nonde-
creasing in E2 , we arrive to the conclusion that (6.30) will be satisfied
for Q̄P meeting (6.29) iff E2 ≥ Ê2 , where
where the right-hand side expression is the value of the zero-error rate-
distortion function (5.19) for the second hierarchy.
Finally note that we obtain the conditions for the successive refin-
ability in distortion sense [70, 157], and [187], letting E1 = E2 = E → 0,
as in (6.21) and (6.22). We quote those particularized conditions accord-
ing to the theorem by Equitz and Cover [70].
R1 = max HP (X),
P ∈α(E1 ,P ∗ )
R1 + R2 = max HP (X),
P ∈α(E2 ,P ∗ )
7.1 Prelude
The section serves an illustration of usefulness of combinatorial meth-
ods developed in information theory to investigation of the logarith-
mically asymptotically optimal (LAO) testing of statistical hypotheses
(see also [49, 51, 54, 167]).
Applications of information-theoretical methods in mathematical
statistics are reflected in the monographs by Kullback [160], Csiszár
and Körner [51], Cover and Thomas [48], Blahut [35], Han [94], Ihara
[147], Chen and Alajaji [43], Csiszár and Shields [54]. Verdú’s book [216]
on multi-user detection is an important information theory reference
where the hypothesis testing is in service to information theory in its
practical elaborations.
The paper by Dobrushin, Pinsker and Shiryaev [59] presents a series
of prospective problems on this subject. Blahut [34] used the results
of statistical hypotheses testing for solution of information-theoretical
problems.
For series of independent experiments the well-known Stein’s
lemma [51] shows that for the given fixed first kind error probability
(N )
α1 = α1 the exponential rate of convergence to zero of the probability
129
130 Logarithmically Asymptotically Optimal Testing of Statistical Hypotheses
(N )
of the second kind error α2 when the number of experiments tends
to infinity is as follows:
(N )
lim N −1 log α2 (α1 ) = −D(P1 kP2 ),
N →∞
α1 (φN ) ≤ e−N E1 ,
takes its maximal value denoted by E2 (E1 ). By analogy with the notion
introduced by Shannon into the information theory (see (2.7), (2.12))
it is natural to refer the function E2 (E1 ) also as reliability function.
Let P = {P (j|i), i = 1, I, j = 1, I} be a matrix of transition prob-
abilities of a stationary Markov chain with the same states set X
and let Q = {Q(i), i = 1, I} be corresponding stationary distribution.
Let us denote D(Q ◦ P kQl ◦ Pl ) Kullback–Leibler divergence of the
distribution
Q ◦ P = {Q(i)P (j|i), i = 1, I, j = 1, I}
where
X
D(Q ◦ P kQl ◦ Pl ) = Q(i)P (j|i)[log Q(i)P (j|i) − log Ql (i)Pl (j|i)]
i,j
= D(QkQl ) + D(Q ◦ P kQ ◦ Pl )
with
X
D(QkQl ) = Q(i)[log Q(i) − log Ql (i)], l = 1, 2.
i
Theorem 7.1. For any E1 > 0 the LAO tests reliability function
E2 (E1 ) for testing of two hypotheses P1 and P2 concerning Markov
chains is given by the formula:
then
Proof. The proof of Theorem 7.1 consists of two parts. In the first part
by means of construction of a test sequence it is proved that E2 (E1 )
is not less than the expression noted in (7.1). In the second part it is
proved that E2 (E1 ) cannot be greater than this expression. Let us name
the second-order type of vector x (cf. [87, 148]) the square matrix of I 2
relative frequencies {N (i, j)N −1 , i = 1, I, j = 1, I} of the simultaneous
appearance on the pairs of neighbor places of the states i and j. It is
N the set of vectors from X N +1
P
clear that ij N (i, j) = N . Denote TQ◦P
which have the type such that for some joint PD Q ◦ P
N (i, j) = N Q(i)P (j|i), i = 1, I, j = 1, I.
134 Logarithmically Asymptotically Optimal Testing of Statistical Hypotheses
N , then
Note that if the vector x ∈ TQ◦P
X
N (i, j) = N Q(i), i = 1, I,
j
X
N (i, j) = N Q0 (j), j = 1, I,
i
|N Q(i) − N Q0 (i)| ≤ 1, i = 1, I,
The test φN (x) can be given by the set B(E1 ), the part of the space
X N +1 ,in which the hypothesis P1 is adopted. We shall verify that the
test for which
[
N
B(E1 ) = TQ◦P (7.2)
Q,P :D(Q◦P kQ◦P1 )≤E1 ,∃Q1 :D(QkQ1 )≤∞
D(Q ◦ P kQl ◦ Pl ) = ∞,
but since
Ql ◦ PlN (TQ◦P
N
) = exp{−N (D(Q ◦ P kQ ◦ Pl ) + o(1))},
where
−1
o(1) = max max |N log Ql (i)| : Ql (i) > 0 ,
i
−1
max |N log Ql (i)| : Ql (i) > 0 → 0, when N → ∞.
i
Really, this is not difficult to verify taking into account that the number
N | of vectors in T N
|TQ◦P Q◦P is equal to
X
exp −N Q(i)P (j|i) log P (j|i) + o(1) .
i,j
N
By analogy with (1.2) the number of different types TQ◦P does not
2
exceed (N + 1)|X | , then
Coming to the second part of the proof, note that for any x ∈ TQ◦PN
the probability Ql ◦ PlN (x) is constant. Hence for the optimal test the
corresponding set B 0 (E1 ) contains only the whole types TQ◦P
N . Now we
Remark 7.1. The test given by the set B(E1 ) is robust because it is
the same for different alternative hypothesis P2 .
This probability is called [36] the error probability of the rth kind of
the test φN . The quadratic matrix of L2 error probabilities A(φN ) =
(N )
{αl|r (φ), r = 1, L, l = 1, L} sometimes is called the power of the tests.
To every trajectory x the determined test φN results in a choice of
a hypothesis among L ones. So the space X N +1 will be divided into
L parts
and
Denote
1
El|r (φ) = lim − log αl|r (φN ), r, l = 1, L. (7.4)
N →∞ N
M
Rl = {Q ◦ P : D(Q ◦ P kQ ◦ Pl ) ≤ El|l , ∃Ql : D(QkQl ) < ∞},
l = 1, L − 1,
M
RL = {Q ◦ P : D(Q ◦ P kQ ◦ Pl ) ≥ El|l , l = 1, L − 1},
138 Logarithmically Asymptotically Optimal Testing of Statistical Hypotheses
∗ M
EL|r (E1|1 , . . . , EL−1|L−1 ) = inf D(Q ◦ P kQ ◦ Pr ), r = 1, L − 1,
Q◦P ∈RL
∗ ∗ M
EL|L (E1|1 , . . . , EL−1|L−1 ) = min El|L .
l=1,L−1
The minimum on the void set always will be taken equal to infinity.
Name the following conditions as compatibility conditions:
positive.
(2) If the compatibility conditions are violated, then for any tests
at least one element of the reliability matrix will be equal to zero, that
is the corresponding error probabilities will not decrease exponentially.
(N (1|x)/N, . . . , N (I|x)/N ),
∗ ∗
El|l = El|l (El|l ) = El|l , l = 1, L − 1,
∗ ∗
El|r = El|r (El|l ) = inf D(P kPr ), r = 1, L, r 6= l, l = 1, L − 1,
P ∈Rl
∗ ∗
EL|r = EL|r (E1|1 , . . . , EL−1|L−1 ) = inf D(P kPr ), r = 1, L − 1,
P ∈RL
∗ ∗ ∗
EL|L = EL|L (E1|1 , . . . , EL−1|L−1 ) = min El|L (7.7)
l=1,L−1
∗ may be equal to infinity, this may occur,
Note that parameter Er|l
when some measures Pl are not absolutely continuous relative to some
others.
Theorem 7.2 admits the following form.
· · · · · · · · · · · · · · · (7.8)
∗
0 < Er|r < min min El|r (El|l ), min D(Pl kPr ) , r = 2, L − 1,
l=1,r−1 l=r+1,L
140 Logarithmically Asymptotically Optimal Testing of Statistical Hypotheses
(N )
αl1 ,l2 ,...,lK |m1 ,m2 ,...,mK , (m1 , m2 , . . . , mK ) 6= (l1 , l2 , . . . , lK ),
mk , lk = 1, L, k = 1, K,
142 Logarithmically Asymptotically Optimal Testing of Statistical Hypotheses
(N ) Pr(N ) (m 6= r, l = r) 1 X (N )
αl=r|m6=r = = P αr|m Pr(m).
Pr(m 6= r) Pr(m)
m:m6=r
m:m6=r
144 Logarithmically Asymptotically Optimal Testing of Statistical Hypotheses
Theorem 7.4. For the model with different distributions, under the
condition that the probabilities of all L hypotheses are positive the
reliability Em6=r|l=r for given Em=r|l6=r = Er|r is defined by (7.10).
P1 = {0.10, 0.90},
P2 = {0.65, 0.35},
P3 = {0.45, 0.55},
P4 = {0.85, 0.15},
P5 = {0.23, 0.77}.
In Figures 7.2 and 7.3 the results of calculations of the same depen-
dence are presented for four distributions taken from the previous five.
In [113] Haroutunian and Hakobyan solved the problem of identifi-
cation of distributions for two independent objects, what will be exposed
in the coming lines.
We begin with a lemma from [110] on LAO testing for two indepen-
dent objects and L hypotheses concerning each of them. Let a sequence
146 Logarithmically Asymptotically Optimal Testing of Statistical Hypotheses
Fig. 7.2 The function El=r|m6=r for four distributions taken from five.
The LAO test Φ∗ is the compound test, and for it the equalities
(7.11) and (7.12) are valid.
For identification the statistician have to answer the question
whether the pair of distributions (r1 , r2 ) occurred or not. Let us con-
sider two types of error probabilities for each pair (r1 , r2 ), r1 , r2 =
7.4 Optimal Testing and Identification for Statistical Hypothesis 147
(N )
1, L. We denote by α(l1 ,l2 )6=(r1 ,r2 )|(m1 ,m2 )=(r1 ,r2 ) the probability that
pair (r1 , r2 ) is true, but it is rejected. Note that this probability
(N ) (N )
is equal to αr1 ,r2 |r1 ,r2 . Let α(l1 ,l2 )=(r1 ,r2 )|(m1 ,m2 )6=(r1 ,r2 ) be the prob-
ability that (r1 , r2 ) is accepted, when it is not correct. The corre-
sponding reliabilities are E(l1 ,l2 )6=(r1 ,r2 )|(m1 ,m2 )=(r1 ,r2 ) = Er1 ,r2 |r1 ,r2 and
E(l1 ,l2 )=(r1 ,r2 )|(m1 ,m2 )6=(r1 ,r2 ) . Our aim is to determine the dependence of
E(l1 ,l2 )=(r1 ,r2 )|(m1 ,m2 )6=(r1 ,r2 ) on given Er1 ,r2 |r1 ,r2 (ΦN ).
Now let us suppose that hypotheses P1 , P2 , . . . , PL for two objects
have a priori positive probabilities Pr (r1 , r2 ), r1 , r2 = 1, L, and consider
the probability, which we are interested in:
(N )
α(l1 ,l2 )=(r1 ,r2 )|(m1 ,m2 )6=(r1 ,r2 )
E(l1 ,l2 )=(r1 ,r2 )|(m1 ,m2 )6=(r1 ,r2 ) = min Er1 ,r2 |m1 ,m2 .
(m1 ,m2 ):(m1 ,m2 )6=(r1 ,r2 )
(7.13)
For every LAO tests Φ∗ from (7.9), (7.11), (7.12), and (7.14) we
obtain that
where ErI1 |m1 (Er1 |r1 ), ErII2 |m2 (Er2 |r2 ) are determined by (7.7) for, corre-
spondingly, the first and the second objects. For every LAO test Φ∗
from (7.9), (7.11), and (7.12) we deduce that
Er1 ,r2 |r1 ,r2 = min ErI1 |m1 , ErII2 |m2 = min ErI1 |r1 , ErII2 |r2 .
m1 6=r1 ,m2 6=r2
(7.15)
and each of ErI1 |r1 , ErII2 |r2 satisfy the following conditions:
∗
0< ErI1 |r1 < min min El|m I
(El|l ), min D(Pl ||Pr1 ) , (7.16)
l=1,r1 −1 l=r1 +1,L
∗
0< ErII2 |r2 < min min El|m II
(El|l ), min D(Pl ||Pr2 ) . (7.17)
l=1,r2 −1 l=r2 +1,L
(7.17) as follows:
0< ErI1 |r1 < min min D(Pr1 ||Pl ), min D(Pl ||Pr1 ) , (7.18)
l=1,r1 −1 l=r1 +1,L
0 < ErII2 |r2 < min min D(Pr2 ||Pl ), min D(Pl ||Pr2 ) . (7.19)
l=1,r2 −1 l=r2 +1,L
where Er1 |m1 (Er1 ,r2 |r1 ,r2 ) and Er2 |m2 (Er1 ,r2 |r1 ,r2 ) are determined by
(7.7).
Finally we obtained
|a|+ max(a, 0)
bac integer part of the number a
AVC arbitrarily varying channel
AVS arbitrarily varying source
BC broadcast channel
BSC binary symmetric channel
C(W ) capacity of DMC W
C(W ) capacity of DMC W for average error
probability
C(E, W ) = R(E, W ) E-capacity or rate-reliability function
C0 (W ) zero error capacity of DMC W
CRP channel with random parameter
co(A) the convex hull of the set A
d distortion measure
d(x, x
b) average distortion measure of vectors x and x
b
DCC discrete compound channel
DMC discrete memoryless channel
DMS discrete memoryless source
D(P kQ) divergence of PD P from Q
153
154 Basic Notations and Abbreviations
EP X expectation of the RV X
e(f, g, N, W ) the maximal error probability of N -block code
(f, g) for channel W
e(f, g, P ∗ , ∆, N ) error probability of DMS P ∗ N -block code (f, g)
subject to distortion ∆
E error probability exponent (reliability)
E(R, W ) reliability function of DMC W
Esp (R, W ) sphere packing bound for the reliability function
of DMC W
Er (R, W ) random coding bound for the reliability function
of DMC W
Ex (R, W ) expurgated bound for the reliability function of
DMC W
exp, log are to the base two
GCRP generalized channel with random parameter
GIFC general interference channel
g −1 (m) {y : g(y) = m}
HP (X) entropy of RV X with PD P
HP,V (Y |X) conditional entropy of RV Y for given RV X
with PD P and conditional PD V of Y
given X
IP,V (Y ∧ X) conditional mutual information with PD P ◦ W
IQ,P,V (Y ∧ X | U ) mutual information of RV X and Y
IFC interference channel
L(N ) volume of source code
MAC multiple-access channel
MACRP multiple-access channel with random parameter
M message set
M number of messages of the set M
n = 1, N n = 1, 2, . . . , N
PD probability distribution
P(X ) set of all PD on X
PN (X ) subset of P(X ) consisting of the possible types
of sequences x ∈ X N
P, Q, V, W, . . . PD
Basic Notations and Abbreviations 155
157
158 References
[42] P.-N. Chen, “General formula for the Neyman-Pearson type-II error exponent
subject to fixed and exponential type-I error bounds,” IEEE Transactions on
Information Theory, vol. 42, no. 1, pp. 316–323, 1996.
[43] P.-N. Chen and F. Alajaji, Lecture Notes in Information Theory. vol. I and
II, http://shannon.cm.nctu.edu.tw.
[44] M. H. M. Costa and A. El Gamal, “The capacity region of the discrete mem-
oryless interference channel with strong interference,” IEEE Transactions on
Information Theory, vol. 33, pp. 710–711, 1987.
[45] T. M. Cover, “Broadcast channels,” IEEE Transactions on Information The-
ory, vol. 18, no. 1, pp. 2–14, 1972.
[46] T. M. Cover, “An achievable rate region for the broadcast channel,” IEEE
Transactions on Information Theory, vol. 21, pp. 399–404, 1975.
[47] T. M. Cover, “Comments on broadcast channels,” IEEE Transactions on
Information Theory, vol. 44, pp. 2524–2530, 1998.
[48] T. M. Cover and J. A. Thomas, Elements of Information Theory. New York:
Wiley, 1991.
[49] I. Csiszár, “The method of types,” IEEE Transactions on Information Theory,
vol. 44, no. 6, pp. 2505–2523, 1998.
[50] I. Csiszár and J. Körner, “Graph decomposition: A new key to coding theo-
rems,” IEEE Transactions on Information Theory, vol. 27, no. 1, pp. 5–12,
1981.
[51] I. Csiszár and J. Körner, Information Theory: Coding Theorems for Discrete
Memoryless Systems. New York: Academic Press, 1981. (Russian translation,
Mir, Moscow, 1985).
[52] I. Csiszár, J. Körner, and K. Marton, “A new look at the error exponent of
a discrete memoryless channel,” in Presented at the IEEE International Sym-
posium on Information Theory, Ithaca, NY: Cornell Univ., 1977. (Preprint).
[53] I. Csiszár and G. Longo, “On the error exponent for source coding and for
testing simple statistical hypotheses,” Studia Scientiarum Mathematicarum
Hungarica, vol. 6, pp. 181–191, 1971.
[54] I. Csiszár and P. Shields, “Information theory and statistics: A tutorial,” in
Foundation and Trends in Communications and Information theory, note-
Hanover, MA, USA: now Publishers, 2004.
[55] A. Das and P. Narayan, “Capacities of time-varying multiple-access channels
with side information,” IEEE Transactions on Information Theory, vol. 48,
no. 1, pp. 4–25, 2002.
[56] R. L. Dobrushin, “Optimal information transmission in a channel with
unknown parameters,” (in Russian), Radiotekhnika Electronika, vol. 4,
pp. 1951–1956, 1959.
[57] R. L. Dobrushin, “Asymptotic bounds of the probability of error for the trans-
mission of messages over a memoryless channel with a symmetric transition
probability matrix,” (in Russian), Teorija Veroyatnost. i Primenen, vol. 7,
no. 3, pp. 283–311, 1962.
[58] R. L. Dobrushin, “Survey of Soviet research in information theory,” IEEE
Transactions on Information Theory, vol. 18, pp. 703–724, 1972.
References 161
[95] T. S. Han and S. Amari, “Statistical inference under multiterminal data com-
pression,” IEEE Transactions on Information Theory, vol. 44, no. 6, pp. 2300–
2324, 1998.
[96] T. S. Han and K. Kobayashi, “A new achievable region for the interference
channel,” IEEE Transactions on Information Theory, vol. 27, no. 1, pp. 49–60,
1981.
[97] T. S. Han and K. Kobayashi, “Exponential-type error probabilities for multi-
terminal hypothesis testing,” IEEE Transactions on Information Theory,
vol. 35, no. 1, pp. 2–13, 1989.
[98] E. A. Haroutunian, “Upper estimate of transmission rate for memoryless chan-
nel with countable number of output signals under given error probability
exponent,” in 3rd All Union Conference on Theory of Information Trans-
mission and Coding, Uzhgorod, Publishing House of the Uzbek Academy of
Science, pp. 83–86, Tashkent, 1967. (in Russian).
[99] E. A. Haroutunian, “Estimates of the error probability exponent for a semicon-
tinuous memoryless channel,” (in Russian), Problems on Information Trans-
mission, vol. 4, no. 4, pp. 37–48, 1968.
[100] E. A. Haroutunian, “On the optimality of information transmission by a chan-
nel with finite number of states known at the input,” (in Russian), Izvestiya
Akademii Nauk Armenii, Matematika, vol. 4, no. 2, pp. 81–90, 1969.
[101] E. A. Haroutunian, “Error probability lower bound for the multiple-access
communication channels,” (in Russian), Problems of Information Transmis-
sion, vol. 11, no. 2, pp. 22–36, 1975.
[102] E. A. Haroutunian, “Combinatorial method of construction of the upper
bound for E-capacity,” (in Russian), Mezhvuz. Sbornic Nouchnikh Trudov,
Matematika, Yerevan, vol. 1, pp. 213–220, 1982.
[103] E. A. Haroutunian, “On asymptotically optimal testing of hypotheses concern-
ing Markov chain,” (in Russian), Izvestiya Akademii Nauk Armenii, Matem-
atika, vol. 23, no. 1, pp. 76–80, 1988.
[104] E. A. Haroutunian, “Asymptotically optimal testing of many statistical
hypotheses concerning Markov chain,” in 5th International Vilnius Conference
on Probability Theory and Mathematical Statistics, vol. 1(A–L), pp. 202–203,
1989.
[105] E. A. Haroutunian, “On asymptotically optimal criteria for Markov chains,”
(in Russian), First World Congress of Bernoulli Society, section 2, vol. 2,
no. 3, pp. 153–156, 1989.
[106] E. A. Haroutunian, “Logarithmically asymptotically optimal testing of mul-
tiple statistical hypotheses,” Problems of Control and Information Theory,
vol. 19, no. 5–6, pp. 413–421, 1990.
[107] E. A. Haroutunian, “On Bounds for E-capacity of DMC,” IEEE Transactions
on Information Theory, vol. 53, no. 11, pp. 4210–4220, 2007.
[108] E. A. Haroutunian and B. Belbashir, “Lower estimate of optimal transmission
rates with given error probability for discrete memoryless channel and for
asymmetric broadcast channel,” in 6-th International Symposium on Infor-
mation Theory, pp. 19–21, Tashkent, 1984. (in Russian).
164 References
[148] P. Jacket and W. Szpankovksi, “Markov types and minimax redundancy for
Markov sources,” IEEE Transactions on Information Theory, vol. 50, no. 7,
pp. 1393–1402, 2004.
[149] J. Jahn, “Coding of arbitrarily varying multiuser channels,” IEEE Transac-
tions on Information Theory, vol. 27, no. 2, pp. 212–226, 1981.
[150] F. Jelinek, “Evaluation of expurgated bound exponents,” IEEE Transactions
on Information Theory, vol. 14, pp. 501–505, 1968.
[151] A. Kanlis and P. Narayan, “Error exponents for successive refinement by par-
titioning,” IEEE Transactions on Information Theory, vol. 42, no. 1, pp. 275–
282, 1996.
[152] V. D. Kolesnik and G. S. Poltirev, Textbook of Information Theory. (in Rus-
sian), Nauka, Moscow, 1982.
[153] J. Körner and K. Marton, “General broadcast channels with degraded message
sets,” IEEE Transactions on Information Theory, vol. 23, pp. 60–64, 1977.
[154] J. Körner and A. Sgarro, “Universally attainable error exponents for broadcast
channels with degraded message sets,” IEEE Transactions on Information
Theory, vol. 26, pp. 670–679, 1980.
[155] V. N. Koshelev, “Multilevel source coding and data-transmission theorem,”
in Proceedings of VII All-Union Conference on Theory of Coding and Data
Transmission, pp. 85–92, Vilnius, U.S.S.R., pt. 1, 1978.
[156] V. N. Koshelev, “Hierarchical coding of discrete sources,” (in Russian), Prob-
lems of Information Transmission, vol. 16, no. 3, pp. 31–49, 1980.
[157] V. N. Koshelev, “An evaluation of the average distortion for discrete scheme
of sequential approximation,” (in Russian), Problems on Information Trans-
mission, vol. 17, no. 3, pp. 20–30, 1981.
[158] V. N. Koshelev, “On divisibility of discrete sources with the single-letter-
additive measure of distortion,” Problems on Information Transmission,
vol. 30, no. 1, pp. 31–50, (in Russian), 1994.
[159] B. D. Kudryashov and G. S. Poltyrev, “Upper bounds for decoding error prob-
ability in some broadcast channels,” (in Russian), Problems on Information
Transmission, vol. 15, no. 3, pp. 3–17, 1979.
[160] S. Kullback, Information Theory and Statistics. New York: Wiley, 1959.
[161] E. Levitan and N. Merhav, “A competitive Neyman-Pearson approach to uni-
versal hypothesis testing with applications,” IEEE Transactions on Informa-
tion Theory, vol. 48, no. 8, pp. 2215–2229, 2002.
[162] Y. Liang and G. Kramer, “Rate regions for relay broadcast channels,” IEEE
Transactions on Information Theory, vol. 53, no. 10, pp. 3517–3535, 2007.
[163] Y. N. Lin’kov, “On asymptotical discrimination of two simple statistical
hypotheses,” (in Russian), Preprint 86.45, Kiev, 1986.
[164] Y. N. Lin’kov, “Methods of solving asymptotical problems of two simple sta-
tistical hypotheses testing,” (in Russian), Preprint 89.05, Doneck, 1989.
[165] Y. N. Lin’kov, Asymptotical Methods of Random Processes Statistics, (in Rus-
sian). Naukova Dumka, Kiev, 1993.
[166] Y. S. Liu and B. L. Hughes, “A new universal coding bound for the multiple-
access channel,” IEEE Transactions on Information Theory, vol. 42, pp. 376–
386, 1996.
168 References
[167] G. Longo and A. Sgarro, “The error exponent for the testing of simple sta-
tistical hypotheses: A combinatorial approach,” Journal of Combinatories,
Informational System Sciences, vol. 5, no. 1, pp. 58–67, 1980.
[168] A. A. Lyapunov, “On selection between finite number of distributions,” (in
Russian), Uspekhi Matematicheskikh Nauk, vol. 6, no. 1, pp. 178–186, 1951.
[169] I. Marić, R. D. Yates, and G. Kramer, “Capacity of interference channels with
partial transmitter cooperation,” IEEE Transactions on Information Theory,
vol. 53, no. 10, pp. 3536–3548, 2007.
[170] R. S. Maroutian, “Achievable rates for multiple descriptions with given expo-
nent and distortion levels,” (in Russian), Problems on Information Transmis-
sion, vol. 26, no. 1, pp. 83–89, 1990.
[171] K. Marton, “Error exponent for source coding with a fidelity criterion,” IEEE
Transactions on Information Theory, vol. 20, no. 2, pp. 197–199, 1974.
[172] K. Marton, “A coding theorem for the discrete memoryless broadcast chan-
nel,” IEEE Transactions on Information Theory, vol. 25, pp. 306–311, 1979.
[173] U. M. Maurer, “Authentication theorey and hypothesis testing,” IEEE Trans-
actions on Information Theory, vol. 46, no. 4, pp. 1350–1356, 2000.
[174] N. Merhav, “On random coding error exponents of watermarking systems,”
IEEE Transactions on Information Theory, vol. 46, no. 2, pp. 420–430, 2000.
[175] P. Moulin and J. A. O’Sullivan, “Information theoretic analysis of information
hiding,” IEEE Transactions on Information Theory, vol. 49, no. 3, pp. 563–
593, 2003.
[176] P. Moulin and Y. Wang, “Capacity and random-coding exponents for channel
coding with side information,” IEEE Transactions on Information Theory,
vol. 53, no. 4, pp. 1326–1347, 2007.
[177] S. Natarajan, “Large deviations, hypotheses testing, and source coding for
finite Markov chains,” IEEE Transactions on Information Theory, vol. 31,
no. 3, pp. 360–365, 1985.
[178] J. K. Omura, “A lower bounding method for channel and source coding prob-
abilities,” Information and Control, vol. 27, pp. 148–177, 1975.
[179] A. Peres, “Second-type-error exponent given the first-type-error exponent in
the testing statistical hypotheses by unfitted procedures,” in 6th International
Symposium on Information Theory, pp. 277–279, Tashkent, Part 1, 1984.
[180] M. S. Pinsker, “Capacity of noiseless broadcast channels,” Problems on Infor-
mation Transmission, (in Russian), vol. 14, no. 2, pp. 28–34, 1978.
[181] M. S. Pinsker, “Multi-user channels,” in II Joint Swedish-Soviet International
workshop on Information Theory, pp. 160–165, Gränna, Sweden, 1985.
[182] J. Pokorny and H. M. Wallmeier, “Random coding bound and codes pro-
duced by permutations for the multiple-access channel,” IEEE Transactions
on Information Theory, vol. 31, pp. 741–750, 1985.
[183] G. S. Poltyrev, “Random coding bounds for some broadcast channels,” (in
Russian), Problems on Information Transmission, vol. 19, no. 1, pp. 9–20,
1983.
[184] H. V. Poor and S. Verdú, “A lower bound on the probability of error in
multihypothesis testing,” IEEE Transactions on Information Theory, vol. 41,
no. 6, pp. 1992–1995, 1995.
References 169